1XS::Parse::Keyword(3) User Contributed Perl DocumentationXS::Parse::Keyword(3)
2
3
4

NAME

6       "XS::Parse::Keyword" - XS functions to assist in parsing keyword syntax
7

DESCRIPTION

9       This module provides some XS functions to assist in writing syntax
10       modules that provide new perl-visible syntax, primarily for authors of
11       keyword plugins using the "PL_keyword_plugin" hook mechanism. It is
12       unlikely to be of much use to anyone else; and highly unlikely to be
13       any use when writing perl code using these. Unless you are writing a
14       keyword plugin using XS, this module is not for you.
15
16       This module is also currently experimental, and the design is still
17       evolving and subject to change. Later versions may break ABI
18       compatibility, requiring changes or at least a rebuild of any module
19       that depends on it.
20

XS FUNCTIONS

22   boot_xs_parse_keyword
23          void boot_xs_parse_keyword(double ver);
24
25       Call this function from your "BOOT" section in order to initialise the
26       module and parsing hooks.
27
28       ver should either be 0 or a decimal number for the module version
29       requirement; e.g.
30
31          boot_xs_parse_keyword(0.14);
32
33   register_xs_parse_keyword
34          void register_xs_parse_keyword(const char *keyword,
35            const struct XSParseKeywordHooks *hooks, void *hookdata);
36
37       This function installs a set of parsing hooks to be associated with the
38       given keyword. Such a keyword will then be handled automatically by a
39       keyword parser installed by "XS::Parse::Keyword" itself.
40

PARSE HOOKS

42       The "XSParseKeywordHooks" structure provides the following hook stages,
43       which are invoked in the given order.
44
45   flags
46       The following flags are defined:
47
48       "XPK_FLAG_EXPR"
49           The parse or build function is expected to return
50           "KEYWORD_PLUGIN_EXPR".
51
52       "XPK_FLAG_STMT"
53           The parse or build function is expected to return
54           "KEYWORD_PLUGIN_STMT".
55
56           These two flags are largely for the benefit of giving static
57           information at registration time to assist static parsing or other
58           related tasks to know what kind of grammatical element this keyword
59           will produce.
60
61       "XPK_FLAG_AUTOSEMI"
62           The syntax forms a complete statement, which should be followed by
63           a statement separator semicolon (";"). This semicolon is optional
64           at the end of a block.
65
66           The semicolon, if present, will be consumed automatically.
67
68   The "permit" Stage
69          const char *permit_hintkey;
70          bool (*permit) (pTHX_ void *hookdata);
71
72       Called by the installed keyword parser hook which is used to handle
73       keywords registered by "register_xs_parse_keyword".
74
75       As a shortcut for the common case, the "permit_hintkey" may point to a
76       string to look up from the hints hash. If the given key name is not
77       found in the hints hash then the keyword is not permitted. If the key
78       is present then the "permit" function is invoked as normal.
79
80       If not rejected by a hint key that was not found in the hints hash, the
81       function part of the stage is called next and should inspect whether
82       the keyword is permitted at this time perhaps by inspecting other
83       lexical clues, and return true only if the keyword is permitted.
84
85       Both the string and the function are optional. Either or both may be
86       present.  If neither is present then the keyword is always permitted -
87       which is likely not what you wanted to do.
88
89   The "check" Stage
90          void (*check)(pTHX_ void *hookdata);
91
92       Invoked once the keyword has been permitted. If present, this hook
93       function can check the surrounding lexical context, state, or other
94       information and throw an exception if it is unhappy that the keyword
95       should apply in this position.
96
97   The "parse" Stage
98       This stage is invoked once the keyword has been checked, and actually
99       parses the incoming text into an optree. It is implemented by calling
100       the first of the following function pointers which is not NULL. The
101       invoked function may optionally build an optree to represent the parsed
102       syntax, and place it into the variable addressed by "out". If it does
103       not, then a simple "OP_NULL" will be constructed in its place.
104
105       "lex_read_space()" is called both before and after this stage is
106       invoked, so in many simple cases the hook function itself does not need
107       to bother with it.
108
109          int (*parse)(pTHX_ OP **out, void *hookdata);
110
111       If present, this should consume text from the parser buffer by invoking
112       "lex_*" or "parse_*" functions and eventually return a
113       "KEYWORD_PLUGIN_*" result value.
114
115       This is the most generic and powerful of the options, but requires the
116       most amount of implementation work.
117
118          int (*build)(pTHX_ OP **out, XSParseKeywordPiece *args[], size_t nargs, void *hookdata);
119
120       If "parse" is not present, this is called instead after parsing a
121       sequence of arguments, of types given by the pieces field; which should
122       be a zero- terminated array of piece types.
123
124       This alternative is somewhat less generic and powerful than providing
125       "parse" yourself, but involves much less parsing work and is shorter
126       and easier to implement.
127
128          int (*build1)(pTHX_ OP **out, XSParseKeywordPiece *arg0, void *hookdata);
129
130       If neither "parse" nor "build" are present, this is called as a simpler
131       variant of "build" when only a single argument is required. It takes
132       its type from the "piece1" field instead.
133

PIECES AND PIECE TYPES

135       When using the "build" or "build1" alternatives for the "parse" phase,
136       the actual syntax is parsed automatically by this module, according to
137       the specification given by the pieces or piece1 field. The result of
138       that parsing step is placed into the args or arg0 parameter to the
139       invoked function, using a "struct" type consisting of the following
140       fields:
141
142          typedef struct
143             union {
144                OP *op;
145                CV *cv;
146                SV *sv;
147                int i;
148                struct {
149                   SV *name;
150                   SV *value;
151                } attr;
152                PADOFFSET padix;
153                struct XSParseInfixInfo *infix;
154             };
155             int line;
156          } XSParseKeywordPiece;
157
158       Which field of the anonymous union is set depends on the type of the
159       piece.  The line field contains the line number of the source file
160       where parsing of that piece began.
161
162       Some piece types are "atomic", whose definition is self-contained.
163       Others are structural, defined in terms of inner pieces. Together these
164       form an entire tree-shaped definition of the syntax that the keyword
165       expects to find.
166
167       Atomic types generally provide exactly one argument into the list of
168       args (with the exception of literal matches, which do not provide
169       anything).  Structural types may provide an initial argument
170       themselves, followed by a list of the values of each sub-piece they
171       contained inside them. Thus, while the data structure defining the
172       syntax shape is a tree, the argument values it parses into is passed as
173       a flat array to the "build" function.
174
175       Some structural types need to be able to determine whether or not
176       syntax relating some optional part of them is present in the incoming
177       source text. In this case, the pieces relating to those optional parts
178       must support "probing".  This ability is also noted below.
179
180       The type of each piece should be one of the following macro values.
181
182   XPK_BLOCK
183       atomic, can probe, emits op.
184
185          XPK_BLOCK
186
187       A brace-delimited block of code is expected, passed as an optree in the
188       op field. This will be parsed as a block within the current function
189       scope.
190
191       This can be probed by checking for the presence of an open-brace ("{")
192       character.
193
194       Be careful defining grammars with this because an open-brace is also a
195       valid character to start a term expression, for example. Given a choice
196       between "XPK_BLOCK" and "XPK_TERMEXPR", either of them could try to
197       consume such code as
198
199          { 123, 456 }
200
201   XPK_BLOCK_VOIDCTX, XPK_BLOCK_SCALARCTX, XPK_BLOCK_LISTCTX
202       Variants of "XPK_BLOCK" which wrap a void, scalar or list-context scope
203       around the block.
204
205   XPK_PREFIXED_BLOCK
206       structural, emits op.
207
208          XPK_PREFIXED_BLOCK(pieces ...)
209
210       Some pieces are expected, followed by a brace-delimited block of code,
211       which is passed as an optree in the op field. The prefix pieces are
212       parsed first, and their results are passed before the block itself.
213
214       The entire sequence, including the prefix items, is contained within a
215       pair of "block_start()" / "block_end()" calls. This permits the prefix
216       pieces to introduce new items into the lexical scope of the block - for
217       example by the use of "XPK_LEXVAR_MY".
218
219       A call to "intro_my()" is automatically made at the end of the prefix
220       pieces, before the block itself is parsed, ensuring any new lexical
221       variables are now visible.
222
223       In addition, the following extra piece types are recognised here:
224
225       XPK_SETUP
226              void setup(pTHX_ void *hookdata);
227
228              XPK_SETUP(&setup)
229
230           atomic, emits nothing.
231
232           This piece type runs a function given by pointer. Typically this
233           function may be used to introduce new lexical state into the
234           parser, or in some other way have some side-effect on the parsing
235           context of the block to be parsed.
236
237   XPK_PREFIXED_BLOCK_ENTERLEAVE
238       A variant of "XPK_PREFIXED_BLOCK" which additionally wraps the entire
239       parsing operation, including the "block_start()", "block_end()" and any
240       calls to "XPK_SETUP" functions, within a "ENTER"/"LEAVE" pair.
241
242       This should not make a difference to the standard parser pieces
243       provided here, but may be useful behaviour for the code in the setup
244       function, especially if it wishes to modify parser state and use the
245       savestack to ensure it is restored again when parsing has finished.
246
247   XPK_ANONSUB
248       atomic, emits op.
249
250       A brace-delimited block of code is expected, and assembled into the
251       body of a new anonymous subroutine. This will be passed as a protosub
252       CV in the cv field.
253
254   XPK_ARITHEXPR
255       atomic, emits op.
256
257          XPK_ARITHEXPR
258
259       An arithmetic expression is expected, parsed using "parse_arithexpr()",
260       and passed as an optree in the op field.
261
262   XPK_ARITHEXPR_VOIDCTX, XPK_ARITHEXPR_SCALARCTX
263       Variants of "XPK_ARITHEXPR" which puts the expression in void or scalar
264       context.
265
266   XPK_TERMEXPR
267       atomic, emits op.
268
269          XPK_TERMEXPR
270
271       A term expression is expected, parsed using "parse_termexpr()", and
272       passed as an optree in the op field.
273
274   XPK_TERMEXPR_VOIDCTX, XPK_TERMEXPR_SCALARCTX
275       Variants of "XPK_TERMEXPR" which puts the expression in void or scalar
276       context.
277
278   XPK_LISTEXPR
279       atomic, emits op.
280
281          XPK_LISTEXPR
282
283       A list expression is expected, parsed using "parse_listexpr()", and
284       passed as an optree in the op field.
285
286   XPK_LISTEXPR_LISTCTX
287       Variant of "XPK_LISTEXPR" which puts the expression in list context.
288
289   XPK_IDENT, XPK_IDENT_OPT
290       atomic, can probe, emits sv.
291
292       A bareword identifier name is expected, and passed as an SV containing
293       a PV in the sv field. An identifier is not permitted to contain a
294       double colon ("::").
295
296       The "_OPT"-suffixed version is optional; if no identifier is found then
297       sv is set to "NULL".
298
299   XPK_PACKAGENAME, XPK_PACKAGENAME_OPT
300       atomic, can probe, emits sv.
301
302       A bareword package name is expected, and passed as an SV containing a
303       PV in the sv field. A package name is similar to an identifier, except
304       it permits double colons in the middle.
305
306       The "_OPT"-suffixed version is optional; if no package name is found
307       then sv is set to "NULL".
308
309   XPK_LEXVARNAME
310       atomic, emits sv.
311
312          XPK_LEXVARNAME(kind)
313
314       A lexical variable name is expected, and passed as an SV containing a
315       PV in the sv field. The "kind" argument specifies what kinds of
316       variable are permitted, and should be a bitmask of one or more bits
317       from "XPK_LEXVAR_SCALAR", "XPK_LEXVAR_ARRAY" and "XPK_LEXVAR_HASH". A
318       convenient shortcut "XPK_LEXVAR_ANY" permits all three.
319
320   XPK_ATTRIBUTES
321       atomic, emits i followed by more args.
322
323       A list of ":"-prefixed attributes is expected, in the same format as
324       sub or variable attributes. An optional leading ":" indicates the
325       presence of attributes, then one or more of them are parsed. Attributes
326       may be optionally separated by additional ":"s, but this is not
327       required.
328
329       Each attribute is expected to be an identifier name, followed by an
330       optional value wrapped in parentheses. Whitespace is NOT permitted
331       between the name and value, as per standard Perl parsing rules.
332
333          :attrname
334          :attrname(value)
335
336       The i field indicates how many attributes were found. That number of
337       additional arguments are then passed, each containing two SVs in the
338       attr.name and attr.value fields. This number may be zero.
339
340       It is not an error for there to be no attributes present, or for the
341       optional colon to be missing. In this case i will be set to zero.
342
343   XPK_VSTRING, XPK_VSTRING_OPT
344       atomic, can probe, emits sv.
345
346       A version string is expected, of the form "v1.234" including the
347       leading "v" character. It is passed as a version SV object in the sv
348       field.
349
350       The "_OPT"-suffixed version is optional; if no version string is found
351       then sv is set to "NULL".
352
353   XPK_LEXVAR_MY
354       atomic, emits padix.
355
356          XPK_LEXVAR_MY(kind)
357
358       A lexical variable name is expected, added to the current pad as if
359       specified in a "my" expression, and passed as the pad index in the
360       padix field.
361
362       The "kind" argument specifies what kinds of variable are permitted, as
363       per "XPK_LEXVARNAME".
364
365   XPK_COMMA, XPK_COLON, XPK_EQUALS
366       atomic, can probe, emits nothing.
367
368       A literal character (",", ":" or "=") is expected. No argument value is
369       passed.
370
371   XPK_AUTOSEMI
372       atomic, emits nothing.
373
374       A literal semicolon (";") as a statement terminator is optionally
375       expected.  If the next token is a closing brace to indicate the end of
376       a block, then a semicolon is not required. If anything else is
377       encountered an error will be raised.
378
379       This piece type is the same as specifying the "XPK_FLAG_AUTOSEMI". It
380       is useful to put at the end of a sequence that forms part of a choice
381       of syntax, where some forms indicate a statement ending in a semicolon,
382       whereas others may end in a full block that does not need one.
383
384   XPK_INFIX_*
385       atomic, can probe, emits infix.
386
387       An infix operator as recognised by XS::Parse::Infix. The returned
388       pointer points to a structure allocated by "XS::Parse::Infix"
389       describing the operator.
390
391       Various versions of the macro are provided, each using a different
392       selection filter to choose certain available infix operators:
393
394          XPK_INFIX_RELATION         # any relational operator
395          XPK_INFIX_EQUALITY         # an equality operator like `==` or `eq`
396          XPK_INFIX_MATCH_NOSMART    # any sort of "match"-like operator, except smartmatch
397          XPK_INFIX_MATCH_SMART      # XPK_INFIX_MATCH_NOSMART plus smartmatch
398
399   XPK_LITERAL
400       atomic, can probe, emits nothing.
401
402          XPK_LITERAL("literal")
403
404       A literal string match is expected. No argument value is passed.
405
406       This form should generally be avoided if at all possible, because it is
407       very easy to abuse to make syntaxes which confuse humans and code tools
408       alike.  Generally it is best reserved just for the first component of a
409       "XPK_OPTIONAL" or "XPK_REPEATED" sequence, to provide a "secondary
410       keyword" that such a repeated item can look out for.
411
412   XPK_KEYWORD
413       atomic, can probe, emits nothing.
414
415          XPK_KEYWORD("keyword")
416
417       A literal string match is expected. No argument value is passed.
418
419       This is similar to "XPK_LITERAL" except that it additionally checks
420       that the following character is not an identifier character. This
421       ensures that the expected keyword-like behaviour is preserved. For
422       example, given the input "keyword", the piece "XPK_LITERAL("key")"
423       would match it, whereas "XPK_KEYWORD("key")" would not because of the
424       subsequent "w" character.
425
426   XPK_SEQUENCE
427       structural, might support probe, emits nothing.
428
429          XPK_SEQUENCE(pieces ...)
430
431       A structural type which contains a number of pieces. This is normally
432       equivalent to simply placing the pieces in sequence inside their own
433       container, but it is useful inside "XPK_CHOICE" or "XPK_TAGGEDCHOICE".
434
435       An "XPK_SEQUENCE" supports probe if its first contained piece does;
436       i.e.  is transparent to probing.
437
438   XPK_OPTIONAL
439       structural, emits i.
440
441          XPK_OPTIONAL(pieces ...)
442
443       A structural type which may expects to find its contained pieces, or is
444       happy not to. This will pass an argument whose i field contains either
445       1 or 0, depending whether the contents were found. The first piece type
446       within must support probe.
447
448   XPK_REPEATED
449       structural, emits i.
450
451          XPK_REPEATED(pieces ...)
452
453       A structural type which expects to find zero or more repeats of its
454       contained pieces. This will pass an argument whose i field contains the
455       count of the number of repeats it found. The first piece type within
456       must support probe.
457
458   XPK_CHOICE
459       structural, can probe, emits i.
460
461          XPK_CHOICE(options ...)
462
463       A structural type which expects to find one of a number of alternative
464       options. An ordered list of types is provided, all of which must
465       support probe. This will pass an argument whose i field gives the index
466       of the first choice that was accepted. The first option takes the value
467       0.
468
469       As each of the options is interpreted as an alternative, not a
470       sequence, you should use "XPK_SEQUENCE" if a sequence of multiple items
471       should be considered as a single alternative.
472
473       It is not an error if no choice matches. At that point, the i field
474       will be set to -1.
475
476       If you require a failure message in this case, set the final choice to
477       be of type "XPK_FAILURE". This will cause an error message to be
478       printed instead.
479
480          XPK_FAILURE("message string")
481
482   XPK_TAGGEDCHOICE
483       structural, can probe, emits i.
484
485          XPK_TAGGEDCHOICE(choice, tag, ...)
486
487       A structural type similar to "XPK_CHOICE", except that each choice type
488       is followed by an element of type "XPK_TAG" which gives an integer. It
489       is that integer value, rather than the positional index of the choice
490       within the list, which is passed in the i field.
491
492          XPK_TAG(value)
493
494       As each of the options is interpreted as an alternative, not a
495       sequence, you should use "XPK_SEQUENCE" if a sequence of multiple items
496       should be considered as a single alternative.
497
498   XPK_COMMALIST
499       structural, might support probe, emits i.
500
501          XPK_COMMALIST(pieces ...)
502
503       A structural type which expects to find one or more repeats of its
504       contained pieces, separated by literal comma (",") characters. This is
505       somewhat similar to "XPK_REPEATED", except that it needs at least one
506       copy, needs commas between its items, but does not require that the
507       first contained piece support probe (the comma itself is sufficient to
508       indicate a repeat).
509
510       An "XPK_COMMALIST" supports probe if its first contained piece does;
511       i.e.  is transparent to probing.
512
513   XPK_PARENSCOPE
514       structural, can probe, emits nothing.
515
516          XPK_PARENSCOPE(pieces ...)
517
518       A structural type which expects to find a sequence of pieces, all
519       contained in parentheses as "( ... )". This will pass no extra
520       arguments.
521
522   XPK_ARGSCOPE
523       structural, emits nothing.
524
525          XPK_ARGSCOPE(pieces ...)
526
527       A structural type similar to "XPK_PARENSCOPE", except that the
528       parentheses themselves are optional; much like Perl's parsing of calls
529       to known functions.
530
531       If parentheses are encountered in the input, they will be consumed by
532       this piece and it will behave identically to "XPK_PARENSCOPE". If there
533       is no open parenthesis, this piece will behave like "XPK_SEQUENCE" and
534       consume all the pieces inside it, without expecting a closing
535       parenthesis.
536
537   XPK_BRACKETSCOPE
538       structural, can probe, emits nothing.
539
540          XPK_BRACKETSCOPE(pieces ...)
541
542       A structural type which expects to find a sequence of pieces, all
543       contained in square brackets as "[ ... ]". This will pass no extra
544       arguments.
545
546   XPK_BRACESCOPE
547       structural, can probe, emits nothing.
548
549          XPK_BRACESCOPE(pieces ...)
550
551       A structural type which expects to find a sequence of pieces, all
552       contained in braces as "{ ... }". This will pass no extra arguments.
553
554       Note that this is not necessary to use with "XPK_BLOCK" or
555       "XPK_ANONSUB"; those will already consume a set of braces. This is
556       intended for special constrained syntax that should not just accept an
557       arbitrary block.
558
559   XPK_CHEVRONSCOPE
560       structural, can probe, emits nothing.
561
562          XPK_CHEVRONSCOPE(pieces ...)
563
564       A structural type which expects to find a sequence of pieces, all
565       contained in angle brackets as "< ... >". This will pass no extra
566       arguments.
567
568       Remember that expressions like "a > b" are valid term expressions, so
569       the contents of this scope shouldn't allow arbitrary expressions or the
570       closing bracket will be ambiguous.
571
572   XPK_PARENSCOPE_OPT, XPK_BRACKETSCOPE_OPT, XPK_BRACESCOPE_OPT,
573       XPK_CHEVRONSCOPE_OPT
574       structural, can probe, emits i.
575
576          XPK_PARENSCOPE_OPT(pieces ...)
577          XPK_BRACKETSCOPE_OPT(pieces ...)
578          XPK_BRACESCOPE_OPT(pieces ...)
579          XPK_CHEVERONSCOPE_OPT(pieces ...)
580
581       Each of the four "XPK_...SCOPE" macros above has an optional variant,
582       whose name is suffixed by "_OPT". These pass an argument whose i field
583       is either true or false, indicating whether the scope was found,
584       followed by the values from the scope itself.
585
586       This is a convenient shortcut to nesting the scope within a
587       "XPK_OPTIONAL" macro.
588

AUTHOR

590       Paul Evans <leonerd@leonerd.org.uk>
591
592
593
594perl v5.36.0                      2022-07-26             XS::Parse::Keyword(3)
Impressum