1XS::Parse::Keyword(3) User Contributed Perl DocumentationXS::Parse::Keyword(3)
2
3
4
6 "XS::Parse::Keyword" - XS functions to assist in parsing keyword syntax
7
9 This module provides some XS functions to assist in writing syntax
10 modules that provide new perl-visible syntax, primarily for authors of
11 keyword plugins using the "PL_keyword_plugin" hook mechanism. It is
12 unlikely to be of much use to anyone else; and highly unlikely to be
13 any use when writing perl code using these. Unless you are writing a
14 keyword plugin using XS, this module is not for you.
15
16 This module is also currently experimental, and the design is still
17 evolving and subject to change. Later versions may break ABI
18 compatibility, requiring changes or at least a rebuild of any module
19 that depends on it.
20
22 boot_xs_parse_keyword
23 void boot_xs_parse_keyword(double ver);
24
25 Call this function from your "BOOT" section in order to initialise the
26 module and parsing hooks.
27
28 ver should either be 0 or a decimal number for the module version
29 requirement; e.g.
30
31 boot_xs_parse_keyword(0.14);
32
33 register_xs_parse_keyword
34 void register_xs_parse_keyword(const char *keyword,
35 const struct XSParseKeywordHooks *hooks, void *hookdata);
36
37 This function installs a set of parsing hooks to be associated with the
38 given keyword. Such a keyword will then be handled automatically by a
39 keyword parser installed by "XS::Parse::Keyword" itself.
40
42 The "XSParseKeywordHooks" structure provides the following hook stages,
43 which are invoked in the given order.
44
45 flags
46 The following flags are defined:
47
48 "XPK_FLAG_EXPR"
49 The parse or build function is expected to return
50 "KEYWORD_PLUGIN_EXPR".
51
52 "XPK_FLAG_STMT"
53 The parse or build function is expected to return
54 "KEYWORD_PLUGIN_STMT".
55
56 These two flags are largely for the benefit of giving static
57 information at registration time to assist static parsing or other
58 related tasks to know what kind of grammatical element this keyword
59 will produce.
60
61 "XPK_FLAG_AUTOSEMI"
62 The syntax forms a complete statement, which should be followed by
63 a statement separator semicolon (";"). This semicolon is optional
64 at the end of a block.
65
66 The semicolon, if present, will be consumed automatically.
67
68 The "permit" Stage
69 const char *permit_hintkey;
70 bool (*permit) (pTHX_ void *hookdata);
71
72 Called by the installed keyword parser hook which is used to handle
73 keywords registered by "register_xs_parse_keyword".
74
75 As a shortcut for the common case, the "permit_hintkey" may point to a
76 string to look up from the hints hash. If the given key name is not
77 found in the hints hash then the keyword is not permitted. If the key
78 is present then the "permit" function is invoked as normal.
79
80 If not rejected by a hint key that was not found in the hints hash, the
81 function part of the stage is called next and should inspect whether
82 the keyword is permitted at this time perhaps by inspecting other
83 lexical clues, and return true only if the keyword is permitted.
84
85 Both the string and the function are optional. Either or both may be
86 present. If neither is present then the keyword is always permitted -
87 which is likely not what you wanted to do.
88
89 The "check" Stage
90 void (*check)(pTHX_ void *hookdata);
91
92 Invoked once the keyword has been permitted. If present, this hook
93 function can check the surrounding lexical context, state, or other
94 information and throw an exception if it is unhappy that the keyword
95 should apply in this position.
96
97 The "parse" Stage
98 This stage is invoked once the keyword has been checked, and actually
99 parses the incoming text into an optree. It is implemented by calling
100 the first of the following function pointers which is not NULL. The
101 invoked function may optionally build an optree to represent the parsed
102 syntax, and place it into the variable addressed by "out". If it does
103 not, then a simple "OP_NULL" will be constructed in its place.
104
105 lex_read_space() is called both before and after this stage is invoked,
106 so in many simple cases the hook function itself does not need to
107 bother with it.
108
109 int (*parse)(pTHX_ OP **out, void *hookdata);
110
111 If present, this should consume text from the parser buffer by invoking
112 "lex_*" or "parse_*" functions and eventually return a
113 "KEYWORD_PLUGIN_*" result value.
114
115 This is the most generic and powerful of the options, but requires the
116 most amount of implementation work.
117
118 int (*build)(pTHX_ OP **out, XSParseKeywordPiece *args[], size_t nargs, void *hookdata);
119
120 If "parse" is not present, this is called instead after parsing a
121 sequence of arguments, of types given by the pieces field; which should
122 be a zero- terminated array of piece types.
123
124 This alternative is somewhat less generic and powerful than providing
125 "parse" yourself, but involves much less parsing work and is shorter
126 and easier to implement.
127
128 int (*build1)(pTHX_ OP **out, XSParseKeywordPiece *arg0, void *hookdata);
129
130 If neither "parse" nor "build" are present, this is called as a simpler
131 variant of "build" when only a single argument is required. It takes
132 its type from the "piece1" field instead.
133
135 When using the "build" or "build1" alternatives for the "parse" phase,
136 the actual syntax is parsed automatically by this module, according to
137 the specification given by the pieces or piece1 field. The result of
138 that parsing step is placed into the args or arg0 parameter to the
139 invoked function, using a "struct" type consisting of the following
140 fields:
141
142 typedef struct
143 union {
144 OP *op;
145 CV *cv;
146 SV *sv;
147 int i;
148 struct {
149 SV *name;
150 SV *value;
151 } attr;
152 PADOFFSET padix;
153 struct XSParseInfixInfo *infix;
154 };
155 int line;
156 } XSParseKeywordPiece;
157
158 Which field of the anonymous union is set depends on the type of the
159 piece. The line field contains the line number of the source file
160 where parsing of that piece began.
161
162 Some piece types are "atomic", whose definition is self-contained.
163 Others are structural, defined in terms of inner pieces. Together these
164 form an entire tree-shaped definition of the syntax that the keyword
165 expects to find.
166
167 Atomic types generally provide exactly one argument into the list of
168 args (with the exception of literal matches, which do not provide
169 anything). Structural types may provide an initial argument
170 themselves, followed by a list of the values of each sub-piece they
171 contained inside them. Thus, while the data structure defining the
172 syntax shape is a tree, the argument values it parses into is passed as
173 a flat array to the "build" function.
174
175 Some structural types need to be able to determine whether or not
176 syntax relating some optional part of them is present in the incoming
177 source text. In this case, the pieces relating to those optional parts
178 must support "probing". This ability is also noted below.
179
180 The type of each piece should be one of the following macro values.
181
182 XPK_BLOCK
183 atomic, can probe, emits op.
184
185 XPK_BLOCK
186
187 A brace-delimited block of code is expected, passed as an optree in the
188 op field. This will be parsed as a block within the current function
189 scope.
190
191 This can be probed by checking for the presence of an open-brace ("{")
192 character.
193
194 Be careful defining grammars with this because an open-brace is also a
195 valid character to start a term expression, for example. Given a choice
196 between "XPK_BLOCK" and "XPK_TERMEXPR", either of them could try to
197 consume such code as
198
199 { 123, 456 }
200
201 XPK_BLOCK_VOIDCTX, XPK_BLOCK_SCALARCTX, XPK_BLOCK_LISTCTX
202 Variants of "XPK_BLOCK" which wrap a void, scalar or list-context scope
203 around the block.
204
205 XPK_PREFIXED_BLOCK
206 structural, emits op.
207
208 XPK_PREFIXED_BLOCK(pieces ...)
209
210 Some pieces are expected, followed by a brace-delimited block of code,
211 which is passed as an optree in the op field. The prefix pieces are
212 parsed first, and their results are passed before the block itself.
213
214 The entire sequence, including the prefix items, is contained within a
215 pair of block_start() / block_end() calls. This permits the prefix
216 pieces to introduce new items into the lexical scope of the block - for
217 example by the use of "XPK_LEXVAR_MY".
218
219 A call to intro_my() is automatically made at the end of the prefix
220 pieces, before the block itself is parsed, ensuring any new lexical
221 variables are now visible.
222
223 In addition, the following extra piece types are recognised here:
224
225 XPK_SETUP
226 void setup(pTHX_ void *hookdata);
227
228 XPK_SETUP(&setup)
229
230 atomic, emits nothing.
231
232 This piece type runs a function given by pointer. Typically this
233 function may be used to introduce new lexical state into the
234 parser, or in some other way have some side-effect on the parsing
235 context of the block to be parsed.
236
237 XPK_PREFIXED_BLOCK_ENTERLEAVE
238 A variant of "XPK_PREFIXED_BLOCK" which additionally wraps the entire
239 parsing operation, including the block_start(), block_end() and any
240 calls to "XPK_SETUP" functions, within a "ENTER"/"LEAVE" pair.
241
242 This should not make a difference to the standard parser pieces
243 provided here, but may be useful behaviour for the code in the setup
244 function, especially if it wishes to modify parser state and use the
245 savestack to ensure it is restored again when parsing has finished.
246
247 XPK_ANONSUB
248 atomic, emits cv.
249
250 A brace-delimited block of code is expected, and assembled into the
251 body of a new anonymous subroutine. This will be passed as a protosub
252 CV in the cv field.
253
254 XPK_STAGED_ANONSUB
255 XPK_STAGED_ANONSUB(stages ...)
256
257 structural, emits cv.
258
259 A variant of "XPK_ANONSUB" which accepts additional function pointers
260 to be invoked at various points during parsing and compilation. These
261 can be used to interrupt the normal parsing in a manner similar to
262 XS::Parse::Sublike, though currently somewhat less flexibly.
263
264 The stages list may contain elements of the following types. Not every
265 stage must be present, but any that are present must be in the
266 following order. Multiple copies of each stage are permitted; they are
267 invoked in the written order, with parser code happening inbetween.
268
269 XPK_ANONSUB_PREPARE
270 XPK_ANONSUB_PREPARE(&callback)
271
272 atomic, emits nothing.
273
274 Invokes the callback before start_subparse().
275
276 XPK_ANONSUB_START
277 XPK_ANONSUB_START(&callback)
278
279 atomic, emits nothing.
280
281 Invokes the callback after block_start() but before parsing the
282 actual block contents.
283
284 XPK_ANONSUB_END
285 OP *op_wrapper_callback(pTHX_ OP *o, void *hookdata);
286
287 XPK_ANONSUB_END(&op_wrapper_callback)
288
289 atomic, emits nothing.
290
291 Invokes the callback after parsing the block contents but before
292 calling block_end(). The callback may modify the optree if required
293 and return a new one.
294
295 XPK_ANONSUB_WRAP
296 XPK_ANONSUB_WRAP(&op_wrapper_callback)
297
298 atomic, emits nothing.
299
300 Invokes the callback after block_end() but before passing the
301 optree to newATTRSUB(). The callback may modify the optree if
302 required and return a new one.
303
304 XPK_ARITHEXPR
305 atomic, emits op.
306
307 XPK_ARITHEXPR
308
309 An arithmetic expression is expected, parsed using parse_arithexpr(),
310 and passed as an optree in the op field.
311
312 XPK_ARITHEXPR_VOIDCTX, XPK_ARITHEXPR_SCALARCTX
313 Variants of "XPK_ARITHEXPR" which puts the expression in void or scalar
314 context.
315
316 XPK_TERMEXPR
317 atomic, emits op.
318
319 XPK_TERMEXPR
320
321 A term expression is expected, parsed using parse_termexpr(), and
322 passed as an optree in the op field.
323
324 XPK_TERMEXPR_VOIDCTX, XPK_TERMEXPR_SCALARCTX
325 Variants of "XPK_TERMEXPR" which puts the expression in void or scalar
326 context.
327
328 XPK_PREFIXED_TERMEXPR_ENTERLEAVE
329 XPK_PREFIXED_TERMEXPR_ENTERLEAVE(pieces ...)
330
331 A variant of "XPK_TERMEXPR" which expects a sequence pieces first
332 before it parses a term expression, similar to how
333 "XPK_PREFIXED_BLOCK_ENTERLEAVE" works. The entire operation is wrapped
334 in an "ENTER"/"LEAVE" pair.
335
336 This is intended just for use of "XPK_SETUP" pieces as prefixes. Any
337 other pieces which actually parse real input are likely to cause
338 overly-complex, subtle, or outright ambiguous grammars, and should be
339 avoided.
340
341 XPK_LISTEXPR
342 atomic, emits op.
343
344 XPK_LISTEXPR
345
346 A list expression is expected, parsed using parse_listexpr(), and
347 passed as an optree in the op field.
348
349 XPK_LISTEXPR_LISTCTX
350 Variant of "XPK_LISTEXPR" which puts the expression in list context.
351
352 XPK_IDENT, XPK_IDENT_OPT
353 atomic, can probe, emits sv.
354
355 A bareword identifier name is expected, and passed as an SV containing
356 a PV in the sv field. An identifier is not permitted to contain a
357 double colon ("::").
358
359 The "_OPT"-suffixed version is optional; if no identifier is found then
360 sv is set to "NULL".
361
362 XPK_PACKAGENAME, XPK_PACKAGENAME_OPT
363 atomic, can probe, emits sv.
364
365 A bareword package name is expected, and passed as an SV containing a
366 PV in the sv field. A package name is similar to an identifier, except
367 it permits double colons in the middle.
368
369 The "_OPT"-suffixed version is optional; if no package name is found
370 then sv is set to "NULL".
371
372 XPK_LEXVARNAME
373 atomic, emits sv.
374
375 XPK_LEXVARNAME(kind)
376
377 A lexical variable name is expected, and passed as an SV containing a
378 PV in the sv field. The "kind" argument specifies what kinds of
379 variable are permitted, and should be a bitmask of one or more bits
380 from "XPK_LEXVAR_SCALAR", "XPK_LEXVAR_ARRAY" and "XPK_LEXVAR_HASH". A
381 convenient shortcut "XPK_LEXVAR_ANY" permits all three.
382
383 XPK_ATTRIBUTES
384 atomic, emits i followed by more args.
385
386 A list of ":"-prefixed attributes is expected, in the same format as
387 sub or variable attributes. An optional leading ":" indicates the
388 presence of attributes, then one or more of them are parsed. Attributes
389 may be optionally separated by additional ":"s, but this is not
390 required.
391
392 Each attribute is expected to be an identifier name, followed by an
393 optional value wrapped in parentheses. Whitespace is NOT permitted
394 between the name and value, as per standard Perl parsing rules.
395
396 :attrname
397 :attrname(value)
398
399 The i field indicates how many attributes were found. That number of
400 additional arguments are then passed, each containing two SVs in the
401 attr.name and attr.value fields. This number may be zero.
402
403 It is not an error for there to be no attributes present, or for the
404 optional colon to be missing. In this case i will be set to zero.
405
406 XPK_VSTRING, XPK_VSTRING_OPT
407 atomic, can probe, emits sv.
408
409 A version string is expected, of the form "v1.234" including the
410 leading "v" character. It is passed as a version SV object in the sv
411 field.
412
413 The "_OPT"-suffixed version is optional; if no version string is found
414 then sv is set to "NULL".
415
416 XPK_LEXVAR
417 atomic, emits padix.
418
419 XPK_LEXVAR(kind)
420
421 A lexical variable name is expected and looked up from the current pad.
422 The resulting pad index is passed in the padix field. No error happens
423 if the variable is not found; the value "NOT_IN_PAD" is passed instead.
424
425 The "kind" argument specifies what kinds of variable are permitted, as
426 per "XPK_LEXVARNAME".
427
428 XPK_LEXVAR_MY
429 atomic, emits padix.
430
431 XPK_LEXVAR_MY(kind)
432
433 A lexical variable name is expected, added to the current pad as if
434 specified in a "my" expression, and passed as the pad index in the
435 padix field.
436
437 The "kind" argument specifies what kinds of variable are permitted, as
438 per "XPK_LEXVARNAME".
439
440 XPK_COMMA, XPK_COLON, XPK_EQUALS
441 atomic, can probe, emits nothing.
442
443 A literal character (",", ":" or "=") is expected. No argument value is
444 passed.
445
446 XPK_AUTOSEMI
447 atomic, emits nothing.
448
449 A literal semicolon (";") as a statement terminator is optionally
450 expected. If the next token is a closing brace to indicate the end of
451 a block, then a semicolon is not required. If anything else is
452 encountered an error will be raised.
453
454 This piece type is the same as specifying the "XPK_FLAG_AUTOSEMI". It
455 is useful to put at the end of a sequence that forms part of a choice
456 of syntax, where some forms indicate a statement ending in a semicolon,
457 whereas others may end in a full block that does not need one.
458
459 XPK_INFIX_*
460 atomic, can probe, emits infix.
461
462 An infix operator as recognised by XS::Parse::Infix. The returned
463 pointer points to a structure allocated by "XS::Parse::Infix"
464 describing the operator.
465
466 Various versions of the macro are provided, each using a different
467 selection filter to choose certain available infix operators:
468
469 XPK_INFIX_RELATION # any relational operator
470 XPK_INFIX_EQUALITY # an equality operator like `==` or `eq`
471 XPK_INFIX_MATCH_NOSMART # any sort of "match"-like operator, except smartmatch
472 XPK_INFIX_MATCH_SMART # XPK_INFIX_MATCH_NOSMART plus smartmatch
473
474 XPK_LITERAL
475 atomic, can probe, emits nothing.
476
477 XPK_LITERAL("literal")
478
479 A literal string match is expected. No argument value is passed.
480
481 This form should generally be avoided if at all possible, because it is
482 very easy to abuse to make syntaxes which confuse humans and code tools
483 alike. Generally it is best reserved just for the first component of a
484 "XPK_OPTIONAL" or "XPK_REPEATED" sequence, to provide a "secondary
485 keyword" that such a repeated item can look out for.
486
487 XPK_KEYWORD
488 atomic, can probe, emits nothing.
489
490 XPK_KEYWORD("keyword")
491
492 A literal string match is expected. No argument value is passed.
493
494 This is similar to "XPK_LITERAL" except that it additionally checks
495 that the following character is not an identifier character. This
496 ensures that the expected keyword-like behaviour is preserved. For
497 example, given the input "keyword", the piece XPK_LITERAL("key") would
498 match it, whereas XPK_KEYWORD("key") would not because of the
499 subsequent "w" character.
500
501 XPK_SEQUENCE
502 structural, might support probe, emits nothing.
503
504 XPK_SEQUENCE(pieces ...)
505
506 A structural type which contains a number of pieces. This is normally
507 equivalent to simply placing the pieces in sequence inside their own
508 container, but it is useful inside "XPK_CHOICE" or "XPK_TAGGEDCHOICE".
509
510 An "XPK_SEQUENCE" supports probe if its first contained piece does;
511 i.e. is transparent to probing.
512
513 XPK_OPTIONAL
514 structural, emits i.
515
516 XPK_OPTIONAL(pieces ...)
517
518 A structural type which may expects to find its contained pieces, or is
519 happy not to. This will pass an argument whose i field contains either
520 1 or 0, depending whether the contents were found. The first piece type
521 within must support probe.
522
523 XPK_REPEATED
524 structural, emits i.
525
526 XPK_REPEATED(pieces ...)
527
528 A structural type which expects to find zero or more repeats of its
529 contained pieces. This will pass an argument whose i field contains the
530 count of the number of repeats it found. The first piece type within
531 must support probe.
532
533 XPK_CHOICE
534 structural, can probe, emits i.
535
536 XPK_CHOICE(options ...)
537
538 A structural type which expects to find one of a number of alternative
539 options. An ordered list of types is provided, all of which must
540 support probe. This will pass an argument whose i field gives the index
541 of the first choice that was accepted. The first option takes the value
542 0.
543
544 As each of the options is interpreted as an alternative, not a
545 sequence, you should use "XPK_SEQUENCE" if a sequence of multiple items
546 should be considered as a single alternative.
547
548 It is not an error if no choice matches. At that point, the i field
549 will be set to -1.
550
551 If you require a failure message in this case, set the final choice to
552 be of type "XPK_FAILURE". This will cause an error message to be
553 printed instead.
554
555 XPK_FAILURE("message string")
556
557 XPK_TAGGEDCHOICE
558 structural, can probe, emits i.
559
560 XPK_TAGGEDCHOICE(choice, tag, ...)
561
562 A structural type similar to "XPK_CHOICE", except that each choice type
563 is followed by an element of type "XPK_TAG" which gives an integer. It
564 is that integer value, rather than the positional index of the choice
565 within the list, which is passed in the i field.
566
567 XPK_TAG(value)
568
569 As each of the options is interpreted as an alternative, not a
570 sequence, you should use "XPK_SEQUENCE" if a sequence of multiple items
571 should be considered as a single alternative.
572
573 XPK_COMMALIST
574 structural, might support probe, emits i.
575
576 XPK_COMMALIST(pieces ...)
577
578 A structural type which expects to find one or more repeats of its
579 contained pieces, separated by literal comma (",") characters. This is
580 somewhat similar to "XPK_REPEATED", except that it needs at least one
581 copy, needs commas between its items, but does not require that the
582 first contained piece support probe (the comma itself is sufficient to
583 indicate a repeat).
584
585 An "XPK_COMMALIST" supports probe if its first contained piece does;
586 i.e. is transparent to probing.
587
588 XPK_PARENSCOPE
589 structural, can probe, emits nothing.
590
591 XPK_PARENSCOPE(pieces ...)
592
593 A structural type which expects to find a sequence of pieces, all
594 contained in parentheses as "( ... )". This will pass no extra
595 arguments.
596
597 XPK_ARGSCOPE
598 structural, emits nothing.
599
600 XPK_ARGSCOPE(pieces ...)
601
602 A structural type similar to "XPK_PARENSCOPE", except that the
603 parentheses themselves are optional; much like Perl's parsing of calls
604 to known functions.
605
606 If parentheses are encountered in the input, they will be consumed by
607 this piece and it will behave identically to "XPK_PARENSCOPE". If there
608 is no open parenthesis, this piece will behave like "XPK_SEQUENCE" and
609 consume all the pieces inside it, without expecting a closing
610 parenthesis.
611
612 XPK_BRACKETSCOPE
613 structural, can probe, emits nothing.
614
615 XPK_BRACKETSCOPE(pieces ...)
616
617 A structural type which expects to find a sequence of pieces, all
618 contained in square brackets as "[ ... ]". This will pass no extra
619 arguments.
620
621 XPK_BRACESCOPE
622 structural, can probe, emits nothing.
623
624 XPK_BRACESCOPE(pieces ...)
625
626 A structural type which expects to find a sequence of pieces, all
627 contained in braces as "{ ... }". This will pass no extra arguments.
628
629 Note that this is not necessary to use with "XPK_BLOCK" or
630 "XPK_ANONSUB"; those will already consume a set of braces. This is
631 intended for special constrained syntax that should not just accept an
632 arbitrary block.
633
634 XPK_CHEVRONSCOPE
635 structural, can probe, emits nothing.
636
637 XPK_CHEVRONSCOPE(pieces ...)
638
639 A structural type which expects to find a sequence of pieces, all
640 contained in angle brackets as "< ... >". This will pass no extra
641 arguments.
642
643 Remember that expressions like "a > b" are valid term expressions, so
644 the contents of this scope shouldn't allow arbitrary expressions or the
645 closing bracket will be ambiguous.
646
647 XPK_PARENSCOPE_OPT, XPK_BRACKETSCOPE_OPT, XPK_BRACESCOPE_OPT,
648 XPK_CHEVRONSCOPE_OPT
649 structural, can probe, emits i.
650
651 XPK_PARENSCOPE_OPT(pieces ...)
652 XPK_BRACKETSCOPE_OPT(pieces ...)
653 XPK_BRACESCOPE_OPT(pieces ...)
654 XPK_CHEVERONSCOPE_OPT(pieces ...)
655
656 Each of the four "XPK_...SCOPE" macros above has an optional variant,
657 whose name is suffixed by "_OPT". These pass an argument whose i field
658 is either true or false, indicating whether the scope was found,
659 followed by the values from the scope itself.
660
661 This is a convenient shortcut to nesting the scope within a
662 "XPK_OPTIONAL" macro.
663
664 XPK_..._pieces
665 XPK_SEQUENCE_pieces(ptr)
666 XPK_OPTIONAL_pieces(ptr)
667 ...
668
669 For each of the "XPK_..." macros that takes a variable-length list of
670 pieces, there is a variant whose name ends with "..._pieces", taking a
671 single pointer argument directly. This must point at a "const
672 XSParseKeywordPieceType []" array whose final element is the zero
673 element.
674
675 Normally hand-written C code of a fixed grammar would be unlikely to
676 use these forms, but they may be useful in dynamically-generated cases.
677
679 Paul Evans <leonerd@leonerd.org.uk>
680
681
682
683perl v5.36.1 2023-06-15 XS::Parse::Keyword(3)