1Parse::RecDescent(3) User Contributed Perl Documentation Parse::RecDescent(3)
2
3
4
6 Parse::RecDescent - Generate Recursive-Descent Parsers
7
9 This document describes version 1.94 of Parse::RecDescent, released
10 April 9, 2003.
11
13 use Parse::RecDescent;
14
15 # Generate a parser from the specification in $grammar:
16
17 $parser = new Parse::RecDescent ($grammar);
18
19 # Generate a parser from the specification in $othergrammar
20
21 $anotherparser = new Parse::RecDescent ($othergrammar);
22
23 # Parse $text using rule 'startrule' (which must be
24 # defined in $grammar):
25
26 $parser->startrule($text);
27
28 # Parse $text using rule 'otherrule' (which must also
29 # be defined in $grammar):
30
31 $parser->otherrule($text);
32
33 # Change the universal token prefix pattern
34 # (the default is: '\s*'):
35
36 $Parse::RecDescent::skip = '[ \t]+';
37
38 # Replace productions of existing rules (or create new ones)
39 # with the productions defined in $newgrammar:
40
41 $parser->Replace($newgrammar);
42
43 # Extend existing rules (or create new ones)
44 # by adding extra productions defined in $moregrammar:
45
46 $parser->Extend($moregrammar);
47
48 # Global flags (useful as command line arguments under -s):
49
50 $::RD_ERRORS # unless undefined, report fatal errors
51 $::RD_WARN # unless undefined, also report non-fatal problems
52 $::RD_HINT # if defined, also suggestion remedies
53 $::RD_TRACE # if defined, also trace parsers' behaviour
54 $::RD_AUTOSTUB # if defined, generates "stubs" for undefined rules
55 $::RD_AUTOACTION # if defined, appends specified action to productions
56
58 Overview
59
60 Parse::RecDescent incrementally generates top-down recursive-descent
61 text parsers from simple yacc-like grammar specifications. It provides:
62
63 · Regular expressions or literal strings as terminals (tokens),
64
65 · Multiple (non-contiguous) productions for any rule,
66
67 · Repeated and optional subrules within productions,
68
69 · Full access to Perl within actions specified as part of the gram‐
70 mar,
71
72 · Simple automated error reporting during parser generation and pars‐
73 ing,
74
75 · The ability to commit to, uncommit to, or reject particular produc‐
76 tions during a parse,
77
78 · The ability to pass data up and down the parse tree ("down" via
79 subrule argument lists, "up" via subrule return values)
80
81 · Incremental extension of the parsing grammar (even during a parse),
82
83 · Precompilation of parser objects,
84
85 · User-definable reduce-reduce conflict resolution via "scoring" of
86 matching productions.
87
88 Using "Parse::RecDescent"
89
90 Parser objects are created by calling "Parse::RecDescent::new", passing
91 in a grammar specification (see the following subsections). If the
92 grammar is correct, "new" returns a blessed reference which can then be
93 used to initiate parsing through any rule specified in the original
94 grammar. A typical sequence looks like this:
95
96 $grammar = q {
97 # GRAMMAR SPECIFICATION HERE
98 };
99
100 $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n";
101
102 # acquire $text
103
104 defined $parser->startrule($text) or print "Bad text!\n";
105
106 The rule through which parsing is initiated must be explicitly defined
107 in the grammar (i.e. for the above example, the grammar must include a
108 rule of the form: "startrule: <subrules>".
109
110 If the starting rule succeeds, its value (see below) is returned. Fail‐
111 ure to generate the original parser or failure to match a text is indi‐
112 cated by returning "undef". Note that it's easy to set up grammars that
113 can succeed, but which return a value of 0, "0", or "". So don't be
114 tempted to write:
115
116 $parser->startrule($text) or print "Bad text!\n";
117
118 Normally, the parser has no effect on the original text. So in the pre‐
119 vious example the value of $text would be unchanged after having been
120 parsed.
121
122 If, however, the text to be matched is passed by reference:
123
124 $parser->startrule(\$text)
125
126 then any text which was consumed during the match will be removed from
127 the start of $text.
128
129 Rules
130
131 In the grammar from which the parser is built, rules are specified by
132 giving an identifier (which must satisfy /[A-Za-z]\w*/), followed by a
133 colon on the same line, followed by one or more productions, separated
134 by single vertical bars. The layout of the productions is entirely
135 free-format:
136
137 rule1: production1
138 ⎪ production2 ⎪
139 production3 ⎪ production4
140
141 At any point in the grammar previously defined rules may be extended
142 with additional productions. This is achieved by redeclaring the rule
143 with the new productions. Thus:
144
145 rule1: a ⎪ b ⎪ c
146 rule2: d ⎪ e ⎪ f
147 rule1: g ⎪ h
148
149 is exactly equivalent to:
150
151 rule1: a ⎪ b ⎪ c ⎪ g ⎪ h
152 rule2: d ⎪ e ⎪ f
153
154 Each production in a rule consists of zero or more items, each of which
155 may be either: the name of another rule to be matched (a "subrule"), a
156 pattern or string literal to be matched directly (a "token"), a block
157 of Perl code to be executed (an "action"), a special instruction to the
158 parser (a "directive"), or a standard Perl comment (which is ignored).
159
160 A rule matches a text if one of its productions matches. A production
161 matches if each of its items match consecutive substrings of the text.
162 The productions of a rule being matched are tried in the same order
163 that they appear in the original grammar, and the first matching pro‐
164 duction terminates the match attempt (successfully). If all productions
165 are tried and none matches, the match attempt fails.
166
167 Note that this behaviour is quite different from the "prefer the longer
168 match" behaviour of yacc. For example, if yacc were parsing the rule:
169
170 seq : 'A' 'B'
171 ⎪ 'A' 'B' 'C'
172
173 upon matching "AB" it would look ahead to see if a 'C' is next and, if
174 so, will match the second production in preference to the first. In
175 other words, yacc effectively tries all the productions of a rule
176 breadth-first in parallel, and selects the "best" match, where "best"
177 means longest (note that this is a gross simplification of the true be‐
178 haviour of yacc but it will do for our purposes).
179
180 In contrast, "Parse::RecDescent" tries each production depth-first in
181 sequence, and selects the "best" match, where "best" means first. This
182 is the fundamental difference between "bottom-up" and "recursive
183 descent" parsing.
184
185 Each successfully matched item in a production is assigned a value,
186 which can be accessed in subsequent actions within the same production
187 (or, in some cases, as the return value of a successful subrule call).
188 Unsuccessful items don't have an associated value, since the failure of
189 an item causes the entire surrounding production to immediately fail.
190 The following sections describe the various types of items and their
191 success values.
192
193 Subrules
194
195 A subrule which appears in a production is an instruction to the parser
196 to attempt to match the named rule at that point in the text being
197 parsed. If the named subrule is not defined when requested the produc‐
198 tion containing it immediately fails (unless it was "autostubbed" - see
199 Autostubbing).
200
201 A rule may (recursively) call itself as a subrule, but not as the left-
202 most item in any of its productions (since such recursions are usually
203 non-terminating).
204
205 The value associated with a subrule is the value associated with its
206 $return variable (see "Actions" below), or with the last successfully
207 matched item in the subrule match.
208
209 Subrules may also be specified with a trailing repetition specifier,
210 indicating that they are to be (greedily) matched the specified number
211 of times. The available specifiers are:
212
213 subrule(?) # Match one-or-zero times
214 subrule(s) # Match one-or-more times
215 subrule(s?) # Match zero-or-more times
216 subrule(N) # Match exactly N times for integer N > 0
217 subrule(N..M) # Match between N and M times
218 subrule(..M) # Match between 1 and M times
219 subrule(N..) # Match at least N times
220
221 Repeated subrules keep matching until either the subrule fails to
222 match, or it has matched the minimal number of times but fails to con‐
223 sume any of the parsed text (this second condition prevents the subrule
224 matching forever in some cases).
225
226 Since a repeated subrule may match many instances of the subrule
227 itself, the value associated with it is not a simple scalar, but rather
228 a reference to a list of scalars, each of which is the value associated
229 with one of the individual subrule matches. In other words in the rule:
230
231 program: statement(s)
232
233 the value associated with the repeated subrule "statement(s)" is a ref‐
234 erence to an array containing the values matched by each call to the
235 individual subrule "statement".
236
237 Repetition modifieres may include a separator pattern:
238
239 program: statement(s /;/)
240
241 specifying some sequence of characters to be skipped between each repe‐
242 tition. This is really just a shorthand for the <leftop:...> directive
243 (see below).
244
245 Tokens
246
247 If a quote-delimited string or a Perl regex appears in a production,
248 the parser attempts to match that string or pattern at that point in
249 the text. For example:
250
251 typedef: "typedef" typename identifier ';'
252
253 identifier: /[A-Za-z_][A-Za-z0-9_]*/
254
255 As in regular Perl, a single quoted string is uninterpolated, whilst a
256 double-quoted string or a pattern is interpolated (at the time of
257 matching, not when the parser is constructed). Hence, it is possible to
258 define rules in which tokens can be set at run-time:
259
260 typedef: "$::typedefkeyword" typename identifier ';'
261
262 identifier: /$::identpat/
263
264 Note that, since each rule is implemented inside a special namespace
265 belonging to its parser, it is necessary to explicitly quantify vari‐
266 ables from the main package.
267
268 Regex tokens can be specified using just slashes as delimiters or with
269 the explicit "m<delimiter>......<delimiter>" syntax:
270
271 typedef: "typedef" typename identifier ';'
272
273 typename: /[A-Za-z_][A-Za-z0-9_]*/
274
275 identifier: m{[A-Za-z_][A-Za-z0-9_]*}
276
277 A regex of either type can also have any valid trailing parameter(s)
278 (that is, any of [cgimsox]):
279
280 typedef: "typedef" typename identifier ';'
281
282 identifier: / [a-z_] # LEADING ALPHA OR UNDERSCORE
283 [a-z0-9_]* # THEN DIGITS ALSO ALLOWED
284 /ix # CASE/SPACE/COMMENT INSENSITIVE
285
286 The value associated with any successfully matched token is a string
287 containing the actual text which was matched by the token.
288
289 It is important to remember that, since each grammar is specified in a
290 Perl string, all instances of the universal escape character '\' within
291 a grammar must be "doubled", so that they interpolate to single '\'s
292 when the string is compiled. For example, to use the grammar:
293
294 word: /\S+/ ⎪ backslash
295 line: prefix word(s) "\n"
296 backslash: '\\'
297
298 the following code is required:
299
300 $parser = new Parse::RecDescent (q{
301
302 word: /\\S+/ ⎪ backslash
303 line: prefix word(s) "\\n"
304 backslash: '\\\\'
305
306 });
307
308 Terminal Separators
309
310 For the purpose of matching, each terminal in a production is consid‐
311 ered to be preceded by a "prefix" - a pattern which must be matched
312 before a token match is attempted. By default, the prefix is optional
313 whitespace (which always matches, at least trivially), but this default
314 may be reset in any production.
315
316 The variable $Parse::RecDescent::skip stores the universal prefix,
317 which is the default for all terminal matches in all parsers built with
318 "Parse::RecDescent".
319
320 The prefix for an individual production can be altered by using the
321 "<skip:...>" directive (see below).
322
323 Actions
324
325 An action is a block of Perl code which is to be executed (as the block
326 of a "do" statement) when the parser reaches that point in a produc‐
327 tion. The action executes within a special namespace belonging to the
328 active parser, so care must be taken in correctly qualifying variable
329 names (see also "Start-up Actions" below).
330
331 The action is considered to succeed if the final value of the block is
332 defined (that is, if the implied "do" statement evaluates to a defined
333 value - even one which would be treated as "false"). Note that the
334 value associated with a successful action is also the final value in
335 the block.
336
337 An action will fail if its last evaluated value is "undef". This is
338 surprisingly easy to accomplish by accident. For instance, here's an
339 infuriating case of an action that makes its production fail, but only
340 when debugging isn't activated:
341
342 description: name rank serial_number
343 { print "Got $item[2] $item[1] ($item[3])\n"
344 if $::debugging
345 }
346
347 If $debugging is false, no statement in the block is executed, so the
348 final value is "undef", and the entire production fails. The solution
349 is:
350
351 description: name rank serial_number
352 { print "Got $item[2] $item[1] ($item[3])\n"
353 if $::debugging;
354 1;
355 }
356
357 Within an action, a number of useful parse-time variables are available
358 in the special parser namespace (there are other variables also acces‐
359 sible, but meddling with them will probably just break your parser. As
360 a general rule, if you avoid referring to unqualified variables - espe‐
361 cially those starting with an underscore - inside an action, things
362 should be okay):
363
364 @item and %item
365 The array slice @item[1..$#item] stores the value associated with
366 each item (that is, each subrule, token, or action) in the current
367 production. The analogy is to $1, $2, etc. in a yacc grammar. Note
368 that, for obvious reasons, @item only contains the values of items
369 before the current point in the production.
370
371 The first element ($item[0]) stores the name of the current rule
372 being matched.
373
374 @item is a standard Perl array, so it can also be indexed with neg‐
375 ative numbers, representing the number of items back from the cur‐
376 rent position in the parse:
377
378 stuff: /various/ bits 'and' pieces "then" data 'end'
379 { print $item[-2] } # PRINTS data
380 # (EASIER THAN: $item[6])
381
382 The %item hash complements the <@item> array, providing named
383 access to the same item values:
384
385 stuff: /various/ bits 'and' pieces "then" data 'end'
386 { print $item{data} # PRINTS data
387 # (EVEN EASIER THAN USING @item)
388
389 The results of named subrules are stored in the hash under each
390 subrule's name (including the repetition specifier, if any), whilst
391 all other items are stored under a "named positional" key that
392 indictates their ordinal position within their item type:
393 __STRINGn__, __PATTERNn__, __DIRECTIVEn__, __ACTIONn__:
394
395 stuff: /various/ bits 'and' pieces "then" data 'end' { save }
396 { print $item{__PATTERN1__}, # PRINTS 'various'
397 $item{__STRING2__}, # PRINTS 'then'
398 $item{__ACTION1__}, # PRINTS RETURN
399 # VALUE OF save
400 }
401
402 If you want proper named access to patterns or literals, you need
403 to turn them into separate rules:
404
405 stuff: various bits 'and' pieces "then" data 'end'
406 { print $item{various} # PRINTS various
407 }
408
409 various: /various/
410
411 The special entry $item{__RULE__} stores the name of the current
412 rule (i.e. the same value as $item[0].
413
414 The advantage of using %item, instead of @items is that it removes
415 the need to track items positions that may change as a grammar
416 evolves. For example, adding an interim "<skip>" directive of
417 action can silently ruin a trailing action, by moving an @item ele‐
418 ment "down" the array one place. In contrast, the named entry of
419 %item is unaffected by such an insertion.
420
421 A limitation of the %item hash is that it only records the last
422 value of a particular subrule. For example:
423
424 range: '(' number '..' number )'
425 { $return = $item{number} }
426
427 will return only the value corresponding to the second match of the
428 "number" subrule. In other words, successive calls to a subrule
429 overwrite the corresponding entry in %item. Once again, the solu‐
430 tion is to rename each subrule in its own rule:
431
432 range: '(' from_num '..' to_num )'
433 { $return = $item{from_num} }
434
435 from_num: number
436 to_num: number
437
438 @arg and %arg
439 The array @arg and the hash %arg store any arguments passed to the
440 rule from some other rule (see ""Subrule argument lists"). Changes
441 to the elements of either variable do not propagate back to the
442 calling rule (data can be passed back from a subrule via the
443 $return variable - see next item).
444
445 $return
446 If a value is assigned to $return within an action, that value is
447 returned if the production containing the action eventually matches
448 successfully. Note that setting $return doesn't cause the current
449 production to succeed. It merely tells it what to return if it does
450 succeed. Hence $return is analogous to $$ in a yacc grammar.
451
452 If $return is not assigned within a production, the value of the
453 last component of the production (namely: $item[$#item]) is
454 returned if the production succeeds.
455
456 $commit
457 The current state of commitment to the current production (see
458 "Directives" below).
459
460 $skip
461 The current terminal prefix (see "Directives" below).
462
463 $text
464 The remaining (unparsed) text. Changes to $text do not propagate
465 out of unsuccessful productions, but do survive successful produc‐
466 tions. Hence it is possible to dynamically alter the text being
467 parsed - for example, to provide a "#include"-like facility:
468
469 hash_include: '#include' filename
470 { $text = ::loadfile($item[2]) . $text }
471
472 filename: '<' /[a-z0-9._-]+/i '>' { $return = $item[2] }
473 ⎪ '"' /[a-z0-9._-]+/i '"' { $return = $item[2] }
474
475 $thisline and $prevline
476 $thisline stores the current line number within the current parse
477 (starting from 1). $prevline stores the line number for the last
478 character which was already successfully parsed (this will be dif‐
479 ferent from $thisline at the end of each line).
480
481 For efficiency, $thisline and $prevline are actually tied hashes,
482 and only recompute the required line number when the variable's
483 value is used.
484
485 Assignment to $thisline adjusts the line number calculator, so that
486 it believes that the current line number is the value being
487 assigned. Note that this adjustment will be reflected in all subse‐
488 quent line numbers calculations.
489
490 Modifying the value of the variable $text (as in the previous
491 "hash_include" example, for instance) will confuse the line count‐
492 ing mechanism. To prevent this, you should call "Parse::RecDes‐
493 cent::LineCounter::resync($thisline)" immediately after any assign‐
494 ment to the variable $text (or, at least, before the next attempt
495 to use $thisline).
496
497 Note that if a production fails after assigning to or resync'ing
498 $thisline, the parser's line counter mechanism will usually be cor‐
499 rupted.
500
501 Also see the entry for @itempos.
502
503 The line number can be set to values other than 1, by calling the
504 start rule with a second argument. For example:
505
506 $parser = new Parse::RecDescent ($grammar);
507
508 $parser->input($text, 10); # START LINE NUMBERS AT 10
509
510 $thiscolumn and $prevcolumn
511 $thiscolumn stores the current column number within the current
512 line being parsed (starting from 1). $prevcolumn stores the column
513 number of the last character which was actually successfully
514 parsed. Usually "$prevcolumn == $thiscolumn-1", but not at the end
515 of lines.
516
517 For efficiency, $thiscolumn and $prevcolumn are actually tied
518 hashes, and only recompute the required column number when the
519 variable's value is used.
520
521 Assignment to $thiscolumn or $prevcolumn is a fatal error.
522
523 Modifying the value of the variable $text (as in the previous
524 "hash_include" example, for instance) may confuse the column count‐
525 ing mechanism.
526
527 Note that $thiscolumn reports the column number before any white‐
528 space that might be skipped before reading a token. Hence if you
529 wish to know where a token started (and ended) use something like
530 this:
531
532 rule: token1 token2 startcol token3 endcol token4
533 { print "token3: columns $item[3] to $item[5]"; }
534
535 startcol: '' { $thiscolumn } # NEED THE '' TO STEP PAST TOKEN SEP
536 endcol: { $prevcolumn }
537
538 Also see the entry for @itempos.
539
540 $thisoffset and $prevoffset
541 $thisoffset stores the offset of the current parsing position
542 within the complete text being parsed (starting from 0). $prevoff‐
543 set stores the offset of the last character which was actually suc‐
544 cessfully parsed. In all cases "$prevoffset == $thisoffset-1".
545
546 For efficiency, $thisoffset and $prevoffset are actually tied
547 hashes, and only recompute the required offset when the variable's
548 value is used.
549
550 Assignment to $thisoffset or <$prevoffset> is a fatal error.
551
552 Modifying the value of the variable $text will not affect the off‐
553 set counting mechanism.
554
555 Also see the entry for @itempos.
556
557 @itempos
558 The array @itempos stores a hash reference corresponding to each
559 element of @item. The elements of the hash provide the following:
560
561 $itempos[$n]{offset}{from} # VALUE OF $thisoffset BEFORE $item[$n]
562 $itempos[$n]{offset}{to} # VALUE OF $prevoffset AFTER $item[$n]
563 $itempos[$n]{line}{from} # VALUE OF $thisline BEFORE $item[$n]
564 $itempos[$n]{line}{to} # VALUE OF $prevline AFTER $item[$n]
565 $itempos[$n]{column}{from} # VALUE OF $thiscolumn BEFORE $item[$n]
566 $itempos[$n]{column}{to} # VALUE OF $prevcolumn AFTER $item[$n]
567
568 Note that the various "$itempos[$n]...{from}" values record the
569 appropriate value after any token prefix has been skipped.
570
571 Hence, instead of the somewhat tedious and error-prone:
572
573 rule: startcol token1 endcol
574 startcol token2 endcol
575 startcol token3 endcol
576 { print "token1: columns $item[1]
577 to $item[3]
578 token2: columns $item[4]
579 to $item[6]
580 token3: columns $item[7]
581 to $item[9]" }
582
583 startcol: '' { $thiscolumn } # NEED THE '' TO STEP PAST TOKEN SEP
584 endcol: { $prevcolumn }
585
586 it is possible to write:
587
588 rule: token1 token2 token3
589 { print "token1: columns $itempos[1]{column}{from}
590 to $itempos[1]{column}{to}
591 token2: columns $itempos[2]{column}{from}
592 to $itempos[2]{column}{to}
593 token3: columns $itempos[3]{column}{from}
594 to $itempos[3]{column}{to}" }
595
596 Note however that (in the current implementation) the use of @item‐
597 pos anywhere in a grammar implies that item positioning information
598 is collected everywhere during the parse. Depending on the grammar
599 and the size of the text to be parsed, this may be prohibitively
600 expensive and the explicit use of $thisline, $thiscolumn, etc. may
601 be a better choice.
602
603 $thisparser
604 A reference to the "Parse::RecDescent" object through which parsing
605 was initiated.
606
607 The value of $thisparser propagates down the subrules of a parse
608 but not back up. Hence, you can invoke subrules from another parser
609 for the scope of the current rule as follows:
610
611 rule: subrule1 subrule2
612 ⎪ { $thisparser = $::otherparser } <reject>
613 ⎪ subrule3 subrule4
614 ⎪ subrule5
615
616 The result is that the production calls "subrule1" and "subrule2"
617 of the current parser, and the remaining productions call the named
618 subrules from $::otherparser. Note, however that "Bad Things" will
619 happen if "::otherparser" isn't a blessed reference and/or doesn't
620 have methods with the same names as the required subrules!
621
622 $thisrule
623 A reference to the "Parse::RecDescent::Rule" object corresponding
624 to the rule currently being matched.
625
626 $thisprod
627 A reference to the "Parse::RecDescent::Production" object corre‐
628 sponding to the production currently being matched.
629
630 $score and $score_return
631 $score stores the best production score to date, as specified by an
632 earlier "<score:...>" directive. $score_return stores the corre‐
633 sponding return value for the successful production.
634
635 See "Scored productions".
636
637 Warning: the parser relies on the information in the various "this..."
638 objects in some non-obvious ways. Tinkering with the other members of
639 these objects will probably cause Bad Things to happen, unless you
640 really know what you're doing. The only exception to this advice is
641 that the use of "$this...->{local}" is always safe.
642
643 Start-up Actions
644
645 Any actions which appear before the first rule definition in a grammar
646 are treated as "start-up" actions. Each such action is stripped of its
647 outermost brackets and then evaluated (in the parser's special names‐
648 pace) just before the rules of the grammar are first compiled.
649
650 The main use of start-up actions is to declare local variables within
651 the parser's special namespace:
652
653 { my $lastitem = '???'; }
654
655 list: item(s) { $return = $lastitem }
656
657 item: book { $lastitem = 'book'; }
658 bell { $lastitem = 'bell'; }
659 candle { $lastitem = 'candle'; }
660
661 but start-up actions can be used to execute any valid Perl code within
662 a parser's special namespace.
663
664 Start-up actions can appear within a grammar extension or replacement
665 (that is, a partial grammar installed via "Parse::RecDescent::Extend()"
666 or "Parse::RecDescent::Replace()" - see "Incremental Parsing"), and
667 will be executed before the new grammar is installed. Note, however,
668 that a particular start-up action is only ever executed once.
669
670 Autoactions
671
672 It is sometimes desirable to be able to specify a default action to be
673 taken at the end of every production (for example, in order to easily
674 build a parse tree). If the variable $::RD_AUTOACTION is defined when
675 "Parse::RecDescent::new()" is called, the contents of that variable are
676 treated as a specification of an action which is to appended to each
677 production in the corresponding grammar. So, for example, to construct
678 a simple parse tree:
679
680 $::RD_AUTOACTION = q { [@item] };
681
682 parser = new Parse::RecDescent (q{
683 expression: and_expr '⎪⎪' expression ⎪ and_expr
684 and_expr: not_expr '&&' and_expr ⎪ not_expr
685 not_expr: '!' brack_expr ⎪ brack_expr
686 brack_expr: '(' expression ')' ⎪ identifier
687 identifier: /[a-z]+/i
688 });
689
690 which is equivalent to:
691
692 parser = new Parse::RecDescent (q{
693 expression: and_expr '⎪⎪' expression
694 { [@item] }
695 ⎪ and_expr
696 { [@item] }
697
698 and_expr: not_expr '&&' and_expr
699 { [@item] }
700 ⎪ not_expr
701 { [@item] }
702
703 not_expr: '!' brack_expr
704 { [@item] }
705 ⎪ brack_expr
706 { [@item] }
707
708 brack_expr: '(' expression ')'
709 { [@item] }
710 ⎪ identifier
711 { [@item] }
712
713 identifier: /[a-z]+/i
714 { [@item] }
715 });
716
717 Alternatively, we could take an object-oriented approach, use different
718 classes for each node (and also eliminating redundant intermediate
719 nodes):
720
721 $::RD_AUTOACTION = q
722 { $#item==1 ? $item[1] : new ${"$item[0]_node"} (@item[1..$#item]) };
723
724 parser = new Parse::RecDescent (q{
725 expression: and_expr '⎪⎪' expression ⎪ and_expr
726 and_expr: not_expr '&&' and_expr ⎪ not_expr
727 not_expr: '!' brack_expr ⎪ brack_expr
728 brack_expr: '(' expression ')' ⎪ identifier
729 identifier: /[a-z]+/i
730 });
731
732 which is equivalent to:
733
734 parser = new Parse::RecDescent (q{
735 expression: and_expr '⎪⎪' expression
736 { new expression_node (@item[1..3]) }
737 ⎪ and_expr
738
739 and_expr: not_expr '&&' and_expr
740 { new and_expr_node (@item[1..3]) }
741 ⎪ not_expr
742
743 not_expr: '!' brack_expr
744 { new not_expr_node (@item[1..2]) }
745 ⎪ brack_expr
746
747 brack_expr: '(' expression ')'
748 { new brack_expr_node (@item[1..3]) }
749 ⎪ identifier
750
751 identifier: /[a-z]+/i
752 { new identifer_node (@item[1]) }
753 });
754
755 Note that, if a production already ends in an action, no autoaction is
756 appended to it. For example, in this version:
757
758 $::RD_AUTOACTION = q
759 { $#item==1 ? $item[1] : new ${"$item[0]_node"} (@item[1..$#item]) };
760
761 parser = new Parse::RecDescent (q{
762 expression: and_expr '&&' expression ⎪ and_expr
763 and_expr: not_expr '&&' and_expr ⎪ not_expr
764 not_expr: '!' brack_expr ⎪ brack_expr
765 brack_expr: '(' expression ')' ⎪ identifier
766 identifier: /[a-z]+/i
767 { new terminal_node($item[1]) }
768 });
769
770 each "identifier" match produces a "terminal_node" object, not an
771 "identifier_node" object.
772
773 A level 1 warning is issued each time an "autoaction" is added to some
774 production.
775
776 Autotrees
777
778 A commonly needed autoaction is one that builds a parse-tree. It is
779 moderately tricky to set up such an action (which must treat terminals
780 differently from non-terminals), so Parse::RecDescent simplifies the
781 process by providing the "<autotree>" directive.
782
783 If this directive appears at the start of grammar, it causes
784 Parse::RecDescent to insert autoactions at the end of any rule except
785 those which already end in an action. The action inserted depends on
786 whether the production is an intermediate rule (two or more items), or
787 a terminal of the grammar (i.e. a single pattern or string item).
788
789 So, for example, the following grammar:
790
791 <autotree>
792
793 file : command(s)
794 command : get ⎪ set ⎪ vet
795 get : 'get' ident ';'
796 set : 'set' ident 'to' value ';'
797 vet : 'check' ident 'is' value ';'
798 ident : /\w+/
799 value : /\d+/
800
801 is equivalent to:
802
803 file : command(s) { bless \%item, $item[0] }
804 command : get { bless \%item, $item[0] }
805 ⎪ set { bless \%item, $item[0] }
806 ⎪ vet { bless \%item, $item[0] }
807 get : 'get' ident ';' { bless \%item, $item[0] }
808 set : 'set' ident 'to' value ';' { bless \%item, $item[0] }
809 vet : 'check' ident 'is' value ';' { bless \%item, $item[0] }
810
811 ident : /\w+/ { bless {__VALUE__=>$item[1]}, $item[0] }
812 value : /\d+/ { bless {__VALUE__=>$item[1]}, $item[0] }
813
814 Note that each node in the tree is blessed into a class of the same
815 name as the rule itself. This makes it easy to build object-oriented
816 processors for the parse-trees that the grammar produces. Note too that
817 the last two rules produce special objects with the single attribute
818 '__VALUE__'. This is because they consist solely of a single terminal.
819
820 This autoaction-ed grammar would then produce a parse tree in a data
821 structure like this:
822
823 {
824 file => {
825 command => {
826 [ get => {
827 identifier => { __VALUE__ => 'a' },
828 },
829 set => {
830 identifier => { __VALUE__ => 'b' },
831 value => { __VALUE__ => '7' },
832 },
833 vet => {
834 identifier => { __VALUE__ => 'b' },
835 value => { __VALUE__ => '7' },
836 },
837 ],
838 },
839 }
840 }
841
842 (except, of course, that each nested hash would also be blessed into
843 the appropriate class).
844
845 Autostubbing
846
847 Normally, if a subrule appears in some production, but no rule of that
848 name is ever defined in the grammar, the production which refers to the
849 non-existent subrule fails immediately. This typically occurs as a
850 result of misspellings, and is a sufficiently common occurance that a
851 warning is generated for such situations.
852
853 However, when prototyping a grammar it is sometimes useful to be able
854 to use subrules before a proper specification of them is really possi‐
855 ble. For example, a grammar might include a section like:
856
857 function_call: identifier '(' arg(s?) ')'
858
859 identifier: /[a-z]\w*/i
860
861 where the possible format of an argument is sufficiently complex that
862 it is not worth specifying in full until the general function call syn‐
863 tax has been debugged. In this situation it is convenient to leave the
864 real rule "arg" undefined and just slip in a placeholder (or "stub"):
865
866 arg: 'arg'
867
868 so that the function call syntax can be tested with dummy input such
869 as:
870
871 f0()
872 f1(arg)
873 f2(arg arg)
874 f3(arg arg arg)
875
876 et cetera.
877
878 Early in prototyping, many such "stubs" may be required, so
879 "Parse::RecDescent" provides a means of automating their definition.
880 If the variable $::RD_AUTOSTUB is defined when a parser is built, a
881 subrule reference to any non-existent rule (say, "sr"), causes a "stub"
882 rule of the form:
883
884 sr: 'sr'
885
886 to be automatically defined in the generated parser. A level 1 warning
887 is issued for each such "autostubbed" rule.
888
889 Hence, with $::AUTOSTUB defined, it is possible to only partially spec‐
890 ify a grammar, and then "fake" matches of the unspecified (sub)rules by
891 just typing in their name.
892
893 Look-ahead
894
895 If a subrule, token, or action is prefixed by "...", then it is treated
896 as a "look-ahead" request. That means that the current production can
897 (as usual) only succeed if the specified item is matched, but that the
898 matching does not consume any of the text being parsed. This is very
899 similar to the "/(?=...)/" look-ahead construct in Perl patterns. Thus,
900 the rule:
901
902 inner_word: word ...word
903
904 will match whatever the subrule "word" matches, provided that match is
905 followed by some more text which subrule "word" would also match
906 (although this second substring is not actually consumed by
907 "inner_word")
908
909 Likewise, a "...!" prefix, causes the following item to succeed (with‐
910 out consuming any text) if and only if it would normally fail. Hence, a
911 rule such as:
912
913 identifier: ...!keyword ...!'_' /[A-Za-z_]\w*/
914
915 matches a string of characters which satisfies the pattern
916 "/[A-Za-z_]\w*/", but only if the same sequence of characters would not
917 match either subrule "keyword" or the literal token '_'.
918
919 Sequences of look-ahead prefixes accumulate, multiplying their positive
920 and/or negative senses. Hence:
921
922 inner_word: word ...!......!word
923
924 is exactly equivalent the the original example above (a warning is
925 issued in cases like these, since they often indicate something left
926 out, or misunderstood).
927
928 Note that actions can also be treated as look-aheads. In such cases,
929 the state of the parser text (in the local variable $text) after the
930 look-ahead action is guaranteed to be identical to its state before the
931 action, regardless of how it's changed within the action (unless you
932 actually undefine $text, in which case you get the disaster you deserve
933 :-).
934
935 Directives
936
937 Directives are special pre-defined actions which may be used to alter
938 the behaviour of the parser. There are currently eighteen directives:
939 "<commit>", "<uncommit>", "<reject>", "<score>", "<autoscore>",
940 "<skip>", "<resync>", "<error>", "<rulevar>", "<matchrule>",
941 "<leftop>", "<rightop>", "<defer>", "<nocheck>", "<perl_quotelike>",
942 "<perl_codeblock>", "<perl_variable>", and "<token>".
943
944 Committing and uncommitting
945 The "<commit>" and "<uncommit>" directives permit the recursive
946 descent of the parse tree to be pruned (or "cut") for efficiency.
947 Within a rule, a "<commit>" directive instructs the rule to ignore
948 subsequent productions if the current production fails. For exam‐
949 ple:
950
951 command: 'find' <commit> filename
952 ⎪ 'open' <commit> filename
953 ⎪ 'move' filename filename
954
955 Clearly, if the leading token 'find' is matched in the first pro‐
956 duction but that production fails for some other reason, then the
957 remaining productions cannot possibly match. The presence of the
958 "<commit>" causes the "command" rule to fail immediately if an
959 invalid "find" command is found, and likewise if an invalid "open"
960 command is encountered.
961
962 It is also possible to revoke a previous commitment. For example:
963
964 if_statement: 'if' <commit> condition
965 'then' block <uncommit>
966 'else' block
967 ⎪ 'if' <commit> condition
968 'then' block
969
970 In this case, a failure to find an "else" block in the first pro‐
971 duction shouldn't preclude trying the second production, but a
972 failure to find a "condition" certainly should.
973
974 As a special case, any production in which the first item is an
975 "<uncommit>" immediately revokes a preceding "<commit>" (even
976 though the production would not otherwise have been tried). For
977 example, in the rule:
978
979 request: 'explain' expression
980 ⎪ 'explain' <commit> keyword
981 ⎪ 'save'
982 ⎪ 'quit'
983 ⎪ <uncommit> term '?'
984
985 if the text being matched was "explain?", and the first two produc‐
986 tions failed, then the "<commit>" in production two would cause
987 productions three and four to be skipped, but the leading "<uncom‐
988 mit>" in the production five would allow that production to attempt
989 a match.
990
991 Note in the preceding example, that the "<commit>" was only placed
992 in production two. If production one had been:
993
994 request: 'explain' <commit> expression
995
996 then production two would be (inappropriately) skipped if a leading
997 "explain..." was encountered.
998
999 Both "<commit>" and "<uncommit>" directives always succeed, and
1000 their value is always 1.
1001
1002 Rejecting a production
1003 The "<reject>" directive immediately causes the current production
1004 to fail (it is exactly equivalent to, but more obvious than, the
1005 action "{undef}"). A "<reject>" is useful when it is desirable to
1006 get the side effects of the actions in one production, without
1007 prejudicing a match by some other production later in the rule. For
1008 example, to insert tracing code into the parse:
1009
1010 complex_rule: { print "In complex rule...\n"; } <reject>
1011
1012 complex_rule: simple_rule '+' 'i' '*' simple_rule
1013 ⎪ 'i' '*' simple_rule
1014 ⎪ simple_rule
1015
1016 It is also possible to specify a conditional rejection, using the
1017 form "<reject:condition>", which only rejects if the specified con‐
1018 dition is true. This form of rejection is exactly equivalent to the
1019 action "{(condition)?undef:1}>". For example:
1020
1021 command: save_command
1022 ⎪ restore_command
1023 ⎪ <reject: defined $::tolerant> { exit }
1024 ⎪ <error: Unknown command. Ignored.>
1025
1026 A "<reject>" directive never succeeds (and hence has no associated
1027 value). A conditional rejection may succeed (if its condition is
1028 not satisfied), in which case its value is 1.
1029
1030 As an extra optimization, "Parse::RecDescent" ignores any produc‐
1031 tion which begins with an unconditional "<reject>" directive, since
1032 any such production can never successfully match or have any useful
1033 side-effects. A level 1 warning is issued in all such cases.
1034
1035 Note that productions beginning with conditional "<reject:...>"
1036 directives are never "optimized away" in this manner, even if they
1037 are always guaranteed to fail (for example: "<reject:1>")
1038
1039 Due to the way grammars are parsed, there is a minor restriction on
1040 the condition of a conditional "<reject:...>": it cannot contain
1041 any raw '<' or '>' characters. For example:
1042
1043 line: cmd <reject: $thiscolumn > max> data
1044
1045 results in an error when a parser is built from this grammar (since
1046 the grammar parser has no way of knowing whether the first > is a
1047 "less than" or the end of the "<reject:...>".
1048
1049 To overcome this problem, put the condition inside a do{} block:
1050
1051 line: cmd <reject: do{$thiscolumn > max}> data
1052
1053 Note that the same problem may occur in other directives that take
1054 arguments. The same solution will work in all cases.
1055
1056 Skipping between terminals
1057 The "<skip>" directive enables the terminal prefix used in a pro‐
1058 duction to be changed. For example:
1059
1060 OneLiner: Command <skip:'[ \t]*'> Arg(s) /;/
1061
1062 causes only blanks and tabs to be skipped before terminals in the
1063 "Arg" subrule (and any of its subrules>, and also before the final
1064 "/;/" terminal. Once the production is complete, the previous ter‐
1065 minal prefix is reinstated. Note that this implies that distinct
1066 productions of a rule must reset their terminal prefixes individu‐
1067 ally.
1068
1069 The "<skip>" directive evaluates to the previous terminal prefix,
1070 so it's easy to reinstate a prefix later in a production:
1071
1072 Command: <skip:","> CSV(s) <skip:$item[1]> Modifier
1073
1074 The value specified after the colon is interpolated into a pattern,
1075 so all of the following are equivalent (though their efficiency
1076 increases down the list):
1077
1078 <skip: "$colon⎪$comma"> # ASSUMING THE VARS HOLD THE OBVIOUS VALUES
1079
1080 <skip: ':⎪,'>
1081
1082 <skip: q{[:,]}>
1083
1084 <skip: qr/[:,]/>
1085
1086 There is no way of directly setting the prefix for an entire rule,
1087 except as follows:
1088
1089 Rule: <skip: '[ \t]*'> Prod1
1090 ⎪ <skip: '[ \t]*'> Prod2a Prod2b
1091 ⎪ <skip: '[ \t]*'> Prod3
1092
1093 or, better:
1094
1095 Rule: <skip: '[ \t]*'>
1096 (
1097 Prod1
1098 ⎪ Prod2a Prod2b
1099 ⎪ Prod3
1100 )
1101
1102 Note: Up to release 1.51 of Parse::RecDescent, an entirely differ‐
1103 ent mechanism was used for specifying terminal prefixes. The cur‐
1104 rent method is not backwards-compatible with that early approach.
1105 The current approach is stable and will not to change again.
1106
1107 Resynchronization
1108 The "<resync>" directive provides a visually distinctive means of
1109 consuming some of the text being parsed, usually to skip an erro‐
1110 neous input. In its simplest form "<resync>" simply consumes text
1111 up to and including the next newline ("\n") character, succeeding
1112 only if the newline is found, in which case it causes its surround‐
1113 ing rule to return zero on success.
1114
1115 In other words, a "<resync>" is exactly equivalent to the token
1116 "/[^\n]*\n/" followed by the action "{ $return = 0 }" (except that
1117 productions beginning with a "<resync>" are ignored when generating
1118 error messages). A typical use might be:
1119
1120 script : command(s)
1121
1122 command: save_command
1123 ⎪ restore_command
1124 ⎪ <resync> # TRY NEXT LINE, IF POSSIBLE
1125
1126 It is also possible to explicitly specify a resynchronization pat‐
1127 tern, using the "<resync:pattern>" variant. This version succeeds
1128 only if the specified pattern matches (and consumes) the parsed
1129 text. In other words, "<resync:pattern>" is exactly equivalent to
1130 the token "/pattern/" (followed by a "{ $return = 0 }" action). For
1131 example, if commands were terminated by newlines or semi-colons:
1132
1133 command: save_command
1134 ⎪ restore_command
1135 ⎪ <resync:[^;\n]*[;\n]>
1136
1137 The value of a successfully matched "<resync>" directive (of either
1138 type) is the text that it consumed. Note, however, that since the
1139 directive also sets $return, a production consisting of a lone
1140 "<resync>" succeeds but returns the value zero (which a calling
1141 rule may find useful to distinguish between "true" matches and
1142 "tolerant" matches). Remember that returning a zero value indi‐
1143 cates that the rule succeeded (since only an "undef" denotes fail‐
1144 ure within "Parse::RecDescent" parsers.
1145
1146 Error handling
1147 The "<error>" directive provides automatic or user-defined genera‐
1148 tion of error messages during a parse. In its simplest form
1149 "<error>" prepares an error message based on the mismatch between
1150 the last item expected and the text which cause it to fail. For
1151 example, given the rule:
1152
1153 McCoy: curse ',' name ', I'm a doctor, not a' a_profession '!'
1154 ⎪ pronoun 'dead,' name '!'
1155 ⎪ <error>
1156
1157 the following strings would produce the following messages:
1158
1159 "Amen, Jim!"
1160 ERROR (line 1): Invalid McCoy: Expected curse or pronoun
1161 not found
1162
1163 "Dammit, Jim, I'm a doctor!"
1164 ERROR (line 1): Invalid McCoy: Expected ", I'm a doctor, not a"
1165 but found ", I'm a doctor!" instead
1166
1167 "He's dead,\n"
1168 ERROR (line 2): Invalid McCoy: Expected name not found
1169
1170 "He's alive!"
1171 ERROR (line 1): Invalid McCoy: Expected 'dead,' but found
1172 "alive!" instead
1173
1174 "Dammit, Jim, I'm a doctor, not a pointy-eared Vulcan!"
1175 ERROR (line 1): Invalid McCoy: Expected a profession but found
1176 "pointy-eared Vulcan!" instead
1177
1178 Note that, when autogenerating error messages, all underscores in
1179 any rule name used in a message are replaced by single spaces (for
1180 example "a_production" becomes "a production"). Judicious choice of
1181 rule names can therefore considerably improve the readability of
1182 automatic error messages (as well as the maintainability of the
1183 original grammar).
1184
1185 If the automatically generated error is not sufficient, it is pos‐
1186 sible to provide an explicit message as part of the error direc‐
1187 tive. For example:
1188
1189 Spock: "Fascinating ',' (name ⎪ 'Captain') '.'
1190 ⎪ "Highly illogical, doctor."
1191 ⎪ <error: He never said that!>
1192
1193 which would result in all failures to parse a "Spock" subrule
1194 printing the following message:
1195
1196 ERROR (line <N>): Invalid Spock: He never said that!
1197
1198 The error message is treated as a "qq{...}" string and interpolated
1199 when the error is generated (not when the directive is specified!).
1200 Hence:
1201
1202 <error: Mystical error near "$text">
1203
1204 would correctly insert the ambient text string which caused the
1205 error.
1206
1207 There are two other forms of error directive: "<error?>" and
1208 "<error?: msg>". These behave just like "<error>" and
1209 "<error: msg>" respectively, except that they are only triggered if
1210 the rule is "committed" at the time they are encountered. For exam‐
1211 ple:
1212
1213 Scotty: "Ya kenna change the Laws of Phusics," <commit> name
1214 ⎪ name <commit> ',' 'she's goanta blaw!'
1215 ⎪ <error?>
1216
1217 will only generate an error for a string beginning with "Ya kenna
1218 change the Laws o' Phusics," or a valid name, but which still fails
1219 to match the corresponding production. That is,
1220 "$parser->Scotty("Aye, Cap'ain")" will fail silently (since neither
1221 production will "commit" the rule on that input), whereas
1222 "$parser->Scotty("Mr Spock, ah jest kenna do'ut!")" will fail with
1223 the error message:
1224
1225 ERROR (line 1): Invalid Scotty: expected 'she's goanta blaw!'
1226 but found 'I jest kenna do'ut!' instead.
1227
1228 since in that case the second production would commit after match‐
1229 ing the leading name.
1230
1231 Note that to allow this behaviour, all "<error>" directives which
1232 are the first item in a production automatically uncommit the rule
1233 just long enough to allow their production to be attempted (that
1234 is, when their production fails, the commitment is reinstated so
1235 that subsequent productions are skipped).
1236
1237 In order to permanently uncommit the rule before an error message,
1238 it is necessary to put an explicit "<uncommit>" before the
1239 "<error>". For example:
1240
1241 line: 'Kirk:' <commit> Kirk
1242 ⎪ 'Spock:' <commit> Spock
1243 ⎪ 'McCoy:' <commit> McCoy
1244 ⎪ <uncommit> <error?> <reject>
1245 ⎪ <resync>
1246
1247 Error messages generated by the various "<error...>" directives are
1248 not displayed immediately. Instead, they are "queued" in a buffer
1249 and are only displayed once parsing ultimately fails. Moreover,
1250 "<error...>" directives that cause one production of a rule to fail
1251 are automatically removed from the message queue if another produc‐
1252 tion subsequently causes the entire rule to succeed. This means
1253 that you can put "<error...>" directives wherever useful diagnosis
1254 can be done, and only those associated with actual parser failure
1255 will ever be displayed. Also see "Gotchas".
1256
1257 As a general rule, the most useful diagnostics are usually gener‐
1258 ated either at the very lowest level within the grammar, or at the
1259 very highest. A good rule of thumb is to identify those subrules
1260 which consist mainly (or entirely) of terminals, and then put an
1261 "<error...>" directive at the end of any other rule which calls one
1262 or more of those subrules.
1263
1264 There is one other situation in which the output of the various
1265 types of error directive is suppressed; namely, when the rule con‐
1266 taining them is being parsed as part of a "look-ahead" (see
1267 "Look-ahead"). In this case, the error directive will still cause
1268 the rule to fail, but will do so silently.
1269
1270 An unconditional "<error>" directive always fails (and hence has no
1271 associated value). This means that encountering such a directive
1272 always causes the production containing it to fail. Hence an
1273 "<error>" directive will inevitably be the last (useful) item of a
1274 rule (a level 3 warning is issued if a production contains items
1275 after an unconditional "<error>" directive).
1276
1277 An "<error?>" directive will succeed (that is: fail to fail :-), if
1278 the current rule is uncommitted when the directive is encountered.
1279 In that case the directive's associated value is zero. Hence, this
1280 type of error directive can be used before the end of a production.
1281 For example:
1282
1283 command: 'do' <commit> something
1284 ⎪ 'report' <commit> something
1285 ⎪ <error?: Syntax error> <error: Unknown command>
1286
1287 Warning: The "<error?>" directive does not mean "always fail (but
1288 do so silently unless committed)". It actually means "only fail
1289 (and report) if committed, otherwise succeed". To achieve the "fail
1290 silently if uncommitted" semantics, it is necessary to use:
1291
1292 rule: item <commit> item(s)
1293 ⎪ <error?> <reject> # FAIL SILENTLY UNLESS COMMITTED
1294
1295 However, because people seem to expect a lone "<error?>" directive
1296 to work like this:
1297
1298 rule: item <commit> item(s)
1299 ⎪ <error?: Error message if committed>
1300 ⎪ <error: Error message if uncommitted>
1301
1302 Parse::RecDescent automatically appends a "<reject>" directive if
1303 the "<error?>" directive is the only item in a production. A level
1304 2 warning (see below) is issued when this happens.
1305
1306 The level of error reporting during both parser construction and
1307 parsing is controlled by the presence or absence of four global
1308 variables: $::RD_ERRORS, $::RD_WARN, $::RD_HINT, and <$::RD_TRACE>.
1309 If $::RD_ERRORS is defined (and, by default, it is) then fatal
1310 errors are reported.
1311
1312 Whenever $::RD_WARN is defined, certain non-fatal problems are also
1313 reported. Warnings have an associated "level": 1, 2, or 3. The
1314 higher the level, the more serious the warning. The value of the
1315 corresponding global variable ($::RD_WARN) determines the lowest
1316 level of warning to be displayed. Hence, to see all warnings, set
1317 $::RD_WARN to 1. To see only the most serious warnings set
1318 $::RD_WARN to 3. By default $::RD_WARN is initialized to 3, ensur‐
1319 ing that serious but non-fatal errors are automatically reported.
1320
1321 See "DIAGNOSTICS" for a list of the varous error and warning mes‐
1322 sages that Parse::RecDescent generates when these two variables are
1323 defined.
1324
1325 Defining any of the remaining variables (which are not defined by
1326 default) further increases the amount of information reported.
1327 Defining $::RD_HINT causes the parser generator to offer more
1328 detailed analyses and hints on both errors and warnings. Note that
1329 setting $::RD_HINT at any point automagically sets $::RD_WARN to 1.
1330
1331 Defining $::RD_TRACE causes the parser generator and the parser to
1332 report their progress to STDERR in excruciating detail (although,
1333 without hints unless $::RD_HINT is separately defined). This detail
1334 can be moderated in only one respect: if $::RD_TRACE has an integer
1335 value (N) greater than 1, only the N characters of the "current
1336 parsing context" (that is, where in the input string we are at any
1337 point in the parse) is reported at any time.
1338 > $::RD_TRACE is mainly useful for debugging a grammar that
1339 isn't behaving as you expected it to. To this end, if $::RD_TRACE
1340 is defined when a parser is built, any actual parser code which is
1341 generated is also written to a file named "RD_TRACE" in the local
1342 directory.
1343
1344 Note that the four variables belong to the "main" package, which
1345 makes them easier to refer to in the code controlling the parser,
1346 and also makes it easy to turn them into command line flags
1347 ("-RD_ERRORS", "-RD_WARN", "-RD_HINT", "-RD_TRACE") under perl -s.
1348
1349 Specifying local variables
1350 It is occasionally convenient to specify variables which are local
1351 to a single rule. This may be achieved by including a "<rule‐
1352 var:...>" directive anywhere in the rule. For example:
1353
1354 markup: <rulevar: $tag>
1355
1356 markup: tag {($tag=$item[1]) =~ s/^<⎪>$//g} body[$tag]
1357
1358 The example "<rulevar: $tag>" directive causes a "my" variable
1359 named $tag to be declared at the start of the subroutine implement‐
1360 ing the "markup" rule (that is, before the first production,
1361 regardless of where in the rule it is specified).
1362
1363 Specifically, any directive of the form: "<rulevar:text>" causes a
1364 line of the form "my text;" to be added at the beginning of the
1365 rule subroutine, immediately after the definitions of the following
1366 local variables:
1367
1368 $thisparser $commit
1369 $thisrule @item
1370 $thisline @arg
1371 $text %arg
1372
1373 This means that the following "<rulevar>" directives work as
1374 expected:
1375
1376 <rulevar: $count = 0 >
1377
1378 <rulevar: $firstarg = $arg[0] ⎪⎪ '' >
1379
1380 <rulevar: $myItems = \@item >
1381
1382 <rulevar: @context = ( $thisline, $text, @arg ) >
1383
1384 <rulevar: ($name,$age) = $arg{"name","age"} >
1385
1386 If a variable that is also visible to subrules is required, it
1387 needs to be "local"'d, not "my"'d. "rulevar" defaults to "my", but
1388 if "local" is explicitly specified:
1389
1390 <rulevar: local $count = 0 >
1391
1392 then a "local"-ized variable is declared instead, and will be
1393 available within subrules.
1394
1395 Note however that, because all such variables are "my" variables,
1396 their values do not persist between match attempts on a given rule.
1397 To preserve values between match attempts, values can be stored
1398 within the "local" member of the $thisrule object:
1399
1400 countedrule: { $thisrule->{"local"}{"count"}++ }
1401 <reject>
1402 ⎪ subrule1
1403 ⎪ subrule2
1404 ⎪ <reject: $thisrule->{"local"}{"count"} == 1>
1405 subrule3
1406
1407 When matching a rule, each "<rulevar>" directive is matched as if
1408 it were an unconditional "<reject>" directive (that is, it causes
1409 any production in which it appears to immediately fail to match).
1410 For this reason (and to improve readability) it is usual to specify
1411 any "<rulevar>" directive in a separate production at the start of
1412 the rule (this has the added advantage that it enables
1413 "Parse::RecDescent" to optimize away such productions, just as it
1414 does for the "<reject>" directive).
1415
1416 Dynamically matched rules
1417 Because regexes and double-quoted strings are interpolated, it is
1418 relatively easy to specify productions with "context sensitive"
1419 tokens. For example:
1420
1421 command: keyword body "end $item[1]"
1422
1423 which ensures that a command block is bounded by a "<keyword>...end
1424 <same keyword>" pair.
1425
1426 Building productions in which subrules are context sensitive is
1427 also possible, via the "<matchrule:...>" directive. This directive
1428 behaves identically to a subrule item, except that the rule which
1429 is invoked to match it is determined by the string specified after
1430 the colon. For example, we could rewrite the "command" rule like
1431 this:
1432
1433 command: keyword <matchrule:body> "end $item[1]"
1434
1435 Whatever appears after the colon in the directive is treated as an
1436 interpolated string (that is, as if it appeared in "qq{...}" opera‐
1437 tor) and the value of that interpolated string is the name of the
1438 subrule to be matched.
1439
1440 Of course, just putting a constant string like "body" in a
1441 "<matchrule:...>" directive is of little interest or benefit. The
1442 power of directive is seen when we use a string that interpolates
1443 to something interesting. For example:
1444
1445 command: keyword <matchrule:$item[1]_body> "end $item[1]"
1446
1447 keyword: 'while' ⎪ 'if' ⎪ 'function'
1448
1449 while_body: condition block
1450
1451 if_body: condition block ('else' block)(?)
1452
1453 function_body: arglist block
1454
1455 Now the "command" rule selects how to proceed on the basis of the
1456 keyword that is found. It is as if "command" were declared:
1457
1458 command: 'while' while_body "end while"
1459 ⎪ 'if' if_body "end if"
1460 ⎪ 'function' function_body "end function"
1461
1462 When a "<matchrule:...>" directive is used as a repeated subrule,
1463 the rule name expression is "late-bound". That is, the name of the
1464 rule to be called is re-evaluated each time a match attempt is
1465 made. Hence, the following grammar:
1466
1467 { $::species = 'dogs' }
1468
1469 pair: 'two' <matchrule:$::species>(s)
1470
1471 dogs: /dogs/ { $::species = 'cats' }
1472
1473 cats: /cats/
1474
1475 will match the string "two dogs cats cats" completely, whereas it
1476 will only match the string "two dogs dogs dogs" up to the eighth
1477 letter. If the rule name were "early bound" (that is, evaluated
1478 only the first time the directive is encountered in a production),
1479 the reverse behaviour would be expected.
1480
1481 Note that the "matchrule" directive takes a string that is to be
1482 treated as a rule name, not as a rule invocation. That is, it's
1483 like a Perl symbolic reference, not an "eval". Just as you can say:
1484
1485 $subname = 'foo';
1486
1487 # and later...
1488
1489 &{$foo}(@args);
1490
1491 but not:
1492
1493 $subname = 'foo(@args)';
1494
1495 # and later...
1496
1497 &{$foo};
1498
1499 likewise you can say:
1500
1501 $rulename = 'foo';
1502
1503 # and in the grammar...
1504
1505 <matchrule:$rulename>[@args]
1506
1507 but not:
1508
1509 $rulename = 'foo[@args]';
1510
1511 # and in the grammar...
1512
1513 <matchrule:$rulename>
1514
1515 Deferred actions
1516 The "<defer:...>" directive is used to specify an action to be per‐
1517 formed when (and only if!) the current production ultimately suc‐
1518 ceeds.
1519
1520 Whenever a "<defer:...>" directive appears, the code it specifies
1521 is converted to a closure (an anonymous subroutine reference) which
1522 is queued within the active parser object. Note that, because the
1523 deferred code is converted to a closure, the values of any "local"
1524 variable (such as $text, <@item>, etc.) are preserved until the
1525 deferred code is actually executed.
1526
1527 If the parse ultimately succeeds and the production in which the
1528 "<defer:...>" directive was evaluated formed part of the successful
1529 parse, then the deferred code is executed immediately before the
1530 parse returns. If however the production which queued a deferred
1531 action fails, or one of the higher-level rules which called that
1532 production fails, then the deferred action is removed from the
1533 queue, and hence is never executed.
1534
1535 For example, given the grammar:
1536
1537 sentence: noun trans noun
1538 ⎪ noun intrans
1539
1540 noun: 'the dog'
1541 { print "$item[1]\t(noun)\n" }
1542 ⎪ 'the meat'
1543 { print "$item[1]\t(noun)\n" }
1544
1545 trans: 'ate'
1546 { print "$item[1]\t(transitive)\n" }
1547
1548 intrans: 'ate'
1549 { print "$item[1]\t(intransitive)\n" }
1550 ⎪ 'barked'
1551 { print "$item[1]\t(intransitive)\n" }
1552
1553 then parsing the sentence "the dog ate" would produce the output:
1554
1555 the dog (noun)
1556 ate (transitive)
1557 the dog (noun)
1558 ate (intransitive)
1559
1560 This is because, even though the first production of "sentence"
1561 ultimately fails, its initial subrules "noun" and "trans" do match,
1562 and hence they execute their associated actions. Then the second
1563 production of "sentence" succeeds, causing the actions of the sub‐
1564 rules "noun" and "intrans" to be executed as well.
1565
1566 On the other hand, if the actions were replaced by "<defer:...>"
1567 directives:
1568
1569 sentence: noun trans noun
1570 ⎪ noun intrans
1571
1572 noun: 'the dog'
1573 <defer: print "$item[1]\t(noun)\n" >
1574 ⎪ 'the meat'
1575 <defer: print "$item[1]\t(noun)\n" >
1576
1577 trans: 'ate'
1578 <defer: print "$item[1]\t(transitive)\n" >
1579
1580 intrans: 'ate'
1581 <defer: print "$item[1]\t(intransitive)\n" >
1582 ⎪ 'barked'
1583 <defer: print "$item[1]\t(intransitive)\n" >
1584
1585 the output would be:
1586
1587 the dog (noun)
1588 ate (intransitive)
1589
1590 since deferred actions are only executed if they were evaluated in
1591 a production which ultimately contributes to the successful parse.
1592
1593 In this case, even though the first production of "sentence" caused
1594 the subrules "noun" and "trans" to match, that production ulti‐
1595 mately failed and so the deferred actions queued by those subrules
1596 were subsequently disgarded. The second production then succeeded,
1597 causing the entire parse to succeed, and so the deferred actions
1598 queued by the (second) match of the "noun" subrule and the subse‐
1599 quent match of "intrans" are preserved and eventually executed.
1600
1601 Deferred actions provide a means of improving the performance of a
1602 parser, by only executing those actions which are part of the final
1603 parse-tree for the input data.
1604
1605 Alternatively, deferred actions can be viewed as a mechanism for
1606 building (and executing) a customized subroutine corresponding to
1607 the given input data, much in the same way that autoactions (see
1608 "Autoactions") can be used to build a customized data structure for
1609 specific input.
1610
1611 Whether or not the action it specifies is ever executed, a
1612 "<defer:...>" directive always succeeds, returning the number of
1613 deferred actions currently queued at that point.
1614
1615 Parsing Perl
1616 Parse::RecDescent provides limited support for parsing subsets of
1617 Perl, namely: quote-like operators, Perl variables, and complete
1618 code blocks.
1619
1620 The "<perl_quotelike>" directive can be used to parse any Perl
1621 quote-like operator: 'a string', "m/a pattern/", "tr{ans}{lation}",
1622 etc. It does this by calling Text::Balanced::quotelike().
1623
1624 If a quote-like operator is found, a reference to an array of eight
1625 elements is returned. Those elements are identical to the last
1626 eight elements returned by Text::Balanced::extract_quotelike() in
1627 an array context, namely:
1628
1629 [0] the name of the quotelike operator -- 'q', 'qq', 'm', 's', 'tr'
1630 -- if the operator was named; otherwise "undef",
1631
1632 [1] the left delimiter of the first block of the operation,
1633
1634 [2] the text of the first block of the operation (that is, the con‐
1635 tents of a quote, the regex of a match, or substitution or the
1636 target list of a translation),
1637
1638 [3] the right delimiter of the first block of the operation,
1639
1640 [4] the left delimiter of the second block of the operation if
1641 there is one (that is, if it is a "s", "tr", or "y"); otherwise
1642 "undef",
1643
1644 [5] the text of the second block of the operation if there is one
1645 (that is, the replacement of a substitution or the translation
1646 list of a translation); otherwise "undef",
1647
1648 [6] the right delimiter of the second block of the operation (if
1649 any); otherwise "undef",
1650
1651 [7] the trailing modifiers on the operation (if any); otherwise
1652 "undef".
1653
1654 If a quote-like expression is not found, the directive fails with
1655 the usual "undef" value.
1656
1657 The "<perl_variable>" directive can be used to parse any Perl vari‐
1658 able: $scalar, @array, %hash, $ref->{field}[$index], etc. It does
1659 this by calling Text::Balanced::extract_variable().
1660
1661 If the directive matches text representing a valid Perl variable
1662 specification, it returns that text. Otherwise it fails with the
1663 usual "undef" value.
1664
1665 The "<perl_codeblock>" directive can be used to parse curly-brace-
1666 delimited block of Perl code, such as: { $a = 1; f() =~ m/pat/; }.
1667 It does this by calling Text::Balanced::extract_codeblock().
1668
1669 If the directive matches text representing a valid Perl code block,
1670 it returns that text. Otherwise it fails with the usual "undef"
1671 value.
1672
1673 You can also tell it what kind of brackets to use as the outermost
1674 delimiters. For example:
1675
1676 arglist: <perl_codeblock ()>
1677
1678 causes an arglist to match a perl code block whose outermost delim‐
1679 iters are "(...)" (rather than the default "{...}").
1680
1681 Constructing tokens
1682 Eventually, Parse::RecDescent will be able to parse tokenized
1683 input, as well as ordinary strings. In preparation for this joyous
1684 day, the "<token:...>" directive has been provided. This directive
1685 creates a token which will be suitable for input to a Parse::RecDe‐
1686 scent parser (when it eventually supports tokenized input).
1687
1688 The text of the token is the value of the immediately preceding
1689 item in the production. A "<token:...>" directive always succeeds
1690 with a return value which is the hash reference that is the new
1691 token. It also sets the return value for the production to that
1692 hash ref.
1693
1694 The "<token:...>" directive makes it easy to build a Parse::RecDes‐
1695 cent-compatible lexer in Parse::RecDescent:
1696
1697 my $lexer = new Parse::RecDescent q
1698 {
1699 lex: token(s)
1700
1701 token: /a\b/ <token:INDEF>
1702 ⎪ /the\b/ <token:DEF>
1703 ⎪ /fly\b/ <token:NOUN,VERB>
1704 ⎪ /[a-z]+/i { lc $item[1] } <token:ALPHA>
1705 ⎪ <error: Unknown token>
1706
1707 };
1708
1709 which will eventually be able to be used with a regular
1710 Parse::RecDescent grammar:
1711
1712 my $parser = new Parse::RecDescent q
1713 {
1714 startrule: subrule1 subrule 2
1715
1716 # ETC...
1717 };
1718
1719 either with a pre-lexing phase:
1720
1721 $parser->startrule( $lexer->lex($data) );
1722
1723 or with a lex-on-demand approach:
1724
1725 $parser->startrule( sub{$lexer->token(\$data)} );
1726
1727 But at present, only the "<token:...>" directive is actually imple‐
1728 mented. The rest is vapourware.
1729
1730 Specifying operations
1731 One of the commonest requirements when building a parser is to
1732 specify binary operators. Unfortunately, in a normal grammar, the
1733 rules for such things are awkward:
1734
1735 disjunction: conjunction ('or' conjunction)(s?)
1736 { $return = [ $item[1], @{$item[2]} ] }
1737
1738 conjunction: atom ('and' atom)(s?)
1739 { $return = [ $item[1], @{$item[2]} ] }
1740
1741 or inefficient:
1742
1743 disjunction: conjunction 'or' disjunction
1744 { $return = [ $item[1], @{$item[2]} ] }
1745 ⎪ conjunction
1746 { $return = [ $item[1] ] }
1747
1748 conjunction: atom 'and' conjunction
1749 { $return = [ $item[1], @{$item[2]} ] }
1750 ⎪ atom
1751 { $return = [ $item[1] ] }
1752
1753 and either way is ugly and hard to get right.
1754
1755 The "<leftop:...>" and "<rightop:...>" directives provide an easier
1756 way of specifying such operations. Using "<leftop:...>" the above
1757 examples become:
1758
1759 disjunction: <leftop: conjunction 'or' conjunction>
1760 conjunction: <leftop: atom 'and' atom>
1761
1762 The "<leftop:...>" directive specifies a left-associative binary
1763 operator. It is specified around three other grammar elements
1764 (typically subrules or terminals), which match the left operand,
1765 the operator itself, and the right operand respectively.
1766
1767 A "<leftop:...>" directive such as:
1768
1769 disjunction: <leftop: conjunction 'or' conjunction>
1770
1771 is converted to the following:
1772
1773 disjunction: ( conjunction ('or' conjunction)(s?)
1774 { $return = [ $item[1], @{$item[2]} ] } )
1775
1776 In other words, a "<leftop:...>" directive matches the left operand
1777 followed by zero or more repetitions of both the operator and the
1778 right operand. It then flattens the matched items into an anonymous
1779 array which becomes the (single) value of the entire "<leftop:...>"
1780 directive.
1781
1782 For example, an "<leftop:...>" directive such as:
1783
1784 output: <leftop: ident '<<' expr >
1785
1786 when given a string such as:
1787
1788 cout << var << "str" << 3
1789
1790 would match, and $item[1] would be set to:
1791
1792 [ 'cout', 'var', '"str"', '3' ]
1793
1794 In other words:
1795
1796 output: <leftop: ident '<<' expr >
1797
1798 is equivalent to a left-associative operator:
1799
1800 output: ident { $return = [$item[1]] }
1801 ⎪ ident '<<' expr { $return = [@item[1,3]] }
1802 ⎪ ident '<<' expr '<<' expr { $return = [@item[1,3,5]] }
1803 ⎪ ident '<<' expr '<<' expr '<<' expr { $return = [@item[1,3,5,7]] }
1804 # ...etc...
1805
1806 Similarly, the "<rightop:...>" directive takes a left operand, an
1807 operator, and a right operand:
1808
1809 assign: <rightop: var '=' expr >
1810
1811 and converts them to:
1812
1813 assign: ( (var '=' {$return=$item[1]})(s?) expr
1814 { $return = [ @{$item[1]}, $item[2] ] } )
1815
1816 which is equivalent to a right-associative operator:
1817
1818 assign: var { $return = [$item[1]] }
1819 ⎪ var '=' expr { $return = [@item[1,3]] }
1820 ⎪ var '=' var '=' expr { $return = [@item[1,3,5]] }
1821 ⎪ var '=' var '=' var '=' expr { $return = [@item[1,3,5,7]] }
1822 # ...etc...
1823
1824 Note that for both the "<leftop:...>" and "<rightop:...>" direc‐
1825 tives, the directive does not normally return the operator itself,
1826 just a list of the operands involved. This is particularly handy
1827 for specifying lists:
1828
1829 list: '(' <leftop: list_item ',' list_item> ')'
1830 { $return = $item[2] }
1831
1832 There is, however, a problem: sometimes the operator is itself sig‐
1833 nificant. For example, in a Perl list a comma and a "=>" are both
1834 valid separators, but the "=>" has additional stringification
1835 semantics. Hence it's important to know which was used in each
1836 case.
1837
1838 To solve this problem the "<leftop:...>" and "<rightop:...>" direc‐
1839 tives do return the operator(s) as well, under two circumstances.
1840 The first case is where the operator is specified as a subrule. In
1841 that instance, whatever the operator matches is returned (on the
1842 assumption that if the operator is important enough to have its own
1843 subrule, then it's important enough to return).
1844
1845 The second case is where the operator is specified as a regular
1846 expression. In that case, if the first bracketed subpattern of the
1847 regular expression matches, that matching value is returned (this
1848 is analogous to the behaviour of the Perl "split" function, except
1849 that only the first subpattern is returned).
1850
1851 In other words, given the input:
1852
1853 ( a=>1, b=>2 )
1854
1855 the specifications:
1856
1857 list: '(' <leftop: list_item separator list_item> ')'
1858
1859 separator: ',' ⎪ '=>'
1860
1861 or:
1862
1863 list: '(' <leftop: list_item /(,⎪=>)/ list_item> ')'
1864
1865 cause the list separators to be interleaved with the operands in
1866 the anonymous array in $item[2]:
1867
1868 [ 'a', '=>', '1', ',', 'b', '=>', '2' ]
1869
1870 But the following version:
1871
1872 list: '(' <leftop: list_item /,⎪=>/ list_item> ')'
1873
1874 returns only the operators:
1875
1876 [ 'a', '1', 'b', '2' ]
1877
1878 Of course, none of the above specifications handle the case of an
1879 empty list, since the "<leftop:...>" and "<rightop:...>" directives
1880 require at least a single right or left operand to match. To spec‐
1881 ify that the operator can match "trivially", it's necessary to add
1882 a "(?)" qualifier to the directive:
1883
1884 list: '(' <leftop: list_item /(,⎪=>)/ list_item>(?) ')'
1885
1886 Note that in almost all the above examples, the first and third
1887 arguments of the "<leftop:...>" directive were the same subrule.
1888 That is because "<leftop:...>"'s are frequently used to specify
1889 "separated" lists of the same type of item. To make such lists eas‐
1890 ier to specify, the following syntax:
1891
1892 list: element(s /,/)
1893
1894 is exactly equivalent to:
1895
1896 list: <leftop: element /,/ element>
1897
1898 Note that the separator must be specified as a raw pattern (i.e.
1899 not a string or subrule).
1900
1901 Scored productions
1902 By default, Parse::RecDescent grammar rules always accept the first
1903 production that matches the input. But if two or more productions
1904 may potentially match the same input, choosing the first that does
1905 so may not be optimal.
1906
1907 For example, if you were parsing the sentence "time flies like an
1908 arrow", you might use a rule like this:
1909
1910 sentence: verb noun preposition article noun { [@item] }
1911 ⎪ adjective noun verb article noun { [@item] }
1912 ⎪ noun verb preposition article noun { [@item] }
1913
1914 Each of these productions matches the sentence, but the third one
1915 is the most likely interpretation. However, if the sentence had
1916 been "fruit flies like a banana", then the second production is
1917 probably the right match.
1918
1919 To cater for such situtations, the "<score:...>" can be used. The
1920 directive is equivalent to an unconditional "<reject>", except that
1921 it allows you to specify a "score" for the current production. If
1922 that score is numerically greater than the best score of any pre‐
1923 ceding production, the current production is cached for later con‐
1924 sideration. If no later production matches, then the cached produc‐
1925 tion is treated as having matched, and the value of the item imme‐
1926 diately before its "<score:...>" directive is returned as the
1927 result.
1928
1929 In other words, by putting a "<score:...>" directive at the end of
1930 each production, you can select which production matches using cri‐
1931 teria other than specification order. For example:
1932
1933 sentence: verb noun preposition article noun { [@item] } <score: sensible(@item)>
1934 ⎪ adjective noun verb article noun { [@item] } <score: sensible(@item)>
1935 ⎪ noun verb preposition article noun { [@item] } <score: sensible(@item)>
1936
1937 Now, when each production reaches its respective "<score:...>"
1938 directive, the subroutine "sensible" will be called to evaluate the
1939 matched items (somehow). Once all productions have been tried, the
1940 one which "sensible" scored most highly will be the one that is
1941 accepted as a match for the rule.
1942
1943 The variable $score always holds the current best score of any pro‐
1944 duction, and the variable $score_return holds the corresponding
1945 return value.
1946
1947 As another example, the following grammar matches lines that may be
1948 separated by commas, colons, or semi-colons. This can be tricky if
1949 a colon-separated line also contains commas, or vice versa. The
1950 grammar resolves the ambiguity by selecting the rule that results
1951 in the fewest fields:
1952
1953 line: seplist[sep=>','] <score: -@{$item[1]}>
1954 ⎪ seplist[sep=>':'] <score: -@{$item[1]}>
1955 ⎪ seplist[sep=>" "] <score: -@{$item[1]}>
1956
1957 seplist: <skip:""> <leftop: /[^$arg{sep}]*/ "$arg{sep}" /[^$arg{sep}]*/>
1958
1959 Note the use of negation within the "<score:...>" directive to
1960 ensure that the seplist with the most items gets the lowest score.
1961
1962 As the above examples indicate, it is often the case that all pro‐
1963 ductions in a rule use exactly the same "<score:...>" directive. It
1964 is tedious to have to repeat this identical directive in every pro‐
1965 duction, so Parse::RecDescent also provides the "<autoscore:...>"
1966 directive.
1967
1968 If an "<autoscore:...>" directive appears in any production of a
1969 rule, the code it specifies is used as the scoring code for every
1970 production of that rule, except productions that already end with
1971 an explicit "<score:...>" directive. Thus the rules above could be
1972 rewritten:
1973
1974 line: <autoscore: -@{$item[1]}>
1975 line: seplist[sep=>',']
1976 ⎪ seplist[sep=>':']
1977 ⎪ seplist[sep=>" "]
1978
1979 sentence: <autoscore: sensible(@item)>
1980 ⎪ verb noun preposition article noun { [@item] }
1981 ⎪ adjective noun verb article noun { [@item] }
1982 ⎪ noun verb preposition article noun { [@item] }
1983
1984 Note that the "<autoscore:...>" directive itself acts as an uncon‐
1985 ditional "<reject>", and (like the "<rulevar:...>" directive) is
1986 pruned at compile-time wherever possible.
1987
1988 Dispensing with grammar checks
1989 During the compilation phase of parser construction, Parse::RecDes‐
1990 cent performs a small number of checks on the grammar it's given.
1991 Specifically it checks that the grammar is not left-recursive, that
1992 there are no "insatiable" constructs of the form:
1993
1994 rule: subrule(s) subrule
1995
1996 and that there are no rules missing (i.e. referred to, but never
1997 defined).
1998
1999 These checks are important during development, but can slow down
2000 parser construction in stable code. So Parse::RecDescent provides
2001 the <nocheck> directive to turn them off. The directive can only
2002 appear before the first rule definition, and switches off checking
2003 throughout the rest of the current grammar.
2004
2005 Typically, this directive would be added when a parser has been
2006 thoroughly tested and is ready for release.
2007
2008 Subrule argument lists
2009
2010 It is occasionally useful to pass data to a subrule which is being
2011 invoked. For example, consider the following grammar fragment:
2012
2013 classdecl: keyword decl
2014
2015 keyword: 'struct' ⎪ 'class';
2016
2017 decl: # WHATEVER
2018
2019 The "decl" rule might wish to know which of the two keywords was used
2020 (since it may affect some aspect of the way the subsequent declaration
2021 is interpreted). "Parse::RecDescent" allows the grammar designer to
2022 pass data into a rule, by placing that data in an argument list (that
2023 is, in square brackets) immediately after any subrule item in a produc‐
2024 tion. Hence, we could pass the keyword to "decl" as follows:
2025
2026 classdecl: keyword decl[ $item[1] ]
2027
2028 keyword: 'struct' ⎪ 'class';
2029
2030 decl: # WHATEVER
2031
2032 The argument list can consist of any number (including zero!) of comma-
2033 separated Perl expressions. In other words, it looks exactly like a
2034 Perl anonymous array reference. For example, we could pass the keyword,
2035 the name of the surrounding rule, and the literal 'keyword' to "decl"
2036 like so:
2037
2038 classdecl: keyword decl[$item[1],$item[0],'keyword']
2039
2040 keyword: 'struct' ⎪ 'class';
2041
2042 decl: # WHATEVER
2043
2044 Within the rule to which the data is passed ("decl" in the above exam‐
2045 ples) that data is available as the elements of a local variable @arg.
2046 Hence "decl" might report its intentions as follows:
2047
2048 classdecl: keyword decl[$item[1],$item[0],'keyword']
2049
2050 keyword: 'struct' ⎪ 'class';
2051
2052 decl: { print "Declaring $arg[0] (a $arg[2])\n";
2053 print "(this rule called by $arg[1])" }
2054
2055 Subrule argument lists can also be interpreted as hashes, simply by
2056 using the local variable %arg instead of @arg. Hence we could rewrite
2057 the previous example:
2058
2059 classdecl: keyword decl[keyword => $item[1],
2060 caller => $item[0],
2061 type => 'keyword']
2062
2063 keyword: 'struct' ⎪ 'class';
2064
2065 decl: { print "Declaring $arg{keyword} (a $arg{type})\n";
2066 print "(this rule called by $arg{caller})" }
2067
2068 Both @arg and %arg are always available, so the grammar designer may
2069 choose whichever convention (or combination of conventions) suits best.
2070
2071 Subrule argument lists are also useful for creating "rule templates"
2072 (especially when used in conjunction with the "<matchrule:...>" direc‐
2073 tive). For example, the subrule:
2074
2075 list: <matchrule:$arg{rule}> /$arg{sep}/ list[%arg]
2076 { $return = [ $item[1], @{$item[3]} ] }
2077 ⎪ <matchrule:$arg{rule}>
2078 { $return = [ $item[1]] }
2079
2080 is a handy template for the common problem of matching a separated
2081 list. For example:
2082
2083 function: 'func' name '(' list[rule=>'param',sep=>';'] ')'
2084
2085 param: list[rule=>'name',sep=>','] ':' typename
2086
2087 name: /\w+/
2088
2089 typename: name
2090
2091 When a subrule argument list is used with a repeated subrule, the argu‐
2092 ment list goes before the repetition specifier:
2093
2094 list: /some⎪many/ thing[ $item[1] ](s)
2095
2096 The argument list is "late bound". That is, it is re-evaluated for
2097 every repetition of the repeated subrule. This means that each
2098 repeated attempt to match the subrule may be passed a completely dif‐
2099 ferent set of arguments if the value of the expression in the argument
2100 list changes between attempts. So, for example, the grammar:
2101
2102 { $::species = 'dogs' }
2103
2104 pair: 'two' animal[$::species](s)
2105
2106 animal: /$arg[0]/ { $::species = 'cats' }
2107
2108 will match the string "two dogs cats cats" completely, whereas it will
2109 only match the string "two dogs dogs dogs" up to the eighth letter. If
2110 the value of the argument list were "early bound" (that is, evaluated
2111 only the first time a repeated subrule match is attempted), one would
2112 expect the matching behaviours to be reversed.
2113
2114 Of course, it is possible to effectively "early bind" such argument
2115 lists by passing them a value which does not change on each repetition.
2116 For example:
2117
2118 { $::species = 'dogs' }
2119
2120 pair: 'two' { $::species } animal[$item[2]](s)
2121
2122 animal: /$arg[0]/ { $::species = 'cats' }
2123
2124 Arguments can also be passed to the start rule, simply by appending
2125 them to the argument list with which the start rule is called (after
2126 the "line number" parameter). For example, given:
2127
2128 $parser = new Parse::RecDescent ( $grammar );
2129
2130 $parser->data($text, 1, "str", 2, \@arr);
2131
2132 # ^^^^^ ^ ^^^^^^^^^^^^^^^
2133 # ⎪ ⎪ ⎪
2134 # TEXT TO BE PARSED ⎪ ⎪
2135 # STARTING LINE NUMBER ⎪
2136 # ELEMENTS OF @arg WHICH IS PASSED TO RULE data
2137
2138 then within the productions of the rule "data", the array @arg will
2139 contain "("str", 2, \@arr)".
2140
2141 Alternations
2142
2143 Alternations are implicit (unnamed) rules defined as part of a produc‐
2144 tion. An alternation is defined as a series of '⎪'-separated produc‐
2145 tions inside a pair of round brackets. For example:
2146
2147 character: 'the' ( good ⎪ bad ⎪ ugly ) /dude/
2148
2149 Every alternation implicitly defines a new subrule, whose automati‐
2150 cally-generated name indicates its origin: "_alternation_<I>_of_produc‐
2151 tion_<P>_of_rule<R>" for the appropriate values of <I>, <P>, and <R>. A
2152 call to this implicit subrule is then inserted in place of the brack‐
2153 ets. Hence the above example is merely a convenient short-hand for:
2154
2155 character: 'the'
2156 _alternation_1_of_production_1_of_rule_character
2157 /dude/
2158
2159 _alternation_1_of_production_1_of_rule_character:
2160 good ⎪ bad ⎪ ugly
2161
2162 Since alternations are parsed by recursively calling the parser genera‐
2163 tor, any type(s) of item can appear in an alternation. For example:
2164
2165 character: 'the' ( 'high' "plains" # Silent, with poncho
2166 ⎪ /no[- ]name/ # Silent, no poncho
2167 ⎪ vengeance_seeking # Poncho-optional
2168 ⎪ <error>
2169 ) drifter
2170
2171 In this case, if an error occurred, the automatically generated message
2172 would be:
2173
2174 ERROR (line <N>): Invalid implicit subrule: Expected
2175 'high' or /no[- ]name/ or generic,
2176 but found "pacifist" instead
2177
2178 Since every alternation actually has a name, it's even possible to
2179 extend or replace them:
2180
2181 parser->Replace(
2182 "_alternation_1_of_production_1_of_rule_character:
2183 'generic Eastwood'"
2184 );
2185
2186 More importantly, since alternations are a form of subrule, they can be
2187 given repetition specifiers:
2188
2189 character: 'the' ( good ⎪ bad ⎪ ugly )(?) /dude/
2190
2191 Incremental Parsing
2192
2193 "Parse::RecDescent" provides two methods - "Extend" and "Replace" -
2194 which can be used to alter the grammar matched by a parser. Both meth‐
2195 ods take the same argument as "Parse::RecDescent::new", namely a gram‐
2196 mar specification string
2197
2198 "Parse::RecDescent::Extend" interprets the grammar specification and
2199 adds any productions it finds to the end of the rules for which they
2200 are specified. For example:
2201
2202 $add = "name: 'Jimmy-Bob' ⎪ 'Bobby-Jim'\ndesc: colour /necks?/";
2203 parser->Extend($add);
2204
2205 adds two productions to the rule "name" (creating it if necessary) and
2206 one production to the rule "desc".
2207
2208 "Parse::RecDescent::Replace" is identical, except that it first resets
2209 are rule specified in the additional grammar, removing any existing
2210 productions. Hence after:
2211
2212 $add = "name: 'Jimmy-Bob' ⎪ 'Bobby-Jim'\ndesc: colour /necks?/";
2213 parser->Replace($add);
2214
2215 are are only valid "name"s and the one possible description.
2216
2217 A more interesting use of the "Extend" and "Replace" methods is to call
2218 them inside the action of an executing parser. For example:
2219
2220 typedef: 'typedef' type_name identifier ';'
2221 { $thisparser->Extend("type_name: '$item[3]'") }
2222 ⎪ <error>
2223
2224 identifier: ...!type_name /[A-Za-z_]w*/
2225
2226 which automatically prevents type names from being typedef'd, or:
2227
2228 command: 'map' key_name 'to' abort_key
2229 { $thisparser->Replace("abort_key: '$item[2]'") }
2230 ⎪ 'map' key_name 'to' key_name
2231 { map_key($item[2],$item[4]) }
2232 ⎪ abort_key
2233 { exit if confirm("abort?") }
2234
2235 abort_key: 'q'
2236
2237 key_name: ...!abort_key /[A-Za-z]/
2238
2239 which allows the user to change the abort key binding, but not to
2240 unbind it.
2241
2242 The careful use of such constructs makes it possible to reconfigure a a
2243 running parser, eliminating the need for semantic feedback by providing
2244 syntactic feedback instead. However, as currently implemented,
2245 "Replace()" and "Extend()" have to regenerate and re-"eval" the entire
2246 parser whenever they are called. This makes them quite slow for large
2247 grammars.
2248
2249 In such cases, the judicious use of an interpolated regex is likely to
2250 be far more efficient:
2251
2252 typedef: 'typedef' type_name/ identifier ';'
2253 { $thisparser->{local}{type_name} .= "⎪$item[3]" }
2254 ⎪ <error>
2255
2256 identifier: ...!type_name /[A-Za-z_]w*/
2257
2258 type_name: /$thisparser->{local}{type_name}/
2259
2260 Precompiling parsers
2261
2262 Normally Parse::RecDescent builds a parser from a grammar at run-time.
2263 That approach simplifies the design and implementation of parsing code,
2264 but has the disadvantage that it slows the parsing process down - you
2265 have to wait for Parse::RecDescent to build the parser every time the
2266 program runs. Long or complex grammars can be particularly slow to
2267 build, leading to unacceptable delays at start-up.
2268
2269 To overcome this, the module provides a way of "pre-building" a parser
2270 object and saving it in a separate module. That module can then be used
2271 to create clones of the original parser.
2272
2273 A grammar may be precompiled using the "Precompile" class method. For
2274 example, to precompile a grammar stored in the scalar $grammar, and
2275 produce a class named PreGrammar in a module file named PreGrammar.pm,
2276 you could use:
2277
2278 use Parse::RecDescent;
2279
2280 Parse::RecDescent->Precompile($grammar, "PreGrammar");
2281
2282 The first argument is the grammar string, the second is the name of the
2283 class to be built. The name of the module file is generated automati‐
2284 cally by appending ".pm" to the last element of the class name. Thus
2285
2286 Parse::RecDescent->Precompile($grammar, "My::New::Parser");
2287
2288 would produce a module file named Parser.pm.
2289
2290 It is somewhat tedious to have to write a small Perl program just to
2291 generate a precompiled grammar class, so Parse::RecDescent has some
2292 special magic that allows you to do the job directly from the com‐
2293 mand-line.
2294
2295 If your grammar is specified in a file named grammar, you can generate
2296 a class named Yet::Another::Grammar like so:
2297
2298 > perl -MParse::RecDescent - grammar Yet::Another::Grammar
2299
2300 This would produce a file named Grammar.pm containing the full defini‐
2301 tion of a class called Yet::Another::Grammar. Of course, to use that
2302 class, you would need to put the Grammar.pm file in a directory named
2303 Yet/Another, somewhere in your Perl include path.
2304
2305 Having created the new class, it's very easy to use it to build a
2306 parser. You simply "use" the new module, and then call its "new" method
2307 to create a parser object. For example:
2308
2309 use Yet::Another::Grammar;
2310 my $parser = Yet::Another::Grammar->new();
2311
2312 The effect of these two lines is exactly the same as:
2313
2314 use Parse::RecDescent;
2315
2316 open GRAMMAR_FILE, "grammar" or die;
2317 local $/;
2318 my $grammar = <GRAMMAR_FILE>;
2319
2320 my $parser = Parse::RecDescent->new($grammar);
2321
2322 only considerably faster.
2323
2324 Note however that the parsers produced by either approach are exactly
2325 the same, so whilst precompilation has an effect on set-up speed, it
2326 has no effect on parsing speed. RecDescent 2.0 will address that prob‐
2327 lem.
2328
2329 A Metagrammar for "Parse::RecDescent"
2330
2331 The following is a specification of grammar format accepted by
2332 "Parse::RecDescent::new" (specified in the "Parse::RecDescent" grammar
2333 format!):
2334
2335 grammar : components(s)
2336
2337 component : rule ⎪ comment
2338
2339 rule : "\n" identifier ":" production(s?)
2340
2341 production : items(s)
2342
2343 item : lookahead(?) simpleitem
2344 ⎪ directive
2345 ⎪ comment
2346
2347 lookahead : '...' ⎪ '...!' # +'ve or -'ve lookahead
2348
2349 simpleitem : subrule args(?) # match another rule
2350 ⎪ repetition # match repeated subrules
2351 ⎪ terminal # match the next input
2352 ⎪ bracket args(?) # match alternative items
2353 ⎪ action # do something
2354
2355 subrule : identifier # the name of the rule
2356
2357 args : {extract_codeblock($text,'[]')} # just like a [...] array ref
2358
2359 repetition : subrule args(?) howoften
2360
2361 howoften : '(?)' # 0 or 1 times
2362 ⎪ '(s?)' # 0 or more times
2363 ⎪ '(s)' # 1 or more times
2364 ⎪ /(\d+)[.][.](/\d+)/ # $1 to $2 times
2365 ⎪ /[.][.](/\d*)/ # at most $1 times
2366 ⎪ /(\d*)[.][.])/ # at least $1 times
2367
2368 terminal : /[/]([\][/]⎪[^/])*[/]/ # interpolated pattern
2369 ⎪ /"([\]"⎪[^"])*"/ # interpolated literal
2370 ⎪ /'([\]'⎪[^'])*'/ # uninterpolated literal
2371
2372 action : { extract_codeblock($text) } # embedded Perl code
2373
2374 bracket : '(' Item(s) production(s?) ')' # alternative subrules
2375
2376 directive : '<commit>' # commit to production
2377 ⎪ '<uncommit>' # cancel commitment
2378 ⎪ '<resync>' # skip to newline
2379 ⎪ '<resync:' pattern '>' # skip <pattern>
2380 ⎪ '<reject>' # fail this production
2381 ⎪ '<reject:' condition '>' # fail if <condition>
2382 ⎪ '<error>' # report an error
2383 ⎪ '<error:' string '>' # report error as "<string>"
2384 ⎪ '<error?>' # error only if committed
2385 ⎪ '<error?:' string '>' # " " " "
2386 ⎪ '<rulevar:' /[^>]+/ '>' # define rule-local variable
2387 ⎪ '<matchrule:' string '>' # invoke rule named in string
2388
2389 identifier : /[a-z]\w*/i # must start with alpha
2390
2391 comment : /#[^\n]*/ # same as Perl
2392
2393 pattern : {extract_bracketed($text,'<')} # allow embedded "<..>"
2394
2395 condition : {extract_codeblock($text,'{<')} # full Perl expression
2396
2397 string : {extract_variable($text)} # any Perl variable
2398 ⎪ {extract_quotelike($text)} # or quotelike string
2399 ⎪ {extract_bracketed($text,'<')} # or balanced brackets
2400
2402 This section describes common mistakes that grammar writers seem to
2403 make on a regular basis.
2404
2405 1. Expecting an error to always invalidate a parse
2406
2407 A common mistake when using error messages is to write the grammar like
2408 this:
2409
2410 file: line(s)
2411
2412 line: line_type_1
2413 ⎪ line_type_2
2414 ⎪ line_type_3
2415 ⎪ <error>
2416
2417 The expectation seems to be that any line that is not of type 1, 2 or 3
2418 will invoke the "<error>" directive and thereby cause the parse to
2419 fail.
2420
2421 Unfortunately, that only happens if the error occurs in the very first
2422 line. The first rule states that a "file" is matched by one or more
2423 lines, so if even a single line succeeds, the first rule is completely
2424 satisfied and the parse as a whole succeeds. That means that any error
2425 messages generated by subsequent failures in the "line" rule are qui‐
2426 etly ignored.
2427
2428 Typically what's really needed is this:
2429
2430 file: line(s) eofile { $return = $item[1] }
2431
2432 line: line_type_1
2433 ⎪ line_type_2
2434 ⎪ line_type_3
2435 ⎪ <error>
2436
2437 eofile: /^\Z/
2438
2439 The addition of the "eofile" subrule to the first production means
2440 that a file only matches a series of successful "line" matches that
2441 consume the complete input text. If any input text remains after the
2442 lines are matched, there must have been an error in the last "line". In
2443 that case the "eofile" rule will fail, causing the entire "file" rule
2444 to fail too.
2445
2446 Note too that "eofile" must match "/^\Z/" (end-of-text), not "/^\cZ/"
2447 or "/^\cD/" (end-of-file).
2448
2449 And don't forget the action at the end of the production. If you just
2450 write:
2451
2452 file: line(s) eofile
2453
2454 then the value returned by the "file" rule will be the value of its
2455 last item: "eofile". Since "eofile" always returns an empty string on
2456 success, that will cause the "file" rule to return that empty string.
2457 Apart from returning the wrong value, returning an empty string will
2458 trip up code such as:
2459
2460 $parser->file($filetext) ⎪⎪ die;
2461
2462 (since "" is false).
2463
2464 Remember that Parse::RecDescent returns undef on failure, so the only
2465 safe test for failure is:
2466
2467 defined($parser->file($filetext)) ⎪⎪ die;
2468
2470 Diagnostics are intended to be self-explanatory (particularly if you
2471 use -RD_HINT (under perl -s) or define $::RD_HINT inside the program).
2472
2473 "Parse::RecDescent" currently diagnoses the following:
2474
2475 · Invalid regular expressions used as pattern terminals (fatal
2476 error).
2477
2478 · Invalid Perl code in code blocks (fatal error).
2479
2480 · Lookahead used in the wrong place or in a nonsensical way (fatal
2481 error).
2482
2483 · "Obvious" cases of left-recursion (fatal error).
2484
2485 · Missing or extra components in a "<leftop>" or "<rightop>" direc‐
2486 tive.
2487
2488 · Unrecognisable components in the grammar specification (fatal
2489 error).
2490
2491 · "Orphaned" rule components specified before the first rule (fatal
2492 error) or after an "<error>" directive (level 3 warning).
2493
2494 · Missing rule definitions (this only generates a level 3 warning,
2495 since you may be providing them later via "Parse::RecDes‐
2496 cent::Extend()").
2497
2498 · Instances where greedy repetition behaviour will almost certainly
2499 cause the failure of a production (a level 3 warning - see
2500 "ON-GOING ISSUES AND FUTURE DIRECTIONS" below).
2501
2502 · Attempts to define rules named 'Replace' or 'Extend', which cannot
2503 be called directly through the parser object because of the prede‐
2504 fined meaning of "Parse::RecDescent::Replace" and "Parse::RecDes‐
2505 cent::Extend". (Only a level 2 warning is generated, since such
2506 rules can still be used as subrules).
2507
2508 · Productions which consist of a single "<error?>" directive, and
2509 which therefore may succeed unexpectedly (a level 2 warning, since
2510 this might conceivably be the desired effect).
2511
2512 · Multiple consecutive lookahead specifiers (a level 1 warning only,
2513 since their effects simply accumulate).
2514
2515 · Productions which start with a "<reject>" or "<rulevar:...>" direc‐
2516 tive. Such productions are optimized away (a level 1 warning).
2517
2518 · Rules which are autogenerated under $::AUTOSTUB (a level 1 warn‐
2519 ing).
2520
2522 Damian Conway (damian@conway.org)
2523
2525 There are undoubtedly serious bugs lurking somewhere in this much code
2526 :-) Bug reports and other feedback are most welcome.
2527
2528 Ongoing annoyances include:
2529
2530 · There's no support for parsing directly from an input stream. If
2531 and when the Perl Gods give us regular expressions on streams, this
2532 should be trivial (ahem!) to implement.
2533
2534 · The parser generator can get confused if actions aren't properly
2535 closed or if they contain particularly nasty Perl syntax errors
2536 (especially unmatched curly brackets).
2537
2538 · The generator only detects the most obvious form of left recursion
2539 (potential recursion on the first subrule in a rule). More subtle
2540 forms of left recursion (for example, through the second item in a
2541 rule after a "zero" match of a preceding "zero-or-more" repetition,
2542 or after a match of a subrule with an empty production) are not
2543 found.
2544
2545 · Instead of complaining about left-recursion, the generator should
2546 silently transform the grammar to remove it. Don't expect this fea‐
2547 ture any time soon as it would require a more sophisticated
2548 approach to parser generation than is currently used.
2549
2550 · The generated parsers don't always run as fast as might be wished.
2551
2552 · The meta-parser should be bootstrapped using "Parse::RecDescent"
2553 :-)
2554
2556 1. Repetitions are "incorrigibly greedy" in that they will eat every‐
2557 thing they can and won't backtrack if that behaviour causes a pro‐
2558 duction to fail needlessly. So, for example:
2559
2560 rule: subrule(s) subrule
2561
2562 will never succeed, because the repetition will eat all the sub‐
2563 rules it finds, leaving none to match the second item. Such con‐
2564 structions are relatively rare (and "Parse::RecDescent::new" gener‐
2565 ates a warning whenever they occur) so this may not be a problem,
2566 especially since the insatiable behaviour can be overcome "manu‐
2567 ally" by writing:
2568
2569 rule: penultimate_subrule(s) subrule
2570
2571 penultimate_subrule: subrule ...subrule
2572
2573 The issue is that this construction is exactly twice as expensive
2574 as the original, whereas backtracking would add only 1/N to the
2575 cost (for matching N repetitions of "subrule"). I would welcome
2576 feedback on the need for backtracking; particularly on cases where
2577 the lack of it makes parsing performance problematical.
2578
2579 2. Having opened that can of worms, it's also necessary to consider
2580 whether there is a need for non-greedy repetition specifiers.
2581 Again, it's possible (at some cost) to manually provide the
2582 required functionality:
2583
2584 rule: nongreedy_subrule(s) othersubrule
2585
2586 nongreedy_subrule: subrule ...!othersubrule
2587
2588 Overall, the issue is whether the benefit of this extra functional‐
2589 ity outweighs the drawbacks of further complicating the (currently
2590 minimalist) grammar specification syntax, and (worse) introducing
2591 more overhead into the generated parsers.
2592
2593 3. An "<autocommit>" directive would be nice. That is, it would be
2594 useful to be able to say:
2595
2596 command: <autocommit>
2597 command: 'find' name
2598 ⎪ 'find' address
2599 ⎪ 'do' command 'at' time 'if' condition
2600 ⎪ 'do' command 'at' time
2601 ⎪ 'do' command
2602 ⎪ unusual_command
2603
2604 and have the generator work out that this should be "pruned" thus:
2605
2606 command: 'find' name
2607 ⎪ 'find' <commit> address
2608 ⎪ 'do' <commit> command <uncommit>
2609 'at' time
2610 'if' <commit> condition
2611 ⎪ 'do' <commit> command <uncommit>
2612 'at' <commit> time
2613 ⎪ 'do' <commit> command
2614 ⎪ unusual_command
2615
2616 There are several issues here. Firstly, should the "<autocommit>"
2617 automatically install an "<uncommit>" at the start of the last pro‐
2618 duction (on the grounds that the "command" rule doesn't know
2619 whether an "unusual_command" might start with "find" or "do") or
2620 should the "unusual_command" subgraph be analysed (to see if it
2621 might be viable after a "find" or "do")?
2622
2623 The second issue is how regular expressions should be treated. The
2624 simplest approach would be simply to uncommit before them (on the
2625 grounds that they might match). Better efficiency would be obtained
2626 by analyzing all preceding literal tokens to determine whether the
2627 pattern would match them.
2628
2629 Overall, the issues are: can such automated "pruning" approach a
2630 hand-tuned version sufficiently closely to warrant the extra set-up
2631 expense, and (more importantly) is the problem important enough to
2632 even warrant the non-trivial effort of building an automated solu‐
2633 tion?
2634
2636 Copyright (c) 1997-2000, Damian Conway. All Rights Reserved. This mod‐
2637 ule is free software. It may be used, redistributed and/or modified
2638 under the terms of the Perl Artistic License
2639 (see http://www.perl.com/perl/misc/Artistic.html)
2640
2641
2642
2643perl v5.8.8 2003-04-09 Parse::RecDescent(3)