Parse::RecDescent(3pm)

1Parse::RecDescent(3)  User Contributed Perl Documentation Parse::RecDescent(3)
2
3
4

NAME

6       Parse::RecDescent - Generate Recursive-Descent Parsers
7

VERSION

9       This document describes version 1.94 of Parse::RecDescent, released
10       April  9, 2003.
11

SYNOPSIS

13        use Parse::RecDescent;
14
15        # Generate a parser from the specification in $grammar:
16
17                $parser = new Parse::RecDescent ($grammar);
18
19        # Generate a parser from the specification in $othergrammar
20
21                $anotherparser = new Parse::RecDescent ($othergrammar);
22
23        # Parse $text using rule 'startrule' (which must be
24        # defined in $grammar):
25
26               $parser->startrule($text);
27
28        # Parse $text using rule 'otherrule' (which must also
29        # be defined in $grammar):
30
31                $parser->otherrule($text);
32
33        # Change the universal token prefix pattern
34        # (the default is: '\s*'):
35
36               $Parse::RecDescent::skip = '[ \t]+';
37
38        # Replace productions of existing rules (or create new ones)
39        # with the productions defined in $newgrammar:
40
41               $parser->Replace($newgrammar);
42
43        # Extend existing rules (or create new ones)
44        # by adding extra productions defined in $moregrammar:
45
46               $parser->Extend($moregrammar);
47
48        # Global flags (useful as command line arguments under -s):
49
50               $::RD_ERRORS       # unless undefined, report fatal errors
51               $::RD_WARN         # unless undefined, also report non-fatal problems
52               $::RD_HINT         # if defined, also suggestion remedies
53               $::RD_TRACE        # if defined, also trace parsers' behaviour
54               $::RD_AUTOSTUB     # if defined, generates "stubs" for undefined rules
55               $::RD_AUTOACTION   # if defined, appends specified action to productions
56

DESCRIPTION

58       Overview
59
60       Parse::RecDescent incrementally generates top-down recursive-descent
61       text parsers from simple yacc-like grammar specifications. It provides:
62
63       ·   Regular expressions or literal strings as terminals (tokens),
64
65       ·   Multiple (non-contiguous) productions for any rule,
66
67       ·   Repeated and optional subrules within productions,
68
69       ·   Full access to Perl within actions specified as part of the gram‐
70           mar,
71
72       ·   Simple automated error reporting during parser generation and pars‐
73           ing,
74
75       ·   The ability to commit to, uncommit to, or reject particular produc‐
76           tions during a parse,
77
78       ·   The ability to pass data up and down the parse tree ("down" via
79           subrule argument lists, "up" via subrule return values)
80
81       ·   Incremental extension of the parsing grammar (even during a parse),
82
83       ·   Precompilation of parser objects,
84
85       ·   User-definable reduce-reduce conflict resolution via "scoring" of
86           matching productions.
87
88       Using "Parse::RecDescent"
89
90       Parser objects are created by calling "Parse::RecDescent::new", passing
91       in a grammar specification (see the following subsections). If the
92       grammar is correct, "new" returns a blessed reference which can then be
93       used to initiate parsing through any rule specified in the original
94       grammar. A typical sequence looks like this:
95
96               $grammar = q {
97                               # GRAMMAR SPECIFICATION HERE
98                            };
99
100               $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n";
101
102               # acquire $text
103
104               defined $parser->startrule($text) or print "Bad text!\n";
105
106       The rule through which parsing is initiated must be explicitly defined
107       in the grammar (i.e. for the above example, the grammar must include a
108       rule of the form: "startrule: <subrules>".
109
110       If the starting rule succeeds, its value (see below) is returned. Fail‐
111       ure to generate the original parser or failure to match a text is indi‐
112       cated by returning "undef". Note that it's easy to set up grammars that
113       can succeed, but which return a value of 0, "0", or "".  So don't be
114       tempted to write:
115
116               $parser->startrule($text) or print "Bad text!\n";
117
118       Normally, the parser has no effect on the original text. So in the pre‐
119       vious example the value of $text would be unchanged after having been
120       parsed.
121
122       If, however, the text to be matched is passed by reference:
123
124               $parser->startrule(\$text)
125
126       then any text which was consumed during the match will be removed from
127       the start of $text.
128
129       Rules
130
131       In the grammar from which the parser is built, rules are specified by
132       giving an identifier (which must satisfy /[A-Za-z]\w*/), followed by a
133       colon on the same line, followed by one or more productions, separated
134       by single vertical bars. The layout of the productions is entirely
135       free-format:
136
137               rule1:  production1
138                    ⎪  production2 ⎪
139                       production3 ⎪ production4
140
141       At any point in the grammar previously defined rules may be extended
142       with additional productions. This is achieved by redeclaring the rule
143       with the new productions. Thus:
144
145               rule1: a ⎪ b ⎪ c
146               rule2: d ⎪ e ⎪ f
147               rule1: g ⎪ h
148
149       is exactly equivalent to:
150
151               rule1: a ⎪ b ⎪ c ⎪ g ⎪ h
152               rule2: d ⎪ e ⎪ f
153
154       Each production in a rule consists of zero or more items, each of which
155       may be either: the name of another rule to be matched (a "subrule"), a
156       pattern or string literal to be matched directly (a "token"), a block
157       of Perl code to be executed (an "action"), a special instruction to the
158       parser (a "directive"), or a standard Perl comment (which is ignored).
159
160       A rule matches a text if one of its productions matches. A production
161       matches if each of its items match consecutive substrings of the text.
162       The productions of a rule being matched are tried in the same order
163       that they appear in the original grammar, and the first matching pro‐
164       duction terminates the match attempt (successfully). If all productions
165       are tried and none matches, the match attempt fails.
166
167       Note that this behaviour is quite different from the "prefer the longer
168       match" behaviour of yacc. For example, if yacc were parsing the rule:
169
170               seq : 'A' 'B'
171                   ⎪ 'A' 'B' 'C'
172
173       upon matching "AB" it would look ahead to see if a 'C' is next and, if
174       so, will match the second production in preference to the first. In
175       other words, yacc effectively tries all the productions of a rule
176       breadth-first in parallel, and selects the "best" match, where "best"
177       means longest (note that this is a gross simplification of the true be‐
178       haviour of yacc but it will do for our purposes).
179
180       In contrast, "Parse::RecDescent" tries each production depth-first in
181       sequence, and selects the "best" match, where "best" means first. This
182       is the fundamental difference between "bottom-up" and "recursive
183       descent" parsing.
184
185       Each successfully matched item in a production is assigned a value,
186       which can be accessed in subsequent actions within the same production
187       (or, in some cases, as the return value of a successful subrule call).
188       Unsuccessful items don't have an associated value, since the failure of
189       an item causes the entire surrounding production to immediately fail.
190       The following sections describe the various types of items and their
191       success values.
192
193       Subrules
194
195       A subrule which appears in a production is an instruction to the parser
196       to attempt to match the named rule at that point in the text being
197       parsed. If the named subrule is not defined when requested the produc‐
198       tion containing it immediately fails (unless it was "autostubbed" - see
199       Autostubbing).
200
201       A rule may (recursively) call itself as a subrule, but not as the left-
202       most item in any of its productions (since such recursions are usually
203       non-terminating).
204
205       The value associated with a subrule is the value associated with its
206       $return variable (see "Actions" below), or with the last successfully
207       matched item in the subrule match.
208
209       Subrules may also be specified with a trailing repetition specifier,
210       indicating that they are to be (greedily) matched the specified number
211       of times. The available specifiers are:
212
213                       subrule(?)      # Match one-or-zero times
214                       subrule(s)      # Match one-or-more times
215                       subrule(s?)     # Match zero-or-more times
216                       subrule(N)      # Match exactly N times for integer N > 0
217                       subrule(N..M)   # Match between N and M times
218                       subrule(..M)    # Match between 1 and M times
219                       subrule(N..)    # Match at least N times
220
221       Repeated subrules keep matching until either the subrule fails to
222       match, or it has matched the minimal number of times but fails to con‐
223       sume any of the parsed text (this second condition prevents the subrule
224       matching forever in some cases).
225
226       Since a repeated subrule may match many instances of the subrule
227       itself, the value associated with it is not a simple scalar, but rather
228       a reference to a list of scalars, each of which is the value associated
229       with one of the individual subrule matches. In other words in the rule:
230
231                       program: statement(s)
232
233       the value associated with the repeated subrule "statement(s)" is a ref‐
234       erence to an array containing the values matched by each call to the
235       individual subrule "statement".
236
237       Repetition modifieres may include a separator pattern:
238
239                       program: statement(s /;/)
240
241       specifying some sequence of characters to be skipped between each repe‐
242       tition.  This is really just a shorthand for the <leftop:...> directive
243       (see below).
244
245       Tokens
246
247       If a quote-delimited string or a Perl regex appears in a production,
248       the parser attempts to match that string or pattern at that point in
249       the text. For example:
250
251                       typedef: "typedef" typename identifier ';'
252
253                       identifier: /[A-Za-z_][A-Za-z0-9_]*/
254
255       As in regular Perl, a single quoted string is uninterpolated, whilst a
256       double-quoted string or a pattern is interpolated (at the time of
257       matching, not when the parser is constructed). Hence, it is possible to
258       define rules in which tokens can be set at run-time:
259
260                       typedef: "$::typedefkeyword" typename identifier ';'
261
262                       identifier: /$::identpat/
263
264       Note that, since each rule is implemented inside a special namespace
265       belonging to its parser, it is necessary to explicitly quantify vari‐
266       ables from the main package.
267
268       Regex tokens can be specified using just slashes as delimiters or with
269       the explicit "m<delimiter>......<delimiter>" syntax:
270
271                       typedef: "typedef" typename identifier ';'
272
273                       typename: /[A-Za-z_][A-Za-z0-9_]*/
274
275                       identifier: m{[A-Za-z_][A-Za-z0-9_]*}
276
277       A regex of either type can also have any valid trailing parameter(s)
278       (that is, any of [cgimsox]):
279
280                       typedef: "typedef" typename identifier ';'
281
282                       identifier: / [a-z_]            # LEADING ALPHA OR UNDERSCORE
283                                     [a-z0-9_]*        # THEN DIGITS ALSO ALLOWED
284                                   /ix                 # CASE/SPACE/COMMENT INSENSITIVE
285
286       The value associated with any successfully matched token is a string
287       containing the actual text which was matched by the token.
288
289       It is important to remember that, since each grammar is specified in a
290       Perl string, all instances of the universal escape character '\' within
291       a grammar must be "doubled", so that they interpolate to single '\'s
292       when the string is compiled. For example, to use the grammar:
293
294                       word:       /\S+/ ⎪ backslash
295                       line:       prefix word(s) "\n"
296                       backslash:  '\\'
297
298       the following code is required:
299
300                       $parser = new Parse::RecDescent (q{
301
302                               word:       /\\S+/ ⎪ backslash
303                               line:       prefix word(s) "\\n"
304                               backslash:  '\\\\'
305
306                       });
307
308       Terminal Separators
309
310       For the purpose of matching, each terminal in a production is consid‐
311       ered to be preceded by a "prefix" - a pattern which must be matched
312       before a token match is attempted. By default, the prefix is optional
313       whitespace (which always matches, at least trivially), but this default
314       may be reset in any production.
315
316       The variable $Parse::RecDescent::skip stores the universal prefix,
317       which is the default for all terminal matches in all parsers built with
318       "Parse::RecDescent".
319
320       The prefix for an individual production can be altered by using the
321       "<skip:...>" directive (see below).
322
323       Actions
324
325       An action is a block of Perl code which is to be executed (as the block
326       of a "do" statement) when the parser reaches that point in a produc‐
327       tion. The action executes within a special namespace belonging to the
328       active parser, so care must be taken in correctly qualifying variable
329       names (see also "Start-up Actions" below).
330
331       The action is considered to succeed if the final value of the block is
332       defined (that is, if the implied "do" statement evaluates to a defined
333       value - even one which would be treated as "false"). Note that the
334       value associated with a successful action is also the final value in
335       the block.
336
337       An action will fail if its last evaluated value is "undef". This is
338       surprisingly easy to accomplish by accident. For instance, here's an
339       infuriating case of an action that makes its production fail, but only
340       when debugging isn't activated:
341
342               description: name rank serial_number
343                               { print "Got $item[2] $item[1] ($item[3])\n"
344                                       if $::debugging
345                               }
346
347       If $debugging is false, no statement in the block is executed, so the
348       final value is "undef", and the entire production fails. The solution
349       is:
350
351               description: name rank serial_number
352                               { print "Got $item[2] $item[1] ($item[3])\n"
353                                       if $::debugging;
354                                 1;
355                               }
356
357       Within an action, a number of useful parse-time variables are available
358       in the special parser namespace (there are other variables also acces‐
359       sible, but meddling with them will probably just break your parser. As
360       a general rule, if you avoid referring to unqualified variables - espe‐
361       cially those starting with an underscore - inside an action, things
362       should be okay):
363
364       @item and %item
365           The array slice @item[1..$#item] stores the value associated with
366           each item (that is, each subrule, token, or action) in the current
367           production. The analogy is to $1, $2, etc. in a yacc grammar.  Note
368           that, for obvious reasons, @item only contains the values of items
369           before the current point in the production.
370
371           The first element ($item[0]) stores the name of the current rule
372           being matched.
373
374           @item is a standard Perl array, so it can also be indexed with neg‐
375           ative numbers, representing the number of items back from the cur‐
376           rent position in the parse:
377
378                   stuff: /various/ bits 'and' pieces "then" data 'end'
379                                   { print $item[-2] }  # PRINTS data
380                                                        # (EASIER THAN: $item[6])
381
382           The %item hash complements the <@item> array, providing named
383           access to the same item values:
384
385                   stuff: /various/ bits 'and' pieces "then" data 'end'
386                                   { print $item{data}  # PRINTS data
387                                                        # (EVEN EASIER THAN USING @item)
388
389           The results of named subrules are stored in the hash under each
390           subrule's name (including the repetition specifier, if any), whilst
391           all other items are stored under a "named positional" key that
392           indictates their ordinal position within their item type:
393           __STRINGn__, __PATTERNn__, __DIRECTIVEn__, __ACTIONn__:
394
395                   stuff: /various/ bits 'and' pieces "then" data 'end' { save }
396                                   { print $item{__PATTERN1__}, # PRINTS 'various'
397                                           $item{__STRING2__},  # PRINTS 'then'
398                                           $item{__ACTION1__},  # PRINTS RETURN
399                                                                # VALUE OF save
400                                   }
401
402           If you want proper named access to patterns or literals, you need
403           to turn them into separate rules:
404
405                   stuff: various bits 'and' pieces "then" data 'end'
406                                   { print $item{various}  # PRINTS various
407                                   }
408
409                   various: /various/
410
411           The special entry $item{__RULE__} stores the name of the current
412           rule (i.e. the same value as $item[0].
413
414           The advantage of using %item, instead of @items is that it removes
415           the need to track items positions that may change as a grammar
416           evolves. For example, adding an interim "<skip>" directive of
417           action can silently ruin a trailing action, by moving an @item ele‐
418           ment "down" the array one place. In contrast, the named entry of
419           %item is unaffected by such an insertion.
420
421           A limitation of the %item hash is that it only records the last
422           value of a particular subrule. For example:
423
424                   range: '(' number '..' number )'
425                                   { $return = $item{number} }
426
427           will return only the value corresponding to the second match of the
428           "number" subrule. In other words, successive calls to a subrule
429           overwrite the corresponding entry in %item. Once again, the solu‐
430           tion is to rename each subrule in its own rule:
431
432                   range: '(' from_num '..' to_num )'
433                                   { $return = $item{from_num} }
434
435                   from_num: number
436                   to_num:   number
437
438       @arg and %arg
439           The array @arg and the hash %arg store any arguments passed to the
440           rule from some other rule (see ""Subrule argument lists"). Changes
441           to the elements of either variable do not propagate back to the
442           calling rule (data can be passed back from a subrule via the
443           $return variable - see next item).
444
445       $return
446           If a value is assigned to $return within an action, that value is
447           returned if the production containing the action eventually matches
448           successfully. Note that setting $return doesn't cause the current
449           production to succeed. It merely tells it what to return if it does
450           succeed.  Hence $return is analogous to $$ in a yacc grammar.
451
452           If $return is not assigned within a production, the value of the
453           last component of the production (namely: $item[$#item]) is
454           returned if the production succeeds.
455
456       $commit
457           The current state of commitment to the current production (see
458           "Directives" below).
459
460       $skip
461           The current terminal prefix (see "Directives" below).
462
463       $text
464           The remaining (unparsed) text. Changes to $text do not propagate
465           out of unsuccessful productions, but do survive successful produc‐
466           tions. Hence it is possible to dynamically alter the text being
467           parsed - for example, to provide a "#include"-like facility:
468
469                   hash_include: '#include' filename
470                                           { $text = ::loadfile($item[2]) . $text }
471
472                   filename: '<' /[a-z0-9._-]+/i '>'  { $return = $item[2] }
473                           ⎪ '"' /[a-z0-9._-]+/i '"'  { $return = $item[2] }
474
475       $thisline and $prevline
476           $thisline stores the current line number within the current parse
477           (starting from 1). $prevline stores the line number for the last
478           character which was already successfully parsed (this will be dif‐
479           ferent from $thisline at the end of each line).
480
481           For efficiency, $thisline and $prevline are actually tied hashes,
482           and only recompute the required line number when the variable's
483           value is used.
484
485           Assignment to $thisline adjusts the line number calculator, so that
486           it believes that the current line number is the value being
487           assigned. Note that this adjustment will be reflected in all subse‐
488           quent line numbers calculations.
489
490           Modifying the value of the variable $text (as in the previous
491           "hash_include" example, for instance) will confuse the line count‐
492           ing mechanism. To prevent this, you should call "Parse::RecDes‐
493           cent::LineCounter::resync($thisline)" immediately after any assign‐
494           ment to the variable $text (or, at least, before the next attempt
495           to use $thisline).
496
497           Note that if a production fails after assigning to or resync'ing
498           $thisline, the parser's line counter mechanism will usually be cor‐
499           rupted.
500
501           Also see the entry for @itempos.
502
503           The line number can be set to values other than 1, by calling the
504           start rule with a second argument. For example:
505
506                   $parser = new Parse::RecDescent ($grammar);
507
508                   $parser->input($text, 10);      # START LINE NUMBERS AT 10
509
510       $thiscolumn and $prevcolumn
511           $thiscolumn stores the current column number within the current
512           line being parsed (starting from 1). $prevcolumn stores the column
513           number of the last character which was actually successfully
514           parsed. Usually "$prevcolumn == $thiscolumn-1", but not at the end
515           of lines.
516
517           For efficiency, $thiscolumn and $prevcolumn are actually tied
518           hashes, and only recompute the required column number when the
519           variable's value is used.
520
521           Assignment to $thiscolumn or $prevcolumn is a fatal error.
522
523           Modifying the value of the variable $text (as in the previous
524           "hash_include" example, for instance) may confuse the column count‐
525           ing mechanism.
526
527           Note that $thiscolumn reports the column number before any white‐
528           space that might be skipped before reading a token. Hence if you
529           wish to know where a token started (and ended) use something like
530           this:
531
532                   rule: token1 token2 startcol token3 endcol token4
533                                   { print "token3: columns $item[3] to $item[5]"; }
534
535                   startcol: '' { $thiscolumn }    # NEED THE '' TO STEP PAST TOKEN SEP
536                   endcol:      { $prevcolumn }
537
538           Also see the entry for @itempos.
539
540       $thisoffset and $prevoffset
541           $thisoffset stores the offset of the current parsing position
542           within the complete text being parsed (starting from 0). $prevoff‐
543           set stores the offset of the last character which was actually suc‐
544           cessfully parsed. In all cases "$prevoffset == $thisoffset-1".
545
546           For efficiency, $thisoffset and $prevoffset are actually tied
547           hashes, and only recompute the required offset when the variable's
548           value is used.
549
550           Assignment to $thisoffset or <$prevoffset> is a fatal error.
551
552           Modifying the value of the variable $text will not affect the off‐
553           set counting mechanism.
554
555           Also see the entry for @itempos.
556
557       @itempos
558           The array @itempos stores a hash reference corresponding to each
559           element of @item. The elements of the hash provide the following:
560
561                   $itempos[$n]{offset}{from}      # VALUE OF $thisoffset BEFORE $item[$n]
562                   $itempos[$n]{offset}{to}        # VALUE OF $prevoffset AFTER $item[$n]
563                   $itempos[$n]{line}{from}        # VALUE OF $thisline BEFORE $item[$n]
564                   $itempos[$n]{line}{to}          # VALUE OF $prevline AFTER $item[$n]
565                   $itempos[$n]{column}{from}      # VALUE OF $thiscolumn BEFORE $item[$n]
566                   $itempos[$n]{column}{to}        # VALUE OF $prevcolumn AFTER $item[$n]
567
568           Note that the various "$itempos[$n]...{from}" values record the
569           appropriate value after any token prefix has been skipped.
570
571           Hence, instead of the somewhat tedious and error-prone:
572
573                   rule: startcol token1 endcol
574                         startcol token2 endcol
575                         startcol token3 endcol
576                                   { print "token1: columns $item[1]
577                                                         to $item[3]
578                                            token2: columns $item[4]
579                                                         to $item[6]
580                                            token3: columns $item[7]
581                                                         to $item[9]" }
582
583                   startcol: '' { $thiscolumn }    # NEED THE '' TO STEP PAST TOKEN SEP
584                   endcol:      { $prevcolumn }
585
586           it is possible to write:
587
588                   rule: token1 token2 token3
589                                   { print "token1: columns $itempos[1]{column}{from}
590                                                         to $itempos[1]{column}{to}
591                                            token2: columns $itempos[2]{column}{from}
592                                                         to $itempos[2]{column}{to}
593                                            token3: columns $itempos[3]{column}{from}
594                                                         to $itempos[3]{column}{to}" }
595
596           Note however that (in the current implementation) the use of @item‐
597           pos anywhere in a grammar implies that item positioning information
598           is collected everywhere during the parse. Depending on the grammar
599           and the size of the text to be parsed, this may be prohibitively
600           expensive and the explicit use of $thisline, $thiscolumn, etc. may
601           be a better choice.
602
603       $thisparser
604           A reference to the "Parse::RecDescent" object through which parsing
605           was initiated.
606
607           The value of $thisparser propagates down the subrules of a parse
608           but not back up. Hence, you can invoke subrules from another parser
609           for the scope of the current rule as follows:
610
611                   rule: subrule1 subrule2
612                       ⎪ { $thisparser = $::otherparser } <reject>
613                       ⎪ subrule3 subrule4
614                       ⎪ subrule5
615
616           The result is that the production calls "subrule1" and "subrule2"
617           of the current parser, and the remaining productions call the named
618           subrules from $::otherparser. Note, however that "Bad Things" will
619           happen if "::otherparser" isn't a blessed reference and/or doesn't
620           have methods with the same names as the required subrules!
621
622       $thisrule
623           A reference to the "Parse::RecDescent::Rule" object corresponding
624           to the rule currently being matched.
625
626       $thisprod
627           A reference to the "Parse::RecDescent::Production" object corre‐
628           sponding to the production currently being matched.
629
630       $score and $score_return
631           $score stores the best production score to date, as specified by an
632           earlier "<score:...>" directive. $score_return stores the corre‐
633           sponding return value for the successful production.
634
635           See "Scored productions".
636
637       Warning: the parser relies on the information in the various "this..."
638       objects in some non-obvious ways. Tinkering with the other members of
639       these objects will probably cause Bad Things to happen, unless you
640       really know what you're doing. The only exception to this advice is
641       that the use of "$this...->{local}" is always safe.
642
643       Start-up Actions
644
645       Any actions which appear before the first rule definition in a grammar
646       are treated as "start-up" actions. Each such action is stripped of its
647       outermost brackets and then evaluated (in the parser's special names‐
648       pace) just before the rules of the grammar are first compiled.
649
650       The main use of start-up actions is to declare local variables within
651       the parser's special namespace:
652
653               { my $lastitem = '???'; }
654
655               list: item(s)   { $return = $lastitem }
656
657               item: book      { $lastitem = 'book'; }
658                     bell      { $lastitem = 'bell'; }
659                     candle    { $lastitem = 'candle'; }
660
661       but start-up actions can be used to execute any valid Perl code within
662       a parser's special namespace.
663
664       Start-up actions can appear within a grammar extension or replacement
665       (that is, a partial grammar installed via "Parse::RecDescent::Extend()"
666       or "Parse::RecDescent::Replace()" - see "Incremental Parsing"), and
667       will be executed before the new grammar is installed. Note, however,
668       that a particular start-up action is only ever executed once.
669
670       Autoactions
671
672       It is sometimes desirable to be able to specify a default action to be
673       taken at the end of every production (for example, in order to easily
674       build a parse tree). If the variable $::RD_AUTOACTION is defined when
675       "Parse::RecDescent::new()" is called, the contents of that variable are
676       treated as a specification of an action which is to appended to each
677       production in the corresponding grammar. So, for example, to construct
678       a simple parse tree:
679
680           $::RD_AUTOACTION = q { [@item] };
681
682           parser = new Parse::RecDescent (q{
683               expression: and_expr '⎪⎪' expression ⎪ and_expr
684               and_expr:   not_expr '&&' and_expr   ⎪ not_expr
685               not_expr:   '!' brack_expr           ⎪ brack_expr
686               brack_expr: '(' expression ')'       ⎪ identifier
687               identifier: /[a-z]+/i
688               });
689
690       which is equivalent to:
691
692           parser = new Parse::RecDescent (q{
693               expression: and_expr '⎪⎪' expression
694                               { [@item] }
695                         ⎪ and_expr
696                               { [@item] }
697
698               and_expr:   not_expr '&&' and_expr
699                               { [@item] }
700                       ⎪   not_expr
701                               { [@item] }
702
703               not_expr:   '!' brack_expr
704                               { [@item] }
705                       ⎪   brack_expr
706                               { [@item] }
707
708               brack_expr: '(' expression ')'
709                               { [@item] }
710                         ⎪ identifier
711                               { [@item] }
712
713               identifier: /[a-z]+/i
714                               { [@item] }
715               });
716
717       Alternatively, we could take an object-oriented approach, use different
718       classes for each node (and also eliminating redundant intermediate
719       nodes):
720
721           $::RD_AUTOACTION = q
722             { $#item==1 ? $item[1] : new ${"$item[0]_node"} (@item[1..$#item]) };
723
724           parser = new Parse::RecDescent (q{
725               expression: and_expr '⎪⎪' expression ⎪ and_expr
726               and_expr:   not_expr '&&' and_expr   ⎪ not_expr
727               not_expr:   '!' brack_expr           ⎪ brack_expr
728               brack_expr: '(' expression ')'       ⎪ identifier
729               identifier: /[a-z]+/i
730               });
731
732       which is equivalent to:
733
734           parser = new Parse::RecDescent (q{
735               expression: and_expr '⎪⎪' expression
736                               { new expression_node (@item[1..3]) }
737                         ⎪ and_expr
738
739               and_expr:   not_expr '&&' and_expr
740                               { new and_expr_node (@item[1..3]) }
741                       ⎪   not_expr
742
743               not_expr:   '!' brack_expr
744                               { new not_expr_node (@item[1..2]) }
745                       ⎪   brack_expr
746
747               brack_expr: '(' expression ')'
748                               { new brack_expr_node (@item[1..3]) }
749                         ⎪ identifier
750
751               identifier: /[a-z]+/i
752                               { new identifer_node (@item[1]) }
753               });
754
755       Note that, if a production already ends in an action, no autoaction is
756       appended to it. For example, in this version:
757
758           $::RD_AUTOACTION = q
759             { $#item==1 ? $item[1] : new ${"$item[0]_node"} (@item[1..$#item]) };
760
761           parser = new Parse::RecDescent (q{
762               expression: and_expr '&&' expression ⎪ and_expr
763               and_expr:   not_expr '&&' and_expr   ⎪ not_expr
764               not_expr:   '!' brack_expr           ⎪ brack_expr
765               brack_expr: '(' expression ')'       ⎪ identifier
766               identifier: /[a-z]+/i
767                               { new terminal_node($item[1]) }
768               });
769
770       each "identifier" match produces a "terminal_node" object, not an
771       "identifier_node" object.
772
773       A level 1 warning is issued each time an "autoaction" is added to some
774       production.
775
776       Autotrees
777
778       A commonly needed autoaction is one that builds a parse-tree. It is
779       moderately tricky to set up such an action (which must treat terminals
780       differently from non-terminals), so Parse::RecDescent simplifies the
781       process by providing the "<autotree>" directive.
782
783       If this directive appears at the start of grammar, it causes
784       Parse::RecDescent to insert autoactions at the end of any rule except
785       those which already end in an action. The action inserted depends on
786       whether the production is an intermediate rule (two or more items), or
787       a terminal of the grammar (i.e. a single pattern or string item).
788
789       So, for example, the following grammar:
790
791               <autotree>
792
793               file    : command(s)
794               command : get ⎪ set ⎪ vet
795               get     : 'get' ident ';'
796               set     : 'set' ident 'to' value ';'
797               vet     : 'check' ident 'is' value ';'
798               ident   : /\w+/
799               value   : /\d+/
800
801       is equivalent to:
802
803               file    : command(s)                    { bless \%item, $item[0] }
804               command : get                           { bless \%item, $item[0] }
805                       ⎪ set                           { bless \%item, $item[0] }
806                       ⎪ vet                           { bless \%item, $item[0] }
807               get     : 'get' ident ';'               { bless \%item, $item[0] }
808               set     : 'set' ident 'to' value ';'    { bless \%item, $item[0] }
809               vet     : 'check' ident 'is' value ';'  { bless \%item, $item[0] }
810
811               ident   : /\w+/          { bless {__VALUE__=>$item[1]}, $item[0] }
812               value   : /\d+/          { bless {__VALUE__=>$item[1]}, $item[0] }
813
814       Note that each node in the tree is blessed into a class of the same
815       name as the rule itself. This makes it easy to build object-oriented
816       processors for the parse-trees that the grammar produces. Note too that
817       the last two rules produce special objects with the single attribute
818       '__VALUE__'. This is because they consist solely of a single terminal.
819
820       This autoaction-ed grammar would then produce a parse tree in a data
821       structure like this:
822
823               {
824                 file => {
825                           command => {
826                                        [ get => {
827                                                   identifier => { __VALUE__ => 'a' },
828                                                 },
829                                          set => {
830                                                   identifier => { __VALUE__ => 'b' },
831                                                   value      => { __VALUE__ => '7' },
832                                                 },
833                                          vet => {
834                                                   identifier => { __VALUE__ => 'b' },
835                                                   value      => { __VALUE__ => '7' },
836                                                 },
837                                         ],
838                                      },
839                         }
840               }
841
842       (except, of course, that each nested hash would also be blessed into
843       the appropriate class).
844
845       Autostubbing
846
847       Normally, if a subrule appears in some production, but no rule of that
848       name is ever defined in the grammar, the production which refers to the
849       non-existent subrule fails immediately. This typically occurs as a
850       result of misspellings, and is a sufficiently common occurance that a
851       warning is generated for such situations.
852
853       However, when prototyping a grammar it is sometimes useful to be able
854       to use subrules before a proper specification of them is really possi‐
855       ble.  For example, a grammar might include a section like:
856
857               function_call: identifier '(' arg(s?) ')'
858
859               identifier: /[a-z]\w*/i
860
861       where the possible format of an argument is sufficiently complex that
862       it is not worth specifying in full until the general function call syn‐
863       tax has been debugged. In this situation it is convenient to leave the
864       real rule "arg" undefined and just slip in a placeholder (or "stub"):
865
866               arg: 'arg'
867
868       so that the function call syntax can be tested with dummy input such
869       as:
870
871               f0()
872               f1(arg)
873               f2(arg arg)
874               f3(arg arg arg)
875
876       et cetera.
877
878       Early in prototyping, many such "stubs" may be required, so
879       "Parse::RecDescent" provides a means of automating their definition.
880       If the variable $::RD_AUTOSTUB is defined when a parser is built, a
881       subrule reference to any non-existent rule (say, "sr"), causes a "stub"
882       rule of the form:
883
884               sr: 'sr'
885
886       to be automatically defined in the generated parser.  A level 1 warning
887       is issued for each such "autostubbed" rule.
888
889       Hence, with $::AUTOSTUB defined, it is possible to only partially spec‐
890       ify a grammar, and then "fake" matches of the unspecified (sub)rules by
891       just typing in their name.
892
893       Look-ahead
894
895       If a subrule, token, or action is prefixed by "...", then it is treated
896       as a "look-ahead" request. That means that the current production can
897       (as usual) only succeed if the specified item is matched, but that the
898       matching does not consume any of the text being parsed. This is very
899       similar to the "/(?=...)/" look-ahead construct in Perl patterns. Thus,
900       the rule:
901
902               inner_word: word ...word
903
904       will match whatever the subrule "word" matches, provided that match is
905       followed by some more text which subrule "word" would also match
906       (although this second substring is not actually consumed by
907       "inner_word")
908
909       Likewise, a "...!" prefix, causes the following item to succeed (with‐
910       out consuming any text) if and only if it would normally fail. Hence, a
911       rule such as:
912
913               identifier: ...!keyword ...!'_' /[A-Za-z_]\w*/
914
915       matches a string of characters which satisfies the pattern
916       "/[A-Za-z_]\w*/", but only if the same sequence of characters would not
917       match either subrule "keyword" or the literal token '_'.
918
919       Sequences of look-ahead prefixes accumulate, multiplying their positive
920       and/or negative senses. Hence:
921
922               inner_word: word ...!......!word
923
924       is exactly equivalent the the original example above (a warning is
925       issued in cases like these, since they often indicate something left
926       out, or misunderstood).
927
928       Note that actions can also be treated as look-aheads. In such cases,
929       the state of the parser text (in the local variable $text) after the
930       look-ahead action is guaranteed to be identical to its state before the
931       action, regardless of how it's changed within the action (unless you
932       actually undefine $text, in which case you get the disaster you deserve
933       :-).
934
935       Directives
936
937       Directives are special pre-defined actions which may be used to alter
938       the behaviour of the parser. There are currently eighteen directives:
939       "<commit>", "<uncommit>", "<reject>", "<score>", "<autoscore>",
940       "<skip>", "<resync>", "<error>", "<rulevar>", "<matchrule>",
941       "<leftop>", "<rightop>", "<defer>", "<nocheck>", "<perl_quotelike>",
942       "<perl_codeblock>", "<perl_variable>", and "<token>".
943
944       Committing and uncommitting
945           The "<commit>" and "<uncommit>" directives permit the recursive
946           descent of the parse tree to be pruned (or "cut") for efficiency.
947           Within a rule, a "<commit>" directive instructs the rule to ignore
948           subsequent productions if the current production fails. For exam‐
949           ple:
950
951                   command: 'find' <commit> filename
952                          ⎪ 'open' <commit> filename
953                          ⎪ 'move' filename filename
954
955           Clearly, if the leading token 'find' is matched in the first pro‐
956           duction but that production fails for some other reason, then the
957           remaining productions cannot possibly match. The presence of the
958           "<commit>" causes the "command" rule to fail immediately if an
959           invalid "find" command is found, and likewise if an invalid "open"
960           command is encountered.
961
962           It is also possible to revoke a previous commitment. For example:
963
964                   if_statement: 'if' <commit> condition
965                                           'then' block <uncommit>
966                                           'else' block
967                               ⎪ 'if' <commit> condition
968                                           'then' block
969
970           In this case, a failure to find an "else" block in the first pro‐
971           duction shouldn't preclude trying the second production, but a
972           failure to find a "condition" certainly should.
973
974           As a special case, any production in which the first item is an
975           "<uncommit>" immediately revokes a preceding "<commit>" (even
976           though the production would not otherwise have been tried). For
977           example, in the rule:
978
979                   request: 'explain' expression
980                          ⎪ 'explain' <commit> keyword
981                          ⎪ 'save'
982                          ⎪ 'quit'
983                          ⎪ <uncommit> term '?'
984
985           if the text being matched was "explain?", and the first two produc‐
986           tions failed, then the "<commit>" in production two would cause
987           productions three and four to be skipped, but the leading "<uncom‐
988           mit>" in the production five would allow that production to attempt
989           a match.
990
991           Note in the preceding example, that the "<commit>" was only placed
992           in production two. If production one had been:
993
994                   request: 'explain' <commit> expression
995
996           then production two would be (inappropriately) skipped if a leading
997           "explain..." was encountered.
998
999           Both "<commit>" and "<uncommit>" directives always succeed, and
1000           their value is always 1.
1001
1002       Rejecting a production
1003           The "<reject>" directive immediately causes the current production
1004           to fail (it is exactly equivalent to, but more obvious than, the
1005           action "{undef}"). A "<reject>" is useful when it is desirable to
1006           get the side effects of the actions in one production, without
1007           prejudicing a match by some other production later in the rule. For
1008           example, to insert tracing code into the parse:
1009
1010                   complex_rule: { print "In complex rule...\n"; } <reject>
1011
1012                   complex_rule: simple_rule '+' 'i' '*' simple_rule
1013                               ⎪ 'i' '*' simple_rule
1014                               ⎪ simple_rule
1015
1016           It is also possible to specify a conditional rejection, using the
1017           form "<reject:condition>", which only rejects if the specified con‐
1018           dition is true. This form of rejection is exactly equivalent to the
1019           action "{(condition)?undef:1}>".  For example:
1020
1021                   command: save_command
1022                          ⎪ restore_command
1023                          ⎪ <reject: defined $::tolerant> { exit }
1024                          ⎪ <error: Unknown command. Ignored.>
1025
1026           A "<reject>" directive never succeeds (and hence has no associated
1027           value). A conditional rejection may succeed (if its condition is
1028           not satisfied), in which case its value is 1.
1029
1030           As an extra optimization, "Parse::RecDescent" ignores any produc‐
1031           tion which begins with an unconditional "<reject>" directive, since
1032           any such production can never successfully match or have any useful
1033           side-effects. A level 1 warning is issued in all such cases.
1034
1035           Note that productions beginning with conditional "<reject:...>"
1036           directives are never "optimized away" in this manner, even if they
1037           are always guaranteed to fail (for example: "<reject:1>")
1038
1039           Due to the way grammars are parsed, there is a minor restriction on
1040           the condition of a conditional "<reject:...>": it cannot contain
1041           any raw '<' or '>' characters. For example:
1042
1043                   line: cmd <reject: $thiscolumn > max> data
1044
1045           results in an error when a parser is built from this grammar (since
1046           the grammar parser has no way of knowing whether the first > is a
1047           "less than" or the end of the "<reject:...>".
1048
1049           To overcome this problem, put the condition inside a do{} block:
1050
1051                   line: cmd <reject: do{$thiscolumn > max}> data
1052
1053           Note that the same problem may occur in other directives that take
1054           arguments. The same solution will work in all cases.
1055
1056       Skipping between terminals
1057           The "<skip>" directive enables the terminal prefix used in a pro‐
1058           duction to be changed. For example:
1059
1060                   OneLiner: Command <skip:'[ \t]*'> Arg(s) /;/
1061
1062           causes only blanks and tabs to be skipped before terminals in the
1063           "Arg" subrule (and any of its subrules>, and also before the final
1064           "/;/" terminal.  Once the production is complete, the previous ter‐
1065           minal prefix is reinstated. Note that this implies that distinct
1066           productions of a rule must reset their terminal prefixes individu‐
1067           ally.
1068
1069           The "<skip>" directive evaluates to the previous terminal prefix,
1070           so it's easy to reinstate a prefix later in a production:
1071
1072                   Command: <skip:","> CSV(s) <skip:$item[1]> Modifier
1073
1074           The value specified after the colon is interpolated into a pattern,
1075           so all of the following are equivalent (though their efficiency
1076           increases down the list):
1077
1078                   <skip: "$colon⎪$comma">   # ASSUMING THE VARS HOLD THE OBVIOUS VALUES
1079
1080                   <skip: ':⎪,'>
1081
1082                   <skip: q{[:,]}>
1083
1084                   <skip: qr/[:,]/>
1085
1086           There is no way of directly setting the prefix for an entire rule,
1087           except as follows:
1088
1089                   Rule: <skip: '[ \t]*'> Prod1
1090                       ⎪ <skip: '[ \t]*'> Prod2a Prod2b
1091                       ⎪ <skip: '[ \t]*'> Prod3
1092
1093           or, better:
1094
1095                   Rule: <skip: '[ \t]*'>
1096                       (
1097                           Prod1
1098                         ⎪ Prod2a Prod2b
1099                         ⎪ Prod3
1100                       )
1101
1102           Note: Up to release 1.51 of Parse::RecDescent, an entirely differ‐
1103           ent mechanism was used for specifying terminal prefixes. The cur‐
1104           rent method is not backwards-compatible with that early approach.
1105           The current approach is stable and will not to change again.
1106
1107       Resynchronization
1108           The "<resync>" directive provides a visually distinctive means of
1109           consuming some of the text being parsed, usually to skip an erro‐
1110           neous input. In its simplest form "<resync>" simply consumes text
1111           up to and including the next newline ("\n") character, succeeding
1112           only if the newline is found, in which case it causes its surround‐
1113           ing rule to return zero on success.
1114
1115           In other words, a "<resync>" is exactly equivalent to the token
1116           "/[^\n]*\n/" followed by the action "{ $return = 0 }" (except that
1117           productions beginning with a "<resync>" are ignored when generating
1118           error messages). A typical use might be:
1119
1120                   script : command(s)
1121
1122                   command: save_command
1123                          ⎪ restore_command
1124                          ⎪ <resync> # TRY NEXT LINE, IF POSSIBLE
1125
1126           It is also possible to explicitly specify a resynchronization pat‐
1127           tern, using the "<resync:pattern>" variant. This version succeeds
1128           only if the specified pattern matches (and consumes) the parsed
1129           text. In other words, "<resync:pattern>" is exactly equivalent to
1130           the token "/pattern/" (followed by a "{ $return = 0 }" action). For
1131           example, if commands were terminated by newlines or semi-colons:
1132
1133                   command: save_command
1134                          ⎪ restore_command
1135                          ⎪ <resync:[^;\n]*[;\n]>
1136
1137           The value of a successfully matched "<resync>" directive (of either
1138           type) is the text that it consumed. Note, however, that since the
1139           directive also sets $return, a production consisting of a lone
1140           "<resync>" succeeds but returns the value zero (which a calling
1141           rule may find useful to distinguish between "true" matches and
1142           "tolerant" matches).  Remember that returning a zero value indi‐
1143           cates that the rule succeeded (since only an "undef" denotes fail‐
1144           ure within "Parse::RecDescent" parsers.
1145
1146       Error handling
1147           The "<error>" directive provides automatic or user-defined genera‐
1148           tion of error messages during a parse. In its simplest form
1149           "<error>" prepares an error message based on the mismatch between
1150           the last item expected and the text which cause it to fail. For
1151           example, given the rule:
1152
1153                   McCoy: curse ',' name ', I'm a doctor, not a' a_profession '!'
1154                        ⎪ pronoun 'dead,' name '!'
1155                        ⎪ <error>
1156
1157           the following strings would produce the following messages:
1158
1159           "Amen, Jim!"
1160                      ERROR (line 1): Invalid McCoy: Expected curse or pronoun
1161                                      not found
1162
1163           "Dammit, Jim, I'm a doctor!"
1164                      ERROR (line 1): Invalid McCoy: Expected ", I'm a doctor, not a"
1165                                      but found ", I'm a doctor!" instead
1166
1167           "He's dead,\n"
1168                      ERROR (line 2): Invalid McCoy: Expected name not found
1169
1170           "He's alive!"
1171                      ERROR (line 1): Invalid McCoy: Expected 'dead,' but found
1172                                      "alive!" instead
1173
1174           "Dammit, Jim, I'm a doctor, not a pointy-eared Vulcan!"
1175                      ERROR (line 1): Invalid McCoy: Expected a profession but found
1176                                      "pointy-eared Vulcan!" instead
1177
1178           Note that, when autogenerating error messages, all underscores in
1179           any rule name used in a message are replaced by single spaces (for
1180           example "a_production" becomes "a production"). Judicious choice of
1181           rule names can therefore considerably improve the readability of
1182           automatic error messages (as well as the maintainability of the
1183           original grammar).
1184
1185           If the automatically generated error is not sufficient, it is pos‐
1186           sible to provide an explicit message as part of the error direc‐
1187           tive. For example:
1188
1189                   Spock: "Fascinating ',' (name ⎪ 'Captain') '.'
1190                        ⎪ "Highly illogical, doctor."
1191                        ⎪ <error: He never said that!>
1192
1193           which would result in all failures to parse a "Spock" subrule
1194           printing the following message:
1195
1196                  ERROR (line <N>): Invalid Spock:  He never said that!
1197
1198           The error message is treated as a "qq{...}" string and interpolated
1199           when the error is generated (not when the directive is specified!).
1200           Hence:
1201
1202                   <error: Mystical error near "$text">
1203
1204           would correctly insert the ambient text string which caused the
1205           error.
1206
1207           There are two other forms of error directive: "<error?>" and
1208           "<error?: msg>". These behave just like "<error>" and
1209           "<error: msg>" respectively, except that they are only triggered if
1210           the rule is "committed" at the time they are encountered. For exam‐
1211           ple:
1212
1213                   Scotty: "Ya kenna change the Laws of Phusics," <commit> name
1214                         ⎪ name <commit> ',' 'she's goanta blaw!'
1215                         ⎪ <error?>
1216
1217           will only generate an error for a string beginning with "Ya kenna
1218           change the Laws o' Phusics," or a valid name, but which still fails
1219           to match the corresponding production. That is,
1220           "$parser->Scotty("Aye, Cap'ain")" will fail silently (since neither
1221           production will "commit" the rule on that input), whereas
1222           "$parser->Scotty("Mr Spock, ah jest kenna do'ut!")"  will fail with
1223           the error message:
1224
1225                  ERROR (line 1): Invalid Scotty: expected 'she's goanta blaw!'
1226                                  but found 'I jest kenna do'ut!' instead.
1227
1228           since in that case the second production would commit after match‐
1229           ing the leading name.
1230
1231           Note that to allow this behaviour, all "<error>" directives which
1232           are the first item in a production automatically uncommit the rule
1233           just long enough to allow their production to be attempted (that
1234           is, when their production fails, the commitment is reinstated so
1235           that subsequent productions are skipped).
1236
1237           In order to permanently uncommit the rule before an error message,
1238           it is necessary to put an explicit "<uncommit>" before the
1239           "<error>". For example:
1240
1241                   line: 'Kirk:'  <commit> Kirk
1242                       ⎪ 'Spock:' <commit> Spock
1243                       ⎪ 'McCoy:' <commit> McCoy
1244                       ⎪ <uncommit> <error?> <reject>
1245                       ⎪ <resync>
1246
1247           Error messages generated by the various "<error...>" directives are
1248           not displayed immediately. Instead, they are "queued" in a buffer
1249           and are only displayed once parsing ultimately fails. Moreover,
1250           "<error...>" directives that cause one production of a rule to fail
1251           are automatically removed from the message queue if another produc‐
1252           tion subsequently causes the entire rule to succeed.  This means
1253           that you can put "<error...>" directives wherever useful diagnosis
1254           can be done, and only those associated with actual parser failure
1255           will ever be displayed. Also see "Gotchas".
1256
1257           As a general rule, the most useful diagnostics are usually gener‐
1258           ated either at the very lowest level within the grammar, or at the
1259           very highest. A good rule of thumb is to identify those subrules
1260           which consist mainly (or entirely) of terminals, and then put an
1261           "<error...>" directive at the end of any other rule which calls one
1262           or more of those subrules.
1263
1264           There is one other situation in which the output of the various
1265           types of error directive is suppressed; namely, when the rule con‐
1266           taining them is being parsed as part of a "look-ahead" (see
1267           "Look-ahead"). In this case, the error directive will still cause
1268           the rule to fail, but will do so silently.
1269
1270           An unconditional "<error>" directive always fails (and hence has no
1271           associated value). This means that encountering such a directive
1272           always causes the production containing it to fail. Hence an
1273           "<error>" directive will inevitably be the last (useful) item of a
1274           rule (a level 3 warning is issued if a production contains items
1275           after an unconditional "<error>" directive).
1276
1277           An "<error?>" directive will succeed (that is: fail to fail :-), if
1278           the current rule is uncommitted when the directive is encountered.
1279           In that case the directive's associated value is zero. Hence, this
1280           type of error directive can be used before the end of a production.
1281           For example:
1282
1283                   command: 'do' <commit> something
1284                          ⎪ 'report' <commit> something
1285                          ⎪ <error?: Syntax error> <error: Unknown command>
1286
1287           Warning: The "<error?>" directive does not mean "always fail (but
1288           do so silently unless committed)". It actually means "only fail
1289           (and report) if committed, otherwise succeed". To achieve the "fail
1290           silently if uncommitted" semantics, it is necessary to use:
1291
1292                   rule: item <commit> item(s)
1293                       ⎪ <error?> <reject>      # FAIL SILENTLY UNLESS COMMITTED
1294
1295           However, because people seem to expect a lone "<error?>" directive
1296           to work like this:
1297
1298                   rule: item <commit> item(s)
1299                       ⎪ <error?: Error message if committed>
1300                       ⎪ <error:  Error message if uncommitted>
1301
1302           Parse::RecDescent automatically appends a "<reject>" directive if
1303           the "<error?>" directive is the only item in a production. A level
1304           2 warning (see below) is issued when this happens.
1305
1306           The level of error reporting during both parser construction and
1307           parsing is controlled by the presence or absence of four global
1308           variables: $::RD_ERRORS, $::RD_WARN, $::RD_HINT, and <$::RD_TRACE>.
1309           If $::RD_ERRORS is defined (and, by default, it is) then fatal
1310           errors are reported.
1311
1312           Whenever $::RD_WARN is defined, certain non-fatal problems are also
1313           reported.  Warnings have an associated "level": 1, 2, or 3. The
1314           higher the level, the more serious the warning. The value of the
1315           corresponding global variable ($::RD_WARN) determines the lowest
1316           level of warning to be displayed. Hence, to see all warnings, set
1317           $::RD_WARN to 1.  To see only the most serious warnings set
1318           $::RD_WARN to 3.  By default $::RD_WARN is initialized to 3, ensur‐
1319           ing that serious but non-fatal errors are automatically reported.
1320
1321           See "DIAGNOSTICS" for a list of the varous error and warning mes‐
1322           sages that Parse::RecDescent generates when these two variables are
1323           defined.
1324
1325           Defining any of the remaining variables (which are not defined by
1326           default) further increases the amount of information reported.
1327           Defining $::RD_HINT causes the parser generator to offer more
1328           detailed analyses and hints on both errors and warnings.  Note that
1329           setting $::RD_HINT at any point automagically sets $::RD_WARN to 1.
1330
1331           Defining $::RD_TRACE causes the parser generator and the parser to
1332           report their progress to STDERR in excruciating detail (although,
1333           without hints unless $::RD_HINT is separately defined). This detail
1334           can be moderated in only one respect: if $::RD_TRACE has an integer
1335           value (N) greater than 1, only the N characters of the "current
1336           parsing context" (that is, where in the input string we are at any
1337           point in the parse) is reported at any time.
1338              > $::RD_TRACE is mainly useful for debugging a grammar that
1339           isn't behaving as you expected it to. To this end, if $::RD_TRACE
1340           is defined when a parser is built, any actual parser code which is
1341           generated is also written to a file named "RD_TRACE" in the local
1342           directory.
1343
1344           Note that the four variables belong to the "main" package, which
1345           makes them easier to refer to in the code controlling the parser,
1346           and also makes it easy to turn them into command line flags
1347           ("-RD_ERRORS", "-RD_WARN", "-RD_HINT", "-RD_TRACE") under perl -s.
1348
1349       Specifying local variables
1350           It is occasionally convenient to specify variables which are local
1351           to a single rule. This may be achieved by including a "<rule‐
1352           var:...>" directive anywhere in the rule. For example:
1353
1354                   markup: <rulevar: $tag>
1355
1356                   markup: tag {($tag=$item[1]) =~ s/^<⎪>$//g} body[$tag]
1357
1358           The example "<rulevar: $tag>" directive causes a "my" variable
1359           named $tag to be declared at the start of the subroutine implement‐
1360           ing the "markup" rule (that is, before the first production,
1361           regardless of where in the rule it is specified).
1362
1363           Specifically, any directive of the form: "<rulevar:text>" causes a
1364           line of the form "my text;" to be added at the beginning of the
1365           rule subroutine, immediately after the definitions of the following
1366           local variables:
1367
1368                   $thisparser     $commit
1369                   $thisrule       @item
1370                   $thisline       @arg
1371                   $text           %arg
1372
1373           This means that the following "<rulevar>" directives work as
1374           expected:
1375
1376                   <rulevar: $count = 0 >
1377
1378                   <rulevar: $firstarg = $arg[0] ⎪⎪ '' >
1379
1380                   <rulevar: $myItems = \@item >
1381
1382                   <rulevar: @context = ( $thisline, $text, @arg ) >
1383
1384                   <rulevar: ($name,$age) = $arg{"name","age"} >
1385
1386           If a variable that is also visible to subrules is required, it
1387           needs to be "local"'d, not "my"'d. "rulevar" defaults to "my", but
1388           if "local" is explicitly specified:
1389
1390                   <rulevar: local $count = 0 >
1391
1392           then a "local"-ized variable is declared instead, and will be
1393           available within subrules.
1394
1395           Note however that, because all such variables are "my" variables,
1396           their values do not persist between match attempts on a given rule.
1397           To preserve values between match attempts, values can be stored
1398           within the "local" member of the $thisrule object:
1399
1400                   countedrule: { $thisrule->{"local"}{"count"}++ }
1401                                <reject>
1402                              ⎪ subrule1
1403                              ⎪ subrule2
1404                              ⎪ <reject: $thisrule->{"local"}{"count"} == 1>
1405                                subrule3
1406
1407           When matching a rule, each "<rulevar>" directive is matched as if
1408           it were an unconditional "<reject>" directive (that is, it causes
1409           any production in which it appears to immediately fail to match).
1410           For this reason (and to improve readability) it is usual to specify
1411           any "<rulevar>" directive in a separate production at the start of
1412           the rule (this has the added advantage that it enables
1413           "Parse::RecDescent" to optimize away such productions, just as it
1414           does for the "<reject>" directive).
1415
1416       Dynamically matched rules
1417           Because regexes and double-quoted strings are interpolated, it is
1418           relatively easy to specify productions with "context sensitive"
1419           tokens. For example:
1420
1421                   command:  keyword  body  "end $item[1]"
1422
1423           which ensures that a command block is bounded by a "<keyword>...end
1424           <same keyword>" pair.
1425
1426           Building productions in which subrules are context sensitive is
1427           also possible, via the "<matchrule:...>" directive. This directive
1428           behaves identically to a subrule item, except that the rule which
1429           is invoked to match it is determined by the string specified after
1430           the colon. For example, we could rewrite the "command" rule like
1431           this:
1432
1433                   command:  keyword  <matchrule:body>  "end $item[1]"
1434
1435           Whatever appears after the colon in the directive is treated as an
1436           interpolated string (that is, as if it appeared in "qq{...}" opera‐
1437           tor) and the value of that interpolated string is the name of the
1438           subrule to be matched.
1439
1440           Of course, just putting a constant string like "body" in a
1441           "<matchrule:...>" directive is of little interest or benefit.  The
1442           power of directive is seen when we use a string that interpolates
1443           to something interesting. For example:
1444
1445                   command:        keyword <matchrule:$item[1]_body> "end $item[1]"
1446
1447                   keyword:        'while' ⎪ 'if' ⎪ 'function'
1448
1449                   while_body:     condition block
1450
1451                   if_body:        condition block ('else' block)(?)
1452
1453                   function_body:  arglist block
1454
1455           Now the "command" rule selects how to proceed on the basis of the
1456           keyword that is found. It is as if "command" were declared:
1457
1458                   command:        'while'    while_body    "end while"
1459                          ⎪        'if'       if_body       "end if"
1460                          ⎪        'function' function_body "end function"
1461
1462           When a "<matchrule:...>" directive is used as a repeated subrule,
1463           the rule name expression is "late-bound". That is, the name of the
1464           rule to be called is re-evaluated each time a match attempt is
1465           made. Hence, the following grammar:
1466
1467                   { $::species = 'dogs' }
1468
1469                   pair:   'two' <matchrule:$::species>(s)
1470
1471                   dogs:   /dogs/ { $::species = 'cats' }
1472
1473                   cats:   /cats/
1474
1475           will match the string "two dogs cats cats" completely, whereas it
1476           will only match the string "two dogs dogs dogs" up to the eighth
1477           letter. If the rule name were "early bound" (that is, evaluated
1478           only the first time the directive is encountered in a production),
1479           the reverse behaviour would be expected.
1480
1481           Note that the "matchrule" directive takes a string that is to be
1482           treated as a rule name, not as a rule invocation. That is, it's
1483           like a Perl symbolic reference, not an "eval". Just as you can say:
1484
1485                   $subname = 'foo';
1486
1487                   # and later...
1488
1489                   &{$foo}(@args);
1490
1491           but not:
1492
1493                   $subname = 'foo(@args)';
1494
1495                   # and later...
1496
1497                   &{$foo};
1498
1499           likewise you can say:
1500
1501                   $rulename = 'foo';
1502
1503                   # and in the grammar...
1504
1505                   <matchrule:$rulename>[@args]
1506
1507           but not:
1508
1509                   $rulename = 'foo[@args]';
1510
1511                   # and in the grammar...
1512
1513                   <matchrule:$rulename>
1514
1515       Deferred actions
1516           The "<defer:...>" directive is used to specify an action to be per‐
1517           formed when (and only if!) the current production ultimately suc‐
1518           ceeds.
1519
1520           Whenever a "<defer:...>" directive appears, the code it specifies
1521           is converted to a closure (an anonymous subroutine reference) which
1522           is queued within the active parser object. Note that, because the
1523           deferred code is converted to a closure, the values of any "local"
1524           variable (such as $text, <@item>, etc.) are preserved until the
1525           deferred code is actually executed.
1526
1527           If the parse ultimately succeeds and the production in which the
1528           "<defer:...>" directive was evaluated formed part of the successful
1529           parse, then the deferred code is executed immediately before the
1530           parse returns. If however the production which queued a deferred
1531           action fails, or one of the higher-level rules which called that
1532           production fails, then the deferred action is removed from the
1533           queue, and hence is never executed.
1534
1535           For example, given the grammar:
1536
1537                   sentence: noun trans noun
1538                           ⎪ noun intrans
1539
1540                   noun:     'the dog'
1541                                   { print "$item[1]\t(noun)\n" }
1542                       ⎪     'the meat'
1543                                   { print "$item[1]\t(noun)\n" }
1544
1545                   trans:    'ate'
1546                                   { print "$item[1]\t(transitive)\n" }
1547
1548                   intrans:  'ate'
1549                                   { print "$item[1]\t(intransitive)\n" }
1550                          ⎪  'barked'
1551                                   { print "$item[1]\t(intransitive)\n" }
1552
1553           then parsing the sentence "the dog ate" would produce the output:
1554
1555                   the dog  (noun)
1556                   ate      (transitive)
1557                   the dog  (noun)
1558                   ate      (intransitive)
1559
1560           This is because, even though the first production of "sentence"
1561           ultimately fails, its initial subrules "noun" and "trans" do match,
1562           and hence they execute their associated actions.  Then the second
1563           production of "sentence" succeeds, causing the actions of the sub‐
1564           rules "noun" and "intrans" to be executed as well.
1565
1566           On the other hand, if the actions were replaced by "<defer:...>"
1567           directives:
1568
1569                   sentence: noun trans noun
1570                           ⎪ noun intrans
1571
1572                   noun:     'the dog'
1573                                   <defer: print "$item[1]\t(noun)\n" >
1574                       ⎪     'the meat'
1575                                   <defer: print "$item[1]\t(noun)\n" >
1576
1577                   trans:    'ate'
1578                                   <defer: print "$item[1]\t(transitive)\n" >
1579
1580                   intrans:  'ate'
1581                                   <defer: print "$item[1]\t(intransitive)\n" >
1582                          ⎪  'barked'
1583                                   <defer: print "$item[1]\t(intransitive)\n" >
1584
1585           the output would be:
1586
1587                   the dog  (noun)
1588                   ate      (intransitive)
1589
1590           since deferred actions are only executed if they were evaluated in
1591           a production which ultimately contributes to the successful parse.
1592
1593           In this case, even though the first production of "sentence" caused
1594           the subrules "noun" and "trans" to match, that production ulti‐
1595           mately failed and so the deferred actions queued by those subrules
1596           were subsequently disgarded. The second production then succeeded,
1597           causing the entire parse to succeed, and so the deferred actions
1598           queued by the (second) match of the "noun" subrule and the subse‐
1599           quent match of "intrans" are preserved and eventually executed.
1600
1601           Deferred actions provide a means of improving the performance of a
1602           parser, by only executing those actions which are part of the final
1603           parse-tree for the input data.
1604
1605           Alternatively, deferred actions can be viewed as a mechanism for
1606           building (and executing) a customized subroutine corresponding to
1607           the given input data, much in the same way that autoactions (see
1608           "Autoactions") can be used to build a customized data structure for
1609           specific input.
1610
1611           Whether or not the action it specifies is ever executed, a
1612           "<defer:...>" directive always succeeds, returning the number of
1613           deferred actions currently queued at that point.
1614
1615       Parsing Perl
1616           Parse::RecDescent provides limited support for parsing subsets of
1617           Perl, namely: quote-like operators, Perl variables, and complete
1618           code blocks.
1619
1620           The "<perl_quotelike>" directive can be used to parse any Perl
1621           quote-like operator: 'a string', "m/a pattern/", "tr{ans}{lation}",
1622           etc.  It does this by calling Text::Balanced::quotelike().
1623
1624           If a quote-like operator is found, a reference to an array of eight
1625           elements is returned. Those elements are identical to the last
1626           eight elements returned by Text::Balanced::extract_quotelike() in
1627           an array context, namely:
1628
1629           [0] the name of the quotelike operator -- 'q', 'qq', 'm', 's', 'tr'
1630               -- if the operator was named; otherwise "undef",
1631
1632           [1] the left delimiter of the first block of the operation,
1633
1634           [2] the text of the first block of the operation (that is, the con‐
1635               tents of a quote, the regex of a match, or substitution or the
1636               target list of a translation),
1637
1638           [3] the right delimiter of the first block of the operation,
1639
1640           [4] the left delimiter of the second block of the operation if
1641               there is one (that is, if it is a "s", "tr", or "y"); otherwise
1642               "undef",
1643
1644           [5] the text of the second block of the operation if there is one
1645               (that is, the replacement of a substitution or the translation
1646               list of a translation); otherwise "undef",
1647
1648           [6] the right delimiter of the second block of the operation (if
1649               any); otherwise "undef",
1650
1651           [7] the trailing modifiers on the operation (if any); otherwise
1652               "undef".
1653
1654           If a quote-like expression is not found, the directive fails with
1655           the usual "undef" value.
1656
1657           The "<perl_variable>" directive can be used to parse any Perl vari‐
1658           able: $scalar, @array, %hash, $ref->{field}[$index], etc.  It does
1659           this by calling Text::Balanced::extract_variable().
1660
1661           If the directive matches text representing a valid Perl variable
1662           specification, it returns that text. Otherwise it fails with the
1663           usual "undef" value.
1664
1665           The "<perl_codeblock>" directive can be used to parse curly-brace-
1666           delimited block of Perl code, such as: { $a = 1; f() =~ m/pat/; }.
1667           It does this by calling Text::Balanced::extract_codeblock().
1668
1669           If the directive matches text representing a valid Perl code block,
1670           it returns that text. Otherwise it fails with the usual "undef"
1671           value.
1672
1673           You can also tell it what kind of brackets to use as the outermost
1674           delimiters. For example:
1675
1676                   arglist: <perl_codeblock ()>
1677
1678           causes an arglist to match a perl code block whose outermost delim‐
1679           iters are "(...)" (rather than the default "{...}").
1680
1681       Constructing tokens
1682           Eventually, Parse::RecDescent will be able to parse tokenized
1683           input, as well as ordinary strings. In preparation for this joyous
1684           day, the "<token:...>" directive has been provided.  This directive
1685           creates a token which will be suitable for input to a Parse::RecDe‐
1686           scent parser (when it eventually supports tokenized input).
1687
1688           The text of the token is the value of the immediately preceding
1689           item in the production. A "<token:...>" directive always succeeds
1690           with a return value which is the hash reference that is the new
1691           token. It also sets the return value for the production to that
1692           hash ref.
1693
1694           The "<token:...>" directive makes it easy to build a Parse::RecDes‐
1695           cent-compatible lexer in Parse::RecDescent:
1696
1697                   my $lexer = new Parse::RecDescent q
1698                   {
1699                           lex:    token(s)
1700
1701                           token:  /a\b/                      <token:INDEF>
1702                                ⎪  /the\b/                    <token:DEF>
1703                                ⎪  /fly\b/                    <token:NOUN,VERB>
1704                                ⎪  /[a-z]+/i { lc $item[1] }  <token:ALPHA>
1705                                ⎪  <error: Unknown token>
1706
1707                   };
1708
1709           which will eventually be able to be used with a regular
1710           Parse::RecDescent grammar:
1711
1712                   my $parser = new Parse::RecDescent q
1713                   {
1714                           startrule: subrule1 subrule 2
1715
1716                           # ETC...
1717                   };
1718
1719           either with a pre-lexing phase:
1720
1721                   $parser->startrule( $lexer->lex($data) );
1722
1723           or with a lex-on-demand approach:
1724
1725                   $parser->startrule( sub{$lexer->token(\$data)} );
1726
1727           But at present, only the "<token:...>" directive is actually imple‐
1728           mented. The rest is vapourware.
1729
1730       Specifying operations
1731           One of the commonest requirements when building a parser is to
1732           specify binary operators. Unfortunately, in a normal grammar, the
1733           rules for such things are awkward:
1734
1735                   disjunction:    conjunction ('or' conjunction)(s?)
1736                                           { $return = [ $item[1], @{$item[2]} ] }
1737
1738                   conjunction:    atom ('and' atom)(s?)
1739                                           { $return = [ $item[1], @{$item[2]} ] }
1740
1741           or inefficient:
1742
1743                   disjunction:    conjunction 'or' disjunction
1744                                           { $return = [ $item[1], @{$item[2]} ] }
1745                              ⎪    conjunction
1746                                           { $return = [ $item[1] ] }
1747
1748                   conjunction:    atom 'and' conjunction
1749                                           { $return = [ $item[1], @{$item[2]} ] }
1750                              ⎪    atom
1751                                           { $return = [ $item[1] ] }
1752
1753           and either way is ugly and hard to get right.
1754
1755           The "<leftop:...>" and "<rightop:...>" directives provide an easier
1756           way of specifying such operations. Using "<leftop:...>" the above
1757           examples become:
1758
1759                   disjunction:    <leftop: conjunction 'or' conjunction>
1760                   conjunction:    <leftop: atom 'and' atom>
1761
1762           The "<leftop:...>" directive specifies a left-associative binary
1763           operator.  It is specified around three other grammar elements
1764           (typically subrules or terminals), which match the left operand,
1765           the operator itself, and the right operand respectively.
1766
1767           A "<leftop:...>" directive such as:
1768
1769                   disjunction:    <leftop: conjunction 'or' conjunction>
1770
1771           is converted to the following:
1772
1773                   disjunction:    ( conjunction ('or' conjunction)(s?)
1774                                           { $return = [ $item[1], @{$item[2]} ] } )
1775
1776           In other words, a "<leftop:...>" directive matches the left operand
1777           followed by zero or more repetitions of both the operator and the
1778           right operand. It then flattens the matched items into an anonymous
1779           array which becomes the (single) value of the entire "<leftop:...>"
1780           directive.
1781
1782           For example, an "<leftop:...>" directive such as:
1783
1784                   output:  <leftop: ident '<<' expr >
1785
1786           when given a string such as:
1787
1788                   cout << var << "str" << 3
1789
1790           would match, and $item[1] would be set to:
1791
1792                   [ 'cout', 'var', '"str"', '3' ]
1793
1794           In other words:
1795
1796                   output:  <leftop: ident '<<' expr >
1797
1798           is equivalent to a left-associative operator:
1799
1800                   output:  ident                                  { $return = [$item[1]]       }
1801                         ⎪  ident '<<' expr                        { $return = [@item[1,3]]     }
1802                         ⎪  ident '<<' expr '<<' expr              { $return = [@item[1,3,5]]   }
1803                         ⎪  ident '<<' expr '<<' expr '<<' expr    { $return = [@item[1,3,5,7]] }
1804                         #  ...etc...
1805
1806           Similarly, the "<rightop:...>" directive takes a left operand, an
1807           operator, and a right operand:
1808
1809                   assign:  <rightop: var '=' expr >
1810
1811           and converts them to:
1812
1813                   assign:  ( (var '=' {$return=$item[1]})(s?) expr
1814                                           { $return = [ @{$item[1]}, $item[2] ] } )
1815
1816           which is equivalent to a right-associative operator:
1817
1818                   assign:  var                            { $return = [$item[1]]       }
1819                         ⎪  var '=' expr                   { $return = [@item[1,3]]     }
1820                         ⎪  var '=' var '=' expr           { $return = [@item[1,3,5]]   }
1821                         ⎪  var '=' var '=' var '=' expr   { $return = [@item[1,3,5,7]] }
1822                         #  ...etc...
1823
1824           Note that for both the "<leftop:...>" and "<rightop:...>" direc‐
1825           tives, the directive does not normally return the operator itself,
1826           just a list of the operands involved. This is particularly handy
1827           for specifying lists:
1828
1829                   list: '(' <leftop: list_item ',' list_item> ')'
1830                                   { $return = $item[2] }
1831
1832           There is, however, a problem: sometimes the operator is itself sig‐
1833           nificant.  For example, in a Perl list a comma and a "=>" are both
1834           valid separators, but the "=>" has additional stringification
1835           semantics.  Hence it's important to know which was used in each
1836           case.
1837
1838           To solve this problem the "<leftop:...>" and "<rightop:...>" direc‐
1839           tives do return the operator(s) as well, under two circumstances.
1840           The first case is where the operator is specified as a subrule. In
1841           that instance, whatever the operator matches is returned (on the
1842           assumption that if the operator is important enough to have its own
1843           subrule, then it's important enough to return).
1844
1845           The second case is where the operator is specified as a regular
1846           expression. In that case, if the first bracketed subpattern of the
1847           regular expression matches, that matching value is returned (this
1848           is analogous to the behaviour of the Perl "split" function, except
1849           that only the first subpattern is returned).
1850
1851           In other words, given the input:
1852
1853                   ( a=>1, b=>2 )
1854
1855           the specifications:
1856
1857                   list:      '('  <leftop: list_item separator list_item>  ')'
1858
1859                   separator: ',' ⎪ '=>'
1860
1861           or:
1862
1863                   list:      '('  <leftop: list_item /(,⎪=>)/ list_item>  ')'
1864
1865           cause the list separators to be interleaved with the operands in
1866           the anonymous array in $item[2]:
1867
1868                   [ 'a', '=>', '1', ',', 'b', '=>', '2' ]
1869
1870           But the following version:
1871
1872                   list:      '('  <leftop: list_item /,⎪=>/ list_item>  ')'
1873
1874           returns only the operators:
1875
1876                   [ 'a', '1', 'b', '2' ]
1877
1878           Of course, none of the above specifications handle the case of an
1879           empty list, since the "<leftop:...>" and "<rightop:...>" directives
1880           require at least a single right or left operand to match. To spec‐
1881           ify that the operator can match "trivially", it's necessary to add
1882           a "(?)" qualifier to the directive:
1883
1884                   list:      '('  <leftop: list_item /(,⎪=>)/ list_item>(?)  ')'
1885
1886           Note that in almost all the above examples, the first and third
1887           arguments of the "<leftop:...>" directive were the same subrule.
1888           That is because "<leftop:...>"'s are frequently used to specify
1889           "separated" lists of the same type of item. To make such lists eas‐
1890           ier to specify, the following syntax:
1891
1892                   list:   element(s /,/)
1893
1894           is exactly equivalent to:
1895
1896                   list:   <leftop: element /,/ element>
1897
1898           Note that the separator must be specified as a raw pattern (i.e.
1899           not a string or subrule).
1900
1901       Scored productions
1902           By default, Parse::RecDescent grammar rules always accept the first
1903           production that matches the input. But if two or more productions
1904           may potentially match the same input, choosing the first that does
1905           so may not be optimal.
1906
1907           For example, if you were parsing the sentence "time flies like an
1908           arrow", you might use a rule like this:
1909
1910                   sentence: verb noun preposition article noun { [@item] }
1911                           ⎪ adjective noun verb article noun   { [@item] }
1912                           ⎪ noun verb preposition article noun { [@item] }
1913
1914           Each of these productions matches the sentence, but the third one
1915           is the most likely interpretation. However, if the sentence had
1916           been "fruit flies like a banana", then the second production is
1917           probably the right match.
1918
1919           To cater for such situtations, the "<score:...>" can be used.  The
1920           directive is equivalent to an unconditional "<reject>", except that
1921           it allows you to specify a "score" for the current production. If
1922           that score is numerically greater than the best score of any pre‐
1923           ceding production, the current production is cached for later con‐
1924           sideration. If no later production matches, then the cached produc‐
1925           tion is treated as having matched, and the value of the item imme‐
1926           diately before its "<score:...>" directive is returned as the
1927           result.
1928
1929           In other words, by putting a "<score:...>" directive at the end of
1930           each production, you can select which production matches using cri‐
1931           teria other than specification order. For example:
1932
1933                   sentence: verb noun preposition article noun { [@item] } <score: sensible(@item)>
1934                           ⎪ adjective noun verb article noun   { [@item] } <score: sensible(@item)>
1935                           ⎪ noun verb preposition article noun { [@item] } <score: sensible(@item)>
1936
1937           Now, when each production reaches its respective "<score:...>"
1938           directive, the subroutine "sensible" will be called to evaluate the
1939           matched items (somehow). Once all productions have been tried, the
1940           one which "sensible" scored most highly will be the one that is
1941           accepted as a match for the rule.
1942
1943           The variable $score always holds the current best score of any pro‐
1944           duction, and the variable $score_return holds the corresponding
1945           return value.
1946
1947           As another example, the following grammar matches lines that may be
1948           separated by commas, colons, or semi-colons. This can be tricky if
1949           a colon-separated line also contains commas, or vice versa. The
1950           grammar resolves the ambiguity by selecting the rule that results
1951           in the fewest fields:
1952
1953                   line: seplist[sep=>',']  <score: -@{$item[1]}>
1954                       ⎪ seplist[sep=>':']  <score: -@{$item[1]}>
1955                       ⎪ seplist[sep=>" "]  <score: -@{$item[1]}>
1956
1957                   seplist: <skip:""> <leftop: /[^$arg{sep}]*/ "$arg{sep}" /[^$arg{sep}]*/>
1958
1959           Note the use of negation within the "<score:...>" directive to
1960           ensure that the seplist with the most items gets the lowest score.
1961
1962           As the above examples indicate, it is often the case that all pro‐
1963           ductions in a rule use exactly the same "<score:...>" directive. It
1964           is tedious to have to repeat this identical directive in every pro‐
1965           duction, so Parse::RecDescent also provides the "<autoscore:...>"
1966           directive.
1967
1968           If an "<autoscore:...>" directive appears in any production of a
1969           rule, the code it specifies is used as the scoring code for every
1970           production of that rule, except productions that already end with
1971           an explicit "<score:...>" directive. Thus the rules above could be
1972           rewritten:
1973
1974                   line: <autoscore: -@{$item[1]}>
1975                   line: seplist[sep=>',']
1976                       ⎪ seplist[sep=>':']
1977                       ⎪ seplist[sep=>" "]
1978
1979                   sentence: <autoscore: sensible(@item)>
1980                           ⎪ verb noun preposition article noun { [@item] }
1981                           ⎪ adjective noun verb article noun   { [@item] }
1982                           ⎪ noun verb preposition article noun { [@item] }
1983
1984           Note that the "<autoscore:...>" directive itself acts as an uncon‐
1985           ditional "<reject>", and (like the "<rulevar:...>" directive) is
1986           pruned at compile-time wherever possible.
1987
1988       Dispensing with grammar checks
1989           During the compilation phase of parser construction, Parse::RecDes‐
1990           cent performs a small number of checks on the grammar it's given.
1991           Specifically it checks that the grammar is not left-recursive, that
1992           there are no "insatiable" constructs of the form:
1993
1994                   rule: subrule(s) subrule
1995
1996           and that there are no rules missing (i.e. referred to, but never
1997           defined).
1998
1999           These checks are important during development, but can slow down
2000           parser construction in stable code. So Parse::RecDescent provides
2001           the <nocheck> directive to turn them off. The directive can only
2002           appear before the first rule definition, and switches off checking
2003           throughout the rest of the current grammar.
2004
2005           Typically, this directive would be added when a parser has been
2006           thoroughly tested and is ready for release.
2007
2008       Subrule argument lists
2009
2010       It is occasionally useful to pass data to a subrule which is being
2011       invoked. For example, consider the following grammar fragment:
2012
2013               classdecl: keyword decl
2014
2015               keyword:   'struct' ⎪ 'class';
2016
2017               decl:      # WHATEVER
2018
2019       The "decl" rule might wish to know which of the two keywords was used
2020       (since it may affect some aspect of the way the subsequent declaration
2021       is interpreted). "Parse::RecDescent" allows the grammar designer to
2022       pass data into a rule, by placing that data in an argument list (that
2023       is, in square brackets) immediately after any subrule item in a produc‐
2024       tion. Hence, we could pass the keyword to "decl" as follows:
2025
2026               classdecl: keyword decl[ $item[1] ]
2027
2028               keyword:   'struct' ⎪ 'class';
2029
2030               decl:      # WHATEVER
2031
2032       The argument list can consist of any number (including zero!) of comma-
2033       separated Perl expressions. In other words, it looks exactly like a
2034       Perl anonymous array reference. For example, we could pass the keyword,
2035       the name of the surrounding rule, and the literal 'keyword' to "decl"
2036       like so:
2037
2038               classdecl: keyword decl[$item[1],$item[0],'keyword']
2039
2040               keyword:   'struct' ⎪ 'class';
2041
2042               decl:      # WHATEVER
2043
2044       Within the rule to which the data is passed ("decl" in the above exam‐
2045       ples) that data is available as the elements of a local variable @arg.
2046       Hence "decl" might report its intentions as follows:
2047
2048               classdecl: keyword decl[$item[1],$item[0],'keyword']
2049
2050               keyword:   'struct' ⎪ 'class';
2051
2052               decl:      { print "Declaring $arg[0] (a $arg[2])\n";
2053                            print "(this rule called by $arg[1])" }
2054
2055       Subrule argument lists can also be interpreted as hashes, simply by
2056       using the local variable %arg instead of @arg. Hence we could rewrite
2057       the previous example:
2058
2059               classdecl: keyword decl[keyword => $item[1],
2060                                       caller  => $item[0],
2061                                       type    => 'keyword']
2062
2063               keyword:   'struct' ⎪ 'class';
2064
2065               decl:      { print "Declaring $arg{keyword} (a $arg{type})\n";
2066                            print "(this rule called by $arg{caller})" }
2067
2068       Both @arg and %arg are always available, so the grammar designer may
2069       choose whichever convention (or combination of conventions) suits best.
2070
2071       Subrule argument lists are also useful for creating "rule templates"
2072       (especially when used in conjunction with the "<matchrule:...>" direc‐
2073       tive). For example, the subrule:
2074
2075               list:     <matchrule:$arg{rule}> /$arg{sep}/ list[%arg]
2076                               { $return = [ $item[1], @{$item[3]} ] }
2077                   ⎪     <matchrule:$arg{rule}>
2078                               { $return = [ $item[1]] }
2079
2080       is a handy template for the common problem of matching a separated
2081       list.  For example:
2082
2083               function: 'func' name '(' list[rule=>'param',sep=>';'] ')'
2084
2085               param:    list[rule=>'name',sep=>','] ':' typename
2086
2087               name:     /\w+/
2088
2089               typename: name
2090
2091       When a subrule argument list is used with a repeated subrule, the argu‐
2092       ment list goes before the repetition specifier:
2093
2094               list:   /some⎪many/ thing[ $item[1] ](s)
2095
2096       The argument list is "late bound". That is, it is re-evaluated for
2097       every repetition of the repeated subrule.  This means that each
2098       repeated attempt to match the subrule may be passed a completely dif‐
2099       ferent set of arguments if the value of the expression in the argument
2100       list changes between attempts. So, for example, the grammar:
2101
2102               { $::species = 'dogs' }
2103
2104               pair:   'two' animal[$::species](s)
2105
2106               animal: /$arg[0]/ { $::species = 'cats' }
2107
2108       will match the string "two dogs cats cats" completely, whereas it will
2109       only match the string "two dogs dogs dogs" up to the eighth letter. If
2110       the value of the argument list were "early bound" (that is, evaluated
2111       only the first time a repeated subrule match is attempted), one would
2112       expect the matching behaviours to be reversed.
2113
2114       Of course, it is possible to effectively "early bind" such argument
2115       lists by passing them a value which does not change on each repetition.
2116       For example:
2117
2118               { $::species = 'dogs' }
2119
2120               pair:   'two' { $::species } animal[$item[2]](s)
2121
2122               animal: /$arg[0]/ { $::species = 'cats' }
2123
2124       Arguments can also be passed to the start rule, simply by appending
2125       them to the argument list with which the start rule is called (after
2126       the "line number" parameter). For example, given:
2127
2128               $parser = new Parse::RecDescent ( $grammar );
2129
2130               $parser->data($text, 1, "str", 2, \@arr);
2131
2132               #             ^^^^^  ^  ^^^^^^^^^^^^^^^
2133               #               ⎪    ⎪         ⎪
2134               # TEXT TO BE PARSED  ⎪         ⎪
2135               # STARTING LINE NUMBER         ⎪
2136               # ELEMENTS OF @arg WHICH IS PASSED TO RULE data
2137
2138       then within the productions of the rule "data", the array @arg will
2139       contain "("str", 2, \@arr)".
2140
2141       Alternations
2142
2143       Alternations are implicit (unnamed) rules defined as part of a produc‐
2144       tion. An alternation is defined as a series of '⎪'-separated produc‐
2145       tions inside a pair of round brackets. For example:
2146
2147               character: 'the' ( good ⎪ bad ⎪ ugly ) /dude/
2148
2149       Every alternation implicitly defines a new subrule, whose automati‐
2150       cally-generated name indicates its origin: "_alternation_<I>_of_produc‐
2151       tion_<P>_of_rule<R>" for the appropriate values of <I>, <P>, and <R>. A
2152       call to this implicit subrule is then inserted in place of the brack‐
2153       ets. Hence the above example is merely a convenient short-hand for:
2154
2155               character: 'the'
2156                          _alternation_1_of_production_1_of_rule_character
2157                          /dude/
2158
2159               _alternation_1_of_production_1_of_rule_character:
2160                          good ⎪ bad ⎪ ugly
2161
2162       Since alternations are parsed by recursively calling the parser genera‐
2163       tor, any type(s) of item can appear in an alternation. For example:
2164
2165               character: 'the' ( 'high' "plains"      # Silent, with poncho
2166                                ⎪ /no[- ]name/         # Silent, no poncho
2167                                ⎪ vengeance_seeking    # Poncho-optional
2168                                ⎪ <error>
2169                                ) drifter
2170
2171       In this case, if an error occurred, the automatically generated message
2172       would be:
2173
2174               ERROR (line <N>): Invalid implicit subrule: Expected
2175                                 'high' or /no[- ]name/ or generic,
2176                                 but found "pacifist" instead
2177
2178       Since every alternation actually has a name, it's even possible to
2179       extend or replace them:
2180
2181               parser->Replace(
2182                       "_alternation_1_of_production_1_of_rule_character:
2183                               'generic Eastwood'"
2184                               );
2185
2186       More importantly, since alternations are a form of subrule, they can be
2187       given repetition specifiers:
2188
2189               character: 'the' ( good ⎪ bad ⎪ ugly )(?) /dude/
2190
2191       Incremental Parsing
2192
2193       "Parse::RecDescent" provides two methods - "Extend" and "Replace" -
2194       which can be used to alter the grammar matched by a parser. Both meth‐
2195       ods take the same argument as "Parse::RecDescent::new", namely a gram‐
2196       mar specification string
2197
2198       "Parse::RecDescent::Extend" interprets the grammar specification and
2199       adds any productions it finds to the end of the rules for which they
2200       are specified. For example:
2201
2202               $add = "name: 'Jimmy-Bob' ⎪ 'Bobby-Jim'\ndesc: colour /necks?/";
2203               parser->Extend($add);
2204
2205       adds two productions to the rule "name" (creating it if necessary) and
2206       one production to the rule "desc".
2207
2208       "Parse::RecDescent::Replace" is identical, except that it first resets
2209       are rule specified in the additional grammar, removing any existing
2210       productions.  Hence after:
2211
2212               $add = "name: 'Jimmy-Bob' ⎪ 'Bobby-Jim'\ndesc: colour /necks?/";
2213               parser->Replace($add);
2214
2215       are are only valid "name"s and the one possible description.
2216
2217       A more interesting use of the "Extend" and "Replace" methods is to call
2218       them inside the action of an executing parser. For example:
2219
2220               typedef: 'typedef' type_name identifier ';'
2221                              { $thisparser->Extend("type_name: '$item[3]'") }
2222                      ⎪ <error>
2223
2224               identifier: ...!type_name /[A-Za-z_]w*/
2225
2226       which automatically prevents type names from being typedef'd, or:
2227
2228               command: 'map' key_name 'to' abort_key
2229                              { $thisparser->Replace("abort_key: '$item[2]'") }
2230                      ⎪ 'map' key_name 'to' key_name
2231                              { map_key($item[2],$item[4]) }
2232                      ⎪ abort_key
2233                              { exit if confirm("abort?") }
2234
2235               abort_key: 'q'
2236
2237               key_name: ...!abort_key /[A-Za-z]/
2238
2239       which allows the user to change the abort key binding, but not to
2240       unbind it.
2241
2242       The careful use of such constructs makes it possible to reconfigure a a
2243       running parser, eliminating the need for semantic feedback by providing
2244       syntactic feedback instead. However, as currently implemented,
2245       "Replace()" and "Extend()" have to regenerate and re-"eval" the entire
2246       parser whenever they are called. This makes them quite slow for large
2247       grammars.
2248
2249       In such cases, the judicious use of an interpolated regex is likely to
2250       be far more efficient:
2251
2252               typedef: 'typedef' type_name/ identifier ';'
2253                              { $thisparser->{local}{type_name} .= "⎪$item[3]" }
2254                      ⎪ <error>
2255
2256               identifier: ...!type_name /[A-Za-z_]w*/
2257
2258               type_name: /$thisparser->{local}{type_name}/
2259
2260       Precompiling parsers
2261
2262       Normally Parse::RecDescent builds a parser from a grammar at run-time.
2263       That approach simplifies the design and implementation of parsing code,
2264       but has the disadvantage that it slows the parsing process down - you
2265       have to wait for Parse::RecDescent to build the parser every time the
2266       program runs. Long or complex grammars can be particularly slow to
2267       build, leading to unacceptable delays at start-up.
2268
2269       To overcome this, the module provides a way of "pre-building" a parser
2270       object and saving it in a separate module. That module can then be used
2271       to create clones of the original parser.
2272
2273       A grammar may be precompiled using the "Precompile" class method.  For
2274       example, to precompile a grammar stored in the scalar $grammar, and
2275       produce a class named PreGrammar in a module file named PreGrammar.pm,
2276       you could use:
2277
2278               use Parse::RecDescent;
2279
2280               Parse::RecDescent->Precompile($grammar, "PreGrammar");
2281
2282       The first argument is the grammar string, the second is the name of the
2283       class to be built. The name of the module file is generated automati‐
2284       cally by appending ".pm" to the last element of the class name. Thus
2285
2286               Parse::RecDescent->Precompile($grammar, "My::New::Parser");
2287
2288       would produce a module file named Parser.pm.
2289
2290       It is somewhat tedious to have to write a small Perl program just to
2291       generate a precompiled grammar class, so Parse::RecDescent has some
2292       special magic that allows you to do the job directly from the com‐
2293       mand-line.
2294
2295       If your grammar is specified in a file named grammar, you can generate
2296       a class named Yet::Another::Grammar like so:
2297
2298               > perl -MParse::RecDescent - grammar Yet::Another::Grammar
2299
2300       This would produce a file named Grammar.pm containing the full defini‐
2301       tion of a class called Yet::Another::Grammar. Of course, to use that
2302       class, you would need to put the Grammar.pm file in a directory named
2303       Yet/Another, somewhere in your Perl include path.
2304
2305       Having created the new class, it's very easy to use it to build a
2306       parser. You simply "use" the new module, and then call its "new" method
2307       to create a parser object. For example:
2308
2309               use Yet::Another::Grammar;
2310               my $parser = Yet::Another::Grammar->new();
2311
2312       The effect of these two lines is exactly the same as:
2313
2314               use Parse::RecDescent;
2315
2316               open GRAMMAR_FILE, "grammar" or die;
2317               local $/;
2318               my $grammar = <GRAMMAR_FILE>;
2319
2320               my $parser = Parse::RecDescent->new($grammar);
2321
2322       only considerably faster.
2323
2324       Note however that the parsers produced by either approach are exactly
2325       the same, so whilst precompilation has an effect on set-up speed, it
2326       has no effect on parsing speed. RecDescent 2.0 will address that prob‐
2327       lem.
2328
2329       A Metagrammar for "Parse::RecDescent"
2330
2331       The following is a specification of grammar format accepted by
2332       "Parse::RecDescent::new" (specified in the "Parse::RecDescent" grammar
2333       format!):
2334
2335        grammar    : components(s)
2336
2337        component  : rule ⎪ comment
2338
2339        rule       : "\n" identifier ":" production(s?)
2340
2341        production : items(s)
2342
2343        item       : lookahead(?) simpleitem
2344                   ⎪ directive
2345                   ⎪ comment
2346
2347        lookahead  : '...' ⎪ '...!'                   # +'ve or -'ve lookahead
2348
2349        simpleitem : subrule args(?)                  # match another rule
2350                   ⎪ repetition                       # match repeated subrules
2351                   ⎪ terminal                         # match the next input
2352                   ⎪ bracket args(?)                  # match alternative items
2353                   ⎪ action                           # do something
2354
2355        subrule    : identifier                       # the name of the rule
2356
2357        args       : {extract_codeblock($text,'[]')}  # just like a [...] array ref
2358
2359        repetition : subrule args(?) howoften
2360
2361        howoften   : '(?)'                            # 0 or 1 times
2362                   ⎪ '(s?)'                           # 0 or more times
2363                   ⎪ '(s)'                            # 1 or more times
2364                   ⎪ /(\d+)[.][.](/\d+)/              # $1 to $2 times
2365                   ⎪ /[.][.](/\d*)/                   # at most $1 times
2366                   ⎪ /(\d*)[.][.])/                   # at least $1 times
2367
2368        terminal   : /[/]([\][/]⎪[^/])*[/]/           # interpolated pattern
2369                   ⎪ /"([\]"⎪[^"])*"/                 # interpolated literal
2370                   ⎪ /'([\]'⎪[^'])*'/                 # uninterpolated literal
2371
2372        action     : { extract_codeblock($text) }     # embedded Perl code
2373
2374        bracket    : '(' Item(s) production(s?) ')'   # alternative subrules
2375
2376        directive  : '<commit>'                       # commit to production
2377                   ⎪ '<uncommit>'                     # cancel commitment
2378                   ⎪ '<resync>'                       # skip to newline
2379                   ⎪ '<resync:' pattern '>'           # skip <pattern>
2380                   ⎪ '<reject>'                       # fail this production
2381                   ⎪ '<reject:' condition '>'         # fail if <condition>
2382                   ⎪ '<error>'                        # report an error
2383                   ⎪ '<error:' string '>'             # report error as "<string>"
2384                   ⎪ '<error?>'                       # error only if committed
2385                   ⎪ '<error?:' string '>'            #   "    "    "    "
2386                   ⎪ '<rulevar:' /[^>]+/ '>'          # define rule-local variable
2387                   ⎪ '<matchrule:' string '>'         # invoke rule named in string
2388
2389        identifier : /[a-z]\w*/i                      # must start with alpha
2390
2391        comment    : /#[^\n]*/                        # same as Perl
2392
2393        pattern    : {extract_bracketed($text,'<')}   # allow embedded "<..>"
2394
2395        condition  : {extract_codeblock($text,'{<')}  # full Perl expression
2396
2397        string     : {extract_variable($text)}        # any Perl variable
2398                   ⎪ {extract_quotelike($text)}       #   or quotelike string
2399                   ⎪ {extract_bracketed($text,'<')}   #   or balanced brackets
2400

GOTCHAS

2402       This section describes common mistakes that grammar writers seem to
2403       make on a regular basis.
2404
2405       1. Expecting an error to always invalidate a parse
2406
2407       A common mistake when using error messages is to write the grammar like
2408       this:
2409
2410               file: line(s)
2411
2412               line: line_type_1
2413                   ⎪ line_type_2
2414                   ⎪ line_type_3
2415                   ⎪ <error>
2416
2417       The expectation seems to be that any line that is not of type 1, 2 or 3
2418       will invoke the "<error>" directive and thereby cause the parse to
2419       fail.
2420
2421       Unfortunately, that only happens if the error occurs in the very first
2422       line.  The first rule states that a "file" is matched by one or more
2423       lines, so if even a single line succeeds, the first rule is completely
2424       satisfied and the parse as a whole succeeds. That means that any error
2425       messages generated by subsequent failures in the "line" rule are qui‐
2426       etly ignored.
2427
2428       Typically what's really needed is this:
2429
2430               file: line(s) eofile    { $return = $item[1] }
2431
2432               line: line_type_1
2433                   ⎪ line_type_2
2434                   ⎪ line_type_3
2435                   ⎪ <error>
2436
2437               eofile: /^\Z/
2438
2439       The addition of the "eofile" subrule  to the first production means
2440       that a file only matches a series of successful "line" matches that
2441       consume the complete input text. If any input text remains after the
2442       lines are matched, there must have been an error in the last "line". In
2443       that case the "eofile" rule will fail, causing the entire "file" rule
2444       to fail too.
2445
2446       Note too that "eofile" must match "/^\Z/" (end-of-text), not "/^\cZ/"
2447       or "/^\cD/" (end-of-file).
2448
2449       And don't forget the action at the end of the production. If you just
2450       write:
2451
2452               file: line(s) eofile
2453
2454       then the value returned by the "file" rule will be the value of its
2455       last item: "eofile". Since "eofile" always returns an empty string on
2456       success, that will cause the "file" rule to return that empty string.
2457       Apart from returning the wrong value, returning an empty string will
2458       trip up code such as:
2459
2460               $parser->file($filetext) ⎪⎪ die;
2461
2462       (since "" is false).
2463
2464       Remember that Parse::RecDescent returns undef on failure, so the only
2465       safe test for failure is:
2466
2467               defined($parser->file($filetext)) ⎪⎪ die;
2468

DIAGNOSTICS

2470       Diagnostics are intended to be self-explanatory (particularly if you
2471       use -RD_HINT (under perl -s) or define $::RD_HINT inside the program).
2472
2473       "Parse::RecDescent" currently diagnoses the following:
2474
2475       ·   Invalid regular expressions used as pattern terminals (fatal
2476           error).
2477
2478       ·   Invalid Perl code in code blocks (fatal error).
2479
2480       ·   Lookahead used in the wrong place or in a nonsensical way (fatal
2481           error).
2482
2483       ·   "Obvious" cases of left-recursion (fatal error).
2484
2485       ·   Missing or extra components in a "<leftop>" or "<rightop>" direc‐
2486           tive.
2487
2488       ·   Unrecognisable components in the grammar specification (fatal
2489           error).
2490
2491       ·   "Orphaned" rule components specified before the first rule (fatal
2492           error) or after an "<error>" directive (level 3 warning).
2493
2494       ·   Missing rule definitions (this only generates a level 3 warning,
2495           since you may be providing them later via "Parse::RecDes‐
2496           cent::Extend()").
2497
2498       ·   Instances where greedy repetition behaviour will almost certainly
2499           cause the failure of a production (a level 3 warning - see
2500           "ON-GOING ISSUES AND FUTURE DIRECTIONS" below).
2501
2502       ·   Attempts to define rules named 'Replace' or 'Extend', which cannot
2503           be called directly through the parser object because of the prede‐
2504           fined meaning of "Parse::RecDescent::Replace" and "Parse::RecDes‐
2505           cent::Extend". (Only a level 2 warning is generated, since such
2506           rules can still be used as subrules).
2507
2508       ·   Productions which consist of a single "<error?>" directive, and
2509           which therefore may succeed unexpectedly (a level 2 warning, since
2510           this might conceivably be the desired effect).
2511
2512       ·   Multiple consecutive lookahead specifiers (a level 1 warning only,
2513           since their effects simply accumulate).
2514
2515       ·   Productions which start with a "<reject>" or "<rulevar:...>" direc‐
2516           tive. Such productions are optimized away (a level 1 warning).
2517
2518       ·   Rules which are autogenerated under $::AUTOSTUB (a level 1 warn‐
2519           ing).
2520

AUTHOR

2522       Damian Conway (damian@conway.org)
2523

BUGS AND IRRITATIONS

2525       There are undoubtedly serious bugs lurking somewhere in this much code
2526       :-) Bug reports and other feedback are most welcome.
2527
2528       Ongoing annoyances include:
2529
2530       ·   There's no support for parsing directly from an input stream.  If
2531           and when the Perl Gods give us regular expressions on streams, this
2532           should be trivial (ahem!) to implement.
2533
2534       ·   The parser generator can get confused if actions aren't properly
2535           closed or if they contain particularly nasty Perl syntax errors
2536           (especially unmatched curly brackets).
2537
2538       ·   The generator only detects the most obvious form of left recursion
2539           (potential recursion on the first subrule in a rule). More subtle
2540           forms of left recursion (for example, through the second item in a
2541           rule after a "zero" match of a preceding "zero-or-more" repetition,
2542           or after a match of a subrule with an empty production) are not
2543           found.
2544
2545       ·   Instead of complaining about left-recursion, the generator should
2546           silently transform the grammar to remove it. Don't expect this fea‐
2547           ture any time soon as it would require a more sophisticated
2548           approach to parser generation than is currently used.
2549
2550       ·   The generated parsers don't always run as fast as might be wished.
2551
2552       ·   The meta-parser should be bootstrapped using "Parse::RecDescent"
2553           :-)
2554

ON-GOING ISSUES AND FUTURE DIRECTIONS

2556       1.  Repetitions are "incorrigibly greedy" in that they will eat every‐
2557           thing they can and won't backtrack if that behaviour causes a pro‐
2558           duction to fail needlessly.  So, for example:
2559
2560                   rule: subrule(s) subrule
2561
2562           will never succeed, because the repetition will eat all the sub‐
2563           rules it finds, leaving none to match the second item. Such con‐
2564           structions are relatively rare (and "Parse::RecDescent::new" gener‐
2565           ates a warning whenever they occur) so this may not be a problem,
2566           especially since the insatiable behaviour can be overcome "manu‐
2567           ally" by writing:
2568
2569                   rule: penultimate_subrule(s) subrule
2570
2571                   penultimate_subrule: subrule ...subrule
2572
2573           The issue is that this construction is exactly twice as expensive
2574           as the original, whereas backtracking would add only 1/N to the
2575           cost (for matching N repetitions of "subrule"). I would welcome
2576           feedback on the need for backtracking; particularly on cases where
2577           the lack of it makes parsing performance problematical.
2578
2579       2.  Having opened that can of worms, it's also necessary to consider
2580           whether there is a need for non-greedy repetition specifiers.
2581           Again, it's possible (at some cost) to manually provide the
2582           required functionality:
2583
2584                   rule: nongreedy_subrule(s) othersubrule
2585
2586                   nongreedy_subrule: subrule ...!othersubrule
2587
2588           Overall, the issue is whether the benefit of this extra functional‐
2589           ity outweighs the drawbacks of further complicating the (currently
2590           minimalist) grammar specification syntax, and (worse) introducing
2591           more overhead into the generated parsers.
2592
2593       3.  An "<autocommit>" directive would be nice. That is, it would be
2594           useful to be able to say:
2595
2596                   command: <autocommit>
2597                   command: 'find' name
2598                          ⎪ 'find' address
2599                          ⎪ 'do' command 'at' time 'if' condition
2600                          ⎪ 'do' command 'at' time
2601                          ⎪ 'do' command
2602                          ⎪ unusual_command
2603
2604           and have the generator work out that this should be "pruned" thus:
2605
2606                   command: 'find' name
2607                          ⎪ 'find' <commit> address
2608                          ⎪ 'do' <commit> command <uncommit>
2609                                   'at' time
2610                                   'if' <commit> condition
2611                          ⎪ 'do' <commit> command <uncommit>
2612                                   'at' <commit> time
2613                          ⎪ 'do' <commit> command
2614                          ⎪ unusual_command
2615
2616           There are several issues here. Firstly, should the "<autocommit>"
2617           automatically install an "<uncommit>" at the start of the last pro‐
2618           duction (on the grounds that the "command" rule doesn't know
2619           whether an "unusual_command" might start with "find" or "do") or
2620           should the "unusual_command" subgraph be analysed (to see if it
2621           might be viable after a "find" or "do")?
2622
2623           The second issue is how regular expressions should be treated. The
2624           simplest approach would be simply to uncommit before them (on the
2625           grounds that they might match). Better efficiency would be obtained
2626           by analyzing all preceding literal tokens to determine whether the
2627           pattern would match them.
2628
2629           Overall, the issues are: can such automated "pruning" approach a
2630           hand-tuned version sufficiently closely to warrant the extra set-up
2631           expense, and (more importantly) is the problem important enough to
2632           even warrant the non-trivial effort of building an automated solu‐
2633           tion?
2634

COPYRIGHT

2636       Copyright (c) 1997-2000, Damian Conway. All Rights Reserved.  This mod‐
2637       ule is free software. It may be used, redistributed and/or modified
2638       under the terms of the Perl Artistic License
2639         (see http://www.perl.com/perl/misc/Artistic.html)
2640
2641
2642
2643perl v5.8.8                       2003-04-09              Parse::RecDescent(3)