1Parse::RecDescent(3)  User Contributed Perl Documentation Parse::RecDescent(3)
2
3
4

NAME

6       Parse::RecDescent - Generate Recursive-Descent Parsers
7

VERSION

9       This document describes version 1.967015 of Parse::RecDescent released
10       April 4th, 2017.
11

SYNOPSIS

13        use Parse::RecDescent;
14
15        # Generate a parser from the specification in $grammar:
16
17            $parser = new Parse::RecDescent ($grammar);
18
19        # Generate a parser from the specification in $othergrammar
20
21            $anotherparser = new Parse::RecDescent ($othergrammar);
22
23
24        # Parse $text using rule 'startrule' (which must be
25        # defined in $grammar):
26
27           $parser->startrule($text);
28
29
30        # Parse $text using rule 'otherrule' (which must also
31        # be defined in $grammar):
32
33            $parser->otherrule($text);
34
35
36        # Change the universal token prefix pattern
37        # before building a grammar
38        # (the default is: '\s*'):
39
40           $Parse::RecDescent::skip = '[ \t]+';
41
42
43        # Replace productions of existing rules (or create new ones)
44        # with the productions defined in $newgrammar:
45
46           $parser->Replace($newgrammar);
47
48
49        # Extend existing rules (or create new ones)
50        # by adding extra productions defined in $moregrammar:
51
52           $parser->Extend($moregrammar);
53
54
55        # Global flags (useful as command line arguments under -s):
56
57           $::RD_ERRORS       # unless undefined, report fatal errors
58           $::RD_WARN         # unless undefined, also report non-fatal problems
59           $::RD_HINT         # if defined, also suggestion remedies
60           $::RD_TRACE        # if defined, also trace parsers' behaviour
61           $::RD_AUTOSTUB     # if defined, generates "stubs" for undefined rules
62           $::RD_AUTOACTION   # if defined, appends specified action to productions
63

DESCRIPTION

65   Overview
66       Parse::RecDescent incrementally generates top-down recursive-descent
67       text parsers from simple yacc-like grammar specifications. It provides:
68
69       ·   Regular expressions or literal strings as terminals (tokens),
70
71       ·   Multiple (non-contiguous) productions for any rule,
72
73       ·   Repeated and optional subrules within productions,
74
75       ·   Full access to Perl within actions specified as part of the
76           grammar,
77
78       ·   Simple automated error reporting during parser generation and
79           parsing,
80
81       ·   The ability to commit to, uncommit to, or reject particular
82           productions during a parse,
83
84       ·   The ability to pass data up and down the parse tree ("down" via
85           subrule argument lists, "up" via subrule return values)
86
87       ·   Incremental extension of the parsing grammar (even during a parse),
88
89       ·   Precompilation of parser objects,
90
91       ·   User-definable reduce-reduce conflict resolution via "scoring" of
92           matching productions.
93
94   Using "Parse::RecDescent"
95       Parser objects are created by calling "Parse::RecDescent::new", passing
96       in a grammar specification (see the following subsections). If the
97       grammar is correct, "new" returns a blessed reference which can then be
98       used to initiate parsing through any rule specified in the original
99       grammar. A typical sequence looks like this:
100
101           $grammar = q {
102               # GRAMMAR SPECIFICATION HERE
103                };
104
105           $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n";
106
107           # acquire $text
108
109           defined $parser->startrule($text) or print "Bad text!\n";
110
111       The rule through which parsing is initiated must be explicitly defined
112       in the grammar (i.e. for the above example, the grammar must include a
113       rule of the form: "startrule: <subrules>".
114
115       If the starting rule succeeds, its value (see below) is returned.
116       Failure to generate the original parser or failure to match a text is
117       indicated by returning "undef". Note that it's easy to set up grammars
118       that can succeed, but which return a value of 0, "0", or "".  So don't
119       be tempted to write:
120
121           $parser->startrule($text) or print "Bad text!\n";
122
123       Normally, the parser has no effect on the original text. So in the
124       previous example the value of $text would be unchanged after having
125       been parsed.
126
127       If, however, the text to be matched is passed by reference:
128
129           $parser->startrule(\$text)
130
131       then any text which was consumed during the match will be removed from
132       the start of $text.
133
134   Rules
135       In the grammar from which the parser is built, rules are specified by
136       giving an identifier (which must satisfy /[A-Za-z]\w*/), followed by a
137       colon on the same line, followed by one or more productions, separated
138       by single vertical bars. The layout of the productions is entirely
139       free-format:
140
141           rule1:  production1
142            |  production2 |
143           production3 | production4
144
145       At any point in the grammar previously defined rules may be extended
146       with additional productions. This is achieved by redeclaring the rule
147       with the new productions. Thus:
148
149           rule1: a | b | c
150           rule2: d | e | f
151           rule1: g | h
152
153       is exactly equivalent to:
154
155           rule1: a | b | c | g | h
156           rule2: d | e | f
157
158       Each production in a rule consists of zero or more items, each of which
159       may be either: the name of another rule to be matched (a "subrule"), a
160       pattern or string literal to be matched directly (a "token"), a block
161       of Perl code to be executed (an "action"), a special instruction to the
162       parser (a "directive"), or a standard Perl comment (which is ignored).
163
164       A rule matches a text if one of its productions matches. A production
165       matches if each of its items match consecutive substrings of the text.
166       The productions of a rule being matched are tried in the same order
167       that they appear in the original grammar, and the first matching
168       production terminates the match attempt (successfully). If all
169       productions are tried and none matches, the match attempt fails.
170
171       Note that this behaviour is quite different from the "prefer the longer
172       match" behaviour of yacc. For example, if yacc were parsing the rule:
173
174           seq : 'A' 'B'
175           | 'A' 'B' 'C'
176
177       upon matching "AB" it would look ahead to see if a 'C' is next and, if
178       so, will match the second production in preference to the first. In
179       other words, yacc effectively tries all the productions of a rule
180       breadth-first in parallel, and selects the "best" match, where "best"
181       means longest (note that this is a gross simplification of the true
182       behaviour of yacc but it will do for our purposes).
183
184       In contrast, "Parse::RecDescent" tries each production depth-first in
185       sequence, and selects the "best" match, where "best" means first. This
186       is the fundamental difference between "bottom-up" and "recursive
187       descent" parsing.
188
189       Each successfully matched item in a production is assigned a value,
190       which can be accessed in subsequent actions within the same production
191       (or, in some cases, as the return value of a successful subrule call).
192       Unsuccessful items don't have an associated value, since the failure of
193       an item causes the entire surrounding production to immediately fail.
194       The following sections describe the various types of items and their
195       success values.
196
197   Subrules
198       A subrule which appears in a production is an instruction to the parser
199       to attempt to match the named rule at that point in the text being
200       parsed. If the named subrule is not defined when requested the
201       production containing it immediately fails (unless it was "autostubbed"
202       - see Autostubbing).
203
204       A rule may (recursively) call itself as a subrule, but not as the left-
205       most item in any of its productions (since such recursions are usually
206       non-terminating).
207
208       The value associated with a subrule is the value associated with its
209       $return variable (see "Actions" below), or with the last successfully
210       matched item in the subrule match.
211
212       Subrules may also be specified with a trailing repetition specifier,
213       indicating that they are to be (greedily) matched the specified number
214       of times. The available specifiers are:
215
216           subrule(?)  # Match one-or-zero times
217           subrule(s)  # Match one-or-more times
218           subrule(s?) # Match zero-or-more times
219           subrule(N)  # Match exactly N times for integer N > 0
220           subrule(N..M)   # Match between N and M times
221           subrule(..M)    # Match between 1 and M times
222           subrule(N..)    # Match at least N times
223
224       Repeated subrules keep matching until either the subrule fails to
225       match, or it has matched the minimal number of times but fails to
226       consume any of the parsed text (this second condition prevents the
227       subrule matching forever in some cases).
228
229       Since a repeated subrule may match many instances of the subrule
230       itself, the value associated with it is not a simple scalar, but rather
231       a reference to a list of scalars, each of which is the value associated
232       with one of the individual subrule matches. In other words in the rule:
233
234           program: statement(s)
235
236       the value associated with the repeated subrule "statement(s)" is a
237       reference to an array containing the values matched by each call to the
238       individual subrule "statement".
239
240       Repetition modifiers may include a separator pattern:
241
242           program: statement(s /;/)
243
244       specifying some sequence of characters to be skipped between each
245       repetition.  This is really just a shorthand for the <leftop:...>
246       directive (see below).
247
248   Tokens
249       If a quote-delimited string or a Perl regex appears in a production,
250       the parser attempts to match that string or pattern at that point in
251       the text. For example:
252
253           typedef: "typedef" typename identifier ';'
254
255           identifier: /[A-Za-z_][A-Za-z0-9_]*/
256
257       As in regular Perl, a single quoted string is uninterpolated, whilst a
258       double-quoted string or a pattern is interpolated (at the time of
259       matching, not when the parser is constructed). Hence, it is possible to
260       define rules in which tokens can be set at run-time:
261
262           typedef: "$::typedefkeyword" typename identifier ';'
263
264           identifier: /$::identpat/
265
266       Note that, since each rule is implemented inside a special namespace
267       belonging to its parser, it is necessary to explicitly quantify
268       variables from the main package.
269
270       Regex tokens can be specified using just slashes as delimiters or with
271       the explicit "m<delimiter>......<delimiter>" syntax:
272
273           typedef: "typedef" typename identifier ';'
274
275           typename: /[A-Za-z_][A-Za-z0-9_]*/
276
277           identifier: m{[A-Za-z_][A-Za-z0-9_]*}
278
279       A regex of either type can also have any valid trailing parameter(s)
280       (that is, any of [cgimsox]):
281
282           typedef: "typedef" typename identifier ';'
283
284           identifier: / [a-z_]        # LEADING ALPHA OR UNDERSCORE
285                 [a-z0-9_]*    # THEN DIGITS ALSO ALLOWED
286               /ix     # CASE/SPACE/COMMENT INSENSITIVE
287
288       The value associated with any successfully matched token is a string
289       containing the actual text which was matched by the token.
290
291       It is important to remember that, since each grammar is specified in a
292       Perl string, all instances of the universal escape character '\' within
293       a grammar must be "doubled", so that they interpolate to single '\'s
294       when the string is compiled. For example, to use the grammar:
295
296           word:       /\S+/ | backslash
297           line:       prefix word(s) "\n"
298           backslash:  '\\'
299
300       the following code is required:
301
302           $parser = new Parse::RecDescent (q{
303
304               word:   /\\S+/ | backslash
305               line:   prefix word(s) "\\n"
306               backslash:  '\\\\'
307
308           });
309
310   Anonymous subrules
311       Parentheses introduce a nested scope that is very like a call to an
312       anonymous subrule. Hence they are useful for "in-lining" subroutine
313       calls, and other kinds of grouping behaviour. For example, instead of:
314
315           word:       /\S+/ | backslash
316           line:       prefix word(s) "\n"
317
318       you could write:
319
320           line:       prefix ( /\S+/ | backslash )(s) "\n"
321
322       and get exactly the same effects.
323
324       Parentheses are also use for collecting unrepeated alternations within
325       a single production.
326
327           secret_identity: "Mr" ("Incredible"|"Fantastic"|"Sheen") ", Esq."
328
329   Terminal Separators
330       For the purpose of matching, each terminal in a production is
331       considered to be preceded by a "prefix" - a pattern which must be
332       matched before a token match is attempted. By default, the prefix is
333       optional whitespace (which always matches, at least trivially), but
334       this default may be reset in any production.
335
336       The variable $Parse::RecDescent::skip stores the universal prefix,
337       which is the default for all terminal matches in all parsers built with
338       "Parse::RecDescent".
339
340       If you want to change the universal prefix using
341       $Parse::RecDescent::skip, be careful to set it before creating the
342       grammar object, because it is applied statically (when a grammar is
343       built) rather than dynamically (when the grammar is used).
344       Alternatively you can provide a global "<skip:...>" directive in your
345       grammar before any rules (described later).
346
347       The prefix for an individual production can be altered by using the
348       "<skip:...>" directive (described later).  Setting this directive in
349       the top-level rule is an alternative approach to setting
350       $Parse::RecDescent::skip before creating the object, but in this case
351       you don't get the intended skipping behaviour if you directly invoke
352       methods different from the top-level rule.
353
354   Actions
355       An action is a block of Perl code which is to be executed (as the block
356       of a "do" statement) when the parser reaches that point in a
357       production. The action executes within a special namespace belonging to
358       the active parser, so care must be taken in correctly qualifying
359       variable names (see also "Start-up Actions" below).
360
361       The action is considered to succeed if the final value of the block is
362       defined (that is, if the implied "do" statement evaluates to a defined
363       value - even one which would be treated as "false"). Note that the
364       value associated with a successful action is also the final value in
365       the block.
366
367       An action will fail if its last evaluated value is "undef". This is
368       surprisingly easy to accomplish by accident. For instance, here's an
369       infuriating case of an action that makes its production fail, but only
370       when debugging isn't activated:
371
372           description: name rank serial_number
373               { print "Got $item[2] $item[1] ($item[3])\n"
374               if $::debugging
375               }
376
377       If $debugging is false, no statement in the block is executed, so the
378       final value is "undef", and the entire production fails. The solution
379       is:
380
381           description: name rank serial_number
382               { print "Got $item[2] $item[1] ($item[3])\n"
383               if $::debugging;
384                 1;
385               }
386
387       Within an action, a number of useful parse-time variables are available
388       in the special parser namespace (there are other variables also
389       accessible, but meddling with them will probably just break your
390       parser. As a general rule, if you avoid referring to unqualified
391       variables - especially those starting with an underscore - inside an
392       action, things should be okay):
393
394       @item and %item
395           The array slice @item[1..$#item] stores the value associated with
396           each item (that is, each subrule, token, or action) in the current
397           production. The analogy is to $1, $2, etc. in a yacc grammar.  Note
398           that, for obvious reasons, @item only contains the values of items
399           before the current point in the production.
400
401           The first element ($item[0]) stores the name of the current rule
402           being matched.
403
404           @item is a standard Perl array, so it can also be indexed with
405           negative numbers, representing the number of items back from the
406           current position in the parse:
407
408               stuff: /various/ bits 'and' pieces "then" data 'end'
409                   { print $item[-2] }  # PRINTS data
410                        # (EASIER THAN: $item[6])
411
412           The %item hash complements the <@item> array, providing named
413           access to the same item values:
414
415               stuff: /various/ bits 'and' pieces "then" data 'end'
416                   { print $item{data}  # PRINTS data
417                        # (EVEN EASIER THAN USING @item)
418
419           The results of named subrules are stored in the hash under each
420           subrule's name (including the repetition specifier, if any), whilst
421           all other items are stored under a "named positional" key that
422           indicates their ordinal position within their item type:
423           __STRINGn__, __PATTERNn__, __DIRECTIVEn__, __ACTIONn__:
424
425               stuff: /various/ bits 'and' pieces "then" data 'end' { save }
426                   { print $item{__PATTERN1__}, # PRINTS 'various'
427                   $item{__STRING2__},  # PRINTS 'then'
428                   $item{__ACTION1__},  # PRINTS RETURN
429                            # VALUE OF save
430                   }
431
432           If you want proper named access to patterns or literals, you need
433           to turn them into separate rules:
434
435               stuff: various bits 'and' pieces "then" data 'end'
436                   { print $item{various}  # PRINTS various
437                   }
438
439               various: /various/
440
441           The special entry $item{__RULE__} stores the name of the current
442           rule (i.e. the same value as $item[0].
443
444           The advantage of using %item, instead of @items is that it removes
445           the need to track items positions that may change as a grammar
446           evolves. For example, adding an interim "<skip>" directive of
447           action can silently ruin a trailing action, by moving an @item
448           element "down" the array one place. In contrast, the named entry of
449           %item is unaffected by such an insertion.
450
451           A limitation of the %item hash is that it only records the last
452           value of a particular subrule. For example:
453
454               range: '(' number '..' number )'
455                   { $return = $item{number} }
456
457           will return only the value corresponding to the second match of the
458           "number" subrule. In other words, successive calls to a subrule
459           overwrite the corresponding entry in %item. Once again, the
460           solution is to rename each subrule in its own rule:
461
462               range: '(' from_num '..' to_num ')'
463                   { $return = $item{from_num} }
464
465               from_num: number
466               to_num:   number
467
468       @arg and %arg
469           The array @arg and the hash %arg store any arguments passed to the
470           rule from some other rule (see "Subrule argument lists"). Changes
471           to the elements of either variable do not propagate back to the
472           calling rule (data can be passed back from a subrule via the
473           $return variable - see next item).
474
475       $return
476           If a value is assigned to $return within an action, that value is
477           returned if the production containing the action eventually matches
478           successfully. Note that setting $return doesn't cause the current
479           production to succeed. It merely tells it what to return if it does
480           succeed.  Hence $return is analogous to $$ in a yacc grammar.
481
482           If $return is not assigned within a production, the value of the
483           last component of the production (namely: $item[$#item]) is
484           returned if the production succeeds.
485
486       $commit
487           The current state of commitment to the current production (see
488           "Directives" below).
489
490       $skip
491           The current terminal prefix (see "Directives" below).
492
493       $text
494           The remaining (unparsed) text. Changes to $text do not propagate
495           out of unsuccessful productions, but do survive successful
496           productions. Hence it is possible to dynamically alter the text
497           being parsed - for example, to provide a "#include"-like facility:
498
499               hash_include: '#include' filename
500                   { $text = ::loadfile($item[2]) . $text }
501
502               filename: '<' /[a-z0-9._-]+/i '>'  { $return = $item[2] }
503               | '"' /[a-z0-9._-]+/i '"'  { $return = $item[2] }
504
505       $thisline and $prevline
506           $thisline stores the current line number within the current parse
507           (starting from 1). $prevline stores the line number for the last
508           character which was already successfully parsed (this will be
509           different from $thisline at the end of each line).
510
511           For efficiency, $thisline and $prevline are actually tied hashes,
512           and only recompute the required line number when the variable's
513           value is used.
514
515           Assignment to $thisline adjusts the line number calculator, so that
516           it believes that the current line number is the value being
517           assigned. Note that this adjustment will be reflected in all
518           subsequent line numbers calculations.
519
520           Modifying the value of the variable $text (as in the previous
521           "hash_include" example, for instance) will confuse the line
522           counting mechanism. To prevent this, you should call
523           "Parse::RecDescent::LineCounter::resync($thisline)" immediately
524           after any assignment to the variable $text (or, at least, before
525           the next attempt to use $thisline).
526
527           Note that if a production fails after assigning to or resync'ing
528           $thisline, the parser's line counter mechanism will usually be
529           corrupted.
530
531           Also see the entry for @itempos.
532
533           The line number can be set to values other than 1, by calling the
534           start rule with a second argument. For example:
535
536               $parser = new Parse::RecDescent ($grammar);
537
538               $parser->input($text, 10);  # START LINE NUMBERS AT 10
539
540       $thiscolumn and $prevcolumn
541           $thiscolumn stores the current column number within the current
542           line being parsed (starting from 1). $prevcolumn stores the column
543           number of the last character which was actually successfully
544           parsed. Usually "$prevcolumn == $thiscolumn-1", but not at the end
545           of lines.
546
547           For efficiency, $thiscolumn and $prevcolumn are actually tied
548           hashes, and only recompute the required column number when the
549           variable's value is used.
550
551           Assignment to $thiscolumn or $prevcolumn is a fatal error.
552
553           Modifying the value of the variable $text (as in the previous
554           "hash_include" example, for instance) may confuse the column
555           counting mechanism.
556
557           Note that $thiscolumn reports the column number before any
558           whitespace that might be skipped before reading a token. Hence if
559           you wish to know where a token started (and ended) use something
560           like this:
561
562               rule: token1 token2 startcol token3 endcol token4
563                   { print "token3: columns $item[3] to $item[5]"; }
564
565               startcol: '' { $thiscolumn }    # NEED THE '' TO STEP PAST TOKEN SEP
566               endcol:  { $prevcolumn }
567
568           Also see the entry for @itempos.
569
570       $thisoffset and $prevoffset
571           $thisoffset stores the offset of the current parsing position
572           within the complete text being parsed (starting from 0).
573           $prevoffset stores the offset of the last character which was
574           actually successfully parsed. In all cases "$prevoffset ==
575           $thisoffset-1".
576
577           For efficiency, $thisoffset and $prevoffset are actually tied
578           hashes, and only recompute the required offset when the variable's
579           value is used.
580
581           Assignment to $thisoffset or <$prevoffset> is a fatal error.
582
583           Modifying the value of the variable $text will not affect the
584           offset counting mechanism.
585
586           Also see the entry for @itempos.
587
588       @itempos
589           The array @itempos stores a hash reference corresponding to each
590           element of @item. The elements of the hash provide the following:
591
592               $itempos[$n]{offset}{from}  # VALUE OF $thisoffset BEFORE $item[$n]
593               $itempos[$n]{offset}{to}    # VALUE OF $prevoffset AFTER $item[$n]
594               $itempos[$n]{line}{from}    # VALUE OF $thisline BEFORE $item[$n]
595               $itempos[$n]{line}{to}  # VALUE OF $prevline AFTER $item[$n]
596               $itempos[$n]{column}{from}  # VALUE OF $thiscolumn BEFORE $item[$n]
597               $itempos[$n]{column}{to}    # VALUE OF $prevcolumn AFTER $item[$n]
598
599           Note that the various "$itempos[$n]...{from}" values record the
600           appropriate value after any token prefix has been skipped.
601
602           Hence, instead of the somewhat tedious and error-prone:
603
604               rule: startcol token1 endcol
605                 startcol token2 endcol
606                 startcol token3 endcol
607                   { print "token1: columns $item[1]
608                         to $item[3]
609                    token2: columns $item[4]
610                         to $item[6]
611                    token3: columns $item[7]
612                         to $item[9]" }
613
614               startcol: '' { $thiscolumn }    # NEED THE '' TO STEP PAST TOKEN SEP
615               endcol:  { $prevcolumn }
616
617           it is possible to write:
618
619               rule: token1 token2 token3
620                   { print "token1: columns $itempos[1]{column}{from}
621                         to $itempos[1]{column}{to}
622                    token2: columns $itempos[2]{column}{from}
623                         to $itempos[2]{column}{to}
624                    token3: columns $itempos[3]{column}{from}
625                         to $itempos[3]{column}{to}" }
626
627           Note however that (in the current implementation) the use of
628           @itempos anywhere in a grammar implies that item positioning
629           information is collected everywhere during the parse. Depending on
630           the grammar and the size of the text to be parsed, this may be
631           prohibitively expensive and the explicit use of $thisline,
632           $thiscolumn, etc. may be a better choice.
633
634       $thisparser
635           A reference to the "Parse::RecDescent" object through which parsing
636           was initiated.
637
638           The value of $thisparser propagates down the subrules of a parse
639           but not back up. Hence, you can invoke subrules from another parser
640           for the scope of the current rule as follows:
641
642               rule: subrule1 subrule2
643               | { $thisparser = $::otherparser } <reject>
644               | subrule3 subrule4
645               | subrule5
646
647           The result is that the production calls "subrule1" and "subrule2"
648           of the current parser, and the remaining productions call the named
649           subrules from $::otherparser. Note, however that "Bad Things" will
650           happen if "::otherparser" isn't a blessed reference and/or doesn't
651           have methods with the same names as the required subrules!
652
653       $thisrule
654           A reference to the "Parse::RecDescent::Rule" object corresponding
655           to the rule currently being matched.
656
657       $thisprod
658           A reference to the "Parse::RecDescent::Production" object
659           corresponding to the production currently being matched.
660
661       $score and $score_return
662           $score stores the best production score to date, as specified by an
663           earlier "<score:...>" directive. $score_return stores the
664           corresponding return value for the successful production.
665
666           See "Scored productions".
667
668       Warning: the parser relies on the information in the various "this..."
669       objects in some non-obvious ways. Tinkering with the other members of
670       these objects will probably cause Bad Things to happen, unless you
671       really know what you're doing. The only exception to this advice is
672       that the use of "$this...->{local}" is always safe.
673
674   Start-up Actions
675       Any actions which appear before the first rule definition in a grammar
676       are treated as "start-up" actions. Each such action is stripped of its
677       outermost brackets and then evaluated (in the parser's special
678       namespace) just before the rules of the grammar are first compiled.
679
680       The main use of start-up actions is to declare local variables within
681       the parser's special namespace:
682
683           { my $lastitem = '???'; }
684
685           list: item(s)   { $return = $lastitem }
686
687           item: book  { $lastitem = 'book'; }
688             bell  { $lastitem = 'bell'; }
689             candle    { $lastitem = 'candle'; }
690
691       but start-up actions can be used to execute any valid Perl code within
692       a parser's special namespace.
693
694       Start-up actions can appear within a grammar extension or replacement
695       (that is, a partial grammar installed via "Parse::RecDescent::Extend()"
696       or "Parse::RecDescent::Replace()" - see "Incremental Parsing"), and
697       will be executed before the new grammar is installed. Note, however,
698       that a particular start-up action is only ever executed once.
699
700   Autoactions
701       It is sometimes desirable to be able to specify a default action to be
702       taken at the end of every production (for example, in order to easily
703       build a parse tree). If the variable $::RD_AUTOACTION is defined when
704       "Parse::RecDescent::new()" is called, the contents of that variable are
705       treated as a specification of an action which is to appended to each
706       production in the corresponding grammar.
707
708       Alternatively, you can hard-code the autoaction within a grammar, using
709       the "<autoaction:...>" directive.
710
711       So, for example, to construct a simple parse tree you could write:
712
713           $::RD_AUTOACTION = q { [@item] };
714
715           parser = Parse::RecDescent->new(q{
716           expression: and_expr '||' expression | and_expr
717           and_expr:   not_expr '&&' and_expr   | not_expr
718           not_expr:   '!' brack_expr       | brack_expr
719           brack_expr: '(' expression ')'       | identifier
720           identifier: /[a-z]+/i
721           });
722
723       or:
724
725           parser = Parse::RecDescent->new(q{
726           <autoaction: { [@item] } >
727
728           expression: and_expr '||' expression | and_expr
729           and_expr:   not_expr '&&' and_expr   | not_expr
730           not_expr:   '!' brack_expr       | brack_expr
731           brack_expr: '(' expression ')'       | identifier
732           identifier: /[a-z]+/i
733           });
734
735       Either of these is equivalent to:
736
737           parser = new Parse::RecDescent (q{
738           expression: and_expr '||' expression
739               { [@item] }
740             | and_expr
741               { [@item] }
742
743           and_expr:   not_expr '&&' and_expr
744               { [@item] }
745           |   not_expr
746               { [@item] }
747
748           not_expr:   '!' brack_expr
749               { [@item] }
750           |   brack_expr
751               { [@item] }
752
753           brack_expr: '(' expression ')'
754               { [@item] }
755             | identifier
756               { [@item] }
757
758           identifier: /[a-z]+/i
759               { [@item] }
760           });
761
762       Alternatively, we could take an object-oriented approach, use different
763       classes for each node (and also eliminating redundant intermediate
764       nodes):
765
766           $::RD_AUTOACTION = q
767             { $#item==1 ? $item[1] : "$item[0]_node"->new(@item[1..$#item]) };
768
769           parser = Parse::RecDescent->new(q{
770               expression: and_expr '||' expression | and_expr
771               and_expr:   not_expr '&&' and_expr   | not_expr
772               not_expr:   '!' brack_expr           | brack_expr
773               brack_expr: '(' expression ')'       | identifier
774               identifier: /[a-z]+/i
775           });
776
777       or:
778
779           parser = Parse::RecDescent->new(q{
780               <autoaction:
781                 $#item==1 ? $item[1] : "$item[0]_node"->new(@item[1..$#item])
782               >
783
784               expression: and_expr '||' expression | and_expr
785               and_expr:   not_expr '&&' and_expr   | not_expr
786               not_expr:   '!' brack_expr           | brack_expr
787               brack_expr: '(' expression ')'       | identifier
788               identifier: /[a-z]+/i
789           });
790
791       which are equivalent to:
792
793           parser = Parse::RecDescent->new(q{
794               expression: and_expr '||' expression
795                   { "expression_node"->new(@item[1..3]) }
796               | and_expr
797
798               and_expr:   not_expr '&&' and_expr
799                   { "and_expr_node"->new(@item[1..3]) }
800               |   not_expr
801
802               not_expr:   '!' brack_expr
803                   { "not_expr_node"->new(@item[1..2]) }
804               |   brack_expr
805
806               brack_expr: '(' expression ')'
807                   { "brack_expr_node"->new(@item[1..3]) }
808               | identifier
809
810               identifier: /[a-z]+/i
811                   { "identifer_node"->new(@item[1]) }
812           });
813
814       Note that, if a production already ends in an action, no autoaction is
815       appended to it. For example, in this version:
816
817           $::RD_AUTOACTION = q
818             { $#item==1 ? $item[1] : "$item[0]_node"->new(@item[1..$#item]) };
819
820           parser = Parse::RecDescent->new(q{
821               expression: and_expr '&&' expression | and_expr
822               and_expr:   not_expr '&&' and_expr   | not_expr
823               not_expr:   '!' brack_expr           | brack_expr
824               brack_expr: '(' expression ')'       | identifier
825               identifier: /[a-z]+/i
826                   { 'terminal_node'->new($item[1]) }
827           });
828
829       each "identifier" match produces a "terminal_node" object, not an
830       "identifier_node" object.
831
832       A level 1 warning is issued each time an "autoaction" is added to some
833       production.
834
835   Autotrees
836       A commonly needed autoaction is one that builds a parse-tree. It is
837       moderately tricky to set up such an action (which must treat terminals
838       differently from non-terminals), so Parse::RecDescent simplifies the
839       process by providing the "<autotree>" directive.
840
841       If this directive appears at the start of grammar, it causes
842       Parse::RecDescent to insert autoactions at the end of any rule except
843       those which already end in an action. The action inserted depends on
844       whether the production is an intermediate rule (two or more items), or
845       a terminal of the grammar (i.e. a single pattern or string item).
846
847       So, for example, the following grammar:
848
849           <autotree>
850
851           file    : command(s)
852           command : get | set | vet
853           get : 'get' ident ';'
854           set : 'set' ident 'to' value ';'
855           vet : 'check' ident 'is' value ';'
856           ident   : /\w+/
857           value   : /\d+/
858
859       is equivalent to:
860
861           file    : command(s)        { bless \%item, $item[0] }
862           command : get       { bless \%item, $item[0] }
863           | set           { bless \%item, $item[0] }
864           | vet           { bless \%item, $item[0] }
865           get : 'get' ident ';'   { bless \%item, $item[0] }
866           set : 'set' ident 'to' value ';'    { bless \%item, $item[0] }
867           vet : 'check' ident 'is' value ';'  { bless \%item, $item[0] }
868
869           ident   : /\w+/  { bless {__VALUE__=>$item[1]}, $item[0] }
870           value   : /\d+/  { bless {__VALUE__=>$item[1]}, $item[0] }
871
872       Note that each node in the tree is blessed into a class of the same
873       name as the rule itself. This makes it easy to build object-oriented
874       processors for the parse-trees that the grammar produces. Note too that
875       the last two rules produce special objects with the single attribute
876       '__VALUE__'. This is because they consist solely of a single terminal.
877
878       This autoaction-ed grammar would then produce a parse tree in a data
879       structure like this:
880
881           {
882             file => {
883               command => {
884                [ get => {
885                   identifier => { __VALUE__ => 'a' },
886                     },
887                  set => {
888                   identifier => { __VALUE__ => 'b' },
889                   value      => { __VALUE__ => '7' },
890                     },
891                  vet => {
892                   identifier => { __VALUE__ => 'b' },
893                   value      => { __VALUE__ => '7' },
894                     },
895                 ],
896                  },
897             }
898           }
899
900       (except, of course, that each nested hash would also be blessed into
901       the appropriate class).
902
903       You can also specify a base class for the "<autotree>" directive.  The
904       supplied prefix will be prepended to the rule names when creating tree
905       nodes.  The following are equivalent:
906
907           <autotree:MyBase::Class>
908           <autotree:MyBase::Class::>
909
910       And will produce a root node blessed into the "MyBase::Class::file"
911       package in the example above.
912
913   Autostubbing
914       Normally, if a subrule appears in some production, but no rule of that
915       name is ever defined in the grammar, the production which refers to the
916       non-existent subrule fails immediately. This typically occurs as a
917       result of misspellings, and is a sufficiently common occurrence that a
918       warning is generated for such situations.
919
920       However, when prototyping a grammar it is sometimes useful to be able
921       to use subrules before a proper specification of them is really
922       possible.  For example, a grammar might include a section like:
923
924           function_call: identifier '(' arg(s?) ')'
925
926           identifier: /[a-z]\w*/i
927
928       where the possible format of an argument is sufficiently complex that
929       it is not worth specifying in full until the general function call
930       syntax has been debugged. In this situation it is convenient to leave
931       the real rule "arg" undefined and just slip in a placeholder (or
932       "stub"):
933
934           arg: 'arg'
935
936       so that the function call syntax can be tested with dummy input such
937       as:
938
939           f0()
940           f1(arg)
941           f2(arg arg)
942           f3(arg arg arg)
943
944       et cetera.
945
946       Early in prototyping, many such "stubs" may be required, so
947       "Parse::RecDescent" provides a means of automating their definition.
948       If the variable $::RD_AUTOSTUB is defined when a parser is built, a
949       subrule reference to any non-existent rule (say, "subrule"), will cause
950       a "stub" rule to be automatically defined in the generated parser.  If
951       "$::RD_AUTOSTUB eq '1'" or is false, a stub rule of the form:
952
953           subrule: 'subrule'
954
955       will be generated.  The special-case for a value of '1' is to allow the
956       use of the perl -s with -RD_AUTOSTUB without generating "subrule: '1'"
957       per below. If $::RD_AUTOSTUB is true, a stub rule of the form:
958
959           subrule: $::RD_AUTOSTUB
960
961       will be generated.  $::RD_AUTOSTUB must contain a valid production
962       item, no checking is performed.  No lazy evaluation of $::RD_AUTOSTUB
963       is performed, it is evaluated at the time the Parser is generated.
964
965       Hence, with $::RD_AUTOSTUB defined, it is possible to only partially
966       specify a grammar, and then "fake" matches of the unspecified
967       (sub)rules by just typing in their name, or a literal value that was
968       assigned to $::RD_AUTOSTUB.
969
970   Look-ahead
971       If a subrule, token, or action is prefixed by "...", then it is treated
972       as a "look-ahead" request. That means that the current production can
973       (as usual) only succeed if the specified item is matched, but that the
974       matching does not consume any of the text being parsed. This is very
975       similar to the "/(?=...)/" look-ahead construct in Perl patterns. Thus,
976       the rule:
977
978           inner_word: word ...word
979
980       will match whatever the subrule "word" matches, provided that match is
981       followed by some more text which subrule "word" would also match
982       (although this second substring is not actually consumed by
983       "inner_word")
984
985       Likewise, a "...!" prefix, causes the following item to succeed
986       (without consuming any text) if and only if it would normally fail.
987       Hence, a rule such as:
988
989           identifier: ...!keyword ...!'_' /[A-Za-z_]\w*/
990
991       matches a string of characters which satisfies the pattern
992       "/[A-Za-z_]\w*/", but only if the same sequence of characters would not
993       match either subrule "keyword" or the literal token '_'.
994
995       Sequences of look-ahead prefixes accumulate, multiplying their positive
996       and/or negative senses. Hence:
997
998           inner_word: word ...!......!word
999
1000       is exactly equivalent to the original example above (a warning is
1001       issued in cases like these, since they often indicate something left
1002       out, or misunderstood).
1003
1004       Note that actions can also be treated as look-aheads. In such cases,
1005       the state of the parser text (in the local variable $text) after the
1006       look-ahead action is guaranteed to be identical to its state before the
1007       action, regardless of how it's changed within the action (unless you
1008       actually undefine $text, in which case you get the disaster you deserve
1009       :-).
1010
1011   Directives
1012       Directives are special pre-defined actions which may be used to alter
1013       the behaviour of the parser. There are currently twenty-three
1014       directives: "<commit>", "<uncommit>", "<reject>", "<score>",
1015       "<autoscore>", "<skip>", "<resync>", "<error>", "<warn>", "<hint>",
1016       "<trace_build>", "<trace_parse>", "<nocheck>", "<rulevar>",
1017       "<matchrule>", "<leftop>", "<rightop>", "<defer>", "<nocheck>",
1018       "<perl_quotelike>", "<perl_codeblock>", "<perl_variable>", and
1019       "<token>".
1020
1021       Committing and uncommitting
1022           The "<commit>" and "<uncommit>" directives permit the recursive
1023           descent of the parse tree to be pruned (or "cut") for efficiency.
1024           Within a rule, a "<commit>" directive instructs the rule to ignore
1025           subsequent productions if the current production fails. For
1026           example:
1027
1028               command: 'find' <commit> filename
1029                  | 'open' <commit> filename
1030                  | 'move' filename filename
1031
1032           Clearly, if the leading token 'find' is matched in the first
1033           production but that production fails for some other reason, then
1034           the remaining productions cannot possibly match. The presence of
1035           the "<commit>" causes the "command" rule to fail immediately if an
1036           invalid "find" command is found, and likewise if an invalid "open"
1037           command is encountered.
1038
1039           It is also possible to revoke a previous commitment. For example:
1040
1041               if_statement: 'if' <commit> condition
1042                   'then' block <uncommit>
1043                   'else' block
1044                   | 'if' <commit> condition
1045                   'then' block
1046
1047           In this case, a failure to find an "else" block in the first
1048           production shouldn't preclude trying the second production, but a
1049           failure to find a "condition" certainly should.
1050
1051           As a special case, any production in which the first item is an
1052           "<uncommit>" immediately revokes a preceding "<commit>" (even
1053           though the production would not otherwise have been tried). For
1054           example, in the rule:
1055
1056               request: 'explain' expression
1057                      | 'explain' <commit> keyword
1058                      | 'save'
1059                      | 'quit'
1060                      | <uncommit> term '?'
1061
1062           if the text being matched was "explain?", and the first two
1063           productions failed, then the "<commit>" in production two would
1064           cause productions three and four to be skipped, but the leading
1065           "<uncommit>" in the production five would allow that production to
1066           attempt a match.
1067
1068           Note in the preceding example, that the "<commit>" was only placed
1069           in production two. If production one had been:
1070
1071               request: 'explain' <commit> expression
1072
1073           then production two would be (inappropriately) skipped if a leading
1074           "explain..." was encountered.
1075
1076           Both "<commit>" and "<uncommit>" directives always succeed, and
1077           their value is always 1.
1078
1079       Rejecting a production
1080           The "<reject>" directive immediately causes the current production
1081           to fail (it is exactly equivalent to, but more obvious than, the
1082           action "{undef}"). A "<reject>" is useful when it is desirable to
1083           get the side effects of the actions in one production, without
1084           prejudicing a match by some other production later in the rule. For
1085           example, to insert tracing code into the parse:
1086
1087               complex_rule: { print "In complex rule...\n"; } <reject>
1088
1089               complex_rule: simple_rule '+' 'i' '*' simple_rule
1090                   | 'i' '*' simple_rule
1091                   | simple_rule
1092
1093           It is also possible to specify a conditional rejection, using the
1094           form "<reject:condition>", which only rejects if the specified
1095           condition is true. This form of rejection is exactly equivalent to
1096           the action "{(condition)?undef:1}>".  For example:
1097
1098               command: save_command
1099                  | restore_command
1100                  | <reject: defined $::tolerant> { exit }
1101                  | <error: Unknown command. Ignored.>
1102
1103           A "<reject>" directive never succeeds (and hence has no associated
1104           value). A conditional rejection may succeed (if its condition is
1105           not satisfied), in which case its value is 1.
1106
1107           As an extra optimization, "Parse::RecDescent" ignores any
1108           production which begins with an unconditional "<reject>" directive,
1109           since any such production can never successfully match or have any
1110           useful side-effects. A level 1 warning is issued in all such cases.
1111
1112           Note that productions beginning with conditional "<reject:...>"
1113           directives are never "optimized away" in this manner, even if they
1114           are always guaranteed to fail (for example: "<reject:1>")
1115
1116           Due to the way grammars are parsed, there is a minor restriction on
1117           the condition of a conditional "<reject:...>": it cannot contain
1118           any raw '<' or '>' characters. For example:
1119
1120               line: cmd <reject: $thiscolumn > max> data
1121
1122           results in an error when a parser is built from this grammar (since
1123           the grammar parser has no way of knowing whether the first > is a
1124           "less than" or the end of the "<reject:...>".
1125
1126           To overcome this problem, put the condition inside a do{} block:
1127
1128               line: cmd <reject: do{$thiscolumn > max}> data
1129
1130           Note that the same problem may occur in other directives that take
1131           arguments. The same solution will work in all cases.
1132
1133       Skipping between terminals
1134           The "<skip>" directive enables the terminal prefix used in a
1135           production to be changed. For example:
1136
1137               OneLiner: Command <skip:'[ \t]*'> Arg(s) /;/
1138
1139           causes only blanks and tabs to be skipped before terminals in the
1140           "Arg" subrule (and any of its subrules>, and also before the final
1141           "/;/" terminal.  Once the production is complete, the previous
1142           terminal prefix is reinstated. Note that this implies that distinct
1143           productions of a rule must reset their terminal prefixes
1144           individually.
1145
1146           The "<skip>" directive evaluates to the previous terminal prefix,
1147           so it's easy to reinstate a prefix later in a production:
1148
1149               Command: <skip:","> CSV(s) <skip:$item[1]> Modifier
1150
1151           The value specified after the colon is interpolated into a pattern,
1152           so all of the following are equivalent (though their efficiency
1153           increases down the list):
1154
1155               <skip: "$colon|$comma">   # ASSUMING THE VARS HOLD THE OBVIOUS VALUES
1156
1157               <skip: ':|,'>
1158
1159               <skip: q{[:,]}>
1160
1161               <skip: qr/[:,]/>
1162
1163           There is no way of directly setting the prefix for an entire rule,
1164           except as follows:
1165
1166               Rule: <skip: '[ \t]*'> Prod1
1167                   | <skip: '[ \t]*'> Prod2a Prod2b
1168                   | <skip: '[ \t]*'> Prod3
1169
1170           or, better:
1171
1172               Rule: <skip: '[ \t]*'>
1173               (
1174                   Prod1
1175                 | Prod2a Prod2b
1176                 | Prod3
1177               )
1178
1179           The skip pattern is passed down to subrules, so setting the skip
1180           for the top-level rule as described above actually sets the prefix
1181           for the entire grammar (provided that you only call the method
1182           corresponding to the top-level rule itself). Alternatively, or if
1183           you have more than one top-level rule in your grammar, you can
1184           provide a global "<skip>" directive prior to defining any rules in
1185           the grammar. These are the preferred alternatives to setting
1186           $Parse::RecDescent::skip.
1187
1188           Additionally, using "<skip>" actually allows you to have a
1189           completely dynamic skipping behaviour. For example:
1190
1191              Rule_with_dynamic_skip: <skip: $::skip_pattern> Rule
1192
1193           Then you can set $::skip_pattern before invoking
1194           "Rule_with_dynamic_skip" and have it skip whatever you specified.
1195
1196           Note: Up to release 1.51 of Parse::RecDescent, an entirely
1197           different mechanism was used for specifying terminal prefixes. The
1198           current method is not backwards-compatible with that early
1199           approach. The current approach is stable and will not change again.
1200
1201           Note: the global "<skip>" directive added in 1.967_004 did not
1202           interpolate the pattern argument, instead the pattern was placed
1203           inside of single quotes and then interpolated. This behavior was
1204           changed in 1.967_010 so that all "<skip>" directives behavior
1205           similarly.
1206
1207       Resynchronization
1208           The "<resync>" directive provides a visually distinctive means of
1209           consuming some of the text being parsed, usually to skip an
1210           erroneous input. In its simplest form "<resync>" simply consumes
1211           text up to and including the next newline ("\n") character,
1212           succeeding only if the newline is found, in which case it causes
1213           its surrounding rule to return zero on success.
1214
1215           In other words, a "<resync>" is exactly equivalent to the token
1216           "/[^\n]*\n/" followed by the action "{ $return = 0 }" (except that
1217           productions beginning with a "<resync>" are ignored when generating
1218           error messages). A typical use might be:
1219
1220               script : command(s)
1221
1222               command: save_command
1223                  | restore_command
1224                  | <resync> # TRY NEXT LINE, IF POSSIBLE
1225
1226           It is also possible to explicitly specify a resynchronization
1227           pattern, using the "<resync:pattern>" variant. This version
1228           succeeds only if the specified pattern matches (and consumes) the
1229           parsed text. In other words, "<resync:pattern>" is exactly
1230           equivalent to the token "/pattern/" (followed by a
1231           "{ $return = 0 }" action). For example, if commands were terminated
1232           by newlines or semi-colons:
1233
1234               command: save_command
1235                  | restore_command
1236                  | <resync:[^;\n]*[;\n]>
1237
1238           The value of a successfully matched "<resync>" directive (of either
1239           type) is the text that it consumed. Note, however, that since the
1240           directive also sets $return, a production consisting of a lone
1241           "<resync>" succeeds but returns the value zero (which a calling
1242           rule may find useful to distinguish between "true" matches and
1243           "tolerant" matches).  Remember that returning a zero value
1244           indicates that the rule succeeded (since only an "undef" denotes
1245           failure within "Parse::RecDescent" parsers.
1246
1247       Error handling
1248           The "<error>" directive provides automatic or user-defined
1249           generation of error messages during a parse. In its simplest form
1250           "<error>" prepares an error message based on the mismatch between
1251           the last item expected and the text which cause it to fail. For
1252           example, given the rule:
1253
1254               McCoy: curse ',' name ', I'm a doctor, not a' a_profession '!'
1255                | pronoun 'dead,' name '!'
1256                | <error>
1257
1258           the following strings would produce the following messages:
1259
1260           "Amen, Jim!"
1261                      ERROR (line 1): Invalid McCoy: Expected curse or pronoun
1262                          not found
1263
1264           "Dammit, Jim, I'm a doctor!"
1265                      ERROR (line 1): Invalid McCoy: Expected ", I'm a doctor, not a"
1266                          but found ", I'm a doctor!" instead
1267
1268           "He's dead,\n"
1269                      ERROR (line 2): Invalid McCoy: Expected name not found
1270
1271           "He's alive!"
1272                      ERROR (line 1): Invalid McCoy: Expected 'dead,' but found
1273                          "alive!" instead
1274
1275           "Dammit, Jim, I'm a doctor, not a pointy-eared Vulcan!"
1276                      ERROR (line 1): Invalid McCoy: Expected a profession but found
1277                          "pointy-eared Vulcan!" instead
1278
1279           Note that, when autogenerating error messages, all underscores in
1280           any rule name used in a message are replaced by single spaces (for
1281           example "a_production" becomes "a production"). Judicious choice of
1282           rule names can therefore considerably improve the readability of
1283           automatic error messages (as well as the maintainability of the
1284           original grammar).
1285
1286           If the automatically generated error is not sufficient, it is
1287           possible to provide an explicit message as part of the error
1288           directive. For example:
1289
1290               Spock: "Fascinating ',' (name | 'Captain') '.'
1291                | "Highly illogical, doctor."
1292                | <error: He never said that!>
1293
1294           which would result in all failures to parse a "Spock" subrule
1295           printing the following message:
1296
1297                  ERROR (line <N>): Invalid Spock:  He never said that!
1298
1299           The error message is treated as a "qq{...}" string and interpolated
1300           when the error is generated (not when the directive is specified!).
1301           Hence:
1302
1303               <error: Mystical error near "$text">
1304
1305           would correctly insert the ambient text string which caused the
1306           error.
1307
1308           There are two other forms of error directive: "<error?>" and
1309           "<error?: msg>". These behave just like "<error>" and
1310           "<error: msg>" respectively, except that they are only triggered if
1311           the rule is "committed" at the time they are encountered. For
1312           example:
1313
1314               Scotty: "Ya kenna change the Laws of Phusics," <commit> name
1315                 | name <commit> ',' 'she's goanta blaw!'
1316                 | <error?>
1317
1318           will only generate an error for a string beginning with "Ya kenna
1319           change the Laws o' Phusics," or a valid name, but which still fails
1320           to match the corresponding production. That is,
1321           "$parser->Scotty("Aye, Cap'ain")" will fail silently (since neither
1322           production will "commit" the rule on that input), whereas
1323           "$parser->Scotty("Mr Spock, ah jest kenna do'ut!")"  will fail with
1324           the error message:
1325
1326                  ERROR (line 1): Invalid Scotty: expected 'she's goanta blaw!'
1327                      but found 'I jest kenna do'ut!' instead.
1328
1329           since in that case the second production would commit after
1330           matching the leading name.
1331
1332           Note that to allow this behaviour, all "<error>" directives which
1333           are the first item in a production automatically uncommit the rule
1334           just long enough to allow their production to be attempted (that
1335           is, when their production fails, the commitment is reinstated so
1336           that subsequent productions are skipped).
1337
1338           In order to permanently uncommit the rule before an error message,
1339           it is necessary to put an explicit "<uncommit>" before the
1340           "<error>". For example:
1341
1342               line: 'Kirk:'  <commit> Kirk
1343               | 'Spock:' <commit> Spock
1344               | 'McCoy:' <commit> McCoy
1345               | <uncommit> <error?> <reject>
1346               | <resync>
1347
1348           Error messages generated by the various "<error...>" directives are
1349           not displayed immediately. Instead, they are "queued" in a buffer
1350           and are only displayed once parsing ultimately fails. Moreover,
1351           "<error...>" directives that cause one production of a rule to fail
1352           are automatically removed from the message queue if another
1353           production subsequently causes the entire rule to succeed.  This
1354           means that you can put "<error...>" directives wherever useful
1355           diagnosis can be done, and only those associated with actual parser
1356           failure will ever be displayed. Also see "GOTCHAS".
1357
1358           As a general rule, the most useful diagnostics are usually
1359           generated either at the very lowest level within the grammar, or at
1360           the very highest. A good rule of thumb is to identify those
1361           subrules which consist mainly (or entirely) of terminals, and then
1362           put an "<error...>" directive at the end of any other rule which
1363           calls one or more of those subrules.
1364
1365           There is one other situation in which the output of the various
1366           types of error directive is suppressed; namely, when the rule
1367           containing them is being parsed as part of a "look-ahead" (see
1368           "Look-ahead"). In this case, the error directive will still cause
1369           the rule to fail, but will do so silently.
1370
1371           An unconditional "<error>" directive always fails (and hence has no
1372           associated value). This means that encountering such a directive
1373           always causes the production containing it to fail. Hence an
1374           "<error>" directive will inevitably be the last (useful) item of a
1375           rule (a level 3 warning is issued if a production contains items
1376           after an unconditional "<error>" directive).
1377
1378           An "<error?>" directive will succeed (that is: fail to fail :-), if
1379           the current rule is uncommitted when the directive is encountered.
1380           In that case the directive's associated value is zero. Hence, this
1381           type of error directive can be used before the end of a production.
1382           For example:
1383
1384               command: 'do' <commit> something
1385                  | 'report' <commit> something
1386                  | <error?: Syntax error> <error: Unknown command>
1387
1388           Warning: The "<error?>" directive does not mean "always fail (but
1389           do so silently unless committed)". It actually means "only fail
1390           (and report) if committed, otherwise succeed". To achieve the "fail
1391           silently if uncommitted" semantics, it is necessary to use:
1392
1393               rule: item <commit> item(s)
1394               | <error?> <reject>  # FAIL SILENTLY UNLESS COMMITTED
1395
1396           However, because people seem to expect a lone "<error?>" directive
1397           to work like this:
1398
1399               rule: item <commit> item(s)
1400               | <error?: Error message if committed>
1401               | <error:  Error message if uncommitted>
1402
1403           Parse::RecDescent automatically appends a "<reject>" directive if
1404           the "<error?>" directive is the only item in a production. A level
1405           2 warning (see below) is issued when this happens.
1406
1407           The level of error reporting during both parser construction and
1408           parsing is controlled by the presence or absence of four global
1409           variables: $::RD_ERRORS, $::RD_WARN, $::RD_HINT, and <$::RD_TRACE>.
1410           If $::RD_ERRORS is defined (and, by default, it is) then fatal
1411           errors are reported.
1412
1413           Whenever $::RD_WARN is defined, certain non-fatal problems are also
1414           reported.
1415
1416           Warnings have an associated "level": 1, 2, or 3. The higher the
1417           level, the more serious the warning. The value of the corresponding
1418           global variable ($::RD_WARN) determines the lowest level of warning
1419           to be displayed. Hence, to see all warnings, set $::RD_WARN to 1.
1420           To see only the most serious warnings set $::RD_WARN to 3.  By
1421           default $::RD_WARN is initialized to 3, ensuring that serious but
1422           non-fatal errors are automatically reported.
1423
1424           There is also a grammar directive to turn on warnings from within
1425           the grammar: "<warn>". It takes an optional argument, which
1426           specifies the warning level: "<warn: 2>".
1427
1428           See "DIAGNOSTICS" for a list of the various error and warning
1429           messages that Parse::RecDescent generates when these two variables
1430           are defined.
1431
1432           Defining any of the remaining variables (which are not defined by
1433           default) further increases the amount of information reported.
1434           Defining $::RD_HINT causes the parser generator to offer more
1435           detailed analyses and hints on both errors and warnings.  Note that
1436           setting $::RD_HINT at any point automagically sets $::RD_WARN to 1.
1437           There is also a "<hint>" directive, which can be hard-coded into a
1438           grammar.
1439
1440           Defining $::RD_TRACE causes the parser generator and the parser to
1441           report their progress to STDERR in excruciating detail (although,
1442           without hints unless $::RD_HINT is separately defined). This detail
1443           can be moderated in only one respect: if $::RD_TRACE has an integer
1444           value (N) greater than 1, only the N characters of the "current
1445           parsing context" (that is, where in the input string we are at any
1446           point in the parse) is reported at any time.
1447
1448           $::RD_TRACE is mainly useful for debugging a grammar that isn't
1449           behaving as you expected it to. To this end, if $::RD_TRACE is
1450           defined when a parser is built, any actual parser code which is
1451           generated is also written to a file named "RD_TRACE" in the local
1452           directory.
1453
1454           There are two directives associated with the $::RD_TRACE variable.
1455           If a grammar contains a "<trace_build>" directive anywhere in its
1456           specification, $::RD_TRACE is turned on during the parser
1457           construction phase.  If a grammar contains a "<trace_parse>"
1458           directive anywhere in its specification, $::RD_TRACE is turned on
1459           during any parse the parser performs.
1460
1461           Note that the four variables belong to the "main" package, which
1462           makes them easier to refer to in the code controlling the parser,
1463           and also makes it easy to turn them into command line flags
1464           ("-RD_ERRORS", "-RD_WARN", "-RD_HINT", "-RD_TRACE") under perl -s.
1465
1466           The corresponding directives are useful to "hardwire" the various
1467           debugging features into a particular grammar (rather than having to
1468           set and reset external variables).
1469
1470       Redirecting diagnostics
1471           The diagnostics provided by the tracing mechanism always go to
1472           STDERR.  If you need them to go elsewhere, localize and reopen
1473           STDERR prior to the parse.
1474
1475           For example:
1476
1477               {
1478                   local *STDERR = IO::File->new(">$filename") or die $!;
1479
1480                   my $result = $parser->startrule($text);
1481               }
1482
1483       Consistency checks
1484           Whenever a parser is build, Parse::RecDescent carries out a number
1485           of (potentially expensive) consistency checks. These include:
1486           verifying that the grammar is not left-recursive and that no rules
1487           have been left undefined.
1488
1489           These checks are important safeguards during development, but
1490           unnecessary overheads when the grammar is stable and ready to be
1491           deployed. So Parse::RecDescent provides a directive to disable
1492           them: "<nocheck>".
1493
1494           If a grammar contains a "<nocheck>" directive anywhere in its
1495           specification, the extra compile-time checks are by-passed.
1496
1497       Specifying local variables
1498           It is occasionally convenient to specify variables which are local
1499           to a single rule. This may be achieved by including a
1500           "<rulevar:...>" directive anywhere in the rule. For example:
1501
1502               markup: <rulevar: $tag>
1503
1504               markup: tag {($tag=$item[1]) =~ s/^<|>$//g} body[$tag]
1505
1506           The example "<rulevar: $tag>" directive causes a "my" variable
1507           named $tag to be declared at the start of the subroutine
1508           implementing the "markup" rule (that is, before the first
1509           production, regardless of where in the rule it is specified).
1510
1511           Specifically, any directive of the form: "<rulevar:text>" causes a
1512           line of the form "my text;" to be added at the beginning of the
1513           rule subroutine, immediately after the definitions of the following
1514           local variables:
1515
1516               $thisparser $commit
1517               $thisrule   @item
1518               $thisline   @arg
1519               $text   %arg
1520
1521           This means that the following "<rulevar>" directives work as
1522           expected:
1523
1524               <rulevar: $count = 0 >
1525
1526               <rulevar: $firstarg = $arg[0] || '' >
1527
1528               <rulevar: $myItems = \@item >
1529
1530               <rulevar: @context = ( $thisline, $text, @arg ) >
1531
1532               <rulevar: ($name,$age) = $arg{"name","age"} >
1533
1534           If a variable that is also visible to subrules is required, it
1535           needs to be "local"'d, not "my"'d. "rulevar" defaults to "my", but
1536           if "local" is explicitly specified:
1537
1538               <rulevar: local $count = 0 >
1539
1540           then a "local"-ized variable is declared instead, and will be
1541           available within subrules.
1542
1543           Note however that, because all such variables are "my" variables,
1544           their values do not persist between match attempts on a given rule.
1545           To preserve values between match attempts, values can be stored
1546           within the "local" member of the $thisrule object:
1547
1548               countedrule: { $thisrule->{"local"}{"count"}++ }
1549                    <reject>
1550                  | subrule1
1551                  | subrule2
1552                  | <reject: $thisrule->{"local"}{"count"} == 1>
1553                    subrule3
1554
1555           When matching a rule, each "<rulevar>" directive is matched as if
1556           it were an unconditional "<reject>" directive (that is, it causes
1557           any production in which it appears to immediately fail to match).
1558           For this reason (and to improve readability) it is usual to specify
1559           any "<rulevar>" directive in a separate production at the start of
1560           the rule (this has the added advantage that it enables
1561           "Parse::RecDescent" to optimize away such productions, just as it
1562           does for the "<reject>" directive).
1563
1564       Dynamically matched rules
1565           Because regexes and double-quoted strings are interpolated, it is
1566           relatively easy to specify productions with "context sensitive"
1567           tokens. For example:
1568
1569               command:  keyword  body  "end $item[1]"
1570
1571           which ensures that a command block is bounded by a "<keyword>...end
1572           <same keyword>" pair.
1573
1574           Building productions in which subrules are context sensitive is
1575           also possible, via the "<matchrule:...>" directive. This directive
1576           behaves identically to a subrule item, except that the rule which
1577           is invoked to match it is determined by the string specified after
1578           the colon. For example, we could rewrite the "command" rule like
1579           this:
1580
1581               command:  keyword  <matchrule:body>  "end $item[1]"
1582
1583           Whatever appears after the colon in the directive is treated as an
1584           interpolated string (that is, as if it appeared in "qq{...}"
1585           operator) and the value of that interpolated string is the name of
1586           the subrule to be matched.
1587
1588           Of course, just putting a constant string like "body" in a
1589           "<matchrule:...>" directive is of little interest or benefit.  The
1590           power of directive is seen when we use a string that interpolates
1591           to something interesting. For example:
1592
1593               command:    keyword <matchrule:$item[1]_body> "end $item[1]"
1594
1595               keyword:    'while' | 'if' | 'function'
1596
1597               while_body: condition block
1598
1599               if_body:    condition block ('else' block)(?)
1600
1601               function_body:  arglist block
1602
1603           Now the "command" rule selects how to proceed on the basis of the
1604           keyword that is found. It is as if "command" were declared:
1605
1606               command:    'while'    while_body    "end while"
1607                  |    'if'       if_body   "end if"
1608                  |    'function' function_body "end function"
1609
1610           When a "<matchrule:...>" directive is used as a repeated subrule,
1611           the rule name expression is "late-bound". That is, the name of the
1612           rule to be called is re-evaluated each time a match attempt is
1613           made. Hence, the following grammar:
1614
1615               { $::species = 'dogs' }
1616
1617               pair:   'two' <matchrule:$::species>(s)
1618
1619               dogs:   /dogs/ { $::species = 'cats' }
1620
1621               cats:   /cats/
1622
1623           will match the string "two dogs cats cats" completely, whereas it
1624           will only match the string "two dogs dogs dogs" up to the eighth
1625           letter. If the rule name were "early bound" (that is, evaluated
1626           only the first time the directive is encountered in a production),
1627           the reverse behaviour would be expected.
1628
1629           Note that the "matchrule" directive takes a string that is to be
1630           treated as a rule name, not as a rule invocation. That is, it's
1631           like a Perl symbolic reference, not an "eval". Just as you can say:
1632
1633               $subname = 'foo';
1634
1635               # and later...
1636
1637               &{$foo}(@args);
1638
1639           but not:
1640
1641               $subname = 'foo(@args)';
1642
1643               # and later...
1644
1645               &{$foo};
1646
1647           likewise you can say:
1648
1649               $rulename = 'foo';
1650
1651               # and in the grammar...
1652
1653               <matchrule:$rulename>[@args]
1654
1655           but not:
1656
1657               $rulename = 'foo[@args]';
1658
1659               # and in the grammar...
1660
1661               <matchrule:$rulename>
1662
1663       Deferred actions
1664           The "<defer:...>" directive is used to specify an action to be
1665           performed when (and only if!) the current production ultimately
1666           succeeds.
1667
1668           Whenever a "<defer:...>" directive appears, the code it specifies
1669           is converted to a closure (an anonymous subroutine reference) which
1670           is queued within the active parser object. Note that, because the
1671           deferred code is converted to a closure, the values of any "local"
1672           variable (such as $text, <@item>, etc.) are preserved until the
1673           deferred code is actually executed.
1674
1675           If the parse ultimately succeeds and the production in which the
1676           "<defer:...>" directive was evaluated formed part of the successful
1677           parse, then the deferred code is executed immediately before the
1678           parse returns. If however the production which queued a deferred
1679           action fails, or one of the higher-level rules which called that
1680           production fails, then the deferred action is removed from the
1681           queue, and hence is never executed.
1682
1683           For example, given the grammar:
1684
1685               sentence: noun trans noun
1686               | noun intrans
1687
1688               noun:     'the dog'
1689                   { print "$item[1]\t(noun)\n" }
1690               |     'the meat'
1691                   { print "$item[1]\t(noun)\n" }
1692
1693               trans:    'ate'
1694                   { print "$item[1]\t(transitive)\n" }
1695
1696               intrans:  'ate'
1697                   { print "$item[1]\t(intransitive)\n" }
1698                  |  'barked'
1699                   { print "$item[1]\t(intransitive)\n" }
1700
1701           then parsing the sentence "the dog ate" would produce the output:
1702
1703               the dog  (noun)
1704               ate  (transitive)
1705               the dog  (noun)
1706               ate  (intransitive)
1707
1708           This is because, even though the first production of "sentence"
1709           ultimately fails, its initial subrules "noun" and "trans" do match,
1710           and hence they execute their associated actions.  Then the second
1711           production of "sentence" succeeds, causing the actions of the
1712           subrules "noun" and "intrans" to be executed as well.
1713
1714           On the other hand, if the actions were replaced by "<defer:...>"
1715           directives:
1716
1717               sentence: noun trans noun
1718               | noun intrans
1719
1720               noun:     'the dog'
1721                   <defer: print "$item[1]\t(noun)\n" >
1722               |     'the meat'
1723                   <defer: print "$item[1]\t(noun)\n" >
1724
1725               trans:    'ate'
1726                   <defer: print "$item[1]\t(transitive)\n" >
1727
1728               intrans:  'ate'
1729                   <defer: print "$item[1]\t(intransitive)\n" >
1730                  |  'barked'
1731                   <defer: print "$item[1]\t(intransitive)\n" >
1732
1733           the output would be:
1734
1735               the dog  (noun)
1736               ate  (intransitive)
1737
1738           since deferred actions are only executed if they were evaluated in
1739           a production which ultimately contributes to the successful parse.
1740
1741           In this case, even though the first production of "sentence" caused
1742           the subrules "noun" and "trans" to match, that production
1743           ultimately failed and so the deferred actions queued by those
1744           subrules were subsequently discarded. The second production then
1745           succeeded, causing the entire parse to succeed, and so the deferred
1746           actions queued by the (second) match of the "noun" subrule and the
1747           subsequent match of "intrans" are preserved and eventually
1748           executed.
1749
1750           Deferred actions provide a means of improving the performance of a
1751           parser, by only executing those actions which are part of the final
1752           parse-tree for the input data.
1753
1754           Alternatively, deferred actions can be viewed as a mechanism for
1755           building (and executing) a customized subroutine corresponding to
1756           the given input data, much in the same way that autoactions (see
1757           "Autoactions") can be used to build a customized data structure for
1758           specific input.
1759
1760           Whether or not the action it specifies is ever executed, a
1761           "<defer:...>" directive always succeeds, returning the number of
1762           deferred actions currently queued at that point.
1763
1764       Parsing Perl
1765           Parse::RecDescent provides limited support for parsing subsets of
1766           Perl, namely: quote-like operators, Perl variables, and complete
1767           code blocks.
1768
1769           The "<perl_quotelike>" directive can be used to parse any Perl
1770           quote-like operator: 'a string', "m/a pattern/", "tr{ans}{lation}",
1771           etc.  It does this by calling Text::Balanced::quotelike().
1772
1773           If a quote-like operator is found, a reference to an array of eight
1774           elements is returned. Those elements are identical to the last
1775           eight elements returned by Text::Balanced::extract_quotelike() in
1776           an array context, namely:
1777
1778           [0] the name of the quotelike operator -- 'q', 'qq', 'm', 's', 'tr'
1779               -- if the operator was named; otherwise "undef",
1780
1781           [1] the left delimiter of the first block of the operation,
1782
1783           [2] the text of the first block of the operation (that is, the
1784               contents of a quote, the regex of a match, or substitution or
1785               the target list of a translation),
1786
1787           [3] the right delimiter of the first block of the operation,
1788
1789           [4] the left delimiter of the second block of the operation if
1790               there is one (that is, if it is a "s", "tr", or "y"); otherwise
1791               "undef",
1792
1793           [5] the text of the second block of the operation if there is one
1794               (that is, the replacement of a substitution or the translation
1795               list of a translation); otherwise "undef",
1796
1797           [6] the right delimiter of the second block of the operation (if
1798               any); otherwise "undef",
1799
1800           [7] the trailing modifiers on the operation (if any); otherwise
1801               "undef".
1802
1803           If a quote-like expression is not found, the directive fails with
1804           the usual "undef" value.
1805
1806           The "<perl_variable>" directive can be used to parse any Perl
1807           variable: $scalar, @array, %hash, $ref->{field}[$index], etc.  It
1808           does this by calling Text::Balanced::extract_variable().
1809
1810           If the directive matches text representing a valid Perl variable
1811           specification, it returns that text. Otherwise it fails with the
1812           usual "undef" value.
1813
1814           The "<perl_codeblock>" directive can be used to parse curly-brace-
1815           delimited block of Perl code, such as: { $a = 1; f() =~ m/pat/; }.
1816           It does this by calling Text::Balanced::extract_codeblock().
1817
1818           If the directive matches text representing a valid Perl code block,
1819           it returns that text. Otherwise it fails with the usual "undef"
1820           value.
1821
1822           You can also tell it what kind of brackets to use as the outermost
1823           delimiters. For example:
1824
1825               arglist: <perl_codeblock ()>
1826
1827           causes an arglist to match a perl code block whose outermost
1828           delimiters are "(...)" (rather than the default "{...}").
1829
1830       Constructing tokens
1831           Eventually, Parse::RecDescent will be able to parse tokenized
1832           input, as well as ordinary strings. In preparation for this joyous
1833           day, the "<token:...>" directive has been provided.  This directive
1834           creates a token which will be suitable for input to a
1835           Parse::RecDescent parser (when it eventually supports tokenized
1836           input).
1837
1838           The text of the token is the value of the immediately preceding
1839           item in the production. A "<token:...>" directive always succeeds
1840           with a return value which is the hash reference that is the new
1841           token. It also sets the return value for the production to that
1842           hash ref.
1843
1844           The "<token:...>" directive makes it easy to build a
1845           Parse::RecDescent-compatible lexer in Parse::RecDescent:
1846
1847               my $lexer = new Parse::RecDescent q
1848               {
1849               lex:    token(s)
1850
1851               token:  /a\b/          <token:INDEF>
1852                    |  /the\b/        <token:DEF>
1853                    |  /fly\b/        <token:NOUN,VERB>
1854                    |  /[a-z]+/i { lc $item[1] }  <token:ALPHA>
1855                    |  <error: Unknown token>
1856
1857               };
1858
1859           which will eventually be able to be used with a regular
1860           Parse::RecDescent grammar:
1861
1862               my $parser = new Parse::RecDescent q
1863               {
1864               startrule: subrule1 subrule 2
1865
1866               # ETC...
1867               };
1868
1869           either with a pre-lexing phase:
1870
1871               $parser->startrule( $lexer->lex($data) );
1872
1873           or with a lex-on-demand approach:
1874
1875               $parser->startrule( sub{$lexer->token(\$data)} );
1876
1877           But at present, only the "<token:...>" directive is actually
1878           implemented. The rest is vapourware.
1879
1880       Specifying operations
1881           One of the commonest requirements when building a parser is to
1882           specify binary operators. Unfortunately, in a normal grammar, the
1883           rules for such things are awkward:
1884
1885               disjunction:    conjunction ('or' conjunction)(s?)
1886                   { $return = [ $item[1], @{$item[2]} ] }
1887
1888               conjunction:    atom ('and' atom)(s?)
1889                   { $return = [ $item[1], @{$item[2]} ] }
1890
1891           or inefficient:
1892
1893               disjunction:    conjunction 'or' disjunction
1894                   { $return = [ $item[1], @{$item[2]} ] }
1895                  |    conjunction
1896                   { $return = [ $item[1] ] }
1897
1898               conjunction:    atom 'and' conjunction
1899                   { $return = [ $item[1], @{$item[2]} ] }
1900                  |    atom
1901                   { $return = [ $item[1] ] }
1902
1903           and either way is ugly and hard to get right.
1904
1905           The "<leftop:...>" and "<rightop:...>" directives provide an easier
1906           way of specifying such operations. Using "<leftop:...>" the above
1907           examples become:
1908
1909               disjunction:    <leftop: conjunction 'or' conjunction>
1910               conjunction:    <leftop: atom 'and' atom>
1911
1912           The "<leftop:...>" directive specifies a left-associative binary
1913           operator.  It is specified around three other grammar elements
1914           (typically subrules or terminals), which match the left operand,
1915           the operator itself, and the right operand respectively.
1916
1917           A "<leftop:...>" directive such as:
1918
1919               disjunction:    <leftop: conjunction 'or' conjunction>
1920
1921           is converted to the following:
1922
1923               disjunction:    ( conjunction ('or' conjunction)(s?)
1924                   { $return = [ $item[1], @{$item[2]} ] } )
1925
1926           In other words, a "<leftop:...>" directive matches the left operand
1927           followed by zero or more repetitions of both the operator and the
1928           right operand. It then flattens the matched items into an anonymous
1929           array which becomes the (single) value of the entire "<leftop:...>"
1930           directive.
1931
1932           For example, an "<leftop:...>" directive such as:
1933
1934               output:  <leftop: ident '<<' expr >
1935
1936           when given a string such as:
1937
1938               cout << var << "str" << 3
1939
1940           would match, and $item[1] would be set to:
1941
1942               [ 'cout', 'var', '"str"', '3' ]
1943
1944           In other words:
1945
1946               output:  <leftop: ident '<<' expr >
1947
1948           is equivalent to a left-associative operator:
1949
1950               output:  ident          { $return = [$item[1]]   }
1951                     |  ident '<<' expr        { $return = [@item[1,3]]     }
1952                     |  ident '<<' expr '<<' expr      { $return = [@item[1,3,5]]   }
1953                     |  ident '<<' expr '<<' expr '<<' expr    { $return = [@item[1,3,5,7]] }
1954                     #  ...etc...
1955
1956           Similarly, the "<rightop:...>" directive takes a left operand, an
1957           operator, and a right operand:
1958
1959               assign:  <rightop: var '=' expr >
1960
1961           and converts them to:
1962
1963               assign:  ( (var '=' {$return=$item[1]})(s?) expr
1964                   { $return = [ @{$item[1]}, $item[2] ] } )
1965
1966           which is equivalent to a right-associative operator:
1967
1968               assign:  expr       { $return = [$item[1]]       }
1969                     |  var '=' expr       { $return = [@item[1,3]]     }
1970                     |  var '=' var '=' expr   { $return = [@item[1,3,5]]   }
1971                     |  var '=' var '=' var '=' expr   { $return = [@item[1,3,5,7]] }
1972                     #  ...etc...
1973
1974           Note that for both the "<leftop:...>" and "<rightop:...>"
1975           directives, the directive does not normally return the operator
1976           itself, just a list of the operands involved. This is particularly
1977           handy for specifying lists:
1978
1979               list: '(' <leftop: list_item ',' list_item> ')'
1980                   { $return = $item[2] }
1981
1982           There is, however, a problem: sometimes the operator is itself
1983           significant.  For example, in a Perl list a comma and a "=>" are
1984           both valid separators, but the "=>" has additional stringification
1985           semantics.  Hence it's important to know which was used in each
1986           case.
1987
1988           To solve this problem the "<leftop:...>" and "<rightop:...>"
1989           directives do return the operator(s) as well, under two
1990           circumstances.  The first case is where the operator is specified
1991           as a subrule. In that instance, whatever the operator matches is
1992           returned (on the assumption that if the operator is important
1993           enough to have its own subrule, then it's important enough to
1994           return).
1995
1996           The second case is where the operator is specified as a regular
1997           expression. In that case, if the first bracketed subpattern of the
1998           regular expression matches, that matching value is returned (this
1999           is analogous to the behaviour of the Perl "split" function, except
2000           that only the first subpattern is returned).
2001
2002           In other words, given the input:
2003
2004               ( a=>1, b=>2 )
2005
2006           the specifications:
2007
2008               list:      '('  <leftop: list_item separator list_item>  ')'
2009
2010               separator: ',' | '=>'
2011
2012           or:
2013
2014               list:      '('  <leftop: list_item /(,|=>)/ list_item>  ')'
2015
2016           cause the list separators to be interleaved with the operands in
2017           the anonymous array in $item[2]:
2018
2019               [ 'a', '=>', '1', ',', 'b', '=>', '2' ]
2020
2021           But the following version:
2022
2023               list:      '('  <leftop: list_item /,|=>/ list_item>  ')'
2024
2025           returns only the operators:
2026
2027               [ 'a', '1', 'b', '2' ]
2028
2029           Of course, none of the above specifications handle the case of an
2030           empty list, since the "<leftop:...>" and "<rightop:...>" directives
2031           require at least a single right or left operand to match. To
2032           specify that the operator can match "trivially", it's necessary to
2033           add a "(s?)" qualifier to the directive:
2034
2035               list:      '('  <leftop: list_item /(,|=>)/ list_item>(s?)  ')'
2036
2037           Note that in almost all the above examples, the first and third
2038           arguments of the "<leftop:...>" directive were the same subrule.
2039           That is because "<leftop:...>"'s are frequently used to specify
2040           "separated" lists of the same type of item. To make such lists
2041           easier to specify, the following syntax:
2042
2043               list:   element(s /,/)
2044
2045           is exactly equivalent to:
2046
2047               list:   <leftop: element /,/ element>
2048
2049           Note that the separator must be specified as a raw pattern (i.e.
2050           not a string or subrule).
2051
2052       Scored productions
2053           By default, Parse::RecDescent grammar rules always accept the first
2054           production that matches the input. But if two or more productions
2055           may potentially match the same input, choosing the first that does
2056           so may not be optimal.
2057
2058           For example, if you were parsing the sentence "time flies like an
2059           arrow", you might use a rule like this:
2060
2061               sentence: verb noun preposition article noun { [@item] }
2062               | adjective noun verb article noun   { [@item] }
2063               | noun verb preposition article noun { [@item] }
2064
2065           Each of these productions matches the sentence, but the third one
2066           is the most likely interpretation. However, if the sentence had
2067           been "fruit flies like a banana", then the second production is
2068           probably the right match.
2069
2070           To cater for such situations, the "<score:...>" can be used.  The
2071           directive is equivalent to an unconditional "<reject>", except that
2072           it allows you to specify a "score" for the current production. If
2073           that score is numerically greater than the best score of any
2074           preceding production, the current production is cached for later
2075           consideration. If no later production matches, then the cached
2076           production is treated as having matched, and the value of the item
2077           immediately before its "<score:...>" directive is returned as the
2078           result.
2079
2080           In other words, by putting a "<score:...>" directive at the end of
2081           each production, you can select which production matches using
2082           criteria other than specification order. For example:
2083
2084               sentence: verb noun preposition article noun { [@item] } <score: sensible(@item)>
2085               | adjective noun verb article noun   { [@item] } <score: sensible(@item)>
2086               | noun verb preposition article noun { [@item] } <score: sensible(@item)>
2087
2088           Now, when each production reaches its respective "<score:...>"
2089           directive, the subroutine "sensible" will be called to evaluate the
2090           matched items (somehow). Once all productions have been tried, the
2091           one which "sensible" scored most highly will be the one that is
2092           accepted as a match for the rule.
2093
2094           The variable $score always holds the current best score of any
2095           production, and the variable $score_return holds the corresponding
2096           return value.
2097
2098           As another example, the following grammar matches lines that may be
2099           separated by commas, colons, or semi-colons. This can be tricky if
2100           a colon-separated line also contains commas, or vice versa. The
2101           grammar resolves the ambiguity by selecting the rule that results
2102           in the fewest fields:
2103
2104               line: seplist[sep=>',']  <score: -@{$item[1]}>
2105               | seplist[sep=>':']  <score: -@{$item[1]}>
2106               | seplist[sep=>" "]  <score: -@{$item[1]}>
2107
2108               seplist: <skip:""> <leftop: /[^$arg{sep}]*/ "$arg{sep}" /[^$arg{sep}]*/>
2109
2110           Note the use of negation within the "<score:...>" directive to
2111           ensure that the seplist with the most items gets the lowest score.
2112
2113           As the above examples indicate, it is often the case that all
2114           productions in a rule use exactly the same "<score:...>" directive.
2115           It is tedious to have to repeat this identical directive in every
2116           production, so Parse::RecDescent also provides the
2117           "<autoscore:...>" directive.
2118
2119           If an "<autoscore:...>" directive appears in any production of a
2120           rule, the code it specifies is used as the scoring code for every
2121           production of that rule, except productions that already end with
2122           an explicit "<score:...>" directive. Thus the rules above could be
2123           rewritten:
2124
2125               line: <autoscore: -@{$item[1]}>
2126               line: seplist[sep=>',']
2127               | seplist[sep=>':']
2128               | seplist[sep=>" "]
2129
2130
2131               sentence: <autoscore: sensible(@item)>
2132               | verb noun preposition article noun { [@item] }
2133               | adjective noun verb article noun   { [@item] }
2134               | noun verb preposition article noun { [@item] }
2135
2136           Note that the "<autoscore:...>" directive itself acts as an
2137           unconditional "<reject>", and (like the "<rulevar:...>" directive)
2138           is pruned at compile-time wherever possible.
2139
2140       Dispensing with grammar checks
2141           During the compilation phase of parser construction,
2142           Parse::RecDescent performs a small number of checks on the grammar
2143           it's given. Specifically it checks that the grammar is not left-
2144           recursive, that there are no "insatiable" constructs of the form:
2145
2146               rule: subrule(s) subrule
2147
2148           and that there are no rules missing (i.e. referred to, but never
2149           defined).
2150
2151           These checks are important during development, but can slow down
2152           parser construction in stable code. So Parse::RecDescent provides
2153           the <nocheck> directive to turn them off. The directive can only
2154           appear before the first rule definition, and switches off checking
2155           throughout the rest of the current grammar.
2156
2157           Typically, this directive would be added when a parser has been
2158           thoroughly tested and is ready for release.
2159
2160   Subrule argument lists
2161       It is occasionally useful to pass data to a subrule which is being
2162       invoked. For example, consider the following grammar fragment:
2163
2164           classdecl: keyword decl
2165
2166           keyword:   'struct' | 'class';
2167
2168           decl:      # WHATEVER
2169
2170       The "decl" rule might wish to know which of the two keywords was used
2171       (since it may affect some aspect of the way the subsequent declaration
2172       is interpreted). "Parse::RecDescent" allows the grammar designer to
2173       pass data into a rule, by placing that data in an argument list (that
2174       is, in square brackets) immediately after any subrule item in a
2175       production. Hence, we could pass the keyword to "decl" as follows:
2176
2177           classdecl: keyword decl[ $item[1] ]
2178
2179           keyword:   'struct' | 'class';
2180
2181           decl:      # WHATEVER
2182
2183       The argument list can consist of any number (including zero!) of comma-
2184       separated Perl expressions. In other words, it looks exactly like a
2185       Perl anonymous array reference. For example, we could pass the keyword,
2186       the name of the surrounding rule, and the literal 'keyword' to "decl"
2187       like so:
2188
2189           classdecl: keyword decl[$item[1],$item[0],'keyword']
2190
2191           keyword:   'struct' | 'class';
2192
2193           decl:      # WHATEVER
2194
2195       Within the rule to which the data is passed ("decl" in the above
2196       examples) that data is available as the elements of a local variable
2197       @arg. Hence "decl" might report its intentions as follows:
2198
2199           classdecl: keyword decl[$item[1],$item[0],'keyword']
2200
2201           keyword:   'struct' | 'class';
2202
2203           decl:      { print "Declaring $arg[0] (a $arg[2])\n";
2204                print "(this rule called by $arg[1])" }
2205
2206       Subrule argument lists can also be interpreted as hashes, simply by
2207       using the local variable %arg instead of @arg. Hence we could rewrite
2208       the previous example:
2209
2210           classdecl: keyword decl[keyword => $item[1],
2211               caller  => $item[0],
2212               type    => 'keyword']
2213
2214           keyword:   'struct' | 'class';
2215
2216           decl:      { print "Declaring $arg{keyword} (a $arg{type})\n";
2217                print "(this rule called by $arg{caller})" }
2218
2219       Both @arg and %arg are always available, so the grammar designer may
2220       choose whichever convention (or combination of conventions) suits best.
2221
2222       Subrule argument lists are also useful for creating "rule templates"
2223       (especially when used in conjunction with the "<matchrule:...>"
2224       directive). For example, the subrule:
2225
2226           list:     <matchrule:$arg{rule}> /$arg{sep}/ list[%arg]
2227               { $return = [ $item[1], @{$item[3]} ] }
2228           |     <matchrule:$arg{rule}>
2229               { $return = [ $item[1]] }
2230
2231       is a handy template for the common problem of matching a separated
2232       list.  For example:
2233
2234           function: 'func' name '(' list[rule=>'param',sep=>';'] ')'
2235
2236           param:    list[rule=>'name',sep=>','] ':' typename
2237
2238           name:     /\w+/
2239
2240           typename: name
2241
2242       When a subrule argument list is used with a repeated subrule, the
2243       argument list goes before the repetition specifier:
2244
2245           list:   /some|many/ thing[ $item[1] ](s)
2246
2247       The argument list is "late bound". That is, it is re-evaluated for
2248       every repetition of the repeated subrule.  This means that each
2249       repeated attempt to match the subrule may be passed a completely
2250       different set of arguments if the value of the expression in the
2251       argument list changes between attempts. So, for example, the grammar:
2252
2253           { $::species = 'dogs' }
2254
2255           pair:   'two' animal[$::species](s)
2256
2257           animal: /$arg[0]/ { $::species = 'cats' }
2258
2259       will match the string "two dogs cats cats" completely, whereas it will
2260       only match the string "two dogs dogs dogs" up to the eighth letter. If
2261       the value of the argument list were "early bound" (that is, evaluated
2262       only the first time a repeated subrule match is attempted), one would
2263       expect the matching behaviours to be reversed.
2264
2265       Of course, it is possible to effectively "early bind" such argument
2266       lists by passing them a value which does not change on each repetition.
2267       For example:
2268
2269           { $::species = 'dogs' }
2270
2271           pair:   'two' { $::species } animal[$item[2]](s)
2272
2273           animal: /$arg[0]/ { $::species = 'cats' }
2274
2275       Arguments can also be passed to the start rule, simply by appending
2276       them to the argument list with which the start rule is called (after
2277       the "line number" parameter). For example, given:
2278
2279           $parser = new Parse::RecDescent ( $grammar );
2280
2281           $parser->data($text, 1, "str", 2, \@arr);
2282
2283           #         ^^^^^  ^  ^^^^^^^^^^^^^^^
2284           #       |    |     |
2285           # TEXT TO BE PARSED  |     |
2286           # STARTING LINE NUMBER     |
2287           # ELEMENTS OF @arg WHICH IS PASSED TO RULE data
2288
2289       then within the productions of the rule "data", the array @arg will
2290       contain "("str", 2, \@arr)".
2291
2292   Alternations
2293       Alternations are implicit (unnamed) rules defined as part of a
2294       production. An alternation is defined as a series of '|'-separated
2295       productions inside a pair of round brackets. For example:
2296
2297           character: 'the' ( good | bad | ugly ) /dude/
2298
2299       Every alternation implicitly defines a new subrule, whose
2300       automatically-generated name indicates its origin:
2301       "_alternation_<I>_of_production_<P>_of_rule<R>" for the appropriate
2302       values of <I>, <P>, and <R>. A call to this implicit subrule is then
2303       inserted in place of the brackets. Hence the above example is merely a
2304       convenient short-hand for:
2305
2306           character: 'the'
2307              _alternation_1_of_production_1_of_rule_character
2308              /dude/
2309
2310           _alternation_1_of_production_1_of_rule_character:
2311              good | bad | ugly
2312
2313       Since alternations are parsed by recursively calling the parser
2314       generator, any type(s) of item can appear in an alternation. For
2315       example:
2316
2317           character: 'the' ( 'high' "plains"  # Silent, with poncho
2318                | /no[- ]name/ # Silent, no poncho
2319                | vengeance_seeking    # Poncho-optional
2320                | <error>
2321                ) drifter
2322
2323       In this case, if an error occurred, the automatically generated message
2324       would be:
2325
2326           ERROR (line <N>): Invalid implicit subrule: Expected
2327                 'high' or /no[- ]name/ or generic,
2328                 but found "pacifist" instead
2329
2330       Since every alternation actually has a name, it's even possible to
2331       extend or replace them:
2332
2333           parser->Replace(
2334           "_alternation_1_of_production_1_of_rule_character:
2335               'generic Eastwood'"
2336               );
2337
2338       More importantly, since alternations are a form of subrule, they can be
2339       given repetition specifiers:
2340
2341           character: 'the' ( good | bad | ugly )(?) /dude/
2342
2343   Incremental Parsing
2344       "Parse::RecDescent" provides two methods - "Extend" and "Replace" -
2345       which can be used to alter the grammar matched by a parser. Both
2346       methods take the same argument as "Parse::RecDescent::new", namely a
2347       grammar specification string
2348
2349       "Parse::RecDescent::Extend" interprets the grammar specification and
2350       adds any productions it finds to the end of the rules for which they
2351       are specified. For example:
2352
2353           $add = "name: 'Jimmy-Bob' | 'Bobby-Jim'\ndesc: colour /necks?/";
2354           parser->Extend($add);
2355
2356       adds two productions to the rule "name" (creating it if necessary) and
2357       one production to the rule "desc".
2358
2359       "Parse::RecDescent::Replace" is identical, except that it first resets
2360       are rule specified in the additional grammar, removing any existing
2361       productions.  Hence after:
2362
2363           $add = "name: 'Jimmy-Bob' | 'Bobby-Jim'\ndesc: colour /necks?/";
2364           parser->Replace($add);
2365
2366       there are only valid "name"s and the one possible description.
2367
2368       A more interesting use of the "Extend" and "Replace" methods is to call
2369       them inside the action of an executing parser. For example:
2370
2371           typedef: 'typedef' type_name identifier ';'
2372                  { $thisparser->Extend("type_name: '$item[3]'") }
2373              | <error>
2374
2375           identifier: ...!type_name /[A-Za-z_]w*/
2376
2377       which automatically prevents type names from being typedef'd, or:
2378
2379           command: 'map' key_name 'to' abort_key
2380                  { $thisparser->Replace("abort_key: '$item[2]'") }
2381              | 'map' key_name 'to' key_name
2382                  { map_key($item[2],$item[4]) }
2383              | abort_key
2384                  { exit if confirm("abort?") }
2385
2386           abort_key: 'q'
2387
2388           key_name: ...!abort_key /[A-Za-z]/
2389
2390       which allows the user to change the abort key binding, but not to
2391       unbind it.
2392
2393       The careful use of such constructs makes it possible to reconfigure a a
2394       running parser, eliminating the need for semantic feedback by providing
2395       syntactic feedback instead. However, as currently implemented,
2396       "Replace()" and "Extend()" have to regenerate and re-"eval" the entire
2397       parser whenever they are called. This makes them quite slow for large
2398       grammars.
2399
2400       In such cases, the judicious use of an interpolated regex is likely to
2401       be far more efficient:
2402
2403           typedef: 'typedef' type_name/ identifier ';'
2404                  { $thisparser->{local}{type_name} .= "|$item[3]" }
2405              | <error>
2406
2407           identifier: ...!type_name /[A-Za-z_]w*/
2408
2409           type_name: /$thisparser->{local}{type_name}/
2410
2411   Precompiling parsers
2412       Normally Parse::RecDescent builds a parser from a grammar at run-time.
2413       That approach simplifies the design and implementation of parsing code,
2414       but has the disadvantage that it slows the parsing process down - you
2415       have to wait for Parse::RecDescent to build the parser every time the
2416       program runs. Long or complex grammars can be particularly slow to
2417       build, leading to unacceptable delays at start-up.
2418
2419       To overcome this, the module provides a way of "pre-building" a parser
2420       object and saving it in a separate module. That module can then be used
2421       to create clones of the original parser.
2422
2423       A grammar may be precompiled using the "Precompile" class method.  For
2424       example, to precompile a grammar stored in the scalar $grammar, and
2425       produce a class named PreGrammar in a module file named PreGrammar.pm,
2426       you could use:
2427
2428           use Parse::RecDescent;
2429
2430           Parse::RecDescent->Precompile([$options_hashref], $grammar, "PreGrammar", ["RuntimeClass"]);
2431
2432       The first required argument is the grammar string, the second is the
2433       name of the class to be built. The name of the module file is generated
2434       automatically by appending ".pm" to the last element of the class name.
2435       Thus
2436
2437           Parse::RecDescent->Precompile($grammar, "My::New::Parser");
2438
2439       would produce a module file named Parser.pm.
2440
2441       After the class name, you may specify the name of the runtime_class
2442       called by the Precompiled parser.  See "Precompiled runtimes" for more
2443       details.
2444
2445       An optional hash reference may be supplied as the first argument to
2446       "Precompile".  This argument is currently EXPERIMENTAL, and may change
2447       in a future release of Parse::RecDescent.  The only supported option is
2448       currently "-standalone", see "Standalone precompiled parsers".
2449
2450       It is somewhat tedious to have to write a small Perl program just to
2451       generate a precompiled grammar class, so Parse::RecDescent has some
2452       special magic that allows you to do the job directly from the command-
2453       line.
2454
2455       If your grammar is specified in a file named grammar, you can generate
2456       a class named Yet::Another::Grammar like so:
2457
2458           > perl -MParse::RecDescent - grammar Yet::Another::Grammar [Runtime::Class]
2459
2460       This would produce a file named Grammar.pm containing the full
2461       definition of a class called Yet::Another::Grammar. Of course, to use
2462       that class, you would need to put the Grammar.pm file in a directory
2463       named Yet/Another, somewhere in your Perl include path.
2464
2465       Having created the new class, it's very easy to use it to build a
2466       parser. You simply "use" the new module, and then call its "new" method
2467       to create a parser object. For example:
2468
2469           use Yet::Another::Grammar;
2470           my $parser = Yet::Another::Grammar->new();
2471
2472       The effect of these two lines is exactly the same as:
2473
2474           use Parse::RecDescent;
2475
2476           open GRAMMAR_FILE, "grammar" or die;
2477           local $/;
2478           my $grammar = <GRAMMAR_FILE>;
2479
2480           my $parser = Parse::RecDescent->new($grammar);
2481
2482       only considerably faster.
2483
2484       Note however that the parsers produced by either approach are exactly
2485       the same, so whilst precompilation has an effect on set-up speed, it
2486       has no effect on parsing speed. RecDescent 2.0 will address that
2487       problem.
2488
2489       Standalone precompiled parsers
2490
2491       Until version 1.967003 of Parse::RecDescent, parser modules built with
2492       "Precompile" were dependent on Parse::RecDescent.  Future
2493       Parse::RecDescent releases with different internal implementations
2494       would break pre-existing precompiled parsers.
2495
2496       Version 1.967_005 added the ability for Parse::RecDescent to include
2497       itself in the resulting .pm file if you pass the boolean option
2498       "-standalone" to "Precompile":
2499
2500           Parse::RecDescent->Precompile({ -standalone => 1, },
2501               $grammar, "My::New::Parser");
2502
2503       Parse::RecDescent is included as $class::_Runtime in order to avoid
2504       conflicts between an installed version of Parse::RecDescent and other
2505       precompiled, standalone parser made with Parse::RecDescent.  The name
2506       of this class may be changed with the "-runtime_class" option to
2507       Precompile.  This renaming is experimental, and is subject to change in
2508       future versions.
2509
2510       Precompiled parsers remain dependent on Parse::RecDescent by default,
2511       as this feature is still considered experimental.  In the future,
2512       standalone parsers will become the default.
2513
2514       Precompiled runtimes
2515
2516       Standalone precompiled parsers each include a copy of
2517       Parse::RecDescent.  For users who have a family of related precompiled
2518       parsers, this is very inefficient.  "Precompile" now supports an
2519       experimental "-runtime_class" option.  To build a precompiled parser
2520       with a different runtime name, call:
2521
2522           Parse::RecDescent->Precompile({
2523                   -standalone => 1,
2524                   -runtime_class => "My::Runtime",
2525               },
2526               $grammar, "My::New::Parser");
2527
2528       The resulting standalone parser will contain a copy of
2529       Parse::RecDescent, renamed to "My::Runtime".
2530
2531       To build a set of parsers that "use" a custom-named runtime, without
2532       including that runtime in the output, simply build those parsers with
2533       "-runtime_class" and without "-standalone":
2534
2535           Parse::RecDescent->Precompile({
2536                   -runtime_class => "My::Runtime",
2537               },
2538               $grammar, "My::New::Parser");
2539
2540       The runtime itself must be generated as well, so that it may be "use"d
2541       by My::New::Parser.  To generate the runtime file, use one of the two
2542       folling calls:
2543
2544           Parse::RecDescent->PrecompiledRuntime("My::Runtime");
2545
2546           Parse::RecDescent->Precompile({
2547                   -standalone => 1,
2548                   -runtime_class => "My::Runtime",
2549               },
2550               '', # empty grammar
2551               "My::Runtime");
2552

GOTCHAS

2554       This section describes common mistakes that grammar writers seem to
2555       make on a regular basis.
2556
2557   1. Expecting an error to always invalidate a parse
2558       A common mistake when using error messages is to write the grammar like
2559       this:
2560
2561           file: line(s)
2562
2563           line: line_type_1
2564           | line_type_2
2565           | line_type_3
2566           | <error>
2567
2568       The expectation seems to be that any line that is not of type 1, 2 or 3
2569       will invoke the "<error>" directive and thereby cause the parse to
2570       fail.
2571
2572       Unfortunately, that only happens if the error occurs in the very first
2573       line.  The first rule states that a "file" is matched by one or more
2574       lines, so if even a single line succeeds, the first rule is completely
2575       satisfied and the parse as a whole succeeds. That means that any error
2576       messages generated by subsequent failures in the "line" rule are
2577       quietly ignored.
2578
2579       Typically what's really needed is this:
2580
2581           file: line(s) eofile    { $return = $item[1] }
2582
2583           line: line_type_1
2584           | line_type_2
2585           | line_type_3
2586           | <error>
2587
2588           eofile: /^\Z/
2589
2590       The addition of the "eofile" subrule  to the first production means
2591       that a file only matches a series of successful "line" matches that
2592       consume the complete input text. If any input text remains after the
2593       lines are matched, there must have been an error in the last "line". In
2594       that case the "eofile" rule will fail, causing the entire "file" rule
2595       to fail too.
2596
2597       Note too that "eofile" must match "/^\Z/" (end-of-text), not "/^\cZ/"
2598       or "/^\cD/" (end-of-file).
2599
2600       And don't forget the action at the end of the production. If you just
2601       write:
2602
2603           file: line(s) eofile
2604
2605       then the value returned by the "file" rule will be the value of its
2606       last item: "eofile". Since "eofile" always returns an empty string on
2607       success, that will cause the "file" rule to return that empty string.
2608       Apart from returning the wrong value, returning an empty string will
2609       trip up code such as:
2610
2611           $parser->file($filetext) || die;
2612
2613       (since "" is false).
2614
2615       Remember that Parse::RecDescent returns undef on failure, so the only
2616       safe test for failure is:
2617
2618           defined($parser->file($filetext)) || die;
2619
2620   2. Using a "return" in an action
2621       An action is like a "do" block inside the subroutine implementing the
2622       surrounding rule. So if you put a "return" statement in an action:
2623
2624           range: '(' start '..' end )'
2625               { return $item{end} }
2626              /\s+/
2627
2628       that subroutine will immediately return, without checking the rest of
2629       the items in the current production (e.g. the "/\s+/") and without
2630       setting up the necessary data structures to tell the parser that the
2631       rule has succeeded.
2632
2633       The correct way to set a return value in an action is to set the
2634       $return variable:
2635
2636           range: '(' start '..' end )'
2637                       { $return = $item{end} }
2638                  /\s+/
2639
2640   2. Setting $Parse::RecDescent::skip at parse time
2641       If you want to change the default skipping behaviour (see "Terminal
2642       Separators" and the "<skip:...>" directive) by setting
2643       $Parse::RecDescent::skip you have to remember to set this variable
2644       before creating the grammar object.
2645
2646       For example, you might want to skip all Perl-like comments with this
2647       regular expression:
2648
2649          my $skip_spaces_and_comments = qr/
2650                (?mxs:
2651                   \s+         # either spaces
2652                   | \# .*?$   # or a dash and whatever up to the end of line
2653                )*             # repeated at will (in whatever order)
2654             /;
2655
2656       And then:
2657
2658          my $parser1 = Parse::RecDescent->new($grammar);
2659
2660          $Parse::RecDescent::skip = $skip_spaces_and_comments;
2661
2662          my $parser2 = Parse::RecDescent->new($grammar);
2663
2664          $parser1->parse($text); # this does not cope with comments
2665          $parser2->parse($text); # this skips comments correctly
2666
2667       The two parsers behave differently, because any skipping behaviour
2668       specified via $Parse::RecDescent::skip is hard-coded when the grammar
2669       object is built, not at parse time.
2670

DIAGNOSTICS

2672       Diagnostics are intended to be self-explanatory (particularly if you
2673       use -RD_HINT (under perl -s) or define $::RD_HINT inside the program).
2674
2675       "Parse::RecDescent" currently diagnoses the following:
2676
2677       ·   Invalid regular expressions used as pattern terminals (fatal
2678           error).
2679
2680       ·   Invalid Perl code in code blocks (fatal error).
2681
2682       ·   Lookahead used in the wrong place or in a nonsensical way (fatal
2683           error).
2684
2685       ·   "Obvious" cases of left-recursion (fatal error).
2686
2687       ·   Missing or extra components in a "<leftop>" or "<rightop>"
2688           directive.
2689
2690       ·   Unrecognisable components in the grammar specification (fatal
2691           error).
2692
2693       ·   "Orphaned" rule components specified before the first rule (fatal
2694           error) or after an "<error>" directive (level 3 warning).
2695
2696       ·   Missing rule definitions (this only generates a level 3 warning,
2697           since you may be providing them later via
2698           "Parse::RecDescent::Extend()").
2699
2700       ·   Instances where greedy repetition behaviour will almost certainly
2701           cause the failure of a production (a level 3 warning - see "ON-
2702           GOING ISSUES AND FUTURE DIRECTIONS" below).
2703
2704       ·   Attempts to define rules named 'Replace' or 'Extend', which cannot
2705           be called directly through the parser object because of the
2706           predefined meaning of "Parse::RecDescent::Replace" and
2707           "Parse::RecDescent::Extend". (Only a level 2 warning is generated,
2708           since such rules can still be used as subrules).
2709
2710       ·   Productions which consist of a single "<error?>" directive, and
2711           which therefore may succeed unexpectedly (a level 2 warning, since
2712           this might conceivably be the desired effect).
2713
2714       ·   Multiple consecutive lookahead specifiers (a level 1 warning only,
2715           since their effects simply accumulate).
2716
2717       ·   Productions which start with a "<reject>" or "<rulevar:...>"
2718           directive. Such productions are optimized away (a level 1 warning).
2719
2720       ·   Rules which are autogenerated under $::AUTOSTUB (a level 1
2721           warning).
2722

AUTHOR

2724       Damian Conway (damian@conway.org) Jeremy T. Braun (JTBRAUN@CPAN.org)
2725       [current maintainer]
2726

BUGS AND IRRITATIONS

2728       There are undoubtedly serious bugs lurking somewhere in this much code
2729       :-) Bug reports, test cases and other feedback are most welcome.
2730
2731       Ongoing annoyances include:
2732
2733       ·   There's no support for parsing directly from an input stream.  If
2734           and when the Perl Gods give us regular expressions on streams, this
2735           should be trivial (ahem!) to implement.
2736
2737       ·   The parser generator can get confused if actions aren't properly
2738           closed or if they contain particularly nasty Perl syntax errors
2739           (especially unmatched curly brackets).
2740
2741       ·   The generator only detects the most obvious form of left recursion
2742           (potential recursion on the first subrule in a rule). More subtle
2743           forms of left recursion (for example, through the second item in a
2744           rule after a "zero" match of a preceding "zero-or-more" repetition,
2745           or after a match of a subrule with an empty production) are not
2746           found.
2747
2748       ·   Instead of complaining about left-recursion, the generator should
2749           silently transform the grammar to remove it. Don't expect this
2750           feature any time soon as it would require a more sophisticated
2751           approach to parser generation than is currently used.
2752
2753       ·   The generated parsers don't always run as fast as might be wished.
2754
2755       ·   The meta-parser should be bootstrapped using "Parse::RecDescent"
2756           :-)
2757

ON-GOING ISSUES AND FUTURE DIRECTIONS

2759       1.  Repetitions are "incorrigibly greedy" in that they will eat
2760           everything they can and won't backtrack if that behaviour causes a
2761           production to fail needlessly.  So, for example:
2762
2763               rule: subrule(s) subrule
2764
2765           will never succeed, because the repetition will eat all the
2766           subrules it finds, leaving none to match the second item. Such
2767           constructions are relatively rare (and "Parse::RecDescent::new"
2768           generates a warning whenever they occur) so this may not be a
2769           problem, especially since the insatiable behaviour can be overcome
2770           "manually" by writing:
2771
2772               rule: penultimate_subrule(s) subrule
2773
2774               penultimate_subrule: subrule ...subrule
2775
2776           The issue is that this construction is exactly twice as expensive
2777           as the original, whereas backtracking would add only 1/N to the
2778           cost (for matching N repetitions of "subrule"). I would welcome
2779           feedback on the need for backtracking; particularly on cases where
2780           the lack of it makes parsing performance problematical.
2781
2782       2.  Having opened that can of worms, it's also necessary to consider
2783           whether there is a need for non-greedy repetition specifiers.
2784           Again, it's possible (at some cost) to manually provide the
2785           required functionality:
2786
2787               rule: nongreedy_subrule(s) othersubrule
2788
2789               nongreedy_subrule: subrule ...!othersubrule
2790
2791           Overall, the issue is whether the benefit of this extra
2792           functionality outweighs the drawbacks of further complicating the
2793           (currently minimalist) grammar specification syntax, and (worse)
2794           introducing more overhead into the generated parsers.
2795
2796       3.  An "<autocommit>" directive would be nice. That is, it would be
2797           useful to be able to say:
2798
2799               command: <autocommit>
2800               command: 'find' name
2801                  | 'find' address
2802                  | 'do' command 'at' time 'if' condition
2803                  | 'do' command 'at' time
2804                  | 'do' command
2805                  | unusual_command
2806
2807           and have the generator work out that this should be "pruned" thus:
2808
2809               command: 'find' name
2810                  | 'find' <commit> address
2811                  | 'do' <commit> command <uncommit>
2812                   'at' time
2813                   'if' <commit> condition
2814                  | 'do' <commit> command <uncommit>
2815                   'at' <commit> time
2816                  | 'do' <commit> command
2817                  | unusual_command
2818
2819           There are several issues here. Firstly, should the "<autocommit>"
2820           automatically install an "<uncommit>" at the start of the last
2821           production (on the grounds that the "command" rule doesn't know
2822           whether an "unusual_command" might start with "find" or "do") or
2823           should the "unusual_command" subgraph be analysed (to see if it
2824           might be viable after a "find" or "do")?
2825
2826           The second issue is how regular expressions should be treated. The
2827           simplest approach would be simply to uncommit before them (on the
2828           grounds that they might match). Better efficiency would be obtained
2829           by analyzing all preceding literal tokens to determine whether the
2830           pattern would match them.
2831
2832           Overall, the issues are: can such automated "pruning" approach a
2833           hand-tuned version sufficiently closely to warrant the extra set-up
2834           expense, and (more importantly) is the problem important enough to
2835           even warrant the non-trivial effort of building an automated
2836           solution?
2837

SUPPORT

2839   Source Code Repository
2840       <http://github.com/jtbraun/Parse-RecDescent>
2841
2842   Mailing List
2843       Visit <http://www.perlfoundation.org/perl5/index.cgi?parse_recdescent>
2844       to sign up for the mailing list.
2845
2846       <http://www.PerlMonks.org> is also a good place to ask questions.
2847       Previous posts about Parse::RecDescent can typically be found with this
2848       search: <http://perlmonks.org/index.pl?node=recdescent>.
2849
2850   FAQ
2851       Visit Parse::RecDescent::FAQ for answers to frequently (and not so
2852       frequently) asked questions about Parse::RecDescent.
2853
2854   View/Report Bugs
2855       To view the current bug list or report a new issue visit
2856       <https://rt.cpan.org/Public/Dist/Display.html?Name=Parse-RecDescent>.
2857

SEE ALSO

2859       Regexp::Grammars provides Parse::RecDescent style parsing using native
2860       Perl 5.10 regular expressions.
2861
2863       Copyright (c) 1997-2007, Damian Conway "<DCONWAY@CPAN.org>". All rights
2864       reserved.
2865
2866       This module is free software; you can redistribute it and/or modify it
2867       under the same terms as Perl itself. See perlartistic.
2868

DISCLAIMER OF WARRANTY

2870       BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
2871       FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT
2872       WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER
2873       PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND,
2874       EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
2875       WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE
2876       ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH
2877       YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL
2878       NECESSARY SERVICING, REPAIR, OR CORRECTION.
2879
2880       IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
2881       WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
2882       REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE
2883       TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR
2884       CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
2885       SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
2886       RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
2887       FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
2888       SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
2889       DAMAGES.
2890
2891
2892
2893perl v5.32.0                      2020-07-28              Parse::RecDescent(3)
Impressum