1Marpa::XS::Grammar(3) User Contributed Perl DocumentationMarpa::XS::Grammar(3)
2
3
4

NAME

6       Marpa::XS::Grammar - Marpa grammars
7

SYNOPSIS

9           my $grammar = Marpa::XS::Grammar->new(
10               {   start   => 'Expression',
11                   actions => 'My_Actions',
12                   default_action => 'first_arg',
13                   rules   => [
14                       { lhs => 'Expression', rhs => [qw/Term/] },
15                       { lhs => 'Term', rhs => [qw/Factor/] },
16                       { lhs => 'Factor', rhs => [qw/Number/] },
17                       { lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
18                       {   lhs    => 'Factor',
19                           rhs    => [qw/Factor Multiply Factor/],
20                           action => 'do_multiply'
21                       },
22                   ],
23               }
24           );
25
26           $grammar->precompute();
27

DESCRIPTION

29       To create a Marpa grammar object, use the "new" method.  Rules and
30       symbols may be specified when the grammar is created.
31
32       To change a Marpa grammar object, use the "set" method.  New rules may
33       be added until a grammar is precomputed.
34
35       A grammar cannot be used for parsing until it is precomputed.  To
36       precompute a Marpa grammar object, use the "precompute" method.  After
37       precomputation, no new rules may added and most other changes are
38       forbidden.
39
40   Symbol names
41       Symbol names that end in right square brackets are reserved for Marpa's
42       internal use.  Any other valid Perl string is an acceptable symbol
43       name.
44
45   Terminal symbols
46       If a grammar has no empty rules, all symbols are terminals by default.
47       Unlike most parser generators, Marpa will allow terminals to appear on
48       the left hand side of rules.  Marpa defines a terminal as a symbol
49       which is valid as an input token symbol.
50
51       Marpa allows direct marking of terminals using the "terminals" named
52       argument, and the "terminal" property.  If any terminal is directly
53       marked, only directly marked symbols will be terminals.
54
55       Marpa's behavior can be changed by unsetting the "lhs_terminals" named
56       argument.  When the "lhs_terminals" named argument is unset, only
57       symbols which do not appear on the LHS of a rule can be terminals.  If
58       no symbols are directly marked, Marpa will implicitly mark all the non-
59       LHS symbols as terminals.
60
61       Marpa is stricter for grammars with empty rules -- it does not allow
62       them to take the default.  There is an efficiency hit whenever the LHS
63       of an empty rule is a terminal as well, so Marpa does not allow this to
64       happen by default -- the user has to be explicitly indicate that that
65       is what she wants.  If a grammar has any empty rules, it must either
66       directly mark its terminals, or must unset the "lhs_terminals" named
67       argument.
68
69       One advantage of not taking the default is efficiency.  Precomputation
70       is faster for grammars which mark their terminals.  Also, the
71       recognizer checks that input tokens are terminals, so that being
72       selective about which symbols are terminals and which are not can allow
73       better input checking.
74
75       Summary
76
77       The following summarizes the logic which determines whether a symbol
78       "S" is terminal.  As a reminder, a symbol is said to be directly marked
79       as a terminal if it is one of the symbols in the "terminals" named
80       argument, or if the symbol has the "terminal" property set.
81
82       •   If any symbol is directly marked as a terminal, then a symbol "S"
83           is a terminal if and only if it is also directly marked as a
84           terminal.
85
86       •   If no symbol is directly marked as a terminal, and the
87           "lhs_terminals" named argument is unset, then all non-LHS symbols
88           are terminals.
89
90       •   If the grammar contains no empty rules, no symbol is directly
91           marked as a terminal, and the "lhs_terminals" named argument is set
92           or left at its default, then all symbols are terminals.
93
94       •   The only case not covered above is that in which a grammar contains
95           one or more empty rules, no symbol is directly marked as a
96           terminal, and the "lhs_terminals" named argument is set or left at
97           its default.  This is a fatal error.
98
99   Sequence rules
100       It is very common in a grammar for one symbol to produce a repeating
101       sequence.  Marpa allows a shorthand for this: sequence rules.  The RHS
102       of a sequence rule will be repeated, as specified by the "min" rule
103       property.  In sequence rules the RHS must always be one symbol in
104       length, and that symbol may not be a nullable symbol.
105
106       A rule is a sequence rule if the "min" rule property is defined.  "min"
107       can be 0 or 1, and specifies the minimum number of times that the
108       sequence is allowed to repeat.  As of this writing, the maximum number
109       of repetitions is always infinite.
110
111           { lhs => 'sequence', rhs => ['item'], min => 0 }
112
113       A "min" of zero indicates a sequence that repeats zero or more times.
114       This is the equivalent of using the star quantifier (""*"") in the
115       standard regular expression notation.
116
117           { lhs => 'sequence', rhs => ['item'], min => 1 }
118
119       A "min" of one indicates a sequence that repeats one or more times.
120       This is the equivalent of using the plus quantifier (""+"") in the
121       standard regular expression notation.
122
123       Sequences can have a separator, specified with the "separator" rule
124       property.  By default, separation is Perl-style: trailing separators
125       are allowed.  In ""proper"" separation, a separator must actually
126       separate two sequence items and therefore is not allowed after the last
127       item of a sequence.  If you prefer ""proper"" separation, you can set
128       the "proper" rule property.
129
130       Advantages of sequence rules
131
132       You are never forced to use sequence rules, but it's usually better if
133       you do.  When a sequence is written as a sequence rule, Marpa optimizes
134       it.
135
136       When a sequence is written using non-sequence rules, the semantics
137       typically wind up being spread over two or three Perl closures.  The
138       semantic action for a sequence rule is a single Perl closure.  Putting
139       the semantics into a single Perl closure often results in simpler and
140       more natural code.  See the section on sequences in the semantics
141       document.
142
143       Caveats
144
145       Marpa throws an exception if you try to use a nullable symbol as the
146       right hand side of a sequence rule, or as the separator for a sequence
147       rule.  The ban on nullables in sequences only applies to sequences when
148       they are written using sequence rules.  Nothing prevents you from
149       specifying a sequence of nullables using non-sequence rules.  But
150       usually there is no good reason to do this, and sequences of nullable
151       can be highly ambiguous, which makes them a good thing to avoid for
152       efficiency reasons.
153
154       To keep things simple, the right hand side of a sequence rule must be a
155       single symbol.  Of course, applications will often want to repeat
156       sequences of multiple symbols.  That is easy to do indirectly:
157
158           { lhs => 'sequence', rhs => [qw(item)], min => 0 },
159           { lhs => 'item', rhs => [qw(part1 part2)], },
160

CONSTRUCTOR

162   new
163           my $grammar = Marpa::XS::Grammar->new(
164               {   start   => 'Expression',
165                   actions => 'My_Actions',
166                   default_action => 'first_arg',
167                   rules   => [
168                       { lhs => 'Expression', rhs => [qw/Term/] },
169                       { lhs => 'Term', rhs => [qw/Factor/] },
170                       { lhs => 'Factor', rhs => [qw/Number/] },
171                       { lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
172                       {   lhs    => 'Factor',
173                           rhs    => [qw/Factor Multiply Factor/],
174                           action => 'do_multiply'
175                       },
176                   ],
177               }
178           );
179
180       "Marpa::XS::Grammar::new" returns a new Marpa grammar object or throws
181       an exception.  The arguments to "Marpa::XS::Grammar::new" are
182       references to hashes of named arguments.  In each key/value pair of
183       this hash, the hash key is the argument name and the hash value is the
184       value of the named argument.  The available named arguments are
185       described below.
186

MUTATORS

188   precompute
189           $grammar->precompute();
190
191       The "precompute" method compiles data structures that the recognizer
192       will need.  It returns the grammar object or throws an exception.
193
194   set
195           $grammar->set( { trace_file_handle => $trace_fh } );
196
197       The arguments to the "set" method are references to hashes of named
198       arguments.  The available named arguments are described below.  "set"
199       either returns true or throws an exception.
200

ACCESSORS

202   check_terminal
203       Returns a Perl true when its argument is the name of a terminal symbol.
204       Otherwise, returns a Perl false.  Not often needed, but a lexer may
205       find this the most convenient way to determine if a symbol is a
206       terminal.
207

TRACE ACCESSORS

209   show_AHFA
210           print $grammar->show_AHFA()
211               or die "print failed: $ERRNO";
212
213       Not of interest to most users.  Returns a multi-line string listing the
214       states of an internal data structure called the Aycock-Horspool Finite
215       Automaton.  Not useful before the grammar is precomputed.
216
217   show_problems
218           print $grammar->show_problems()
219               or die "print failed: $ERRNO";
220
221       Usually the application does not call this method directly.  Returns a
222       string describing any serious but non-fatal problems a grammar had in
223       the precomputation phase.  A serious problem is one that will prevent
224       parsing.  Warnings are not serious problems in this sense.  If there
225       were no serious problems, returns a string saying so.  This method is
226       not useful before precomputation.
227
228       In Marpa, most serious grammar problems are not immediately thrown as
229       exceptions.  This is because there can be a number of serious problems
230       in a grammar, particularly one that is large or in an early draft.  If
231       each serious problem caused an immediate exception, the user would have
232       to fix them one at a time -- very tedious.
233
234       The recognizer throws an exception when the user attempts to create a
235       parse from a grammar with serious problems.  When that happens, the
236       string returned by "show_problems" is part of the error message.
237
238   show_rules
239           print $grammar->show_rules()
240               or die "print failed: $ERRNO";
241
242       Returns a string listing the rules.  Each rule is shown with comments
243       which indicate either rule properties or internal settings.
244       "show_rules" is useful in debugging grammars.
245
246       Marpa does extensive rewriting of its grammars, and both the original
247       rules and the rewritten rules appear in the "show_rules" list.  When a
248       rule is rewritten, the original rule is often not used.  In that case,
249       ""!used"" will be one of the comments for the original rule.  The
250       ""!used"" comment also marks rules not used for reasons other than
251       rewrites.  For example, inaccessible and unproductive rules are also
252       marked ""!used"".
253
254       The ""discard_sep"" comment indicates that the rule discards separators
255       This is only relevant in sequence rules.  Other comments indicate
256       whether rules were nullable, unproductive, inaccessible, or empty.
257
258       The "vlhs", "vrhs" and "real" comments show rule settings relevant in
259       tracking "virtual" internal symbols.  These are used internally, to
260       optimize evaluation.
261
262   show_symbols
263           print $grammar->show_symbols()
264               or die "print failed: $ERRNO";
265
266       Returns a string listing the symbols, along with comments indicating
267       whether they were terminal, nulling, nullable, unproductive or
268       inaccessible.  Also shown is a list of rules with that symbol on the
269       left hand side, and a list of rules which have that symbol anywhere on
270       the right hand side.  Useful for debugging grammars.
271

NAMED ARGUMENTS

273   action_object
274       The "action_object" named argument specifies a class name to be used in
275       resolving action names to Perl closures.  A "new" constructor must be
276       defined in the "action_object" package.  It will be used to create the
277       per-parse-tree variables.  Details are in the document on semantics.
278
279   actions
280                   actions => 'My_Actions',
281
282       The "actions" named argument specifies the Perl package that Marpa will
283       use when resolving action names to Perl closures.  If both an "actions"
284       named argument and an "action_object" named argument are specified, the
285       package from the "actions" named argument is the only one used to
286       resolve action names.  The "actions" package is treated only as a
287       package, and not as a class.  Any "new" constructor in the "actions"
288       package is ignored.  Details are given in the document on semantics.
289
290   default_action
291                   default_action => 'first_arg',
292
293       The "default_action" named argument specifies the value action name for
294       rules without an "action" property.  Details are given in the document
295       on semantics.
296
297   default_null_value
298       The "default_null_value" named argument specifies the null value for
299       symbols without a "null_value" property.  Details are given in the
300       document on semantics.
301
302   inaccessible_ok
303       The value must be a reference to an array of symbol names.  By default,
304       Marpa warns if a symbol is inaccessible, but the warning is suppressed
305       for any symbol named in the array.  Setting the "inaccessible_ok" named
306       argument after grammar precomputation is useless, and itself results in
307       a warning.
308
309       Inaccessible symbols are symbols which cannot be derived from the start
310       symbol, and which therefore can never be part of a successful parse.
311       Inaccessible symbols often indicate errors in grammar design.  But a
312       user may have plans for these symbols, may wish to keep them as notes,
313       or may simply wish to deal with them later.
314
315   infinite_action
316       Takes as its value a string specifying what Marpa should do if it
317       discovers that its grammar is infinitely ambiguous.  The value must be
318       one of ""fatal"", ""warn"" or ""quiet"".  A grammar is infinitely
319       ambiguous if there is some input for which it produces an endless
320       number of parses.
321
322       If the value is ""fatal"", Marpa throws an exception when it encounters
323       an infinitely ambiguous grammar.  This is the default and will usually
324       be what the user wants.  In most cases, an infinitely ambiguous grammar
325       is simply a mistake.
326
327       ""quiet"" indicates that the user wants to allow infinitely ambiguous
328       grammars.  ""warn"" indicates that the user wants to allow infinitely
329       ambiguous grammars, but wants a warning message to be printed to the
330       trace file handle.
331
332   lhs_terminals
333       The value of the "lhs_terminals" named argument is a Boolean.  If true,
334       symbols which appear on the LHS of a rule are allowed to be terminals.
335       "lhs_terminals" is true by default.
336
337       If "lhs_terminals" is unset and no symbols are directly marked as
338       terminals, Marpa will mark all non-LHS symbols as terminals.  If any
339       symbol is directly marked, all terminals must be directly marked.  If
340       "lhs_terminals" is unset, but some symbols are directly marked, it is a
341       fatal error for a terminal to appear on the LHS of a rule.  For more,
342       see the discussion of terminals above.
343
344   rules
345       The value of the "rules" named argument is a reference to an array of
346       rule descriptors.  The "rules" named argument may be specified multiple
347       times, adding new rules to the grammar each time.  New rules may be
348       added until the grammar is precomputed.  The format of rule descriptors
349       is explained below.
350
351   start
352           start => 'Expression',
353
354       The value of the "start" named argument must be a symbol name.  It will
355       be used as the start symbol for the grammar.  The "start" named
356       argument is required.
357
358   symbols
359       The value of the "symbols" named arguments must be a reference to a
360       hash.  In each key/value pair of this hash, the hash key is the symbol
361       property name and the hash value is the symbol descriptor.  Symbol
362       descriptors are described below.
363
364       Note that the value of "symbols" named argument is a hash, but the
365       value of the "rules" named argument is an array.  This is because
366       symbol names make convenient hash keys.  For rules, there is no equally
367       natural choice for a hash key.
368
369   terminals
370       The value of the "terminals" named argument must be a reference to an
371       array of symbol names.  Specifying a symbol's name in a "terminals"
372       named argument is one way of directly marking it as a terminal.  See
373       the discussion of terminals above.
374
375   trace_file_handle
376       The value is a file handle.  Trace output and warning messages go to
377       the trace file handle.  By default the trace file handle is "STDERR".
378
379   trace_rules
380       Traces rules as they are added to the grammar.  Useful, but you will
381       usually prefer the "show_rules" method.  A "trace_rules" setting
382       becomes effective within the named argument hash which sets it.  A
383       trace message warns the user if he turns on rule tracing when rules
384       have already been added.
385
386   unproductive_ok
387       The value must be a reference to an array of symbol names.  By default,
388       Marpa warns if a symbol is unproductive, but the warning is suppressed
389       for any symbol named in the array.  Setting the "unproductive_ok" named
390       argument after grammar precomputation is useless, and itself results in
391       a warning.
392
393       Unproductive symbols are symbols which can never derive a sentence.  (A
394       sentence is a string of zero or more terminals.)  That means that
395       unproductive symbols can never be part of a successful parse.
396       Unproductive symbols often indicate errors in grammar design.  But a
397       user may have plans for these symbols, may wish to keep them as notes,
398       or may simply wish to deal with them later.
399
400   warnings
401       The value is a boolean.  Warnings are written to the trace file handle.
402       By default, warnings are on.  Usually, an application will want to
403       leave them on.  If warnings are turned off, turning them back on after
404       grammar precomputation is useless, and itself results in a warning.
405

RULE DESCRIPTORS

407           rules => [
408               { lhs => 'Expression', rhs => [qw/Term/] },
409               { lhs => 'Term',       rhs => [qw/Factor/] },
410               { lhs => 'Factor',     rhs => [qw/Number/] },
411               { lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
412               {   lhs    => 'Factor',
413                   rhs    => [qw/Factor Multiply Factor/],
414                   action => 'do_multiply'
415               },
416           ],
417
418   Rule descriptors as hashes
419       The long form descriptor of a rule is a reference to a hash of rule
420       properties.  In each key/value pair of this hash, the hash key is the
421       rule property name and the hash value is the value of that property.
422
423   action
424       The value of the "action" rule property is a string which specifies the
425       semantics for the rule.  For details, see the document on semantics.
426
427       The semantics of nulling symbols are dealt with on a per-symbol basis,
428       rather than a per-rule basis.  For this reason the "action" rule
429       property is useless for empty rules.  An exception is thrown if an
430       "action" property is defined for an empty rule.
431
432   keep
433       Separators in sequence rules are usually not semantically significant.
434       By default, Marpa throws away separators during parse tree traversal
435       and before node evaluation time, so that the semantic actions do not
436       see the separators.
437
438       If the value of the "keep" rule property is a Perl true, Marpa keeps
439       separators.  This allows the semantic actions to examine them.  The
440       downside is that the work of distinguishing sequence separators from
441       sequence items is pushed into the semantic actions.  For details about
442       the semantics, see the document on semantics.
443
444   lhs
445       The value of the "lhs" rule property must be a string containing the
446       name of the rule's left hand side symbol.  Every Marpa rule must have a
447       left hand side symbol.
448
449       The "lhs" plays a role in the semantics.  If no "action" rule property
450       is defined, Marpa looks in either the grammar's "actions" package or
451       the grammar's "action_object" for a Perl closure that has the name of
452       the "lhs" symbol.  For details, see the document on semantics.
453
454   min
455       "min" must be 0, 1, or undefined.  If "min" is 0 or 1, the rule is a
456       sequence rule.  If "min" is undefined, the rule is an ordinary, non-
457       sequence rule.
458
459       Only one symbol, called the sequence item, is allowed on the right hand
460       side of a sequence rule.  The sequence item may not be a nullable
461       symbol.  The input will be required to match the sequence item at least
462       "min" times and will be allowed to match the sequence item an unlimited
463       number of times.
464
465   null_ranking
466       "null_ranking" is ignored unless the recognizer's "ranking_method"
467       named argument is set to something other than its default.  The
468       "null_ranking" named argument allows the application to control the
469       order in which rules with nullable symbols are returned by the "value"
470       method.  Such rules can match the same input in several ways depending
471       on which symbols are nulled.  These different ways of nulling symbols
472       in a rule are called its null variants.
473
474       If "null_ranking" is undefined, the order of the null variants will be
475       arbitrary.  This is the default, and is acceptable to most
476       applications.  For details on using the "null_ranking" named argument,
477       see the document on parse order.
478
479   proper
480       By default, sequence rules with separators allow trailing separators,
481       Perl-style.  If the "proper" rule property is a Perl true, ""proper""
482       separation is enforced.  In proper separation, separation must actually
483       separate sequence items, and trailing separators are not allowed.
484
485   rank
486       "rank" is ignored unless the recognizer's "ranking_method" named
487       argument is set to something other than its default.  "rank", when
488       specified, must be an integer.  It may be negative.  "rank" is 0 by
489       default.  For details on using the "rank" named argument, see the
490       document on parse order.
491
492   rhs
493       The value of the "rhs" property is a reference to an array of strings
494       containing the names of the rule's right hand symbols, in order.  This
495       array may be zero length, in which case this is an empty rule -- a rule
496       with no symbols on the right hand side.  A rule is also empty if there
497       is no "rhs" specifier in its descriptor.
498
499   separator
500       Any sequence rule may have a "separator" defined.  The value must be a
501       symbol name.  By default, Marpa allows trailing separators.  This is
502       the usual style in Perl.  The separator must not be a nullable symbol.
503
504   Rule descriptors as arrays
505           rules => [
506               [ 'E', [qw/E Add E/],      'do_add' ],
507               [ 'E', [qw/E Multiply E/], 'do_multiply' ],
508               [ 'E', [qw/Number/], ],
509           ],
510
511       Rule descriptors may be given in "short form" -- as a reference to an
512       array.  The elements of the array, in order, are the "lhs" property,
513       the "rhs" property, and the "action" property.  The last two are
514       optional.  Omission of an optional property in a short form descriptor
515       has the same effect that omitting the same optional property would have
516       in the long form.
517
518   Duplicate rules
519       Marpa throws an exception if a duplicate rule is added.  Two rules are
520       considered duplicates if
521
522       •   Both rules have the same left hand symbol.
523
524       •   Both rules have the same right hand symbols in the same order.
525
526       This definition applies to sequence rules, as well as to ordinary
527       rules.  As a consequence, sequence rules can be considered duplicates
528       even when they have different separators and/or different minimum
529       counts.
530

SYMBOL DESCRIPTORS

532           symbols        => {
533               L => { null_value => 'null L' },
534               R => { null_value => 'null R' },
535               A => { null_value => 'null A' },
536               B => { null_value => 'null B' },
537               X => { null_value => 'null X', terminal => 1 },
538               Y => { null_value => 'null Y', terminal => 1 },
539           },
540
541       A symbol descriptor is a hash.  In the key/value pairs of this hash,
542       the hash key is the symbol property name and the hash value is the
543       value of that property.  The available symbol properties are as
544       follows:
545
546   null_value
547       The "null_value" symbol property specifies the null value of its
548       symbol.  Details are given in the document on semantics.
549
550   terminal
551       A boolean.  If true, it directly marks the symbol as a terminal.  If
552       false, it unmarks the symbol as a terminal.  For details, see the
553       section on terminals.
554
556         Copyright 2012 Jeffrey Kegler
557         This file is part of Marpa::XS.  Marpa::XS is free software: you can
558         redistribute it and/or modify it under the terms of the GNU Lesser
559         General Public License as published by the Free Software Foundation,
560         either version 3 of the License, or (at your option) any later version.
561
562         Marpa::XS is distributed in the hope that it will be useful,
563         but WITHOUT ANY WARRANTY; without even the implied warranty of
564         MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
565         Lesser General Public License for more details.
566
567         You should have received a copy of the GNU Lesser
568         General Public License along with Marpa::XS.  If not, see
569         http://www.gnu.org/licenses/.
570
571
572
573perl v5.34.0                      2022-01-21             Marpa::XS::Grammar(3)
Impressum