1Marpa::XS(3) User Contributed Perl Documentation Marpa::XS(3)
2
3
4
6 Marpa::XS - XS version of Marpa
7
9 use Marpa::XS;
10
11 my $grammar = Marpa::XS::Grammar->new(
12 { start => 'Expression',
13 actions => 'My_Actions',
14 default_action => 'first_arg',
15 rules => [
16 { lhs => 'Expression', rhs => [qw/Term/] },
17 { lhs => 'Term', rhs => [qw/Factor/] },
18 { lhs => 'Factor', rhs => [qw/Number/] },
19 { lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
20 { lhs => 'Factor',
21 rhs => [qw/Factor Multiply Factor/],
22 action => 'do_multiply'
23 },
24 ],
25 }
26 );
27
28 $grammar->precompute();
29
30 my $recce = Marpa::XS::Recognizer->new( { grammar => $grammar } );
31
32 $recce->read( 'Number', 42 );
33 $recce->read( 'Multiply', );
34 $recce->read( 'Number', 1 );
35 $recce->read( 'Add', );
36 $recce->read( 'Number', 7 );
37
38 sub My_Actions::do_add {
39 my ( undef, $t1, undef, $t2 ) = @_;
40 return $t1 + $t2;
41 }
42
43 sub My_Actions::do_multiply {
44 my ( undef, $t1, undef, $t2 ) = @_;
45 return $t1 * $t2;
46 }
47
48 sub My_Actions::first_arg { shift; return shift; }
49
50 my $value_ref = $recce->value;
51 my $value = $value_ref ? ${$value_ref} : 'No Parse';
52
54 Overview
55 Marpa parses any language whose grammar can be written in BNF. That
56 includes recursive grammars, ambiguous grammars, infinitely ambiguous
57 grammars and grammars with useless or empty productions.
58
59 This document contains a top-level overview of the API for the Marpa
60 parse engine. The two examples in this document show the typical flows
61 of Marpa method calls. This document will use these examples to
62 describe the basic features of Marpa in semi-tutorial fashion. Marpa's
63 advanced features, and full reference details of all features, can be
64 found in the other Marpa API documents.
65
66 The three phases
67 A parser needs to:
68
69 • Accept a grammar.
70
71 • Read input.
72
73 • Return values from the parses, according to a semantics.
74
75 In Marpa these three tasks are, for the most part, distinct phases.
76 Grammars are "Marpa::XS::Grammar" objects. The reading of input and
77 the evaluation of the parse according to the semantics is performed by
78 "Marpa::XS::Recognizer" objects.
79
81 The synopsis shows the code for a very simple calculator. It handles
82 only addition and multiplication of integers. This section explains,
83 line by line, how it works.
84
85 Marpa::XS::Grammar::new
86 my $grammar = Marpa::XS::Grammar->new(
87 { start => 'Expression',
88 actions => 'My_Actions',
89 default_action => 'first_arg',
90 rules => [
91 { lhs => 'Expression', rhs => [qw/Term/] },
92 { lhs => 'Term', rhs => [qw/Factor/] },
93 { lhs => 'Factor', rhs => [qw/Number/] },
94 { lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
95 { lhs => 'Factor',
96 rhs => [qw/Factor Multiply Factor/],
97 action => 'do_multiply'
98 },
99 ],
100 }
101 );
102
103 Marpa grammars are "Marpa::XS::Grammar" objects. They are created with
104 the Marpa::XS::Grammar::new constructor. The arguments to
105 Marpa::XS::Grammar::new are references to hashes of named arguments.
106 In the key/value pairs of these hashes, the hash key is the name of the
107 argument, and the hash value is the value of the named argument.
108
109 The start named argument
110
111 start => 'Expression',
112
113 The "start" named argument is required. Its value is a string
114 containing the name of the grammar's start symbol.
115
116 Named arguments for the semantics
117
118 actions => 'My_Actions',
119 default_action => 'first_arg',
120
121 The "actions" and "default_action" named arguments specify semantics.
122 Their argument values are strings, which acquire their semantics during
123 evaluation.
124
125 Evaluation will be described later. Peeking ahead, the
126 "default_action" named argument will be interpreted as an action name.
127 This action name will resolve to an action -- a Perl closure that
128 implements semantics. The action specified by "default_action" is used
129 as the action for rules with no action of their own. "actions"
130 provides the name of a Perl package where Marpa will look for its
131 actions.
132
133 The rules named argument
134
135 rules => [
136 { lhs => 'Expression', rhs => [qw/Term/] },
137 { lhs => 'Term', rhs => [qw/Factor/] },
138 { lhs => 'Factor', rhs => [qw/Number/] },
139 { lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
140 { lhs => 'Factor',
141 rhs => [qw/Factor Multiply Factor/],
142 action => 'do_multiply'
143 },
144 ],
145
146 The value of the "rules" named argument is a reference to an array of
147 rule descriptors. In this example, all the rule descriptors are in the
148 "long" form -- they are references to hashes of rule properties. In
149 each key/value pair of a rule descriptor hash, the key is the name of a
150 rule property, and the hash value is the value of that rule property.
151
152 The lhs property
153
154 The value of the "lhs" rule property must be a string containing the
155 name of the rule's left hand side symbol. Every Marpa rule must have a
156 left hand side symbol.
157
158 The rhs property
159
160 The value of the "rhs" property is a reference to an array of strings
161 containing names of the rule's right hand symbols, in order. This
162 array may be zero length, in which case this is an empty rule -- a rule
163 with no symbols on the right hand side. There are no empty rules in
164 this example.
165
166 The action property
167
168 The value of the "action" rule property is a string. Peeking ahead,
169 each "action" property string will be interpreted as an action name.
170 This action name will be resolved to a Perl closure that implements the
171 rule's semantics.
172
173 Marpa::XS::Grammar::precompute
174 $grammar->precompute();
175
176 Before a Marpa grammar object can be used by a Marpa recognizer, it
177 must be precomputed. Precomputation compiles data structures that the
178 recognizer will need.
179
180 Marpa::XS::Recognizer::new
181 my $recce = Marpa::XS::Recognizer->new( { grammar => $grammar } );
182
183 "Marpa::XS::Recognizer::new" creates a new recognizer. Its arguments
184 are references to hashes of named arguments. In this example the only
185 named argument is the required argument: ""grammar"". The value of the
186 "grammar" named argument must be a precomputed Marpa grammar.
187
188 Marpa::XS::Recognizer::read
189 $recce->read( 'Number', 42 );
190 $recce->read( 'Multiply', );
191 $recce->read( 'Number', 1 );
192 $recce->read( 'Add', );
193 $recce->read( 'Number', 7 );
194
195 The "Marpa::XS::Recognizer::read" method takes two arguments, a token
196 name and a token value. The token name must be the name of a valid
197 terminal symbol in the grammar. By default all symbols are valid as
198 terminal symbols, unless a grammar contains empty rules.
199
200 The grammars in the examples in this document do not contain empty
201 rules, and therefore are free to use any symbol in the grammar as a
202 token name. For more on terminals, including how to explicitly mark
203 terminal symbols, see "Terminals" in Marpa::XS::Grammar.
204
205 The token value must be a Perl scalar, but otherwise its form and
206 semantics are entirely up to the application. If the token value is
207 omitted, it is a Perl "undef". In the calculator example, the values
208 of the ""Add"" and ""Multiply"" tokens are never used, so they are
209 allowed to default to "undef".
210
211 Marpa::XS::Recognizer::value
212 my $value_ref = $recce->value;
213 my $value = $value_ref ? ${$value_ref} : 'No Parse';
214
215 The "Marpa::XS::Recognizer::value" method returns a reference to the
216 parse result's value, if there was a parse result. If there was no
217 parse result, "Marpa::XS::Recognizer::value" returns "undef".
218
219 Resolving the semantics
220 The first thing "Marpa::XS::Recognizer::value" needs to do is to
221 resolve the semantics. Resolving the semantics means mapping the
222 action names into actions. Actions are Perl closures which directly
223 implement semantics. In this example, the "actions" named argument is
224 specified. "actions" is a Perl package name. Marpa will look for
225 actions in that package.
226
227 actions => 'My_Actions',
228
229 { lhs => 'Factor', rhs => [qw/Factor Multiply Factor/], action => 'do_multiply' },
230
231 For example, the "action" property for the above rule is
232 ""do_multiply"" and the "actions" named argument to the grammar was
233 ""My_Actions"". So Marpa looks for a closure whose fully qualified
234 name is "My_Actions::do_multiply", which it finds:
235
236 sub My_Actions::do_multiply {
237 my ( undef, $t1, undef, $t2 ) = @_;
238 return $t1 * $t2;
239 }
240
241 Rules do not always have "action" properties. That is the case with
242 these rules in this example:
243
244 { lhs => 'Expression', rhs => [qw/Term/] },
245 { lhs => 'Term', rhs => [qw/Factor/] },
246 { lhs => 'Factor', rhs => [qw/Number/] },
247
248 Where there is no "action" rule property, Marpa tries to use the "lhs"
249 property as an action name. When Marpa cannot resolve the "lhs"
250 property as an action name, it will fall back to using the default
251 action for the grammar.
252
253 For example, in the first rule in the above display, Marpa will look
254 for a Perl closure with the fully qualified name
255 ""My_Actions::Expression"". When Marpa does not find a closure by that
256 name, Marpa will fall back to trying to use the default action.
257
258 The other two rules in the above display work similarly. Marpa will
259 look for Perl closures named ""My_Actions::Term"" and
260 ""My_Actions::Factor"". It will not find them and will fall back to
261 trying to use the default action, as described next.
262
263 default_action => 'first_arg',
264
265 The "default_action" named argument is resolved in the same way as are
266 the "action" properties of the rules. In this example, default_action
267 is specified as ""first_arg"" and resolves to "My_Actions::first_arg".
268
269 Actions
270 sub My_Actions::first_arg { shift; return shift; }
271
272 sub My_Actions::do_add {
273 my ( undef, $t1, undef, $t2 ) = @_;
274 return $t1 + $t2;
275 }
276
277 Value actions are Perl closures used as callbacks. Value actions are
278 called when nodes in a parse tree are evaluated. A value action
279 receives one or more arguments. The first argument to a value action
280 is always a per-parse-tree object, which the callbacks can use as a
281 scratchpad. In these examples, the per-parse-tree object is not used.
282
283 For a non-empty rule, the second and any subsequent arguments to the
284 callback are the values, in lexical order, of the symbols on the right
285 hand side of the rule. If the action is for an empty rule, the per-
286 parse-tree object will be its only argument.
287
288 Every value action is expected to return a value. With one exception,
289 this value is passed up to a parent node as an argument. The exception
290 is the value for the start rule. The return value for the start rule
291 becomes the parse result.
292
293 Rules with no action specified for them take their semantics from the
294 "default_action" named argument. If there is no default action for a
295 grammar, rules with no action specified for them return a Perl "undef".
296
298 This is the same calculator as before, rewritten to be ambiguous.
299 Rather than give multiplication precedence over addition, the rewritten
300 calculator allows any order of operations. In this example, the
301 actions ("My_Actions::do_add", etc.) and the @tokens array remain the
302 same as before.
303
304 Eliminating precedence makes the grammar shorter, but it also means
305 there can be multiple parse trees, and that the different parse trees
306 can have different parse results. In this application we decide, for
307 each input, to return every one of the parse results.
308
309 use Marpa::XS;
310
311 my $ambiguous_grammar = Marpa::XS::Grammar->new(
312 { start => 'E',
313 actions => 'My_Actions',
314 rules => [
315 [ 'E', [qw/E Add E/], 'do_add' ],
316 [ 'E', [qw/E Multiply E/], 'do_multiply' ],
317 [ 'E', [qw/Number/], ],
318 ],
319 default_action => 'first_arg',
320 }
321 );
322
323 $ambiguous_grammar->precompute();
324
325 my $ambiguous_recce =
326 Marpa::XS::Recognizer->new( { grammar => $ambiguous_grammar } );
327
328 $ambiguous_recce->read( 'Number', 42 );
329 $ambiguous_recce->read( 'Multiply', );
330 $ambiguous_recce->read( 'Number', 1 );
331 $ambiguous_recce->read( 'Add', );
332 $ambiguous_recce->read( 'Number', 7 );
333
334 my @values = ();
335 while ( defined( my $ambiguous_value_ref = $ambiguous_recce->value() ) ) {
336 push @values, ${$ambiguous_value_ref};
337 }
338
339 Short form rule descriptors
340 rules => [
341 [ 'E', [qw/E Add E/], 'do_add' ],
342 [ 'E', [qw/E Multiply E/], 'do_multiply' ],
343 [ 'E', [qw/Number/], ],
344 ],
345
346 The rule descriptors in the ambiguous example demonstrate the "short"
347 or array form of rule descriptors. Array form rule descriptors are
348 references to arrays. Here the elements are, in order, the "lhs"
349 property, the "rhs" property, and the "action" property.
350
351 Marpa::XS::Recognizer::value
352 my @values = ();
353 while ( defined( my $ambiguous_value_ref = $ambiguous_recce->value() ) ) {
354 push @values, ${$ambiguous_value_ref};
355 }
356
357 When called more than once, the "Marpa::XS::Recognizer::value" method
358 iterates through the parse results. For each call, it returns a
359 reference to the parse result. At the end of the iteration, after all
360 parse results have been returned, "Marpa::XS::Recognizer::value"
361 returns "undef". If there were no parse results,
362 "Marpa::XS::Recognizer::value" returns "undef" the first time that it
363 is called.
364
366 Methods in the Marpa API do not return errors. When there are errors,
367 Marpa API methods throw an exception.
368
370 Classes in the Marpa API are not designed to be inherited.
371
373 This document gives a semi-tutorial overview of the entire Marpa API.
374 For full details on Marpa's grammar objects and their methods, see the
375 Marpa::XS::Grammar document. For full details on Marpa's recognizer
376 objects and their methods, see the Marpa::XS::Recognizer document.
377
378 Marpa::XS::Vocabulary is intended as a quick refresher in parsing
379 terminology, emphasizing how the standard terms are used in the Marpa
380 context. Marpa's standard semantics are fully described in the
381 Marpa::XS::Semantics document. Techniques for tracing and for
382 debugging your Marpa grammars are described in the Marpa::XS::Tracing
383 document and the Marpa::XS::Debug document. For those with a
384 theoretical bent, my sources, and other useful references, are
385 described in Marpa::XS::Advanced::Bibliography.
386
388 Jeffrey Kegler
389
390 Why is it called "Marpa"?
391 Marpa is the name of the greatest of the Tibetan "translators". In his
392 time (the 11th century AD) Indian Buddhism was at its height. Marpa's
393 generation of scholars was devoted to producing Tibetan versions of
394 Buddhism's Sanskrit scriptures. Marpa became the greatest of them, and
395 today is known as Marpa Lotsawa: "Marpa the Translator".
396
397 Blatant plug
398 Marpa is a character in my novel, The God Proof. The God Proof centers
399 around Kurt Goedel's proof of God's existence. Yes, that Kurt Goedel,
400 and yes, he really did work out a God Proof (it's in his Collected
401 Works, Vol. 3, pp. 403-404). The God Proof is available as a free
402 download (<http://www.lulu.com/content/933192>). It can be purchased
403 in print form at Amazon.com:
404 <http://www.amazon.com/God-Proof-Jeffrey-Kegler/dp/1434807355>.
405
407 Marpa is directly derived from two other parsers. The first was
408 discovered by John Aycock and R. Nigel Horspool and is described in
409 their Aycock and Horspool 2002. The second was described by Joop Leo
410 and is described in Leo 1991. Aycock, Horspool, and Leo, in turn,
411 based their algorithms on the algorithm discovered by Jay Earley. I
412 combined the Aycock-Horspool algorithm with the Leo algorithm, and
413 added significant changes of my own.
414
415 I'm grateful to Randal Schwartz for his support over the years that
416 I've been working on Marpa. My chats with Larry Wall have been few and
417 brief, but his openness to new ideas has been a major encouragement and
418 his insight into the relationship between "natural language" and
419 computer language has been a major influence. More recently, Allison
420 Randal and Patrick Michaud have been generous with their very valuable
421 time. They might have preferred that I volunteered as a Parrot cage-
422 cleaner, but if so, they were too polite to say.
423
424 Many at perlmonks.org answered questions for me. I used answers from
425 chromatic, Corion, dragonchild, jdporter, samtregar and Juerd, among
426 others, in writing this module. I'm just as grateful to those whose
427 answers I didn't use. My inquiries were made while I was thinking out
428 the code and it wasn't always 100% clear what I was after. If the butt
429 is moved after the round, it shouldn't count against the archer.
430
431 In writing the Pure Perl version of Marpa, I benefited from studying
432 the work of Francois Desarmenien ("Parse::Yapp"), Damian Conway
433 ("Parse::RecDescent") and Graham Barr ("Scalar::Util"). Adam Kennedy
434 patiently instructed me in module writing, both on the finer points and
435 on issues about which I really should have know better.
436
438 Marpa::XS comes without warranty. Support is provided on a volunteer
439 basis through the standard mechanisms for CPAN modules. The Support
440 document has details.
441
443 Copyright 2012 Jeffrey Kegler
444 This file is part of Marpa::XS. Marpa::XS is free software: you can
445 redistribute it and/or modify it under the terms of the GNU Lesser
446 General Public License as published by the Free Software Foundation,
447 either version 3 of the License, or (at your option) any later version.
448
449 Marpa::XS is distributed in the hope that it will be useful,
450 but WITHOUT ANY WARRANTY; without even the implied warranty of
451 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
452 Lesser General Public License for more details.
453
454 You should have received a copy of the GNU Lesser
455 General Public License along with Marpa::XS. If not, see
456 http://www.gnu.org/licenses/.
457
458
459
460perl v5.34.0 2022-01-21 Marpa::XS(3)