1Parse::Yapp(3) User Contributed Perl Documentation Parse::Yapp(3)
2
3
4
6 Parse::Yapp - Perl extension for generating and using LALR parsers.
7
9 yapp -m MyParser grammar_file.yp
10
11 ...
12
13 use MyParser;
14
15 $parser=new MyParser();
16 $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub);
17
18 $nberr=$parser->YYNberr();
19
20 $parser->YYData->{DATA}= [ 'Anything', 'You Want' ];
21
22 $data=$parser->YYData->{DATA}[0];
23
25 Parse::Yapp (Yet Another Perl Parser compiler) is a collection of
26 modules that let you generate and use yacc like thread safe (reentrant)
27 parsers with perl object oriented interface.
28
29 The script yapp is a front-end to the Parse::Yapp module and let you
30 easily create a Perl OO parser from an input grammar file.
31
32 The Grammar file
33 "Comments"
34 Through all your files, comments are either Perl style, introduced
35 by # up to the end of line, or C style, enclosed between /* and
36 */.
37
38 "Tokens and string literals"
39 Through all the grammar files, two kind of symbols may appear: Non-
40 terminal symbols, called also left-hand-side symbols, which are the
41 names of your rules, and Terminal symbols, called also Tokens.
42
43 Tokens are the symbols your lexer function will feed your parser
44 with (see below). They are of two flavours: symbolic tokens and
45 string literals.
46
47 Non-terminals and symbolic tokens share the same identifier syntax:
48
49 [A-Za-z][A-Za-z0-9_]*
50
51 String literals are enclosed in single quotes and can contain
52 almost anything. They will be output to your parser file double-
53 quoted, making any special character as such. '"', '$' and '@' will
54 be automatically quoted with '\', making their writing more
55 natural. On the other hand, if you need a single quote inside your
56 literal, just quote it with '\'.
57
58 You cannot have a literal 'error' in your grammar as it would
59 confuse the driver with the error token. Use a symbolic token
60 instead. In case you inadvertently use it, this will produce a
61 warning telling you you should have written it error and will treat
62 it as if it were the error token, which is certainly NOT what you
63 meant.
64
65 "Grammar file syntax"
66 It is very close to yacc syntax (in fact, Parse::Yapp should
67 compile a clean yacc grammar without any modification, whereas the
68 opposite is not true).
69
70 This file is divided in three sections, separated by "%%":
71
72 header section
73 %%
74 rules section
75 %%
76 footer section
77
78 The Header Section section may optionally contain:
79 * One or more code blocks enclosed inside "%{" and "%}" just like
80 in yacc. They may contain any valid Perl code and will be
81 copied verbatim at the very beginning of the parser module.
82 They are not as useful as they are in yacc, but you can use
83 them, for example, for global variable declarations, though you
84 will notice later that such global variables can be avoided to
85 make a reentrant parser module.
86
87 * Precedence declarations, introduced by %left, %right and
88 %nonassoc specifying associativity, followed by the list of
89 tokens or litterals having the same precedence and
90 associativity. The precedence beeing the latter declared will
91 be having the highest level. (see the yacc or bison manuals
92 for a full explanation of how they work, as they are
93 implemented exactly the same way in Parse::Yapp)
94
95 * %start followed by a rule's left hand side, declaring this rule
96 to be the starting rule of your grammar. The default, when
97 %start is not used, is the first rule in your grammar section.
98
99 * %token followed by a list of symbols, forcing them to be
100 recognized as tokens, generating a syntax error if used in the
101 left hand side of a rule declaration. Note that in
102 Parse::Yapp, you don't need to declare tokens as in yacc: any
103 symbol not appearing as a left hand side of a rule is
104 considered to be a token. Other yacc declarations or
105 constructs such as %type and %union are parsed but (almost)
106 ignored.
107
108 * %expect followed by a number, suppress warnings about number of
109 Shift/Reduce conflicts when both numbers match, a la bison.
110
111 The Rule Section contains your grammar rules:
112 A rule is made of a left-hand-side symbol, followed by a ':' and
113 one or more right-hand-sides separated by '|' and terminated by a
114 ';':
115
116 exp: exp '+' exp
117 | exp '-' exp
118 ;
119
120 A right hand side may be empty:
121
122 input: #empty
123 | input line
124 ;
125
126 (if you have more than one empty rhs, Parse::Yapp will issue a
127 warning, as this is usually a mistake, and you will certainly have
128 a reduce/reduce conflict)
129
130 A rhs may be followed by an optional %prec directive, followed by a
131 token, giving the rule an explicit precedence (see yacc manuals for
132 its precise meaning) and optional semantic action code block (see
133 below).
134
135 exp: '-' exp %prec NEG { -$_[1] }
136 | exp '+' exp { $_[1] + $_[3] }
137 | NUM
138 ;
139
140 Note that in Parse::Yapp, a lhs cannot appear more than once as a
141 rule name (This differs from yacc).
142
143 "The footer section"
144 may contain any valid Perl code and will be appended at the very
145 end of your parser module. Here you can write your lexer, error
146 report subs and anything relevant to you parser.
147
148 "Semantic actions"
149 Semantic actions are run every time a reduction occurs in the
150 parsing flow and they must return a semantic value.
151
152 They are (usually, but see below "In rule actions") written at the
153 very end of the rhs, enclosed with "{ }", and are copied verbatim
154 to your parser file, inside of the rules table.
155
156 Be aware that matching braces in Perl is much more difficult than
157 in C: inside strings they don't need to match. While in C it is
158 very easy to detect the beginning of a string construct, or a
159 single character, it is much more difficult in Perl, as there are
160 so many ways of writing such literals. So there is no check for
161 that today. If you need a brace in a double-quoted string, just
162 quote it ("\{" or "\}"). For single-quoted strings, you will need
163 to make a comment matching it in th right order. Sorry for the
164 inconvenience.
165
166 {
167 "{ My string block }".
168 "\{ My other string block \}".
169 qq/ My unmatched brace \} /.
170 # Force the match: {
171 q/ for my closing brace } /
172 q/ My opening brace { /
173 # must be closed: }
174 }
175
176 All of these constructs should work.
177
178 In Parse::Yapp, semantic actions are called like normal Perl sub
179 calls, with their arguments passed in @_, and their semantic value
180 are their return values.
181
182 $_[1] to $_[n] are the parameters just as $1 to $n in yacc, while
183 $_[0] is the parser object itself.
184
185 Having $_[0] beeing the parser object itself allows you to call
186 parser methods. Thats how the yacc macros are implemented:
187
188 yyerrok is done by calling $_[0]->YYErrok
189 YYERROR is done by calling $_[0]->YYError
190 YYACCEPT is done by calling $_[0]->YYAccept
191 YYABORT is done by calling $_[0]->YYAbort
192
193 All those methods explicitly return undef, for convenience.
194
195 YYRECOVERING is done by calling $_[0]->YYRecovering
196
197 Four useful methods in error recovery sub
198
199 $_[0]->YYCurtok
200 $_[0]->YYCurval
201 $_[0]->YYExpect
202 $_[0]->YYLexer
203
204 return respectivly the current input token that made the parse
205 fail, its semantic value (both can be used to modify their values
206 too, but know what you are doing ! See Error reporting routine
207 section for an example), a list which contains the tokens the
208 parser expected when the failure occured and a reference to the
209 lexer routine.
210
211 Note that if "$_[0]->YYCurtok" is declared as a %nonassoc token, it
212 can be included in "$_[0]->YYExpect" list whenever the input try to
213 use it in an associative way. This is not a bug: the token IS
214 expected to report an error if encountered.
215
216 To detect such a thing in your error reporting sub, the following
217 example should do the trick:
218
219 grep { $_[0]->YYCurtok eq $_ } $_[0]->YYExpect
220 and do {
221 #Non-associative token used in an associative expression
222 };
223
224 Accessing semantics values on the left of your reducing rule is
225 done through the method
226
227 $_[0]->YYSemval( index )
228
229 where index is an integer. Its value being 1 .. n returns the same
230 values than $_[1] .. $_[n], but -n .. 0 returns values on the left
231 of the rule beeing reduced (It is related to $-n .. $0 .. $n in
232 yacc, but you cannot use $_[0] or $_[-n] constructs in Parse::Yapp
233 for obvious reasons)
234
235 There is also a provision for a user data area in the parser
236 object, accessed by the method:
237
238 $_[0]->YYData
239
240 which returns a reference to an anonymous hash, which let you have
241 all of your parsing data held inside the object (see the Calc.yp or
242 ParseYapp.yp files in the distribution for some examples). That's
243 how you can make you parser module reentrant: all of your module
244 states and variables are held inside the parser object.
245
246 Note: unfortunatly, method calls in Perl have a lot of overhead,
247 and when YYData is used, it may be called a huge number
248 of times. If your are not a *real* purist and efficiency
249 is your concern, you may access directly the user-space
250 in the object: $parser->{USER} wich is a reference to an
251 anonymous hash array, and then benchmark.
252
253 If no action is specified for a rule, the equivalant of a default
254 action is run, which returns the first parameter:
255
256 { $_[1] }
257
258 "In rule actions"
259 It is also possible to embed semantic actions inside of a rule:
260
261 typedef: TYPE { $type = $_[1] } identlist { ... } ;
262
263 When the Parse::Yapp's parser encounter such an embedded action, it
264 modifies the grammar as if you wrote (although @x-1 is not a legal
265 lhs value):
266
267 @x-1: /* empty */ { $type = $_[1] };
268 typedef: TYPE @x-1 identlist { ... } ;
269
270 where x is a sequential number incremented for each "in rule"
271 action, and -1 represents the "dot position" in the rule where the
272 action arises.
273
274 In such actions, you can use $_[1]..$_[n] variables, which are the
275 semantic values on the left of your action.
276
277 Be aware that the way Parse::Yapp modifies your grammar because of
278 in rule actions can produce, in some cases, spurious conflicts that
279 wouldn't happen otherwise.
280
281 "Generating the Parser Module"
282 Now that you grammar file is written, you can use yapp on it to
283 generate your parser module:
284
285 yapp -v Calc.yp
286
287 will create two files Calc.pm, your parser module, and Calc.output
288 a verbose output of your parser rules, conflicts, warnings, states
289 and summary.
290
291 What your are missing now is a lexer routine.
292
293 "The Lexer sub"
294 is called each time the parser need to read the next token.
295
296 It is called with only one argument that is the parser object
297 itself, so you can access its methods, specially the
298
299 $_[0]->YYData
300
301 data area.
302
303 It is its duty to return the next token and value to the parser.
304 They "must" be returned as a list of two variables, the first one
305 is the token known by the parser (symbolic or literal), the second
306 one beeing anything you want (usually the content of the token, or
307 the literal value) from a simple scalar value to any complex
308 reference, as the parsing driver never use it but to call semantic
309 actions:
310
311 ( 'NUMBER', $num )
312 or
313 ( '>=', '>=' )
314 or
315 ( 'ARRAY', [ @values ] )
316
317 When the lexer reach the end of input, it must return the '' empty
318 token with an undef value:
319
320 ( '', undef )
321
322 Note that your lexer should never return 'error' as token value:
323 for the driver, this is the error token used for error recovery and
324 would lead to odd reactions.
325
326 Now that you have your lexer written, maybe you will need to output
327 meaningful error messages, instead of the default which is to print
328 'Parse error.' on STDERR.
329
330 So you will need an Error reporting sub.
331
332 "Error reporting routine"
333 If you want one, write it knowing that it is passed as parameter
334 the parser object. So you can share information whith the lexer
335 routine quite easily.
336
337 You can also use the "$_[0]->YYErrok" method in it, which will
338 resume parsing as if no error occured. Of course, since the invalid
339 token is still invalid, you're supposed to fix the problem by
340 yourself.
341
342 The method "$_[0]->YYLexer" may help you, as it returns a reference
343 to the lexer routine, and can be called as
344
345 ($tok,$val)=&{$_[0]->Lexer}
346
347 to get the next token and semantic value from the input stream. To
348 make them current for the parser, use:
349
350 ($_[0]->YYCurtok, $_[0]->YYCurval) = ($tok, $val)
351
352 and know what you're doing...
353
354 "Parsing"
355 Now you've got everything to do the parsing.
356
357 First, use the parser module:
358
359 use Calc;
360
361 Then create the parser object:
362
363 $parser=new Calc;
364
365 Now, call the YYParse method, telling it where to find the lexer
366 and error report subs:
367
368 $result=$parser->YYParse(yylex => \&Lexer,
369 yyerror => \&ErrorReport);
370
371 (assuming Lexer and ErrorReport subs have been written in your
372 current package)
373
374 The order in which parameters appear is unimportant.
375
376 Et voila.
377
378 The YYParse method will do the parse, then return the last semantic
379 value returned, or undef if error recovery cannot recover.
380
381 If you need to be sure the parse has been successful (in case your
382 last returned semantic value is undef) make a call to:
383
384 $parser->YYNberr()
385
386 which returns the total number of time the error reporting sub has
387 been called.
388
389 "Error Recovery"
390 in Parse::Yapp is implemented the same way it is in yacc.
391
392 "Debugging Parser"
393 To debug your parser, you can call the YYParse method with a debug
394 parameter:
395
396 $parser->YYParse( ... , yydebug => value, ... )
397
398 where value is a bitfield, each bit representing a specific debug
399 output:
400
401 Bit Value Outputs
402 0x01 Token reading (useful for Lexer debugging)
403 0x02 States information
404 0x04 Driver actions (shifts, reduces, accept...)
405 0x08 Parse Stack dump
406 0x10 Error Recovery tracing
407
408 To have a full debugging ouput, use
409
410 debug => 0x1F
411
412 Debugging output is sent to STDERR, and be aware that it can
413 produce "huge" outputs.
414
415 "Standalone Parsers"
416 By default, the parser modules generated will need the Parse::Yapp
417 module installed on the system to run. They use the
418 Parse::Yapp::Driver which can be safely shared between parsers in
419 the same script.
420
421 In the case you'd prefer to have a standalone module generated, use
422 the "-s" switch with yapp: this will automagically copy the driver
423 code into your module so you can use/distribute it without the need
424 of the Parse::Yapp module, making it really a "Standalone Parser".
425
426 If you do so, please remember to include Parse::Yapp's copyright
427 notice in your main module copyright, so others can know about
428 Parse::Yapp module.
429
430 "Source file line numbers"
431 by default will be included in the generated parser module, which
432 will help to find the guilty line in your source file in case of a
433 syntax error. You can disable this feature by compiling your
434 grammar with yapp using the "-n" switch.
435
437 If you find bugs, think of anything that could improve Parse::Yapp or
438 have any questions related to it, feel free to contact the author.
439
441 Francois Desarmenien <francois@fdesar.net>
442
444 yapp(1) perl(1) yacc(1) bison(1).
445
447 The Parse::Yapp module and its related modules and shell scripts are
448 copyright (c) 1998-2001 Francois Desarmenien, France. All rights
449 reserved.
450
451 You may use and distribute them under the terms of either the GNU
452 General Public License or the Artistic License, as specified in the
453 Perl README file.
454
455 If you use the "standalone parser" option so people don't need to
456 install Parse::Yapp on their systems in order to run you software, this
457 copyright noticed should be included in your software copyright too,
458 and the copyright notice in the embedded driver should be left
459 untouched.
460
462 Hey! The above document had some coding errors, which are explained
463 below:
464
465 Around line 112:
466 Expected text after =item, not a bullet
467
468 Around line 121:
469 Expected text after =item, not a bullet
470
471 Around line 130:
472 Expected text after =item, not a bullet
473
474 Around line 136:
475 Expected text after =item, not a bullet
476
477 Around line 147:
478 Expected text after =item, not a bullet
479
480
481
482perl v5.16.3 2014-06-10 Parse::Yapp(3)