1Parse::Yapp(3) User Contributed Perl Documentation Parse::Yapp(3)
2
3
4
6 Parse::Yapp - Perl extension for generating and using LALR parsers.
7
9 yapp -m MyParser grammar_file.yp
10
11 ...
12
13 use MyParser;
14
15 $parser=new MyParser();
16 $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub);
17
18 $nberr=$parser->YYNberr();
19
20 $parser->YYData->{DATA}= [ 'Anything', 'You Want' ];
21
22 $data=$parser->YYData->{DATA}[0];
23
25 Parse::Yapp (Yet Another Perl Parser compiler) is a collection of
26 modules that let you generate and use yacc like thread safe (reentrant)
27 parsers with perl object oriented interface.
28
29 The script yapp is a front-end to the Parse::Yapp module and let you
30 easily create a Perl OO parser from an input grammar file.
31
32 The Grammar file
33 "Comments"
34 Through all your files, comments are either Perl style, introduced
35 by # up to the end of line, or C style, enclosed between /* and
36 */.
37
38 "Tokens and string literals"
39 Through all the grammar files, two kind of symbols may appear: Non-
40 terminal symbols, called also left-hand-side symbols, which are the
41 names of your rules, and Terminal symbols, called also Tokens.
42
43 Tokens are the symbols your lexer function will feed your parser
44 with (see below). They are of two flavours: symbolic tokens and
45 string literals.
46
47 Non-terminals and symbolic tokens share the same identifier syntax:
48
49 [A-Za-z][A-Za-z0-9_]*
50
51 String literals are enclosed in single quotes and can contain
52 almost anything. They will be output to your parser file double-
53 quoted, making any special character as such. '"', '$' and '@' will
54 be automatically quoted with '\', making their writing more
55 natural. On the other hand, if you need a single quote inside your
56 literal, just quote it with '\'.
57
58 You cannot have a literal 'error' in your grammar as it would
59 confuse the driver with the error token. Use a symbolic token
60 instead. In case you inadvertently use it, this will produce a
61 warning telling you you should have written it error and will treat
62 it as if it were the error token, which is certainly NOT what you
63 meant.
64
65 "Grammar file syntax"
66 It is very close to yacc syntax (in fact, Parse::Yapp should
67 compile a clean yacc grammar without any modification, whereas the
68 opposite is not true).
69
70 This file is divided in three sections, separated by "%%":
71
72 header section
73 %%
74 rules section
75 %%
76 footer section
77
78 The Header Section section may optionally contain:
79 * One or more code blocks enclosed inside "%{" and "%}" just like
80 in yacc. They may contain any valid Perl code and will be
81 copied verbatim at the very beginning of the parser module.
82 They are not as useful as they are in yacc, but you can use
83 them, for example, for global variable declarations, though you
84 will notice later that such global variables can be avoided to
85 make a reentrant parser module.
86
87 * Precedence declarations, introduced by %left, %right and
88 %nonassoc specifying associativity, followed by the list of
89 tokens or litterals having the same precedence and
90 associativity. The precedence beeing the latter declared will
91 be having the highest level. (see the yacc or bison manuals
92 for a full explanation of how they work, as they are
93 implemented exactly the same way in Parse::Yapp)
94
95 * %start followed by a rule's left hand side, declaring this rule
96 to be the starting rule of your grammar. The default, when
97 %start is not used, is the first rule in your grammar section.
98
99 * %token followed by a list of symbols, forcing them to be
100 recognized as tokens, generating a syntax error if used in the
101 left hand side of a rule declaration. Note that in
102 Parse::Yapp, you don't need to declare tokens as in yacc: any
103 symbol not appearing as a left hand side of a rule is
104 considered to be a token. Other yacc declarations or
105 constructs such as %type and %union are parsed but (almost)
106 ignored.
107
108 * %expect followed by a number, suppress warnings about number of
109 Shift/Reduce conflicts when both numbers match, a la bison.
110
111 The Rule Section contains your grammar rules:
112 A rule is made of a left-hand-side symbol, followed by a ':'
113 and one or more right-hand-sides separated by '|' and
114 terminated by a ';':
115
116 exp: exp '+' exp
117 | exp '-' exp
118 ;
119
120 A right hand side may be empty:
121
122 input: #empty
123 | input line
124 ;
125
126 (if you have more than one empty rhs, Parse::Yapp will issue a
127 warning, as this is usually a mistake, and you will certainly
128 have a reduce/reduce conflict)
129
130 A rhs may be followed by an optional %prec directive, followed
131 by a token, giving the rule an explicit precedence (see yacc
132 manuals for its precise meaning) and optionnal semantic action
133 code block (see below).
134
135 exp: '-' exp %prec NEG { -$_[1] }
136 | exp '+' exp { $_[1] + $_[3] }
137 | NUM
138 ;
139
140 Note that in Parse::Yapp, a lhs cannot appear more than once as
141 a rule name (This differs from yacc).
142
143 "The footer section"
144 may contain any valid Perl code and will be appended at the
145 very end of your parser module. Here you can write your lexer,
146 error report subs and anything relevant to you parser.
147
148 "Semantic actions"
149 Semantic actions are run every time a reduction occurs in the
150 parsing flow and they must return a semantic value.
151
152 They are (usually, but see below "In rule actions") written at
153 the very end of the rhs, enclosed with "{ }", and are copied
154 verbatim to your parser file, inside of the rules table.
155
156 Be aware that matching braces in Perl is much more difficult
157 than in C: inside strings they don't need to match. While in C
158 it is very easy to detect the beginning of a string construct,
159 or a single character, it is much more difficult in Perl, as
160 there are so many ways of writing such literals. So there is no
161 check for that today. If you need a brace in a double-quoted
162 string, just quote it ("\{" or "\}"). For single-quoted
163 strings, you will need to make a comment matching it in th
164 right order. Sorry for the inconvenience.
165
166 {
167 "{ My string block }".
168 "\{ My other string block \}".
169 qq/ My unmatched brace \} /.
170 # Force the match: {
171 q/ for my closing brace } /
172 q/ My opening brace { /
173 # must be closed: }
174 }
175
176 All of these constructs should work.
177
178 In Parse::Yapp, semantic actions are called like normal Perl
179 sub calls, with their arguments passed in @_, and their
180 semantic value are their return values.
181
182 $_[1] to $_[n] are the parameters just as $1 to $n in yacc,
183 while $_[0] is the parser object itself.
184
185 Having $_[0] beeing the parser object itself allows you to call
186 parser methods. Thats how the yacc macros are implemented:
187
188 yyerrok is done by calling $_[0]->YYErrok
189 YYERROR is done by calling $_[0]->YYError
190 YYACCEPT is done by calling $_[0]->YYAccept
191 YYABORT is done by calling $_[0]->YYAbort
192
193 All those methods explicitly return undef, for convenience.
194
195 YYRECOVERING is done by calling $_[0]->YYRecovering
196
197 Four useful methods in error recovery sub
198
199 $_[0]->YYCurtok
200 $_[0]->YYCurval
201 $_[0]->YYExpect
202 $_[0]->YYLexer
203
204 return respectivly the current input token that made the parse
205 fail, its semantic value (both can be used to modify their
206 values too, but know what you are doing ! See Error reporting
207 routine section for an example), a list which contains the
208 tokens the parser expected when the failure occured and a
209 reference to the lexer routine.
210
211 Note that if "$_[0]->YYCurtok" is declared as a %nonassoc
212 token, it can be included in "$_[0]->YYExpect" list whenever
213 the input try to use it in an associative way. This is not a
214 bug: the token IS expected to report an error if encountered.
215
216 To detect such a thing in your error reporting sub, the
217 following example should do the trick:
218
219 grep { $_[0]->YYCurtok eq $_ } $_[0]->YYExpect
220 and do {
221 #Non-associative token used in an associative expression
222 };
223
224 Accessing semantics values on the left of your reducing rule is
225 done through the method
226
227 $_[0]->YYSemval( index )
228
229 where index is an integer. Its value being 1 .. n returns the
230 same values than $_[1] .. $_[n], but -n .. 0 returns values on
231 the left of the rule beeing reduced (It is related to $-n .. $0
232 .. $n in yacc, but you cannot use $_[0] or $_[-n] constructs in
233 Parse::Yapp for obvious reasons)
234
235 There is also a provision for a user data area in the parser
236 object, accessed by the method:
237
238 $_[0]->YYData
239
240 which returns a reference to an anonymous hash, which let you
241 have all of your parsing data held inside the object (see the
242 Calc.yp or ParseYapp.yp files in the distribution for some
243 examples). That's how you can make you parser module
244 reentrant: all of your module states and variables are held
245 inside the parser object.
246
247 Note: unfortunatly, method calls in Perl have a lot of
248 overhead,
249 and when YYData is used, it may be called a huge number
250 of times. If your are not a *real* purist and efficiency
251 is your concern, you may access directly the user-space
252 in the object: $parser->{USER} wich is a reference to an
253 anonymous hash array, and then benchmark.
254
255 If no action is specified for a rule, the equivalant of a
256 default action is run, which returns the first parameter:
257
258 { $_[1] }
259
260 "In rule actions"
261 It is also possible to embed semantic actions inside of a rule:
262
263 typedef: TYPE { $type = $_[1] } identlist { ... } ;
264
265 When the Parse::Yapp's parser encounter such an embedded
266 action, it modifies the grammar as if you wrote (although @x-1
267 is not a legal lhs value):
268
269 @x-1: /* empty */ { $type = $_[1] };
270 typedef: TYPE @x-1 identlist { ... } ;
271
272 where x is a sequential number incremented for each "in rule"
273 action, and -1 represents the "dot position" in the rule where
274 the action arises.
275
276 In such actions, you can use $_[1]..$_[n] variables, which are
277 the semantic values on the left of your action.
278
279 Be aware that the way Parse::Yapp modifies your grammar because
280 of in rule actions can produce, in some cases, spurious
281 conflicts that wouldn't happen otherwise.
282
283 "Generating the Parser Module"
284 Now that you grammar file is written, you can use yapp on it to
285 generate your parser module:
286
287 yapp -v Calc.yp
288
289 will create two files Calc.pm, your parser module, and
290 Calc.output a verbose output of your parser rules, conflicts,
291 warnings, states and summary.
292
293 What your are missing now is a lexer routine.
294
295 "The Lexer sub"
296 is called each time the parser need to read the next token.
297
298 It is called with only one argument that is the parser object
299 itself, so you can access its methods, specially the
300
301 $_[0]->YYData
302
303 data area.
304
305 It is its duty to return the next token and value to the
306 parser. They "must" be returned as a list of two variables,
307 the first one is the token known by the parser (symbolic or
308 literal), the second one beeing anything you want (usualy the
309 content of the token, or the literal value) from a simple
310 scalar value to any complex reference, as the parsing driver
311 never use it but to call semantic actions:
312
313 ( 'NUMBER', $num )
314 or
315 ( '>=', '>=' )
316 or
317 ( 'ARRAY', [ @values ] )
318
319 When the lexer reach the end of input, it must return the ''
320 empty token with an undef value:
321
322 ( '', undef )
323
324 Note that your lexer should never return 'error' as token
325 value: for the driver, this is the error token used for error
326 recovery and would lead to odd reactions.
327
328 Now that you have your lexer written, maybe you will need to
329 output meaningful error messages, instead of the default which
330 is to print 'Parse error.' on STDERR.
331
332 So you will need an Error reporting sub.
333
334 item "Error reporting routine"
335
336 If you want one, write it knowing that it is passed as
337 parameter the parser object. So you can share information whith
338 the lexer routine quite easily.
339
340 You can also use the "$_[0]->YYErrok" method in it, which will
341 resume parsing as if no error occured. Of course, since the
342 invalid token is still invalid, you're supposed to fix the
343 problem by yourself.
344
345 The method "$_[0]->YYLexer" may help you, as it returns a
346 reference to the lexer routine, and can be called as
347
348 ($tok,$val)=&{$_[0]->Lexer}
349
350 to get the next token and semantic value from the input stream.
351 To make them current for the parser, use:
352
353 ($_[0]->YYCurtok, $_[0]->YYCurval) = ($tok, $val)
354
355 and know what you're doing...
356
357 "Parsing"
358 Now you've got everything to do the parsing.
359
360 First, use the parser module:
361
362 use Calc;
363
364 Then create the parser object:
365
366 $parser=new Calc;
367
368 Now, call the YYParse method, telling it where to find the
369 lexer and error report subs:
370
371 $result=$parser->YYParse(yylex => \&Lexer,
372 yyerror => \&ErrorReport);
373
374 (assuming Lexer and ErrorReport subs have been written in your
375 current package)
376
377 The order in which parameters appear is unimportant.
378
379 Et voila.
380
381 The YYParse method will do the parse, then return the last
382 semantic value returned, or undef if error recovery cannot
383 recover.
384
385 If you need to be sure the parse has been successful (in case
386 your last returned semantic value is undef) make a call to:
387
388 $parser->YYNberr()
389
390 which returns the total number of time the error reporting sub
391 has been called.
392
393 "Error Recovery"
394 in Parse::Yapp is implemented the same way it is in yacc.
395
396 "Debugging Parser"
397 To debug your parser, you can call the YYParse method with a
398 debug parameter:
399
400 $parser->YYParse( ... , yydebug => value, ... )
401
402 where value is a bitfield, each bit representing a specific
403 debug output:
404
405 Bit Value Outputs
406 0x01 Token reading (useful for Lexer debugging)
407 0x02 States information
408 0x04 Driver actions (shifts, reduces, accept...)
409 0x08 Parse Stack dump
410 0x10 Error Recovery tracing
411
412 To have a full debugging ouput, use
413
414 debug => 0x1F
415
416 Debugging output is sent to STDERR, and be aware that it can
417 produce "huge" outputs.
418
419 "Standalone Parsers"
420 By default, the parser modules generated will need the
421 Parse::Yapp module installed on the system to run. They use the
422 Parse::Yapp::Driver which can be safely shared between parsers
423 in the same script.
424
425 In the case you'd prefer to have a standalone module generated,
426 use the "-s" switch with yapp: this will automagically copy the
427 driver code into your module so you can use/distribute it
428 without the need of the Parse::Yapp module, making it really a
429 "Standalone Parser".
430
431 If you do so, please remember to include Parse::Yapp's
432 copyright notice in your main module copyright, so others can
433 know about Parse::Yapp module.
434
435 "Source file line numbers"
436 by default will be included in the generated parser module,
437 which will help to find the guilty line in your source file in
438 case of a syntax error. You can disable this feature by
439 compiling your grammar with yapp using the "-n" switch.
440
442 If you find bugs, think of anything that could improve Parse::Yapp or
443 have any questions related to it, feel free to contact the author.
444
446 Francois Desarmenien <francois@fdesar.net>
447
449 yapp(1) perl(1) yacc(1) bison(1).
450
452 The Parse::Yapp module and its related modules and shell scripts are
453 copyright (c) 1998-2001 Francois Desarmenien, France. All rights
454 reserved.
455
456 You may use and distribute them under the terms of either the GNU
457 General Public License or the Artistic License, as specified in the
458 Perl README file.
459
460 If you use the "standalone parser" option so people don't need to
461 install Parse::Yapp on their systems in order to run you software, this
462 copyright noticed should be included in your software copyright too,
463 and the copyright notice in the embedded driver should be left
464 untouched.
465
467 Hey! The above document had some coding errors, which are explained
468 below:
469
470 Around line 485:
471 You forgot a '=back' before '=head1'
472
473
474
475perl v5.10.1 2001-02-11 Parse::Yapp(3)