1Parse::Yapp(3) User Contributed Perl Documentation Parse::Yapp(3)
2
3
4
6 Parse::Yapp - Perl extension for generating and using LALR parsers.
7
9 yapp -m MyParser grammar_file.yp
10
11 ...
12
13 use MyParser;
14
15 $parser=new MyParser();
16 $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub);
17
18 $nberr=$parser->YYNberr();
19
20 $parser->YYData->{DATA}= [ 'Anything', 'You Want' ];
21
22 $data=$parser->YYData->{DATA}[0];
23
25 Parse::Yapp (Yet Another Perl Parser compiler) is a collection of
26 modules that let you generate and use yacc like thread safe (reentrant)
27 parsers with perl object oriented interface.
28
29 The script yapp is a front-end to the Parse::Yapp module and let you
30 easily create a Perl OO parser from an input grammar file.
31
32 The Grammar file
33 "Comments"
34 Through all your files, comments are either Perl style, introduced
35 by # up to the end of line, or C style, enclosed between /* and
36 */.
37
38 "Tokens and string literals"
39 Through all the grammar files, two kind of symbols may appear: Non-
40 terminal symbols, called also left-hand-side symbols, which are the
41 names of your rules, and Terminal symbols, called also Tokens.
42
43 Tokens are the symbols your lexer function will feed your parser
44 with (see below). They are of two flavours: symbolic tokens and
45 string literals.
46
47 Non-terminals and symbolic tokens share the same identifier syntax:
48
49 [A-Za-z][A-Za-z0-9_]*
50
51 String literals are enclosed in single quotes and can contain
52 almost anything. They will be output to your parser file double-
53 quoted, making any special character as such. '"', '$' and '@' will
54 be automatically quoted with '\', making their writing more
55 natural. On the other hand, if you need a single quote inside your
56 literal, just quote it with '\'.
57
58 You cannot have a literal 'error' in your grammar as it would
59 confuse the driver with the error token. Use a symbolic token
60 instead. In case you inadvertently use it, this will produce a
61 warning telling you you should have written it error and will treat
62 it as if it were the error token, which is certainly NOT what you
63 meant.
64
65 "Grammar file syntax"
66 It is very close to yacc syntax (in fact, Parse::Yapp should
67 compile a clean yacc grammar without any modification, whereas the
68 opposite is not true).
69
70 This file is divided in three sections, separated by "%%":
71
72 header section
73 %%
74 rules section
75 %%
76 footer section
77
78 The Header Section section may optionally contain:
79 • One or more code blocks enclosed inside "%{" and "%}" just
80 like in yacc. They may contain any valid Perl code and will
81 be copied verbatim at the very beginning of the parser
82 module. They are not as useful as they are in yacc, but you
83 can use them, for example, for global variable
84 declarations, though you will notice later that such global
85 variables can be avoided to make a reentrant parser module.
86
87 • Precedence declarations, introduced by %left, %right and
88 %nonassoc specifying associativity, followed by the list of
89 tokens or litterals having the same precedence and
90 associativity. The precedence being the latter declared
91 will be having the highest level. (see the yacc or bison
92 manuals for a full explanation of how they work, as they
93 are implemented exactly the same way in Parse::Yapp)
94
95 • %start followed by a rule's left hand side, declaring this
96 rule to be the starting rule of your grammar. The default,
97 when %start is not used, is the first rule in your grammar
98 section.
99
100 • %token followed by a list of symbols, forcing them to be
101 recognized as tokens, generating a syntax error if used in
102 the left hand side of a rule declaration. Note that in
103 Parse::Yapp, you don't need to declare tokens as in yacc:
104 any symbol not appearing as a left hand side of a rule is
105 considered to be a token. Other yacc declarations or
106 constructs such as %type and %union are parsed but (almost)
107 ignored.
108
109 • %expect followed by a number, suppress warnings about
110 number of Shift/Reduce conflicts when both numbers match, a
111 la bison.
112
113 The Rule Section contains your grammar rules:
114 A rule is made of a left-hand-side symbol, followed by a ':' and
115 one or more right-hand-sides separated by '|' and terminated by a
116 ';':
117
118 exp: exp '+' exp
119 | exp '-' exp
120 ;
121
122 A right hand side may be empty:
123
124 input: #empty
125 | input line
126 ;
127
128 (if you have more than one empty rhs, Parse::Yapp will issue a
129 warning, as this is usually a mistake, and you will certainly have
130 a reduce/reduce conflict)
131
132 A rhs may be followed by an optional %prec directive, followed by a
133 token, giving the rule an explicit precedence (see yacc manuals for
134 its precise meaning) and optional semantic action code block (see
135 below).
136
137 exp: '-' exp %prec NEG { -$_[1] }
138 | exp '+' exp { $_[1] + $_[3] }
139 | NUM
140 ;
141
142 Note that in Parse::Yapp, a lhs cannot appear more than once as a
143 rule name (This differs from yacc).
144
145 "The footer section"
146 may contain any valid Perl code and will be appended at the very
147 end of your parser module. Here you can write your lexer, error
148 report subs and anything relevant to you parser.
149
150 "Semantic actions"
151 Semantic actions are run every time a reduction occurs in the
152 parsing flow and they must return a semantic value.
153
154 They are (usually, but see below "In rule actions") written at the
155 very end of the rhs, enclosed with "{ }", and are copied verbatim
156 to your parser file, inside of the rules table.
157
158 Be aware that matching braces in Perl is much more difficult than
159 in C: inside strings they don't need to match. While in C it is
160 very easy to detect the beginning of a string construct, or a
161 single character, it is much more difficult in Perl, as there are
162 so many ways of writing such literals. So there is no check for
163 that today. If you need a brace in a double-quoted string, just
164 quote it ("\{" or "\}"). For single-quoted strings, you will need
165 to make a comment matching it in th right order. Sorry for the
166 inconvenience.
167
168 {
169 "{ My string block }".
170 "\{ My other string block \}".
171 qq/ My unmatched brace \} /.
172 # Force the match: {
173 q/ for my closing brace } /
174 q/ My opening brace { /
175 # must be closed: }
176 }
177
178 All of these constructs should work.
179
180 In Parse::Yapp, semantic actions are called like normal Perl sub
181 calls, with their arguments passed in @_, and their semantic value
182 are their return values.
183
184 $_[1] to $_[n] are the parameters just as $1 to $n in yacc, while
185 $_[0] is the parser object itself.
186
187 Having $_[0] being the parser object itself allows you to call
188 parser methods. That's how the yacc macros are implemented:
189
190 yyerrok is done by calling $_[0]->YYErrok
191 YYERROR is done by calling $_[0]->YYError
192 YYACCEPT is done by calling $_[0]->YYAccept
193 YYABORT is done by calling $_[0]->YYAbort
194
195 All those methods explicitly return undef, for convenience.
196
197 YYRECOVERING is done by calling $_[0]->YYRecovering
198
199 Four useful methods in error recovery sub
200
201 $_[0]->YYCurtok
202 $_[0]->YYCurval
203 $_[0]->YYExpect
204 $_[0]->YYLexer
205
206 return respectivly the current input token that made the parse
207 fail, its semantic value (both can be used to modify their values
208 too, but know what you are doing ! See Error reporting routine
209 section for an example), a list which contains the tokens the
210 parser expected when the failure occurred and a reference to the
211 lexer routine.
212
213 Note that if "$_[0]->YYCurtok" is declared as a %nonassoc token, it
214 can be included in "$_[0]->YYExpect" list whenever the input try to
215 use it in an associative way. This is not a bug: the token IS
216 expected to report an error if encountered.
217
218 To detect such a thing in your error reporting sub, the following
219 example should do the trick:
220
221 grep { $_[0]->YYCurtok eq $_ } $_[0]->YYExpect
222 and do {
223 #Non-associative token used in an associative expression
224 };
225
226 Accessing semantics values on the left of your reducing rule is
227 done through the method
228
229 $_[0]->YYSemval( index )
230
231 where index is an integer. Its value being 1 .. n returns the same
232 values than $_[1] .. $_[n], but -n .. 0 returns values on the left
233 of the rule being reduced (It is related to $-n .. $0 .. $n in
234 yacc, but you cannot use $_[0] or $_[-n] constructs in Parse::Yapp
235 for obvious reasons)
236
237 There is also a provision for a user data area in the parser
238 object, accessed by the method:
239
240 $_[0]->YYData
241
242 which returns a reference to an anonymous hash, which let you have
243 all of your parsing data held inside the object (see the Calc.yp or
244 ParseYapp.yp files in the distribution for some examples). That's
245 how you can make you parser module reentrant: all of your module
246 states and variables are held inside the parser object.
247
248 Note: unfortunately, method calls in Perl have a lot of overhead,
249 and when YYData is used, it may be called a huge number
250 of times. If your are not a *real* purist and efficiency
251 is your concern, you may access directly the user-space
252 in the object: $parser->{USER} wich is a reference to an
253 anonymous hash array, and then benchmark.
254
255 If no action is specified for a rule, the equivalant of a default
256 action is run, which returns the first parameter:
257
258 { $_[1] }
259
260 "In rule actions"
261 It is also possible to embed semantic actions inside of a rule:
262
263 typedef: TYPE { $type = $_[1] } identlist { ... } ;
264
265 When the Parse::Yapp's parser encounter such an embedded action, it
266 modifies the grammar as if you wrote (although @x-1 is not a legal
267 lhs value):
268
269 @x-1: /* empty */ { $type = $_[1] };
270 typedef: TYPE @x-1 identlist { ... } ;
271
272 where x is a sequential number incremented for each "in rule"
273 action, and -1 represents the "dot position" in the rule where the
274 action arises.
275
276 In such actions, you can use $_[1]..$_[n] variables, which are the
277 semantic values on the left of your action.
278
279 Be aware that the way Parse::Yapp modifies your grammar because of
280 in rule actions can produce, in some cases, spurious conflicts that
281 wouldn't happen otherwise.
282
283 "Generating the Parser Module"
284 Now that you grammar file is written, you can use yapp on it to
285 generate your parser module:
286
287 yapp -v Calc.yp
288
289 will create two files Calc.pm, your parser module, and Calc.output
290 a verbose output of your parser rules, conflicts, warnings, states
291 and summary.
292
293 What your are missing now is a lexer routine.
294
295 "The Lexer sub"
296 is called each time the parser need to read the next token.
297
298 It is called with only one argument that is the parser object
299 itself, so you can access its methods, specially the
300
301 $_[0]->YYData
302
303 data area.
304
305 It is its duty to return the next token and value to the parser.
306 They "must" be returned as a list of two variables, the first one
307 is the token known by the parser (symbolic or literal), the second
308 one being anything you want (usually the content of the token, or
309 the literal value) from a simple scalar value to any complex
310 reference, as the parsing driver never use it but to call semantic
311 actions:
312
313 ( 'NUMBER', $num )
314 or
315 ( '>=', '>=' )
316 or
317 ( 'ARRAY', [ @values ] )
318
319 When the lexer reach the end of input, it must return the '' empty
320 token with an undef value:
321
322 ( '', undef )
323
324 Note that your lexer should never return 'error' as token value:
325 for the driver, this is the error token used for error recovery and
326 would lead to odd reactions.
327
328 Now that you have your lexer written, maybe you will need to output
329 meaningful error messages, instead of the default which is to print
330 'Parse error.' on STDERR.
331
332 So you will need an Error reporting sub.
333
334 "Error reporting routine"
335 If you want one, write it knowing that it is passed as parameter
336 the parser object. So you can share information with the lexer
337 routine quite easily.
338
339 You can also use the "$_[0]->YYErrok" method in it, which will
340 resume parsing as if no error occurred. Of course, since the
341 invalid token is still invalid, you're supposed to fix the problem
342 by yourself.
343
344 The method "$_[0]->YYLexer" may help you, as it returns a reference
345 to the lexer routine, and can be called as
346
347 ($tok,$val)=&{$_[0]->Lexer}
348
349 to get the next token and semantic value from the input stream. To
350 make them current for the parser, use:
351
352 ($_[0]->YYCurtok, $_[0]->YYCurval) = ($tok, $val)
353
354 and know what you're doing...
355
356 "Parsing"
357 Now you've got everything to do the parsing.
358
359 First, use the parser module:
360
361 use Calc;
362
363 Then create the parser object:
364
365 $parser=new Calc;
366
367 Now, call the YYParse method, telling it where to find the lexer
368 and error report subs:
369
370 $result=$parser->YYParse(yylex => \&Lexer,
371 yyerror => \&ErrorReport);
372
373 (assuming Lexer and ErrorReport subs have been written in your
374 current package)
375
376 The order in which parameters appear is unimportant.
377
378 Et voila.
379
380 The YYParse method will do the parse, then return the last semantic
381 value returned, or undef if error recovery cannot recover.
382
383 If you need to be sure the parse has been successful (in case your
384 last returned semantic value is undef) make a call to:
385
386 $parser->YYNberr()
387
388 which returns the total number of time the error reporting sub has
389 been called.
390
391 "Error Recovery"
392 in Parse::Yapp is implemented the same way it is in yacc.
393
394 "Debugging Parser"
395 To debug your parser, you can call the YYParse method with a debug
396 parameter:
397
398 $parser->YYParse( ... , yydebug => value, ... )
399
400 where value is a bitfield, each bit representing a specific debug
401 output:
402
403 Bit Value Outputs
404 0x01 Token reading (useful for Lexer debugging)
405 0x02 States information
406 0x04 Driver actions (shifts, reduces, accept...)
407 0x08 Parse Stack dump
408 0x10 Error Recovery tracing
409
410 To have a full debugging output, use
411
412 debug => 0x1F
413
414 Debugging output is sent to STDERR, and be aware that it can
415 produce "huge" outputs.
416
417 "Standalone Parsers"
418 By default, the parser modules generated will need the Parse::Yapp
419 module installed on the system to run. They use the
420 Parse::Yapp::Driver which can be safely shared between parsers in
421 the same script.
422
423 In the case you'd prefer to have a standalone module generated, use
424 the "-s" switch with yapp: this will automagically copy the driver
425 code into your module so you can use/distribute it without the need
426 of the Parse::Yapp module, making it really a "Standalone Parser".
427
428 If you do so, please remember to include Parse::Yapp's copyright
429 notice in your main module copyright, so others can know about
430 Parse::Yapp module.
431
432 "Source file line numbers"
433 by default will be included in the generated parser module, which
434 will help to find the guilty line in your source file in case of a
435 syntax error. You can disable this feature by compiling your
436 grammar with yapp using the "-n" switch.
437
439 If you find bugs, think of anything that could improve Parse::Yapp or
440 have any questions related to it, feel free to contact the author.
441
443 William N. Braswell, Jr. <wbraswell_cpan@NOSPAM.nym.hush.com> (Remove
444 "NOSPAM".)
445
447 yapp(1) perl(1) yacc(1) bison(1).
448
450 The Parse::Yapp module and its related modules and shell scripts are
451 copyright: Copyright © 1998, 1999, 2000, 2001, Francois Desarmenien.
452 Copyright © 2017 William N. Braswell, Jr.
453
454 You may use and distribute them under the terms of either the GNU
455 General Public License or the Artistic License, as specified in the
456 Perl README file.
457
458 If you use the "standalone parser" option so people don't need to
459 install Parse::Yapp on their systems in order to run you software, this
460 copyright noticed should be included in your software copyright too,
461 and the copyright notice in the embedded driver should be left
462 untouched.
463
464
465
466perl v5.36.0 2023-01-20 Parse::Yapp(3)