1Parse::Yapp(3)        User Contributed Perl Documentation       Parse::Yapp(3)
2
3
4

NAME

6       Parse::Yapp - Perl extension for generating and using LALR parsers.
7

SYNOPSIS

9         yapp -m MyParser grammar_file.yp
10
11         ...
12
13         use MyParser;
14
15         $parser=new MyParser();
16         $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub);
17
18         $nberr=$parser->YYNberr();
19
20         $parser->YYData->{DATA}= [ 'Anything', 'You Want' ];
21
22         $data=$parser->YYData->{DATA}[0];
23

DESCRIPTION

25       Parse::Yapp (Yet Another Perl Parser compiler) is a collection of
26       modules that let you generate and use yacc like thread safe (reentrant)
27       parsers with perl object oriented interface.
28
29       The script yapp is a front-end to the Parse::Yapp module and let you
30       easily create a Perl OO parser from an input grammar file.
31
32   The Grammar file
33       "Comments"
34           Through all your files, comments are either Perl style, introduced
35           by # up to the end of line, or C style, enclosed between  /* and
36           */.
37
38       "Tokens and string literals"
39           Through all the grammar files, two kind of symbols may appear: Non-
40           terminal symbols, called also left-hand-side symbols, which are the
41           names of your rules, and Terminal symbols, called also Tokens.
42
43           Tokens are the symbols your lexer function will feed your parser
44           with (see below). They are of two flavours: symbolic tokens and
45           string literals.
46
47           Non-terminals and symbolic tokens share the same identifier syntax:
48
49                           [A-Za-z][A-Za-z0-9_]*
50
51           String literals are enclosed in single quotes and can contain
52           almost anything. They will be output to your parser file double-
53           quoted, making any special character as such. '"', '$' and '@' will
54           be automatically quoted with '\', making their writing more
55           natural. On the other hand, if you need a single quote inside your
56           literal, just quote it with '\'.
57
58           You cannot have a literal 'error' in your grammar as it would
59           confuse the driver with the error token. Use a symbolic token
60           instead.  In case you inadvertently use it, this will produce a
61           warning telling you you should have written it error and will treat
62           it as if it were the error token, which is certainly NOT what you
63           meant.
64
65       "Grammar file syntax"
66           It is very close to yacc syntax (in fact, Parse::Yapp should
67           compile a clean yacc grammar without any modification, whereas the
68           opposite is not true).
69
70           This file is divided in three sections, separated by "%%":
71
72                   header section
73                   %%
74                   rules section
75                   %%
76                   footer section
77
78           The Header Section section may optionally contain:
79               •   One or more code blocks enclosed inside "%{" and "%}" just
80                   like in yacc. They may contain any valid Perl code and will
81                   be copied verbatim at the very beginning of the parser
82                   module. They are not as useful as they are in yacc, but you
83                   can use them, for example, for global variable
84                   declarations, though you will notice later that such global
85                   variables can be avoided to make a reentrant parser module.
86
87               •   Precedence declarations, introduced by %left, %right and
88                   %nonassoc specifying associativity, followed by the list of
89                   tokens or litterals having the same precedence and
90                   associativity.  The precedence being the latter declared
91                   will be having the highest level.  (see the yacc or bison
92                   manuals for a full explanation of how they work, as they
93                   are implemented exactly the same way in Parse::Yapp)
94
95               •   %start followed by a rule's left hand side, declaring this
96                   rule to be the starting rule of your grammar. The default,
97                   when %start is not used, is the first rule in your grammar
98                   section.
99
100               •   %token followed by a list of symbols, forcing them to be
101                   recognized as tokens, generating a syntax error if used in
102                   the left hand side of a rule declaration.  Note that in
103                   Parse::Yapp, you don't need to declare tokens as in yacc:
104                   any symbol not appearing as a left hand side of a rule is
105                   considered to be a token.  Other yacc declarations or
106                   constructs such as %type and %union are parsed but (almost)
107                   ignored.
108
109               •   %expect followed by a number, suppress warnings about
110                   number of Shift/Reduce conflicts when both numbers match, a
111                   la bison.
112
113       The Rule Section contains your grammar rules:
114           A rule is made of a left-hand-side symbol, followed by a ':' and
115           one or more right-hand-sides separated by '|' and terminated by a
116           ';':
117
118               exp:    exp '+' exp
119                   |   exp '-' exp
120                   ;
121
122           A right hand side may be empty:
123
124               input:  #empty
125                   |   input line
126                   ;
127
128           (if you have more than one empty rhs, Parse::Yapp will issue a
129           warning, as this is usually a mistake, and you will certainly have
130           a reduce/reduce conflict)
131
132           A rhs may be followed by an optional %prec directive, followed by a
133           token, giving the rule an explicit precedence (see yacc manuals for
134           its precise meaning) and optional semantic action code block (see
135           below).
136
137               exp:   '-' exp %prec NEG { -$_[1] }
138                   |  exp '+' exp       { $_[1] + $_[3] }
139                   |  NUM
140                   ;
141
142           Note that in Parse::Yapp, a lhs cannot appear more than once as a
143           rule name (This differs from yacc).
144
145       "The footer section"
146           may contain any valid Perl code and will be appended at the very
147           end of your parser module. Here you can write your lexer, error
148           report subs and anything relevant to you parser.
149
150       "Semantic actions"
151           Semantic actions are run every time a reduction occurs in the
152           parsing flow and they must return a semantic value.
153
154           They are (usually, but see below "In rule actions") written at the
155           very end of the rhs, enclosed with "{ }", and are copied verbatim
156           to your parser file, inside of the rules table.
157
158           Be aware that matching braces in Perl is much more difficult than
159           in C: inside strings they don't need to match. While in C it is
160           very easy to detect the beginning of a string construct, or a
161           single character, it is much more difficult in Perl, as there are
162           so many ways of writing such literals. So there is no check for
163           that today. If you need a brace in a double-quoted string, just
164           quote it ("\{" or "\}"). For single-quoted strings, you will need
165           to make a comment matching it in th right order.  Sorry for the
166           inconvenience.
167
168               {
169                   "{ My string block }".
170                   "\{ My other string block \}".
171                   qq/ My unmatched brace \} /.
172                   # Force the match: {
173                   q/ for my closing brace } /
174                   q/ My opening brace { /
175                   # must be closed: }
176               }
177
178           All of these constructs should work.
179
180           In Parse::Yapp, semantic actions are called like normal Perl sub
181           calls, with their arguments passed in @_, and their semantic value
182           are their return values.
183
184           $_[1] to $_[n] are the parameters just as $1 to $n in yacc, while
185           $_[0] is the parser object itself.
186
187           Having $_[0] being the parser object itself allows you to call
188           parser methods. That's how the yacc macros are implemented:
189
190                   yyerrok is done by calling $_[0]->YYErrok
191                   YYERROR is done by calling $_[0]->YYError
192                   YYACCEPT is done by calling $_[0]->YYAccept
193                   YYABORT is done by calling $_[0]->YYAbort
194
195           All those methods explicitly return undef, for convenience.
196
197               YYRECOVERING is done by calling $_[0]->YYRecovering
198
199           Four useful methods in error recovery sub
200
201               $_[0]->YYCurtok
202               $_[0]->YYCurval
203               $_[0]->YYExpect
204               $_[0]->YYLexer
205
206           return respectivly the current input token that made the parse
207           fail, its semantic value (both can be used to modify their values
208           too, but know what you are doing ! See Error reporting routine
209           section for an example), a list which contains the tokens the
210           parser expected when the failure occurred and a reference to the
211           lexer routine.
212
213           Note that if "$_[0]->YYCurtok" is declared as a %nonassoc token, it
214           can be included in "$_[0]->YYExpect" list whenever the input try to
215           use it in an associative way. This is not a bug: the token IS
216           expected to report an error if encountered.
217
218           To detect such a thing in your error reporting sub, the following
219           example should do the trick:
220
221                   grep { $_[0]->YYCurtok eq $_ } $_[0]->YYExpect
222               and do {
223                   #Non-associative token used in an associative expression
224               };
225
226           Accessing semantics values on the left of your reducing rule is
227           done through the method
228
229               $_[0]->YYSemval( index )
230
231           where index is an integer. Its value being 1 .. n returns the same
232           values than $_[1] .. $_[n], but -n .. 0 returns values on the left
233           of the rule being reduced (It is related to $-n .. $0 .. $n in
234           yacc, but you cannot use $_[0] or $_[-n] constructs in Parse::Yapp
235           for obvious reasons)
236
237           There is also a provision for a user data area in the parser
238           object, accessed by the method:
239
240               $_[0]->YYData
241
242           which returns a reference to an anonymous hash, which let you have
243           all of your parsing data held inside the object (see the Calc.yp or
244           ParseYapp.yp files in the distribution for some examples).  That's
245           how you can make you parser module reentrant: all of your module
246           states and variables are held inside the parser object.
247
248           Note: unfortunately, method calls in Perl have a lot of overhead,
249                 and when YYData is used, it may be called a huge number
250                 of times. If your are not a *real* purist and efficiency
251                 is your concern, you may access directly the user-space
252                 in the object: $parser->{USER} wich is a reference to an
253                 anonymous hash array, and then benchmark.
254
255           If no action is specified for a rule, the equivalant of a default
256           action is run, which returns the first parameter:
257
258              { $_[1] }
259
260       "In rule actions"
261           It is also possible to embed semantic actions inside of a rule:
262
263               typedef:    TYPE { $type = $_[1] } identlist { ... } ;
264
265           When the Parse::Yapp's parser encounter such an embedded action, it
266           modifies the grammar as if you wrote (although @x-1 is not a legal
267           lhs value):
268
269               @x-1:   /* empty */ { $type = $_[1] };
270               typedef:    TYPE @x-1 identlist { ... } ;
271
272           where x is a sequential number incremented for each "in rule"
273           action, and -1 represents the "dot position" in the rule where the
274           action arises.
275
276           In such actions, you can use $_[1]..$_[n] variables, which are the
277           semantic values on the left of your action.
278
279           Be aware that the way Parse::Yapp modifies your grammar because of
280           in rule actions can produce, in some cases, spurious conflicts that
281           wouldn't happen otherwise.
282
283       "Generating the Parser Module"
284           Now that you grammar file is written, you can use yapp on it to
285           generate your parser module:
286
287               yapp -v Calc.yp
288
289           will create two files Calc.pm, your parser module, and Calc.output
290           a verbose output of your parser rules, conflicts, warnings, states
291           and summary.
292
293           What your are missing now is a lexer routine.
294
295       "The Lexer sub"
296           is called each time the parser need to read the next token.
297
298           It is called with only one argument that is the parser object
299           itself, so you can access its methods, specially the
300
301               $_[0]->YYData
302
303           data area.
304
305           It is its duty to return the next token and value to the parser.
306           They "must" be returned as a list of two variables, the first one
307           is the token known by the parser (symbolic or literal), the second
308           one being anything you want (usually the content of the token, or
309           the literal value) from a simple scalar value to any complex
310           reference, as the parsing driver never use it but to call semantic
311           actions:
312
313               ( 'NUMBER', $num )
314           or
315               ( '>=', '>=' )
316           or
317               ( 'ARRAY', [ @values ] )
318
319           When the lexer reach the end of input, it must return the '' empty
320           token with an undef value:
321
322                ( '', undef )
323
324           Note that your lexer should never return 'error' as token value:
325           for the driver, this is the error token used for error recovery and
326           would lead to odd reactions.
327
328           Now that you have your lexer written, maybe you will need to output
329           meaningful error messages, instead of the default which is to print
330           'Parse error.' on STDERR.
331
332           So you will need an Error reporting sub.
333
334       "Error reporting routine"
335           If you want one, write it knowing that it is passed as parameter
336           the parser object. So you can share information with the lexer
337           routine quite easily.
338
339           You can also use the "$_[0]->YYErrok" method in it, which will
340           resume parsing as if no error occurred. Of course, since the
341           invalid token is still invalid, you're supposed to fix the problem
342           by yourself.
343
344           The method "$_[0]->YYLexer" may help you, as it returns a reference
345           to the lexer routine, and can be called as
346
347               ($tok,$val)=&{$_[0]->Lexer}
348
349           to get the next token and semantic value from the input stream. To
350           make them current for the parser, use:
351
352               ($_[0]->YYCurtok, $_[0]->YYCurval) = ($tok, $val)
353
354           and know what you're doing...
355
356       "Parsing"
357           Now you've got everything to do the parsing.
358
359           First, use the parser module:
360
361               use Calc;
362
363           Then create the parser object:
364
365               $parser=new Calc;
366
367           Now, call the YYParse method, telling it where to find the lexer
368           and error report subs:
369
370               $result=$parser->YYParse(yylex => \&Lexer,
371                                      yyerror => \&ErrorReport);
372
373           (assuming Lexer and ErrorReport subs have been written in your
374           current package)
375
376           The order in which parameters appear is unimportant.
377
378           Et voila.
379
380           The YYParse method will do the parse, then return the last semantic
381           value returned, or undef if error recovery cannot recover.
382
383           If you need to be sure the parse has been successful (in case your
384           last returned semantic value is undef) make a call to:
385
386               $parser->YYNberr()
387
388           which returns the total number of time the error reporting sub has
389           been called.
390
391       "Error Recovery"
392           in Parse::Yapp is implemented the same way it is in yacc.
393
394       "Debugging Parser"
395           To debug your parser, you can call the YYParse method with a debug
396           parameter:
397
398               $parser->YYParse( ... , yydebug => value, ... )
399
400           where value is a bitfield, each bit representing a specific debug
401           output:
402
403               Bit Value    Outputs
404               0x01         Token reading (useful for Lexer debugging)
405               0x02         States information
406               0x04         Driver actions (shifts, reduces, accept...)
407               0x08         Parse Stack dump
408               0x10         Error Recovery tracing
409
410           To have a full debugging output, use
411
412               debug => 0x1F
413
414           Debugging output is sent to STDERR, and be aware that it can
415           produce "huge" outputs.
416
417       "Standalone Parsers"
418           By default, the parser modules generated will need the Parse::Yapp
419           module installed on the system to run. They use the
420           Parse::Yapp::Driver which can be safely shared between parsers in
421           the same script.
422
423           In the case you'd prefer to have a standalone module generated, use
424           the "-s" switch with yapp: this will automagically copy the driver
425           code into your module so you can use/distribute it without the need
426           of the Parse::Yapp module, making it really a "Standalone Parser".
427
428           If you do so, please remember to include Parse::Yapp's copyright
429           notice in your main module copyright, so others can know about
430           Parse::Yapp module.
431
432       "Source file line numbers"
433           by default will be included in the generated parser module, which
434           will help to find the guilty line in your source file in case of a
435           syntax error.  You can disable this feature by compiling your
436           grammar with yapp using the "-n" switch.
437

BUGS AND SUGGESTIONS

439       If you find bugs, think of anything that could improve Parse::Yapp or
440       have any questions related to it, feel free to contact the author.
441

AUTHOR

443       William N. Braswell, Jr. <wbraswell_cpan@NOSPAM.nym.hush.com> (Remove
444       "NOSPAM".)
445

SEE ALSO

447       yapp(1) perl(1) yacc(1) bison(1).
448
450       The Parse::Yapp module and its related modules and shell scripts are
451       copyright: Copyright © 1998, 1999, 2000, 2001, Francois Desarmenien.
452       Copyright © 2017 William N. Braswell, Jr.
453
454       You may use and distribute them under the terms of either the GNU
455       General Public License or the Artistic License, as specified in the
456       Perl README file.
457
458       If you use the "standalone parser" option so people don't need to
459       install Parse::Yapp on their systems in order to run you software, this
460       copyright noticed should be included in your software copyright too,
461       and the copyright notice in the embedded driver should be left
462       untouched.
463
464
465
466perl v5.36.0                      2023-01-20                    Parse::Yapp(3)
Impressum