1Parse::Yapp(3)        User Contributed Perl Documentation       Parse::Yapp(3)
2
3
4

NAME

6       Parse::Yapp - Perl extension for generating and using LALR parsers.
7

SYNOPSIS

9         yapp -m MyParser grammar_file.yp
10
11         ...
12
13         use MyParser;
14
15         $parser=new MyParser();
16         $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub);
17
18         $nberr=$parser->YYNberr();
19
20         $parser->YYData->{DATA}= [ 'Anything', 'You Want' ];
21
22         $data=$parser->YYData->{DATA}[0];
23

DESCRIPTION

25       Parse::Yapp (Yet Another Perl Parser compiler) is a collection of mod‐
26       ules that let you generate and use yacc like thread safe (reentrant)
27       parsers with perl object oriented interface.
28
29       The script yapp is a front-end to the Parse::Yapp module and let you
30       easily create a Perl OO parser from an input grammar file.
31
32       The Grammar file
33
34       "Comments"
35           Through all your files, comments are either Perl style, introduced
36           by # up to the end of line, or C style, enclosed between  /* and
37           */.
38
39       "Tokens and string literals"
40           Through all the grammar files, two kind of symbols may appear: Non-
41           terminal symbols, called also left-hand-side symbols, which are the
42           names of your rules, and Terminal symbols, called also Tokens.
43
44           Tokens are the symbols your lexer function will feed your parser
45           with (see below). They are of two flavours: symbolic tokens and
46           string literals.
47
48           Non-terminals and symbolic tokens share the same identifier syntax:
49
50                           [A-Za-z][A-Za-z0-9_]*
51
52           String literals are enclosed in single quotes and can contain
53           almost anything. They will be output to your parser file dou‐
54           ble-quoted, making any special character as such. '"', '$' and '@'
55           will be automatically quoted with '\', making their writing more
56           natural. On the other hand, if you need a single quote inside your
57           literal, just quote it with '\'.
58
59           You cannot have a literal 'error' in your grammar as it would con‐
60           fuse the driver with the error token. Use a symbolic token instead.
61           In case you inadvertently use it, this will produce a warning
62           telling you you should have written it error and will treat it as
63           if it were the error token, which is certainly NOT what you meant.
64
65       "Grammar file syntax"
66           It is very close to yacc syntax (in fact, Parse::Yapp should com‐
67           pile a clean yacc grammar without any modification, whereas the
68           opposite is not true).
69
70           This file is divided in three sections, separated by "%%":
71
72                   header section
73                   %%
74                   rules section
75                   %%
76                   footer section
77
78           The Header Section section may optionally contain:
79           *   One or more code blocks enclosed inside "%{" and "%}" just like
80               in yacc. They may contain any valid Perl code and will be
81               copied verbatim at the very beginning of the parser module.
82               They are not as useful as they are in yacc, but you can use
83               them, for example, for global variable declarations, though you
84               will notice later that such global variables can be avoided to
85               make a reentrant parser module.
86
87           *   Precedence declarations, introduced by %left, %right and
88               %nonassoc specifying associativity, followed by the list of
89               tokens or litterals having the same precedence and associativ‐
90               ity.  The precedence beeing the latter declared will be having
91               the highest level.  (see the yacc or bison manuals for a full
92               explanation of how they work, as they are implemented exactly
93               the same way in Parse::Yapp)
94
95           *   %start followed by a rule's left hand side, declaring this rule
96               to be the starting rule of your grammar. The default, when
97               %start is not used, is the first rule in your grammar section.
98
99           *   %token followed by a list of symbols, forcing them to be recog‐
100               nized as tokens, generating a syntax error if used in the left
101               hand side of a rule declaration.  Note that in Parse::Yapp, you
102               don't need to declare tokens as in yacc: any symbol not appear‐
103               ing as a left hand side of a rule is considered to be a token.
104               Other yacc declarations or constructs such as %type and %union
105               are parsed but (almost) ignored.
106
107           *   %expect followed by a number, suppress warnings about number of
108               Shift/Reduce conflicts when both numbers match, a la bison.
109
110           The Rule Section contains your grammar rules:
111               A rule is made of a left-hand-side symbol, followed by a ':'
112               and one or more right-hand-sides separated by '⎪' and termi‐
113               nated by a ';':
114
115                   exp:    exp '+' exp
116                       ⎪   exp '-' exp
117                       ;
118
119               A right hand side may be empty:
120
121                   input:  #empty
122                       ⎪   input line
123                       ;
124
125               (if you have more than one empty rhs, Parse::Yapp will issue a
126               warning, as this is usually a mistake, and you will certainly
127               have a reduce/reduce conflict)
128
129               A rhs may be followed by an optional %prec directive, followed
130               by a token, giving the rule an explicit precedence (see yacc
131               manuals for its precise meaning) and optionnal semantic action
132               code block (see below).
133
134                   exp:   '-' exp %prec NEG { -$_[1] }
135                       ⎪  exp '+' exp       { $_[1] + $_[3] }
136                       ⎪  NUM
137                       ;
138
139               Note that in Parse::Yapp, a lhs cannot appear more than once as
140               a rule name (This differs from yacc).
141
142           "The footer section"
143               may contain any valid Perl code and will be appended at the
144               very end of your parser module. Here you can write your lexer,
145               error report subs and anything relevant to you parser.
146
147           "Semantic actions"
148               Semantic actions are run every time a reduction occurs in the
149               parsing flow and they must return a semantic value.
150
151               They are (usually, but see below "In rule actions") written at
152               the very end of the rhs, enclosed with "{ }", and are copied
153               verbatim to your parser file, inside of the rules table.
154
155               Be aware that matching braces in Perl is much more difficult
156               than in C: inside strings they don't need to match. While in C
157               it is very easy to detect the beginning of a string construct,
158               or a single character, it is much more difficult in Perl, as
159               there are so many ways of writing such literals. So there is no
160               check for that today. If you need a brace in a double-quoted
161               string, just quote it ("\{" or "\}"). For single-quoted
162               strings, you will need to make a comment matching it in th
163               right order.  Sorry for the inconvenience.
164
165                   {
166                       "{ My string block }".
167                       "\{ My other string block \}".
168                       qq/ My unmatched brace \} /.
169                       # Force the match: {
170                       q/ for my closing brace } /
171                       q/ My opening brace { /
172                       # must be closed: }
173                   }
174
175               All of these constructs should work.
176
177               In Parse::Yapp, semantic actions are called like normal Perl
178               sub calls, with their arguments passed in @_, and their seman‐
179               tic value are their return values.
180
181               $_[1] to $_[n] are the parameters just as $1 to $n in yacc,
182               while $_[0] is the parser object itself.
183
184               Having $_[0] beeing the parser object itself allows you to call
185               parser methods. Thats how the yacc macros are implemented:
186
187                       yyerrok is done by calling $_[0]->YYErrok
188                       YYERROR is done by calling $_[0]->YYError
189                       YYACCEPT is done by calling $_[0]->YYAccept
190                       YYABORT is done by calling $_[0]->YYAbort
191
192               All those methods explicitly return undef, for convenience.
193
194                   YYRECOVERING is done by calling $_[0]->YYRecovering
195
196               Four useful methods in error recovery sub
197
198                   $_[0]->YYCurtok
199                   $_[0]->YYCurval
200                   $_[0]->YYExpect
201                   $_[0]->YYLexer
202
203               return respectivly the current input token that made the parse
204               fail, its semantic value (both can be used to modify their val‐
205               ues too, but know what you are doing ! See Error reporting rou‐
206               tine section for an example), a list which contains the tokens
207               the parser expected when the failure occured and a reference to
208               the lexer routine.
209
210               Note that if "$_[0]->YYCurtok" is declared as a %nonassoc
211               token, it can be included in "$_[0]->YYExpect" list whenever
212               the input try to use it in an associative way. This is not a
213               bug: the token IS expected to report an error if encountered.
214
215               To detect such a thing in your error reporting sub, the follow‐
216               ing example should do the trick:
217
218                       grep { $_[0]->YYCurtok eq $_ } $_[0]->YYExpect
219                   and do {
220                       #Non-associative token used in an associative expression
221                   };
222
223               Accessing semantics values on the left of your reducing rule is
224               done through the method
225
226                   $_[0]->YYSemval( index )
227
228               where index is an integer. Its value being 1 .. n returns the
229               same values than $_[1] .. $_[n], but -n .. 0 returns values on
230               the left of the rule beeing reduced (It is related to $-n .. $0
231               .. $n in yacc, but you cannot use $_[0] or $_[-n] constructs in
232               Parse::Yapp for obvious reasons)
233
234               There is also a provision for a user data area in the parser
235               object, accessed by the method:
236
237                   $_[0]->YYData
238
239               which returns a reference to an anonymous hash, which let you
240               have all of your parsing data held inside the object (see the
241               Calc.yp or ParseYapp.yp files in the distribution for some
242               examples).  That's how you can make you parser module reen‐
243               trant: all of your module states and variables are held inside
244               the parser object.
245
246               Note: unfortunatly, method calls in Perl have a lot of over‐
247               head,
248                     and when YYData is used, it may be called a huge number
249                     of times. If your are not a *real* purist and efficiency
250                     is your concern, you may access directly the user-space
251                     in the object: $parser->{USER} wich is a reference to an
252                     anonymous hash array, and then benchmark.
253
254               If no action is specified for a rule, the equivalant of a
255               default action is run, which returns the first parameter:
256
257                  { $_[1] }
258
259           "In rule actions"
260               It is also possible to embed semantic actions inside of a rule:
261
262                   typedef:    TYPE { $type = $_[1] } identlist { ... } ;
263
264               When the Parse::Yapp's parser encounter such an embedded
265               action, it modifies the grammar as if you wrote (although @x-1
266               is not a legal lhs value):
267
268                   @x-1:   /* empty */ { $type = $_[1] };
269                   typedef:    TYPE @x-1 identlist { ... } ;
270
271               where x is a sequential number incremented for each "in rule"
272               action, and -1 represents the "dot position" in the rule where
273               the action arises.
274
275               In such actions, you can use $_[1]..$_[n] variables, which are
276               the semantic values on the left of your action.
277
278               Be aware that the way Parse::Yapp modifies your grammar because
279               of in rule actions can produce, in some cases, spurious con‐
280               flicts that wouldn't happen otherwise.
281
282           "Generating the Parser Module"
283               Now that you grammar file is written, you can use yapp on it to
284               generate your parser module:
285
286                   yapp -v Calc.yp
287
288               will create two files Calc.pm, your parser module, and
289               Calc.output a verbose output of your parser rules, conflicts,
290               warnings, states and summary.
291
292               What your are missing now is a lexer routine.
293
294           "The Lexer sub"
295               is called each time the parser need to read the next token.
296
297               It is called with only one argument that is the parser object
298               itself, so you can access its methods, specially the
299
300                   $_[0]->YYData
301
302               data area.
303
304               It is its duty to return the next token and value to the
305               parser.  They "must" be returned as a list of two variables,
306               the first one is the token known by the parser (symbolic or
307               literal), the second one beeing anything you want (usualy the
308               content of the token, or the literal value) from a simple
309               scalar value to any complex reference, as the parsing driver
310               never use it but to call semantic actions:
311
312                   ( 'NUMBER', $num )
313               or
314                   ( '>=', '>=' )
315               or
316                   ( 'ARRAY', [ @values ] )
317
318               When the lexer reach the end of input, it must return the ''
319               empty token with an undef value:
320
321                    ( '', undef )
322
323               Note that your lexer should never return 'error' as token
324               value: for the driver, this is the error token used for error
325               recovery and would lead to odd reactions.
326
327               Now that you have your lexer written, maybe you will need to
328               output meaningful error messages, instead of the default which
329               is to print 'Parse error.' on STDERR.
330
331               So you will need an Error reporting sub.
332
333               item "Error reporting routine"
334
335               If you want one, write it knowing that it is passed as parame‐
336               ter the parser object. So you can share information whith the
337               lexer routine quite easily.
338
339               You can also use the "$_[0]->YYErrok" method in it, which will
340               resume parsing as if no error occured. Of course, since the
341               invalid token is still invalid, you're supposed to fix the
342               problem by yourself.
343
344               The method "$_[0]->YYLexer" may help you, as it returns a ref‐
345               erence to the lexer routine, and can be called as
346
347                   ($tok,$val)=&{$_[0]->Lexer}
348
349               to get the next token and semantic value from the input stream.
350               To make them current for the parser, use:
351
352                   ($_[0]->YYCurtok, $_[0]->YYCurval) = ($tok, $val)
353
354               and know what you're doing...
355
356           "Parsing"
357               Now you've got everything to do the parsing.
358
359               First, use the parser module:
360
361                   use Calc;
362
363               Then create the parser object:
364
365                   $parser=new Calc;
366
367               Now, call the YYParse method, telling it where to find the
368               lexer and error report subs:
369
370                   $result=$parser->YYParse(yylex => \&Lexer,
371                                          yyerror => \&ErrorReport);
372
373               (assuming Lexer and ErrorReport subs have been written in your
374               current package)
375
376               The order in which parameters appear is unimportant.
377
378               Et voila.
379
380               The YYParse method will do the parse, then return the last
381               semantic value returned, or undef if error recovery cannot
382               recover.
383
384               If you need to be sure the parse has been successful (in case
385               your last returned semantic value is undef) make a call to:
386
387                   $parser->YYNberr()
388
389               which returns the total number of time the error reporting sub
390               has been called.
391
392           "Error Recovery"
393               in Parse::Yapp is implemented the same way it is in yacc.
394
395           "Debugging Parser"
396               To debug your parser, you can call the YYParse method with a
397               debug parameter:
398
399                   $parser->YYParse( ... , yydebug => value, ... )
400
401               where value is a bitfield, each bit representing a specific
402               debug output:
403
404                   Bit Value    Outputs
405                   0x01         Token reading (useful for Lexer debugging)
406                   0x02         States information
407                   0x04         Driver actions (shifts, reduces, accept...)
408                   0x08         Parse Stack dump
409                   0x10         Error Recovery tracing
410
411               To have a full debugging ouput, use
412
413                   debug => 0x1F
414
415               Debugging output is sent to STDERR, and be aware that it can
416               produce "huge" outputs.
417
418           "Standalone Parsers"
419               By default, the parser modules generated will need the
420               Parse::Yapp module installed on the system to run. They use the
421               Parse::Yapp::Driver which can be safely shared between parsers
422               in the same script.
423
424               In the case you'd prefer to have a standalone module generated,
425               use the "-s" switch with yapp: this will automagically copy the
426               driver code into your module so you can use/distribute it with‐
427               out the need of the Parse::Yapp module, making it really a
428               "Standalone Parser".
429
430               If you do so, please remember to include Parse::Yapp's copy‐
431               right notice in your main module copyright, so others can know
432               about Parse::Yapp module.
433
434           "Source file line numbers"
435               by default will be included in the generated parser module,
436               which will help to find the guilty line in your source file in
437               case of a syntax error.  You can disable this feature by com‐
438               piling your grammar with yapp using the "-n" switch.
439

BUGS AND SUGGESTIONS

441       If you find bugs, think of anything that could improve Parse::Yapp or
442       have any questions related to it, feel free to contact the author.
443

AUTHOR

445       Francois Desarmenien  <francois@fdesar.net>
446

SEE ALSO

448       yapp(1) perl(1) yacc(1) bison(1).
449
451       The Parse::Yapp module and its related modules and shell scripts are
452       copyright (c) 1998-2001 Francois Desarmenien, France. All rights
453       reserved.
454
455       You may use and distribute them under the terms of either the GNU Gen‐
456       eral Public License or the Artistic License, as specified in the Perl
457       README file.
458
459       If you use the "standalone parser" option so people don't need to
460       install Parse::Yapp on their systems in order to run you software, this
461       copyright noticed should be included in your software copyright too,
462       and the copyright notice in the embedded driver should be left
463       untouched.
464
465
466
467perl v5.8.8                       2001-02-11                    Parse::Yapp(3)
Impressum