1Parse::Yapp(3)        User Contributed Perl Documentation       Parse::Yapp(3)
2
3
4

NAME

6       Parse::Yapp - Perl extension for generating and using LALR parsers.
7

SYNOPSIS

9         yapp -m MyParser grammar_file.yp
10
11         ...
12
13         use MyParser;
14
15         $parser=new MyParser();
16         $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub);
17
18         $nberr=$parser->YYNberr();
19
20         $parser->YYData->{DATA}= [ 'Anything', 'You Want' ];
21
22         $data=$parser->YYData->{DATA}[0];
23

DESCRIPTION

25       Parse::Yapp (Yet Another Perl Parser compiler) is a collection of
26       modules that let you generate and use yacc like thread safe (reentrant)
27       parsers with perl object oriented interface.
28
29       The script yapp is a front-end to the Parse::Yapp module and let you
30       easily create a Perl OO parser from an input grammar file.
31
32   The Grammar file
33       "Comments"
34           Through all your files, comments are either Perl style, introduced
35           by # up to the end of line, or C style, enclosed between  /* and
36           */.
37
38       "Tokens and string literals"
39           Through all the grammar files, two kind of symbols may appear: Non-
40           terminal symbols, called also left-hand-side symbols, which are the
41           names of your rules, and Terminal symbols, called also Tokens.
42
43           Tokens are the symbols your lexer function will feed your parser
44           with (see below). They are of two flavours: symbolic tokens and
45           string literals.
46
47           Non-terminals and symbolic tokens share the same identifier syntax:
48
49                           [A-Za-z][A-Za-z0-9_]*
50
51           String literals are enclosed in single quotes and can contain
52           almost anything. They will be output to your parser file double-
53           quoted, making any special character as such. '"', '$' and '@' will
54           be automatically quoted with '\', making their writing more
55           natural. On the other hand, if you need a single quote inside your
56           literal, just quote it with '\'.
57
58           You cannot have a literal 'error' in your grammar as it would
59           confuse the driver with the error token. Use a symbolic token
60           instead.  In case you inadvertently use it, this will produce a
61           warning telling you you should have written it error and will treat
62           it as if it were the error token, which is certainly NOT what you
63           meant.
64
65       "Grammar file syntax"
66           It is very close to yacc syntax (in fact, Parse::Yapp should
67           compile a clean yacc grammar without any modification, whereas the
68           opposite is not true).
69
70           This file is divided in three sections, separated by "%%":
71
72                   header section
73                   %%
74                   rules section
75                   %%
76                   footer section
77
78           The Header Section section may optionally contain:
79           *   One or more code blocks enclosed inside "%{" and "%}" just like
80               in yacc. They may contain any valid Perl code and will be
81               copied verbatim at the very beginning of the parser module.
82               They are not as useful as they are in yacc, but you can use
83               them, for example, for global variable declarations, though you
84               will notice later that such global variables can be avoided to
85               make a reentrant parser module.
86
87           *   Precedence declarations, introduced by %left, %right and
88               %nonassoc specifying associativity, followed by the list of
89               tokens or litterals having the same precedence and
90               associativity.  The precedence beeing the latter declared will
91               be having the highest level.  (see the yacc or bison manuals
92               for a full explanation of how they work, as they are
93               implemented exactly the same way in Parse::Yapp)
94
95           *   %start followed by a rule's left hand side, declaring this rule
96               to be the starting rule of your grammar. The default, when
97               %start is not used, is the first rule in your grammar section.
98
99           *   %token followed by a list of symbols, forcing them to be
100               recognized as tokens, generating a syntax error if used in the
101               left hand side of a rule declaration.  Note that in
102               Parse::Yapp, you don't need to declare tokens as in yacc: any
103               symbol not appearing as a left hand side of a rule is
104               considered to be a token.  Other yacc declarations or
105               constructs such as %type and %union are parsed but (almost)
106               ignored.
107
108           *   %expect followed by a number, suppress warnings about number of
109               Shift/Reduce conflicts when both numbers match, a la bison.
110
111           The Rule Section contains your grammar rules:
112               A rule is made of a left-hand-side symbol, followed by a ':'
113               and one or more right-hand-sides separated by '|' and
114               terminated by a ';':
115
116                   exp:    exp '+' exp
117                       |   exp '-' exp
118                       ;
119
120               A right hand side may be empty:
121
122                   input:  #empty
123                       |   input line
124                       ;
125
126               (if you have more than one empty rhs, Parse::Yapp will issue a
127               warning, as this is usually a mistake, and you will certainly
128               have a reduce/reduce conflict)
129
130               A rhs may be followed by an optional %prec directive, followed
131               by a token, giving the rule an explicit precedence (see yacc
132               manuals for its precise meaning) and optionnal semantic action
133               code block (see below).
134
135                   exp:   '-' exp %prec NEG { -$_[1] }
136                       |  exp '+' exp       { $_[1] + $_[3] }
137                       |  NUM
138                       ;
139
140               Note that in Parse::Yapp, a lhs cannot appear more than once as
141               a rule name (This differs from yacc).
142
143           "The footer section"
144               may contain any valid Perl code and will be appended at the
145               very end of your parser module. Here you can write your lexer,
146               error report subs and anything relevant to you parser.
147
148           "Semantic actions"
149               Semantic actions are run every time a reduction occurs in the
150               parsing flow and they must return a semantic value.
151
152               They are (usually, but see below "In rule actions") written at
153               the very end of the rhs, enclosed with "{ }", and are copied
154               verbatim to your parser file, inside of the rules table.
155
156               Be aware that matching braces in Perl is much more difficult
157               than in C: inside strings they don't need to match. While in C
158               it is very easy to detect the beginning of a string construct,
159               or a single character, it is much more difficult in Perl, as
160               there are so many ways of writing such literals. So there is no
161               check for that today. If you need a brace in a double-quoted
162               string, just quote it ("\{" or "\}"). For single-quoted
163               strings, you will need to make a comment matching it in th
164               right order.  Sorry for the inconvenience.
165
166                   {
167                       "{ My string block }".
168                       "\{ My other string block \}".
169                       qq/ My unmatched brace \} /.
170                       # Force the match: {
171                       q/ for my closing brace } /
172                       q/ My opening brace { /
173                       # must be closed: }
174                   }
175
176               All of these constructs should work.
177
178               In Parse::Yapp, semantic actions are called like normal Perl
179               sub calls, with their arguments passed in @_, and their
180               semantic value are their return values.
181
182               $_[1] to $_[n] are the parameters just as $1 to $n in yacc,
183               while $_[0] is the parser object itself.
184
185               Having $_[0] beeing the parser object itself allows you to call
186               parser methods. Thats how the yacc macros are implemented:
187
188                       yyerrok is done by calling $_[0]->YYErrok
189                       YYERROR is done by calling $_[0]->YYError
190                       YYACCEPT is done by calling $_[0]->YYAccept
191                       YYABORT is done by calling $_[0]->YYAbort
192
193               All those methods explicitly return undef, for convenience.
194
195                   YYRECOVERING is done by calling $_[0]->YYRecovering
196
197               Four useful methods in error recovery sub
198
199                   $_[0]->YYCurtok
200                   $_[0]->YYCurval
201                   $_[0]->YYExpect
202                   $_[0]->YYLexer
203
204               return respectivly the current input token that made the parse
205               fail, its semantic value (both can be used to modify their
206               values too, but know what you are doing ! See Error reporting
207               routine section for an example), a list which contains the
208               tokens the parser expected when the failure occured and a
209               reference to the lexer routine.
210
211               Note that if "$_[0]->YYCurtok" is declared as a %nonassoc
212               token, it can be included in "$_[0]->YYExpect" list whenever
213               the input try to use it in an associative way. This is not a
214               bug: the token IS expected to report an error if encountered.
215
216               To detect such a thing in your error reporting sub, the
217               following example should do the trick:
218
219                       grep { $_[0]->YYCurtok eq $_ } $_[0]->YYExpect
220                   and do {
221                       #Non-associative token used in an associative expression
222                   };
223
224               Accessing semantics values on the left of your reducing rule is
225               done through the method
226
227                   $_[0]->YYSemval( index )
228
229               where index is an integer. Its value being 1 .. n returns the
230               same values than $_[1] .. $_[n], but -n .. 0 returns values on
231               the left of the rule beeing reduced (It is related to $-n .. $0
232               .. $n in yacc, but you cannot use $_[0] or $_[-n] constructs in
233               Parse::Yapp for obvious reasons)
234
235               There is also a provision for a user data area in the parser
236               object, accessed by the method:
237
238                   $_[0]->YYData
239
240               which returns a reference to an anonymous hash, which let you
241               have all of your parsing data held inside the object (see the
242               Calc.yp or ParseYapp.yp files in the distribution for some
243               examples).  That's how you can make you parser module
244               reentrant: all of your module states and variables are held
245               inside the parser object.
246
247               Note: unfortunatly, method calls in Perl have a lot of
248               overhead,
249                     and when YYData is used, it may be called a huge number
250                     of times. If your are not a *real* purist and efficiency
251                     is your concern, you may access directly the user-space
252                     in the object: $parser->{USER} wich is a reference to an
253                     anonymous hash array, and then benchmark.
254
255               If no action is specified for a rule, the equivalant of a
256               default action is run, which returns the first parameter:
257
258                  { $_[1] }
259
260           "In rule actions"
261               It is also possible to embed semantic actions inside of a rule:
262
263                   typedef:    TYPE { $type = $_[1] } identlist { ... } ;
264
265               When the Parse::Yapp's parser encounter such an embedded
266               action, it modifies the grammar as if you wrote (although @x-1
267               is not a legal lhs value):
268
269                   @x-1:   /* empty */ { $type = $_[1] };
270                   typedef:    TYPE @x-1 identlist { ... } ;
271
272               where x is a sequential number incremented for each "in rule"
273               action, and -1 represents the "dot position" in the rule where
274               the action arises.
275
276               In such actions, you can use $_[1]..$_[n] variables, which are
277               the semantic values on the left of your action.
278
279               Be aware that the way Parse::Yapp modifies your grammar because
280               of in rule actions can produce, in some cases, spurious
281               conflicts that wouldn't happen otherwise.
282
283           "Generating the Parser Module"
284               Now that you grammar file is written, you can use yapp on it to
285               generate your parser module:
286
287                   yapp -v Calc.yp
288
289               will create two files Calc.pm, your parser module, and
290               Calc.output a verbose output of your parser rules, conflicts,
291               warnings, states and summary.
292
293               What your are missing now is a lexer routine.
294
295           "The Lexer sub"
296               is called each time the parser need to read the next token.
297
298               It is called with only one argument that is the parser object
299               itself, so you can access its methods, specially the
300
301                   $_[0]->YYData
302
303               data area.
304
305               It is its duty to return the next token and value to the
306               parser.  They "must" be returned as a list of two variables,
307               the first one is the token known by the parser (symbolic or
308               literal), the second one beeing anything you want (usualy the
309               content of the token, or the literal value) from a simple
310               scalar value to any complex reference, as the parsing driver
311               never use it but to call semantic actions:
312
313                   ( 'NUMBER', $num )
314               or
315                   ( '>=', '>=' )
316               or
317                   ( 'ARRAY', [ @values ] )
318
319               When the lexer reach the end of input, it must return the ''
320               empty token with an undef value:
321
322                    ( '', undef )
323
324               Note that your lexer should never return 'error' as token
325               value: for the driver, this is the error token used for error
326               recovery and would lead to odd reactions.
327
328               Now that you have your lexer written, maybe you will need to
329               output meaningful error messages, instead of the default which
330               is to print 'Parse error.' on STDERR.
331
332               So you will need an Error reporting sub.
333
334               item "Error reporting routine"
335
336               If you want one, write it knowing that it is passed as
337               parameter the parser object. So you can share information whith
338               the lexer routine quite easily.
339
340               You can also use the "$_[0]->YYErrok" method in it, which will
341               resume parsing as if no error occured. Of course, since the
342               invalid token is still invalid, you're supposed to fix the
343               problem by yourself.
344
345               The method "$_[0]->YYLexer" may help you, as it returns a
346               reference to the lexer routine, and can be called as
347
348                   ($tok,$val)=&{$_[0]->Lexer}
349
350               to get the next token and semantic value from the input stream.
351               To make them current for the parser, use:
352
353                   ($_[0]->YYCurtok, $_[0]->YYCurval) = ($tok, $val)
354
355               and know what you're doing...
356
357           "Parsing"
358               Now you've got everything to do the parsing.
359
360               First, use the parser module:
361
362                   use Calc;
363
364               Then create the parser object:
365
366                   $parser=new Calc;
367
368               Now, call the YYParse method, telling it where to find the
369               lexer and error report subs:
370
371                   $result=$parser->YYParse(yylex => \&Lexer,
372                                          yyerror => \&ErrorReport);
373
374               (assuming Lexer and ErrorReport subs have been written in your
375               current package)
376
377               The order in which parameters appear is unimportant.
378
379               Et voila.
380
381               The YYParse method will do the parse, then return the last
382               semantic value returned, or undef if error recovery cannot
383               recover.
384
385               If you need to be sure the parse has been successful (in case
386               your last returned semantic value is undef) make a call to:
387
388                   $parser->YYNberr()
389
390               which returns the total number of time the error reporting sub
391               has been called.
392
393           "Error Recovery"
394               in Parse::Yapp is implemented the same way it is in yacc.
395
396           "Debugging Parser"
397               To debug your parser, you can call the YYParse method with a
398               debug parameter:
399
400                   $parser->YYParse( ... , yydebug => value, ... )
401
402               where value is a bitfield, each bit representing a specific
403               debug output:
404
405                   Bit Value    Outputs
406                   0x01         Token reading (useful for Lexer debugging)
407                   0x02         States information
408                   0x04         Driver actions (shifts, reduces, accept...)
409                   0x08         Parse Stack dump
410                   0x10         Error Recovery tracing
411
412               To have a full debugging ouput, use
413
414                   debug => 0x1F
415
416               Debugging output is sent to STDERR, and be aware that it can
417               produce "huge" outputs.
418
419           "Standalone Parsers"
420               By default, the parser modules generated will need the
421               Parse::Yapp module installed on the system to run. They use the
422               Parse::Yapp::Driver which can be safely shared between parsers
423               in the same script.
424
425               In the case you'd prefer to have a standalone module generated,
426               use the "-s" switch with yapp: this will automagically copy the
427               driver code into your module so you can use/distribute it
428               without the need of the Parse::Yapp module, making it really a
429               "Standalone Parser".
430
431               If you do so, please remember to include Parse::Yapp's
432               copyright notice in your main module copyright, so others can
433               know about Parse::Yapp module.
434
435           "Source file line numbers"
436               by default will be included in the generated parser module,
437               which will help to find the guilty line in your source file in
438               case of a syntax error.  You can disable this feature by
439               compiling your grammar with yapp using the "-n" switch.
440

BUGS AND SUGGESTIONS

442       If you find bugs, think of anything that could improve Parse::Yapp or
443       have any questions related to it, feel free to contact the author.
444

AUTHOR

446       Francois Desarmenien  <francois@fdesar.net>
447

SEE ALSO

449       yapp(1) perl(1) yacc(1) bison(1).
450
452       The Parse::Yapp module and its related modules and shell scripts are
453       copyright (c) 1998-2001 Francois Desarmenien, France. All rights
454       reserved.
455
456       You may use and distribute them under the terms of either the GNU
457       General Public License or the Artistic License, as specified in the
458       Perl README file.
459
460       If you use the "standalone parser" option so people don't need to
461       install Parse::Yapp on their systems in order to run you software, this
462       copyright noticed should be included in your software copyright too,
463       and the copyright notice in the embedded driver should be left
464       untouched.
465

POD ERRORS

467       Hey! The above document had some coding errors, which are explained
468       below:
469
470       Around line 485:
471           You forgot a '=back' before '=head1'
472
473
474
475perl v5.12.0                      2001-02-11                    Parse::Yapp(3)
Impressum