1YACC(P) POSIX Programmer's Manual YACC(P)
2
3
4
6 yacc - yet another compiler compiler (DEVELOPMENT)
7
9 yacc [-dltv][-b file_prefix][-p sym_prefix] grammar
10
12 The yacc utility shall read a description of a context-free grammar in
13 grammar and write C source code, conforming to the ISO C standard, to a
14 code file, and optionally header information into a header file, in the
15 current directory. The C code shall define a function and related rou‐
16 tines and macros for an automaton that executes a parsing algorithm
17 meeting the requirements in Algorithms .
18
19 The form and meaning of the grammar are described in the EXTENDED
20 DESCRIPTION section.
21
22 The C source code and header file shall be produced in a form suitable
23 as input for the C compiler (see c99 ).
24
26 The yacc utility shall conform to the Base Definitions volume of
27 IEEE Std 1003.1-2001, Section 12.2, Utility Syntax Guidelines.
28
29 The following options shall be supported:
30
31 -b file_prefix
32 Use file_prefix instead of y as the prefix for all output file‐
33 names. The code file y.tab.c, the header file y.tab.h (created
34 when -d is specified), and the description file y.output (cre‐
35 ated when -v is specified), shall be changed to file_prefix
36 .tab.c, file_prefix .tab.h, and file_prefix .output, respec‐
37 tively.
38
39 -d Write the header file; by default only the code file is written.
40 The #define statements associate the token codes assigned by
41 yacc with the user-declared token names. This allows source
42 files other than y.tab.c to access the token codes.
43
44 -l Produce a code file that does not contain any #line constructs.
45 If this option is not present, it is unspecified whether the
46 code file or header file contains #line directives. This should
47 only be used after the grammar and the associated actions are
48 fully debugged.
49
50 -p sym_prefix
51
52 Use sym_prefix instead of yy as the prefix for all external
53 names produced by yacc. The names affected shall include the
54 functions yyparse(), yylex(), and yyerror(), and the variables
55 yylval, yychar, and yydebug. (In the remainder of this section,
56 the six symbols cited are referenced using their default names
57 only as a notational convenience.) Local names may also be
58 affected by the -p option; however, the -p option shall not
59 affect #define symbols generated by yacc.
60
61 -t Modify conditional compilation directives to permit compilation
62 of debugging code in the code file. Runtime debugging statements
63 shall always be contained in the code file, but by default con‐
64 ditional compilation directives prevent their compilation.
65
66 -v Write a file containing a description of the parser and a report
67 of conflicts generated by ambiguities in the grammar.
68
69
71 The following operand is required:
72
73 grammar
74 A pathname of a file containing instructions, hereafter called
75 grammar, for which a parser is to be created. The format for the
76 grammar is described in the EXTENDED DESCRIPTION section.
77
78
80 Not used.
81
83 The file grammar shall be a text file formatted as specified in the
84 EXTENDED DESCRIPTION section.
85
87 The following environment variables shall affect the execution of yacc:
88
89 LANG Provide a default value for the internationalization variables
90 that are unset or null. (See the Base Definitions volume of
91 IEEE Std 1003.1-2001, Section 8.2, Internationalization Vari‐
92 ables for the precedence of internationalization variables used
93 to determine the values of locale categories.)
94
95 LC_ALL If set to a non-empty string value, override the values of all
96 the other internationalization variables.
97
98 LC_CTYPE
99 Determine the locale for the interpretation of sequences of
100 bytes of text data as characters (for example, single-byte as
101 opposed to multi-byte characters in arguments and input files).
102
103 LC_MESSAGES
104 Determine the locale that should be used to affect the format
105 and contents of diagnostic messages written to standard error.
106
107 NLSPATH
108 Determine the location of message catalogs for the processing of
109 LC_MESSAGES .
110
111
112 The LANG and LC_* variables affect the execution of the yacc utility as
113 stated. The main() function defined in Yacc Library shall call:
114
115
116 setlocale(LC_ALL, "")
117
118 and thus the program generated by yacc shall also be affected by the
119 contents of these variables at runtime.
120
122 Default.
123
125 Not used.
126
128 If shift/reduce or reduce/reduce conflicts are detected in grammar,
129 yacc shall write a report of those conflicts to the standard error in
130 an unspecified format.
131
132 Standard error shall also be used for diagnostic messages.
133
135 The code file, the header file, and the description file shall be text
136 files. All are described in the following sections.
137
138 Code File
139 This file shall contain the C source code for the yyparse() function.
140 It shall contain code for the various semantic actions with macro sub‐
141 stitution performed on them as described in the EXTENDED DESCRIPTION
142 section. It also shall contain a copy of the #define statements in the
143 header file. If a %union declaration is used, the declaration for
144 YYSTYPE shall also be included in this file.
145
146 Header File
147 The header file shall contain #define statements that associate the
148 token numbers with the token names. This allows source files other than
149 the code file to access the token codes. If a %union declaration is
150 used, the declaration for YYSTYPE and an extern YYSTYPE yylval declara‐
151 tion shall also be included in this file.
152
153 Description File
154 The description file shall be a text file containing a description of
155 the state machine corresponding to the parser, using an unspecified
156 format. Limits for internal tables (see Limits ) shall also be
157 reported, in an implementation-defined manner. (Some implementations
158 may use dynamic allocation techniques and have no specific limit values
159 to report.)
160
162 The yacc command accepts a language that is used to define a grammar
163 for a target language to be parsed by the tables and code generated by
164 yacc. The language accepted by yacc as a grammar for the target lan‐
165 guage is described below using the yacc input language itself.
166
167 The input grammar includes rules describing the input structure of the
168 target language and code to be invoked when these rules are recognized
169 to provide the associated semantic action. The code to be executed
170 shall appear as bodies of text that are intended to be C-language code.
171 The C-language inclusions are presumed to form a correct function when
172 processed by yacc into its output files. The code included in this way
173 shall be executed during the recognition of the target language.
174
175 Given a grammar, the yacc utility generates the files described in the
176 OUTPUT FILES section. The code file can be compiled and linked using
177 c99. If the declaration and programs sections of the grammar file did
178 not include definitions of main(), yylex(), and yyerror(), the compiled
179 output requires linking with externally supplied versions of those
180 functions. Default versions of main() and yyerror() are supplied in the
181 yacc library and can be linked in by using the -l y operand to c99.
182 The yacc library interfaces need not support interfaces with other than
183 the default yy symbol prefix. The application provides the lexical ana‐
184 lyzer function, yylex(); the lex utility is specifically designed to
185 generate such a routine.
186
187 Input Language
188 The application shall ensure that every specification file consists of
189 three sections in order: declarations, grammar rules, and programs,
190 separated by double percent signs ( "%%" ). The declarations and pro‐
191 grams sections can be empty. If the latter is empty, the preceding "%%"
192 mark separating it from the rules section can be omitted.
193
194 The input is free form text following the structure of the grammar
195 defined below.
196
197 Lexical Structure of the Grammar
198 The <blank>s, <newline>s, and <form-feed>s shall be ignored, except
199 that the application shall ensure that they do not appear in names or
200 multi-character reserved symbols. Comments shall be enclosed in
201 "/* ... */" , and can appear wherever a name is valid.
202
203 Names are of arbitrary length, made up of letters, periods ( '.' ),
204 underscores ( '_' ), and non-initial digits. Uppercase and lowercase
205 letters are distinct. Conforming applications shall not use names
206 beginning in yy or YY since the yacc parser uses such names. Many of
207 the names appear in the final output of yacc, and thus they should be
208 chosen to conform with any additional rules created by the C compiler
209 to be used. In particular they appear in #define statements.
210
211 A literal shall consist of a single character enclosed in single-quotes
212 ( '" ). All of the escape sequences supported for character constants
213 by the ISO C standard shall be supported by yacc.
214
215 The relationship with the lexical analyzer is discussed in detail
216 below.
217
218 The application shall ensure that the NUL character is not used in
219 grammar rules or literals.
220
221 Declarations Section
222 The declarations section is used to define the symbols used to define
223 the target language and their relationship with each other. In particu‐
224 lar, much of the additional information required to resolve ambiguities
225 in the context-free grammar for the target language is provided here.
226
227 Usually yacc assigns the relationship between the symbolic names it
228 generates and their underlying numeric value. The declarations section
229 makes it possible to control the assignment of these values.
230
231 It is also possible to keep semantic information associated with the
232 tokens currently on the parse stack in a user-defined C-language union,
233 if the members of the union are associated with the various names in
234 the grammar. The declarations section provides for this as well.
235
236 The first group of declarators below all take a list of names as argu‐
237 ments. That list can optionally be preceded by the name of a C union
238 member (called a tag below) appearing within '<' and '>' . (As an
239 exception to the typographical conventions of the rest of this volume
240 of IEEE Std 1003.1-2001, in this case <tag> does not represent a
241 metavariable, but the literal angle bracket characters surrounding a
242 symbol.) The use of tag specifies that the tokens named on this line
243 shall be of the same C type as the union member referenced by tag. This
244 is discussed in more detail below.
245
246 For lists used to define tokens, the first appearance of a given token
247 can be followed by a positive integer (as a string of decimal digits).
248 If this is done, the underlying value assigned to it for lexical pur‐
249 poses shall be taken to be that number.
250
251 The following declares name to be a token:
252
253
254 %token [<tag>] name [number][name [number]]...
255
256 If tag is present, the C type for all tokens on this line shall be
257 declared to be the type referenced by tag. If a positive integer, num‐
258 ber, follows a name, that value shall be assigned to the token.
259
260 The following declares name to be a token, and assigns precedence to
261 it:
262
263
264 %left [<tag>] name [number][name [number]]...
265 %right [<tag>] name [number][name [number]]...
266
267 One or more lines, each beginning with one of these symbols, can appear
268 in this section. All tokens on the same line have the same precedence
269 level and associativity; the lines are in order of increasing prece‐
270 dence or binding strength. %left denotes that the operators on that
271 line are left associative, and %right similarly denotes right associa‐
272 tive operators. If tag is present, it shall declare a C type for names
273 as described for %token.
274
275 The following declares name to be a token, and indicates that this can‐
276 not be used associatively:
277
278
279 %nonassoc [<tag>] name [number][name [number]]...
280
281 If the parser encounters associative use of this token it reports an
282 error. If tag is present, it shall declare a C type for names as
283 described for %token.
284
285 The following declares that union member names are non-terminals, and
286 thus it is required to have a tag field at its beginning:
287
288
289 %type <tag> name...
290
291 Because it deals with non-terminals only, assigning a token number or
292 using a literal is also prohibited. If this construct is present, yacc
293 shall perform type checking; if this construct is not present, the
294 parse stack shall hold only the int type.
295
296 Every name used in grammar not defined by a %token, %left, %right, or
297 %nonassoc declaration is assumed to represent a non-terminal symbol.
298 The yacc utility shall report an error for any non-terminal symbol that
299 does not appear on the left side of at least one grammar rule.
300
301 Once the type, precedence, or token number of a name is specified, it
302 shall not be changed. If the first declaration of a token does not
303 assign a token number, yacc shall assign a token number. Once this
304 assignment is made, the token number shall not be changed by explicit
305 assignment.
306
307 The following declarators do not follow the previous pattern.
308
309 The following declares the non-terminal name to be the start symbol,
310 which represents the largest, most general structure described by the
311 grammar rules:
312
313
314 %start name
315
316 By default, it is the left-hand side of the first grammar rule; this
317 default can be overridden with this declaration.
318
319 The following declares the yacc value stack to be a union of the vari‐
320 ous types of values desired:
321
322
323 %union { body of union (in C) }
324
325 By default, the values returned by actions (see below) and the lexical
326 analyzer shall be of type int. The yacc utility keeps track of types,
327 and it shall insert corresponding union member names in order to per‐
328 form strict type checking of the resulting parser.
329
330 Alternatively, given that at least one <tag> construct is used, the
331 union can be declared in a header file (which shall be included in the
332 declarations section by using a #include construct within %{ and %}),
333 and a typedef used to define the symbol YYSTYPE to represent this
334 union. The effect of %union is to provide the declaration of YYSTYPE
335 directly from the yacc input.
336
337 C-language declarations and definitions can appear in the declarations
338 section, enclosed by the following marks:
339
340
341 %{ ... %}
342
343 These statements shall be copied into the code file, and have global
344 scope within it so that they can be used in the rules and program sec‐
345 tions.
346
347 The application shall ensure that the declarations section is termi‐
348 nated by the token %%.
349
350 Grammar Rules in yacc
351 The rules section defines the context-free grammar to be accepted by
352 the function yacc generates, and associates with those rules C-language
353 actions and additional precedence information. The grammar is
354 described below, and a formal definition follows.
355
356 The rules section is comprised of one or more grammar rules. A grammar
357 rule has the form:
358
359
360 A : BODY ;
361
362 The symbol A represents a non-terminal name, and BODY represents a
363 sequence of zero or more names, literals, and semantic actions that can
364 then be followed by optional precedence rules. Only the names and lit‐
365 erals participate in the formation of the grammar; the semantic actions
366 and precedence rules are used in other ways. The colon and the semi‐
367 colon are yacc punctuation. If there are several successive grammar
368 rules with the same left-hand side, the vertical bar '|' can be used to
369 avoid rewriting the left-hand side; in this case the semicolon appears
370 only after the last rule. The BODY part can be empty (or empty of names
371 and literals) to indicate that the non-terminal symbol matches the
372 empty string.
373
374 The yacc utility assigns a unique number to each rule. Rules using the
375 vertical bar notation are distinct rules. The number assigned to the
376 rule appears in the description file.
377
378 The elements comprising a BODY are:
379
380 name, literal
381 These form the rules of the grammar: name is either a token or a
382 non-terminal; literal stands for itself (less the lexically
383 required quotation marks).
384
385 semantic action
386
387 With each grammar rule, the user can associate actions to be
388 performed each time the rule is recognized in the input process.
389 (Note that the word "action" can also refer to the actions of
390 the parser-shift, reduce, and so on.)
391
392 These actions can return values and can obtain the values returned by
393 previous actions. These values are kept in objects of type YYSTYPE (see
394 %union). The result value of the action shall be kept on the parse
395 stack with the left-hand side of the rule, to be accessed by other
396 reductions as part of their right-hand side. By using the <tag> infor‐
397 mation provided in the declarations section, the code generated by yacc
398 can be strictly type checked and contain arbitrary information. In
399 addition, the lexical analyzer can provide the same kinds of values for
400 tokens, if desired.
401
402 An action is an arbitrary C statement and as such can do input or out‐
403 put, call subprograms, and alter external variables. An action is one
404 or more C statements enclosed in curly braces '{' and '}' .
405
406 Certain pseudo-variables can be used in the action. These are macros
407 for access to data structures known internally to yacc.
408
409 $$
410 The value of the action can be set by assigning it to $$. If
411 type checking is enabled and the type of the value to be
412 assigned cannot be determined, a diagnostic message may be gen‐
413 erated.
414
415 $number
416 This refers to the value returned by the component specified by
417 the token number in the right side of a rule, reading from left
418 to right; number can be zero or negative. If number is zero or
419 negative, it refers to the data associated with the name on the
420 parser's stack preceding the leftmost symbol of the current
421 rule. (That is, "$0" refers to the name immediately preceding
422 the leftmost name in the current rule to be found on the
423 parser's stack and "$-1" refers to the symbol to its left.) If
424 number refers to an element past the current point in the rule,
425 or beyond the bottom of the stack, the result is undefined. If
426 type checking is enabled and the type of the value to be
427 assigned cannot be determined, a diagnostic message may be gen‐
428 erated.
429
430 $<tag>number
431
432 These correspond exactly to the corresponding symbols without
433 the tag inclusion, but allow for strict type checking (and pre‐
434 clude unwanted type conversions). The effect is that the macro
435 is expanded to use tag to select an element from the YYSTYPE
436 union (using dataname.tag). This is particularly useful if num‐
437 ber is not positive.
438
439 $<tag>$
440 This imposes on the reference the type of the union member ref‐
441 erenced by tag. This construction is applicable when a reference
442 to a left context value occurs in the grammar, and provides yacc
443 with a means for selecting a type.
444
445
446 Actions can occur anywhere in a rule (not just at the end); an action
447 can access values returned by actions to its left, and in turn the
448 value it returns can be accessed by actions to its right. An action
449 appearing in the middle of a rule shall be equivalent to replacing the
450 action with a new non-terminal symbol and adding an empty rule with
451 that non-terminal symbol on the left-hand side. The semantic action
452 associated with the new rule shall be equivalent to the original
453 action. The use of actions within rules might introduce conflicts that
454 would not otherwise exist.
455
456 By default, the value of a rule shall be the value of the first element
457 in it. If the first element does not have a type (particularly in the
458 case of a literal) and type checking is turned on by %type, an error
459 message shall result.
460
461 precedence
462 The keyword %prec can be used to change the precedence level
463 associated with a particular grammar rule. Examples of this are
464 in cases where a unary and binary operator have the same sym‐
465 bolic representation, but need to be given different prece‐
466 dences, or where the handling of an ambiguous if-else construc‐
467 tion is necessary. The reserved symbol %prec can appear immedi‐
468 ately after the body of the grammar rule and can be followed by
469 a token name or a literal. It shall cause the precedence of the
470 grammar rule to become that of the following token name or lit‐
471 eral. The action for the rule as a whole can follow %prec.
472
473
474 If a program section follows, the application shall ensure that the
475 grammar rules are terminated by %%.
476
477 Programs Section
478 The programs section can include the definition of the lexical analyzer
479 yylex(), and any other functions; for example, those used in the
480 actions specified in the grammar rules. It is unspecified whether the
481 programs section precedes or follows the semantic actions in the output
482 file; therefore, if the application contains any macro definitions and
483 declarations intended to apply to the code in the semantic actions, it
484 shall place them within "%{ ... %}" in the declarations section.
485
486 Input Grammar
487 The following input to yacc yields a parser for the input to yacc. This
488 formal syntax takes precedence over the preceding text syntax descrip‐
489 tion.
490
491 The lexical structure is defined less precisely; Lexical Structure of
492 the Grammar defines most terms. The correspondence between the previous
493 terms and the tokens below is as follows.
494
495 IDENTIFIER
496 This corresponds to the concept of name, given previously. It
497 also includes literals as defined previously.
498
499 C_IDENTIFIER
500 This is a name, and additionally it is known to be followed by a
501 colon. A literal cannot yield this token.
502
503 NUMBER A string of digits (a non-negative decimal integer).
504
505 TYPE, LEFT, MARK, LCURL, RCURL
506
507 These correspond directly to %type, %left, %%, %{, and %}.
508
509 { ... }
510 This indicates C-language source code, with the possible inclu‐
511 sion of '$' macros as discussed previously.
512
513
514
515 /* Grammar for the input to yacc. */
516 /* Basic entries. */
517 /* The following are recognized by the lexical analyzer. */
518
519
520 %token IDENTIFIER /* Includes identifiers and literals */
521 %token C_IDENTIFIER /* identifier (but not literal)
522 followed by a :. */
523 %token NUMBER /* [0-9][0-9]* */
524
525
526 /* Reserved words : %type=>TYPE %left=>LEFT, and so on */
527
528
529 %token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION
530
531
532 %token MARK /* The %% mark. */
533 %token LCURL /* The %{ mark. */
534 %token RCURL /* The %} mark. */
535
536
537 /* 8-bit character literals stand for themselves; */
538 /* tokens have to be defined for multi-byte characters. */
539
540
541 %start spec
542
543
544 %%
545
546
547 spec : defs MARK rules tail
548 ;
549 tail : MARK
550 {
551 /* In this action, set up the rest of the file. */
552 }
553 | /* Empty; the second MARK is optional. */
554 ;
555 defs : /* Empty. */
556 | defs def
557 ;
558 def : START IDENTIFIER
559 | UNION
560 {
561 /* Copy union definition to output. */
562 }
563 | LCURL
564 {
565 /* Copy C code to output file. */
566 }
567 RCURL
568 | rword tag nlist
569 ;
570 rword : TOKEN
571 | LEFT
572 | RIGHT
573 | NONASSOC
574 | TYPE
575 ;
576 tag : /* Empty: union tag ID optional. */
577 | '<' IDENTIFIER '>'
578 ;
579 nlist : nmno
580 | nlist nmno
581 ;
582 nmno : IDENTIFIER /* Note: literal invalid with % type. */
583 | IDENTIFIER NUMBER /* Note: invalid with % type. */
584 ;
585
586
587 /* Rule section */
588
589
590 rules : C_IDENTIFIER rbody prec
591 | rules rule
592 ;
593 rule : C_IDENTIFIER rbody prec
594 | '|' rbody prec
595 ;
596 rbody : /* empty */
597 | rbody IDENTIFIER
598 | rbody act
599 ;
600 act : '{'
601 {
602 /* Copy action, translate $$, and so on. */
603 }
604 '}'
605 ;
606 prec : /* Empty */
607 | PREC IDENTIFIER
608 | PREC IDENTIFIER act
609 | prec ';'
610 ;
611
612 Conflicts
613 The parser produced for an input grammar may contain states in which
614 conflicts occur. The conflicts occur because the grammar is not
615 LALR(1). An ambiguous grammar always contains at least one LALR(1) con‐
616 flict. The yacc utility shall resolve all conflicts, using either
617 default rules or user-specified precedence rules.
618
619 Conflicts are either shift/reduce conflicts or reduce/reduce conflicts.
620 A shift/reduce conflict is where, for a given state and lookahead sym‐
621 bol, both a shift action and a reduce action are possible. A
622 reduce/reduce conflict is where, for a given state and lookahead sym‐
623 bol, reductions by two different rules are possible.
624
625 The rules below describe how to specify what actions to take when a
626 conflict occurs. Not all shift/reduce conflicts can be successfully
627 resolved this way because the conflict may be due to something other
628 than ambiguity, so incautious use of these facilities can cause the
629 language accepted by the parser to be much different from that which
630 was intended. The description file shall contain sufficient information
631 to understand the cause of the conflict. Where ambiguity is the reason
632 either the default or explicit rules should be adequate to produce a
633 working parser.
634
635 The declared precedences and associativities (see Declarations Section
636 ) are used to resolve parsing conflicts as follows:
637
638 1. A precedence and associativity is associated with each grammar
639 rule; it is the precedence and associativity of the last token or
640 literal in the body of the rule. If the %prec keyword is used, it
641 overrides this default. Some grammar rules might not have both
642 precedence and associativity.
643
644 2. If there is a shift/reduce conflict, and both the grammar rule and
645 the input symbol have precedence and associativity associated with
646 them, then the conflict is resolved in favor of the action (shift
647 or reduce) associated with the higher precedence. If the prece‐
648 dences are the same, then the associativity is used; left associa‐
649 tive implies reduce, right associative implies shift, and non-asso‐
650 ciative implies an error in the string being parsed.
651
652 3. When there is a shift/reduce conflict that cannot be resolved by
653 rule 2, the shift is done. Conflicts resolved this way are counted
654 in the diagnostic output described in Error Handling .
655
656 4. When there is a reduce/reduce conflict, a reduction is done by the
657 grammar rule that occurs earlier in the input sequence. Conflicts
658 resolved this way are counted in the diagnostic output described in
659 Error Handling .
660
661 Conflicts resolved by precedence or associativity shall not be counted
662 in the shift/reduce and reduce/reduce conflicts reported by yacc on
663 either standard error or in the description file.
664
665 Error Handling
666 The token error shall be reserved for error handling. The name error
667 can be used in grammar rules. It indicates places where the parser can
668 recover from a syntax error. The default value of error shall be 256.
669 Its value can be changed using a %token declaration. The lexical ana‐
670 lyzer should not return the value of error.
671
672 The parser shall detect a syntax error when it is in a state where the
673 action associated with the lookahead symbol is error. A semantic action
674 can cause the parser to initiate error handling by executing the macro
675 YYERROR. When YYERROR is executed, the semantic action passes control
676 back to the parser. YYERROR cannot be used outside of semantic actions.
677
678 When the parser detects a syntax error, it normally calls yyerror()
679 with the character string "syntax error" as its argument. The call
680 shall not be made if the parser is still recovering from a previous
681 error when the error is detected. The parser is considered to be recov‐
682 ering from a previous error until the parser has shifted over at least
683 three normal input symbols since the last error was detected or a
684 semantic action has executed the macro yyerrok. The parser shall not
685 call yyerror() when YYERROR is executed.
686
687 The macro function YYRECOVERING shall return 1 if a syntax error has
688 been detected and the parser has not yet fully recovered from it. Oth‐
689 erwise, zero shall be returned.
690
691 When a syntax error is detected by the parser, the parser shall check
692 if a previous syntax error has been detected. If a previous error was
693 detected, and if no normal input symbols have been shifted since the
694 preceding error was detected, the parser checks if the lookahead symbol
695 is an endmarker (see Interface to the Lexical Analyzer ). If it is, the
696 parser shall return with a non-zero value. Otherwise, the lookahead
697 symbol shall be discarded and normal parsing shall resume.
698
699 When YYERROR is executed or when the parser detects a syntax error and
700 no previous error has been detected, or at least one normal input sym‐
701 bol has been shifted since the previous error was detected, the parser
702 shall pop back one state at a time until the parse stack is empty or
703 the current state allows a shift over error. If the parser empties the
704 parse stack, it shall return with a non-zero value. Otherwise, it shall
705 shift over error and then resume normal parsing. If the parser reads a
706 lookahead symbol before the error was detected, that symbol shall still
707 be the lookahead symbol when parsing is resumed.
708
709 The macro yyerrok in a semantic action shall cause the parser to act as
710 if it has fully recovered from any previous errors. The macro yyclearin
711 shall cause the parser to discard the current lookahead token. If the
712 current lookahead token has not yet been read, yyclearin shall have no
713 effect.
714
715 The macro YYACCEPT shall cause the parser to return with the value
716 zero. The macro YYABORT shall cause the parser to return with a non-
717 zero value.
718
719 Interface to the Lexical Analyzer
720 The yylex() function is an integer-valued function that returns a token
721 number representing the kind of token read. If there is a value associ‐
722 ated with the token returned by yylex() (see the discussion of tag
723 above), it shall be assigned to the external variable yylval.
724
725 If the parser and yylex() do not agree on these token numbers, reliable
726 communication between them cannot occur. For (single-byte character)
727 literals, the token is simply the numeric value of the character in the
728 current character set. The numbers for other tokens can either be cho‐
729 sen by yacc, or chosen by the user. In either case, the #define con‐
730 struct of C is used to allow yylex() to return these numbers symboli‐
731 cally. The #define statements are put into the code file, and the
732 header file if that file is requested. The set of characters permitted
733 by yacc in an identifier is larger than that permitted by C. Token
734 names found to contain such characters shall not be included in the
735 #define declarations.
736
737 If the token numbers are chosen by yacc, the tokens other than literals
738 shall be assigned numbers greater than 256, although no order is
739 implied. A token can be explicitly assigned a number by following its
740 first appearance in the declarations section with a number. Names and
741 literals not defined this way retain their default definition. All
742 token numbers assigned by yacc shall be unique and distinct from the
743 token numbers used for literals and user-assigned tokens. If duplicate
744 token numbers cause conflicts in parser generation, yacc shall report
745 an error; otherwise, it is unspecified whether the token assignment is
746 accepted or an error is reported.
747
748 The end of the input is marked by a special token called the endmarker,
749 which has a token number that is zero or negative. (These values are
750 invalid for any other token.) All lexical analyzers shall return zero
751 or negative as a token number upon reaching the end of their input. If
752 the tokens up to, but excluding, the endmarker form a structure that
753 matches the start symbol, the parser shall accept the input. If the
754 endmarker is seen in any other context, it shall be considered an
755 error.
756
757 Completing the Program
758 In addition to yyparse() and yylex(), the functions yyerror() and
759 main() are required to make a complete program. The application can
760 supply main() and yyerror(), or those routines can be obtained from the
761 yacc library.
762
763 Yacc Library
764 The following functions shall appear only in the yacc library accessi‐
765 ble through the -l y operand to c99; they can therefore be redefined by
766 a conforming application:
767
768 int main(void)
769
770 This function shall call yyparse() and exit with an unspecified
771 value. Other actions within this function are unspecified.
772
773 int yyerror(const char *s)
774
775 This function shall write the NUL-terminated argument to stan‐
776 dard error, followed by a <newline>.
777
778
779 The order of the -l y and -l l operands given to c99 is significant;
780 the application shall either provide its own main() function or ensure
781 that -l y precedes -l l.
782
783 Debugging the Parser
784 The parser generated by yacc shall have diagnostic facilities in it
785 that can be optionally enabled at either compile time or at runtime (if
786 enabled at compile time). The compilation of the runtime debugging code
787 is under the control of YYDEBUG, a preprocessor symbol. If YYDEBUG has
788 a non-zero value, the debugging code shall be included. If its value is
789 zero, the code shall not be included.
790
791 In parsers where the debugging code has been included, the external int
792 yydebug can be used to turn debugging on (with a non-zero value) and
793 off (zero value) at runtime. The initial value of yydebug shall be
794 zero.
795
796 When -t is specified, the code file shall be built such that, if YYDE‐
797 BUG is not already defined at compilation time (using the c99 -D YYDE‐
798 BUG option, for example), YYDEBUG shall be set explicitly to 1. When -t
799 is not specified, the code file shall be built such that, if YYDEBUG is
800 not already defined, it shall be set explicitly to zero.
801
802 The format of the debugging output is unspecified but includes at least
803 enough information to determine the shift and reduce actions, and the
804 input symbols. It also provides information about error recovery.
805
806 Algorithms
807 The parser constructed by yacc implements an LALR(1) parsing algorithm
808 as documented in the literature. It is unspecified whether the parser
809 is table-driven or direct-coded.
810
811 A parser generated by yacc shall never request an input symbol from
812 yylex() while in a state where the only actions other than the error
813 action are reductions by a single rule.
814
815 The literature of parsing theory defines these concepts.
816
817 Limits
818 The yacc utility may have several internal tables. The minimum maximums
819 for these tables are shown in the following table. The exact meaning of
820 these values is implementation-defined. The implementation shall
821 define the relationship between these values and between them and any
822 error messages that the implementation may generate should it run out
823 of space for any internal structure. An implementation may combine
824 groups of these resources into a single pool as long as the total
825 available to the user does not fall below the sum of the sizes speci‐
826 fied by this section.
827
828 Table: Internal Limits in yacc
829
830
831 Minimum
832 Limit Maximum Description
833 {NTERMS} 126 Number of tokens.
834 {NNONTERM} 200 Number of non-terminals.
835 {NPROD} 300 Number of rules.
836 {NSTATES} 600 Number of states.
837 {MEMSIZE} 5200 Length of rules. The total length, in
838 names (tokens and non-terminals), of all
839 the rules of the grammar. The left-hand
840 side is counted for each rule, even if
841 it is not explicitly repeated, as speci‐
842 fied in Grammar Rules in yacc .
843 {ACTSIZE} 4000 Number of actions. "Actions" here (and
844 in the description file) refer to parser
845 actions (shift, reduce, and so on) not
846 to semantic actions defined in Grammar
847 Rules in yacc .
848
850 The following exit values shall be returned:
851
852 0 Successful completion.
853
854 >0 An error occurred.
855
856
858 If any errors are encountered, the run is aborted and yacc exits with a
859 non-zero status. Partial code files and header files may be produced.
860 The summary information in the description file shall always be pro‐
861 duced if the -v flag is present.
862
863 The following sections are informative.
864
866 Historical implementations experience name conflicts on the names
867 yacc.tmp, yacc.acts, yacc.debug, y.tab.c, y.tab.h, and y.output if more
868 than one copy of yacc is running in a single directory at one time. The
869 -b option was added to overcome this problem. The related problem of
870 allowing multiple yacc parsers to be placed in the same file was
871 addressed by adding a -p option to override the previously hard-coded
872 yy variable prefix.
873
874 The description of the -p option specifies the minimal set of function
875 and variable names that cause conflict when multiple parsers are linked
876 together. YYSTYPE does not need to be changed. Instead, the programmer
877 can use -b to give the header files for different parsers different
878 names, and then the file with the yylex() for a given parser can
879 include the header for that parser. Names such as yyclearerr do not
880 need to be changed because they are used only in the actions; they do
881 not have linkage. It is possible that an implementation has other
882 names, either internal ones for implementing things such as yyclearerr,
883 or providing non-standard features that it wants to change with -p.
884
885 Unary operators that are the same token as a binary operator in general
886 need their precedence adjusted. This is handled by the %prec advisory
887 symbol associated with the particular grammar rule defining that unary
888 operator. (See Grammar Rules in yacc .) Applications are not required
889 to use this operator for unary operators, but the grammars that do not
890 require it are rare.
891
893 Access to the yacc library is obtained with library search operands to
894 c99. To use the yacc library main():
895
896
897 c99 y.tab.c -l y
898
899 Both the lex library and the yacc library contain main(). To access
900 the yacc main():
901
902
903 c99 y.tab.c lex.yy.c -l y -l l
904
905 This ensures that the yacc library is searched first, so that its
906 main() is used.
907
908 The historical yacc libraries have contained two simple functions that
909 are normally coded by the application programmer. These functions are
910 similar to the following code:
911
912
913 #include <locale.h>
914 int main(void)
915 {
916 extern int yyparse();
917
918
919 setlocale(LC_ALL, "");
920
921
922 /* If the following parser is one created by lex, the
923 application must be careful to ensure that LC_CTYPE
924 and LC_COLLATE are set to the POSIX locale. */
925 (void) yyparse();
926 return (0);
927 }
928
929
930 #include <stdio.h>
931
932
933 int yyerror(const char *msg)
934 {
935 (void) fprintf(stderr, "%s\n", msg);
936 return (0);
937 }
938
940 The references in may be helpful in constructing the parser generator.
941 The referenced DeRemer and Pennello article (along with the works it
942 references) describes a technique to generate parsers that conform to
943 this volume of IEEE Std 1003.1-2001. Work in this area continues to be
944 done, so implementors should consult current literature before doing
945 any new implementations. The original Knuth article is the theoretical
946 basis for this kind of parser, but the tables it generates are imprac‐
947 tically large for reasonable grammars and should not be used. The
948 "equivalent to" wording is intentional to assure that the best tables
949 that are LALR(1) can be generated.
950
951 There has been confusion between the class of grammars, the algorithms
952 needed to generate parsers, and the algorithms needed to parse the lan‐
953 guages. They are all reasonably orthogonal. In particular, a parser
954 generator that accepts the full range of LR(1) grammars need not gener‐
955 ate a table any more complex than one that accepts SLR(1) (a relatively
956 weak class of LR grammars) for a grammar that happens to be SLR(1).
957 Such an implementation need not recognize the case, either; table com‐
958 pression can yield the SLR(1) table (or one even smaller than that)
959 without recognizing that the grammar is SLR(1). The speed of an LR(1)
960 parser for any class is dependent more upon the table representation
961 and compression (or the code generation if a direct parser is gener‐
962 ated) than upon the class of grammar that the table generator handles.
963
964 The speed of the parser generator is somewhat dependent upon the class
965 of grammar it handles. However, the original Knuth article algorithms
966 for constructing LR parsers were judged by its author to be impracti‐
967 cally slow at that time. Although full LR is more complex than LALR(1),
968 as computer speeds and algorithms improve, the difference (in terms of
969 acceptable wall-clock execution time) is becoming less significant.
970
971 Potential authors are cautioned that the referenced DeRemer and Pen‐
972 nello article previously cited identifies a bug (an over-simplification
973 of the computation of LALR(1) lookahead sets) in some of the LALR(1)
974 algorithm statements that preceded it to publication. They should take
975 the time to seek out that paper, as well as current relevant work, par‐
976 ticularly Aho's.
977
978 The -b option was added to provide a portable method for permitting
979 yacc to work on multiple separate parsers in the same directory. If a
980 directory contains more than one yacc grammar, and both grammars are
981 constructed at the same time (by, for example, a parallel make pro‐
982 gram), conflict results. While the solution is not historical prac‐
983 tice, it corrects a known deficiency in historical implementations.
984 Corresponding changes were made to all sections that referenced the
985 filenames y.tab.c (now "the code file"), y.tab.h (now "the header
986 file"), and y.output (now "the description file").
987
988 The grammar for yacc input is based on System V documentation. The
989 textual description shows there that the ';' is required at the end of
990 the rule. The grammar and the implementation do not require this. (The
991 use of C_IDENTIFIER causes a reduce to occur in the right place.)
992
993 Also, in that implementation, the constructs such as %token can be ter‐
994 minated by a semicolon, but this is not permitted by the grammar. The
995 keywords such as %token can also appear in uppercase, which is again
996 not discussed. In most places where '%' is used, '\' can be substi‐
997 tuted, and there are alternate spellings for some of the symbols (for
998 example, %LEFT can be "%<" or even "\<" ).
999
1000 Historically, <tag> can contain any characters except '>' , including
1001 white space, in the implementation. However, since the tag must refer‐
1002 ence an ISO C standard union member, in practice conforming implementa‐
1003 tions need to support only the set of characters for ISO C standard
1004 identifiers in this context.
1005
1006 Some historical implementations are known to accept actions that are
1007 terminated by a period. Historical implementations often allow '$' in
1008 names. A conforming implementation does not need to support either of
1009 these behaviors.
1010
1011 Deciding when to use %prec illustrates the difficulty in specifying the
1012 behavior of yacc. There may be situations in which the grammar is not,
1013 strictly speaking, in error, and yet yacc cannot interpret it unambigu‐
1014 ously. The resolution of ambiguities in the grammar can in many
1015 instances be resolved by providing additional information, such as
1016 using %type or %union declarations. It is often easier and it usually
1017 yields a smaller parser to take this alternative when it is appropri‐
1018 ate.
1019
1020 The size and execution time of a program produced without the runtime
1021 debugging code is usually smaller and slightly faster in historical
1022 implementations.
1023
1024 Statistics messages from several historical implementations include the
1025 following types of information:
1026
1027
1028 n/512 terminals, n/300 non-terminals
1029 n/600 grammar rules, n/1500 states
1030 n shift/reduce, n reduce/reduce conflicts reported
1031 n/350 working sets used
1032 Memory: states, etc. n/15000, parser n/15000
1033 n/600 distinct lookahead sets
1034 n extra closures
1035 n shift entries, n exceptions
1036 n goto entries
1037 n entries saved by goto default
1038 Optimizer space used: input n/15000, output n/15000
1039 n table entries, n zero
1040 Maximum spread: n, Maximum offset: n
1041
1042 The report of internal tables in the description file is left implemen‐
1043 tation-defined because all aspects of these limits are also implementa‐
1044 tion-defined. Some implementations may use dynamic allocation tech‐
1045 niques and have no specific limit values to report.
1046
1047 The format of the y.output file is not given because specification of
1048 the format was not seen to enhance applications portability. The list‐
1049 ing is primarily intended to help human users understand and debug the
1050 parser; use of y.output by a conforming application script would be
1051 unusual. Furthermore, implementations have not produced consistent out‐
1052 put and no popular format was apparent. The format selected by the
1053 implementation should be human-readable, in addition to the requirement
1054 that it be a text file.
1055
1056 Standard error reports are not specifically described because they are
1057 seldom of use to conforming applications and there was no reason to
1058 restrict implementations.
1059
1060 Some implementations recognize "={" as equivalent to '{' because it
1061 appears in historical documentation. This construction was recognized
1062 and documented as obsolete as long ago as 1978, in the referenced Yacc:
1063 Yet Another Compiler-Compiler. This volume of IEEE Std 1003.1-2001
1064 chose to leave it as obsolete and omit it.
1065
1066 Multi-byte characters should be recognized by the lexical analyzer and
1067 returned as tokens. They should not be returned as multi-byte character
1068 literals. The token error that is used for error recovery is normally
1069 assigned the value 256 in the historical implementation. Thus, the
1070 token value 256, which is used in many multi-byte character sets, is
1071 not available for use as the value of a user-defined token.
1072
1074 None.
1075
1077 c99 , lex
1078
1080 Portions of this text are reprinted and reproduced in electronic form
1081 from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
1082 -- Portable Operating System Interface (POSIX), The Open Group Base
1083 Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
1084 Electrical and Electronics Engineers, Inc and The Open Group. In the
1085 event of any discrepancy between this version and the original IEEE and
1086 The Open Group Standard, the original IEEE and The Open Group Standard
1087 is the referee document. The original Standard can be obtained online
1088 at http://www.opengroup.org/unix/online.html .
1089
1090
1091
1092IEEE/The Open Group 2003 YACC(P)