yacc(1p)

1YACC(P)                    POSIX Programmer's Manual                   YACC(P)
2
3
4

NAME

6       yacc - yet another compiler compiler (DEVELOPMENT)
7

SYNOPSIS

9       yacc [-dltv][-b file_prefix][-p sym_prefix] grammar
10

DESCRIPTION

12       The  yacc utility shall read a description of a context-free grammar in
13       grammar and write C source code, conforming to the ISO C standard, to a
14       code file, and optionally header information into a header file, in the
15       current directory. The C code shall define a function and related  rou‐
16       tines  and  macros  for  an automaton that executes a parsing algorithm
17       meeting the requirements in Algorithms .
18
19       The form and meaning of the  grammar  are  described  in  the  EXTENDED
20       DESCRIPTION section.
21
22       The  C source code and header file shall be produced in a form suitable
23       as input for the C compiler (see c99 ).
24

OPTIONS

26       The yacc utility shall  conform  to  the  Base  Definitions  volume  of
27       IEEE Std 1003.1-2001, Section 12.2, Utility Syntax Guidelines.
28
29       The following options shall be supported:
30
31       -b  file_prefix
32              Use  file_prefix instead of y as the prefix for all output file‐
33              names. The code file y.tab.c, the header file  y.tab.h  (created
34              when  -d  is specified), and the description file y.output (cre‐
35              ated when -v is specified),  shall  be  changed  to  file_prefix
36              .tab.c,  file_prefix  .tab.h,  and  file_prefix .output, respec‐
37              tively.
38
39       -d     Write the header file; by default only the code file is written.
40              The  #define  statements  associate  the token codes assigned by
41              yacc with the user-declared  token  names.  This  allows  source
42              files other than y.tab.c to access the token codes.
43
44       -l     Produce  a code file that does not contain any #line constructs.
45              If this option is not present, it  is  unspecified  whether  the
46              code  file or header file contains #line directives. This should
47              only be used after the grammar and the  associated  actions  are
48              fully debugged.
49
50       -p  sym_prefix
51
52              Use  sym_prefix  instead  of  yy  as the prefix for all external
53              names produced by yacc. The names  affected  shall  include  the
54              functions  yyparse(),  yylex(), and yyerror(), and the variables
55              yylval, yychar, and yydebug. (In the remainder of this  section,
56              the  six  symbols cited are referenced using their default names
57              only as a notational  convenience.)  Local  names  may  also  be
58              affected  by  the  -p  option;  however, the -p option shall not
59              affect #define symbols generated by yacc.
60
61       -t     Modify conditional compilation directives to permit  compilation
62              of debugging code in the code file. Runtime debugging statements
63              shall always be contained in the code file, but by default  con‐
64              ditional compilation directives prevent their compilation.
65
66       -v     Write a file containing a description of the parser and a report
67              of conflicts generated by ambiguities in the grammar.
68
69

OPERANDS

71       The following operand is required:
72
73       grammar
74              A pathname of a file containing instructions,  hereafter  called
75              grammar, for which a parser is to be created. The format for the
76              grammar is described in the EXTENDED DESCRIPTION section.
77
78

STDIN

80       Not used.
81

INPUT FILES

83       The file grammar shall be a text file formatted  as  specified  in  the
84       EXTENDED DESCRIPTION section.
85

ENVIRONMENT VARIABLES

87       The following environment variables shall affect the execution of yacc:
88
89       LANG   Provide  a  default value for the internationalization variables
90              that are unset or null. (See  the  Base  Definitions  volume  of
91              IEEE Std 1003.1-2001,  Section  8.2,  Internationalization Vari‐
92              ables for the precedence of internationalization variables  used
93              to determine the values of locale categories.)
94
95       LC_ALL If  set  to a non-empty string value, override the values of all
96              the other internationalization variables.
97
98       LC_CTYPE
99              Determine the locale for  the  interpretation  of  sequences  of
100              bytes  of  text  data as characters (for example, single-byte as
101              opposed to multi-byte characters in arguments and input files).
102
103       LC_MESSAGES
104              Determine the locale that should be used to  affect  the  format
105              and contents of diagnostic messages written to standard error.
106
107       NLSPATH
108              Determine the location of message catalogs for the processing of
109              LC_MESSAGES .
110
111
112       The LANG and LC_* variables affect the execution of the yacc utility as
113       stated. The main() function defined in Yacc Library shall call:
114
115
116              setlocale(LC_ALL, "")
117
118       and  thus  the  program generated by yacc shall also be affected by the
119       contents of these variables at runtime.
120

ASYNCHRONOUS EVENTS

122       Default.
123

STDOUT

125       Not used.
126

STDERR

128       If shift/reduce or reduce/reduce conflicts  are  detected  in  grammar,
129       yacc  shall  write a report of those conflicts to the standard error in
130       an unspecified format.
131
132       Standard error shall also be used for diagnostic messages.
133

OUTPUT FILES

135       The code file, the header file, and the description file shall be  text
136       files. All are described in the following sections.
137
138   Code File
139       This  file  shall contain the C source code for the yyparse() function.
140       It shall contain code for the various semantic actions with macro  sub‐
141       stitution  performed  on  them as described in the EXTENDED DESCRIPTION
142       section. It also shall contain a copy of the #define statements in  the
143       header  file.  If  a  %union  declaration  is used, the declaration for
144       YYSTYPE shall also be included in this file.
145
146   Header File
147       The header file shall contain #define  statements  that  associate  the
148       token numbers with the token names. This allows source files other than
149       the code file to access the token codes. If  a  %union  declaration  is
150       used, the declaration for YYSTYPE and an extern YYSTYPE yylval declara‐
151       tion shall also be included in this file.
152
153   Description File
154       The description file shall be a text file containing a  description  of
155       the  state  machine  corresponding  to the parser, using an unspecified
156       format. Limits  for  internal  tables  (see  Limits  )  shall  also  be
157       reported,  in  an  implementation-defined manner. (Some implementations
158       may use dynamic allocation techniques and have no specific limit values
159       to report.)
160

EXTENDED DESCRIPTION

162       The  yacc  command  accepts a language that is used to define a grammar
163       for a target language to be parsed by the tables and code generated  by
164       yacc.  The  language  accepted by yacc as a grammar for the target lan‐
165       guage is described below using the yacc input language itself.
166
167       The input grammar includes rules describing the input structure of  the
168       target  language and code to be invoked when these rules are recognized
169       to provide the associated semantic action.  The  code  to  be  executed
170       shall appear as bodies of text that are intended to be C-language code.
171       The C-language inclusions are presumed to form a correct function  when
172       processed  by yacc into its output files. The code included in this way
173       shall be executed during the recognition of the target language.
174
175       Given a grammar, the yacc utility generates the files described in  the
176       OUTPUT  FILES  section.  The code file can be compiled and linked using
177       c99. If the declaration and programs sections of the grammar  file  did
178       not include definitions of main(), yylex(), and yyerror(), the compiled
179       output requires linking with  externally  supplied  versions  of  those
180       functions. Default versions of main() and yyerror() are supplied in the
181       yacc library and can be linked in by using the  -l y  operand  to  c99.
182       The yacc library interfaces need not support interfaces with other than
183       the default yy symbol prefix. The application provides the lexical ana‐
184       lyzer  function,  yylex();  the lex utility is specifically designed to
185       generate such a routine.
186
187   Input Language
188       The application shall ensure that every specification file consists  of
189       three  sections  in  order:  declarations, grammar rules, and programs,
190       separated by double percent signs ( "%%" ). The declarations  and  pro‐
191       grams sections can be empty. If the latter is empty, the preceding "%%"
192       mark separating it from the rules section can be omitted.
193
194       The input is free form text following  the  structure  of  the  grammar
195       defined below.
196
197   Lexical Structure of the Grammar
198       The  <blank>s,  <newline>s,  and  <form-feed>s shall be ignored, except
199       that the application shall ensure that they do not appear in  names  or
200       multi-character   reserved  symbols.  Comments  shall  be  enclosed  in
201       "/* ... */" , and can appear wherever a name is valid.
202
203       Names are of arbitrary length, made up of letters, periods  (  '.'   ),
204       underscores  (  '_'  ), and non-initial digits. Uppercase and lowercase
205       letters are distinct.  Conforming  applications  shall  not  use  names
206       beginning  in  yy  or YY since the yacc parser uses such names. Many of
207       the names appear in the final output of yacc, and thus they  should  be
208       chosen  to  conform with any additional rules created by the C compiler
209       to be used. In particular they appear in #define statements.
210
211       A literal shall consist of a single character enclosed in single-quotes
212       (  '"  ). All of the escape sequences supported for character constants
213       by the ISO C standard shall be supported by yacc.
214
215       The relationship with the  lexical  analyzer  is  discussed  in  detail
216       below.
217
218       The  application  shall  ensure  that  the NUL character is not used in
219       grammar rules or literals.
220
221   Declarations Section
222       The declarations section is used to define the symbols used  to  define
223       the target language and their relationship with each other. In particu‐
224       lar, much of the additional information required to resolve ambiguities
225       in the context-free grammar for the target language is provided here.
226
227       Usually  yacc  assigns  the  relationship between the symbolic names it
228       generates and their underlying numeric value. The declarations  section
229       makes it possible to control the assignment of these values.
230
231       It  is  also  possible to keep semantic information associated with the
232       tokens currently on the parse stack in a user-defined C-language union,
233       if  the  members  of the union are associated with the various names in
234       the grammar. The declarations section provides for this as well.
235
236       The first group of declarators below all take a list of names as  argu‐
237       ments.   That  list can optionally be preceded by the name of a C union
238       member (called a tag below) appearing within  '<'  and  '>'  .  (As  an
239       exception  to  the typographical conventions of the rest of this volume
240       of IEEE Std 1003.1-2001, in  this  case  <tag>  does  not  represent  a
241       metavariable,  but  the  literal angle bracket characters surrounding a
242       symbol.) The use of tag specifies that the tokens named  on  this  line
243       shall be of the same C type as the union member referenced by tag. This
244       is discussed in more detail below.
245
246       For lists used to define tokens, the first appearance of a given  token
247       can  be followed by a positive integer (as a string of decimal digits).
248       If this is done, the underlying value assigned to it for  lexical  pur‐
249       poses shall be taken to be that number.
250
251       The following declares name to be a token:
252
253
254              %token [<tag>] name [number][name [number]]...
255
256       If  tag  is  present,  the  C type for all tokens on this line shall be
257       declared to be the type referenced by tag. If a positive integer,  num‐
258       ber, follows a name, that value shall be assigned to the token.
259
260       The  following  declares  name to be a token, and assigns precedence to
261       it:
262
263
264              %left [<tag>] name [number][name [number]]...
265              %right [<tag>] name [number][name [number]]...
266
267       One or more lines, each beginning with one of these symbols, can appear
268       in  this  section. All tokens on the same line have the same precedence
269       level and associativity; the lines are in order  of  increasing  prece‐
270       dence  or  binding  strength.  %left denotes that the operators on that
271       line are left associative, and %right similarly denotes right  associa‐
272       tive  operators. If tag is present, it shall declare a C type for names
273       as described for %token.
274
275       The following declares name to be a token, and indicates that this can‐
276       not be used associatively:
277
278
279              %nonassoc [<tag>] name [number][name [number]]...
280
281       If  the  parser  encounters associative use of this token it reports an
282       error. If tag is present, it shall  declare  a  C  type  for  names  as
283       described for %token.
284
285       The  following  declares that union member names are non-terminals, and
286       thus it is required to have a tag field at its beginning:
287
288
289              %type <tag> name...
290
291       Because it deals with non-terminals only, assigning a token  number  or
292       using  a literal is also prohibited. If this construct is present, yacc
293       shall perform type checking; if this  construct  is  not  present,  the
294       parse stack shall hold only the int type.
295
296       Every  name  used in grammar not defined by a %token, %left, %right, or
297       %nonassoc declaration is assumed to represent  a  non-terminal  symbol.
298       The yacc utility shall report an error for any non-terminal symbol that
299       does not appear on the left side of at least one grammar rule.
300
301       Once the type, precedence, or token number of a name is  specified,  it
302       shall  not  be  changed.  If  the first declaration of a token does not
303       assign a token number, yacc shall assign a  token  number.   Once  this
304       assignment  is  made, the token number shall not be changed by explicit
305       assignment.
306
307       The following declarators do not follow the previous pattern.
308
309       The following declares the non-terminal name to be  the  start  symbol,
310       which  represents  the largest, most general structure described by the
311       grammar rules:
312
313
314              %start name
315
316       By default, it is the left-hand side of the first  grammar  rule;  this
317       default can be overridden with this declaration.
318
319       The  following declares the yacc value stack to be a union of the vari‐
320       ous types of values desired:
321
322
323              %union { body of union (in C) }
324
325       By default, the values returned by actions (see below) and the  lexical
326       analyzer  shall  be of type int. The yacc utility keeps track of types,
327       and it shall insert corresponding union member names in order  to  per‐
328       form strict type checking of the resulting parser.
329
330       Alternatively,  given  that  at  least one <tag> construct is used, the
331       union can be declared in a header file (which shall be included in  the
332       declarations  section  by using a #include construct within %{ and %}),
333       and a typedef used to define  the  symbol  YYSTYPE  to  represent  this
334       union.  The  effect  of %union is to provide the declaration of YYSTYPE
335       directly from the yacc input.
336
337       C-language declarations and definitions can appear in the  declarations
338       section, enclosed by the following marks:
339
340
341              %{ ... %}
342
343       These  statements  shall  be copied into the code file, and have global
344       scope within it so that they can be used in the rules and program  sec‐
345       tions.
346
347       The  application  shall  ensure that the declarations section is termi‐
348       nated by the token %%.
349
350   Grammar Rules in yacc
351       The rules section defines the context-free grammar to  be  accepted  by
352       the function yacc generates, and associates with those rules C-language
353       actions  and  additional  precedence  information.   The   grammar   is
354       described below, and a formal definition follows.
355
356       The  rules section is comprised of one or more grammar rules. A grammar
357       rule has the form:
358
359
360              A : BODY ;
361
362       The symbol A represents a non-terminal  name,  and  BODY  represents  a
363       sequence of zero or more names, literals, and semantic actions that can
364       then be followed by optional precedence rules. Only the names and  lit‐
365       erals participate in the formation of the grammar; the semantic actions
366       and precedence rules are used in other ways. The colon  and  the  semi‐
367       colon  are  yacc  punctuation.  If there are several successive grammar
368       rules with the same left-hand side, the vertical bar '|' can be used to
369       avoid  rewriting the left-hand side; in this case the semicolon appears
370       only after the last rule. The BODY part can be empty (or empty of names
371       and  literals)  to  indicate  that  the non-terminal symbol matches the
372       empty string.
373
374       The yacc utility assigns a unique number to each rule. Rules using  the
375       vertical  bar  notation  are distinct rules. The number assigned to the
376       rule appears in the description file.
377
378       The elements comprising a BODY are:
379
380       name, literal
381              These form the rules of the grammar: name is either a token or a
382              non-terminal;  literal  stands  for  itself  (less the lexically
383              required quotation marks).
384
385       semantic action
386
387              With each grammar rule, the user can  associate  actions  to  be
388              performed each time the rule is recognized in the input process.
389              (Note that the word "action" can also refer to  the  actions  of
390              the parser-shift, reduce, and so on.)
391
392       These  actions  can return values and can obtain the values returned by
393       previous actions. These values are kept in objects of type YYSTYPE (see
394       %union).  The  result  value  of  the action shall be kept on the parse
395       stack with the left-hand side of the rule,  to  be  accessed  by  other
396       reductions as part of their right-hand side.  By using the <tag> infor‐
397       mation provided in the declarations section, the code generated by yacc
398       can  be  strictly  type  checked  and contain arbitrary information. In
399       addition, the lexical analyzer can provide the same kinds of values for
400       tokens, if desired.
401
402       An  action is an arbitrary C statement and as such can do input or out‐
403       put, call subprograms, and alter external variables. An action  is  one
404       or more C statements enclosed in curly braces '{' and '}' .
405
406       Certain  pseudo-variables  can  be used in the action. These are macros
407       for access to data structures known internally to yacc.
408
409       $$
410              The value of the action can be set by assigning  it  to  $$.  If
411              type  checking  is  enabled  and  the  type  of  the value to be
412              assigned cannot be determined, a diagnostic message may be  gen‐
413              erated.
414
415       $number
416              This  refers to the value returned by the component specified by
417              the token number in the right side of a rule, reading from  left
418              to  right;  number can be zero or negative. If number is zero or
419              negative, it refers to the data associated with the name on  the
420              parser's  stack  preceding  the  leftmost  symbol of the current
421              rule. (That is, "$0" refers to the  name  immediately  preceding
422              the  leftmost  name  in  the  current  rule  to  be found on the
423              parser's stack and "$-1" refers to the symbol to its  left.)  If
424              number  refers to an element past the current point in the rule,
425              or beyond the bottom of the stack, the result is  undefined.  If
426              type  checking  is  enabled  and  the  type  of  the value to be
427              assigned cannot be determined, a diagnostic message may be  gen‐
428              erated.
429
430       $<tag>number
431
432              These  correspond  exactly  to the corresponding symbols without
433              the tag inclusion, but allow for strict type checking (and  pre‐
434              clude  unwanted  type conversions). The effect is that the macro
435              is expanded to use tag to select an  element  from  the  YYSTYPE
436              union  (using dataname.tag). This is particularly useful if num‐
437              ber is not positive.
438
439       $<tag>$
440              This imposes on the reference the type of the union member  ref‐
441              erenced by tag. This construction is applicable when a reference
442              to a left context value occurs in the grammar, and provides yacc
443              with a means for selecting a type.
444
445
446       Actions  can  occur anywhere in a rule (not just at the end); an action
447       can access values returned by actions to its  left,  and  in  turn  the
448       value  it  returns  can be accessed by actions to its right.  An action
449       appearing in the middle of a rule shall be equivalent to replacing  the
450       action  with  a  new  non-terminal symbol and adding an empty rule with
451       that non-terminal symbol on the left-hand  side.  The  semantic  action
452       associated  with  the  new  rule  shall  be  equivalent to the original
453       action. The use of actions within rules might introduce conflicts  that
454       would not otherwise exist.
455
456       By default, the value of a rule shall be the value of the first element
457       in it. If the first element does not have a type (particularly  in  the
458       case  of  a  literal) and type checking is turned on by %type, an error
459       message shall result.
460
461       precedence
462              The keyword %prec can be used to  change  the  precedence  level
463              associated  with a particular grammar rule. Examples of this are
464              in cases where a unary and binary operator have  the  same  sym‐
465              bolic  representation,  but  need  to  be given different prece‐
466              dences, or where the handling of an ambiguous if-else  construc‐
467              tion is necessary.  The reserved symbol %prec can appear immedi‐
468              ately after the body of the grammar rule and can be followed  by
469              a  token name or a literal. It shall cause the precedence of the
470              grammar rule to become that of the following token name or  lit‐
471              eral. The action for the rule as a whole can follow %prec.
472
473
474       If  a  program  section  follows, the application shall ensure that the
475       grammar rules are terminated by %%.
476
477   Programs Section
478       The programs section can include the definition of the lexical analyzer
479       yylex(),  and  any  other  functions;  for  example,  those used in the
480       actions specified in the grammar rules.  It is unspecified whether  the
481       programs section precedes or follows the semantic actions in the output
482       file; therefore, if the application contains any macro definitions  and
483       declarations  intended to apply to the code in the semantic actions, it
484       shall place them within "%{ ... %}" in the declarations section.
485
486   Input Grammar
487       The following input to yacc yields a parser for the input to yacc. This
488       formal  syntax takes precedence over the preceding text syntax descrip‐
489       tion.
490
491       The lexical structure is defined less precisely; Lexical  Structure  of
492       the Grammar defines most terms. The correspondence between the previous
493       terms and the tokens below is as follows.
494
495       IDENTIFIER
496              This corresponds to the concept of name,  given  previously.  It
497              also includes literals as defined previously.
498
499       C_IDENTIFIER
500              This is a name, and additionally it is known to be followed by a
501              colon.  A literal cannot yield this token.
502
503       NUMBER A string of digits (a non-negative decimal integer).
504
505       TYPE, LEFT, MARK, LCURL, RCURL
506
507              These correspond directly to %type, %left, %%, %{, and %}.
508
509       { ... }
510              This indicates C-language source code, with the possible  inclu‐
511              sion of '$' macros as discussed previously.
512
513
514
515              /* Grammar for the input to yacc. */
516              /* Basic entries. */
517              /* The following are recognized by the lexical analyzer. */
518
519
520              %token    IDENTIFIER      /* Includes identifiers and literals */
521              %token    C_IDENTIFIER    /* identifier (but not literal)
522                                           followed by a :. */
523              %token    NUMBER          /* [0-9][0-9]* */
524
525
526              /* Reserved words : %type=>TYPE %left=>LEFT, and so on */
527
528
529              %token    LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION
530
531
532              %token    MARK            /* The %% mark. */
533              %token    LCURL           /* The %{ mark. */
534              %token    RCURL           /* The %} mark. */
535
536
537              /* 8-bit character literals stand for themselves; */
538              /* tokens have to be defined for multi-byte characters. */
539
540
541              %start    spec
542
543
544              %%
545
546
547              spec  : defs MARK rules tail
548                    ;
549              tail  : MARK
550                    {
551                      /* In this action, set up the rest of the file. */
552                    }
553                    | /* Empty; the second MARK is optional. */
554                    ;
555              defs  : /* Empty. */
556                    |    defs def
557                    ;
558              def   : START IDENTIFIER
559                    |    UNION
560                    {
561                      /* Copy union definition to output. */
562                    }
563                    |    LCURL
564                    {
565                      /* Copy C code to output file. */
566                    }
567                      RCURL
568                    |    rword tag nlist
569                    ;
570              rword : TOKEN
571                    | LEFT
572                    | RIGHT
573                    | NONASSOC
574                    | TYPE
575                    ;
576              tag   : /* Empty: union tag ID optional. */
577                    | '<' IDENTIFIER '>'
578                    ;
579              nlist : nmno
580                    | nlist nmno
581                    ;
582              nmno  : IDENTIFIER         /* Note: literal invalid with % type. */
583                    | IDENTIFIER NUMBER  /* Note: invalid with % type. */
584                    ;
585
586
587              /* Rule section */
588
589
590              rules : C_IDENTIFIER rbody prec
591                    | rules  rule
592                    ;
593              rule  : C_IDENTIFIER rbody prec
594                    | '|' rbody prec
595                    ;
596              rbody : /* empty */
597                    | rbody IDENTIFIER
598                    | rbody act
599                    ;
600              act   : '{'
601                      {
602                        /* Copy action, translate $$, and so on. */
603                      }
604                      '}'
605                    ;
606              prec  : /* Empty */
607                    | PREC IDENTIFIER
608                    | PREC IDENTIFIER act
609                    | prec ';'
610                    ;
611
612   Conflicts
613       The  parser  produced  for an input grammar may contain states in which
614       conflicts occur.  The  conflicts  occur  because  the  grammar  is  not
615       LALR(1). An ambiguous grammar always contains at least one LALR(1) con‐
616       flict. The yacc utility  shall  resolve  all  conflicts,  using  either
617       default rules or user-specified precedence rules.
618
619       Conflicts are either shift/reduce conflicts or reduce/reduce conflicts.
620       A shift/reduce conflict is where, for a given state and lookahead  sym‐
621       bol,  both  a  shift  action  and  a  reduce  action  are  possible.  A
622       reduce/reduce conflict is where, for a given state and  lookahead  sym‐
623       bol, reductions by two different rules are possible.
624
625       The  rules  below  describe  how to specify what actions to take when a
626       conflict occurs. Not all shift/reduce  conflicts  can  be  successfully
627       resolved  this  way  because the conflict may be due to something other
628       than ambiguity, so incautious use of these  facilities  can  cause  the
629       language  accepted  by  the parser to be much different from that which
630       was intended. The description file shall contain sufficient information
631       to  understand the cause of the conflict. Where ambiguity is the reason
632       either the default or explicit rules should be adequate  to  produce  a
633       working parser.
634
635       The  declared precedences and associativities (see Declarations Section
636       ) are used to resolve parsing conflicts as follows:
637
638        1. A precedence and associativity  is  associated  with  each  grammar
639           rule;  it  is the precedence and associativity of the last token or
640           literal in the body of the rule. If the %prec keyword is  used,  it
641           overrides  this  default.  Some  grammar  rules might not have both
642           precedence and associativity.
643
644        2. If there is a shift/reduce conflict, and both the grammar rule  and
645           the  input symbol have precedence and associativity associated with
646           them, then the conflict is resolved in favor of the  action  (shift
647           or  reduce)  associated  with  the higher precedence. If the prece‐
648           dences are the same, then the associativity is used; left  associa‐
649           tive implies reduce, right associative implies shift, and non-asso‐
650           ciative implies an error in the string being parsed.
651
652        3. When there is a shift/reduce conflict that cannot  be  resolved  by
653           rule  2, the shift is done. Conflicts resolved this way are counted
654           in the diagnostic output described in Error Handling .
655
656        4. When there is a reduce/reduce conflict, a reduction is done by  the
657           grammar  rule that occurs earlier in the input sequence.  Conflicts
658           resolved this way are counted in the diagnostic output described in
659           Error Handling .
660
661       Conflicts  resolved by precedence or associativity shall not be counted
662       in the shift/reduce and reduce/reduce conflicts  reported  by  yacc  on
663       either standard error or in the description file.
664
665   Error Handling
666       The  token  error  shall be reserved for error handling. The name error
667       can be used in grammar rules. It indicates places where the parser  can
668       recover  from  a syntax error. The default value of error shall be 256.
669       Its value can be changed using a %token declaration. The  lexical  ana‐
670       lyzer should not return the value of error.
671
672       The  parser shall detect a syntax error when it is in a state where the
673       action associated with the lookahead symbol is error. A semantic action
674       can  cause the parser to initiate error handling by executing the macro
675       YYERROR. When YYERROR is executed, the semantic action  passes  control
676       back to the parser. YYERROR cannot be used outside of semantic actions.
677
678       When  the  parser  detects  a syntax error, it normally calls yyerror()
679       with the character string "syntax error"  as  its  argument.  The  call
680       shall  not  be  made  if the parser is still recovering from a previous
681       error when the error is detected. The parser is considered to be recov‐
682       ering  from a previous error until the parser has shifted over at least
683       three normal input symbols since the  last  error  was  detected  or  a
684       semantic  action  has  executed the macro yyerrok. The parser shall not
685       call yyerror() when YYERROR is executed.
686
687       The macro function YYRECOVERING shall return 1 if a  syntax  error  has
688       been  detected and the parser has not yet fully recovered from it. Oth‐
689       erwise, zero shall be returned.
690
691       When a syntax error is detected by the parser, the parser  shall  check
692       if  a  previous syntax error has been detected. If a previous error was
693       detected, and if no normal input symbols have been  shifted  since  the
694       preceding error was detected, the parser checks if the lookahead symbol
695       is an endmarker (see Interface to the Lexical Analyzer ). If it is, the
696       parser  shall  return  with  a non-zero value. Otherwise, the lookahead
697       symbol shall be discarded and normal parsing shall resume.
698
699       When YYERROR is executed or when the parser detects a syntax error  and
700       no  previous error has been detected, or at least one normal input sym‐
701       bol has been shifted since the previous error was detected, the  parser
702       shall  pop  back  one state at a time until the parse stack is empty or
703       the current state allows a shift over error.  If the parser empties the
704       parse stack, it shall return with a non-zero value. Otherwise, it shall
705       shift over error and then resume normal parsing. If the parser reads  a
706       lookahead symbol before the error was detected, that symbol shall still
707       be the lookahead symbol when parsing is resumed.
708
709       The macro yyerrok in a semantic action shall cause the parser to act as
710       if it has fully recovered from any previous errors. The macro yyclearin
711       shall cause the parser to discard the current lookahead token.  If  the
712       current  lookahead token has not yet been read, yyclearin shall have no
713       effect.
714
715       The macro YYACCEPT shall cause the parser  to  return  with  the  value
716       zero.  The  macro  YYABORT shall cause the parser to return with a non-
717       zero value.
718
719   Interface to the Lexical Analyzer
720       The yylex() function is an integer-valued function that returns a token
721       number representing the kind of token read. If there is a value associ‐
722       ated with the token returned by yylex()  (see  the  discussion  of  tag
723       above), it shall be assigned to the external variable yylval.
724
725       If the parser and yylex() do not agree on these token numbers, reliable
726       communication between them cannot occur.  For  (single-byte  character)
727       literals, the token is simply the numeric value of the character in the
728       current character set. The numbers for other tokens can either be  cho‐
729       sen  by  yacc,  or chosen by the user. In either case, the #define con‐
730       struct of C is used to allow yylex() to return these  numbers  symboli‐
731       cally.   The  #define  statements  are  put into the code file, and the
732       header file if that file is requested. The set of characters  permitted
733       by  yacc  in  an  identifier  is larger than that permitted by C. Token
734       names found to contain such characters shall not  be  included  in  the
735       #define declarations.
736
737       If the token numbers are chosen by yacc, the tokens other than literals
738       shall be assigned numbers  greater  than  256,  although  no  order  is
739       implied.  A  token can be explicitly assigned a number by following its
740       first appearance in the declarations section with a number.  Names  and
741       literals  not  defined  this  way  retain their default definition. All
742       token numbers assigned by yacc shall be unique and  distinct  from  the
743       token  numbers used for literals and user-assigned tokens. If duplicate
744       token numbers cause conflicts in parser generation, yacc  shall  report
745       an  error; otherwise, it is unspecified whether the token assignment is
746       accepted or an error is reported.
747
748       The end of the input is marked by a special token called the endmarker,
749       which  has  a  token number that is zero or negative. (These values are
750       invalid for any other token.) All lexical analyzers shall  return  zero
751       or  negative as a token number upon reaching the end of their input. If
752       the tokens up to, but excluding, the endmarker form  a  structure  that
753       matches  the  start  symbol,  the parser shall accept the input. If the
754       endmarker is seen in any other  context,  it  shall  be  considered  an
755       error.
756
757   Completing the Program
758       In  addition  to  yyparse()  and  yylex(),  the functions yyerror() and
759       main() are required to make a complete  program.  The  application  can
760       supply main() and yyerror(), or those routines can be obtained from the
761       yacc library.
762
763   Yacc Library
764       The following functions shall appear only in the yacc library  accessi‐
765       ble through the -l y operand to c99; they can therefore be redefined by
766       a conforming application:
767
768       int  main(void)
769
770              This function shall call yyparse() and exit with an  unspecified
771              value. Other actions within this function are unspecified.
772
773       int  yyerror(const char *s)
774
775              This  function  shall write the NUL-terminated argument to stan‐
776              dard error, followed by a <newline>.
777
778
779       The order of the -l y and -l l operands given to  c99  is  significant;
780       the  application shall either provide its own main() function or ensure
781       that -l y precedes -l l.
782
783   Debugging the Parser
784       The parser generated by yacc shall have  diagnostic  facilities  in  it
785       that can be optionally enabled at either compile time or at runtime (if
786       enabled at compile time). The compilation of the runtime debugging code
787       is  under the control of YYDEBUG, a preprocessor symbol. If YYDEBUG has
788       a non-zero value, the debugging code shall be included. If its value is
789       zero, the code shall not be included.
790
791       In parsers where the debugging code has been included, the external int
792       yydebug can be used to turn debugging on (with a  non-zero  value)  and
793       off  (zero  value)  at  runtime.  The initial value of yydebug shall be
794       zero.
795
796       When -t is specified, the code file shall be built such that, if  YYDE‐
797       BUG  is not already defined at compilation time (using the c99 -D YYDE‐
798       BUG option, for example), YYDEBUG shall be set explicitly to 1. When -t
799       is not specified, the code file shall be built such that, if YYDEBUG is
800       not already defined, it shall be set explicitly to zero.
801
802       The format of the debugging output is unspecified but includes at least
803       enough  information  to determine the shift and reduce actions, and the
804       input symbols. It also provides information about error recovery.
805
806   Algorithms
807       The parser constructed by yacc implements an LALR(1) parsing  algorithm
808       as  documented  in the literature. It is unspecified whether the parser
809       is table-driven or direct-coded.
810
811       A parser generated by yacc shall never request  an  input  symbol  from
812       yylex()  while  in  a state where the only actions other than the error
813       action are reductions by a single rule.
814
815       The literature of parsing theory defines these concepts.
816
817   Limits
818       The yacc utility may have several internal tables. The minimum maximums
819       for these tables are shown in the following table. The exact meaning of
820       these  values  is  implementation-defined.   The  implementation  shall
821       define  the  relationship between these values and between them and any
822       error messages that the implementation may generate should it  run  out
823       of  space  for  any  internal  structure. An implementation may combine
824       groups of these resources into a single  pool  as  long  as  the  total
825       available  to  the user does not fall below the sum of the sizes speci‐
826       fied by this section.
827
828                           Table: Internal Limits in yacc
829
830
831                        Minimum
832           Limit        Maximum   Description
833           {NTERMS}     126       Number of tokens.
834           {NNONTERM}   200       Number of non-terminals.
835           {NPROD}      300       Number of rules.
836           {NSTATES}    600       Number of states.
837           {MEMSIZE}    5200      Length of rules. The total length, in
838                                  names (tokens and non-terminals), of all
839                                  the rules of the grammar. The left-hand
840                                  side is counted for each rule, even if
841                                  it is not explicitly repeated, as speci‐
842                                  fied in Grammar Rules in yacc .
843           {ACTSIZE}    4000      Number of actions. "Actions" here (and
844                                  in the description file) refer to parser
845                                  actions (shift, reduce, and so on) not
846                                  to semantic actions defined in Grammar
847                                  Rules in yacc .
848

EXIT STATUS

850       The following exit values shall be returned:
851
852        0     Successful completion.
853
854       >0     An error occurred.
855
856

CONSEQUENCES OF ERRORS

858       If any errors are encountered, the run is aborted and yacc exits with a
859       non-zero status. Partial code files and header files may  be  produced.
860       The  summary  information  in the description file shall always be pro‐
861       duced if the -v flag is present.
862
863       The following sections are informative.
864

APPLICATION USAGE

866       Historical implementations  experience  name  conflicts  on  the  names
867       yacc.tmp, yacc.acts, yacc.debug, y.tab.c, y.tab.h, and y.output if more
868       than one copy of yacc is running in a single directory at one time. The
869       -b  option  was  added to overcome this problem. The related problem of
870       allowing multiple yacc parsers to  be  placed  in  the  same  file  was
871       addressed  by  adding a -p option to override the previously hard-coded
872       yy variable prefix.
873
874       The description of the -p option specifies the minimal set of  function
875       and variable names that cause conflict when multiple parsers are linked
876       together. YYSTYPE does not need to be changed. Instead, the  programmer
877       can  use  -b  to  give the header files for different parsers different
878       names, and then the file with  the  yylex()  for  a  given  parser  can
879       include  the  header  for  that parser. Names such as yyclearerr do not
880       need to be changed because they are used only in the actions;  they  do
881       not  have  linkage.  It  is  possible  that an implementation has other
882       names, either internal ones for implementing things such as yyclearerr,
883       or providing non-standard features that it wants to change with -p.
884
885       Unary operators that are the same token as a binary operator in general
886       need their precedence adjusted. This is handled by the  %prec  advisory
887       symbol  associated with the particular grammar rule defining that unary
888       operator. (See Grammar Rules in yacc .) Applications are  not  required
889       to  use this operator for unary operators, but the grammars that do not
890       require it are rare.
891

EXAMPLES

893       Access to the yacc library is obtained with library search operands  to
894       c99. To use the yacc library main():
895
896
897              c99 y.tab.c -l y
898
899       Both  the  lex  library and the yacc library contain main().  To access
900       the yacc main():
901
902
903              c99 y.tab.c lex.yy.c -l y -l l
904
905       This ensures that the yacc library  is  searched  first,  so  that  its
906       main() is used.
907
908       The  historical yacc libraries have contained two simple functions that
909       are normally coded by the application programmer.  These functions  are
910       similar to the following code:
911
912
913              #include <locale.h>
914              int main(void)
915              {
916                  extern int yyparse();
917
918
919                  setlocale(LC_ALL, "");
920
921
922                  /* If the following parser is one created by lex, the
923                     application must be careful to ensure that LC_CTYPE
924                     and LC_COLLATE are set to the POSIX locale. */
925                  (void) yyparse();
926                  return (0);
927              }
928
929
930              #include <stdio.h>
931
932
933              int yyerror(const char *msg)
934              {
935                  (void) fprintf(stderr, "%s\n", msg);
936                  return (0);
937              }
938

RATIONALE

940       The  references in may be helpful in constructing the parser generator.
941       The referenced DeRemer and Pennello article (along with  the  works  it
942       references)  describes  a technique to generate parsers that conform to
943       this volume of IEEE Std 1003.1-2001.  Work in this area continues to be
944       done,  so  implementors  should consult current literature before doing
945       any new implementations. The original Knuth article is the  theoretical
946       basis  for this kind of parser, but the tables it generates are imprac‐
947       tically large for reasonable grammars  and  should  not  be  used.  The
948       "equivalent  to"  wording is intentional to assure that the best tables
949       that are LALR(1) can be generated.
950
951       There has been confusion between the class of grammars, the  algorithms
952       needed to generate parsers, and the algorithms needed to parse the lan‐
953       guages. They are all reasonably orthogonal.  In  particular,  a  parser
954       generator that accepts the full range of LR(1) grammars need not gener‐
955       ate a table any more complex than one that accepts SLR(1) (a relatively
956       weak  class  of  LR  grammars) for a grammar that happens to be SLR(1).
957       Such an implementation need not recognize the case, either; table  com‐
958       pression  can  yield  the  SLR(1) table (or one even smaller than that)
959       without recognizing that the grammar is SLR(1). The speed of  an  LR(1)
960       parser  for  any  class is dependent more upon the table representation
961       and compression (or the code generation if a direct  parser  is  gener‐
962       ated) than upon the class of grammar that the table generator handles.
963
964       The  speed of the parser generator is somewhat dependent upon the class
965       of grammar it handles. However, the original Knuth  article  algorithms
966       for  constructing  LR parsers were judged by its author to be impracti‐
967       cally slow at that time. Although full LR is more complex than LALR(1),
968       as  computer speeds and algorithms improve, the difference (in terms of
969       acceptable wall-clock execution time) is becoming less significant.
970
971       Potential authors are cautioned that the referenced  DeRemer  and  Pen‐
972       nello article previously cited identifies a bug (an over-simplification
973       of the computation of LALR(1) lookahead sets) in some  of  the  LALR(1)
974       algorithm  statements that preceded it to publication. They should take
975       the time to seek out that paper, as well as current relevant work, par‐
976       ticularly Aho's.
977
978       The  -b  option  was  added to provide a portable method for permitting
979       yacc to work on multiple separate parsers in the same directory.  If  a
980       directory  contains  more  than one yacc grammar, and both grammars are
981       constructed at the same time (by, for example,  a  parallel  make  pro‐
982       gram),  conflict  results.   While the solution is not historical prac‐
983       tice, it corrects a known  deficiency  in  historical  implementations.
984       Corresponding  changes  were  made  to all sections that referenced the
985       filenames y.tab.c (now "the  code  file"),  y.tab.h  (now  "the  header
986       file"), and y.output (now "the description file").
987
988       The  grammar  for  yacc  input is based on System V documentation.  The
989       textual description shows there that the ';' is required at the end  of
990       the  rule. The grammar and the implementation do not require this. (The
991       use of C_IDENTIFIER causes a reduce to occur in the right place.)
992
993       Also, in that implementation, the constructs such as %token can be ter‐
994       minated  by  a semicolon, but this is not permitted by the grammar. The
995       keywords such as %token can also appear in uppercase,  which  is  again
996       not  discussed.  In  most  places where '%' is used, '\' can be substi‐
997       tuted, and there are alternate spellings for some of the  symbols  (for
998       example, %LEFT can be "%<" or even "\<" ).
999
1000       Historically,  <tag>  can contain any characters except '>' , including
1001       white space, in the implementation. However, since the tag must  refer‐
1002       ence an ISO C standard union member, in practice conforming implementa‐
1003       tions need to support only the set of  characters  for  ISO C  standard
1004       identifiers in this context.
1005
1006       Some  historical  implementations  are known to accept actions that are
1007       terminated by a period. Historical implementations often allow  '$'  in
1008       names.  A  conforming implementation does not need to support either of
1009       these behaviors.
1010
1011       Deciding when to use %prec illustrates the difficulty in specifying the
1012       behavior  of yacc. There may be situations in which the grammar is not,
1013       strictly speaking, in error, and yet yacc cannot interpret it unambigu‐
1014       ously.  The  resolution  of  ambiguities  in  the  grammar  can in many
1015       instances be resolved by  providing  additional  information,  such  as
1016       using  %type  or %union declarations. It is often easier and it usually
1017       yields a smaller parser to take this alternative when it  is  appropri‐
1018       ate.
1019
1020       The  size  and execution time of a program produced without the runtime
1021       debugging code is usually smaller and  slightly  faster  in  historical
1022       implementations.
1023
1024       Statistics messages from several historical implementations include the
1025       following types of information:
1026
1027
1028              n/512 terminals, n/300 non-terminals
1029              n/600 grammar rules, n/1500 states
1030              n shift/reduce, n reduce/reduce conflicts reported
1031              n/350 working sets used
1032              Memory: states, etc. n/15000, parser n/15000
1033              n/600 distinct lookahead sets
1034              n extra closures
1035              n shift entries, n exceptions
1036              n goto entries
1037              n entries saved by goto default
1038              Optimizer space used: input n/15000, output n/15000
1039              n table entries, n zero
1040              Maximum spread: n, Maximum offset: n
1041
1042       The report of internal tables in the description file is left implemen‐
1043       tation-defined because all aspects of these limits are also implementa‐
1044       tion-defined. Some implementations may  use  dynamic  allocation  tech‐
1045       niques and have no specific limit values to report.
1046
1047       The  format  of the y.output file is not given because specification of
1048       the format was not seen to enhance applications portability. The  list‐
1049       ing  is primarily intended to help human users understand and debug the
1050       parser; use of y.output by a conforming  application  script  would  be
1051       unusual. Furthermore, implementations have not produced consistent out‐
1052       put and no popular format was apparent.  The  format  selected  by  the
1053       implementation should be human-readable, in addition to the requirement
1054       that it be a text file.
1055
1056       Standard error reports are not specifically described because they  are
1057       seldom  of  use  to  conforming applications and there was no reason to
1058       restrict implementations.
1059
1060       Some implementations recognize "={" as equivalent  to  '{'  because  it
1061       appears  in  historical documentation. This construction was recognized
1062       and documented as obsolete as long ago as 1978, in the referenced Yacc:
1063       Yet  Another  Compiler-Compiler.  This  volume  of IEEE Std 1003.1-2001
1064       chose to leave it as obsolete and omit it.
1065
1066       Multi-byte characters should be recognized by the lexical analyzer  and
1067       returned as tokens. They should not be returned as multi-byte character
1068       literals. The token error that is used for error recovery  is  normally
1069       assigned  the  value  256  in  the historical implementation. Thus, the
1070       token value 256, which is used in many multi-byte  character  sets,  is
1071       not available for use as the value of a user-defined token.
1072

FUTURE DIRECTIONS

1074       None.
1075

COPYRIGHT

1080       Portions  of  this text are reprinted and reproduced in electronic form
1081       from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
1082       --  Portable  Operating  System  Interface (POSIX), The Open Group Base
1083       Specifications Issue 6, Copyright (C) 2001-2003  by  the  Institute  of
1084       Electrical  and  Electronics  Engineers, Inc and The Open Group. In the
1085       event of any discrepancy between this version and the original IEEE and
1086       The  Open Group Standard, the original IEEE and The Open Group Standard
1087       is the referee document. The original Standard can be  obtained  online
1088       at http://www.opengroup.org/unix/online.html .
1089
1090
1091
1092IEEE/The Open Group                  2003                              YACC(P)