1YACC(1P)                   POSIX Programmer's Manual                  YACC(1P)
2
3
4

PROLOG

6       This  manual  page is part of the POSIX Programmer's Manual.  The Linux
7       implementation of this interface may differ (consult the  corresponding
8       Linux  manual page for details of Linux behavior), or the interface may
9       not be implemented on Linux.
10
11

NAME

13       yacc — yet another compiler compiler (DEVELOPMENT)
14

SYNOPSIS

16       yacc [−dltv] [−b file_prefix] [−p sym_prefix] grammar
17

DESCRIPTION

19       The yacc utility shall read a description of a context-free grammar  in
20       grammar and write C source code, conforming to the ISO C standard, to a
21       code file, and optionally header information into a header file, in the
22       current  directory.  The  generated source code shall not depend on any
23       undefined, unspecified, or implementation-defined behavior,  except  in
24       cases  where  it  is  copied  directly from the supplied grammar, or in
25       cases that are documented by  the  implementation.  The  C  code  shall
26       define a function and related routines and macros for an automaton that
27       executes a parsing algorithm meeting the requirements in Algorithms.
28
29       The form and meaning of the  grammar  are  described  in  the  EXTENDED
30       DESCRIPTION section.
31
32       The  C source code and header file shall be produced in a form suitable
33       as input for the C compiler (see c99).
34

OPTIONS

36       The yacc utility shall  conform  to  the  Base  Definitions  volume  of
37       POSIX.1‐2008,  Section  12.2,  Utility  Syntax  Guidelines,  except for
38       Guideline 9.
39
40       The following options shall be supported:
41
42       −b file_prefix
43                 Use file_prefix instead of y as the  prefix  for  all  output
44                 filenames.  The  code  file  y.tab.c, the header file y.tab.h
45                 (created when −d is  specified),  and  the  description  file
46                 y.output  (created when −v is specified), shall be changed to
47                 file_prefix.tab.c, file_prefix.tab.h, and file_prefix.output,
48                 respectively.
49
50       −d        Write the header file; by default only the code file is writ‐
51                 ten.  The  #define  statements  associate  the  token   codes
52                 assigned  by  yacc  with  the user-declared token names. This
53                 allows source files other than y.tab.c to  access  the  token
54                 codes.
55
56       −l        Produce  a  code  file  that  does not contain any #line con‐
57                 structs. If this option is not  present,  it  is  unspecified
58                 whether  the  code  file or header file contains #line direc‐
59                 tives. This should only be used after  the  grammar  and  the
60                 associated actions are fully debugged.
61
62       −p sym_prefix
63                 Use  sym_prefix  instead of yy as the prefix for all external
64                 names produced by yacc.  The names affected shall include the
65                 functions  yyparse(),  yylex(),  and yyerror(), and the vari‐
66                 ables yylval, yychar, and yydebug.  (In the remainder of this
67                 section,  the  six  symbols  cited are referenced using their
68                 default names only as a notational convenience.) Local  names
69                 may also be affected by the −p option; however, the −p option
70                 shall not affect #define symbols generated by yacc.
71
72       −t        Modify conditional compilation directives to permit  compila‐
73                 tion  of  debugging  code in the code file. Runtime debugging
74                 statements shall always be contained in the code file, but by
75                 default conditional compilation directives prevent their com‐
76                 pilation.
77
78       −v        Write a file containing a description of  the  parser  and  a
79                 report of conflicts generated by ambiguities in the grammar.
80

OPERANDS

82       The following operand is required:
83
84       grammar   A  pathname  of  a  file  containing  instructions, hereafter
85                 called grammar, for which a parser is to be created. The for‐
86                 mat  for the grammar is described in the EXTENDED DESCRIPTION
87                 section.
88

STDIN

90       Not used.
91

INPUT FILES

93       The file grammar shall be a text file formatted  as  specified  in  the
94       EXTENDED DESCRIPTION section.
95

ENVIRONMENT VARIABLES

97       The following environment variables shall affect the execution of yacc:
98
99       LANG      Provide  a  default  value for the internationalization vari‐
100                 ables that are unset or null. (See the Base Definitions  vol‐
101                 ume  of POSIX.1‐2008, Section 8.2, Internationalization Vari‐
102                 ables for the precedence  of  internationalization  variables
103                 used to determine the values of locale categories.)
104
105       LC_ALL    If  set  to  a non-empty string value, override the values of
106                 all the other internationalization variables.
107
108       LC_CTYPE  Determine the locale for the interpretation of  sequences  of
109                 bytes of text data as characters (for example, single-byte as
110                 opposed to  multi-byte  characters  in  arguments  and  input
111                 files).
112
113       LC_MESSAGES
114                 Determine the locale that should be used to affect the format
115                 and contents  of  diagnostic  messages  written  to  standard
116                 error.
117
118       NLSPATH   Determine the location of message catalogs for the processing
119                 of LC_MESSAGES.
120
121       The LANG and LC_* variables affect the execution of the yacc utility as
122       stated. The main() function defined in Yacc Library shall call:
123
124           setlocale(LC_ALL, "")
125
126       and  thus  the  program generated by yacc shall also be affected by the
127       contents of these variables at runtime.
128

ASYNCHRONOUS EVENTS

130       Default.
131

STDOUT

133       Not used.
134

STDERR

136       If shift/reduce or reduce/reduce conflicts  are  detected  in  grammar,
137       yacc  shall  write a report of those conflicts to the standard error in
138       an unspecified format.
139
140       Standard error shall also be used for diagnostic messages.
141

OUTPUT FILES

143       The code file, the header file, and the description file shall be  text
144       files. All are described in the following sections.
145
146   Code File
147       This  file  shall contain the C source code for the yyparse() function.
148       It shall contain code for the various semantic actions with macro  sub‐
149       stitution  performed  on  them as described in the EXTENDED DESCRIPTION
150       section. It also shall contain a copy of the #define statements in  the
151       header  file.  If  a  %union  declaration  is used, the declaration for
152       YYSTYPE shall also be included in this file.
153
154   Header File
155       The header file shall contain #define  statements  that  associate  the
156       token numbers with the token names. This allows source files other than
157       the code file to access the token codes. If  a  %union  declaration  is
158       used, the declaration for YYSTYPE and an extern YYSTYPE yylval declara‐
159       tion shall also be included in this file.
160
161   Description File
162       The description file shall be a text file containing a  description  of
163       the  state  machine  corresponding  to the parser, using an unspecified
164       format. Limits for internal tables (see Limits) shall also be reported,
165       in  an  implementation-defined  manner.  (Some  implementations may use
166       dynamic allocation techniques and have  no  specific  limit  values  to
167       report.)
168

EXTENDED DESCRIPTION

170       The  yacc  command  accepts a language that is used to define a grammar
171       for a target language to be parsed by the tables and code generated  by
172       yacc.   The  language accepted by yacc as a grammar for the target lan‐
173       guage is described below using the yacc input language itself.
174
175       The input grammar includes rules describing the input structure of  the
176       target  language and code to be invoked when these rules are recognized
177       to provide the associated semantic action.  The  code  to  be  executed
178       shall appear as bodies of text that are intended to be C-language code.
179       These bodies of text shall not contain C-language trigraphs. The C-lan‐
180       guage inclusions are presumed to form a correct function when processed
181       by yacc into its output files. The code included in this way  shall  be
182       executed during the recognition of the target language.
183
184       Given  a grammar, the yacc utility generates the files described in the
185       OUTPUT FILES section. The code file can be compiled  and  linked  using
186       c99.   If the declaration and programs sections of the grammar file did
187       not include definitions of main(), yylex(), and yyerror(), the compiled
188       output  requires  linking  with  externally  supplied versions of those
189       functions. Default versions of main() and yyerror() are supplied in the
190       yacc  library  and  can  be linked in by using the −l y operand to c99.
191       The yacc library interfaces need not support interfaces with other than
192       the default yy symbol prefix. The application provides the lexical ana‐
193       lyzer function, yylex(); the lex utility is  specifically  designed  to
194       generate such a routine.
195
196   Input Language
197       The  application shall ensure that every specification file consists of
198       three sections in order: declarations,  grammar  rules,  and  programs,
199       separated by double <percent-sign> characters ("%%").  The declarations
200       and programs sections can be empty. If the latter is empty, the preced‐
201       ing "%%" mark separating it from the rules section can be omitted.
202
203       The  input  is  free  form  text following the structure of the grammar
204       defined below.
205
206   Lexical Structure of the Grammar
207       The <blank>, <newline>, and <form-feed>  character  shall  be  ignored,
208       except  that  the  application  shall ensure that they do not appear in
209       names or multi-character reserved symbols. Comments shall  be  enclosed
210       in "/* ... */", and can appear wherever a name is valid.
211
212       Names  are  of  arbitrary  length,  made  up of letters, periods ('.'),
213       underscores ('_'), and non-initial digits. Uppercase and lowercase let‐
214       ters  are distinct.  Conforming applications shall not use names begin‐
215       ning in yy or YY since the yacc parser uses such  names.  Many  of  the
216       names  appear in the final output of yacc, and thus they should be cho‐
217       sen to conform with any additional rules created by the C  compiler  to
218       be used. In particular they appear in #define statements.
219
220       A  literal shall consist of a single character enclosed in single-quote
221       characters. All of the escape sequences supported  for  character  con‐
222       stants by the ISO C standard shall be supported by yacc.
223
224       The  relationship  with  the  lexical  analyzer  is discussed in detail
225       below.
226
227       The application shall ensure that the NUL  character  is  not  used  in
228       grammar rules or literals.
229
230   Declarations Section
231       The  declarations  section is used to define the symbols used to define
232       the target language and their relationship with each other. In particu‐
233       lar, much of the additional information required to resolve ambiguities
234       in the context-free grammar for the target language is provided here.
235
236       Usually yacc assigns the relationship between  the  symbolic  names  it
237       generates  and their underlying numeric value. The declarations section
238       makes it possible to control the assignment of these values.
239
240       It is also possible to keep semantic information  associated  with  the
241       tokens currently on the parse stack in a user-defined C-language union,
242       if the members of the union are associated with the  various  names  in
243       the grammar. The declarations section provides for this as well.
244
245       The  first group of declarators below all take a list of names as argu‐
246       ments. That list can optionally be preceded by the name of  a  C  union
247       member  (called  a  tag  below)  appearing  within '<' and '>'.  (As an
248       exception to the typographical conventions of the rest of  this  volume
249       of  POSIX.1‐2008, in this case <tag> does not represent a metavariable,
250       but the literal angle bracket characters surrounding a symbol.) The use
251       of  tag  specifies  that  the tokens named on this line shall be of the
252       same C type as the union member referenced by tag.  This  is  discussed
253       in more detail below.
254
255       For  lists used to define tokens, the first appearance of a given token
256       can be followed by a positive integer (as a string of decimal  digits).
257       If  this  is done, the underlying value assigned to it for lexical pur‐
258       poses shall be taken to be that number.
259
260       The following declares name to be a token:
261
262           %token [<tag>] name [number] [name [number]]...
263
264       If tag is present, the C type for all tokens  on  this  line  shall  be
265       declared to be the type referenced by tag.  If a positive integer, num‐
266       ber, follows a name, that value shall be assigned to the token.
267
268       The following declares name to be a token, and  assigns  precedence  to
269       it:
270
271           %left [<tag>] name [number] [name [number]]...
272           %right [<tag>] name [number] [name [number]]...
273
274       One or more lines, each beginning with one of these symbols, can appear
275       in this section. All tokens on the same line have the  same  precedence
276       level  and  associativity;  the lines are in order of increasing prece‐
277       dence or binding strength.  %left denotes that the  operators  on  that
278       line  are left associative, and %right similarly denotes right associa‐
279       tive operators. If tag is present, it shall declare a C type for  names
280       as described for %token.
281
282       The following declares name to be a token, and indicates that this can‐
283       not be used associatively:
284
285           %nonassoc [<tag>] name [number] [name [number]]...
286
287       If the parser encounters associative use of this token  it  reports  an
288       error.  If  tag  is  present,  it  shall  declare a C type for names as
289       described for %token.
290
291       The following declares that union member names are  non-terminals,  and
292       thus it is required to have a tag field at its beginning:
293
294           %type <tag> name...
295
296       Because  it  deals with non-terminals only, assigning a token number or
297       using a literal is also prohibited. If this construct is present,  yacc
298       shall  perform  type  checking;  if  this construct is not present, the
299       parse stack shall hold only the int type.
300
301       Every name used in grammar not defined by a %token, %left,  %right,  or
302       %nonassoc  declaration  is  assumed to represent a non-terminal symbol.
303       The yacc utility shall report an error for any non-terminal symbol that
304       does not appear on the left side of at least one grammar rule.
305
306       Once  the  type, precedence, or token number of a name is specified, it
307       shall not be changed. If the first declaration  of  a  token  does  not
308       assign  a  token  number,  yacc  shall assign a token number. Once this
309       assignment is made, the token number shall not be changed  by  explicit
310       assignment.
311
312       The following declarators do not follow the previous pattern.
313
314       The  following  declares  the non-terminal name to be the start symbol,
315       which represents the largest, most general structure described  by  the
316       grammar rules:
317
318           %start name
319
320       By  default,  it  is the left-hand side of the first grammar rule; this
321       default can be overridden with this declaration.
322
323       The following declares the yacc value stack to be a union of the  vari‐
324       ous types of values desired.
325
326           %union { body of union (in C) }
327
328       The  body of the union shall not contain unbalanced curly brace prepro‐
329       cessing tokens.
330
331       By default, the values returned by actions (see below) and the  lexical
332       analyzer  shall be of type int.  The yacc utility keeps track of types,
333       and it shall insert corresponding union member names in order  to  per‐
334       form strict type checking of the resulting parser.
335
336       Alternatively,  given  that  at  least one <tag> construct is used, the
337       union can be declared in a header file (which shall be included in  the
338       declarations  section  by using a #include construct within %{ and %}),
339       and a typedef used to define  the  symbol  YYSTYPE  to  represent  this
340       union.  The  effect  of %union is to provide the declaration of YYSTYPE
341       directly from the yacc input.
342
343       C-language declarations and definitions can appear in the  declarations
344       section, enclosed by the following marks:
345
346           %{ ... %}
347
348       These  statements  shall  be copied into the code file, and have global
349       scope within it so that they can be used in the rules and program  sec‐
350       tions.  The statements shall not contain "%}" outside a comment, string
351       literal, or multi-character constant.
352
353       The application shall ensure that the declarations  section  is  termi‐
354       nated by the token %%.
355
356   Grammar Rules in yacc
357       The  rules  section  defines the context-free grammar to be accepted by
358       the function yacc generates, and associates with those rules C-language
359       actions and additional precedence information. The grammar is described
360       below, and a formal definition follows.
361
362       The rules section is comprised of one or more grammar rules. A  grammar
363       rule has the form:
364
365           A : BODY ;
366
367       The  symbol  A  represents  a  non-terminal name, and BODY represents a
368       sequence of zero or more names, literals, and semantic actions that can
369       then be followed by optional precedence rules.  Only the names and lit‐
370       erals participate in the formation of the grammar; the semantic actions
371       and precedence rules are used in other ways. The <colon> and the <semi‐
372       colon> are yacc punctuation. If there are  several  successive  grammar
373       rules  with  the  same left-hand side, the <vertical-line> ('|') can be
374       used to avoid rewriting the left-hand side; in  this  case  the  <semi‐
375       colon> appears only after the last rule. The BODY part can be empty (or
376       empty of names and literals) to indicate that the  non-terminal  symbol
377       matches the empty string.
378
379       The  yacc utility assigns a unique number to each rule. Rules using the
380       vertical bar notation are distinct rules. The number  assigned  to  the
381       rule appears in the description file.
382
383       The elements comprising a BODY are:
384
385       name, literal
386                 These  form  the rules of the grammar: name is either a token
387                 or a non-terminal; literal stands for itself (less the  lexi‐
388                 cally required quotation marks).
389
390       semantic action
391                 With  each grammar rule, the user can associate actions to be
392                 performed each time the  rule  is  recognized  in  the  input
393                 process. (Note that the word ``action'' can also refer to the
394                 actions of the parser—shift, reduce, and so on.)
395
396                 These actions can return values and  can  obtain  the  values
397                 returned  by  previous  actions.  These  values  are  kept in
398                 objects of type YYSTYPE (see %union).  The  result  value  of
399                 the  action  shall  be kept on the parse stack with the left-
400                 hand side of the rule, to be accessed by other reductions  as
401                 part of their right-hand side. By using the <tag> information
402                 provided in the declarations section, the code  generated  by
403                 yacc  can  be  strictly  type  checked  and contain arbitrary
404                 information. In addition, the lexical  analyzer  can  provide
405                 the same kinds of values for tokens, if desired.
406
407                 An  action  is  an  arbitrary  C statement and as such can do
408                 input or output, call subprograms, and alter  external  vari‐
409                 ables.  An  action  is  one  or more C statements enclosed in
410                 curly braces '{' and '}'.  The statements shall  not  contain
411                 unbalanced curly brace preprocessing tokens.
412
413                 Certain pseudo-variables can be used in the action. These are
414                 macros for access to  data  structures  known  internally  to
415                 yacc.
416
417                 $$        The  value of the action can be set by assigning it
418                           to $$. If type checking is enabled and the type  of
419                           the  value  to  be assigned cannot be determined, a
420                           diagnostic message may be generated.
421
422                 $number   This refers to the value returned by the  component
423                           specified  by the token number in the right side of
424                           a rule, reading from left to right; number  can  be
425                           zero or negative. If number is zero or negative, it
426                           refers to the data associated with the name on  the
427                           parser's stack preceding the leftmost symbol of the
428                           current rule.  (That is, "$0" refers  to  the  name
429                           immediately preceding the leftmost name in the cur‐
430                           rent rule to be found on  the  parser's  stack  and
431                           "$−1"  refers to the symbol to its left.) If number
432                           refers to an element past the current point in  the
433                           rule, or beyond the bottom of the stack, the result
434                           is undefined. If type checking is enabled  and  the
435                           type  of  the value to be assigned cannot be deter‐
436                           mined, a diagnostic message may be generated.
437
438                 $<tag>number
439                           These correspond exactly to the corresponding  sym‐
440                           bols  without  the  tag  inclusion,  but  allow for
441                           strict type checking (and  preclude  unwanted  type
442                           conversions).  The  effect  is  that  the  macro is
443                           expanded to use tag to select an element  from  the
444                           YYSTYPE  union  (using dataname.tag).  This is par‐
445                           ticularly useful if number is not positive.
446
447                 $<tag>$   This imposes on the reference the type of the union
448                           member  referenced  by  tag.   This construction is
449                           applicable when a reference to a left context value
450                           occurs  in  the  grammar,  and provides yacc with a
451                           means for selecting a type.
452
453                 Actions can occur anywhere in a rule (not just at  the  end);
454                 an  action can access values returned by actions to its left,
455                 and in turn the value it returns can be accessed  by  actions
456                 to  its  right.  An  action appearing in the middle of a rule
457                 shall be equivalent to replacing the action with a  new  non-
458                 terminal symbol and adding an empty rule with that non-termi‐
459                 nal symbol on the left-hand side. The semantic action associ‐
460                 ated  with  the  new rule shall be equivalent to the original
461                 action. The use of actions within rules might introduce  con‐
462                 flicts that would not otherwise exist.
463
464                 By  default,  the  value  of a rule shall be the value of the
465                 first element in it. If the first element  does  not  have  a
466                 type  (particularly in the case of a literal) and type check‐
467                 ing is turned on by %type, an error message shall result.
468
469       precedence
470                 The keyword %prec can be used to change the precedence  level
471                 associated  with  a particular grammar rule. Examples of this
472                 are in cases where a unary and binary operator have the  same
473                 symbolic  representation,  but  need  to  be  given different
474                 precedences, or where the handling of  an  ambiguous  if-else
475                 construction  is  necessary.  The  reserved  symbol %prec can
476                 appear immediately after the body of the grammar rule and can
477                 be  followed by a token name or a literal. It shall cause the
478                 precedence of the grammar rule to become that of the  follow‐
479                 ing token name or literal. The action for the rule as a whole
480                 can follow %prec.
481
482       If a program section follows, the application  shall  ensure  that  the
483       grammar rules are terminated by %%.
484
485   Programs Section
486       The programs section can include the definition of the lexical analyzer
487       yylex(), and any other  functions;  for  example,  those  used  in  the
488       actions  specified  in the grammar rules. It is unspecified whether the
489       programs section precedes or follows the semantic actions in the output
490       file;  therefore, if the application contains any macro definitions and
491       declarations intended to apply to the code in the semantic actions,  it
492       shall place them within "%{ ... %}" in the declarations section.
493
494   Input Grammar
495       The  following  input  to  yacc  yields a parser for the input to yacc.
496       This formal syntax takes precedence  over  the  preceding  text  syntax
497       description.
498
499       The  lexical  structure is defined less precisely; Lexical Structure of
500       the Grammar defines most terms. The correspondence between the previous
501       terms and the tokens below is as follows.
502
503       IDENTIFIER  This  corresponds to the concept of name, given previously.
504                   It also includes literals as defined previously.
505
506       C_IDENTIFIER
507                   This is a name, and additionally it is known to be followed
508                   by a <colon>.  A literal cannot yield this token.
509
510       NUMBER      A string of digits (a non-negative decimal integer).
511
512       TYPE, LEFT, MARK, LCURL, RCURL
513                   These correspond directly to %type, %left, %%, %{, and %}.
514
515       { ... }     This  indicates  C-language  source code, with the possible
516                   inclusion of '$' macros as discussed previously.
517
518           /* Grammar for the input to yacc. */
519           /* Basic entries. */
520           /* The following are recognized by the lexical analyzer. */
521
522           %token    IDENTIFIER      /* Includes identifiers and literals */
523           %token    C_IDENTIFIER    /* identifier (but not literal)
524                                        followed by a :. */
525           %token    NUMBER          /* [0-9][0-9]* */
526
527           /* Reserved words : %type=>TYPE %left=>LEFT, and so on */
528
529           %token    LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION
530
531           %token    MARK            /* The %% mark. */
532           %token    LCURL           /* The %{ mark. */
533           %token    RCURL           /* The %} mark. */
534
535           /* 8-bit character literals stand for themselves; */
536           /* tokens have to be defined for multi-byte characters. */
537
538           %start    spec
539
540           %%
541
542           spec  : defs MARK rules tail
543                 ;
544           tail  : MARK
545                 {
546                   /* In this action, set up the rest of the file. */
547                 }
548                 | /* Empty; the second MARK is optional. */
549                 ;
550           defs  : /* Empty. */
551                 |    defs def
552                 ;
553           def   : START IDENTIFIER
554                 |    UNION
555                 {
556                   /* Copy union definition to output. */
557                 }
558                 |    LCURL
559                 {
560                   /* Copy C code to output file. */
561                 }
562                   RCURL
563                 |    rword tag nlist
564                 ;
565           rword : TOKEN
566                 | LEFT
567                 | RIGHT
568                 | NONASSOC
569                 | TYPE
570                 ;
571           tag   : /* Empty: union tag ID optional. */
572                 | '<' IDENTIFIER '>'
573                 ;
574           nlist : nmno
575                 | nlist nmno
576                 ;
577           nmno  : IDENTIFIER         /* Note: literal invalid with % type. */
578                 | IDENTIFIER NUMBER  /* Note: invalid with % type. */
579                 ;
580
581           /* Rule section */
582
583           rules : C_IDENTIFIER rbody prec
584                 | rules  rule
585                 ;
586           rule  : C_IDENTIFIER rbody prec
587                 | '|' rbody prec
588                 ;
589           rbody : /* empty */
590                 | rbody IDENTIFIER
591                 | rbody act
592                 ;
593           act   : '{'
594                   {
595                     /* Copy action, translate $$, and so on. */
596                   }
597                   '}'
598                 ;
599           prec  : /* Empty */
600                 | PREC IDENTIFIER
601                 | PREC IDENTIFIER act
602                 | prec ';'
603                 ;
604
605   Conflicts
606       The parser produced for an input grammar may contain  states  in  which
607       conflicts  occur.  The  conflicts  occur  because  the  grammar  is not
608       LALR(1). An ambiguous grammar always contains at least one LALR(1) con‐
609       flict.  The  yacc  utility  shall  resolve  all conflicts, using either
610       default rules or user-specified precedence rules.
611
612       Conflicts are either shift/reduce conflicts or reduce/reduce conflicts.
613       A  shift/reduce conflict is where, for a given state and lookahead sym‐
614       bol,  both  a  shift  action  and  a  reduce  action  are  possible.  A
615       reduce/reduce  conflict  is where, for a given state and lookahead sym‐
616       bol, reductions by two different rules are possible.
617
618       The rules below describe how to specify what actions  to  take  when  a
619       conflict  occurs.  Not  all  shift/reduce conflicts can be successfully
620       resolved this way because the conflict may be due  to  something  other
621       than  ambiguity,  so  incautious  use of these facilities can cause the
622       language accepted by the parser to be much different  from  that  which
623       was intended. The description file shall contain sufficient information
624       to understand the cause of the conflict. Where ambiguity is the  reason
625       either  the  default  or explicit rules should be adequate to produce a
626       working parser.
627
628       The declared precedences and associativities (see Declarations Section)
629       are used to resolve parsing conflicts as follows:
630
631        1. A  precedence  and  associativity  is  associated with each grammar
632           rule; it is the precedence and associativity of the last  token  or
633           literal  in  the body of the rule. If the %prec keyword is used, it
634           overrides this default. Some grammar  rules  might  not  have  both
635           precedence and associativity.
636
637        2. If  there is a shift/reduce conflict, and both the grammar rule and
638           the input symbol have precedence and associativity associated  with
639           them,  then  the conflict is resolved in favor of the action (shift
640           or reduce) associated with the higher  precedence.  If  the  prece‐
641           dences  are the same, then the associativity is used; left associa‐
642           tive implies reduce, right associative implies shift, and non-asso‐
643           ciative implies an error in the string being parsed.
644
645        3. When  there  is  a shift/reduce conflict that cannot be resolved by
646           rule 2, the shift is done. Conflicts resolved this way are  counted
647           in the diagnostic output described in Error Handling.
648
649        4. When  there is a reduce/reduce conflict, a reduction is done by the
650           grammar rule that occurs earlier in the input  sequence.  Conflicts
651           resolved this way are counted in the diagnostic output described in
652           Error Handling.
653
654       Conflicts resolved by precedence or associativity shall not be  counted
655       in  the  shift/reduce  and  reduce/reduce conflicts reported by yacc on
656       either standard error or in the description file.
657
658   Error Handling
659       The token error shall be reserved for error handling.  The  name  error
660       can  be used in grammar rules. It indicates places where the parser can
661       recover from a syntax error. The default value of error shall  be  256.
662       Its  value  can be changed using a %token declaration. The lexical ana‐
663       lyzer should not return the value of error.
664
665       The parser shall detect a syntax error when it is in a state where  the
666       action  associated  with  the  lookahead  symbol  is error.  A semantic
667       action can cause the parser to initiate error handling by executing the
668       macro  YYERROR.  When  YYERROR  is executed, the semantic action passes
669       control back to the parser. YYERROR cannot be used outside of  semantic
670       actions.
671
672       When  the  parser  detects  a syntax error, it normally calls yyerror()
673       with the character string "syntax error"  as  its  argument.  The  call
674       shall  not  be  made  if the parser is still recovering from a previous
675       error when the error is detected. The parser is considered to be recov‐
676       ering  from a previous error until the parser has shifted over at least
677       three normal input symbols since the  last  error  was  detected  or  a
678       semantic  action  has executed the macro yyerrok.  The parser shall not
679       call yyerror() when YYERROR is executed.
680
681       The macro function YYRECOVERING shall return 1 if a  syntax  error  has
682       been detected and the parser has not yet fully recovered from it.  Oth‐
683       erwise, zero shall be returned.
684
685       When a syntax error is detected by the parser, the parser  shall  check
686       if  a  previous syntax error has been detected. If a previous error was
687       detected, and if no normal input symbols have been  shifted  since  the
688       preceding error was detected, the parser checks if the lookahead symbol
689       is an endmarker (see Interface to the Lexical Analyzer).  If it is, the
690       parser  shall  return  with  a non-zero value. Otherwise, the lookahead
691       symbol shall be discarded and normal parsing shall resume.
692
693       When YYERROR is executed or when the parser detects a syntax error  and
694       no  previous error has been detected, or at least one normal input sym‐
695       bol has been shifted since the previous error was detected, the  parser
696       shall  pop  back  one state at a time until the parse stack is empty or
697       the current state allows a shift over error.  If the parser empties the
698       parse stack, it shall return with a non-zero value. Otherwise, it shall
699       shift over error and then resume normal parsing. If the parser reads  a
700       lookahead symbol before the error was detected, that symbol shall still
701       be the lookahead symbol when parsing is resumed.
702
703       The macro yyerrok in a semantic action shall cause the parser to act as
704       if it has fully recovered from any previous errors. The macro yyclearin
705       shall cause the parser to discard the current lookahead token.  If  the
706       current  lookahead token has not yet been read, yyclearin shall have no
707       effect.
708
709       The macro YYACCEPT shall cause the parser  to  return  with  the  value
710       zero.  The  macro  YYABORT shall cause the parser to return with a non-
711       zero value.
712
713   Interface to the Lexical Analyzer
714       The yylex() function is an integer-valued function that returns a token
715       number representing the kind of token read. If there is a value associ‐
716       ated with the token returned by yylex()  (see  the  discussion  of  tag
717       above), it shall be assigned to the external variable yylval.
718
719       If the parser and yylex() do not agree on these token numbers, reliable
720       communication between them cannot occur.  For  (single-byte  character)
721       literals, the token is simply the numeric value of the character in the
722       current character set.  The numbers for other tokens can either be cho‐
723       sen  by  yacc,  or chosen by the user. In either case, the #define con‐
724       struct of C is used to allow yylex() to return these  numbers  symboli‐
725       cally.  The  #define  statements  are  put  into the code file, and the
726       header file if that file is requested. The set of characters  permitted
727       by  yacc  in  an  identifier  is larger than that permitted by C. Token
728       names found to contain such characters shall not  be  included  in  the
729       #define declarations.
730
731       If the token numbers are chosen by yacc, the tokens other than literals
732       shall be assigned numbers  greater  than  256,  although  no  order  is
733       implied.  A  token can be explicitly assigned a number by following its
734       first appearance in the declarations section with a number.  Names  and
735       literals  not  defined  this  way  retain their default definition. All
736       token numbers assigned by yacc shall be unique and  distinct  from  the
737       token  numbers used for literals and user-assigned tokens. If duplicate
738       token numbers cause conflicts in parser generation, yacc  shall  report
739       an  error; otherwise, it is unspecified whether the token assignment is
740       accepted or an error is reported.
741
742       The end of the input is marked by a special token called the endmarker,
743       which  has  a  token number that is zero or negative. (These values are
744       invalid for any other token.) All lexical analyzers shall  return  zero
745       or  negative as a token number upon reaching the end of their input. If
746       the tokens up to, but excluding, the endmarker form  a  structure  that
747       matches  the  start  symbol,  the parser shall accept the input. If the
748       endmarker is seen in any other  context,  it  shall  be  considered  an
749       error.
750
751   Completing the Program
752       In  addition  to  yyparse()  and  yylex(),  the functions yyerror() and
753       main() are required to make a complete  program.  The  application  can
754       supply main() and yyerror(), or those routines can be obtained from the
755       yacc library.
756
757   Yacc Library
758       The following functions shall appear only in the yacc library  accessi‐
759       ble through the −l y operand to c99; they can therefore be redefined by
760       a conforming application:
761
762       int main(void)
763             This function shall call yyparse() and exit with  an  unspecified
764             value. Other actions within this function are unspecified.
765
766       int yyerror(const char *s)
767             This function shall write the NUL-terminated argument to standard
768             error, followed by a <newline>.
769
770       The order of the −l y and −l l operands given to  c99  is  significant;
771       the  application shall either provide its own main() function or ensure
772       that −l y precedes −l l.
773
774   Debugging the Parser
775       The parser generated by yacc shall have  diagnostic  facilities  in  it
776       that can be optionally enabled at either compile time or at runtime (if
777       enabled at compile time).  The compilation  of  the  runtime  debugging
778       code is under the control of YYDEBUG, a preprocessor symbol. If YYDEBUG
779       has a non-zero value, the debugging code  shall  be  included.  If  its
780       value is zero, the code shall not be included.
781
782       In parsers where the debugging code has been included, the external int
783       yydebug can be used to turn debugging on (with a  non-zero  value)  and
784       off  (zero  value)  at  runtime.  The initial value of yydebug shall be
785       zero.
786
787       When −t is specified, the code file shall be built such that, if  YYDE‐
788       BUG  is not already defined at compilation time (using the c99 −D YYDE‐
789       BUG option, for example), YYDEBUG shall be set explicitly to  1.   When
790       −t is not specified, the code file shall be built such that, if YYDEBUG
791       is not already defined, it shall be set explicitly to zero.
792
793       The format of the debugging output is unspecified but includes at least
794       enough  information  to determine the shift and reduce actions, and the
795       input symbols. It also provides information about error recovery.
796
797   Algorithms
798       The parser constructed by yacc implements an LALR(1) parsing  algorithm
799       as  documented  in the literature. It is unspecified whether the parser
800       is table-driven or direct-coded.
801
802       A parser generated by yacc shall never request  an  input  symbol  from
803       yylex()  while  in  a state where the only actions other than the error
804       action are reductions by a single rule.
805
806       The literature of parsing theory defines these concepts.
807
808   Limits
809       The yacc utility may have several internal tables. The minimum maximums
810       for these tables are shown in the following table. The exact meaning of
811       these values is implementation-defined. The implementation shall define
812       the  relationship  between  these values and between them and any error
813       messages that the implementation may generate  should  it  run  out  of
814       space  for any internal structure. An implementation may combine groups
815       of these resources into a single pool as long as the total available to
816       the  user  does  not  fall below the sum of the sizes specified by this
817       section.
818
819                           Table: Internal Limits in yacc
820
821               ┌───────────┬─────────┬────────────────────────────────┐
822               │           │ Minimum │                                │
823Limit    Maximum Description           
824               ├───────────┼─────────┼────────────────────────────────┤
825               │{NTERMS}   │   126   │ Number of tokens.              │
826               │{NNONTERM} │   200   │ Number of non-terminals.       │
827               │{NPROD}    │   300   │ Number of rules.               │
828               │{NSTATES}  │   600   │ Number of states.              │
829               │{MEMSIZE}  │  5200   │ Length of rules. The total     │
830               │           │         │ length, in names (tokens and   │
831               │           │         │ non-terminals), of all the     │
832               │           │         │ rules of the grammar. The      │
833               │           │         │ left-hand side is counted for  │
834               │           │         │ each rule, even if it is not   │
835               │           │         │ explicitly repeated, as speci‐ │
836               │           │         │ fied in Grammar Rules in yacc. │
837               │{ACTSIZE}  │  4000   │ Number of actions. ``Actions'' │
838               │           │         │ here (and in the description   │
839               │           │         │ file) refer to parser actions  │
840               │           │         │ (shift, reduce, and so on) not │
841               │           │         │ to semantic actions defined in │
842               │           │         │ Grammar Rules in yacc.         │
843               └───────────┴─────────┴────────────────────────────────┘

EXIT STATUS

845       The following exit values shall be returned:
846
847        0    Successful completion.
848
849       >0    An error occurred.
850

CONSEQUENCES OF ERRORS

852       If any errors are encountered, the run is aborted and yacc exits with a
853       non-zero  status.  Partial code files and header files may be produced.
854       The summary information in the description file shall  always  be  pro‐
855       duced if the −v flag is present.
856
857       The following sections are informative.
858

APPLICATION USAGE

860       Historical  implementations  experience  name  conflicts  on  the names
861       yacc.tmp, yacc.acts, yacc.debug, y.tab.c, y.tab.h, and y.output if more
862       than one copy of yacc is running in a single directory at one time. The
863       −b option was added to overcome this problem. The  related  problem  of
864       allowing  multiple  yacc  parsers  to  be  placed  in the same file was
865       addressed by adding a −p option to override the  previously  hard-coded
866       yy variable prefix.
867
868       The  description of the −p option specifies the minimal set of function
869       and variable names that cause conflict when multiple parsers are linked
870       together.  YYSTYPE does not need to be changed. Instead, the programmer
871       can use −b to give the header files  for  different  parsers  different
872       names,  and  then  the  file  with  the  yylex() for a given parser can
873       include the header for that parser. Names such  as  yyclearerr  do  not
874       need  to  be changed because they are used only in the actions; they do
875       not have linkage. It is  possible  that  an  implementation  has  other
876       names, either internal ones for implementing things such as yyclearerr,
877       or providing non-standard features that it wants to change with −p.
878
879       Unary operators that are the same token as a binary operator in general
880       need  their  precedence adjusted. This is handled by the %prec advisory
881       symbol associated with the particular grammar rule defining that  unary
882       operator.  (See  Grammar Rules in yacc.)  Applications are not required
883       to use this operator for unary operators, but the grammars that do  not
884       require it are rare.
885

EXAMPLES

887       Access  to the yacc library is obtained with library search operands to
888       c99.  To use the yacc library main():
889
890           c99 y.tab.c −l y
891
892       Both the lex library and the yacc library contain  main().   To  access
893       the yacc main():
894
895           c99 y.tab.c lex.yy.c −l y −l l
896
897       This  ensures  that  the  yacc  library  is searched first, so that its
898       main() is used.
899
900       The historical yacc libraries have contained two simple functions  that
901       are  normally  coded by the application programmer. These functions are
902       similar to the following code:
903
904           #include <locale.h>
905           int main(void)
906           {
907               extern int yyparse();
908
909               setlocale(LC_ALL, "");
910
911               /* If the following parser is one created by lex, the
912                  application must be careful to ensure that LC_CTYPE
913                  and LC_COLLATE are set to the POSIX locale. */
914               (void) yyparse();
915               return (0);
916           }
917
918           #include <stdio.h>
919
920           int yyerror(const char *msg)
921           {
922               (void) fprintf(stderr, "%s\n", msg);
923               return (0);
924           }
925

RATIONALE

927       The references in Referenced Documents may be helpful  in  constructing
928       the  parser  generator.  The  referenced  DeRemer  and Pennello article
929       (along with the works it references) describes a technique to  generate
930       parsers  that conform to this volume of POSIX.1‐2008. Work in this area
931       continues to be done, so implementors should consult current literature
932       before doing any new implementations. The original Knuth article is the
933       theoretical basis for this kind of parser, but the tables it  generates
934       are impractically large for reasonable grammars and should not be used.
935       The ``equivalent to'' wording is intentional to assure  that  the  best
936       tables that are LALR(1) can be generated.
937
938       There  has been confusion between the class of grammars, the algorithms
939       needed to generate parsers, and the algorithms needed to parse the lan‐
940       guages.  They  are  all  reasonably orthogonal. In particular, a parser
941       generator that accepts the full range of LR(1) grammars need not gener‐
942       ate a table any more complex than one that accepts SLR(1) (a relatively
943       weak class of LR grammars) for a grammar that  happens  to  be  SLR(1).
944       Such  an implementation need not recognize the case, either; table com‐
945       pression can yield the SLR(1) table (or one  even  smaller  than  that)
946       without  recognizing that the grammar is SLR(1).  The speed of an LR(1)
947       parser for any class is dependent more upon  the  table  representation
948       and  compression  (or  the code generation if a direct parser is gener‐
949       ated) than upon the class of grammar that the table generator handles.
950
951       The speed of the parser generator is somewhat dependent upon the  class
952       of  grammar  it handles. However, the original Knuth article algorithms
953       for constructing LR parsers were judged by its author to  be  impracti‐
954       cally slow at that time. Although full LR is more complex than LALR(1),
955       as computer speeds and algorithms improve, the difference (in terms  of
956       acceptable wall-clock execution time) is becoming less significant.
957
958       Potential  authors  are  cautioned that the referenced DeRemer and Pen‐
959       nello article previously cited identifies a bug (an over-simplification
960       of  the  computation  of LALR(1) lookahead sets) in some of the LALR(1)
961       algorithm statements that preceded it to publication. They should  take
962       the time to seek out that paper, as well as current relevant work, par‐
963       ticularly Aho's.
964
965       The −b option was added to provide a  portable  method  for  permitting
966       yacc  to  work on multiple separate parsers in the same directory. If a
967       directory contains more than one yacc grammar, and  both  grammars  are
968       constructed  at  the  same  time (by, for example, a parallel make pro‐
969       gram), conflict results. While the solution is not historical practice,
970       it  corrects  a known deficiency in historical implementations.  Corre‐
971       sponding changes were made to all sections that  referenced  the  file‐
972       names  y.tab.c  (now  ``the  code  file''),  y.tab.h  (now ``the header
973       file''), and y.output (now ``the description file'').
974
975       The grammar for yacc input is based on System V documentation. The tex‐
976       tual description shows there that the ';' is required at the end of the
977       rule. The grammar and the implementation do not require this. (The  use
978       of C_IDENTIFIER causes a reduce to occur in the right place.)
979
980       Also, in that implementation, the constructs such as %token can be ter‐
981       minated by a <semicolon>, but this is not permitted by the grammar. The
982       keywords  such  as  %token can also appear in uppercase, which is again
983       not discussed. In most places where '%' is  used,  <backslash>  can  be
984       substituted,  and there are alternate spellings for some of the symbols
985       (for example, %LEFT can be "%<" or even "\<").
986
987       Historically, <tag> can contain any characters  except  '>',  including
988       white  space, in the implementation. However, since the tag must refer‐
989       ence an ISO C standard union member, in practice conforming implementa‐
990       tions  need  to  support  only the set of characters for ISO C standard
991       identifiers in this context.
992
993       Some historical implementations are known to accept  actions  that  are
994       terminated  by  a period. Historical implementations often allow '$' in
995       names. A conforming implementation does not need to support  either  of
996       these behaviors.
997
998       Deciding when to use %prec illustrates the difficulty in specifying the
999       behavior of yacc.  There may be situations in which the grammar is not,
1000       strictly speaking, in error, and yet yacc cannot interpret it unambigu‐
1001       ously. The resolution  of  ambiguities  in  the  grammar  can  in  many
1002       instances  be  resolved  by  providing  additional information, such as
1003       using %type or %union declarations. It is often easier and  it  usually
1004       yields  a  smaller parser to take this alternative when it is appropri‐
1005       ate.
1006
1007       The size and execution time of a program produced without  the  runtime
1008       debugging  code  is  usually  smaller and slightly faster in historical
1009       implementations.
1010
1011       Statistics messages from several historical implementations include the
1012       following types of information:
1013
1014           n/512 terminals, n/300 non-terminals
1015           n/600 grammar rules, n/1500 states
1016           n shift/reduce, n reduce/reduce conflicts reported
1017           n/350 working sets used
1018           Memory: states, etc. n/15000, parser n/15000
1019           n/600 distinct lookahead sets
1020           n extra closures
1021           n shift entries, n exceptions
1022           n goto entries
1023           n entries saved by goto default
1024           Optimizer space used: input n/15000, output n/15000
1025           n table entries, n zero
1026           Maximum spread: n, Maximum offset: n
1027
1028       The report of internal tables in the description file is left implemen‐
1029       tation-defined because all aspects of these limits are also implementa‐
1030       tion-defined.  Some  implementations  may  use dynamic allocation tech‐
1031       niques and have no specific limit values to report.
1032
1033       The format of the y.output file is not given because  specification  of
1034       the  format was not seen to enhance applications portability. The list‐
1035       ing is primarily intended to help human users understand and debug  the
1036       parser;  use  of  y.output  by a conforming application script would be
1037       unusual. Furthermore, implementations have not produced consistent out‐
1038       put  and  no  popular  format  was apparent. The format selected by the
1039       implementation should be human-readable, in addition to the requirement
1040       that it be a text file.
1041
1042       Standard  error reports are not specifically described because they are
1043       seldom of use to conforming applications and there  was  no  reason  to
1044       restrict implementations.
1045
1046       Some  implementations  recognize  "={"  as equivalent to '{' because it
1047       appears in historical documentation. This construction  was  recognized
1048       and documented as obsolete as long ago as 1978, in the referenced Yacc:
1049       Yet Another Compiler-Compiler. This volume  of  POSIX.1‐2008  chose  to
1050       leave it as obsolete and omit it.
1051
1052       Multi-byte  characters should be recognized by the lexical analyzer and
1053       returned as tokens. They should not be returned as multi-byte character
1054       literals.  The  token error that is used for error recovery is normally
1055       assigned the value 256 in  the  historical  implementation.  Thus,  the
1056       token  value  256,  which is used in many multi-byte character sets, is
1057       not available for use as the value of a user-defined token.
1058

FUTURE DIRECTIONS

1060       None.
1061

SEE ALSO

1063       c99, lex
1064
1065       The Base Definitions volume of  POSIX.1‐2008,  Chapter  8,  Environment
1066       Variables, Section 12.2, Utility Syntax Guidelines
1067
1069       Portions  of  this text are reprinted and reproduced in electronic form
1070       from IEEE Std 1003.1, 2013 Edition, Standard for Information Technology
1071       --  Portable  Operating  System  Interface (POSIX), The Open Group Base
1072       Specifications Issue 7, Copyright (C) 2013 by the Institute of Electri‐
1073       cal  and  Electronics  Engineers,  Inc  and  The  Open Group.  (This is
1074       POSIX.1-2008 with the 2013 Technical Corrigendum  1  applied.)  In  the
1075       event of any discrepancy between this version and the original IEEE and
1076       The Open Group Standard, the original IEEE and The Open Group  Standard
1077       is  the  referee document. The original Standard can be obtained online
1078       at http://www.unix.org/online.html .
1079
1080       Any typographical or formatting errors that appear  in  this  page  are
1081       most likely to have been introduced during the conversion of the source
1082       files to man page format. To report such errors,  see  https://www.ker
1083       nel.org/doc/man-pages/reporting_bugs.html .
1084
1085
1086
1087IEEE/The Open Group                  2013                             YACC(1P)
Impressum