yacc(1p) - f35

1YACC(1P)                   POSIX Programmer's Manual                  YACC(1P)
2
3
4

PROLOG

6       This  manual  page is part of the POSIX Programmer's Manual.  The Linux
7       implementation of this interface may differ (consult the  corresponding
8       Linux  manual page for details of Linux behavior), or the interface may
9       not be implemented on Linux.
10

NAME

12       yacc — yet another compiler compiler (DEVELOPMENT)
13

SYNOPSIS

15       yacc [-dltv] [-b file_prefix] [-p sym_prefix] grammar
16

DESCRIPTION

18       The yacc utility shall read a description of a context-free grammar  in
19       grammar and write C source code, conforming to the ISO C standard, to a
20       code file, and optionally header information into a header file, in the
21       current  directory.  The  generated source code shall not depend on any
22       undefined, unspecified, or implementation-defined behavior,  except  in
23       cases  where  it  is  copied  directly from the supplied grammar, or in
24       cases that are documented by  the  implementation.  The  C  code  shall
25       define a function and related routines and macros for an automaton that
26       executes a parsing algorithm meeting the requirements in Algorithms.
27
28       The form and meaning of the  grammar  are  described  in  the  EXTENDED
29       DESCRIPTION section.
30
31       The  C source code and header file shall be produced in a form suitable
32       as input for the C compiler (see c99).
33

OPTIONS

35       The yacc utility shall  conform  to  the  Base  Definitions  volume  of
36       POSIX.1‐2017,  Section  12.2,  Utility  Syntax  Guidelines,  except for
37       Guideline 9.
38
39       The following options shall be supported:
40
41       -b file_prefix
42                 Use file_prefix instead of y as the  prefix  for  all  output
43                 filenames.  The  code  file  y.tab.c, the header file y.tab.h
44                 (created when -d is  specified),  and  the  description  file
45                 y.output  (created when -v is specified), shall be changed to
46                 file_prefix.tab.c, file_prefix.tab.h, and file_prefix.output,
47                 respectively.
48
49       -d        Write the header file; by default only the code file is writ‐
50                 ten. See the OUTPUT FILES section.
51
52       -l        Produce a code file that does  not  contain  any  #line  con‐
53                 structs.  If  this  option  is not present, it is unspecified
54                 whether the code file or header file  contains  #line  direc‐
55                 tives.  This  should  only  be used after the grammar and the
56                 associated actions are fully debugged.
57
58       -p sym_prefix
59                 Use sym_prefix instead of yy as the prefix for  all  external
60                 names produced by yacc.  The names affected shall include the
61                 functions yyparse(), yylex(), and yyerror(),  and  the  vari‐
62                 ables yylval, yychar, and yydebug.  (In the remainder of this
63                 section, the six symbols cited  are  referenced  using  their
64                 default  names only as a notational convenience.) Local names
65                 may also be affected by the -p option; however, the -p option
66                 shall not affect #define symbols generated by yacc.
67
68       -t        Modify  conditional compilation directives to permit compila‐
69                 tion of debugging code in the code  file.  Runtime  debugging
70                 statements shall always be contained in the code file, but by
71                 default conditional compilation directives prevent their com‐
72                 pilation.
73
74       -v        Write  a  file  containing  a description of the parser and a
75                 report of conflicts generated by ambiguities in the grammar.
76

OPERANDS

78       The following operand is required:
79
80       grammar   A pathname  of  a  file  containing  instructions,  hereafter
81                 called grammar, for which a parser is to be created. The for‐
82                 mat for the grammar is described in the EXTENDED  DESCRIPTION
83                 section.
84

STDIN

86       Not used.
87

INPUT FILES

89       The  file  grammar  shall  be a text file formatted as specified in the
90       EXTENDED DESCRIPTION section.
91

ENVIRONMENT VARIABLES

93       The following environment variables shall affect the execution of yacc:
94
95       LANG      Provide a default value for  the  internationalization  vari‐
96                 ables  that are unset or null. (See the Base Definitions vol‐
97                 ume of POSIX.1‐2017, Section 8.2, Internationalization  Vari‐
98                 ables  for  the  precedence of internationalization variables
99                 used to determine the values of locale categories.)
100
101       LC_ALL    If set to a non-empty string value, override  the  values  of
102                 all the other internationalization variables.
103
104       LC_CTYPE  Determine  the  locale for the interpretation of sequences of
105                 bytes of text data as characters (for example, single-byte as
106                 opposed  to  multi-byte  characters  in  arguments  and input
107                 files).
108
109       LC_MESSAGES
110                 Determine the locale that should be used to affect the format
111                 and  contents  of  diagnostic  messages  written  to standard
112                 error.
113
114       NLSPATH   Determine the location of message catalogs for the processing
115                 of LC_MESSAGES.
116
117       The LANG and LC_* variables affect the execution of the yacc utility as
118       stated. The main() function defined in Yacc Library shall call:
119
120
121           setlocale(LC_ALL, "")
122
123       and thus the program generated by yacc shall also be  affected  by  the
124       contents of these variables at runtime.
125

ASYNCHRONOUS EVENTS

127       Default.
128

STDOUT

130       Not used.
131

STDERR

133       If  shift/reduce  or  reduce/reduce  conflicts are detected in grammar,
134       yacc shall write a report of those conflicts to the standard  error  in
135       an unspecified format.
136
137       Standard error shall also be used for diagnostic messages.
138

OUTPUT FILES

140       The  code file, the header file, and the description file shall be text
141       files. All are described in the following sections.
142
143   Code File
144       This file shall contain the C source code for the  yyparse()  function.
145       It  shall contain code for the various semantic actions with macro sub‐
146       stitution performed on them as described in  the  EXTENDED  DESCRIPTION
147       section.  It also shall contain a copy of the #define statements in the
148       header file. If a %union  declaration  is  used,  the  declaration  for
149       YYSTYPE shall also be included in this file.
150
151   Header File
152       The  header  file  shall  contain #define statements that associate the
153       token numbers with the token names. This allows source files other than
154       the  code  file  to  access the token codes. If a %union declaration is
155       used, the declaration for YYSTYPE and an extern YYSTYPE yylval declara‐
156       tion shall also be included in this file.
157
158   Description File
159       The  description  file shall be a text file containing a description of
160       the state machine corresponding to the  parser,  using  an  unspecified
161       format. Limits for internal tables (see Limits) shall also be reported,
162       in an implementation-defined  manner.  (Some  implementations  may  use
163       dynamic  allocation  techniques  and  have  no specific limit values to
164       report.)
165

EXTENDED DESCRIPTION

167       The yacc command accepts a language that is used to  define  a  grammar
168       for  a target language to be parsed by the tables and code generated by
169       yacc.  The language accepted by yacc as a grammar for the  target  lan‐
170       guage is described below using the yacc input language itself.
171
172       The  input grammar includes rules describing the input structure of the
173       target language and code to be invoked when these rules are  recognized
174       to  provide  the  associated  semantic  action. The code to be executed
175       shall appear as bodies of text that are intended to be C-language code.
176       These bodies of text shall not contain C-language trigraphs. The C-lan‐
177       guage inclusions are presumed to form a correct function when processed
178       by  yacc  into its output files. The code included in this way shall be
179       executed during the recognition of the target language.
180
181       Given a grammar, the yacc utility generates the files described in  the
182       OUTPUT  FILES  section.  The code file can be compiled and linked using
183       c99.  If the declaration and programs sections of the grammar file  did
184       not include definitions of main(), yylex(), and yyerror(), the compiled
185       output requires linking with  externally  supplied  versions  of  those
186       functions. Default versions of main() and yyerror() are supplied in the
187       yacc library and can be linked in by using the  -l y  operand  to  c99.
188       The yacc library interfaces need not support interfaces with other than
189       the default yy symbol prefix. The application provides the lexical ana‐
190       lyzer  function,  yylex();  the lex utility is specifically designed to
191       generate such a routine.
192
193   Input Language
194       The application shall ensure that every specification file consists  of
195       three  sections  in  order:  declarations, grammar rules, and programs,
196       separated by double <percent-sign> characters ("%%").  The declarations
197       and programs sections can be empty. If the latter is empty, the preced‐
198       ing "%%" mark separating it from the rules section can be omitted.
199
200       The input is free form text following  the  structure  of  the  grammar
201       defined below.
202
203   Lexical Structure of the Grammar
204       The  <blank>,  <newline>,  and  <form-feed> character shall be ignored,
205       except that the application shall ensure that they  do  not  appear  in
206       names  or  multi-character reserved symbols. Comments shall be enclosed
207       in "/* ... */", and can appear wherever a name is valid.
208
209       Names are of arbitrary length,  made  up  of  letters,  periods  ('.'),
210       underscores ('_'), and non-initial digits. Uppercase and lowercase let‐
211       ters are distinct.  Conforming applications shall not use names  begin‐
212       ning  in  yy  or  YY since the yacc parser uses such names. Many of the
213       names appear in the final output of yacc, and thus they should be  cho‐
214       sen  to  conform with any additional rules created by the C compiler to
215       be used. In particular they appear in #define statements.
216
217       A literal shall consist of a single character enclosed in  single-quote
218       characters.  All  of  the escape sequences supported for character con‐
219       stants by the ISO C standard shall be supported by yacc.
220
221       The relationship with the  lexical  analyzer  is  discussed  in  detail
222       below.
223
224       The  application  shall  ensure  that  the NUL character is not used in
225       grammar rules or literals.
226
227   Declarations Section
228       The declarations section is used to define the symbols used  to  define
229       the target language and their relationship with each other. In particu‐
230       lar, much of the additional information required to resolve ambiguities
231       in the context-free grammar for the target language is provided here.
232
233       Usually  yacc  assigns  the  relationship between the symbolic names it
234       generates and their underlying numeric value. The declarations  section
235       makes it possible to control the assignment of these values.
236
237       It  is  also  possible to keep semantic information associated with the
238       tokens currently on the parse stack in a user-defined C-language union,
239       if  the  members  of the union are associated with the various names in
240       the grammar. The declarations section provides for this as well.
241
242       The first group of declarators below all take a list of names as  argu‐
243       ments.  That  list  can optionally be preceded by the name of a C union
244       member (called a tag below) appearing  within  '<'  and  '>'.   (As  an
245       exception  to  the typographical conventions of the rest of this volume
246       of POSIX.1‐2017, in this case <tag> does not represent a  metavariable,
247       but the literal angle bracket characters surrounding a symbol.) The use
248       of tag specifies that the tokens named on this line  shall  be  of  the
249       same  C  type as the union member referenced by tag.  This is discussed
250       in more detail below.
251
252       For lists used to define tokens, the first appearance of a given  token
253       can  be followed by a positive integer (as a string of decimal digits).
254       If this is done, the underlying value assigned to it for  lexical  pur‐
255       poses shall be taken to be that number.
256
257       The following declares name to be a token:
258
259
260           %token [<tag>] name [number] [name [number]]...
261
262       If  tag  is  present,  the  C type for all tokens on this line shall be
263       declared to be the type referenced by tag.  If a positive integer, num‐
264       ber, follows a name, that value shall be assigned to the token.
265
266       The  following  declares  name to be a token, and assigns precedence to
267       it:
268
269
270           %left [<tag>] name [number] [name [number]]...
271           %right [<tag>] name [number] [name [number]]...
272
273       One or more lines, each beginning with one of these symbols, can appear
274       in  this  section. All tokens on the same line have the same precedence
275       level and associativity; the lines are in order  of  increasing  prece‐
276       dence  or  binding  strength.  %left denotes that the operators on that
277       line are left associative, and %right similarly denotes right  associa‐
278       tive  operators. If tag is present, it shall declare a C type for names
279       as described for %token.
280
281       The following declares name to be a token, and indicates that this can‐
282       not be used associatively:
283
284
285           %nonassoc [<tag>] name [number] [name [number]]...
286
287       If  the  parser  encounters associative use of this token it reports an
288       error. If tag is present, it shall  declare  a  C  type  for  names  as
289       described for %token.
290
291       The  following  declares that union member names are non-terminals, and
292       thus it is required to have a tag field at its beginning:
293
294
295           %type <tag> name...
296
297       Because it deals with non-terminals only, assigning a token  number  or
298       using  a literal is also prohibited. If this construct is present, yacc
299       shall perform type checking; if this  construct  is  not  present,  the
300       parse stack shall hold only the int type.
301
302       Every  name  used in grammar not defined by a %token, %left, %right, or
303       %nonassoc declaration is assumed to represent  a  non-terminal  symbol.
304       The yacc utility shall report an error for any non-terminal symbol that
305       does not appear on the left side of at least one grammar rule.
306
307       Once the type, precedence, or token number of a name is  specified,  it
308       shall  not  be  changed.  If  the first declaration of a token does not
309       assign a token number, yacc shall assign  a  token  number.  Once  this
310       assignment  is  made, the token number shall not be changed by explicit
311       assignment.
312
313       The following declarators do not follow the previous pattern.
314
315       The following declares the non-terminal name to be  the  start  symbol,
316       which  represents  the largest, most general structure described by the
317       grammar rules:
318
319
320           %start name
321
322       By default, it is the left-hand side of the first  grammar  rule;  this
323       default can be overridden with this declaration.
324
325       The  following declares the yacc value stack to be a union of the vari‐
326       ous types of values desired.
327
328
329           %union { body of union (in C) }
330
331       The body of the union shall not contain unbalanced curly brace  prepro‐
332       cessing tokens.
333
334       By  default, the values returned by actions (see below) and the lexical
335       analyzer shall be of type int.  The yacc utility keeps track of  types,
336       and  it  shall insert corresponding union member names in order to per‐
337       form strict type checking of the resulting parser.
338
339       Alternatively, given that at least one <tag>  construct  is  used,  the
340       union  can be declared in a header file (which shall be included in the
341       declarations section by using a #include construct within %{  and  %}),
342       and  a  typedef  used  to  define  the symbol YYSTYPE to represent this
343       union. The effect of %union is to provide the  declaration  of  YYSTYPE
344       directly from the yacc input.
345
346       C-language  declarations and definitions can appear in the declarations
347       section, enclosed by the following marks:
348
349
350           %{ ... %}
351
352       These statements shall be copied into the code file,  and  have  global
353       scope  within it so that they can be used in the rules and program sec‐
354       tions. The statements shall not contain "%}" outside a comment,  string
355       literal, or multi-character constant.
356
357       The  application  shall  ensure that the declarations section is termi‐
358       nated by the token %%.
359
360   Grammar Rules in yacc
361       The rules section defines the context-free grammar to  be  accepted  by
362       the function yacc generates, and associates with those rules C-language
363       actions and additional precedence information. The grammar is described
364       below, and a formal definition follows.
365
366       The  rules section is comprised of one or more grammar rules. A grammar
367       rule has the form:
368
369
370           A : BODY ;
371
372       The symbol A represents a non-terminal  name,  and  BODY  represents  a
373       sequence of zero or more names, literals, and semantic actions that can
374       then be followed by optional precedence rules.  Only the names and lit‐
375       erals participate in the formation of the grammar; the semantic actions
376       and precedence rules are used in other ways. The <colon> and the <semi‐
377       colon>  are  yacc  punctuation. If there are several successive grammar
378       rules with the same left-hand side, the <vertical-line>  ('|')  can  be
379       used  to  avoid  rewriting  the left-hand side; in this case the <semi‐
380       colon> appears only after the last rule. The BODY part can be empty (or
381       empty  of  names and literals) to indicate that the non-terminal symbol
382       matches the empty string.
383
384       The yacc utility assigns a unique number to each rule. Rules using  the
385       vertical  bar  notation  are distinct rules. The number assigned to the
386       rule appears in the description file.
387
388       The elements comprising a BODY are:
389
390       name, literal
391                 These form the rules of the grammar: name is either  a  token
392                 or  a non-terminal; literal stands for itself (less the lexi‐
393                 cally required quotation marks).
394
395       semantic action
396                 With each grammar rule, the user can associate actions to  be
397                 performed  each  time  the  rule  is  recognized in the input
398                 process. (Note that the word ``action'' can also refer to the
399                 actions of the parser—shift, reduce, and so on.)
400
401                 These  actions  can  return  values and can obtain the values
402                 returned by  previous  actions.  These  values  are  kept  in
403                 objects  of  type  YYSTYPE (see %union).  The result value of
404                 the action shall be kept on the parse stack  with  the  left-
405                 hand  side of the rule, to be accessed by other reductions as
406                 part of their right-hand side. By using the <tag> information
407                 provided  in  the declarations section, the code generated by
408                 yacc can be  strictly  type  checked  and  contain  arbitrary
409                 information.  In  addition,  the lexical analyzer can provide
410                 the same kinds of values for tokens, if desired.
411
412                 An action is an arbitrary C statement  and  as  such  can  do
413                 input  or  output, call subprograms, and alter external vari‐
414                 ables. An action is one or  more  C  statements  enclosed  in
415                 curly  braces  '{' and '}'.  The statements shall not contain
416                 unbalanced curly brace preprocessing tokens.
417
418                 Certain pseudo-variables can be used in the action. These are
419                 macros  for  access  to  data  structures known internally to
420                 yacc.
421
422                 $$        The value of the action can be set by assigning  it
423                           to  $$. If type checking is enabled and the type of
424                           the value to be assigned cannot  be  determined,  a
425                           diagnostic message may be generated.
426
427                 $number   This  refers to the value returned by the component
428                           specified by the token number in the right side  of
429                           a  rule,  reading from left to right; number can be
430                           zero or negative. If number is zero or negative, it
431                           refers  to the data associated with the name on the
432                           parser's stack preceding the leftmost symbol of the
433                           current  rule.   (That  is, "$0" refers to the name
434                           immediately preceding the leftmost name in the cur‐
435                           rent  rule  to  be  found on the parser's stack and
436                           "$-1" refers to the symbol to its left.) If  number
437                           refers  to an element past the current point in the
438                           rule, or beyond the bottom of the stack, the result
439                           is  undefined.  If type checking is enabled and the
440                           type of the value to be assigned cannot  be  deter‐
441                           mined, a diagnostic message may be generated.
442
443                 $<tag>number
444                           These  correspond exactly to the corresponding sym‐
445                           bols without  the  tag  inclusion,  but  allow  for
446                           strict  type  checking  (and preclude unwanted type
447                           conversions). The  effect  is  that  the  macro  is
448                           expanded  to  use tag to select an element from the
449                           YYSTYPE union (using dataname.tag).  This  is  par‐
450                           ticularly useful if number is not positive.
451
452                 $<tag>$   This imposes on the reference the type of the union
453                           member referenced by  tag.   This  construction  is
454                           applicable when a reference to a left context value
455                           occurs in the grammar, and  provides  yacc  with  a
456                           means for selecting a type.
457
458                 Actions  can  occur anywhere in a rule (not just at the end);
459                 an action can access values returned by actions to its  left,
460                 and  in  turn the value it returns can be accessed by actions
461                 to its right. An action appearing in the  middle  of  a  rule
462                 shall  be  equivalent to replacing the action with a new non-
463                 terminal symbol and adding an empty rule with that non-termi‐
464                 nal symbol on the left-hand side. The semantic action associ‐
465                 ated with the new rule shall be equivalent  to  the  original
466                 action.  The use of actions within rules might introduce con‐
467                 flicts that would not otherwise exist.
468
469                 By default, the value of a rule shall be  the  value  of  the
470                 first  element  in  it.  If the first element does not have a
471                 type (particularly in the case of a literal) and type  check‐
472                 ing is turned on by %type, an error message shall result.
473
474       precedence
475                 The  keyword %prec can be used to change the precedence level
476                 associated with a particular grammar rule. Examples  of  this
477                 are  in cases where a unary and binary operator have the same
478                 symbolic representation,  but  need  to  be  given  different
479                 precedences,  or  where  the handling of an ambiguous if-else
480                 construction is necessary.  The  reserved  symbol  %prec  can
481                 appear immediately after the body of the grammar rule and can
482                 be followed by a token name or a literal. It shall cause  the
483                 precedence  of the grammar rule to become that of the follow‐
484                 ing token name or literal. The action for the rule as a whole
485                 can follow %prec.
486
487       If  a  program  section  follows, the application shall ensure that the
488       grammar rules are terminated by %%.
489
490   Programs Section
491       The programs section can include the definition of the lexical analyzer
492       yylex(),  and  any  other  functions;  for  example,  those used in the
493       actions specified in the grammar rules. It is unspecified  whether  the
494       programs section precedes or follows the semantic actions in the output
495       file; therefore, if the application contains any macro definitions  and
496       declarations  intended to apply to the code in the semantic actions, it
497       shall place them within "%{ ... %}" in the declarations section.
498
499   Input Grammar
500       The following input to yacc yields a parser  for  the  input  to  yacc.
501       This  formal  syntax  takes  precedence  over the preceding text syntax
502       description.
503
504       The lexical structure is defined less precisely; Lexical  Structure  of
505       the Grammar defines most terms. The correspondence between the previous
506       terms and the tokens below is as follows.
507
508       IDENTIFIER  This corresponds to the concept of name, given  previously.
509                   It also includes literals as defined previously.
510
511       C_IDENTIFIER
512                   This is a name, and additionally it is known to be followed
513                   by a <colon>.  A literal cannot yield this token.
514
515       NUMBER      A string of digits (a non-negative decimal integer).
516
517       TYPE, LEFT, MARK, LCURL, RCURL
518                   These correspond directly to %type, %left, %%, %{, and %}.
519
520       { ... }     This indicates C-language source code,  with  the  possible
521                   inclusion of '$' macros as discussed previously.
522
523
524           /* Grammar for the input to yacc. */
525           /* Basic entries. */
526           /* The following are recognized by the lexical analyzer. */
527
528           %token    IDENTIFIER      /* Includes identifiers and literals */
529           %token    C_IDENTIFIER    /* identifier (but not literal)
530                                        followed by a :. */
531           %token    NUMBER          /* [0-9][0-9]* */
532
533           /* Reserved words : %type=>TYPE %left=>LEFT, and so on */
534
535           %token    LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION
536
537           %token    MARK            /* The %% mark. */
538           %token    LCURL           /* The %{ mark. */
539           %token    RCURL           /* The %} mark. */
540
541           /* 8-bit character literals stand for themselves; */
542           /* tokens have to be defined for multi-byte characters. */
543
544           %start    spec
545
546           %%
547
548           spec  : defs MARK rules tail
549                 ;
550           tail  : MARK
551                 {
552                   /* In this action, set up the rest of the file. */
553                 }
554                 | /* Empty; the second MARK is optional. */
555                 ;
556           defs  : /* Empty. */
557                 |    defs def
558                 ;
559           def   : START IDENTIFIER
560                 |    UNION
561                 {
562                   /* Copy union definition to output. */
563                 }
564                 |    LCURL
565                 {
566                   /* Copy C code to output file. */
567                 }
568                   RCURL
569                 |    rword tag nlist
570                 ;
571           rword : TOKEN
572                 | LEFT
573                 | RIGHT
574                 | NONASSOC
575                 | TYPE
576                 ;
577           tag   : /* Empty: union tag ID optional. */
578                 | '<' IDENTIFIER '>'
579                 ;
580           nlist : nmno
581                 | nlist nmno
582                 ;
583           nmno  : IDENTIFIER         /* Note: literal invalid with % type. */
584                 | IDENTIFIER NUMBER  /* Note: invalid with % type. */
585                 ;
586
587           /* Rule section */
588
589           rules : C_IDENTIFIER rbody prec
590                 | rules  rule
591                 ;
592           rule  : C_IDENTIFIER rbody prec
593                 | '|' rbody prec
594                 ;
595           rbody : /* empty */
596                 | rbody IDENTIFIER
597                 | rbody act
598                 ;
599           act   : '{'
600                   {
601                     /* Copy action, translate $$, and so on. */
602                   }
603                   '}'
604                 ;
605           prec  : /* Empty */
606                 | PREC IDENTIFIER
607                 | PREC IDENTIFIER act
608                 | prec ';'
609                 ;
610
611   Conflicts
612       The  parser  produced  for an input grammar may contain states in which
613       conflicts occur.  The  conflicts  occur  because  the  grammar  is  not
614       LALR(1). An ambiguous grammar always contains at least one LALR(1) con‐
615       flict. The yacc utility  shall  resolve  all  conflicts,  using  either
616       default rules or user-specified precedence rules.
617
618       Conflicts are either shift/reduce conflicts or reduce/reduce conflicts.
619       A shift/reduce conflict is where, for a given state and lookahead  sym‐
620       bol,  both  a  shift  action  and  a  reduce  action  are  possible.  A
621       reduce/reduce conflict is where, for a given state and  lookahead  sym‐
622       bol, reductions by two different rules are possible.
623
624       The  rules  below  describe  how to specify what actions to take when a
625       conflict occurs. Not all shift/reduce  conflicts  can  be  successfully
626       resolved  this  way  because the conflict may be due to something other
627       than ambiguity, so incautious use of these  facilities  can  cause  the
628       language  accepted  by  the parser to be much different from that which
629       was intended. The description file shall contain sufficient information
630       to  understand the cause of the conflict. Where ambiguity is the reason
631       either the default or explicit rules should be adequate  to  produce  a
632       working parser.
633
634       The declared precedences and associativities (see Declarations Section)
635       are used to resolve parsing conflicts as follows:
636
637        1. A precedence and associativity  is  associated  with  each  grammar
638           rule;  it  is the precedence and associativity of the last token or
639           literal in the body of the rule. If the %prec keyword is  used,  it
640           overrides  this  default.  Some  grammar  rules might not have both
641           precedence and associativity.
642
643        2. If there is a shift/reduce conflict, and both the grammar rule  and
644           the  input symbol have precedence and associativity associated with
645           them, then the conflict is resolved in favor of the  action  (shift
646           or  reduce)  associated  with  the higher precedence. If the prece‐
647           dences are the same, then the associativity is used; left  associa‐
648           tive implies reduce, right associative implies shift, and non-asso‐
649           ciative implies an error in the string being parsed.
650
651        3. When there is a shift/reduce conflict that cannot  be  resolved  by
652           rule  2, the shift is done. Conflicts resolved this way are counted
653           in the diagnostic output described in Error Handling.
654
655        4. When there is a reduce/reduce conflict, a reduction is done by  the
656           grammar  rule  that occurs earlier in the input sequence. Conflicts
657           resolved this way are counted in the diagnostic output described in
658           Error Handling.
659
660       Conflicts  resolved by precedence or associativity shall not be counted
661       in the shift/reduce and reduce/reduce conflicts  reported  by  yacc  on
662       either standard error or in the description file.
663
664   Error Handling
665       The  token  error  shall be reserved for error handling. The name error
666       can be used in grammar rules. It indicates places where the parser  can
667       recover  from  a syntax error. The default value of error shall be 256.
668       Its value can be changed using a %token declaration. The  lexical  ana‐
669       lyzer should not return the value of error.
670
671       The  parser shall detect a syntax error when it is in a state where the
672       action associated with the  lookahead  symbol  is  error.   A  semantic
673       action can cause the parser to initiate error handling by executing the
674       macro YYERROR. When YYERROR is executed,  the  semantic  action  passes
675       control  back to the parser. YYERROR cannot be used outside of semantic
676       actions.
677
678       When the parser detects a syntax error,  it  normally  calls  yyerror()
679       with  the  character  string  "syntax error"  as its argument. The call
680       shall not be made if the parser is still  recovering  from  a  previous
681       error when the error is detected. The parser is considered to be recov‐
682       ering from a previous error until the parser has shifted over at  least
683       three  normal  input  symbols  since  the  last error was detected or a
684       semantic action has executed the macro yyerrok.  The parser  shall  not
685       call yyerror() when YYERROR is executed.
686
687       The  macro  function  YYRECOVERING shall return 1 if a syntax error has
688       been detected and the parser has not yet fully recovered from it.  Oth‐
689       erwise, zero shall be returned.
690
691       When  a  syntax error is detected by the parser, the parser shall check
692       if a previous syntax error has been detected. If a previous  error  was
693       detected,  and  if  no normal input symbols have been shifted since the
694       preceding error was detected, the parser checks if the lookahead symbol
695       is an endmarker (see Interface to the Lexical Analyzer).  If it is, the
696       parser shall return with a non-zero  value.  Otherwise,  the  lookahead
697       symbol shall be discarded and normal parsing shall resume.
698
699       When  YYERROR is executed or when the parser detects a syntax error and
700       no previous error has been detected, or at least one normal input  sym‐
701       bol  has been shifted since the previous error was detected, the parser
702       shall pop back one state at a time until the parse stack  is  empty  or
703       the current state allows a shift over error.  If the parser empties the
704       parse stack, it shall return with a non-zero value. Otherwise, it shall
705       shift  over error and then resume normal parsing. If the parser reads a
706       lookahead symbol before the error was detected, that symbol shall still
707       be the lookahead symbol when parsing is resumed.
708
709       The macro yyerrok in a semantic action shall cause the parser to act as
710       if it has fully recovered from any previous errors. The macro yyclearin
711       shall  cause  the parser to discard the current lookahead token. If the
712       current lookahead token has not yet been read, yyclearin shall have  no
713       effect.
714
715       The  macro  YYACCEPT  shall  cause  the parser to return with the value
716       zero. The macro YYABORT shall cause the parser to return  with  a  non-
717       zero value.
718
719   Interface to the Lexical Analyzer
720       The yylex() function is an integer-valued function that returns a token
721       number representing the kind of token read. If there is a value associ‐
722       ated  with  the  token  returned  by yylex() (see the discussion of tag
723       above), it shall be assigned to the external variable yylval.
724
725       If the parser and yylex() do not agree on these token numbers, reliable
726       communication  between  them  cannot occur. For (single-byte character)
727       literals, the token is simply the numeric value of the character in the
728       current character set.  The numbers for other tokens can either be cho‐
729       sen by yacc, or chosen by the user. In either case,  the  #define  con‐
730       struct  of  C is used to allow yylex() to return these numbers symboli‐
731       cally. The #define statements are put  into  the  code  file,  and  the
732       header  file if that file is requested. The set of characters permitted
733       by yacc in an identifier is larger than  that  permitted  by  C.  Token
734       names  found  to  contain  such characters shall not be included in the
735       #define declarations.
736
737       If the token numbers are chosen by yacc, the tokens other than literals
738       shall  be  assigned  numbers  greater  than  256,  although no order is
739       implied. A token can be explicitly assigned a number by  following  its
740       first  appearance  in the declarations section with a number. Names and
741       literals not defined this way  retain  their  default  definition.  All
742       token  numbers  assigned  by yacc shall be unique and distinct from the
743       token numbers used for literals and user-assigned tokens. If  duplicate
744       token  numbers  cause conflicts in parser generation, yacc shall report
745       an error; otherwise, it is unspecified whether the token assignment  is
746       accepted or an error is reported.
747
748       The end of the input is marked by a special token called the endmarker,
749       which has a token number that is zero or negative.  (These  values  are
750       invalid  for  any other token.) All lexical analyzers shall return zero
751       or negative as a token number upon reaching the end of their input.  If
752       the  tokens  up  to, but excluding, the endmarker form a structure that
753       matches the start symbol, the parser shall accept  the  input.  If  the
754       endmarker  is  seen  in  any  other  context, it shall be considered an
755       error.
756
757   Completing the Program
758       In addition to yyparse()  and  yylex(),  the  functions  yyerror()  and
759       main()  are  required  to  make a complete program. The application can
760       supply main() and yyerror(), or those routines can be obtained from the
761       yacc library.
762
763   Yacc Library
764       The  following functions shall appear only in the yacc library accessi‐
765       ble through the -l y operand to c99; they can therefore be redefined by
766       a conforming application:
767
768       int main(void)
769             This  function  shall call yyparse() and exit with an unspecified
770             value. Other actions within this function are unspecified.
771
772       int yyerror(const char *s)
773             This function shall write the NUL-terminated argument to standard
774             error, followed by a <newline>.
775
776       The  order  of  the -l y and -l l operands given to c99 is significant;
777       the application shall either provide its own main() function or  ensure
778       that -l y precedes -l l.
779
780   Debugging the Parser
781       The  parser  generated  by  yacc shall have diagnostic facilities in it
782       that can be optionally enabled at either compile time or at runtime (if
783       enabled  at  compile  time).   The compilation of the runtime debugging
784       code is under the control of YYDEBUG, a preprocessor symbol. If YYDEBUG
785       has  a  non-zero  value,  the  debugging code shall be included. If its
786       value is zero, the code shall not be included.
787
788       In parsers where the debugging code has been included, the external int
789       yydebug  can  be  used to turn debugging on (with a non-zero value) and
790       off (zero value) at runtime. The initial  value  of  yydebug  shall  be
791       zero.
792
793       When  -t is specified, the code file shall be built such that, if YYDE‐
794       BUG is not already defined at compilation time (using the c99 -D  YYDE‐
795       BUG  option,  for example), YYDEBUG shall be set explicitly to 1.  When
796       -t is not specified, the code file shall be built such that, if YYDEBUG
797       is not already defined, it shall be set explicitly to zero.
798
799       The format of the debugging output is unspecified but includes at least
800       enough information to determine the shift and reduce actions,  and  the
801       input symbols. It also provides information about error recovery.
802
803   Algorithms
804       The  parser constructed by yacc implements an LALR(1) parsing algorithm
805       as documented in the literature. It is unspecified whether  the  parser
806       is table-driven or direct-coded.
807
808       A  parser  generated  by  yacc shall never request an input symbol from
809       yylex() while in a state where the only actions other  than  the  error
810       action are reductions by a single rule.
811
812       The literature of parsing theory defines these concepts.
813
814   Limits
815       The yacc utility may have several internal tables. The minimum maximums
816       for these tables are shown in the following table. The exact meaning of
817       these values is implementation-defined. The implementation shall define
818       the relationship between these values and between them  and  any  error
819       messages  that  the  implementation  may  generate should it run out of
820       space for any internal structure. An implementation may combine  groups
821       of these resources into a single pool as long as the total available to
822       the user does not fall below the sum of the  sizes  specified  by  this
823       section.
824
825                           Table: Internal Limits in yacc
826
827               ┌───────────┬─────────┬────────────────────────────────┐
828               │           │ Minimum │                                │
829               │  Limit    │ Maximum │          Description           │
830               ├───────────┼─────────┼────────────────────────────────┤
831               │{NTERMS}   │   126   │ Number of tokens.              │
832               │{NNONTERM} │   200   │ Number of non-terminals.       │
833               │{NPROD}    │   300   │ Number of rules.               │
834               │{NSTATES}  │   600   │ Number of states.              │
835               │{MEMSIZE}  │  5200   │ Length of rules. The total     │
836               │           │         │ length, in names (tokens and   │
837               │           │         │ non-terminals), of all the     │
838               │           │         │ rules of the grammar. The      │
839               │           │         │ left-hand side is counted for  │
840               │           │         │ each rule, even if it is not   │
841               │           │         │ explicitly repeated, as speci‐ │
842               │           │         │ fied in Grammar Rules in yacc. │
843               │{ACTSIZE}  │  4000   │ Number of actions. ``Actions'' │
844               │           │         │ here (and in the description   │
845               │           │         │ file) refer to parser actions  │
846               │           │         │ (shift, reduce, and so on) not │
847               │           │         │ to semantic actions defined in │
848               │           │         │ Grammar Rules in yacc.         │
849               └───────────┴─────────┴────────────────────────────────┘

EXIT STATUS

851       The following exit values shall be returned:
852
853        0    Successful completion.
854
855       >0    An error occurred.
856

CONSEQUENCES OF ERRORS

858       If any errors are encountered, the run is aborted and yacc exits with a
859       non-zero status. Partial code files and header files may  be  produced.
860       The  summary  information  in the description file shall always be pro‐
861       duced if the -v flag is present.
862
863       The following sections are informative.
864

APPLICATION USAGE

866       Historical implementations  experience  name  conflicts  on  the  names
867       yacc.tmp, yacc.acts, yacc.debug, y.tab.c, y.tab.h, and y.output if more
868       than one copy of yacc is running in a single directory at one time. The
869       -b  option  was  added to overcome this problem. The related problem of
870       allowing multiple yacc parsers to  be  placed  in  the  same  file  was
871       addressed  by  adding a -p option to override the previously hard-coded
872       yy variable prefix.
873
874       The description of the -p option specifies the minimal set of  function
875       and variable names that cause conflict when multiple parsers are linked
876       together. YYSTYPE does not need to be changed. Instead, the  programmer
877       can  use  -b  to  give the header files for different parsers different
878       names, and then the file with  the  yylex()  for  a  given  parser  can
879       include  the  header  for  that parser. Names such as yyclearerr do not
880       need to be changed because they are used only in the actions;  they  do
881       not  have  linkage.  It  is  possible  that an implementation has other
882       names, either internal ones for implementing things such as yyclearerr,
883       or providing non-standard features that it wants to change with -p.
884
885       Unary operators that are the same token as a binary operator in general
886       need their precedence adjusted. This is handled by the  %prec  advisory
887       symbol  associated with the particular grammar rule defining that unary
888       operator. (See Grammar Rules in yacc.)  Applications are  not  required
889       to  use this operator for unary operators, but the grammars that do not
890       require it are rare.
891

EXAMPLES

893       Access to the yacc library is obtained with library search operands  to
894       c99.  To use the yacc library main():
895
896
897           c99 y.tab.c -l y
898
899       Both  the  lex  library and the yacc library contain main().  To access
900       the yacc main():
901
902
903           c99 y.tab.c lex.yy.c -l y -l l
904
905       This ensures that the yacc library  is  searched  first,  so  that  its
906       main() is used.
907
908       The  historical yacc libraries have contained two simple functions that
909       are normally coded by the application programmer. These  functions  are
910       similar to the following code:
911
912
913           #include <locale.h>
914           int main(void)
915           {
916               extern int yyparse();
917
918               setlocale(LC_ALL, "");
919
920               /* If the following parser is one created by lex, the
921                  application must be careful to ensure that LC_CTYPE
922                  and LC_COLLATE are set to the POSIX locale. */
923               (void) yyparse();
924               return (0);
925           }
926
927           #include <stdio.h>
928
929           int yyerror(const char *msg)
930           {
931               (void) fprintf(stderr, "%s\n", msg);
932               return (0);
933           }
934

RATIONALE

936       The  references  in Referenced Documents may be helpful in constructing
937       the parser generator.  The  referenced  DeRemer  and  Pennello  article
938       (along  with the works it references) describes a technique to generate
939       parsers that conform to this volume of POSIX.1‐2017. Work in this  area
940       continues to be done, so implementors should consult current literature
941       before doing any new implementations. The original Knuth article is the
942       theoretical  basis for this kind of parser, but the tables it generates
943       are impractically large for reasonable grammars and should not be used.
944       The  ``equivalent  to''  wording is intentional to assure that the best
945       tables that are LALR(1) can be generated.
946
947       There has been confusion between the class of grammars, the  algorithms
948       needed to generate parsers, and the algorithms needed to parse the lan‐
949       guages. They are all reasonably orthogonal.  In  particular,  a  parser
950       generator that accepts the full range of LR(1) grammars need not gener‐
951       ate a table any more complex than one that accepts SLR(1) (a relatively
952       weak  class  of  LR  grammars) for a grammar that happens to be SLR(1).
953       Such an implementation need not recognize the case, either; table  com‐
954       pression  can  yield  the  SLR(1) table (or one even smaller than that)
955       without recognizing that the grammar is SLR(1).  The speed of an  LR(1)
956       parser  for  any  class is dependent more upon the table representation
957       and compression (or the code generation if a direct  parser  is  gener‐
958       ated) than upon the class of grammar that the table generator handles.
959
960       The  speed of the parser generator is somewhat dependent upon the class
961       of grammar it handles. However, the original Knuth  article  algorithms
962       for  constructing  LR parsers were judged by its author to be impracti‐
963       cally slow at that time. Although full LR is more complex than LALR(1),
964       as  computer speeds and algorithms improve, the difference (in terms of
965       acceptable wall-clock execution time) is becoming less significant.
966
967       Potential authors are cautioned that the referenced  DeRemer  and  Pen‐
968       nello article previously cited identifies a bug (an over-simplification
969       of the computation of LALR(1) lookahead sets) in some  of  the  LALR(1)
970       algorithm  statements that preceded it to publication. They should take
971       the time to seek out that paper, as well as current relevant work, par‐
972       ticularly Aho's.
973
974       The  -b  option  was  added to provide a portable method for permitting
975       yacc to work on multiple separate parsers in the same directory.  If  a
976       directory  contains  more  than one yacc grammar, and both grammars are
977       constructed at the same time (by, for example,  a  parallel  make  pro‐
978       gram), conflict results. While the solution is not historical practice,
979       it corrects a known deficiency in historical  implementations.   Corre‐
980       sponding  changes  were  made to all sections that referenced the file‐
981       names y.tab.c (now  ``the  code  file''),  y.tab.h  (now  ``the  header
982       file''), and y.output (now ``the description file'').
983
984       The grammar for yacc input is based on System V documentation. The tex‐
985       tual description shows there that the ';' is required at the end of the
986       rule.  The grammar and the implementation do not require this. (The use
987       of C_IDENTIFIER causes a reduce to occur in the right place.)
988
989       Also, in that implementation, the constructs such as %token can be ter‐
990       minated by a <semicolon>, but this is not permitted by the grammar. The
991       keywords such as %token can also appear in uppercase,  which  is  again
992       not  discussed.  In  most  places where '%' is used, <backslash> can be
993       substituted, and there are alternate spellings for some of the  symbols
994       (for example, %LEFT can be "%<" or even "\<").
995
996       Historically,  <tag>  can  contain any characters except '>', including
997       white space, in the implementation. However, since the tag must  refer‐
998       ence an ISO C standard union member, in practice conforming implementa‐
999       tions need to support only the set of  characters  for  ISO C  standard
1000       identifiers in this context.
1001
1002       Some  historical  implementations  are known to accept actions that are
1003       terminated by a period. Historical implementations often allow  '$'  in
1004       names.  A  conforming implementation does not need to support either of
1005       these behaviors.
1006
1007       Deciding when to use %prec illustrates the difficulty in specifying the
1008       behavior of yacc.  There may be situations in which the grammar is not,
1009       strictly speaking, in error, and yet yacc cannot interpret it unambigu‐
1010       ously.  The  resolution  of  ambiguities  in  the  grammar  can in many
1011       instances be resolved by  providing  additional  information,  such  as
1012       using  %type  or %union declarations. It is often easier and it usually
1013       yields a smaller parser to take this alternative when it  is  appropri‐
1014       ate.
1015
1016       The  size  and execution time of a program produced without the runtime
1017       debugging code is usually smaller and  slightly  faster  in  historical
1018       implementations.
1019
1020       Statistics messages from several historical implementations include the
1021       following types of information:
1022
1023
1024           n/512 terminals, n/300 non-terminals
1025           n/600 grammar rules, n/1500 states
1026           n shift/reduce, n reduce/reduce conflicts reported
1027           n/350 working sets used
1028           Memory: states, etc. n/15000, parser n/15000
1029           n/600 distinct lookahead sets
1030           n extra closures
1031           n shift entries, n exceptions
1032           n goto entries
1033           n entries saved by goto default
1034           Optimizer space used: input n/15000, output n/15000
1035           n table entries, n zero
1036           Maximum spread: n, Maximum offset: n
1037
1038       The report of internal tables in the description file is left implemen‐
1039       tation-defined because all aspects of these limits are also implementa‐
1040       tion-defined. Some implementations may  use  dynamic  allocation  tech‐
1041       niques and have no specific limit values to report.
1042
1043       The  format  of the y.output file is not given because specification of
1044       the format was not seen to enhance applications portability. The  list‐
1045       ing  is primarily intended to help human users understand and debug the
1046       parser; use of y.output by a conforming  application  script  would  be
1047       unusual. Furthermore, implementations have not produced consistent out‐
1048       put and no popular format was apparent.  The  format  selected  by  the
1049       implementation should be human-readable, in addition to the requirement
1050       that it be a text file.
1051
1052       Standard error reports are not specifically described because they  are
1053       seldom  of  use  to  conforming applications and there was no reason to
1054       restrict implementations.
1055
1056       Some implementations recognize "={" as equivalent  to  '{'  because  it
1057       appears  in  historical documentation. This construction was recognized
1058       and documented as obsolete as long ago as 1978, in the referenced Yacc:
1059       Yet  Another  Compiler-Compiler.  This  volume of POSIX.1‐2017 chose to
1060       leave it as obsolete and omit it.
1061
1062       Multi-byte characters should be recognized by the lexical analyzer  and
1063       returned as tokens. They should not be returned as multi-byte character
1064       literals. The token error that is used for error recovery  is  normally
1065       assigned  the  value  256  in  the historical implementation. Thus, the
1066       token value 256, which is used in many multi-byte  character  sets,  is
1067       not available for use as the value of a user-defined token.
1068

FUTURE DIRECTIONS

1070       None.
1071

COPYRIGHT

1079       Portions of this text are reprinted and reproduced in  electronic  form
1080       from  IEEE Std 1003.1-2017, Standard for Information Technology -- Por‐
1081       table Operating System Interface (POSIX), The Open Group Base  Specifi‐
1082       cations  Issue  7, 2018 Edition, Copyright (C) 2018 by the Institute of
1083       Electrical and Electronics Engineers, Inc and The Open Group.   In  the
1084       event of any discrepancy between this version and the original IEEE and
1085       The Open Group Standard, the original IEEE and The Open Group  Standard
1086       is  the  referee document. The original Standard can be obtained online
1087       at http://www.opengroup.org/unix/online.html .
1088
1089       Any typographical or formatting errors that appear  in  this  page  are
1090       most likely to have been introduced during the conversion of the source
1091       files to man page format. To report such errors,  see  https://www.ker‐
1092       nel.org/doc/man-pages/reporting_bugs.html .
1093
1094
1095
1096IEEE/The Open Group                  2017                             YACC(1P)