awk(1p) - f35

1AWK(1P)                    POSIX Programmer's Manual                   AWK(1P)
2
3
4

PROLOG

6       This  manual  page is part of the POSIX Programmer's Manual.  The Linux
7       implementation of this interface may differ (consult the  corresponding
8       Linux  manual page for details of Linux behavior), or the interface may
9       not be implemented on Linux.
10

NAME

12       awk — pattern scanning and processing language
13

SYNOPSIS

15       awk [-F sepstring] [-v assignment]... program [argument...]
16
17       awk [-F sepstring] -f progfile [-f progfile]... [-v assignment]...
18            [argument...]
19

DESCRIPTION

21       The awk utility shall execute programs written in the  awk  programming
22       language,  which  is  specialized for textual data manipulation. An awk
23       program is a sequence of patterns and corresponding actions. When input
24       is read that matches a pattern, the action associated with that pattern
25       is carried out.
26
27       Input shall be interpreted as a sequence  of  records.  By  default,  a
28       record  is  a  line,  less  its  terminating <newline>, but this can be
29       changed by using the RS built-in variable. Each record of  input  shall
30       be  matched  in turn against each pattern in the program. For each pat‐
31       tern matched, the associated action shall be executed.
32
33       The awk utility shall interpret each input  record  as  a  sequence  of
34       fields  where, by default, a field is a string of non-<blank> non-<new‐
35       line> characters. This default <blank> and  <newline>  field  delimiter
36       can  be  changed  by using the FS built-in variable or the -F sepstring
37       option. The awk utility shall denote the first field in  a  record  $1,
38       the  second  $2,  and  so  on.  The symbol $0 shall refer to the entire
39       record; setting any other field causes the re-evaluation of $0. Assign‐
40       ing  to $0 shall reset the values of all other fields and the NF built-
41       in variable.
42

OPTIONS

44       The awk utility  shall  conform  to  the  Base  Definitions  volume  of
45       POSIX.1‐2017, Section 12.2, Utility Syntax Guidelines.
46
47       The following options shall be supported:
48
49       -F sepstring
50                 Define the input field separator. This option shall be equiv‐
51                 alent to:
52
53
54                     -v FS=sepstring
55
56                 except that if -F sepstring  and  -v  FS=sepstring  are  both
57                 used,  it  is unspecified whether the FS assignment resulting
58                 from -F sepstring is processed in command line  order  or  is
59                 processed  after  the last -v FS=sepstring.  See the descrip‐
60                 tion of the FS built-in variable, and how it is used, in  the
61                 EXTENDED DESCRIPTION section.
62
63       -f progfile
64                 Specify  the  pathname of the file progfile containing an awk
65                 program. A pathname of '-' shall denote the  standard  input.
66                 If  multiple instances of this option are specified, the con‐
67                 catenation of the files specified as progfile  in  the  order
68                 specified  shall  be  the  awk  program.  The awk program can
69                 alternatively be specified in the command line  as  a  single
70                 argument.
71
72       -v assignment
73                 The  application shall ensure that the assignment argument is
74                 in the same form as  an  assignment  operand.  The  specified
75                 variable  assignment  shall  occur prior to executing the awk
76                 program, including the actions associated with BEGIN patterns
77                 (if  any).  Multiple occurrences of this option can be speci‐
78                 fied.
79

OPERANDS

81       The following operands shall be supported:
82
83       program   If no -f option is specified, the first operand to awk  shall
84                 be  the text of the awk program. The application shall supply
85                 the program operand as a single argument to awk.  If the text
86                 does  not end in a <newline>, awk shall interpret the text as
87                 if it did.
88
89       argument  Either of the following two types of argument can  be  inter‐
90                 mixed:
91
92                 file      A  pathname of a file that contains the input to be
93                           read, which is matched against the set of  patterns
94                           in  the program. If no file operands are specified,
95                           or if a file operand is  '-',  the  standard  input
96                           shall be used.
97
98                 assignment
99                           An  operand  that  begins  with  an <underscore> or
100                           alphabetic character from  the  portable  character
101                           set  (see  the table in the Base Definitions volume
102                           of POSIX.1‐2017, Section  6.1,  Portable  Character
103                           Set),  followed  by a sequence of underscores, dig‐
104                           its, and alphabetics from  the  portable  character
105                           set, followed by the '=' character, shall specify a
106                           variable assignment rather than  a  pathname.   The
107                           characters  before the '=' represent the name of an
108                           awk variable; if that name is an awk reserved  word
109                           (see  Grammar) the behavior is undefined. The char‐
110                           acters following the <equals-sign> shall be  inter‐
111                           preted  as if they appeared in the awk program pre‐
112                           ceded and followed by a double-quote ('"')  charac‐
113                           ter,  as  a STRING token (see Grammar), except that
114                           if the last character is an unescaped  <backslash>,
115                           it  shall  be  interpreted as a literal <backslash>
116                           rather than as the first character of the  sequence
117                           "\"".   The variable shall be assigned the value of
118                           that STRING token and,  if  appropriate,  shall  be
119                           considered  a  numeric  string  (see Expressions in
120                           awk), the  variable  shall  also  be  assigned  its
121                           numeric  value. Each such variable assignment shall
122                           occur just prior to the processing of the following
123                           file,  if any. Thus, an assignment before the first
124                           file argument shall be  executed  after  the  BEGIN
125                           actions  (if  any),  while  an assignment after the
126                           last file  argument  shall  occur  before  the  END
127                           actions  (if  any). If there are no file arguments,
128                           assignments shall be executed before processing the
129                           standard input.
130

STDIN

132       The  standard  input  shall be used only if no file operands are speci‐
133       fied, or if a file operand is '-', or if a progfile option-argument  is
134       '-';  see  the  INPUT  FILES  section.  If  the awk program contains no
135       actions and no patterns, but is otherwise a valid awk program, standard
136       input and any file operands shall not be read and awk shall exit with a
137       return status of zero.
138

INPUT FILES

140       Input files to the awk program from any of the following sources  shall
141       be text files:
142
143        *  Any  file  operands or their equivalents, achieved by modifying the
144           awk variables ARGV and ARGC
145
146        *  Standard input in the absence of any file operands
147
148        *  Arguments to the getline function
149
150       Whether the variable RS is set to a value other  than  a  <newline>  or
151       not,  for these files, implementations shall support records terminated
152       with the specified separator up to {LINE_MAX}  bytes  and  may  support
153       longer records.
154
155       If  -f  progfile  is  specified,  the application shall ensure that the
156       files named by each of the progfile option-arguments are text files and
157       their concatenation, in the same order as they appear in the arguments,
158       is an awk program.
159

ENVIRONMENT VARIABLES

161       The following environment variables shall affect the execution of awk:
162
163       LANG      Provide a default value for  the  internationalization  vari‐
164                 ables  that are unset or null. (See the Base Definitions vol‐
165                 ume of POSIX.1‐2017, Section 8.2, Internationalization  Vari‐
166                 ables  for  the  precedence of internationalization variables
167                 used to determine the values of locale categories.)
168
169       LC_ALL    If set to a non-empty string value, override  the  values  of
170                 all the other internationalization variables.
171
172       LC_COLLATE
173                 Determine  the locale for the behavior of ranges, equivalence
174                 classes, and multi-character collating elements within  regu‐
175                 lar expressions and in comparisons of string values.
176
177       LC_CTYPE  Determine  the  locale for the interpretation of sequences of
178                 bytes of text data as characters (for example, single-byte as
179                 opposed  to  multi-byte  characters  in  arguments  and input
180                 files), the behavior  of  character  classes  within  regular
181                 expressions, the identification of characters as letters, and
182                 the mapping of uppercase and  lowercase  characters  for  the
183                 toupper and tolower functions.
184
185       LC_MESSAGES
186                 Determine the locale that should be used to affect the format
187                 and contents  of  diagnostic  messages  written  to  standard
188                 error.
189
190       LC_NUMERIC
191                 Determine  the radix character used when interpreting numeric
192                 input, performing conversions between numeric and string val‐
193                 ues, and formatting numeric output. Regardless of locale, the
194                 <period> character (the decimal-point character of the  POSIX
195                 locale) is the decimal-point character recognized in process‐
196                 ing awk programs (including assignments in command line argu‐
197                 ments).
198
199       NLSPATH   Determine the location of message catalogs for the processing
200                 of LC_MESSAGES.
201
202       PATH      Determine the search path when looking for commands  executed
203                 by system(expr), or input and output pipes; see the Base Def‐
204                 initions volume of POSIX.1‐2017, Chapter 8, Environment Vari‐
205                 ables.
206
207       In  addition,  all  environment  variables shall be visible via the awk
208       variable ENVIRON.
209

ASYNCHRONOUS EVENTS

211       Default.
212

STDOUT

214       The nature of the output files depends on the awk program.
215

STDERR

217       The standard error shall be used only for diagnostic messages.
218

OUTPUT FILES

220       The nature of the output files depends on the awk program.
221

EXTENDED DESCRIPTION

223   Overall Program Structure
224       An awk program is composed of pairs of the form:
225
226
227           pattern { action }
228
229       Either the pattern or the action (including the enclosing brace charac‐
230       ters) can be omitted.
231
232       A missing pattern shall match any record of input, and a missing action
233       shall be equivalent to:
234
235
236           { print }
237
238       Execution of the awk program shall start by first executing the actions
239       associated  with all BEGIN patterns in the order they occur in the pro‐
240       gram. Then each file operand (or standard input if no files were speci‐
241       fied)  shall be processed in turn by reading data from the file until a
242       record separator is seen (<newline> by default). Before the first  ref‐
243       erence to a field in the record is evaluated, the record shall be split
244       into fields, according to the rules in Regular Expressions,  using  the
245       value of FS that was current at the time the record was read. Each pat‐
246       tern in the program then shall be evaluated in the order of occurrence,
247       and  the  action  associated with each pattern that matches the current
248       record executed. The action for a matching pattern  shall  be  executed
249       before  evaluating subsequent patterns. Finally, the actions associated
250       with all END patterns shall be executed in the order they occur in  the
251       program.
252
253   Expressions in awk
254       Expressions describe computations used in patterns and actions.  In the
255       following table, valid expression operations are given in  groups  from
256       highest  precedence  first to lowest precedence last, with equal-prece‐
257       dence operators grouped between horizontal lines. In expression evalua‐
258       tion, where the grammar is formally ambiguous, higher precedence opera‐
259       tors shall be evaluated before lower precedence operators. In this  ta‐
260       ble  expr,  expr1,  expr2,  and  expr3  represent any expression, while
261       lvalue represents any entity that can be assigned to (that is,  on  the
262       left  side  of  an assignment operator).  The precise syntax of expres‐
263       sions is given in Grammar.
264
265               Table 4-1: Expressions in Decreasing Precedence in awk
266
267   ┌─────────────────────┬─────────────────────────┬────────────────┬──────────────┐
268   │       Syntax        │          Name           │ Type of Result │Associativity │
269   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
270   │( expr )             │Grouping                 │Type of expr    │N/A           │
271   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
272   │$expr                │Field reference          │String          │N/A           │
273   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
274   │lvalue ++            │Post-increment           │Numeric         │N/A           │
275   │lvalue --            │Post-decrement           │Numeric         │N/A           │
276   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
277   │++ lvalue            │Pre-increment            │Numeric         │N/A           │
278   │-- lvalue            │Pre-decrement            │Numeric         │N/A           │
279   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
280   │expr ^ expr          │Exponentiation           │Numeric         │Right         │
281   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
282   │! expr               │Logical not              │Numeric         │N/A           │
283   │+ expr               │Unary plus               │Numeric         │N/A           │
284   │- expr               │Unary minus              │Numeric         │N/A           │
285   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
286   │expr * expr          │Multiplication           │Numeric         │Left          │
287   │expr / expr          │Division                 │Numeric         │Left          │
288   │expr % expr          │Modulus                  │Numeric         │Left          │
289   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
290   │expr + expr          │Addition                 │Numeric         │Left          │
291   │expr - expr          │Subtraction              │Numeric         │Left          │
292   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
293   │expr expr            │String concatenation     │String          │Left          │
294   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
295   │expr < expr          │Less than                │Numeric         │None          │
296   │expr <= expr         │Less than or equal to    │Numeric         │None          │
297   │expr != expr         │Not equal to             │Numeric         │None          │
298   │expr == expr         │Equal to                 │Numeric         │None          │
299   │expr > expr          │Greater than             │Numeric         │None          │
300   │expr >= expr         │Greater than or equal to │Numeric         │None          │
301   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
302   │expr ~ expr          │ERE match                │Numeric         │None          │
303   │expr !~ expr         │ERE non-match            │Numeric         │None          │
304   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
305   │expr in array        │Array membership         │Numeric         │Left          │
306   │( index ) in array   │Multi-dimension array    │Numeric         │Left          │
307   │                     │membership               │                │              │
308   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
309   │expr && expr         │Logical AND              │Numeric         │Left          │
310   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
311   │expr || expr         │Logical OR               │Numeric         │Left          │
312   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
313   │expr1 ? expr2 : expr3│Conditional expression   │Type of selected│Right         │
314   │                     │                         │expr2 or expr3  │              │
315   ├─────────────────────┼─────────────────────────┼────────────────┼──────────────┤
316   │lvalue ^= expr       │Exponentiation assignment│Numeric         │Right         │
317   │lvalue %= expr       │Modulus assignment       │Numeric         │Right         │
318   │lvalue *= expr       │Multiplication assignment│Numeric         │Right         │
319   │lvalue /= expr       │Division assignment      │Numeric         │Right         │
320   │lvalue += expr       │Addition assignment      │Numeric         │Right         │
321   │lvalue -= expr       │Subtraction assignment   │Numeric         │Right         │
322   │lvalue = expr        │Assignment               │Type of expr    │Right         │
323   └─────────────────────┴─────────────────────────┴────────────────┴──────────────┘
324       Each expression shall have either a string value, a numeric  value,  or
325       both.  Except  as stated for specific contexts, the value of an expres‐
326       sion shall be implicitly converted to the type needed for  the  context
327       in  which  it  is  used. A string value shall be converted to a numeric
328       value either by the equivalent of  the  following  calls  to  functions
329       defined by the ISO C standard:
330
331
332           setlocale(LC_NUMERIC, "");
333           numeric_value = atof(string_value);
334
335       or  by converting the initial portion of the string to type double rep‐
336       resentation as follows:
337
338              The input string is decomposed into two parts: an initial,  pos‐
339              sibly empty, sequence of white-space characters (as specified by
340              isspace()) and a subject sequence  interpreted  as  a  floating-
341              point constant.
342
343              The  expected form of the subject sequence is an optional '+' or
344              '-' sign, then a non-empty sequence of  digits  optionally  con‐
345              taining  a <period>, then an optional exponent part. An exponent
346              part consists of 'e' or 'E', followed by an optional sign,  fol‐
347              lowed by one or more decimal digits.
348
349              The  sequence  starting  with  the  first  digit or the <period>
350              (whichever occurs first) is interpreted as a  floating  constant
351              of  the  C  language,  and  if  neither  an  exponent part nor a
352              <period> appears, a <period> is assumed to follow the last digit
353              in  the  string.  If the subject sequence begins with a <hyphen-
354              minus>, the value resulting from the conversion is negated.
355
356       A numeric value that is exactly equal to the value of an  integer  (see
357       Section  1.1.2, Concepts Derived from the ISO C Standard) shall be con‐
358       verted to a string by the equivalent of a call to the sprintf  function
359       (see String Functions) with the string "%d" as the fmt argument and the
360       numeric value being converted as the first and only expr argument.  Any
361       other numeric value shall be converted to a string by the equivalent of
362       a call to the sprintf function with the value of the  variable  CONVFMT
363       as  the fmt argument and the numeric value being converted as the first
364       and only expr argument. The result of the conversion is unspecified  if
365       the value of CONVFMT is not a floating-point format specification. This
366       volume of POSIX.1‐2017 specifies no explicit conversions  between  num‐
367       bers  and strings. An application can force an expression to be treated
368       as a number by adding zero to it, or can force it to be  treated  as  a
369       string by concatenating the null string ("") to it.
370
371       A  string  value  shall be considered a numeric string if it comes from
372       one of the following:
373
374        1. Field variables
375
376        2. Input from the getline() function
377
378        3. FILENAME
379
380        4. ARGV array elements
381
382        5. ENVIRON array elements
383
384        6. Array elements created by the split() function
385
386        7. A command line variable assignment
387
388        8. Variable assignment from another numeric string variable
389
390       and an implementation-dependent condition corresponding to either  case
391       (a) or (b) below is met.
392
393        a. After the equivalent of the following calls to functions defined by
394           the   ISO C   standard,   string_value_end   would   differ    from
395           string_value,  and any characters before the terminating null char‐
396           acter in string_value_end would be <blank> characters:
397
398
399               char *string_value_end;
400               setlocale(LC_NUMERIC, "");
401               numeric_value = strtod (string_value, &string_value_end);
402
403        b. After all the following conversions have been applied, the  result‐
404           ing  string  would  lexically  be  recognized  as a NUMBER token as
405           described by the lexical conventions in Grammar:
406
407           --  All leading and trailing <blank> characters are discarded.
408
409           --  If the first non-<blank> is '+' or '-', it is discarded.
410
411           --  Each occurrence of the decimal point character from the current
412               locale is changed to a <period>.
413       In  case (a) the numeric value of the numeric string shall be the value
414       that would be returned by the strtod() call. In case (b) if  the  first
415       non-<blank>  is  '-',  the numeric value of the numeric string shall be
416       the negation of the numeric value of the recognized NUMBER token;  oth‐
417       erwise,  the  numeric  value of the numeric string shall be the numeric
418       value of the recognized NUMBER token. Whether or  not  a  string  is  a
419       numeric  string  shall  be relevant only in contexts where that term is
420       used in this section.
421
422       When an expression is used in a Boolean context, if it  has  a  numeric
423       value,  a  value  of zero shall be treated as false and any other value
424       shall be treated as true. Otherwise, a string value of the null  string
425       shall be treated as false and any other value shall be treated as true.
426       A Boolean context shall be one of the following:
427
428        *  The first subexpression of a conditional expression
429
430        *  An expression operated on by logical NOT, logical AND,  or  logical
431           OR
432
433        *  The second expression of a for statement
434
435        *  The expression of an if statement
436
437        *  The  expression of the while clause in either a while or do...while
438           statement
439
440        *  An expression used as a pattern (as in Overall Program Structure)
441
442       All arithmetic shall follow the semantics of floating-point  arithmetic
443       as specified by the ISO C standard (see Section 1.1.2, Concepts Derived
444       from the ISO C Standard).
445
446       The value of the expression:
447
448
449           expr1 ^ expr2
450
451       shall be equivalent to the value returned by the ISO C  standard  func‐
452       tion call:
453
454
455           pow(expr1, expr2)
456
457       The expression:
458
459
460           lvalue ^= expr
461
462       shall be equivalent to the ISO C standard expression:
463
464
465           lvalue = pow(lvalue, expr)
466
467       except  that  lvalue  shall  be  evaluated  only once. The value of the
468       expression:
469
470
471           expr1 % expr2
472
473       shall be equivalent to the value returned by the ISO C  standard  func‐
474       tion call:
475
476
477           fmod(expr1, expr2)
478
479       The expression:
480
481
482           lvalue %= expr
483
484       shall be equivalent to the ISO C standard expression:
485
486
487           lvalue = fmod(lvalue, expr)
488
489       except that lvalue shall be evaluated only once.
490
491       Variables and fields shall be set by the assignment statement:
492
493
494           lvalue = expression
495
496       and the type of expression shall determine the resulting variable type.
497       The assignment includes the arithmetic assignments ("+=",  "-=",  "*=",
498       "/=",  "%=",  "^=",  "++",  "--")  all of which shall produce a numeric
499       result. The left-hand side of an assignment and the target of increment
500       and  decrement operators can be one of a variable, an array with index,
501       or a field selector.
502
503       The awk language supplies arrays that are used for storing  numbers  or
504       strings.   Arrays  need not be declared. They shall initially be empty,
505       and their sizes shall change dynamically. The  subscripts,  or  element
506       identifiers,  are  strings, providing a type of associative array capa‐
507       bility. An array name followed by a subscript  within  square  brackets
508       can be used as an lvalue and thus as an expression, as described in the
509       grammar; see Grammar.  Unsubscripted array names can be  used  in  only
510       the following contexts:
511
512        *  A parameter in a function definition or function call
513
514        *  The  NAME token following any use of the keyword in as specified in
515           the grammar (see Grammar); if the name used in this context is  not
516           an array name, the behavior is undefined
517
518       A  valid  array  index  shall  consist of one or more <comma>-separated
519       expressions, similar to the way in which multi-dimensional  arrays  are
520       indexed  in  some  programming languages. Because awk arrays are really
521       one-dimensional, such a <comma>-separated list shall be converted to  a
522       single  string  by  concatenating  the  string  values  of the separate
523       expressions, each separated from the other by the value of  the  SUBSEP
524       variable. Thus, the following two index operations shall be equivalent:
525
526
527           var[expr1, expr2, ... exprn]
528
529           var[expr1 SUBSEP expr2 SUBSEP ... SUBSEP exprn]
530
531       The  application  shall ensure that a multi-dimensioned index used with
532       the in operator is parenthesized. The in operator, which tests for  the
533       existence  of  a particular array element, shall not cause that element
534       to exist. Any other reference to  a  nonexistent  array  element  shall
535       automatically create it.
536
537       Comparisons  (with  the '<', "<=", "!=", "==", '>', and ">=" operators)
538       shall be made numerically if both  operands  are  numeric,  if  one  is
539       numeric  and  the other has a string value that is a numeric string, or
540       if one is numeric and the other has the  uninitialized  value.   Other‐
541       wise,  operands  shall be converted to strings as required and a string
542       comparison shall be made as follows:
543
544        *  For the "!=" and "==" operators, the strings should be compared  to
545           check  if  they are identical but may be compared using the locale-
546           specific collation sequence to check if they collate equally.
547
548        *  For the other operators, the strings shall be  compared  using  the
549           locale-specific collation sequence.
550
551       The  value  of  the comparison expression shall be 1 if the relation is
552       true, or 0 if the relation is false.
553
554   Variables and Special Variables
555       Variables can be used in an awk program by referencing them.  With  the
556       exception of function parameters (see User-Defined Functions), they are
557       not explicitly declared. Function parameter names shall be local to the
558       function; all other variable names shall be global. The same name shall
559       not be used as both a function parameter name and  as  the  name  of  a
560       function  or  a  special  awk variable. The same name shall not be used
561       both as a variable name with global scope and as the name  of  a  func‐
562       tion.  The  same name shall not be used within the same scope both as a
563       scalar variable and as an array.   Uninitialized  variables,  including
564       scalar  variables,  array  elements, and field variables, shall have an
565       uninitialized value. An uninitialized value shall have both  a  numeric
566       value  of  zero  and  a string value of the empty string. Evaluation of
567       variables with an uninitialized value, to  either  string  or  numeric,
568       shall be determined by the context in which they are used.
569
570       Field  variables  shall  be designated by a '$' followed by a number or
571       numerical expression. The effect of the field number expression  evalu‐
572       ating  to  anything  other  than a non-negative integer is unspecified;
573       uninitialized variables or string  values  need  not  be  converted  to
574       numeric  values  in this context. New field variables can be created by
575       assigning a value to them. References to nonexistent fields  (that  is,
576       fields after $NF), shall evaluate to the uninitialized value. Such ref‐
577       erences shall not create new fields. However, assigning to  a  nonexis‐
578       tent  field  (for  example,  $(NF+2)=5) shall increase the value of NF;
579       create any intervening fields with the uninitialized value;  and  cause
580       the  value  of  $0 to be recomputed, with the fields being separated by
581       the value of OFS.  Each field variable shall have a string value or  an
582       uninitialized value when created. Field variables shall have the unini‐
583       tialized value when created from $0 using FS and the variable does  not
584       contain  any  characters.  If  appropriate, the field variable shall be
585       considered a numeric string (see Expressions in awk).
586
587       Implementations shall support the  following  other  special  variables
588       that are set by awk:
589
590       ARGC      The number of elements in the ARGV array.
591
592       ARGV      An array of command line arguments, excluding options and the
593                 program argument, numbered from zero to ARGC-1.
594
595                 The arguments in ARGV can be modified or added to;  ARGC  can
596                 be altered. As each input file ends, awk shall treat the next
597                 non-null element of ARGV, up to the current value of  ARGC-1,
598                 inclusive,  as the name of the next input file. Thus, setting
599                 an element of ARGV to null means that it shall not be treated
600                 as  an input file. The name '-' indicates the standard input.
601                 If an argument matches the format of an  assignment  operand,
602                 this argument shall be treated as an assignment rather than a
603                 file argument.
604
605       CONVFMT   The printf format for converting numbers to  strings  (except
606                 for  output  statements,  where  OFMT  is  used);  "%.6g"  by
607                 default.
608
609       ENVIRON   An array  representing  the  value  of  the  environment,  as
610                 described  in the exec functions defined in the System Inter‐
611                 faces volume of POSIX.1‐2017. The indices of the array  shall
612                 be  strings  consisting of the names of the environment vari‐
613                 ables, and the value of each array element shall be a  string
614                 consisting of the value of that variable. If appropriate, the
615                 environment variable shall be  considered  a  numeric  string
616                 (see  Expressions  in awk); the array element shall also have
617                 its numeric value.
618
619                 In all cases where the behavior of awk is affected  by  envi‐
620                 ronment  variables (including the environment of any commands
621                 that awk executes via the system  function  or  via  pipeline
622                 redirections  with the print statement, the printf statement,
623                 or the getline function), the environment used shall  be  the
624                 environment  at the time awk began executing; it is implemen‐
625                 tation-defined whether any modification  of  ENVIRON  affects
626                 this environment.
627
628       FILENAME  A  pathname  of the current input file. Inside a BEGIN action
629                 the value is undefined. Inside an END action the value  shall
630                 be the name of the last input file processed.
631
632       FNR       The ordinal number of the current record in the current file.
633                 Inside a BEGIN action the value shall be zero. Inside an  END
634                 action  the value shall be the number of the last record pro‐
635                 cessed in the last file processed.
636
637       FS        Input  field  separator  regular  expression;  a  <space>  by
638                 default.
639
640       NF        The  number  of  fields in the current record. Inside a BEGIN
641                 action, the use of NF is undefined unless a getline  function
642                 without  a var argument is executed previously. Inside an END
643                 action, NF shall retain the value it had for the last  record
644                 read, unless a subsequent, redirected, getline function with‐
645                 out a var argument is performed prior  to  entering  the  END
646                 action.
647
648       NR        The  ordinal  number  of the current record from the start of
649                 input.  Inside a BEGIN action the value shall be zero. Inside
650                 an  END  action  the  value  shall  be the number of the last
651                 record processed.
652
653       OFMT      The printf format for converting numbers to strings in output
654                 statements  (see  Output  Statements); "%.6g" by default. The
655                 result of the conversion is unspecified if the value of  OFMT
656                 is not a floating-point format specification.
657
658       OFS       The  print  statement  output  field  separator;  <space>  by
659                 default.
660
661       ORS       The print statement output record separator; a  <newline>  by
662                 default.
663
664       RLENGTH   The length of the string matched by the match function.
665
666       RS        The  first  character  of the string value of RS shall be the
667                 input record separator; a <newline> by default.  If  RS  con‐
668                 tains  more  than one character, the results are unspecified.
669                 If RS is null, then records are separated by  sequences  con‐
670                 sisting  of a <newline> plus one or more blank lines, leading
671                 or trailing blank lines shall not result in empty records  at
672                 the  beginning  or  end  of  the input, and a <newline> shall
673                 always be a field separator, no matter what the value  of  FS
674                 is.
675
676       RSTART    The  starting  position  of  the  string matched by the match
677                 function, numbering from 1. This shall always  be  equivalent
678                 to the return value of the match function.
679
680       SUBSEP    The  subscript separator string for multi-dimensional arrays;
681                 the default value is implementation-defined.
682
683   Regular Expressions
684       The awk utility shall make use of the extended regular expression nota‐
685       tion  (see  the  Base  Definitions volume of POSIX.1‐2017, Section 9.4,
686       Extended Regular Expressions) except that it shall allow the use of  C-
687       language  conventions  for escaping special characters within the EREs,
688       as  specified  in  the  table  in  the  Base  Definitions   volume   of
689       POSIX.1‐2017,  Chapter 5, File Format Notation ('\\', '\a', '\b', '\f',
690       '\n', '\r', '\t', '\v') and the following table; these escape sequences
691       shall  be  recognized both inside and outside bracket expressions. Note
692       that records need not be separated by <newline> characters  and  string
693       constants  can  contain <newline> characters, so even the "\n" sequence
694       is valid in awk EREs. Using a <slash> character within an ERE  requires
695       the escaping shown in the following table.
696
697                         Table 4-2: Escape Sequences in awk
698
699 ┌─────────┬────────────────────────────────────┬────────────────────────────────────┐
700 │ Escape  │                                    │                                    │
701 │Sequence │            Description             │              Meaning               │
702 ├─────────┼────────────────────────────────────┼────────────────────────────────────┤
703 │\"       │ <backslash> <quotation-mark>       │ <quotation-mark> character         │
704 ├─────────┼────────────────────────────────────┼────────────────────────────────────┤
705 │\/       │ <backslash> <slash>                │ <slash> character                  │
706 ├─────────┼────────────────────────────────────┼────────────────────────────────────┤
707 │\ddd     │ A <backslash> character followed   │ The character whose encoding is    │
708 │         │ by the longest sequence of one,    │ represented by the one, two, or    │
709 │         │ two, or three octal-digit charac‐  │ three-digit octal integer. Multi-  │
710 │         │ ters (01234567). If all of the     │ byte characters require multiple,  │
711 │         │ digits are 0 (that is, representa‐ │ concatenated escape sequences of   │
712 │         │ tion of the NUL character), the    │ this type, including the leading   │
713 │         │ behavior is undefined.             │ <backslash> for each byte.         │
714 ├─────────┼────────────────────────────────────┼────────────────────────────────────┤
715 │\c       │ A <backslash> character followed   │ Undefined                          │
716 │         │ by any character not described in  │                                    │
717 │         │ this table or in the table in the  │                                    │
718 │         │ Base Definitions volume of         │                                    │
719 │         │ POSIX.1‐2017, Chapter 5, File For‐ │                                    │
720 │         │ mat Notation ('\\', '\a', '\b',    │                                    │
721 │         │ '\f', '\n', '\r', '\t', '\v').     │                                    │
722 └─────────┴────────────────────────────────────┴────────────────────────────────────┘
723       A  regular expression can be matched against a specific field or string
724       by using one of the two regular expression matching operators, '~'  and
725       "!~".   These  operators  shall interpret their right-hand operand as a
726       regular expression and their left-hand operand as a string. If the reg‐
727       ular  expression  matches the string, the '~' expression shall evaluate
728       to a value of 1, and the "!~" expression shall evaluate to a  value  of
729       0. (The regular expression matching operation is as defined by the term
730       matched in the Base Definitions volume of  POSIX.1‐2017,  Section  9.1,
731       Regular Expression Definitions, where a match occurs on any part of the
732       string unless the regular expression is limited with  the  <circumflex>
733       or  <dollar-sign>  special  characters.) If the regular expression does
734       not match the string, the '~' expression shall evaluate to a  value  of
735       0,  and  the  "!~"  expression  shall  evaluate to a value of 1. If the
736       right-hand operand is any expression other than the lexical token  ERE,
737       the  string value of the expression shall be interpreted as an extended
738       regular expression, including the escape conventions  described  above.
739       Note that these same escape conventions shall also be applied in deter‐
740       mining the value of a string literal (the lexical  token  STRING),  and
741       thus  shall  be  applied a second time when a string literal is used in
742       this context.
743
744       When an ERE token appears as an expression in any context other than as
745       the  right-hand  of  the '~' or "!~" operator or as one of the built-in
746       function arguments described below, the value of the resulting  expres‐
747       sion shall be the equivalent of:
748
749
750           $0 ~ /ere/
751
752       The ere argument to the gsub, match, sub functions, and the fs argument
753       to the split function (see String Functions) shall  be  interpreted  as
754       extended  regular  expressions. These can be either ERE tokens or arbi‐
755       trary expressions, and shall be interpreted in the same manner  as  the
756       right-hand side of the '~' or "!~" operator.
757
758       An  extended  regular  expression  can  be  used  to separate fields by
759       assigning a string containing the expression to the  built-in  variable
760       FS,  either  directly  or  as  a  consequence of using the -F sepstring
761       option.  The default value  of  the  FS  variable  shall  be  a  single
762       <space>.  The following describes FS behavior:
763
764        1. If FS is a null string, the behavior is unspecified.
765
766        2. If FS is a single character:
767
768            a. If  FS  is <space>, skip leading and trailing <blank> and <new‐
769               line> characters; fields shall be delimited by sets of  one  or
770               more <blank> or <newline> characters.
771
772            b. Otherwise,  if  FS  is  any  other character c, fields shall be
773               delimited by each single occurrence of c.
774
775        3. Otherwise, the string value of FS shall  be  considered  to  be  an
776           extended regular expression. Each occurrence of a sequence matching
777           the extended regular expression shall delimit fields.
778
779       Except for the '~' and "!~" operators, and in the gsub,  match,  split,
780       and  sub  built-in  functions,  ERE  matching  shall  be based on input
781       records; that is, record separator characters (the first  character  of
782       the  value of the variable RS, <newline> by default) cannot be embedded
783       in the expression, and no expression shall match the  record  separator
784       character.  If the record separator is not <newline>, <newline> charac‐
785       ters embedded in the expression can be matched. For the  '~'  and  "!~"
786       operators,  and in those four built-in functions, ERE matching shall be
787       based on text strings; that is, any character (including <newline>  and
788       the  record separator) can be embedded in the pattern, and an appropri‐
789       ate pattern shall match any character. However, in all awk  ERE  match‐
790       ing,  the  use  of  one  or  more  NUL characters in the pattern, input
791       record, or text string produces undefined results.
792
793   Patterns
794       A pattern is any valid expression, a range specified by two expressions
795       separated by a comma, or one of the two special patterns BEGIN or END.
796
797   Special Patterns
798       The  awk  utility  shall recognize two special patterns, BEGIN and END.
799       Each BEGIN pattern shall be matched once and its associated action exe‐
800       cuted  before  the first record of input is read—except possibly by use
801       of the getline function (see Input/Output and General Functions)  in  a
802       prior BEGIN action—and before command line assignment is done. Each END
803       pattern shall be matched once and its associated action executed  after
804       the  last  record of input has been read. These two patterns shall have
805       associated actions.
806
807       BEGIN and END shall not combine with other patterns. Multiple BEGIN and
808       END  patterns  shall  be allowed. The actions associated with the BEGIN
809       patterns shall be executed in the order specified in  the  program,  as
810       are  the  END  actions. An END pattern can precede a BEGIN pattern in a
811       program.
812
813       If an awk program consists of only actions with the pattern BEGIN,  and
814       the  BEGIN  action contains no getline function, awk shall exit without
815       reading its input when the last statement in the last BEGIN  action  is
816       executed.  If  an awk program consists of only actions with the pattern
817       END or only actions with the patterns BEGIN and END, the input shall be
818       read before the statements in the END actions are executed.
819
820   Expression Patterns
821       An expression pattern shall be evaluated as if it were an expression in
822       a Boolean context. If the result is true, the pattern shall be  consid‐
823       ered to match, and the associated action (if any) shall be executed. If
824       the result is false, the action shall not be executed.
825
826   Pattern Ranges
827       A pattern range consists of two expressions separated by  a  comma;  in
828       this  case,  the  action  shall  be performed for all records between a
829       match of the first expression and the following  match  of  the  second
830       expression, inclusive. At this point, the pattern range can be repeated
831       starting at input records subsequent to the end of the matched range.
832
833   Actions
834       An action is a sequence of statements as shown in the grammar in  Gram‐
835       mar.  Any single statement can be replaced by a statement list enclosed
836       in curly braces. The application shall  ensure  that  statements  in  a
837       statement  list  are  separated by <newline> or <semicolon> characters.
838       Statements in a statement list shall be executed  sequentially  in  the
839       order that they appear.
840
841       The  expression  acting  as the conditional in an if statement shall be
842       evaluated and if it is non-zero or non-null,  the  following  statement
843       shall be executed; otherwise, if else is present, the statement follow‐
844       ing the else shall be executed.
845
846       The if, while, do...while, for,  break,  and  continue  statements  are
847       based  on  the ISO C standard (see Section 1.1.2, Concepts Derived from
848       the ISO C Standard), except  that  the  Boolean  expressions  shall  be
849       treated as described in Expressions in awk, and except in the case of:
850
851
852           for (variable in array)
853
854       which  shall  iterate,  assigning each index of array to variable in an
855       unspecified order. The results of adding new elements to  array  within
856       such  a for loop are undefined. If a break or continue statement occurs
857       outside of a loop, the behavior is undefined.
858
859       The delete statement shall remove an individual  array  element.  Thus,
860       the following code deletes an entire array:
861
862
863           for (index in array)
864               delete array[index]
865
866       The  next  statement  shall cause all further processing of the current
867       input record to be abandoned. The  behavior  is  undefined  if  a  next
868       statement appears or is invoked in a BEGIN or END action.
869
870       The  exit  statement shall invoke all END actions in the order in which
871       they occur in the program source and then terminate the program without
872       reading  further  input.  An  exit statement inside an END action shall
873       terminate the program without further execution of END actions.  If  an
874       expression  is  specified in an exit statement, its numeric value shall
875       be the exit status of awk, unless subsequent errors are encountered  or
876       a subsequent exit statement with an expression is executed.
877
878   Output Statements
879       Both  print  and  printf  statements  shall write to standard output by
880       default. The output shall be written to the location specified by  out‐
881       put_redirection if one is supplied, as follows:
882
883
884           > expression
885           >> expression
886           | expression
887
888       In  all  cases,  the  expression shall be evaluated to produce a string
889       that is used as a pathname into which to write (for '>' or ">>") or  as
890       a  command to be executed (for '|').  Using the first two forms, if the
891       file of that name is not currently open, it shall be  opened,  creating
892       it if necessary and using the first form, truncating the file. The out‐
893       put then shall be appended to the file. As long  as  the  file  remains
894       open, subsequent calls in which expression evaluates to the same string
895       value shall simply append output to the file.  The  file  remains  open
896       until  the  close  function (see Input/Output and General Functions) is
897       called with an expression that evaluates to the same string value.
898
899       The third form shall write output onto a stream piped to the input of a
900       command.  The  stream  shall  be created if no stream is currently open
901       with the value of expression as its command name.  The  stream  created
902       shall  be  equivalent  to one created by a call to the popen() function
903       defined in the System Interfaces volume of POSIX.1‐2017 with the  value
904       of  expression  as  the  command  argument and a value of w as the mode
905       argument. As long as the stream remains open, subsequent calls in which
906       expression evaluates to the same string value shall write output to the
907       existing stream. The stream shall remain open until the close  function
908       (see  Input/Output  and General Functions) is called with an expression
909       that evaluates to the same string value.   At  that  time,  the  stream
910       shall be closed as if by a call to the pclose() function defined in the
911       System Interfaces volume of POSIX.1‐2017.
912
913       As described in detail by the grammar in Grammar, these  output  state‐
914       ments shall take a <comma>-separated list of expressions referred to in
915       the grammar by the non-terminal symbols expr_list, print_expr_list,  or
916       print_expr_list_opt.   This  list is referred to here as the expression
917       list, and each member is referred to as an expression argument.
918
919       The print statement shall write the value of each  expression  argument
920       onto  the indicated output stream separated by the current output field
921       separator (see variable OFS above), and terminated by the output record
922       separator  (see  variable ORS above). All expression arguments shall be
923       taken as strings, being converted if necessary; this  conversion  shall
924       be  as  described  in  Expressions  in awk, with the exception that the
925       printf format in OFMT shall be used instead of the  value  in  CONVFMT.
926       An empty expression list shall stand for the whole input record ($0).
927
928       The  printf  statement shall produce output based on a notation similar
929       to the File Format Notation used to describe file formats in this  vol‐
930       ume  of  POSIX.1‐2017 (see the Base Definitions volume of POSIX.1‐2017,
931       Chapter 5, File Format Notation).  Output shall be produced  as  speci‐
932       fied with the first expression argument as the string format and subse‐
933       quent expression arguments as the strings arg1 to argn, inclusive, with
934       the following exceptions:
935
936        1. The format shall be an actual character string rather than a graph‐
937           ical representation. Therefore, it cannot contain  empty  character
938           positions.  The  <space> in the format string, in any context other
939           than a flag of a conversion specification, shall be treated  as  an
940           ordinary character that is copied to the output.
941
942        2. If  the  character  set  contains a '' character and that character
943           appears in the format string, it shall be treated  as  an  ordinary
944           character that is copied to the output.
945
946        3. The  escape  sequences beginning with a <backslash> character shall
947           be treated as sequences of ordinary characters that are  copied  to
948           the  output.  Note  that  these same sequences shall be interpreted
949           lexically by awk when they appear  in  literal  strings,  but  they
950           shall not be treated specially by the printf statement.
951
952        4. A  field  width  or precision can be specified as the '*' character
953           instead of a digit string. In this case the next argument from  the
954           expression list shall be fetched and its numeric value taken as the
955           field width or precision.
956
957        5. The implementation shall not precede or follow output from the d or
958           u conversion specifier characters with <blank> characters not spec‐
959           ified by the format string.
960
961        6. The implementation shall not precede output from the  o  conversion
962           specifier  character with leading zeros not specified by the format
963           string.
964
965        7. For the c conversion specifier character: if  the  argument  has  a
966           numeric  value, the character whose encoding is that value shall be
967           output. If the value is zero or is not the encoding of any  charac‐
968           ter  in  the character set, the behavior is undefined. If the argu‐
969           ment does not have a numeric value,  the  first  character  of  the
970           string  value  shall  be output; if the string does not contain any
971           characters, the behavior is undefined.
972
973        8. For each conversion specification that consumes  an  argument,  the
974           next  expression argument shall be evaluated. With the exception of
975           the c conversion specifier character, the value shall be  converted
976           (according  to  the  rules  specified in Expressions in awk) to the
977           appropriate type for the conversion specification.
978
979        9. If there are insufficient expression arguments to satisfy  all  the
980           conversion  specifications  in  the  format string, the behavior is
981           undefined.
982
983       10. If any character sequence in the format string begins  with  a  '%'
984           character,  but does not form a valid conversion specification, the
985           behavior is unspecified.
986
987       Both print and printf can output at least {LINE_MAX} bytes.
988
989   Functions
990       The awk language has  a  variety  of  built-in  functions:  arithmetic,
991       string, input/output, and general.
992
993   Arithmetic Functions
994       The  arithmetic  functions, except for int, shall be based on the ISO C
995       standard (see Section 1.1.2, Concepts Derived from the ISO C Standard).
996       The  behavior  is undefined in cases where the ISO C standard specifies
997       that an error be returned or that the behavior is  undefined.  Although
998       the  grammar (see Grammar) permits built-in functions to appear with no
999       arguments or parentheses, unless the argument or parentheses are  indi‐
1000       cated  as optional in the following list (by displaying them within the
1001       "[]" brackets), such use is undefined.
1002
1003       atan2(y,x)
1004                 Return arctangent of y/x in radians in the range [-π,π].
1005
1006       cos(x)    Return cosine of x, where x is in radians.
1007
1008       sin(x)    Return sine of x, where x is in radians.
1009
1010       exp(x)    Return the exponential function of x.
1011
1012       log(x)    Return the natural logarithm of x.
1013
1014       sqrt(x)   Return the square root of x.
1015
1016       int(x)    Return the argument truncated to an integer. Truncation shall
1017                 be toward 0 when x>0.
1018
1019       rand()    Return a random number n, such that 0≤n<1.
1020
1021       srand([expr])
1022                 Set the seed value for rand to expr or use the time of day if
1023                 expr is omitted. The previous seed value shall be returned.
1024
1025   String Functions
1026       The  string  functions  in  the  following  list  shall  be  supported.
1027       Although the grammar (see Grammar) permits built-in functions to appear
1028       with no arguments or parentheses, unless the  argument  or  parentheses
1029       are  indicated  as  optional  in the following list (by displaying them
1030       within the "[]" brackets), such use is undefined.
1031
1032       gsub(ere, repl[, in])
1033                 Behave like sub (see below), except that it shall replace all
1034                 occurrences  of  the  regular expression (like the ed utility
1035                 global substitute) in $0 or in the in argument,  when  speci‐
1036                 fied.
1037
1038       index(s, t)
1039                 Return  the  position,  in  characters,  numbering from 1, in
1040                 string s where string t first occurs, or zero if it does  not
1041                 occur at all.
1042
1043       length[([s])]
1044                 Return  the length, in characters, of its argument taken as a
1045                 string, or of the whole record, $0, if there is no argument.
1046
1047       match(s, ere)
1048                 Return the position, in  characters,  numbering  from  1,  in
1049                 string s where the extended regular expression ere occurs, or
1050                 zero if it does not occur at all. RSTART shall be set to  the
1051                 starting  position (which is the same as the returned value),
1052                 zero if no match is found; RLENGTH shall be set to the length
1053                 of the matched string, -1 if no match is found.
1054
1055       split(s, a[, fs ])
1056                 Split the string s into array elements a[1], a[2], ..., a[n],
1057                 and return n.  All elements of the  array  shall  be  deleted
1058                 before  the  split is performed. The separation shall be done
1059                 with the ERE fs or with the field separator FS if fs  is  not
1060                 given. Each array element shall have a string value when cre‐
1061                 ated and, if appropriate, the array element shall be  consid‐
1062                 ered  a  numeric string (see Expressions in awk).  The effect
1063                 of a null string as the value of fs is unspecified.
1064
1065       sprintf(fmt, expr, expr, ...)
1066                 Format the expressions according to the printf  format  given
1067                 by fmt and return the resulting string.
1068
1069       sub(ere, repl[, in ])
1070                 Substitute  the string repl in place of the first instance of
1071                 the extended regular expression ERE in string in  and  return
1072                 the  number  of substitutions. An <ampersand> ('&') appearing
1073                 in the string repl shall be replaced by the  string  from  in
1074                 that  matches  the ERE. An <ampersand> preceded with a <back‐
1075                 slash> shall be interpreted as the literal <ampersand>  char‐
1076                 acter.  An  occurrence of two consecutive <backslash> charac‐
1077                 ters shall be interpreted as just  a  single  literal  <back‐
1078                 slash>  character. Any other occurrence of a <backslash> (for
1079                 example, preceding any other character) shall be treated as a
1080                 literal  <backslash> character. Note that if repl is a string
1081                 literal (the lexical token STRING; see Grammar), the handling
1082                 of  the  <ampersand>  character occurs after any lexical pro‐
1083                 cessing, including any  lexical  <backslash>-escape  sequence
1084                 processing.  If  in is specified and it is not an lvalue (see
1085                 Expressions in awk), the behavior  is  undefined.  If  in  is
1086                 omitted, awk shall use the current record ($0) in its place.
1087
1088       substr(s, m[, n ])
1089                 Return  the at most n-character substring of s that begins at
1090                 position m, numbering from 1. If n is omitted, or if n speci‐
1091                 fies  more characters than are left in the string, the length
1092                 of the substring shall be limited by the length of the string
1093                 s.
1094
1095       tolower(s)
1096                 Return  a  string based on the string s.  Each character in s
1097                 that is an uppercase letter specified to have a tolower  map‐
1098                 ping  by the LC_CTYPE category of the current locale shall be
1099                 replaced in the returned string by the lowercase letter spec‐
1100                 ified  by  the  mapping.  Other  characters  in  s  shall  be
1101                 unchanged in the returned string.
1102
1103       toupper(s)
1104                 Return a string based on the string s.  Each character  in  s
1105                 that  is  a lowercase letter specified to have a toupper map‐
1106                 ping by the  LC_CTYPE  category  of  the  current  locale  is
1107                 replaced in the returned string by the uppercase letter spec‐
1108                 ified by the mapping. Other characters in s are unchanged  in
1109                 the returned string.
1110
1111       All  of  the  preceding functions that take ERE as a parameter expect a
1112       pattern or a string valued expression that is a regular  expression  as
1113       defined in Regular Expressions.
1114
1115   Input/Output and General Functions
1116       The input/output and general functions are:
1117
1118       close(expression)
1119                 Close  the file or pipe opened by a print or printf statement
1120                 or a call to getline with the same string-valued  expression.
1121                 The  limit  on  the  number  of  open expression arguments is
1122                 implementation-defined. If  the  close  was  successful,  the
1123                 function  shall  return zero; otherwise, it shall return non-
1124                 zero.
1125
1126       expression | getline [var]
1127                 Read a record of input from a stream piped from the output of
1128                 a  command.  The stream shall be created if no stream is cur‐
1129                 rently open with the value of expression as its command name.
1130                 The  stream  created  shall be equivalent to one created by a
1131                 call to the popen() function with the value of expression  as
1132                 the  command  argument and a value of r as the mode argument.
1133                 As long as the stream remains open, subsequent calls in which
1134                 expression evaluates to the same string value shall read sub‐
1135                 sequent records from the stream. The stream shall remain open
1136                 until  the  close  function is called with an expression that
1137                 evaluates to the same string value. At that time, the  stream
1138                 shall  be closed as if by a call to the pclose() function. If
1139                 var is omitted, $0 and NF shall be set; otherwise, var  shall
1140                 be  set and, if appropriate, it shall be considered a numeric
1141                 string (see Expressions in awk).
1142
1143                 The getline operator can form ambiguous constructs when there
1144                 are  unparenthesized operators (including concatenate) to the
1145                 left of the '|' (to the beginning of the expression  contain‐
1146                 ing  getline).  In the context of the '$' operator, '|' shall
1147                 behave as if it had a lower precedence than '$'.  The  result
1148                 of  evaluating other operators is unspecified, and conforming
1149                 applications shall parenthesize properly all such usages.
1150
1151       getline   Set $0 to the next input record from the current input  file.
1152                 This form of getline shall set the NF, NR, and FNR variables.
1153
1154       getline var
1155                 Set  variable  var  to the next input record from the current
1156                 input file and, if appropriate, var  shall  be  considered  a
1157                 numeric  string  (see Expressions in awk).  This form of get‐
1158                 line shall set the FNR and NR variables.
1159
1160       getline [var] < expression
1161                 Read the next record of input from a named file. The  expres‐
1162                 sion shall be evaluated to produce a string that is used as a
1163                 pathname.  If the file of that name is not currently open, it
1164                 shall  be  opened. As long as the stream remains open, subse‐
1165                 quent calls in which expression evaluates to the same  string
1166                 value  shall  read subsequent records from the file. The file
1167                 shall remain open until the close function is called with  an
1168                 expression that evaluates to the same string value. If var is
1169                 omitted, $0 and NF shall be set; otherwise, var shall be  set
1170                 and,  if appropriate, it shall be considered a numeric string
1171                 (see Expressions in awk).
1172
1173                 The getline operator can form ambiguous constructs when there
1174                 are  unparenthesized binary operators (including concatenate)
1175                 to the right of the '<' (up to the end of the expression con‐
1176                 taining  the  getline).  The result of evaluating such a con‐
1177                 struct is  unspecified,  and  conforming  applications  shall
1178                 parenthesize properly all such usages.
1179
1180       system(expression)
1181                 Execute  the  command given by expression in a manner equiva‐
1182                 lent to the system() function defined in  the  System  Inter‐
1183                 faces  volume  of  POSIX.1‐2017 and return the exit status of
1184                 the command.
1185
1186       All forms of getline shall return 1 for successful input, zero for end-
1187       of-file, and -1 for an error.
1188
1189       Where  strings are used as the name of a file or pipeline, the applica‐
1190       tion shall ensure that the strings are textually identical. The  termi‐
1191       nology  ``same string value'' implies that ``equivalent strings'', even
1192       those that differ  only  by  <space>  characters,  represent  different
1193       files.
1194
1195   User-Defined Functions
1196       The  awk  language also provides user-defined functions. Such functions
1197       can be defined as:
1198
1199
1200           function name([parameter, ...]) { statements }
1201
1202       A function can be referred to anywhere in an awk program;  in  particu‐
1203       lar,  its  use  can  precede its definition. The scope of a function is
1204       global.
1205
1206       Function parameters, if present, can be either scalars or  arrays;  the
1207       behavior  is  undefined  if an array name is passed as a parameter that
1208       the function uses as a scalar, or if a scalar expression is passed as a
1209       parameter that the function uses as an array. Function parameters shall
1210       be passed by value if scalar and by reference if array name.
1211
1212       The number of parameters in the function definition need not match  the
1213       number of parameters in the function call. Excess formal parameters can
1214       be used as local variables. If fewer arguments are supplied in a  func‐
1215       tion  call  than  are  in the function definition, the extra parameters
1216       that are used in the function body as scalars  shall  evaluate  to  the
1217       uninitialized value until they are otherwise initialized, and the extra
1218       parameters that are used in  the  function  body  as  arrays  shall  be
1219       treated  as  uninitialized  arrays  where each element evaluates to the
1220       uninitialized value until otherwise initialized.
1221
1222       When invoking a function, no white space  can  be  placed  between  the
1223       function name and the opening parenthesis. Function calls can be nested
1224       and recursive calls can be made upon functions. Upon  return  from  any
1225       nested  or  recursive  function  call, the values of all of the calling
1226       function's parameters shall be unchanged, except for  array  parameters
1227       passed  by  reference.  The  return  statement  can be used to return a
1228       value. If a return statement appears outside of a function  definition,
1229       the behavior is undefined.
1230
1231       In  the  function  definition,  <newline>  characters shall be optional
1232       before the opening brace and after the closing brace. Function  defini‐
1233       tions can appear anywhere in the program where a pattern-action pair is
1234       allowed.
1235
1236   Grammar
1237       The grammar in this section and the lexical conventions in the  follow‐
1238       ing  section  shall  together describe the syntax for awk programs. The
1239       general conventions for this style of grammar are described in  Section
1240       1.3,  Grammar  Conventions.   A valid program can be represented as the
1241       non-terminal symbol program in the grammar. This  formal  syntax  shall
1242       take precedence over the preceding text syntax description.
1243
1244
1245           %token NAME NUMBER STRING ERE
1246           %token FUNC_NAME   /* Name followed by '(' without white space. */
1247
1248           /* Keywords */
1249           %token       Begin   End
1250           /*          'BEGIN' 'END'                            */
1251
1252           %token       Break   Continue   Delete   Do   Else
1253           /*          'break' 'continue' 'delete' 'do' 'else'  */
1254
1255           %token       Exit   For   Function   If   In
1256           /*          'exit' 'for' 'function' 'if' 'in'        */
1257
1258           %token       Next   Print   Printf   Return   While
1259           /*          'next' 'print' 'printf' 'return' 'while' */
1260
1261           /* Reserved function names */
1262           %token BUILTIN_FUNC_NAME
1263                       /* One token for the following:
1264                        * atan2 cos sin exp log sqrt int rand srand
1265                        * gsub index length match split sprintf sub
1266                        * substr tolower toupper close system
1267                        */
1268           %token GETLINE
1269                       /* Syntactically different from other built-ins. */
1270
1271           /* Two-character tokens. */
1272           %token ADD_ASSIGN SUB_ASSIGN MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN POW_ASSIGN
1273           /*     '+='       '-='       '*='       '/='       '%='       '^=' */
1274
1275           %token OR   AND  NO_MATCH   EQ   LE   GE   NE   INCR  DECR  APPEND
1276           /*     '||' '&&' '!~' '==' '<=' '>=' '!=' '++'  '--'  '>>'   */
1277
1278           /* One-character tokens. */
1279           %token '{' '}' '(' ')' '[' ']' ',' ';' NEWLINE
1280           %token '+' '-' '*' '%' '^' '!' '>' '<' '|' '?' ':' '~' '$' '='
1281
1282           %start program
1283           %%
1284
1285           program          : item_list
1286                            | item_list item
1287                            ;
1288
1289           item_list        : /* empty */
1290                            | item_list item terminator
1291                            ;
1292
1293           item             : action
1294                            | pattern action
1295                            | normal_pattern
1296                            | Function NAME      '(' param_list_opt ')'
1297                                  newline_opt action
1298                            | Function FUNC_NAME '(' param_list_opt ')'
1299                                  newline_opt action
1300                            ;
1301
1302           param_list_opt   : /* empty */
1303                            | param_list
1304                            ;
1305
1306           param_list       : NAME
1307                            | param_list ',' NAME
1308                            ;
1309
1310           pattern          : normal_pattern
1311                            | special_pattern
1312                            ;
1313
1314           normal_pattern   : expr
1315                            | expr ',' newline_opt expr
1316                            ;
1317
1318           special_pattern  : Begin
1319                            | End
1320                            ;
1321
1322           action           : '{' newline_opt                             '}'
1323                            | '{' newline_opt terminated_statement_list   '}'
1324                            | '{' newline_opt unterminated_statement_list '}'
1325                            ;
1326
1327           terminator       : terminator NEWLINE
1328                            |            ';'
1329                            |            NEWLINE
1330                            ;
1331
1332           terminated_statement_list : terminated_statement
1333                            | terminated_statement_list terminated_statement
1334                            ;
1335
1336           unterminated_statement_list : unterminated_statement
1337                            | terminated_statement_list unterminated_statement
1338                            ;
1339
1340           terminated_statement : action newline_opt
1341                            | If '(' expr ')' newline_opt terminated_statement
1342                            | If '(' expr ')' newline_opt terminated_statement
1343                                  Else newline_opt terminated_statement
1344                            | While '(' expr ')' newline_opt terminated_statement
1345                            | For '(' simple_statement_opt ';'
1346                                 expr_opt ';' simple_statement_opt ')' newline_opt
1347                                 terminated_statement
1348                            | For '(' NAME In NAME ')' newline_opt
1349                                 terminated_statement
1350                            | ';' newline_opt
1351                            | terminatable_statement NEWLINE newline_opt
1352                            | terminatable_statement ';'     newline_opt
1353                            ;
1354
1355           unterminated_statement : terminatable_statement
1356                            | If '(' expr ')' newline_opt unterminated_statement
1357                            | If '(' expr ')' newline_opt terminated_statement
1358                                 Else newline_opt unterminated_statement
1359                            | While '(' expr ')' newline_opt unterminated_statement
1360                            | For '(' simple_statement_opt ';'
1361                             expr_opt ';' simple_statement_opt ')' newline_opt
1362                                 unterminated_statement
1363                            | For '(' NAME In NAME ')' newline_opt
1364                                 unterminated_statement
1365                            ;
1366
1367           terminatable_statement : simple_statement
1368                            | Break
1369                            | Continue
1370                            | Next
1371                            | Exit expr_opt
1372                            | Return expr_opt
1373                            | Do newline_opt terminated_statement While '(' expr ')'
1374                            ;
1375
1376           simple_statement_opt : /* empty */
1377                            | simple_statement
1378                            ;
1379
1380           simple_statement : Delete NAME '[' expr_list ']'
1381                            | expr
1382                            | print_statement
1383                            ;
1384
1385           print_statement  : simple_print_statement
1386                            | simple_print_statement output_redirection
1387                            ;
1388
1389           simple_print_statement : Print  print_expr_list_opt
1390                            | Print  '(' multiple_expr_list ')'
1391                            | Printf print_expr_list
1392                            | Printf '(' multiple_expr_list ')'
1393                            ;
1394
1395           output_redirection : '>'    expr
1396                            | APPEND expr
1397                            | '|'    expr
1398                            ;
1399
1400           expr_list_opt    : /* empty */
1401                            | expr_list
1402                            ;
1403
1404           expr_list        : expr
1405                            | multiple_expr_list
1406                            ;
1407
1408           multiple_expr_list : expr ',' newline_opt expr
1409                            | multiple_expr_list ',' newline_opt expr
1410                            ;
1411
1412           expr_opt         : /* empty */
1413                            | expr
1414                            ;
1415
1416           expr             : unary_expr
1417                            | non_unary_expr
1418                            ;
1419
1420           unary_expr       : '+' expr
1421                            | '-' expr
1422                            | unary_expr '^'      expr
1423                            | unary_expr '*'      expr
1424                            | unary_expr '/'      expr
1425                            | unary_expr '%'      expr
1426                            | unary_expr '+'      expr
1427                            | unary_expr '-'      expr
1428                            | unary_expr          non_unary_expr
1429                            | unary_expr '<'      expr
1430                            | unary_expr LE       expr
1431                            | unary_expr NE       expr
1432                            | unary_expr EQ       expr
1433                            | unary_expr '>'      expr
1434                            | unary_expr GE       expr
1435                            | unary_expr '~'      expr
1436                            | unary_expr NO_MATCH expr
1437                            | unary_expr In NAME
1438                            | unary_expr AND newline_opt expr
1439                            | unary_expr OR  newline_opt expr
1440                            | unary_expr '?' expr ':' expr
1441                            | unary_input_function
1442                            ;
1443
1444           non_unary_expr   : '(' expr ')'
1445                            | '!' expr
1446                            | non_unary_expr '^'      expr
1447                            | non_unary_expr '*'      expr
1448                            | non_unary_expr '/'      expr
1449                            | non_unary_expr '%'      expr
1450                            | non_unary_expr '+'      expr
1451                            | non_unary_expr '-'      expr
1452                            | non_unary_expr          non_unary_expr
1453                            | non_unary_expr '<'      expr
1454                            | non_unary_expr LE       expr
1455                            | non_unary_expr NE       expr
1456                            | non_unary_expr EQ       expr
1457                            | non_unary_expr '>'      expr
1458                            | non_unary_expr GE       expr
1459                            | non_unary_expr '~'      expr
1460                            | non_unary_expr NO_MATCH expr
1461                            | non_unary_expr In NAME
1462                            | '(' multiple_expr_list ')' In NAME
1463                            | non_unary_expr AND newline_opt expr
1464                            | non_unary_expr OR  newline_opt expr
1465                            | non_unary_expr '?' expr ':' expr
1466                            | NUMBER
1467                            | STRING
1468                            | lvalue
1469                            | ERE
1470                            | lvalue INCR
1471                            | lvalue DECR
1472                            | INCR lvalue
1473                            | DECR lvalue
1474                            | lvalue POW_ASSIGN expr
1475                            | lvalue MOD_ASSIGN expr
1476                            | lvalue MUL_ASSIGN expr
1477                            | lvalue DIV_ASSIGN expr
1478                            | lvalue ADD_ASSIGN expr
1479                            | lvalue SUB_ASSIGN expr
1480                            | lvalue '=' expr
1481                            | FUNC_NAME '(' expr_list_opt ')'
1482                                 /* no white space allowed before '(' */
1483                            | BUILTIN_FUNC_NAME '(' expr_list_opt ')'
1484                            | BUILTIN_FUNC_NAME
1485                            | non_unary_input_function
1486                            ;
1487
1488           print_expr_list_opt : /* empty */
1489                            | print_expr_list
1490                            ;
1491
1492           print_expr_list  : print_expr
1493                            | print_expr_list ',' newline_opt print_expr
1494                            ;
1495
1496           print_expr       : unary_print_expr
1497                            | non_unary_print_expr
1498                            ;
1499
1500           unary_print_expr : '+' print_expr
1501                            | '-' print_expr
1502                            | unary_print_expr '^'      print_expr
1503                            | unary_print_expr '*'      print_expr
1504                            | unary_print_expr '/'      print_expr
1505                            | unary_print_expr '%'      print_expr
1506                            | unary_print_expr '+'      print_expr
1507                            | unary_print_expr '-'      print_expr
1508                            | unary_print_expr          non_unary_print_expr
1509                            | unary_print_expr '~'      print_expr
1510                            | unary_print_expr NO_MATCH print_expr
1511                            | unary_print_expr In NAME
1512                            | unary_print_expr AND newline_opt print_expr
1513                            | unary_print_expr OR  newline_opt print_expr
1514                            | unary_print_expr '?' print_expr ':' print_expr
1515                            ;
1516
1517           non_unary_print_expr : '(' expr ')'
1518                            | '!' print_expr
1519                            | non_unary_print_expr '^'      print_expr
1520                            | non_unary_print_expr '*'      print_expr
1521                            | non_unary_print_expr '/'      print_expr
1522                            | non_unary_print_expr '%'      print_expr
1523                            | non_unary_print_expr '+'      print_expr
1524                            | non_unary_print_expr '-'      print_expr
1525                            | non_unary_print_expr          non_unary_print_expr
1526                            | non_unary_print_expr '~'      print_expr
1527                            | non_unary_print_expr NO_MATCH print_expr
1528                            | non_unary_print_expr In NAME
1529                            | '(' multiple_expr_list ')' In NAME
1530                            | non_unary_print_expr AND newline_opt print_expr
1531                            | non_unary_print_expr OR  newline_opt print_expr
1532                            | non_unary_print_expr '?' print_expr ':' print_expr
1533                            | NUMBER
1534                            | STRING
1535                            | lvalue
1536                            | ERE
1537                            | lvalue INCR
1538                            | lvalue DECR
1539                            | INCR lvalue
1540                            | DECR lvalue
1541                            | lvalue POW_ASSIGN print_expr
1542                            | lvalue MOD_ASSIGN print_expr
1543                            | lvalue MUL_ASSIGN print_expr
1544                            | lvalue DIV_ASSIGN print_expr
1545                            | lvalue ADD_ASSIGN print_expr
1546                            | lvalue SUB_ASSIGN print_expr
1547                            | lvalue '=' print_expr
1548                            | FUNC_NAME '(' expr_list_opt ')'
1549                                /* no white space allowed before '(' */
1550                            | BUILTIN_FUNC_NAME '(' expr_list_opt ')'
1551                            | BUILTIN_FUNC_NAME
1552                            ;
1553
1554           lvalue           : NAME
1555                            | NAME '[' expr_list ']'
1556                            | '$' expr
1557                            ;
1558
1559           non_unary_input_function : simple_get
1560                            | simple_get '<' expr
1561                            | non_unary_expr '|' simple_get
1562                            ;
1563
1564           unary_input_function : unary_expr '|' simple_get
1565                            ;
1566
1567           simple_get       : GETLINE
1568                            | GETLINE lvalue
1569                            ;
1570
1571           newline_opt      : /* empty */
1572                            | newline_opt NEWLINE
1573                            ;
1574
1575       This grammar has several ambiguities that shall be resolved as follows:
1576
1577        *  Operator  precedence and associativity shall be as described in Ta‐
1578           ble 4-1, Expressions in Decreasing Precedence in awk.
1579
1580        *  In case of ambiguity, an else shall be  associated  with  the  most
1581           immediately preceding if that would satisfy the grammar.
1582
1583        *  In  some  contexts, a <slash> ('/') that is used to surround an ERE
1584           could also be the division operator.  This  shall  be  resolved  in
1585           such  a  way  that  wherever  the division operator could appear, a
1586           <slash> is assumed to be the division operator. (There is no  unary
1587           division operator.)
1588
1589       Each  expression  in an awk program shall conform to the precedence and
1590       associativity rules, even when this is not needed to resolve an ambigu‐
1591       ity.  For  example,  because  '$'  has higher precedence than '++', the
1592       string "$x++--" is not a valid awk expression, even though it is  unam‐
1593       biguously parsed by the grammar as "$(x++)--".
1594
1595       One  convention  that  might  not be obvious from the formal grammar is
1596       where <newline> characters are acceptable. There  are  several  obvious
1597       placements  such  as  terminating a statement, and a <backslash> can be
1598       used to escape <newline> characters  between  any  lexical  tokens.  In
1599       addition,  <newline> characters without <backslash> characters can fol‐
1600       low a comma, an open brace, logical AND  operator  ("&&"),  logical  OR
1601       operator  ("||"),  the  do  keyword,  the else keyword, and the closing
1602       parenthesis of an if, for, or while statement. For example:
1603
1604
1605           { print  $1,
1606                    $2 }
1607
1608   Lexical Conventions
1609       The lexical conventions for awk programs, with respect to the preceding
1610       grammar, shall be as follows:
1611
1612        1. Except  as noted, awk shall recognize the longest possible token or
1613           delimiter beginning at a given point.
1614
1615        2. A comment shall consist of any characters beginning with the  <num‐
1616           ber-sign>  character  and  terminated  by,  but  excluding the next
1617           occurrence of, a <newline>.  Comments shall have no effect,  except
1618           to delimit lexical tokens.
1619
1620        3. The <newline> shall be recognized as the token NEWLINE.
1621
1622        4. A  <backslash>  character immediately followed by a <newline> shall
1623           have no effect.
1624
1625        5. The token STRING shall represent a string constant. A  string  con‐
1626           stant  shall  begin  with  the character '"'.  Within a string con‐
1627           stant, a <backslash> character shall  be  considered  to  begin  an
1628           escape  sequence  as specified in the table in the Base Definitions
1629           volume of POSIX.1‐2017, Chapter  5,  File  Format  Notation  ('\\',
1630           '\a', '\b', '\f', '\n', '\r', '\t', '\v').  In addition, the escape
1631           sequences in Table 4-2, Escape Sequences in  awk  shall  be  recog‐
1632           nized.  A  <newline>  shall  not  occur within a string constant. A
1633           string constant shall be terminated by the first  unescaped  occur‐
1634           rence  of  the  character  '"' after the one that begins the string
1635           constant. The value of the string shall  be  the  sequence  of  all
1636           unescaped  characters  and  values of escape sequences between, but
1637           not including, the two delimiting '"' characters.
1638
1639        6. The token ERE represents an extended regular  expression  constant.
1640           An  ERE  constant shall begin with the <slash> character. Within an
1641           ERE constant, a <backslash> character shall be considered to  begin
1642           an  escape  sequence  as specified in the table in the Base Defini‐
1643           tions volume of POSIX.1‐2017, Chapter 5, File Format Notation.   In
1644           addition,  the  escape  sequences in Table 4-2, Escape Sequences in
1645           awk shall be recognized. The application shall ensure that a  <new‐
1646           line>  does not occur within an ERE constant. An ERE constant shall
1647           be terminated by the first  unescaped  occurrence  of  the  <slash>
1648           character  after the one that begins the ERE constant. The extended
1649           regular expression represented by the ERE  constant  shall  be  the
1650           sequence of all unescaped characters and values of escape sequences
1651           between, but not including, the two delimiting <slash> characters.
1652
1653        7. A <blank> shall have no effect, except to delimit lexical tokens or
1654           within STRING or ERE tokens.
1655
1656        8. The  token  NUMBER shall represent a numeric constant. Its form and
1657           numeric value shall either be equivalent to  the  decimal-floating-
1658           constant token as specified by the ISO C standard, or it shall be a
1659           sequence of decimal digits and shall be  evaluated  as  an  integer
1660           constant  in  decimal.  In  addition,  implementations  may  accept
1661           numeric constants with the form and numeric value equivalent to the
1662           hexadecimal-constant  and  hexadecimal-floating-constant  tokens as
1663           specified by the ISO C standard.
1664
1665           If the value is too large or too small  to  be  representable  (see
1666           Section  1.1.2,  Concepts  Derived  from  the  ISO C Standard), the
1667           behavior is undefined.
1668
1669        9. A sequence of underscores, digits, and alphabetics from the  porta‐
1670           ble character set (see the Base Definitions volume of POSIX.1‐2017,
1671           Section 6.1, Portable Character Set),  beginning  with  an  <under‐
1672           score> or alphabetic character, shall be considered a word.
1673
1674       10. The  following words are keywords that shall be recognized as indi‐
1675           vidual tokens; the name of the token is the same as the keyword:
1676
1677           BEGIN      delete     END        function   in         printf
1678           break      do         exit       getline    next       return
1679           continue   else       for        if         print      while
1680
1681       11. The following words are names of built-in functions  and  shall  be
1682           recognized as the token BUILTIN_FUNC_NAME:
1683
1684           atan2     gsub      log       split     sub       toupper
1685           close     index     match     sprintf   substr
1686           cos       int       rand      sqrt      system
1687           exp       length    sin       srand     tolower
1688
1689           The  above-listed keywords and names of built-in functions are con‐
1690           sidered reserved words.
1691
1692       12. The token NAME shall consist of a word that is not a keyword  or  a
1693           name  of a built-in function and is not followed immediately (with‐
1694           out any delimiters) by the '(' character.
1695
1696       13. The token FUNC_NAME shall consist of a word that is not  a  keyword
1697           or a name of a built-in function, followed immediately (without any
1698           delimiters) by the '(' character. The '(' character  shall  not  be
1699           included as part of the token.
1700
1701       14. The  following  two-character  sequences shall be recognized as the
1702           named tokens:
1703
1704                     ┌───────────┬──────────┬────────────┬──────────┐
1705                     │Token Name │ Sequence │ Token Name │ Sequence │
1706                     ├───────────┼──────────┼────────────┼──────────┤
1707                     │ADD_ASSIGN │    +=    │ NO_MATCH   │    !~    │
1708                     │SUB_ASSIGN │    -=    │ EQ         │    ==    │
1709                     │MUL_ASSIGN │    *=    │ LE         │    <=    │
1710                     │DIV_ASSIGN │    /=    │ GE         │    >=    │
1711                     │MOD_ASSIGN │    %=    │ NE         │    !=    │
1712                     │POW_ASSIGN │    ^=    │ INCR       │    ++    │
1713                     │OR         │    ||    │ DECR       │    --    │
1714                     │AND        │    &&    │ APPEND     │    >>    │
1715                     └───────────┴──────────┴────────────┴──────────┘
1716       15. The following single characters shall be recognized as tokens whose
1717           names are the character:
1718
1719
1720               <newline> { } ( ) [ ] , ; + - * % ^ ! > < | ? : ~ $ =
1721
1722       There  is  a lexical ambiguity between the token ERE and the tokens '/'
1723       and DIV_ASSIGN.  When an input sequence begins with a <slash> character
1724       in any syntactic context where the token '/' or DIV_ASSIGN could appear
1725       as the next token in a valid program, the longer of  those  two  tokens
1726       that can be recognized shall be recognized. In any other syntactic con‐
1727       text where the token ERE could appear as the next token in a valid pro‐
1728       gram, the token ERE shall be recognized.
1729

EXIT STATUS

1731       The following exit values shall be returned:
1732
1733        0    All input files were processed successfully.
1734
1735       >0    An error occurred.
1736
1737       The  exit  status  can  be  altered within the program by using an exit
1738       expression.
1739

CONSEQUENCES OF ERRORS

1741       If any file operand is specified and the named file cannot be accessed,
1742       awk  shall  write  a diagnostic message to standard error and terminate
1743       without any further action.
1744
1745       If the program specified by either the program operand  or  a  progfile
1746       operand  is  not  a  valid  awk  program  (as specified in the EXTENDED
1747       DESCRIPTION section), the behavior is undefined.
1748
1749       The following sections are informative.
1750

APPLICATION USAGE

1752       The index, length, match, and substr functions should not  be  confused
1753       with  similar  functions  in  the ISO C standard; the awk versions deal
1754       with characters, while the ISO C standard deals with bytes.
1755
1756       Because the concatenation operation is represented by adjacent  expres‐
1757       sions  rather  than  an explicit operator, it is often necessary to use
1758       parentheses to enforce the proper evaluation precedence.
1759
1760       When using awk to process pathnames, it is recommended that LC_ALL,  or
1761       at least LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environ‐
1762       ment, since pathnames can contain byte sequences that do not form valid
1763       characters  in some locales, in which case the utility's behavior would
1764       be undefined. In the POSIX locale each  byte  is  a  valid  single-byte
1765       character, and therefore this problem is avoided.
1766
1767       On  implementations  where  the "==" operator checks if strings collate
1768       equally, applications needing to check whether  strings  are  identical
1769       can use:
1770
1771
1772           length(a) == length(b) && index(a,b) == 1
1773
1774       On  implementations where the "==" operator checks if strings are iden‐
1775       tical, applications needing to check whether  strings  collate  equally
1776       can use:
1777
1778
1779           a <= b && a >= b
1780

EXAMPLES

1782       The  awk program specified in the command line is most easily specified
1783       within single-quotes (for example, 'program')  for  applications  using
1784       sh,  because  awk programs commonly contain characters that are special
1785       to the shell, including double-quotes. In the cases where an  awk  pro‐
1786       gram contains single-quote characters, it is usually easiest to specify
1787       most of the program as strings within single-quotes concatenated by the
1788       shell with quoted single-quote characters. For example:
1789
1790
1791           awk '/'\''/ { print "quote:", $0 }'
1792
1793       prints  all  lines  from  the  standard input containing a single-quote
1794       character, prefixed with quote:.
1795
1796       The following are examples of simple awk programs:
1797
1798        1. Write to the standard output all input lines for which field  3  is
1799           greater than 5:
1800
1801
1802               $3 > 5
1803
1804        2. Write every tenth line:
1805
1806
1807               (NR % 10) == 0
1808
1809        3. Write any line with a substring matching the regular expression:
1810
1811
1812               /(G|D)(2[0-9][[:alpha:]]*)/
1813
1814        4. Print  any  line with a substring containing a 'G' or 'D', followed
1815           by a sequence of digits and characters. This example uses character
1816           classes  digit  and  alpha  to match language-independent digit and
1817           alphabetic characters respectively:
1818
1819
1820               /(G|D)([[:digit:][:alpha:]]*)/
1821
1822        5. Write any line in  which  the  second  field  matches  the  regular
1823           expression and the fourth field does not:
1824
1825
1826               $2 ~ /xyz/ && $4 !~ /xyz/
1827
1828        6. Write any line in which the second field contains a <backslash>:
1829
1830
1831               $2 ~ /\\/
1832
1833        7. Write  any  line  in which the second field contains a <backslash>.
1834           Note that <backslash>-escapes are interpreted twice; once in  lexi‐
1835           cal  processing  of  the  string and once in processing the regular
1836           expression:
1837
1838
1839               $2 ~ "\\\\"
1840
1841        8. Write the second to the last and the last field in each line. Sepa‐
1842           rate the fields by a <colon>:
1843
1844
1845               {OFS=":";print $(NF-1), $NF}
1846
1847        9. Write  the line number and number of fields in each line. The three
1848           strings representing the line number, the <colon>, and  the  number
1849           of  fields  are concatenated and that string is written to standard
1850           output:
1851
1852
1853               {print NR ":" NF}
1854
1855       10. Write lines longer than 72 characters:
1856
1857
1858               length($0) > 72
1859
1860       11. Write the first two fields in opposite order separated by OFS:
1861
1862
1863               { print $2, $1 }
1864
1865       12. Same, with input fields separated by a <comma> or <space> and <tab>
1866           characters, or both:
1867
1868
1869               BEGIN { FS = ",[ \t]*|[ \t]+" }
1870                     { print $2, $1 }
1871
1872       13. Add up the first column, print sum, and average:
1873
1874
1875                     {s += $1 }
1876               END   {print "sum is ", s, " average is", s/NR}
1877
1878       14. Write  fields  in  reverse  order, one per line (many lines out for
1879           each line in):
1880
1881
1882               { for (i = NF; i > 0; --i) print $i }
1883
1884       15. Write all lines between occurrences of the strings start and stop:
1885
1886
1887               /start/, /stop/
1888
1889       16. Write all lines whose first field is different  from  the  previous
1890           one:
1891
1892
1893               $1 != prev { print; prev = $1 }
1894
1895       17. Simulate echo:
1896
1897
1898               BEGIN  {
1899                       for (i = 1; i < ARGC; ++i)
1900                       printf("%s%s", ARGV[i], i==ARGC-1?"\n":" ")
1901               }
1902
1903       18. Write the path prefixes contained in the PATH environment variable,
1904           one per line:
1905
1906
1907               BEGIN  {
1908                       n = split (ENVIRON["PATH"], path, ":")
1909                       for (i = 1; i <= n; ++i)
1910                       print path[i]
1911               }
1912
1913       19. If there is a file named input containing page headers of the form:
1914           Page #
1915
1916           and a file named program that contains:
1917
1918
1919               /Page/   { $2 = n++; }
1920                        { print }
1921
1922           then the command line:
1923
1924
1925               awk -f program n=5 input
1926
1927           prints the file input, filling in page numbers starting at 5.
1928

RATIONALE

1930       This description is based on the new awk, ``nawk'', (see the referenced
1931       The AWK Programming Language), which introduced a number  of  new  fea‐
1932       tures to the historical awk:
1933
1934        1. New keywords: delete, do, function, return
1935
1936        2. New  built-in functions: atan2, close, cos, gsub, match, rand, sin,
1937           srand, sub, system
1938
1939        3. New predefined variables: FNR, ARGC, ARGV, RSTART, RLENGTH, SUBSEP
1940
1941        4. New expression operators: ?, :, ,, ^
1942
1943        5. The FS variable and the third argument to  split,  now  treated  as
1944           extended regular expressions.
1945
1946        6. The  operator  precedence, changed to more closely match the C lan‐
1947           guage.  Two examples of code that operate differently are:
1948
1949
1950               while ( n /= 10 > 1) ...
1951               if (!"wk" ~ /bwk/) ...
1952
1953       Several features have been added based on newer implementations of awk:
1954
1955        *  Multiple instances of -f progfile are permitted.
1956
1957        *  The new option -v assignment.
1958
1959        *  The new predefined variable ENVIRON.
1960
1961        *  New built-in functions toupper and tolower.
1962
1963        *  More formatting capabilities are added to printf to match the ISO C
1964           standard.
1965
1966       Earlier  versions  of this standard required implementations to support
1967       multiple adjacent <semicolon>s, lines  with  one  or  more  <semicolon>
1968       before  a  rule  (pattern-action  pairs),  and  lines  with only <semi‐
1969       colon>(s).  These are not required by this standard and are  considered
1970       poor  programming practice, but can be accepted by an implementation of
1971       awk as an extension.
1972
1973       The overall awk syntax has always been based on the C language, with  a
1974       few features from the shell command language and other sources. Because
1975       of this, it is not completely compatible with any other language, which
1976       has  caused confusion for some users. It is not the intent of the stan‐
1977       dard developers to address such issues. A few relatively minor  changes
1978       toward making the language more compatible with the ISO C standard were
1979       made; most of these changes are based  on  similar  changes  in  recent
1980       implementations,  as  described  above. There remain several C-language
1981       conventions that are not in awk.   One  of  the  notable  ones  is  the
1982       <comma>  operator,  which  is commonly used to specify multiple expres‐
1983       sions in the C language for statement. Also, there are  various  places
1984       where awk is more restrictive than the C language regarding the type of
1985       expression that can be used in a given context. These  limitations  are
1986       due to the different features that the awk language does provide.
1987
1988       Regular  expressions in awk have been extended somewhat from historical
1989       implementations to make  them  a  pure  superset  of  extended  regular
1990       expressions,  as defined by POSIX.1‐2008 (see the Base Definitions vol‐
1991       ume of POSIX.1‐2017, Section 9.4, Extended Regular  Expressions).   The
1992       main  extensions are internationalization features and interval expres‐
1993       sions. Historical implementations of awk  have  long  supported  <back‐
1994       slash>-escape  sequences  as  an  extension to extended regular expres‐
1995       sions, and this extension has been retained despite inconsistency  with
1996       other  utilities.  The  number  of  escape sequences recognized in both
1997       extended regular expressions and strings has varied (generally increas‐
1998       ing with time) among implementations. The set specified by POSIX.1‐2008
1999       includes most sequences known to be supported  by  popular  implementa‐
2000       tions  and by the ISO C standard. One sequence that is not supported is
2001       hexadecimal value escapes beginning with '\x'.  This would allow values
2002       expressed  in  more  than  9 bits to be used within awk as in the ISO C
2003       standard. However, because this syntax has a non-deterministic  length,
2004       it  does not permit the subsequent character to be a hexadecimal digit.
2005       This limitation can be dealt with in the C language by the use of lexi‐
2006       cal string concatenation. In the awk language, concatenation could also
2007       be a solution for strings, but not  for  extended  regular  expressions
2008       (either  lexical  ERE  tokens  or  strings  used dynamically as regular
2009       expressions). Because of this limitation,  the  feature  has  not  been
2010       added to POSIX.1‐2008.
2011
2012       When  a  string variable is used in a context where an extended regular
2013       expression normally appears (where the lexical token ERE is used in the
2014       grammar) the string does not contain the literal <slash> characters.
2015
2016       Some versions of awk allow the form:
2017
2018
2019           func name(args, ... ) { statements }
2020
2021       This has been deprecated by the authors of the language, who asked that
2022       it not be specified.
2023
2024       Historical implementations of awk produce an error if a next  statement
2025       is  executed  in  a  BEGIN action, and cause awk to terminate if a next
2026       statement is executed in an END action. This behavior has not been doc‐
2027       umented,  and  it was not believed that it was necessary to standardize
2028       it.
2029
2030       The specification of conversions between string and numeric  values  is
2031       much  more detailed than in the documentation of historical implementa‐
2032       tions or in the referenced The AWK Programming Language. Although  most
2033       of  the behavior is designed to be intuitive, the details are necessary
2034       to ensure compatible behavior from different implementations.  This  is
2035       especially  important  in relational expressions since the types of the
2036       operands determine whether a string or numeric comparison is performed.
2037       From  the perspective of an application developer, it is usually suffi‐
2038       cient to expect intuitive behavior and to force conversions (by  adding
2039       zero  or  concatenating  a  null string) when the type of an expression
2040       does not obviously match what is needed. The intent has been to specify
2041       historical  practice in almost all cases. The one exception is that, in
2042       historical  implementations,  variables  and  constants  maintain  both
2043       string  and  numeric  values after their original value is converted by
2044       any use. This means that referencing a variable or  constant  can  have
2045       unexpected  side-effects.  For example, with historical implementations
2046       the following program:
2047
2048
2049           {
2050               a = "+2"
2051               b = 2
2052               if (NR % 2)
2053                   c = a + b
2054               if (a == b)
2055                   print "numeric comparison"
2056               else
2057                   print "string comparison"
2058           }
2059
2060       would perform a numeric comparison (and output numeric comparison)  for
2061       each  odd-numbered  line,  but  perform a string comparison (and output
2062       string comparison) for each even-numbered  line.  POSIX.1‐2008  ensures
2063       that  comparisons  will be numeric if necessary. With historical imple‐
2064       mentations, the following program:
2065
2066
2067           BEGIN {
2068               OFMT = "%e"
2069               print 3.14
2070               OFMT = "%f"
2071               print 3.14
2072           }
2073
2074       would output "3.140000e+00" twice, because in the second  print  state‐
2075       ment  the  constant  "3.14" would have a string value from the previous
2076       conversion. POSIX.1‐2008 requires that the output of the  second  print
2077       statement  be  "3.140000".   The behavior of historical implementations
2078       was seen as too unintuitive and unpredictable.
2079
2080       It was pointed out that with the rules contained in early  drafts,  the
2081       following script would print nothing:
2082
2083
2084           BEGIN {
2085               y[1.5] = 1
2086               OFMT = "%e"
2087               print y[1.5]
2088           }
2089
2090       Therefore,  a  new variable, CONVFMT, was introduced. The OFMT variable
2091       is now restricted to affecting output conversions of numbers to strings
2092       and  CONVFMT  is  used for internal conversions, such as comparisons or
2093       array indexing. The default value is the same  as  that  for  OFMT,  so
2094       unless  a  program  changes  CONVFMT (which no historical program would
2095       do), it will receive the historical behavior associated  with  internal
2096       string conversions.
2097
2098       The POSIX awk lexical and syntactic conventions are specified more for‐
2099       mally than in other sources. Again the intent has been to specify  his‐
2100       torical  practice. One convention that may not be obvious from the for‐
2101       mal grammar as in other verbal descriptions is where <newline>  charac‐
2102       ters  are acceptable. There are several obvious placements such as ter‐
2103       minating a statement, and a <backslash> can be used to escape <newline>
2104       characters  between  any lexical tokens. In addition, <newline> charac‐
2105       ters without <backslash> characters can follow a comma, an open  brace,
2106       a  logical  AND  operator  ("&&"), a logical OR operator ("||"), the do
2107       keyword, the else keyword, and the closing parenthesis of an  if,  for,
2108       or while statement. For example:
2109
2110
2111           { print $1,
2112                   $2 }
2113
2114       The  requirement that awk add a trailing <newline> to the program argu‐
2115       ment text is to simplify the grammar, making it match a  text  file  in
2116       form.  There  is  no  way for an application or test suite to determine
2117       whether a literal <newline> is added or whether awk simply acts  as  if
2118       it did.
2119
2120       POSIX.1‐2008  requires  several changes from historical implementations
2121       in order to support internationalization. Probably the most  subtle  of
2122       these  is  the  use  of  the  decimal-point  character,  defined by the
2123       LC_NUMERIC category of the locale, in representations of floating-point
2124       numbers.  This locale-specific character is used in recognizing numeric
2125       input, in converting between strings and numeric values, and in format‐
2126       ting output. However, regardless of locale, the <period> character (the
2127       decimal-point character of the POSIX locale) is the decimal-point char‐
2128       acter  recognized  in processing awk programs (including assignments in
2129       command line arguments). This is essentially the same convention as the
2130       one  used  in the ISO C standard. The difference is that the C language
2131       includes the setlocale() function, which permits an application to mod‐
2132       ify its locale. Because of this capability, a C application begins exe‐
2133       cuting with its locale set to the C locale, and only  executes  in  the
2134       environment-specified  locale  after  an  explicit call to setlocale().
2135       However, adding such an elaborate new feature to the awk  language  was
2136       seen  as  inappropriate  for POSIX.1‐2008. It is possible to execute an
2137       awk program explicitly in any desired locale by setting the environment
2138       in the shell.
2139
2140       The  undefined behavior resulting from NULs in extended regular expres‐
2141       sions allows future extensions for the  GNU  gawk  program  to  process
2142       binary data.
2143
2144       The  behavior  in  the case of invalid awk programs (including lexical,
2145       syntactic, and semantic errors) is undefined because it was  considered
2146       overly  limiting  on  implementations  to  specify.  In most cases such
2147       errors can be expected to produce a diagnostic and a non-zero exit sta‐
2148       tus. However, some implementations may choose to extend the language in
2149       ways that make use of certain invalid constructs.  Other  invalid  con‐
2150       structs  might  be deemed worthy of a warning, but otherwise cause some
2151       reasonable behavior. Still other constructs may be  very  difficult  to
2152       detect  in some implementations.  Also, different implementations might
2153       detect a given error during an initial parsing of the  program  (before
2154       reading  any  input  files) while others might detect it when executing
2155       the program after reading some input. Implementors should be aware that
2156       diagnosing errors as early as possible and producing useful diagnostics
2157       can ease debugging of applications, and  thus  make  an  implementation
2158       more usable.
2159
2160       The  unspecified  behavior  from  using multi-character RS values is to
2161       allow possible future extensions based on extended regular  expressions
2162       used  for  record separators. Historical implementations take the first
2163       character of the string and ignore the others.
2164
2165       Unspecified behavior when  split(string,array,<null>)  is  used  is  to
2166       allow  a proposed future extension that would split up a string into an
2167       array of individual characters.
2168
2169       In the context of the getline function, equally good arguments for dif‐
2170       ferent  precedences  of  the  | and < operators can be made. Historical
2171       practice has been that:
2172
2173
2174           getline < "a" "b"
2175
2176       is parsed as:
2177
2178
2179           ( getline < "a" ) "b"
2180
2181       although many would argue that the intent was that the file  ab  should
2182       be read. However:
2183
2184
2185           getline < "x" + 1
2186
2187       parses as:
2188
2189
2190           getline < ( "x" + 1 )
2191
2192       Similar  problems  occur with the | version of getline, particularly in
2193       combination with $.  For example:
2194
2195
2196           $"echo hi" | getline
2197
2198       (This situation is particularly problematic when used in a print state‐
2199       ment, where the |getline part might be a redirection of the print.)
2200
2201       Since in most cases such constructs are not (or at least should not) be
2202       used (because they have a natural ambiguity for which there is no  con‐
2203       ventional  parsing),  the  meaning  of  these  constructs has been made
2204       explicitly unspecified. (The effect is that  a  conforming  application
2205       that runs into the problem must parenthesize to resolve the ambiguity.)
2206       There appeared to be few if any actual uses of such constructs.
2207
2208       Grammars can be written that would cause an error under  these  circum‐
2209       stances.  Where  backwards-compatibility  is not a large consideration,
2210       implementors may wish to use such grammars.
2211
2212       Some historical implementations have allowed some built-in functions to
2213       be called without an argument list, the result being a default argument
2214       list chosen in some ``reasonable'' way. Use of length as a synonym  for
2215       length($0)  is the only one of these forms that is thought to be widely
2216       known or widely used; this particular form  is  documented  in  various
2217       places  (for example, most historical awk reference pages, although not
2218       in the referenced The AWK Programming Language) as legitimate practice.
2219       With  this  exception,  default argument lists have always been undocu‐
2220       mented and vaguely defined, and it is not at all clear how (or if) they
2221       should  be  generalized  to  user-defined functions. They add no useful
2222       functionality and preclude possible future extensions that  might  need
2223       to  name  functions  without calling them. Not standardizing them seems
2224       the simplest course. The standard  developers  considered  that  length
2225       merited special treatment, however, since it has been documented in the
2226       past and sees possibly substantial use in historical programs.  Accord‐
2227       ingly,  this  usage  has  been made legitimate, but Issue 5 removed the
2228       obsolescent marking for XSI-conforming implementations and many  other‐
2229       wise conforming applications depend on this feature.
2230
2231       In  sub  and  gsub,  if  repl  is  a  string literal (the lexical token
2232       STRING), then two consecutive <backslash> characters should be used  in
2233       the  string to ensure a single <backslash> will precede the <ampersand>
2234       when the resultant string is passed to the function. (For  example,  to
2235       specify   one  literal  <ampersand>  in  the  replacement  string,  use
2236       gsub(ERE, "\\&").)
2237
2238       Historically, the only special character in the repl  argument  of  sub
2239       and  gsub string functions was the <ampersand> ('&') character and pre‐
2240       ceding it with the <backslash> character was used to turn off its  spe‐
2241       cial meaning.
2242
2243       The  description  in  the ISO POSIX‐2:1993 standard introduced behavior
2244       such that the <backslash> character was another special  character  and
2245       it  was  unspecified  whether  there were any other special characters.
2246       This description introduced several portability problems, some of which
2247       are described below, and so it has been replaced with the more histori‐
2248       cal description. Some of the problems include:
2249
2250        *  Historically, to create the replacement string, a script could  use
2251           gsub(ERE,  "\\&"),  but with the ISO POSIX‐2:1993 standard wording,
2252           it was necessary to use gsub(ERE, "\\\\&").  The <backslash>  char‐
2253           acters  are doubled here because all string literals are subject to
2254           lexical analysis, which would reduce each pair of <backslash> char‐
2255           acters to a single <backslash> before being passed to gsub.
2256
2257        *  Since it was unspecified what the special characters were, for por‐
2258           table scripts to guarantee that characters are  printed  literally,
2259           each  character  had to be preceded with a <backslash>.  (For exam‐
2260           ple, a portable script had to use gsub(ERE, "\\h\\i") to produce  a
2261           replacement string of "hi".)
2262
2263       The  description  for  comparisons in the ISO POSIX‐2:1993 standard did
2264       not properly describe historical practice because of  the  way  numeric
2265       strings  are compared as numbers. The current rules cause the following
2266       code:
2267
2268
2269           if (0 == "000")
2270               print "strange, but true"
2271           else
2272               print "not true"
2273
2274       to do a numeric comparison, causing the if to  succeed.  It  should  be
2275       intuitively  obvious  that  this  is incorrect behavior, and indeed, no
2276       historical implementation of awk actually behaves this way.
2277
2278       To fix this problem, the definition of numeric string was  enhanced  to
2279       include  only those values obtained from specific circumstances (mostly
2280       external sources) where it is not possible to  determine  unambiguously
2281       whether the value is intended to be a string or a numeric.
2282
2283       Variables  that  are assigned to a numeric string shall also be treated
2284       as a numeric string. (For example, the notion of a numeric  string  can
2285       be propagated across assignments.) In comparisons, all variables having
2286       the uninitialized value are to be treated as a numeric operand evaluat‐
2287       ing to the numeric value zero.
2288
2289       Uninitialized  variables  include  all  types  of  variables  including
2290       scalars, array elements, and fields. The definition of an uninitialized
2291       value  in  Variables and Special Variables is necessary to describe the
2292       value placed on uninitialized variables and on fields  that  are  valid
2293       (for example, < $NF) but have no characters in them and to describe how
2294       these variables are to be used in comparisons. A valid field,  such  as
2295       $1,  that has no characters in it can be obtained from an input line of
2296       "\t\t" when FS='\t'.  Historically, the  comparison  ($1<10)  was  done
2297       numerically after evaluating $1 to the value zero.
2298
2299       The  phrase  ``...  also  shall  have  the numeric value of the numeric
2300       string'' was removed from  several  sections  of  the  ISO POSIX‐2:1993
2301       standard  because is specifies an unnecessary implementation detail. It
2302       is not necessary for POSIX.1‐2008 to  specify  that  these  objects  be
2303       assigned  two  different  values.  It is only necessary to specify that
2304       these objects may evaluate to two different values  depending  on  con‐
2305       text.
2306
2307       Historical  implementations of awk did not parse hexadecimal integer or
2308       floating constants like "0xa" and "0xap0".  Due to  an  oversight,  the
2309       2001  through 2004 editions of this standard required support for hexa‐
2310       decimal floating constants. This was due to the  reference  to  atof().
2311       This  version  of  the standard allows but does not require implementa‐
2312       tions to use atof() and includes a description  of  how  floating-point
2313       numbers  are  recognized  as an alternative to match historic behavior.
2314       The intent of this change is  to  allow  implementations  to  recognize
2315       floating-point  constants  according  to  either  the ISO/IEC 9899:1990
2316       standard or ISO/IEC 9899:1999 standard, and to allow (but not  require)
2317       implementations to recognize hexadecimal integer constants.
2318
2319       Historical  implementations  of  awk  did  not  support  floating-point
2320       infinities and NaNs in numeric strings; e.g., "-INF" and  "NaN".   How‐
2321       ever,  implementations  that use the atof() or strtod() functions to do
2322       the conversion picked up support  for  these  values  if  they  used  a
2323       ISO/IEC 9899:1999  standard  version  of  the  function  instead  of  a
2324       ISO/IEC 9899:1990 standard version.  Due  to  an  oversight,  the  2001
2325       through  2004  editions  of  this  standard  did  not allow support for
2326       infinities and NaNs, but in this revision support is allowed  (but  not
2327       required). This is a silent change to the behavior of awk programs; for
2328       example, in the POSIX locale the expression:
2329
2330
2331           ("-INF" + 0 < 0)
2332
2333       formerly had the value 0 because "-INF" converted to 0, but now it  may
2334       have the value 0 or 1.
2335

FUTURE DIRECTIONS

2337       A  future version of this standard may require the "!=" and "==" opera‐
2338       tors to perform string comparisons by checking if the strings are iden‐
2339       tical (and not by checking if they collate equally).
2340

COPYRIGHT

2353       Portions  of  this text are reprinted and reproduced in electronic form
2354       from IEEE Std 1003.1-2017, Standard for Information Technology --  Por‐
2355       table  Operating System Interface (POSIX), The Open Group Base Specifi‐
2356       cations Issue 7, 2018 Edition, Copyright (C) 2018 by the  Institute  of
2357       Electrical  and  Electronics Engineers, Inc and The Open Group.  In the
2358       event of any discrepancy between this version and the original IEEE and
2359       The  Open Group Standard, the original IEEE and The Open Group Standard
2360       is the referee document. The original Standard can be  obtained  online
2361       at http://www.opengroup.org/unix/online.html .
2362
2363       Any  typographical  or  formatting  errors that appear in this page are
2364       most likely to have been introduced during the conversion of the source
2365       files  to  man page format. To report such errors, see https://www.ker‐
2366       nel.org/doc/man-pages/reporting_bugs.html .
2367
2368
2369
2370IEEE/The Open Group                  2017                              AWK(1P)