1nawk(1)                          User Commands                         nawk(1)
2
3
4

NAME

6       nawk - pattern scanning and processing language
7

SYNOPSIS

9       /usr/bin/nawk [-F ERE] [-v assignment] 'program' | -f progfile...
10            [argument]...
11
12
13       /usr/xpg4/bin/awk [-F ERE] [-v assignment]... 'program' | -f progfile...
14            [argument]...
15
16

DESCRIPTION

18       The  /usr/bin/nawk  and  /usr/xpg4/bin/awk  utilities  execute programs
19       written in the nawk programming language, which is specialized for tex‐
20       tual  data  manipulation.  A nawk program is a sequence of patterns and
21       corresponding actions. The string specifying program must  be  enclosed
22       in  single  quotes  (') to protect it from interpretation by the shell.
23       The sequence of pattern - action statements can  be  specified  in  the
24       command  line  as  program or in one, or more, file(s) specified by the
25       -fprogfile option. When input is  read  that  matches  a  pattern,  the
26       action associated with the pattern is performed.
27
28
29       Input  is interpreted as a sequence of records. By default, a record is
30       a line, but this can be changed by using the RS built-in variable. Each
31       record  of  input  is  matched to each pattern in the program. For each
32       pattern matched, the associated action is executed.
33
34
35       The nawk utility interprets each input record as a sequence  of  fields
36       where,  by  default,  a field is a string of non-blank characters. This
37       default white-space field delimiter (blanks and/or tabs) can be changed
38       by using the FS built-in variable or the -FERE option. The nawk utility
39       denotes the first field in a record $1, the second $2,  and  so  forth.
40       The  symbol  $0  refers  to  the entire record; setting any other field
41       causes the reevaluation of $0. Assigning to $0 resets the values of all
42       fields and the NF built-in variable.
43

OPTIONS

45       The following options are supported:
46
47       -F ERE           Define  the  input  field separator to be the extended
48                        regular expression ERE, before any input is read  (can
49                        be a character).
50
51
52       -f progfile      Specifies the pathname of the file progfile containing
53                        a nawk program. If multiple instances of  this  option
54                        are  specified,  the concatenation of the files speci‐
55                        fied as progfile in the order specified  is  the  nawk
56                        program.  The nawk program can alternatively be speci‐
57                        fied in the command line as a single argument.
58
59
60       -v assignment    The assignment argument must be in the same form as an
61                        assignment  operand.  The  assignment  is  of the form
62                        var=value, where var is the name of one of  the  vari‐
63                        ables described below. The specified assignment occurs
64                        before  executing  the  nawk  program,  including  the
65                        actions  associated with BEGIN patterns (if any). Mul‐
66                        tiple occurrences of this option can be specified.
67
68

OPERANDS

70       The following operands are supported:
71
72       program     If no -f option is specified, the first operand to nawk  is
73                   the  text of the nawk program. The application supplies the
74                   program operand as a single argument to nawk. If  the  text
75                   does  not  end  in a newline character, nawk interprets the
76                   text as if it did.
77
78
79       argument    Either of the following two types of argument can be inter‐
80                   mixed:
81
82                   file          A  pathname of a file that contains the input
83                                 to be read, which is matched against the  set
84                                 of  patterns in the program. If no file oper‐
85                                 ands are specified, or if a file  operand  is
86                                 , the standard input is used.
87
88
89                   assignment    An  operand that begins with an underscore or
90                                 alphabetic character from the portable  char‐
91                                 acter  set,  followed by a sequence of under‐
92                                 scores, digits and alphabetics from the  por‐
93                                 table  character set, followed by the = char‐
94                                 acter specifies a variable assignment  rather
95                                 than  a pathname. The characters before the =
96                                 represent the name of  a  nawk  variable.  If
97                                 that name is a nawk reserved word, the behav‐
98                                 ior is undefined.  The  characters  following
99                                 the  equal  sign  is  interpreted  as if they
100                                 appeared in the  nawk  program  preceded  and
101                                 followed  by a double-quote (") character, as
102                                 a STRING token ,  except  that  if  the  last
103                                 character  is  an  unescaped backslash, it is
104                                 interpreted as  a  literal  backslash  rather
105                                 than  as  the first character of the sequence
106                                 \.. The variable is  assigned  the  value  of
107                                 that STRING token. If the value is considered
108                                 a numericstring, the variable is assigned its
109                                 numeric  value. Each such variable assignment
110                                 is performed just before  the  processing  of
111                                 the  following file, if any. Thus, an assign‐
112                                 ment before the first file argument  is  exe‐
113                                 cuted after the BEGIN actions (if any), while
114                                 an assignment after the last file argument is
115                                 executed before the END actions (if any).  If
116                                 there are no file arguments, assignments  are
117                                 executed   before   processing  the  standard
118                                 input.
119
120
121

INPUT FILES

123       Input files to the nawk program from any of the following sources:
124
125           o      any file operands or their equivalents, achieved by  modify‐
126                  ing the nawk variables ARGV and ARGC
127
128           o      standard input in the absence of any file operands
129
130           o      arguments to the getline function
131
132
133       must  be  text  files.  Whether the variable RS is set to a value other
134       than a newline character or not, for these files, implementations  sup‐
135       port  records  terminated with the specified separator up to {LINE_MAX}
136       bytes and can support longer records.
137
138
139       If -f progfile is specified, the files named by each  of  the  progfile
140       option-arguments must be text files containing an nawk program.
141
142
143       The  standard input are used only if no file operands are specified, or
144       if a file operand is .
145

EXTENDED DESCRIPTION

147       A nawk program is composed of pairs of the form:
148
149         pattern { action }
150
151
152
153       Either the pattern or the action (including the enclosing brace charac‐
154       ters)  can  be  omitted.  Pattern-action  statements are separated by a
155       semicolon or by a newline.
156
157
158       A missing pattern matches any record of input, and a missing action  is
159       equivalent  to  an  action  that  writes the matched record of input to
160       standard output.
161
162
163       Execution of the nawk program starts by  first  executing  the  actions
164       associated  with all BEGIN patterns in the order they occur in the pro‐
165       gram. Then each file operand (or standard input if no files were speci‐
166       fied) is processed by reading data from the file until a record separa‐
167       tor is seen (a newline character by  default),  splitting  the  current
168       record  into fields using the current value of FS, evaluating each pat‐
169       tern in the program in the  order  of  occurrence,  and  executing  the
170       action  associated  with  each pattern that matches the current record.
171       The action for a matching pattern is executed before evaluating  subse‐
172       quent  patterns.  Last, the actions associated with all END patterns is
173       executed in the order they occur in the program.
174
175   Expressions in nawk
176       Expressions describe computations used in patterns and actions. In  the
177       following  table,  valid expression operations are given in groups from
178       highest precedence first to lowest precedence last,  with  equal-prece‐
179       dence operators grouped between horizontal lines. In expression evalua‐
180       tion, where the grammar is formally ambiguous, higher precedence opera‐
181       tors  are  evaluated  before lower precedence operators.  In this table
182       expr, expr1, expr2, and expr3 represent any  expression,  while  lvalue
183       represents  any  entity  that  can be assigned to (that is, on the left
184       side of an assignment operator).
185
186
187
188
189           Syntax                  Name              Type of Result     Associativity
190       ────────────────────────────────────────────────────────────────────────────────
191       ( expr )          Grouping                   type of expr        n/a
192       ────────────────────────────────────────────────────────────────────────────────
193       $expr             Field reference            string              n/a
194       ────────────────────────────────────────────────────────────────────────────────
195       ++ lvalue         Pre-increment              numeric             n/a
196        −−lvalue         Pre-decrement              numeric             n/a
197       lvalue ++         Post-increment             numeric             n/a
198
199       lvalue −−         Post-decrement             numeric             n/a
200       ────────────────────────────────────────────────────────────────────────────────
201       expr ^ expr       Exponentiation             numeric             right
202       ────────────────────────────────────────────────────────────────────────────────
203       ! expr            Logical not                numeric             n/a
204       + expr            Unary plus                 numeric             n/a
205expr            Unary minus                numeric             n/a
206       ────────────────────────────────────────────────────────────────────────────────
207       expr * expr       Multiplication             numeric             left
208       expr / expr       Division                   numeric             left
209       expr % expr       Modulus                    numeric             left
210       ────────────────────────────────────────────────────────────────────────────────
211       expr + expr       Addition                   numeric             left
212       exprexpr       Subtraction                numeric             left
213       ────────────────────────────────────────────────────────────────────────────────
214       expr expr         String concatenation       string              left
215       ────────────────────────────────────────────────────────────────────────────────
216       expr < expr       Less than                  numeric             none
217       expr <= expr      Less than or equal to      numeric             none
218       expr != expr      Not equal to               numeric             none
219       expr == expr      Equal to                   numeric             none
220       expr > expr       Greater than               numeric             none
221       expr >= expr      Greater than or equal to   numeric             none
222       ────────────────────────────────────────────────────────────────────────────────
223       expr ~ expr       ERE match                  numeric             none
224       expr !~ expr      ERE non-match               numeric            none
225       ────────────────────────────────────────────────────────────────────────────────
226       expr in array     Array membership           numeric             left
227       ( index ) in      Multi-dimension array      numeric             left
228           array             membership
229       ────────────────────────────────────────────────────────────────────────────────
230       expr && expr      Logical AND                numeric             left
231       ────────────────────────────────────────────────────────────────────────────────
232       expr || expr      Logical OR                 numeric             left
233       ────────────────────────────────────────────────────────────────────────────────
234       expr1 ? expr2     Conditional expression     type of selected    right
235           : expr3                                     expr2 or expr3
236       ────────────────────────────────────────────────────────────────────────────────
237       lvalue ^= expr    Exponentiation             numeric             right
238                         assignment
239       lvalue %= expr    Modulus assignment         numeric             right
240       lvalue *= expr    Multiplication             numeric             right
241                         assignment
242       lvalue /= expr    Division assignment        numeric             right
243       lvalue +=  expr   Addition assignment        numeric             right
244       lvalue −= expr    Subtraction assignment     numeric             right
245       lvalue = expr     Assignment                 type of expr        right
246
247
248
249       Each expression has either a string value, a  numeric  value  or  both.
250       Except  as  stated for specific contexts, the value of an expression is
251       implicitly converted to the type needed for the context in which it  is
252       used.  A string value is converted to a numeric value by the equivalent
253       of the following calls:
254
255         setlocale(LC_NUMERIC, "");
256         numeric_value = atof(string_value);
257
258
259
260       A numeric value that is exactly equal to the value  of  an  integer  is
261       converted  to a string by the equivalent of a call to the sprintf func‐
262       tion with the string %d as the fmt argument and the numeric value being
263       converted as the first and only expr argument.  Any other numeric value
264       is converted to a string by the equivalent of a  call  to  the  sprintf
265       function with the value of the variable CONVFMT as the fmt argument and
266       the numeric value being converted as the first and only expr argument.
267
268
269       A string value is considered to be a numeric string  in  the  following
270       case:
271
272           1.     Any leading and trailing blank characters is ignored.
273
274           2.     If the first unignored character is a + or , it is ignored.
275
276           3.     If  the  remaining  unignored  characters would be lexically
277                  recognized as a NUMBER token, the  string  is  considered  a
278                  numeric string.
279
280
281       If  a  character is ignored in the above steps, the numeric value of
282       the numeric string is the negation of the numeric value of  the  recog‐
283       nized  NUMBER  token. Otherwise the numeric value of the numeric string
284       is the numeric value of the recognized NUMBER token. Whether or  not  a
285       string is a numeric string is relevant only in contexts where that term
286       is used in this section.
287
288
289       When an expression is used in a Boolean context, if it  has  a  numeric
290       value,  a  value  of  zero  is  treated as false and any other value is
291       treated as true. Otherwise, a  string  value  of  the  null  string  is
292       treated as false and any other value is treated as true. A Boolean con‐
293       text is one of the following:
294
295           o      the first subexpression of a conditional expression.
296
297           o      an expression operated on by logical NOT,  logical  AND,  or
298                  logical OR.
299
300           o      the second expression of a for statement.
301
302           o      the expression of an if statement.
303
304           o      the  expression  of the while clause in either a while or do
305                  ... while statement.
306
307           o      an expression used as  a  pattern  (as  in  Overall  Program
308                  Structure).
309
310
311       The  nawk language supplies arrays that are used for storing numbers or
312       strings. Arrays need not be declared. They  are  initially  empty,  and
313       their  sizes  changes  dynamically.  The subscripts, or element identi‐
314       fiers, are strings, providing a type of associative  array  capability.
315       An  array  name  followed  by a subscript within square brackets can be
316       used as an lvalue and as an expression, as described  in  the  grammar.
317       Unsubscripted array names are used in only the following contexts:
318
319           o      a parameter in a function definition or function call.
320
321           o      the NAME token following any use of the keyword in.
322
323
324       A  valid  array  index  consists of one or more comma-separated expres‐
325       sions, similar to the way in which multi-dimensional arrays are indexed
326       in  some  programming  languages.  Because  nawk arrays are really one-
327       dimensional, such a comma-separated  list  is  converted  to  a  single
328       string  by concatenating the string values of the separate expressions,
329       each separated from the other by the value of the SUBSEP variable.
330
331
332       Thus, the following two index operations are equivalent:
333
334         var[expr1, expr2, ... exprn]
335         var[expr1 SUBSEP expr2 SUBSEP ... SUBSEP exprn]
336
337
338
339       A multi-dimensioned index used with the in  operator  must  be  put  in
340       parentheses.  The  in operator, which tests for the existence of a par‐
341       ticular array element, does not create  the  element  if  it  does  not
342       exist.   Any  other reference to a non-existent array element automati‐
343       cally creates it.
344
345   Variables and Special Variables
346       Variables can be used in an nawk program by referencing them. With  the
347       exception  of  function  parameters,  they are not explicitly declared.
348       Uninitialized scalar variables and array elements have both  a  numeric
349       value of zero and a string value of the empty string.
350
351
352       Field variables are designated by a $ followed by a number or numerical
353       expression. The effect of the field  number  expression  evaluating  to
354       anything  other  than a non-negative integer is unspecified. Uninitial‐
355       ized variables or string values need not be converted to numeric values
356       in  this  context. New field variables are created by assigning a value
357       to them. References to non-existent fields (that is, fields after  $NF)
358       produce  the  null  string.  However, assigning to a non-existent field
359       (for example, $(NF+2) = 5) increases the value of NF, create any inter‐
360       vening  fields with the null string as their values and cause the value
361       of $0 to be recomputed, with the fields being separated by the value of
362       OFS.  Each  field  variable  has  a  string  value when created. If the
363       string, with any occurrence of the  decimal-point  character  from  the
364       current  locale  changed to a period character, is considered a numeric
365       string (see Expressions in nawk above), the field variable also has the
366       numeric value of the numeric string.
367
368   /usr/bin/nawk, /usr/xpg4/bin/awk
369       nawk  sets  the  following special variables that are supported by both
370       /usr/bin/nawk and /usr/xpg4/bin/awk:
371
372       ARGC        The number of elements in the ARGV array.
373
374
375       ARGV        An array of command line arguments, excluding  options  and
376                   the program argument, numbered from zero to ARGC−1.
377
378                   The arguments in ARGV can be modified or added to; ARGC can
379                   be altered.  As each input file ends, nawk treats the  next
380                   non-null  element  of  ARGV,  up  to  the  current value of
381                   ARGC−1, inclusive, as the name  of  the  next  input  file.
382                   Setting  an  element  of  ARGV to null means that it is not
383                   treated as an input file. The name indicates the standard
384                   input.  If  an argument matches the format of an assignment
385                   operand, this argument is treated as an  assignment  rather
386                   than a file argument.
387
388
389       ENVIRON     The  variable ENVIRON is an array representing the value of
390                   the environment. The indices of the array are strings  con‐
391                   sisting  of the names of the environment variables, and the
392                   value of each array element is a string consisting  of  the
393                   value  of  that  variable.  If  the value of an environment
394                   variable is considered a numeric string, the array  element
395                   also has its numeric value.
396
397                   In all cases where nawk behavior is affected by environment
398                   variables (including the environment of any  commands  that
399                   nawk executes via the system function or via pipeline redi‐
400                   rections with the print statement, the printf statement, or
401                   the getline function), the environment used is the environ‐
402                   ment at the time nawk began executing.
403
404
405       FILENAME    A pathname of the current input file. Inside a BEGIN action
406                   the  value  is undefined. Inside an END action the value is
407                   the name of the last input file processed.
408
409
410       FNR         The ordinal number of the current  record  in  the  current
411                   file.  Inside  a  BEGIN action the value is zero. Inside an
412                   END action the value is the number of the last record  pro‐
413                   cessed in the last file processed.
414
415
416       FS          Input field separator regular expression; a space character
417                   by default.
418
419
420       NF          The number of fields in the current record. Inside a  BEGIN
421                   action,  the  use of NF is undefined unless a getline func‐
422                   tion without a var argument is executed previously.  Inside
423                   an  END  action,  NF  retains the value it had for the last
424                   record read, unless a subsequent, redirected, getline func‐
425                   tion  without a var argument is performed prior to entering
426                   the END action.
427
428
429       NR          The ordinal number of the current record from the start  of
430                   input.  Inside  a BEGIN action the value is zero. Inside an
431                   END action the value is the number of the last record  pro‐
432                   cessed.
433
434
435       OFMT        The printf format for converting numbers to strings in out‐
436                   put statements "%.6g" by default. The result of the conver‐
437                   sion is unspecified if the value of OFMT is not a floating-
438                   point format specification.
439
440
441       OFS         The print statement output field separator; a space charac‐
442                   ter by default.
443
444
445       ORS         The  print  output record separator; a newline character by
446                   default.
447
448
449       LENGTH      The length of the string matched by the match function.
450
451
452       RS          The first character of the string value of RS is the  input
453                   record  separator;  a  newline  character by default. If RS
454                   contains more than one character, the results are  unspeci‐
455                   fied.  If  RS  is  null,  then  records  are  separated  by
456                   sequences of one or more blank lines. Leading  or  trailing
457                   blank  lines  do not produce empty records at the beginning
458                   or end of input, and the field separator is always newline,
459                   no matter what the value of FS.
460
461
462       RSTART      The  starting  position  of the string matched by the match
463                   function, numbering from 1. This is  always  equivalent  to
464                   the return value of the match function.
465
466
467       SUBSEP      The   subscript   separator  string  for  multi-dimensional
468                   arrays. The default value is \034.
469
470
471   /usr/xpg4/bin/awk
472       The following variable is supported for /usr/xpg4/bin/awk only:
473
474       CONVFMT    The printf format for converting numbers to strings  (except
475                  for  output  statements, where OFMT is used). The default is
476                  %.6g.
477
478
479   Regular Expressions
480       The nawk utility makes use of the extended regular expression  notation
481       (see  regex(5)) except that it allows the use of C-language conventions
482       to escape special characters within the EREs, namely \\,  \a,  \b,  \f,
483       \n,  \r,  \t,  \v,  and  those specified in the following table.  These
484       escape sequences are recognized both inside and outside bracket expres‐
485       sions.   Note  that records need not be separated by newline characters
486       and string constants can contain newline characters,  so  even  the  \n
487       sequence  is  valid  in  nawk EREs.  Using a slash character within the
488       regular expression requires escaping as shown in the table below:
489
490
491
492
493       Escape Sequence   Description                Meaning
494       ───────────────────────────────────────────────────────────────────────
495       \"                Backslash quotation-mark   Quotation-mark character
496       ───────────────────────────────────────────────────────────────────────
497       \/                Backslash slash            Slash character
498       ───────────────────────────────────────────────────────────────────────
499       \ddd              A  backslash   character   The character encoded by
500                         followed  by the longest   the one-, two- or three-
501                         sequence of one, two, or   digit   octal   integer.
502                         three  octal-digit char‐   Multi-byte    characters
503                         acters  (01234567).   If   require  multiple,  con‐
504                         all of the digits are 0,   catenated         escape
505                         (that is, representation   sequences, including the
506                         of  the NULL character),   leading \ for each byte.
507                         the  behavior  is  unde‐
508                         fined.
509       ───────────────────────────────────────────────────────────────────────
510       \c                A  backslash   character   Undefined
511                         followed  by any charac‐
512                         ter  not  described   in
513                         this  table  or  special
514                         characters (\\, \a,  \b,
515                         \f, \n, \r, \t, \v).
516
517
518
519       A  regular expression can be matched against a specific field or string
520       by using one of the two regular expression matching  operators,  ~  and
521       !~.  These  operators  interpret  their right-hand operand as a regular
522       expression and their left-hand operand as  a  string.  If  the  regular
523       expression  matches the string, the ~ expression evaluates to the value
524       1, and the !~ expression evaluates to  the  value  0.  If  the  regular
525       expression does not match the string, the ~ expression evaluates to the
526       value 0, and the !~ expression evaluates to the value 1. If the  right-
527       hand  operand  is  any expression other than the lexical token ERE, the
528       string value of the expression is interpreted as  an  extended  regular
529       expression,  including  the  escape conventions described above. Notice
530       that these same escape conventions also are applied in the  determining
531       the  value  of  a  string  literal  (the  lexical token STRING), and is
532       applied a second time when a string literal is used in this context.
533
534
535       When an ERE token appears as an expression in any context other than as
536       the  right-hand of the ~ or !~ operator or as one of the built-in func‐
537       tion arguments described below, the value of the  resulting  expression
538       is the equivalent of:
539
540         $0 ~ /ere/
541
542
543
544       The ere argument to the gsub, match, sub functions, and the fs argument
545       to the split function (see String Functions) is interpreted as extended
546       regular  expressions.  These  can  be  either  ERE  tokens or arbitrary
547       expressions, and are interpreted in the same manner as  the  right-hand
548       side of the ~ or !~ operator.
549
550
551       An  extended regular expression can be used to separate fields by using
552       the -F ERE option or by assigning a string containing the expression to
553       the  built-in  variable  FS.  The default value of the FS variable is a
554       single space character. The following describes FS behavior:
555
556           1.     If FS is a single character:
557
558               o      If FS is the space character, skip leading and  trailing
559                      blank characters; fields are delimited by sets of one or
560                      more blank characters.
561
562               o      Otherwise, if FS is any other character  c,  fields  are
563                      delimited by each single occurrence of c.
564
565           2.     Otherwise,  the  string  value  of FS is considered to be an
566                  extended regular expression. Each occurrence of  a  sequence
567                  matching the extended regular expression delimits fields.
568
569
570       Except  in  the gsub, match, split, and sub built-in functions, regular
571       expression matching is based on input records. That is, record  separa‐
572       tor  characters (the first character of the value of the variable RS, a
573       newline character by default) cannot be embedded in the expression, and
574       no  expression  matches  the  record separator character. If the record
575       separator is not a newline character, newline  characters  embedded  in
576       the  expression can be matched. In those four built-in functions, regu‐
577       lar expression matching are based on text strings.  So,  any  character
578       (including  the  newline  character  and  the  record separator) can be
579       embedded in the pattern and an appropriate pattern matches any  charac‐
580       ter.  However,  in all nawk regular expression matching, the use of one
581       or more NULL characters in the pattern, input  record  or  text  string
582       produces undefined results.
583
584   Patterns
585       A pattern is any valid expression, a range specified by two expressions
586       separated by comma, or one of the two special patterns BEGIN or END.
587
588   Special Patterns
589       The nawk utility recognizes two special patterns, BEGIN and  END.  Each
590       BEGIN pattern is matched once and its associated action executed before
591       the first record of input is read (except possibly by use of  the  get‐
592       line  function in a prior BEGIN action) and before command line assign‐
593       ment is done. Each END pattern  is  matched  once  and  its  associated
594       action executed after the last record of input has been read. These two
595       patterns have associated actions.
596
597
598       BEGIN and END do not combine with other patterns.  Multiple  BEGIN  and
599       END  patterns  are  allowed. The actions associated with the BEGIN pat‐
600       terns are executed in the order specified in the program,  as  are  the
601       END actions. An END pattern can precede a BEGIN pattern in a program.
602
603
604       If an nawk program consists of only actions with the pattern BEGIN, and
605       the BEGIN action contains no getline function, nawk exits without read‐
606       ing  its input when the last statement in the last BEGIN action is exe‐
607       cuted. If an nawk program consists of only actions with the pattern END
608       or  only  actions  with  the  patterns BEGIN and END, the input is read
609       before the statements in the END actions are executed.
610
611   Expression Patterns
612       An expression pattern is evaluated as if it were  an  expression  in  a
613       Boolean  context.  If  the result is true, the pattern is considered to
614       match, and the associated action (if any) is executed. If the result is
615       false, the action is not executed.
616
617   Pattern Ranges
618       A  pattern  range  consists of two expressions separated by a comma. In
619       this case, the action is performed for all records between a  match  of
620       the  first expression and the following match of the second expression,
621       inclusive. At this point, the pattern range can be repeated starting at
622       input records subsequent to the end of the matched range.
623
624   Actions
625       An  action  is  a sequence of statements. A statement can be one of the
626       following:
627
628         if ( expression ) statement [ else statement ]
629         while ( expression ) statement
630         do statement while ( expression )
631         for ( expression ; expression ; expression ) statement
632         for ( var in array ) statement
633         delete array[subscript] #delete an array element
634         break
635         continue
636         { [ statement ] ... }
637         expression        # commonly variable = expression
638         print [ expression-list ] [ >expression ]
639         printf format [ ,expression-list ] [ >expression ]
640         next              # skip remaining patterns on this input line
641         exit [expr] # skip the rest of the input; exit status is expr
642         return [expr]
643
644
645
646       Any single statement can be replaced by a statement  list  enclosed  in
647       braces.   The  statements are terminated by newline characters or semi‐
648       colons, and are executed sequentially in the order that they appear.
649
650
651       The next statement causes all further processing of the  current  input
652       record  to  be abandoned. The behavior is undefined if a next statement
653       appears or is invoked in a BEGIN or END action.
654
655
656       The exit statement invokes all END actions in the order in  which  they
657       occur  in  the  program  source  and then terminate the program without
658       reading further input. An exit statement inside an  END  action  termi‐
659       nates  the  program  without  further  execution of END actions.  If an
660       expression is specified in an exit statement, its numeric value is  the
661       exit status of nawk, unless subsequent errors are encountered or a sub‐
662       sequent exit statement with an expression is executed.
663
664   Output Statements
665       Both print and printf statements write to standard output  by  default.
666       The  output  is written to the location specified by output_redirection
667       if one is supplied, as follows:
668
669         > expression>> expression| expression
670
671
672
673       In all cases, the expression is evaluated to produce a string  that  is
674       used  as a full pathname to write into (for > or >>) or as a command to
675       be executed (for |). Using the first two forms, if  the  file  of  that
676       name  is not currently open, it is opened, creating it if necessary and
677       using the first form, truncating the file. The output then is  appended
678       to  the  file.   As  long as the file remains open, subsequent calls in
679       which expression evaluates to the same string value simply appends out‐
680       put  to the file. The file remains open until the close function, which
681       is called with an expression that evaluates to the same string value.
682
683
684       The third form writes output onto a stream piped to the input of a com‐
685       mand.  The  stream  is  created if no stream is currently open with the
686       value of expression as its command name.  The stream created is equiva‐
687       lent  to one created by a call to the popen(3C) function with the value
688       of expression as the command argument and a value  of  w  as  the  mode
689       argument.   As  long  as  the  stream remains open, subsequent calls in
690       which expression evaluates to the same string value  writes  output  to
691       the  existing  stream. The stream remains open until the close function
692       is called with an expression that evaluates to the same  string  value.
693       At  that time, the stream is closed as if by a call to the pclose func‐
694       tion.
695
696
697       These output statements take a comma-separated  list  of  expression  s
698       referred   in  the  grammar  by  the  non-terminal  symbols  expr_list,
699       print_expr_list or print_expr_list_opt. This list is referred  to  here
700       as the expression list, and each member is referred to as an expression
701       argument.
702
703
704       The print statement writes the value of each expression  argument  onto
705       the indicated output stream separated by the current output field sepa‐
706       rator (see variable OFS above), and terminated  by  the  output  record
707       separator  (see  variable ORS above). All expression arguments is taken
708       as strings, being converted if necessary; with the exception  that  the
709       printf format in OFMT is used instead of the value in CONVFMT. An empty
710       expression list stands for the whole input record ($0).
711
712
713       The printf statement produces output based on a notation similar to the
714       File  Format  Notation  used  to describe file formats in this document
715       Output is produced as specified with the first expression  argument  as
716       the  string  format  and subsequent expression arguments as the strings
717       arg1 to argn, inclusive, with the following exceptions:
718
719           1.     The format is an  actual  character  string  rather  than  a
720                  graphical representation. Therefore, it cannot contain empty
721                  character positions.  The  space  character  in  the  format
722                  string,  in  any  context  other than a flag of a conversion
723                  specification, is treated as an ordinary character  that  is
724                  copied to the output.
725
726           2.     If  the  character  set  contains a Delta character and that
727                  character appears in the format string, it is treated as  an
728                  ordinary character that is copied to the output.
729
730           3.     The escape sequences beginning with a backslash character is
731                  treated as sequences of ordinary characters that are  copied
732                  to the output. Note that these same sequences is interpreted
733                  lexically by nawk when they appear in literal  strings,  but
734                  they is not treated specially by the printf statement.
735
736           4.     A field width or precision can be specified as the * charac‐
737                  ter instead of a digit string. In this case the  next  argu‐
738                  ment  from  the  expression  list is fetched and its numeric
739                  value taken as the field width or precision.
740
741           5.     The implementation does not precede or  follow  output  from
742                  the  d  or u conversion specifications with blank characters
743                  not specified by the format string.
744
745           6.     The implementation does not precede output from the  o  con‐
746                  version  specification  with  leading zeros not specified by
747                  the format string.
748
749           7.     For the c conversion specification: if the  argument  has  a
750                  numeric value, the character whose encoding is that value is
751                  output.  If the value is zero or is not the encoding of  any
752                  character  in  the character set, the behavior is undefined.
753                  If the argument does not have a  numeric  value,  the  first
754                  character  of the string value is output; if the string does
755                  not contain any characters the behavior is undefined.
756
757           8.     For each conversion specification that consumes an argument,
758                  the  next  expression argument is evaluated. With the excep‐
759                  tion of the c conversion, the  value  is  converted  to  the
760                  appropriate type for the conversion specification.
761
762           9.     If  there  are  insufficient expression arguments to satisfy
763                  all the conversion specifications in the format string,  the
764                  behavior is undefined.
765
766           10.    If any character sequence in the format string begins with a
767                  % character, but does not form a valid conversion specifica‐
768                  tion, the behavior is unspecified.
769
770
771       Both print and printf can output at least {LINE_MAX} bytes.
772
773   Functions
774       The  nawk  language  has  a  variety of built-in functions: arithmetic,
775       string, input/output and general.
776
777   Arithmetic Functions
778       The arithmetic functions, except for int, are based on the ISO C  stan‐
779       dard. The behavior is undefined in cases where the ISO C standard spec‐
780       ifies that an error be returned or  that  the  behavior  is  undefined.
781       Although the grammar permits built-in functions to appear with no argu‐
782       ments or parentheses, unless the argument or parentheses are  indicated
783       as  optional  in  the following list (by displaying them within the [ ]
784       brackets), such use is undefined.
785
786       atan2(y,x)       Return arctangent of y/x.
787
788
789       cos(x)           Return cosine of x, where x is in radians.
790
791
792       sin(x)           Return sine of x, where x is in radians.
793
794
795       exp(x)           Return the exponential function of x.
796
797
798       log(x)           Return the natural logarithm of x.
799
800
801       sqrt(x)          Return the square root of x.
802
803
804       int(x)           Truncate its argument to an integer. It  is  truncated
805                        toward 0 when x > 0.
806
807
808       rand()           Return a random number n, such that 0 ≤ n < 1.
809
810
811       srand([expr])    Set the seed value for rand to expr or use the time of
812                        day if expr is omitted. The  previous  seed  value  is
813                        returned.
814
815
816   String Functions
817       The string functions in the following list shall be supported. Although
818       the grammar permits built-in functions to appear with no  arguments  or
819       parentheses,  unless  the  argument  or  parentheses  are  indicated as
820       optional in the following list (by  displaying  them  within  the  [  ]
821       brackets), such use is undefined.
822
823       gsub(ere,repl[,in])
824
825           Behave  like  sub  (see  below), except that it replaces all occur‐
826           rences of the regular expression (like the ed utility  global  sub‐
827           stitute) in $0 or in the in argument, when specified.
828
829
830       index(s,t)
831
832           Return  the  position, in characters, numbering from 1, in string s
833           where string t first occurs, or zero if it does not occur at all.
834
835
836       length[([s])]
837
838           Return the length, in  characters,  of  its  argument  taken  as  a
839           string, or of the whole record, $0, if there is no argument.
840
841
842       match(s,ere)
843
844           Return  the  position, in characters, numbering from 1, in string s
845           where the extended regular expression ere occurs,  or  zero  if  it
846           does  not  occur  at  all.  RSTART  is set to the starting position
847           (which is the same as the returned value),  zero  if  no  match  is
848           found; RLENGTH is set to the length of the matched string, −1 if no
849           match is found.
850
851
852       split(s,a[,fs])
853
854           Split the string s into array elements a[1], a[2], ...,  a[n],  and
855           return  n. The separation is done with the extended regular expres‐
856           sion fs or with the field separator FS if fs  is  not  given.  Each
857           array  element  has  a  string  value  when  created. If the string
858           assigned to any array element, with any occurrence of the  decimal-
859           point character from the current locale changed to a period charac‐
860           ter, would be considered a numeric string; the array  element  also
861           has  the  numeric value of the numeric string. The effect of a null
862           string as the value of fs is unspecified.
863
864
865       sprintf(fmt,expr,expr,...)
866
867           Format the expressions according to the printf format given by  fmt
868           and return the resulting string.
869
870
871       sub(ere,repl[,in])
872
873           Substitute  the  string  repl in place of the first instance of the
874           extended regular expression ERE in string in and return the  number
875           of  substitutions.  An ampersand ( & ) appearing in the string repl
876           is replaced by the string from in that matches the regular  expres‐
877           sion.  An  ampersand preceded with a backslash ( \ ) is interpreted
878           as the literal ampersand character. An occurrence of  two  consecu‐
879           tive  backslashes is interpreted as just a single literal backslash
880           character.  Any other occurrence of a backslash (for example,  pre‐
881           ceding any other character) is treated as a literal backslash char‐
882           acter. If repl is a string literal, the handling of  the  ampersand
883           character  occurs after any lexical processing, including any lexi‐
884           cal backslash escape sequence processing. If in is specified and it
885           is  not an lvalue the behavior is undefined. If in is omitted, nawk
886           uses the current record ($0) in its place.
887
888
889       substr(s,m[,n])
890
891           Return the at most n-character substring of s that begins at  posi‐
892           tion  m,  numbering from 1. If n is missing, the length of the sub‐
893           string is limited by the length of the string s.
894
895
896       tolower(s)
897
898           Return a string based on the string s. Each character in s that  is
899           an  upper-case  letter  specified  to have a tolower mapping by the
900           LC_CTYPE category of the current locale is replaced in the returned
901           string  by  the  lower-case  letter specified by the mapping. Other
902           characters in s are unchanged in the returned string.
903
904
905       toupper(s)
906
907           Return a string based on the string s. Each character in s that  is
908           a  lower-case  letter  specified  to  have a toupper mapping by the
909           LC_CTYPE category of the current locale is replaced in the returned
910           string  by  the  upper-case  letter specified by the mapping. Other
911           characters in s are unchanged in the returned string.
912
913
914
915       All of the preceding functions that take ERE as a  parameter  expect  a
916       pattern  or  a string valued expression that is a regular expression as
917       defined below.
918
919   Input/Output and General Functions
920       The input/output and general functions are:
921
922       close(expression)          Close the file or pipe opened by a print  or
923                                  printf  statement  or a call to getline with
924                                  the same string-valued  expression.  If  the
925                                  close  was  successful, the function returns
926                                  0; otherwise, it returns non-zero.
927
928
929       expression|getline[var]    Read a record of input from a  stream  piped
930                                  from  the output of a command. The stream is
931                                  created if no stream is currently open  with
932                                  the value of expression as its command name.
933                                  The stream created is equivalent to one cre‐
934                                  ated  by  a  call to the popen function with
935                                  the value of expression as the command argu‐
936                                  ment  and a value of r as the mode argument.
937                                  As long as the stream remains  open,  subse‐
938                                  quent calls in which expression evaluates to
939                                  the  same  string  value  reads   subsequent
940                                  records  from  the  file. The stream remains
941                                  open until the close function is called with
942                                  an  expression  that  evaluates  to the same
943                                  string value. At that time,  the  stream  is
944                                  closed  as  if by a call to the pclose func‐
945                                  tion. If var is missing, $0 and NF  is  set.
946                                  Otherwise, var is set.
947
948                                  The getline operator can form ambiguous con‐
949                                  structs when there are  operators  that  are
950                                  not  in  parentheses (including concatenate)
951                                  to the left of the | (to  the  beginning  of
952                                  the  expression  containing getline). In the
953                                  context of the $ operator, | behaves  as  if
954                                  it had a lower precedence than $. The result
955                                  of evaluating other  operators  is  unspeci‐
956                                  fied, and all such uses of portable applica‐
957                                  tions must be put in parentheses properly.
958
959
960       getline                       Set $0 to the next input record from  the
961                                     current  input file. This form of getline
962                                     sets the NF, NR, and FNR variables.
963
964
965       getline var                   Set variable var to the next input record
966                                     from the current input file. This form of
967                                     getline sets the FNR and NR variables.
968
969
970       getline [var] < expression    Read the next  record  of  input  from  a
971                                     named  file.  The expression is evaluated
972                                     to produce a string that  is  used  as  a
973                                     full  pathname.  If the file of that name
974                                     is not currently open, it is  opened.  As
975                                     long  as  the stream remains open, subse‐
976                                     quent calls in which expression evaluates
977                                     to the same string value reads subsequent
978                                     records from the file. The  file  remains
979                                     open  until  the close function is called
980                                     with an expression that evaluates to  the
981                                     same  string value. If var is missing, $0
982                                     and NF is set. Otherwise, var is set.
983
984                                     The getline operator can  form  ambiguous
985                                     constructs  when  there are binary opera‐
986                                     tors that are not in parentheses (includ‐
987                                     ing  concatenate)  to  the right of the <
988                                     (up to the end of the expression contain‐
989                                     ing  the getline). The result of evaluat‐
990                                     ing such a construct is unspecified,  and
991                                     all  such  uses  of portable applications
992                                     must be put in parentheses properly.
993
994
995       system(expression)            Execute the command given  by  expression
996                                     in  a manner equivalent to the system(3C)
997                                     function and return the  exit  status  of
998                                     the command.
999
1000
1001
1002       All  forms of getline return 1 for successful input, 0 for end of file,
1003       and −1 for an error.
1004
1005
1006       Where strings are used as the name of a file or pipeline,  the  strings
1007       must  be  textually  identical.  The  terminology ``same string value''
1008       implies that ``equivalent strings'', even those  that  differ  only  by
1009       space characters, represent different files.
1010
1011   User-defined Functions
1012       The  nawk language also provides user-defined functions. Such functions
1013       can be defined as:
1014
1015         function name(args,...) { statements }
1016
1017
1018
1019       A function can be referred to anywhere in an nawk program; in  particu‐
1020       lar,  its  use  can  precede its definition. The scope of a function is
1021       global.
1022
1023
1024       Function arguments can be either scalars or  arrays;  the  behavior  is
1025       undefined  if  an array name is passed as an argument that the function
1026       uses as a scalar, or if a scalar expression is passed  as  an  argument
1027       that  the  function  uses as an array. Function arguments are passed by
1028       value if scalar and by reference if  array  name.  Argument  names  are
1029       local  to  the  function; all other variable names are global. The same
1030       name is not used as both an argument name and as the name of a function
1031       or  a  special  nawk variable. The same name must not be used both as a
1032       variable name with global scope and as the name of a function. The same
1033       name  must  not be used within the same scope both as a scalar variable
1034       and as an array.
1035
1036
1037       The number of parameters in the function definition need not match  the
1038       number of parameters in the function call. Excess formal parameters can
1039       be used as local variables. If fewer arguments are supplied in a  func‐
1040       tion  call  than  are  in the function definition, the extra parameters
1041       that are used in the function body as scalars are  initialized  with  a
1042       string  value  of  the null string and a numeric value of zero, and the
1043       extra parameters that are used in the function body as arrays are  ini‐
1044       tialized  as empty arrays. If more arguments are supplied in a function
1045       call than are in the function definition, the behavior is undefined.
1046
1047
1048       When invoking a function, no white space  can  be  placed  between  the
1049       function name and the opening parenthesis. Function calls can be nested
1050       and recursive calls can be made upon functions. Upon  return  from  any
1051       nested  or  recursive  function  call, the values of all of the calling
1052       function's parameters are unchanged, except for array parameters passed
1053       by  reference. The return statement can be used to return a value. If a
1054       return statement appears outside of a function definition, the behavior
1055       is undefined.
1056
1057
1058       In  the function definition, newline characters are optional before the
1059       opening brace and after the closing  brace.  Function  definitions  can
1060       appear anywhere in the program where a pattern-action pair is allowed.
1061

USAGE

1063       The  index,  length, match, and substr functions should not be confused
1064       with similar functions in the ISO C standard; the  nawk  versions  deal
1065       with characters, while the ISO C standard deals with bytes.
1066
1067
1068       Because  the concatenation operation is represented by adjacent expres‐
1069       sions rather than an explicit operator, it is often  necessary  to  use
1070       parentheses to enforce the proper evaluation precedence.
1071
1072
1073       See  largefile(5)  for  the  description  of  the behavior of nawk when
1074       encountering files greater than or equal to 2 Gbyte (2^31 bytes).
1075

EXAMPLES

1077       The nawk program specified in the command line is most easily specified
1078       within  single-quotes  (for  example, 'program') for applications using
1079       sh, because nawk programs commonly contain characters that are  special
1080       to  the  shell, including double-quotes. In the cases where a nawk pro‐
1081       gram contains single-quote characters, it is usually easiest to specify
1082       most of the program as strings within single-quotes concatenated by the
1083       shell with quoted single-quote characters. For example:
1084
1085         nawk '/'\''/ { print "quote:", $0 }'
1086
1087
1088
1089       prints all lines from the  standard  input  containing  a  single-quote
1090       character, prefixed with quote:.
1091
1092
1093       The following are examples of simple nawk programs:
1094
1095       Example  1 Write to the standard output all input lines for which field
1096       3 is greater than 5:
1097
1098         $3 > 5
1099
1100
1101
1102       Example 2 Write every tenth line:
1103
1104         (NR % 10) == 0
1105
1106
1107
1108       Example 3 Write any line with a substring matching the regular  expres‐
1109       sion:
1110
1111         /(G|D)(2[0-9][[:alpha:]]*)/
1112
1113
1114
1115       Example 4 Print any line with a substring containing a G or D, followed
1116       by a sequence of digits and characters:
1117
1118
1119       This example uses character classes digit and alpha to match  language-
1120       independent digit and alphabetic characters, respectively.
1121
1122
1123         /(G|D)([[:digit:][:alpha:]]*)/
1124
1125
1126
1127       Example  5 Write any line in which the second field matches the regular
1128       expression and the fourth field does not:
1129
1130         $2 ~ /xyz/ && $4 !~ /xyz/
1131
1132
1133
1134       Example 6 Write any line in which the second  field  contains  a  back‐
1135       slash:
1136
1137         $2 ~ /\\/
1138
1139
1140
1141       Example 7 Write any line in which the second field contains a backslash
1142       (alternate method):
1143
1144
1145       Notice that backslash escapes are interpreted twice,  once  in  lexical
1146       processing of the string and once in processing the regular expression.
1147
1148
1149         $2 ~ "\\\\"
1150
1151
1152
1153       Example 8 Write the second to the last and the last field in each line,
1154       separating the fields by a colon:
1155
1156         {OFS=":";print $(NF-1), $NF}
1157
1158
1159
1160       Example 9 Write the line number and number of fields in each line:
1161
1162
1163       The three strings representing the line number, the colon and the  num‐
1164       ber  of  fields are concatenated and that string is written to standard
1165       output.
1166
1167
1168         {print NR ":" NF}
1169
1170
1171
1172       Example 10 Write lines longer than 72 characters:
1173
1174         {length($0) > 72}
1175
1176
1177
1178       Example 11 Write first two fields in opposite order  separated  by  the
1179       OFS:
1180
1181         { print $2, $1 }
1182
1183
1184
1185       Example  12 Same, with input fields separated by comma or space and tab
1186       characters, or both:
1187
1188         BEGIN { FS = ",[\t]*|[\t]+" }
1189               { print $2, $1 }
1190
1191
1192
1193       Example 13 Add up first column, print sum and average:
1194
1195         {s += $1 }
1196         END {print "sum is ", s, " average is", s/NR}
1197
1198
1199
1200       Example 14 Write fields in reverse order, one per line (many lines  out
1201       for each line in):
1202
1203         { for (i = NF; i > 0; --i) print $i }
1204
1205
1206
1207       Example  15  Write all lines between occurrences of the strings "start"
1208       and "stop":
1209
1210         /start/, /stop/
1211
1212
1213
1214       Example 16 Write all lines whose first field is different from the pre‐
1215       vious one:
1216
1217         $1 != prev { print; prev = $1 }
1218
1219
1220
1221       Example 17 Simulate the echo command:
1222
1223         BEGIN  {
1224                for (i = 1; i < ARGC; ++i)
1225                      printf "%s%s", ARGV[i], i==ARGC-1?"\n":""
1226                }
1227
1228
1229
1230       Example  18  Write  the path prefixes contained in the PATH environment
1231       variable, one per line:
1232
1233         BEGIN  {
1234                n = split (ENVIRON["PATH"], path, ":")
1235                for (i = 1; i <= n; ++i)
1236                       print path[i]
1237                }
1238
1239
1240
1241       Example 19 Print the file "input", filling in page numbers starting  at
1242       5:
1243
1244
1245       If there is a file named input containing page headers of the form
1246
1247
1248         Page#
1249
1250
1251
1252       and a file named program that contains
1253
1254
1255         /Page/{ $2 = n++; }
1256         { print }
1257
1258
1259
1260       then the command line
1261
1262
1263         nawk -f program n=5 input
1264
1265
1266
1267
1268       prints the file input, filling in page numbers starting at 5.
1269
1270

ENVIRONMENT VARIABLES

1272       See  environ(5) for descriptions of the following environment variables
1273       that affect execution: LC_COLLATE, LC_CTYPE, LC_MESSAGES, and NLSPATH.
1274
1275       LC_NUMERIC    Determine the  radix  character  used  when  interpreting
1276                     numeric input, performing conversions between numeric and
1277                     string values and formatting numeric  output.  Regardless
1278                     of  locale, the period character (the decimal-point char‐
1279                     acter of the POSIX locale) is the decimal-point character
1280                     recognized  in processing awk programs (including assign‐
1281                     ments in command-line arguments).
1282
1283

EXIT STATUS

1285       The following exit values are returned:
1286
1287       0     All input files were processed successfully.
1288
1289
1290       >0    An error occurred.
1291
1292
1293
1294       The exit status can be altered within the  program  by  using  an  exit
1295       expression.
1296

ATTRIBUTES

1298       See attributes(5) for descriptions of the following attributes:
1299
1300   /usr/bin/nawk
1301       ┌─────────────────────────────┬─────────────────────────────┐
1302       │      ATTRIBUTE TYPE         │      ATTRIBUTE VALUE        │
1303       ├─────────────────────────────┼─────────────────────────────┤
1304       │Availability                 │SUNWcsu                      │
1305       └─────────────────────────────┴─────────────────────────────┘
1306
1307   /usr/xpg4/bin/awk
1308       ┌─────────────────────────────┬─────────────────────────────┐
1309       │      ATTRIBUTE TYPE         │      ATTRIBUTE VALUE        │
1310       ├─────────────────────────────┼─────────────────────────────┤
1311       │Availability                 │SUNWxcu4                     │
1312       └─────────────────────────────┴─────────────────────────────┘
1313

SEE ALSO

1315       awk(1),   ed(1),   egrep(1),   grep(1),   lex(1),   sed(1),  popen(3C),
1316       printf(3C),  system(3C),   attributes(5),   environ(5),   largefile(5),
1317       regex(5), XPG4(5)
1318
1319
1320       Aho,  A. V., B. W. Kernighan, and P. J. Weinberger, The AWK Programming
1321       Language, Addison-Wesley, 1988.
1322

DIAGNOSTICS

1324       If any file operand is specified and the named file cannot be accessed,
1325       nawk  writes a diagnostic message to standard error and terminate with‐
1326       out any further action.
1327
1328
1329       If the program specified by either the program operand  or  a  progfile
1330       operand  is not a valid nawk program (as specified in EXTENDED DESCRIP‐
1331       TION), the behavior is undefined.
1332

NOTES

1334       Input white space is not preserved on output if fields are involved.
1335
1336
1337       There are no explicit conversions between numbers and strings. To force
1338       an  expression to be treated as a number add 0 to it; to force it to be
1339       treated as a string concatenate the null string ("") to it.
1340
1341
1342
1343SunOS 5.11                        24 May 2006                          nawk(1)
Impressum