1nawk(1) User Commands nawk(1)
2
3
4
6 nawk - pattern scanning and processing language
7
9 /usr/bin/nawk [-F ERE] [-v assignment] 'program' | -f progfile...
10 [argument]...
11
12
13 /usr/xpg4/bin/awk [-F ERE] [-v assignment]... 'program' | -f progfile...
14 [argument]...
15
16
18 The /usr/bin/nawk and /usr/xpg4/bin/awk utilities execute programs
19 written in the nawk programming language, which is specialized for tex‐
20 tual data manipulation. A nawk program is a sequence of patterns and
21 corresponding actions. The string specifying program must be enclosed
22 in single quotes (') to protect it from interpretation by the shell.
23 The sequence of pattern - action statements can be specified in the
24 command line as program or in one, or more, file(s) specified by the
25 -fprogfile option. When input is read that matches a pattern, the
26 action associated with the pattern is performed.
27
28
29 Input is interpreted as a sequence of records. By default, a record is
30 a line, but this can be changed by using the RS built-in variable. Each
31 record of input is matched to each pattern in the program. For each
32 pattern matched, the associated action is executed.
33
34
35 The nawk utility interprets each input record as a sequence of fields
36 where, by default, a field is a string of non-blank characters. This
37 default white-space field delimiter (blanks and/or tabs) can be changed
38 by using the FS built-in variable or the -FERE option. The nawk utility
39 denotes the first field in a record $1, the second $2, and so forth.
40 The symbol $0 refers to the entire record; setting any other field
41 causes the reevaluation of $0. Assigning to $0 resets the values of all
42 fields and the NF built-in variable.
43
45 The following options are supported:
46
47 -F ERE Define the input field separator to be the extended
48 regular expression ERE, before any input is read (can
49 be a character).
50
51
52 -f progfile Specifies the pathname of the file progfile containing
53 a nawk program. If multiple instances of this option
54 are specified, the concatenation of the files speci‐
55 fied as progfile in the order specified is the nawk
56 program. The nawk program can alternatively be speci‐
57 fied in the command line as a single argument.
58
59
60 -v assignment The assignment argument must be in the same form as an
61 assignment operand. The assignment is of the form
62 var=value, where var is the name of one of the vari‐
63 ables described below. The specified assignment occurs
64 before executing the nawk program, including the
65 actions associated with BEGIN patterns (if any). Mul‐
66 tiple occurrences of this option can be specified.
67
68
70 The following operands are supported:
71
72 program If no -f option is specified, the first operand to nawk is
73 the text of the nawk program. The application supplies the
74 program operand as a single argument to nawk. If the text
75 does not end in a newline character, nawk interprets the
76 text as if it did.
77
78
79 argument Either of the following two types of argument can be inter‐
80 mixed:
81
82 file A pathname of a file that contains the input
83 to be read, which is matched against the set
84 of patterns in the program. If no file oper‐
85 ands are specified, or if a file operand is
86 −, the standard input is used.
87
88
89 assignment An operand that begins with an underscore or
90 alphabetic character from the portable char‐
91 acter set, followed by a sequence of under‐
92 scores, digits and alphabetics from the por‐
93 table character set, followed by the = char‐
94 acter specifies a variable assignment rather
95 than a pathname. The characters before the =
96 represent the name of a nawk variable. If
97 that name is a nawk reserved word, the behav‐
98 ior is undefined. The characters following
99 the equal sign is interpreted as if they
100 appeared in the nawk program preceded and
101 followed by a double-quote (") character, as
102 a STRING token , except that if the last
103 character is an unescaped backslash, it is
104 interpreted as a literal backslash rather
105 than as the first character of the sequence
106 \.. The variable is assigned the value of
107 that STRING token. If the value is considered
108 a numericstring, the variable is assigned its
109 numeric value. Each such variable assignment
110 is performed just before the processing of
111 the following file, if any. Thus, an assign‐
112 ment before the first file argument is exe‐
113 cuted after the BEGIN actions (if any), while
114 an assignment after the last file argument is
115 executed before the END actions (if any). If
116 there are no file arguments, assignments are
117 executed before processing the standard
118 input.
119
120
121
123 Input files to the nawk program from any of the following sources:
124
125 o any file operands or their equivalents, achieved by modify‐
126 ing the nawk variables ARGV and ARGC
127
128 o standard input in the absence of any file operands
129
130 o arguments to the getline function
131
132
133 must be text files. Whether the variable RS is set to a value other
134 than a newline character or not, for these files, implementations sup‐
135 port records terminated with the specified separator up to {LINE_MAX}
136 bytes and can support longer records.
137
138
139 If -f progfile is specified, the files named by each of the progfile
140 option-arguments must be text files containing an nawk program.
141
142
143 The standard input are used only if no file operands are specified, or
144 if a file operand is −.
145
147 A nawk program is composed of pairs of the form:
148
149 pattern { action }
150
151
152
153 Either the pattern or the action (including the enclosing brace charac‐
154 ters) can be omitted. Pattern-action statements are separated by a
155 semicolon or by a newline.
156
157
158 A missing pattern matches any record of input, and a missing action is
159 equivalent to an action that writes the matched record of input to
160 standard output.
161
162
163 Execution of the nawk program starts by first executing the actions
164 associated with all BEGIN patterns in the order they occur in the pro‐
165 gram. Then each file operand (or standard input if no files were speci‐
166 fied) is processed by reading data from the file until a record separa‐
167 tor is seen (a newline character by default), splitting the current
168 record into fields using the current value of FS, evaluating each pat‐
169 tern in the program in the order of occurrence, and executing the
170 action associated with each pattern that matches the current record.
171 The action for a matching pattern is executed before evaluating subse‐
172 quent patterns. Last, the actions associated with all END patterns is
173 executed in the order they occur in the program.
174
175 Expressions in nawk
176 Expressions describe computations used in patterns and actions. In the
177 following table, valid expression operations are given in groups from
178 highest precedence first to lowest precedence last, with equal-prece‐
179 dence operators grouped between horizontal lines. In expression evalua‐
180 tion, where the grammar is formally ambiguous, higher precedence opera‐
181 tors are evaluated before lower precedence operators. In this table
182 expr, expr1, expr2, and expr3 represent any expression, while lvalue
183 represents any entity that can be assigned to (that is, on the left
184 side of an assignment operator).
185
186
187
188
189 Syntax Name Type of Result Associativity
190 ────────────────────────────────────────────────────────────────────────────────
191 ( expr ) Grouping type of expr n/a
192 ────────────────────────────────────────────────────────────────────────────────
193 $expr Field reference string n/a
194 ────────────────────────────────────────────────────────────────────────────────
195 ++ lvalue Pre-increment numeric n/a
196 −−lvalue Pre-decrement numeric n/a
197 lvalue ++ Post-increment numeric n/a
198
199 lvalue −− Post-decrement numeric n/a
200 ────────────────────────────────────────────────────────────────────────────────
201 expr ^ expr Exponentiation numeric right
202 ────────────────────────────────────────────────────────────────────────────────
203 ! expr Logical not numeric n/a
204 + expr Unary plus numeric n/a
205 − expr Unary minus numeric n/a
206 ────────────────────────────────────────────────────────────────────────────────
207 expr * expr Multiplication numeric left
208 expr / expr Division numeric left
209 expr % expr Modulus numeric left
210 ────────────────────────────────────────────────────────────────────────────────
211 expr + expr Addition numeric left
212 expr − expr Subtraction numeric left
213 ────────────────────────────────────────────────────────────────────────────────
214 expr expr String concatenation string left
215 ────────────────────────────────────────────────────────────────────────────────
216 expr < expr Less than numeric none
217 expr <= expr Less than or equal to numeric none
218 expr != expr Not equal to numeric none
219 expr == expr Equal to numeric none
220 expr > expr Greater than numeric none
221 expr >= expr Greater than or equal to numeric none
222 ────────────────────────────────────────────────────────────────────────────────
223 expr ~ expr ERE match numeric none
224 expr !~ expr ERE non-match numeric none
225 ────────────────────────────────────────────────────────────────────────────────
226 expr in array Array membership numeric left
227 ( index ) in Multi-dimension array numeric left
228 array membership
229 ────────────────────────────────────────────────────────────────────────────────
230 expr && expr Logical AND numeric left
231 ────────────────────────────────────────────────────────────────────────────────
232 expr || expr Logical OR numeric left
233 ────────────────────────────────────────────────────────────────────────────────
234 expr1 ? expr2 Conditional expression type of selected right
235 : expr3 expr2 or expr3
236 ────────────────────────────────────────────────────────────────────────────────
237 lvalue ^= expr Exponentiation numeric right
238 assignment
239 lvalue %= expr Modulus assignment numeric right
240 lvalue *= expr Multiplication numeric right
241 assignment
242 lvalue /= expr Division assignment numeric right
243 lvalue += expr Addition assignment numeric right
244 lvalue −= expr Subtraction assignment numeric right
245 lvalue = expr Assignment type of expr right
246
247
248
249 Each expression has either a string value, a numeric value or both.
250 Except as stated for specific contexts, the value of an expression is
251 implicitly converted to the type needed for the context in which it is
252 used. A string value is converted to a numeric value by the equivalent
253 of the following calls:
254
255 setlocale(LC_NUMERIC, "");
256 numeric_value = atof(string_value);
257
258
259
260 A numeric value that is exactly equal to the value of an integer is
261 converted to a string by the equivalent of a call to the sprintf func‐
262 tion with the string %d as the fmt argument and the numeric value being
263 converted as the first and only expr argument. Any other numeric value
264 is converted to a string by the equivalent of a call to the sprintf
265 function with the value of the variable CONVFMT as the fmt argument and
266 the numeric value being converted as the first and only expr argument.
267
268
269 A string value is considered to be a numeric string in the following
270 case:
271
272 1. Any leading and trailing blank characters is ignored.
273
274 2. If the first unignored character is a + or −, it is ignored.
275
276 3. If the remaining unignored characters would be lexically
277 recognized as a NUMBER token, the string is considered a
278 numeric string.
279
280
281 If a − character is ignored in the above steps, the numeric value of
282 the numeric string is the negation of the numeric value of the recog‐
283 nized NUMBER token. Otherwise the numeric value of the numeric string
284 is the numeric value of the recognized NUMBER token. Whether or not a
285 string is a numeric string is relevant only in contexts where that term
286 is used in this section.
287
288
289 When an expression is used in a Boolean context, if it has a numeric
290 value, a value of zero is treated as false and any other value is
291 treated as true. Otherwise, a string value of the null string is
292 treated as false and any other value is treated as true. A Boolean con‐
293 text is one of the following:
294
295 o the first subexpression of a conditional expression.
296
297 o an expression operated on by logical NOT, logical AND, or
298 logical OR.
299
300 o the second expression of a for statement.
301
302 o the expression of an if statement.
303
304 o the expression of the while clause in either a while or do
305 ... while statement.
306
307 o an expression used as a pattern (as in Overall Program
308 Structure).
309
310
311 The nawk language supplies arrays that are used for storing numbers or
312 strings. Arrays need not be declared. They are initially empty, and
313 their sizes changes dynamically. The subscripts, or element identi‐
314 fiers, are strings, providing a type of associative array capability.
315 An array name followed by a subscript within square brackets can be
316 used as an lvalue and as an expression, as described in the grammar.
317 Unsubscripted array names are used in only the following contexts:
318
319 o a parameter in a function definition or function call.
320
321 o the NAME token following any use of the keyword in.
322
323
324 A valid array index consists of one or more comma-separated expres‐
325 sions, similar to the way in which multi-dimensional arrays are indexed
326 in some programming languages. Because nawk arrays are really one-
327 dimensional, such a comma-separated list is converted to a single
328 string by concatenating the string values of the separate expressions,
329 each separated from the other by the value of the SUBSEP variable.
330
331
332 Thus, the following two index operations are equivalent:
333
334 var[expr1, expr2, ... exprn]
335 var[expr1 SUBSEP expr2 SUBSEP ... SUBSEP exprn]
336
337
338
339 A multi-dimensioned index used with the in operator must be put in
340 parentheses. The in operator, which tests for the existence of a par‐
341 ticular array element, does not create the element if it does not
342 exist. Any other reference to a non-existent array element automati‐
343 cally creates it.
344
345 Variables and Special Variables
346 Variables can be used in an nawk program by referencing them. With the
347 exception of function parameters, they are not explicitly declared.
348 Uninitialized scalar variables and array elements have both a numeric
349 value of zero and a string value of the empty string.
350
351
352 Field variables are designated by a $ followed by a number or numerical
353 expression. The effect of the field number expression evaluating to
354 anything other than a non-negative integer is unspecified. Uninitial‐
355 ized variables or string values need not be converted to numeric values
356 in this context. New field variables are created by assigning a value
357 to them. References to non-existent fields (that is, fields after $NF)
358 produce the null string. However, assigning to a non-existent field
359 (for example, $(NF+2) = 5) increases the value of NF, create any inter‐
360 vening fields with the null string as their values and cause the value
361 of $0 to be recomputed, with the fields being separated by the value of
362 OFS. Each field variable has a string value when created. If the
363 string, with any occurrence of the decimal-point character from the
364 current locale changed to a period character, is considered a numeric
365 string (see Expressions in nawk above), the field variable also has the
366 numeric value of the numeric string.
367
368 /usr/bin/nawk, /usr/xpg4/bin/awk
369 nawk sets the following special variables that are supported by both
370 /usr/bin/nawk and /usr/xpg4/bin/awk:
371
372 ARGC The number of elements in the ARGV array.
373
374
375 ARGV An array of command line arguments, excluding options and
376 the program argument, numbered from zero to ARGC−1.
377
378 The arguments in ARGV can be modified or added to; ARGC can
379 be altered. As each input file ends, nawk treats the next
380 non-null element of ARGV, up to the current value of
381 ARGC−1, inclusive, as the name of the next input file.
382 Setting an element of ARGV to null means that it is not
383 treated as an input file. The name − indicates the standard
384 input. If an argument matches the format of an assignment
385 operand, this argument is treated as an assignment rather
386 than a file argument.
387
388
389 ENVIRON The variable ENVIRON is an array representing the value of
390 the environment. The indices of the array are strings con‐
391 sisting of the names of the environment variables, and the
392 value of each array element is a string consisting of the
393 value of that variable. If the value of an environment
394 variable is considered a numeric string, the array element
395 also has its numeric value.
396
397 In all cases where nawk behavior is affected by environment
398 variables (including the environment of any commands that
399 nawk executes via the system function or via pipeline redi‐
400 rections with the print statement, the printf statement, or
401 the getline function), the environment used is the environ‐
402 ment at the time nawk began executing.
403
404
405 FILENAME A pathname of the current input file. Inside a BEGIN action
406 the value is undefined. Inside an END action the value is
407 the name of the last input file processed.
408
409
410 FNR The ordinal number of the current record in the current
411 file. Inside a BEGIN action the value is zero. Inside an
412 END action the value is the number of the last record pro‐
413 cessed in the last file processed.
414
415
416 FS Input field separator regular expression; a space character
417 by default.
418
419
420 NF The number of fields in the current record. Inside a BEGIN
421 action, the use of NF is undefined unless a getline func‐
422 tion without a var argument is executed previously. Inside
423 an END action, NF retains the value it had for the last
424 record read, unless a subsequent, redirected, getline func‐
425 tion without a var argument is performed prior to entering
426 the END action.
427
428
429 NR The ordinal number of the current record from the start of
430 input. Inside a BEGIN action the value is zero. Inside an
431 END action the value is the number of the last record pro‐
432 cessed.
433
434
435 OFMT The printf format for converting numbers to strings in out‐
436 put statements "%.6g" by default. The result of the conver‐
437 sion is unspecified if the value of OFMT is not a floating-
438 point format specification.
439
440
441 OFS The print statement output field separator; a space charac‐
442 ter by default.
443
444
445 ORS The print output record separator; a newline character by
446 default.
447
448
449 LENGTH The length of the string matched by the match function.
450
451
452 RS The first character of the string value of RS is the input
453 record separator; a newline character by default. If RS
454 contains more than one character, the results are unspeci‐
455 fied. If RS is null, then records are separated by
456 sequences of one or more blank lines. Leading or trailing
457 blank lines do not produce empty records at the beginning
458 or end of input, and the field separator is always newline,
459 no matter what the value of FS.
460
461
462 RSTART The starting position of the string matched by the match
463 function, numbering from 1. This is always equivalent to
464 the return value of the match function.
465
466
467 SUBSEP The subscript separator string for multi-dimensional
468 arrays. The default value is \034.
469
470
471 /usr/xpg4/bin/awk
472 The following variable is supported for /usr/xpg4/bin/awk only:
473
474 CONVFMT The printf format for converting numbers to strings (except
475 for output statements, where OFMT is used). The default is
476 %.6g.
477
478
479 Regular Expressions
480 The nawk utility makes use of the extended regular expression notation
481 (see regex(5)) except that it allows the use of C-language conventions
482 to escape special characters within the EREs, namely \\, \a, \b, \f,
483 \n, \r, \t, \v, and those specified in the following table. These
484 escape sequences are recognized both inside and outside bracket expres‐
485 sions. Note that records need not be separated by newline characters
486 and string constants can contain newline characters, so even the \n
487 sequence is valid in nawk EREs. Using a slash character within the
488 regular expression requires escaping as shown in the table below:
489
490
491
492
493 Escape Sequence Description Meaning
494 ───────────────────────────────────────────────────────────────────────
495 \" Backslash quotation-mark Quotation-mark character
496 ───────────────────────────────────────────────────────────────────────
497 \/ Backslash slash Slash character
498 ───────────────────────────────────────────────────────────────────────
499 \ddd A backslash character The character encoded by
500 followed by the longest the one-, two- or three-
501 sequence of one, two, or digit octal integer.
502 three octal-digit char‐ Multi-byte characters
503 acters (01234567). If require multiple, con‐
504 all of the digits are 0, catenated escape
505 (that is, representation sequences, including the
506 of the NULL character), leading \ for each byte.
507 the behavior is unde‐
508 fined.
509 ───────────────────────────────────────────────────────────────────────
510 \c A backslash character Undefined
511 followed by any charac‐
512 ter not described in
513 this table or special
514 characters (\\, \a, \b,
515 \f, \n, \r, \t, \v).
516
517
518
519 A regular expression can be matched against a specific field or string
520 by using one of the two regular expression matching operators, ~ and
521 !~. These operators interpret their right-hand operand as a regular
522 expression and their left-hand operand as a string. If the regular
523 expression matches the string, the ~ expression evaluates to the value
524 1, and the !~ expression evaluates to the value 0. If the regular
525 expression does not match the string, the ~ expression evaluates to the
526 value 0, and the !~ expression evaluates to the value 1. If the right-
527 hand operand is any expression other than the lexical token ERE, the
528 string value of the expression is interpreted as an extended regular
529 expression, including the escape conventions described above. Notice
530 that these same escape conventions also are applied in the determining
531 the value of a string literal (the lexical token STRING), and is
532 applied a second time when a string literal is used in this context.
533
534
535 When an ERE token appears as an expression in any context other than as
536 the right-hand of the ~ or !~ operator or as one of the built-in func‐
537 tion arguments described below, the value of the resulting expression
538 is the equivalent of:
539
540 $0 ~ /ere/
541
542
543
544 The ere argument to the gsub, match, sub functions, and the fs argument
545 to the split function (see String Functions) is interpreted as extended
546 regular expressions. These can be either ERE tokens or arbitrary
547 expressions, and are interpreted in the same manner as the right-hand
548 side of the ~ or !~ operator.
549
550
551 An extended regular expression can be used to separate fields by using
552 the -F ERE option or by assigning a string containing the expression to
553 the built-in variable FS. The default value of the FS variable is a
554 single space character. The following describes FS behavior:
555
556 1. If FS is a single character:
557
558 o If FS is the space character, skip leading and trailing
559 blank characters; fields are delimited by sets of one or
560 more blank characters.
561
562 o Otherwise, if FS is any other character c, fields are
563 delimited by each single occurrence of c.
564
565 2. Otherwise, the string value of FS is considered to be an
566 extended regular expression. Each occurrence of a sequence
567 matching the extended regular expression delimits fields.
568
569
570 Except in the gsub, match, split, and sub built-in functions, regular
571 expression matching is based on input records. That is, record separa‐
572 tor characters (the first character of the value of the variable RS, a
573 newline character by default) cannot be embedded in the expression, and
574 no expression matches the record separator character. If the record
575 separator is not a newline character, newline characters embedded in
576 the expression can be matched. In those four built-in functions, regu‐
577 lar expression matching are based on text strings. So, any character
578 (including the newline character and the record separator) can be
579 embedded in the pattern and an appropriate pattern matches any charac‐
580 ter. However, in all nawk regular expression matching, the use of one
581 or more NULL characters in the pattern, input record or text string
582 produces undefined results.
583
584 Patterns
585 A pattern is any valid expression, a range specified by two expressions
586 separated by comma, or one of the two special patterns BEGIN or END.
587
588 Special Patterns
589 The nawk utility recognizes two special patterns, BEGIN and END. Each
590 BEGIN pattern is matched once and its associated action executed before
591 the first record of input is read (except possibly by use of the get‐
592 line function in a prior BEGIN action) and before command line assign‐
593 ment is done. Each END pattern is matched once and its associated
594 action executed after the last record of input has been read. These two
595 patterns have associated actions.
596
597
598 BEGIN and END do not combine with other patterns. Multiple BEGIN and
599 END patterns are allowed. The actions associated with the BEGIN pat‐
600 terns are executed in the order specified in the program, as are the
601 END actions. An END pattern can precede a BEGIN pattern in a program.
602
603
604 If an nawk program consists of only actions with the pattern BEGIN, and
605 the BEGIN action contains no getline function, nawk exits without read‐
606 ing its input when the last statement in the last BEGIN action is exe‐
607 cuted. If an nawk program consists of only actions with the pattern END
608 or only actions with the patterns BEGIN and END, the input is read
609 before the statements in the END actions are executed.
610
611 Expression Patterns
612 An expression pattern is evaluated as if it were an expression in a
613 Boolean context. If the result is true, the pattern is considered to
614 match, and the associated action (if any) is executed. If the result is
615 false, the action is not executed.
616
617 Pattern Ranges
618 A pattern range consists of two expressions separated by a comma. In
619 this case, the action is performed for all records between a match of
620 the first expression and the following match of the second expression,
621 inclusive. At this point, the pattern range can be repeated starting at
622 input records subsequent to the end of the matched range.
623
624 Actions
625 An action is a sequence of statements. A statement can be one of the
626 following:
627
628 if ( expression ) statement [ else statement ]
629 while ( expression ) statement
630 do statement while ( expression )
631 for ( expression ; expression ; expression ) statement
632 for ( var in array ) statement
633 delete array[subscript] #delete an array element
634 break
635 continue
636 { [ statement ] ... }
637 expression # commonly variable = expression
638 print [ expression-list ] [ >expression ]
639 printf format [ ,expression-list ] [ >expression ]
640 next # skip remaining patterns on this input line
641 exit [expr] # skip the rest of the input; exit status is expr
642 return [expr]
643
644
645
646 Any single statement can be replaced by a statement list enclosed in
647 braces. The statements are terminated by newline characters or semi‐
648 colons, and are executed sequentially in the order that they appear.
649
650
651 The next statement causes all further processing of the current input
652 record to be abandoned. The behavior is undefined if a next statement
653 appears or is invoked in a BEGIN or END action.
654
655
656 The exit statement invokes all END actions in the order in which they
657 occur in the program source and then terminate the program without
658 reading further input. An exit statement inside an END action termi‐
659 nates the program without further execution of END actions. If an
660 expression is specified in an exit statement, its numeric value is the
661 exit status of nawk, unless subsequent errors are encountered or a sub‐
662 sequent exit statement with an expression is executed.
663
664 Output Statements
665 Both print and printf statements write to standard output by default.
666 The output is written to the location specified by output_redirection
667 if one is supplied, as follows:
668
669 > expression>> expression| expression
670
671
672
673 In all cases, the expression is evaluated to produce a string that is
674 used as a full pathname to write into (for > or >>) or as a command to
675 be executed (for |). Using the first two forms, if the file of that
676 name is not currently open, it is opened, creating it if necessary and
677 using the first form, truncating the file. The output then is appended
678 to the file. As long as the file remains open, subsequent calls in
679 which expression evaluates to the same string value simply appends out‐
680 put to the file. The file remains open until the close function, which
681 is called with an expression that evaluates to the same string value.
682
683
684 The third form writes output onto a stream piped to the input of a com‐
685 mand. The stream is created if no stream is currently open with the
686 value of expression as its command name. The stream created is equiva‐
687 lent to one created by a call to the popen(3C) function with the value
688 of expression as the command argument and a value of w as the mode
689 argument. As long as the stream remains open, subsequent calls in
690 which expression evaluates to the same string value writes output to
691 the existing stream. The stream remains open until the close function
692 is called with an expression that evaluates to the same string value.
693 At that time, the stream is closed as if by a call to the pclose func‐
694 tion.
695
696
697 These output statements take a comma-separated list of expression s
698 referred in the grammar by the non-terminal symbols expr_list,
699 print_expr_list or print_expr_list_opt. This list is referred to here
700 as the expression list, and each member is referred to as an expression
701 argument.
702
703
704 The print statement writes the value of each expression argument onto
705 the indicated output stream separated by the current output field sepa‐
706 rator (see variable OFS above), and terminated by the output record
707 separator (see variable ORS above). All expression arguments is taken
708 as strings, being converted if necessary; with the exception that the
709 printf format in OFMT is used instead of the value in CONVFMT. An empty
710 expression list stands for the whole input record ($0).
711
712
713 The printf statement produces output based on a notation similar to the
714 File Format Notation used to describe file formats in this document
715 Output is produced as specified with the first expression argument as
716 the string format and subsequent expression arguments as the strings
717 arg1 to argn, inclusive, with the following exceptions:
718
719 1. The format is an actual character string rather than a
720 graphical representation. Therefore, it cannot contain empty
721 character positions. The space character in the format
722 string, in any context other than a flag of a conversion
723 specification, is treated as an ordinary character that is
724 copied to the output.
725
726 2. If the character set contains a Delta character and that
727 character appears in the format string, it is treated as an
728 ordinary character that is copied to the output.
729
730 3. The escape sequences beginning with a backslash character is
731 treated as sequences of ordinary characters that are copied
732 to the output. Note that these same sequences is interpreted
733 lexically by nawk when they appear in literal strings, but
734 they is not treated specially by the printf statement.
735
736 4. A field width or precision can be specified as the * charac‐
737 ter instead of a digit string. In this case the next argu‐
738 ment from the expression list is fetched and its numeric
739 value taken as the field width or precision.
740
741 5. The implementation does not precede or follow output from
742 the d or u conversion specifications with blank characters
743 not specified by the format string.
744
745 6. The implementation does not precede output from the o con‐
746 version specification with leading zeros not specified by
747 the format string.
748
749 7. For the c conversion specification: if the argument has a
750 numeric value, the character whose encoding is that value is
751 output. If the value is zero or is not the encoding of any
752 character in the character set, the behavior is undefined.
753 If the argument does not have a numeric value, the first
754 character of the string value is output; if the string does
755 not contain any characters the behavior is undefined.
756
757 8. For each conversion specification that consumes an argument,
758 the next expression argument is evaluated. With the excep‐
759 tion of the c conversion, the value is converted to the
760 appropriate type for the conversion specification.
761
762 9. If there are insufficient expression arguments to satisfy
763 all the conversion specifications in the format string, the
764 behavior is undefined.
765
766 10. If any character sequence in the format string begins with a
767 % character, but does not form a valid conversion specifica‐
768 tion, the behavior is unspecified.
769
770
771 Both print and printf can output at least {LINE_MAX} bytes.
772
773 Functions
774 The nawk language has a variety of built-in functions: arithmetic,
775 string, input/output and general.
776
777 Arithmetic Functions
778 The arithmetic functions, except for int, are based on the ISO C stan‐
779 dard. The behavior is undefined in cases where the ISO C standard spec‐
780 ifies that an error be returned or that the behavior is undefined.
781 Although the grammar permits built-in functions to appear with no argu‐
782 ments or parentheses, unless the argument or parentheses are indicated
783 as optional in the following list (by displaying them within the [ ]
784 brackets), such use is undefined.
785
786 atan2(y,x) Return arctangent of y/x.
787
788
789 cos(x) Return cosine of x, where x is in radians.
790
791
792 sin(x) Return sine of x, where x is in radians.
793
794
795 exp(x) Return the exponential function of x.
796
797
798 log(x) Return the natural logarithm of x.
799
800
801 sqrt(x) Return the square root of x.
802
803
804 int(x) Truncate its argument to an integer. It is truncated
805 toward 0 when x > 0.
806
807
808 rand() Return a random number n, such that 0 ≤ n < 1.
809
810
811 srand([expr]) Set the seed value for rand to expr or use the time of
812 day if expr is omitted. The previous seed value is
813 returned.
814
815
816 String Functions
817 The string functions in the following list shall be supported. Although
818 the grammar permits built-in functions to appear with no arguments or
819 parentheses, unless the argument or parentheses are indicated as
820 optional in the following list (by displaying them within the [ ]
821 brackets), such use is undefined.
822
823 gsub(ere,repl[,in])
824
825 Behave like sub (see below), except that it replaces all occur‐
826 rences of the regular expression (like the ed utility global sub‐
827 stitute) in $0 or in the in argument, when specified.
828
829
830 index(s,t)
831
832 Return the position, in characters, numbering from 1, in string s
833 where string t first occurs, or zero if it does not occur at all.
834
835
836 length[([s])]
837
838 Return the length, in characters, of its argument taken as a
839 string, or of the whole record, $0, if there is no argument.
840
841
842 match(s,ere)
843
844 Return the position, in characters, numbering from 1, in string s
845 where the extended regular expression ere occurs, or zero if it
846 does not occur at all. RSTART is set to the starting position
847 (which is the same as the returned value), zero if no match is
848 found; RLENGTH is set to the length of the matched string, −1 if no
849 match is found.
850
851
852 split(s,a[,fs])
853
854 Split the string s into array elements a[1], a[2], ..., a[n], and
855 return n. The separation is done with the extended regular expres‐
856 sion fs or with the field separator FS if fs is not given. Each
857 array element has a string value when created. If the string
858 assigned to any array element, with any occurrence of the decimal-
859 point character from the current locale changed to a period charac‐
860 ter, would be considered a numeric string; the array element also
861 has the numeric value of the numeric string. The effect of a null
862 string as the value of fs is unspecified.
863
864
865 sprintf(fmt,expr,expr,...)
866
867 Format the expressions according to the printf format given by fmt
868 and return the resulting string.
869
870
871 sub(ere,repl[,in])
872
873 Substitute the string repl in place of the first instance of the
874 extended regular expression ERE in string in and return the number
875 of substitutions. An ampersand ( & ) appearing in the string repl
876 is replaced by the string from in that matches the regular expres‐
877 sion. An ampersand preceded with a backslash ( \ ) is interpreted
878 as the literal ampersand character. An occurrence of two consecu‐
879 tive backslashes is interpreted as just a single literal backslash
880 character. Any other occurrence of a backslash (for example, pre‐
881 ceding any other character) is treated as a literal backslash char‐
882 acter. If repl is a string literal, the handling of the ampersand
883 character occurs after any lexical processing, including any lexi‐
884 cal backslash escape sequence processing. If in is specified and it
885 is not an lvalue the behavior is undefined. If in is omitted, nawk
886 uses the current record ($0) in its place.
887
888
889 substr(s,m[,n])
890
891 Return the at most n-character substring of s that begins at posi‐
892 tion m, numbering from 1. If n is missing, the length of the sub‐
893 string is limited by the length of the string s.
894
895
896 tolower(s)
897
898 Return a string based on the string s. Each character in s that is
899 an upper-case letter specified to have a tolower mapping by the
900 LC_CTYPE category of the current locale is replaced in the returned
901 string by the lower-case letter specified by the mapping. Other
902 characters in s are unchanged in the returned string.
903
904
905 toupper(s)
906
907 Return a string based on the string s. Each character in s that is
908 a lower-case letter specified to have a toupper mapping by the
909 LC_CTYPE category of the current locale is replaced in the returned
910 string by the upper-case letter specified by the mapping. Other
911 characters in s are unchanged in the returned string.
912
913
914
915 All of the preceding functions that take ERE as a parameter expect a
916 pattern or a string valued expression that is a regular expression as
917 defined below.
918
919 Input/Output and General Functions
920 The input/output and general functions are:
921
922 close(expression) Close the file or pipe opened by a print or
923 printf statement or a call to getline with
924 the same string-valued expression. If the
925 close was successful, the function returns
926 0; otherwise, it returns non-zero.
927
928
929 expression|getline[var] Read a record of input from a stream piped
930 from the output of a command. The stream is
931 created if no stream is currently open with
932 the value of expression as its command name.
933 The stream created is equivalent to one cre‐
934 ated by a call to the popen function with
935 the value of expression as the command argu‐
936 ment and a value of r as the mode argument.
937 As long as the stream remains open, subse‐
938 quent calls in which expression evaluates to
939 the same string value reads subsequent
940 records from the file. The stream remains
941 open until the close function is called with
942 an expression that evaluates to the same
943 string value. At that time, the stream is
944 closed as if by a call to the pclose func‐
945 tion. If var is missing, $0 and NF is set.
946 Otherwise, var is set.
947
948 The getline operator can form ambiguous con‐
949 structs when there are operators that are
950 not in parentheses (including concatenate)
951 to the left of the | (to the beginning of
952 the expression containing getline). In the
953 context of the $ operator, | behaves as if
954 it had a lower precedence than $. The result
955 of evaluating other operators is unspeci‐
956 fied, and all such uses of portable applica‐
957 tions must be put in parentheses properly.
958
959
960 getline Set $0 to the next input record from the
961 current input file. This form of getline
962 sets the NF, NR, and FNR variables.
963
964
965 getline var Set variable var to the next input record
966 from the current input file. This form of
967 getline sets the FNR and NR variables.
968
969
970 getline [var] < expression Read the next record of input from a
971 named file. The expression is evaluated
972 to produce a string that is used as a
973 full pathname. If the file of that name
974 is not currently open, it is opened. As
975 long as the stream remains open, subse‐
976 quent calls in which expression evaluates
977 to the same string value reads subsequent
978 records from the file. The file remains
979 open until the close function is called
980 with an expression that evaluates to the
981 same string value. If var is missing, $0
982 and NF is set. Otherwise, var is set.
983
984 The getline operator can form ambiguous
985 constructs when there are binary opera‐
986 tors that are not in parentheses (includ‐
987 ing concatenate) to the right of the <
988 (up to the end of the expression contain‐
989 ing the getline). The result of evaluat‐
990 ing such a construct is unspecified, and
991 all such uses of portable applications
992 must be put in parentheses properly.
993
994
995 system(expression) Execute the command given by expression
996 in a manner equivalent to the system(3C)
997 function and return the exit status of
998 the command.
999
1000
1001
1002 All forms of getline return 1 for successful input, 0 for end of file,
1003 and −1 for an error.
1004
1005
1006 Where strings are used as the name of a file or pipeline, the strings
1007 must be textually identical. The terminology ``same string value''
1008 implies that ``equivalent strings'', even those that differ only by
1009 space characters, represent different files.
1010
1011 User-defined Functions
1012 The nawk language also provides user-defined functions. Such functions
1013 can be defined as:
1014
1015 function name(args,...) { statements }
1016
1017
1018
1019 A function can be referred to anywhere in an nawk program; in particu‐
1020 lar, its use can precede its definition. The scope of a function is
1021 global.
1022
1023
1024 Function arguments can be either scalars or arrays; the behavior is
1025 undefined if an array name is passed as an argument that the function
1026 uses as a scalar, or if a scalar expression is passed as an argument
1027 that the function uses as an array. Function arguments are passed by
1028 value if scalar and by reference if array name. Argument names are
1029 local to the function; all other variable names are global. The same
1030 name is not used as both an argument name and as the name of a function
1031 or a special nawk variable. The same name must not be used both as a
1032 variable name with global scope and as the name of a function. The same
1033 name must not be used within the same scope both as a scalar variable
1034 and as an array.
1035
1036
1037 The number of parameters in the function definition need not match the
1038 number of parameters in the function call. Excess formal parameters can
1039 be used as local variables. If fewer arguments are supplied in a func‐
1040 tion call than are in the function definition, the extra parameters
1041 that are used in the function body as scalars are initialized with a
1042 string value of the null string and a numeric value of zero, and the
1043 extra parameters that are used in the function body as arrays are ini‐
1044 tialized as empty arrays. If more arguments are supplied in a function
1045 call than are in the function definition, the behavior is undefined.
1046
1047
1048 When invoking a function, no white space can be placed between the
1049 function name and the opening parenthesis. Function calls can be nested
1050 and recursive calls can be made upon functions. Upon return from any
1051 nested or recursive function call, the values of all of the calling
1052 function's parameters are unchanged, except for array parameters passed
1053 by reference. The return statement can be used to return a value. If a
1054 return statement appears outside of a function definition, the behavior
1055 is undefined.
1056
1057
1058 In the function definition, newline characters are optional before the
1059 opening brace and after the closing brace. Function definitions can
1060 appear anywhere in the program where a pattern-action pair is allowed.
1061
1063 The index, length, match, and substr functions should not be confused
1064 with similar functions in the ISO C standard; the nawk versions deal
1065 with characters, while the ISO C standard deals with bytes.
1066
1067
1068 Because the concatenation operation is represented by adjacent expres‐
1069 sions rather than an explicit operator, it is often necessary to use
1070 parentheses to enforce the proper evaluation precedence.
1071
1072
1073 See largefile(5) for the description of the behavior of nawk when
1074 encountering files greater than or equal to 2 Gbyte (2^31 bytes).
1075
1077 The nawk program specified in the command line is most easily specified
1078 within single-quotes (for example, 'program') for applications using
1079 sh, because nawk programs commonly contain characters that are special
1080 to the shell, including double-quotes. In the cases where a nawk pro‐
1081 gram contains single-quote characters, it is usually easiest to specify
1082 most of the program as strings within single-quotes concatenated by the
1083 shell with quoted single-quote characters. For example:
1084
1085 nawk '/'\''/ { print "quote:", $0 }'
1086
1087
1088
1089 prints all lines from the standard input containing a single-quote
1090 character, prefixed with quote:.
1091
1092
1093 The following are examples of simple nawk programs:
1094
1095 Example 1 Write to the standard output all input lines for which field
1096 3 is greater than 5:
1097
1098 $3 > 5
1099
1100
1101
1102 Example 2 Write every tenth line:
1103
1104 (NR % 10) == 0
1105
1106
1107
1108 Example 3 Write any line with a substring matching the regular expres‐
1109 sion:
1110
1111 /(G|D)(2[0-9][[:alpha:]]*)/
1112
1113
1114
1115 Example 4 Print any line with a substring containing a G or D, followed
1116 by a sequence of digits and characters:
1117
1118
1119 This example uses character classes digit and alpha to match language-
1120 independent digit and alphabetic characters, respectively.
1121
1122
1123 /(G|D)([[:digit:][:alpha:]]*)/
1124
1125
1126
1127 Example 5 Write any line in which the second field matches the regular
1128 expression and the fourth field does not:
1129
1130 $2 ~ /xyz/ && $4 !~ /xyz/
1131
1132
1133
1134 Example 6 Write any line in which the second field contains a back‐
1135 slash:
1136
1137 $2 ~ /\\/
1138
1139
1140
1141 Example 7 Write any line in which the second field contains a backslash
1142 (alternate method):
1143
1144
1145 Notice that backslash escapes are interpreted twice, once in lexical
1146 processing of the string and once in processing the regular expression.
1147
1148
1149 $2 ~ "\\\\"
1150
1151
1152
1153 Example 8 Write the second to the last and the last field in each line,
1154 separating the fields by a colon:
1155
1156 {OFS=":";print $(NF-1), $NF}
1157
1158
1159
1160 Example 9 Write the line number and number of fields in each line:
1161
1162
1163 The three strings representing the line number, the colon and the num‐
1164 ber of fields are concatenated and that string is written to standard
1165 output.
1166
1167
1168 {print NR ":" NF}
1169
1170
1171
1172 Example 10 Write lines longer than 72 characters:
1173
1174 {length($0) > 72}
1175
1176
1177
1178 Example 11 Write first two fields in opposite order separated by the
1179 OFS:
1180
1181 { print $2, $1 }
1182
1183
1184
1185 Example 12 Same, with input fields separated by comma or space and tab
1186 characters, or both:
1187
1188 BEGIN { FS = ",[\t]*|[\t]+" }
1189 { print $2, $1 }
1190
1191
1192
1193 Example 13 Add up first column, print sum and average:
1194
1195 {s += $1 }
1196 END {print "sum is ", s, " average is", s/NR}
1197
1198
1199
1200 Example 14 Write fields in reverse order, one per line (many lines out
1201 for each line in):
1202
1203 { for (i = NF; i > 0; --i) print $i }
1204
1205
1206
1207 Example 15 Write all lines between occurrences of the strings "start"
1208 and "stop":
1209
1210 /start/, /stop/
1211
1212
1213
1214 Example 16 Write all lines whose first field is different from the pre‐
1215 vious one:
1216
1217 $1 != prev { print; prev = $1 }
1218
1219
1220
1221 Example 17 Simulate the echo command:
1222
1223 BEGIN {
1224 for (i = 1; i < ARGC; ++i)
1225 printf "%s%s", ARGV[i], i==ARGC-1?"\n":""
1226 }
1227
1228
1229
1230 Example 18 Write the path prefixes contained in the PATH environment
1231 variable, one per line:
1232
1233 BEGIN {
1234 n = split (ENVIRON["PATH"], path, ":")
1235 for (i = 1; i <= n; ++i)
1236 print path[i]
1237 }
1238
1239
1240
1241 Example 19 Print the file "input", filling in page numbers starting at
1242 5:
1243
1244
1245 If there is a file named input containing page headers of the form
1246
1247
1248 Page#
1249
1250
1251
1252 and a file named program that contains
1253
1254
1255 /Page/{ $2 = n++; }
1256 { print }
1257
1258
1259
1260 then the command line
1261
1262
1263 nawk -f program n=5 input
1264
1265
1266
1267
1268 prints the file input, filling in page numbers starting at 5.
1269
1270
1272 See environ(5) for descriptions of the following environment variables
1273 that affect execution: LC_COLLATE, LC_CTYPE, LC_MESSAGES, and NLSPATH.
1274
1275 LC_NUMERIC Determine the radix character used when interpreting
1276 numeric input, performing conversions between numeric and
1277 string values and formatting numeric output. Regardless
1278 of locale, the period character (the decimal-point char‐
1279 acter of the POSIX locale) is the decimal-point character
1280 recognized in processing awk programs (including assign‐
1281 ments in command-line arguments).
1282
1283
1285 The following exit values are returned:
1286
1287 0 All input files were processed successfully.
1288
1289
1290 >0 An error occurred.
1291
1292
1293
1294 The exit status can be altered within the program by using an exit
1295 expression.
1296
1298 See attributes(5) for descriptions of the following attributes:
1299
1300 /usr/bin/nawk
1301 ┌─────────────────────────────┬─────────────────────────────┐
1302 │ ATTRIBUTE TYPE │ ATTRIBUTE VALUE │
1303 ├─────────────────────────────┼─────────────────────────────┤
1304 │Availability │SUNWcsu │
1305 └─────────────────────────────┴─────────────────────────────┘
1306
1307 /usr/xpg4/bin/awk
1308 ┌─────────────────────────────┬─────────────────────────────┐
1309 │ ATTRIBUTE TYPE │ ATTRIBUTE VALUE │
1310 ├─────────────────────────────┼─────────────────────────────┤
1311 │Availability │SUNWxcu4 │
1312 └─────────────────────────────┴─────────────────────────────┘
1313
1315 awk(1), ed(1), egrep(1), grep(1), lex(1), sed(1), popen(3C),
1316 printf(3C), system(3C), attributes(5), environ(5), largefile(5),
1317 regex(5), XPG4(5)
1318
1319
1320 Aho, A. V., B. W. Kernighan, and P. J. Weinberger, The AWK Programming
1321 Language, Addison-Wesley, 1988.
1322
1324 If any file operand is specified and the named file cannot be accessed,
1325 nawk writes a diagnostic message to standard error and terminate with‐
1326 out any further action.
1327
1328
1329 If the program specified by either the program operand or a progfile
1330 operand is not a valid nawk program (as specified in EXTENDED DESCRIP‐
1331 TION), the behavior is undefined.
1332
1334 Input white space is not preserved on output if fields are involved.
1335
1336
1337 There are no explicit conversions between numbers and strings. To force
1338 an expression to be treated as a number add 0 to it; to force it to be
1339 treated as a string concatenate the null string ("") to it.
1340
1341
1342
1343SunOS 5.11 24 May 2006 nawk(1)