1GAWK(1) Utility Commands GAWK(1)
2
3
4
6 gawk - pattern scanning and processing language
7
9 gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
10 gawk [ POSIX or GNU style options ] [ -- ] program-text file ...
11
13 Gawk is the GNU Project's implementation of the AWK programming lan‐
14 guage. It conforms to the definition of the language in the POSIX
15 1003.1 standard. This version in turn is based on the description in
16 The AWK Programming Language, by Aho, Kernighan, and Weinberger. Gawk
17 provides the additional features found in the current version of Brian
18 Kernighan's awk and numerous GNU-specific extensions.
19
20 The command line consists of options to gawk itself, the AWK program
21 text (if not supplied via the -f or --include options), and values to
22 be made available in the ARGC and ARGV pre-defined AWK variables.
23
24 When gawk is invoked with the --profile option, it starts gathering
25 profiling statistics from the execution of the program. Gawk runs more
26 slowly in this mode, and automatically produces an execution profile in
27 the file awkprof.out when done. See the --profile option, below.
28
29 Gawk also has an integrated debugger. An interactive debugging session
30 can be started by supplying the --debug option to the command line. In
31 this mode of execution, gawk loads the AWK source code and then prompts
32 for debugging commands. Gawk can only debug AWK program source pro‐
33 vided with the -f and --include options. The debugger is documented in
34 GAWK: Effective AWK Programming.
35
37 Gawk options may be either traditional POSIX-style one letter options,
38 or GNU-style long options. POSIX options start with a single “-”,
39 while long options start with “--”. Long options are provided for both
40 GNU-specific features and for POSIX-mandated features.
41
42 Gawk-specific options are typically used in long-option form. Argu‐
43 ments to long options are either joined with the option by an = sign,
44 with no intervening spaces, or they may be provided in the next command
45 line argument. Long options may be abbreviated, as long as the abbre‐
46 viation remains unique.
47
48 Additionally, every long option has a corresponding short option, so
49 that the option's functionality may be used from within #! executable
50 scripts.
51
53 Gawk accepts the following options. Standard options are listed first,
54 followed by options for gawk extensions, listed alphabetically by short
55 option.
56
57 -f program-file
58 --file program-file
59 Read the AWK program source from the file program-file, instead
60 of from the first command line argument. Multiple -f (or
61 --file) options may be used. Files read with -f are treated as
62 if they begin with an implicit @namespace "awk" statement.
63
64 -F fs
65 --field-separator fs
66 Use fs for the input field separator (the value of the FS prede‐
67 fined variable).
68
69 -v var=val
70 --assign var=val
71 Assign the value val to the variable var, before execution of
72 the program begins. Such variable values are available to the
73 BEGIN rule of an AWK program.
74
75 -b
76 --characters-as-bytes
77 Treat all input data as single-byte characters. In other words,
78 don't pay any attention to the locale information when attempt‐
79 ing to process strings as multibyte characters. The --posix
80 option overrides this one.
81
82 -c
83 --traditional
84 Run in compatibility mode. In compatibility mode, gawk behaves
85 identically to Brian Kernighan's awk; none of the GNU-specific
86 extensions are recognized. See GNU EXTENSIONS, below, for more
87 information.
88
89 -C
90 --copyright
91 Print the short version of the GNU copyright information message
92 on the standard output and exit successfully.
93
94 -d[file]
95 --dump-variables[=file]
96 Print a sorted list of global variables, their types and final
97 values to file. If no file is provided, gawk uses a file named
98 awkvars.out in the current directory.
99 Having a list of all the global variables is a good way to look
100 for typographical errors in your programs. You would also use
101 this option if you have a large program with a lot of functions,
102 and you want to be sure that your functions don't inadvertently
103 use global variables that you meant to be local. (This is a
104 particularly easy mistake to make with simple variable names
105 like i, j, and so on.)
106
107 -D[file]
108 --debug[=file]
109 Enable debugging of AWK programs. By default, the debugger
110 reads commands interactively from the keyboard (standard input).
111 The optional file argument specifies a file with a list of com‐
112 mands for the debugger to execute non-interactively.
113
114 -e program-text
115 --source program-text
116 Use program-text as AWK program source code. This option allows
117 the easy intermixing of library functions (used via the -f and
118 --include options) with source code entered on the command line.
119 It is intended primarily for medium to large AWK programs used
120 in shell scripts. Each argument supplied via -e is treated as
121 if it begins with an implicit @namespace "awk" statement.
122
123 -E file
124 --exec file
125 Similar to -f, however, this is option is the last one pro‐
126 cessed. This should be used with #! scripts, particularly for
127 CGI applications, to avoid passing in options or source code (!)
128 on the command line from a URL. This option disables command-
129 line variable assignments.
130
131 -g
132 --gen-pot
133 Scan and parse the AWK program, and generate a GNU .pot (Porta‐
134 ble Object Template) format file on standard output with entries
135 for all localizable strings in the program. The program itself
136 is not executed. See the GNU gettext distribution for more
137 information on .pot files.
138
139 -h
140 --help Print a relatively short summary of the available options on the
141 standard output. (Per the GNU Coding Standards, these options
142 cause an immediate, successful exit.)
143
144 -i include-file
145 --include include-file
146 Load an awk source library. This searches for the library using
147 the AWKPATH environment variable. If the initial search fails,
148 another attempt will be made after appending the .awk suffix.
149 The file will be loaded only once (i.e., duplicates are elimi‐
150 nated), and the code does not constitute the main program
151 source. Files read with --include are treated as if they begin
152 with an implicit @namespace "awk" statement.
153
154 -l lib
155 --load lib
156 Load a gawk extension from the shared library lib. This
157 searches for the library using the AWKLIBPATH environment vari‐
158 able. If the initial search fails, another attempt will be made
159 after appending the default shared library suffix for the plat‐
160 form. The library initialization routine is expected to be
161 named dl_load().
162
163 -L [value]
164 --lint[=value]
165 Provide warnings about constructs that are dubious or non-porta‐
166 ble to other AWK implementations. With an optional argument of
167 fatal, lint warnings become fatal errors. This may be drastic,
168 but its use will certainly encourage the development of cleaner
169 AWK programs. With an optional argument of invalid, only warn‐
170 ings about things that are actually invalid are issued. (This is
171 not fully implemented yet.) With an optional argument of no-
172 ext, warnings about gawk extensions are disabled.
173
174 -M
175 --bignum
176 Force arbitrary precision arithmetic on numbers. This option has
177 no effect if gawk is not compiled to use the GNU MPFR and GMP
178 libraries. (In such a case, gawk issues a warning.)
179
180 -n
181 --non-decimal-data
182 Recognize octal and hexadecimal values in input data. Use this
183 option with great caution!
184
185 -N
186 --use-lc-numeric
187 Force gawk to use the locale's decimal point character when
188 parsing input data. Although the POSIX standard requires this
189 behavior, and gawk does so when --posix is in effect, the
190 default is to follow traditional behavior and use a period as
191 the decimal point, even in locales where the period is not the
192 decimal point character. This option overrides the default
193 behavior, without the full draconian strictness of the --posix
194 option.
195
196 -o[file]
197 --pretty-print[=file]
198 Output a pretty printed version of the program to file. If no
199 file is provided, gawk uses a file named awkprof.out in the cur‐
200 rent directory. This option implies --no-optimize.
201
202 -O
203 --optimize
204 Enable gawk's default optimizations upon the internal represen‐
205 tation of the program. Currently, this just includes simple
206 constant folding. This option is on by default.
207
208 -p[prof-file]
209 --profile[=prof-file]
210 Start a profiling session, and send the profiling data to prof-
211 file. The default is awkprof.out. The profile contains execu‐
212 tion counts of each statement in the program in the left margin
213 and function call counts for each user-defined function. This
214 option implies --no-optimize.
215
216 -P
217 --posix
218 This turns on compatibility mode, with the following additional
219 restrictions:
220
221 · \x escape sequences are not recognized.
222
223 · You cannot continue lines after ? and :.
224
225 · The synonym func for the keyword function is not recognized.
226
227 · The operators ** and **= cannot be used in place of ^ and ^=.
228
229 -r
230 --re-interval
231 Enable the use of interval expressions in regular expression
232 matching (see Regular Expressions, below). Interval expressions
233 were not traditionally available in the AWK language. The POSIX
234 standard added them, to make awk and egrep consistent with each
235 other. They are enabled by default, but this option remains for
236 use together with --traditional.
237
238 -s
239 --no-optimize
240 Disable gawk's default optimizations upon the internal represen‐
241 tation of the program.
242
243 -S
244 --sandbox
245 Run gawk in sandbox mode, disabling the system() function, input
246 redirection with getline, output redirection with print and
247 printf, and loading dynamic extensions. Command execution
248 (through pipelines) is also disabled. This effectively blocks a
249 script from accessing local resources, except for the files
250 specified on the command line.
251
252 -t
253 --lint-old
254 Provide warnings about constructs that are not portable to the
255 original version of UNIX awk.
256
257 -V
258 --version
259 Print version information for this particular copy of gawk on
260 the standard output. This is useful mainly for knowing if the
261 current copy of gawk on your system is up to date with respect
262 to whatever the Free Software Foundation is distributing. This
263 is also useful when reporting bugs. (Per the GNU Coding Stan‐
264 dards, these options cause an immediate, successful exit.)
265
266 -- Signal the end of options. This is useful to allow further argu‐
267 ments to the AWK program itself to start with a “-”. This pro‐
268 vides consistency with the argument parsing convention used by
269 most other POSIX programs.
270
271 In compatibility mode, any other options are flagged as invalid, but
272 are otherwise ignored. In normal operation, as long as program text
273 has been supplied, unknown options are passed on to the AWK program in
274 the ARGV array for processing. This is particularly useful for running
275 AWK programs via the #! executable interpreter mechanism.
276
277 For POSIX compatibility, the -W option may be used, followed by the
278 name of a long option.
279
281 An AWK program consists of a sequence of optional directives, pattern-
282 action statements, and optional function definitions.
283
284 @include "filename"
285 @load "filename"
286 @namespace "name"
287 pattern { action statements }
288 function name(parameter list) { statements }
289
290 Gawk first reads the program source from the program-file(s) if speci‐
291 fied, from arguments to --source, or from the first non-option argument
292 on the command line. The -f and --source options may be used multiple
293 times on the command line. Gawk reads the program text as if all the
294 program-files and command line source texts had been concatenated
295 together. This is useful for building libraries of AWK functions,
296 without having to include them in each new AWK program that uses them.
297 It also provides the ability to mix library functions with command line
298 programs.
299
300 In addition, lines beginning with @include may be used to include other
301 source files into your program, making library use even easier. This
302 is equivalent to using the --include option.
303
304 Lines beginning with @load may be used to load extension functions into
305 your program. This is equivalent to using the --load option.
306
307 The environment variable AWKPATH specifies a search path to use when
308 finding source files named with the -f and --include options. If this
309 variable does not exist, the default path is ".:/usr/local/share/awk".
310 (The actual directory may vary, depending upon how gawk was built and
311 installed.) If a file name given to the -f option contains a “/” char‐
312 acter, no path search is performed.
313
314 The environment variable AWKLIBPATH specifies a search path to use when
315 finding source files named with the --load option. If this variable
316 does not exist, the default path is "/usr/local/lib/gawk". (The actual
317 directory may vary, depending upon how gawk was built and installed.)
318
319 Gawk executes AWK programs in the following order. First, all variable
320 assignments specified via the -v option are performed. Next, gawk com‐
321 piles the program into an internal form. Then, gawk executes the code
322 in the BEGIN rule(s) (if any), and then proceeds to read each file
323 named in the ARGV array (up to ARGV[ARGC-1]). If there are no files
324 named on the command line, gawk reads the standard input.
325
326 If a filename on the command line has the form var=val it is treated as
327 a variable assignment. The variable var will be assigned the value
328 val. (This happens after any BEGIN rule(s) have been run.) Command
329 line variable assignment is most useful for dynamically assigning val‐
330 ues to the variables AWK uses to control how input is broken into
331 fields and records. It is also useful for controlling state if multi‐
332 ple passes are needed over a single data file.
333
334 If the value of a particular element of ARGV is empty (""), gawk skips
335 over it.
336
337 For each input file, if a BEGINFILE rule exists, gawk executes the
338 associated code before processing the contents of the file. Similarly,
339 gawk executes the code associated with ENDFILE after processing the
340 file.
341
342 For each record in the input, gawk tests to see if it matches any pat‐
343 tern in the AWK program. For each pattern that the record matches,
344 gawk executes the associated action. The patterns are tested in the
345 order they occur in the program.
346
347 Finally, after all the input is exhausted, gawk executes the code in
348 the END rule(s) (if any).
349
350 Command Line Directories
351 According to POSIX, files named on the awk command line must be text
352 files. The behavior is ``undefined'' if they are not. Most versions
353 of awk treat a directory on the command line as a fatal error.
354
355 Starting with version 4.0 of gawk, a directory on the command line pro‐
356 duces a warning, but is otherwise skipped. If either of the --posix or
357 --traditional options is given, then gawk reverts to treating directo‐
358 ries on the command line as a fatal error.
359
361 AWK variables are dynamic; they come into existence when they are first
362 used. Their values are either floating-point numbers or strings, or
363 both, depending upon how they are used. Additionally, gawk allows
364 variables to have regular-expression type. AWK also has one dimen‐
365 sional arrays; arrays with multiple dimensions may be simulated. Gawk
366 provides true arrays of arrays; see Arrays, below. Several pre-defined
367 variables are set as a program runs; these are described as needed and
368 summarized below.
369
370 Records
371 Normally, records are separated by newline characters. You can control
372 how records are separated by assigning values to the built-in variable
373 RS. If RS is any single character, that character separates records.
374 Otherwise, RS is a regular expression. Text in the input that matches
375 this regular expression separates the record. However, in compatibil‐
376 ity mode, only the first character of its string value is used for sep‐
377 arating records. If RS is set to the null string, then records are
378 separated by empty lines. When RS is set to the null string, the new‐
379 line character always acts as a field separator, in addition to what‐
380 ever value FS may have.
381
382 Fields
383 As each input record is read, gawk splits the record into fields, using
384 the value of the FS variable as the field separator. If FS is a single
385 character, fields are separated by that character. If FS is the null
386 string, then each individual character becomes a separate field. Oth‐
387 erwise, FS is expected to be a full regular expression. In the special
388 case that FS is a single space, fields are separated by runs of spaces
389 and/or tabs and/or newlines. NOTE: The value of IGNORECASE (see below)
390 also affects how fields are split when FS is a regular expression, and
391 how records are separated when RS is a regular expression.
392
393 If the FIELDWIDTHS variable is set to a space-separated list of num‐
394 bers, each field is expected to have fixed width, and gawk splits up
395 the record using the specified widths. Each field width may optionally
396 be preceded by a colon-separated value specifying the number of charac‐
397 ters to skip before the field starts. The value of FS is ignored.
398 Assigning a new value to FS or FPAT overrides the use of FIELDWIDTHS.
399
400 Similarly, if the FPAT variable is set to a string representing a regu‐
401 lar expression, each field is made up of text that matches that regular
402 expression. In this case, the regular expression describes the fields
403 themselves, instead of the text that separates the fields. Assigning a
404 new value to FS or FIELDWIDTHS overrides the use of FPAT.
405
406 Each field in the input record may be referenced by its position: $1,
407 $2, and so on. $0 is the whole record, including leading and trailing
408 whitespace. Fields need not be referenced by constants:
409
410 n = 5
411 print $n
412
413 prints the fifth field in the input record.
414
415 The variable NF is set to the total number of fields in the input
416 record.
417
418 References to non-existent fields (i.e., fields after $NF) produce the
419 null string. However, assigning to a non-existent field (e.g., $(NF+2)
420 = 5) increases the value of NF, creates any intervening fields with the
421 null string as their values, and causes the value of $0 to be recom‐
422 puted, with the fields being separated by the value of OFS. References
423 to negative numbered fields cause a fatal error. Decrementing NF
424 causes the values of fields past the new value to be lost, and the
425 value of $0 to be recomputed, with the fields being separated by the
426 value of OFS.
427
428 Assigning a value to an existing field causes the whole record to be
429 rebuilt when $0 is referenced. Similarly, assigning a value to $0
430 causes the record to be resplit, creating new values for the fields.
431
432 Built-in Variables
433 Gawk's built-in variables are:
434
435 ARGC The number of command line arguments (does not include
436 options to gawk, or the program source).
437
438 ARGIND The index in ARGV of the current file being processed.
439
440 ARGV Array of command line arguments. The array is indexed from
441 0 to ARGC - 1. Dynamically changing the contents of ARGV
442 can control the files used for data.
443
444 BINMODE On non-POSIX systems, specifies use of “binary” mode for
445 all file I/O. Numeric values of 1, 2, or 3, specify that
446 input files, output files, or all files, respectively,
447 should use binary I/O. String values of "r", or "w" spec‐
448 ify that input files, or output files, respectively, should
449 use binary I/O. String values of "rw" or "wr" specify that
450 all files should use binary I/O. Any other string value is
451 treated as "rw", but generates a warning message.
452
453 CONVFMT The conversion format for numbers, "%.6g", by default.
454
455 ENVIRON An array containing the values of the current environment.
456 The array is indexed by the environment variables, each
457 element being the value of that variable (e.g., ENVI‐
458 RON["HOME"] might be "/home/arnold").
459
460 In POSIX mode, changing this array does not affect the
461 environment seen by programs which gawk spawns via redi‐
462 rection or the system() function. Otherwise, gawk updates
463 its real environment so that programs it spawns see the
464 changes.
465
466 ERRNO If a system error occurs either doing a redirection for
467 getline, during a read for getline, or during a close(),
468 then ERRNO is set to a string describing the error. The
469 value is subject to translation in non-English locales. If
470 the string in ERRNO corresponds to a system error in the
471 errno(3) variable, then the numeric value can be found in
472 PROCINFO["errno"]. For non-system errors,
473 PROCINFO["errno"] will be zero.
474
475 FIELDWIDTHS A whitespace-separated list of field widths. When set,
476 gawk parses the input into fields of fixed width, instead
477 of using the value of the FS variable as the field separa‐
478 tor. Each field width may optionally be preceded by a
479 colon-separated value specifying the number of characters
480 to skip before the field starts. See Fields, above.
481
482 FILENAME The name of the current input file. If no files are speci‐
483 fied on the command line, the value of FILENAME is “-”.
484 However, FILENAME is undefined inside the BEGIN rule
485 (unless set by getline).
486
487 FNR The input record number in the current input file.
488
489 FPAT A regular expression describing the contents of the fields
490 in a record. When set, gawk parses the input into fields,
491 where the fields match the regular expression, instead of
492 using the value of FS as the field separator. See Fields,
493 above.
494
495 FS The input field separator, a space by default. See Fields,
496 above.
497
498 FUNCTAB An array whose indices and corresponding values are the
499 names of all the user-defined or extension functions in the
500 program. NOTE: You may not use the delete statement with
501 the FUNCTAB array.
502
503 IGNORECASE Controls the case-sensitivity of all regular expression and
504 string operations. If IGNORECASE has a non-zero value,
505 then string comparisons and pattern matching in rules,
506 field splitting with FS and FPAT, record separating with
507 RS, regular expression matching with ~ and !~, and the gen‐
508 sub(), gsub(), index(), match(), patsplit(), split(), and
509 sub() built-in functions all ignore case when doing regular
510 expression operations. NOTE: Array subscripting is not
511 affected. However, the asort() and asorti() functions are
512 affected.
513 Thus, if IGNORECASE is not equal to zero, /aB/ matches all
514 of the strings "ab", "aB", "Ab", and "AB". As with all AWK
515 variables, the initial value of IGNORECASE is zero, so all
516 regular expression and string operations are normally case-
517 sensitive.
518
519 LINT Provides dynamic control of the --lint option from within
520 an AWK program. When true, gawk prints lint warnings. When
521 false, it does not. When assigned the string value
522 "fatal", lint warnings become fatal errors, exactly like
523 --lint=fatal. Any other true value just prints warnings.
524
525 NF The number of fields in the current input record.
526
527 NR The total number of input records seen so far.
528
529 OFMT The output format for numbers, "%.6g", by default.
530
531 OFS The output field separator, a space by default.
532
533 ORS The output record separator, by default a newline.
534
535 PREC The working precision of arbitrary precision floating-point
536 numbers, 53 by default.
537
538 PROCINFO The elements of this array provide access to information
539 about the running AWK program. On some systems, there may
540 be elements in the array, "group1" through "groupn" for
541 some n, which is the number of supplementary groups that
542 the process has. Use the in operator to test for these
543 elements. The following elements are guaranteed to be
544 available:
545
546 PROCINFO["argv"] The command line arguments as received
547 by gawk at the C-language level. The
548 subscripts start from zero.
549
550 PROCINFO["egid"] The value of the getegid(2) system
551 call.
552
553 PROCINFO["errno"] The value of errno(3) when ERRNO is
554 set to the associated error message.
555
556 PROCINFO["euid"] The value of the geteuid(2) system
557 call.
558
559 PROCINFO["FS"] "FS" if field splitting with FS is in
560 effect, "FPAT" if field splitting with
561 FPAT is in effect, "FIELDWIDTHS" if
562 field splitting with FIELDWIDTHS is in
563 effect, or "API" if API input parser
564 field splitting is in effect.
565
566 PROCINFO["gid"] The value of the getgid(2) system
567 call.
568
569 PROCINFO["identifiers"]
570 A subarray, indexed by the names of
571 all identifiers used in the text of
572 the AWK program. The values indicate
573 what gawk knows about the identifiers
574 after it has finished parsing the pro‐
575 gram; they are not updated while the
576 program runs. For each identifier,
577 the value of the element is one of the
578 following:
579
580 "array" The identifier is an
581 array.
582
583 "builtin" The identifier is a built-
584 in function.
585
586 "extension" The identifier is an
587 extension function loaded
588 via @load or --load.
589
590 "scalar" The identifier is a
591 scalar.
592
593 "untyped" The identifier is untyped
594 (could be used as a scalar
595 or array, gawk doesn't
596 know yet).
597
598 "user" The identifier is a user-
599 defined function.
600
601 PROCINFO["pgrpid"] The value of the getpgrp(2) system
602 call.
603
604 PROCINFO["pid"] The value of the getpid(2) system
605 call.
606
607 PROCINFO["platform"] A string indicating the platform for
608 which gawk was compiled. It is one
609 of:
610
611 "djgpp", "mingw"
612 Microsoft Windows, using either
613 DJGPP, or MinGW, respectively.
614
615 "os2" OS/2.
616
617 "posix"
618 GNU/Linux, Cygwin, Mac OS X,
619 and legacy Unix systems.
620
621 "vms" OpenVMS or Vax/VMS.
622
623 PROCINFO["ppid"] The value of the getppid(2) system
624 call.
625
626 PROCINFO["strftime"] The default time format string for
627 strftime(). Changing its value
628 affects how strftime() formats time
629 values when called with no arguments.
630
631 PROCINFO["uid"] The value of the getuid(2) system
632 call.
633
634 PROCINFO["version"] The version of gawk.
635
636 The following elements are present if loading dynamic
637 extensions is available:
638
639 PROCINFO["api_major"]
640 The major version of the extension API.
641
642 PROCINFO["api_minor"]
643 The minor version of the extension API.
644
645 The following elements are available if MPFR support is
646 compiled into gawk:
647
648 PROCINFO["gmp_version"]
649 The version of the GNU GMP library used for arbi‐
650 trary precision number support in gawk.
651
652 PROCINFO["mpfr_version"]
653 The version of the GNU MPFR library used for arbi‐
654 trary precision number support in gawk.
655
656 PROCINFO["prec_max"]
657 The maximum precision supported by the GNU MPFR
658 library for arbitrary precision floating-point num‐
659 bers.
660
661 PROCINFO["prec_min"]
662 The minimum precision allowed by the GNU MPFR
663 library for arbitrary precision floating-point num‐
664 bers.
665
666 The following elements may set by a program to change
667 gawk's behavior:
668
669 PROCINFO["NONFATAL"]
670 If this exists, then I/O errors for all redirections
671 become nonfatal.
672
673 PROCINFO["name", "NONFATAL"]
674 Make I/O errors for name be nonfatal.
675
676 PROCINFO["command", "pty"]
677 Use a pseudo-tty for two-way communication with com‐
678 mand instead of setting up two one-way pipes.
679
680 PROCINFO["input", "READ_TIMEOUT"]
681 The timeout in milliseconds for reading data from
682 input, where input is a redirection string or a
683 filename. A value of zero or less than zero means no
684 timeout.
685
686 PROCINFO["input", "RETRY"]
687 If an I/O error that may be retried occurs when
688 reading data from input, and this array entry
689 exists, then getline returns -2 instead of following
690 the default behavior of returning -1 and configuring
691 input to return no further data. An I/O error that
692 may be retried is one where errno(3) has the value
693 EAGAIN, EWOULDBLOCK, EINTR, or ETIMEDOUT. This may
694 be useful in conjunction with PROCINFO["input",
695 "READ_TIMEOUT"] or in situations where a file
696 descriptor has been configured to behave in a non-
697 blocking fashion.
698
699 PROCINFO["sorted_in"]
700 If this element exists in PROCINFO, then its value
701 controls the order in which array elements are tra‐
702 versed in for loops. Supported values are
703 "@ind_str_asc", "@ind_num_asc", "@val_type_asc",
704 "@val_str_asc", "@val_num_asc", "@ind_str_desc",
705 "@ind_num_desc", "@val_type_desc", "@val_str_desc",
706 "@val_num_desc", and "@unsorted". The value can
707 also be the name (as a string) of any comparison
708 function defined as follows:
709
710 function cmp_func(i1, v1, i2, v2)
711
712 where i1 and i2 are the indices, and v1 and v2 are
713 the corresponding values of the two elements being
714 compared. It should return a number less than,
715 equal to, or greater than 0, depending on how the
716 elements of the array are to be ordered.
717
718 ROUNDMODE The rounding mode to use for arbitrary precision arithmetic
719 on numbers, by default "N" (IEEE-754 roundTiesToEven mode).
720 The accepted values are:
721
722 "A" or "a"
723 for rounding away from zero. These are only avail‐
724 able if your version of the GNU MPFR library sup‐
725 ports rounding away from zero.
726
727 "D" or "d" for roundTowardNegative.
728
729 "N" or "n" for roundTiesToEven.
730
731 "U" or "u" for roundTowardPositive.
732
733 "Z" or "z" for roundTowardZero.
734
735 RS The input record separator, by default a newline.
736
737 RT The record terminator. Gawk sets RT to the input text that
738 matched the character or regular expression specified by
739 RS.
740
741 RSTART The index of the first character matched by match(); 0 if
742 no match. (This implies that character indices start at
743 one.)
744
745 RLENGTH The length of the string matched by match(); -1 if no
746 match.
747
748 SUBSEP The string used to separate multiple subscripts in array
749 elements, by default "\034".
750
751 SYMTAB An array whose indices are the names of all currently
752 defined global variables and arrays in the program. The
753 array may be used for indirect access to read or write the
754 value of a variable:
755
756 foo = 5
757 SYMTAB["foo"] = 4
758 print foo # prints 4
759
760 The typeof() function may be used to test if an element in
761 SYMTAB is an array. You may not use the delete statement
762 with the SYMTAB array, nor assign to elements with an index
763 that is not a variable name.
764
765 TEXTDOMAIN The text domain of the AWK program; used to find the local‐
766 ized translations for the program's strings.
767
768 Arrays
769 Arrays are subscripted with an expression between square brackets ([
770 and ]). If the expression is an expression list (expr, expr ...) then
771 the array subscript is a string consisting of the concatenation of the
772 (string) value of each expression, separated by the value of the SUBSEP
773 variable. This facility is used to simulate multiply dimensioned
774 arrays. For example:
775
776 i = "A"; j = "B"; k = "C"
777 x[i, j, k] = "hello, world\n"
778
779 assigns the string "hello, world\n" to the element of the array x which
780 is indexed by the string "A\034B\034C". All arrays in AWK are associa‐
781 tive, i.e., indexed by string values.
782
783 The special operator in may be used to test if an array has an index
784 consisting of a particular value:
785
786 if (val in array)
787 print array[val]
788
789 If the array has multiple subscripts, use (i, j) in array.
790
791 The in construct may also be used in a for loop to iterate over all the
792 elements of an array. However, the (i, j) in array construct only
793 works in tests, not in for loops.
794
795 An element may be deleted from an array using the delete statement.
796 The delete statement may also be used to delete the entire contents of
797 an array, just by specifying the array name without a subscript.
798
799 gawk supports true multidimensional arrays. It does not require that
800 such arrays be ``rectangular'' as in C or C++. For example:
801
802 a[1] = 5
803 a[2][1] = 6
804 a[2][2] = 7
805
806 NOTE: You may need to tell gawk that an array element is really a sub‐
807 array in order to use it where gawk expects an array (such as in the
808 second argument to split()). You can do this by creating an element in
809 the subarray and then deleting it with the delete statement.
810
811 Namespaces
812 Gawk provides a simple namespace facility to help work around the fact
813 that all variables in AWK are global.
814
815 A qualified name consists of a two simple identifiers joined by a dou‐
816 ble colon (::). The left-hand identifier represents the namespace and
817 the right-hand identifier is the variable within it. All simple (non-
818 qualified) names are considered to be in the ``current'' namespace; the
819 default namespace is awk. However, simple identifiers consisting
820 solely of uppercase letters are forced into the awk namespace, even if
821 the current namespace is different.
822
823 You change the current namespace with an @namespace "name" directive.
824
825 The standard predefined builtin function names may not be used as
826 namespace names. The names of additional functions provided by gawk
827 may be used as namespace names or as simple identifiers in other names‐
828 paces. For more details, see GAWK: Effective AWK Programming.
829
830 Variable Typing And Conversion
831 Variables and fields may be (floating point) numbers, or strings, or
832 both. They may also be regular expressions. How the value of a vari‐
833 able is interpreted depends upon its context. If used in a numeric
834 expression, it will be treated as a number; if used as a string it will
835 be treated as a string.
836
837 To force a variable to be treated as a number, add zero to it; to force
838 it to be treated as a string, concatenate it with the null string.
839
840 Uninitialized variables have the numeric value zero and the string
841 value "" (the null, or empty, string).
842
843 When a string must be converted to a number, the conversion is accom‐
844 plished using strtod(3). A number is converted to a string by using
845 the value of CONVFMT as a format string for sprintf(3), with the
846 numeric value of the variable as the argument. However, even though
847 all numbers in AWK are floating-point, integral values are always con‐
848 verted as integers. Thus, given
849
850 CONVFMT = "%2.2f"
851 a = 12
852 b = a ""
853
854 the variable b has a string value of "12" and not "12.00".
855
856 NOTE: When operating in POSIX mode (such as with the --posix option),
857 beware that locale settings may interfere with the way decimal numbers
858 are treated: the decimal separator of the numbers you are feeding to
859 gawk must conform to what your locale would expect, be it a comma (,)
860 or a period (.).
861
862 Gawk performs comparisons as follows: If two variables are numeric,
863 they are compared numerically. If one value is numeric and the other
864 has a string value that is a “numeric string,” then comparisons are
865 also done numerically. Otherwise, the numeric value is converted to a
866 string and a string comparison is performed. Two strings are compared,
867 of course, as strings.
868
869 Note that string constants, such as "57", are not numeric strings, they
870 are string constants. The idea of “numeric string” only applies to
871 fields, getline input, FILENAME, ARGV elements, ENVIRON elements and
872 the elements of an array created by split() or patsplit() that are
873 numeric strings. The basic idea is that user input, and only user
874 input, that looks numeric, should be treated that way.
875
876 Octal and Hexadecimal Constants
877 You may use C-style octal and hexadecimal constants in your AWK program
878 source code. For example, the octal value 011 is equal to decimal 9,
879 and the hexadecimal value 0x11 is equal to decimal 17.
880
881 String Constants
882 String constants in AWK are sequences of characters enclosed between
883 double quotes (like "value"). Within strings, certain escape sequences
884 are recognized, as in C. These are:
885
886 \\ A literal backslash.
887
888 \a The “alert” character; usually the ASCII BEL character.
889
890 \b Backspace.
891
892 \f Form-feed.
893
894 \n Newline.
895
896 \r Carriage return.
897
898 \t Horizontal tab.
899
900 \v Vertical tab.
901
902 \xhex digits
903 The character represented by the string of hexadecimal digits fol‐
904 lowing the \x. Up to two following hexadecimal digits are consid‐
905 ered part of the escape sequence. E.g., "\x1B" is the ASCII ESC
906 (escape) character.
907
908 \ddd The character represented by the 1-, 2-, or 3-digit sequence of
909 octal digits. E.g., "\033" is the ASCII ESC (escape) character.
910
911 \c The literal character c.
912
913 In compatibility mode, the characters represented by octal and hexadec‐
914 imal escape sequences are treated literally when used in regular
915 expression constants. Thus, /a\52b/ is equivalent to /a\*b/.
916
917 Regexp Constants
918 A regular expression constant is a sequence of characters enclosed
919 between forward slashes (like /value/). Regular expression matching is
920 described more fully below; see Regular Expressions.
921
922 The escape sequences described earlier may also be used inside constant
923 regular expressions (e.g., /[ \t\f\n\r\v]/ matches whitespace charac‐
924 ters).
925
926 Gawk provides strongly typed regular expression constants. These are
927 written with a leading @ symbol (like so: @/value/). Such constants
928 may be assigned to scalars (variables, array elements) and passed to
929 user-defined functions. Variables that have been so assigned have regu‐
930 lar expression type.
931
933 AWK is a line-oriented language. The pattern comes first, and then the
934 action. Action statements are enclosed in { and }. Either the pattern
935 may be missing, or the action may be missing, but, of course, not both.
936 If the pattern is missing, the action executes for every single record
937 of input. A missing action is equivalent to
938
939 { print }
940
941 which prints the entire record.
942
943 Comments begin with the # character, and continue until the end of the
944 line. Empty lines may be used to separate statements. Normally, a
945 statement ends with a newline, however, this is not the case for lines
946 ending in a comma, {, ?, :, &&, or ||. Lines ending in do or else also
947 have their statements automatically continued on the following line.
948 In other cases, a line can be continued by ending it with a “\”, in
949 which case the newline is ignored. However, a “\” after a # is not
950 special.
951
952 Multiple statements may be put on one line by separating them with a
953 “;”. This applies to both the statements within the action part of a
954 pattern-action pair (the usual case), and to the pattern-action state‐
955 ments themselves.
956
957 Patterns
958 AWK patterns may be one of the following:
959
960 BEGIN
961 END
962 BEGINFILE
963 ENDFILE
964 /regular expression/
965 relational expression
966 pattern && pattern
967 pattern || pattern
968 pattern ? pattern : pattern
969 (pattern)
970 ! pattern
971 pattern1, pattern2
972
973 BEGIN and END are two special kinds of patterns which are not tested
974 against the input. The action parts of all BEGIN patterns are merged
975 as if all the statements had been written in a single BEGIN rule. They
976 are executed before any of the input is read. Similarly, all the END
977 rules are merged, and executed when all the input is exhausted (or when
978 an exit statement is executed). BEGIN and END patterns cannot be com‐
979 bined with other patterns in pattern expressions. BEGIN and END pat‐
980 terns cannot have missing action parts.
981
982 BEGINFILE and ENDFILE are additional special patterns whose actions are
983 executed before reading the first record of each command-line input
984 file and after reading the last record of each file. Inside the BEGIN‐
985 FILE rule, the value of ERRNO is the empty string if the file was
986 opened successfully. Otherwise, there is some problem with the file
987 and the code should use nextfile to skip it. If that is not done, gawk
988 produces its usual fatal error for files that cannot be opened.
989
990 For /regular expression/ patterns, the associated statement is executed
991 for each input record that matches the regular expression. Regular
992 expressions are the same as those in egrep(1), and are summarized
993 below.
994
995 A relational expression may use any of the operators defined below in
996 the section on actions. These generally test whether certain fields
997 match certain regular expressions.
998
999 The &&, ||, and ! operators are logical AND, logical OR, and logical
1000 NOT, respectively, as in C. They do short-circuit evaluation, also as
1001 in C, and are used for combining more primitive pattern expressions.
1002 As in most languages, parentheses may be used to change the order of
1003 evaluation.
1004
1005 The ?: operator is like the same operator in C. If the first pattern
1006 is true then the pattern used for testing is the second pattern, other‐
1007 wise it is the third. Only one of the second and third patterns is
1008 evaluated.
1009
1010 The pattern1, pattern2 form of an expression is called a range pattern.
1011 It matches all input records starting with a record that matches pat‐
1012 tern1, and continuing until a record that matches pattern2, inclusive.
1013 It does not combine with any other sort of pattern expression.
1014
1015 Regular Expressions
1016 Regular expressions are the extended kind found in egrep. They are
1017 composed of characters as follows:
1018
1019 c Matches the non-metacharacter c.
1020
1021 \c Matches the literal character c.
1022
1023 . Matches any character including newline.
1024
1025 ^ Matches the beginning of a string.
1026
1027 $ Matches the end of a string.
1028
1029 [abc...] A character list: matches any of the characters abc.... You
1030 may include a range of characters by separating them with a
1031 dash. To include a literal dash in the list, put it first
1032 or last.
1033
1034 [^abc...] A negated character list: matches any character except
1035 abc....
1036
1037 r1|r2 Alternation: matches either r1 or r2.
1038
1039 r1r2 Concatenation: matches r1, and then r2.
1040
1041 r+ Matches one or more r's.
1042
1043 r* Matches zero or more r's.
1044
1045 r? Matches zero or one r's.
1046
1047 (r) Grouping: matches r.
1048
1049 r{n}
1050 r{n,}
1051 r{n,m} One or two numbers inside braces denote an interval expres‐
1052 sion. If there is one number in the braces, the preceding
1053 regular expression r is repeated n times. If there are two
1054 numbers separated by a comma, r is repeated n to m times.
1055 If there is one number followed by a comma, then r is
1056 repeated at least n times.
1057
1058 \y Matches the empty string at either the beginning or the end
1059 of a word.
1060
1061 \B Matches the empty string within a word.
1062
1063 \< Matches the empty string at the beginning of a word.
1064
1065 \> Matches the empty string at the end of a word.
1066
1067 \s Matches any whitespace character.
1068
1069 \S Matches any nonwhitespace character.
1070
1071 \w Matches any word-constituent character (letter, digit, or
1072 underscore).
1073
1074 \W Matches any character that is not word-constituent.
1075
1076 \` Matches the empty string at the beginning of a buffer
1077 (string).
1078
1079 \' Matches the empty string at the end of a buffer.
1080
1081 The escape sequences that are valid in string constants (see String
1082 Constants) are also valid in regular expressions.
1083
1084 Character classes are a feature introduced in the POSIX standard. A
1085 character class is a special notation for describing lists of charac‐
1086 ters that have a specific attribute, but where the actual characters
1087 themselves can vary from country to country and/or from character set
1088 to character set. For example, the notion of what is an alphabetic
1089 character differs in the USA and in France.
1090
1091 A character class is only valid in a regular expression inside the
1092 brackets of a character list. Character classes consist of [:, a key‐
1093 word denoting the class, and :]. The character classes defined by the
1094 POSIX standard are:
1095
1096 [:alnum:] Alphanumeric characters.
1097
1098 [:alpha:] Alphabetic characters.
1099
1100 [:blank:] Space or tab characters.
1101
1102 [:cntrl:] Control characters.
1103
1104 [:digit:] Numeric characters.
1105
1106 [:graph:] Characters that are both printable and visible. (A space is
1107 printable, but not visible, while an a is both.)
1108
1109 [:lower:] Lowercase alphabetic characters.
1110
1111 [:print:] Printable characters (characters that are not control char‐
1112 acters.)
1113
1114 [:punct:] Punctuation characters (characters that are not letter, dig‐
1115 its, control characters, or space characters).
1116
1117 [:space:] Space characters (such as space, tab, and formfeed, to name
1118 a few).
1119
1120 [:upper:] Uppercase alphabetic characters.
1121
1122 [:xdigit:] Characters that are hexadecimal digits.
1123
1124 For example, before the POSIX standard, to match alphanumeric charac‐
1125 ters, you would have had to write /[A-Za-z0-9]/. If your character set
1126 had other alphabetic characters in it, this would not match them, and
1127 if your character set collated differently from ASCII, this might not
1128 even match the ASCII alphanumeric characters. With the POSIX character
1129 classes, you can write /[[:alnum:]]/, and this matches the alphabetic
1130 and numeric characters in your character set, no matter what it is.
1131
1132 Two additional special sequences can appear in character lists. These
1133 apply to non-ASCII character sets, which can have single symbols
1134 (called collating elements) that are represented with more than one
1135 character, as well as several characters that are equivalent for col‐
1136 lating, or sorting, purposes. (E.g., in French, a plain “e” and a
1137 grave-accented “e`” are equivalent.)
1138
1139 Collating Symbols
1140 A collating symbol is a multi-character collating element
1141 enclosed in [. and .]. For example, if ch is a collating ele‐
1142 ment, then [[.ch.]] is a regular expression that matches this
1143 collating element, while [ch] is a regular expression that
1144 matches either c or h.
1145
1146 Equivalence Classes
1147 An equivalence class is a locale-specific name for a list of
1148 characters that are equivalent. The name is enclosed in [= and
1149 =]. For example, the name e might be used to represent all of
1150 “e”, “e´”, and “e`”. In this case, [[=e=]] is a regular expres‐
1151 sion that matches any of e, e´, or e`.
1152
1153 These features are very valuable in non-English speaking locales. The
1154 library functions that gawk uses for regular expression matching cur‐
1155 rently only recognize POSIX character classes; they do not recognize
1156 collating symbols or equivalence classes.
1157
1158 The \y, \B, \<, \>, \s, \S, \w, \W, \`, and \' operators are specific
1159 to gawk; they are extensions based on facilities in the GNU regular
1160 expression libraries.
1161
1162 The various command line options control how gawk interprets characters
1163 in regular expressions.
1164
1165 No options
1166 In the default case, gawk provides all the facilities of POSIX
1167 regular expressions and the GNU regular expression operators
1168 described above.
1169
1170 --posix
1171 Only POSIX regular expressions are supported, the GNU operators
1172 are not special. (E.g., \w matches a literal w).
1173
1174 --traditional
1175 Traditional UNIX awk regular expressions are matched. The GNU
1176 operators are not special, and interval expressions are not
1177 available. Characters described by octal and hexadecimal escape
1178 sequences are treated literally, even if they represent regular
1179 expression metacharacters.
1180
1181 --re-interval
1182 Allow interval expressions in regular expressions, even if
1183 --traditional has been provided.
1184
1185 Actions
1186 Action statements are enclosed in braces, { and }. Action statements
1187 consist of the usual assignment, conditional, and looping statements
1188 found in most languages. The operators, control statements, and
1189 input/output statements available are patterned after those in C.
1190
1191 Operators
1192 The operators in AWK, in order of decreasing precedence, are:
1193
1194 (...) Grouping
1195
1196 $ Field reference.
1197
1198 ++ -- Increment and decrement, both prefix and postfix.
1199
1200 ^ Exponentiation (** may also be used, and **= for the
1201 assignment operator).
1202
1203 + - ! Unary plus, unary minus, and logical negation.
1204
1205 * / % Multiplication, division, and modulus.
1206
1207 + - Addition and subtraction.
1208
1209 space String concatenation.
1210
1211 | |& Piped I/O for getline, print, and printf.
1212
1213 < > <= >= == !=
1214 The regular relational operators.
1215
1216 ~ !~ Regular expression match, negated match. NOTE: Do not use
1217 a constant regular expression (/foo/) on the left-hand side
1218 of a ~ or !~. Only use one on the right-hand side. The
1219 expression /foo/ ~ exp has the same meaning as (($0 ~
1220 /foo/) ~ exp). This is usually not what you want.
1221
1222 in Array membership.
1223
1224 && Logical AND.
1225
1226 || Logical OR.
1227
1228 ?: The C conditional expression. This has the form expr1 ?
1229 expr2 : expr3. If expr1 is true, the value of the expres‐
1230 sion is expr2, otherwise it is expr3. Only one of expr2
1231 and expr3 is evaluated.
1232
1233 = += -= *= /= %= ^=
1234 Assignment. Both absolute assignment (var = value) and
1235 operator-assignment (the other forms) are supported.
1236
1237 Control Statements
1238 The control statements are as follows:
1239
1240 if (condition) statement [ else statement ]
1241 while (condition) statement
1242 do statement while (condition)
1243 for (expr1; expr2; expr3) statement
1244 for (var in array) statement
1245 break
1246 continue
1247 delete array[index]
1248 delete array
1249 exit [ expression ]
1250 { statements }
1251 switch (expression) {
1252 case value|regex : statement
1253 ...
1254 [ default: statement ]
1255 }
1256
1257 I/O Statements
1258 The input/output statements are as follows:
1259
1260 close(file [, how]) Close file, pipe or coprocess. The optional how
1261 should only be used when closing one end of a
1262 two-way pipe to a coprocess. It must be a string
1263 value, either "to" or "from".
1264
1265 getline Set $0 from the next input record; set NF, NR,
1266 FNR, RT.
1267
1268 getline <file Set $0 from the next record of file; set NF, RT.
1269
1270 getline var Set var from the next input record; set NR, FNR,
1271 RT.
1272
1273 getline var <file Set var from the next record of file; set RT.
1274
1275 command | getline [var]
1276 Run command, piping the output either into $0 or
1277 var, as above, and RT.
1278
1279 command |& getline [var]
1280 Run command as a coprocess piping the output
1281 either into $0 or var, as above, and RT. Copro‐
1282 cesses are a gawk extension. (The command can
1283 also be a socket. See the subsection Special
1284 File Names, below.)
1285
1286 next Stop processing the current input record. Read
1287 the next input record and start processing over
1288 with the first pattern in the AWK program. Upon
1289 reaching the end of the input data, execute any
1290 END rule(s).
1291
1292 nextfile Stop processing the current input file. The next
1293 input record read comes from the next input file.
1294 Update FILENAME and ARGIND, reset FNR to 1, and
1295 start processing over with the first pattern in
1296 the AWK program. Upon reaching the end of the
1297 input data, execute any ENDFILE and END rule(s).
1298
1299 print Print the current record. The output record is
1300 terminated with the value of ORS.
1301
1302 print expr-list Print expressions. Each expression is separated
1303 by the value of OFS. The output record is termi‐
1304 nated with the value of ORS.
1305
1306 print expr-list >file Print expressions on file. Each expression is
1307 separated by the value of OFS. The output record
1308 is terminated with the value of ORS.
1309
1310 printf fmt, expr-list Format and print. See The printf Statement,
1311 below.
1312
1313 printf fmt, expr-list >file
1314 Format and print on file.
1315
1316 system(cmd-line) Execute the command cmd-line, and return the exit
1317 status. (This may not be available on non-POSIX
1318 systems.) See GAWK: Effective AWK Programming
1319 for the full details on the exit status.
1320
1321 fflush([file]) Flush any buffers associated with the open output
1322 file or pipe file. If file is missing or if it
1323 is the null string, then flush all open output
1324 files and pipes.
1325
1326 Additional output redirections are allowed for print and printf.
1327
1328 print ... >> file
1329 Append output to the file.
1330
1331 print ... | command
1332 Write on a pipe.
1333
1334 print ... |& command
1335 Send data to a coprocess or socket. (See also the subsection
1336 Special File Names, below.)
1337
1338 The getline command returns 1 on success, zero on end of file, and -1
1339 on an error. If the errno(3) value indicates that the I/O operation
1340 may be retried, and PROCINFO["input", "RETRY"] is set, then -2 is
1341 returned instead of -1, and further calls to getline may be attempted.
1342 Upon an error, ERRNO is set to a string describing the problem.
1343
1344 NOTE: Failure in opening a two-way socket results in a non-fatal error
1345 being returned to the calling function. If using a pipe, coprocess, or
1346 socket to getline, or from print or printf within a loop, you must use
1347 close() to create new instances of the command or socket. AWK does not
1348 automatically close pipes, sockets, or coprocesses when they return
1349 EOF.
1350
1351 The printf Statement
1352 The AWK versions of the printf statement and sprintf() function (see
1353 below) accept the following conversion specification formats:
1354
1355 %a, %A A floating point number of the form [-]0xh.hhhhp+-dd (C99 hexa‐
1356 decimal floating point format). For %A, uppercase letters are
1357 used instead of lowercase ones.
1358
1359 %c A single character. If the argument used for %c is numeric, it
1360 is treated as a character and printed. Otherwise, the argument
1361 is assumed to be a string, and the only first character of that
1362 string is printed.
1363
1364 %d, %i A decimal number (the integer part).
1365
1366 %e, %E A floating point number of the form [-]d.dddddde[+-]dd. The %E
1367 format uses E instead of e.
1368
1369 %f, %F A floating point number of the form [-]ddd.dddddd. If the sys‐
1370 tem library supports it, %F is available as well. This is like
1371 %f, but uses capital letters for special “not a number” and
1372 “infinity” values. If %F is not available, gawk uses %f.
1373
1374 %g, %G Use %e or %f conversion, whichever is shorter, with nonsignifi‐
1375 cant zeros suppressed. The %G format uses %E instead of %e.
1376
1377 %o An unsigned octal number (also an integer).
1378
1379 %u An unsigned decimal number (again, an integer).
1380
1381 %s A character string.
1382
1383 %x, %X An unsigned hexadecimal number (an integer). The %X format
1384 uses ABCDEF instead of abcdef.
1385
1386 %% A single % character; no argument is converted.
1387
1388 Optional, additional parameters may lie between the % and the control
1389 letter:
1390
1391 count$ Use the count'th argument at this point in the formatting. This
1392 is called a positional specifier and is intended primarily for
1393 use in translated versions of format strings, not in the origi‐
1394 nal text of an AWK program. It is a gawk extension.
1395
1396 - The expression should be left-justified within its field.
1397
1398 space For numeric conversions, prefix positive values with a space,
1399 and negative values with a minus sign.
1400
1401 + The plus sign, used before the width modifier (see below), says
1402 to always supply a sign for numeric conversions, even if the
1403 data to be formatted is positive. The + overrides the space
1404 modifier.
1405
1406 # Use an “alternate form” for certain control letters. For %o,
1407 supply a leading zero. For %x, and %X, supply a leading 0x or
1408 0X for a nonzero result. For %e, %E, %f and %F, the result
1409 always contains a decimal point. For %g, and %G, trailing zeros
1410 are not removed from the result.
1411
1412 0 A leading 0 (zero) acts as a flag, indicating that output should
1413 be padded with zeroes instead of spaces. This applies only to
1414 the numeric output formats. This flag only has an effect when
1415 the field width is wider than the value to be printed.
1416
1417 ' A single quote character instructs gawk to insert the locale's
1418 thousands-separator character into decimal numbers, and to also
1419 use the locale's decimal point character with floating point
1420 formats. This requires correct locale support in the C library
1421 and in the definition of the current locale.
1422
1423 width The field should be padded to this width. The field is normally
1424 padded with spaces. With the 0 flag, it is padded with zeroes.
1425
1426 .prec A number that specifies the precision to use when printing. For
1427 the %e, %E, %f and %F, formats, this specifies the number of
1428 digits you want printed to the right of the decimal point. For
1429 the %g, and %G formats, it specifies the maximum number of sig‐
1430 nificant digits. For the %d, %i, %o, %u, %x, and %X formats, it
1431 specifies the minimum number of digits to print. For the %s
1432 format, it specifies the maximum number of characters from the
1433 string that should be printed.
1434
1435 The dynamic width and prec capabilities of the ISO C printf() routines
1436 are supported. A * in place of either the width or prec specifications
1437 causes their values to be taken from the argument list to printf or
1438 sprintf(). To use a positional specifier with a dynamic width or pre‐
1439 cision, supply the count$ after the * in the format string. For exam‐
1440 ple, "%3$*2$.*1$s".
1441
1442 Special File Names
1443 When doing I/O redirection from either print or printf into a file, or
1444 via getline from a file, gawk recognizes certain special filenames
1445 internally. These filenames allow access to open file descriptors
1446 inherited from gawk's parent process (usually the shell). These file
1447 names may also be used on the command line to name data files. The
1448 filenames are:
1449
1450 - The standard input.
1451
1452 /dev/stdin The standard input.
1453
1454 /dev/stdout The standard output.
1455
1456 /dev/stderr The standard error output.
1457
1458 /dev/fd/n The file associated with the open file descriptor n.
1459
1460 These are particularly useful for error messages. For example:
1461
1462 print "You blew it!" > "/dev/stderr"
1463
1464 whereas you would otherwise have to use
1465
1466 print "You blew it!" | "cat 1>&2"
1467
1468 The following special filenames may be used with the |& coprocess oper‐
1469 ator for creating TCP/IP network connections:
1470
1471 /inet/tcp/lport/rhost/rport
1472 /inet4/tcp/lport/rhost/rport
1473 /inet6/tcp/lport/rhost/rport
1474 Files for a TCP/IP connection on local port lport to remote host
1475 rhost on remote port rport. Use a port of 0 to have the system
1476 pick a port. Use /inet4 to force an IPv4 connection, and /inet6
1477 to force an IPv6 connection. Plain /inet uses the system
1478 default (most likely IPv4). Usable only with the |& two-way I/O
1479 operator.
1480
1481 /inet/udp/lport/rhost/rport
1482 /inet4/udp/lport/rhost/rport
1483 /inet6/udp/lport/rhost/rport
1484 Similar, but use UDP/IP instead of TCP/IP.
1485
1486 Numeric Functions
1487 AWK has the following built-in arithmetic functions:
1488
1489 atan2(y, x) Return the arctangent of y/x in radians.
1490
1491 cos(expr) Return the cosine of expr, which is in radians.
1492
1493 exp(expr) The exponential function.
1494
1495 int(expr) Truncate to integer.
1496
1497 log(expr) The natural logarithm function.
1498
1499 rand() Return a random number N, between zero and one, such that
1500 0 ≤ N < 1.
1501
1502 sin(expr) Return the sine of expr, which is in radians.
1503
1504 sqrt(expr) Return the square root of expr.
1505
1506 srand([expr]) Use expr as the new seed for the random number generator.
1507 If no expr is provided, use the time of day. Return the
1508 previous seed for the random number generator.
1509
1510 String Functions
1511 Gawk has the following built-in string functions:
1512
1513 asort(s [, d [, how] ]) Return the number of elements in the source
1514 array s. Sort the contents of s using gawk's
1515 normal rules for comparing values, and replace
1516 the indices of the sorted values s with sequen‐
1517 tial integers starting with 1. If the optional
1518 destination array d is specified, first dupli‐
1519 cate s into d, and then sort d, leaving the
1520 indices of the source array s unchanged. The
1521 optional string how controls the direction and
1522 the comparison mode. Valid values for how are
1523 any of the strings valid for
1524 PROCINFO["sorted_in"]. It can also be the name
1525 of a user-defined comparison function as
1526 described in PROCINFO["sorted_in"].
1527
1528 asorti(s [, d [, how] ])
1529 Return the number of elements in the source
1530 array s. The behavior is the same as that of
1531 asort(), except that the array indices are used
1532 for sorting, not the array values. When done,
1533 the array is indexed numerically, and the val‐
1534 ues are those of the original indices. The
1535 original values are lost; thus provide a second
1536 array if you wish to preserve the original.
1537 The purpose of the optional string how is the
1538 same as described previously for asort().
1539
1540 gensub(r, s, h [, t]) Search the target string t for matches of the
1541 regular expression r. If h is a string begin‐
1542 ning with g or G, then replace all matches of r
1543 with s. Otherwise, h is a number indicating
1544 which match of r to replace. If t is not sup‐
1545 plied, use $0 instead. Within the replacement
1546 text s, the sequence \n, where n is a digit
1547 from 1 to 9, may be used to indicate just the
1548 text that matched the n'th parenthesized subex‐
1549 pression. The sequence \0 represents the
1550 entire matched text, as does the character &.
1551 Unlike sub() and gsub(), the modified string is
1552 returned as the result of the function, and the
1553 original target string is not changed.
1554
1555 gsub(r, s [, t]) For each substring matching the regular expres‐
1556 sion r in the string t, substitute the string
1557 s, and return the number of substitutions. If
1558 t is not supplied, use $0. An & in the
1559 replacement text is replaced with the text that
1560 was actually matched. Use \& to get a literal
1561 &. (This must be typed as "\\&"; see GAWK:
1562 Effective AWK Programming for a fuller discus‐
1563 sion of the rules for ampersands and back‐
1564 slashes in the replacement text of sub(),
1565 gsub(), and gensub().)
1566
1567 index(s, t) Return the index of the string t in the string
1568 s, or zero if t is not present. (This implies
1569 that character indices start at one.) It is a
1570 fatal error to use a regexp constant for t.
1571
1572 length([s]) Return the length of the string s, or the
1573 length of $0 if s is not supplied. As a non-
1574 standard extension, with an array argument,
1575 length() returns the number of elements in the
1576 array.
1577
1578 match(s, r [, a]) Return the position in s where the regular
1579 expression r occurs, or zero if r is not
1580 present, and set the values of RSTART and
1581 RLENGTH. Note that the argument order is the
1582 same as for the ~ operator: str ~ re. If array
1583 a is provided, a is cleared and then elements 1
1584 through n are filled with the portions of s
1585 that match the corresponding parenthesized sub‐
1586 expression in r. The zero'th element of a con‐
1587 tains the portion of s matched by the entire
1588 regular expression r. Subscripts a[n,
1589 "start"], and a[n, "length"] provide the start‐
1590 ing index in the string and length respec‐
1591 tively, of each matching substring.
1592
1593 patsplit(s, a [, r [, seps] ])
1594 Split the string s into the array a and the
1595 separators array seps on the regular expression
1596 r, and return the number of fields. Element
1597 values are the portions of s that matched r.
1598 The value of seps[i] is the possibly null sepa‐
1599 rator that appeared after a[i]. The value of
1600 seps[0] is the possibly null leading separator.
1601 If r is omitted, FPAT is used instead. The
1602 arrays a and seps are cleared first. Splitting
1603 behaves identically to field splitting with
1604 FPAT, described above.
1605
1606 split(s, a [, r [, seps] ])
1607 Split the string s into the array a and the
1608 separators array seps on the regular expression
1609 r, and return the number of fields. If r is
1610 omitted, FS is used instead. The arrays a and
1611 seps are cleared first. seps[i] is the field
1612 separator matched by r between a[i] and a[i+1].
1613 If r is a single space, then leading whitespace
1614 in s goes into the extra array element seps[0]
1615 and trailing whitespace goes into the extra
1616 array element seps[n], where n is the return
1617 value of split(s, a, r, seps). Splitting
1618 behaves identically to field splitting,
1619 described above. In particular, if r is a sin‐
1620 gle-character string, that string acts as the
1621 separator, even if it happens to be a regular
1622 expression metacharacter.
1623
1624 sprintf(fmt, expr-list) Print expr-list according to fmt, and return
1625 the resulting string.
1626
1627 strtonum(str) Examine str, and return its numeric value. If
1628 str begins with a leading 0, treat it as an
1629 octal number. If str begins with a leading 0x
1630 or 0X, treat it as a hexadecimal number. Oth‐
1631 erwise, assume it is a decimal number.
1632
1633 sub(r, s [, t]) Just like gsub(), but replace only the first
1634 matching substring. Return either zero or one.
1635
1636 substr(s, i [, n]) Return the at most n-character substring of s
1637 starting at i. If n is omitted, use the rest
1638 of s.
1639
1640 tolower(str) Return a copy of the string str, with all the
1641 uppercase characters in str translated to their
1642 corresponding lowercase counterparts. Non-
1643 alphabetic characters are left unchanged.
1644
1645 toupper(str) Return a copy of the string str, with all the
1646 lowercase characters in str translated to their
1647 corresponding uppercase counterparts. Non-
1648 alphabetic characters are left unchanged.
1649
1650 Gawk is multibyte aware. This means that index(), length(), substr()
1651 and match() all work in terms of characters, not bytes.
1652
1653 Time Functions
1654 Since one of the primary uses of AWK programs is processing log files
1655 that contain time stamp information, gawk provides the following func‐
1656 tions for obtaining time stamps and formatting them.
1657
1658 mktime(datespec [, utc-flag])
1659 Turn datespec into a time stamp of the same form as returned
1660 by systime(), and return the result. The datespec is a
1661 string of the form YYYY MM DD HH MM SS[ DST]. The contents
1662 of the string are six or seven numbers representing respec‐
1663 tively the full year including century, the month from 1 to
1664 12, the day of the month from 1 to 31, the hour of the day
1665 from 0 to 23, the minute from 0 to 59, the second from 0 to
1666 60, and an optional daylight saving flag. The values of
1667 these numbers need not be within the ranges specified; for
1668 example, an hour of -1 means 1 hour before midnight. The
1669 origin-zero Gregorian calendar is assumed, with year 0 pre‐
1670 ceding year 1 and year -1 preceding year 0. If utc-flag is
1671 present and is non-zero or non-null, the time is assumed to
1672 be in the UTC time zone; otherwise, the time is assumed to be
1673 in the local time zone. If the DST daylight saving flag is
1674 positive, the time is assumed to be daylight saving time; if
1675 zero, the time is assumed to be standard time; and if nega‐
1676 tive (the default), mktime() attempts to determine whether
1677 daylight saving time is in effect for the specified time. If
1678 datespec does not contain enough elements or if the resulting
1679 time is out of range, mktime() returns -1.
1680
1681 strftime([format [, timestamp[, utc-flag]]])
1682 Format timestamp according to the specification in format.
1683 If utc-flag is present and is non-zero or non-null, the
1684 result is in UTC, otherwise the result is in local time. The
1685 timestamp should be of the same form as returned by sys‐
1686 time(). If timestamp is missing, the current time of day is
1687 used. If format is missing, a default format equivalent to
1688 the output of date(1) is used. The default format is avail‐
1689 able in PROCINFO["strftime"]. See the specification for the
1690 strftime() function in ISO C for the format conversions that
1691 are guaranteed to be available.
1692
1693 systime() Return the current time of day as the number of seconds since
1694 the Epoch (1970-01-01 00:00:00 UTC on POSIX systems).
1695
1696 Bit Manipulations Functions
1697 Gawk supplies the following bit manipulation functions. They work by
1698 converting double-precision floating point values to uintmax_t inte‐
1699 gers, doing the operation, and then converting the result back to
1700 floating point.
1701
1702 NOTE: Passing negative operands to any of these functions causes a
1703 fatal error.
1704
1705 The functions are:
1706
1707 and(v1, v2 [, ...]) Return the bitwise AND of the values provided in
1708 the argument list. There must be at least two.
1709
1710 compl(val) Return the bitwise complement of val.
1711
1712 lshift(val, count) Return the value of val, shifted left by count
1713 bits.
1714
1715 or(v1, v2 [, ...]) Return the bitwise OR of the values provided in the
1716 argument list. There must be at least two.
1717
1718 rshift(val, count) Return the value of val, shifted right by count
1719 bits.
1720
1721 xor(v1, v2 [, ...]) Return the bitwise XOR of the values provided in
1722 the argument list. There must be at least two.
1723
1724 Type Functions
1725 The following functions provide type related information about their
1726 arguments.
1727
1728 isarray(x) Return true if x is an array, false otherwise. This func‐
1729 tion is mainly for use with the elements of multidimensional
1730 arrays and with function parameters.
1731
1732 typeof(x) Return a string indicating the type of x. The string will
1733 be one of "array", "number", "regexp", "string", "strnum",
1734 "unassigned", or "undefined".
1735
1736 Internationalization Functions
1737 The following functions may be used from within your AWK program for
1738 translating strings at run-time. For full details, see GAWK: Effective
1739 AWK Programming.
1740
1741 bindtextdomain(directory [, domain])
1742 Specify the directory where gawk looks for the .gmo files, in
1743 case they will not or cannot be placed in the ``standard'' loca‐
1744 tions (e.g., during testing). It returns the directory where
1745 domain is ``bound.''
1746 The default domain is the value of TEXTDOMAIN. If directory is
1747 the null string (""), then bindtextdomain() returns the current
1748 binding for the given domain.
1749
1750 dcgettext(string [, domain [, category]])
1751 Return the translation of string in text domain domain for
1752 locale category category. The default value for domain is the
1753 current value of TEXTDOMAIN. The default value for category is
1754 "LC_MESSAGES".
1755 If you supply a value for category, it must be a string equal to
1756 one of the known locale categories described in GAWK: Effective
1757 AWK Programming. You must also supply a text domain. Use
1758 TEXTDOMAIN if you want to use the current domain.
1759
1760 dcngettext(string1, string2, number [, domain [, category]])
1761 Return the plural form used for number of the translation of
1762 string1 and string2 in text domain domain for locale category
1763 category. The default value for domain is the current value of
1764 TEXTDOMAIN. The default value for category is "LC_MESSAGES".
1765 If you supply a value for category, it must be a string equal to
1766 one of the known locale categories described in GAWK: Effective
1767 AWK Programming. You must also supply a text domain. Use
1768 TEXTDOMAIN if you want to use the current domain.
1769
1771 Functions in AWK are defined as follows:
1772
1773 function name(parameter list) { statements }
1774
1775 Functions execute when they are called from within expressions in
1776 either patterns or actions. Actual parameters supplied in the function
1777 call are used to instantiate the formal parameters declared in the
1778 function. Arrays are passed by reference, other variables are passed
1779 by value.
1780
1781 Since functions were not originally part of the AWK language, the pro‐
1782 vision for local variables is rather clumsy: They are declared as extra
1783 parameters in the parameter list. The convention is to separate local
1784 variables from real parameters by extra spaces in the parameter list.
1785 For example:
1786
1787 function f(p, q, a, b) # a and b are local
1788 {
1789 ...
1790 }
1791
1792 /abc/ { ... ; f(1, 2) ; ... }
1793
1794 The left parenthesis in a function call is required to immediately fol‐
1795 low the function name, without any intervening whitespace. This avoids
1796 a syntactic ambiguity with the concatenation operator. This restric‐
1797 tion does not apply to the built-in functions listed above.
1798
1799 Functions may call each other and may be recursive. Function parame‐
1800 ters used as local variables are initialized to the null string and the
1801 number zero upon function invocation.
1802
1803 Use return expr to return a value from a function. The return value is
1804 undefined if no value is provided, or if the function returns by “fall‐
1805 ing off” the end.
1806
1807 As a gawk extension, functions may be called indirectly. To do this,
1808 assign the name of the function to be called, as a string, to a vari‐
1809 able. Then use the variable as if it were the name of a function, pre‐
1810 fixed with an @ sign, like so:
1811 function myfunc()
1812 {
1813 print "myfunc called"
1814 ...
1815 }
1816
1817 { ...
1818 the_func = "myfunc"
1819 @the_func() # call through the_func to myfunc
1820 ...
1821 }
1822 As of version 4.1.2, this works with user-defined functions, built-in
1823 functions, and extension functions.
1824
1825 If --lint has been provided, gawk warns about calls to undefined func‐
1826 tions at parse time, instead of at run time. Calling an undefined
1827 function at run time is a fatal error.
1828
1829 The word func may be used in place of function, although this is depre‐
1830 cated.
1831
1833 You can dynamically add new functions written in C or C++ to the run‐
1834 ning gawk interpreter with the @load statement. The full details are
1835 beyond the scope of this manual page; see GAWK: Effective AWK Program‐
1836 ming.
1837
1839 The gawk profiler accepts two signals. SIGUSR1 causes it to dump a
1840 profile and function call stack to the profile file, which is either
1841 awkprof.out, or whatever file was named with the --profile option. It
1842 then continues to run. SIGHUP causes gawk to dump the profile and
1843 function call stack and then exit.
1844
1846 String constants are sequences of characters enclosed in double quotes.
1847 In non-English speaking environments, it is possible to mark strings in
1848 the AWK program as requiring translation to the local natural language.
1849 Such strings are marked in the AWK program with a leading underscore
1850 (“_”). For example,
1851
1852 gawk 'BEGIN { print "hello, world" }'
1853
1854 always prints hello, world. But,
1855
1856 gawk 'BEGIN { print _"hello, world" }'
1857
1858 might print bonjour, monde in France.
1859
1860 There are several steps involved in producing and running a localizable
1861 AWK program.
1862
1863 1. Add a BEGIN action to assign a value to the TEXTDOMAIN variable to
1864 set the text domain to a name associated with your program:
1865
1866 BEGIN { TEXTDOMAIN = "myprog" }
1867
1868 This allows gawk to find the .gmo file associated with your pro‐
1869 gram. Without this step, gawk uses the messages text domain, which
1870 likely does not contain translations for your program.
1871
1872 2. Mark all strings that should be translated with leading under‐
1873 scores.
1874
1875 3. If necessary, use the dcgettext() and/or bindtextdomain() functions
1876 in your program, as appropriate.
1877
1878 4. Run gawk --gen-pot -f myprog.awk > myprog.pot to generate a .pot
1879 file for your program.
1880
1881 5. Provide appropriate translations, and build and install the corre‐
1882 sponding .gmo files.
1883
1884 The internationalization features are described in full detail in GAWK:
1885 Effective AWK Programming.
1886
1888 A primary goal for gawk is compatibility with the POSIX standard, as
1889 well as with the latest version of Brian Kernighan's awk. To this end,
1890 gawk incorporates the following user visible features which are not
1891 described in the AWK book, but are part of the Brian Kernighan's ver‐
1892 sion of awk, and are in the POSIX standard.
1893
1894 The book indicates that command line variable assignment happens when
1895 awk would otherwise open the argument as a file, which is after the
1896 BEGIN rule is executed. However, in earlier implementations, when such
1897 an assignment appeared before any file names, the assignment would hap‐
1898 pen before the BEGIN rule was run. Applications came to depend on this
1899 “feature.” When awk was changed to match its documentation, the -v
1900 option for assigning variables before program execution was added to
1901 accommodate applications that depended upon the old behavior. (This
1902 feature was agreed upon by both the Bell Laboratories developers and
1903 the GNU developers.)
1904
1905 When processing arguments, gawk uses the special option “--” to signal
1906 the end of arguments. In compatibility mode, it warns about but other‐
1907 wise ignores undefined options. In normal operation, such arguments
1908 are passed on to the AWK program for it to process.
1909
1910 The AWK book does not define the return value of srand(). The POSIX
1911 standard has it return the seed it was using, to allow keeping track of
1912 random number sequences. Therefore srand() in gawk also returns its
1913 current seed.
1914
1915 Other features are: The use of multiple -f options (from MKS awk); the
1916 ENVIRON array; the \a, and \v escape sequences (done originally in gawk
1917 and fed back into the Bell Laboratories version); the tolower() and
1918 toupper() built-in functions (from the Bell Laboratories version); and
1919 the ISO C conversion specifications in printf (done first in the Bell
1920 Laboratories version).
1921
1923 There is one feature of historical AWK implementations that gawk sup‐
1924 ports: It is possible to call the length() built-in function not only
1925 with no argument, but even without parentheses! Thus,
1926
1927 a = length # Holy Algol 60, Batman!
1928
1929 is the same as either of
1930
1931 a = length()
1932 a = length($0)
1933
1934 Using this feature is poor practice, and gawk issues a warning about
1935 its use if --lint is specified on the command line.
1936
1938 Gawk has a too-large number of extensions to POSIX awk. They are
1939 described in this section. All the extensions described here can be
1940 disabled by invoking gawk with the --traditional or --posix options.
1941
1942 The following features of gawk are not available in POSIX awk.
1943
1944 · No path search is performed for files named via the -f option.
1945 Therefore the AWKPATH environment variable is not special.
1946
1947 · There is no facility for doing file inclusion (gawk's @include mecha‐
1948 nism).
1949
1950 · There is no facility for dynamically adding new functions written in
1951 C (gawk's @load mechanism).
1952
1953 · The \x escape sequence.
1954
1955 · The ability to continue lines after ? and :.
1956
1957 · Octal and hexadecimal constants in AWK programs.
1958
1959 · The ARGIND, BINMODE, ERRNO, LINT, PREC, ROUNDMODE, RT and TEXTDOMAIN
1960 variables are not special.
1961
1962 · The IGNORECASE variable and its side-effects are not available.
1963
1964 · The FIELDWIDTHS variable and fixed-width field splitting.
1965
1966 · The FPAT variable and field splitting based on field values.
1967
1968 · The FUNCTAB, SYMTAB, and PROCINFO arrays are not available.
1969
1970 · The use of RS as a regular expression.
1971
1972 · The special file names available for I/O redirection are not recog‐
1973 nized.
1974
1975 · The |& operator for creating coprocesses.
1976
1977 · The BEGINFILE and ENDFILE special patterns are not available.
1978
1979 · The ability to split out individual characters using the null string
1980 as the value of FS, and as the third argument to split().
1981
1982 · An optional fourth argument to split() to receive the separator
1983 texts.
1984
1985 · The optional second argument to the close() function.
1986
1987 · The optional third argument to the match() function.
1988
1989 · The ability to use positional specifiers with printf and sprintf().
1990
1991 · The ability to pass an array to length().
1992
1993 · The and(), asort(), asorti(), bindtextdomain(), compl(), dcgettext(),
1994 dcngettext(), gensub(), lshift(), mktime(), or(), patsplit(),
1995 rshift(), strftime(), strtonum(), systime() and xor() functions.
1996
1997 · Localizable strings.
1998
1999 · Non-fatal I/O.
2000
2001 · Retryable I/O.
2002
2003 The AWK book does not define the return value of the close() function.
2004 Gawk's close() returns the value from fclose(3), or pclose(3), when
2005 closing an output file or pipe, respectively. It returns the process's
2006 exit status when closing an input pipe. The return value is -1 if the
2007 named file, pipe or coprocess was not opened with a redirection.
2008
2009 When gawk is invoked with the --traditional option, if the fs argument
2010 to the -F option is “t”, then FS is set to the tab character. Note
2011 that typing gawk -F\t ... simply causes the shell to quote the “t,”
2012 and does not pass “\t” to the -F option. Since this is a rather ugly
2013 special case, it is not the default behavior. This behavior also does
2014 not occur if --posix has been specified. To really get a tab character
2015 as the field separator, it is best to use single quotes: gawk -F'\t'
2016 ....
2017
2019 The AWKPATH environment variable can be used to provide a list of
2020 directories that gawk searches when looking for files named via the -f,
2021 --file, -i and --include options, and the @include directive. If the
2022 initial search fails, the path is searched again after appending .awk
2023 to the filename.
2024
2025 The AWKLIBPATH environment variable can be used to provide a list of
2026 directories that gawk searches when looking for files named via the -l
2027 and --load options.
2028
2029 The GAWK_READ_TIMEOUT environment variable can be used to specify a
2030 timeout in milliseconds for reading input from a terminal, pipe or two-
2031 way communication including sockets.
2032
2033 For connection to a remote host via socket, GAWK_SOCK_RETRIES controls
2034 the number of retries, and GAWK_MSEC_SLEEP the interval between
2035 retries. The interval is in milliseconds. On systems that do not sup‐
2036 port usleep(3), the value is rounded up to an integral number of sec‐
2037 onds.
2038
2039 If POSIXLY_CORRECT exists in the environment, then gawk behaves exactly
2040 as if --posix had been specified on the command line. If --lint has
2041 been specified, gawk issues a warning message to this effect.
2042
2044 If the exit statement is used with a value, then gawk exits with the
2045 numeric value given to it.
2046
2047 Otherwise, if there were no problems during execution, gawk exits with
2048 the value of the C constant EXIT_SUCCESS. This is usually zero.
2049
2050 If an error occurs, gawk exits with the value of the C constant
2051 EXIT_FAILURE. This is usually one.
2052
2053 If gawk exits because of a fatal error, the exit status is 2. On non-
2054 POSIX systems, this value may be mapped to EXIT_FAILURE.
2055
2057 This man page documents gawk, version 5.0.
2058
2060 The original version of UNIX awk was designed and implemented by Alfred
2061 Aho, Peter Weinberger, and Brian Kernighan of Bell Laboratories. Brian
2062 Kernighan continues to maintain and enhance it.
2063
2064 Paul Rubin and Jay Fenlason, of the Free Software Foundation, wrote
2065 gawk, to be compatible with the original version of awk distributed in
2066 Seventh Edition UNIX. John Woods contributed a number of bug fixes.
2067 David Trueman, with contributions from Arnold Robbins, made gawk com‐
2068 patible with the new version of UNIX awk. Arnold Robbins is the cur‐
2069 rent maintainer.
2070
2071 See GAWK: Effective AWK Programming for a full list of the contributors
2072 to gawk and its documentation.
2073
2074 See the README file in the gawk distribution for up-to-date information
2075 about maintainers and which ports are currently supported.
2076
2078 If you find a bug in gawk, please send electronic mail to
2079 bug-gawk@gnu.org. Please include your operating system and its revi‐
2080 sion, the version of gawk (from gawk --version), which C compiler you
2081 used to compile it, and a test program and data that are as small as
2082 possible for reproducing the problem.
2083
2084 Before sending a bug report, please do the following things. First,
2085 verify that you have the latest version of gawk. Many bugs (usually
2086 subtle ones) are fixed at each release, and if yours is out of date,
2087 the problem may already have been solved. Second, please see if set‐
2088 ting the environment variable LC_ALL to LC_ALL=C causes things to
2089 behave as you expect. If so, it's a locale issue, and may or may not
2090 really be a bug. Finally, please read this man page and the reference
2091 manual carefully to be sure that what you think is a bug really is,
2092 instead of just a quirk in the language.
2093
2094 Whatever you do, do NOT post a bug report in comp.lang.awk. While the
2095 gawk developers occasionally read this newsgroup, posting bug reports
2096 there is an unreliable way to report bugs. Similarly, do NOT use a web
2097 forum (such as Stack Overflow) for reporting bugs. Instead, please use
2098 the electronic mail addresses given above. Really.
2099
2100 If you're using a GNU/Linux or BSD-based system, you may wish to submit
2101 a bug report to the vendor of your distribution. That's fine, but
2102 please send a copy to the official email address as well, since there's
2103 no guarantee that the bug report will be forwarded to the gawk main‐
2104 tainer.
2105
2107 The -F option is not necessary given the command line variable assign‐
2108 ment feature; it remains only for backwards compatibility.
2109
2111 egrep(1), sed(1), getpid(2), getppid(2), getpgrp(2), getuid(2),
2112 geteuid(2), getgid(2), getegid(2), getgroups(2), printf(3), strf‐
2113 time(3), usleep(3)
2114
2115 The AWK Programming Language, Alfred V. Aho, Brian W. Kernighan, Peter
2116 J. Weinberger, Addison-Wesley, 1988. ISBN 0-201-07981-X.
2117
2118 GAWK: Effective AWK Programming, Edition 5.0, shipped with the gawk
2119 source. The current version of this document is available online at
2120 https://www.gnu.org/software/gawk/manual.
2121
2122 The GNU gettext documentation, available online at
2123 https://www.gnu.org/software/gettext.
2124
2126 Print and sort the login names of all users:
2127
2128 BEGIN { FS = ":" }
2129 { print $1 | "sort" }
2130
2131 Count lines in a file:
2132
2133 { nlines++ }
2134 END { print nlines }
2135
2136 Precede each line by its number in the file:
2137
2138 { print FNR, $0 }
2139
2140 Concatenate and line number (a variation on a theme):
2141
2142 { print NR, $0 }
2143
2144 Run an external command for particular lines of data:
2145
2146 tail -f access_log |
2147 awk '/myhome.html/ { system("nmap " $1 ">> logdir/myhome.html") }'
2148
2150 Brian Kernighan provided valuable assistance during testing and debug‐
2151 ging. We thank him.
2152
2154 Copyright © 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
2155 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 2013, 2014,
2156 2015, 2016, 2017, 2018, 2019, Free Software Foundation, Inc.
2157
2158 Permission is granted to make and distribute verbatim copies of this
2159 manual page provided the copyright notice and this permission notice
2160 are preserved on all copies.
2161
2162 Permission is granted to copy and distribute modified versions of this
2163 manual page under the conditions for verbatim copying, provided that
2164 the entire resulting derived work is distributed under the terms of a
2165 permission notice identical to this one.
2166
2167 Permission is granted to copy and distribute translations of this man‐
2168 ual page into another language, under the above conditions for modified
2169 versions, except that this permission notice may be stated in a trans‐
2170 lation approved by the Foundation.
2171
2172
2173
2174Free Software Foundation May 22 2019 GAWK(1)