1GAWK(1) Utility Commands GAWK(1)
2
3
4
6 gawk - pattern scanning and processing language
7
9 gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
10 gawk [ POSIX or GNU style options ] [ -- ] program-text file ...
11
13 Gawk is the GNU Project's implementation of the AWK programming lan‐
14 guage. It conforms to the definition of the language in the POSIX
15 1003.1 standard. This version in turn is based on the description in
16 The AWK Programming Language, by Aho, Kernighan, and Weinberger. Gawk
17 provides the additional features found in the current version of Brian
18 Kernighan's awk and numerous GNU-specific extensions.
19
20 The command line consists of options to gawk itself, the AWK program
21 text (if not supplied via the -f or --include options), and values to
22 be made available in the ARGC and ARGV pre-defined AWK variables.
23
24 When gawk is invoked with the --profile option, it starts gathering
25 profiling statistics from the execution of the program. Gawk runs more
26 slowly in this mode, and automatically produces an execution profile in
27 the file awkprof.out when done. See the --profile option, below.
28
29 Gawk also has an integrated debugger. An interactive debugging session
30 can be started by supplying the --debug option to the command line. In
31 this mode of execution, gawk loads the AWK source code and then prompts
32 for debugging commands. Gawk can only debug AWK program source pro‐
33 vided with the -f and --include options. The debugger is documented in
34 GAWK: Effective AWK Programming.
35
37 Gawk options may be either traditional POSIX-style one letter options,
38 or GNU-style long options. POSIX options start with a single “-”,
39 while long options start with “--”. Long options are provided for both
40 GNU-specific features and for POSIX-mandated features.
41
42 Gawk-specific options are typically used in long-option form. Argu‐
43 ments to long options are either joined with the option by an = sign,
44 with no intervening spaces, or they may be provided in the next command
45 line argument. Long options may be abbreviated, as long as the abbre‐
46 viation remains unique.
47
48 Additionally, every long option has a corresponding short option, so
49 that the option's functionality may be used from within #! executable
50 scripts.
51
53 Gawk accepts the following options. Standard options are listed first,
54 followed by options for gawk extensions, listed alphabetically by short
55 option.
56
57 -f program-file
58 --file program-file
59 Read the AWK program source from the file program-file, instead
60 of from the first command line argument. Multiple -f (or
61 --file) options may be used. Files read with -f are treated as
62 if they begin with an implicit @namespace "awk" statement.
63
64 -F fs
65 --field-separator fs
66 Use fs for the input field separator (the value of the FS prede‐
67 fined variable).
68
69 -v var=val
70 --assign var=val
71 Assign the value val to the variable var, before execution of
72 the program begins. Such variable values are available to the
73 BEGIN rule of an AWK program.
74
75 -b
76 --characters-as-bytes
77 Treat all input data as single-byte characters. In other words,
78 don't pay any attention to the locale information when attempt‐
79 ing to process strings as multibyte characters. The --posix op‐
80 tion overrides this one.
81
82 -c
83 --traditional
84 Run in compatibility mode. In compatibility mode, gawk behaves
85 identically to Brian Kernighan's awk; none of the GNU-specific
86 extensions are recognized. See GNU EXTENSIONS, below, for more
87 information.
88
89 -C
90 --copyright
91 Print the short version of the GNU copyright information message
92 on the standard output and exit successfully.
93
94 -d[file]
95 --dump-variables[=file]
96 Print a sorted list of global variables, their types and final
97 values to file. If no file is provided, gawk uses a file named
98 awkvars.out in the current directory.
99 Having a list of all the global variables is a good way to look
100 for typographical errors in your programs. You would also use
101 this option if you have a large program with a lot of functions,
102 and you want to be sure that your functions don't inadvertently
103 use global variables that you meant to be local. (This is a
104 particularly easy mistake to make with simple variable names
105 like i, j, and so on.)
106
107 -D[file]
108 --debug[=file]
109 Enable debugging of AWK programs. By default, the debugger
110 reads commands interactively from the keyboard (standard input).
111 The optional file argument specifies a file with a list of com‐
112 mands for the debugger to execute non-interactively.
113
114 -e program-text
115 --source program-text
116 Use program-text as AWK program source code. This option allows
117 the easy intermixing of library functions (used via the -f and
118 --include options) with source code entered on the command line.
119 It is intended primarily for medium to large AWK programs used
120 in shell scripts. Each argument supplied via -e is treated as
121 if it begins with an implicit @namespace "awk" statement.
122
123 -E file
124 --exec file
125 Similar to -f, however, this is option is the last one pro‐
126 cessed. This should be used with #! scripts, particularly for
127 CGI applications, to avoid passing in options or source code (!)
128 on the command line from a URL. This option disables command-
129 line variable assignments.
130
131 -g
132 --gen-pot
133 Scan and parse the AWK program, and generate a GNU .pot (Porta‐
134 ble Object Template) format file on standard output with entries
135 for all localizable strings in the program. The program itself
136 is not executed. See the GNU gettext distribution for more in‐
137 formation on .pot files.
138
139 -h
140 --help Print a relatively short summary of the available options on the
141 standard output. (Per the GNU Coding Standards, these options
142 cause an immediate, successful exit.)
143
144 -i include-file
145 --include include-file
146 Load an awk source library. This searches for the library using
147 the AWKPATH environment variable. If the initial search fails,
148 another attempt will be made after appending the .awk suffix.
149 The file will be loaded only once (i.e., duplicates are elimi‐
150 nated), and the code does not constitute the main program
151 source. Files read with --include are treated as if they begin
152 with an implicit @namespace "awk" statement.
153
154 -I
155 --trace
156 Print the internal byte code names as they are executed when
157 running the program. The trace is printed to standard error.
158 Each ``op code'' is preceded by a + sign in the output.
159
160 -l lib
161 --load lib
162 Load a gawk extension from the shared library lib. This
163 searches for the library using the AWKLIBPATH environment vari‐
164 able. If the initial search fails, another attempt will be made
165 after appending the default shared library suffix for the plat‐
166 form. The library initialization routine is expected to be
167 named dl_load().
168
169 -L [value]
170 --lint[=value]
171 Provide warnings about constructs that are dubious or non-porta‐
172 ble to other AWK implementations. With an optional argument of
173 fatal, lint warnings become fatal errors. This may be drastic,
174 but its use will certainly encourage the development of cleaner
175 AWK programs. With an optional argument of invalid, only warn‐
176 ings about things that are actually invalid are issued. (This
177 is not fully implemented yet.) With an optional argument of no-
178 ext, warnings about gawk extensions are disabled.
179
180 -M
181 --bignum
182 Force arbitrary precision arithmetic on numbers. This option has
183 no effect if gawk is not compiled to use the GNU MPFR and GMP
184 libraries. (In such a case, gawk issues a warning.)
185
186 -n
187 --non-decimal-data
188 Recognize octal and hexadecimal values in input data. Use this
189 option with great caution!
190
191 -N
192 --use-lc-numeric
193 Force gawk to use the locale's decimal point character when
194 parsing input data. Although the POSIX standard requires this
195 behavior, and gawk does so when --posix is in effect, the de‐
196 fault is to follow traditional behavior and use a period as the
197 decimal point, even in locales where the period is not the deci‐
198 mal point character. This option overrides the default behav‐
199 ior, without the full draconian strictness of the --posix op‐
200 tion.
201
202 -o[file]
203 --pretty-print[=file]
204 Output a pretty printed version of the program to file. If no
205 file is provided, gawk uses a file named awkprof.out in the cur‐
206 rent directory. This option implies --no-optimize.
207
208 -O
209 --optimize
210 Enable gawk's default optimizations upon the internal represen‐
211 tation of the program. Currently, this just includes simple
212 constant folding. This option is on by default.
213
214 -p[prof-file]
215 --profile[=prof-file]
216 Start a profiling session, and send the profiling data to prof-
217 file. The default is awkprof.out. The profile contains execu‐
218 tion counts of each statement in the program in the left margin
219 and function call counts for each user-defined function. This
220 option implies --no-optimize.
221
222 -P
223 --posix
224 This turns on compatibility mode, with the following additional
225 restrictions:
226
227 • \x escape sequences are not recognized.
228
229 • You cannot continue lines after ? and :.
230
231 • The synonym func for the keyword function is not recognized.
232
233 • The operators ** and **= cannot be used in place of ^ and ^=.
234
235 -r
236 --re-interval
237 Enable the use of interval expressions in regular expression
238 matching (see Regular Expressions, below). Interval expressions
239 were not traditionally available in the AWK language. The POSIX
240 standard added them, to make awk and egrep consistent with each
241 other. They are enabled by default, but this option remains for
242 use together with --traditional.
243
244 -s
245 --no-optimize
246 Disable gawk's default optimizations upon the internal represen‐
247 tation of the program.
248
249 -S
250 --sandbox
251 Run gawk in sandbox mode, disabling the system() function, input
252 redirection with getline, output redirection with print and
253 printf, and loading dynamic extensions. Command execution
254 (through pipelines) is also disabled. This effectively blocks a
255 script from accessing local resources, except for the files
256 specified on the command line.
257
258 -t
259 --lint-old
260 Provide warnings about constructs that are not portable to the
261 original version of UNIX awk.
262
263 -V
264 --version
265 Print version information for this particular copy of gawk on
266 the standard output. This is useful mainly for knowing if the
267 current copy of gawk on your system is up to date with respect
268 to whatever the Free Software Foundation is distributing. This
269 is also useful when reporting bugs. (Per the GNU Coding Stan‐
270 dards, these options cause an immediate, successful exit.)
271
272 -- Signal the end of options. This is useful to allow further argu‐
273 ments to the AWK program itself to start with a “-”. This pro‐
274 vides consistency with the argument parsing convention used by
275 most other POSIX programs.
276
277 In compatibility mode, any other options are flagged as invalid, but
278 are otherwise ignored. In normal operation, as long as program text
279 has been supplied, unknown options are passed on to the AWK program in
280 the ARGV array for processing. This is particularly useful for running
281 AWK programs via the #! executable interpreter mechanism.
282
283 For POSIX compatibility, the -W option may be used, followed by the
284 name of a long option.
285
287 An AWK program consists of a sequence of optional directives, pattern-
288 action statements, and optional function definitions.
289
290 @include "filename"
291 @load "filename"
292 @namespace "name"
293 pattern { action statements }
294 function name(parameter list) { statements }
295
296 Gawk first reads the program source from the program-file(s) if speci‐
297 fied, from arguments to --source, or from the first non-option argument
298 on the command line. The -f and --source options may be used multiple
299 times on the command line. Gawk reads the program text as if all the
300 program-files and command line source texts had been concatenated to‐
301 gether. This is useful for building libraries of AWK functions, with‐
302 out having to include them in each new AWK program that uses them. It
303 also provides the ability to mix library functions with command line
304 programs.
305
306 In addition, lines beginning with @include may be used to include other
307 source files into your program, making library use even easier. This
308 is equivalent to using the --include option.
309
310 Lines beginning with @load may be used to load extension functions into
311 your program. This is equivalent to using the --load option.
312
313 The environment variable AWKPATH specifies a search path to use when
314 finding source files named with the -f and --include options. If this
315 variable does not exist, the default path is ".:/usr/local/share/awk".
316 (The actual directory may vary, depending upon how gawk was built and
317 installed.) If a file name given to the -f option contains a “/” char‐
318 acter, no path search is performed.
319
320 The environment variable AWKLIBPATH specifies a search path to use when
321 finding source files named with the --load option. If this variable
322 does not exist, the default path is "/usr/local/lib/gawk". (The actual
323 directory may vary, depending upon how gawk was built and installed.)
324
325 Gawk executes AWK programs in the following order. First, all variable
326 assignments specified via the -v option are performed. Next, gawk com‐
327 piles the program into an internal form. Then, gawk executes the code
328 in the BEGIN rule(s) (if any), and then proceeds to read each file
329 named in the ARGV array (up to ARGV[ARGC-1]). If there are no files
330 named on the command line, gawk reads the standard input.
331
332 If a filename on the command line has the form var=val it is treated as
333 a variable assignment. The variable var will be assigned the value
334 val. (This happens after any BEGIN rule(s) have been run.) Command
335 line variable assignment is most useful for dynamically assigning val‐
336 ues to the variables AWK uses to control how input is broken into
337 fields and records. It is also useful for controlling state if multi‐
338 ple passes are needed over a single data file.
339
340 If the value of a particular element of ARGV is empty (""), gawk skips
341 over it.
342
343 For each input file, if a BEGINFILE rule exists, gawk executes the as‐
344 sociated code before processing the contents of the file. Similarly,
345 gawk executes the code associated with ENDFILE after processing the
346 file.
347
348 For each record in the input, gawk tests to see if it matches any pat‐
349 tern in the AWK program. For each pattern that the record matches,
350 gawk executes the associated action. The patterns are tested in the
351 order they occur in the program.
352
353 Finally, after all the input is exhausted, gawk executes the code in
354 the END rule(s) (if any).
355
356 Command Line Directories
357 According to POSIX, files named on the awk command line must be text
358 files. The behavior is ``undefined'' if they are not. Most versions
359 of awk treat a directory on the command line as a fatal error.
360
361 Starting with version 4.0 of gawk, a directory on the command line pro‐
362 duces a warning, but is otherwise skipped. If either of the --posix or
363 --traditional options is given, then gawk reverts to treating directo‐
364 ries on the command line as a fatal error.
365
367 AWK variables are dynamic; they come into existence when they are first
368 used. Their values are either floating-point numbers or strings, or
369 both, depending upon how they are used. Additionally, gawk allows
370 variables to have regular-expression type. AWK also has one dimen‐
371 sional arrays; arrays with multiple dimensions may be simulated. Gawk
372 provides true arrays of arrays; see Arrays, below. Several pre-defined
373 variables are set as a program runs; these are described as needed and
374 summarized below.
375
376 Records
377 Normally, records are separated by newline characters. You can control
378 how records are separated by assigning values to the built-in variable
379 RS. If RS is any single character, that character separates records.
380 Otherwise, RS is a regular expression. Text in the input that matches
381 this regular expression separates the record. However, in compatibil‐
382 ity mode, only the first character of its string value is used for sep‐
383 arating records. If RS is set to the null string, then records are
384 separated by empty lines. When RS is set to the null string, the new‐
385 line character always acts as a field separator, in addition to what‐
386 ever value FS may have.
387
388 Fields
389 As each input record is read, gawk splits the record into fields, using
390 the value of the FS variable as the field separator. If FS is a single
391 character, fields are separated by that character. If FS is the null
392 string, then each individual character becomes a separate field. Oth‐
393 erwise, FS is expected to be a full regular expression. In the special
394 case that FS is a single space, fields are separated by runs of spaces
395 and/or tabs and/or newlines. NOTE: The value of IGNORECASE (see below)
396 also affects how fields are split when FS is a regular expression, and
397 how records are separated when RS is a regular expression.
398
399 If the FIELDWIDTHS variable is set to a space-separated list of num‐
400 bers, each field is expected to have fixed width, and gawk splits up
401 the record using the specified widths. Each field width may optionally
402 be preceded by a colon-separated value specifying the number of charac‐
403 ters to skip before the field starts. The value of FS is ignored. As‐
404 signing a new value to FS or FPAT overrides the use of FIELDWIDTHS.
405
406 Similarly, if the FPAT variable is set to a string representing a regu‐
407 lar expression, each field is made up of text that matches that regular
408 expression. In this case, the regular expression describes the fields
409 themselves, instead of the text that separates the fields. Assigning a
410 new value to FS or FIELDWIDTHS overrides the use of FPAT.
411
412 Each field in the input record may be referenced by its position: $1,
413 $2, and so on. $0 is the whole record, including leading and trailing
414 whitespace. Fields need not be referenced by constants:
415
416 n = 5
417 print $n
418
419 prints the fifth field in the input record.
420
421 The variable NF is set to the total number of fields in the input
422 record.
423
424 References to non-existent fields (i.e., fields after $NF) produce the
425 null string. However, assigning to a non-existent field (e.g., $(NF+2)
426 = 5) increases the value of NF, creates any intervening fields with the
427 null string as their values, and causes the value of $0 to be recom‐
428 puted, with the fields being separated by the value of OFS. References
429 to negative numbered fields cause a fatal error. Decrementing NF
430 causes the values of fields past the new value to be lost, and the
431 value of $0 to be recomputed, with the fields being separated by the
432 value of OFS.
433
434 Assigning a value to an existing field causes the whole record to be
435 rebuilt when $0 is referenced. Similarly, assigning a value to $0
436 causes the record to be resplit, creating new values for the fields.
437
438 Built-in Variables
439 Gawk's built-in variables are:
440
441 ARGC The number of command line arguments (does not include op‐
442 tions to gawk, or the program source).
443
444 ARGIND The index in ARGV of the current file being processed.
445
446 ARGV Array of command line arguments. The array is indexed from
447 0 to ARGC - 1. Dynamically changing the contents of ARGV
448 can control the files used for data.
449
450 BINMODE On non-POSIX systems, specifies use of “binary” mode for
451 all file I/O. Numeric values of 1, 2, or 3, specify that
452 input files, output files, or all files, respectively,
453 should use binary I/O. String values of "r", or "w" spec‐
454 ify that input files, or output files, respectively, should
455 use binary I/O. String values of "rw" or "wr" specify that
456 all files should use binary I/O. Any other string value is
457 treated as "rw", but generates a warning message.
458
459 CONVFMT The conversion format for numbers, "%.6g", by default.
460
461 ENVIRON An array containing the values of the current environment.
462 The array is indexed by the environment variables, each el‐
463 ement being the value of that variable (e.g., ENVI‐
464 RON["HOME"] might be "/home/arnold").
465
466 In POSIX mode, changing this array does not affect the en‐
467 vironment seen by programs which gawk spawns via redirect‐
468 ion or the system() function. Otherwise, gawk updates its
469 real environment so that programs it spawns see the
470 changes.
471
472 ERRNO If a system error occurs either doing a redirection for
473 getline, during a read for getline, or during a close(),
474 then ERRNO is set to a string describing the error. The
475 value is subject to translation in non-English locales. If
476 the string in ERRNO corresponds to a system error in the
477 errno(3) variable, then the numeric value can be found in
478 PROCINFO["errno"]. For non-system errors, PROCINFO["er‐
479 rno"] will be zero.
480
481 FIELDWIDTHS A whitespace-separated list of field widths. When set,
482 gawk parses the input into fields of fixed width, instead
483 of using the value of the FS variable as the field separa‐
484 tor. Each field width may optionally be preceded by a
485 colon-separated value specifying the number of characters
486 to skip before the field starts. See Fields, above.
487
488 FILENAME The name of the current input file. If no files are speci‐
489 fied on the command line, the value of FILENAME is “-”.
490 However, FILENAME is undefined inside the BEGIN rule (un‐
491 less set by getline).
492
493 FNR The input record number in the current input file.
494
495 FPAT A regular expression describing the contents of the fields
496 in a record. When set, gawk parses the input into fields,
497 where the fields match the regular expression, instead of
498 using the value of FS as the field separator. See Fields,
499 above.
500
501 FS The input field separator, a space by default. See Fields,
502 above.
503
504 FUNCTAB An array whose indices and corresponding values are the
505 names of all the user-defined or extension functions in the
506 program. NOTE: You may not use the delete statement with
507 the FUNCTAB array.
508
509 IGNORECASE Controls the case-sensitivity of all regular expression and
510 string operations. If IGNORECASE has a non-zero value,
511 then string comparisons and pattern matching in rules,
512 field splitting with FS and FPAT, record separating with
513 RS, regular expression matching with ~ and !~, and the gen‐
514 sub(), gsub(), index(), match(), patsplit(), split(), and
515 sub() built-in functions all ignore case when doing regular
516 expression operations. NOTE: Array subscripting is not af‐
517 fected. However, the asort() and asorti() functions are
518 affected.
519 Thus, if IGNORECASE is not equal to zero, /aB/ matches all
520 of the strings "ab", "aB", "Ab", and "AB". As with all AWK
521 variables, the initial value of IGNORECASE is zero, so all
522 regular expression and string operations are normally case-
523 sensitive.
524
525 LINT Provides dynamic control of the --lint option from within
526 an AWK program. When true, gawk prints lint warnings. When
527 false, it does not. The values allowed for the --lint op‐
528 tion may also be assigned to LINT, with the same effects.
529 Any other true value just prints warnings.
530
531 NF The number of fields in the current input record.
532
533 NR The total number of input records seen so far.
534
535 OFMT The output format for numbers, "%.6g", by default.
536
537 OFS The output field separator, a space by default.
538
539 ORS The output record separator, by default a newline.
540
541 PREC The working precision of arbitrary precision floating-point
542 numbers, 53 by default.
543
544 PROCINFO The elements of this array provide access to information
545 about the running AWK program. On some systems, there may
546 be elements in the array, "group1" through "groupn" for
547 some n, which is the number of supplementary groups that
548 the process has. Use the in operator to test for these el‐
549 ements. The following elements are guaranteed to be avail‐
550 able:
551
552 PROCINFO["argv"] The command line arguments as received
553 by gawk at the C-language level. The
554 subscripts start from zero.
555
556 PROCINFO["egid"] The value of the getegid(2) system
557 call.
558
559 PROCINFO["errno"] The value of errno(3) when ERRNO is
560 set to the associated error message.
561
562 PROCINFO["euid"] The value of the geteuid(2) system
563 call.
564
565 PROCINFO["FS"] "FS" if field splitting with FS is in
566 effect, "FPAT" if field splitting with
567 FPAT is in effect, "FIELDWIDTHS" if
568 field splitting with FIELDWIDTHS is in
569 effect, or "API" if API input parser
570 field splitting is in effect.
571
572 PROCINFO["gid"] The value of the getgid(2) system
573 call.
574
575 PROCINFO["identifiers"]
576 A subarray, indexed by the names of
577 all identifiers used in the text of
578 the AWK program. The values indicate
579 what gawk knows about the identifiers
580 after it has finished parsing the pro‐
581 gram; they are not updated while the
582 program runs. For each identifier,
583 the value of the element is one of the
584 following:
585
586 "array" The identifier is an ar‐
587 ray.
588
589 "builtin" The identifier is a built-
590 in function.
591
592 "extension" The identifier is an ex‐
593 tension function loaded
594 via @load or --load.
595
596 "scalar" The identifier is a
597 scalar.
598
599 "untyped" The identifier is untyped
600 (could be used as a scalar
601 or array, gawk doesn't
602 know yet).
603
604 "user" The identifier is a user-
605 defined function.
606
607 PROCINFO["pgrpid"] The value of the getpgrp(2) system
608 call.
609
610 PROCINFO["pid"] The value of the getpid(2) system
611 call.
612
613 PROCINFO["platform"] A string indicating the platform for
614 which gawk was compiled. It is one
615 of:
616
617 "djgpp", "mingw"
618 Microsoft Windows, using either
619 DJGPP, or MinGW, respectively.
620
621 "os2" OS/2.
622
623 "posix"
624 GNU/Linux, Cygwin, Mac OS X,
625 and legacy Unix systems.
626
627 "vms" OpenVMS or Vax/VMS.
628
629 PROCINFO["ppid"] The value of the getppid(2) system
630 call.
631
632 PROCINFO["strftime"] The default time format string for
633 strftime(). Changing its value af‐
634 fects how strftime() formats time val‐
635 ues when called with no arguments.
636
637 PROCINFO["uid"] The value of the getuid(2) system
638 call.
639
640 PROCINFO["version"] The version of gawk.
641
642 The following elements are present if loading dynamic ex‐
643 tensions is available:
644
645 PROCINFO["api_major"]
646 The major version of the extension API.
647
648 PROCINFO["api_minor"]
649 The minor version of the extension API.
650
651 The following elements are available if MPFR support is
652 compiled into gawk:
653
654 PROCINFO["gmp_version"]
655 The version of the GNU GMP library used for arbi‐
656 trary precision number support in gawk.
657
658 PROCINFO["mpfr_version"]
659 The version of the GNU MPFR library used for arbi‐
660 trary precision number support in gawk.
661
662 PROCINFO["prec_max"]
663 The maximum precision supported by the GNU MPFR li‐
664 brary for arbitrary precision floating-point num‐
665 bers.
666
667 PROCINFO["prec_min"]
668 The minimum precision allowed by the GNU MPFR li‐
669 brary for arbitrary precision floating-point num‐
670 bers.
671
672 The following elements may set by a program to change
673 gawk's behavior:
674
675 PROCINFO["NONFATAL"]
676 If this exists, then I/O errors for all redirections
677 become nonfatal.
678
679 PROCINFO["name", "NONFATAL"]
680 Make I/O errors for name be nonfatal.
681
682 PROCINFO["command", "pty"]
683 Use a pseudo-tty for two-way communication with com‐
684 mand instead of setting up two one-way pipes.
685
686 PROCINFO["input", "READ_TIMEOUT"]
687 The timeout in milliseconds for reading data from
688 input, where input is a redirection string or a
689 filename. A value of zero or less than zero means no
690 timeout.
691
692 PROCINFO["input", "RETRY"]
693 If an I/O error that may be retried occurs when
694 reading data from input, and this array entry ex‐
695 ists, then getline returns -2 instead of following
696 the default behavior of returning -1 and configuring
697 input to return no further data. An I/O error that
698 may be retried is one where errno(3) has the value
699 EAGAIN, EWOULDBLOCK, EINTR, or ETIMEDOUT. This may
700 be useful in conjunction with PROCINFO["input",
701 "READ_TIMEOUT"] or in situations where a file de‐
702 scriptor has been configured to behave in a non-
703 blocking fashion.
704
705 PROCINFO["sorted_in"]
706 If this element exists in PROCINFO, then its value
707 controls the order in which array elements are tra‐
708 versed in for loops. Supported values are
709 "@ind_str_asc", "@ind_num_asc", "@val_type_asc",
710 "@val_str_asc", "@val_num_asc", "@ind_str_desc",
711 "@ind_num_desc", "@val_type_desc", "@val_str_desc",
712 "@val_num_desc", and "@unsorted". The value can
713 also be the name (as a string) of any comparison
714 function defined as follows:
715
716 function cmp_func(i1, v1, i2, v2)
717
718 where i1 and i2 are the indices, and v1 and v2 are
719 the corresponding values of the two elements being
720 compared. It should return a number less than,
721 equal to, or greater than 0, depending on how the
722 elements of the array are to be ordered.
723
724 ROUNDMODE The rounding mode to use for arbitrary precision arithmetic
725 on numbers, by default "N" (IEEE-754 roundTiesToEven mode).
726 The accepted values are:
727
728 "A" or "a"
729 for rounding away from zero. These are only avail‐
730 able if your version of the GNU MPFR library sup‐
731 ports rounding away from zero.
732
733 "D" or "d" for roundTowardNegative.
734
735 "N" or "n" for roundTiesToEven.
736
737 "U" or "u" for roundTowardPositive.
738
739 "Z" or "z" for roundTowardZero.
740
741 RS The input record separator, by default a newline.
742
743 RT The record terminator. Gawk sets RT to the input text that
744 matched the character or regular expression specified by
745 RS.
746
747 RSTART The index of the first character matched by match(); 0 if
748 no match. (This implies that character indices start at
749 one.)
750
751 RLENGTH The length of the string matched by match(); -1 if no
752 match.
753
754 SUBSEP The string used to separate multiple subscripts in array
755 elements, by default "\034".
756
757 SYMTAB An array whose indices are the names of all currently de‐
758 fined global variables and arrays in the program. The ar‐
759 ray may be used for indirect access to read or write the
760 value of a variable:
761
762 foo = 5
763 SYMTAB["foo"] = 4
764 print foo # prints 4
765
766 The typeof() function may be used to test if an element in
767 SYMTAB is an array. You may not use the delete statement
768 with the SYMTAB array, nor assign to elements with an index
769 that is not a variable name.
770
771 TEXTDOMAIN The text domain of the AWK program; used to find the local‐
772 ized translations for the program's strings.
773
774 Arrays
775 Arrays are subscripted with an expression between square brackets ([
776 and ]). If the expression is an expression list (expr, expr ...) then
777 the array subscript is a string consisting of the concatenation of the
778 (string) value of each expression, separated by the value of the SUBSEP
779 variable. This facility is used to simulate multiply dimensioned ar‐
780 rays. For example:
781
782 i = "A"; j = "B"; k = "C"
783 x[i, j, k] = "hello, world\n"
784
785 assigns the string "hello, world\n" to the element of the array x which
786 is indexed by the string "A\034B\034C". All arrays in AWK are associa‐
787 tive, i.e., indexed by string values.
788
789 The special operator in may be used to test if an array has an index
790 consisting of a particular value:
791
792 if (val in array)
793 print array[val]
794
795 If the array has multiple subscripts, use (i, j) in array.
796
797 The in construct may also be used in a for loop to iterate over all the
798 elements of an array. However, the (i, j) in array construct only
799 works in tests, not in for loops.
800
801 An element may be deleted from an array using the delete statement.
802 The delete statement may also be used to delete the entire contents of
803 an array, just by specifying the array name without a subscript.
804
805 gawk supports true multidimensional arrays. It does not require that
806 such arrays be ``rectangular'' as in C or C++. For example:
807
808 a[1] = 5
809 a[2][1] = 6
810 a[2][2] = 7
811
812 NOTE: You may need to tell gawk that an array element is really a sub‐
813 array in order to use it where gawk expects an array (such as in the
814 second argument to split()). You can do this by creating an element in
815 the subarray and then deleting it with the delete statement.
816
817 Namespaces
818 Gawk provides a simple namespace facility to help work around the fact
819 that all variables in AWK are global.
820
821 A qualified name consists of a two simple identifiers joined by a dou‐
822 ble colon (::). The left-hand identifier represents the namespace and
823 the right-hand identifier is the variable within it. All simple (non-
824 qualified) names are considered to be in the ``current'' namespace; the
825 default namespace is awk. However, simple identifiers consisting
826 solely of uppercase letters are forced into the awk namespace, even if
827 the current namespace is different.
828
829 You change the current namespace with an @namespace "name" directive.
830
831 The standard predefined builtin function names may not be used as name‐
832 space names. The names of additional functions provided by gawk may be
833 used as namespace names or as simple identifiers in other namespaces.
834 For more details, see GAWK: Effective AWK Programming.
835
836 Variable Typing And Conversion
837 Variables and fields may be (floating point) numbers, or strings, or
838 both. They may also be regular expressions. How the value of a vari‐
839 able is interpreted depends upon its context. If used in a numeric ex‐
840 pression, it will be treated as a number; if used as a string it will
841 be treated as a string.
842
843 To force a variable to be treated as a number, add zero to it; to force
844 it to be treated as a string, concatenate it with the null string.
845
846 Uninitialized variables have the numeric value zero and the string
847 value "" (the null, or empty, string).
848
849 When a string must be converted to a number, the conversion is accom‐
850 plished using strtod(3). A number is converted to a string by using
851 the value of CONVFMT as a format string for sprintf(3), with the nu‐
852 meric value of the variable as the argument. However, even though all
853 numbers in AWK are floating-point, integral values are always converted
854 as integers. Thus, given
855
856 CONVFMT = "%2.2f"
857 a = 12
858 b = a ""
859
860 the variable b has a string value of "12" and not "12.00".
861
862 NOTE: When operating in POSIX mode (such as with the --posix option),
863 beware that locale settings may interfere with the way decimal numbers
864 are treated: the decimal separator of the numbers you are feeding to
865 gawk must conform to what your locale would expect, be it a comma (,)
866 or a period (.).
867
868 Gawk performs comparisons as follows: If two variables are numeric,
869 they are compared numerically. If one value is numeric and the other
870 has a string value that is a “numeric string,” then comparisons are
871 also done numerically. Otherwise, the numeric value is converted to a
872 string and a string comparison is performed. Two strings are compared,
873 of course, as strings.
874
875 Note that string constants, such as "57", are not numeric strings, they
876 are string constants. The idea of “numeric string” only applies to
877 fields, getline input, FILENAME, ARGV elements, ENVIRON elements and
878 the elements of an array created by split() or patsplit() that are nu‐
879 meric strings. The basic idea is that user input, and only user input,
880 that looks numeric, should be treated that way.
881
882 Octal and Hexadecimal Constants
883 You may use C-style octal and hexadecimal constants in your AWK program
884 source code. For example, the octal value 011 is equal to decimal 9,
885 and the hexadecimal value 0x11 is equal to decimal 17.
886
887 String Constants
888 String constants in AWK are sequences of characters enclosed between
889 double quotes (like "value"). Within strings, certain escape sequences
890 are recognized, as in C. These are:
891
892 \\ A literal backslash.
893
894 \a The “alert” character; usually the ASCII BEL character.
895
896 \b Backspace.
897
898 \f Form-feed.
899
900 \n Newline.
901
902 \r Carriage return.
903
904 \t Horizontal tab.
905
906 \v Vertical tab.
907
908 \xhex digits
909 The character represented by the string of hexadecimal digits fol‐
910 lowing the \x. Up to two following hexadecimal digits are consid‐
911 ered part of the escape sequence. E.g., "\x1B" is the ASCII ESC
912 (escape) character.
913
914 \ddd The character represented by the 1-, 2-, or 3-digit sequence of
915 octal digits. E.g., "\033" is the ASCII ESC (escape) character.
916
917 \c The literal character c.
918
919 In compatibility mode, the characters represented by octal and hexadec‐
920 imal escape sequences are treated literally when used in regular ex‐
921 pression constants. Thus, /a\52b/ is equivalent to /a\*b/.
922
923 Regexp Constants
924 A regular expression constant is a sequence of characters enclosed be‐
925 tween forward slashes (like /value/). Regular expression matching is
926 described more fully below; see Regular Expressions.
927
928 The escape sequences described earlier may also be used inside constant
929 regular expressions (e.g., /[ \t\f\n\r\v]/ matches whitespace charac‐
930 ters).
931
932 Gawk provides strongly typed regular expression constants. These are
933 written with a leading @ symbol (like so: @/value/). Such constants
934 may be assigned to scalars (variables, array elements) and passed to
935 user-defined functions. Variables that have been so assigned have regu‐
936 lar expression type.
937
939 AWK is a line-oriented language. The pattern comes first, and then the
940 action. Action statements are enclosed in { and }. Either the pattern
941 may be missing, or the action may be missing, but, of course, not both.
942 If the pattern is missing, the action executes for every single record
943 of input. A missing action is equivalent to
944
945 { print }
946
947 which prints the entire record.
948
949 Comments begin with the # character, and continue until the end of the
950 line. Empty lines may be used to separate statements. Normally, a
951 statement ends with a newline, however, this is not the case for lines
952 ending in a comma, {, ?, :, &&, or ||. Lines ending in do or else also
953 have their statements automatically continued on the following line.
954 In other cases, a line can be continued by ending it with a “\”, in
955 which case the newline is ignored. However, a “\” after a # is not
956 special.
957
958 Multiple statements may be put on one line by separating them with a
959 “;”. This applies to both the statements within the action part of a
960 pattern-action pair (the usual case), and to the pattern-action state‐
961 ments themselves.
962
963 Patterns
964 AWK patterns may be one of the following:
965
966 BEGIN
967 END
968 BEGINFILE
969 ENDFILE
970 /regular expression/
971 relational expression
972 pattern && pattern
973 pattern || pattern
974 pattern ? pattern : pattern
975 (pattern)
976 ! pattern
977 pattern1, pattern2
978
979 BEGIN and END are two special kinds of patterns which are not tested
980 against the input. The action parts of all BEGIN patterns are merged
981 as if all the statements had been written in a single BEGIN rule. They
982 are executed before any of the input is read. Similarly, all the END
983 rules are merged, and executed when all the input is exhausted (or when
984 an exit statement is executed). BEGIN and END patterns cannot be com‐
985 bined with other patterns in pattern expressions. BEGIN and END pat‐
986 terns cannot have missing action parts.
987
988 BEGINFILE and ENDFILE are additional special patterns whose actions are
989 executed before reading the first record of each command-line input
990 file and after reading the last record of each file. Inside the BEGIN‐
991 FILE rule, the value of ERRNO is the empty string if the file was
992 opened successfully. Otherwise, there is some problem with the file
993 and the code should use nextfile to skip it. If that is not done, gawk
994 produces its usual fatal error for files that cannot be opened.
995
996 For /regular expression/ patterns, the associated statement is executed
997 for each input record that matches the regular expression. Regular ex‐
998 pressions are the same as those in egrep(1), and are summarized below.
999
1000 A relational expression may use any of the operators defined below in
1001 the section on actions. These generally test whether certain fields
1002 match certain regular expressions.
1003
1004 The &&, ||, and ! operators are logical AND, logical OR, and logical
1005 NOT, respectively, as in C. They do short-circuit evaluation, also as
1006 in C, and are used for combining more primitive pattern expressions.
1007 As in most languages, parentheses may be used to change the order of
1008 evaluation.
1009
1010 The ?: operator is like the same operator in C. If the first pattern
1011 is true then the pattern used for testing is the second pattern, other‐
1012 wise it is the third. Only one of the second and third patterns is
1013 evaluated.
1014
1015 The pattern1, pattern2 form of an expression is called a range pattern.
1016 It matches all input records starting with a record that matches pat‐
1017 tern1, and continuing until a record that matches pattern2, inclusive.
1018 It does not combine with any other sort of pattern expression.
1019
1020 Regular Expressions
1021 Regular expressions are the extended kind found in egrep. They are
1022 composed of characters as follows:
1023
1024 c Matches the non-metacharacter c.
1025
1026 \c Matches the literal character c.
1027
1028 . Matches any character including newline.
1029
1030 ^ Matches the beginning of a string.
1031
1032 $ Matches the end of a string.
1033
1034 [abc...] A character list: matches any of the characters abc.... You
1035 may include a range of characters by separating them with a
1036 dash. To include a literal dash in the list, put it first
1037 or last.
1038
1039 [^abc...] A negated character list: matches any character except
1040 abc....
1041
1042 r1|r2 Alternation: matches either r1 or r2.
1043
1044 r1r2 Concatenation: matches r1, and then r2.
1045
1046 r+ Matches one or more r's.
1047
1048 r* Matches zero or more r's.
1049
1050 r? Matches zero or one r's.
1051
1052 (r) Grouping: matches r.
1053
1054 r{n}
1055 r{n,}
1056 r{n,m} One or two numbers inside braces denote an interval expres‐
1057 sion. If there is one number in the braces, the preceding
1058 regular expression r is repeated n times. If there are two
1059 numbers separated by a comma, r is repeated n to m times.
1060 If there is one number followed by a comma, then r is re‐
1061 peated at least n times.
1062
1063 \y Matches the empty string at either the beginning or the end
1064 of a word.
1065
1066 \B Matches the empty string within a word.
1067
1068 \< Matches the empty string at the beginning of a word.
1069
1070 \> Matches the empty string at the end of a word.
1071
1072 \s Matches any whitespace character.
1073
1074 \S Matches any nonwhitespace character.
1075
1076 \w Matches any word-constituent character (letter, digit, or
1077 underscore).
1078
1079 \W Matches any character that is not word-constituent.
1080
1081 \` Matches the empty string at the beginning of a buffer
1082 (string).
1083
1084 \' Matches the empty string at the end of a buffer.
1085
1086 The escape sequences that are valid in string constants (see String
1087 Constants) are also valid in regular expressions.
1088
1089 Character classes are a feature introduced in the POSIX standard. A
1090 character class is a special notation for describing lists of charac‐
1091 ters that have a specific attribute, but where the actual characters
1092 themselves can vary from country to country and/or from character set
1093 to character set. For example, the notion of what is an alphabetic
1094 character differs in the USA and in France.
1095
1096 A character class is only valid in a regular expression inside the
1097 brackets of a character list. Character classes consist of [:, a key‐
1098 word denoting the class, and :]. The character classes defined by the
1099 POSIX standard are:
1100
1101 [:alnum:] Alphanumeric characters.
1102
1103 [:alpha:] Alphabetic characters.
1104
1105 [:blank:] Space or tab characters.
1106
1107 [:cntrl:] Control characters.
1108
1109 [:digit:] Numeric characters.
1110
1111 [:graph:] Characters that are both printable and visible. (A space is
1112 printable, but not visible, while an a is both.)
1113
1114 [:lower:] Lowercase alphabetic characters.
1115
1116 [:print:] Printable characters (characters that are not control char‐
1117 acters.)
1118
1119 [:punct:] Punctuation characters (characters that are not letter, dig‐
1120 its, control characters, or space characters).
1121
1122 [:space:] Space characters (such as space, tab, and formfeed, to name
1123 a few).
1124
1125 [:upper:] Uppercase alphabetic characters.
1126
1127 [:xdigit:] Characters that are hexadecimal digits.
1128
1129 For example, before the POSIX standard, to match alphanumeric charac‐
1130 ters, you would have had to write /[A-Za-z0-9]/. If your character set
1131 had other alphabetic characters in it, this would not match them, and
1132 if your character set collated differently from ASCII, this might not
1133 even match the ASCII alphanumeric characters. With the POSIX character
1134 classes, you can write /[[:alnum:]]/, and this matches the alphabetic
1135 and numeric characters in your character set, no matter what it is.
1136
1137 Two additional special sequences can appear in character lists. These
1138 apply to non-ASCII character sets, which can have single symbols
1139 (called collating elements) that are represented with more than one
1140 character, as well as several characters that are equivalent for col‐
1141 lating, or sorting, purposes. (E.g., in French, a plain “e” and a
1142 grave-accented “e`” are equivalent.)
1143
1144 Collating Symbols
1145 A collating symbol is a multi-character collating element en‐
1146 closed in [. and .]. For example, if ch is a collating ele‐
1147 ment, then [[.ch.]] is a regular expression that matches this
1148 collating element, while [ch] is a regular expression that
1149 matches either c or h.
1150
1151 Equivalence Classes
1152 An equivalence class is a locale-specific name for a list of
1153 characters that are equivalent. The name is enclosed in [= and
1154 =]. For example, the name e might be used to represent all of
1155 “e”, “e´”, and “e`”. In this case, [[=e=]] is a regular expres‐
1156 sion that matches any of e, e´, or e`.
1157
1158 These features are very valuable in non-English speaking locales. The
1159 library functions that gawk uses for regular expression matching cur‐
1160 rently only recognize POSIX character classes; they do not recognize
1161 collating symbols or equivalence classes.
1162
1163 The \y, \B, \<, \>, \s, \S, \w, \W, \`, and \' operators are specific
1164 to gawk; they are extensions based on facilities in the GNU regular ex‐
1165 pression libraries.
1166
1167 The various command line options control how gawk interprets characters
1168 in regular expressions.
1169
1170 No options
1171 In the default case, gawk provides all the facilities of POSIX
1172 regular expressions and the GNU regular expression operators de‐
1173 scribed above.
1174
1175 --posix
1176 Only POSIX regular expressions are supported, the GNU operators
1177 are not special. (E.g., \w matches a literal w).
1178
1179 --traditional
1180 Traditional UNIX awk regular expressions are matched. The GNU
1181 operators are not special, and interval expressions are not
1182 available. Characters described by octal and hexadecimal escape
1183 sequences are treated literally, even if they represent regular
1184 expression metacharacters.
1185
1186 --re-interval
1187 Allow interval expressions in regular expressions, even if
1188 --traditional has been provided.
1189
1190 Actions
1191 Action statements are enclosed in braces, { and }. Action statements
1192 consist of the usual assignment, conditional, and looping statements
1193 found in most languages. The operators, control statements, and in‐
1194 put/output statements available are patterned after those in C.
1195
1196 Operators
1197 The operators in AWK, in order of decreasing precedence, are:
1198
1199 (...) Grouping
1200
1201 $ Field reference.
1202
1203 ++ -- Increment and decrement, both prefix and postfix.
1204
1205 ^ Exponentiation (** may also be used, and **= for the as‐
1206 signment operator).
1207
1208 + - ! Unary plus, unary minus, and logical negation.
1209
1210 * / % Multiplication, division, and modulus.
1211
1212 + - Addition and subtraction.
1213
1214 space String concatenation.
1215
1216 | |& Piped I/O for getline, print, and printf.
1217
1218 < > <= >= == !=
1219 The regular relational operators.
1220
1221 ~ !~ Regular expression match, negated match. NOTE: Do not use
1222 a constant regular expression (/foo/) on the left-hand side
1223 of a ~ or !~. Only use one on the right-hand side. The
1224 expression /foo/ ~ exp has the same meaning as (($0 ~
1225 /foo/) ~ exp). This is usually not what you want.
1226
1227 in Array membership.
1228
1229 && Logical AND.
1230
1231 || Logical OR.
1232
1233 ?: The C conditional expression. This has the form expr1 ?
1234 expr2 : expr3. If expr1 is true, the value of the expres‐
1235 sion is expr2, otherwise it is expr3. Only one of expr2
1236 and expr3 is evaluated.
1237
1238 = += -= *= /= %= ^=
1239 Assignment. Both absolute assignment (var = value) and op‐
1240 erator-assignment (the other forms) are supported.
1241
1242 Control Statements
1243 The control statements are as follows:
1244
1245 if (condition) statement [ else statement ]
1246 while (condition) statement
1247 do statement while (condition)
1248 for (expr1; expr2; expr3) statement
1249 for (var in array) statement
1250 break
1251 continue
1252 delete array[index]
1253 delete array
1254 exit [ expression ]
1255 { statements }
1256 switch (expression) {
1257 case value|regex : statement
1258 ...
1259 [ default: statement ]
1260 }
1261
1262 I/O Statements
1263 The input/output statements are as follows:
1264
1265 close(file [, how]) Close file, pipe or coprocess. The optional how
1266 should only be used when closing one end of a
1267 two-way pipe to a coprocess. It must be a string
1268 value, either "to" or "from".
1269
1270 getline Set $0 from the next input record; set NF, NR,
1271 FNR, RT.
1272
1273 getline <file Set $0 from the next record of file; set NF, RT.
1274
1275 getline var Set var from the next input record; set NR, FNR,
1276 RT.
1277
1278 getline var <file Set var from the next record of file; set RT.
1279
1280 command | getline [var]
1281 Run command, piping the output either into $0 or
1282 var, as above, and RT.
1283
1284 command |& getline [var]
1285 Run command as a coprocess piping the output ei‐
1286 ther into $0 or var, as above, and RT. Copro‐
1287 cesses are a gawk extension. (The command can
1288 also be a socket. See the subsection Special
1289 File Names, below.)
1290
1291 next Stop processing the current input record. Read
1292 the next input record and start processing over
1293 with the first pattern in the AWK program. Upon
1294 reaching the end of the input data, execute any
1295 END rule(s).
1296
1297 nextfile Stop processing the current input file. The next
1298 input record read comes from the next input file.
1299 Update FILENAME and ARGIND, reset FNR to 1, and
1300 start processing over with the first pattern in
1301 the AWK program. Upon reaching the end of the
1302 input data, execute any ENDFILE and END rule(s).
1303
1304 print Print the current record. The output record is
1305 terminated with the value of ORS.
1306
1307 print expr-list Print expressions. Each expression is separated
1308 by the value of OFS. The output record is termi‐
1309 nated with the value of ORS.
1310
1311 print expr-list >file Print expressions on file. Each expression is
1312 separated by the value of OFS. The output record
1313 is terminated with the value of ORS.
1314
1315 printf fmt, expr-list Format and print. See The printf Statement, be‐
1316 low.
1317
1318 printf fmt, expr-list >file
1319 Format and print on file.
1320
1321 system(cmd-line) Execute the command cmd-line, and return the exit
1322 status. (This may not be available on non-POSIX
1323 systems.) See GAWK: Effective AWK Programming
1324 for the full details on the exit status.
1325
1326 fflush([file]) Flush any buffers associated with the open output
1327 file or pipe file. If file is missing or if it
1328 is the null string, then flush all open output
1329 files and pipes.
1330
1331 Additional output redirections are allowed for print and printf.
1332
1333 print ... >> file
1334 Append output to the file.
1335
1336 print ... | command
1337 Write on a pipe.
1338
1339 print ... |& command
1340 Send data to a coprocess or socket. (See also the subsection
1341 Special File Names, below.)
1342
1343 The getline command returns 1 on success, zero on end of file, and -1
1344 on an error. If the errno(3) value indicates that the I/O operation
1345 may be retried, and PROCINFO["input", "RETRY"] is set, then -2 is re‐
1346 turned instead of -1, and further calls to getline may be attempted.
1347 Upon an error, ERRNO is set to a string describing the problem.
1348
1349 NOTE: Failure in opening a two-way socket results in a non-fatal error
1350 being returned to the calling function. If using a pipe, coprocess, or
1351 socket to getline, or from print or printf within a loop, you must use
1352 close() to create new instances of the command or socket. AWK does not
1353 automatically close pipes, sockets, or coprocesses when they return
1354 EOF.
1355
1356 The printf Statement
1357 The AWK versions of the printf statement and sprintf() function (see
1358 below) accept the following conversion specification formats:
1359
1360 %a, %A A floating point number of the form [-]0xh.hhhhp+-dd (C99 hexa‐
1361 decimal floating point format). For %A, uppercase letters are
1362 used instead of lowercase ones.
1363
1364 %c A single character. If the argument used for %c is numeric, it
1365 is treated as a character and printed. Otherwise, the argument
1366 is assumed to be a string, and the only first character of that
1367 string is printed.
1368
1369 %d, %i A decimal number (the integer part).
1370
1371 %e, %E A floating point number of the form [-]d.dddddde[+-]dd. The %E
1372 format uses E instead of e.
1373
1374 %f, %F A floating point number of the form [-]ddd.dddddd. If the sys‐
1375 tem library supports it, %F is available as well. This is like
1376 %f, but uses capital letters for special “not a number” and
1377 “infinity” values. If %F is not available, gawk uses %f.
1378
1379 %g, %G Use %e or %f conversion, whichever is shorter, with nonsignifi‐
1380 cant zeros suppressed. The %G format uses %E instead of %e.
1381
1382 %o An unsigned octal number (also an integer).
1383
1384 %u An unsigned decimal number (again, an integer).
1385
1386 %s A character string.
1387
1388 %x, %X An unsigned hexadecimal number (an integer). The %X format
1389 uses ABCDEF instead of abcdef.
1390
1391 %% A single % character; no argument is converted.
1392
1393 Optional, additional parameters may lie between the % and the control
1394 letter:
1395
1396 count$ Use the count'th argument at this point in the formatting. This
1397 is called a positional specifier and is intended primarily for
1398 use in translated versions of format strings, not in the origi‐
1399 nal text of an AWK program. It is a gawk extension.
1400
1401 - The expression should be left-justified within its field.
1402
1403 space For numeric conversions, prefix positive values with a space,
1404 and negative values with a minus sign.
1405
1406 + The plus sign, used before the width modifier (see below), says
1407 to always supply a sign for numeric conversions, even if the
1408 data to be formatted is positive. The + overrides the space
1409 modifier.
1410
1411 # Use an “alternate form” for certain control letters. For %o,
1412 supply a leading zero. For %x, and %X, supply a leading 0x or
1413 0X for a nonzero result. For %e, %E, %f and %F, the result al‐
1414 ways contains a decimal point. For %g, and %G, trailing zeros
1415 are not removed from the result.
1416
1417 0 A leading 0 (zero) acts as a flag, indicating that output should
1418 be padded with zeroes instead of spaces. This applies only to
1419 the numeric output formats. This flag only has an effect when
1420 the field width is wider than the value to be printed.
1421
1422 ' A single quote character instructs gawk to insert the locale's
1423 thousands-separator character into decimal numbers, and to also
1424 use the locale's decimal point character with floating point
1425 formats. This requires correct locale support in the C library
1426 and in the definition of the current locale.
1427
1428 width The field should be padded to this width. The field is normally
1429 padded with spaces. With the 0 flag, it is padded with zeroes.
1430
1431 .prec A number that specifies the precision to use when printing. For
1432 the %e, %E, %f and %F, formats, this specifies the number of
1433 digits you want printed to the right of the decimal point. For
1434 the %g, and %G formats, it specifies the maximum number of sig‐
1435 nificant digits. For the %d, %i, %o, %u, %x, and %X formats, it
1436 specifies the minimum number of digits to print. For the %s
1437 format, it specifies the maximum number of characters from the
1438 string that should be printed.
1439
1440 The dynamic width and prec capabilities of the ISO C printf() routines
1441 are supported. A * in place of either the width or prec specifications
1442 causes their values to be taken from the argument list to printf or
1443 sprintf(). To use a positional specifier with a dynamic width or pre‐
1444 cision, supply the count$ after the * in the format string. For exam‐
1445 ple, "%3$*2$.*1$s".
1446
1447 Special File Names
1448 When doing I/O redirection from either print or printf into a file, or
1449 via getline from a file, gawk recognizes certain special filenames in‐
1450 ternally. These filenames allow access to open file descriptors inher‐
1451 ited from gawk's parent process (usually the shell). These file names
1452 may also be used on the command line to name data files. The filenames
1453 are:
1454
1455 - The standard input.
1456
1457 /dev/stdin The standard input.
1458
1459 /dev/stdout The standard output.
1460
1461 /dev/stderr The standard error output.
1462
1463 /dev/fd/n The file associated with the open file descriptor n.
1464
1465 These are particularly useful for error messages. For example:
1466
1467 print "You blew it!" > "/dev/stderr"
1468
1469 whereas you would otherwise have to use
1470
1471 print "You blew it!" | "cat 1>&2"
1472
1473 The following special filenames may be used with the |& coprocess oper‐
1474 ator for creating TCP/IP network connections:
1475
1476 /inet/tcp/lport/rhost/rport
1477 /inet4/tcp/lport/rhost/rport
1478 /inet6/tcp/lport/rhost/rport
1479 Files for a TCP/IP connection on local port lport to remote host
1480 rhost on remote port rport. Use a port of 0 to have the system
1481 pick a port. Use /inet4 to force an IPv4 connection, and /inet6
1482 to force an IPv6 connection. Plain /inet uses the system de‐
1483 fault (most likely IPv4). Usable only with the |& two-way I/O
1484 operator.
1485
1486 /inet/udp/lport/rhost/rport
1487 /inet4/udp/lport/rhost/rport
1488 /inet6/udp/lport/rhost/rport
1489 Similar, but use UDP/IP instead of TCP/IP.
1490
1491 Numeric Functions
1492 AWK has the following built-in arithmetic functions:
1493
1494 atan2(y, x) Return the arctangent of y/x in radians.
1495
1496 cos(expr) Return the cosine of expr, which is in radians.
1497
1498 exp(expr) The exponential function.
1499
1500 int(expr) Truncate to integer.
1501
1502 log(expr) The natural logarithm function.
1503
1504 rand() Return a random number N, between zero and one, such that
1505 0 ≤ N < 1.
1506
1507 sin(expr) Return the sine of expr, which is in radians.
1508
1509 sqrt(expr) Return the square root of expr.
1510
1511 srand([expr]) Use expr as the new seed for the random number generator.
1512 If no expr is provided, use the time of day. Return the
1513 previous seed for the random number generator.
1514
1515 String Functions
1516 Gawk has the following built-in string functions:
1517
1518 asort(s [, d [, how] ]) Return the number of elements in the source ar‐
1519 ray s. Sort the contents of s using gawk's
1520 normal rules for comparing values, and replace
1521 the indices of the sorted values s with sequen‐
1522 tial integers starting with 1. If the optional
1523 destination array d is specified, first dupli‐
1524 cate s into d, and then sort d, leaving the in‐
1525 dices of the source array s unchanged. The op‐
1526 tional string how controls the direction and
1527 the comparison mode. Valid values for how are
1528 any of the strings valid for
1529 PROCINFO["sorted_in"]. It can also be the name
1530 of a user-defined comparison function as de‐
1531 scribed in PROCINFO["sorted_in"]. s and d are
1532 allowed to be the same array; this only makes
1533 sense when supplying the third argument as
1534 well.
1535
1536 asorti(s [, d [, how] ])
1537 Return the number of elements in the source ar‐
1538 ray s. The behavior is the same as that of
1539 asort(), except that the array indices are used
1540 for sorting, not the array values. When done,
1541 the array is indexed numerically, and the val‐
1542 ues are those of the original indices. The
1543 original values are lost; thus provide a second
1544 array if you wish to preserve the original.
1545 The purpose of the optional string how is the
1546 same as described previously for asort(). Here
1547 too, s and d are allowed to be the same array;
1548 this only makes sense when supplying the third
1549 argument as well.
1550
1551 gensub(r, s, h [, t]) Search the target string t for matches of the
1552 regular expression r. If h is a string begin‐
1553 ning with g or G, then replace all matches of r
1554 with s. Otherwise, h is a number indicating
1555 which match of r to replace. If t is not sup‐
1556 plied, use $0 instead. Within the replacement
1557 text s, the sequence \n, where n is a digit
1558 from 1 to 9, may be used to indicate just the
1559 text that matched the n'th parenthesized subex‐
1560 pression. The sequence \0 represents the en‐
1561 tire matched text, as does the character &.
1562 Unlike sub() and gsub(), the modified string is
1563 returned as the result of the function, and the
1564 original target string is not changed.
1565
1566 gsub(r, s [, t]) For each substring matching the regular expres‐
1567 sion r in the string t, substitute the string
1568 s, and return the number of substitutions. If
1569 t is not supplied, use $0. An & in the re‐
1570 placement text is replaced with the text that
1571 was actually matched. Use \& to get a literal
1572 &. (This must be typed as "\\&"; see GAWK: Ef‐
1573 fective AWK Programming for a fuller discussion
1574 of the rules for ampersands and backslashes in
1575 the replacement text of sub(), gsub(), and gen‐
1576 sub().)
1577
1578 index(s, t) Return the index of the string t in the string
1579 s, or zero if t is not present. (This implies
1580 that character indices start at one.) It is a
1581 fatal error to use a regexp constant for t.
1582
1583 length([s]) Return the length of the string s, or the
1584 length of $0 if s is not supplied. As a non-
1585 standard extension, with an array argument,
1586 length() returns the number of elements in the
1587 array.
1588
1589 match(s, r [, a]) Return the position in s where the regular ex‐
1590 pression r occurs, or zero if r is not present,
1591 and set the values of RSTART and RLENGTH. Note
1592 that the argument order is the same as for the
1593 ~ operator: str ~ re. If array a is provided,
1594 a is cleared and then elements 1 through n are
1595 filled with the portions of s that match the
1596 corresponding parenthesized subexpression in r.
1597 The zero'th element of a contains the portion
1598 of s matched by the entire regular expression
1599 r. Subscripts a[n, "start"], and a[n,
1600 "length"] provide the starting index in the
1601 string and length respectively, of each match‐
1602 ing substring.
1603
1604 patsplit(s, a [, r [, seps] ])
1605 Split the string s into the array a and the
1606 separators array seps on the regular expression
1607 r, and return the number of fields. Element
1608 values are the portions of s that matched r.
1609 The value of seps[i] is the possibly null sepa‐
1610 rator that appeared after a[i]. The value of
1611 seps[0] is the possibly null leading separator.
1612 If r is omitted, FPAT is used instead. The ar‐
1613 rays a and seps are cleared first. Splitting
1614 behaves identically to field splitting with
1615 FPAT, described above.
1616
1617 split(s, a [, r [, seps] ])
1618 Split the string s into the array a and the
1619 separators array seps on the regular expression
1620 r, and return the number of fields. If r is
1621 omitted, FS is used instead. The arrays a and
1622 seps are cleared first. seps[i] is the field
1623 separator matched by r between a[i] and a[i+1].
1624 If r is a single space, then leading whitespace
1625 in s goes into the extra array element seps[0]
1626 and trailing whitespace goes into the extra ar‐
1627 ray element seps[n], where n is the return
1628 value of split(s, a, r, seps). Splitting be‐
1629 haves identically to field splitting, described
1630 above. In particular, if r is a single-charac‐
1631 ter string, that string acts as the separator,
1632 even if it happens to be a regular expression
1633 metacharacter.
1634
1635 sprintf(fmt, expr-list) Print expr-list according to fmt, and return
1636 the resulting string.
1637
1638 strtonum(str) Examine str, and return its numeric value. If
1639 str begins with a leading 0, treat it as an oc‐
1640 tal number. If str begins with a leading 0x or
1641 0X, treat it as a hexadecimal number. Other‐
1642 wise, assume it is a decimal number.
1643
1644 sub(r, s [, t]) Just like gsub(), but replace only the first
1645 matching substring. Return either zero or one.
1646
1647 substr(s, i [, n]) Return the at most n-character substring of s
1648 starting at i. If n is omitted, use the rest
1649 of s.
1650
1651 tolower(str) Return a copy of the string str, with all the
1652 uppercase characters in str translated to their
1653 corresponding lowercase counterparts. Non-al‐
1654 phabetic characters are left unchanged.
1655
1656 toupper(str) Return a copy of the string str, with all the
1657 lowercase characters in str translated to their
1658 corresponding uppercase counterparts. Non-al‐
1659 phabetic characters are left unchanged.
1660
1661 Gawk is multibyte aware. This means that index(), length(), substr()
1662 and match() all work in terms of characters, not bytes.
1663
1664 Time Functions
1665 Since one of the primary uses of AWK programs is processing log files
1666 that contain time stamp information, gawk provides the following func‐
1667 tions for obtaining time stamps and formatting them.
1668
1669 mktime(datespec [, utc-flag])
1670 Turn datespec into a time stamp of the same form as returned
1671 by systime(), and return the result. The datespec is a
1672 string of the form YYYY MM DD HH MM SS[ DST]. The contents
1673 of the string are six or seven numbers representing respec‐
1674 tively the full year including century, the month from 1 to
1675 12, the day of the month from 1 to 31, the hour of the day
1676 from 0 to 23, the minute from 0 to 59, the second from 0 to
1677 60, and an optional daylight saving flag. The values of
1678 these numbers need not be within the ranges specified; for
1679 example, an hour of -1 means 1 hour before midnight. The
1680 origin-zero Gregorian calendar is assumed, with year 0 pre‐
1681 ceding year 1 and year -1 preceding year 0. If utc-flag is
1682 present and is non-zero or non-null, the time is assumed to
1683 be in the UTC time zone; otherwise, the time is assumed to be
1684 in the local time zone. If the DST daylight saving flag is
1685 positive, the time is assumed to be daylight saving time; if
1686 zero, the time is assumed to be standard time; and if nega‐
1687 tive (the default), mktime() attempts to determine whether
1688 daylight saving time is in effect for the specified time. If
1689 datespec does not contain enough elements or if the resulting
1690 time is out of range, mktime() returns -1.
1691
1692 strftime([format [, timestamp[, utc-flag]]])
1693 Format timestamp according to the specification in format.
1694 If utc-flag is present and is non-zero or non-null, the re‐
1695 sult is in UTC, otherwise the result is in local time. The
1696 timestamp should be of the same form as returned by sys‐
1697 time(). If timestamp is missing, the current time of day is
1698 used. If format is missing, a default format equivalent to
1699 the output of date(1) is used. The default format is avail‐
1700 able in PROCINFO["strftime"]. See the specification for the
1701 strftime() function in ISO C for the format conversions that
1702 are guaranteed to be available.
1703
1704 systime() Return the current time of day as the number of seconds since
1705 the Epoch (1970-01-01 00:00:00 UTC on POSIX systems).
1706
1707 Bit Manipulations Functions
1708 Gawk supplies the following bit manipulation functions. They work by
1709 converting double-precision floating point values to uintmax_t inte‐
1710 gers, doing the operation, and then converting the result back to
1711 floating point.
1712
1713 NOTE: Passing negative operands to any of these functions causes a fa‐
1714 tal error.
1715
1716 The functions are:
1717
1718 and(v1, v2 [, ...]) Return the bitwise AND of the values provided in
1719 the argument list. There must be at least two.
1720
1721 compl(val) Return the bitwise complement of val.
1722
1723 lshift(val, count) Return the value of val, shifted left by count
1724 bits.
1725
1726 or(v1, v2 [, ...]) Return the bitwise OR of the values provided in the
1727 argument list. There must be at least two.
1728
1729 rshift(val, count) Return the value of val, shifted right by count
1730 bits.
1731
1732 xor(v1, v2 [, ...]) Return the bitwise XOR of the values provided in
1733 the argument list. There must be at least two.
1734
1735 Type Functions
1736 The following functions provide type related information about their
1737 arguments.
1738
1739 isarray(x) Return true if x is an array, false otherwise. This func‐
1740 tion is mainly for use with the elements of multidimensional
1741 arrays and with function parameters.
1742
1743 typeof(x) Return a string indicating the type of x. The string will
1744 be one of "array", "number", "regexp", "string", "strnum",
1745 "unassigned", or "undefined".
1746
1747 Internationalization Functions
1748 The following functions may be used from within your AWK program for
1749 translating strings at run-time. For full details, see GAWK: Effective
1750 AWK Programming.
1751
1752 bindtextdomain(directory [, domain])
1753 Specify the directory where gawk looks for the .gmo files, in
1754 case they will not or cannot be placed in the ``standard'' loca‐
1755 tions (e.g., during testing). It returns the directory where
1756 domain is ``bound.''
1757 The default domain is the value of TEXTDOMAIN. If directory is
1758 the null string (""), then bindtextdomain() returns the current
1759 binding for the given domain.
1760
1761 dcgettext(string [, domain [, category]])
1762 Return the translation of string in text domain domain for lo‐
1763 cale category category. The default value for domain is the
1764 current value of TEXTDOMAIN. The default value for category is
1765 "LC_MESSAGES".
1766 If you supply a value for category, it must be a string equal to
1767 one of the known locale categories described in GAWK: Effective
1768 AWK Programming. You must also supply a text domain. Use
1769 TEXTDOMAIN if you want to use the current domain.
1770
1771 dcngettext(string1, string2, number [, domain [, category]])
1772 Return the plural form used for number of the translation of
1773 string1 and string2 in text domain domain for locale category
1774 category. The default value for domain is the current value of
1775 TEXTDOMAIN. The default value for category is "LC_MESSAGES".
1776 If you supply a value for category, it must be a string equal to
1777 one of the known locale categories described in GAWK: Effective
1778 AWK Programming. You must also supply a text domain. Use
1779 TEXTDOMAIN if you want to use the current domain.
1780
1782 Functions in AWK are defined as follows:
1783
1784 function name(parameter list) { statements }
1785
1786 Functions execute when they are called from within expressions in ei‐
1787 ther patterns or actions. Actual parameters supplied in the function
1788 call are used to instantiate the formal parameters declared in the
1789 function. Arrays are passed by reference, other variables are passed
1790 by value.
1791
1792 Since functions were not originally part of the AWK language, the pro‐
1793 vision for local variables is rather clumsy: They are declared as extra
1794 parameters in the parameter list. The convention is to separate local
1795 variables from real parameters by extra spaces in the parameter list.
1796 For example:
1797
1798 function f(p, q, a, b) # a and b are local
1799 {
1800 ...
1801 }
1802
1803 /abc/ { ... ; f(1, 2) ; ... }
1804
1805 The left parenthesis in a function call is required to immediately fol‐
1806 low the function name, without any intervening whitespace. This avoids
1807 a syntactic ambiguity with the concatenation operator. This restric‐
1808 tion does not apply to the built-in functions listed above.
1809
1810 Functions may call each other and may be recursive. Function parame‐
1811 ters used as local variables are initialized to the null string and the
1812 number zero upon function invocation.
1813
1814 Use return expr to return a value from a function. The return value is
1815 undefined if no value is provided, or if the function returns by “fall‐
1816 ing off” the end.
1817
1818 As a gawk extension, functions may be called indirectly. To do this,
1819 assign the name of the function to be called, as a string, to a vari‐
1820 able. Then use the variable as if it were the name of a function, pre‐
1821 fixed with an @ sign, like so:
1822 function myfunc()
1823 {
1824 print "myfunc called"
1825 ...
1826 }
1827
1828 { ...
1829 the_func = "myfunc"
1830 @the_func() # call through the_func to myfunc
1831 ...
1832 }
1833 As of version 4.1.2, this works with user-defined functions, built-in
1834 functions, and extension functions.
1835
1836 If --lint has been provided, gawk warns about calls to undefined func‐
1837 tions at parse time, instead of at run time. Calling an undefined
1838 function at run time is a fatal error.
1839
1840 The word func may be used in place of function, although this is depre‐
1841 cated.
1842
1844 You can dynamically add new functions written in C or C++ to the run‐
1845 ning gawk interpreter with the @load statement. The full details are
1846 beyond the scope of this manual page; see GAWK: Effective AWK Program‐
1847 ming.
1848
1850 The gawk profiler accepts two signals. SIGUSR1 causes it to dump a
1851 profile and function call stack to the profile file, which is either
1852 awkprof.out, or whatever file was named with the --profile option. It
1853 then continues to run. SIGHUP causes gawk to dump the profile and
1854 function call stack and then exit.
1855
1857 String constants are sequences of characters enclosed in double quotes.
1858 In non-English speaking environments, it is possible to mark strings in
1859 the AWK program as requiring translation to the local natural language.
1860 Such strings are marked in the AWK program with a leading underscore
1861 (“_”). For example,
1862
1863 gawk 'BEGIN { print "hello, world" }'
1864
1865 always prints hello, world. But,
1866
1867 gawk 'BEGIN { print _"hello, world" }'
1868
1869 might print bonjour, monde in France.
1870
1871 There are several steps involved in producing and running a localizable
1872 AWK program.
1873
1874 1. Add a BEGIN action to assign a value to the TEXTDOMAIN variable to
1875 set the text domain to a name associated with your program:
1876
1877 BEGIN { TEXTDOMAIN = "myprog" }
1878
1879 This allows gawk to find the .gmo file associated with your pro‐
1880 gram. Without this step, gawk uses the messages text domain, which
1881 likely does not contain translations for your program.
1882
1883 2. Mark all strings that should be translated with leading under‐
1884 scores.
1885
1886 3. If necessary, use the dcgettext() and/or bindtextdomain() functions
1887 in your program, as appropriate.
1888
1889 4. Run gawk --gen-pot -f myprog.awk > myprog.pot to generate a .pot
1890 file for your program.
1891
1892 5. Provide appropriate translations, and build and install the corre‐
1893 sponding .gmo files.
1894
1895 The internationalization features are described in full detail in GAWK:
1896 Effective AWK Programming.
1897
1899 A primary goal for gawk is compatibility with the POSIX standard, as
1900 well as with the latest version of Brian Kernighan's awk. To this end,
1901 gawk incorporates the following user visible features which are not de‐
1902 scribed in the AWK book, but are part of the Brian Kernighan's version
1903 of awk, and are in the POSIX standard.
1904
1905 The book indicates that command line variable assignment happens when
1906 awk would otherwise open the argument as a file, which is after the BE‐
1907 GIN rule is executed. However, in earlier implementations, when such
1908 an assignment appeared before any file names, the assignment would hap‐
1909 pen before the BEGIN rule was run. Applications came to depend on this
1910 “feature.” When awk was changed to match its documentation, the -v op‐
1911 tion for assigning variables before program execution was added to ac‐
1912 commodate applications that depended upon the old behavior. (This fea‐
1913 ture was agreed upon by both the Bell Laboratories developers and the
1914 GNU developers.)
1915
1916 When processing arguments, gawk uses the special option “--” to signal
1917 the end of arguments. In compatibility mode, it warns about but other‐
1918 wise ignores undefined options. In normal operation, such arguments
1919 are passed on to the AWK program for it to process.
1920
1921 The AWK book does not define the return value of srand(). The POSIX
1922 standard has it return the seed it was using, to allow keeping track of
1923 random number sequences. Therefore srand() in gawk also returns its
1924 current seed.
1925
1926 Other features are: The use of multiple -f options (from MKS awk); the
1927 ENVIRON array; the \a, and \v escape sequences (done originally in gawk
1928 and fed back into the Bell Laboratories version); the tolower() and
1929 toupper() built-in functions (from the Bell Laboratories version); and
1930 the ISO C conversion specifications in printf (done first in the Bell
1931 Laboratories version).
1932
1934 There is one feature of historical AWK implementations that gawk sup‐
1935 ports: It is possible to call the length() built-in function not only
1936 with no argument, but even without parentheses! Thus,
1937
1938 a = length # Holy Algol 60, Batman!
1939
1940 is the same as either of
1941
1942 a = length()
1943 a = length($0)
1944
1945 Using this feature is poor practice, and gawk issues a warning about
1946 its use if --lint is specified on the command line.
1947
1949 Gawk has a too-large number of extensions to POSIX awk. They are de‐
1950 scribed in this section. All the extensions described here can be dis‐
1951 abled by invoking gawk with the --traditional or --posix options.
1952
1953 The following features of gawk are not available in POSIX awk.
1954
1955 • No path search is performed for files named via the -f option.
1956 Therefore the AWKPATH environment variable is not special.
1957
1958 • There is no facility for doing file inclusion (gawk's @include mecha‐
1959 nism).
1960
1961 • There is no facility for dynamically adding new functions written in
1962 C (gawk's @load mechanism).
1963
1964 • The \x escape sequence.
1965
1966 • The ability to continue lines after ? and :.
1967
1968 • Octal and hexadecimal constants in AWK programs.
1969
1970 • The ARGIND, BINMODE, ERRNO, LINT, PREC, ROUNDMODE, RT and TEXTDOMAIN
1971 variables are not special.
1972
1973 • The IGNORECASE variable and its side-effects are not available.
1974
1975 • The FIELDWIDTHS variable and fixed-width field splitting.
1976
1977 • The FPAT variable and field splitting based on field values.
1978
1979 • The FUNCTAB, SYMTAB, and PROCINFO arrays are not available.
1980
1981 • The use of RS as a regular expression.
1982
1983 • The special file names available for I/O redirection are not recog‐
1984 nized.
1985
1986 • The |& operator for creating coprocesses.
1987
1988 • The BEGINFILE and ENDFILE special patterns are not available.
1989
1990 • The ability to split out individual characters using the null string
1991 as the value of FS, and as the third argument to split().
1992
1993 • An optional fourth argument to split() to receive the separator
1994 texts.
1995
1996 • The optional second argument to the close() function.
1997
1998 • The optional third argument to the match() function.
1999
2000 • The ability to use positional specifiers with printf and sprintf().
2001
2002 • The ability to pass an array to length().
2003
2004 • The and(), asort(), asorti(), bindtextdomain(), compl(), dcgettext(),
2005 dcngettext(), gensub(), lshift(), mktime(), or(), patsplit(),
2006 rshift(), strftime(), strtonum(), systime() and xor() functions.
2007
2008 • Localizable strings.
2009
2010 • Non-fatal I/O.
2011
2012 • Retryable I/O.
2013
2014 The AWK book does not define the return value of the close() function.
2015 Gawk's close() returns the value from fclose(3), or pclose(3), when
2016 closing an output file or pipe, respectively. It returns the process's
2017 exit status when closing an input pipe. The return value is -1 if the
2018 named file, pipe or coprocess was not opened with a redirection.
2019
2020 When gawk is invoked with the --traditional option, if the fs argument
2021 to the -F option is “t”, then FS is set to the tab character. Note
2022 that typing gawk -F\t ... simply causes the shell to quote the “t,”
2023 and does not pass “\t” to the -F option. Since this is a rather ugly
2024 special case, it is not the default behavior. This behavior also does
2025 not occur if --posix has been specified. To really get a tab character
2026 as the field separator, it is best to use single quotes: gawk -F'\t'
2027 ....
2028
2030 The AWKPATH environment variable can be used to provide a list of di‐
2031 rectories that gawk searches when looking for files named via the -f,
2032 --file, -i and --include options, and the @include directive. If the
2033 initial search fails, the path is searched again after appending .awk
2034 to the filename.
2035
2036 The AWKLIBPATH environment variable can be used to provide a list of
2037 directories that gawk searches when looking for files named via the -l
2038 and --load options.
2039
2040 The GAWK_READ_TIMEOUT environment variable can be used to specify a
2041 timeout in milliseconds for reading input from a terminal, pipe or two-
2042 way communication including sockets.
2043
2044 For connection to a remote host via socket, GAWK_SOCK_RETRIES controls
2045 the number of retries, and GAWK_MSEC_SLEEP the interval between re‐
2046 tries. The interval is in milliseconds. On systems that do not support
2047 usleep(3), the value is rounded up to an integral number of seconds.
2048
2049 If POSIXLY_CORRECT exists in the environment, then gawk behaves exactly
2050 as if --posix had been specified on the command line. If --lint has
2051 been specified, gawk issues a warning message to this effect.
2052
2054 If the exit statement is used with a value, then gawk exits with the
2055 numeric value given to it.
2056
2057 Otherwise, if there were no problems during execution, gawk exits with
2058 the value of the C constant EXIT_SUCCESS. This is usually zero.
2059
2060 If an error occurs, gawk exits with the value of the C constant
2061 EXIT_FAILURE. This is usually one.
2062
2063 If gawk exits because of a fatal error, the exit status is 2. On non-
2064 POSIX systems, this value may be mapped to EXIT_FAILURE.
2065
2067 This man page documents gawk, version 5.1.
2068
2070 The original version of UNIX awk was designed and implemented by Alfred
2071 Aho, Peter Weinberger, and Brian Kernighan of Bell Laboratories. Brian
2072 Kernighan continues to maintain and enhance it.
2073
2074 Paul Rubin and Jay Fenlason, of the Free Software Foundation, wrote
2075 gawk, to be compatible with the original version of awk distributed in
2076 Seventh Edition UNIX. John Woods contributed a number of bug fixes.
2077 David Trueman, with contributions from Arnold Robbins, made gawk com‐
2078 patible with the new version of UNIX awk. Arnold Robbins is the cur‐
2079 rent maintainer.
2080
2081 See GAWK: Effective AWK Programming for a full list of the contributors
2082 to gawk and its documentation.
2083
2084 See the README file in the gawk distribution for up-to-date information
2085 about maintainers and which ports are currently supported.
2086
2088 If you find a bug in gawk, please send electronic mail to
2089 bug-gawk@gnu.org. Please include your operating system and its revi‐
2090 sion, the version of gawk (from gawk --version), which C compiler you
2091 used to compile it, and a test program and data that are as small as
2092 possible for reproducing the problem.
2093
2094 Before sending a bug report, please do the following things. First,
2095 verify that you have the latest version of gawk. Many bugs (usually
2096 subtle ones) are fixed at each release, and if yours is out of date,
2097 the problem may already have been solved. Second, please see if set‐
2098 ting the environment variable LC_ALL to LC_ALL=C causes things to be‐
2099 have as you expect. If so, it's a locale issue, and may or may not re‐
2100 ally be a bug. Finally, please read this man page and the reference
2101 manual carefully to be sure that what you think is a bug really is, in‐
2102 stead of just a quirk in the language.
2103
2104 Whatever you do, do NOT post a bug report in comp.lang.awk. While the
2105 gawk developers occasionally read this newsgroup, posting bug reports
2106 there is an unreliable way to report bugs. Similarly, do NOT use a web
2107 forum (such as Stack Overflow) for reporting bugs. Instead, please use
2108 the electronic mail addresses given above. Really.
2109
2110 If you're using a GNU/Linux or BSD-based system, you may wish to submit
2111 a bug report to the vendor of your distribution. That's fine, but
2112 please send a copy to the official email address as well, since there's
2113 no guarantee that the bug report will be forwarded to the gawk main‐
2114 tainer.
2115
2117 The -F option is not necessary given the command line variable assign‐
2118 ment feature; it remains only for backwards compatibility.
2119
2121 egrep(1), sed(1), getpid(2), getppid(2), getpgrp(2), getuid(2), ge‐
2122 teuid(2), getgid(2), getegid(2), getgroups(2), printf(3), strftime(3),
2123 usleep(3)
2124
2125 The AWK Programming Language, Alfred V. Aho, Brian W. Kernighan, Peter
2126 J. Weinberger, Addison-Wesley, 1988. ISBN 0-201-07981-X.
2127
2128 GAWK: Effective AWK Programming, Edition 5.1, shipped with the gawk
2129 source. The current version of this document is available online at
2130 https://www.gnu.org/software/gawk/manual.
2131
2132 The GNU gettext documentation, available online at
2133 https://www.gnu.org/software/gettext.
2134
2136 Print and sort the login names of all users:
2137
2138 BEGIN { FS = ":" }
2139 { print $1 | "sort" }
2140
2141 Count lines in a file:
2142
2143 { nlines++ }
2144 END { print nlines }
2145
2146 Precede each line by its number in the file:
2147
2148 { print FNR, $0 }
2149
2150 Concatenate and line number (a variation on a theme):
2151
2152 { print NR, $0 }
2153
2154 Run an external command for particular lines of data:
2155
2156 tail -f access_log |
2157 awk '/myhome.html/ { system("nmap " $1 ">> logdir/myhome.html") }'
2158
2160 Brian Kernighan provided valuable assistance during testing and debug‐
2161 ging. We thank him.
2162
2164 Copyright © 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
2165 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 2013, 2014,
2166 2015, 2016, 2017, 2018, 2019, 2020, 2021, Free Software Foundation,
2167 Inc.
2168
2169 Permission is granted to make and distribute verbatim copies of this
2170 manual page provided the copyright notice and this permission notice
2171 are preserved on all copies.
2172
2173 Permission is granted to copy and distribute modified versions of this
2174 manual page under the conditions for verbatim copying, provided that
2175 the entire resulting derived work is distributed under the terms of a
2176 permission notice identical to this one.
2177
2178 Permission is granted to copy and distribute translations of this man‐
2179 ual page into another language, under the above conditions for modified
2180 versions, except that this permission notice may be stated in a trans‐
2181 lation approved by the Foundation.
2182
2183
2184
2185Free Software Foundation Jul 05 2021 GAWK(1)