1GAWK(1)                        Utility Commands                        GAWK(1)
2
3
4

NAME

6       gawk - pattern scanning and processing language
7

SYNOPSIS

9       gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
10       gawk [ POSIX or GNU style options ] [ -- ] program-text file ...
11

DESCRIPTION

13       Gawk  is  the  GNU Project's implementation of the AWK programming lan‐
14       guage.  It conforms to the definition of  the  language  in  the  POSIX
15       1003.1  standard.   This version in turn is based on the description in
16       The AWK Programming Language, by Aho, Kernighan, and Weinberger.   Gawk
17       provides  the additional features found in the current version of Brian
18       Kernighan's awk and numerous GNU-specific extensions.
19
20       The command line consists of options to gawk itself,  the  AWK  program
21       text  (if  not supplied via the -f or --include options), and values to
22       be made available in the ARGC and ARGV pre-defined AWK variables.
23
24       When gawk is invoked with the --profile  option,  it  starts  gathering
25       profiling statistics from the execution of the program.  Gawk runs more
26       slowly in this mode, and automatically produces an execution profile in
27       the file awkprof.out when done.  See the --profile option, below.
28
29       Gawk  also has an integrated debugger. An interactive debugging session
30       can be started by supplying the --debug option to the command line.  In
31       this mode of execution, gawk loads the AWK source code and then prompts
32       for debugging commands.  Gawk can only debug AWK  program  source  pro‐
33       vided with the -f and --include options.  The debugger is documented in
34       GAWK: Effective AWK Programming.
35

OPTION FORMAT

37       Gawk options may be either traditional POSIX-style one letter  options,
38       or  GNU-style  long  options.   POSIX  options start with a single “-”,
39       while long options start with “--”.  Long options are provided for both
40       GNU-specific features and for POSIX-mandated features.
41
42       Gawk-specific  options  are  typically used in long-option form.  Argu‐
43       ments to long options are either joined with the option by an  =  sign,
44       with no intervening spaces, or they may be provided in the next command
45       line argument.  Long options may be abbreviated, as long as the  abbre‐
46       viation remains unique.
47
48       Additionally,  every  long  option has a corresponding short option, so
49       that the option's functionality may be used from within #!   executable
50       scripts.
51

OPTIONS

53       Gawk accepts the following options.  Standard options are listed first,
54       followed by options for gawk extensions, listed alphabetically by short
55       option.
56
57       -f program-file
58       --file program-file
59              Read  the AWK program source from the file program-file, instead
60              of from the  first  command  line  argument.   Multiple  -f  (or
61              --file)  options may be used.  Files read with -f are treated as
62              if they begin with an implicit @namespace "awk" statement.
63
64       -F fs
65       --field-separator fs
66              Use fs for the input field separator (the value of the FS prede‐
67              fined variable).
68
69       -v var=val
70       --assign var=val
71              Assign  the  value  val to the variable var, before execution of
72              the program begins.  Such variable values are available  to  the
73              BEGIN rule of an AWK program.
74
75       -b
76       --characters-as-bytes
77              Treat  all input data as single-byte characters. In other words,
78              don't pay any attention to the locale information when  attempt‐
79              ing to process strings as multibyte characters.  The --posix op‐
80              tion overrides this one.
81
82       -c
83       --traditional
84              Run in compatibility mode.  In compatibility mode, gawk  behaves
85              identically  to  Brian Kernighan's awk; none of the GNU-specific
86              extensions are recognized.  See GNU EXTENSIONS, below, for  more
87              information.
88
89       -C
90       --copyright
91              Print the short version of the GNU copyright information message
92              on the standard output and exit successfully.
93
94       -d[file]
95       --dump-variables[=file]
96              Print a sorted list of global variables, their types  and  final
97              values  to file.  If no file is provided, gawk uses a file named
98              awkvars.out in the current directory.
99              Having a list of all the global variables is a good way to  look
100              for  typographical  errors in your programs.  You would also use
101              this option if you have a large program with a lot of functions,
102              and  you want to be sure that your functions don't inadvertently
103              use global variables that you meant to be  local.   (This  is  a
104              particularly  easy  mistake  to  make with simple variable names
105              like i, j, and so on.)
106
107       -D[file]
108       --debug[=file]
109              Enable debugging of AWK  programs.   By  default,  the  debugger
110              reads commands interactively from the keyboard (standard input).
111              The optional file argument specifies a file with a list of  com‐
112              mands for the debugger to execute non-interactively.
113
114       -e program-text
115       --source program-text
116              Use program-text as AWK program source code.  This option allows
117              the easy intermixing of library functions (used via the  -f  and
118              --include options) with source code entered on the command line.
119              It is intended primarily for medium to large AWK  programs  used
120              in  shell  scripts.  Each argument supplied via -e is treated as
121              if it begins with an implicit @namespace "awk" statement.
122
123       -E file
124       --exec file
125              Similar to -f, however, this is option  is  the  last  one  pro‐
126              cessed.   This should be used with #!  scripts, particularly for
127              CGI applications, to avoid passing in options or source code (!)
128              on  the  command line from a URL.  This option disables command-
129              line variable assignments.
130
131       -g
132       --gen-pot
133              Scan and parse the AWK program, and generate a GNU .pot  (Porta‐
134              ble Object Template) format file on standard output with entries
135              for all localizable strings in the program.  The program  itself
136              is  not executed.  See the GNU gettext distribution for more in‐
137              formation on .pot files.
138
139       -h
140       --help Print a relatively short summary of the available options on the
141              standard  output.   (Per the GNU Coding Standards, these options
142              cause an immediate, successful exit.)
143
144       -i include-file
145       --include include-file
146              Load an awk source library.  This searches for the library using
147              the  AWKPATH environment variable.  If the initial search fails,
148              another attempt will be made after appending  the  .awk  suffix.
149              The  file  will be loaded only once (i.e., duplicates are elimi‐
150              nated), and the  code  does  not  constitute  the  main  program
151              source.   Files read with --include are treated as if they begin
152              with an implicit @namespace "awk" statement.
153
154       -I
155       --trace
156              Print the internal byte code names as  they  are  executed  when
157              running  the  program.  The  trace is printed to standard error.
158              Each ``op code'' is preceded by a + sign in the output.
159
160       -l lib
161       --load lib
162              Load a  gawk  extension  from  the  shared  library  lib.   This
163              searches  for the library using the AWKLIBPATH environment vari‐
164              able.  If the initial search fails, another attempt will be made
165              after  appending the default shared library suffix for the plat‐
166              form.  The library initialization  routine  is  expected  to  be
167              named dl_load().
168
169       -L [value]
170       --lint[=value]
171              Provide warnings about constructs that are dubious or non-porta‐
172              ble to other AWK implementations.  With an optional argument  of
173              fatal,  lint warnings become fatal errors.  This may be drastic,
174              but its use will certainly encourage the development of  cleaner
175              AWK  programs.  With an optional argument of invalid, only warn‐
176              ings about things that are actually invalid are  issued.   (This
177              is not fully implemented yet.)  With an optional argument of no-
178              ext, warnings about gawk extensions are disabled.
179
180       -M
181       --bignum
182              Force arbitrary precision arithmetic on numbers. This option has
183              no  effect  if  gawk is not compiled to use the GNU MPFR and GMP
184              libraries.  (In such a case, gawk issues a warning.)
185
186       -n
187       --non-decimal-data
188              Recognize octal and hexadecimal values in input data.  Use  this
189              option with great caution!
190
191       -N
192       --use-lc-numeric
193              Force  gawk  to  use  the  locale's decimal point character when
194              parsing input data.  Although the POSIX standard  requires  this
195              behavior,  and  gawk  does so when --posix is in effect, the de‐
196              fault is to follow traditional behavior and use a period as  the
197              decimal point, even in locales where the period is not the deci‐
198              mal point character.  This option overrides the  default  behav‐
199              ior,  without  the  full draconian strictness of the --posix op‐
200              tion.
201
202       -o[file]
203       --pretty-print[=file]
204              Output a pretty printed version of the program to file.   If  no
205              file is provided, gawk uses a file named awkprof.out in the cur‐
206              rent directory.  This option implies --no-optimize.
207
208       -O
209       --optimize
210              Enable gawk's default optimizations upon the internal  represen‐
211              tation  of  the  program.   Currently, this just includes simple
212              constant folding.  This option is on by default.
213
214       -p[prof-file]
215       --profile[=prof-file]
216              Start a profiling session, and send the profiling data to  prof-
217              file.   The default is awkprof.out.  The profile contains execu‐
218              tion counts of each statement in the program in the left  margin
219              and  function  call counts for each user-defined function.  This
220              option implies --no-optimize.
221
222       -P
223       --posix
224              This turns on compatibility mode, with the following  additional
225              restrictions:
226
227\x escape sequences are not recognized.
228
229              • You cannot continue lines after ?  and :.
230
231              • The synonym func for the keyword function is not recognized.
232
233              • The operators ** and **= cannot be used in place of ^ and ^=.
234
235       -r
236       --re-interval
237              Enable  the  use  of  interval expressions in regular expression
238              matching (see Regular Expressions, below).  Interval expressions
239              were not traditionally available in the AWK language.  The POSIX
240              standard added them, to make awk and egrep consistent with  each
241              other.  They are enabled by default, but this option remains for
242              use together with --traditional.
243
244       -s
245       --no-optimize
246              Disable gawk's default optimizations upon the internal represen‐
247              tation of the program.
248
249       -S
250       --sandbox
251              Run gawk in sandbox mode, disabling the system() function, input
252              redirection with getline,  output  redirection  with  print  and
253              printf,  and  loading  dynamic  extensions.   Command  execution
254              (through pipelines) is also disabled.  This effectively blocks a
255              script  from  accessing  local  resources,  except for the files
256              specified on the command line.
257
258       -t
259       --lint-old
260              Provide warnings about constructs that are not portable  to  the
261              original version of UNIX awk.
262
263       -V
264       --version
265              Print  version  information  for this particular copy of gawk on
266              the standard output.  This is useful mainly for knowing  if  the
267              current  copy  of gawk on your system is up to date with respect
268              to whatever the Free Software Foundation is distributing.   This
269              is  also  useful when reporting bugs.  (Per the GNU Coding Stan‐
270              dards, these options cause an immediate, successful exit.)
271
272       --     Signal the end of options. This is useful to allow further argu‐
273              ments  to the AWK program itself to start with a “-”.  This pro‐
274              vides consistency with the argument parsing convention  used  by
275              most other POSIX programs.
276
277       In  compatibility  mode,  any other options are flagged as invalid, but
278       are otherwise ignored.  In normal operation, as long  as  program  text
279       has  been supplied, unknown options are passed on to the AWK program in
280       the ARGV array for processing.  This is particularly useful for running
281       AWK programs via the #!  executable interpreter mechanism.
282
283       For  POSIX  compatibility,  the  -W option may be used, followed by the
284       name of a long option.
285

AWK PROGRAM EXECUTION

287       An AWK program consists of a sequence of optional directives,  pattern-
288       action statements, and optional function definitions.
289
290              @include "filename"
291              @load "filename"
292              @namespace "name"
293              pattern   { action statements }
294              function name(parameter list) { statements }
295
296       Gawk  first reads the program source from the program-file(s) if speci‐
297       fied, from arguments to --source, or from the first non-option argument
298       on  the command line.  The -f and --source options may be used multiple
299       times on the command line.  Gawk reads the program text as if  all  the
300       program-files  and  command line source texts had been concatenated to‐
301       gether.  This is useful for building libraries of AWK functions,  with‐
302       out  having to include them in each new AWK program that uses them.  It
303       also provides the ability to mix library functions  with  command  line
304       programs.
305
306       In addition, lines beginning with @include may be used to include other
307       source files into your program, making library use even  easier.   This
308       is equivalent to using the --include option.
309
310       Lines beginning with @load may be used to load extension functions into
311       your program.  This is equivalent to using the --load option.
312
313       The environment variable AWKPATH specifies a search path  to  use  when
314       finding  source files named with the -f and --include options.  If this
315       variable does not exist, the default path is  ".:/usr/local/share/awk".
316       (The  actual  directory may vary, depending upon how gawk was built and
317       installed.)  If a file name given to the -f option contains a “/” char‐
318       acter, no path search is performed.
319
320       The environment variable AWKLIBPATH specifies a search path to use when
321       finding source files named with the --load option.   If  this  variable
322       does not exist, the default path is "/usr/local/lib/gawk".  (The actual
323       directory may vary, depending upon how gawk was built and installed.)
324
325       Gawk executes AWK programs in the following order.  First, all variable
326       assignments specified via the -v option are performed.  Next, gawk com‐
327       piles the program into an internal form.  Then, gawk executes the  code
328       in  the  BEGIN  rule(s)  (if  any), and then proceeds to read each file
329       named in the ARGV array (up to ARGV[ARGC-1]).  If there  are  no  files
330       named on the command line, gawk reads the standard input.
331
332       If a filename on the command line has the form var=val it is treated as
333       a variable assignment.  The variable var will  be  assigned  the  value
334       val.   (This  happens  after any BEGIN rule(s) have been run.)  Command
335       line variable assignment is most useful for dynamically assigning  val‐
336       ues  to  the  variables  AWK  uses  to control how input is broken into
337       fields and records.  It is also useful for controlling state if  multi‐
338       ple passes are needed over a single data file.
339
340       If  the value of a particular element of ARGV is empty (""), gawk skips
341       over it.
342
343       For each input file, if a BEGINFILE rule exists, gawk executes the  as‐
344       sociated  code  before  processing the contents of the file. Similarly,
345       gawk executes the code associated with  ENDFILE  after  processing  the
346       file.
347
348       For  each record in the input, gawk tests to see if it matches any pat‐
349       tern in the AWK program.  For each pattern  that  the  record  matches,
350       gawk  executes  the  associated action.  The patterns are tested in the
351       order they occur in the program.
352
353       Finally, after all the input is exhausted, gawk executes  the  code  in
354       the END rule(s) (if any).
355
356   Command Line Directories
357       According  to  POSIX,  files named on the awk command line must be text
358       files.  The behavior is ``undefined'' if they are not.   Most  versions
359       of awk treat a directory on the command line as a fatal error.
360
361       Starting with version 4.0 of gawk, a directory on the command line pro‐
362       duces a warning, but is otherwise skipped.  If either of the --posix or
363       --traditional  options is given, then gawk reverts to treating directo‐
364       ries on the command line as a fatal error.
365

VARIABLES, RECORDS AND FIELDS

367       AWK variables are dynamic; they come into existence when they are first
368       used.   Their  values  are either floating-point numbers or strings, or
369       both, depending upon how they  are  used.   Additionally,  gawk  allows
370       variables  to  have  regular-expression  type.  AWK also has one dimen‐
371       sional arrays; arrays with multiple dimensions may be simulated.   Gawk
372       provides true arrays of arrays; see Arrays, below.  Several pre-defined
373       variables are set as a program runs; these are described as needed  and
374       summarized below.
375
376   Records
377       Normally, records are separated by newline characters.  You can control
378       how records are separated by assigning values to the built-in  variable
379       RS.   If  RS is any single character, that character separates records.
380       Otherwise, RS is a regular expression.  Text in the input that  matches
381       this  regular expression separates the record.  However, in compatibil‐
382       ity mode, only the first character of its string value is used for sep‐
383       arating  records.   If  RS  is set to the null string, then records are
384       separated by empty lines.  When RS is set to the null string, the  new‐
385       line  character  always acts as a field separator, in addition to what‐
386       ever value FS may have.
387
388   Fields
389       As each input record is read, gawk splits the record into fields, using
390       the value of the FS variable as the field separator.  If FS is a single
391       character, fields are separated by that character.  If FS is  the  null
392       string,  then each individual character becomes a separate field.  Oth‐
393       erwise, FS is expected to be a full regular expression.  In the special
394       case  that FS is a single space, fields are separated by runs of spaces
395       and/or tabs and/or newlines.  NOTE: The value of IGNORECASE (see below)
396       also  affects how fields are split when FS is a regular expression, and
397       how records are separated when RS is a regular expression.
398
399       If the FIELDWIDTHS variable is set to a space-separated  list  of  num‐
400       bers,  each  field  is expected to have fixed width, and gawk splits up
401       the record using the specified widths.  Each field width may optionally
402       be preceded by a colon-separated value specifying the number of charac‐
403       ters to skip before the field starts.  The value of FS is ignored.  As‐
404       signing a new value to FS or FPAT overrides the use of FIELDWIDTHS.
405
406       Similarly, if the FPAT variable is set to a string representing a regu‐
407       lar expression, each field is made up of text that matches that regular
408       expression.  In  this case, the regular expression describes the fields
409       themselves, instead of the text that separates the fields.  Assigning a
410       new value to FS or FIELDWIDTHS overrides the use of FPAT.
411
412       Each  field  in the input record may be referenced by its position: $1,
413       $2, and so on.  $0 is the whole record, including leading and  trailing
414       whitespace.  Fields need not be referenced by constants:
415
416              n = 5
417              print $n
418
419       prints the fifth field in the input record.
420
421       The  variable  NF  is  set  to  the total number of fields in the input
422       record.
423
424       References to non-existent fields (i.e., fields after $NF) produce  the
425       null string.  However, assigning to a non-existent field (e.g., $(NF+2)
426       = 5) increases the value of NF, creates any intervening fields with the
427       null  string  as  their values, and causes the value of $0 to be recom‐
428       puted, with the fields being separated by the value of OFS.  References
429       to  negative  numbered  fields  cause  a  fatal error.  Decrementing NF
430       causes the values of fields past the new value  to  be  lost,  and  the
431       value  of  $0  to be recomputed, with the fields being separated by the
432       value of OFS.
433
434       Assigning a value to an existing field causes the whole  record  to  be
435       rebuilt  when  $0  is  referenced.   Similarly, assigning a value to $0
436       causes the record to be resplit, creating new values for the fields.
437
438   Built-in Variables
439       Gawk's built-in variables are:
440
441       ARGC        The number of command line arguments (does not include  op‐
442                   tions to gawk, or the program source).
443
444       ARGIND      The index in ARGV of the current file being processed.
445
446       ARGV        Array of command line arguments.  The array is indexed from
447                   0 to ARGC - 1.  Dynamically changing the contents  of  ARGV
448                   can control the files used for data.
449
450       BINMODE     On  non-POSIX  systems,  specifies use of “binary” mode for
451                   all file I/O.  Numeric values of 1, 2, or 3,  specify  that
452                   input  files,  output  files,  or  all files, respectively,
453                   should use binary I/O.  String values of "r", or "w"  spec‐
454                   ify that input files, or output files, respectively, should
455                   use binary I/O.  String values of "rw" or "wr" specify that
456                   all files should use binary I/O.  Any other string value is
457                   treated as "rw", but generates a warning message.
458
459       CONVFMT     The conversion format for numbers, "%.6g", by default.
460
461       ENVIRON     An array containing the values of the current  environment.
462                   The array is indexed by the environment variables, each el‐
463                   ement  being  the  value  of  that  variable  (e.g.,  ENVI‐
464                   RON["HOME"] might be "/home/arnold").
465
466                   In  POSIX mode, changing this array does not affect the en‐
467                   vironment seen by programs which gawk spawns via  redirect‐
468                   ion  or the system() function.  Otherwise, gawk updates its
469                   real  environment  so  that  programs  it  spawns  see  the
470                   changes.
471
472       ERRNO       If  a  system  error  occurs either doing a redirection for
473                   getline, during a read for getline, or  during  a  close(),
474                   then  ERRNO  is  set to a string describing the error.  The
475                   value is subject to translation in non-English locales.  If
476                   the  string  in  ERRNO corresponds to a system error in the
477                   errno(3) variable, then the numeric value can be  found  in
478                   PROCINFO["errno"].   For  non-system  errors, PROCINFO["er‐
479                   rno"] will be zero.
480
481       FIELDWIDTHS A whitespace-separated list of  field  widths.   When  set,
482                   gawk  parses  the input into fields of fixed width, instead
483                   of using the value of the FS variable as the field  separa‐
484                   tor.   Each  field  width  may  optionally be preceded by a
485                   colon-separated value specifying the number  of  characters
486                   to skip before the field starts.  See Fields, above.
487
488       FILENAME    The name of the current input file.  If no files are speci‐
489                   fied on the command line, the value  of  FILENAME  is  “-”.
490                   However,  FILENAME  is undefined inside the BEGIN rule (un‐
491                   less set by getline).
492
493       FNR         The input record number in the current input file.
494
495       FPAT        A regular expression describing the contents of the  fields
496                   in  a record.  When set, gawk parses the input into fields,
497                   where the fields match the regular expression,  instead  of
498                   using  the value of FS as the field separator.  See Fields,
499                   above.
500
501       FS          The input field separator, a space by default.  See Fields,
502                   above.
503
504       FUNCTAB     An  array  whose  indices  and corresponding values are the
505                   names of all the user-defined or extension functions in the
506                   program.   NOTE:  You may not use the delete statement with
507                   the FUNCTAB array.
508
509       IGNORECASE  Controls the case-sensitivity of all regular expression and
510                   string  operations.   If  IGNORECASE  has a non-zero value,
511                   then string comparisons  and  pattern  matching  in  rules,
512                   field  splitting  with  FS and FPAT, record separating with
513                   RS, regular expression matching with ~ and !~, and the gen‐
514                   sub(),  gsub(),  index(), match(), patsplit(), split(), and
515                   sub() built-in functions all ignore case when doing regular
516                   expression operations.  NOTE: Array subscripting is not af‐
517                   fected.  However, the asort() and  asorti()  functions  are
518                   affected.
519                   Thus,  if IGNORECASE is not equal to zero, /aB/ matches all
520                   of the strings "ab", "aB", "Ab", and "AB".  As with all AWK
521                   variables,  the initial value of IGNORECASE is zero, so all
522                   regular expression and string operations are normally case-
523                   sensitive.
524
525       LINT        Provides  dynamic  control of the --lint option from within
526                   an AWK program.  When true, gawk prints lint warnings. When
527                   false,  it does not.  The values allowed for the --lint op‐
528                   tion may also be assigned to LINT, with the  same  effects.
529                   Any other true value just prints warnings.
530
531       NF          The number of fields in the current input record.
532
533       NR          The total number of input records seen so far.
534
535       OFMT        The output format for numbers, "%.6g", by default.
536
537       OFS         The output field separator, a space by default.
538
539       ORS         The output record separator, by default a newline.
540
541       PREC        The working precision of arbitrary precision floating-point
542                   numbers, 53 by default.
543
544       PROCINFO    The elements of this array provide  access  to  information
545                   about  the running AWK program.  On some systems, there may
546                   be elements in the array,  "group1"  through  "groupn"  for
547                   some  n,  which  is the number of supplementary groups that
548                   the process has.  Use the in operator to test for these el‐
549                   ements.  The following elements are guaranteed to be avail‐
550                   able:
551
552                   PROCINFO["argv"]     The command line arguments as received
553                                        by  gawk at the C-language level.  The
554                                        subscripts start from zero.
555
556                   PROCINFO["egid"]     The value  of  the  getegid(2)  system
557                                        call.
558
559                   PROCINFO["errno"]    The  value  of  errno(3) when ERRNO is
560                                        set to the associated error message.
561
562                   PROCINFO["euid"]     The value  of  the  geteuid(2)  system
563                                        call.
564
565                   PROCINFO["FS"]       "FS"  if field splitting with FS is in
566                                        effect, "FPAT" if field splitting with
567                                        FPAT  is  in  effect, "FIELDWIDTHS" if
568                                        field splitting with FIELDWIDTHS is in
569                                        effect,  or  "API" if API input parser
570                                        field splitting is in effect.
571
572                   PROCINFO["gid"]      The  value  of  the  getgid(2)  system
573                                        call.
574
575                   PROCINFO["identifiers"]
576                                        A  subarray,  indexed  by the names of
577                                        all identifiers used in  the  text  of
578                                        the  AWK program.  The values indicate
579                                        what gawk knows about the  identifiers
580                                        after it has finished parsing the pro‐
581                                        gram; they are not updated  while  the
582                                        program  runs.   For  each identifier,
583                                        the value of the element is one of the
584                                        following:
585
586                                        "array"     The  identifier  is an ar‐
587                                                    ray.
588
589                                        "builtin"   The identifier is a built-
590                                                    in function.
591
592                                        "extension" The  identifier  is an ex‐
593                                                    tension  function   loaded
594                                                    via @load or --load.
595
596                                        "scalar"    The    identifier   is   a
597                                                    scalar.
598
599                                        "untyped"   The identifier is  untyped
600                                                    (could be used as a scalar
601                                                    or  array,  gawk   doesn't
602                                                    know yet).
603
604                                        "user"      The  identifier is a user-
605                                                    defined function.
606
607                   PROCINFO["pgrpid"]   The value  of  the  getpgrp(2)  system
608                                        call.
609
610                   PROCINFO["pid"]      The  value  of  the  getpid(2)  system
611                                        call.
612
613                   PROCINFO["platform"] A string indicating the  platform  for
614                                        which  gawk  was  compiled.  It is one
615                                        of:
616
617                                        "djgpp", "mingw"
618                                               Microsoft Windows, using either
619                                               DJGPP, or MinGW, respectively.
620
621                                        "os2"  OS/2.
622
623                                        "posix"
624                                               GNU/Linux,  Cygwin,  Mac  OS X,
625                                               and legacy Unix systems.
626
627                                        "vms"  OpenVMS or Vax/VMS.
628
629                   PROCINFO["ppid"]     The value  of  the  getppid(2)  system
630                                        call.
631
632                   PROCINFO["strftime"] The  default  time  format  string for
633                                        strftime().  Changing  its  value  af‐
634                                        fects how strftime() formats time val‐
635                                        ues when called with no arguments.
636
637                   PROCINFO["uid"]      The  value  of  the  getuid(2)  system
638                                        call.
639
640                   PROCINFO["version"]  The version of gawk.
641
642                   The  following  elements are present if loading dynamic ex‐
643                   tensions is available:
644
645                   PROCINFO["api_major"]
646                          The major version of the extension API.
647
648                   PROCINFO["api_minor"]
649                          The minor version of the extension API.
650
651                   The following elements are available  if  MPFR  support  is
652                   compiled into gawk:
653
654                   PROCINFO["gmp_version"]
655                          The  version  of  the GNU GMP library used for arbi‐
656                          trary precision number support in gawk.
657
658                   PROCINFO["mpfr_version"]
659                          The version of the GNU MPFR library used  for  arbi‐
660                          trary precision number support in gawk.
661
662                   PROCINFO["prec_max"]
663                          The  maximum precision supported by the GNU MPFR li‐
664                          brary for arbitrary  precision  floating-point  num‐
665                          bers.
666
667                   PROCINFO["prec_min"]
668                          The  minimum  precision  allowed by the GNU MPFR li‐
669                          brary for arbitrary  precision  floating-point  num‐
670                          bers.
671
672                   The  following  elements  may  set  by  a program to change
673                   gawk's behavior:
674
675                   PROCINFO["NONFATAL"]
676                          If this exists, then I/O errors for all redirections
677                          become nonfatal.
678
679                   PROCINFO["name", "NONFATAL"]
680                          Make I/O errors for name be nonfatal.
681
682                   PROCINFO["command", "pty"]
683                          Use a pseudo-tty for two-way communication with com‐
684                          mand instead of setting up two one-way pipes.
685
686                   PROCINFO["input", "READ_TIMEOUT"]
687                          The timeout in milliseconds for  reading  data  from
688                          input,  where  input  is  a  redirection string or a
689                          filename. A value of zero or less than zero means no
690                          timeout.
691
692                   PROCINFO["input", "RETRY"]
693                          If  an  I/O  error  that  may be retried occurs when
694                          reading data from input, and this  array  entry  ex‐
695                          ists,  then  getline returns -2 instead of following
696                          the default behavior of returning -1 and configuring
697                          input  to return no further data.  An I/O error that
698                          may be retried is one where errno(3) has  the  value
699                          EAGAIN,  EWOULDBLOCK, EINTR, or ETIMEDOUT.  This may
700                          be  useful  in  conjunction  with  PROCINFO["input",
701                          "READ_TIMEOUT"]  or  in  situations where a file de‐
702                          scriptor has been configured to  behave  in  a  non-
703                          blocking fashion.
704
705                   PROCINFO["sorted_in"]
706                          If  this  element exists in PROCINFO, then its value
707                          controls the order in which array elements are  tra‐
708                          versed   in   for   loops.    Supported  values  are
709                          "@ind_str_asc",   "@ind_num_asc",   "@val_type_asc",
710                          "@val_str_asc",   "@val_num_asc",   "@ind_str_desc",
711                          "@ind_num_desc", "@val_type_desc",  "@val_str_desc",
712                          "@val_num_desc",  and  "@unsorted".   The  value can
713                          also be the name (as a  string)  of  any  comparison
714                          function defined as follows:
715
716                               function cmp_func(i1, v1, i2, v2)
717
718                          where  i1  and i2 are the indices, and v1 and v2 are
719                          the corresponding values of the two  elements  being
720                          compared.   It  should  return  a  number less than,
721                          equal to, or greater than 0, depending  on  how  the
722                          elements of the array are to be ordered.
723
724       ROUNDMODE   The rounding mode to use for arbitrary precision arithmetic
725                   on numbers, by default "N" (IEEE-754 roundTiesToEven mode).
726                   The accepted values are:
727
728                   "A" or "a"
729                          for  rounding away from zero.  These are only avail‐
730                          able if your version of the GNU  MPFR  library  sup‐
731                          ports rounding away from zero.
732
733                   "D" or "d" for roundTowardNegative.
734
735                   "N" or "n" for roundTiesToEven.
736
737                   "U" or "u" for roundTowardPositive.
738
739                   "Z" or "z" for roundTowardZero.
740
741       RS          The input record separator, by default a newline.
742
743       RT          The record terminator.  Gawk sets RT to the input text that
744                   matched the character or regular  expression  specified  by
745                   RS.
746
747       RSTART      The  index  of the first character matched by match(); 0 if
748                   no match.  (This implies that character  indices  start  at
749                   one.)
750
751       RLENGTH     The  length  of  the  string  matched  by match(); -1 if no
752                   match.
753
754       SUBSEP      The string used to separate multiple  subscripts  in  array
755                   elements, by default "\034".
756
757       SYMTAB      An  array  whose indices are the names of all currently de‐
758                   fined global variables and arrays in the program.  The  ar‐
759                   ray  may  be  used for indirect access to read or write the
760                   value of a variable:
761
762                        foo = 5
763                        SYMTAB["foo"] = 4
764                        print foo    # prints 4
765
766                   The typeof() function may be used to test if an element  in
767                   SYMTAB  is  an array.  You may not use the delete statement
768                   with the SYMTAB array, nor assign to elements with an index
769                   that is not a variable name.
770
771       TEXTDOMAIN  The text domain of the AWK program; used to find the local‐
772                   ized translations for the program's strings.
773
774   Arrays
775       Arrays are subscripted with an expression between  square  brackets  ([
776       and ]).  If the expression is an expression list (expr, expr ...)  then
777       the array subscript is a string consisting of the concatenation of  the
778       (string) value of each expression, separated by the value of the SUBSEP
779       variable.  This facility is used to simulate multiply  dimensioned  ar‐
780       rays.  For example:
781
782              i = "A"; j = "B"; k = "C"
783              x[i, j, k] = "hello, world\n"
784
785       assigns the string "hello, world\n" to the element of the array x which
786       is indexed by the string "A\034B\034C".  All arrays in AWK are associa‐
787       tive, i.e., indexed by string values.
788
789       The  special  operator  in may be used to test if an array has an index
790       consisting of a particular value:
791
792              if (val in array)
793                   print array[val]
794
795       If the array has multiple subscripts, use (i, j) in array.
796
797       The in construct may also be used in a for loop to iterate over all the
798       elements  of  an  array.   However,  the (i, j) in array construct only
799       works in tests, not in for loops.
800
801       An element may be deleted from an array  using  the  delete  statement.
802       The  delete statement may also be used to delete the entire contents of
803       an array, just by specifying the array name without a subscript.
804
805       gawk supports true multidimensional arrays. It does  not  require  that
806       such arrays be ``rectangular'' as in C or C++.  For example:
807
808              a[1] = 5
809              a[2][1] = 6
810              a[2][2] = 7
811
812       NOTE:  You may need to tell gawk that an array element is really a sub‐
813       array in order to use it where gawk expects an array (such  as  in  the
814       second argument to split()).  You can do this by creating an element in
815       the subarray and then deleting it with the delete statement.
816
817   Namespaces
818       Gawk provides a simple namespace facility to help work around the  fact
819       that all variables in AWK are global.
820
821       A  qualified name consists of a two simple identifiers joined by a dou‐
822       ble colon (::).  The left-hand identifier represents the namespace  and
823       the  right-hand identifier is the variable within it.  All simple (non-
824       qualified) names are considered to be in the ``current'' namespace; the
825       default  namespace  is  awk.   However,  simple  identifiers consisting
826       solely of uppercase letters are forced into the awk namespace, even  if
827       the current namespace is different.
828
829       You change the current namespace with an @namespace "name" directive.
830
831       The standard predefined builtin function names may not be used as name‐
832       space names.  The names of additional functions provided by gawk may be
833       used  as  namespace names or as simple identifiers in other namespaces.
834       For more details, see GAWK: Effective AWK Programming.
835
836   Variable Typing And Conversion
837       Variables and fields may be (floating point) numbers,  or  strings,  or
838       both.   They  may also be regular expressions. How the value of a vari‐
839       able is interpreted depends upon its context.  If used in a numeric ex‐
840       pression,  it  will be treated as a number; if used as a string it will
841       be treated as a string.
842
843       To force a variable to be treated as a number, add zero to it; to force
844       it to be treated as a string, concatenate it with the null string.
845
846       Uninitialized  variables  have  the  numeric  value zero and the string
847       value "" (the null, or empty, string).
848
849       When a string must be converted to a number, the conversion  is  accom‐
850       plished  using  strtod(3).   A number is converted to a string by using
851       the value of CONVFMT as a format string for sprintf(3),  with  the  nu‐
852       meric  value of the variable as the argument.  However, even though all
853       numbers in AWK are floating-point, integral values are always converted
854       as integers.  Thus, given
855
856              CONVFMT = "%2.2f"
857              a = 12
858              b = a ""
859
860       the variable b has a string value of "12" and not "12.00".
861
862       NOTE:  When  operating in POSIX mode (such as with the --posix option),
863       beware that locale settings may interfere with the way decimal  numbers
864       are  treated:  the  decimal separator of the numbers you are feeding to
865       gawk must conform to what your locale would expect, be it a  comma  (,)
866       or a period (.).
867
868       Gawk  performs  comparisons  as  follows: If two variables are numeric,
869       they are compared numerically.  If one value is numeric and  the  other
870       has  a  string  value  that is a “numeric string,” then comparisons are
871       also done numerically.  Otherwise, the numeric value is converted to  a
872       string and a string comparison is performed.  Two strings are compared,
873       of course, as strings.
874
875       Note that string constants, such as "57", are not numeric strings, they
876       are  string  constants.   The  idea of “numeric string” only applies to
877       fields, getline input, FILENAME, ARGV elements,  ENVIRON  elements  and
878       the  elements of an array created by split() or patsplit() that are nu‐
879       meric strings.  The basic idea is that user input, and only user input,
880       that looks numeric, should be treated that way.
881
882   Octal and Hexadecimal Constants
883       You may use C-style octal and hexadecimal constants in your AWK program
884       source code.  For example, the octal value 011 is equal to  decimal  9,
885       and the hexadecimal value 0x11 is equal to decimal 17.
886
887   String Constants
888       String  constants  in  AWK are sequences of characters enclosed between
889       double quotes (like "value").  Within strings, certain escape sequences
890       are recognized, as in C.  These are:
891
892       \\   A literal backslash.
893
894       \a   The “alert” character; usually the ASCII BEL character.
895
896       \b   Backspace.
897
898       \f   Form-feed.
899
900       \n   Newline.
901
902       \r   Carriage return.
903
904       \t   Horizontal tab.
905
906       \v   Vertical tab.
907
908       \xhex digits
909            The character represented by the string of hexadecimal digits fol‐
910            lowing the \x.  Up to two following hexadecimal digits are consid‐
911            ered  part  of the escape sequence.  E.g., "\x1B" is the ASCII ESC
912            (escape) character.
913
914       \ddd The character represented by the 1-, 2-, or  3-digit  sequence  of
915            octal digits.  E.g., "\033" is the ASCII ESC (escape) character.
916
917       \c   The literal character c.
918
919       In compatibility mode, the characters represented by octal and hexadec‐
920       imal escape sequences are treated literally when used  in  regular  ex‐
921       pression constants.  Thus, /a\52b/ is equivalent to /a\*b/.
922
923   Regexp Constants
924       A  regular expression constant is a sequence of characters enclosed be‐
925       tween forward slashes (like /value/).  Regular expression  matching  is
926       described more fully below; see Regular Expressions.
927
928       The escape sequences described earlier may also be used inside constant
929       regular expressions (e.g., /[ \t\f\n\r\v]/ matches  whitespace  charac‐
930       ters).
931
932       Gawk  provides  strongly  typed regular expression constants. These are
933       written with a leading @ symbol (like so:  @/value/).   Such  constants
934       may  be  assigned  to scalars (variables, array elements) and passed to
935       user-defined functions. Variables that have been so assigned have regu‐
936       lar expression type.
937

PATTERNS AND ACTIONS

939       AWK is a line-oriented language.  The pattern comes first, and then the
940       action.  Action statements are enclosed in { and }.  Either the pattern
941       may be missing, or the action may be missing, but, of course, not both.
942       If the pattern is missing, the action executes for every single  record
943       of input.  A missing action is equivalent to
944
945              { print }
946
947       which prints the entire record.
948
949       Comments  begin with the # character, and continue until the end of the
950       line.  Empty lines may be used to  separate  statements.   Normally,  a
951       statement  ends with a newline, however, this is not the case for lines
952       ending in a comma, {, ?, :, &&, or ||.  Lines ending in do or else also
953       have  their  statements  automatically continued on the following line.
954       In other cases, a line can be continued by ending it  with  a  “\”,  in
955       which  case  the  newline  is ignored.  However, a “\” after a # is not
956       special.
957
958       Multiple statements may be put on one line by separating  them  with  a
959       “;”.   This  applies to both the statements within the action part of a
960       pattern-action pair (the usual case), and to the pattern-action  state‐
961       ments themselves.
962
963   Patterns
964       AWK patterns may be one of the following:
965
966              BEGIN
967              END
968              BEGINFILE
969              ENDFILE
970              /regular expression/
971              relational expression
972              pattern && pattern
973              pattern || pattern
974              pattern ? pattern : pattern
975              (pattern)
976              ! pattern
977              pattern1, pattern2
978
979       BEGIN  and  END  are two special kinds of patterns which are not tested
980       against the input.  The action parts of all BEGIN patterns  are  merged
981       as if all the statements had been written in a single BEGIN rule.  They
982       are executed before any of the input is read.  Similarly, all  the  END
983       rules are merged, and executed when all the input is exhausted (or when
984       an exit statement is executed).  BEGIN and END patterns cannot be  com‐
985       bined  with  other patterns in pattern expressions.  BEGIN and END pat‐
986       terns cannot have missing action parts.
987
988       BEGINFILE and ENDFILE are additional special patterns whose actions are
989       executed  before  reading  the  first record of each command-line input
990       file and after reading the last record of each file.  Inside the BEGIN‐
991       FILE  rule,  the  value  of  ERRNO  is the empty string if the file was
992       opened successfully.  Otherwise, there is some problem  with  the  file
993       and  the code should use nextfile to skip it. If that is not done, gawk
994       produces its usual fatal error for files that cannot be opened.
995
996       For /regular expression/ patterns, the associated statement is executed
997       for each input record that matches the regular expression.  Regular ex‐
998       pressions are the same as those in egrep(1), and are summarized below.
999
1000       A relational expression may use any of the operators defined  below  in
1001       the  section  on  actions.  These generally test whether certain fields
1002       match certain regular expressions.
1003
1004       The &&, ||, and !  operators are logical AND, logical OR,  and  logical
1005       NOT,  respectively, as in C.  They do short-circuit evaluation, also as
1006       in C, and are used for combining more  primitive  pattern  expressions.
1007       As  in  most  languages, parentheses may be used to change the order of
1008       evaluation.
1009
1010       The ?: operator is like the same operator in C.  If the  first  pattern
1011       is true then the pattern used for testing is the second pattern, other‐
1012       wise it is the third.  Only one of the second  and  third  patterns  is
1013       evaluated.
1014
1015       The pattern1, pattern2 form of an expression is called a range pattern.
1016       It matches all input records starting with a record that  matches  pat‐
1017       tern1,  and continuing until a record that matches pattern2, inclusive.
1018       It does not combine with any other sort of pattern expression.
1019
1020   Regular Expressions
1021       Regular expressions are the extended kind found  in  egrep.   They  are
1022       composed of characters as follows:
1023
1024       c          Matches the non-metacharacter c.
1025
1026       \c         Matches the literal character c.
1027
1028       .          Matches any character including newline.
1029
1030       ^          Matches the beginning of a string.
1031
1032       $          Matches the end of a string.
1033
1034       [abc...]   A character list: matches any of the characters abc....  You
1035                  may include a range of characters by separating them with  a
1036                  dash.   To  include a literal dash in the list, put it first
1037                  or last.
1038
1039       [^abc...]  A negated  character  list:  matches  any  character  except
1040                  abc....
1041
1042       r1|r2      Alternation: matches either r1 or r2.
1043
1044       r1r2       Concatenation: matches r1, and then r2.
1045
1046       r+         Matches one or more r's.
1047
1048       r*         Matches zero or more r's.
1049
1050       r?         Matches zero or one r's.
1051
1052       (r)        Grouping: matches r.
1053
1054       r{n}
1055       r{n,}
1056       r{n,m}     One  or two numbers inside braces denote an interval expres‐
1057                  sion.  If there is one number in the braces,  the  preceding
1058                  regular  expression r is repeated n times.  If there are two
1059                  numbers separated by a comma, r is repeated n  to  m  times.
1060                  If  there  is  one number followed by a comma, then r is re‐
1061                  peated at least n times.
1062
1063       \y         Matches the empty string at either the beginning or the  end
1064                  of a word.
1065
1066       \B         Matches the empty string within a word.
1067
1068       \<         Matches the empty string at the beginning of a word.
1069
1070       \>         Matches the empty string at the end of a word.
1071
1072       \s         Matches any whitespace character.
1073
1074       \S         Matches any nonwhitespace character.
1075
1076       \w         Matches  any  word-constituent  character (letter, digit, or
1077                  underscore).
1078
1079       \W         Matches any character that is not word-constituent.
1080
1081       \`         Matches the empty  string  at  the  beginning  of  a  buffer
1082                  (string).
1083
1084       \'         Matches the empty string at the end of a buffer.
1085
1086       The  escape  sequences  that  are valid in string constants (see String
1087       Constants) are also valid in regular expressions.
1088
1089       Character classes are a feature introduced in the  POSIX  standard.   A
1090       character  class  is a special notation for describing lists of charac‐
1091       ters that have a specific attribute, but where  the  actual  characters
1092       themselves  can  vary from country to country and/or from character set
1093       to character set.  For example, the notion of  what  is  an  alphabetic
1094       character differs in the USA and in France.
1095
1096       A  character  class  is  only  valid in a regular expression inside the
1097       brackets of a character list.  Character classes consist of [:, a  key‐
1098       word  denoting the class, and :].  The character classes defined by the
1099       POSIX standard are:
1100
1101       [:alnum:]  Alphanumeric characters.
1102
1103       [:alpha:]  Alphabetic characters.
1104
1105       [:blank:]  Space or tab characters.
1106
1107       [:cntrl:]  Control characters.
1108
1109       [:digit:]  Numeric characters.
1110
1111       [:graph:]  Characters that are both printable and visible.  (A space is
1112                  printable, but not visible, while an a is both.)
1113
1114       [:lower:]  Lowercase alphabetic characters.
1115
1116       [:print:]  Printable  characters (characters that are not control char‐
1117                  acters.)
1118
1119       [:punct:]  Punctuation characters (characters that are not letter, dig‐
1120                  its, control characters, or space characters).
1121
1122       [:space:]  Space  characters (such as space, tab, and formfeed, to name
1123                  a few).
1124
1125       [:upper:]  Uppercase alphabetic characters.
1126
1127       [:xdigit:] Characters that are hexadecimal digits.
1128
1129       For example, before the POSIX standard, to match  alphanumeric  charac‐
1130       ters, you would have had to write /[A-Za-z0-9]/.  If your character set
1131       had other alphabetic characters in it, this would not match  them,  and
1132       if  your  character set collated differently from ASCII, this might not
1133       even match the ASCII alphanumeric characters.  With the POSIX character
1134       classes,  you  can write /[[:alnum:]]/, and this matches the alphabetic
1135       and numeric characters in your character set, no matter what it is.
1136
1137       Two additional special sequences can appear in character lists.   These
1138       apply  to  non-ASCII  character  sets,  which  can  have single symbols
1139       (called collating elements) that are represented  with  more  than  one
1140       character,  as  well as several characters that are equivalent for col‐
1141       lating, or sorting, purposes.  (E.g., in French,  a  plain  “e”  and  a
1142       grave-accented “e`” are equivalent.)
1143
1144       Collating Symbols
1145              A  collating  symbol  is a multi-character collating element en‐
1146              closed in [.  and .].  For example, if ch is  a  collating  ele‐
1147              ment,  then  [[.ch.]]  is a regular expression that matches this
1148              collating element, while  [ch]  is  a  regular  expression  that
1149              matches either c or h.
1150
1151       Equivalence Classes
1152              An  equivalence  class  is  a locale-specific name for a list of
1153              characters that are equivalent.  The name is enclosed in [=  and
1154              =].   For  example, the name e might be used to represent all of
1155              “e”, “e´”, and “e`”.  In this case, [[=e=]] is a  regular  expres‐
1156              sion that matches any of e, e´, or e`.
1157
1158       These  features are very valuable in non-English speaking locales.  The
1159       library functions that gawk uses for regular expression  matching  cur‐
1160       rently  only  recognize  POSIX character classes; they do not recognize
1161       collating symbols or equivalence classes.
1162
1163       The \y, \B, \<, \>, \s, \S, \w, \W, \`, and \' operators  are  specific
1164       to gawk; they are extensions based on facilities in the GNU regular ex‐
1165       pression libraries.
1166
1167       The various command line options control how gawk interprets characters
1168       in regular expressions.
1169
1170       No options
1171              In  the  default case, gawk provides all the facilities of POSIX
1172              regular expressions and the GNU regular expression operators de‐
1173              scribed above.
1174
1175       --posix
1176              Only  POSIX regular expressions are supported, the GNU operators
1177              are not special.  (E.g., \w matches a literal w).
1178
1179       --traditional
1180              Traditional UNIX awk regular expressions are matched.   The  GNU
1181              operators  are  not  special,  and  interval expressions are not
1182              available.  Characters described by octal and hexadecimal escape
1183              sequences  are treated literally, even if they represent regular
1184              expression metacharacters.
1185
1186       --re-interval
1187              Allow interval  expressions  in  regular  expressions,  even  if
1188              --traditional has been provided.
1189
1190   Actions
1191       Action  statements  are enclosed in braces, { and }.  Action statements
1192       consist of the usual assignment, conditional,  and  looping  statements
1193       found  in  most  languages.  The operators, control statements, and in‐
1194       put/output statements available are patterned after those in C.
1195
1196   Operators
1197       The operators in AWK, in order of decreasing precedence, are:
1198
1199       (...)       Grouping
1200
1201       $           Field reference.
1202
1203       ++ --       Increment and decrement, both prefix and postfix.
1204
1205       ^           Exponentiation (** may also be used, and **=  for  the  as‐
1206                   signment operator).
1207
1208       + - !       Unary plus, unary minus, and logical negation.
1209
1210       * / %       Multiplication, division, and modulus.
1211
1212       + -         Addition and subtraction.
1213
1214       space       String concatenation.
1215
1216       |   |&      Piped I/O for getline, print, and printf.
1217
1218       < > <= >= == !=
1219                   The regular relational operators.
1220
1221       ~ !~        Regular  expression match, negated match.  NOTE: Do not use
1222                   a constant regular expression (/foo/) on the left-hand side
1223                   of  a  ~  or !~.  Only use one on the right-hand side.  The
1224                   expression /foo/ ~ exp has  the  same  meaning  as  (($0  ~
1225                   /foo/) ~ exp).  This is usually not what you want.
1226
1227       in          Array membership.
1228
1229       &&          Logical AND.
1230
1231       ||          Logical OR.
1232
1233       ?:          The  C  conditional  expression.  This has the form expr1 ?
1234                   expr2 : expr3.  If expr1 is true, the value of the  expres‐
1235                   sion  is  expr2,  otherwise it is expr3.  Only one of expr2
1236                   and expr3 is evaluated.
1237
1238       = += -= *= /= %= ^=
1239                   Assignment.  Both absolute assignment (var = value) and op‐
1240                   erator-assignment (the other forms) are supported.
1241
1242   Control Statements
1243       The control statements are as follows:
1244
1245              if (condition) statement [ else statement ]
1246              while (condition) statement
1247              do statement while (condition)
1248              for (expr1; expr2; expr3) statement
1249              for (var in array) statement
1250              break
1251              continue
1252              delete array[index]
1253              delete array
1254              exit [ expression ]
1255              { statements }
1256              switch (expression) {
1257              case value|regex : statement
1258              ...
1259              [ default: statement ]
1260              }
1261
1262   I/O Statements
1263       The input/output statements are as follows:
1264
1265       close(file [, how])   Close  file, pipe or coprocess.  The optional how
1266                             should only be used when closing  one  end  of  a
1267                             two-way pipe to a coprocess.  It must be a string
1268                             value, either "to" or "from".
1269
1270       getline               Set $0 from the next input record;  set  NF,  NR,
1271                             FNR, RT.
1272
1273       getline <file         Set $0 from the next record of file; set NF, RT.
1274
1275       getline var           Set  var from the next input record; set NR, FNR,
1276                             RT.
1277
1278       getline var <file     Set var from the next record of file; set RT.
1279
1280       command | getline [var]
1281                             Run command, piping the output either into $0  or
1282                             var, as above, and RT.
1283
1284       command |& getline [var]
1285                             Run  command as a coprocess piping the output ei‐
1286                             ther into $0 or var, as above,  and  RT.   Copro‐
1287                             cesses  are  a  gawk extension.  (The command can
1288                             also be a socket.   See  the  subsection  Special
1289                             File Names, below.)
1290
1291       next                  Stop  processing  the current input record.  Read
1292                             the next input record and start  processing  over
1293                             with  the first pattern in the AWK program.  Upon
1294                             reaching the end of the input data,  execute  any
1295                             END rule(s).
1296
1297       nextfile              Stop processing the current input file.  The next
1298                             input record read comes from the next input file.
1299                             Update  FILENAME  and ARGIND, reset FNR to 1, and
1300                             start processing over with the first  pattern  in
1301                             the  AWK  program.   Upon reaching the end of the
1302                             input data, execute any ENDFILE and END rule(s).
1303
1304       print                 Print the current record.  The output  record  is
1305                             terminated with the value of ORS.
1306
1307       print expr-list       Print  expressions.  Each expression is separated
1308                             by the value of OFS.  The output record is termi‐
1309                             nated with the value of ORS.
1310
1311       print expr-list >file Print  expressions  on  file.  Each expression is
1312                             separated by the value of OFS.  The output record
1313                             is terminated with the value of ORS.
1314
1315       printf fmt, expr-list Format  and print.  See The printf Statement, be‐
1316                             low.
1317
1318       printf fmt, expr-list >file
1319                             Format and print on file.
1320
1321       system(cmd-line)      Execute the command cmd-line, and return the exit
1322                             status.   (This may not be available on non-POSIX
1323                             systems.)  See GAWK:  Effective  AWK  Programming
1324                             for the full details on the exit status.
1325
1326       fflush([file])        Flush any buffers associated with the open output
1327                             file or pipe file.  If file is missing or  if  it
1328                             is  the  null  string, then flush all open output
1329                             files and pipes.
1330
1331       Additional output redirections are allowed for print and printf.
1332
1333       print ... >> file
1334              Append output to the file.
1335
1336       print ... | command
1337              Write on a pipe.
1338
1339       print ... |& command
1340              Send data to a coprocess or socket.  (See  also  the  subsection
1341              Special File Names, below.)
1342
1343       The  getline  command returns 1 on success, zero on end of file, and -1
1344       on an error.  If the errno(3) value indicates that  the  I/O  operation
1345       may  be  retried, and PROCINFO["input", "RETRY"] is set, then -2 is re‐
1346       turned instead of -1, and further calls to getline  may  be  attempted.
1347       Upon an error, ERRNO is set to a string describing the problem.
1348
1349       NOTE:  Failure in opening a two-way socket results in a non-fatal error
1350       being returned to the calling function. If using a pipe, coprocess,  or
1351       socket  to getline, or from print or printf within a loop, you must use
1352       close() to create new instances of the command or socket.  AWK does not
1353       automatically  close  pipes,  sockets,  or coprocesses when they return
1354       EOF.
1355
1356   The printf Statement
1357       The AWK versions of the printf statement and  sprintf()  function  (see
1358       below) accept the following conversion specification formats:
1359
1360       %a, %A  A floating point number of the form [-]0xh.hhhhp+-dd (C99 hexa‐
1361               decimal floating point format).  For %A, uppercase letters  are
1362               used instead of lowercase ones.
1363
1364       %c      A single character.  If the argument used for %c is numeric, it
1365               is treated as a character and printed.  Otherwise, the argument
1366               is assumed to be a string, and the only first character of that
1367               string is printed.
1368
1369       %d, %i  A decimal number (the integer part).
1370
1371       %e, %E  A floating point number of the form [-]d.dddddde[+-]dd.  The %E
1372               format uses E instead of e.
1373
1374       %f, %F  A floating point number of the form [-]ddd.dddddd.  If the sys‐
1375               tem library supports it, %F is available as well. This is  like
1376               %f,  but  uses  capital  letters for special “not a number” and
1377               “infinity” values. If %F is not available, gawk uses %f.
1378
1379       %g, %G  Use %e or %f conversion, whichever is shorter, with nonsignifi‐
1380               cant zeros suppressed.  The %G format uses %E instead of %e.
1381
1382       %o      An unsigned octal number (also an integer).
1383
1384       %u      An unsigned decimal number (again, an integer).
1385
1386       %s      A character string.
1387
1388       %x, %X  An  unsigned  hexadecimal  number  (an integer).  The %X format
1389               uses ABCDEF instead of abcdef.
1390
1391       %%      A single % character; no argument is converted.
1392
1393       Optional, additional parameters may lie between the % and  the  control
1394       letter:
1395
1396       count$ Use the count'th argument at this point in the formatting.  This
1397              is called a positional specifier and is intended  primarily  for
1398              use  in translated versions of format strings, not in the origi‐
1399              nal text of an AWK program.  It is a gawk extension.
1400
1401       -      The expression should be left-justified within its field.
1402
1403       space  For numeric conversions, prefix positive values  with  a  space,
1404              and negative values with a minus sign.
1405
1406       +      The  plus sign, used before the width modifier (see below), says
1407              to always supply a sign for numeric  conversions,  even  if  the
1408              data  to  be  formatted  is positive.  The + overrides the space
1409              modifier.
1410
1411       #      Use an “alternate form” for certain control  letters.   For  %o,
1412              supply  a  leading zero.  For %x, and %X, supply a leading 0x or
1413              0X for a nonzero result.  For %e, %E, %f and %F, the result  al‐
1414              ways  contains  a decimal point.  For %g, and %G, trailing zeros
1415              are not removed from the result.
1416
1417       0      A leading 0 (zero) acts as a flag, indicating that output should
1418              be  padded  with zeroes instead of spaces.  This applies only to
1419              the numeric output formats.  This flag only has an  effect  when
1420              the field width is wider than the value to be printed.
1421
1422       '      A  single  quote character instructs gawk to insert the locale's
1423              thousands-separator character into decimal numbers, and to  also
1424              use  the  locale's  decimal  point character with floating point
1425              formats.  This requires correct locale support in the C  library
1426              and in the definition of the current locale.
1427
1428       width  The field should be padded to this width.  The field is normally
1429              padded with spaces.  With the 0 flag, it is padded with zeroes.
1430
1431       .prec  A number that specifies the precision to use when printing.  For
1432              the  %e,  %E,  %f  and %F, formats, this specifies the number of
1433              digits you want printed to the right of the decimal point.   For
1434              the  %g, and %G formats, it specifies the maximum number of sig‐
1435              nificant digits.  For the %d, %i, %o, %u, %x, and %X formats, it
1436              specifies  the  minimum  number  of digits to print.  For the %s
1437              format, it specifies the maximum number of characters  from  the
1438              string that should be printed.
1439
1440       The  dynamic width and prec capabilities of the ISO C printf() routines
1441       are supported.  A * in place of either the width or prec specifications
1442       causes  their  values  to  be taken from the argument list to printf or
1443       sprintf().  To use a positional specifier with a dynamic width or  pre‐
1444       cision,  supply the count$ after the * in the format string.  For exam‐
1445       ple, "%3$*2$.*1$s".
1446
1447   Special File Names
1448       When doing I/O redirection from either print or printf into a file,  or
1449       via  getline from a file, gawk recognizes certain special filenames in‐
1450       ternally.  These filenames allow access to open file descriptors inher‐
1451       ited  from gawk's parent process (usually the shell).  These file names
1452       may also be used on the command line to name data files.  The filenames
1453       are:
1454
1455       -           The standard input.
1456
1457       /dev/stdin  The standard input.
1458
1459       /dev/stdout The standard output.
1460
1461       /dev/stderr The standard error output.
1462
1463       /dev/fd/n   The file associated with the open file descriptor n.
1464
1465       These are particularly useful for error messages.  For example:
1466
1467              print "You blew it!" > "/dev/stderr"
1468
1469       whereas you would otherwise have to use
1470
1471              print "You blew it!" | "cat 1>&2"
1472
1473       The following special filenames may be used with the |& coprocess oper‐
1474       ator for creating TCP/IP network connections:
1475
1476       /inet/tcp/lport/rhost/rport
1477       /inet4/tcp/lport/rhost/rport
1478       /inet6/tcp/lport/rhost/rport
1479              Files for a TCP/IP connection on local port lport to remote host
1480              rhost  on remote port rport.  Use a port of 0 to have the system
1481              pick a port.  Use /inet4 to force an IPv4 connection, and /inet6
1482              to  force  an  IPv6 connection.  Plain /inet uses the system de‐
1483              fault (most likely IPv4).  Usable only with the |&  two-way  I/O
1484              operator.
1485
1486       /inet/udp/lport/rhost/rport
1487       /inet4/udp/lport/rhost/rport
1488       /inet6/udp/lport/rhost/rport
1489              Similar, but use UDP/IP instead of TCP/IP.
1490
1491   Numeric Functions
1492       AWK has the following built-in arithmetic functions:
1493
1494       atan2(y, x)   Return the arctangent of y/x in radians.
1495
1496       cos(expr)     Return the cosine of expr, which is in radians.
1497
1498       exp(expr)     The exponential function.
1499
1500       int(expr)     Truncate to integer.
1501
1502       log(expr)     The natural logarithm function.
1503
1504       rand()        Return a random number N, between zero and one, such that
1505                     0 ≤ N < 1.
1506
1507       sin(expr)     Return the sine of expr, which is in radians.
1508
1509       sqrt(expr)    Return the square root of expr.
1510
1511       srand([expr]) Use expr as the new seed for the random number generator.
1512                     If  no expr is provided, use the time of day.  Return the
1513                     previous seed for the random number generator.
1514
1515   String Functions
1516       Gawk has the following built-in string functions:
1517
1518       asort(s [, d [, how] ]) Return the number of elements in the source ar‐
1519                               ray  s.   Sort  the  contents of s using gawk's
1520                               normal rules for comparing values, and  replace
1521                               the indices of the sorted values s with sequen‐
1522                               tial integers starting with 1. If the  optional
1523                               destination  array d is specified, first dupli‐
1524                               cate s into d, and then sort d, leaving the in‐
1525                               dices  of the source array s unchanged. The op‐
1526                               tional string how controls  the  direction  and
1527                               the  comparison mode.  Valid values for how are
1528                               any    of     the     strings     valid     for
1529                               PROCINFO["sorted_in"].  It can also be the name
1530                               of a user-defined comparison  function  as  de‐
1531                               scribed  in PROCINFO["sorted_in"].  s and d are
1532                               allowed to be the same array; this  only  makes
1533                               sense  when  supplying  the  third  argument as
1534                               well.
1535
1536       asorti(s [, d [, how] ])
1537                               Return the number of elements in the source ar‐
1538                               ray  s.   The  behavior  is the same as that of
1539                               asort(), except that the array indices are used
1540                               for  sorting, not the array values.  When done,
1541                               the array is indexed numerically, and the  val‐
1542                               ues  are  those  of  the original indices.  The
1543                               original values are lost; thus provide a second
1544                               array  if  you  wish  to preserve the original.
1545                               The purpose of the optional string how  is  the
1546                               same as described previously for asort().  Here
1547                               too, s and d are allowed to be the same  array;
1548                               this  only makes sense when supplying the third
1549                               argument as well.
1550
1551       gensub(r, s, h [, t])   Search the target string t for matches  of  the
1552                               regular  expression r.  If h is a string begin‐
1553                               ning with g or G, then replace all matches of r
1554                               with  s.   Otherwise,  h is a number indicating
1555                               which match of r to replace.  If t is not  sup‐
1556                               plied,  use $0 instead.  Within the replacement
1557                               text s, the sequence \n, where  n  is  a  digit
1558                               from  1  to 9, may be used to indicate just the
1559                               text that matched the n'th parenthesized subex‐
1560                               pression.   The  sequence \0 represents the en‐
1561                               tire matched text, as  does  the  character  &.
1562                               Unlike sub() and gsub(), the modified string is
1563                               returned as the result of the function, and the
1564                               original target string is not changed.
1565
1566       gsub(r, s [, t])        For each substring matching the regular expres‐
1567                               sion r in the string t, substitute  the  string
1568                               s,  and return the number of substitutions.  If
1569                               t is not supplied, use $0.  An  &  in  the  re‐
1570                               placement  text  is replaced with the text that
1571                               was actually matched.  Use \& to get a  literal
1572                               &.  (This must be typed as "\\&"; see GAWK: Ef‐
1573                               fective AWK Programming for a fuller discussion
1574                               of  the rules for ampersands and backslashes in
1575                               the replacement text of sub(), gsub(), and gen‐
1576                               sub().)
1577
1578       index(s, t)             Return  the index of the string t in the string
1579                               s, or zero if t is not present.  (This  implies
1580                               that  character indices start at one.)  It is a
1581                               fatal error to use a regexp constant for t.
1582
1583       length([s])             Return the length  of  the  string  s,  or  the
1584                               length  of  $0 if s is not supplied.  As a non-
1585                               standard extension,  with  an  array  argument,
1586                               length()  returns the number of elements in the
1587                               array.
1588
1589       match(s, r [, a])       Return the position in s where the regular  ex‐
1590                               pression r occurs, or zero if r is not present,
1591                               and set the values of RSTART and RLENGTH.  Note
1592                               that  the argument order is the same as for the
1593                               ~ operator: str ~ re.  If array a is  provided,
1594                               a  is cleared and then elements 1 through n are
1595                               filled with the portions of s  that  match  the
1596                               corresponding parenthesized subexpression in r.
1597                               The zero'th element of a contains  the  portion
1598                               of  s  matched by the entire regular expression
1599                               r.   Subscripts   a[n,   "start"],   and   a[n,
1600                               "length"]  provide  the  starting  index in the
1601                               string and length respectively, of each  match‐
1602                               ing substring.
1603
1604       patsplit(s, a [, r [, seps] ])
1605                               Split  the  string  s  into the array a and the
1606                               separators array seps on the regular expression
1607                               r,  and  return  the number of fields.  Element
1608                               values are the portions of s  that  matched  r.
1609                               The value of seps[i] is the possibly null sepa‐
1610                               rator that appeared after a[i].  The  value  of
1611                               seps[0] is the possibly null leading separator.
1612                               If r is omitted, FPAT is used instead.  The ar‐
1613                               rays  a  and seps are cleared first.  Splitting
1614                               behaves identically  to  field  splitting  with
1615                               FPAT, described above.
1616
1617       split(s, a [, r [, seps] ])
1618                               Split  the  string  s  into the array a and the
1619                               separators array seps on the regular expression
1620                               r,  and  return  the number of fields.  If r is
1621                               omitted, FS is used instead.  The arrays a  and
1622                               seps  are  cleared first.  seps[i] is the field
1623                               separator matched by r between a[i] and a[i+1].
1624                               If r is a single space, then leading whitespace
1625                               in s goes into the extra array element  seps[0]
1626                               and trailing whitespace goes into the extra ar‐
1627                               ray element seps[n],  where  n  is  the  return
1628                               value  of  split(s, a, r, seps).  Splitting be‐
1629                               haves identically to field splitting, described
1630                               above.  In particular, if r is a single-charac‐
1631                               ter string, that string acts as the  separator,
1632                               even  if  it happens to be a regular expression
1633                               metacharacter.
1634
1635       sprintf(fmt, expr-list) Print expr-list according to  fmt,  and  return
1636                               the resulting string.
1637
1638       strtonum(str)           Examine  str, and return its numeric value.  If
1639                               str begins with a leading 0, treat it as an oc‐
1640                               tal number.  If str begins with a leading 0x or
1641                               0X, treat it as a hexadecimal  number.   Other‐
1642                               wise, assume it is a decimal number.
1643
1644       sub(r, s [, t])         Just  like  gsub(),  but replace only the first
1645                               matching substring.  Return either zero or one.
1646
1647       substr(s, i [, n])      Return the at most n-character substring  of  s
1648                               starting  at  i.  If n is omitted, use the rest
1649                               of s.
1650
1651       tolower(str)            Return a copy of the string str, with  all  the
1652                               uppercase characters in str translated to their
1653                               corresponding lowercase counterparts.   Non-al‐
1654                               phabetic characters are left unchanged.
1655
1656       toupper(str)            Return  a  copy of the string str, with all the
1657                               lowercase characters in str translated to their
1658                               corresponding  uppercase counterparts.  Non-al‐
1659                               phabetic characters are left unchanged.
1660
1661       Gawk is multibyte aware.  This means that index(),  length(),  substr()
1662       and match() all work in terms of characters, not bytes.
1663
1664   Time Functions
1665       Since  one  of the primary uses of AWK programs is processing log files
1666       that contain time stamp information, gawk provides the following  func‐
1667       tions for obtaining time stamps and formatting them.
1668
1669       mktime(datespec [, utc-flag])
1670                 Turn  datespec into a time stamp of the same form as returned
1671                 by systime(), and return  the  result.   The  datespec  is  a
1672                 string  of  the form YYYY MM DD HH MM SS[ DST].  The contents
1673                 of the string are six or seven numbers  representing  respec‐
1674                 tively  the  full year including century, the month from 1 to
1675                 12, the day of the month from 1 to 31, the hour  of  the  day
1676                 from  0  to 23, the minute from 0 to 59, the second from 0 to
1677                 60, and an optional daylight  saving  flag.   The  values  of
1678                 these  numbers  need  not be within the ranges specified; for
1679                 example, an hour of -1 means 1  hour  before  midnight.   The
1680                 origin-zero  Gregorian  calendar is assumed, with year 0 pre‐
1681                 ceding year 1 and year -1 preceding year 0.  If  utc-flag  is
1682                 present  and  is non-zero or non-null, the time is assumed to
1683                 be in the UTC time zone; otherwise, the time is assumed to be
1684                 in  the  local time zone.  If the DST daylight saving flag is
1685                 positive, the time is assumed to be daylight saving time;  if
1686                 zero,  the  time is assumed to be standard time; and if nega‐
1687                 tive (the default), mktime() attempts  to  determine  whether
1688                 daylight saving time is in effect for the specified time.  If
1689                 datespec does not contain enough elements or if the resulting
1690                 time is out of range, mktime() returns -1.
1691
1692       strftime([format [, timestamp[, utc-flag]]])
1693                 Format  timestamp  according  to the specification in format.
1694                 If utc-flag is present and is non-zero or non-null,  the  re‐
1695                 sult  is  in UTC, otherwise the result is in local time.  The
1696                 timestamp should be of the same  form  as  returned  by  sys‐
1697                 time().   If timestamp is missing, the current time of day is
1698                 used.  If format is missing, a default format  equivalent  to
1699                 the  output of date(1) is used.  The default format is avail‐
1700                 able in PROCINFO["strftime"].  See the specification for  the
1701                 strftime()  function in ISO C for the format conversions that
1702                 are guaranteed to be available.
1703
1704       systime() Return the current time of day as the number of seconds since
1705                 the Epoch (1970-01-01 00:00:00 UTC on POSIX systems).
1706
1707   Bit Manipulations Functions
1708       Gawk  supplies  the following bit manipulation functions.  They work by
1709       converting double-precision floating point values  to  uintmax_t  inte‐
1710       gers,  doing  the  operation,  and  then  converting the result back to
1711       floating point.
1712
1713       NOTE: Passing negative operands to any of these functions causes a  fa‐
1714       tal error.
1715
1716       The functions are:
1717
1718       and(v1, v2 [, ...]) Return  the  bitwise  AND of the values provided in
1719                           the argument list.  There must be at least two.
1720
1721       compl(val)          Return the bitwise complement of val.
1722
1723       lshift(val, count)  Return the value of  val,  shifted  left  by  count
1724                           bits.
1725
1726       or(v1, v2 [, ...])  Return the bitwise OR of the values provided in the
1727                           argument list.  There must be at least two.
1728
1729       rshift(val, count)  Return the value of val,  shifted  right  by  count
1730                           bits.
1731
1732       xor(v1, v2 [, ...]) Return  the  bitwise  XOR of the values provided in
1733                           the argument list.  There must be at least two.
1734
1735   Type Functions
1736       The following functions provide type related  information  about  their
1737       arguments.
1738
1739       isarray(x) Return  true  if x is an array, false otherwise.  This func‐
1740                  tion is mainly for use with the elements of multidimensional
1741                  arrays and with function parameters.
1742
1743       typeof(x)  Return  a  string indicating the type of x.  The string will
1744                  be one of "array", "number", "regexp",  "string",  "strnum",
1745                  "unassigned", or "undefined".
1746
1747   Internationalization Functions
1748       The  following  functions  may be used from within your AWK program for
1749       translating strings at run-time.  For full details, see GAWK: Effective
1750       AWK Programming.
1751
1752       bindtextdomain(directory [, domain])
1753              Specify  the  directory  where gawk looks for the .gmo files, in
1754              case they will not or cannot be placed in the ``standard'' loca‐
1755              tions  (e.g.,  during  testing).  It returns the directory where
1756              domain is ``bound.''
1757              The default domain is the value of TEXTDOMAIN.  If directory  is
1758              the  null string (""), then bindtextdomain() returns the current
1759              binding for the given domain.
1760
1761       dcgettext(string [, domain [, category]])
1762              Return the translation of string in text domain domain  for  lo‐
1763              cale  category  category.   The  default value for domain is the
1764              current value of TEXTDOMAIN.  The default value for category  is
1765              "LC_MESSAGES".
1766              If you supply a value for category, it must be a string equal to
1767              one of the known locale categories described in GAWK:  Effective
1768              AWK  Programming.   You  must  also  supply  a text domain.  Use
1769              TEXTDOMAIN if you want to use the current domain.
1770
1771       dcngettext(string1, string2, number [, domain [, category]])
1772              Return the plural form used for number  of  the  translation  of
1773              string1  and  string2  in text domain domain for locale category
1774              category.  The default value for domain is the current value  of
1775              TEXTDOMAIN.  The default value for category is "LC_MESSAGES".
1776              If you supply a value for category, it must be a string equal to
1777              one of the known locale categories described in GAWK:  Effective
1778              AWK  Programming.   You  must  also  supply  a text domain.  Use
1779              TEXTDOMAIN if you want to use the current domain.
1780

USER-DEFINED FUNCTIONS

1782       Functions in AWK are defined as follows:
1783
1784              function name(parameter list) { statements }
1785
1786       Functions execute when they are called from within expressions  in  ei‐
1787       ther  patterns  or actions.  Actual parameters supplied in the function
1788       call are used to instantiate the  formal  parameters  declared  in  the
1789       function.   Arrays  are passed by reference, other variables are passed
1790       by value.
1791
1792       Since functions were not originally part of the AWK language, the  pro‐
1793       vision for local variables is rather clumsy: They are declared as extra
1794       parameters in the parameter list.  The convention is to separate  local
1795       variables  from  real parameters by extra spaces in the parameter list.
1796       For example:
1797
1798              function  f(p, q,     a, b)   # a and b are local
1799              {
1800                   ...
1801              }
1802
1803              /abc/     { ... ; f(1, 2) ; ... }
1804
1805       The left parenthesis in a function call is required to immediately fol‐
1806       low the function name, without any intervening whitespace.  This avoids
1807       a syntactic ambiguity with the concatenation operator.   This  restric‐
1808       tion does not apply to the built-in functions listed above.
1809
1810       Functions  may  call each other and may be recursive.  Function parame‐
1811       ters used as local variables are initialized to the null string and the
1812       number zero upon function invocation.
1813
1814       Use return expr to return a value from a function.  The return value is
1815       undefined if no value is provided, or if the function returns by “fall‐
1816       ing off” the end.
1817
1818       As  a  gawk  extension, functions may be called indirectly. To do this,
1819       assign the name of the function to be called, as a string, to  a  vari‐
1820       able.  Then use the variable as if it were the name of a function, pre‐
1821       fixed with an @ sign, like so:
1822              function myfunc()
1823              {
1824                   print "myfunc called"
1825                   ...
1826              }
1827
1828              {    ...
1829                   the_func = "myfunc"
1830                   @the_func()    # call through the_func to myfunc
1831                   ...
1832              }
1833       As of version 4.1.2, this works with user-defined  functions,  built-in
1834       functions, and extension functions.
1835
1836       If  --lint has been provided, gawk warns about calls to undefined func‐
1837       tions at parse time, instead of at  run  time.   Calling  an  undefined
1838       function at run time is a fatal error.
1839
1840       The word func may be used in place of function, although this is depre‐
1841       cated.
1842

DYNAMICALLY LOADING NEW FUNCTIONS

1844       You can dynamically add new functions written in C or C++ to  the  run‐
1845       ning  gawk  interpreter with the @load statement.  The full details are
1846       beyond the scope of this manual page; see GAWK: Effective AWK  Program‐
1847       ming.
1848

SIGNALS

1850       The  gawk  profiler  accepts  two signals.  SIGUSR1 causes it to dump a
1851       profile and function call stack to the profile file,  which  is  either
1852       awkprof.out,  or whatever file was named with the --profile option.  It
1853       then continues to run.  SIGHUP causes gawk  to  dump  the  profile  and
1854       function call stack and then exit.
1855

INTERNATIONALIZATION

1857       String constants are sequences of characters enclosed in double quotes.
1858       In non-English speaking environments, it is possible to mark strings in
1859       the AWK program as requiring translation to the local natural language.
1860       Such strings are marked in the AWK program with  a  leading  underscore
1861       (“_”).  For example,
1862
1863              gawk 'BEGIN { print "hello, world" }'
1864
1865       always prints hello, world.  But,
1866
1867              gawk 'BEGIN { print _"hello, world" }'
1868
1869       might print bonjour, monde in France.
1870
1871       There are several steps involved in producing and running a localizable
1872       AWK program.
1873
1874       1.  Add a BEGIN action to assign a value to the TEXTDOMAIN variable  to
1875           set the text domain to a name associated with your program:
1876
1877                BEGIN { TEXTDOMAIN = "myprog" }
1878
1879           This  allows  gawk  to find the .gmo file associated with your pro‐
1880           gram.  Without this step, gawk uses the messages text domain, which
1881           likely does not contain translations for your program.
1882
1883       2.  Mark  all  strings  that  should  be translated with leading under‐
1884           scores.
1885
1886       3.  If necessary, use the dcgettext() and/or bindtextdomain() functions
1887           in your program, as appropriate.
1888
1889       4.  Run  gawk  --gen-pot  -f myprog.awk > myprog.pot to generate a .pot
1890           file for your program.
1891
1892       5.  Provide appropriate translations, and build and install the  corre‐
1893           sponding .gmo files.
1894
1895       The internationalization features are described in full detail in GAWK:
1896       Effective AWK Programming.
1897

POSIX COMPATIBILITY

1899       A primary goal for gawk is compatibility with the  POSIX  standard,  as
1900       well as with the latest version of Brian Kernighan's awk.  To this end,
1901       gawk incorporates the following user visible features which are not de‐
1902       scribed  in the AWK book, but are part of the Brian Kernighan's version
1903       of awk, and are in the POSIX standard.
1904
1905       The book indicates that command line variable assignment  happens  when
1906       awk would otherwise open the argument as a file, which is after the BE‐
1907       GIN rule is executed.  However, in earlier implementations,  when  such
1908       an assignment appeared before any file names, the assignment would hap‐
1909       pen before the BEGIN rule was run.  Applications came to depend on this
1910       “feature.”  When awk was changed to match its documentation, the -v op‐
1911       tion for assigning variables before program execution was added to  ac‐
1912       commodate applications that depended upon the old behavior.  (This fea‐
1913       ture was agreed upon by both the Bell Laboratories developers  and  the
1914       GNU developers.)
1915
1916       When  processing arguments, gawk uses the special option “--” to signal
1917       the end of arguments.  In compatibility mode, it warns about but other‐
1918       wise  ignores  undefined  options.  In normal operation, such arguments
1919       are passed on to the AWK program for it to process.
1920
1921       The AWK book does not define the return value of  srand().   The  POSIX
1922       standard has it return the seed it was using, to allow keeping track of
1923       random number sequences.  Therefore srand() in gawk  also  returns  its
1924       current seed.
1925
1926       Other  features are: The use of multiple -f options (from MKS awk); the
1927       ENVIRON array; the \a, and \v escape sequences (done originally in gawk
1928       and  fed  back  into  the Bell Laboratories version); the tolower() and
1929       toupper() built-in functions (from the Bell Laboratories version);  and
1930       the  ISO  C conversion specifications in printf (done first in the Bell
1931       Laboratories version).
1932

HISTORICAL FEATURES

1934       There is one feature of historical AWK implementations that  gawk  sup‐
1935       ports:  It  is possible to call the length() built-in function not only
1936       with no argument, but even without parentheses!  Thus,
1937
1938              a = length     # Holy Algol 60, Batman!
1939
1940       is the same as either of
1941
1942              a = length()
1943              a = length($0)
1944
1945       Using this feature is poor practice, and gawk issues  a  warning  about
1946       its use if --lint is specified on the command line.
1947

GNU EXTENSIONS

1949       Gawk  has  a too-large number of extensions to POSIX awk.  They are de‐
1950       scribed in this section.  All the extensions described here can be dis‐
1951       abled by invoking gawk with the --traditional or --posix options.
1952
1953       The following features of gawk are not available in POSIX awk.
1954
1955       • No  path  search  is  performed  for  files  named via the -f option.
1956         Therefore the AWKPATH environment variable is not special.
1957
1958       • There is no facility for doing file inclusion (gawk's @include mecha‐
1959         nism).
1960
1961       • There  is no facility for dynamically adding new functions written in
1962         C (gawk's @load mechanism).
1963
1964       • The \x escape sequence.
1965
1966       • The ability to continue lines after ?  and :.
1967
1968       • Octal and hexadecimal constants in AWK programs.
1969
1970       • The ARGIND, BINMODE, ERRNO, LINT, PREC, ROUNDMODE, RT and  TEXTDOMAIN
1971         variables are not special.
1972
1973       • The IGNORECASE variable and its side-effects are not available.
1974
1975       • The FIELDWIDTHS variable and fixed-width field splitting.
1976
1977       • The FPAT variable and field splitting based on field values.
1978
1979       • The FUNCTAB, SYMTAB, and PROCINFO arrays are not available.
1980
1981       • The use of RS as a regular expression.
1982
1983       • The  special  file names available for I/O redirection are not recog‐
1984         nized.
1985
1986       • The |& operator for creating coprocesses.
1987
1988       • The BEGINFILE and ENDFILE special patterns are not available.
1989
1990       • The ability to split out individual characters using the null  string
1991         as the value of FS, and as the third argument to split().
1992
1993       • An  optional  fourth  argument  to  split()  to receive the separator
1994         texts.
1995
1996       • The optional second argument to the close() function.
1997
1998       • The optional third argument to the match() function.
1999
2000       • The ability to use positional specifiers with printf and sprintf().
2001
2002       • The ability to pass an array to length().
2003
2004       • The and(), asort(), asorti(), bindtextdomain(), compl(), dcgettext(),
2005         dcngettext(),   gensub(),   lshift(),   mktime(),  or(),  patsplit(),
2006         rshift(), strftime(), strtonum(), systime() and xor() functions.
2007
2008       • Localizable strings.
2009
2010       • Non-fatal I/O.
2011
2012       • Retryable I/O.
2013
2014       The AWK book does not define the return value of the close()  function.
2015       Gawk's  close()  returns  the  value from fclose(3), or pclose(3), when
2016       closing an output file or pipe, respectively.  It returns the process's
2017       exit  status when closing an input pipe.  The return value is -1 if the
2018       named file, pipe or coprocess was not opened with a redirection.
2019
2020       When gawk is invoked with the --traditional option, if the fs  argument
2021       to  the  -F  option  is “t”, then FS is set to the tab character.  Note
2022       that typing gawk -F\t ...  simply causes the shell to  quote  the  “t,”
2023       and  does  not pass “\t” to the -F option.  Since this is a rather ugly
2024       special case, it is not the default behavior.  This behavior also  does
2025       not occur if --posix has been specified.  To really get a tab character
2026       as the field separator, it is best to use single  quotes:  gawk  -F'\t'
2027       ....
2028

ENVIRONMENT VARIABLES

2030       The  AWKPATH  environment variable can be used to provide a list of di‐
2031       rectories that gawk searches when looking for files named via  the  -f,
2032       --file,  -i  and --include options, and the @include directive.  If the
2033       initial search fails, the path is searched again after  appending  .awk
2034       to the filename.
2035
2036       The  AWKLIBPATH  environment  variable can be used to provide a list of
2037       directories that gawk searches when looking for files named via the  -l
2038       and --load options.
2039
2040       The  GAWK_READ_TIMEOUT  environment  variable  can be used to specify a
2041       timeout in milliseconds for reading input from a terminal, pipe or two-
2042       way communication including sockets.
2043
2044       For  connection to a remote host via socket, GAWK_SOCK_RETRIES controls
2045       the number of retries, and GAWK_MSEC_SLEEP  the  interval  between  re‐
2046       tries.  The interval is in milliseconds. On systems that do not support
2047       usleep(3), the value is rounded up to an integral number of seconds.
2048
2049       If POSIXLY_CORRECT exists in the environment, then gawk behaves exactly
2050       as  if  --posix  had been specified on the command line.  If --lint has
2051       been specified, gawk issues a warning message to this effect.
2052

EXIT STATUS

2054       If the exit statement is used with a value, then gawk  exits  with  the
2055       numeric value given to it.
2056
2057       Otherwise,  if there were no problems during execution, gawk exits with
2058       the value of the C constant EXIT_SUCCESS.  This is usually zero.
2059
2060       If an error occurs, gawk  exits  with  the  value  of  the  C  constant
2061       EXIT_FAILURE.  This is usually one.
2062
2063       If  gawk exits because of a fatal error, the exit status is 2.  On non-
2064       POSIX systems, this value may be mapped to EXIT_FAILURE.
2065

VERSION INFORMATION

2067       This man page documents gawk, version 5.1.
2068

AUTHORS

2070       The original version of UNIX awk was designed and implemented by Alfred
2071       Aho, Peter Weinberger, and Brian Kernighan of Bell Laboratories.  Brian
2072       Kernighan continues to maintain and enhance it.
2073
2074       Paul Rubin and Jay Fenlason, of the  Free  Software  Foundation,  wrote
2075       gawk,  to be compatible with the original version of awk distributed in
2076       Seventh Edition UNIX.  John Woods contributed a number  of  bug  fixes.
2077       David  Trueman,  with contributions from Arnold Robbins, made gawk com‐
2078       patible with the new version of UNIX awk.  Arnold Robbins is  the  cur‐
2079       rent maintainer.
2080
2081       See GAWK: Effective AWK Programming for a full list of the contributors
2082       to gawk and its documentation.
2083
2084       See the README file in the gawk distribution for up-to-date information
2085       about maintainers and which ports are currently supported.
2086

BUG REPORTS

2088       If   you   find   a  bug  in  gawk,  please  send  electronic  mail  to
2089       bug-gawk@gnu.org.  Please include your operating system and  its  revi‐
2090       sion,  the  version of gawk (from gawk --version), which C compiler you
2091       used to compile it, and a test program and data that are  as  small  as
2092       possible for reproducing the problem.
2093
2094       Before  sending  a  bug report, please do the following things.  First,
2095       verify that you have the latest version of gawk.   Many  bugs  (usually
2096       subtle  ones)  are  fixed at each release, and if yours is out of date,
2097       the problem may already have been solved.  Second, please see  if  set‐
2098       ting  the  environment variable LC_ALL to LC_ALL=C causes things to be‐
2099       have as you expect. If so, it's a locale issue, and may or may not  re‐
2100       ally  be  a  bug.  Finally, please read this man page and the reference
2101       manual carefully to be sure that what you think is a bug really is, in‐
2102       stead of just a quirk in the language.
2103
2104       Whatever  you do, do NOT post a bug report in comp.lang.awk.  While the
2105       gawk developers occasionally read this newsgroup, posting  bug  reports
2106       there is an unreliable way to report bugs.  Similarly, do NOT use a web
2107       forum (such as Stack Overflow) for reporting bugs.  Instead, please use
2108       the electronic mail addresses given above.  Really.
2109
2110       If you're using a GNU/Linux or BSD-based system, you may wish to submit
2111       a bug report to the vendor of  your  distribution.   That's  fine,  but
2112       please send a copy to the official email address as well, since there's
2113       no guarantee that the bug report will be forwarded to  the  gawk  main‐
2114       tainer.
2115

BUGS

2117       The  -F option is not necessary given the command line variable assign‐
2118       ment feature; it remains only for backwards compatibility.
2119

SEE ALSO

2121       egrep(1), sed(1), getpid(2),  getppid(2),  getpgrp(2),  getuid(2),  ge‐
2122       teuid(2),  getgid(2), getegid(2), getgroups(2), printf(3), strftime(3),
2123       usleep(3)
2124
2125       The AWK Programming Language, Alfred V. Aho, Brian W. Kernighan,  Peter
2126       J. Weinberger, Addison-Wesley, 1988.  ISBN 0-201-07981-X.
2127
2128       GAWK:  Effective  AWK  Programming,  Edition 5.1, shipped with the gawk
2129       source.  The current version of this document is  available  online  at
2130       https://www.gnu.org/software/gawk/manual.
2131
2132       The     GNU     gettext     documentation,    available    online    at
2133       https://www.gnu.org/software/gettext.
2134

EXAMPLES

2136       Print and sort the login names of all users:
2137
2138            BEGIN     { FS = ":" }
2139                 { print $1 | "sort" }
2140
2141       Count lines in a file:
2142
2143                 { nlines++ }
2144            END  { print nlines }
2145
2146       Precede each line by its number in the file:
2147
2148            { print FNR, $0 }
2149
2150       Concatenate and line number (a variation on a theme):
2151
2152            { print NR, $0 }
2153
2154       Run an external command for particular lines of data:
2155
2156            tail -f access_log |
2157            awk '/myhome.html/ { system("nmap " $1 ">> logdir/myhome.html") }'
2158

ACKNOWLEDGEMENTS

2160       Brian Kernighan provided valuable assistance during testing and  debug‐
2161       ging.  We thank him.
2162

COPYING PERMISSIONS

2164       Copyright © 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
2165       2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 2013, 2014,
2166       2015,  2016,  2017,  2018,  2019, 2020, 2021, Free Software Foundation,
2167       Inc.
2168
2169       Permission is granted to make and distribute verbatim  copies  of  this
2170       manual  page  provided  the copyright notice and this permission notice
2171       are preserved on all copies.
2172
2173       Permission is granted to copy and distribute modified versions of  this
2174       manual  page  under  the conditions for verbatim copying, provided that
2175       the entire resulting derived work is distributed under the terms  of  a
2176       permission notice identical to this one.
2177
2178       Permission  is granted to copy and distribute translations of this man‐
2179       ual page into another language, under the above conditions for modified
2180       versions,  except that this permission notice may be stated in a trans‐
2181       lation approved by the Foundation.
2182
2183
2184
2185Free Software Foundation          Jul 05 2021                          GAWK(1)
Impressum