1GAWK(1)                        Utility Commands                        GAWK(1)
2
3
4

NAME

6       gawk - pattern scanning and processing language
7

SYNOPSIS

9       gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
10       gawk [ POSIX or GNU style options ] [ -- ] program-text file ...
11
12       pgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
13       pgawk [ POSIX or GNU style options ] [ -- ] program-text file ...
14

DESCRIPTION

16       Gawk  is  the  GNU Project's implementation of the AWK programming lan‐
17       guage.  It conforms to the definition of  the  language  in  the  POSIX
18       1003.1  Standard.   This version in turn is based on the description in
19       The AWK Programming Language, by Aho, Kernighan, and  Weinberger,  with
20       the additional features found in the System V Release 4 version of UNIX
21       awk.  Gawk also provides more recent Bell Laboratories awk  extensions,
22       and a number of GNU-specific extensions.
23
24       Pgawk  is  the profiling version of gawk.  It is identical in every way
25       to gawk, except that programs run more  slowly,  and  it  automatically
26       produces  an  execution profile in the file awkprof.out when done.  See
27       the --profile option, below.
28
29       The command line consists of options to gawk itself,  the  AWK  program
30       text  (if  not supplied via the -f or --file options), and values to be
31       made available in the ARGC and ARGV pre-defined AWK variables.
32

OPTION FORMAT

34       Gawk options may be either traditional POSIX  one  letter  options,  or
35       GNU-style  long  options.  POSIX options start with a single “-”, while
36       long options start with “--”.  Long options are provided for both  GNU-
37       specific features and for POSIX-mandated features.
38
39       Following  the  POSIX  standard, gawk-specific options are supplied via
40       arguments to the -W option.  Multiple -W options may be  supplied  Each
41       -W  option  has  a corresponding long option, as detailed below.  Argu‐
42       ments to long options are either joined with the option by an  =  sign,
43       with no intervening spaces, or they may be provided in the next command
44       line argument.  Long options may be abbreviated, as long as the  abbre‐
45       viation remains unique.
46

OPTIONS

48       Gawk accepts the following options, listed by frequency.
49
50       -F fs
51       --field-separator fs
52              Use fs for the input field separator (the value of the FS prede‐
53              fined variable).
54
55       -v var=val
56       --assign var=val
57              Assign the value val to the variable var,  before  execution  of
58              the  program  begins.  Such variable values are available to the
59              BEGIN block of an AWK program.
60
61       -f program-file
62       --file program-file
63              Read the AWK program source from the file program-file,  instead
64              of  from  the  first  command  line  argument.   Multiple -f (or
65              --file) options may be used.
66
67       -mf NNN
68       -mr NNN
69              Set various memory limits to the value NNN.  The f flag sets the
70              maximum number of fields, and the r flag sets the maximum record
71              size.  These two flags and the -m option  are  from  an  earlier
72              version  of  the Bell Laboratories research version of UNIX awk.
73              They are ignored by gawk, since gawk has no pre-defined  limits.
74              (Current  versions of the Bell Laboratories awk no longer accept
75              them.)
76
77       -O
78       --optimize
79              Enable optimizations upon the  internal  representation  of  the
80              program.  Currently, this includes just simple constant-folding.
81              The gawk maintainer hopes to add additional  optimizations  over
82              time.
83
84       -W compat
85       -W traditional
86       --compat
87       --traditional
88              Run  in compatibility mode.  In compatibility mode, gawk behaves
89              identically to UNIX awk; none of the GNU-specific extensions are
90              recognized.   The  use  of  --traditional  is preferred over the
91              other forms of this option.  See GNU EXTENSIONS, below, for more
92              information.
93
94       -W copyleft
95       -W copyright
96       --copyleft
97       --copyright
98              Print the short version of the GNU copyright information message
99              on the standard output and exit successfully.
100
101       -W dump-variables[=file]
102       --dump-variables[=file]
103              Print a sorted list of global variables, their types  and  final
104              values  to file.  If no file is provided, gawk uses a file named
105              awkvars.out in the current directory.
106              Having a list of all the global variables is a good way to  look
107              for  typographical  errors in your programs.  You would also use
108              this option if you have a large program with a lot of functions,
109              and  you want to be sure that your functions don't inadvertently
110              use global variables that you meant to be  local.   (This  is  a
111              particularly  easy  mistake  to  make with simple variable names
112              like i, j, and so on.)
113
114       -W exec file
115       --exec file
116              Similar to -f, however, this is option  is  the  last  one  pro‐
117              cessed.   This should be used with #!  scripts, particularly for
118              CGI applications, to avoid passing in options or source code (!)
119              on  the  command line from a URL.  This option disables command-
120              line variable assignments.
121
122       -W gen-po
123       --gen-po
124              Scan and parse the AWK program, and generate a  GNU  .po  format
125              file on standard output with entries for all localizable strings
126              in the program.  The program itself is not  executed.   See  the
127              GNU gettext distribution for more information on .po files.
128
129       -W help
130       -W usage
131       --help
132       --usage
133              Print a relatively short summary of the available options on the
134              standard output.  (Per the GNU Coding Standards,  these  options
135              cause an immediate, successful exit.)
136
137       -W lint[=value]
138       --lint[=value]
139              Provide warnings about constructs that are dubious or non-porta‐
140              ble to other AWK implementations.  With an optional argument  of
141              fatal,  lint warnings become fatal errors.  This may be drastic,
142              but its use will certainly encourage the development of  cleaner
143              AWK  programs.  With an optional argument of invalid, only warn‐
144              ings about things that are actually invalid are issued. (This is
145              not fully implemented yet.)
146
147       -W lint-old
148       --lint-old
149              Provide  warnings  about constructs that are not portable to the
150              original version of Unix awk.
151
152       -W non-decimal-data
153       --non-decimal-data
154              Recognize octal and hexadecimal values in input data.  Use  this
155              option with great caution!
156
157       -W posix
158       --posix
159              This  turns on compatibility mode, with the following additional
160              restrictions:
161
162              · \x escape sequences are not recognized.
163
164              · Only space and tab act as field separators when FS is set to a
165                single space, newline does not.
166
167              · You cannot continue lines after ?  and :.
168
169              · The synonym func for the keyword function is not recognized.
170
171              · The operators ** and **= cannot be used in place of ^ and ^=.
172
173              · The fflush() function is not available.
174
175       -W profile[=prof_file]
176       --profile[=prof_file]
177              Send  profiling  data to prof_file.  The default is awkprof.out.
178              When run with gawk, the profile is just a “pretty printed”  ver‐
179              sion  of the program.  When run with pgawk, the profile contains
180              execution counts of each statement in the program  in  the  left
181              margin and function call counts for each user-defined function.
182
183       -W re-interval
184       --re-interval
185              Enable  the  use  of  interval expressions in regular expression
186              matching (see Regular Expressions, below).  Interval expressions
187              were not traditionally available in the AWK language.  The POSIX
188              standard added them, to make awk and egrep consistent with  each
189              other.   However, their use is likely to break old AWK programs,
190              so gawk only provides them  if  they  are  requested  with  this
191              option, or when --posix is specified.
192
193       -W source program-text
194       --source program-text
195              Use program-text as AWK program source code.  This option allows
196              the easy intermixing of library functions (used via the  -f  and
197              --file  options)  with  source code entered on the command line.
198              It is intended primarily for medium to large AWK  programs  used
199              in shell scripts.
200
201       -W use-lc-numeric
202       --use-lc-numeric
203              This  forces  gawk  to  use the locale's decimal point character
204              when parsing input data.  Although the POSIX  standard  requires
205              this  behavior,  and gawk does so when --posix is in effect, the
206              default is to follow traditional behavior and use  a  period  as
207              the  decimal  point, even in locales where the period is not the
208              decimal point character.   This  option  overrides  the  default
209              behavior,  without  the full draconian strictness of the --posix
210              option.
211
212       -W version
213       --version
214              Print version information for this particular copy  of  gawk  on
215              the  standard  output.  This is useful mainly for knowing if the
216              current copy of gawk on your system is up to date  with  respect
217              to  whatever the Free Software Foundation is distributing.  This
218              is also useful when reporting bugs.  (Per the GNU  Coding  Stan‐
219              dards, these options cause an immediate, successful exit.)
220
221       --     Signal the end of options. This is useful to allow further argu‐
222              ments to the AWK program itself to start with a “-”.  This  pro‐
223              vides  consistency  with the argument parsing convention used by
224              most other POSIX programs.
225
226       In compatibility mode, any other options are flagged  as  invalid,  but
227       are  otherwise  ignored.   In normal operation, as long as program text
228       has been supplied, unknown options are passed on to the AWK program  in
229       the ARGV array for processing.  This is particularly useful for running
230       AWK programs via the “#!” executable interpreter mechanism.
231

AWK PROGRAM EXECUTION

233       An AWK program consists of a sequence of pattern-action statements  and
234       optional function definitions.
235
236              pattern   { action statements }
237              function name(parameter list) { statements }
238
239       Gawk  first reads the program source from the program-file(s) if speci‐
240       fied, from arguments to --source, or from the first non-option argument
241       on  the command line.  The -f and --source options may be used multiple
242       times on the command line.  Gawk reads the program text as if  all  the
243       program-files  and  command  line  source  texts  had been concatenated
244       together.  This is useful for  building  libraries  of  AWK  functions,
245       without  having to include them in each new AWK program that uses them.
246       It also provides the ability to mix library functions with command line
247       programs.
248
249       The  environment  variable  AWKPATH specifies a search path to use when
250       finding source files named with the -f option.  If this  variable  does
251       not  exist,  the default path is ".:/usr/local/share/awk".  (The actual
252       directory may vary, depending upon how gawk was built  and  installed.)
253       If a file name given to the -f option contains a “/” character, no path
254       search is performed.
255
256       Gawk executes AWK programs in the following order.  First, all variable
257       assignments specified via the -v option are performed.  Next, gawk com‐
258       piles the program into an internal form.  Then, gawk executes the  code
259       in  the  BEGIN  block(s)  (if any), and then proceeds to read each file
260       named in the ARGV array.  If there are no files named  on  the  command
261       line, gawk reads the standard input.
262
263       If a filename on the command line has the form var=val it is treated as
264       a variable assignment.  The variable var will  be  assigned  the  value
265       val.   (This  happens after any BEGIN block(s) have been run.)  Command
266       line variable assignment is most useful for dynamically assigning  val‐
267       ues  to  the  variables  AWK  uses  to control how input is broken into
268       fields and records.  It is also useful for controlling state if  multi‐
269       ple passes are needed over a single data file.
270
271       If  the value of a particular element of ARGV is empty (""), gawk skips
272       over it.
273
274       For each record in the input, gawk tests to see if it matches any  pat‐
275       tern in the AWK program.  For each pattern that the record matches, the
276       associated action is executed.  The patterns are tested  in  the  order
277       they occur in the program.
278
279       Finally,  after  all  the input is exhausted, gawk executes the code in
280       the END block(s) (if any).
281

VARIABLES, RECORDS AND FIELDS

283       AWK variables are dynamic; they come into existence when they are first
284       used.   Their  values  are either floating-point numbers or strings, or
285       both, depending upon how they are used.  AWK also has  one  dimensional
286       arrays; arrays with multiple dimensions may be simulated.  Several pre-
287       defined variables are set as a program runs;  these  are  described  as
288       needed and summarized below.
289
290   Records
291       Normally, records are separated by newline characters.  You can control
292       how records are separated by assigning values to the built-in  variable
293       RS.   If  RS is any single character, that character separates records.
294       Otherwise, RS is a regular expression.  Text in the input that  matches
295       this  regular expression separates the record.  However, in compatibil‐
296       ity mode, only the first character of its string value is used for sep‐
297       arating  records.   If  RS  is set to the null string, then records are
298       separated by blank lines.  When RS is set to the null string, the  new‐
299       line  character  always acts as a field separator, in addition to what‐
300       ever value FS may have.
301
302   Fields
303       As each input record is read, gawk splits the record into fields, using
304       the value of the FS variable as the field separator.  If FS is a single
305       character, fields are separated by that character.  If FS is  the  null
306       string,  then each individual character becomes a separate field.  Oth‐
307       erwise, FS is expected to be a full regular expression.  In the special
308       case  that FS is a single space, fields are separated by runs of spaces
309       and/or tabs and/or newlines.  (But see the section POSIX COMPATIBILITY,
310       below).   NOTE:  The  value  of IGNORECASE (see below) also affects how
311       fields are split when FS is a regular expression, and how  records  are
312       separated when RS is a regular expression.
313
314       If  the  FIELDWIDTHS  variable is set to a space separated list of num‐
315       bers, each field is expected to have fixed width, and  gawk  splits  up
316       the  record  using  the  specified widths.  The value of FS is ignored.
317       Assigning a new value to FS  overrides  the  use  of  FIELDWIDTHS,  and
318       restores the default behavior.
319
320       Each  field  in the input record may be referenced by its position, $1,
321       $2, and so on.  $0 is the whole record.  Fields need not be  referenced
322       by constants:
323
324              n = 5
325              print $n
326
327       prints the fifth field in the input record.
328
329       The  variable  NF  is  set  to  the total number of fields in the input
330       record.
331
332       References to non-existent fields (i.e. fields after $NF)  produce  the
333       null-string.  However, assigning to a non-existent field (e.g., $(NF+2)
334       = 5) increases the value of NF, creates any intervening fields with the
335       null  string  as  their  value, and causes the value of $0 to be recom‐
336       puted, with the fields being separated by the value of OFS.  References
337       to  negative  numbered  fields  cause  a  fatal error.  Decrementing NF
338       causes the values of fields past the new value  to  be  lost,  and  the
339       value  of  $0  to be recomputed, with the fields being separated by the
340       value of OFS.
341
342       Assigning a value to an existing field causes the whole  record  to  be
343       rebuilt  when  $0  is  referenced.   Similarly, assigning a value to $0
344       causes the record to be resplit, creating new values for the fields.
345
346   Built-in Variables
347       Gawk's built-in variables are:
348
349       ARGC        The number of command  line  arguments  (does  not  include
350                   options to gawk, or the program source).
351
352       ARGIND      The index in ARGV of the current file being processed.
353
354       ARGV        Array of command line arguments.  The array is indexed from
355                   0 to ARGC - 1.  Dynamically changing the contents  of  ARGV
356                   can control the files used for data.
357
358       BINMODE     On  non-POSIX  systems,  specifies use of “binary” mode for
359                   all file I/O.  Numeric values of 1, 2, or 3,  specify  that
360                   input  files,  output  files,  or  all files, respectively,
361                   should use binary I/O.  String values of "r", or "w"  spec‐
362                   ify that input files, or output files, respectively, should
363                   use binary I/O.  String values of "rw" or "wr" specify that
364                   all files should use binary I/O.  Any other string value is
365                   treated as "rw", but generates a warning message.
366
367       CONVFMT     The conversion format for numbers, "%.6g", by default.
368
369       ENVIRON     An array containing the values of the current  environment.
370                   The  array  is  indexed  by the environment variables, each
371                   element being the  value  of  that  variable  (e.g.,  ENVI‐
372                   RON["HOME"]  might  be  /home/arnold).  Changing this array
373                   does not affect the environment seen by programs which gawk
374                   spawns via redirection or the system() function.
375
376       ERRNO       If  a  system  error  occurs either doing a redirection for
377                   getline, during a read for getline, or  during  a  close(),
378                   then ERRNO will contain a string describing the error.  The
379                   value is subject to translation in non-English locales.
380
381       FIELDWIDTHS A white-space separated list  of  fieldwidths.   When  set,
382                   gawk  parses  the input into fields of fixed width, instead
383                   of using the value of the FS variable as the field  separa‐
384                   tor.
385
386       FILENAME    The name of the current input file.  If no files are speci‐
387                   fied on the command line, the value  of  FILENAME  is  “-”.
388                   However,  FILENAME  is  undefined  inside  the  BEGIN block
389                   (unless set by getline).
390
391       FNR         The input record number in the current input file.
392
393       FS          The input field separator, a space by default.  See Fields,
394                   above.
395
396       IGNORECASE  Controls the case-sensitivity of all regular expression and
397                   string operations.  If IGNORECASE  has  a  non-zero  value,
398                   then  string  comparisons  and  pattern  matching in rules,
399                   field splitting with FS, record separating with RS, regular
400                   expression  matching  with  ~  and  !~,  and  the gensub(),
401                   gsub(), index(), match(), split(), and sub() built-in func‐
402                   tions  all ignore case when doing regular expression opera‐
403                   tions.  NOTE: Array subscripting is not affected.  However,
404                   the asort() and asorti() functions are affected.
405                   Thus,  if IGNORECASE is not equal to zero, /aB/ matches all
406                   of the strings "ab", "aB", "Ab", and "AB".  As with all AWK
407                   variables,  the initial value of IGNORECASE is zero, so all
408                   regular expression and string operations are normally case-
409                   sensitive.  Under Unix, the full ISO 8859-1 Latin-1 charac‐
410                   ter set is used when ignoring case.  As of gawk 3.1.4,  the
411                   case  equivalencies  are fully locale-aware, based on the C
412                   <ctype.h> facilities such as isalpha(), and toupper().
413
414       LINT        Provides dynamic control of the --lint option  from  within
415                   an AWK program.  When true, gawk prints lint warnings. When
416                   false,  it  does  not.   When  assigned  the  string  value
417                   "fatal",  lint  warnings  become fatal errors, exactly like
418                   --lint=fatal.  Any other true value just prints warnings.
419
420       NF          The number of fields in the current input record.
421
422       NR          The total number of input records seen so far.
423
424       OFMT        The output format for numbers, "%.6g", by default.
425
426       OFS         The output field separator, a space by default.
427
428       ORS         The output record separator, by default a newline.
429
430       PROCINFO    The elements of this array provide  access  to  information
431                   about  the running AWK program.  On some systems, there may
432                   be elements in the array,  "group1"  through  "groupn"  for
433                   some  n,  which  is the number of supplementary groups that
434                   the process has.  Use the in operator  to  test  for  these
435                   elements.   The  following  elements  are  guaranteed to be
436                   available:
437
438                   PROCINFO["egid"]    the  value  of  the  getegid(2)  system
439                                       call.
440
441                   PROCINFO["euid"]    the  value  of  the  geteuid(2)  system
442                                       call.
443
444                   PROCINFO["FS"]      "FS" if field splitting with FS  is  in
445                                       effect,   or   "FIELDWIDTHS"  if  field
446                                       splitting  with   FIELDWIDTHS   is   in
447                                       effect.
448
449                   PROCINFO["gid"]     the value of the getgid(2) system call.
450
451                   PROCINFO["pgrpid"]  the  process  group  ID  of the current
452                                       process.
453
454                   PROCINFO["pid"]     the process ID of the current process.
455
456                   PROCINFO["ppid"]    the parent process ID  of  the  current
457                                       process.
458
459                   PROCINFO["uid"]     the value of the getuid(2) system call.
460
461                   PROCINFO["version"] the version of gawk.  This is available
462                                       from version 3.1.4 and later.
463
464       RS          The input record separator, by default a newline.
465
466       RT          The record terminator.  Gawk sets RT to the input text that
467                   matched  the  character  or regular expression specified by
468                   RS.
469
470       RSTART      The index of the first character matched by match();  0  if
471                   no  match.   (This  implies that character indices start at
472                   one.)
473
474       RLENGTH     The length of the string  matched  by  match();  -1  if  no
475                   match.
476
477       SUBSEP      The character used to separate multiple subscripts in array
478                   elements, by default "\034".
479
480       TEXTDOMAIN  The text domain of the AWK program; used to find the local‐
481                   ized translations for the program's strings.
482
483   Arrays
484       Arrays  are  subscripted  with an expression between square brackets ([
485       and ]).  If the expression is an expression list (expr, expr ...)  then
486       the  array subscript is a string consisting of the concatenation of the
487       (string) value of each expression, separated by the value of the SUBSEP
488       variable.   This  facility  is  used  to  simulate multiply dimensioned
489       arrays.  For example:
490
491              i = "A"; j = "B"; k = "C"
492              x[i, j, k] = "hello, world\n"
493
494       assigns the string "hello, world\n" to the element of the array x which
495       is indexed by the string "A\034B\034C".  All arrays in AWK are associa‐
496       tive, i.e. indexed by string values.
497
498       The special operator in may be used to test if an array  has  an  index
499       consisting of a particular value.
500
501              if (val in array)
502                   print array[val]
503
504       If the array has multiple subscripts, use (i, j) in array.
505
506       The in construct may also be used in a for loop to iterate over all the
507       elements of an array.
508
509       An element may be deleted from an array  using  the  delete  statement.
510       The  delete statement may also be used to delete the entire contents of
511       an array, just by specifying the array name without a subscript.
512
513   Variable Typing And Conversion
514       Variables and fields may be (floating point) numbers,  or  strings,  or
515       both.  How the value of a variable is interpreted depends upon its con‐
516       text.  If used in a numeric expression, it will be treated as a number;
517       if used as a string it will be treated as a string.
518
519       To force a variable to be treated as a number, add 0 to it; to force it
520       to be treated as a string, concatenate it with the null string.
521
522       When a string must be converted to a number, the conversion  is  accom‐
523       plished  using  strtod(3).   A number is converted to a string by using
524       the value of CONVFMT as  a  format  string  for  sprintf(3),  with  the
525       numeric  value  of  the variable as the argument.  However, even though
526       all numbers in AWK are floating-point, integral values are always  con‐
527       verted as integers.  Thus, given
528
529              CONVFMT = "%2.2f"
530              a = 12
531              b = a ""
532
533       the variable b has a string value of "12" and not "12.00".
534
535       When  operating  in  POSIX  mode (such as with the --posix command line
536       option), beware that locale settings may interfere with the way decimal
537       numbers are treated: the decimal separator of the numbers you are feed‐
538       ing to gawk must conform to what your locale  would  expect,  be  it  a
539       comma (,) or a period (.).
540
541       Gawk  performs  comparisons  as  follows: If two variables are numeric,
542       they are compared numerically.  If one value is numeric and  the  other
543       has  a  string  value  that is a “numeric string,” then comparisons are
544       also done numerically.  Otherwise, the numeric value is converted to  a
545       string and a string comparison is performed.  Two strings are compared,
546       of course, as strings.
547
548       Note that string constants, such as "57", are not numeric strings, they
549       are  string  constants.   The  idea of “numeric string” only applies to
550       fields, getline input, FILENAME, ARGV elements,  ENVIRON  elements  and
551       the  elements  of an array created by split() that are numeric strings.
552       The basic idea is that user input, and  only  user  input,  that  looks
553       numeric, should be treated that way.
554
555       Uninitialized  variables  have the numeric value 0 and the string value
556       "" (the null, or empty, string).
557
558   Octal and Hexadecimal Constants
559       Starting with version 3.1 of gawk , you may use C-style octal and hexa‐
560       decimal  constants  in  your AWK program source code.  For example, the
561       octal value 011 is equal to decimal 9, and the hexadecimal  value  0x11
562       is equal to decimal 17.
563
564   String Constants
565       String  constants  in  AWK are sequences of characters enclosed between
566       double quotes (").  Within strings, certain escape sequences are recog‐
567       nized, as in C.  These are:
568
569       \\   A literal backslash.
570
571       \a   The “alert” character; usually the ASCII BEL character.
572
573       \b   backspace.
574
575       \f   form-feed.
576
577       \n   newline.
578
579       \r   carriage return.
580
581       \t   horizontal tab.
582
583       \v   vertical tab.
584
585       \xhex digits
586            The character represented by the string of hexadecimal digits fol‐
587            lowing the \x.  As in ANSI C, all following hexadecimal digits are
588            considered part of the escape sequence.  (This feature should tell
589            us something about language design by committee.)  E.g., "\x1B" is
590            the ASCII ESC (escape) character.
591
592       \ddd The  character  represented  by the 1-, 2-, or 3-digit sequence of
593            octal digits.  E.g., "\033" is the ASCII ESC (escape) character.
594
595       \c   The literal character c.
596
597       The escape sequences may also be used inside constant  regular  expres‐
598       sions (e.g., /[ \t\f\n\r\v]/ matches whitespace characters).
599
600       In compatibility mode, the characters represented by octal and hexadec‐
601       imal escape sequences  are  treated  literally  when  used  in  regular
602       expression constants.  Thus, /a\52b/ is equivalent to /a\*b/.
603

PATTERNS AND ACTIONS

605       AWK is a line-oriented language.  The pattern comes first, and then the
606       action.  Action statements are enclosed in { and }.  Either the pattern
607       may be missing, or the action may be missing, but, of course, not both.
608       If the pattern is missing, the action  is  executed  for  every  single
609       record of input.  A missing action is equivalent to
610
611              { print }
612
613       which prints the entire record.
614
615       Comments  begin  with  the “#” character, and continue until the end of
616       the line.  Blank lines may be used to separate statements.  Normally, a
617       statement  ends with a newline, however, this is not the case for lines
618       ending in a “,”, {, ?, :, &&, or ||.  Lines ending in do or  else  also
619       have  their  statements  automatically continued on the following line.
620       In other cases, a line can be continued by ending it  with  a  “\”,  in
621       which case the newline will be ignored.
622
623       Multiple  statements  may  be put on one line by separating them with a
624       “;”.  This applies to both the statements within the action part  of  a
625       pattern-action  pair (the usual case), and to the pattern-action state‐
626       ments themselves.
627
628   Patterns
629       AWK patterns may be one of the following:
630
631              BEGIN
632              END
633              /regular expression/
634              relational expression
635              pattern && pattern
636              pattern || pattern
637              pattern ? pattern : pattern
638              (pattern)
639              ! pattern
640              pattern1, pattern2
641
642       BEGIN and END are two special kinds of patterns which  are  not  tested
643       against  the  input.  The action parts of all BEGIN patterns are merged
644       as if all the statements had been written  in  a  single  BEGIN  block.
645       They  are executed before any of the input is read.  Similarly, all the
646       END blocks are merged, and executed when all the input is exhausted (or
647       when  an exit statement is executed).  BEGIN and END patterns cannot be
648       combined with other patterns in pattern  expressions.   BEGIN  and  END
649       patterns cannot have missing action parts.
650
651       For /regular expression/ patterns, the associated statement is executed
652       for each input record that matches  the  regular  expression.   Regular
653       expressions  are  the  same  as  those  in egrep(1), and are summarized
654       below.
655
656       A relational expression may use any of the operators defined  below  in
657       the  section  on  actions.  These generally test whether certain fields
658       match certain regular expressions.
659
660       The &&, ||, and !  operators are logical AND, logical OR,  and  logical
661       NOT,  respectively, as in C.  They do short-circuit evaluation, also as
662       in C, and are used for combining more  primitive  pattern  expressions.
663       As  in  most  languages, parentheses may be used to change the order of
664       evaluation.
665
666       The ?: operator is like the same operator in C.  If the  first  pattern
667       is true then the pattern used for testing is the second pattern, other‐
668       wise it is the third.  Only one of the second  and  third  patterns  is
669       evaluated.
670
671       The pattern1, pattern2 form of an expression is called a range pattern.
672       It matches all input records starting with a record that  matches  pat‐
673       tern1,  and continuing until a record that matches pattern2, inclusive.
674       It does not combine with any other sort of pattern expression.
675
676   Regular Expressions
677       Regular expressions are the extended kind found  in  egrep.   They  are
678       composed of characters as follows:
679
680       c          matches the non-metacharacter c.
681
682       \c         matches the literal character c.
683
684       .          matches any character including newline.
685
686       ^          matches the beginning of a string.
687
688       $          matches the end of a string.
689
690       [abc...]   character list, matches any of the characters abc....
691
692       [^abc...]  negated character list, matches any character except abc....
693
694       r1|r2      alternation: matches either r1 or r2.
695
696       r1r2       concatenation: matches r1, and then r2.
697
698       r+         matches one or more r's.
699
700       r*         matches zero or more r's.
701
702       r?         matches zero or one r's.
703
704       (r)        grouping: matches r.
705
706       r{n}
707       r{n,}
708       r{n,m}     One  or two numbers inside braces denote an interval expres‐
709                  sion.  If there is one number in the braces,  the  preceding
710                  regular  expression r is repeated n times.  If there are two
711                  numbers separated by a comma, r is repeated n  to  m  times.
712                  If  there  is  one  number  followed  by  a comma, then r is
713                  repeated at least n times.
714                  Interval expressions are only available if either --posix or
715                  --re-interval is specified on the command line.
716
717       \y         matches  the empty string at either the beginning or the end
718                  of a word.
719
720       \B         matches the empty string within a word.
721
722       \<         matches the empty string at the beginning of a word.
723
724       \>         matches the empty string at the end of a word.
725
726       \w         matches any word-constituent character  (letter,  digit,  or
727                  underscore).
728
729       \W         matches any character that is not word-constituent.
730
731       \`         matches  the  empty  string  at  the  beginning  of a buffer
732                  (string).
733
734       \'         matches the empty string at the end of a buffer.
735
736       The escape sequences that are valid in string constants (see below) are
737       also valid in regular expressions.
738
739       Character  classes  are  a feature introduced in the POSIX standard.  A
740       character class is a special notation for describing lists  of  charac‐
741       ters  that  have  a specific attribute, but where the actual characters
742       themselves can vary from country to country and/or from  character  set
743       to  character  set.   For  example, the notion of what is an alphabetic
744       character differs in the USA and in France.
745
746       A character class is only valid in  a  regular  expression  inside  the
747       brackets  of a character list.  Character classes consist of [:, a key‐
748       word denoting the class, and :].  The character classes defined by  the
749       POSIX standard are:
750
751       [:alnum:]  Alphanumeric characters.
752
753       [:alpha:]  Alphabetic characters.
754
755       [:blank:]  Space or tab characters.
756
757       [:cntrl:]  Control characters.
758
759       [:digit:]  Numeric characters.
760
761       [:graph:]  Characters that are both printable and visible.  (A space is
762                  printable, but not visible, while an a is both.)
763
764       [:lower:]  Lower-case alphabetic characters.
765
766       [:print:]  Printable characters (characters that are not control  char‐
767                  acters.)
768
769       [:punct:]  Punctuation characters (characters that are not letter, dig‐
770                  its, control characters, or space characters).
771
772       [:space:]  Space characters (such as space, tab, and formfeed, to  name
773                  a few).
774
775       [:upper:]  Upper-case alphabetic characters.
776
777       [:xdigit:] Characters that are hexadecimal digits.
778
779       For  example,  before the POSIX standard, to match alphanumeric charac‐
780       ters, you would have had to write /[A-Za-z0-9]/.  If your character set
781       had  other  alphabetic characters in it, this would not match them, and
782       if your character set collated differently from ASCII, this  might  not
783       even match the ASCII alphanumeric characters.  With the POSIX character
784       classes, you can write /[[:alnum:]]/, and this matches  the  alphabetic
785       and numeric characters in your character set, no matter what it is.
786
787       Two  additional special sequences can appear in character lists.  These
788       apply to non-ASCII  character  sets,  which  can  have  single  symbols
789       (called  collating  elements)  that  are represented with more than one
790       character, as well as several characters that are equivalent  for  col‐
791       lating,  or  sorting,  purposes.   (E.g.,  in French, a plain “e” and a
792       grave-accented “e`” are equivalent.)
793
794       Collating Symbols
795              A  collating  symbol  is  a  multi-character  collating  element
796              enclosed  in [.  and .].  For example, if ch is a collating ele‐
797              ment, then [[.ch.]]  is a regular expression that  matches  this
798              collating  element,  while  [ch]  is  a  regular expression that
799              matches either c or h.
800
801       Equivalence Classes
802              An equivalence class is a locale-specific name  for  a  list  of
803              characters  that are equivalent.  The name is enclosed in [= and
804              =].  For example, the name e might be used to represent  all  of
805              “e,”  “e´,”  and “e`.”  In this case, [[=e=]] is a regular expres‐
806              sion that matches any of e, e´, or e`.
807
808       These features are very valuable in non-English speaking locales.   The
809       library  functions  that gawk uses for regular expression matching cur‐
810       rently only recognize POSIX character classes; they  do  not  recognize
811       collating symbols or equivalence classes.
812
813       The  \y, \B, \<, \>, \w, \W, \`, and \' operators are specific to gawk;
814       they are extensions based on facilities in the GNU  regular  expression
815       libraries.
816
817       The various command line options control how gawk interprets characters
818       in regular expressions.
819
820       No options
821              In the default case, gawk provide all the  facilities  of  POSIX
822              regular  expressions  and  the  GNU regular expression operators
823              described above.  However, interval  expressions  are  not  sup‐
824              ported.
825
826       --posix
827              Only  POSIX regular expressions are supported, the GNU operators
828              are not special.  (E.g., \w  matches  a  literal  w).   Interval
829              expressions are allowed.
830
831       --traditional
832              Traditional  Unix  awk regular expressions are matched.  The GNU
833              operators are not special, interval expressions are  not  avail‐
834              able,  and  neither are the POSIX character classes ([[:alnum:]]
835              and so on).   Characters  described  by  octal  and  hexadecimal
836              escape  sequences  are treated literally, even if they represent
837              regular expression metacharacters.
838
839       --re-interval
840              Allow interval  expressions  in  regular  expressions,  even  if
841              --traditional has been provided.
842
843   Actions
844       Action  statements  are enclosed in braces, { and }.  Action statements
845       consist of the usual assignment, conditional,  and  looping  statements
846       found  in  most  languages.   The  operators,  control  statements, and
847       input/output statements available are patterned after those in C.
848
849   Operators
850       The operators in AWK, in order of decreasing precedence, are
851
852       (...)       Grouping
853
854       $           Field reference.
855
856       ++ --       Increment and decrement, both prefix and postfix.
857
858       ^           Exponentiation (** may  also  be  used,  and  **=  for  the
859                   assignment operator).
860
861       + - !       Unary plus, unary minus, and logical negation.
862
863       * / %       Multiplication, division, and modulus.
864
865       + -         Addition and subtraction.
866
867       space       String concatenation.
868
869       | |&        Piped I/O for getline, print, and printf.
870
871       < > <= >= != ==
872                   The regular relational operators.
873
874       ~ !~        Regular  expression match, negated match.  NOTE: Do not use
875                   a constant regular expression (/foo/) on the left-hand side
876                   of  a  ~  or !~.  Only use one on the right-hand side.  The
877                   expression /foo/ ~ exp has  the  same  meaning  as  (($0  ~
878                   /foo/) ~ exp).  This is usually not what was intended.
879
880       in          Array membership.
881
882       &&          Logical AND.
883
884       ||          Logical OR.
885
886       ?:          The  C  conditional  expression.  This has the form expr1 ?
887                   expr2 : expr3.  If expr1 is true, the value of the  expres‐
888                   sion  is  expr2,  otherwise it is expr3.  Only one of expr2
889                   and expr3 is evaluated.
890
891       = += -= *= /= %= ^=
892                   Assignment.  Both absolute assignment  (var  =  value)  and
893                   operator-assignment (the other forms) are supported.
894
895   Control Statements
896       The control statements are as follows:
897
898              if (condition) statement [ else statement ]
899              while (condition) statement
900              do statement while (condition)
901              for (expr1; expr2; expr3) statement
902              for (var in array) statement
903              break
904              continue
905              delete array[index]
906              delete array
907              exit [ expression ]
908              { statements }
909
910   I/O Statements
911       The input/output statements are as follows:
912
913       close(file [, how])   Close file, pipe or co-process.  The optional how
914                             should only be used when closing  one  end  of  a
915                             two-way  pipe  to  a  co-process.   It  must be a
916                             string value, either "to" or "from".
917
918       getline               Set $0 from next input record; set NF, NR, FNR.
919
920       getline <file         Set $0 from next record of file; set NF.
921
922       getline var           Set var from next input record; set NR, FNR.
923
924       getline var <file     Set var from next record of file.
925
926       command | getline [var]
927                             Run command piping the output either into  $0  or
928                             var, as above.
929
930       command |& getline [var]
931                             Run  command  as  a  co-process piping the output
932                             either into $0 or var,  as  above.   Co-processes
933                             are  a  gawk  extension.   (command can also be a
934                             socket.  See the subsection Special  File  Names,
935                             below.)
936
937       next                  Stop  processing  the  current input record.  The
938                             next input record is read and  processing  starts
939                             over  with  the first pattern in the AWK program.
940                             If the end of the input data is reached, the  END
941                             block(s), if any, are executed.
942
943       nextfile              Stop processing the current input file.  The next
944                             input record read comes from the next input file.
945                             FILENAME  and ARGIND are updated, FNR is reset to
946                             1, and processing starts over with the first pat‐
947                             tern  in the AWK program. If the end of the input
948                             data is reached, the END block(s),  if  any,  are
949                             executed.
950
951       print                 Prints  the current record.  The output record is
952                             terminated with the value of the ORS variable.
953
954       print expr-list       Prints expressions.  Each expression is separated
955                             by  the  value  of  the OFS variable.  The output
956                             record is terminated with the value  of  the  ORS
957                             variable.
958
959       print expr-list >file Prints  expressions  on file.  Each expression is
960                             separated by the value of the OFS variable.   The
961                             output record is terminated with the value of the
962                             ORS variable.
963
964       printf fmt, expr-list Format and print.
965
966       printf fmt, expr-list >file
967                             Format and print on file.
968
969       system(cmd-line)      Execute the command cmd-line, and return the exit
970                             status.   (This may not be available on non-POSIX
971                             systems.)
972
973       fflush([file])        Flush any buffers associated with the open output
974                             file  or  pipe  file.   If  file is missing, then
975                             standard output is flushed.  If file is the  null
976                             string, then all open output files and pipes have
977                             their buffers flushed.
978
979       Additional output redirections are allowed for print and printf.
980
981       print ... >> file
982              Appends output to the file.
983
984       print ... | command
985              Writes on a pipe.
986
987       print ... |& command
988              Sends data to a co-process or socket.  (See also the  subsection
989              Special File Names, below.)
990
991       The  getline  command returns 1 on success, 0 on end of file, and -1 on
992       an error.  Upon an error, ERRNO contains a string describing the  prob‐
993       lem.
994
995       NOTE:  Failure  in  opening a two-way socket will result in a non-fatal
996       error being returned to the calling function.  If  using  a  pipe,  co-
997       process,  or  socket to getline, or from print or printf within a loop,
998       you must use close() to create new instances of the command or  socket.
999       AWK  does  not automatically close pipes, sockets, or co-processes when
1000       they return EOF.
1001
1002   The printf Statement
1003       The AWK versions of the printf statement and  sprintf()  function  (see
1004       below) accept the following conversion specification formats:
1005
1006       %c      An ASCII character.  If the argument used for %c is numeric, it
1007               is treated as a character and printed.  Otherwise, the argument
1008               is assumed to be a string, and the only first character of that
1009               string is printed.
1010
1011       %d, %i  A decimal number (the integer part).
1012
1013       %e, %E  A floating point number of the form [-]d.dddddde[+-]dd.  The %E
1014               format uses E instead of e.
1015
1016       %f, %F  A floating point number of the form [-]ddd.dddddd.  If the sys‐
1017               tem library supports it, %F is available as well. This is  like
1018               %f,  but  uses  capital  letters for special “not a number” and
1019               “infinity” values. If %F is not available, gawk uses %f.
1020
1021       %g, %G  Use %e or %f conversion, whichever is shorter, with nonsignifi‐
1022               cant zeros suppressed.  The %G format uses %E instead of %e.
1023
1024       %o      An unsigned octal number (also an integer).
1025
1026       %u      An unsigned decimal number (again, an integer).
1027
1028       %s      A character string.
1029
1030       %x, %X  An  unsigned  hexadecimal  number  (an integer).  The %X format
1031               uses ABCDEF instead of abcdef.
1032
1033       %%      A single % character; no argument is converted.
1034
1035       Optional, additional parameters may lie between the % and  the  control
1036       letter:
1037
1038       count$ Use the count'th argument at this point in the formatting.  This
1039              is called a positional specifier and is intended  primarily  for
1040              use  in translated versions of format strings, not in the origi‐
1041              nal text of an AWK program.  It is a gawk extension.
1042
1043       -      The expression should be left-justified within its field.
1044
1045       space  For numeric conversions, prefix positive values  with  a  space,
1046              and negative values with a minus sign.
1047
1048       +      The  plus sign, used before the width modifier (see below), says
1049              to always supply a sign for numeric  conversions,  even  if  the
1050              data  to  be  formatted  is positive.  The + overrides the space
1051              modifier.
1052
1053       #      Use an “alternate form” for certain control  letters.   For  %o,
1054              supply  a  leading zero.  For %x, and %X, supply a leading 0x or
1055              0X for a nonzero result.  For %e, %E,  %f  and  %F,  the  result
1056              always contains a decimal point.  For %g, and %G, trailing zeros
1057              are not removed from the result.
1058
1059       0      A leading 0 (zero) acts as a flag, that indicates output  should
1060              be  padded  with zeroes instead of spaces.  This applies only to
1061              the numeric output formats.  This flag only has an  effect  when
1062              the field width is wider than the value to be printed.
1063
1064       width  The field should be padded to this width.  The field is normally
1065              padded with spaces.  If the 0 flag has been used, it  is  padded
1066              with zeroes.
1067
1068       .prec  A number that specifies the precision to use when printing.  For
1069              the %e, %E, %f and %F, formats, this  specifies  the  number  of
1070              digits  you want printed to the right of the decimal point.  For
1071              the %g, and %G formats, it specifies the maximum number of  sig‐
1072              nificant digits.  For the %d, %o, %i, %u, %x, and %X formats, it
1073              specifies the minimum number of digits to  print.   For  %s,  it
1074              specifies  the maximum number of characters from the string that
1075              should be printed.
1076
1077       The dynamic width and prec capabilities of the ANSI C printf() routines
1078       are supported.  A * in place of either the width or prec specifications
1079       causes their values to be taken from the argument  list  to  printf  or
1080       sprintf().   To use a positional specifier with a dynamic width or pre‐
1081       cision, supply the count$ after the * in the format string.  For  exam‐
1082       ple, "%3$*2$.*1$s".
1083
1084   Special File Names
1085       When  doing I/O redirection from either print or printf into a file, or
1086       via getline from a file,  gawk  recognizes  certain  special  filenames
1087       internally.   These  filenames  allow  access  to open file descriptors
1088       inherited from gawk's parent process (usually the shell).   These  file
1089       names  may  also  be  used on the command line to name data files.  The
1090       filenames are:
1091
1092       /dev/stdin  The standard input.
1093
1094       /dev/stdout The standard output.
1095
1096       /dev/stderr The standard error output.
1097
1098       /dev/fd/n   The file associated with the open file descriptor n.
1099
1100       These are particularly useful for error messages.  For example:
1101
1102              print "You blew it!" > "/dev/stderr"
1103
1104       whereas you would otherwise have to use
1105
1106              print "You blew it!" | "cat 1>&2"
1107
1108       The following special filenames may be  used  with  the  |&  co-process
1109       operator for creating TCP/IP network connections.
1110
1111       /inet/tcp/lport/rhost/rport  File  for  TCP/IP connection on local port
1112                                    lport to remote host rhost on remote  port
1113                                    rport.  Use a port of 0 to have the system
1114                                    pick a port.
1115
1116       /inet/udp/lport/rhost/rport  Similar, but use UDP/IP instead of TCP/IP.
1117
1118       /inet/raw/lport/rhost/rport  Reserved for future use.
1119
1120       Other special filenames provide access to information about the running
1121       gawk  process.   These  filenames  are  now obsolete.  Use the PROCINFO
1122       array to obtain the information they provide.  The filenames are:
1123
1124       /dev/pid    Reading this file returns the process  ID  of  the  current
1125                   process, in decimal, terminated with a newline.
1126
1127       /dev/ppid   Reading this file returns the parent process ID of the cur‐
1128                   rent process, in decimal, terminated with a newline.
1129
1130       /dev/pgrpid Reading this file returns the process group ID of the  cur‐
1131                   rent process, in decimal, terminated with a newline.
1132
1133       /dev/user   Reading this file returns a single record terminated with a
1134                   newline.  The fields are separated with spaces.  $1 is  the
1135                   value  of the getuid(2) system call, $2 is the value of the
1136                   geteuid(2) system call, $3 is the value  of  the  getgid(2)
1137                   system  call,  and $4 is the value of the getegid(2) system
1138                   call.  If there are any additional  fields,  they  are  the
1139                   group  IDs  returned  by getgroups(2).  Multiple groups may
1140                   not be supported on all systems.
1141
1142   Numeric Functions
1143       AWK has the following built-in arithmetic functions:
1144
1145       atan2(y, x)   Returns the arctangent of y/x in radians.
1146
1147       cos(expr)     Returns the cosine of expr, which is in radians.
1148
1149       exp(expr)     The exponential function.
1150
1151       int(expr)     Truncates to integer.
1152
1153       log(expr)     The natural logarithm function.
1154
1155       rand()        Returns a random number N, between 0 and 1, such that 0 ≤
1156                     N < 1.
1157
1158       sin(expr)     Returns the sine of expr, which is in radians.
1159
1160       sqrt(expr)    The square root function.
1161
1162       srand([expr]) Uses  expr as a new seed for the random number generator.
1163                     If no expr is provided, the time of  day  is  used.   The
1164                     return  value  is the previous seed for the random number
1165                     generator.
1166
1167   String Functions
1168       Gawk has the following built-in string functions:
1169
1170       asort(s [, d])          Returns the number of elements  in  the  source
1171                               array  s.   The  contents of s are sorted using
1172                               gawk's normal rules for comparing  values,  and
1173                               the  indices  of  the  sorted  values  of s are
1174                               replaced with sequential integers starting with
1175                               1. If the optional destination array d is spec‐
1176                               ified, then s is first duplicated into  d,  and
1177                               then  d  is  sorted, leaving the indices of the
1178                               source array s unchanged.
1179
1180       asorti(s [, d])         Returns the number of elements  in  the  source
1181                               array  s.   The behavior is the same as that of
1182                               asort(), except that the array indices are used
1183                               for  sorting, not the array values.  When done,
1184                               the array is indexed numerically, and the  val‐
1185                               ues  are  those  of  the original indices.  The
1186                               original values are lost; thus provide a second
1187                               array if you wish to preserve the original.
1188
1189       gensub(r, s, h [, t])   Search  the  target string t for matches of the
1190                               regular expression r.  If h is a string  begin‐
1191                               ning with g or G, then replace all matches of r
1192                               with s.  Otherwise, h is  a  number  indicating
1193                               which  match of r to replace.  If t is not sup‐
1194                               plied, $0 is used instead.  Within the replace‐
1195                               ment  text  s,  the  sequence  \n, where n is a
1196                               digit from 1 to 9, may be used to indicate just
1197                               the  text  that  matched the n'th parenthesized
1198                               subexpression.  The sequence \0 represents  the
1199                               entire  matched  text, as does the character &.
1200                               Unlike sub() and gsub(), the modified string is
1201                               returned as the result of the function, and the
1202                               original target string is not changed.
1203
1204       gsub(r, s [, t])        For each substring matching the regular expres‐
1205                               sion  r  in the string t, substitute the string
1206                               s, and return the number of substitutions.   If
1207                               t  is  not  supplied,  use  $0.   An  &  in the
1208                               replacement text is replaced with the text that
1209                               was  actually matched.  Use \& to get a literal
1210                               &.  (This must be typed  as  "\\&";  see  GAWK:
1211                               Effective  AWK Programming for a fuller discus‐
1212                               sion of the rules for &'s  and  backslashes  in
1213                               the replacement text of sub(), gsub(), and gen‐
1214                               sub().)
1215
1216       index(s, t)             Returns the index of the string t in the string
1217                               s,  or  0  if  t is not present.  (This implies
1218                               that character indices start at one.)
1219
1220       length([s])             Returns the length of  the  string  s,  or  the
1221                               length  of  $0  if s is not supplied.  Starting
1222                               with version 3.1.5, as  a  non-standard  exten‐
1223                               sion,  with an array argument, length() returns
1224                               the number of elements in the array.
1225
1226       match(s, r [, a])       Returns the position in  s  where  the  regular
1227                               expression  r occurs, or 0 if r is not present,
1228                               and sets the  values  of  RSTART  and  RLENGTH.
1229                               Note that the argument order is the same as for
1230                               the ~ operator: str ~ re.  If array a  is  pro‐
1231                               vided, a is cleared and then elements 1 through
1232                               n are filled with the portions of s that  match
1233                               the  corresponding  parenthesized subexpression
1234                               in r.  The 0'th element of a contains the  por‐
1235                               tion of s matched by the entire regular expres‐
1236                               sion r.  Subscripts  a[n,  "start"],  and  a[n,
1237                               "length"]  provide  the  starting  index in the
1238                               string and length respectively, of each  match‐
1239                               ing substring.
1240
1241       split(s, a [, r])       Splits  the  string  s  into the array a on the
1242                               regular expression r, and returns the number of
1243                               fields.   If  r is omitted, FS is used instead.
1244                               The  array  a  is  cleared  first.    Splitting
1245                               behaves   identically   to   field   splitting,
1246                               described above.
1247
1248       sprintf(fmt, expr-list) Prints expr-list according to fmt, and  returns
1249                               the resulting string.
1250
1251       strtonum(str)           Examines  str,  and  returns its numeric value.
1252                               If str begins  with  a  leading  0,  strtonum()
1253                               assumes  that  str  is an octal number.  If str
1254                               begins with a  leading  0x  or  0X,  strtonum()
1255                               assumes that str is a hexadecimal number.
1256
1257       sub(r, s [, t])         Just  like  gsub(), but only the first matching
1258                               substring is replaced.
1259
1260       substr(s, i [, n])      Returns the at most n-character substring of  s
1261                               starting  at i.  If n is omitted, the rest of s
1262                               is used.
1263
1264       tolower(str)            Returns a copy of the string str, with all  the
1265                               upper-case  characters  in  str  translated  to
1266                               their  corresponding  lower-case  counterparts.
1267                               Non-alphabetic characters are left unchanged.
1268
1269       toupper(str)            Returns  a copy of the string str, with all the
1270                               lower-case  characters  in  str  translated  to
1271                               their  corresponding  upper-case  counterparts.
1272                               Non-alphabetic characters are left unchanged.
1273
1274       As of version 3.1.5, gawk is multibyte aware.  This means that index(),
1275       length(),  substr()  and  match()  all work in terms of characters, not
1276       bytes.
1277
1278   Time Functions
1279       Since one of the primary uses of AWK programs is processing  log  files
1280       that  contain time stamp information, gawk provides the following func‐
1281       tions for obtaining time stamps and formatting them.
1282
1283       mktime(datespec)
1284                 Turns datespec into a time stamp of the same form as returned
1285                 by  systime().   The datespec is a string of the form YYYY MM
1286                 DD HH MM SS[ DST].  The contents of the  string  are  six  or
1287                 seven numbers representing respectively the full year includ‐
1288                 ing century, the month from 1 to 12, the  day  of  the  month
1289                 from  1  to  31, the hour of the day from 0 to 23, the minute
1290                 from 0 to 59, and the second from 0 to 60,  and  an  optional
1291                 daylight  saving  flag.  The values of these numbers need not
1292                 be within the ranges specified; for example, an  hour  of  -1
1293                 means 1 hour before midnight.  The origin-zero Gregorian cal‐
1294                 endar is assumed, with year 0 preceding year 1  and  year  -1
1295                 preceding  year  0.   The  time is assumed to be in the local
1296                 timezone.  If the daylight saving flag is positive, the  time
1297                 is  assumed  to be daylight saving time; if zero, the time is
1298                 assumed to be standard time; and if negative  (the  default),
1299                 mktime()  attempts  to determine whether daylight saving time
1300                 is in effect for the specified time.  If  datespec  does  not
1301                 contain  enough  elements  or if the resulting time is out of
1302                 range, mktime() returns -1.
1303
1304       strftime([format [, timestamp[, utc-flag]]])
1305                 Formats timestamp according to the specification  in  format.
1306                 If  utc-flag  is  present  and  is  non-zero or non-null, the
1307                 result is in UTC, otherwise the result is in local time.  The
1308                 timestamp  should  be  of  the  same form as returned by sys‐
1309                 time().  If timestamp is missing, the current time of day  is
1310                 used.   If  format is missing, a default format equivalent to
1311                 the output of date(1) is used.  See the specification for the
1312                 strftime() function in ANSI C for the format conversions that
1313                 are guaranteed to be available.
1314
1315       systime() Returns the current time of day  as  the  number  of  seconds
1316                 since the Epoch (1970-01-01 00:00:00 UTC on POSIX systems).
1317
1318   Bit Manipulations Functions
1319       Starting with version 3.1 of gawk, the following bit manipulation func‐
1320       tions are available.  They work by converting double-precision floating
1321       point  values to uintmax_t integers, doing the operation, and then con‐
1322       verting the result back to floating point.  The functions are:
1323
1324       and(v1, v2)         Return the bitwise AND of the values provided by v1
1325                           and v2.
1326
1327       compl(val)          Return the bitwise complement of val.
1328
1329       lshift(val, count)  Return  the  value  of  val,  shifted left by count
1330                           bits.
1331
1332       or(v1, v2)          Return the bitwise OR of the values provided by  v1
1333                           and v2.
1334
1335       rshift(val, count)  Return  the  value  of  val, shifted right by count
1336                           bits.
1337
1338       xor(v1, v2)         Return the bitwise XOR of the values provided by v1
1339                           and v2.
1340
1341   Internationalization Functions
1342       Starting  with version 3.1 of gawk, the following functions may be used
1343       from within your AWK program for translating strings at run-time.   For
1344       full details, see GAWK: Effective AWK Programming.
1345
1346       bindtextdomain(directory [, domain])
1347              Specifies  the  directory where gawk looks for the .mo files, in
1348              case they will not or cannot be placed in the ``standard'' loca‐
1349              tions  (e.g.,  during  testing).  It returns the directory where
1350              domain is ``bound.''
1351              The default domain is the value of TEXTDOMAIN.  If directory  is
1352              the  null string (""), then bindtextdomain() returns the current
1353              binding for the given domain.
1354
1355       dcgettext(string [, domain [, category]])
1356              Returns the translation of string  in  text  domain  domain  for
1357              locale  category  category.  The default value for domain is the
1358              current value of TEXTDOMAIN.  The default value for category  is
1359              "LC_MESSAGES".
1360              If you supply a value for category, it must be a string equal to
1361              one of the known locale categories described in GAWK:  Effective
1362              AWK  Programming.   You  must  also  supply  a text domain.  Use
1363              TEXTDOMAIN if you want to use the current domain.
1364
1365       dcngettext(string1 , string2 , number [, domain [, category]])
1366              Returns the plural form used for number of  the  translation  of
1367              string1  and  string2  in text domain domain for locale category
1368              category.  The default value for domain is the current value  of
1369              TEXTDOMAIN.  The default value for category is "LC_MESSAGES".
1370              If you supply a value for category, it must be a string equal to
1371              one of the known locale categories described in GAWK:  Effective
1372              AWK  Programming.   You  must  also  supply  a text domain.  Use
1373              TEXTDOMAIN if you want to use the current domain.
1374

USER-DEFINED FUNCTIONS

1376       Functions in AWK are defined as follows:
1377
1378              function name(parameter list) { statements }
1379
1380       Functions are executed when they are called from within expressions  in
1381       either patterns or actions.  Actual parameters supplied in the function
1382       call are used to instantiate the  formal  parameters  declared  in  the
1383       function.   Arrays  are passed by reference, other variables are passed
1384       by value.
1385
1386       Since functions were not originally part of the AWK language, the  pro‐
1387       vision for local variables is rather clumsy: They are declared as extra
1388       parameters in the parameter list.  The convention is to separate  local
1389       variables  from  real parameters by extra spaces in the parameter list.
1390       For example:
1391
1392              function  f(p, q,     a, b)   # a and b are local
1393              {
1394                   ...
1395              }
1396
1397              /abc/     { ... ; f(1, 2) ; ... }
1398
1399       The left parenthesis in a function call is required to immediately fol‐
1400       low  the  function  name,  without  any  intervening white space.  This
1401       avoids a syntactic ambiguity with  the  concatenation  operator.   This
1402       restriction does not apply to the built-in functions listed above.
1403
1404       Functions  may  call each other and may be recursive.  Function parame‐
1405       ters used as local variables are initialized to the null string and the
1406       number zero upon function invocation.
1407
1408       Use return expr to return a value from a function.  The return value is
1409       undefined if no value is provided, or if the function returns by “fall‐
1410       ing off” the end.
1411
1412       If  --lint has been provided, gawk warns about calls to undefined func‐
1413       tions at parse time, instead of at  run  time.   Calling  an  undefined
1414       function at run time is a fatal error.
1415
1416       The word func may be used in place of function.
1417

DYNAMICALLY LOADING NEW FUNCTIONS

1419       Beginning  with version 3.1 of gawk, you can dynamically add new built-
1420       in functions to the running gawk interpreter.   The  full  details  are
1421       beyond  the scope of this manual page; see GAWK: Effective AWK Program‐
1422       ming for the details.
1423
1424       extension(object, function)
1425               Dynamically link the shared object file named  by  object,  and
1426               invoke  function  in  that  object,  to perform initialization.
1427               These should both be provided as strings.   Returns  the  value
1428               returned by function.
1429
1430       This  function  is  provided and documented in GAWK: Effective AWK Pro‐
1431       gramming, but everything about this feature is likely to change eventu‐
1432       ally.   We STRONGLY recommend that you do not use this feature for any‐
1433       thing that you aren't willing to redo.
1434

SIGNALS

1436       pgawk accepts two signals.  SIGUSR1 causes it to  dump  a  profile  and
1437       function  call  stack to the profile file, which is either awkprof.out,
1438       or whatever file was named with the --profile option.  It then  contin‐
1439       ues  to run.  SIGHUP causes pgawk to dump the profile and function call
1440       stack and then exit.
1441

EXAMPLES

1443       Print and sort the login names of all users:
1444
1445            BEGIN     { FS = ":" }
1446                 { print $1 | "sort" }
1447
1448       Count lines in a file:
1449
1450                 { nlines++ }
1451            END  { print nlines }
1452
1453       Precede each line by its number in the file:
1454
1455            { print FNR, $0 }
1456
1457       Concatenate and line number (a variation on a theme):
1458
1459            { print NR, $0 }
1460       Run an external command for particular lines of data:
1461
1462            tail -f access_log |
1463            awk '/myhome.html/ { system("nmap " $1 ">> logdir/myhome.html") }'
1464

INTERNATIONALIZATION

1466       String constants are sequences of characters enclosed in double quotes.
1467       In non-English speaking environments, it is possible to mark strings in
1468       the AWK program as requiring translation to  the  native  natural  lan‐
1469       guage. Such strings are marked in the AWK program with a leading under‐
1470       score (“_”).  For example,
1471
1472              gawk 'BEGIN { print "hello, world" }'
1473
1474       always prints hello, world.  But,
1475
1476              gawk 'BEGIN { print _"hello, world" }'
1477
1478       might print bonjour, monde in France.
1479
1480       There are several steps involved in producing and running a localizable
1481       AWK program.
1482
1483       1.  Add  a BEGIN action to assign a value to the TEXTDOMAIN variable to
1484           set the text domain to a name associated with your program.
1485
1486           BEGIN { TEXTDOMAIN = "myprog" }
1487
1488       This allows gawk to find the .mo file  associated  with  your  program.
1489       Without  this  step,  gawk  uses the messages text domain, which likely
1490       does not contain translations for your program.
1491
1492       2.  Mark all strings that should  be  translated  with  leading  under‐
1493           scores.
1494
1495       3.  If necessary, use the dcgettext() and/or bindtextdomain() functions
1496           in your program, as appropriate.
1497
1498       4.  Run gawk --gen-po -f myprog.awk > myprog.po to generate a .po  file
1499           for your program.
1500
1501       5.  Provide  appropriate translations, and build and install the corre‐
1502           sponding .mo files.
1503
1504       The internationalization features are described in full detail in GAWK:
1505       Effective AWK Programming.
1506

POSIX COMPATIBILITY

1508       A  primary  goal  for gawk is compatibility with the POSIX standard, as
1509       well as with the latest version of UNIX awk.  To this end, gawk  incor‐
1510       porates  the following user visible features which are not described in
1511       the AWK book, but are part of the Bell Laboratories version of awk, and
1512       are in the POSIX standard.
1513
1514       The  book  indicates that command line variable assignment happens when
1515       awk would otherwise open the argument as a file,  which  is  after  the
1516       BEGIN  block  is  executed.   However, in earlier implementations, when
1517       such an assignment appeared before any file names, the assignment would
1518       happen  before the BEGIN block was run.  Applications came to depend on
1519       this “feature.”  When awk was changed to match its  documentation,  the
1520       -v option for assigning variables before program execution was added to
1521       accommodate applications that depended upon the  old  behavior.   (This
1522       feature  was  agreed  upon  by  both  the Bell Laboratories and the GNU
1523       developers.)
1524
1525       The -W option for implementation specific features is  from  the  POSIX
1526       standard.
1527
1528       When  processing arguments, gawk uses the special option “--” to signal
1529       the end of arguments.  In compatibility mode, it warns about but other‐
1530       wise  ignores  undefined  options.  In normal operation, such arguments
1531       are passed on to the AWK program for it to process.
1532
1533       The AWK book does not define the return value of  srand().   The  POSIX
1534       standard has it return the seed it was using, to allow keeping track of
1535       random number sequences.  Therefore srand() in gawk  also  returns  its
1536       current seed.
1537
1538       Other  new features are: The use of multiple -f options (from MKS awk);
1539       the ENVIRON array; the \a, and \v escape sequences (done originally  in
1540       gawk  and  fed  back into the Bell Laboratories version); the tolower()
1541       and toupper() built-in functions (from the Bell Laboratories  version);
1542       and  the  ANSI C conversion specifications in printf (done first in the
1543       Bell Laboratories version).
1544

HISTORICAL FEATURES

1546       There are two features of historical AWK implementations that gawk sup‐
1547       ports.   First,  it  is possible to call the length() built-in function
1548       not only with no argument, but even without parentheses!  Thus,
1549
1550              a = length     # Holy Algol 60, Batman!
1551
1552       is the same as either of
1553
1554              a = length()
1555              a = length($0)
1556
1557       This feature is marked as “deprecated” in the POSIX standard, and  gawk
1558       issues  a  warning  about its use if --lint is specified on the command
1559       line.
1560
1561       The other feature is the use of either the continue or the break state‐
1562       ments  outside  the  body of a while, for, or do loop.  Traditional AWK
1563       implementations have treated such  usage  as  equivalent  to  the  next
1564       statement.   Gawk  supports this usage if --traditional has been speci‐
1565       fied.
1566

GNU EXTENSIONS

1568       Gawk has a number of extensions to POSIX awk.  They  are  described  in
1569       this  section.   All  the  extensions described here can be disabled by
1570       invoking gawk with the --traditional or --posix options.
1571
1572       The following features of gawk are not available in POSIX awk.
1573
1574       · No path search is performed  for  files  named  via  the  -f  option.
1575         Therefore the AWKPATH environment variable is not special.
1576
1577       · The \x escape sequence.  (Disabled with --posix.)
1578
1579       · The fflush() function.  (Disabled with --posix.)
1580
1581       · The  ability  to  continue  lines  after  ?   and  :.  (Disabled with
1582         --posix.)
1583
1584       · Octal and hexadecimal constants in AWK programs.
1585
1586       · The ARGIND, BINMODE, ERRNO, LINT, RT and TEXTDOMAIN variables are not
1587         special.
1588
1589       · The IGNORECASE variable and its side-effects are not available.
1590
1591       · The FIELDWIDTHS variable and fixed-width field splitting.
1592
1593       · The PROCINFO array is not available.
1594
1595       · The use of RS as a regular expression.
1596
1597       · The  special  file names available for I/O redirection are not recog‐
1598         nized.
1599
1600       · The |& operator for creating co-processes.
1601
1602       · The ability to split out individual characters using the null  string
1603         as the value of FS, and as the third argument to split().
1604
1605       · The optional second argument to the close() function.
1606
1607       · The optional third argument to the match() function.
1608
1609       · The ability to use positional specifiers with printf and sprintf().
1610
1611       · The ability to pass an array to length().
1612
1613       · The use of delete array to delete the entire contents of an array.
1614
1615       · The use of nextfile to abandon processing of the current input file.
1616
1617       · The and(), asort(), asorti(), bindtextdomain(), compl(), dcgettext(),
1618         dcngettext(), gensub(), lshift(),  mktime(),  or(),  rshift(),  strf‐
1619         time(), strtonum(), systime() and xor() functions.
1620
1621       · Localizable strings.
1622
1623       · Adding  new built-in functions dynamically with the extension() func‐
1624         tion.
1625
1626       The AWK book does not define the return value of the close()  function.
1627       Gawk's  close()  returns  the  value from fclose(3), or pclose(3), when
1628       closing an output file or pipe, respectively.  It returns the process's
1629       exit  status when closing an input pipe.  The return value is -1 if the
1630       named file, pipe or co-process was not opened with a redirection.
1631
1632       When gawk is invoked with the --traditional option, if the fs  argument
1633       to  the  -F  option  is “t”, then FS is set to the tab character.  Note
1634       that typing gawk -F\t ...  simply causes the shell to  quote  the  “t,”
1635       and  does  not pass “\t” to the -F option.  Since this is a rather ugly
1636       special case, it is not the default behavior.  This behavior also  does
1637       not occur if --posix has been specified.  To really get a tab character
1638       as the field separator, it is best to use single  quotes:  gawk  -F'\t'
1639       ....
1640
1641       If  gawk is configured with the --enable-switch option to the configure
1642       command, then it accepts an additional control-flow statement:
1643              switch (expression) {
1644              case value|regex : statement
1645              ...
1646              [ default: statement ]
1647              }
1648
1649       If gawk is configured with the --disable-directories-fatal option, then
1650       it  will  silently  skip directories named on the command line.  Other‐
1651       wise, it will do so only if invoked with the --traditional option.
1652

ENVIRONMENT VARIABLES

1654       The AWKPATH environment variable can be  used  to  provide  a  list  of
1655       directories  that gawk searches when looking for files named via the -f
1656       and --file options.
1657
1658       For socket communication, two special environment variables can be used
1659       to  control the number of retries (GAWK_SOCK_RETRIES), and the interval
1660       between retries (GAWK_MSEC_SLEEP).  The interval is in milliseconds. On
1661       systems  that  do  not support usleep(3), the value is rounded up to an
1662       integral number of seconds.
1663
1664       If POSIXLY_CORRECT exists in the environment, then gawk behaves exactly
1665       as  if  --posix  had been specified on the command line.  If --lint has
1666       been specified, gawk issues a warning message to this effect.
1667

EXIT STATUS

1669       If the exit statement is used with a value, then gawk  exits  with  the
1670       numeric value given to it.
1671
1672       Otherwise,  if there were no problems during execution, gawk exits with
1673       the value of the C constant EXIT_SUCCESS.  This is usually zero.
1674
1675       If an error occurs, gawk  exits  with  the  value  of  the  C  constant
1676       EXIT_FAILURE.  This is usually one.
1677
1678       If  gawk exits because of a fatal error, the exit status is 2.  On non-
1679       POSIX systems, this value may be mapped to EXIT_FAILURE.
1680

SEE ALSO

1682       egrep(1), getpid(2),  getppid(2),  getpgrp(2),  getuid(2),  geteuid(2),
1683       getgid(2), getegid(2), getgroups(2)
1684
1685       The  AWK Programming Language, Alfred V. Aho, Brian W. Kernighan, Peter
1686       J. Weinberger, Addison-Wesley, 1988.  ISBN 0-201-07981-X.
1687
1688       GAWK: Effective AWK Programming, Edition 3.0,  published  by  the  Free
1689       Software  Foundation,  2001.   The  current version of this document is
1690       available online at http://www.gnu.org/software/gawk/manual.
1691

BUGS

1693       The -F option is not necessary given the command line variable  assign‐
1694       ment feature; it remains only for backwards compatibility.
1695
1696       Syntactically  invalid  single  character programs tend to overflow the
1697       parse stack, generating a rather unhelpful message.  Such programs  are
1698       surprisingly  difficult to diagnose in the completely general case, and
1699       the effort to do so really is not worth it.
1700

AUTHORS

1702       The original version of UNIX awk was designed and implemented by Alfred
1703       Aho, Peter Weinberger, and Brian Kernighan of Bell Laboratories.  Brian
1704       Kernighan continues to maintain and enhance it.
1705
1706       Paul Rubin and Jay Fenlason, of the  Free  Software  Foundation,  wrote
1707       gawk,  to be compatible with the original version of awk distributed in
1708       Seventh Edition UNIX.  John Woods contributed a number  of  bug  fixes.
1709       David  Trueman,  with contributions from Arnold Robbins, made gawk com‐
1710       patible with the new version of UNIX awk.  Arnold Robbins is  the  cur‐
1711       rent maintainer.
1712
1713       The  initial  DOS  port  was  done  by Conrad Kwok and Scott Garfinkle.
1714       Scott Deifik is the current DOS maintainer.  Pat Rankin did the port to
1715       VMS,  and  Michal Jaegermann did the port to the Atari ST.  The port to
1716       OS/2 was done by Kai Uwe Rommel, with contributions and help from  Dar‐
1717       rel Hankerson.  Andreas Buening now maintains the OS/2 port.  Fred Fish
1718       supplied support for the Amiga, and  Martin  Brown  provided  the  BeOS
1719       port.   Stephen  Davies  provided the original Tandem port, and Matthew
1720       Woehlke provided changes for Tandem's  POSIX-compliant  systems.   Ralf
1721       Wildenhues now maintains that port.
1722
1723       See  the  README  file in the gawk distribution for current information
1724       about maintainers and which ports are currently supported.
1725

VERSION INFORMATION

1727       This man page documents gawk, version 3.1.8.
1728

BUG REPORTS

1730       If you find a  bug  in  gawk,  please  send  electronic  mail  to  bug-
1731       gawk@gnu.org.   Please  include your operating system and its revision,
1732       the version of gawk (from gawk --version), what C compiler you used  to
1733       compile  it,  and a test program and data that are as small as possible
1734       for reproducing the problem.
1735
1736       Before sending a bug report, please do the  following  things.   First,
1737       verify  that  you  have the latest version of gawk.  Many bugs (usually
1738       subtle ones) are fixed at each release, and if yours is  out  of  date,
1739       the  problem  may already have been solved.  Second, please see if set‐
1740       ting the environment variable  LC_ALL  to  LC_ALL=C  causes  things  to
1741       behave  as  you  expect. If so, it's a locale issue, and may or may not
1742       really be a bug.  Finally, please read this man page and the  reference
1743       manual  carefully  to  be  sure that what you think is a bug really is,
1744       instead of just a quirk in the language.
1745
1746       Whatever you do, do NOT post a bug report in comp.lang.awk.  While  the
1747       gawk  developers  occasionally read this newsgroup, posting bug reports
1748       there is an unreliable way to report bugs.   Instead,  please  use  the
1749       electronic mail addresses given above.
1750
1751       If you're using a GNU/Linux system or BSD-based system, you may wish to
1752       submit a bug report to the vendor of your distribution.   That's  fine,
1753       but  please  send  a  copy to the official email address as well, since
1754       there's no guarantee that the bug will be forwarded to the  gawk  main‐
1755       tainer.
1756

ACKNOWLEDGEMENTS

1758       Brian  Kernighan of Bell Laboratories provided valuable assistance dur‐
1759       ing testing and debugging.  We thank him.
1760

COPYING PERMISSIONS

1762       Copyright © 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
1763       2001,  2002,  2003,  2004, 2005, 2007, 2009, 2010 Free Software Founda‐
1764       tion, Inc.
1765
1766       Permission is granted to make and distribute verbatim  copies  of  this
1767       manual  page  provided  the copyright notice and this permission notice
1768       are preserved on all copies.
1769
1770       Permission is granted to copy and distribute modified versions of  this
1771       manual  page  under  the conditions for verbatim copying, provided that
1772       the entire resulting derived work is distributed under the terms  of  a
1773       permission notice identical to this one.
1774
1775       Permission  is granted to copy and distribute translations of this man‐
1776       ual page into another language, under the above conditions for modified
1777       versions,  except that this permission notice may be stated in a trans‐
1778       lation approved by the Foundation.
1779
1780
1781
1782Free Software Foundation          Apr 20 2010                          GAWK(1)
Impressum