1GAWK(1)                        Utility Commands                        GAWK(1)
2
3
4

NAME

6       gawk - pattern scanning and processing language
7

SYNOPSIS

9       gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
10       gawk [ POSIX or GNU style options ] [ -- ] program-text file ...
11
12       pgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
13       pgawk [ POSIX or GNU style options ] [ -- ] program-text file ...
14
15       dgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
16

DESCRIPTION

18       Gawk  is  the  GNU Project's implementation of the AWK programming lan‐
19       guage.  It conforms to the definition of  the  language  in  the  POSIX
20       1003.1  Standard.   This version in turn is based on the description in
21       The AWK Programming Language, by Aho, Kernighan, and Weinberger.   Gawk
22       provides  the  additional features found in the current version of UNIX
23       awk and a number of GNU-specific extensions.
24
25       The command line consists of options to gawk itself,  the  AWK  program
26       text  (if  not supplied via the -f or --file options), and values to be
27       made available in the ARGC and ARGV pre-defined AWK variables.
28
29       Pgawk is the profiling version of gawk.  It is identical in  every  way
30       to  gawk,  except  that  programs run more slowly, and it automatically
31       produces an execution profile in the file awkprof.out when  done.   See
32       the --profile option, below.
33
34       Dgawk  is  an awk debugger. Instead of running the program directly, it
35       loads the AWK source code and  then  prompts  for  debugging  commands.
36       Unlike gawk and pgawk, dgawk only processes AWK program source provided
37       with the -f option.  The debugger is documented in GAWK: Effective  AWK
38       Programming.
39

OPTION FORMAT

41       Gawk  options may be either traditional POSIX-style one letter options,
42       or GNU-style long options.  POSIX options  start  with  a  single  “-”,
43       while long options start with “--”.  Long options are provided for both
44       GNU-specific features and for POSIX-mandated features.
45
46       Gawk- specific options are typically used in long-option  form.   Argu‐
47       ments  to  long options are either joined with the option by an = sign,
48       with no intervening spaces, or they may be provided in the next command
49       line  argument.  Long options may be abbreviated, as long as the abbre‐
50       viation remains unique.
51
52       Additionally, each long option has a  corresponding  short  option,  so
53       that  the option's functionality may be used from within #!  executable
54       scripts.
55

OPTIONS

57       Gawk accepts the following options.  Standard options are listed first,
58       followed by options for gawk extensions, listed alphabetically by short
59       option.
60
61       -f program-file
62       --file program-file
63              Read the AWK program source from the file program-file,  instead
64              of  from  the  first  command  line  argument.   Multiple -f (or
65              --file) options may be used.
66
67       -F fs
68       --field-separator fs
69              Use fs for the input field separator (the value of the FS prede‐
70              fined variable).
71
72       -v var=val
73       --assign var=val
74              Assign  the  value  val to the variable var, before execution of
75              the program begins.  Such variable values are available  to  the
76              BEGIN block of an AWK program.
77
78       -b
79       --characters-as-bytes
80              Treat  all input data as single-byte characters. In other words,
81              don't pay any attention to the locale information when  attempt‐
82              ing  to  process  strings  as multibyte characters.  The --posix
83              option overrides this one.
84
85       -c
86       --traditional
87              Run in compatibility mode.  In compatibility mode, gawk  behaves
88              identically to UNIX awk; none of the GNU-specific extensions are
89              recognized.  See GNU EXTENSIONS, below, for more information.
90
91       -C
92       --copyright
93              Print the short version of the GNU copyright information message
94              on the standard output and exit successfully.
95
96       -d[file]
97       --dump-variables[=file]
98              Print  a  sorted list of global variables, their types and final
99              values to file.  If no file is provided, gawk uses a file  named
100              awkvars.out in the current directory.
101              Having  a list of all the global variables is a good way to look
102              for typographical errors in your programs.  You would  also  use
103              this option if you have a large program with a lot of functions,
104              and you want to be sure that your functions don't  inadvertently
105              use  global  variables  that  you meant to be local.  (This is a
106              particularly easy mistake to make  with  simple  variable  names
107              like i, j, and so on.)
108
109       -e program-text
110       --source program-text
111              Use program-text as AWK program source code.  This option allows
112              the easy intermixing of library functions (used via the  -f  and
113              --file  options)  with  source code entered on the command line.
114              It is intended primarily for medium to large AWK  programs  used
115              in shell scripts.
116
117       -E file
118       --exec file
119              Similar  to  -f,  however,  this  is option is the last one pro‐
120              cessed.  This should be used with #!  scripts, particularly  for
121              CGI applications, to avoid passing in options or source code (!)
122              on the command line from a URL.  This option  disables  command-
123              line variable assignments.
124
125       -g
126       --gen-pot
127              Scan  and parse the AWK program, and generate a GNU .pot (Porta‐
128              ble Object Template) format file on standard output with entries
129              for  all localizable strings in the program.  The program itself
130              is not executed.  See the  GNU  gettext  distribution  for  more
131              information on .pot files.
132
133       -h
134       --help Print a relatively short summary of the available options on the
135              standard output.  (Per the GNU Coding Standards,  these  options
136              cause an immediate, successful exit.)
137
138       -L [value]
139       --lint[=value]
140              Provide warnings about constructs that are dubious or non-porta‐
141              ble to other AWK implementations.  With an optional argument  of
142              fatal,  lint warnings become fatal errors.  This may be drastic,
143              but its use will certainly encourage the development of  cleaner
144              AWK  programs.  With an optional argument of invalid, only warn‐
145              ings about things that are actually invalid are issued. (This is
146              not fully implemented yet.)
147
148       -n
149       --non-decimal-data
150              Recognize  octal and hexadecimal values in input data.  Use this
151              option with great caution!
152
153       -N
154       --use-lc-numeric
155              This forces gawk to use the  locale's  decimal  point  character
156              when  parsing  input data.  Although the POSIX standard requires
157              this behavior, and gawk does so when --posix is in  effect,  the
158              default  is  to  follow traditional behavior and use a period as
159              the decimal point, even in locales where the period is  not  the
160              decimal  point  character.   This  option  overrides the default
161              behavior, without the full draconian strictness of  the  --posix
162              option.
163
164       -O
165       --optimize
166              Enable  optimizations  upon  the  internal representation of the
167              program.  Currently, this includes just simple constant-folding.
168              The  gawk  maintainer hopes to add additional optimizations over
169              time.
170
171       -p[prof_file]
172       --profile[=prof_file]
173              Send profiling data to prof_file.  The default  is  awkprof.out.
174              When  run with gawk, the profile is just a “pretty printed” ver‐
175              sion of the program.  When run with pgawk, the profile  contains
176              execution  counts  of  each statement in the program in the left
177              margin and function call counts for each user-defined function.
178
179       -P
180       --posix
181              This turns on compatibility mode, with the following  additional
182              restrictions:
183
184              · \x escape sequences are not recognized.
185
186              · Only space and tab act as field separators when FS is set to a
187                single space, newline does not.
188
189              · You cannot continue lines after ?  and :.
190
191              · The synonym func for the keyword function is not recognized.
192
193              · The operators ** and **= cannot be used in place of ^ and ^=.
194
195       -r
196       --re-interval
197              Enable the use of interval  expressions  in  regular  expression
198              matching (see Regular Expressions, below).  Interval expressions
199              were not traditionally available in the AWK language.  The POSIX
200              standard  added them, to make awk and egrep consistent with each
201              other.  They are enabled by default, but this option remains for
202              use with --traditional.
203
204       -R
205       --command file
206              Dgawk only.  Read stored debugger commands from file.
207
208       -S
209       --sandbox
210              Runs  gawk  in  sandbox  mode,  disabling the system() function,
211              input redirection with getline, output  redirection  with  print
212              and  printf,  and loading dynamic extensions.  Command execution
213              (through pipelines) is also disabled.  This effectively blocks a
214              script  from  accessing  local  resources  (except for the files
215              specified on the command line).
216
217       -t
218       --lint-old
219              Provide warnings about constructs that are not portable  to  the
220              original version of Unix awk.
221
222       -V
223       --version
224              Print  version  information  for this particular copy of gawk on
225              the standard output.  This is useful mainly for knowing  if  the
226              current  copy  of gawk on your system is up to date with respect
227              to whatever the Free Software Foundation is distributing.   This
228              is  also  useful when reporting bugs.  (Per the GNU Coding Stan‐
229              dards, these options cause an immediate, successful exit.)
230
231       --     Signal the end of options. This is useful to allow further argu‐
232              ments  to the AWK program itself to start with a “-”.  This pro‐
233              vides consistency with the argument parsing convention  used  by
234              most other POSIX programs.
235
236       In  compatibility  mode,  any other options are flagged as invalid, but
237       are otherwise ignored.  In normal operation, as long  as  program  text
238       has  been supplied, unknown options are passed on to the AWK program in
239       the ARGV array for processing.  This is particularly useful for running
240       AWK programs via the “#!” executable interpreter mechanism.
241

AWK PROGRAM EXECUTION

243       An  AWK program consists of a sequence of pattern-action statements and
244       optional function definitions.
245
246              @include "filename" pattern   { action statements }
247              function name(parameter list) { statements }
248
249       Gawk first reads the program source from the program-file(s) if  speci‐
250       fied, from arguments to --source, or from the first non-option argument
251       on the command line.  The -f and --source options may be used  multiple
252       times  on  the command line.  Gawk reads the program text as if all the
253       program-files and command  line  source  texts  had  been  concatenated
254       together.   This  is  useful  for  building libraries of AWK functions,
255       without having to include them in each new AWK program that uses  them.
256       It also provides the ability to mix library functions with command line
257       programs.
258
259       In addition, lines beginning with @include may be used to include other
260       source files into your program, making library use even easier.
261
262       The  environment  variable  AWKPATH specifies a search path to use when
263       finding source files named with the -f option.  If this  variable  does
264       not  exist,  the default path is ".:/usr/local/share/awk".  (The actual
265       directory may vary, depending upon how gawk was built  and  installed.)
266       If a file name given to the -f option contains a “/” character, no path
267       search is performed.
268
269       Gawk executes AWK programs in the following order.  First, all variable
270       assignments specified via the -v option are performed.  Next, gawk com‐
271       piles the program into an internal form.  Then, gawk executes the  code
272       in  the  BEGIN  block(s)  (if any), and then proceeds to read each file
273       named in the ARGV array (up to ARGV[ARGC]).   If  there  are  no  files
274       named on the command line, gawk reads the standard input.
275
276       If a filename on the command line has the form var=val it is treated as
277       a variable assignment.  The variable var will  be  assigned  the  value
278       val.   (This  happens after any BEGIN block(s) have been run.)  Command
279       line variable assignment is most useful for dynamically assigning  val‐
280       ues  to  the  variables  AWK  uses  to control how input is broken into
281       fields and records.  It is also useful for controlling state if  multi‐
282       ple passes are needed over a single data file.
283
284       If  the value of a particular element of ARGV is empty (""), gawk skips
285       over it.
286
287       For each input file, if a BEGINFILE  rule  exists,  gawk  executes  the
288       associated  code before processing the contents of the file. Similarly,
289       gawk executes the code associated with  ENDFILE  after  processing  the
290       file.
291
292       For  each record in the input, gawk tests to see if it matches any pat‐
293       tern in the AWK program.  For each pattern that the record matches, the
294       associated  action  is  executed.  The patterns are tested in the order
295       they occur in the program.
296
297       Finally, after all the input is exhausted, gawk executes  the  code  in
298       the END block(s) (if any).
299
300   Command Line Directories
301       According  to  POSIX,  files named on the awk command line must be text
302       files.  The behavior is ``undefined'' if they are not.   Most  versions
303       of awk treat a directory on the command line as a fatal error.
304
305       Starting with version 4.0 of gawk, a directory on the command line pro‐
306       duces a warning, but is otherwise skipped.  If either of the --posix or
307       --traditional  options is given, then gawk reverts to treating directo‐
308       ries on the command line as a fatal error.
309

VARIABLES, RECORDS AND FIELDS

311       AWK variables are dynamic; they come into existence when they are first
312       used.   Their  values  are either floating-point numbers or strings, or
313       both, depending upon how they are used.  AWK also has  one  dimensional
314       arrays; arrays with multiple dimensions may be simulated.  Several pre-
315       defined variables are set as a program runs;  these  are  described  as
316       needed and summarized below.
317
318   Records
319       Normally, records are separated by newline characters.  You can control
320       how records are separated by assigning values to the built-in  variable
321       RS.   If  RS is any single character, that character separates records.
322       Otherwise, RS is a regular expression.  Text in the input that  matches
323       this  regular expression separates the record.  However, in compatibil‐
324       ity mode, only the first character of its string value is used for sep‐
325       arating  records.   If  RS  is set to the null string, then records are
326       separated by blank lines.  When RS is set to the null string, the  new‐
327       line  character  always acts as a field separator, in addition to what‐
328       ever value FS may have.
329
330   Fields
331       As each input record is read, gawk splits the record into fields, using
332       the value of the FS variable as the field separator.  If FS is a single
333       character, fields are separated by that character.  If FS is  the  null
334       string,  then each individual character becomes a separate field.  Oth‐
335       erwise, FS is expected to be a full regular expression.  In the special
336       case  that FS is a single space, fields are separated by runs of spaces
337       and/or tabs and/or newlines.  (But see the section POSIX COMPATIBILITY,
338       below).   NOTE:  The  value  of IGNORECASE (see below) also affects how
339       fields are split when FS is a regular expression, and how  records  are
340       separated when RS is a regular expression.
341
342       If  the  FIELDWIDTHS  variable is set to a space separated list of num‐
343       bers, each field is expected to have fixed width, and  gawk  splits  up
344       the  record  using  the  specified widths.  The value of FS is ignored.
345       Assigning a new value to FS or FPAT overrides the use of FIELDWIDTHS.
346
347       Similarly, if the FPAT variable is set to a string representing a regu‐
348       lar expression, each field is made up of text that matches that regular
349       expression. In this case, the regular expression describes  the  fields
350       themselves, instead of the text that separates the fields.  Assigning a
351       new value to FS or FIELDWIDTHS overrides the use of FPAT.
352
353       Each field in the input record may be referenced by its  position,  $1,
354       $2,  and so on.  $0 is the whole record.  Fields need not be referenced
355       by constants:
356
357              n = 5
358              print $n
359
360       prints the fifth field in the input record.
361
362       The variable NF is set to the total  number  of  fields  in  the  input
363       record.
364
365       References  to  non-existent fields (i.e. fields after $NF) produce the
366       null-string.  However, assigning to a non-existent field (e.g., $(NF+2)
367       = 5) increases the value of NF, creates any intervening fields with the
368       null string as their value, and causes the value of  $0  to  be  recom‐
369       puted, with the fields being separated by the value of OFS.  References
370       to negative numbered fields  cause  a  fatal  error.   Decrementing  NF
371       causes  the  values  of  fields  past the new value to be lost, and the
372       value of $0 to be recomputed, with the fields being  separated  by  the
373       value of OFS.
374
375       Assigning  a  value  to an existing field causes the whole record to be
376       rebuilt when $0 is referenced.  Similarly,  assigning  a  value  to  $0
377       causes the record to be resplit, creating new values for the fields.
378
379   Built-in Variables
380       Gawk's built-in variables are:
381
382       ARGC        The  number  of  command  line  arguments (does not include
383                   options to gawk, or the program source).
384
385       ARGIND      The index in ARGV of the current file being processed.
386
387       ARGV        Array of command line arguments.  The array is indexed from
388                   0  to  ARGC - 1.  Dynamically changing the contents of ARGV
389                   can control the files used for data.
390
391       BINMODE     On non-POSIX systems, specifies use of  “binary”  mode  for
392                   all  file  I/O.  Numeric values of 1, 2, or 3, specify that
393                   input files, output  files,  or  all  files,  respectively,
394                   should  use binary I/O.  String values of "r", or "w" spec‐
395                   ify that input files, or output files, respectively, should
396                   use binary I/O.  String values of "rw" or "wr" specify that
397                   all files should use binary I/O.  Any other string value is
398                   treated as "rw", but generates a warning message.
399
400       CONVFMT     The conversion format for numbers, "%.6g", by default.
401
402       ENVIRON     An  array containing the values of the current environment.
403                   The array is indexed by  the  environment  variables,  each
404                   element  being  the  value  of  that  variable (e.g., ENVI‐
405                   RON["HOME"] might be /home/arnold).   Changing  this  array
406                   does not affect the environment seen by programs which gawk
407                   spawns via redirection or the system() function.
408
409       ERRNO       If a system error occurs either  doing  a  redirection  for
410                   getline,  during  a  read for getline, or during a close(),
411                   then ERRNO will contain a string describing the error.  The
412                   value is subject to translation in non-English locales.
413
414       FIELDWIDTHS A  whitespace  separated  list  of field widths.  When set,
415                   gawk parses the input into fields of fixed  width,  instead
416                   of  using the value of the FS variable as the field separa‐
417                   tor.  See Fields, above.
418
419       FILENAME    The name of the current input file.  If no files are speci‐
420                   fied  on  the  command  line, the value of FILENAME is “-”.
421                   However, FILENAME  is  undefined  inside  the  BEGIN  block
422                   (unless set by getline).
423
424       FNR         The input record number in the current input file.
425
426       FPAT        A  regular expression describing the contents of the fields
427                   in a record.  When set, gawk parses the input into  fields,
428                   where  the  fields match the regular expression, instead of
429                   using the value of the FS variable as the field  separator.
430                   See Fields, above.
431
432       FS          The input field separator, a space by default.  See Fields,
433                   above.
434
435       IGNORECASE  Controls the case-sensitivity of all regular expression and
436                   string  operations.   If  IGNORECASE  has a non-zero value,
437                   then string comparisons  and  pattern  matching  in  rules,
438                   field  splitting  with  FS and FPAT, record separating with
439                   RS, regular expression matching with ~ and !~, and the gen‐
440                   sub(),  gsub(),  index(), match(), patsplit(), split(), and
441                   sub() built-in functions all ignore case when doing regular
442                   expression  operations.   NOTE:  Array  subscripting is not
443                   affected.  However, the asort() and asorti() functions  are
444                   affected.
445                   Thus,  if IGNORECASE is not equal to zero, /aB/ matches all
446                   of the strings "ab", "aB", "Ab", and "AB".  As with all AWK
447                   variables,  the initial value of IGNORECASE is zero, so all
448                   regular expression and string operations are normally case-
449                   sensitive.
450
451       LINT        Provides  dynamic  control of the --lint option from within
452                   an AWK program.  When true, gawk prints lint warnings. When
453                   false,  it  does  not.   When  assigned  the  string  value
454                   "fatal", lint warnings become fatal  errors,  exactly  like
455                   --lint=fatal.  Any other true value just prints warnings.
456
457       NF          The number of fields in the current input record.
458
459       NR          The total number of input records seen so far.
460
461       OFMT        The output format for numbers, "%.6g", by default.
462
463       OFS         The output field separator, a space by default.
464
465       ORS         The output record separator, by default a newline.
466
467       PROCINFO    The  elements  of  this array provide access to information
468                   about the running AWK program.  On some systems, there  may
469                   be  elements  in  the  array, "group1" through "groupn" for
470                   some n, which is the number of  supplementary  groups  that
471                   the  process  has.   Use  the in operator to test for these
472                   elements.  The following  elements  are  guaranteed  to  be
473                   available:
474
475                   PROCINFO["egid"]    the  value  of  the  getegid(2)  system
476                                       call.
477
478                   PROCINFO["strftime"]
479                                       The  default  time  format  string  for
480                                       strftime().
481
482                   PROCINFO["euid"]    the  value  of  the  geteuid(2)  system
483                                       call.
484
485                   PROCINFO["FS"]      "FS" if field splitting with FS  is  in
486                                       effect,  "FPAT" if field splitting with
487                                       FPAT is in effect, or "FIELDWIDTHS"  if
488                                       field  splitting with FIELDWIDTHS is in
489                                       effect.
490
491                   PROCINFO["gid"]     the value of the getgid(2) system call.
492
493                   PROCINFO["pgrpid"]  the process group  ID  of  the  current
494                                       process.
495
496                   PROCINFO["pid"]     the process ID of the current process.
497
498                   PROCINFO["ppid"]    the  parent  process  ID of the current
499                                       process.
500
501                   PROCINFO["uid"]     the value of the getuid(2) system call.
502
503                   PROCINFO["sorted_in"]
504                                       If this  element  exists  in  PROCINFO,
505                                       then  its  value  controls the order in
506                                       which array elements are  traversed  in
507                                       for   loops.    Supported   values  are
508                                       "@ind_str_asc",         "@ind_num_asc",
509                                       "@val_type_asc",        "@val_str_asc",
510                                       "@val_num_asc",        "@ind_str_desc",
511                                       "@ind_num_desc",      "@val_type_desc",
512                                       "@val_str_desc",  "@val_num_desc",  and
513                                       "@unsorted".  The value can also be the
514                                       name of any comparison function defined
515                                       as follows:
516
517                          function cmp_func(i1, v1, i2, v2)
518
519                   where i1 and i2 are the indices, and v1 and v2 are the cor‐
520                   responding values of the two elements being  compared.   It
521                   should return a number less than, equal to, or greater than
522                   0, depending on how the elements of the  array  are  to  be
523                   ordered.
524
525                   PROCINFO["version"]
526                          the version of gawk.
527
528       RS          The input record separator, by default a newline.
529
530       RT          The record terminator.  Gawk sets RT to the input text that
531                   matched the character or regular  expression  specified  by
532                   RS.
533
534       RSTART      The  index  of the first character matched by match(); 0 if
535                   no match.  (This implies that character  indices  start  at
536                   one.)
537
538       RLENGTH     The  length  of  the  string  matched  by match(); -1 if no
539                   match.
540
541       SUBSEP      The character used to separate multiple subscripts in array
542                   elements, by default "\034".
543
544       TEXTDOMAIN  The text domain of the AWK program; used to find the local‐
545                   ized translations for the program's strings.
546
547   Arrays
548       Arrays are subscripted with an expression between  square  brackets  ([
549       and ]).  If the expression is an expression list (expr, expr ...)  then
550       the array subscript is a string consisting of the concatenation of  the
551       (string) value of each expression, separated by the value of the SUBSEP
552       variable.  This facility  is  used  to  simulate  multiply  dimensioned
553       arrays.  For example:
554
555              i = "A"; j = "B"; k = "C"
556              x[i, j, k] = "hello, world\n"
557
558       assigns the string "hello, world\n" to the element of the array x which
559       is indexed by the string "A\034B\034C".  All arrays in AWK are associa‐
560       tive, i.e. indexed by string values.
561
562       The  special  operator  in may be used to test if an array has an index
563       consisting of a particular value:
564
565              if (val in array)
566                   print array[val]
567
568       If the array has multiple subscripts, use (i, j) in array.
569
570       The in construct may also be used in a for loop to iterate over all the
571       elements of an array.
572
573       An  element  may  be  deleted from an array using the delete statement.
574       The delete statement may also be used to delete the entire contents  of
575       an array, just by specifying the array name without a subscript.
576
577       gawk  supports  true  multidimensional arrays. It does not require that
578       such arrays be ``rectangular'' as in C or C++.  For example:
579              a[1] = 5
580              a[2][1] = 6
581              a[2][2] = 7
582
583   Variable Typing And Conversion
584       Variables and fields may be (floating point) numbers,  or  strings,  or
585       both.  How the value of a variable is interpreted depends upon its con‐
586       text.  If used in a numeric expression, it will be treated as a number;
587       if used as a string it will be treated as a string.
588
589       To force a variable to be treated as a number, add 0 to it; to force it
590       to be treated as a string, concatenate it with the null string.
591
592       When a string must be converted to a number, the conversion  is  accom‐
593       plished  using  strtod(3).   A number is converted to a string by using
594       the value of CONVFMT as  a  format  string  for  sprintf(3),  with  the
595       numeric  value  of  the variable as the argument.  However, even though
596       all numbers in AWK are floating-point, integral values are always  con‐
597       verted as integers.  Thus, given
598
599              CONVFMT = "%2.2f"
600              a = 12
601              b = a ""
602
603       the variable b has a string value of "12" and not "12.00".
604
605       NOTE:  When  operating  in POSIX mode (such as with the --posix command
606       line option), beware that locale settings may interfere  with  the  way
607       decimal  numbers  are treated: the decimal separator of the numbers you
608       are feeding to gawk must conform to what your locale would  expect,  be
609       it a comma (,) or a period (.).
610
611       Gawk  performs  comparisons  as  follows: If two variables are numeric,
612       they are compared numerically.  If one value is numeric and  the  other
613       has  a  string  value  that is a “numeric string,” then comparisons are
614       also done numerically.  Otherwise, the numeric value is converted to  a
615       string and a string comparison is performed.  Two strings are compared,
616       of course, as strings.
617
618       Note that string constants, such as "57", are not numeric strings, they
619       are  string  constants.   The  idea of “numeric string” only applies to
620       fields, getline input, FILENAME, ARGV elements,  ENVIRON  elements  and
621       the  elements  of  an  array  created by split() or patsplit() that are
622       numeric strings.  The basic idea is that  user  input,  and  only  user
623       input, that looks numeric, should be treated that way.
624
625       Uninitialized  variables  have the numeric value 0 and the string value
626       "" (the null, or empty, string).
627
628   Octal and Hexadecimal Constants
629       You may use C-style octal and hexadecimal constants in your AWK program
630       source  code.   For example, the octal value 011 is equal to decimal 9,
631       and the hexadecimal value 0x11 is equal to decimal 17.
632
633   String Constants
634       String constants in AWK are sequences of  characters  enclosed  between
635       double quotes (like "value").  Within strings, certain escape sequences
636       are recognized, as in C.  These are:
637
638       \\   A literal backslash.
639
640       \a   The “alert” character; usually the ASCII BEL character.
641
642       \b   backspace.
643
644       \f   form-feed.
645
646       \n   newline.
647
648       \r   carriage return.
649
650       \t   horizontal tab.
651
652       \v   vertical tab.
653
654       \xhex digits
655            The character represented by the string of hexadecimal digits fol‐
656            lowing the \x.  As in ANSI C, all following hexadecimal digits are
657            considered part of the escape sequence.  (This feature should tell
658            us something about language design by committee.)  E.g., "\x1B" is
659            the ASCII ESC (escape) character.
660
661       \ddd The character represented by the 1-, 2-, or  3-digit  sequence  of
662            octal digits.  E.g., "\033" is the ASCII ESC (escape) character.
663
664       \c   The literal character c.
665
666       The  escape  sequences may also be used inside constant regular expres‐
667       sions (e.g., /[ \t\f\n\r\v]/ matches whitespace characters).
668
669       In compatibility mode, the characters represented by octal and hexadec‐
670       imal  escape  sequences  are  treated  literally  when  used in regular
671       expression constants.  Thus, /a\52b/ is equivalent to /a\*b/.
672

PATTERNS AND ACTIONS

674       AWK is a line-oriented language.  The pattern comes first, and then the
675       action.  Action statements are enclosed in { and }.  Either the pattern
676       may be missing, or the action may be missing, but, of course, not both.
677       If  the  pattern  is  missing,  the action is executed for every single
678       record of input.  A missing action is equivalent to
679
680              { print }
681
682       which prints the entire record.
683
684       Comments begin with the # character, and continue until the end of  the
685       line.   Blank  lines  may  be used to separate statements.  Normally, a
686       statement ends with a newline, however, this is not the case for  lines
687       ending in a comma, {, ?, :, &&, or ||.  Lines ending in do or else also
688       have their statements automatically continued on  the  following  line.
689       In  other  cases,  a  line can be continued by ending it with a “\”, in
690       which case the newline is ignored.
691
692       Multiple statements may be put on one line by separating  them  with  a
693       “;”.   This  applies to both the statements within the action part of a
694       pattern-action pair (the usual case), and to the pattern-action  state‐
695       ments themselves.
696
697   Patterns
698       AWK patterns may be one of the following:
699
700              BEGIN
701              END
702              BEGINFILE
703              ENDFILE
704              /regular expression/
705              relational expression
706              pattern && pattern
707              pattern || pattern
708              pattern ? pattern : pattern
709              (pattern)
710              ! pattern
711              pattern1, pattern2
712
713       BEGIN  and  END  are two special kinds of patterns which are not tested
714       against the input.  The action parts of all BEGIN patterns  are  merged
715       as  if  all  the  statements  had been written in a single BEGIN block.
716       They are executed before any of the input is read.  Similarly, all  the
717       END blocks are merged, and executed when all the input is exhausted (or
718       when an exit statement is executed).  BEGIN and END patterns cannot  be
719       combined  with  other  patterns  in pattern expressions.  BEGIN and END
720       patterns cannot have missing action parts.
721
722       BEGINFILE and ENDFILE are additional special patterns whose bodies  are
723       executed  before  reading  the  first record of each command line input
724       file and after reading the last record of each file.  Inside the BEGIN‐
725       FILE  rule,  the  value  of  ERRNO will be the empty string if the file
726       could be opened successfully.  Otherwise, there is  some  problem  with
727       the  file  and  the code should use nextfile to skip it. If that is not
728       done, gawk produces its usual fatal error  for  files  that  cannot  be
729       opened.
730
731       For /regular expression/ patterns, the associated statement is executed
732       for each input record that matches  the  regular  expression.   Regular
733       expressions  are  the  same  as  those  in egrep(1), and are summarized
734       below.
735
736       A relational expression may use any of the operators defined  below  in
737       the  section  on  actions.  These generally test whether certain fields
738       match certain regular expressions.
739
740       The &&, ||, and !  operators are logical AND, logical OR,  and  logical
741       NOT,  respectively, as in C.  They do short-circuit evaluation, also as
742       in C, and are used for combining more  primitive  pattern  expressions.
743       As  in  most  languages, parentheses may be used to change the order of
744       evaluation.
745
746       The ?: operator is like the same operator in C.  If the  first  pattern
747       is true then the pattern used for testing is the second pattern, other‐
748       wise it is the third.  Only one of the second  and  third  patterns  is
749       evaluated.
750
751       The pattern1, pattern2 form of an expression is called a range pattern.
752       It matches all input records starting with a record that  matches  pat‐
753       tern1,  and continuing until a record that matches pattern2, inclusive.
754       It does not combine with any other sort of pattern expression.
755
756   Regular Expressions
757       Regular expressions are the extended kind found  in  egrep.   They  are
758       composed of characters as follows:
759
760       c          matches the non-metacharacter c.
761
762       \c         matches the literal character c.
763
764       .          matches any character including newline.
765
766       ^          matches the beginning of a string.
767
768       $          matches the end of a string.
769
770       [abc...]   character list, matches any of the characters abc....
771
772       [^abc...]  negated character list, matches any character except abc....
773
774       r1|r2      alternation: matches either r1 or r2.
775
776       r1r2       concatenation: matches r1, and then r2.
777
778       r+         matches one or more r's.
779
780       r*         matches zero or more r's.
781
782       r?         matches zero or one r's.
783
784       (r)        grouping: matches r.
785
786       r{n}
787       r{n,}
788       r{n,m}     One  or two numbers inside braces denote an interval expres‐
789                  sion.  If there is one number in the braces,  the  preceding
790                  regular  expression r is repeated n times.  If there are two
791                  numbers separated by a comma, r is repeated n  to  m  times.
792                  If  there  is  one  number  followed  by  a comma, then r is
793                  repeated at least n times.
794
795       \y         matches the empty string at either the beginning or the  end
796                  of a word.
797
798       \B         matches the empty string within a word.
799
800       \<         matches the empty string at the beginning of a word.
801
802       \>         matches the empty string at the end of a word.
803
804       \s         matches any whitespace character.
805
806       \S         matches any nonwhitespace character.
807
808       \w         matches  any  word-constituent  character (letter, digit, or
809                  underscore).
810
811       \W         matches any character that is not word-constituent.
812
813       \`         matches the empty  string  at  the  beginning  of  a  buffer
814                  (string).
815
816       \'         matches the empty string at the end of a buffer.
817
818       The escape sequences that are valid in string constants (see below) are
819       also valid in regular expressions.
820
821       Character classes are a feature introduced in the  POSIX  standard.   A
822       character  class  is a special notation for describing lists of charac‐
823       ters that have a specific attribute, but where  the  actual  characters
824       themselves  can  vary from country to country and/or from character set
825       to character set.  For example, the notion of  what  is  an  alphabetic
826       character differs in the USA and in France.
827
828       A  character  class  is  only  valid in a regular expression inside the
829       brackets of a character list.  Character classes consist of [:, a  key‐
830       word  denoting the class, and :].  The character classes defined by the
831       POSIX standard are:
832
833       [:alnum:]  Alphanumeric characters.
834
835       [:alpha:]  Alphabetic characters.
836
837       [:blank:]  Space or tab characters.
838
839       [:cntrl:]  Control characters.
840
841       [:digit:]  Numeric characters.
842
843       [:graph:]  Characters that are both printable and visible.  (A space is
844                  printable, but not visible, while an a is both.)
845
846       [:lower:]  Lowercase alphabetic characters.
847
848       [:print:]  Printable  characters (characters that are not control char‐
849                  acters.)
850
851       [:punct:]  Punctuation characters (characters that are not letter, dig‐
852                  its, control characters, or space characters).
853
854       [:space:]  Space  characters (such as space, tab, and formfeed, to name
855                  a few).
856
857       [:upper:]  Uppercase alphabetic characters.
858
859       [:xdigit:] Characters that are hexadecimal digits.
860
861       For example, before the POSIX standard, to match  alphanumeric  charac‐
862       ters, you would have had to write /[A-Za-z0-9]/.  If your character set
863       had other alphabetic characters in it, this would not match  them,  and
864       if  your  character set collated differently from ASCII, this might not
865       even match the ASCII alphanumeric characters.  With the POSIX character
866       classes,  you  can write /[[:alnum:]]/, and this matches the alphabetic
867       and numeric characters in your character set, no matter what it is.
868
869       Two additional special sequences can appear in character lists.   These
870       apply  to  non-ASCII  character  sets,  which  can  have single symbols
871       (called collating elements) that are represented  with  more  than  one
872       character,  as  well as several characters that are equivalent for col‐
873       lating, or sorting, purposes.  (E.g., in French,  a  plain  “e”  and  a
874       grave-accented “e`” are equivalent.)
875
876       Collating Symbols
877              A  collating  symbol  is  a  multi-character  collating  element
878              enclosed in [.  and .].  For example, if ch is a collating  ele‐
879              ment,  then  [[.ch.]]  is a regular expression that matches this
880              collating element, while  [ch]  is  a  regular  expression  that
881              matches either c or h.
882
883       Equivalence Classes
884              An  equivalence  class  is  a locale-specific name for a list of
885              characters that are equivalent.  The name is enclosed in [=  and
886              =].   For  example, the name e might be used to represent all of
887              “e,” “e´,” and “e`.”  In this case, [[=e=]] is a  regular  expres‐
888              sion that matches any of e, e´, or e`.
889
890       These  features are very valuable in non-English speaking locales.  The
891       library functions that gawk uses for regular expression  matching  cur‐
892       rently  only  recognize  POSIX character classes; they do not recognize
893       collating symbols or equivalence classes.
894
895       The \y, \B, \<, \>, \s, \S, \w, \W, \`, and \' operators  are  specific
896       to  gawk;  they  are  extensions based on facilities in the GNU regular
897       expression libraries.
898
899       The various command line options control how gawk interprets characters
900       in regular expressions.
901
902       No options
903              In  the  default  case, gawk provide all the facilities of POSIX
904              regular expressions and the  GNU  regular  expression  operators
905              described above.
906
907       --posix
908              Only  POSIX regular expressions are supported, the GNU operators
909              are not special.  (E.g., \w matches a literal w).
910
911       --traditional
912              Traditional Unix awk regular expressions are matched.   The  GNU
913              operators  are  not  special,  and  interval expressions are not
914              available.  Characters described by octal and hexadecimal escape
915              sequences  are treated literally, even if they represent regular
916              expression metacharacters.
917
918       --re-interval
919              Allow interval  expressions  in  regular  expressions,  even  if
920              --traditional has been provided.
921
922   Actions
923       Action  statements  are enclosed in braces, { and }.  Action statements
924       consist of the usual assignment, conditional,  and  looping  statements
925       found  in  most  languages.   The  operators,  control  statements, and
926       input/output statements available are patterned after those in C.
927
928   Operators
929       The operators in AWK, in order of decreasing precedence, are
930
931       (...)       Grouping
932
933       $           Field reference.
934
935       ++ --       Increment and decrement, both prefix and postfix.
936
937       ^           Exponentiation (** may  also  be  used,  and  **=  for  the
938                   assignment operator).
939
940       + - !       Unary plus, unary minus, and logical negation.
941
942       * / %       Multiplication, division, and modulus.
943
944       + -         Addition and subtraction.
945
946       space       String concatenation.
947
948       |   |&      Piped I/O for getline, print, and printf.
949
950       < > <= >= != ==
951                   The regular relational operators.
952
953       ~ !~        Regular  expression match, negated match.  NOTE: Do not use
954                   a constant regular expression (/foo/) on the left-hand side
955                   of  a  ~  or !~.  Only use one on the right-hand side.  The
956                   expression /foo/ ~ exp has  the  same  meaning  as  (($0  ~
957                   /foo/) ~ exp).  This is usually not what was intended.
958
959       in          Array membership.
960
961       &&          Logical AND.
962
963       ||          Logical OR.
964
965       ?:          The  C  conditional  expression.  This has the form expr1 ?
966                   expr2 : expr3.  If expr1 is true, the value of the  expres‐
967                   sion  is  expr2,  otherwise it is expr3.  Only one of expr2
968                   and expr3 is evaluated.
969
970       = += -= *= /= %= ^=
971                   Assignment.  Both absolute assignment  (var  =  value)  and
972                   operator-assignment (the other forms) are supported.
973
974   Control Statements
975       The control statements are as follows:
976
977              if (condition) statement [ else statement ]
978              while (condition) statement
979              do statement while (condition)
980              for (expr1; expr2; expr3) statement
981              for (var in array) statement
982              break
983              continue
984              delete array[index]
985              delete array
986              exit [ expression ]
987              { statements }
988              switch (expression) {
989              case value|regex : statement
990              ...
991              [ default: statement ]
992              }
993
994   I/O Statements
995       The input/output statements are as follows:
996
997       close(file [, how])   Close file, pipe or co-process.  The optional how
998                             should only be used when closing  one  end  of  a
999                             two-way  pipe  to  a  co-process.   It  must be a
1000                             string value, either "to" or "from".
1001
1002       getline               Set $0 from next input record; set NF, NR, FNR.
1003
1004       getline <file         Set $0 from next record of file; set NF.
1005
1006       getline var           Set var from next input record; set NR, FNR.
1007
1008       getline var <file     Set var from next record of file.
1009
1010       command | getline [var]
1011                             Run command piping the output either into  $0  or
1012                             var, as above.
1013
1014       command |& getline [var]
1015                             Run  command  as  a  co-process piping the output
1016                             either into $0 or var,  as  above.   Co-processes
1017                             are  a  gawk  extension.   (command can also be a
1018                             socket.  See the subsection Special  File  Names,
1019                             below.)
1020
1021       next                  Stop  processing  the  current input record.  The
1022                             next input record is read and  processing  starts
1023                             over  with  the first pattern in the AWK program.
1024                             If the end of the input data is reached, the  END
1025                             block(s), if any, are executed.
1026
1027       nextfile              Stop processing the current input file.  The next
1028                             input record read comes from the next input file.
1029                             FILENAME  and ARGIND are updated, FNR is reset to
1030                             1, and processing starts over with the first pat‐
1031                             tern  in the AWK program. If the end of the input
1032                             data is reached, the END block(s),  if  any,  are
1033                             executed.
1034
1035       print                 Print  the  current record.  The output record is
1036                             terminated with the value of the ORS variable.
1037
1038       print expr-list       Print expressions.  Each expression is  separated
1039                             by  the  value  of  the OFS variable.  The output
1040                             record is terminated with the value  of  the  ORS
1041                             variable.
1042
1043       print expr-list >file Print  expressions  on  file.  Each expression is
1044                             separated by the value of the OFS variable.   The
1045                             output record is terminated with the value of the
1046                             ORS variable.
1047
1048       printf fmt, expr-list Format and  print.   See  The  printf  Statement,
1049                             below.
1050
1051       printf fmt, expr-list >file
1052                             Format and print on file.
1053
1054       system(cmd-line)      Execute the command cmd-line, and return the exit
1055                             status.  (This may not be available on  non-POSIX
1056                             systems.)
1057
1058       fflush([file])        Flush any buffers associated with the open output
1059                             file or pipe file.  If file is missing or  if  it
1060                             is  the  null  string, then flush all open output
1061                             files and pipes.
1062
1063       Additional output redirections are allowed for print and printf.
1064
1065       print ... >> file
1066              Appends output to the file.
1067
1068       print ... | command
1069              Writes on a pipe.
1070
1071       print ... |& command
1072              Sends data to a co-process or socket.  (See also the  subsection
1073              Special File Names, below.)
1074
1075       The  getline  command returns 1 on success, 0 on end of file, and -1 on
1076       an error.  Upon an error, ERRNO contains a string describing the  prob‐
1077       lem.
1078
1079       NOTE:  Failure  in  opening a two-way socket will result in a non-fatal
1080       error being returned to the calling function.  If  using  a  pipe,  co-
1081       process,  or  socket to getline, or from print or printf within a loop,
1082       you must use close() to create new instances of the command or  socket.
1083       AWK  does  not automatically close pipes, sockets, or co-processes when
1084       they return EOF.
1085
1086   The printf Statement
1087       The AWK versions of the printf statement and  sprintf()  function  (see
1088       below) accept the following conversion specification formats:
1089
1090       %c      A single character.  If the argument used for %c is numeric, it
1091               is treated as a character and printed.  Otherwise, the argument
1092               is assumed to be a string, and the only first character of that
1093               string is printed.
1094
1095       %d, %i  A decimal number (the integer part).
1096
1097       %e, %E  A floating point number of the form [-]d.dddddde[+-]dd.  The %E
1098               format uses E instead of e.
1099
1100       %f, %F  A floating point number of the form [-]ddd.dddddd.  If the sys‐
1101               tem library supports it, %F is available as well. This is  like
1102               %f,  but  uses  capital  letters for special “not a number” and
1103               “infinity” values. If %F is not available, gawk uses %f.
1104
1105       %g, %G  Use %e or %f conversion, whichever is shorter, with nonsignifi‐
1106               cant zeros suppressed.  The %G format uses %E instead of %e.
1107
1108       %o      An unsigned octal number (also an integer).
1109
1110       %u      An unsigned decimal number (again, an integer).
1111
1112       %s      A character string.
1113
1114       %x, %X  An  unsigned  hexadecimal  number  (an integer).  The %X format
1115               uses ABCDEF instead of abcdef.
1116
1117       %%      A single % character; no argument is converted.
1118
1119       Optional, additional parameters may lie between the % and  the  control
1120       letter:
1121
1122       count$ Use the count'th argument at this point in the formatting.  This
1123              is called a positional specifier and is intended  primarily  for
1124              use  in translated versions of format strings, not in the origi‐
1125              nal text of an AWK program.  It is a gawk extension.
1126
1127       -      The expression should be left-justified within its field.
1128
1129       space  For numeric conversions, prefix positive values  with  a  space,
1130              and negative values with a minus sign.
1131
1132       +      The  plus sign, used before the width modifier (see below), says
1133              to always supply a sign for numeric  conversions,  even  if  the
1134              data  to  be  formatted  is positive.  The + overrides the space
1135              modifier.
1136
1137       #      Use an “alternate form” for certain control  letters.   For  %o,
1138              supply  a  leading zero.  For %x, and %X, supply a leading 0x or
1139              0X for a nonzero result.  For %e, %E,  %f  and  %F,  the  result
1140              always contains a decimal point.  For %g, and %G, trailing zeros
1141              are not removed from the result.
1142
1143       0      A leading 0 (zero) acts as a flag, that indicates output  should
1144              be  padded  with zeroes instead of spaces.  This applies only to
1145              the numeric output formats.  This flag only has an  effect  when
1146              the field width is wider than the value to be printed.
1147
1148       width  The field should be padded to this width.  The field is normally
1149              padded with spaces.  If the 0 flag has been used, it  is  padded
1150              with zeroes.
1151
1152       .prec  A number that specifies the precision to use when printing.  For
1153              the %e, %E, %f and %F, formats, this  specifies  the  number  of
1154              digits  you want printed to the right of the decimal point.  For
1155              the %g, and %G formats, it specifies the maximum number of  sig‐
1156              nificant digits.  For the %d, %i, %o, %u, %x, and %X formats, it
1157              specifies the minimum number of digits to  print.   For  %s,  it
1158              specifies  the maximum number of characters from the string that
1159              should be printed.
1160
1161       The dynamic width and prec capabilities of the ANSI C printf() routines
1162       are supported.  A * in place of either the width or prec specifications
1163       causes their values to be taken from the argument  list  to  printf  or
1164       sprintf().   To use a positional specifier with a dynamic width or pre‐
1165       cision, supply the count$ after the * in the format string.  For  exam‐
1166       ple, "%3$*2$.*1$s".
1167
1168   Special File Names
1169       When  doing I/O redirection from either print or printf into a file, or
1170       via getline from a file,  gawk  recognizes  certain  special  filenames
1171       internally.   These  filenames  allow  access  to open file descriptors
1172       inherited from gawk's parent process (usually the shell).   These  file
1173       names  may  also  be  used on the command line to name data files.  The
1174       filenames are:
1175
1176       /dev/stdin  The standard input.
1177
1178       /dev/stdout The standard output.
1179
1180       /dev/stderr The standard error output.
1181
1182       /dev/fd/n   The file associated with the open file descriptor n.
1183
1184       These are particularly useful for error messages.  For example:
1185
1186              print "You blew it!" > "/dev/stderr"
1187
1188       whereas you would otherwise have to use
1189
1190              print "You blew it!" | "cat 1>&2"
1191
1192       The following special filenames may be  used  with  the  |&  co-process
1193       operator for creating TCP/IP network connections:
1194
1195       /inet/tcp/lport/rhost/rport
1196       /inet4/tcp/lport/rhost/rport
1197       /inet6/tcp/lport/rhost/rport
1198              Files for a TCP/IP connection on local port lport to remote host
1199              rhost on remote port rport.  Use a port of 0 to have the  system
1200              pick a port.  Use /inet4 to force an IPv4 connection, and /inet6
1201              to force an  IPv6  connection.   Plain  /inet  uses  the  system
1202              default (most likely IPv4).
1203
1204       /inet/udp/lport/rhost/rport
1205       /inet4/udp/lport/rhost/rport
1206       /inet6/udp/lport/rhost/rport
1207              Similar, but use UDP/IP instead of TCP/IP.
1208
1209   Numeric Functions
1210       AWK has the following built-in arithmetic functions:
1211
1212       atan2(y, x)   Return the arctangent of y/x in radians.
1213
1214       cos(expr)     Return the cosine of expr, which is in radians.
1215
1216       exp(expr)     The exponential function.
1217
1218       int(expr)     Truncate to integer.
1219
1220       log(expr)     The natural logarithm function.
1221
1222       rand()        Return  a random number N, between 0 and 1, such that 0 ≤
1223                     N < 1.
1224
1225       sin(expr)     Return the sine of expr, which is in radians.
1226
1227       sqrt(expr)    The square root function.
1228
1229       srand([expr]) Use expr as the new seed for the random number generator.
1230                     If  no expr is provided, use the time of day.  The return
1231                     value is the previous seed for the random number  genera‐
1232                     tor.
1233
1234   String Functions
1235       Gawk has the following built-in string functions:
1236
1237       asort(s [, d [, how] ]) Return  the  number  of  elements in the source
1238                               array s.  Sort the contents of s  using  gawk's
1239                               normal  rules for comparing values, and replace
1240                               the indices of the sorted values s with sequen‐
1241                               tial  integers starting with 1. If the optional
1242                               destination array d is  specified,  then  first
1243                               duplicate  s  into  d, and then sort d, leaving
1244                               the indices of the source  array  s  unchanged.
1245                               The  optional string how controls the direction
1246                               and the comparison mode.  Valid values for  how
1247                               are    any    of    the   strings   valid   for
1248                               PROCINFO["sorted_in"].  It can also be the name
1249                               of   a   user-defined  comparison  function  as
1250                               described in PROCINFO["sorted_in"].
1251
1252       asorti(s [, d [, how] ])
1253                               Return the number of  elements  in  the  source
1254                               array  s.   The behavior is the same as that of
1255                               asort(), except that the array indices are used
1256                               for  sorting, not the array values.  When done,
1257                               the array is indexed numerically, and the  val‐
1258                               ues  are  those  of  the original indices.  The
1259                               original values are lost; thus provide a second
1260                               array  if  you  wish  to preserve the original.
1261                               The purpose of the optional string how  is  the
1262                               same as described in asort() above.
1263
1264       gensub(r, s, h [, t])   Search  the  target string t for matches of the
1265                               regular expression r.  If h is a string  begin‐
1266                               ning with g or G, then replace all matches of r
1267                               with s.  Otherwise, h is  a  number  indicating
1268                               which  match of r to replace.  If t is not sup‐
1269                               plied, use $0 instead.  Within the  replacement
1270                               text  s,  the  sequence  \n, where n is a digit
1271                               from 1 to 9, may be used to indicate  just  the
1272                               text that matched the n'th parenthesized subex‐
1273                               pression.   The  sequence  \0  represents   the
1274                               entire  matched  text, as does the character &.
1275                               Unlike sub() and gsub(), the modified string is
1276                               returned as the result of the function, and the
1277                               original target string is not changed.
1278
1279       gsub(r, s [, t])        For each substring matching the regular expres‐
1280                               sion  r  in the string t, substitute the string
1281                               s, and return the number of substitutions.   If
1282                               t  is  not  supplied,  use  $0.   An  &  in the
1283                               replacement text is replaced with the text that
1284                               was  actually matched.  Use \& to get a literal
1285                               &.  (This must be typed  as  "\\&";  see  GAWK:
1286                               Effective  AWK Programming for a fuller discus‐
1287                               sion of the rules for &'s  and  backslashes  in
1288                               the replacement text of sub(), gsub(), and gen‐
1289                               sub().)
1290
1291       index(s, t)             Return the index of the string t in the  string
1292                               s,  or  0  if  t is not present.  (This implies
1293                               that character indices start at one.)
1294
1295       length([s])             Return the length  of  the  string  s,  or  the
1296                               length  of  $0 if s is not supplied.  As a non-
1297                               standard extension,  with  an  array  argument,
1298                               length()  returns the number of elements in the
1299                               array.
1300
1301       match(s, r [, a])       Return the position  in  s  where  the  regular
1302                               expression  r occurs, or 0 if r is not present,
1303                               and set the values of RSTART and RLENGTH.  Note
1304                               that  the argument order is the same as for the
1305                               ~ operator: str ~ re.  If array a is  provided,
1306                               a  is cleared and then elements 1 through n are
1307                               filled with the portions of s  that  match  the
1308                               corresponding parenthesized subexpression in r.
1309                               The 0'th element of a contains the portion of s
1310                               matched  by  the  entire  regular expression r.
1311                               Subscripts a[n, "start"],  and  a[n,  "length"]
1312                               provide  the  starting  index in the string and
1313                               length  respectively,  of  each  matching  sub‐
1314                               string.
1315
1316       patsplit(s, a [, r [, seps] ])
1317                               Split  the  string  s  into the array a and the
1318                               separators array seps on the regular expression
1319                               r,  and  return  the number of fields.  Element
1320                               values are the portions of s  that  matched  r.
1321                               The  value  of  seps[i]  is  the separator that
1322                               appeared in front of a[i+1].  If r is  omitted,
1323                               FPAT  is  used  instead.  The arrays a and seps
1324                               are cleared first.  Splitting  behaves  identi‐
1325                               cally  to  field splitting with FPAT, described
1326                               above.
1327
1328       split(s, a [, r [, seps] ])
1329                               Split the string s into the  array  a  and  the
1330                               separators array seps on the regular expression
1331                               r, and return the number of fields.   If  r  is
1332                               omitted,  FS is used instead.  The arrays a and
1333                               seps are cleared first.  seps[i] is  the  field
1334                               separator matched by r between a[i] and a[i+1].
1335                               If r is a single space, then leading whitespace
1336                               in  s goes into the extra array element seps[0]
1337                               and trailing whitespace  goes  into  the  extra
1338                               array  element  seps[n],  where n is the return
1339                               value  of  split(s,  a,  r,  seps).   Splitting
1340                               behaves   identically   to   field   splitting,
1341                               described above.
1342
1343       sprintf(fmt, expr-list) Prints expr-list according to fmt, and  returns
1344                               the resulting string.
1345
1346       strtonum(str)           Examine  str, and return its numeric value.  If
1347                               str begins with a leading 0, strtonum() assumes
1348                               that  str  is  an  octal number.  If str begins
1349                               with a leading 0x  or  0X,  strtonum()  assumes
1350                               that  str  is a hexadecimal number.  Otherwise,
1351                               decimal is assumed.
1352
1353       sub(r, s [, t])         Just like gsub(), but replace  only  the  first
1354                               matching substring.
1355
1356       substr(s, i [, n])      Return  the  at most n-character substring of s
1357                               starting at i.  If n is omitted, use  the  rest
1358                               of s.
1359
1360       tolower(str)            Return  a  copy of the string str, with all the
1361                               uppercase characters in str translated to their
1362                               corresponding   lowercase  counterparts.   Non-
1363                               alphabetic characters are left unchanged.
1364
1365       toupper(str)            Return a copy of the string str, with  all  the
1366                               lowercase characters in str translated to their
1367                               corresponding  uppercase  counterparts.    Non-
1368                               alphabetic characters are left unchanged.
1369
1370       Gawk  is  multibyte aware.  This means that index(), length(), substr()
1371       and match() all work in terms of characters, not bytes.
1372
1373   Time Functions
1374       Since one of the primary uses of AWK programs is processing  log  files
1375       that  contain time stamp information, gawk provides the following func‐
1376       tions for obtaining time stamps and formatting them.
1377
1378       mktime(datespec)
1379                 Turn datespec into a time stamp of the same form as  returned
1380                 by  systime(),  and  return  the  result.   The datespec is a
1381                 string of the form YYYY MM DD HH MM SS[ DST].   The  contents
1382                 of  the  string are six or seven numbers representing respec‐
1383                 tively the full year including century, the month from  1  to
1384                 12,  the  day  of the month from 1 to 31, the hour of the day
1385                 from 0 to 23, the minute from 0 to 59, the second from  0  to
1386                 60,  and  an  optional  daylight  saving flag.  The values of
1387                 these numbers need not be within the  ranges  specified;  for
1388                 example,  an  hour  of  -1 means 1 hour before midnight.  The
1389                 origin-zero Gregorian calendar is assumed, with year  0  pre‐
1390                 ceding  year  1  and  year  -1 preceding year 0.  The time is
1391                 assumed to be in the local timezone.  If the daylight  saving
1392                 flag  is  positive, the time is assumed to be daylight saving
1393                 time; if zero, the time is assumed to be standard  time;  and
1394                 if  negative  (the  default),  mktime() attempts to determine
1395                 whether daylight saving time is in effect for  the  specified
1396                 time.  If datespec does not contain enough elements or if the
1397                 resulting time is out of range, mktime() returns -1.
1398
1399       strftime([format [, timestamp[, utc-flag]]])
1400                 Format timestamp according to the  specification  in  format.
1401                 If  utc-flag  is  present  and  is  non-zero or non-null, the
1402                 result is in UTC, otherwise the result is in local time.  The
1403                 timestamp  should  be  of  the  same form as returned by sys‐
1404                 time().  If timestamp is missing, the current time of day  is
1405                 used.   If  format is missing, a default format equivalent to
1406                 the output of date(1) is used.  The default format is  avail‐
1407                 able  in PROCINFO["strftime"].  See the specification for the
1408                 strftime() function in ANSI C for the format conversions that
1409                 are guaranteed to be available.
1410
1411       systime() Return the current time of day as the number of seconds since
1412                 the Epoch (1970-01-01 00:00:00 UTC on POSIX systems).
1413
1414   Bit Manipulations Functions
1415       Gawk supplies the following bit manipulation functions.  They  work  by
1416       converting  double-precision  floating  point values to uintmax_t inte‐
1417       gers, doing the operation, and  then  converting  the  result  back  to
1418       floating point.  The functions are:
1419
1420       and(v1, v2)         Return the bitwise AND of the values provided by v1
1421                           and v2.
1422
1423       compl(val)          Return the bitwise complement of val.
1424
1425       lshift(val, count)  Return the value of  val,  shifted  left  by  count
1426                           bits.
1427
1428       or(v1, v2)          Return  the bitwise OR of the values provided by v1
1429                           and v2.
1430
1431       rshift(val, count)  Return the value of val,  shifted  right  by  count
1432                           bits.
1433
1434       xor(v1, v2)         Return the bitwise XOR of the values provided by v1
1435                           and v2.
1436
1437   Type Function
1438       The following function is for use with multidimensional arrays.
1439
1440       isarray(x)
1441              Return true if x is an array, false otherwise.
1442
1443   Internationalization Functions
1444       The following functions may be used from within your  AWK  program  for
1445       translating strings at run-time.  For full details, see GAWK: Effective
1446       AWK Programming.
1447
1448       bindtextdomain(directory [, domain])
1449              Specify the directory where gawk looks for  the  .mo  files,  in
1450              case they will not or cannot be placed in the ``standard'' loca‐
1451              tions (e.g., during testing).  It returns  the  directory  where
1452              domain is ``bound.''
1453              The  default domain is the value of TEXTDOMAIN.  If directory is
1454              the null string (""), then bindtextdomain() returns the  current
1455              binding for the given domain.
1456
1457       dcgettext(string [, domain [, category]])
1458              Return  the  translation  of  string  in  text domain domain for
1459              locale category category.  The default value for domain  is  the
1460              current  value of TEXTDOMAIN.  The default value for category is
1461              "LC_MESSAGES".
1462              If you supply a value for category, it must be a string equal to
1463              one  of the known locale categories described in GAWK: Effective
1464              AWK Programming.  You must  also  supply  a  text  domain.   Use
1465              TEXTDOMAIN if you want to use the current domain.
1466
1467       dcngettext(string1 , string2 , number [, domain [, category]])
1468              Return  the  plural  form  used for number of the translation of
1469              string1 and string2 in text domain domain  for  locale  category
1470              category.   The default value for domain is the current value of
1471              TEXTDOMAIN.  The default value for category is "LC_MESSAGES".
1472              If you supply a value for category, it must be a string equal to
1473              one  of the known locale categories described in GAWK: Effective
1474              AWK Programming.  You must  also  supply  a  text  domain.   Use
1475              TEXTDOMAIN if you want to use the current domain.
1476

USER-DEFINED FUNCTIONS

1478       Functions in AWK are defined as follows:
1479
1480              function name(parameter list) { statements }
1481
1482       Functions  are executed when they are called from within expressions in
1483       either patterns or actions.  Actual parameters supplied in the function
1484       call  are  used  to  instantiate  the formal parameters declared in the
1485       function.  Arrays are passed by reference, other variables  are  passed
1486       by value.
1487
1488       Since  functions were not originally part of the AWK language, the pro‐
1489       vision for local variables is rather clumsy: They are declared as extra
1490       parameters  in the parameter list.  The convention is to separate local
1491       variables from real parameters by extra spaces in the  parameter  list.
1492       For example:
1493
1494              function  f(p, q,     a, b)   # a and b are local
1495              {
1496                   ...
1497              }
1498
1499              /abc/     { ... ; f(1, 2) ; ... }
1500
1501       The left parenthesis in a function call is required to immediately fol‐
1502       low the function name, without any intervening whitespace.  This avoids
1503       a  syntactic  ambiguity with the concatenation operator.  This restric‐
1504       tion does not apply to the built-in functions listed above.
1505
1506       Functions may call each other and may be recursive.   Function  parame‐
1507       ters used as local variables are initialized to the null string and the
1508       number zero upon function invocation.
1509
1510       Use return expr to return a value from a function.  The return value is
1511       undefined if no value is provided, or if the function returns by “fall‐
1512       ing off” the end.
1513
1514       As a gawk extension, functions may be called indirectly.  To  do  this,
1515       assign  the  name of the function to be called, as a string, to a vari‐
1516       able.  Then use the variable as if it were the name of a function, pre‐
1517       fixed with an @ sign, like so:
1518              function  myfunc()
1519              {
1520                   print "myfunc called"
1521                   ...
1522              }
1523
1524              {    ...
1525                   the_func = "myfunc"
1526                   @the_func()    # call through the_func to myfunc
1527                   ...
1528              }
1529
1530       If  --lint has been provided, gawk warns about calls to undefined func‐
1531       tions at parse time, instead of at  run  time.   Calling  an  undefined
1532       function at run time is a fatal error.
1533
1534       The word func may be used in place of function.
1535

DYNAMICALLY LOADING NEW FUNCTIONS

1537       You  can  dynamically  add  new  built-in functions to the running gawk
1538       interpreter.  The full details are beyond  the  scope  of  this  manual
1539       page; see GAWK: Effective AWK Programming for the details.
1540
1541       extension(object, function)
1542               Dynamically  link  the  shared object file named by object, and
1543               invoke function in  that  object,  to  perform  initialization.
1544               These  should  both  be  provided as strings.  Return the value
1545               returned by function.
1546
1547       Using this feature at the C level is not pretty, but it is unlikely  to
1548       go away. Additional mechanisms may be added at some point.
1549

SIGNALS

1551       pgawk  accepts  two  signals.   SIGUSR1 causes it to dump a profile and
1552       function call stack to the profile file, which is  either  awkprof.out,
1553       or  whatever file was named with the --profile option.  It then contin‐
1554       ues to run.  SIGHUP causes pgawk to dump the profile and function  call
1555       stack and then exit.
1556

INTERNATIONALIZATION

1558       String constants are sequences of characters enclosed in double quotes.
1559       In non-English speaking environments, it is possible to mark strings in
1560       the AWK program as requiring translation to the local natural language.
1561       Such strings are marked in the AWK program with  a  leading  underscore
1562       (“_”).  For example,
1563
1564              gawk 'BEGIN { print "hello, world" }'
1565
1566       always prints hello, world.  But,
1567
1568              gawk 'BEGIN { print _"hello, world" }'
1569
1570       might print bonjour, monde in France.
1571
1572       There are several steps involved in producing and running a localizable
1573       AWK program.
1574
1575       1.  Add a BEGIN action to assign a value to the TEXTDOMAIN variable  to
1576           set the text domain to a name associated with your program:
1577
1578           BEGIN { TEXTDOMAIN = "myprog" }
1579
1580       This  allows  gawk  to  find the .mo file associated with your program.
1581       Without this step, gawk uses the messages  text  domain,  which  likely
1582       does not contain translations for your program.
1583
1584       2.  Mark  all  strings  that  should  be translated with leading under‐
1585           scores.
1586
1587       3.  If necessary, use the dcgettext() and/or bindtextdomain() functions
1588           in your program, as appropriate.
1589
1590       4.  Run  gawk  --gen-pot  -f  myprog.awk > myprog.pot to generate a .po
1591           file for your program.
1592
1593       5.  Provide appropriate translations, and build and install the  corre‐
1594           sponding .mo files.
1595
1596       The internationalization features are described in full detail in GAWK:
1597       Effective AWK Programming.
1598

POSIX COMPATIBILITY

1600       A primary goal for gawk is compatibility with the  POSIX  standard,  as
1601       well  as with the latest version of UNIX awk.  To this end, gawk incor‐
1602       porates the following user visible features which are not described  in
1603       the AWK book, but are part of the Bell Laboratories version of awk, and
1604       are in the POSIX standard.
1605
1606       The book indicates that command line variable assignment  happens  when
1607       awk  would  otherwise  open  the argument as a file, which is after the
1608       BEGIN block is executed.  However,  in  earlier  implementations,  when
1609       such an assignment appeared before any file names, the assignment would
1610       happen before the BEGIN block was run.  Applications came to depend  on
1611       this  “feature.”   When awk was changed to match its documentation, the
1612       -v option for assigning variables before program execution was added to
1613       accommodate  applications  that  depended upon the old behavior.  (This
1614       feature was agreed upon by both  the  Bell  Laboratories  and  the  GNU
1615       developers.)
1616
1617       When  processing arguments, gawk uses the special option “--” to signal
1618       the end of arguments.  In compatibility mode, it warns about but other‐
1619       wise  ignores  undefined  options.  In normal operation, such arguments
1620       are passed on to the AWK program for it to process.
1621
1622       The AWK book does not define the return value of  srand().   The  POSIX
1623       standard has it return the seed it was using, to allow keeping track of
1624       random number sequences.  Therefore srand() in gawk  also  returns  its
1625       current seed.
1626
1627       Other  new features are: The use of multiple -f options (from MKS awk);
1628       the ENVIRON array; the \a, and \v escape sequences (done originally  in
1629       gawk  and  fed  back into the Bell Laboratories version); the tolower()
1630       and toupper() built-in functions (from the Bell Laboratories  version);
1631       and  the  ANSI C conversion specifications in printf (done first in the
1632       Bell Laboratories version).
1633

HISTORICAL FEATURES

1635       There is one feature of historical AWK implementations that  gawk  sup‐
1636       ports:  It  is possible to call the length() built-in function not only
1637       with no argument, but even without parentheses!  Thus,
1638
1639              a = length     # Holy Algol 60, Batman!
1640
1641       is the same as either of
1642
1643              a = length()
1644              a = length($0)
1645
1646       Using this feature is poor practice, and gawk issues  a  warning  about
1647       its use if --lint is specified on the command line.
1648

GNU EXTENSIONS

1650       Gawk  has  a  number of extensions to POSIX awk.  They are described in
1651       this section.  All the extensions described here  can  be  disabled  by
1652       invoking gawk with the --traditional or --posix options.
1653
1654       The following features of gawk are not available in POSIX awk.
1655
1656       · No  path  search  is  performed  for  files  named via the -f option.
1657         Therefore the AWKPATH environment variable is not special.
1658
1659       · There is no facility for doing file inclusion (gawk's @include mecha‐
1660         nism).
1661
1662       · The \x escape sequence.  (Disabled with --posix.)
1663
1664       · The  ability  to  continue  lines  after  ?   and  :.  (Disabled with
1665         --posix.)
1666
1667       · Octal and hexadecimal constants in AWK programs.
1668
1669       · The ARGIND, BINMODE, ERRNO, LINT, RT and TEXTDOMAIN variables are not
1670         special.
1671
1672       · The IGNORECASE variable and its side-effects are not available.
1673
1674       · The FIELDWIDTHS variable and fixed-width field splitting.
1675
1676       · The FPAT variable and field splitting based on field values.
1677
1678       · The PROCINFO array is not available.
1679
1680       · The use of RS as a regular expression.
1681
1682       · The  special  file names available for I/O redirection are not recog‐
1683         nized.
1684
1685       · The |& operator for creating co-processes.
1686
1687       · The BEGINFILE and ENDFILE special patterns are not available.
1688
1689       · The ability to split out individual characters using the null  string
1690         as the value of FS, and as the third argument to split().
1691
1692       · An  optional  fourth  argument  to  split()  to receive the separator
1693         texts.
1694
1695       · The optional second argument to the close() function.
1696
1697       · The optional third argument to the match() function.
1698
1699       · The ability to use positional specifiers with printf and sprintf().
1700
1701       · The ability to pass an array to length().
1702
1703       · The use of delete array to delete the entire contents of an array.
1704
1705       · The use of nextfile to abandon processing of the current input file.
1706
1707       · The and(), asort(), asorti(), bindtextdomain(), compl(), dcgettext(),
1708         dcngettext(),   gensub(),   lshift(),   mktime(),  or(),  patsplit(),
1709         rshift(), strftime(), strtonum(), systime() and xor() functions.
1710
1711       · Localizable strings.
1712
1713       · Adding new built-in functions dynamically with the extension()  func‐
1714         tion.
1715
1716       The  AWK book does not define the return value of the close() function.
1717       Gawk's close() returns the value from  fclose(3),  or  pclose(3),  when
1718       closing an output file or pipe, respectively.  It returns the process's
1719       exit status when closing an input pipe.  The return value is -1 if  the
1720       named file, pipe or co-process was not opened with a redirection.
1721
1722       When  gawk is invoked with the --traditional option, if the fs argument
1723       to the -F option is “t”, then FS is set to  the  tab  character.   Note
1724       that  typing  gawk  -F\t ...  simply causes the shell to quote the “t,”
1725       and does not pass “\t” to the -F option.  Since this is a  rather  ugly
1726       special  case, it is not the default behavior.  This behavior also does
1727       not occur if --posix has been specified.  To really get a tab character
1728       as  the  field  separator, it is best to use single quotes: gawk -F'\t'
1729       ....
1730

ENVIRONMENT VARIABLES

1732       The AWKPATH environment variable can be  used  to  provide  a  list  of
1733       directories  that gawk searches when looking for files named via the -f
1734       and --file options.
1735
1736       For socket communication, two special environment variables can be used
1737       to  control the number of retries (GAWK_SOCK_RETRIES), and the interval
1738       between retries (GAWK_MSEC_SLEEP).  The interval is in milliseconds. On
1739       systems  that  do  not support usleep(3), the value is rounded up to an
1740       integral number of seconds.
1741
1742       If POSIXLY_CORRECT exists in the environment, then gawk behaves exactly
1743       as  if  --posix  had been specified on the command line.  If --lint has
1744       been specified, gawk issues a warning message to this effect.
1745

EXIT STATUS

1747       If the exit statement is used with a value, then gawk  exits  with  the
1748       numeric value given to it.
1749
1750       Otherwise,  if there were no problems during execution, gawk exits with
1751       the value of the C constant EXIT_SUCCESS.  This is usually zero.
1752
1753       If an error occurs, gawk  exits  with  the  value  of  the  C  constant
1754       EXIT_FAILURE.  This is usually one.
1755
1756       If  gawk exits because of a fatal error, the exit status is 2.  On non-
1757       POSIX systems, this value may be mapped to EXIT_FAILURE.
1758

VERSION INFORMATION

1760       This man page documents gawk, version 4.0.
1761

AUTHORS

1763       The original version of UNIX awk was designed and implemented by Alfred
1764       Aho, Peter Weinberger, and Brian Kernighan of Bell Laboratories.  Brian
1765       Kernighan continues to maintain and enhance it.
1766
1767       Paul Rubin and Jay Fenlason, of the  Free  Software  Foundation,  wrote
1768       gawk,  to be compatible with the original version of awk distributed in
1769       Seventh Edition UNIX.  John Woods contributed a number  of  bug  fixes.
1770       David  Trueman,  with contributions from Arnold Robbins, made gawk com‐
1771       patible with the new version of UNIX awk.  Arnold Robbins is  the  cur‐
1772       rent maintainer.
1773
1774       The  initial  DOS  port  was  done  by Conrad Kwok and Scott Garfinkle.
1775       Scott Deifik maintains the port to MS-DOS using DJGPP.   Eli  Zaretskii
1776       maintains  the port to MS-Windows using MinGW.  Pat Rankin did the port
1777       to VMS, and Michal Jaegermann did the port to the Atari ST.   The  port
1778       to  OS/2  was  done by Kai Uwe Rommel, with contributions and help from
1779       Darrel Hankerson.  Andreas Buening now maintains the  OS/2  port.   The
1780       late  Fred  Fish  supplied support for the Amiga, and Martin Brown pro‐
1781       vided the BeOS port.  Stephen Davies provided the original Tandem port,
1782       and  Matthew Woehlke provided changes for Tandem's POSIX-compliant sys‐
1783       tems.  Dave Pitts provided the port to z/OS.
1784
1785       See the README file in the gawk distribution for up-to-date information
1786       about maintainers and which ports are currently supported.
1787

BUG REPORTS

1789       If  you  find  a  bug  in  gawk,  please  send  electronic mail to bug-
1790       gawk@gnu.org.  Please include your operating system and  its  revision,
1791       the version of gawk (from gawk --version), which C compiler you used to
1792       compile it, and a test program and data that are as small  as  possible
1793       for reproducing the problem.
1794
1795       Before  sending  a  bug report, please do the following things.  First,
1796       verify that you have the latest version of gawk.   Many  bugs  (usually
1797       subtle  ones)  are  fixed at each release, and if yours is out of date,
1798       the problem may already have been solved.  Second, please see  if  set‐
1799       ting  the  environment  variable  LC_ALL  to  LC_ALL=C causes things to
1800       behave as you expect. If so, it's a locale issue, and may  or  may  not
1801       really  be a bug.  Finally, please read this man page and the reference
1802       manual carefully to be sure that what you think is  a  bug  really  is,
1803       instead of just a quirk in the language.
1804
1805       Whatever  you do, do NOT post a bug report in comp.lang.awk.  While the
1806       gawk developers occasionally read this newsgroup, posting  bug  reports
1807       there  is  an  unreliable  way to report bugs.  Instead, please use the
1808       electronic mail addresses given above.
1809
1810       If you're using a GNU/Linux or BSD-based system, you may wish to submit
1811       a  bug  report  to  the  vendor of your distribution.  That's fine, but
1812       please send a copy to the official email address as well, since there's
1813       no  guarantee  that  the bug report will be forwarded to the gawk main‐
1814       tainer.
1815

BUGS

1817       The -F option is not necessary given the command line variable  assign‐
1818       ment feature; it remains only for backwards compatibility.
1819
1820       Syntactically  invalid  single  character programs tend to overflow the
1821       parse stack, generating a rather unhelpful message.  Such programs  are
1822       surprisingly  difficult to diagnose in the completely general case, and
1823       the effort to do so really is not worth it.
1824

SEE ALSO

1826       egrep(1), getpid(2),  getppid(2),  getpgrp(2),  getuid(2),  geteuid(2),
1827       getgid(2), getegid(2), getgroups(2), usleep(3)
1828
1829       The  AWK Programming Language, Alfred V. Aho, Brian W. Kernighan, Peter
1830       J. Weinberger, Addison-Wesley, 1988.  ISBN 0-201-07981-X.
1831
1832       GAWK: Effective AWK Programming, Edition 4.0,  shipped  with  the  gawk
1833       source.   The  current  version of this document is available online at
1834       http://www.gnu.org/software/gawk/manual.
1835

EXAMPLES

1837       Print and sort the login names of all users:
1838
1839            BEGIN     { FS = ":" }
1840                 { print $1 | "sort" }
1841
1842       Count lines in a file:
1843
1844                 { nlines++ }
1845            END  { print nlines }
1846
1847       Precede each line by its number in the file:
1848
1849            { print FNR, $0 }
1850
1851       Concatenate and line number (a variation on a theme):
1852
1853            { print NR, $0 }
1854
1855       Run an external command for particular lines of data:
1856
1857            tail -f access_log |
1858            awk '/myhome.html/ { system("nmap " $1 ">> logdir/myhome.html") }'
1859

ACKNOWLEDGEMENTS

1861       Brian Kernighan of Bell Laboratories provided valuable assistance  dur‐
1862       ing testing and debugging.  We thank him.
1863

COPYING PERMISSIONS

1865       Copyright © 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
1866       2001, 2002, 2003, 2004, 2005, 2007,  2009,  2010,  2011  Free  Software
1867       Foundation, Inc.
1868
1869       Permission  is  granted  to make and distribute verbatim copies of this
1870       manual page provided the copyright notice and  this  permission  notice
1871       are preserved on all copies.
1872
1873       Permission  is granted to copy and distribute modified versions of this
1874       manual page under the conditions for verbatim  copying,  provided  that
1875       the  entire  resulting derived work is distributed under the terms of a
1876       permission notice identical to this one.
1877
1878       Permission is granted to copy and distribute translations of this  man‐
1879       ual page into another language, under the above conditions for modified
1880       versions, except that this permission notice may be stated in a  trans‐
1881       lation approved by the Foundation.
1882
1883
1884
1885Free Software Foundation          Dec 07 2012                          GAWK(1)
Impressum