1MAWK(1)                     General Commands Manual                    MAWK(1)
2
3
4

NAME

6       mawk - pattern scanning and text processing language
7

SYNOPSIS

9       mawk  [-W  option]  [-F value] [-v var=value] [--] 'program text' [file
10       ...]
11       mawk [-W option] [-F value] [-v var=value] [-f program-file] [--] [file
12       ...]
13

DESCRIPTION

15       mawk  is an interpreter for the AWK Programming Language.  The AWK lan‐
16       guage is useful for manipulation of data files, text retrieval and pro‐
17       cessing,  and  for prototyping and experimenting with algorithms.  mawk
18       is a new awk meaning it implements the AWK language as defined in  Aho,
19       Kernighan  and Weinberger, The AWK Programming Language, Addison-Wesley
20       Publishing, 1988.  (Hereafter referred to as the AWK book.)  mawk  con‐
21       forms  to  the Posix 1003.2 (draft 11.3) definition of the AWK language
22       which contains a few features not described in the AWK book,  and  mawk
23       provides a small number of extensions.
24
25       An  AWK  program  is  a sequence of pattern {action} pairs and function
26       definitions.  Short programs are entered on the  command  line  usually
27       enclosed  in ' ' to avoid shell interpretation.  Longer programs can be
28       read in from a file with the -f option.  Data  input is read  from  the
29       list  of files on the command line or from standard input when the list
30       is empty.  The input is broken into records as determined by the record
31       separator  variable,  RS.  Initially, RS = "\n" and records are synony‐
32       mous with lines.  Each record is compared against each pattern  and  if
33       it matches, the program text for {action} is executed.
34

OPTIONS

36       -F value       sets the field separator, FS, to value.
37
38       -f file        Program  text is read from file instead of from the com‐
39                      mand line.  Multiple -f options are allowed.
40
41       -v var=value   assigns value to program variable var.
42
43       --             indicates the unambiguous end of options.
44
45       The above options will be available with any Posix compatible implemen‐
46       tation  of  AWK,  and implementation specific options are prefaced with
47       -W.  mawk provides six:
48
49       -W version     mawk writes its version and copyright to stdout and com‐
50                      piled limits to stderr and exits 0.
51
52       -W dump        writes  an assembler like listing of the internal repre‐
53                      sentation of the program to stdout and exits 0 (on  suc‐
54                      cessful compilation).
55
56       -W interactive sets unbuffered writes to stdout and line buffered reads
57                      from stdin.  Records from stdin are lines regardless  of
58                      the value of RS.
59
60       -W exec file   Program  text  is  read  from  file and this is the last
61                      option. Useful on systems that support  the  #!   "magic
62                      number" convention for executable scripts.
63
64       -W sprintf=num adjusts  the  size  of mawk's internal sprintf buffer to
65                      num bytes.  More than rare use of this option  indicates
66                      mawk should be recompiled.
67
68       -W posix_space forces mawk not to consider '\n' to be space.
69
70       The  short  forms  -W[vdiesp] are recognized and on some systems -We is
71       mandatory to avoid command line length limitations.
72
73       mawk allows multiple -W  options  to  be  combined  by  separating  the
74       options with commas, e.g., -Wsprint=2000,posix.
75

THE AWK LANGUAGE

77   1. Program structure
78       An  AWK  program is a sequence of pattern {action} pairs and user func‐
79       tion definitions.
80
81       A pattern can be:
82              BEGIN
83              END
84              expression
85              expression , expression
86
87       One, but not both, of pattern {action} can be omitted.   If {action} is
88       omitted  it is implicitly { print }.  If pattern is omitted, then it is
89       implicitly matched.  BEGIN and END patterns require an action.
90
91       Statements are terminated by newlines, semi-colons or both.  Groups  of
92       statements such as actions or loop bodies are blocked via { ... } as in
93       C.  The last statement in a block doesn't  need  a  terminator.   Blank
94       lines  have  no  meaning; an empty statement is terminated with a semi-
95       colon. Long statements can be continued with a backslash, \.  A  state‐
96       ment  can  be broken without a backslash after a comma, left brace, &&,
97       ||, do, else, the right parenthesis of an if, while or  for  statement,
98       and  the  right parenthesis of a function definition.  A comment starts
99       with # and extends to, but does not include the end of line.
100
101       The following statements control program flow inside blocks.
102
103              if ( expr ) statement
104
105              if ( expr ) statement else statement
106
107              while ( expr ) statement
108
109              do statement while ( expr )
110
111              for ( opt_expr ; opt_expr ; opt_expr ) statement
112
113              for ( var in array ) statement
114
115              continue
116
117              break
118
119   2. Data types, conversion and comparison
120       There are two basic data types, numeric and string.  Numeric  constants
121       can  be  integer  like -2, decimal like 1.08, or in scientific notation
122       like -1.1e4 or .28E-3.  All numbers are represented internally and  all
123       computations  are  done  in floating point arithmetic.  So for example,
124       the expression 0.2e2 == 20 is true and true is represented as 1.0.
125
126       String constants are enclosed in double quotes.
127
128                   "This is a string with a newline at the end.\n"
129
130       Strings can be continued across a line by  escaping  (\)  the  newline.
131       The following escape sequences are recognized.
132
133            \\        \
134            \"        "
135            \a        alert, ascii 7
136            \b        backspace, ascii 8
137            \t        tab, ascii 9
138            \n        newline, ascii 10
139            \v        vertical tab, ascii 11
140            \f        formfeed, ascii 12
141            \r        carriage return, ascii 13
142            \ddd      1, 2 or 3 octal digits for ascii ddd
143            \xhh      1 or 2 hex digits for ascii  hh
144
145       If  you  escape  any other character \c, you get \c, i.e., mawk ignores
146       the escape.
147
148       There are really three basic data types; the third is number and string
149       which  has  both  a  numeric value and a string value at the same time.
150       User defined variables come into existence when  first  referenced  and
151       are  initialized  to  null, a number and string value which has numeric
152       value 0 and string value "".  Non-trivial number and string typed  data
153       come from input and are typically stored in fields.  (See section 4).
154
155       The  type  of  an expression is determined by its context and automatic
156       type conversion occurs if needed.  For example, to evaluate the  state‐
157       ments
158
159            y = x + 2  ;  z = x  "hello"
160
161       The  value  stored  in  variable  y will be typed numeric.  If x is not
162       numeric, the value read from x is converted to  numeric  before  it  is
163       added  to  2  and  stored in y.  The value stored in variable z will be
164       typed string, and the value of x will be converted to string if  neces‐
165       sary  and  concatenated  with  "hello".  (Of course, the value and type
166       stored in x is not changed by any conversions.)  A string expression is
167       converted  to numeric using its longest numeric prefix as with atof(3).
168       A numeric expression is converted to  string  by  replacing  expr  with
169       sprintf(CONVFMT,  expr),  unless  expr  can  be represented on the host
170       machine as an exact integer  then  it  is  converted  to  sprintf("%d",
171       expr).   Sprintf() is an AWK built-in that duplicates the functionality
172       of sprintf(3), and CONVFMT is a built-in  variable  used  for  internal
173       conversion  from  number to string and initialized to "%.6g".  Explicit
174       type conversions can be  forced,  expr  ""  is  string  and  expr+0  is
175       numeric.
176
177       To evaluate, expr1 rel-op expr2, if both operands are numeric or number
178       and string then the comparison is numeric; if both operands are  string
179       the  comparison is string; if one operand is string, the non-string op‐
180       erand is converted  and  the  comparison  is  string.   The  result  is
181       numeric, 1 or 0.
182
183       In boolean contexts such as, if ( expr ) statement, a string expression
184       evaluates true if and only if it is not the empty  string  "";  numeric
185       values if and only if not numerically zero.
186
187   3. Regular expressions
188       In  the  AWK language, records, fields and strings are often tested for
189       matching a regular expression.  Regular  expressions  are  enclosed  in
190       slashes, and
191
192            expr ~ /r/
193
194       is  an  AWK  expression  that evaluates to 1 if expr "matches" r, which
195       means a substring of expr is in the set of strings defined by r.   With
196       no  match  the  expression  evaluates  to  0; replacing ~ with the "not
197       match" operator, !~ , reverses the meaning.  As  pattern-action pairs,
198
199            /r/ { action }   and   $0 ~ /r/ { action }
200
201       are the same, and for each input record that matches r, action is  exe‐
202       cuted.   In  fact, /r/ is an AWK expression that is equivalent to ($0 ~
203       /r/) anywhere except when on the right side  of  a  match  operator  or
204       passed  as  an  argument  to a built-in function that expects a regular
205       expression argument.
206
207       AWK uses extended regular expressions as with  egrep(1).   The  regular
208       expression  metacharacters, i.e., those with special meaning in regular
209       expressions are
210
211             ^ $ . [ ] | ( ) * + ?
212
213       Regular expressions are built up from characters as follows:
214
215              c            matches any non-metacharacter c.
216
217              \c           matches a character  defined  by  the  same  escape
218                           sequences  used  in string constants or the literal
219                           character c if \c is not an escape sequence.
220
221              .            matches any character (including newline).
222
223              ^            matches the front of a string.
224
225              $            matches the back of a string.
226
227              [c1c2c3...]  matches any character in the class c1c2c3... .   An
228                           interval  of  characters  is denoted c1-c2 inside a
229                           class [...].
230
231              [^c1c2c3...] matches any character not in the class c1c2c3...
232
233       Regular expressions are built up from other regular expressions as fol‐
234       lows:
235
236              r1r2         matches  r1  followed immediately by r2 (concatena‐
237                           tion).
238
239              r1 | r2      matches r1 or r2 (alternation).
240
241              r*           matches r repeated zero or more times.
242
243              r+           matches r repeated one or more times.
244
245              r?           matches r zero or once.
246
247              (r)          matches r, providing grouping.
248
249       The increasing precedence of operators  is  alternation,  concatenation
250       and unary (*, + or ?).
251
252       For example,
253
254            /^[_a-zA-Z][_a-zA-Z0-9]*$/  and
255            /^[-+]?([0-9]+\.?|\.[0-9])[0-9]*([eE][-+]?[0-9]+)?$/
256
257       are  matched by AWK identifiers and AWK numeric constants respectively.
258       Note that . has to be escaped to be recognized as a decimal point,  and
259       that metacharacters are not special inside character classes.
260
261       Any expression can be used on the right hand side of the ~ or !~ opera‐
262       tors or passed to a built-in that expects  a  regular  expression.   If
263       needed,  it  is  converted to string, and then interpreted as a regular
264       expression.  For example,
265
266            BEGIN { identifier = "[_a-zA-Z][_a-zA-Z0-9]*" }
267
268            $0 ~ "^" identifier
269
270       prints all lines that start with an AWK identifier.
271
272       mawk recognizes the empty regular expression,  //,  which  matches  the
273       empty  string and hence is matched by any string at the front, back and
274       between every character.  For example,
275
276            echo  abc | mawk { gsub(//, "X") ; print }
277            XaXbXcX
278
279
280   4. Records and fields
281       Records are read in one at a time, and stored in the field variable $0.
282       The  record  is split into fields which are stored in $1, $2, ..., $NF.
283       The built-in variable NF is set to the number of fields, and NR and FNR
284       are incremented by 1.  Fields above $NF are set to "".
285
286       Assignment to $0 causes the fields and NF to be recomputed.  Assignment
287       to NF or to a field causes $0 to be reconstructed by concatenating  the
288       $i's  separated  by OFS.  Assignment to a field with index greater than
289       NF, increases NF and causes $0 to be reconstructed.
290
291       Data input stored in fields is string,  unless  the  entire  field  has
292       numeric form and then the type is number and string.  For example,
293
294            echo 24 24E |
295            mawk '{ print($1>100, $1>"100", $2>100, $2>"100") }'
296            0 1 1 1
297
298       $0 and $2 are string and $1 is number and string.  The first comparison
299       is numeric, the second is string, the third is string (100 is converted
300       to "100"), and the last is string.
301
302   5. Expressions and operators
303       The expression syntax is similar to C.  Primary expressions are numeric
304       constants, string constants, variables,  fields,  arrays  and  function
305       calls.   The  identifier  for  a  variable,  array or function can be a
306       sequence of letters, digits and underscores, that does not start with a
307       digit.   Variables  are  not declared; they exist when first referenced
308       and are initialized to null.
309
310       New expressions are composed with the following operators in  order  of
311       increasing precedence.
312
313              assignment          =  +=  -=  *=  /=  %=  ^=
314              conditional         ?  :
315              logical or          ||
316              logical and         &&
317              array membership    in
318              matching       ~   !~
319              relational          <  >   <=  >=  ==  !=
320              concatenation       (no explicit operator)
321              add ops             +  -
322              mul ops             *  /  %
323              unary               +  -
324              logical not         !
325              exponentiation      ^
326              inc and dec         ++ -- (both post and pre)
327              field               $
328
329       Assignment, conditional and exponentiation associate right to left; the
330       other operators associate left to right.  Any expression can be  paren‐
331       thesized.
332
333   6. Arrays
334       Awk  provides  one-dimensional arrays.  Array elements are expressed as
335       array[expr].  Expr is internally converted  to  string  type,  so,  for
336       example,  A[1]  and A["1"] are the same element and the actual index is
337       "1".  Arrays indexed by strings are called  associative  arrays.   Ini‐
338       tially  an  array  is  empty;  elements  exist when first accessed.  An
339       expression, expr in array evaluates to 1 if array[expr] exists, else to
340       0.
341
342       There  is  a form of the for statement that loops over each index of an
343       array.
344
345            for ( var in array ) statement
346
347       sets var to each index of array and executes statement.  The order that
348       var transverses the indices of array is not defined.
349
350       The  statement,  delete  array[expr],  causes array[expr] not to exist.
351       mawk supports an extension, delete array, which deletes all elements of
352       array.
353
354       Multidimensional  arrays  are  synthesized with concatenation using the
355       built-in  variable  SUBSEP.   array[expr1,expr2]   is   equivalent   to
356       array[expr1 SUBSEP expr2].  Testing for a multidimensional element uses
357       a parenthesized index, such as
358
359            if ( (i, j) in A )  print A[i, j]
360
361
362   7. Builtin-variables
363       The following variables are built-in  and  initialized  before  program
364       execution.
365
366              ARGC      number of command line arguments.
367
368              ARGV      array of command line arguments, 0..ARGC-1.
369
370              CONVFMT   format  for  internal conversion of numbers to string,
371                        initially = "%.6g".
372
373              ENVIRON   array indexed by environment variables.   An  environ‐
374                        ment  string,  var=value  is  stored as ENVIRON[var] =
375                        value.
376
377              FILENAME  name of the current input file.
378
379              FNR       current record number in FILENAME.
380
381              FS        splits records into fields as a regular expression.
382
383              NF        number of fields in the current record.
384
385              NR        current record number in the total input stream.
386
387              OFMT      format for printing numbers; initially = "%.6g".
388
389              OFS       inserted between fields on output, initially = " ".
390
391              ORS       terminates each record on output, initially = "\n".
392
393              RLENGTH   length set by the last call to the built-in  function,
394                        match().
395
396              RS        input record separator, initially = "\n".
397
398              RSTART    index set by the last call to match().
399
400              SUBSEP    used  to  build multiple array subscripts, initially =
401                        "\034".
402
403   8. Built-in functions
404       String functions
405
406              gsub(r,s,t)  gsub(r,s)
407                     Global substitution, every match of regular expression  r
408                     in  variable  t  is  replaced by string s.  The number of
409                     replacements is returned.  If t is omitted, $0  is  used.
410                     An  &  in  the  replacement  string  s is replaced by the
411                     matched substring of t.  \& and \\ put  literal & and  \,
412                     respectively, in the replacement string.
413
414              index(s,t)
415                     If  t  is  a  substring  of  s, then the position where t
416                     starts is returned, else 0 is returned.  The first  char‐
417                     acter of s is in position 1.
418
419              length(s)
420                     Returns the length of string s.
421
422              match(s,r)
423                     Returns  the  index of the first longest match of regular
424                     expression r in string s.  Returns 0 if no match.   As  a
425                     side  effect, RSTART is set to the return value.  RLENGTH
426                     is set to the length of the match or -1 if no match.   If
427                     the  empty  string is matched, RLENGTH is set to 0, and 1
428                     is returned if the match is at the front, and length(s)+1
429                     is returned if the match is at the back.
430
431              split(s,A,r)  split(s,A)
432                     String s is split into fields by regular expression r and
433                     the fields are loaded into array A.  The number of fields
434                     is returned.  See section 11 below for more detail.  If r
435                     is omitted, FS is used.
436
437              sprintf(format,expr-list)
438                     Returns a string constructed from expr-list according  to
439                     format.  See the description of printf() below.
440
441              sub(r,s,t)  sub(r,s)
442                     Single  substitution,  same  as gsub() except at most one
443                     substitution.
444
445              substr(s,i,n)  substr(s,i)
446                     Returns the substring of string s, starting at  index  i,
447                     of  length n.  If n is omitted, the suffix of s, starting
448                     at i is returned.
449
450              tolower(s)
451                     Returns a copy of s with all upper case  characters  con‐
452                     verted to lower case.
453
454              toupper(s)
455                     Returns  a  copy of s with all lower case characters con‐
456                     verted to upper case.
457
458       Arithmetic functions
459
460              atan2(y,x)     Arctan of y/x between -pi and pi.
461
462              cos(x)         Cosine function, x in radians.
463
464              exp(x)         Exponential function.
465
466              int(x)         Returns x truncated towards zero.
467
468              log(x)         Natural logarithm.
469
470              rand()         Returns a random number between zero and one.
471
472              sin(x)         Sine function, x in radians.
473
474              sqrt(x)        Returns square root of x.
475
476              srand(expr)  srand()
477                     Seeds the random number generator,  using  the  clock  if
478                     expr  is  omitted,  and returns the value of the previous
479                     seed.  mawk seeds the random number  generator  from  the
480                     clock  at  startup  so  there  is  no  real  need to call
481                     srand().  Srand(expr) is useful for repeating pseudo ran‐
482                     dom sequences.
483
484   9. Input and output
485       There are two output statements, print and printf.
486
487              print  writes $0  ORS to standard output.
488
489              print expr1, expr2, ..., exprn
490                     writes expr1 OFS expr2 OFS ... exprn ORS to standard out‐
491                     put.  Numeric expressions are converted  to  string  with
492                     OFMT.
493
494              printf format, expr-list
495                     duplicates the printf C library function writing to stan‐
496                     dard output.  The complete ANSI C  format  specifications
497                     are  recognized  with conversions %c, %d, %e, %E, %f, %g,
498                     %G, %i, %o, %s, %u, %x, %X and %%, and conversion  quali‐
499                     fiers h and l.
500
501       The  argument  list  to  print  or printf can optionally be enclosed in
502       parentheses.  Print formats numbers using OFMT or "%d" for exact  inte‐
503       gers.   "%c"  with  a  numeric  argument prints the corresponding 8 bit
504       character, with a string argument it prints the first character of  the
505       string.   The output of print and printf can be redirected to a file or
506       command by appending > file, >> file or | command to  the  end  of  the
507       print  statement.   Redirection opens file or command only once, subse‐
508       quent redirections append to the already open stream.   By  convention,
509       mawk  associates  the  filename  "/dev/stderr" with stderr which allows
510       print and printf to be redirected to stderr.  mawk also associates  "-"
511       and  "/dev/stdout"  with stdin and stdout which allows these streams to
512       be passed to functions.
513
514       The input function getline has the following variations.
515
516              getline
517                     reads into $0, updates the fields, NF, NR and FNR.
518
519              getline < file
520                     reads into $0 from file, updates the fields and NF.
521
522              getline var
523                     reads the next record into var, updates NR and FNR.
524
525              getline var < file
526                     reads the next record of file into var.
527
528               command | getline
529                     pipes a record from  command  into  $0  and  updates  the
530                     fields and NF.
531
532               command | getline var
533                     pipes a record from command into var.
534
535       Getline returns 0 on end-of-file, -1 on error, otherwise 1.
536
537       Commands on the end of pipes are executed by /bin/sh.
538
539       The  function close(expr) closes the file or pipe associated with expr.
540       Close returns 0 if expr is an open file, the exit status if expr  is  a
541       piped  command,  and  -1  otherwise.  Close is used to reread a file or
542       command, make sure the other end of an output pipe is finished or  con‐
543       serve file resources.
544
545       The  function  fflush(expr)  flushes the output file or pipe associated
546       with expr.  Fflush returns 0 if expr is an open output stream else  -1.
547       Fflush  without an argument flushes stdout.  Fflush with an empty argu‐
548       ment ("") flushes all open output.
549
550       The function system(expr) uses /bin/sh to execute expr and returns  the
551       exit status of the command expr.  Changes made to the ENVIRON array are
552       not passed to commands executed with system or pipes.
553
554   10. User defined functions
555       The syntax for a user defined function is
556
557            function name( args ) { statements }
558
559       The function body can contain a return statement
560
561            return opt_expr
562
563       A return statement is not required.  Function calls may  be  nested  or
564       recursive.   Functions  are  passed  expressions by value and arrays by
565       reference.  Extra arguments serve as local variables and  are  initial‐
566       ized  to  null.  For example, csplit(s,A) puts each character of s into
567       array A and returns the length of s.
568
569            function csplit(s, A,    n, i)
570            {
571              n = length(s)
572              for( i = 1 ; i <= n ; i++ ) A[i] = substr(s, i, 1)
573              return n
574            }
575
576       Putting extra space between passed arguments  and  local  variables  is
577       conventional.  Functions can be referenced before they are defined, but
578       the function name and the '(' of the arguments must touch to avoid con‐
579       fusion with concatenation.
580
581   11. Splitting strings, records and files
582       Awk  programs  use the same algorithm to split strings into arrays with
583       split(), and records into fields on FS.  mawk uses essentially the same
584       algorithm to split files into records on RS.
585
586       Split(expr,A,sep) works as follows:
587
588              (1)    If  sep  is omitted, it is replaced by FS.  Sep can be an
589                     expression or regular expression.  If it is an expression
590                     of non-string type, it is converted to string.
591
592              (2)    If  sep  =  " " (a single space), then <SPACE> is trimmed
593                     from the front and back of expr, and sep becomes <SPACE>.
594                     mawk   defines   <SPACE>   as   the   regular  expression
595                     /[ \t\n]+/.   Otherwise  sep  is  treated  as  a  regular
596                     expression, except that meta-characters are ignored for a
597                     string of length 1, e.g., split(x, A, "*")  and  split(x,
598                     A, /\*/) are the same.
599
600              (3)    If  expr  is  not  string, it is converted to string.  If
601                     expr is then the empty string "", split() returns 0 and A
602                     is  set  empty.  Otherwise, all non-overlapping, non-null
603                     and longest matches of sep in expr,  separate  expr  into
604                     fields which are loaded into A.  The fields are placed in
605                     A[1], A[2], ..., A[n] and split() returns n,  the  number
606                     of  fields which is the number of matches plus one.  Data
607                     placed in A  that  looks  numeric  is  typed  number  and
608                     string.
609
610       Splitting  records  into  fields  works  the same except the pieces are
611       loaded into $1, $2,..., $NF.  If $0 is empty, NF is set to 0 and all $i
612       to "".
613
614       mawk  splits  files  into  records  by the same algorithm, but with the
615       slight difference that RS is really a terminator instead of  a  separa‐
616       tor.  (ORS is really a terminator too).
617
618              E.g.,  if FS = ":+" and $0 = "a::b:" , then NF = 3 and $1 = "a",
619              $2 = "b" and $3 = "", but if "a::b:" is the contents of an input
620              file and RS = ":+", then there are two records "a" and "b".
621
622       RS = " " is not special.
623
624       If  FS  =  "",  then mawk breaks the record into individual characters,
625       and, similarly, split(s,A,"") places the  individual  characters  of  s
626       into A.
627
628   12. Multi-line records
629       Since  mawk  interprets  RS as a regular expression, multi-line records
630       are easy.  Setting RS = "\n\n+", makes one or more blank lines separate
631       records.  If FS = " " (the default), then single newlines, by the rules
632       for <SPACE> above, become space and single newlines are  field  separa‐
633       tors.
634
635              For  example,  if  a file is "a b\nc\n\n", RS = "\n\n+" and FS =
636              " ", then there is one record "a b\nc" with  three  fields  "a",
637              "b"  and  "c".   Changing  FS = "\n", gives two fields "a b" and
638              "c"; changing FS = "", gives one field identical to the record.
639
640       If you want lines with spaces or tabs to be considered blank, set RS  =
641       "\n([ \t]*\n)+".   For  compatibility  with other awks, setting RS = ""
642       has the same effect as if blank lines are stripped from the  front  and
643       back  of  files  and  then  records  are determined as if RS = "\n\n+".
644       Posix requires that "\n" always separates records when RS = ""  regard‐
645       less  of  the  value  of  FS.   mawk  does not support this convention,
646       because defining "\n" as <SPACE> makes it unnecessary.
647
648       Most of the time when you change RS for multi-line  records,  you  will
649       also want to change ORS to "\n\n" so the record spacing is preserved on
650       output.
651
652   13. Program execution
653       This section describes the order of program execution.  First  ARGC  is
654       set  to the total number of command line arguments passed to the execu‐
655       tion phase of the program.  ARGV[0] is set the name of the  AWK  inter‐
656       preter  and  ARGV[1] ...  ARGV[ARGC-1] holds the remaining command line
657       arguments exclusive of options and program source.  For example with
658
659            mawk  -f  prog  v=1  A  t=hello  B
660
661       ARGC = 5 with ARGV[0] = "mawk", ARGV[1] = "v=1", ARGV[2] = "A", ARGV[3]
662       = "t=hello" and ARGV[4] = "B".
663
664       Next,  each  BEGIN block is executed in order.  If the program consists
665       entirely of BEGIN blocks, then  execution  terminates,  else  an  input
666       stream  is opened and execution continues.  If ARGC equals 1, the input
667       stream is set to stdin, else  the command line  arguments  ARGV[1]  ...
668       ARGV[ARGC-1] are examined for a file argument.
669
670       The  command  line  arguments  divide  into three sets: file arguments,
671       assignment arguments and empty strings "".  An assignment has the  form
672       var=string.   When  an ARGV[i] is examined as a possible file argument,
673       if it is empty it is skipped; if it  is  an  assignment  argument,  the
674       assignment  to  var  takes place and i skips to the next argument; else
675       ARGV[i] is opened for input.  If it fails to open, execution terminates
676       with exit code 2.  If no command line argument is a file argument, then
677       input comes from stdin.  Getline in a BEGIN action opens input.  "-" as
678       a file argument denotes stdin.
679
680       Once  an input stream is open, each input record is tested against each
681       pattern, and if it matches, the  associated  action  is  executed.   An
682       expression  pattern  matches if it is boolean true (see the end of sec‐
683       tion 2).  A BEGIN pattern matches before any input has been  read,  and
684       an END pattern matches after all input has been read.  A range pattern,
685       expr1,expr2 , matches every record between the match of expr1  and  the
686       match expr2 inclusively.
687
688       When end of file occurs on the input stream, the remaining command line
689       arguments are examined for a file argument, and if there is one  it  is
690       opened,  else the END pattern is considered matched and all END actions
691       are executed.
692
693       In the example, the assignment v=1 takes place after the BEGIN  actions
694       are  executed,  and  the  data  placed in v is typed number and string.
695       Input is then read from file A.  On end of file A,  t  is  set  to  the
696       string  "hello",  and B is opened for input.  On end of file B, the END
697       actions are executed.
698
699       Program flow at the pattern {action} level can be changed with the
700
701            next
702            exit  opt_expr
703
704       statements.  A next statement causes the next input record to  be  read
705       and  pattern testing to restart with the first pattern {action} pair in
706       the program.  An exit statement causes immediate execution of  the  END
707       actions  or program termination if there are none or if the exit occurs
708       in an END action.  The opt_expr sets the  exit  value  of  the  program
709       unless overridden by a later exit or subsequent error.
710

EXAMPLES

712       1. emulate cat.
713
714            { print }
715
716       2. emulate wc.
717
718            { chars += length($0) + 1  # add one for the \n
719              words += NF
720            }
721
722            END{ print NR, words, chars }
723
724       3. count the number of unique "real words".
725
726            BEGIN { FS = "[^A-Za-z]+" }
727
728            { for(i = 1 ; i <= NF ; i++)  word[$i] = "" }
729
730            END { delete word[""]
731                  for ( i in word )  cnt++
732                  print cnt
733            }
734
735       4. sum the second field of every record based on the first field.
736
737            $1 ~ /credit|gain/ { sum += $2 }
738            $1 ~ /debit|loss/  { sum -= $2 }
739
740            END { print sum }
741
742       5. sort a file, comparing as string
743
744            { line[NR] = $0 "" }  # make sure of comparison type
745                            # in case some lines look numeric
746
747            END {  isort(line, NR)
748              for(i = 1 ; i <= NR ; i++) print line[i]
749            }
750
751            #insertion sort of A[1..n]
752            function isort( A, n,    i, j, hold)
753            {
754              for( i = 2 ; i <= n ; i++)
755              {
756                hold = A[j = i]
757                while ( A[j-1] > hold )
758                { j-- ; A[j+1] = A[j] }
759                A[j] = hold
760              }
761              # sentinel A[0] = "" will be created if needed
762            }
763
764

COMPATIBILITY ISSUES

766       The  Posix  1003.2(draft 11.3) definition of the AWK language is AWK as
767       described in the AWK book with a few extensions that appeared  in  Sys‐
768       temVR4 nawk. The extensions are:
769
770              New functions: toupper() and tolower().
771
772              New variables: ENVIRON[] and CONVFMT.
773
774              ANSI C conversion specifications for printf() and sprintf().
775
776              New  command  options:   -v  var=value,  multiple -f options and
777              implementation options as arguments to -W.
778
779              For systems (MS-DOS or Windows) which provide  a  setmode  func‐
780              tion,  an  environment variable MAWKBINMODE and a built-in vari‐
781              able BINMODE.  The bits of the BINMODE value tell  mawk  how  to
782              modify the RS and ORS variables:
783
784                 0  set  standard input to binary mode, and if BIT-2 is unset,
785                    set RS to "\r\n" (CR/LF) rather than "\n" (LF).
786
787                 1  set standard output to binary mode, and if BIT-2 is unset,
788                    set ORS to "\r\n" (CR/LF) rather than "\n" (LF).
789
790                 2  suppress  the assignment to RS and ORS of CR/LF, making it
791                    possible to run scripts  and  generate  output  compatible
792                    with Unix line-endings.
793
794       Posix  AWK is oriented to operate on files a line at a time.  RS can be
795       changed from "\n" to another single character, but it is hard  to  find
796       any  use  for this — there are no examples in the AWK book.  By conven‐
797       tion, RS = "", makes one or more blank lines separate records, allowing
798       multi-line  records.   When  RS  = "", "\n" is always a field separator
799       regardless of the value in FS.
800
801       mawk, on the other hand, allows RS to be a  regular  expression.   When
802       "\n"  appears  in records, it is treated as space, and FS always deter‐
803       mines fields.
804
805       Removing the line at a time paradigm can make some programs simpler and
806       can  often  improve  performance.   For example, redoing example 3 from
807       above,
808
809            BEGIN { RS = "[^A-Za-z]+" }
810
811            { word[ $0 ] = "" }
812
813            END { delete  word[ "" ]
814              for( i in word )  cnt++
815              print cnt
816            }
817
818       counts the number of unique words by making each  word  a  record.   On
819       moderate  size  files, mawk executes twice as fast, because of the sim‐
820       plified inner loop.
821
822       The following program replaces each comment by a single space  in  a  C
823       program file,
824
825            BEGIN {
826              RS = "/\*([^*]|\*+[^/*])*\*+/"
827                 # comment is record separator
828              ORS = " "
829              getline  hold
830              }
831
832              { print hold ; hold = $0 }
833
834              END { printf "%s" , hold }
835
836       Buffering  one  record  is  needed to avoid terminating the last record
837       with a space.
838
839       With mawk, the following are all equivalent,
840
841            x ~ /a\+b/    x ~ "a\+b"     x ~ "a\\+b"
842
843       The strings get scanned twice, once  as  string  and  once  as  regular
844       expression.   On the string scan, mawk ignores the escape on non-escape
845       characters while the AWK book advocates \c be  recognized  as  c  which
846       necessitates  the double escaping of meta-characters in strings.  Posix
847       explicitly declines to define the behavior which passively forces  pro‐
848       grams  that  must  run under a variety of awks to use the more portable
849       but less readable, double escape.
850
851       Posix AWK does not  recognize  "/dev/std{out,err}"  or  \x  hex  escape
852       sequences  in strings.  Unlike ANSI C, mawk limits the number of digits
853       that follows \x to two as the current implementation  only  supports  8
854       bit  characters.  The built-in fflush first appeared in a recent (1993)
855       AT&T awk released to netlib, and is not part  of  the  posix  standard.
856       Aggregate deletion with delete array is not part of the posix standard.
857
858       Posix explicitly leaves the behavior of FS = "" undefined, and mentions
859       splitting the record into characters as a possible interpretation,  but
860       currently this use is not portable across implementations.
861
862       Finally,  here  is  how mawk handles exceptional cases not discussed in
863       the AWK book or the Posix draft.  It is unsafe  to  assume  consistency
864       across awks and safe to skip to the next section.
865
866              substr(s,  i, n) returns the characters of s in the intersection
867              of the closed interval [1, length(s)] and the half-open interval
868              [i,  i+n).  When this intersection is empty, the empty string is
869              returned; so substr("ABC", 1, 0) = "" and substr("ABC", -4, 6) =
870              "A".
871
872              Every  string,  including  the  empty  string, matches the empty
873              string at the front so, s ~ // and s ~ "", are always  1  as  is
874              match(s, //) and match(s, "").  The last two set RLENGTH to 0.
875
876              index(s,  t)  is always the same as match(s, t1) where t1 is the
877              same as t with metacharacters escaped.  Hence  consistency  with
878              match  requires  that  index(s,  "") always returns 1.  Also the
879              condition, index(s,t) != 0 if and only t is a  substring  of  s,
880              requires index("","") = 1.
881
882              If  getline  encounters  end  of  file,  getline var, leaves var
883              unchanged.  Similarly, on entry to  the  END  actions,  $0,  the
884              fields and NF have their value unaltered from the last record.
885

SEE ALSO

887       egrep(1)
888
889       Aho,  Kernighan  and Weinberger, The AWK Programming Language, Addison-
890       Wesley Publishing, 1988, (the AWK book), defines the language,  opening
891       with  a  tutorial and advancing to many interesting programs that delve
892       into issues of software design and analysis relevant to programming  in
893       any language.
894
895       The  GAWK Manual, The Free Software Foundation, 1991, is a tutorial and
896       language reference that does not attempt the depth of the AWK book  and
897       assumes  the  reader  may  be  a novice programmer.  The section on AWK
898       arrays is excellent.  It also discusses Posix requirements for AWK.
899

BUGS

901       mawk implements printf() and sprintf() using the C  library  functions,
902       printf  and  sprintf,  so  full  ANSI  compatibility requires an ANSI C
903       library.  In practice this means the h conversion qualifier may not  be
904       available.   Also  mawk inherits any bugs or limitations of the library
905       functions.
906
907       Implementors of the AWK language have shown a consistent lack of imagi‐
908       nation when naming their programs.
909

AUTHOR

911       Mike Brennan (brennan@whidbey.com).
912       Thomas E. Dickey <dickey@invisible-island.net>.
913
914
915
916                                 USER COMMANDS                         MAWK(1)
Impressum