1MILLER(1)                                                            MILLER(1)
2
3
4

NAME

6       miller - like awk, sed, cut, join, and sort for name-indexed data such
7       as CSV and tabular JSON.
8

SYNOPSIS

10       Usage: mlr [I/O options] {verb} [verb-dependent options ...] {zero or
11       more file names}
12
13

DESCRIPTION

15       Miller operates on key-value-pair data while the familiar Unix tools
16       operate on integer-indexed fields: if the natural data structure for
17       the latter is the array, then Miller's natural data structure is the
18       insertion-ordered hash map.  This encompasses a variety of data
19       formats, including but not limited to the familiar CSV, TSV, and JSON.
20       (Miller can handle positionally-indexed data as a special case.) This
21       manpage documents Miller v5.9.1.
22

EXAMPLES

24   COMMAND-LINE SYNTAX
25       mlr --csv cut -f hostname,uptime mydata.csv
26       mlr --tsv --rs lf filter '$status != "down" && $upsec >= 10000' *.tsv
27       mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat
28       grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group
29       mlr join -j account_id -f accounts.dat then group-by account_name balances.dat
30       mlr --json put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json
31       mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/*
32       mlr stats2 -a linreg-pca -f u,v -g shape data/*
33       mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}' data/*
34       mlr --from estimates.tbl put '
35         for (k,v in $*) {
36           if (is_numeric(v) && k =~ "^[t-z].*$") {
37             $sum += v; $count += 1
38           }
39         }
40         $mean = $sum / $count # no assignment if count unset'
41       mlr --from infile.dat put -f analyze.mlr
42       mlr --from infile.dat put 'tee > "./taps/data-".$a."-".$b, $*'
43       mlr --from infile.dat put 'tee | "gzip > ./taps/data-".$a."-".$b.".gz", $*'
44       mlr --from infile.dat put -q '@v=$*; dump | "jq .[]"'
45       mlr --from infile.dat put  '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
46
47   DATA FORMATS
48         DKVP: delimited key-value pairs (Miller default format)
49         +---------------------+
50         | apple=1,bat=2,cog=3 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
51         | dish=7,egg=8,flint  | Record 2: "dish" => "7", "egg" => "8", "3" => "flint"
52         +---------------------+
53
54         NIDX: implicitly numerically indexed (Unix-toolkit style)
55         +---------------------+
56         | the quick brown     | Record 1: "1" => "the", "2" => "quick", "3" => "brown"
57         | fox jumped          | Record 2: "1" => "fox", "2" => "jumped"
58         +---------------------+
59
60         CSV/CSV-lite: comma-separated values with separate header line
61         +---------------------+
62         | apple,bat,cog       |
63         | 1,2,3               | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
64         | 4,5,6               | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
65         +---------------------+
66
67         Tabular JSON: nested objects are supported, although arrays within them are not:
68         +---------------------+
69         | {                   |
70         |  "apple": 1,        | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
71         |  "bat": 2,          |
72         |  "cog": 3           |
73         | }                   |
74         | {                   |
75         |   "dish": {         | Record 2: "dish:egg" => "7", "dish:flint" => "8", "garlic" => ""
76         |     "egg": 7,       |
77         |     "flint": 8      |
78         |   },                |
79         |   "garlic": ""      |
80         | }                   |
81         +---------------------+
82
83         PPRINT: pretty-printed tabular
84         +---------------------+
85         | apple bat cog       |
86         | 1     2   3         | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
87         | 4     5   6         | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
88         +---------------------+
89
90         XTAB: pretty-printed transposed tabular
91         +---------------------+
92         | apple 1             | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
93         | bat   2             |
94         | cog   3             |
95         |                     |
96         | dish 7              | Record 2: "dish" => "7", "egg" => "8"
97         | egg  8              |
98         +---------------------+
99
100         Markdown tabular (supported for output only):
101         +-----------------------+
102         | | apple | bat | cog | |
103         | | ---   | --- | --- | |
104         | | 1     | 2   | 3   | | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
105         | | 4     | 5   | 6   | | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
106         +-----------------------+
107

OPTIONS

109       In the following option flags, the version with "i" designates the
110       input stream, "o" the output stream, and the version without prefix
111       sets the option for both input and output stream. For example: --irs
112       sets the input record separator, --ors the output record separator, and
113       --rs sets both the input and output separator to the given value.
114
115   HELP OPTIONS
116         -h or --help                 Show this message.
117         --version                    Show the software version.
118         {verb name} --help           Show verb-specific help.
119         --help-all-verbs             Show help on all verbs.
120         -l or --list-all-verbs       List only verb names.
121         -L                           List only verb names, one per line.
122         -f or --help-all-functions   Show help on all built-in functions.
123         -F                           Show a bare listing of built-in functions by name.
124         -k or --help-all-keywords    Show help on all keywords.
125         -K                           Show a bare listing of keywords by name.
126
127   VERB LIST
128        altkv bar bootstrap cat check clean-whitespace count count-distinct
129        count-similar cut decimate fill-down filter format-values fraction grep
130        group-by group-like having-fields head histogram join label least-frequent
131        merge-fields most-frequent nest nothing put regularize remove-empty-columns
132        rename reorder repeat reshape sample sec2gmt sec2gmtdate seqgen shuffle
133        skip-trivial-records sort stats1 stats2 step tac tail tee top uniq unsparsify
134
135   FUNCTION LIST
136        + + - - * / // .+ .+ .- .- .* ./ .// % ** | ^ & ~ << >> bitcount == != =~ !=~
137        > >= < <= && || ^^ ! ? : . gsub regextract regextract_or_else strlen sub ssub
138        substr tolower toupper capitalize lstrip rstrip strip collapse_whitespace
139        clean_whitespace system abs acos acosh asin asinh atan atan2 atanh cbrt ceil
140        cos cosh erf erfc exp expm1 floor invqnorm log log10 log1p logifit madd max
141        mexp min mmul msub pow qnorm round roundm sgn sin sinh sqrt tan tanh urand
142        urandrange urand32 urandint dhms2fsec dhms2sec fsec2dhms fsec2hms gmt2sec
143        localtime2sec hms2fsec hms2sec sec2dhms sec2gmt sec2gmt sec2gmtdate
144        sec2localtime sec2localtime sec2localdate sec2hms strftime strftime_local
145        strptime strptime_local systime is_absent is_bool is_boolean is_empty
146        is_empty_map is_float is_int is_map is_nonempty_map is_not_empty is_not_map
147        is_not_null is_null is_numeric is_present is_string asserting_absent
148        asserting_bool asserting_boolean asserting_empty asserting_empty_map
149        asserting_float asserting_int asserting_map asserting_nonempty_map
150        asserting_not_empty asserting_not_map asserting_not_null asserting_null
151        asserting_numeric asserting_present asserting_string boolean float fmtnum
152        hexfmt int string typeof depth haskey joink joinkv joinv leafcount length
153        mapdiff mapexcept mapselect mapsum splitkv splitkvx splitnv splitnvx
154
155       Please use "mlr --help-function {function name}" for function-specific help.
156
157   I/O FORMATTING
158         --idkvp   --odkvp   --dkvp      Delimited key-value pairs, e.g "a=1,b=2"
159                                         (this is Miller's default format).
160
161         --inidx   --onidx   --nidx      Implicitly-integer-indexed fields
162                                         (Unix-toolkit style).
163         -T                              Synonymous with "--nidx --fs tab".
164
165         --icsv    --ocsv    --csv       Comma-separated value (or tab-separated
166                                         with --fs tab, etc.)
167
168         --itsv    --otsv    --tsv       Keystroke-savers for "--icsv --ifs tab",
169                                         "--ocsv --ofs tab", "--csv --fs tab".
170         --iasv    --oasv    --asv       Similar but using ASCII FS 0x1f and RS 0x1e
171         --iusv    --ousv    --usv       Similar but using Unicode FS U+241F (UTF-8 0xe2909f)
172                                         and RS U+241E (UTF-8 0xe2909e)
173
174         --icsvlite --ocsvlite --csvlite Comma-separated value (or tab-separated
175                                         with --fs tab, etc.). The 'lite' CSV does not handle
176                                         RFC-CSV double-quoting rules; is slightly faster;
177                                         and handles heterogeneity in the input stream via
178                                         empty newline followed by new header line. See also
179                                         http://johnkerl.org/miller/doc/file-formats.html#CSV/TSV/etc.
180
181         --itsvlite --otsvlite --tsvlite Keystroke-savers for "--icsvlite --ifs tab",
182                                         "--ocsvlite --ofs tab", "--csvlite --fs tab".
183         -t                              Synonymous with --tsvlite.
184         --iasvlite --oasvlite --asvlite Similar to --itsvlite et al. but using ASCII FS 0x1f and RS 0x1e
185         --iusvlite --ousvlite --usvlite Similar to --itsvlite et al. but using Unicode FS U+241F (UTF-8 0xe2909f)
186                                         and RS U+241E (UTF-8 0xe2909e)
187
188         --ipprint --opprint --pprint    Pretty-printed tabular (produces no
189                                         output until all input is in).
190                             --right     Right-justifies all fields for PPRINT output.
191                             --barred    Prints a border around PPRINT output
192                                         (only available for output).
193
194                   --omd                 Markdown-tabular (only available for output).
195
196         --ixtab   --oxtab   --xtab      Pretty-printed vertical-tabular.
197                             --xvright   Right-justifies values for XTAB format.
198
199         --ijson   --ojson   --json      JSON tabular: sequence or list of one-level
200                                         maps: {...}{...} or [{...},{...}].
201           --json-map-arrays-on-input    JSON arrays are unmillerable. --json-map-arrays-on-input
202           --json-skip-arrays-on-input   is the default: arrays are converted to integer-indexed
203           --json-fatal-arrays-on-input  maps. The other two options cause them to be skipped, or
204                                         to be treated as errors.  Please use the jq tool for full
205                                         JSON (pre)processing.
206                             --jvstack   Put one key-value pair per line for JSON
207                                         output.
208                       --jsonx --ojsonx  Keystroke-savers for --json --jvstack
209                       --jsonx --ojsonx  and --ojson --jvstack, respectively.
210                             --jlistwrap Wrap JSON output in outermost [ ].
211                           --jknquoteint Do not quote non-string map keys in JSON output.
212                            --jvquoteall Quote map values in JSON output, even if they're
213                                         numeric.
214                     --jflatsep {string} Separator for flattening multi-level JSON keys,
215                                         e.g. '{"a":{"b":3}}' becomes a:b => 3 for
216                                         non-JSON formats. Defaults to :.
217
218         -p is a keystroke-saver for --nidx --fs space --repifs
219
220         Examples: --csv for CSV-formatted input and output; --idkvp --opprint for
221         DKVP-formatted input and pretty-printed output.
222
223         Please use --iformat1 --oformat2 rather than --format1 --oformat2.
224         The latter sets up input and output flags for format1, not all of which
225         are overridden in all cases by setting output format to format2.
226
227   COMMENTS IN DATA
228         --skip-comments                 Ignore commented lines (prefixed by "#")
229                                         within the input.
230         --skip-comments-with {string}   Ignore commented lines within input, with
231                                         specified prefix.
232         --pass-comments                 Immediately print commented lines (prefixed by "#")
233                                         within the input.
234         --pass-comments-with {string}   Immediately print commented lines within input, with
235                                         specified prefix.
236       Notes:
237       * Comments are only honored at the start of a line.
238       * In the absence of any of the above four options, comments are data like
239         any other text.
240       * When pass-comments is used, comment lines are written to standard output
241         immediately upon being read; they are not part of the record stream.
242         Results may be counterintuitive. A suggestion is to place comments at the
243         start of data files.
244
245   FORMAT-CONVERSION KEYSTROKE-SAVERS
246       As keystroke-savers for format-conversion you may use the following:
247               --c2t --c2d --c2n --c2j --c2x --c2p --c2m
248         --t2c       --t2d --t2n --t2j --t2x --t2p --t2m
249         --d2c --d2t       --d2n --d2j --d2x --d2p --d2m
250         --n2c --n2t --n2d       --n2j --n2x --n2p --n2m
251         --j2c --j2t --j2d --j2n       --j2x --j2p --j2m
252         --x2c --x2t --x2d --x2n --x2j       --x2p --x2m
253         --p2c --p2t --p2d --p2n --p2j --p2x       --p2m
254       The letters c t d n j x p m refer to formats CSV, TSV, DKVP, NIDX, JSON, XTAB,
255       PPRINT, and markdown, respectively. Note that markdown format is available for
256       output only.
257
258   COMPRESSED I/O
259         --prepipe {command} This allows Miller to handle compressed inputs. You can do
260         without this for single input files, e.g. "gunzip < myfile.csv.gz | mlr ...".
261
262         However, when multiple input files are present, between-file separations are
263         lost; also, the FILENAME variable doesn't iterate. Using --prepipe you can
264         specify an action to be taken on each input file. This pre-pipe command must
265         be able to read from standard input; it will be invoked with
266           {command} < {filename}.
267         Examples:
268           mlr --prepipe 'gunzip'
269           mlr --prepipe 'zcat -cf'
270           mlr --prepipe 'xz -cd'
271           mlr --prepipe cat
272           mlr --prepipe-gunzip
273           mlr --prepipe-zcat
274         Note that this feature is quite general and is not limited to decompression
275         utilities. You can use it to apply per-file filters of your choice.
276         For output compression (or other) utilities, simply pipe the output:
277           mlr ... | {your compression command}
278
279         There are shorthands --prepipe-zcat and --prepipe-gunzip which are
280         valid in .mlrrc files. The --prepipe flag is not valid in .mlrrc
281         files since that would put execution of the prepipe command under
282         control of the .mlrrc file.
283
284   SEPARATORS
285         --rs     --irs     --ors              Record separators, e.g. 'lf' or '\r\n'
286         --fs     --ifs     --ofs  --repifs    Field separators, e.g. comma
287         --ps     --ips     --ops              Pair separators, e.g. equals sign
288
289         Notes about line endings:
290         * Default line endings (--irs and --ors) are "auto" which means autodetect from
291           the input file format, as long as the input file(s) have lines ending in either
292           LF (also known as linefeed, '\n', 0x0a, Unix-style) or CRLF (also known as
293           carriage-return/linefeed pairs, '\r\n', 0x0d 0x0a, Windows style).
294         * If both irs and ors are auto (which is the default) then LF input will lead to LF
295           output and CRLF input will lead to CRLF output, regardless of the platform you're
296           running on.
297         * The line-ending autodetector triggers on the first line ending detected in the input
298           stream. E.g. if you specify a CRLF-terminated file on the command line followed by an
299           LF-terminated file then autodetected line endings will be CRLF.
300         * If you use --ors {something else} with (default or explicitly specified) --irs auto
301           then line endings are autodetected on input and set to what you specify on output.
302         * If you use --irs {something else} with (default or explicitly specified) --ors auto
303           then the output line endings used are LF on Unix/Linux/BSD/MacOSX, and CRLF on Windows.
304
305         Notes about all other separators:
306         * IPS/OPS are only used for DKVP and XTAB formats, since only in these formats
307           do key-value pairs appear juxtaposed.
308         * IRS/ORS are ignored for XTAB format. Nominally IFS and OFS are newlines;
309           XTAB records are separated by two or more consecutive IFS/OFS -- i.e.
310           a blank line. Everything above about --irs/--ors/--rs auto becomes --ifs/--ofs/--fs
311           auto for XTAB format. (XTAB's default IFS/OFS are "auto".)
312         * OFS must be single-character for PPRINT format. This is because it is used
313           with repetition for alignment; multi-character separators would make
314           alignment impossible.
315         * OPS may be multi-character for XTAB format, in which case alignment is
316           disabled.
317         * TSV is simply CSV using tab as field separator ("--fs tab").
318         * FS/PS are ignored for markdown format; RS is used.
319         * All FS and PS options are ignored for JSON format, since they are not relevant
320           to the JSON format.
321         * You can specify separators in any of the following ways, shown by example:
322           - Type them out, quoting as necessary for shell escapes, e.g.
323             "--fs '|' --ips :"
324           - C-style escape sequences, e.g. "--rs '\r\n' --fs '\t'".
325           - To avoid backslashing, you can use any of the following names:
326             cr crcr newline lf lflf crlf crlfcrlf tab space comma pipe slash colon semicolon equals
327         * Default separators by format:
328             File format  RS       FS       PS
329             gen          N/A      (N/A)    (N/A)
330             dkvp         auto     ,        =
331             json         auto     (N/A)    (N/A)
332             nidx         auto     space    (N/A)
333             csv          auto     ,        (N/A)
334             csvlite      auto     ,        (N/A)
335             markdown     auto     (N/A)    (N/A)
336             pprint       auto     space    (N/A)
337             xtab         (N/A)    auto     space
338
339   CSV-SPECIFIC OPTIONS
340         --implicit-csv-header Use 1,2,3,... as field labels, rather than from line 1
341                            of input files. Tip: combine with "label" to recreate
342                            missing headers.
343         --allow-ragged-csv-input|--ragged If a data line has fewer fields than the header line,
344                            fill remaining keys with empty string. If a data line has more
345                            fields than the header line, use integer field labels as in
346                            the implicit-header case.
347         --headerless-csv-output   Print only CSV data lines.
348         -N                 Keystroke-saver for --implicit-csv-header --headerless-csv-output.
349
350   DOUBLE-QUOTING FOR CSV/CSVLITE OUTPUT
351         --quote-all        Wrap all fields in double quotes
352         --quote-none       Do not wrap any fields in double quotes, even if they have
353                            OFS or ORS in them
354         --quote-minimal    Wrap fields in double quotes only if they have OFS or ORS
355                            in them (default)
356         --quote-numeric    Wrap fields in double quotes only if they have numbers
357                            in them
358         --quote-original   Wrap fields in double quotes if and only if they were
359                            quoted on input. This isn't sticky for computed fields:
360                            e.g. if fields a and b were quoted on input and you do
361                            "put '$c = $a . $b'" then field c won't inherit a or b's
362                            was-quoted-on-input flag.
363
364   NUMERICAL FORMATTING
365         --ofmt {format}    E.g. %.18lf, %.0lf. Please use sprintf-style codes for
366                            double-precision. Applies to verbs which compute new
367                            values, e.g. put, stats1, stats2. See also the fmtnum
368                            function within mlr put (mlr --help-all-functions).
369                            Defaults to %lf.
370
371   OTHER OPTIONS
372         --seed {n} with n of the form 12345678 or 0xcafefeed. For put/filter
373                            urand()/urandint()/urand32().
374         --nr-progress-mod {m}, with m a positive integer: print filename and record
375                            count to stderr every m input records.
376         --from {filename}  Use this to specify an input file before the verb(s),
377                            rather than after. May be used more than once. Example:
378                            "mlr --from a.dat --from b.dat cat" is the same as
379                            "mlr cat a.dat b.dat".
380         -n                 Process no input files, nor standard input either. Useful
381                            for mlr put with begin/end statements only. (Same as --from
382                            /dev/null.) Also useful in "mlr -n put -v '...'" for
383                            analyzing abstract syntax trees (if that's your thing).
384         -I                 Process files in-place. For each file name on the command
385                            line, output is written to a temp file in the same
386                            directory, which is then renamed over the original. Each
387                            file is processed in isolation: if the output format is
388                            CSV, CSV headers will be present in each output file;
389                            statistics are only over each file's own records; and so on.
390
391   THEN-CHAINING
392       Output of one verb may be chained as input to another using "then", e.g.
393         mlr stats1 -a min,mean,max -f flag,u,v -g color then sort -f color
394
395   AUXILIARY COMMANDS
396       Miller has a few otherwise-standalone executables packaged within it.
397       They do not participate in any other parts of Miller.
398       Available subcommands:
399         aux-list
400         lecat
401         termcvt
402         hex
403         unhex
404         netbsd-strptime
405       For more information, please invoke mlr {subcommand} --help
406

MLRRC

408       You can set up personal defaults via a $HOME/.mlrrc and/or ./.mlrrc.
409       For example, if you usually process CSV, then you can put "--csv" in your .mlrrc file
410       and that will be the default input/output format unless otherwise specified on the command line.
411
412       The .mlrrc file format is one "--flag" or "--option value" per line, with the leading "--" optional.
413       Hash-style comments and blank lines are ignored.
414
415       Sample .mlrrc:
416       # Input and output formats are CSV by default (unless otherwise specified
417       # on the mlr command line):
418       csv
419       # These are no-ops for CSV, but when I do use JSON output, I want these
420       # pretty-printing options to be used:
421       jvstack
422       jlistwrap
423
424       How to specify location of .mlrrc:
425       * If $MLRRC is set:
426         o If its value is "__none__" then no .mlrrc files are processed.
427         o Otherwise, its value (as a filename) is loaded and processed. If there are syntax
428           errors, they abort mlr with a usage message (as if you had mistyped something on the
429           command line). If the file can't be loaded at all, though, it is silently skipped.
430         o Any .mlrrc in your home directory or current directory is ignored whenever $MLRRC is
431           set in the environment.
432       * Otherwise:
433         o If $HOME/.mlrrc exists, it's then processed as above.
434         o If ./.mlrrc exists, it's then also processed as above.
435         (I.e. current-directory .mlrrc defaults are stacked over home-directory .mlrrc defaults.)
436
437       See also:
438       https://johnkerl.org/miller/doc/customization.html
439

VERBS

441   altkv
442       Usage: mlr altkv [no options]
443       Given fields with values of the form a,b,c,d,e,f emits a=b,c=d,e=f pairs.
444
445   bar
446       Usage: mlr bar [options]
447       Replaces a numeric field with a number of asterisks, allowing for cheesy
448       bar plots. These align best with --opprint or --oxtab output format.
449       Options:
450       -f   {a,b,c}      Field names to convert to bars.
451       -c   {character}  Fill character: default '*'.
452       -x   {character}  Out-of-bounds character: default '#'.
453       -b   {character}  Blank character: default '.'.
454       --lo {lo}         Lower-limit value for min-width bar: default '0.000000'.
455       --hi {hi}         Upper-limit value for max-width bar: default '100.000000'.
456       -w   {n}          Bar-field width: default '40'.
457       --auto            Automatically computes limits, ignoring --lo and --hi.
458                         Holds all records in memory before producing any output.
459
460   bootstrap
461       Usage: mlr bootstrap [options]
462       Emits an n-sample, with replacement, of the input records.
463       Options:
464       -n {number} Number of samples to output. Defaults to number of input records.
465                   Must be non-negative.
466       See also mlr sample and mlr shuffle.
467
468   cat
469       Usage: mlr cat [options]
470       Passes input records directly to output. Most useful for format conversion.
471       Options:
472       -n        Prepend field "n" to each record with record-counter starting at 1
473       -g {comma-separated field name(s)} When used with -n/-N, writes record-counters
474                 keyed by specified field name(s).
475       -v        Write a low-level record-structure dump to stderr.
476       -N {name} Prepend field {name} to each record with record-counter starting at 1
477
478   check
479       Usage: mlr check
480       Consumes records without printing any output.
481       Useful for doing a well-formatted check on input data.
482
483   clean-whitespace
484       Usage: mlr clean-whitespace [options] {old1,new1,old2,new2,...}
485       For each record, for each field in the record, whitespace-cleans the keys and
486       values. Whitespace-cleaning entails stripping leading and trailing whitespace,
487       and replacing multiple whitespace with singles. For finer-grained control,
488       please see the DSL functions lstrip, rstrip, strip, collapse_whitespace,
489       and clean_whitespace.
490
491       Options:
492       -k|--keys-only    Do not touch values.
493       -v|--values-only  Do not touch keys.
494       It is an error to specify -k as well as -v.
495
496   count
497       Usage: mlr count [options]
498       Prints number of records, optionally grouped by distinct values for specified field names.
499
500       Options:
501       -g {a,b,c}    Field names for distinct count.
502       -n            Show only the number of distinct values. Not compatible with -u.
503       -o {name}     Field name for output count. Default "count".
504
505   count-distinct
506       Usage: mlr count-distinct [options]
507       Prints number of records having distinct values for specified field names.
508       Same as uniq -c.
509
510       Options:
511       -f {a,b,c}    Field names for distinct count.
512       -n            Show only the number of distinct values. Not compatible with -u.
513       -o {name}     Field name for output count. Default "count".
514                     Ignored with -u.
515       -u            Do unlashed counts for multiple field names. With -f a,b and
516                     without -u, computes counts for distinct combinations of a
517                     and b field values. With -f a,b and with -u, computes counts
518                     for distinct a field values and counts for distinct b field
519                     values separately.
520
521   count-similar
522       Usage: mlr count-similar [options]
523       Ingests all records, then emits each record augmented by a count of
524       the number of other records having the same group-by field values.
525       Options:
526       -g {d,e,f} Group-by-field names for counts.
527       -o {name}  Field name for output count. Default "count".
528
529   cut
530       Usage: mlr cut [options]
531       Passes through input records with specified fields included/excluded.
532       -f {a,b,c}       Field names to include for cut.
533       -o               Retain fields in the order specified here in the argument list.
534                        Default is to retain them in the order found in the input data.
535       -x|--complement  Exclude, rather than include, field names specified by -f.
536       -r               Treat field names as regular expressions. "ab", "a.*b" will
537                        match any field name containing the substring "ab" or matching
538                        "a.*b", respectively; anchors of the form "^ab$", "^a.*b$" may
539                        be used. The -o flag is ignored when -r is present.
540       Examples:
541         mlr cut -f hostname,status
542         mlr cut -x -f hostname,status
543         mlr cut -r -f '^status$,sda[0-9]'
544         mlr cut -r -f '^status$,"sda[0-9]"'
545         mlr cut -r -f '^status$,"sda[0-9]"i' (this is case-insensitive)
546
547   decimate
548       Usage: mlr decimate [options]
549       -n {count}    Decimation factor; default 10
550       -b            Decimate by printing first of every n.
551       -e            Decimate by printing last of every n (default).
552       -g {a,b,c}    Optional group-by-field names for decimate counts
553       Passes through one of every n records, optionally by category.
554
555   fill-down
556       Usage: mlr fill-down [options]
557       -f {a,b,c}          Field names for fill-down
558       -a|--only-if-absent Field names for fill-down
559       If a given record has a missing value for a given field, fill that from
560       the corresponding value from a previous record, if any.
561       By default, a 'missing' field either is absent, or has the empty-string value.
562       With -a, a field is 'missing' only if it is absent.
563
564   filter
565       Usage: mlr filter [options] {expression}
566       Prints records for which {expression} evaluates to true.
567       If there are multiple semicolon-delimited expressions, all of them are
568       evaluated and the last one is used as the filter criterion.
569
570       Conversion options:
571       -S: Keeps field values as strings with no type inference to int or float.
572       -F: Keeps field values as strings or floats with no inference to int.
573       All field values are type-inferred to int/float/string unless this behavior is
574       suppressed with -S or -F.
575
576       Output/formatting options:
577       --oflatsep {string}: Separator to use when flattening multi-level @-variables
578           to output records for emit. Default ":".
579       --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
580       --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
581       Any of the output-format command-line flags (see mlr -h). Example: using
582         mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
583       the input is CSV, the output is pretty-print tabular, but the tee-file output
584       is written in JSON format.
585       --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
586           record.
587
588       Expression-specification options:
589       -f {filename}: the DSL expression is taken from the specified file rather
590           than from the command line. Outer single quotes wrapping the expression
591           should not be placed in the file. If -f is specified more than once,
592           all input files specified using -f are concatenated to produce the expression.
593           (For example, you can define functions in one file and call them from another.)
594       -e {expression}: You can use this after -f to add an expression. Example use
595           case: define functions/subroutines in a file you specify with -f, then call
596           them with an expression you specify with -e.
597       (If you mix -e and -f then the expressions are evaluated in the order encountered.
598       Since the expression pieces are simply concatenated, please be sure to use intervening
599       semicolons to separate expressions.)
600
601       -s name=value: Predefines out-of-stream variable @name to have value "value".
602           Thus mlr filter put -s foo=97 '$column += @foo' is like
603           mlr filter put 'begin {@foo = 97} $column += @foo'.
604           The value part is subject to type-inferencing as specified by -S/-F.
605           May be specified more than once, e.g. -s name1=value1 -s name2=value2.
606           Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE
607
608       Tracing options:
609       -v: Prints the expressions's AST (abstract syntax tree), which gives
610           full transparency on the precedence and associativity rules of
611           Miller's grammar, to stdout.
612       -a: Prints a low-level stack-allocation trace to stdout.
613       -t: Prints a low-level parser trace to stderr.
614       -T: Prints a every statement to stderr as it is executed.
615
616       Other options:
617       -x: Prints records for which {expression} evaluates to false.
618
619       Please use a dollar sign for field names and double-quotes for string
620       literals. If field names have special characters such as "." then you might
621       use braces, e.g. '${field.name}'. Miller built-in variables are
622       NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
623       variables. The environment-variable name may be an expression, e.g. a field
624       value.
625
626       Use # to comment to end of line.
627
628       Examples:
629         mlr filter 'log10($count) > 4.0'
630         mlr filter 'FNR == 2          (second record in each file)'
631         mlr filter 'urand() < 0.001'  (subsampling)
632         mlr filter '$color != "blue" && $value > 4.2'
633         mlr filter '($x<.5 && $y<.5) || ($x>.5 && $y>.5)'
634         mlr filter '($name =~ "^sys.*east$") || ($name =~ "^dev.[0-9]+"i)'
635         mlr filter '$ab = $a+$b; $cd = $c+$d; $ab != $cd'
636         mlr filter '
637           NR == 1 ||
638          #NR == 2 ||
639           NR == 3
640         '
641
642       Please see http://johnkerl.org/miller/doc/reference.html for more information
643       including function list. Or "mlr -f". Please also see "mlr grep" which is
644       useful when you don't yet know which field name(s) you're looking for.
645       Please see in particular:
646         http://www.johnkerl.org/miller/doc/reference-verbs.html#filter
647
648   format-values
649       Usage: mlr format-values [options]
650       Applies format strings to all field values, depending on autodetected type.
651       * If a field value is detected to be integer, applies integer format.
652       * Else, if a field value is detected to be float, applies float format.
653       * Else, applies string format.
654
655       Note: this is a low-keystroke way to apply formatting to many fields. To get
656       finer control, please see the fmtnum function within the mlr put DSL.
657
658       Note: this verb lets you apply arbitrary format strings, which can produce
659       undefined behavior and/or program crashes.  See your system's "man printf".
660
661       Options:
662       -i {integer format} Defaults to "%lld".
663                           Examples: "%06lld", "%08llx".
664                           Note that Miller integers are long long so you must use
665                           formats which apply to long long, e.g. with ll in them.
666                           Undefined behavior results otherwise.
667       -f {float format}   Defaults to "%lf".
668                           Examples: "%8.3lf", "%.6le".
669                           Note that Miller floats are double-precision so you must
670                           use formats which apply to double, e.g. with l[efg] in them.
671                           Undefined behavior results otherwise.
672       -s {string format}  Defaults to "%s".
673                           Examples: "_%s", "%08s".
674                           Note that you must use formats which apply to string, e.g.
675                           with s in them. Undefined behavior results otherwise.
676       -n                  Coerce field values autodetected as int to float, and then
677                           apply the float format.
678
679   fraction
680       Usage: mlr fraction [options]
681       For each record's value in specified fields, computes the ratio of that
682       value to the sum of values in that field over all input records.
683       E.g. with input records  x=1  x=2  x=3  and  x=4, emits output records
684       x=1,x_fraction=0.1  x=2,x_fraction=0.2  x=3,x_fraction=0.3  and  x=4,x_fraction=0.4
685
686       Note: this is internally a two-pass algorithm: on the first pass it retains
687       input records and accumulates sums; on the second pass it computes quotients
688       and emits output records. This means it produces no output until all input is read.
689
690       Options:
691       -f {a,b,c}    Field name(s) for fraction calculation
692       -g {d,e,f}    Optional group-by-field name(s) for fraction counts
693       -p            Produce percents [0..100], not fractions [0..1]. Output field names
694                     end with "_percent" rather than "_fraction"
695       -c            Produce cumulative distributions, i.e. running sums: each output
696                     value folds in the sum of the previous for the specified group
697                     E.g. with input records  x=1  x=2  x=3  and  x=4, emits output records
698                     x=1,x_cumulative_fraction=0.1  x=2,x_cumulative_fraction=0.3
699                     x=3,x_cumulative_fraction=0.6  and  x=4,x_cumulative_fraction=1.0
700
701   grep
702       Usage: mlr grep [options] {regular expression}
703       Passes through records which match {regex}.
704       Options:
705       -i    Use case-insensitive search.
706       -v    Invert: pass through records which do not match the regex.
707       Note that "mlr filter" is more powerful, but requires you to know field names.
708       By contrast, "mlr grep" allows you to regex-match the entire record. It does
709       this by formatting each record in memory as DKVP, using command-line-specified
710       ORS/OFS/OPS, and matching the resulting line against the regex specified
711       here. In particular, the regex is not applied to the input stream: if you
712       have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
713       be matched, not against either of these lines, but against the DKVP line
714       "x=1,y=2,z=3".  Furthermore, not all the options to system grep are supported,
715       and this command is intended to be merely a keystroke-saver. To get all the
716       features of system grep, you can do
717         "mlr --odkvp ... | grep ... | mlr --idkvp ..."
718
719   group-by
720       Usage: mlr group-by {comma-separated field names}
721       Outputs records in batches having identical values at specified field names.
722
723   group-like
724       Usage: mlr group-like
725       Outputs records in batches having identical field names.
726
727   having-fields
728       Usage: mlr having-fields [options]
729       Conditionally passes through records depending on each record's field names.
730       Options:
731         --at-least      {comma-separated names}
732         --which-are     {comma-separated names}
733         --at-most       {comma-separated names}
734         --all-matching  {regular expression}
735         --any-matching  {regular expression}
736         --none-matching {regular expression}
737       Examples:
738         mlr having-fields --which-are amount,status,owner
739         mlr having-fields --any-matching 'sda[0-9]'
740         mlr having-fields --any-matching '"sda[0-9]"'
741         mlr having-fields --any-matching '"sda[0-9]"i' (this is case-insensitive)
742
743   head
744       Usage: mlr head [options]
745       -n {count}    Head count to print; default 10
746       -g {a,b,c}    Optional group-by-field names for head counts
747       Passes through the first n records, optionally by category.
748       Without -g, ceases consuming more input (i.e. is fast) when n
749       records have been read.
750
751   histogram
752       Usage: mlr histogram [options]
753       -f {a,b,c}    Value-field names for histogram counts
754       --lo {lo}     Histogram low value
755       --hi {hi}     Histogram high value
756       --nbins {n}   Number of histogram bins
757       --auto        Automatically computes limits, ignoring --lo and --hi.
758                     Holds all values in memory before producing any output.
759       -o {prefix}   Prefix for output field name. Default: no prefix.
760       Just a histogram. Input values < lo or > hi are not counted.
761
762   join
763       Usage: mlr join [options]
764       Joins records from specified left file name with records from all file names
765       at the end of the Miller argument list.
766       Functionality is essentially the same as the system "join" command, but for
767       record streams.
768       Options:
769         -f {left file name}
770         -j {a,b,c}   Comma-separated join-field names for output
771         -l {a,b,c}   Comma-separated join-field names for left input file;
772                      defaults to -j values if omitted.
773         -r {a,b,c}   Comma-separated join-field names for right input file(s);
774                      defaults to -j values if omitted.
775         --lp {text}  Additional prefix for non-join output field names from
776                      the left file
777         --rp {text}  Additional prefix for non-join output field names from
778                      the right file(s)
779         --np         Do not emit paired records
780         --ul         Emit unpaired records from the left file
781         --ur         Emit unpaired records from the right file(s)
782         -s|--sorted-input  Require sorted input: records must be sorted
783                      lexically by their join-field names, else not all records will
784                      be paired. The only likely use case for this is with a left
785                      file which is too big to fit into system memory otherwise.
786         -u           Enable unsorted input. (This is the default even without -u.)
787                      In this case, the entire left file will be loaded into memory.
788         --prepipe {command} As in main input options; see mlr --help for details.
789                      If you wish to use a prepipe command for the main input as well
790                      as here, it must be specified there as well as here.
791       File-format options default to those for the right file names on the Miller
792       argument list, but may be overridden for the left file as follows. Please see
793       the main "mlr --help" for more information on syntax for these arguments.
794         -i {one of csv,dkvp,nidx,pprint,xtab}
795         --irs {record-separator character}
796         --ifs {field-separator character}
797         --ips {pair-separator character}
798         --repifs
799         --repips
800       Please use "mlr --usage-separator-options" for information on specifying separators.
801       Please see http://johnkerl.org/miller/doc/reference-verbs.html#join for more information
802       including examples.
803
804   label
805       Usage: mlr label {new1,new2,new3,...}
806       Given n comma-separated names, renames the first n fields of each record to
807       have the respective name. (Fields past the nth are left with their original
808       names.) Particularly useful with --inidx or --implicit-csv-header, to give
809       useful names to otherwise integer-indexed fields.
810       Examples:
811         "echo 'a b c d' | mlr --inidx --odkvp cat"       gives "1=a,2=b,3=c,4=d"
812         "echo 'a b c d' | mlr --inidx --odkvp label s,t" gives "s=a,t=b,3=c,4=d"
813
814   least-frequent
815       Usage: mlr least-frequent [options]
816       Shows the least frequently occurring distinct values for specified field names.
817       The first entry is the statistical anti-mode; the remaining are runners-up.
818       Options:
819       -f {one or more comma-separated field names}. Required flag.
820       -n {count}. Optional flag defaulting to 10.
821       -b          Suppress counts; show only field values.
822       -o {name}   Field name for output count. Default "count".
823       See also "mlr most-frequent".
824
825   merge-fields
826       Usage: mlr merge-fields [options]
827       Computes univariate statistics for each input record, accumulated across
828       specified fields.
829       Options:
830       -a {sum,count,...}  Names of accumulators. One or more of:
831         count     Count instances of fields
832         mode      Find most-frequently-occurring values for fields; first-found wins tie
833         antimode  Find least-frequently-occurring values for fields; first-found wins tie
834         sum       Compute sums of specified fields
835         mean      Compute averages (sample means) of specified fields
836         stddev    Compute sample standard deviation of specified fields
837         var       Compute sample variance of specified fields
838         meaneb    Estimate error bars for averages (assuming no sample autocorrelation)
839         skewness  Compute sample skewness of specified fields
840         kurtosis  Compute sample kurtosis of specified fields
841         min       Compute minimum values of specified fields
842         max       Compute maximum values of specified fields
843       -f {a,b,c}  Value-field names on which to compute statistics. Requires -o.
844       -r {a,b,c}  Regular expressions for value-field names on which to compute
845                   statistics. Requires -o.
846       -c {a,b,c}  Substrings for collapse mode. All fields which have the same names
847                   after removing substrings will be accumulated together. Please see
848                   examples below.
849       -i          Use interpolated percentiles, like R's type=7; default like type=1.
850                   Not sensical for string-valued fields.
851       -o {name}   Output field basename for -f/-r.
852       -k          Keep the input fields which contributed to the output statistics;
853                   the default is to omit them.
854       -F          Computes integerable things (e.g. count) in floating point.
855
856       String-valued data make sense unless arithmetic on them is required,
857       e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
858       numbers are less than strings.
859
860       Example input data: "a_in_x=1,a_out_x=2,b_in_y=4,b_out_x=8".
861       Example: mlr merge-fields -a sum,count -f a_in_x,a_out_x -o foo
862         produces "b_in_y=4,b_out_x=8,foo_sum=3,foo_count=2" since "a_in_x,a_out_x" are
863         summed over.
864       Example: mlr merge-fields -a sum,count -r in_,out_ -o bar
865         produces "bar_sum=15,bar_count=4" since all four fields are summed over.
866       Example: mlr merge-fields -a sum,count -c in_,out_
867         produces "a_x_sum=3,a_x_count=2,b_y_sum=4,b_y_count=1,b_x_sum=8,b_x_count=1"
868         since "a_in_x" and "a_out_x" both collapse to "a_x", "b_in_y" collapses to
869         "b_y", and "b_out_x" collapses to "b_x".
870
871   most-frequent
872       Usage: mlr most-frequent [options]
873       Shows the most frequently occurring distinct values for specified field names.
874       The first entry is the statistical mode; the remaining are runners-up.
875       Options:
876       -f {one or more comma-separated field names}. Required flag.
877       -n {count}. Optional flag defaulting to 10.
878       -b          Suppress counts; show only field values.
879       -o {name}   Field name for output count. Default "count".
880       See also "mlr least-frequent".
881
882   nest
883       Usage: mlr nest [options]
884       Explodes specified field values into separate fields/records, or reverses this.
885       Options:
886         --explode,--implode   One is required.
887         --values,--pairs      One is required.
888         --across-records,--across-fields One is required.
889         -f {field name}       Required.
890         --nested-fs {string}  Defaults to ";". Field separator for nested values.
891         --nested-ps {string}  Defaults to ":". Pair separator for nested key-value pairs.
892         --evar {string}       Shorthand for --explode --values ---across-records --nested-fs {string}
893         --ivar {string}       Shorthand for --implode --values ---across-records --nested-fs {string}
894       Please use "mlr --usage-separator-options" for information on specifying separators.
895
896       Examples:
897
898         mlr nest --explode --values --across-records -f x
899         with input record "x=a;b;c,y=d" produces output records
900           "x=a,y=d"
901           "x=b,y=d"
902           "x=c,y=d"
903         Use --implode to do the reverse.
904
905         mlr nest --explode --values --across-fields -f x
906         with input record "x=a;b;c,y=d" produces output records
907           "x_1=a,x_2=b,x_3=c,y=d"
908         Use --implode to do the reverse.
909
910         mlr nest --explode --pairs --across-records -f x
911         with input record "x=a:1;b:2;c:3,y=d" produces output records
912           "a=1,y=d"
913           "b=2,y=d"
914           "c=3,y=d"
915
916         mlr nest --explode --pairs --across-fields -f x
917         with input record "x=a:1;b:2;c:3,y=d" produces output records
918           "a=1,b=2,c=3,y=d"
919
920       Notes:
921       * With --pairs, --implode doesn't make sense since the original field name has
922         been lost.
923       * The combination "--implode --values --across-records" is non-streaming:
924         no output records are produced until all input records have been read. In
925         particular, this means it won't work in tail -f contexts. But all other flag
926         combinations result in streaming (tail -f friendly) data processing.
927       * It's up to you to ensure that the nested-fs is distinct from your data's IFS:
928         e.g. by default the former is semicolon and the latter is comma.
929       See also mlr reshape.
930
931   nothing
932       Usage: mlr nothing
933       Drops all input records. Useful for testing, or after tee/print/etc. have
934       produced other output.
935
936   put
937       Usage: mlr put [options] {expression}
938       Adds/updates specified field(s). Expressions are semicolon-separated and must
939       either be assignments, or evaluate to boolean.  Booleans with following
940       statements in curly braces control whether those statements are executed;
941       booleans without following curly braces do nothing except side effects (e.g.
942       regex-captures into \1, \2, etc.).
943
944       Conversion options:
945       -S: Keeps field values as strings with no type inference to int or float.
946       -F: Keeps field values as strings or floats with no inference to int.
947       All field values are type-inferred to int/float/string unless this behavior is
948       suppressed with -S or -F.
949
950       Output/formatting options:
951       --oflatsep {string}: Separator to use when flattening multi-level @-variables
952           to output records for emit. Default ":".
953       --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
954       --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
955       Any of the output-format command-line flags (see mlr -h). Example: using
956         mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
957       the input is CSV, the output is pretty-print tabular, but the tee-file output
958       is written in JSON format.
959       --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
960           record.
961
962       Expression-specification options:
963       -f {filename}: the DSL expression is taken from the specified file rather
964           than from the command line. Outer single quotes wrapping the expression
965           should not be placed in the file. If -f is specified more than once,
966           all input files specified using -f are concatenated to produce the expression.
967           (For example, you can define functions in one file and call them from another.)
968       -e {expression}: You can use this after -f to add an expression. Example use
969           case: define functions/subroutines in a file you specify with -f, then call
970           them with an expression you specify with -e.
971       (If you mix -e and -f then the expressions are evaluated in the order encountered.
972       Since the expression pieces are simply concatenated, please be sure to use intervening
973       semicolons to separate expressions.)
974
975       -s name=value: Predefines out-of-stream variable @name to have value "value".
976           Thus mlr put put -s foo=97 '$column += @foo' is like
977           mlr put put 'begin {@foo = 97} $column += @foo'.
978           The value part is subject to type-inferencing as specified by -S/-F.
979           May be specified more than once, e.g. -s name1=value1 -s name2=value2.
980           Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE
981
982       Tracing options:
983       -v: Prints the expressions's AST (abstract syntax tree), which gives
984           full transparency on the precedence and associativity rules of
985           Miller's grammar, to stdout.
986       -a: Prints a low-level stack-allocation trace to stdout.
987       -t: Prints a low-level parser trace to stderr.
988       -T: Prints a every statement to stderr as it is executed.
989
990       Other options:
991       -q: Does not include the modified record in the output stream. Useful for when
992           all desired output is in begin and/or end blocks.
993
994       Please use a dollar sign for field names and double-quotes for string
995       literals. If field names have special characters such as "." then you might
996       use braces, e.g. '${field.name}'. Miller built-in variables are
997       NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
998       variables. The environment-variable name may be an expression, e.g. a field
999       value.
1000
1001       Use # to comment to end of line.
1002
1003       Examples:
1004         mlr put '$y = log10($x); $z = sqrt($y)'
1005         mlr put '$x>0.0 { $y=log10($x); $z=sqrt($y) }' # does {...} only if $x > 0.0
1006         mlr put '$x>0.0;  $y=log10($x); $z=sqrt($y)'   # does all three statements
1007         mlr put '$a =~ "([a-z]+)_([0-9]+);  $b = "left_\1"; $c = "right_\2"'
1008         mlr put '$a =~ "([a-z]+)_([0-9]+) { $b = "left_\1"; $c = "right_\2" }'
1009         mlr put '$filename = FILENAME'
1010         mlr put '$colored_shape = $color . "_" . $shape'
1011         mlr put '$y = cos($theta); $z = atan2($y, $x)'
1012         mlr put '$name = sub($name, "http.*com"i, "")'
1013         mlr put -q '@sum += $x; end {emit @sum}'
1014         mlr put -q '@sum[$a] += $x; end {emit @sum, "a"}'
1015         mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}'
1016         mlr put -q '@min=min(@min,$x);@max=max(@max,$x); end{emitf @min, @max}'
1017         mlr put -q 'is_null(@xmax) || $x > @xmax {@xmax=$x; @recmax=$*}; end {emit @recmax}'
1018         mlr put '
1019           $x = 1;
1020          #$y = 2;
1021           $z = 3
1022         '
1023
1024       Please see also 'mlr -k' for examples using redirected output.
1025
1026       Please see http://johnkerl.org/miller/doc/reference.html for more information
1027       including function list. Or "mlr -f".
1028       Please see in particular:
1029         http://www.johnkerl.org/miller/doc/reference-verbs.html#put
1030
1031   regularize
1032       Usage: mlr regularize
1033       For records seen earlier in the data stream with same field names in
1034       a different order, outputs them with field names in the previously
1035       encountered order.
1036       Example: input records a=1,c=2,b=3, then e=4,d=5, then c=7,a=6,b=8
1037       output as              a=1,c=2,b=3, then e=4,d=5, then a=6,c=7,b=8
1038
1039   remove-empty-columns
1040       Usage: mlr remove-empty-columns
1041       Omits fields which are empty on every input row. Non-streaming.
1042
1043   rename
1044       Usage: mlr rename [options] {old1,new1,old2,new2,...}
1045       Renames specified fields.
1046       Options:
1047       -r         Treat old field  names as regular expressions. "ab", "a.*b"
1048                  will match any field name containing the substring "ab" or
1049                  matching "a.*b", respectively; anchors of the form "^ab$",
1050                  "^a.*b$" may be used. New field names may be plain strings,
1051                  or may contain capture groups of the form "\1" through
1052                  "\9". Wrapping the regex in double quotes is optional, but
1053                  is required if you wish to follow it with 'i' to indicate
1054                  case-insensitivity.
1055       -g         Do global replacement within each field name rather than
1056                  first-match replacement.
1057       Examples:
1058       mlr rename old_name,new_name'
1059       mlr rename old_name_1,new_name_1,old_name_2,new_name_2'
1060       mlr rename -r 'Date_[0-9]+,Date,'  Rename all such fields to be "Date"
1061       mlr rename -r '"Date_[0-9]+",Date' Same
1062       mlr rename -r 'Date_([0-9]+).*,\1' Rename all such fields to be of the form 20151015
1063       mlr rename -r '"name"i,Name'       Rename "name", "Name", "NAME", etc. to "Name"
1064
1065   reorder
1066       Usage: mlr reorder [options]
1067       -f {a,b,c}   Field names to reorder.
1068       -e           Put specified field names at record end: default is to put
1069                    them at record start.
1070       Examples:
1071       mlr reorder    -f a,b sends input record "d=4,b=2,a=1,c=3" to "a=1,b=2,d=4,c=3".
1072       mlr reorder -e -f a,b sends input record "d=4,b=2,a=1,c=3" to "d=4,c=3,a=1,b=2".
1073
1074   repeat
1075       Usage: mlr repeat [options]
1076       Copies input records to output records multiple times.
1077       Options must be exactly one of the following:
1078         -n {repeat count}  Repeat each input record this many times.
1079         -f {field name}    Same, but take the repeat count from the specified
1080                            field name of each input record.
1081       Example:
1082         echo x=0 | mlr repeat -n 4 then put '$x=urand()'
1083       produces:
1084        x=0.488189
1085        x=0.484973
1086        x=0.704983
1087        x=0.147311
1088       Example:
1089         echo a=1,b=2,c=3 | mlr repeat -f b
1090       produces:
1091         a=1,b=2,c=3
1092         a=1,b=2,c=3
1093       Example:
1094         echo a=1,b=2,c=3 | mlr repeat -f c
1095       produces:
1096         a=1,b=2,c=3
1097         a=1,b=2,c=3
1098         a=1,b=2,c=3
1099
1100   reshape
1101       Usage: mlr reshape [options]
1102       Wide-to-long options:
1103         -i {input field names}   -o {key-field name,value-field name}
1104         -r {input field regexes} -o {key-field name,value-field name}
1105         These pivot/reshape the input data such that the input fields are removed
1106         and separate records are emitted for each key/value pair.
1107         Note: this works with tail -f and produces output records for each input
1108         record seen.
1109       Long-to-wide options:
1110         -s {key-field name,value-field name}
1111         These pivot/reshape the input data to undo the wide-to-long operation.
1112         Note: this does not work with tail -f; it produces output records only after
1113         all input records have been read.
1114
1115       Examples:
1116
1117         Input file "wide.txt":
1118           time       X           Y
1119           2009-01-01 0.65473572  2.4520609
1120           2009-01-02 -0.89248112 0.2154713
1121           2009-01-03 0.98012375  1.3179287
1122
1123         mlr --pprint reshape -i X,Y -o item,value wide.txt
1124           time       item value
1125           2009-01-01 X    0.65473572
1126           2009-01-01 Y    2.4520609
1127           2009-01-02 X    -0.89248112
1128           2009-01-02 Y    0.2154713
1129           2009-01-03 X    0.98012375
1130           2009-01-03 Y    1.3179287
1131
1132         mlr --pprint reshape -r '[A-Z]' -o item,value wide.txt
1133           time       item value
1134           2009-01-01 X    0.65473572
1135           2009-01-01 Y    2.4520609
1136           2009-01-02 X    -0.89248112
1137           2009-01-02 Y    0.2154713
1138           2009-01-03 X    0.98012375
1139           2009-01-03 Y    1.3179287
1140
1141         Input file "long.txt":
1142           time       item value
1143           2009-01-01 X    0.65473572
1144           2009-01-01 Y    2.4520609
1145           2009-01-02 X    -0.89248112
1146           2009-01-02 Y    0.2154713
1147           2009-01-03 X    0.98012375
1148           2009-01-03 Y    1.3179287
1149
1150         mlr --pprint reshape -s item,value long.txt
1151           time       X           Y
1152           2009-01-01 0.65473572  2.4520609
1153           2009-01-02 -0.89248112 0.2154713
1154           2009-01-03 0.98012375  1.3179287
1155       See also mlr nest.
1156
1157   sample
1158       Usage: mlr sample [options]
1159       Reservoir sampling (subsampling without replacement), optionally by category.
1160       -k {count}    Required: number of records to output, total, or by group if using -g.
1161       -g {a,b,c}    Optional: group-by-field names for samples.
1162       See also mlr bootstrap and mlr shuffle.
1163
1164   sec2gmt
1165       Usage: mlr sec2gmt [options] {comma-separated list of field names}
1166       Replaces a numeric field representing seconds since the epoch with the
1167       corresponding GMT timestamp; leaves non-numbers as-is. This is nothing
1168       more than a keystroke-saver for the sec2gmt function:
1169         mlr sec2gmt time1,time2
1170       is the same as
1171         mlr put '$time1=sec2gmt($time1);$time2=sec2gmt($time2)'
1172       Options:
1173       -1 through -9: format the seconds using 1..9 decimal places, respectively.
1174
1175   sec2gmtdate
1176       Usage: mlr sec2gmtdate {comma-separated list of field names}
1177       Replaces a numeric field representing seconds since the epoch with the
1178       corresponding GMT year-month-day timestamp; leaves non-numbers as-is.
1179       This is nothing more than a keystroke-saver for the sec2gmtdate function:
1180         mlr sec2gmtdate time1,time2
1181       is the same as
1182         mlr put '$time1=sec2gmtdate($time1);$time2=sec2gmtdate($time2)'
1183
1184   seqgen
1185       Usage: mlr seqgen [options]
1186       Produces a sequence of counters.  Discards the input record stream. Produces
1187       output as specified by the following options:
1188       -f {name} Field name for counters; default "i".
1189       --start {number} Inclusive start value; default "1".
1190       --stop  {number} Inclusive stop value; default "100".
1191       --step  {number} Step value; default "1".
1192       Start, stop, and/or step may be floating-point. Output is integer if start,
1193       stop, and step are all integers. Step may be negative. It may not be zero
1194       unless start == stop.
1195
1196   shuffle
1197       Usage: mlr shuffle {no options}
1198       Outputs records randomly permuted. No output records are produced until
1199       all input records are read.
1200       See also mlr bootstrap and mlr sample.
1201
1202   skip-trivial-records
1203       Usage: mlr skip-trivial-records [options]
1204       Passes through all records except:
1205       * those with zero fields;
1206       * those for which all fields have empty value.
1207
1208   sort
1209       Usage: mlr sort {flags}
1210       Flags:
1211         -f  {comma-separated field names}  Lexical ascending
1212         -n  {comma-separated field names}  Numerical ascending; nulls sort last
1213         -nf {comma-separated field names}  Same as -n
1214         -r  {comma-separated field names}  Lexical descending
1215         -nr {comma-separated field names}  Numerical descending; nulls sort first
1216       Sorts records primarily by the first specified field, secondarily by the second
1217       field, and so on.  (Any records not having all specified sort keys will appear
1218       at the end of the output, in the order they were encountered, regardless of the
1219       specified sort order.) The sort is stable: records that compare equal will sort
1220       in the order they were encountered in the input record stream.
1221
1222       Example:
1223         mlr sort -f a,b -nr x,y,z
1224       which is the same as:
1225         mlr sort -f a -f b -nr x -nr y -nr z
1226
1227   stats1
1228       Usage: mlr stats1 [options]
1229       Computes univariate statistics for one or more given fields, accumulated across
1230       the input record stream.
1231       Options:
1232       -a {sum,count,...}  Names of accumulators: p10 p25.2 p50 p98 p100 etc. and/or
1233                           one or more of:
1234          count     Count instances of fields
1235          mode      Find most-frequently-occurring values for fields; first-found wins tie
1236          antimode  Find least-frequently-occurring values for fields; first-found wins tie
1237          sum       Compute sums of specified fields
1238          mean      Compute averages (sample means) of specified fields
1239          stddev    Compute sample standard deviation of specified fields
1240          var       Compute sample variance of specified fields
1241          meaneb    Estimate error bars for averages (assuming no sample autocorrelation)
1242          skewness  Compute sample skewness of specified fields
1243          kurtosis  Compute sample kurtosis of specified fields
1244          min       Compute minimum values of specified fields
1245          max       Compute maximum values of specified fields
1246       -f {a,b,c}   Value-field names on which to compute statistics
1247       --fr {regex} Regex for value-field names on which to compute statistics
1248                    (compute statistics on values in all field names matching regex)
1249       --fx {regex} Inverted regex for value-field names on which to compute statistics
1250                    (compute statistics on values in all field names not matching regex)
1251       -g {d,e,f}   Optional group-by-field names
1252       --gr {regex} Regex for optional group-by-field names
1253                    (group by values in field names matching regex)
1254       --gx {regex} Inverted regex for optional group-by-field names
1255                    (group by values in field names not matching regex)
1256       --grfx {regex} Shorthand for --gr {regex} --fx {that same regex}
1257       -i           Use interpolated percentiles, like R's type=7; default like type=1.
1258                    Not sensical for string-valued fields.
1259       -s           Print iterative stats. Useful in tail -f contexts (in which
1260                    case please avoid pprint-format output since end of input
1261                    stream will never be seen).
1262       -F           Computes integerable things (e.g. count) in floating point.
1263       Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
1264       Example: mlr stats1 -a count,mode -f size
1265       Example: mlr stats1 -a count,mode -f size -g shape
1266       Example: mlr stats1 -a count,mode --fr '^[a-h].*$' -gr '^k.*$'
1267                This computes count and mode statistics on all field names beginning
1268                with a through h, grouped by all field names starting with k.
1269       Notes:
1270       * p50 and median are synonymous.
1271       * min and max output the same results as p0 and p100, respectively, but use
1272         less memory.
1273       * String-valued data make sense unless arithmetic on them is required,
1274         e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
1275         numbers are less than strings.
1276       * count and mode allow text input; the rest require numeric input.
1277         In particular, 1 and 1.0 are distinct text for count and mode.
1278       * When there are mode ties, the first-encountered datum wins.
1279
1280   stats2
1281       Usage: mlr stats2 [options]
1282       Computes bivariate statistics for one or more given field-name pairs,
1283       accumulated across the input record stream.
1284       -a {linreg-ols,corr,...}  Names of accumulators: one or more of:
1285         linreg-pca   Linear regression using principal component analysis
1286         linreg-ols   Linear regression using ordinary least squares
1287         r2           Quality metric for linreg-ols (linreg-pca emits its own)
1288         logireg      Logistic regression
1289         corr         Sample correlation
1290         cov          Sample covariance
1291         covx         Sample-covariance matrix
1292       -f {a,b,c,d}   Value-field name-pairs on which to compute statistics.
1293                      There must be an even number of names.
1294       -g {e,f,g}     Optional group-by-field names.
1295       -v             Print additional output for linreg-pca.
1296       -s             Print iterative stats. Useful in tail -f contexts (in which
1297                      case please avoid pprint-format output since end of input
1298                      stream will never be seen).
1299       --fit          Rather than printing regression parameters, applies them to
1300                      the input data to compute new fit fields. All input records are
1301                      held in memory until end of input stream. Has effect only for
1302                      linreg-ols, linreg-pca, and logireg.
1303       Only one of -s or --fit may be used.
1304       Example: mlr stats2 -a linreg-pca -f x,y
1305       Example: mlr stats2 -a linreg-ols,r2 -f x,y -g size,shape
1306       Example: mlr stats2 -a corr -f x,y
1307
1308   step
1309       Usage: mlr step [options]
1310       Computes values dependent on the previous record, optionally grouped
1311       by category.
1312
1313       Options:
1314       -a {delta,rsum,...}   Names of steppers: comma-separated, one or more of:
1315         delta    Compute differences in field(s) between successive records
1316         shift    Include value(s) in field(s) from previous record, if any
1317         from-first Compute differences in field(s) from first record
1318         ratio    Compute ratios in field(s) between successive records
1319         rsum     Compute running sums of field(s) between successive records
1320         counter  Count instances of field(s) between successive records
1321         ewma     Exponentially weighted moving average over successive records
1322       -f {a,b,c} Value-field names on which to compute statistics
1323       -g {d,e,f} Optional group-by-field names
1324       -F         Computes integerable things (e.g. counter) in floating point.
1325       -d {x,y,z} Weights for ewma. 1 means current sample gets all weight (no
1326                  smoothing), near under under 1 is light smoothing, near over 0 is
1327                  heavy smoothing. Multiple weights may be specified, e.g.
1328                  "mlr step -a ewma -f sys_load -d 0.01,0.1,0.9". Default if omitted
1329                  is "-d 0.5".
1330       -o {a,b,c} Custom suffixes for EWMA output fields. If omitted, these default to
1331                  the -d values. If supplied, the number of -o values must be the same
1332                  as the number of -d values.
1333
1334       Examples:
1335         mlr step -a rsum -f request_size
1336         mlr step -a delta -f request_size -g hostname
1337         mlr step -a ewma -d 0.1,0.9 -f x,y
1338         mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y
1339         mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y -g group_name
1340
1341       Please see http://johnkerl.org/miller/doc/reference-verbs.html#filter or
1342       https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average
1343       for more information on EWMA.
1344
1345   tac
1346       Usage: mlr tac
1347       Prints records in reverse order from the order in which they were encountered.
1348
1349   tail
1350       Usage: mlr tail [options]
1351       -n {count}    Tail count to print; default 10
1352       -g {a,b,c}    Optional group-by-field names for tail counts
1353       Passes through the last n records, optionally by category.
1354
1355   tee
1356       Usage: mlr tee [options] {filename}
1357       Passes through input records (like mlr cat) but also writes to specified output
1358       file, using output-format flags from the command line (e.g. --ocsv). See also
1359       the "tee" keyword within mlr put, which allows data-dependent filenames.
1360       Options:
1361       -a:          append to existing file, if any, rather than overwriting.
1362       --no-fflush: don't call fflush() after every record.
1363       Any of the output-format command-line flags (see mlr -h). Example: using
1364         mlr --icsv --opprint put '...' then tee --ojson ./mytap.dat then stats1 ...
1365       the input is CSV, the output is pretty-print tabular, but the tee-file output
1366       is written in JSON format.
1367
1368   top
1369       Usage: mlr top [options]
1370       -f {a,b,c}    Value-field names for top counts.
1371       -g {d,e,f}    Optional group-by-field names for top counts.
1372       -n {count}    How many records to print per category; default 1.
1373       -a            Print all fields for top-value records; default is
1374                     to print only value and group-by fields. Requires a single
1375                     value-field name only.
1376       --min         Print top smallest values; default is top largest values.
1377       -F            Keep top values as floats even if they look like integers.
1378       -o {name}     Field name for output indices. Default "top_idx".
1379       Prints the n records with smallest/largest values at specified fields,
1380       optionally by category.
1381
1382   uniq
1383       Usage: mlr uniq [options]
1384       Prints distinct values for specified field names. With -c, same as
1385       count-distinct. For uniq, -f is a synonym for -g.
1386
1387       Options:
1388       -g {d,e,f}    Group-by-field names for uniq counts.
1389       -c            Show repeat counts in addition to unique values.
1390       -n            Show only the number of distinct values.
1391       -o {name}     Field name for output count. Default "count".
1392       -a            Output each unique record only once. Incompatible with -g.
1393                     With -c, produces unique records, with repeat counts for each.
1394                     With -n, produces only one record which is the unique-record count.
1395                     With neither -c nor -n, produces unique records.
1396
1397   unsparsify
1398       Usage: mlr unsparsify [options]
1399       Prints records with the union of field names over all input records.
1400       For field names absent in a given record but present in others, fills in
1401       a value. This verb retains all input before producing any output.
1402
1403       Options:
1404       --fill-with {filler string}  What to fill absent fields with. Defaults to
1405                                    the empty string.
1406
1407       Example: if the input is two records, one being 'a=1,b=2' and the other
1408       being 'b=3,c=4', then the output is the two records 'a=1,b=2,c=' and
1409       ’a=,b=3,c=4'.
1410

FUNCTIONS FOR FILTER/PUT

1412   +
1413       (class=arithmetic #args=2): Addition.
1414
1415       + (class=arithmetic #args=1): Unary plus.
1416
1417   -
1418       (class=arithmetic #args=2): Subtraction.
1419
1420       - (class=arithmetic #args=1): Unary minus.
1421
1422   *
1423       (class=arithmetic #args=2): Multiplication.
1424
1425   /
1426       (class=arithmetic #args=2): Division.
1427
1428   //
1429       (class=arithmetic #args=2): Integer division: rounds to negative (pythonic).
1430
1431   .+
1432       (class=arithmetic #args=2): Addition, with integer-to-integer overflow
1433
1434       .+ (class=arithmetic #args=1): Unary plus, with integer-to-integer overflow.
1435
1436   .-
1437       (class=arithmetic #args=2): Subtraction, with integer-to-integer overflow.
1438
1439       .- (class=arithmetic #args=1): Unary minus, with integer-to-integer overflow.
1440
1441   .*
1442       (class=arithmetic #args=2): Multiplication, with integer-to-integer overflow.
1443
1444   ./
1445       (class=arithmetic #args=2): Division, with integer-to-integer overflow.
1446
1447   .//
1448       (class=arithmetic #args=2): Integer division: rounds to negative (pythonic), with integer-to-integer overflow.
1449
1450   %
1451       (class=arithmetic #args=2): Remainder; never negative-valued (pythonic).
1452
1453   **
1454       (class=arithmetic #args=2): Exponentiation; same as pow, but as an infix
1455       operator.
1456
1457   |
1458       (class=arithmetic #args=2): Bitwise OR.
1459
1460   ^
1461       (class=arithmetic #args=2): Bitwise XOR.
1462
1463   &
1464       (class=arithmetic #args=2): Bitwise AND.
1465
1466   ~
1467       (class=arithmetic #args=1): Bitwise NOT. Beware '$y=~$x' since =~ is the
1468       regex-match operator: try '$y = ~$x'.
1469
1470   <<
1471       (class=arithmetic #args=2): Bitwise left-shift.
1472
1473   >>
1474       (class=arithmetic #args=2): Bitwise right-shift.
1475
1476   bitcount
1477       (class=arithmetic #args=1): Count of 1-bits
1478
1479   ==
1480       (class=boolean #args=2): String/numeric equality. Mixing number and string
1481       results in string compare.
1482
1483   !=
1484       (class=boolean #args=2): String/numeric inequality. Mixing number and string
1485       results in string compare.
1486
1487   =~
1488       (class=boolean #args=2): String (left-hand side) matches regex (right-hand
1489       side), e.g. '$name =~ "^a.*b$"'.
1490
1491   !=~
1492       (class=boolean #args=2): String (left-hand side) does not match regex
1493       (right-hand side), e.g. '$name !=~ "^a.*b$"'.
1494
1495   >
1496       (class=boolean #args=2): String/numeric greater-than. Mixing number and string
1497       results in string compare.
1498
1499   >=
1500       (class=boolean #args=2): String/numeric greater-than-or-equals. Mixing number
1501       and string results in string compare.
1502
1503   <
1504       (class=boolean #args=2): String/numeric less-than. Mixing number and string
1505       results in string compare.
1506
1507   <=
1508       (class=boolean #args=2): String/numeric less-than-or-equals. Mixing number
1509       and string results in string compare.
1510
1511   &&
1512       (class=boolean #args=2): Logical AND.
1513
1514   ||
1515       (class=boolean #args=2): Logical OR.
1516
1517   ^^
1518       (class=boolean #args=2): Logical XOR.
1519
1520   !
1521       (class=boolean #args=1): Logical negation.
1522
1523   ? :
1524       (class=boolean #args=3): Ternary operator.
1525
1526   .
1527       (class=string #args=2): String concatenation.
1528
1529   gsub
1530       (class=string #args=3): Example: '$name=gsub($name, "old", "new")'
1531       (replace all).
1532
1533   regextract
1534       (class=string #args=2): Example: '$name=regextract($name, "[A-Z]{3}[0-9]{2}")'
1535       .
1536
1537   regextract_or_else
1538       (class=string #args=3): Example: '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")'
1539       .
1540
1541   strlen
1542       (class=string #args=1): String length.
1543
1544   sub
1545       (class=string #args=3): Example: '$name=sub($name, "old", "new")'
1546       (replace once).
1547
1548   ssub
1549       (class=string #args=3): Like sub but does no regexing. No characters are special.
1550
1551   substr
1552       (class=string #args=3): substr(s,m,n) gives substring of s from 0-up position m to n
1553       inclusive. Negative indices -len .. -1 alias to 0 .. len-1.
1554
1555   tolower
1556       (class=string #args=1): Convert string to lowercase.
1557
1558   toupper
1559       (class=string #args=1): Convert string to uppercase.
1560
1561   capitalize
1562       (class=string #args=1): Convert string's first character to uppercase.
1563
1564   lstrip
1565       (class=string #args=1): Strip leading whitespace from string.
1566
1567   rstrip
1568       (class=string #args=1): Strip trailing whitespace from string.
1569
1570   strip
1571       (class=string #args=1): Strip leading and trailing whitespace from string.
1572
1573   collapse_whitespace
1574       (class=string #args=1): Strip repeated whitespace from string.
1575
1576   clean_whitespace
1577       (class=string #args=1): Same as collapse_whitespace and strip.
1578
1579   system
1580       (class=string #args=1): Run command string, yielding its stdout minus final carriage return.
1581
1582   abs
1583       (class=math #args=1): Absolute value.
1584
1585   acos
1586       (class=math #args=1): Inverse trigonometric cosine.
1587
1588   acosh
1589       (class=math #args=1): Inverse hyperbolic cosine.
1590
1591   asin
1592       (class=math #args=1): Inverse trigonometric sine.
1593
1594   asinh
1595       (class=math #args=1): Inverse hyperbolic sine.
1596
1597   atan
1598       (class=math #args=1): One-argument arctangent.
1599
1600   atan2
1601       (class=math #args=2): Two-argument arctangent.
1602
1603   atanh
1604       (class=math #args=1): Inverse hyperbolic tangent.
1605
1606   cbrt
1607       (class=math #args=1): Cube root.
1608
1609   ceil
1610       (class=math #args=1): Ceiling: nearest integer at or above.
1611
1612   cos
1613       (class=math #args=1): Trigonometric cosine.
1614
1615   cosh
1616       (class=math #args=1): Hyperbolic cosine.
1617
1618   erf
1619       (class=math #args=1): Error function.
1620
1621   erfc
1622       (class=math #args=1): Complementary error function.
1623
1624   exp
1625       (class=math #args=1): Exponential function e**x.
1626
1627   expm1
1628       (class=math #args=1): e**x - 1.
1629
1630   floor
1631       (class=math #args=1): Floor: nearest integer at or below.
1632
1633   invqnorm
1634       (class=math #args=1): Inverse of normal cumulative distribution
1635       function. Note that invqorm(urand()) is normally distributed.
1636
1637   log
1638       (class=math #args=1): Natural (base-e) logarithm.
1639
1640   log10
1641       (class=math #args=1): Base-10 logarithm.
1642
1643   log1p
1644       (class=math #args=1): log(1-x).
1645
1646   logifit
1647       (class=math #args=3): Given m and b from logistic regression, compute
1648       fit: $yhat=logifit($x,$m,$b).
1649
1650   madd
1651       (class=math #args=3): a + b mod m (integers)
1652
1653   max
1654       (class=math variadic): max of n numbers; null loses
1655
1656   mexp
1657       (class=math #args=3): a ** b mod m (integers)
1658
1659   min
1660       (class=math variadic): Min of n numbers; null loses
1661
1662   mmul
1663       (class=math #args=3): a * b mod m (integers)
1664
1665   msub
1666       (class=math #args=3): a - b mod m (integers)
1667
1668   pow
1669       (class=math #args=2): Exponentiation; same as **.
1670
1671   qnorm
1672       (class=math #args=1): Normal cumulative distribution function.
1673
1674   round
1675       (class=math #args=1): Round to nearest integer.
1676
1677   roundm
1678       (class=math #args=2): Round to nearest multiple of m: roundm($x,$m) is
1679       the same as round($x/$m)*$m
1680
1681   sgn
1682       (class=math #args=1): +1 for positive input, 0 for zero input, -1 for
1683       negative input.
1684
1685   sin
1686       (class=math #args=1): Trigonometric sine.
1687
1688   sinh
1689       (class=math #args=1): Hyperbolic sine.
1690
1691   sqrt
1692       (class=math #args=1): Square root.
1693
1694   tan
1695       (class=math #args=1): Trigonometric tangent.
1696
1697   tanh
1698       (class=math #args=1): Hyperbolic tangent.
1699
1700   urand
1701       (class=math #args=0): Floating-point numbers uniformly distributed on the unit interval.
1702       Int-valued example: '$n=floor(20+urand()*11)'.
1703
1704   urandrange
1705       (class=math #args=2): Floating-point numbers uniformly distributed on the interval [a, b).
1706
1707   urand32
1708       (class=math #args=0): Integer uniformly distributed 0 and 2**32-1
1709       inclusive.
1710
1711   urandint
1712       (class=math #args=2): Integer uniformly distributed between inclusive
1713       integer endpoints.
1714
1715   dhms2fsec
1716       (class=time #args=1): Recovers floating-point seconds as in
1717       dhms2fsec("5d18h53m20.250000s") = 500000.250000
1718
1719   dhms2sec
1720       (class=time #args=1): Recovers integer seconds as in
1721       dhms2sec("5d18h53m20s") = 500000
1722
1723   fsec2dhms
1724       (class=time #args=1): Formats floating-point seconds as in
1725       fsec2dhms(500000.25) = "5d18h53m20.250000s"
1726
1727   fsec2hms
1728       (class=time #args=1): Formats floating-point seconds as in
1729       fsec2hms(5000.25) = "01:23:20.250000"
1730
1731   gmt2sec
1732       (class=time #args=1): Parses GMT timestamp as integer seconds since
1733       the epoch.
1734
1735   localtime2sec
1736       (class=time #args=1): Parses local timestamp as integer seconds since
1737       the epoch. Consults $TZ environment variable.
1738
1739   hms2fsec
1740       (class=time #args=1): Recovers floating-point seconds as in
1741       hms2fsec("01:23:20.250000") = 5000.250000
1742
1743   hms2sec
1744       (class=time #args=1): Recovers integer seconds as in
1745       hms2sec("01:23:20") = 5000
1746
1747   sec2dhms
1748       (class=time #args=1): Formats integer seconds as in sec2dhms(500000)
1749       = "5d18h53m20s"
1750
1751   sec2gmt
1752       (class=time #args=1): Formats seconds since epoch (integer part)
1753       as GMT timestamp, e.g. sec2gmt(1440768801.7) = "2015-08-28T13:33:21Z".
1754       Leaves non-numbers as-is.
1755
1756       sec2gmt (class=time #args=2): Formats seconds since epoch as GMT timestamp with n
1757       decimal places for seconds, e.g. sec2gmt(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1758       Leaves non-numbers as-is.
1759
1760   sec2gmtdate
1761       (class=time #args=1): Formats seconds since epoch (integer part)
1762       as GMT timestamp with year-month-date, e.g. sec2gmtdate(1440768801.7) = "2015-08-28".
1763       Leaves non-numbers as-is.
1764
1765   sec2localtime
1766       (class=time #args=1): Formats seconds since epoch (integer part)
1767       as local timestamp, e.g. sec2localtime(1440768801.7) = "2015-08-28T13:33:21Z".
1768       Consults $TZ environment variable. Leaves non-numbers as-is.
1769
1770       sec2localtime (class=time #args=2): Formats seconds since epoch as local timestamp with n
1771       decimal places for seconds, e.g. sec2localtime(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1772       Consults $TZ environment variable. Leaves non-numbers as-is.
1773
1774   sec2localdate
1775       (class=time #args=1): Formats seconds since epoch (integer part)
1776       as local timestamp with year-month-date, e.g. sec2localdate(1440768801.7) = "2015-08-28".
1777       Consults $TZ environment variable. Leaves non-numbers as-is.
1778
1779   sec2hms
1780       (class=time #args=1): Formats integer seconds as in
1781       sec2hms(5000) = "01:23:20"
1782
1783   strftime
1784       (class=time #args=2): Formats seconds since the epoch as timestamp, e.g.
1785       strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z", and
1786       strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z".
1787       Format strings are as in the C library (please see "man strftime" on your system),
1788       with the Miller-specific addition of "%1S" through "%9S" which format the seconds
1789       with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.)
1790       See also strftime_local.
1791
1792   strftime_local
1793       (class=time #args=2): Like strftime but consults the $TZ environment variable to get local time zone.
1794
1795   strptime
1796       (class=time #args=2): Parses timestamp as floating-point seconds since the epoch,
1797       e.g. strptime("2015-08-28T13:33:21Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.000000,
1798       and  strptime("2015-08-28T13:33:21.345Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.345000.
1799       See also strptime_local.
1800
1801   strptime_local
1802       (class=time #args=2): Like strptime, but consults $TZ environment variable to find and use local timezone.
1803
1804   systime
1805       (class=time #args=0): Floating-point seconds since the epoch,
1806       e.g. 1440768801.748936.
1807
1808   is_absent
1809       (class=typing #args=1): False if field is present in input, true otherwise
1810
1811   is_bool
1812       (class=typing #args=1): True if field is present with boolean value. Synonymous with is_boolean.
1813
1814   is_boolean
1815       (class=typing #args=1): True if field is present with boolean value. Synonymous with is_bool.
1816
1817   is_empty
1818       (class=typing #args=1): True if field is present in input with empty string value, false otherwise.
1819
1820   is_empty_map
1821       (class=typing #args=1): True if argument is a map which is empty.
1822
1823   is_float
1824       (class=typing #args=1): True if field is present with value inferred to be float
1825
1826   is_int
1827       (class=typing #args=1): True if field is present with value inferred to be int
1828
1829   is_map
1830       (class=typing #args=1): True if argument is a map.
1831
1832   is_nonempty_map
1833       (class=typing #args=1): True if argument is a map which is non-empty.
1834
1835   is_not_empty
1836       (class=typing #args=1): False if field is present in input with empty value, true otherwise
1837
1838   is_not_map
1839       (class=typing #args=1): True if argument is not a map.
1840
1841   is_not_null
1842       (class=typing #args=1): False if argument is null (empty or absent), true otherwise.
1843
1844   is_null
1845       (class=typing #args=1): True if argument is null (empty or absent), false otherwise.
1846
1847   is_numeric
1848       (class=typing #args=1): True if field is present with value inferred to be int or float
1849
1850   is_present
1851       (class=typing #args=1): True if field is present in input, false otherwise.
1852
1853   is_string
1854       (class=typing #args=1): True if field is present with string (including empty-string) value
1855
1856   asserting_absent
1857       (class=typing #args=1): Returns argument if it is absent in the input data, else
1858       throws an error.
1859
1860   asserting_bool
1861       (class=typing #args=1): Returns argument if it is present with boolean value, else
1862       throws an error.
1863
1864   asserting_boolean
1865       (class=typing #args=1): Returns argument if it is present with boolean value, else
1866       throws an error.
1867
1868   asserting_empty
1869       (class=typing #args=1): Returns argument if it is present in input with empty value,
1870       else throws an error.
1871
1872   asserting_empty_map
1873       (class=typing #args=1): Returns argument if it is a map with empty value, else
1874       throws an error.
1875
1876   asserting_float
1877       (class=typing #args=1): Returns argument if it is present with float value, else
1878       throws an error.
1879
1880   asserting_int
1881       (class=typing #args=1): Returns argument if it is present with int value, else
1882       throws an error.
1883
1884   asserting_map
1885       (class=typing #args=1): Returns argument if it is a map, else throws an error.
1886
1887   asserting_nonempty_map
1888       (class=typing #args=1): Returns argument if it is a non-empty map, else throws
1889       an error.
1890
1891   asserting_not_empty
1892       (class=typing #args=1): Returns argument if it is present in input with non-empty
1893       value, else throws an error.
1894
1895   asserting_not_map
1896       (class=typing #args=1): Returns argument if it is not a map, else throws an error.
1897
1898   asserting_not_null
1899       (class=typing #args=1): Returns argument if it is non-null (non-empty and non-absent),
1900       else throws an error.
1901
1902   asserting_null
1903       (class=typing #args=1): Returns argument if it is null (empty or absent), else throws
1904       an error.
1905
1906   asserting_numeric
1907       (class=typing #args=1): Returns argument if it is present with int or float value,
1908       else throws an error.
1909
1910   asserting_present
1911       (class=typing #args=1): Returns argument if it is present in input, else throws
1912       an error.
1913
1914   asserting_string
1915       (class=typing #args=1): Returns argument if it is present with string (including
1916       empty-string) value, else throws an error.
1917
1918   boolean
1919       (class=conversion #args=1): Convert int/float/bool/string to boolean.
1920
1921   float
1922       (class=conversion #args=1): Convert int/float/bool/string to float.
1923
1924   fmtnum
1925       (class=conversion #args=2): Convert int/float/bool to string using
1926       printf-style format string, e.g. '$s = fmtnum($n, "%06lld")'. WARNING: Miller numbers
1927       are all long long or double. If you use formats like %d or %f, behavior is undefined.
1928
1929   hexfmt
1930       (class=conversion #args=1): Convert int to string, e.g. 255 to "0xff".
1931
1932   int
1933       (class=conversion #args=1): Convert int/float/bool/string to int.
1934
1935   string
1936       (class=conversion #args=1): Convert int/float/bool/string to string.
1937
1938   typeof
1939       (class=conversion #args=1): Convert argument to type of argument (e.g.
1940       MT_STRING). For debug.
1941
1942   depth
1943       (class=maps #args=1): Prints maximum depth of hashmap: ''. Scalars have depth 0.
1944
1945   haskey
1946       (class=maps #args=2): True/false if map has/hasn't key, e.g. 'haskey($*, "a")' or
1947       ’haskey(mymap, mykey)'. Error if 1st argument is not a map.
1948
1949   joink
1950       (class=maps #args=2): Makes string from map keys. E.g. 'joink($*, ",")'.
1951
1952   joinkv
1953       (class=maps #args=3): Makes string from map key-value pairs. E.g. 'joinkv(@v[2], "=", ",")'
1954
1955   joinv
1956       (class=maps #args=2): Makes string from map keys. E.g. 'joinv(mymap, ",")'.
1957
1958   leafcount
1959       (class=maps #args=1): Counts total number of terminal values in hashmap. For single-level maps,
1960       same as length.
1961
1962   length
1963       (class=maps #args=1): Counts number of top-level entries in hashmap. Scalars have length 1.
1964
1965   mapdiff
1966       (class=maps variadic): With 0 args, returns empty map. With 1 arg, returns copy of arg.
1967       With 2 or more, returns copy of arg 1 with all keys from any of remaining argument maps removed.
1968
1969   mapexcept
1970       (class=maps variadic): Returns a map with keys from remaining arguments, if any, unset.
1971       E.g. 'mapexcept({1:2,3:4,5:6}, 1, 5, 7)' is '{3:4}'.
1972
1973   mapselect
1974       (class=maps variadic): Returns a map with only keys from remaining arguments set.
1975       E.g. 'mapselect({1:2,3:4,5:6}, 1, 5, 7)' is '{1:2,5:6}'.
1976
1977   mapsum
1978       (class=maps variadic): With 0 args, returns empty map. With >= 1 arg, returns a map with
1979       key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'.
1980
1981   splitkv
1982       (class=maps #args=3): Splits string by separators into map with type inference.
1983       E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives '{"a" : 1, "b" : 2, "c" : 3}'.
1984
1985   splitkvx
1986       (class=maps #args=3): Splits string by separators into map without type inference (keys and
1987       values are strings). E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives
1988       ’{"a" : "1", "b" : "2", "c" : "3"}'.
1989
1990   splitnv
1991       (class=maps #args=2): Splits string by separator into integer-indexed map with type inference.
1992       E.g. 'splitnv("a,b,c" , ",")' gives '{1 : "a", 2 : "b", 3 : "c"}'.
1993
1994   splitnvx
1995       (class=maps #args=2): Splits string by separator into integer-indexed map without type
1996       inference (values are strings). E.g. 'splitnv("4,5,6" , ",")' gives '{1 : "4", 2 : "5", 3 : "6"}'.
1997

KEYWORDS FOR PUT AND FILTER

1999   all
2000       all: used in "emit", "emitp", and "unset" as a synonym for @*
2001
2002   begin
2003       begin: defines a block of statements to be executed before input records
2004       are ingested. The body statements must be wrapped in curly braces.
2005       Example: 'begin { @count = 0 }'
2006
2007   bool
2008       bool: declares a boolean local variable in the current curly-braced scope.
2009       Type-checking happens at assignment: 'bool b = 1' is an error.
2010
2011   break
2012       break: causes execution to continue after the body of the current
2013       for/while/do-while loop.
2014
2015   call
2016       call: used for invoking a user-defined subroutine.
2017       Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
2018
2019   continue
2020       continue: causes execution to skip the remaining statements in the body of
2021       the current for/while/do-while loop. For-loop increments are still applied.
2022
2023   do
2024       do: with "while", introduces a do-while loop. The body statements must be wrapped
2025       in curly braces.
2026
2027   dump
2028       dump: prints all currently defined out-of-stream variables immediately
2029         to stdout as JSON.
2030
2031         With >, >>, or |, the data do not become part of the output record stream but
2032         are instead redirected.
2033
2034         The > and >> are for write and append, as in the shell, but (as with awk) the
2035         file-overwrite for > is on first write, not per record. The | is for piping to
2036         a process which will process the data. There will be one open file for each
2037         distinct file name (for > and >>) or one subordinate process for each distinct
2038         value of the piped-to command (for |). Output-formatting flags are taken from
2039         the main command line.
2040
2041         Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }'
2042         Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >  "mytap.dat"}'
2043         Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}'
2044         Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}'
2045
2046   edump
2047       edump: prints all currently defined out-of-stream variables immediately
2048         to stderr as JSON.
2049
2050         Example: mlr --from f.dat put -q '@v[NR]=$*; end { edump }'
2051
2052   elif
2053       elif: the way Miller spells "else if". The body statements must be wrapped
2054       in curly braces.
2055
2056   else
2057       else: terminates an if/elif/elif chain. The body statements must be wrapped
2058       in curly braces.
2059
2060   emit
2061       emit: inserts an out-of-stream variable into the output record stream. Hashmap
2062         indices present in the data but not slotted by emit arguments are not output.
2063
2064         With >, >>, or |, the data do not become part of the output record stream but
2065         are instead redirected.
2066
2067         The > and >> are for write and append, as in the shell, but (as with awk) the
2068         file-overwrite for > is on first write, not per record. The | is for piping to
2069         a process which will process the data. There will be one open file for each
2070         distinct file name (for > and >>) or one subordinate process for each distinct
2071         value of the piped-to command (for |). Output-formatting flags are taken from
2072         the main command line.
2073
2074         You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2075         etc., to control the format of the output if the output is redirected. See also mlr -h.
2076
2077         Example: mlr --from f.dat put 'emit >  "/tmp/data-".$a, $*'
2078         Example: mlr --from f.dat put 'emit >  "/tmp/data-".$a, mapexcept($*, "a")'
2079         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums'
2080         Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums'
2081         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"'
2082         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"'
2083         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >  "mytap.dat", @*, "index1", "index2"'
2084         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"'
2085         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"'
2086         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
2087         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"'
2088
2089         Please see http://johnkerl.org/miller/doc for more information.
2090
2091   emitf
2092       emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the
2093         output record stream.
2094
2095         With >, >>, or |, the data do not become part of the output record stream but
2096         are instead redirected.
2097
2098         The > and >> are for write and append, as in the shell, but (as with awk) the
2099         file-overwrite for > is on first write, not per record. The | is for piping to
2100         a process which will process the data. There will be one open file for each
2101         distinct file name (for > and >>) or one subordinate process for each distinct
2102         value of the piped-to command (for |). Output-formatting flags are taken from
2103         the main command line.
2104
2105         You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2106         etc., to control the format of the output if the output is redirected. See also mlr -h.
2107
2108         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a'
2109         Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a'
2110         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c'
2111         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c'
2112         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c'
2113         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c'
2114         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c'
2115         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c'
2116
2117         Please see http://johnkerl.org/miller/doc for more information.
2118
2119   emitp
2120       emitp: inserts an out-of-stream variable into the output record stream.
2121         Hashmap indices present in the data but not slotted by emitp arguments are
2122         output concatenated with ":".
2123
2124         With >, >>, or |, the data do not become part of the output record stream but
2125         are instead redirected.
2126
2127         The > and >> are for write and append, as in the shell, but (as with awk) the
2128         file-overwrite for > is on first write, not per record. The | is for piping to
2129         a process which will process the data. There will be one open file for each
2130         distinct file name (for > and >>) or one subordinate process for each distinct
2131         value of the piped-to command (for |). Output-formatting flags are taken from
2132         the main command line.
2133
2134         You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2135         etc., to control the format of the output if the output is redirected. See also mlr -h.
2136
2137         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums'
2138         Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums'
2139         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"'
2140         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"'
2141         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >  "mytap.dat", @*, "index1", "index2"'
2142         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"'
2143         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"'
2144         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
2145         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"'
2146
2147         Please see http://johnkerl.org/miller/doc for more information.
2148
2149   end
2150       end: defines a block of statements to be executed after input records
2151       are ingested. The body statements must be wrapped in curly braces.
2152       Example: 'end { emit @count }'
2153       Example: 'end { eprint "Final count is " . @count }'
2154
2155   eprint
2156       eprint: prints expression immediately to stderr.
2157         Example: mlr --from f.dat put -q 'eprint "The sum of x and y is ".($x+$y)'
2158         Example: mlr --from f.dat put -q 'for (k, v in $*) { eprint k . " => " . v }'
2159         Example: mlr --from f.dat put  '(NR % 1000 == 0) { eprint "Checkpoint ".NR}'
2160
2161   eprintn
2162       eprintn: prints expression immediately to stderr, without trailing newline.
2163         Example: mlr --from f.dat put -q 'eprintn "The sum of x and y is ".($x+$y); eprint ""'
2164
2165   false
2166       false: the boolean literal value.
2167
2168   filter
2169       filter: includes/excludes the record in the output record stream.
2170
2171         Example: mlr --from f.dat put 'filter (NR == 2 || $x > 5.4)'
2172
2173         Instead of put with 'filter false' you can simply use put -q.  The following
2174         uses the input record to accumulate data but only prints the running sum
2175         without printing the input record:
2176
2177         Example: mlr --from f.dat put -q '@running_sum += $x * $y; emit @running_sum'
2178
2179   float
2180       float: declares a floating-point local variable in the current curly-braced scope.
2181       Type-checking happens at assignment: 'float x = 0' is an error.
2182
2183   for
2184       for: defines a for-loop using one of three styles. The body statements must
2185       be wrapped in curly braces.
2186       For-loop over stream record:
2187         Example:  'for (k, v in $*) { ... }'
2188       For-loop over out-of-stream variables:
2189         Example: 'for (k, v in @counts) { ... }'
2190         Example: 'for ((k1, k2), v in @counts) { ... }'
2191         Example: 'for ((k1, k2, k3), v in @*) { ... }'
2192       C-style for-loop:
2193         Example:  'for (var i = 0, var b = 1; i < 10; i += 1, b *= 2) { ... }'
2194
2195   func
2196       func: used for defining a user-defined function.
2197       Example: 'func f(a,b) { return sqrt(a**2+b**2)} $d = f($x, $y)'
2198
2199   if
2200       if: starts an if/elif/elif chain. The body statements must be wrapped
2201       in curly braces.
2202
2203   in
2204       in: used in for-loops over stream records or out-of-stream variables.
2205
2206   int
2207       int: declares an integer local variable in the current curly-braced scope.
2208       Type-checking happens at assignment: 'int x = 0.0' is an error.
2209
2210   map
2211       map: declares an map-valued local variable in the current curly-braced scope.
2212       Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is
2213       always OK. map b = a is OK or not depending on whether a is a map.
2214
2215   num
2216       num: declares an int/float local variable in the current curly-braced scope.
2217       Type-checking happens at assignment: 'num b = true' is an error.
2218
2219   print
2220       print: prints expression immediately to stdout.
2221         Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)'
2222         Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }'
2223         Example: mlr --from f.dat put  '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
2224
2225   printn
2226       printn: prints expression immediately to stdout, without trailing newline.
2227         Example: mlr --from f.dat put -q 'printn "."; end { print "" }'
2228
2229   return
2230       return: specifies the return value from a user-defined function.
2231       Omitted return statements (including via if-branches) result in an absent-null
2232       return value, which in turns results in a skipped assignment to an LHS.
2233
2234   stderr
2235       stderr: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2236         to print to standard error.
2237
2238   stdout
2239       stdout: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2240         to print to standard output.
2241
2242   str
2243       str: declares a string local variable in the current curly-braced scope.
2244       Type-checking happens at assignment.
2245
2246   subr
2247       subr: used for defining a subroutine.
2248       Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
2249
2250   tee
2251       tee: prints the current record to specified file.
2252         This is an immediate print to the specified file (except for pprint format
2253         which of course waits until the end of the input stream to format all output).
2254
2255         The > and >> are for write and append, as in the shell, but (as with awk) the
2256         file-overwrite for > is on first write, not per record. The | is for piping to
2257         a process which will process the data. There will be one open file for each
2258         distinct file name (for > and >>) or one subordinate process for each distinct
2259         value of the piped-to command (for |). Output-formatting flags are taken from
2260         the main command line.
2261
2262         You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2263         etc., to control the format of the output. See also mlr -h.
2264
2265         emit with redirect and tee with redirect are identical, except tee can only
2266         output $*.
2267
2268         Example: mlr --from f.dat put 'tee >  "/tmp/data-".$a, $*'
2269         Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*'
2270         Example: mlr --from f.dat put 'tee >  stderr, $*'
2271         Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\]", $*'
2272         Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\] > /tmp/data-".$a, $*'
2273         Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2274         Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2275
2276   true
2277       true: the boolean literal value.
2278
2279   unset
2280       unset: clears field(s) from the current record, or an out-of-stream or local variable.
2281
2282         Example: mlr --from f.dat put 'unset $x'
2283         Example: mlr --from f.dat put 'unset $*'
2284         Example: mlr --from f.dat put 'for (k, v in $*) { if (k =~ "a.*") { unset $[k] } }'
2285         Example: mlr --from f.dat put '...; unset @sums'
2286         Example: mlr --from f.dat put '...; unset @sums["green"]'
2287         Example: mlr --from f.dat put '...; unset @*'
2288
2289   var
2290       var: declares an untyped local variable in the current curly-braced scope.
2291       Examples: 'var a=1', 'var xyz=""'
2292
2293   while
2294       while: introduces a while loop, or with "do", introduces a do-while loop.
2295       The body statements must be wrapped in curly braces.
2296
2297   ENV
2298       ENV: access to environment variables by name, e.g. '$home = ENV["HOME"]'
2299
2300   FILENAME
2301       FILENAME: evaluates to the name of the current file being processed.
2302
2303   FILENUM
2304       FILENUM: evaluates to the number of the current file being processed,
2305       starting with 1.
2306
2307   FNR
2308       FNR: evaluates to the number of the current record within the current file
2309       being processed, starting with 1. Resets at the start of each file.
2310
2311   IFS
2312       IFS: evaluates to the input field separator from the command line.
2313
2314   IPS
2315       IPS: evaluates to the input pair separator from the command line.
2316
2317   IRS
2318       IRS: evaluates to the input record separator from the command line,
2319       or to LF or CRLF from the input data if in autodetect mode (which is
2320       the default).
2321
2322   M_E
2323       M_E: the mathematical constant e.
2324
2325   M_PI
2326       M_PI: the mathematical constant pi.
2327
2328   NF
2329       NF: evaluates to the number of fields in the current record.
2330
2331   NR
2332       NR: evaluates to the number of the current record over all files
2333       being processed, starting with 1. Does not reset at the start of each file.
2334
2335   OFS
2336       OFS: evaluates to the output field separator from the command line.
2337
2338   OPS
2339       OPS: evaluates to the output pair separator from the command line.
2340
2341   ORS
2342       ORS: evaluates to the output record separator from the command line,
2343       or to LF or CRLF from the input data if in autodetect mode (which is
2344       the default).
2345

AUTHOR

2347       Miller is written by John Kerl <kerl.john.r@gmail.com>.
2348
2349       This manual page has been composed from Miller's help output by Eric
2350       MSP Veith <eveith@veith-m.de>.
2351

SEE ALSO

2353       awk(1), sed(1), cut(1), join(1), sort(1), RFC 4180: Common Format and
2354       MIME Type for Comma-Separated Values (CSV) Files, the miller website
2355       http://johnkerl.org/miller/doc
2356
2357
2358
2359                                  2020-09-03                         MILLER(1)
Impressum