1MILLER(1)                                                            MILLER(1)
2
3
4

NAME

6       miller - like awk, sed, cut, join, and sort for name-indexed data such
7       as CSV and tabular JSON.
8

SYNOPSIS

10       Usage: mlr [I/O options] {verb} [verb-dependent options ...] {zero or
11       more file names}
12
13

DESCRIPTION

15       Miller operates on key-value-pair data while the familiar Unix tools
16       operate on integer-indexed fields: if the natural data structure for
17       the latter is the array, then Miller's natural data structure is the
18       insertion-ordered hash map.  This encompasses a variety of data
19       formats, including but not limited to the familiar CSV, TSV, and JSON.
20       (Miller can handle positionally-indexed data as a special case.) This
21       manpage documents Miller v5.10.1.
22

EXAMPLES

24   COMMAND-LINE SYNTAX
25       mlr --csv cut -f hostname,uptime mydata.csv
26       mlr --tsv --rs lf filter '$status != "down" && $upsec >= 10000' *.tsv
27       mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat
28       grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group
29       mlr join -j account_id -f accounts.dat then group-by account_name balances.dat
30       mlr --json put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json
31       mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/*
32       mlr stats2 -a linreg-pca -f u,v -g shape data/*
33       mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}' data/*
34       mlr --from estimates.tbl put '
35         for (k,v in $*) {
36           if (is_numeric(v) && k =~ "^[t-z].*$") {
37             $sum += v; $count += 1
38           }
39         }
40         $mean = $sum / $count # no assignment if count unset'
41       mlr --from infile.dat put -f analyze.mlr
42       mlr --from infile.dat put 'tee > "./taps/data-".$a."-".$b, $*'
43       mlr --from infile.dat put 'tee | "gzip > ./taps/data-".$a."-".$b.".gz", $*'
44       mlr --from infile.dat put -q '@v=$*; dump | "jq .[]"'
45       mlr --from infile.dat put  '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
46
47   DATA FORMATS
48         DKVP: delimited key-value pairs (Miller default format)
49         +---------------------+
50         | apple=1,bat=2,cog=3 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
51         | dish=7,egg=8,flint  | Record 2: "dish" => "7", "egg" => "8", "3" => "flint"
52         +---------------------+
53
54         NIDX: implicitly numerically indexed (Unix-toolkit style)
55         +---------------------+
56         | the quick brown     | Record 1: "1" => "the", "2" => "quick", "3" => "brown"
57         | fox jumped          | Record 2: "1" => "fox", "2" => "jumped"
58         +---------------------+
59
60         CSV/CSV-lite: comma-separated values with separate header line
61         +---------------------+
62         | apple,bat,cog       |
63         | 1,2,3               | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
64         | 4,5,6               | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
65         +---------------------+
66
67         Tabular JSON: nested objects are supported, although arrays within them are not:
68         +---------------------+
69         | {                   |
70         |  "apple": 1,        | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
71         |  "bat": 2,          |
72         |  "cog": 3           |
73         | }                   |
74         | {                   |
75         |   "dish": {         | Record 2: "dish:egg" => "7", "dish:flint" => "8", "garlic" => ""
76         |     "egg": 7,       |
77         |     "flint": 8      |
78         |   },                |
79         |   "garlic": ""      |
80         | }                   |
81         +---------------------+
82
83         PPRINT: pretty-printed tabular
84         +---------------------+
85         | apple bat cog       |
86         | 1     2   3         | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
87         | 4     5   6         | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
88         +---------------------+
89
90         XTAB: pretty-printed transposed tabular
91         +---------------------+
92         | apple 1             | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
93         | bat   2             |
94         | cog   3             |
95         |                     |
96         | dish 7              | Record 2: "dish" => "7", "egg" => "8"
97         | egg  8              |
98         +---------------------+
99
100         Markdown tabular (supported for output only):
101         +-----------------------+
102         | | apple | bat | cog | |
103         | | ---   | --- | --- | |
104         | | 1     | 2   | 3   | | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
105         | | 4     | 5   | 6   | | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
106         +-----------------------+
107

OPTIONS

109       In the following option flags, the version with "i" designates the
110       input stream, "o" the output stream, and the version without prefix
111       sets the option for both input and output stream. For example: --irs
112       sets the input record separator, --ors the output record separator, and
113       --rs sets both the input and output separator to the given value.
114
115   HELP OPTIONS
116         -h or --help                 Show this message.
117         --version                    Show the software version.
118         {verb name} --help           Show verb-specific help.
119         --help-all-verbs             Show help on all verbs.
120         -l or --list-all-verbs       List only verb names.
121         -L                           List only verb names, one per line.
122         -f or --help-all-functions   Show help on all built-in functions.
123         -F                           Show a bare listing of built-in functions by name.
124         -k or --help-all-keywords    Show help on all keywords.
125         -K                           Show a bare listing of keywords by name.
126
127   VERB LIST
128        altkv bar bootstrap cat check clean-whitespace count count-distinct
129        count-similar cut decimate fill-down filter format-values fraction grep
130        group-by group-like having-fields head histogram join label least-frequent
131        merge-fields most-frequent nest nothing put regularize remove-empty-columns
132        rename reorder repeat reshape sample sec2gmt sec2gmtdate seqgen shuffle
133        skip-trivial-records sort sort-within-records stats1 stats2 step tac tail tee
134        top uniq unsparsify
135
136   FUNCTION LIST
137        + + - - * / // .+ .+ .- .- .* ./ .// % ** | ^ & ~ << >> bitcount == != =~ !=~
138        > >= < <= && || ^^ ! ? : . gsub regextract regextract_or_else strlen sub ssub
139        substr tolower toupper truncate capitalize lstrip rstrip strip
140        collapse_whitespace clean_whitespace system abs acos acosh asin asinh atan
141        atan2 atanh cbrt ceil cos cosh erf erfc exp expm1 floor invqnorm log log10
142        log1p logifit madd max mexp min mmul msub pow qnorm round roundm sgn sin sinh
143        sqrt tan tanh urand urandrange urand32 urandint dhms2fsec dhms2sec fsec2dhms
144        fsec2hms gmt2sec localtime2sec hms2fsec hms2sec sec2dhms sec2gmt sec2gmt
145        sec2gmtdate sec2localtime sec2localtime sec2localdate sec2hms strftime
146        strftime_local strptime strptime_local systime is_absent is_bool is_boolean
147        is_empty is_empty_map is_float is_int is_map is_nonempty_map is_not_empty
148        is_not_map is_not_null is_null is_numeric is_present is_string
149        asserting_absent asserting_bool asserting_boolean asserting_empty
150        asserting_empty_map asserting_float asserting_int asserting_map
151        asserting_nonempty_map asserting_not_empty asserting_not_map
152        asserting_not_null asserting_null asserting_numeric asserting_present
153        asserting_string boolean float fmtnum hexfmt int string typeof depth haskey
154        joink joinkv joinv leafcount length mapdiff mapexcept mapselect mapsum splitkv
155        splitkvx splitnv splitnvx
156
157       Please use "mlr --help-function {function name}" for function-specific help.
158
159   I/O FORMATTING
160         --idkvp   --odkvp   --dkvp      Delimited key-value pairs, e.g "a=1,b=2"
161                                         (this is Miller's default format).
162
163         --inidx   --onidx   --nidx      Implicitly-integer-indexed fields
164                                         (Unix-toolkit style).
165         -T                              Synonymous with "--nidx --fs tab".
166
167         --icsv    --ocsv    --csv       Comma-separated value (or tab-separated
168                                         with --fs tab, etc.)
169
170         --itsv    --otsv    --tsv       Keystroke-savers for "--icsv --ifs tab",
171                                         "--ocsv --ofs tab", "--csv --fs tab".
172         --iasv    --oasv    --asv       Similar but using ASCII FS 0x1f and RS 0x1e
173         --iusv    --ousv    --usv       Similar but using Unicode FS U+241F (UTF-8 0xe2909f)
174                                         and RS U+241E (UTF-8 0xe2909e)
175
176         --icsvlite --ocsvlite --csvlite Comma-separated value (or tab-separated
177                                         with --fs tab, etc.). The 'lite' CSV does not handle
178                                         RFC-CSV double-quoting rules; is slightly faster;
179                                         and handles heterogeneity in the input stream via
180                                         empty newline followed by new header line. See also
181                                         http://johnkerl.org/miller/doc/file-formats.html#CSV/TSV/etc.
182
183         --itsvlite --otsvlite --tsvlite Keystroke-savers for "--icsvlite --ifs tab",
184                                         "--ocsvlite --ofs tab", "--csvlite --fs tab".
185         -t                              Synonymous with --tsvlite.
186         --iasvlite --oasvlite --asvlite Similar to --itsvlite et al. but using ASCII FS 0x1f and RS 0x1e
187         --iusvlite --ousvlite --usvlite Similar to --itsvlite et al. but using Unicode FS U+241F (UTF-8 0xe2909f)
188                                         and RS U+241E (UTF-8 0xe2909e)
189
190         --ipprint --opprint --pprint    Pretty-printed tabular (produces no
191                                         output until all input is in).
192                             --right     Right-justifies all fields for PPRINT output.
193                             --barred    Prints a border around PPRINT output
194                                         (only available for output).
195
196                   --omd                 Markdown-tabular (only available for output).
197
198         --ixtab   --oxtab   --xtab      Pretty-printed vertical-tabular.
199                             --xvright   Right-justifies values for XTAB format.
200
201         --ijson   --ojson   --json      JSON tabular: sequence or list of one-level
202                                         maps: {...}{...} or [{...},{...}].
203           --json-map-arrays-on-input    JSON arrays are unmillerable. --json-map-arrays-on-input
204           --json-skip-arrays-on-input   is the default: arrays are converted to integer-indexed
205           --json-fatal-arrays-on-input  maps. The other two options cause them to be skipped, or
206                                         to be treated as errors.  Please use the jq tool for full
207                                         JSON (pre)processing.
208                             --jvstack   Put one key-value pair per line for JSON
209                                         output.
210                       --jsonx --ojsonx  Keystroke-savers for --json --jvstack
211                       --jsonx --ojsonx  and --ojson --jvstack, respectively.
212                             --jlistwrap Wrap JSON output in outermost [ ].
213                           --jknquoteint Do not quote non-string map keys in JSON output.
214                            --jvquoteall Quote map values in JSON output, even if they're
215                                         numeric.
216                     --jflatsep {string} Separator for flattening multi-level JSON keys,
217                                         e.g. '{"a":{"b":3}}' becomes a:b => 3 for
218                                         non-JSON formats. Defaults to :.
219
220         -p is a keystroke-saver for --nidx --fs space --repifs
221
222         Examples: --csv for CSV-formatted input and output; --idkvp --opprint for
223         DKVP-formatted input and pretty-printed output.
224
225         Please use --iformat1 --oformat2 rather than --format1 --oformat2.
226         The latter sets up input and output flags for format1, not all of which
227         are overridden in all cases by setting output format to format2.
228
229   COMMENTS IN DATA
230         --skip-comments                 Ignore commented lines (prefixed by "#")
231                                         within the input.
232         --skip-comments-with {string}   Ignore commented lines within input, with
233                                         specified prefix.
234         --pass-comments                 Immediately print commented lines (prefixed by "#")
235                                         within the input.
236         --pass-comments-with {string}   Immediately print commented lines within input, with
237                                         specified prefix.
238       Notes:
239       * Comments are only honored at the start of a line.
240       * In the absence of any of the above four options, comments are data like
241         any other text.
242       * When pass-comments is used, comment lines are written to standard output
243         immediately upon being read; they are not part of the record stream.
244         Results may be counterintuitive. A suggestion is to place comments at the
245         start of data files.
246
247   FORMAT-CONVERSION KEYSTROKE-SAVERS
248       As keystroke-savers for format-conversion you may use the following:
249               --c2t --c2d --c2n --c2j --c2x --c2p --c2m
250         --t2c       --t2d --t2n --t2j --t2x --t2p --t2m
251         --d2c --d2t       --d2n --d2j --d2x --d2p --d2m
252         --n2c --n2t --n2d       --n2j --n2x --n2p --n2m
253         --j2c --j2t --j2d --j2n       --j2x --j2p --j2m
254         --x2c --x2t --x2d --x2n --x2j       --x2p --x2m
255         --p2c --p2t --p2d --p2n --p2j --p2x       --p2m
256       The letters c t d n j x p m refer to formats CSV, TSV, DKVP, NIDX, JSON, XTAB,
257       PPRINT, and markdown, respectively. Note that markdown format is available for
258       output only.
259
260   COMPRESSED I/O
261         --prepipe {command} This allows Miller to handle compressed inputs. You can do
262         without this for single input files, e.g. "gunzip < myfile.csv.gz | mlr ...".
263
264         However, when multiple input files are present, between-file separations are
265         lost; also, the FILENAME variable doesn't iterate. Using --prepipe you can
266         specify an action to be taken on each input file. This pre-pipe command must
267         be able to read from standard input; it will be invoked with
268           {command} < {filename}.
269         Examples:
270           mlr --prepipe 'gunzip'
271           mlr --prepipe 'zcat -cf'
272           mlr --prepipe 'xz -cd'
273           mlr --prepipe cat
274           mlr --prepipe-gunzip
275           mlr --prepipe-zcat
276         Note that this feature is quite general and is not limited to decompression
277         utilities. You can use it to apply per-file filters of your choice.
278         For output compression (or other) utilities, simply pipe the output:
279           mlr ... | {your compression command}
280
281         There are shorthands --prepipe-zcat and --prepipe-gunzip which are
282         valid in .mlrrc files. The --prepipe flag is not valid in .mlrrc
283         files since that would put execution of the prepipe command under
284         control of the .mlrrc file.
285
286   SEPARATORS
287         --rs     --irs     --ors              Record separators, e.g. 'lf' or '\r\n'
288         --fs     --ifs     --ofs  --repifs    Field separators, e.g. comma
289         --ps     --ips     --ops              Pair separators, e.g. equals sign
290
291         Notes about line endings:
292         * Default line endings (--irs and --ors) are "auto" which means autodetect from
293           the input file format, as long as the input file(s) have lines ending in either
294           LF (also known as linefeed, '\n', 0x0a, Unix-style) or CRLF (also known as
295           carriage-return/linefeed pairs, '\r\n', 0x0d 0x0a, Windows style).
296         * If both irs and ors are auto (which is the default) then LF input will lead to LF
297           output and CRLF input will lead to CRLF output, regardless of the platform you're
298           running on.
299         * The line-ending autodetector triggers on the first line ending detected in the input
300           stream. E.g. if you specify a CRLF-terminated file on the command line followed by an
301           LF-terminated file then autodetected line endings will be CRLF.
302         * If you use --ors {something else} with (default or explicitly specified) --irs auto
303           then line endings are autodetected on input and set to what you specify on output.
304         * If you use --irs {something else} with (default or explicitly specified) --ors auto
305           then the output line endings used are LF on Unix/Linux/BSD/MacOSX, and CRLF on Windows.
306
307         Notes about all other separators:
308         * IPS/OPS are only used for DKVP and XTAB formats, since only in these formats
309           do key-value pairs appear juxtaposed.
310         * IRS/ORS are ignored for XTAB format. Nominally IFS and OFS are newlines;
311           XTAB records are separated by two or more consecutive IFS/OFS -- i.e.
312           a blank line. Everything above about --irs/--ors/--rs auto becomes --ifs/--ofs/--fs
313           auto for XTAB format. (XTAB's default IFS/OFS are "auto".)
314         * OFS must be single-character for PPRINT format. This is because it is used
315           with repetition for alignment; multi-character separators would make
316           alignment impossible.
317         * OPS may be multi-character for XTAB format, in which case alignment is
318           disabled.
319         * TSV is simply CSV using tab as field separator ("--fs tab").
320         * FS/PS are ignored for markdown format; RS is used.
321         * All FS and PS options are ignored for JSON format, since they are not relevant
322           to the JSON format.
323         * You can specify separators in any of the following ways, shown by example:
324           - Type them out, quoting as necessary for shell escapes, e.g.
325             "--fs '|' --ips :"
326           - C-style escape sequences, e.g. "--rs '\r\n' --fs '\t'".
327           - To avoid backslashing, you can use any of the following names:
328             cr crcr newline lf lflf crlf crlfcrlf tab space comma pipe slash colon semicolon equals
329         * Default separators by format:
330             File format  RS       FS       PS
331             gen          N/A      (N/A)    (N/A)
332             dkvp         auto     ,        =
333             json         auto     (N/A)    (N/A)
334             nidx         auto     space    (N/A)
335             csv          auto     ,        (N/A)
336             csvlite      auto     ,        (N/A)
337             markdown     auto     (N/A)    (N/A)
338             pprint       auto     space    (N/A)
339             xtab         (N/A)    auto     space
340
341   CSV-SPECIFIC OPTIONS
342         --implicit-csv-header Use 1,2,3,... as field labels, rather than from line 1
343                            of input files. Tip: combine with "label" to recreate
344                            missing headers.
345         --allow-ragged-csv-input|--ragged If a data line has fewer fields than the header line,
346                            fill remaining keys with empty string. If a data line has more
347                            fields than the header line, use integer field labels as in
348                            the implicit-header case.
349         --headerless-csv-output   Print only CSV data lines.
350         -N                 Keystroke-saver for --implicit-csv-header --headerless-csv-output.
351
352   DOUBLE-QUOTING FOR CSV/CSVLITE OUTPUT
353         --quote-all        Wrap all fields in double quotes
354         --quote-none       Do not wrap any fields in double quotes, even if they have
355                            OFS or ORS in them
356         --quote-minimal    Wrap fields in double quotes only if they have OFS or ORS
357                            in them (default)
358         --quote-numeric    Wrap fields in double quotes only if they have numbers
359                            in them
360         --quote-original   Wrap fields in double quotes if and only if they were
361                            quoted on input. This isn't sticky for computed fields:
362                            e.g. if fields a and b were quoted on input and you do
363                            "put '$c = $a . $b'" then field c won't inherit a or b's
364                            was-quoted-on-input flag.
365
366   NUMERICAL FORMATTING
367         --ofmt {format}    E.g. %.18lf, %.0lf. Please use sprintf-style codes for
368                            double-precision. Applies to verbs which compute new
369                            values, e.g. put, stats1, stats2. See also the fmtnum
370                            function within mlr put (mlr --help-all-functions).
371                            Defaults to %lf.
372
373   OTHER OPTIONS
374         --seed {n} with n of the form 12345678 or 0xcafefeed. For put/filter
375                            urand()/urandint()/urand32().
376         --nr-progress-mod {m}, with m a positive integer: print filename and record
377                            count to stderr every m input records.
378         --from {filename}  Use this to specify an input file before the verb(s),
379                            rather than after. May be used more than once. Example:
380                            "mlr --from a.dat --from b.dat cat" is the same as
381                            "mlr cat a.dat b.dat".
382         -n                 Process no input files, nor standard input either. Useful
383                            for mlr put with begin/end statements only. (Same as --from
384                            /dev/null.) Also useful in "mlr -n put -v '...'" for
385                            analyzing abstract syntax trees (if that's your thing).
386         -I                 Process files in-place. For each file name on the command
387                            line, output is written to a temp file in the same
388                            directory, which is then renamed over the original. Each
389                            file is processed in isolation: if the output format is
390                            CSV, CSV headers will be present in each output file;
391                            statistics are only over each file's own records; and so on.
392
393   THEN-CHAINING
394       Output of one verb may be chained as input to another using "then", e.g.
395         mlr stats1 -a min,mean,max -f flag,u,v -g color then sort -f color
396
397   AUXILIARY COMMANDS
398       Miller has a few otherwise-standalone executables packaged within it.
399       They do not participate in any other parts of Miller.
400       Available subcommands:
401         aux-list
402         lecat
403         termcvt
404         hex
405         unhex
406         netbsd-strptime
407       For more information, please invoke mlr {subcommand} --help
408

MLRRC

410       You can set up personal defaults via a $HOME/.mlrrc and/or ./.mlrrc.
411       For example, if you usually process CSV, then you can put "--csv" in your .mlrrc file
412       and that will be the default input/output format unless otherwise specified on the command line.
413
414       The .mlrrc file format is one "--flag" or "--option value" per line, with the leading "--" optional.
415       Hash-style comments and blank lines are ignored.
416
417       Sample .mlrrc:
418       # Input and output formats are CSV by default (unless otherwise specified
419       # on the mlr command line):
420       csv
421       # These are no-ops for CSV, but when I do use JSON output, I want these
422       # pretty-printing options to be used:
423       jvstack
424       jlistwrap
425
426       How to specify location of .mlrrc:
427       * If $MLRRC is set:
428         o If its value is "__none__" then no .mlrrc files are processed.
429         o Otherwise, its value (as a filename) is loaded and processed. If there are syntax
430           errors, they abort mlr with a usage message (as if you had mistyped something on the
431           command line). If the file can't be loaded at all, though, it is silently skipped.
432         o Any .mlrrc in your home directory or current directory is ignored whenever $MLRRC is
433           set in the environment.
434       * Otherwise:
435         o If $HOME/.mlrrc exists, it's then processed as above.
436         o If ./.mlrrc exists, it's then also processed as above.
437         (I.e. current-directory .mlrrc defaults are stacked over home-directory .mlrrc defaults.)
438
439       See also:
440       https://johnkerl.org/miller/doc/customization.html
441

VERBS

443   altkv
444       Usage: mlr altkv [no options]
445       Given fields with values of the form a,b,c,d,e,f emits a=b,c=d,e=f pairs.
446
447   bar
448       Usage: mlr bar [options]
449       Replaces a numeric field with a number of asterisks, allowing for cheesy
450       bar plots. These align best with --opprint or --oxtab output format.
451       Options:
452       -f   {a,b,c}      Field names to convert to bars.
453       -c   {character}  Fill character: default '*'.
454       -x   {character}  Out-of-bounds character: default '#'.
455       -b   {character}  Blank character: default '.'.
456       --lo {lo}         Lower-limit value for min-width bar: default '0.000000'.
457       --hi {hi}         Upper-limit value for max-width bar: default '100.000000'.
458       -w   {n}          Bar-field width: default '40'.
459       --auto            Automatically computes limits, ignoring --lo and --hi.
460                         Holds all records in memory before producing any output.
461
462   bootstrap
463       Usage: mlr bootstrap [options]
464       Emits an n-sample, with replacement, of the input records.
465       Options:
466       -n {number} Number of samples to output. Defaults to number of input records.
467                   Must be non-negative.
468       See also mlr sample and mlr shuffle.
469
470   cat
471       Usage: mlr cat [options]
472       Passes input records directly to output. Most useful for format conversion.
473       Options:
474       -n        Prepend field "n" to each record with record-counter starting at 1
475       -g {comma-separated field name(s)} When used with -n/-N, writes record-counters
476                 keyed by specified field name(s).
477       -v        Write a low-level record-structure dump to stderr.
478       -N {name} Prepend field {name} to each record with record-counter starting at 1
479
480   check
481       Usage: mlr check
482       Consumes records without printing any output.
483       Useful for doing a well-formatted check on input data.
484
485   clean-whitespace
486       Usage: mlr clean-whitespace [options]
487       For each record, for each field in the record, whitespace-cleans the keys and
488       values. Whitespace-cleaning entails stripping leading and trailing whitespace,
489       and replacing multiple whitespace with singles. For finer-grained control,
490       please see the DSL functions lstrip, rstrip, strip, collapse_whitespace,
491       and clean_whitespace.
492
493       Options:
494       -k|--keys-only    Do not touch values.
495       -v|--values-only  Do not touch keys.
496       It is an error to specify -k as well as -v -- to clean keys and values,
497       leave off -k as well as -v.
498
499   count
500       Usage: mlr count [options]
501       Prints number of records, optionally grouped by distinct values for specified field names.
502
503       Options:
504       -g {a,b,c}    Field names for distinct count.
505       -n            Show only the number of distinct values. Not interesting without -g.
506       -o {name}     Field name for output count. Default "count".
507
508   count-distinct
509       Usage: mlr count-distinct [options]
510       Prints number of records having distinct values for specified field names.
511       Same as uniq -c.
512
513       Options:
514       -f {a,b,c}    Field names for distinct count.
515       -n            Show only the number of distinct values. Not compatible with -u.
516       -o {name}     Field name for output count. Default "count".
517                     Ignored with -u.
518       -u            Do unlashed counts for multiple field names. With -f a,b and
519                     without -u, computes counts for distinct combinations of a
520                     and b field values. With -f a,b and with -u, computes counts
521                     for distinct a field values and counts for distinct b field
522                     values separately.
523
524   count-similar
525       Usage: mlr count-similar [options]
526       Ingests all records, then emits each record augmented by a count of
527       the number of other records having the same group-by field values.
528       Options:
529       -g {d,e,f} Group-by-field names for counts.
530       -o {name}  Field name for output count. Default "count".
531
532   cut
533       Usage: mlr cut [options]
534       Passes through input records with specified fields included/excluded.
535       -f {a,b,c}       Field names to include for cut.
536       -o               Retain fields in the order specified here in the argument list.
537                        Default is to retain them in the order found in the input data.
538       -x|--complement  Exclude, rather than include, field names specified by -f.
539       -r               Treat field names as regular expressions. "ab", "a.*b" will
540                        match any field name containing the substring "ab" or matching
541                        "a.*b", respectively; anchors of the form "^ab$", "^a.*b$" may
542                        be used. The -o flag is ignored when -r is present.
543       Examples:
544         mlr cut -f hostname,status
545         mlr cut -x -f hostname,status
546         mlr cut -r -f '^status$,sda[0-9]'
547         mlr cut -r -f '^status$,"sda[0-9]"'
548         mlr cut -r -f '^status$,"sda[0-9]"i' (this is case-insensitive)
549
550   decimate
551       Usage: mlr decimate [options]
552       -n {count}    Decimation factor; default 10
553       -b            Decimate by printing first of every n.
554       -e            Decimate by printing last of every n (default).
555       -g {a,b,c}    Optional group-by-field names for decimate counts
556       Passes through one of every n records, optionally by category.
557
558   fill-down
559       Usage: mlr fill-down [options]
560       -f {a,b,c}          Field names for fill-down
561       -a|--only-if-absent Field names for fill-down
562       If a given record has a missing value for a given field, fill that from
563       the corresponding value from a previous record, if any.
564       By default, a 'missing' field either is absent, or has the empty-string value.
565       With -a, a field is 'missing' only if it is absent.
566
567   filter
568       Usage: mlr filter [options] {expression}
569       Prints records for which {expression} evaluates to true.
570       If there are multiple semicolon-delimited expressions, all of them are
571       evaluated and the last one is used as the filter criterion.
572
573       Conversion options:
574       -S: Keeps field values as strings with no type inference to int or float.
575       -F: Keeps field values as strings or floats with no inference to int.
576       All field values are type-inferred to int/float/string unless this behavior is
577       suppressed with -S or -F.
578
579       Output/formatting options:
580       --oflatsep {string}: Separator to use when flattening multi-level @-variables
581           to output records for emit. Default ":".
582       --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
583       --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
584       Any of the output-format command-line flags (see mlr -h). Example: using
585         mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
586       the input is CSV, the output is pretty-print tabular, but the tee-file output
587       is written in JSON format.
588       --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
589           record.
590
591       Expression-specification options:
592       -f {filename}: the DSL expression is taken from the specified file rather
593           than from the command line. Outer single quotes wrapping the expression
594           should not be placed in the file. If -f is specified more than once,
595           all input files specified using -f are concatenated to produce the expression.
596           (For example, you can define functions in one file and call them from another.)
597       -e {expression}: You can use this after -f to add an expression. Example use
598           case: define functions/subroutines in a file you specify with -f, then call
599           them with an expression you specify with -e.
600       (If you mix -e and -f then the expressions are evaluated in the order encountered.
601       Since the expression pieces are simply concatenated, please be sure to use intervening
602       semicolons to separate expressions.)
603
604       -s name=value: Predefines out-of-stream variable @name to have value "value".
605           Thus mlr filter put -s foo=97 '$column += @foo' is like
606           mlr filter put 'begin {@foo = 97} $column += @foo'.
607           The value part is subject to type-inferencing as specified by -S/-F.
608           May be specified more than once, e.g. -s name1=value1 -s name2=value2.
609           Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE
610
611       Tracing options:
612       -v: Prints the expressions's AST (abstract syntax tree), which gives
613           full transparency on the precedence and associativity rules of
614           Miller's grammar, to stdout.
615       -a: Prints a low-level stack-allocation trace to stdout.
616       -t: Prints a low-level parser trace to stderr.
617       -T: Prints a every statement to stderr as it is executed.
618
619       Other options:
620       -x: Prints records for which {expression} evaluates to false.
621
622       Please use a dollar sign for field names and double-quotes for string
623       literals. If field names have special characters such as "." then you might
624       use braces, e.g. '${field.name}'. Miller built-in variables are
625       NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
626       variables. The environment-variable name may be an expression, e.g. a field
627       value.
628
629       Use # to comment to end of line.
630
631       Examples:
632         mlr filter 'log10($count) > 4.0'
633         mlr filter 'FNR == 2'         (second record in each file)
634         mlr filter 'urand() < 0.001'  (subsampling)
635         mlr filter '$color != "blue" && $value > 4.2'
636         mlr filter '($x<.5 && $y<.5) || ($x>.5 && $y>.5)'
637         mlr filter '($name =~ "^sys.*east$") || ($name =~ "^dev.[0-9]+"i)'
638         mlr filter '$ab = $a+$b; $cd = $c+$d; $ab != $cd'
639         mlr filter '
640           NR == 1 ||
641          #NR == 2 ||
642           NR == 3
643         '
644
645       Please see https://miller.readthedocs.io/en/latest/reference.html for more information
646       including function list. Or "mlr -f". Please also see "mlr grep" which is
647       useful when you don't yet know which field name(s) you're looking for.
648       Please see in particular:
649         http://www.johnkerl.org/miller/doc/reference-verbs.html#filter
650
651   format-values
652       Usage: mlr format-values [options]
653       Applies format strings to all field values, depending on autodetected type.
654       * If a field value is detected to be integer, applies integer format.
655       * Else, if a field value is detected to be float, applies float format.
656       * Else, applies string format.
657
658       Note: this is a low-keystroke way to apply formatting to many fields. To get
659       finer control, please see the fmtnum function within the mlr put DSL.
660
661       Note: this verb lets you apply arbitrary format strings, which can produce
662       undefined behavior and/or program crashes.  See your system's "man printf".
663
664       Options:
665       -i {integer format} Defaults to "%lld".
666                           Examples: "%06lld", "%08llx".
667                           Note that Miller integers are long long so you must use
668                           formats which apply to long long, e.g. with ll in them.
669                           Undefined behavior results otherwise.
670       -f {float format}   Defaults to "%lf".
671                           Examples: "%8.3lf", "%.6le".
672                           Note that Miller floats are double-precision so you must
673                           use formats which apply to double, e.g. with l[efg] in them.
674                           Undefined behavior results otherwise.
675       -s {string format}  Defaults to "%s".
676                           Examples: "_%s", "%08s".
677                           Note that you must use formats which apply to string, e.g.
678                           with s in them. Undefined behavior results otherwise.
679       -n                  Coerce field values autodetected as int to float, and then
680                           apply the float format.
681
682   fraction
683       Usage: mlr fraction [options]
684       For each record's value in specified fields, computes the ratio of that
685       value to the sum of values in that field over all input records.
686       E.g. with input records  x=1  x=2  x=3  and  x=4, emits output records
687       x=1,x_fraction=0.1  x=2,x_fraction=0.2  x=3,x_fraction=0.3  and  x=4,x_fraction=0.4
688
689       Note: this is internally a two-pass algorithm: on the first pass it retains
690       input records and accumulates sums; on the second pass it computes quotients
691       and emits output records. This means it produces no output until all input is read.
692
693       Options:
694       -f {a,b,c}    Field name(s) for fraction calculation
695       -g {d,e,f}    Optional group-by-field name(s) for fraction counts
696       -p            Produce percents [0..100], not fractions [0..1]. Output field names
697                     end with "_percent" rather than "_fraction"
698       -c            Produce cumulative distributions, i.e. running sums: each output
699                     value folds in the sum of the previous for the specified group
700                     E.g. with input records  x=1  x=2  x=3  and  x=4, emits output records
701                     x=1,x_cumulative_fraction=0.1  x=2,x_cumulative_fraction=0.3
702                     x=3,x_cumulative_fraction=0.6  and  x=4,x_cumulative_fraction=1.0
703
704   grep
705       Usage: mlr grep [options] {regular expression}
706       Passes through records which match {regex}.
707       Options:
708       -i    Use case-insensitive search.
709       -v    Invert: pass through records which do not match the regex.
710       Note that "mlr filter" is more powerful, but requires you to know field names.
711       By contrast, "mlr grep" allows you to regex-match the entire record. It does
712       this by formatting each record in memory as DKVP, using command-line-specified
713       ORS/OFS/OPS, and matching the resulting line against the regex specified
714       here. In particular, the regex is not applied to the input stream: if you
715       have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
716       be matched, not against either of these lines, but against the DKVP line
717       "x=1,y=2,z=3".  Furthermore, not all the options to system grep are supported,
718       and this command is intended to be merely a keystroke-saver. To get all the
719       features of system grep, you can do
720         "mlr --odkvp ... | grep ... | mlr --idkvp ..."
721
722   group-by
723       Usage: mlr group-by {comma-separated field names}
724       Outputs records in batches having identical values at specified field names.
725
726   group-like
727       Usage: mlr group-like
728       Outputs records in batches having identical field names.
729
730   having-fields
731       Usage: mlr having-fields [options]
732       Conditionally passes through records depending on each record's field names.
733       Options:
734         --at-least      {comma-separated names}
735         --which-are     {comma-separated names}
736         --at-most       {comma-separated names}
737         --all-matching  {regular expression}
738         --any-matching  {regular expression}
739         --none-matching {regular expression}
740       Examples:
741         mlr having-fields --which-are amount,status,owner
742         mlr having-fields --any-matching 'sda[0-9]'
743         mlr having-fields --any-matching '"sda[0-9]"'
744         mlr having-fields --any-matching '"sda[0-9]"i' (this is case-insensitive)
745
746   head
747       Usage: mlr head [options]
748       -n {count}    Head count to print; default 10
749       -g {a,b,c}    Optional group-by-field names for head counts
750       Passes through the first n records, optionally by category.
751       Without -g, ceases consuming more input (i.e. is fast) when n
752       records have been read.
753
754   histogram
755       Usage: mlr histogram [options]
756       -f {a,b,c}    Value-field names for histogram counts
757       --lo {lo}     Histogram low value
758       --hi {hi}     Histogram high value
759       --nbins {n}   Number of histogram bins
760       --auto        Automatically computes limits, ignoring --lo and --hi.
761                     Holds all values in memory before producing any output.
762       -o {prefix}   Prefix for output field name. Default: no prefix.
763       Just a histogram. Input values < lo or > hi are not counted.
764
765   join
766       Usage: mlr join [options]
767       Joins records from specified left file name with records from all file names
768       at the end of the Miller argument list.
769       Functionality is essentially the same as the system "join" command, but for
770       record streams.
771       Options:
772         -f {left file name}
773         -j {a,b,c}   Comma-separated join-field names for output
774         -l {a,b,c}   Comma-separated join-field names for left input file;
775                      defaults to -j values if omitted.
776         -r {a,b,c}   Comma-separated join-field names for right input file(s);
777                      defaults to -j values if omitted.
778         --lp {text}  Additional prefix for non-join output field names from
779                      the left file
780         --rp {text}  Additional prefix for non-join output field names from
781                      the right file(s)
782         --np         Do not emit paired records
783         --ul         Emit unpaired records from the left file
784         --ur         Emit unpaired records from the right file(s)
785         -s|--sorted-input  Require sorted input: records must be sorted
786                      lexically by their join-field names, else not all records will
787                      be paired. The only likely use case for this is with a left
788                      file which is too big to fit into system memory otherwise.
789         -u           Enable unsorted input. (This is the default even without -u.)
790                      In this case, the entire left file will be loaded into memory.
791         --prepipe {command} As in main input options; see mlr --help for details.
792                      If you wish to use a prepipe command for the main input as well
793                      as here, it must be specified there as well as here.
794       File-format options default to those for the right file names on the Miller
795       argument list, but may be overridden for the left file as follows. Please see
796       the main "mlr --help" for more information on syntax for these arguments.
797         -i {one of csv,dkvp,nidx,pprint,xtab}
798         --irs {record-separator character}
799         --ifs {field-separator character}
800         --ips {pair-separator character}
801         --repifs
802         --repips
803       Please use "mlr --usage-separator-options" for information on specifying separators.
804       Please see https://miller.readthedocs.io/en/latest/reference-verbs.html#join for more information
805       including examples.
806
807   label
808       Usage: mlr label {new1,new2,new3,...}
809       Given n comma-separated names, renames the first n fields of each record to
810       have the respective name. (Fields past the nth are left with their original
811       names.) Particularly useful with --inidx or --implicit-csv-header, to give
812       useful names to otherwise integer-indexed fields.
813       Examples:
814         "echo 'a b c d' | mlr --inidx --odkvp cat"       gives "1=a,2=b,3=c,4=d"
815         "echo 'a b c d' | mlr --inidx --odkvp label s,t" gives "s=a,t=b,3=c,4=d"
816
817   least-frequent
818       Usage: mlr least-frequent [options]
819       Shows the least frequently occurring distinct values for specified field names.
820       The first entry is the statistical anti-mode; the remaining are runners-up.
821       Options:
822       -f {one or more comma-separated field names}. Required flag.
823       -n {count}. Optional flag defaulting to 10.
824       -b          Suppress counts; show only field values.
825       -o {name}   Field name for output count. Default "count".
826       See also "mlr most-frequent".
827
828   merge-fields
829       Usage: mlr merge-fields [options]
830       Computes univariate statistics for each input record, accumulated across
831       specified fields.
832       Options:
833       -a {sum,count,...}  Names of accumulators. One or more of:
834         count     Count instances of fields
835         mode      Find most-frequently-occurring values for fields; first-found wins tie
836         antimode  Find least-frequently-occurring values for fields; first-found wins tie
837         sum       Compute sums of specified fields
838         mean      Compute averages (sample means) of specified fields
839         stddev    Compute sample standard deviation of specified fields
840         var       Compute sample variance of specified fields
841         meaneb    Estimate error bars for averages (assuming no sample autocorrelation)
842         skewness  Compute sample skewness of specified fields
843         kurtosis  Compute sample kurtosis of specified fields
844         min       Compute minimum values of specified fields
845         max       Compute maximum values of specified fields
846       -f {a,b,c}  Value-field names on which to compute statistics. Requires -o.
847       -r {a,b,c}  Regular expressions for value-field names on which to compute
848                   statistics. Requires -o.
849       -c {a,b,c}  Substrings for collapse mode. All fields which have the same names
850                   after removing substrings will be accumulated together. Please see
851                   examples below.
852       -i          Use interpolated percentiles, like R's type=7; default like type=1.
853                   Not sensical for string-valued fields.
854       -o {name}   Output field basename for -f/-r.
855       -k          Keep the input fields which contributed to the output statistics;
856                   the default is to omit them.
857       -F          Computes integerable things (e.g. count) in floating point.
858
859       String-valued data make sense unless arithmetic on them is required,
860       e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
861       numbers are less than strings.
862
863       Example input data: "a_in_x=1,a_out_x=2,b_in_y=4,b_out_x=8".
864       Example: mlr merge-fields -a sum,count -f a_in_x,a_out_x -o foo
865         produces "b_in_y=4,b_out_x=8,foo_sum=3,foo_count=2" since "a_in_x,a_out_x" are
866         summed over.
867       Example: mlr merge-fields -a sum,count -r in_,out_ -o bar
868         produces "bar_sum=15,bar_count=4" since all four fields are summed over.
869       Example: mlr merge-fields -a sum,count -c in_,out_
870         produces "a_x_sum=3,a_x_count=2,b_y_sum=4,b_y_count=1,b_x_sum=8,b_x_count=1"
871         since "a_in_x" and "a_out_x" both collapse to "a_x", "b_in_y" collapses to
872         "b_y", and "b_out_x" collapses to "b_x".
873
874   most-frequent
875       Usage: mlr most-frequent [options]
876       Shows the most frequently occurring distinct values for specified field names.
877       The first entry is the statistical mode; the remaining are runners-up.
878       Options:
879       -f {one or more comma-separated field names}. Required flag.
880       -n {count}. Optional flag defaulting to 10.
881       -b          Suppress counts; show only field values.
882       -o {name}   Field name for output count. Default "count".
883       See also "mlr least-frequent".
884
885   nest
886       Usage: mlr nest [options]
887       Explodes specified field values into separate fields/records, or reverses this.
888       Options:
889         --explode,--implode   One is required.
890         --values,--pairs      One is required.
891         --across-records,--across-fields One is required.
892         -f {field name}       Required.
893         --nested-fs {string}  Defaults to ";". Field separator for nested values.
894         --nested-ps {string}  Defaults to ":". Pair separator for nested key-value pairs.
895         --evar {string}       Shorthand for --explode --values ---across-records --nested-fs {string}
896         --ivar {string}       Shorthand for --implode --values ---across-records --nested-fs {string}
897       Please use "mlr --usage-separator-options" for information on specifying separators.
898
899       Examples:
900
901         mlr nest --explode --values --across-records -f x
902         with input record "x=a;b;c,y=d" produces output records
903           "x=a,y=d"
904           "x=b,y=d"
905           "x=c,y=d"
906         Use --implode to do the reverse.
907
908         mlr nest --explode --values --across-fields -f x
909         with input record "x=a;b;c,y=d" produces output records
910           "x_1=a,x_2=b,x_3=c,y=d"
911         Use --implode to do the reverse.
912
913         mlr nest --explode --pairs --across-records -f x
914         with input record "x=a:1;b:2;c:3,y=d" produces output records
915           "a=1,y=d"
916           "b=2,y=d"
917           "c=3,y=d"
918
919         mlr nest --explode --pairs --across-fields -f x
920         with input record "x=a:1;b:2;c:3,y=d" produces output records
921           "a=1,b=2,c=3,y=d"
922
923       Notes:
924       * With --pairs, --implode doesn't make sense since the original field name has
925         been lost.
926       * The combination "--implode --values --across-records" is non-streaming:
927         no output records are produced until all input records have been read. In
928         particular, this means it won't work in tail -f contexts. But all other flag
929         combinations result in streaming (tail -f friendly) data processing.
930       * It's up to you to ensure that the nested-fs is distinct from your data's IFS:
931         e.g. by default the former is semicolon and the latter is comma.
932       See also mlr reshape.
933
934   nothing
935       Usage: mlr nothing
936       Drops all input records. Useful for testing, or after tee/print/etc. have
937       produced other output.
938
939   put
940       Usage: mlr put [options] {expression}
941       Adds/updates specified field(s). Expressions are semicolon-separated and must
942       either be assignments, or evaluate to boolean.  Booleans with following
943       statements in curly braces control whether those statements are executed;
944       booleans without following curly braces do nothing except side effects (e.g.
945       regex-captures into \1, \2, etc.).
946
947       Conversion options:
948       -S: Keeps field values as strings with no type inference to int or float.
949       -F: Keeps field values as strings or floats with no inference to int.
950       All field values are type-inferred to int/float/string unless this behavior is
951       suppressed with -S or -F.
952
953       Output/formatting options:
954       --oflatsep {string}: Separator to use when flattening multi-level @-variables
955           to output records for emit. Default ":".
956       --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
957       --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
958       Any of the output-format command-line flags (see mlr -h). Example: using
959         mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
960       the input is CSV, the output is pretty-print tabular, but the tee-file output
961       is written in JSON format.
962       --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
963           record.
964
965       Expression-specification options:
966       -f {filename}: the DSL expression is taken from the specified file rather
967           than from the command line. Outer single quotes wrapping the expression
968           should not be placed in the file. If -f is specified more than once,
969           all input files specified using -f are concatenated to produce the expression.
970           (For example, you can define functions in one file and call them from another.)
971       -e {expression}: You can use this after -f to add an expression. Example use
972           case: define functions/subroutines in a file you specify with -f, then call
973           them with an expression you specify with -e.
974       (If you mix -e and -f then the expressions are evaluated in the order encountered.
975       Since the expression pieces are simply concatenated, please be sure to use intervening
976       semicolons to separate expressions.)
977
978       -s name=value: Predefines out-of-stream variable @name to have value "value".
979           Thus mlr put put -s foo=97 '$column += @foo' is like
980           mlr put put 'begin {@foo = 97} $column += @foo'.
981           The value part is subject to type-inferencing as specified by -S/-F.
982           May be specified more than once, e.g. -s name1=value1 -s name2=value2.
983           Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE
984
985       Tracing options:
986       -v: Prints the expressions's AST (abstract syntax tree), which gives
987           full transparency on the precedence and associativity rules of
988           Miller's grammar, to stdout.
989       -a: Prints a low-level stack-allocation trace to stdout.
990       -t: Prints a low-level parser trace to stderr.
991       -T: Prints a every statement to stderr as it is executed.
992
993       Other options:
994       -q: Does not include the modified record in the output stream. Useful for when
995           all desired output is in begin and/or end blocks.
996
997       Please use a dollar sign for field names and double-quotes for string
998       literals. If field names have special characters such as "." then you might
999       use braces, e.g. '${field.name}'. Miller built-in variables are
1000       NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
1001       variables. The environment-variable name may be an expression, e.g. a field
1002       value.
1003
1004       Use # to comment to end of line.
1005
1006       Examples:
1007         mlr put '$y = log10($x); $z = sqrt($y)'
1008         mlr put '$x>0.0 { $y=log10($x); $z=sqrt($y) }' # does {...} only if $x > 0.0
1009         mlr put '$x>0.0;  $y=log10($x); $z=sqrt($y)'   # does all three statements
1010         mlr put '$a =~ "([a-z]+)_([0-9]+);  $b = "left_\1"; $c = "right_\2"'
1011         mlr put '$a =~ "([a-z]+)_([0-9]+) { $b = "left_\1"; $c = "right_\2" }'
1012         mlr put '$filename = FILENAME'
1013         mlr put '$colored_shape = $color . "_" . $shape'
1014         mlr put '$y = cos($theta); $z = atan2($y, $x)'
1015         mlr put '$name = sub($name, "http.*com"i, "")'
1016         mlr put -q '@sum += $x; end {emit @sum}'
1017         mlr put -q '@sum[$a] += $x; end {emit @sum, "a"}'
1018         mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}'
1019         mlr put -q '@min=min(@min,$x);@max=max(@max,$x); end{emitf @min, @max}'
1020         mlr put -q 'is_null(@xmax) || $x > @xmax {@xmax=$x; @recmax=$*}; end {emit @recmax}'
1021         mlr put '
1022           $x = 1;
1023          #$y = 2;
1024           $z = 3
1025         '
1026
1027       Please see also 'mlr -k' for examples using redirected output.
1028
1029       Please see https://miller.readthedocs.io/en/latest/reference.html for more information
1030       including function list. Or "mlr -f".
1031       Please see in particular:
1032         http://www.johnkerl.org/miller/doc/reference-verbs.html#put
1033
1034   regularize
1035       Usage: mlr regularize
1036       For records seen earlier in the data stream with same field names in
1037       a different order, outputs them with field names in the previously
1038       encountered order.
1039       Example: input records a=1,c=2,b=3, then e=4,d=5, then c=7,a=6,b=8
1040       output as              a=1,c=2,b=3, then e=4,d=5, then a=6,c=7,b=8
1041
1042   remove-empty-columns
1043       Usage: mlr remove-empty-columns
1044       Omits fields which are empty on every input row. Non-streaming.
1045
1046   rename
1047       Usage: mlr rename [options] {old1,new1,old2,new2,...}
1048       Renames specified fields.
1049       Options:
1050       -r         Treat old field  names as regular expressions. "ab", "a.*b"
1051                  will match any field name containing the substring "ab" or
1052                  matching "a.*b", respectively; anchors of the form "^ab$",
1053                  "^a.*b$" may be used. New field names may be plain strings,
1054                  or may contain capture groups of the form "\1" through
1055                  "\9". Wrapping the regex in double quotes is optional, but
1056                  is required if you wish to follow it with 'i' to indicate
1057                  case-insensitivity.
1058       -g         Do global replacement within each field name rather than
1059                  first-match replacement.
1060       Examples:
1061       mlr rename old_name,new_name'
1062       mlr rename old_name_1,new_name_1,old_name_2,new_name_2'
1063       mlr rename -r 'Date_[0-9]+,Date,'  Rename all such fields to be "Date"
1064       mlr rename -r '"Date_[0-9]+",Date' Same
1065       mlr rename -r 'Date_([0-9]+).*,\1' Rename all such fields to be of the form 20151015
1066       mlr rename -r '"name"i,Name'       Rename "name", "Name", "NAME", etc. to "Name"
1067
1068   reorder
1069       Usage: mlr reorder [options]
1070       -f {a,b,c}   Field names to reorder.
1071       -e           Put specified field names at record end: default is to put
1072                    them at record start.
1073       Examples:
1074       mlr reorder    -f a,b sends input record "d=4,b=2,a=1,c=3" to "a=1,b=2,d=4,c=3".
1075       mlr reorder -e -f a,b sends input record "d=4,b=2,a=1,c=3" to "d=4,c=3,a=1,b=2".
1076
1077   repeat
1078       Usage: mlr repeat [options]
1079       Copies input records to output records multiple times.
1080       Options must be exactly one of the following:
1081         -n {repeat count}  Repeat each input record this many times.
1082         -f {field name}    Same, but take the repeat count from the specified
1083                            field name of each input record.
1084       Example:
1085         echo x=0 | mlr repeat -n 4 then put '$x=urand()'
1086       produces:
1087        x=0.488189
1088        x=0.484973
1089        x=0.704983
1090        x=0.147311
1091       Example:
1092         echo a=1,b=2,c=3 | mlr repeat -f b
1093       produces:
1094         a=1,b=2,c=3
1095         a=1,b=2,c=3
1096       Example:
1097         echo a=1,b=2,c=3 | mlr repeat -f c
1098       produces:
1099         a=1,b=2,c=3
1100         a=1,b=2,c=3
1101         a=1,b=2,c=3
1102
1103   reshape
1104       Usage: mlr reshape [options]
1105       Wide-to-long options:
1106         -i {input field names}   -o {key-field name,value-field name}
1107         -r {input field regexes} -o {key-field name,value-field name}
1108         These pivot/reshape the input data such that the input fields are removed
1109         and separate records are emitted for each key/value pair.
1110         Note: this works with tail -f and produces output records for each input
1111         record seen.
1112       Long-to-wide options:
1113         -s {key-field name,value-field name}
1114         These pivot/reshape the input data to undo the wide-to-long operation.
1115         Note: this does not work with tail -f; it produces output records only after
1116         all input records have been read.
1117
1118       Examples:
1119
1120         Input file "wide.txt":
1121           time       X           Y
1122           2009-01-01 0.65473572  2.4520609
1123           2009-01-02 -0.89248112 0.2154713
1124           2009-01-03 0.98012375  1.3179287
1125
1126         mlr --pprint reshape -i X,Y -o item,value wide.txt
1127           time       item value
1128           2009-01-01 X    0.65473572
1129           2009-01-01 Y    2.4520609
1130           2009-01-02 X    -0.89248112
1131           2009-01-02 Y    0.2154713
1132           2009-01-03 X    0.98012375
1133           2009-01-03 Y    1.3179287
1134
1135         mlr --pprint reshape -r '[A-Z]' -o item,value wide.txt
1136           time       item value
1137           2009-01-01 X    0.65473572
1138           2009-01-01 Y    2.4520609
1139           2009-01-02 X    -0.89248112
1140           2009-01-02 Y    0.2154713
1141           2009-01-03 X    0.98012375
1142           2009-01-03 Y    1.3179287
1143
1144         Input file "long.txt":
1145           time       item value
1146           2009-01-01 X    0.65473572
1147           2009-01-01 Y    2.4520609
1148           2009-01-02 X    -0.89248112
1149           2009-01-02 Y    0.2154713
1150           2009-01-03 X    0.98012375
1151           2009-01-03 Y    1.3179287
1152
1153         mlr --pprint reshape -s item,value long.txt
1154           time       X           Y
1155           2009-01-01 0.65473572  2.4520609
1156           2009-01-02 -0.89248112 0.2154713
1157           2009-01-03 0.98012375  1.3179287
1158       See also mlr nest.
1159
1160   sample
1161       Usage: mlr sample [options]
1162       Reservoir sampling (subsampling without replacement), optionally by category.
1163       -k {count}    Required: number of records to output, total, or by group if using -g.
1164       -g {a,b,c}    Optional: group-by-field names for samples.
1165       See also mlr bootstrap and mlr shuffle.
1166
1167   sec2gmt
1168       Usage: mlr sec2gmt [options] {comma-separated list of field names}
1169       Replaces a numeric field representing seconds since the epoch with the
1170       corresponding GMT timestamp; leaves non-numbers as-is. This is nothing
1171       more than a keystroke-saver for the sec2gmt function:
1172         mlr sec2gmt time1,time2
1173       is the same as
1174         mlr put '$time1=sec2gmt($time1);$time2=sec2gmt($time2)'
1175       Options:
1176       -1 through -9: format the seconds using 1..9 decimal places, respectively.
1177
1178   sec2gmtdate
1179       Usage: mlr sec2gmtdate {comma-separated list of field names}
1180       Replaces a numeric field representing seconds since the epoch with the
1181       corresponding GMT year-month-day timestamp; leaves non-numbers as-is.
1182       This is nothing more than a keystroke-saver for the sec2gmtdate function:
1183         mlr sec2gmtdate time1,time2
1184       is the same as
1185         mlr put '$time1=sec2gmtdate($time1);$time2=sec2gmtdate($time2)'
1186
1187   seqgen
1188       Usage: mlr seqgen [options]
1189       Produces a sequence of counters.  Discards the input record stream. Produces
1190       output as specified by the following options:
1191       -f {name} Field name for counters; default "i".
1192       --start {number} Inclusive start value; default "1".
1193       --stop  {number} Inclusive stop value; default "100".
1194       --step  {number} Step value; default "1".
1195       Start, stop, and/or step may be floating-point. Output is integer if start,
1196       stop, and step are all integers. Step may be negative. It may not be zero
1197       unless start == stop.
1198
1199   shuffle
1200       Usage: mlr shuffle {no options}
1201       Outputs records randomly permuted. No output records are produced until
1202       all input records are read.
1203       See also mlr bootstrap and mlr sample.
1204
1205   skip-trivial-records
1206       Usage: mlr skip-trivial-records [options]
1207       Passes through all records except:
1208       * those with zero fields;
1209       * those for which all fields have empty value.
1210
1211   sort
1212       Usage: mlr sort {flags}
1213       Flags:
1214         -f  {comma-separated field names}  Lexical ascending
1215         -n  {comma-separated field names}  Numerical ascending; nulls sort last
1216         -nf {comma-separated field names}  Same as -n
1217         -r  {comma-separated field names}  Lexical descending
1218         -nr {comma-separated field names}  Numerical descending; nulls sort first
1219       Sorts records primarily by the first specified field, secondarily by the second
1220       field, and so on.  (Any records not having all specified sort keys will appear
1221       at the end of the output, in the order they were encountered, regardless of the
1222       specified sort order.) The sort is stable: records that compare equal will sort
1223       in the order they were encountered in the input record stream.
1224
1225       Example:
1226         mlr sort -f a,b -nr x,y,z
1227       which is the same as:
1228         mlr sort -f a -f b -nr x -nr y -nr z
1229
1230   sort-within-records
1231       Usage: mlr sort-within-records [no options]
1232       Outputs records sorted lexically ascending by keys.
1233
1234   stats1
1235       Usage: mlr stats1 [options]
1236       Computes univariate statistics for one or more given fields, accumulated across
1237       the input record stream.
1238       Options:
1239       -a {sum,count,...}  Names of accumulators: p10 p25.2 p50 p98 p100 etc. and/or
1240                           one or more of:
1241          count     Count instances of fields
1242          mode      Find most-frequently-occurring values for fields; first-found wins tie
1243          antimode  Find least-frequently-occurring values for fields; first-found wins tie
1244          sum       Compute sums of specified fields
1245          mean      Compute averages (sample means) of specified fields
1246          stddev    Compute sample standard deviation of specified fields
1247          var       Compute sample variance of specified fields
1248          meaneb    Estimate error bars for averages (assuming no sample autocorrelation)
1249          skewness  Compute sample skewness of specified fields
1250          kurtosis  Compute sample kurtosis of specified fields
1251          min       Compute minimum values of specified fields
1252          max       Compute maximum values of specified fields
1253       -f {a,b,c}   Value-field names on which to compute statistics
1254       --fr {regex} Regex for value-field names on which to compute statistics
1255                    (compute statistics on values in all field names matching regex)
1256       --fx {regex} Inverted regex for value-field names on which to compute statistics
1257                    (compute statistics on values in all field names not matching regex)
1258       -g {d,e,f}   Optional group-by-field names
1259       --gr {regex} Regex for optional group-by-field names
1260                    (group by values in field names matching regex)
1261       --gx {regex} Inverted regex for optional group-by-field names
1262                    (group by values in field names not matching regex)
1263       --grfx {regex} Shorthand for --gr {regex} --fx {that same regex}
1264       -i           Use interpolated percentiles, like R's type=7; default like type=1.
1265                    Not sensical for string-valued fields.
1266       -s           Print iterative stats. Useful in tail -f contexts (in which
1267                    case please avoid pprint-format output since end of input
1268                    stream will never be seen).
1269       -F           Computes integerable things (e.g. count) in floating point.
1270       Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
1271       Example: mlr stats1 -a count,mode -f size
1272       Example: mlr stats1 -a count,mode -f size -g shape
1273       Example: mlr stats1 -a count,mode --fr '^[a-h].*$' -gr '^k.*$'
1274                This computes count and mode statistics on all field names beginning
1275                with a through h, grouped by all field names starting with k.
1276       Notes:
1277       * p50 and median are synonymous.
1278       * min and max output the same results as p0 and p100, respectively, but use
1279         less memory.
1280       * String-valued data make sense unless arithmetic on them is required,
1281         e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
1282         numbers are less than strings.
1283       * count and mode allow text input; the rest require numeric input.
1284         In particular, 1 and 1.0 are distinct text for count and mode.
1285       * When there are mode ties, the first-encountered datum wins.
1286
1287   stats2
1288       Usage: mlr stats2 [options]
1289       Computes bivariate statistics for one or more given field-name pairs,
1290       accumulated across the input record stream.
1291       -a {linreg-ols,corr,...}  Names of accumulators: one or more of:
1292         linreg-pca   Linear regression using principal component analysis
1293         linreg-ols   Linear regression using ordinary least squares
1294         r2           Quality metric for linreg-ols (linreg-pca emits its own)
1295         logireg      Logistic regression
1296         corr         Sample correlation
1297         cov          Sample covariance
1298         covx         Sample-covariance matrix
1299       -f {a,b,c,d}   Value-field name-pairs on which to compute statistics.
1300                      There must be an even number of names.
1301       -g {e,f,g}     Optional group-by-field names.
1302       -v             Print additional output for linreg-pca.
1303       -s             Print iterative stats. Useful in tail -f contexts (in which
1304                      case please avoid pprint-format output since end of input
1305                      stream will never be seen).
1306       --fit          Rather than printing regression parameters, applies them to
1307                      the input data to compute new fit fields. All input records are
1308                      held in memory until end of input stream. Has effect only for
1309                      linreg-ols, linreg-pca, and logireg.
1310       Only one of -s or --fit may be used.
1311       Example: mlr stats2 -a linreg-pca -f x,y
1312       Example: mlr stats2 -a linreg-ols,r2 -f x,y -g size,shape
1313       Example: mlr stats2 -a corr -f x,y
1314
1315   step
1316       Usage: mlr step [options]
1317       Computes values dependent on the previous record, optionally grouped
1318       by category.
1319
1320       Options:
1321       -a {delta,rsum,...}   Names of steppers: comma-separated, one or more of:
1322         delta    Compute differences in field(s) between successive records
1323         shift    Include value(s) in field(s) from previous record, if any
1324         from-first Compute differences in field(s) from first record
1325         ratio    Compute ratios in field(s) between successive records
1326         rsum     Compute running sums of field(s) between successive records
1327         counter  Count instances of field(s) between successive records
1328         ewma     Exponentially weighted moving average over successive records
1329       -f {a,b,c} Value-field names on which to compute statistics
1330       -g {d,e,f} Optional group-by-field names
1331       -F         Computes integerable things (e.g. counter) in floating point.
1332       -d {x,y,z} Weights for ewma. 1 means current sample gets all weight (no
1333                  smoothing), near under under 1 is light smoothing, near over 0 is
1334                  heavy smoothing. Multiple weights may be specified, e.g.
1335                  "mlr step -a ewma -f sys_load -d 0.01,0.1,0.9". Default if omitted
1336                  is "-d 0.5".
1337       -o {a,b,c} Custom suffixes for EWMA output fields. If omitted, these default to
1338                  the -d values. If supplied, the number of -o values must be the same
1339                  as the number of -d values.
1340
1341       Examples:
1342         mlr step -a rsum -f request_size
1343         mlr step -a delta -f request_size -g hostname
1344         mlr step -a ewma -d 0.1,0.9 -f x,y
1345         mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y
1346         mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y -g group_name
1347
1348       Please see https://miller.readthedocs.io/en/latest/reference-verbs.html#filter or
1349       https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average
1350       for more information on EWMA.
1351
1352   tac
1353       Usage: mlr tac
1354       Prints records in reverse order from the order in which they were encountered.
1355
1356   tail
1357       Usage: mlr tail [options]
1358       -n {count}    Tail count to print; default 10
1359       -g {a,b,c}    Optional group-by-field names for tail counts
1360       Passes through the last n records, optionally by category.
1361
1362   tee
1363       Usage: mlr tee [options] {filename}
1364       Passes through input records (like mlr cat) but also writes to specified output
1365       file, using output-format flags from the command line (e.g. --ocsv). See also
1366       the "tee" keyword within mlr put, which allows data-dependent filenames.
1367       Options:
1368       -a:          append to existing file, if any, rather than overwriting.
1369       --no-fflush: don't call fflush() after every record.
1370       Any of the output-format command-line flags (see mlr -h). Example: using
1371         mlr --icsv --opprint put '...' then tee --ojson ./mytap.dat then stats1 ...
1372       the input is CSV, the output is pretty-print tabular, but the tee-file output
1373       is written in JSON format.
1374
1375   top
1376       Usage: mlr top [options]
1377       -f {a,b,c}    Value-field names for top counts.
1378       -g {d,e,f}    Optional group-by-field names for top counts.
1379       -n {count}    How many records to print per category; default 1.
1380       -a            Print all fields for top-value records; default is
1381                     to print only value and group-by fields. Requires a single
1382                     value-field name only.
1383       --min         Print top smallest values; default is top largest values.
1384       -F            Keep top values as floats even if they look like integers.
1385       -o {name}     Field name for output indices. Default "top_idx".
1386       Prints the n records with smallest/largest values at specified fields,
1387       optionally by category.
1388
1389   uniq
1390       Usage: mlr uniq [options]
1391       Prints distinct values for specified field names. With -c, same as
1392       count-distinct. For uniq, -f is a synonym for -g.
1393
1394       Options:
1395       -g {d,e,f}    Group-by-field names for uniq counts.
1396       -c            Show repeat counts in addition to unique values.
1397       -n            Show only the number of distinct values.
1398       -o {name}     Field name for output count. Default "count".
1399       -a            Output each unique record only once. Incompatible with -g.
1400                     With -c, produces unique records, with repeat counts for each.
1401                     With -n, produces only one record which is the unique-record count.
1402                     With neither -c nor -n, produces unique records.
1403
1404   unsparsify
1405       Usage: mlr unsparsify [options]
1406       Prints records with the union of field names over all input records.
1407       For field names absent in a given record but present in others, fills in a
1408       value. Without -f, this verb retains all input before producing any output.
1409
1410       Options:
1411       --fill-with {filler string}  What to fill absent fields with. Defaults to
1412                                    the empty string.
1413       -f {a,b,c} Specify field names to be operated on. Any other fields won't be
1414                                    modified, and operation will be streaming.
1415
1416       Example: if the input is two records, one being 'a=1,b=2' and the other
1417       being 'b=3,c=4', then the output is the two records 'a=1,b=2,c=' and
1418       ’a=,b=3,c=4'.
1419

FUNCTIONS FOR FILTER/PUT

1421   +
1422       (class=arithmetic #args=2): Addition.
1423
1424       + (class=arithmetic #args=1): Unary plus.
1425
1426   -
1427       (class=arithmetic #args=2): Subtraction.
1428
1429       - (class=arithmetic #args=1): Unary minus.
1430
1431   *
1432       (class=arithmetic #args=2): Multiplication.
1433
1434   /
1435       (class=arithmetic #args=2): Division.
1436
1437   //
1438       (class=arithmetic #args=2): Integer division: rounds to negative (pythonic).
1439
1440   .+
1441       (class=arithmetic #args=2): Addition, with integer-to-integer overflow
1442
1443       .+ (class=arithmetic #args=1): Unary plus, with integer-to-integer overflow.
1444
1445   .-
1446       (class=arithmetic #args=2): Subtraction, with integer-to-integer overflow.
1447
1448       .- (class=arithmetic #args=1): Unary minus, with integer-to-integer overflow.
1449
1450   .*
1451       (class=arithmetic #args=2): Multiplication, with integer-to-integer overflow.
1452
1453   ./
1454       (class=arithmetic #args=2): Division, with integer-to-integer overflow.
1455
1456   .//
1457       (class=arithmetic #args=2): Integer division: rounds to negative (pythonic), with integer-to-integer overflow.
1458
1459   %
1460       (class=arithmetic #args=2): Remainder; never negative-valued (pythonic).
1461
1462   **
1463       (class=arithmetic #args=2): Exponentiation; same as pow, but as an infix
1464       operator.
1465
1466   |
1467       (class=arithmetic #args=2): Bitwise OR.
1468
1469   ^
1470       (class=arithmetic #args=2): Bitwise XOR.
1471
1472   &
1473       (class=arithmetic #args=2): Bitwise AND.
1474
1475   ~
1476       (class=arithmetic #args=1): Bitwise NOT. Beware '$y=~$x' since =~ is the
1477       regex-match operator: try '$y = ~$x'.
1478
1479   <<
1480       (class=arithmetic #args=2): Bitwise left-shift.
1481
1482   >>
1483       (class=arithmetic #args=2): Bitwise right-shift.
1484
1485   bitcount
1486       (class=arithmetic #args=1): Count of 1-bits
1487
1488   ==
1489       (class=boolean #args=2): String/numeric equality. Mixing number and string
1490       results in string compare.
1491
1492   !=
1493       (class=boolean #args=2): String/numeric inequality. Mixing number and string
1494       results in string compare.
1495
1496   =~
1497       (class=boolean #args=2): String (left-hand side) matches regex (right-hand
1498       side), e.g. '$name =~ "^a.*b$"'.
1499
1500   !=~
1501       (class=boolean #args=2): String (left-hand side) does not match regex
1502       (right-hand side), e.g. '$name !=~ "^a.*b$"'.
1503
1504   >
1505       (class=boolean #args=2): String/numeric greater-than. Mixing number and string
1506       results in string compare.
1507
1508   >=
1509       (class=boolean #args=2): String/numeric greater-than-or-equals. Mixing number
1510       and string results in string compare.
1511
1512   <
1513       (class=boolean #args=2): String/numeric less-than. Mixing number and string
1514       results in string compare.
1515
1516   <=
1517       (class=boolean #args=2): String/numeric less-than-or-equals. Mixing number
1518       and string results in string compare.
1519
1520   &&
1521       (class=boolean #args=2): Logical AND.
1522
1523   ||
1524       (class=boolean #args=2): Logical OR.
1525
1526   ^^
1527       (class=boolean #args=2): Logical XOR.
1528
1529   !
1530       (class=boolean #args=1): Logical negation.
1531
1532   ? :
1533       (class=boolean #args=3): Ternary operator.
1534
1535   .
1536       (class=string #args=2): String concatenation.
1537
1538   gsub
1539       (class=string #args=3): Example: '$name=gsub($name, "old", "new")'
1540       (replace all).
1541
1542   regextract
1543       (class=string #args=2): Example: '$name=regextract($name, "[A-Z]{3}[0-9]{2}")'
1544       .
1545
1546   regextract_or_else
1547       (class=string #args=3): Example: '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")'
1548       .
1549
1550   strlen
1551       (class=string #args=1): String length.
1552
1553   sub
1554       (class=string #args=3): Example: '$name=sub($name, "old", "new")'
1555       (replace once).
1556
1557   ssub
1558       (class=string #args=3): Like sub but does no regexing. No characters are special.
1559
1560   substr
1561       (class=string #args=3): substr(s,m,n) gives substring of s from 0-up position m to n
1562       inclusive. Negative indices -len .. -1 alias to 0 .. len-1.
1563
1564   tolower
1565       (class=string #args=1): Convert string to lowercase.
1566
1567   toupper
1568       (class=string #args=1): Convert string to uppercase.
1569
1570   truncate
1571       (class=string #args=2): Truncates string first argument to max length of int second argument.
1572
1573   capitalize
1574       (class=string #args=1): Convert string's first character to uppercase.
1575
1576   lstrip
1577       (class=string #args=1): Strip leading whitespace from string.
1578
1579   rstrip
1580       (class=string #args=1): Strip trailing whitespace from string.
1581
1582   strip
1583       (class=string #args=1): Strip leading and trailing whitespace from string.
1584
1585   collapse_whitespace
1586       (class=string #args=1): Strip repeated whitespace from string.
1587
1588   clean_whitespace
1589       (class=string #args=1): Same as collapse_whitespace and strip.
1590
1591   system
1592       (class=string #args=1): Run command string, yielding its stdout minus final carriage return.
1593
1594   abs
1595       (class=math #args=1): Absolute value.
1596
1597   acos
1598       (class=math #args=1): Inverse trigonometric cosine.
1599
1600   acosh
1601       (class=math #args=1): Inverse hyperbolic cosine.
1602
1603   asin
1604       (class=math #args=1): Inverse trigonometric sine.
1605
1606   asinh
1607       (class=math #args=1): Inverse hyperbolic sine.
1608
1609   atan
1610       (class=math #args=1): One-argument arctangent.
1611
1612   atan2
1613       (class=math #args=2): Two-argument arctangent.
1614
1615   atanh
1616       (class=math #args=1): Inverse hyperbolic tangent.
1617
1618   cbrt
1619       (class=math #args=1): Cube root.
1620
1621   ceil
1622       (class=math #args=1): Ceiling: nearest integer at or above.
1623
1624   cos
1625       (class=math #args=1): Trigonometric cosine.
1626
1627   cosh
1628       (class=math #args=1): Hyperbolic cosine.
1629
1630   erf
1631       (class=math #args=1): Error function.
1632
1633   erfc
1634       (class=math #args=1): Complementary error function.
1635
1636   exp
1637       (class=math #args=1): Exponential function e**x.
1638
1639   expm1
1640       (class=math #args=1): e**x - 1.
1641
1642   floor
1643       (class=math #args=1): Floor: nearest integer at or below.
1644
1645   invqnorm
1646       (class=math #args=1): Inverse of normal cumulative distribution
1647       function. Note that invqorm(urand()) is normally distributed.
1648
1649   log
1650       (class=math #args=1): Natural (base-e) logarithm.
1651
1652   log10
1653       (class=math #args=1): Base-10 logarithm.
1654
1655   log1p
1656       (class=math #args=1): log(1-x).
1657
1658   logifit
1659       (class=math #args=3): Given m and b from logistic regression, compute
1660       fit: $yhat=logifit($x,$m,$b).
1661
1662   madd
1663       (class=math #args=3): a + b mod m (integers)
1664
1665   max
1666       (class=math variadic): max of n numbers; null loses
1667
1668   mexp
1669       (class=math #args=3): a ** b mod m (integers)
1670
1671   min
1672       (class=math variadic): Min of n numbers; null loses
1673
1674   mmul
1675       (class=math #args=3): a * b mod m (integers)
1676
1677   msub
1678       (class=math #args=3): a - b mod m (integers)
1679
1680   pow
1681       (class=math #args=2): Exponentiation; same as **.
1682
1683   qnorm
1684       (class=math #args=1): Normal cumulative distribution function.
1685
1686   round
1687       (class=math #args=1): Round to nearest integer.
1688
1689   roundm
1690       (class=math #args=2): Round to nearest multiple of m: roundm($x,$m) is
1691       the same as round($x/$m)*$m
1692
1693   sgn
1694       (class=math #args=1): +1 for positive input, 0 for zero input, -1 for
1695       negative input.
1696
1697   sin
1698       (class=math #args=1): Trigonometric sine.
1699
1700   sinh
1701       (class=math #args=1): Hyperbolic sine.
1702
1703   sqrt
1704       (class=math #args=1): Square root.
1705
1706   tan
1707       (class=math #args=1): Trigonometric tangent.
1708
1709   tanh
1710       (class=math #args=1): Hyperbolic tangent.
1711
1712   urand
1713       (class=math #args=0): Floating-point numbers uniformly distributed on the unit interval.
1714       Int-valued example: '$n=floor(20+urand()*11)'.
1715
1716   urandrange
1717       (class=math #args=2): Floating-point numbers uniformly distributed on the interval [a, b).
1718
1719   urand32
1720       (class=math #args=0): Integer uniformly distributed 0 and 2**32-1
1721       inclusive.
1722
1723   urandint
1724       (class=math #args=2): Integer uniformly distributed between inclusive
1725       integer endpoints.
1726
1727   dhms2fsec
1728       (class=time #args=1): Recovers floating-point seconds as in
1729       dhms2fsec("5d18h53m20.250000s") = 500000.250000
1730
1731   dhms2sec
1732       (class=time #args=1): Recovers integer seconds as in
1733       dhms2sec("5d18h53m20s") = 500000
1734
1735   fsec2dhms
1736       (class=time #args=1): Formats floating-point seconds as in
1737       fsec2dhms(500000.25) = "5d18h53m20.250000s"
1738
1739   fsec2hms
1740       (class=time #args=1): Formats floating-point seconds as in
1741       fsec2hms(5000.25) = "01:23:20.250000"
1742
1743   gmt2sec
1744       (class=time #args=1): Parses GMT timestamp as integer seconds since
1745       the epoch.
1746
1747   localtime2sec
1748       (class=time #args=1): Parses local timestamp as integer seconds since
1749       the epoch. Consults $TZ environment variable.
1750
1751   hms2fsec
1752       (class=time #args=1): Recovers floating-point seconds as in
1753       hms2fsec("01:23:20.250000") = 5000.250000
1754
1755   hms2sec
1756       (class=time #args=1): Recovers integer seconds as in
1757       hms2sec("01:23:20") = 5000
1758
1759   sec2dhms
1760       (class=time #args=1): Formats integer seconds as in sec2dhms(500000)
1761       = "5d18h53m20s"
1762
1763   sec2gmt
1764       (class=time #args=1): Formats seconds since epoch (integer part)
1765       as GMT timestamp, e.g. sec2gmt(1440768801.7) = "2015-08-28T13:33:21Z".
1766       Leaves non-numbers as-is.
1767
1768       sec2gmt (class=time #args=2): Formats seconds since epoch as GMT timestamp with n
1769       decimal places for seconds, e.g. sec2gmt(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1770       Leaves non-numbers as-is.
1771
1772   sec2gmtdate
1773       (class=time #args=1): Formats seconds since epoch (integer part)
1774       as GMT timestamp with year-month-date, e.g. sec2gmtdate(1440768801.7) = "2015-08-28".
1775       Leaves non-numbers as-is.
1776
1777   sec2localtime
1778       (class=time #args=1): Formats seconds since epoch (integer part)
1779       as local timestamp, e.g. sec2localtime(1440768801.7) = "2015-08-28T13:33:21Z".
1780       Consults $TZ environment variable. Leaves non-numbers as-is.
1781
1782       sec2localtime (class=time #args=2): Formats seconds since epoch as local timestamp with n
1783       decimal places for seconds, e.g. sec2localtime(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1784       Consults $TZ environment variable. Leaves non-numbers as-is.
1785
1786   sec2localdate
1787       (class=time #args=1): Formats seconds since epoch (integer part)
1788       as local timestamp with year-month-date, e.g. sec2localdate(1440768801.7) = "2015-08-28".
1789       Consults $TZ environment variable. Leaves non-numbers as-is.
1790
1791   sec2hms
1792       (class=time #args=1): Formats integer seconds as in
1793       sec2hms(5000) = "01:23:20"
1794
1795   strftime
1796       (class=time #args=2): Formats seconds since the epoch as timestamp, e.g.
1797       strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z", and
1798       strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z".
1799       Format strings are as in the C library (please see "man strftime" on your system),
1800       with the Miller-specific addition of "%1S" through "%9S" which format the seconds
1801       with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.)
1802       See also strftime_local.
1803
1804   strftime_local
1805       (class=time #args=2): Like strftime but consults the $TZ environment variable to get local time zone.
1806
1807   strptime
1808       (class=time #args=2): Parses timestamp as floating-point seconds since the epoch,
1809       e.g. strptime("2015-08-28T13:33:21Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.000000,
1810       and  strptime("2015-08-28T13:33:21.345Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.345000.
1811       See also strptime_local.
1812
1813   strptime_local
1814       (class=time #args=2): Like strptime, but consults $TZ environment variable to find and use local timezone.
1815
1816   systime
1817       (class=time #args=0): Floating-point seconds since the epoch,
1818       e.g. 1440768801.748936.
1819
1820   is_absent
1821       (class=typing #args=1): False if field is present in input, true otherwise
1822
1823   is_bool
1824       (class=typing #args=1): True if field is present with boolean value. Synonymous with is_boolean.
1825
1826   is_boolean
1827       (class=typing #args=1): True if field is present with boolean value. Synonymous with is_bool.
1828
1829   is_empty
1830       (class=typing #args=1): True if field is present in input with empty string value, false otherwise.
1831
1832   is_empty_map
1833       (class=typing #args=1): True if argument is a map which is empty.
1834
1835   is_float
1836       (class=typing #args=1): True if field is present with value inferred to be float
1837
1838   is_int
1839       (class=typing #args=1): True if field is present with value inferred to be int
1840
1841   is_map
1842       (class=typing #args=1): True if argument is a map.
1843
1844   is_nonempty_map
1845       (class=typing #args=1): True if argument is a map which is non-empty.
1846
1847   is_not_empty
1848       (class=typing #args=1): False if field is present in input with empty value, true otherwise
1849
1850   is_not_map
1851       (class=typing #args=1): True if argument is not a map.
1852
1853   is_not_null
1854       (class=typing #args=1): False if argument is null (empty or absent), true otherwise.
1855
1856   is_null
1857       (class=typing #args=1): True if argument is null (empty or absent), false otherwise.
1858
1859   is_numeric
1860       (class=typing #args=1): True if field is present with value inferred to be int or float
1861
1862   is_present
1863       (class=typing #args=1): True if field is present in input, false otherwise.
1864
1865   is_string
1866       (class=typing #args=1): True if field is present with string (including empty-string) value
1867
1868   asserting_absent
1869       (class=typing #args=1): Returns argument if it is absent in the input data, else
1870       throws an error.
1871
1872   asserting_bool
1873       (class=typing #args=1): Returns argument if it is present with boolean value, else
1874       throws an error.
1875
1876   asserting_boolean
1877       (class=typing #args=1): Returns argument if it is present with boolean value, else
1878       throws an error.
1879
1880   asserting_empty
1881       (class=typing #args=1): Returns argument if it is present in input with empty value,
1882       else throws an error.
1883
1884   asserting_empty_map
1885       (class=typing #args=1): Returns argument if it is a map with empty value, else
1886       throws an error.
1887
1888   asserting_float
1889       (class=typing #args=1): Returns argument if it is present with float value, else
1890       throws an error.
1891
1892   asserting_int
1893       (class=typing #args=1): Returns argument if it is present with int value, else
1894       throws an error.
1895
1896   asserting_map
1897       (class=typing #args=1): Returns argument if it is a map, else throws an error.
1898
1899   asserting_nonempty_map
1900       (class=typing #args=1): Returns argument if it is a non-empty map, else throws
1901       an error.
1902
1903   asserting_not_empty
1904       (class=typing #args=1): Returns argument if it is present in input with non-empty
1905       value, else throws an error.
1906
1907   asserting_not_map
1908       (class=typing #args=1): Returns argument if it is not a map, else throws an error.
1909
1910   asserting_not_null
1911       (class=typing #args=1): Returns argument if it is non-null (non-empty and non-absent),
1912       else throws an error.
1913
1914   asserting_null
1915       (class=typing #args=1): Returns argument if it is null (empty or absent), else throws
1916       an error.
1917
1918   asserting_numeric
1919       (class=typing #args=1): Returns argument if it is present with int or float value,
1920       else throws an error.
1921
1922   asserting_present
1923       (class=typing #args=1): Returns argument if it is present in input, else throws
1924       an error.
1925
1926   asserting_string
1927       (class=typing #args=1): Returns argument if it is present with string (including
1928       empty-string) value, else throws an error.
1929
1930   boolean
1931       (class=conversion #args=1): Convert int/float/bool/string to boolean.
1932
1933   float
1934       (class=conversion #args=1): Convert int/float/bool/string to float.
1935
1936   fmtnum
1937       (class=conversion #args=2): Convert int/float/bool to string using
1938       printf-style format string, e.g. '$s = fmtnum($n, "%06lld")'. WARNING: Miller numbers
1939       are all long long or double. If you use formats like %d or %f, behavior is undefined.
1940
1941   hexfmt
1942       (class=conversion #args=1): Convert int to string, e.g. 255 to "0xff".
1943
1944   int
1945       (class=conversion #args=1): Convert int/float/bool/string to int.
1946
1947   string
1948       (class=conversion #args=1): Convert int/float/bool/string to string.
1949
1950   typeof
1951       (class=conversion #args=1): Convert argument to type of argument (e.g.
1952       MT_STRING). For debug.
1953
1954   depth
1955       (class=maps #args=1): Prints maximum depth of hashmap: ''. Scalars have depth 0.
1956
1957   haskey
1958       (class=maps #args=2): True/false if map has/hasn't key, e.g. 'haskey($*, "a")' or
1959       ’haskey(mymap, mykey)'. Error if 1st argument is not a map.
1960
1961   joink
1962       (class=maps #args=2): Makes string from map keys. E.g. 'joink($*, ",")'.
1963
1964   joinkv
1965       (class=maps #args=3): Makes string from map key-value pairs. E.g. 'joinkv(@v[2], "=", ",")'
1966
1967   joinv
1968       (class=maps #args=2): Makes string from map values. E.g. 'joinv(mymap, ",")'.
1969
1970   leafcount
1971       (class=maps #args=1): Counts total number of terminal values in hashmap. For single-level maps,
1972       same as length.
1973
1974   length
1975       (class=maps #args=1): Counts number of top-level entries in hashmap. Scalars have length 1.
1976
1977   mapdiff
1978       (class=maps variadic): With 0 args, returns empty map. With 1 arg, returns copy of arg.
1979       With 2 or more, returns copy of arg 1 with all keys from any of remaining argument maps removed.
1980
1981   mapexcept
1982       (class=maps variadic): Returns a map with keys from remaining arguments, if any, unset.
1983       E.g. 'mapexcept({1:2,3:4,5:6}, 1, 5, 7)' is '{3:4}'.
1984
1985   mapselect
1986       (class=maps variadic): Returns a map with only keys from remaining arguments set.
1987       E.g. 'mapselect({1:2,3:4,5:6}, 1, 5, 7)' is '{1:2,5:6}'.
1988
1989   mapsum
1990       (class=maps variadic): With 0 args, returns empty map. With >= 1 arg, returns a map with
1991       key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'.
1992
1993   splitkv
1994       (class=maps #args=3): Splits string by separators into map with type inference.
1995       E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives '{"a" : 1, "b" : 2, "c" : 3}'.
1996
1997   splitkvx
1998       (class=maps #args=3): Splits string by separators into map without type inference (keys and
1999       values are strings). E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives
2000       ’{"a" : "1", "b" : "2", "c" : "3"}'.
2001
2002   splitnv
2003       (class=maps #args=2): Splits string by separator into integer-indexed map with type inference.
2004       E.g. 'splitnv("a,b,c" , ",")' gives '{1 : "a", 2 : "b", 3 : "c"}'.
2005
2006   splitnvx
2007       (class=maps #args=2): Splits string by separator into integer-indexed map without type
2008       inference (values are strings). E.g. 'splitnv("4,5,6" , ",")' gives '{1 : "4", 2 : "5", 3 : "6"}'.
2009

KEYWORDS FOR PUT AND FILTER

2011   all
2012       all: used in "emit", "emitp", and "unset" as a synonym for @*
2013
2014   begin
2015       begin: defines a block of statements to be executed before input records
2016       are ingested. The body statements must be wrapped in curly braces.
2017       Example: 'begin { @count = 0 }'
2018
2019   bool
2020       bool: declares a boolean local variable in the current curly-braced scope.
2021       Type-checking happens at assignment: 'bool b = 1' is an error.
2022
2023   break
2024       break: causes execution to continue after the body of the current
2025       for/while/do-while loop.
2026
2027   call
2028       call: used for invoking a user-defined subroutine.
2029       Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
2030
2031   continue
2032       continue: causes execution to skip the remaining statements in the body of
2033       the current for/while/do-while loop. For-loop increments are still applied.
2034
2035   do
2036       do: with "while", introduces a do-while loop. The body statements must be wrapped
2037       in curly braces.
2038
2039   dump
2040       dump: prints all currently defined out-of-stream variables immediately
2041         to stdout as JSON.
2042
2043         With >, >>, or |, the data do not become part of the output record stream but
2044         are instead redirected.
2045
2046         The > and >> are for write and append, as in the shell, but (as with awk) the
2047         file-overwrite for > is on first write, not per record. The | is for piping to
2048         a process which will process the data. There will be one open file for each
2049         distinct file name (for > and >>) or one subordinate process for each distinct
2050         value of the piped-to command (for |). Output-formatting flags are taken from
2051         the main command line.
2052
2053         Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }'
2054         Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >  "mytap.dat"}'
2055         Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}'
2056         Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}'
2057
2058   edump
2059       edump: prints all currently defined out-of-stream variables immediately
2060         to stderr as JSON.
2061
2062         Example: mlr --from f.dat put -q '@v[NR]=$*; end { edump }'
2063
2064   elif
2065       elif: the way Miller spells "else if". The body statements must be wrapped
2066       in curly braces.
2067
2068   else
2069       else: terminates an if/elif/elif chain. The body statements must be wrapped
2070       in curly braces.
2071
2072   emit
2073       emit: inserts an out-of-stream variable into the output record stream. Hashmap
2074         indices present in the data but not slotted by emit arguments are not output.
2075
2076         With >, >>, or |, the data do not become part of the output record stream but
2077         are instead redirected.
2078
2079         The > and >> are for write and append, as in the shell, but (as with awk) the
2080         file-overwrite for > is on first write, not per record. The | is for piping to
2081         a process which will process the data. There will be one open file for each
2082         distinct file name (for > and >>) or one subordinate process for each distinct
2083         value of the piped-to command (for |). Output-formatting flags are taken from
2084         the main command line.
2085
2086         You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2087         etc., to control the format of the output if the output is redirected. See also mlr -h.
2088
2089         Example: mlr --from f.dat put 'emit >  "/tmp/data-".$a, $*'
2090         Example: mlr --from f.dat put 'emit >  "/tmp/data-".$a, mapexcept($*, "a")'
2091         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums'
2092         Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums'
2093         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"'
2094         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"'
2095         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >  "mytap.dat", @*, "index1", "index2"'
2096         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"'
2097         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"'
2098         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
2099         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"'
2100
2101         Please see http://johnkerl.org/miller/doc for more information.
2102
2103   emitf
2104       emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the
2105         output record stream.
2106
2107         With >, >>, or |, the data do not become part of the output record stream but
2108         are instead redirected.
2109
2110         The > and >> are for write and append, as in the shell, but (as with awk) the
2111         file-overwrite for > is on first write, not per record. The | is for piping to
2112         a process which will process the data. There will be one open file for each
2113         distinct file name (for > and >>) or one subordinate process for each distinct
2114         value of the piped-to command (for |). Output-formatting flags are taken from
2115         the main command line.
2116
2117         You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2118         etc., to control the format of the output if the output is redirected. See also mlr -h.
2119
2120         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a'
2121         Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a'
2122         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c'
2123         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c'
2124         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c'
2125         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c'
2126         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c'
2127         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c'
2128
2129         Please see http://johnkerl.org/miller/doc for more information.
2130
2131   emitp
2132       emitp: inserts an out-of-stream variable into the output record stream.
2133         Hashmap indices present in the data but not slotted by emitp arguments are
2134         output concatenated with ":".
2135
2136         With >, >>, or |, the data do not become part of the output record stream but
2137         are instead redirected.
2138
2139         The > and >> are for write and append, as in the shell, but (as with awk) the
2140         file-overwrite for > is on first write, not per record. The | is for piping to
2141         a process which will process the data. There will be one open file for each
2142         distinct file name (for > and >>) or one subordinate process for each distinct
2143         value of the piped-to command (for |). Output-formatting flags are taken from
2144         the main command line.
2145
2146         You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2147         etc., to control the format of the output if the output is redirected. See also mlr -h.
2148
2149         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums'
2150         Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums'
2151         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"'
2152         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"'
2153         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >  "mytap.dat", @*, "index1", "index2"'
2154         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"'
2155         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"'
2156         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
2157         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"'
2158
2159         Please see http://johnkerl.org/miller/doc for more information.
2160
2161   end
2162       end: defines a block of statements to be executed after input records
2163       are ingested. The body statements must be wrapped in curly braces.
2164       Example: 'end { emit @count }'
2165       Example: 'end { eprint "Final count is " . @count }'
2166
2167   eprint
2168       eprint: prints expression immediately to stderr.
2169         Example: mlr --from f.dat put -q 'eprint "The sum of x and y is ".($x+$y)'
2170         Example: mlr --from f.dat put -q 'for (k, v in $*) { eprint k . " => " . v }'
2171         Example: mlr --from f.dat put  '(NR % 1000 == 0) { eprint "Checkpoint ".NR}'
2172
2173   eprintn
2174       eprintn: prints expression immediately to stderr, without trailing newline.
2175         Example: mlr --from f.dat put -q 'eprintn "The sum of x and y is ".($x+$y); eprint ""'
2176
2177   false
2178       false: the boolean literal value.
2179
2180   filter
2181       filter: includes/excludes the record in the output record stream.
2182
2183         Example: mlr --from f.dat put 'filter (NR == 2 || $x > 5.4)'
2184
2185         Instead of put with 'filter false' you can simply use put -q.  The following
2186         uses the input record to accumulate data but only prints the running sum
2187         without printing the input record:
2188
2189         Example: mlr --from f.dat put -q '@running_sum += $x * $y; emit @running_sum'
2190
2191   float
2192       float: declares a floating-point local variable in the current curly-braced scope.
2193       Type-checking happens at assignment: 'float x = 0' is an error.
2194
2195   for
2196       for: defines a for-loop using one of three styles. The body statements must
2197       be wrapped in curly braces.
2198       For-loop over stream record:
2199         Example:  'for (k, v in $*) { ... }'
2200       For-loop over out-of-stream variables:
2201         Example: 'for (k, v in @counts) { ... }'
2202         Example: 'for ((k1, k2), v in @counts) { ... }'
2203         Example: 'for ((k1, k2, k3), v in @*) { ... }'
2204       C-style for-loop:
2205         Example:  'for (var i = 0, var b = 1; i < 10; i += 1, b *= 2) { ... }'
2206
2207   func
2208       func: used for defining a user-defined function.
2209       Example: 'func f(a,b) { return sqrt(a**2+b**2)} $d = f($x, $y)'
2210
2211   if
2212       if: starts an if/elif/elif chain. The body statements must be wrapped
2213       in curly braces.
2214
2215   in
2216       in: used in for-loops over stream records or out-of-stream variables.
2217
2218   int
2219       int: declares an integer local variable in the current curly-braced scope.
2220       Type-checking happens at assignment: 'int x = 0.0' is an error.
2221
2222   map
2223       map: declares an map-valued local variable in the current curly-braced scope.
2224       Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is
2225       always OK. map b = a is OK or not depending on whether a is a map.
2226
2227   num
2228       num: declares an int/float local variable in the current curly-braced scope.
2229       Type-checking happens at assignment: 'num b = true' is an error.
2230
2231   print
2232       print: prints expression immediately to stdout.
2233         Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)'
2234         Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }'
2235         Example: mlr --from f.dat put  '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
2236
2237   printn
2238       printn: prints expression immediately to stdout, without trailing newline.
2239         Example: mlr --from f.dat put -q 'printn "."; end { print "" }'
2240
2241   return
2242       return: specifies the return value from a user-defined function.
2243       Omitted return statements (including via if-branches) result in an absent-null
2244       return value, which in turns results in a skipped assignment to an LHS.
2245
2246   stderr
2247       stderr: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2248         to print to standard error.
2249
2250   stdout
2251       stdout: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2252         to print to standard output.
2253
2254   str
2255       str: declares a string local variable in the current curly-braced scope.
2256       Type-checking happens at assignment.
2257
2258   subr
2259       subr: used for defining a subroutine.
2260       Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
2261
2262   tee
2263       tee: prints the current record to specified file.
2264         This is an immediate print to the specified file (except for pprint format
2265         which of course waits until the end of the input stream to format all output).
2266
2267         The > and >> are for write and append, as in the shell, but (as with awk) the
2268         file-overwrite for > is on first write, not per record. The | is for piping to
2269         a process which will process the data. There will be one open file for each
2270         distinct file name (for > and >>) or one subordinate process for each distinct
2271         value of the piped-to command (for |). Output-formatting flags are taken from
2272         the main command line.
2273
2274         You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2275         etc., to control the format of the output. See also mlr -h.
2276
2277         emit with redirect and tee with redirect are identical, except tee can only
2278         output $*.
2279
2280         Example: mlr --from f.dat put 'tee >  "/tmp/data-".$a, $*'
2281         Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*'
2282         Example: mlr --from f.dat put 'tee >  stderr, $*'
2283         Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\]", $*'
2284         Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\] > /tmp/data-".$a, $*'
2285         Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2286         Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2287
2288   true
2289       true: the boolean literal value.
2290
2291   unset
2292       unset: clears field(s) from the current record, or an out-of-stream or local variable.
2293
2294         Example: mlr --from f.dat put 'unset $x'
2295         Example: mlr --from f.dat put 'unset $*'
2296         Example: mlr --from f.dat put 'for (k, v in $*) { if (k =~ "a.*") { unset $[k] } }'
2297         Example: mlr --from f.dat put '...; unset @sums'
2298         Example: mlr --from f.dat put '...; unset @sums["green"]'
2299         Example: mlr --from f.dat put '...; unset @*'
2300
2301   var
2302       var: declares an untyped local variable in the current curly-braced scope.
2303       Examples: 'var a=1', 'var xyz=""'
2304
2305   while
2306       while: introduces a while loop, or with "do", introduces a do-while loop.
2307       The body statements must be wrapped in curly braces.
2308
2309   ENV
2310       ENV: access to environment variables by name, e.g. '$home = ENV["HOME"]'
2311
2312   FILENAME
2313       FILENAME: evaluates to the name of the current file being processed.
2314
2315   FILENUM
2316       FILENUM: evaluates to the number of the current file being processed,
2317       starting with 1.
2318
2319   FNR
2320       FNR: evaluates to the number of the current record within the current file
2321       being processed, starting with 1. Resets at the start of each file.
2322
2323   IFS
2324       IFS: evaluates to the input field separator from the command line.
2325
2326   IPS
2327       IPS: evaluates to the input pair separator from the command line.
2328
2329   IRS
2330       IRS: evaluates to the input record separator from the command line,
2331       or to LF or CRLF from the input data if in autodetect mode (which is
2332       the default).
2333
2334   M_E
2335       M_E: the mathematical constant e.
2336
2337   M_PI
2338       M_PI: the mathematical constant pi.
2339
2340   NF
2341       NF: evaluates to the number of fields in the current record.
2342
2343   NR
2344       NR: evaluates to the number of the current record over all files
2345       being processed, starting with 1. Does not reset at the start of each file.
2346
2347   OFS
2348       OFS: evaluates to the output field separator from the command line.
2349
2350   OPS
2351       OPS: evaluates to the output pair separator from the command line.
2352
2353   ORS
2354       ORS: evaluates to the output record separator from the command line,
2355       or to LF or CRLF from the input data if in autodetect mode (which is
2356       the default).
2357

AUTHOR

2359       Miller is written by John Kerl <kerl.john.r@gmail.com>.
2360
2361       This manual page has been composed from Miller's help output by Eric
2362       MSP Veith <eveith@veith-m.de>.
2363

SEE ALSO

2365       awk(1), sed(1), cut(1), join(1), sort(1), RFC 4180: Common Format and
2366       MIME Type for Comma-Separated Values (CSV) Files, the miller website
2367       http://johnkerl.org/miller/doc
2368
2369
2370
2371                                  2021-03-23                         MILLER(1)
Impressum