mlr(1) - f31

1MILLER(1)                                                            MILLER(1)
2
3
4

NAME

6       Miller is like awk, sed, cut, join, and sort for name-indexed data such
7       as CSV and tabular JSON.
8

SYNOPSIS

10       Usage: mlr [I/O options] {verb} [verb-dependent options ...] {zero or
11       more file names}
12
13

DESCRIPTION

15       Miller operates on key-value-pair data while the familiar Unix tools
16       operate on integer-indexed fields: if the natural data structure for
17       the latter is the array, then Miller's natural data structure is the
18       insertion-ordered hash map.  This encompasses a variety of data
19       formats, including but not limited to the familiar CSV, TSV, and JSON.
20       (Miller can handle positionally-indexed data as a special case.) This
21       manpage documents Miller v5.4.0.
22

EXAMPLES

24   COMMAND-LINE SYNTAX
25       mlr --csv cut -f hostname,uptime mydata.csv
26       mlr --tsv --rs lf filter '$status != "down" && $upsec >= 10000' *.tsv
27       mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat
28       grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group
29       mlr join -j account_id -f accounts.dat then group-by account_name balances.dat
30       mlr --json put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json
31       mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/*
32       mlr stats2 -a linreg-pca -f u,v -g shape data/*
33       mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}' data/*
34       mlr --from estimates.tbl put '
35         for (k,v in $*) {
36           if (is_numeric(v) && k =~ "^[t-z].*$") {
37             $sum += v; $count += 1
38           }
39         }
40         $mean = $sum / $count # no assignment if count unset'
41       mlr --from infile.dat put -f analyze.mlr
42       mlr --from infile.dat put 'tee > "./taps/data-".$a."-".$b, $*'
43       mlr --from infile.dat put 'tee | "gzip > ./taps/data-".$a."-".$b.".gz", $*'
44       mlr --from infile.dat put -q '@v=$*; dump | "jq .[]"'
45       mlr --from infile.dat put  '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
46
47   DATA FORMATS
48         DKVP: delimited key-value pairs (Miller default format)
49         +---------------------+
50         | apple=1,bat=2,cog=3 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
51         | dish=7,egg=8,flint  | Record 2: "dish" => "7", "egg" => "8", "3" => "flint"
52         +---------------------+
53
54         NIDX: implicitly numerically indexed (Unix-toolkit style)
55         +---------------------+
56         | the quick brown     | Record 1: "1" => "the", "2" => "quick", "3" => "brown"
57         | fox jumped          | Record 2: "1" => "fox", "2" => "jumped"
58         +---------------------+
59
60         CSV/CSV-lite: comma-separated values with separate header line
61         +---------------------+
62         | apple,bat,cog       |
63         | 1,2,3               | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
64         | 4,5,6               | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
65         +---------------------+
66
67         Tabular JSON: nested objects are supported, although arrays within them are not:
68         +---------------------+
69         | {                   |
70         |  "apple": 1,        | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
71         |  "bat": 2,          |
72         |  "cog": 3           |
73         | }                   |
74         | {                   |
75         |   "dish": {         | Record 2: "dish:egg" => "7", "dish:flint" => "8", "garlic" => ""
76         |     "egg": 7,       |
77         |     "flint": 8      |
78         |   },                |
79         |   "garlic": ""      |
80         | }                   |
81         +---------------------+
82
83         PPRINT: pretty-printed tabular
84         +---------------------+
85         | apple bat cog       |
86         | 1     2   3         | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
87         | 4     5   6         | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
88         +---------------------+
89
90         XTAB: pretty-printed transposed tabular
91         +---------------------+
92         | apple 1             | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
93         | bat   2             |
94         | cog   3             |
95         |                     |
96         | dish 7              | Record 2: "dish" => "7", "egg" => "8"
97         | egg  8              |
98         +---------------------+
99
100         Markdown tabular (supported for output only):
101         +-----------------------+
102         | | apple | bat | cog | |
103         | | ---   | --- | --- | |
104         | | 1     | 2   | 3   | | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
105         | | 4     | 5   | 6   | | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
106         +-----------------------+
107

OPTIONS

109       In the following option flags, the version with "i" designates the
110       input stream, "o" the output stream, and the version without prefix
111       sets the option for both input and output stream. For example: --irs
112       sets the input record separator, --ors the output record separator, and
113       --rs sets both the input and output separator to the given value.
114
115   HELP OPTIONS
116         -h or --help                 Show this message.
117         --version                    Show the software version.
118         {verb name} --help           Show verb-specific help.
119         --help-all-verbs             Show help on all verbs.
120         -l or --list-all-verbs       List only verb names.
121         -L                           List only verb names, one per line.
122         -f or --help-all-functions   Show help on all built-in functions.
123         -F                           Show a bare listing of built-in functions by name.
124         -k or --help-all-keywords    Show help on all keywords.
125         -K                           Show a bare listing of keywords by name.
126
127   VERB LIST
128        altkv bar bootstrap cat check clean-whitespace count-distinct count-similar
129        cut decimate fill-down filter fraction grep group-by group-like having-fields
130        head histogram join label least-frequent merge-fields most-frequent nest
131        nothing put regularize rename reorder repeat reshape sample sec2gmt
132        sec2gmtdate seqgen shuffle sort stats1 stats2 step tac tail tee top uniq
133        unsparsify
134
135   FUNCTION LIST
136        + + - - * / // .+ .+ .- .- .* ./ .// % ** | ^ & ~ << >> bitcount == != =~ !=~
137        > >= < <= && || ^^ ! ? : . gsub regextract regextract_or_else strlen sub ssub
138        substr tolower toupper lstrip rstrip strip collapse_whitespace
139        clean_whitespace abs acos acosh asin asinh atan atan2 atanh cbrt ceil cos cosh
140        erf erfc exp expm1 floor invqnorm log log10 log1p logifit madd max mexp min
141        mmul msub pow qnorm round roundm sgn sin sinh sqrt tan tanh urand urand32
142        urandint dhms2fsec dhms2sec fsec2dhms fsec2hms gmt2sec localtime2sec hms2fsec
143        hms2sec sec2dhms sec2gmt sec2gmt sec2gmtdate sec2localtime sec2localtime
144        sec2localdate sec2hms strftime strftime_local strptime strptime_local systime
145        is_absent is_bool is_boolean is_empty is_empty_map is_float is_int is_map
146        is_nonempty_map is_not_empty is_not_map is_not_null is_null is_numeric
147        is_present is_string asserting_absent asserting_bool asserting_boolean
148        asserting_empty asserting_empty_map asserting_float asserting_int
149        asserting_map asserting_nonempty_map asserting_not_empty asserting_not_map
150        asserting_not_null asserting_null asserting_numeric asserting_present
151        asserting_string boolean float fmtnum hexfmt int string typeof depth haskey
152        joink joinkv joinv leafcount length mapdiff mapexcept mapselect mapsum splitkv
153        splitkvx splitnv splitnvx
154
155       Please use "mlr --help-function {function name}" for function-specific help.
156
157   I/O FORMATTING
158         --idkvp   --odkvp   --dkvp      Delimited key-value pairs, e.g "a=1,b=2"
159                                         (this is Miller's default format).
160
161         --inidx   --onidx   --nidx      Implicitly-integer-indexed fields
162                                         (Unix-toolkit style).
163         -T                              Synonymous with "--nidx --fs tab".
164
165         --icsv    --ocsv    --csv       Comma-separated value (or tab-separated
166                                         with --fs tab, etc.)
167
168         --itsv    --otsv    --tsv       Keystroke-savers for "--icsv --ifs tab",
169                                         "--ocsv --ofs tab", "--csv --fs tab".
170
171         --icsvlite --ocsvlite --csvlite Comma-separated value (or tab-separated
172                                         with --fs tab, etc.). The 'lite' CSV does not handle
173                                         RFC-CSV double-quoting rules; is slightly faster;
174                                         and handles heterogeneity in the input stream via
175                                         empty newline followed by new header line. See also
176                                         http://johnkerl.org/miller/doc/file-formats.html#CSV/TSV/etc.
177
178         --itsvlite --otsvlite --tsvlite Keystroke-savers for "--icsvlite --ifs tab",
179                                         "--ocsvlite --ofs tab", "--csvlite --fs tab".
180         -t                              Synonymous with --tsvlite.
181
182         --ipprint --opprint --pprint    Pretty-printed tabular (produces no
183                                         output until all input is in).
184                             --right     Right-justifies all fields for PPRINT output.
185                             --barred    Prints a border around PPRINT output
186                                         (only available for output).
187
188                   --omd                 Markdown-tabular (only available for output).
189
190         --ixtab   --oxtab   --xtab      Pretty-printed vertical-tabular.
191                             --xvright   Right-justifies values for XTAB format.
192
193         --ijson   --ojson   --json      JSON tabular: sequence or list of one-level
194                                         maps: {...}{...} or [{...},{...}].
195           --json-map-arrays-on-input    JSON arrays are unmillerable. --json-map-arrays-on-input
196           --json-skip-arrays-on-input   is the default: arrays are converted to integer-indexed
197           --json-fatal-arrays-on-input  maps. The other two options cause them to be skipped, or
198                                         to be treated as errors.  Please use the jq tool for full
199                                         JSON (pre)processing.
200                             --jvstack   Put one key-value pair per line for JSON
201                                         output.
202                             --jlistwrap Wrap JSON output in outermost [ ].
203                           --jknquoteint Do not quote non-string map keys in JSON output.
204                            --jvquoteall Quote map values in JSON output, even if they're
205                                         numeric.
206                     --jflatsep {string} Separator for flattening multi-level JSON keys,
207                                         e.g. '{"a":{"b":3}}' becomes a:b => 3 for
208                                         non-JSON formats. Defaults to :.
209
210         -p is a keystroke-saver for --nidx --fs space --repifs
211
212         --mmap --no-mmap --mmap-below {n} Use mmap for files whenever possible, never, or
213                                         for files less than n bytes in size. Default is for
214                                         files less than 4294967296 bytes in size.
215                                         'Whenever possible' means always except for when reading
216                                         standard input which is not mmappable. If you don't know
217                                         what this means, don't worry about it -- it's a minor
218                                         performance optimization.
219
220         Examples: --csv for CSV-formatted input and output; --idkvp --opprint for
221         DKVP-formatted input and pretty-printed output.
222
223   COMMENTS IN DATA
224         --skip-comments                 Ignore commented lines (prefixed by "#")
225                                         within the input.
226         --skip-comments-with {string}   Ignore commented lines within input, with
227                                         specified prefix.
228         --pass-comments                 Immediately print commented lines (prefixed by "#")
229                                         within the input.
230         --pass-comments-with {string}   Immediately print commented lines within input, with
231                                         specified prefix.
232       Notes:
233       * Comments are only honored at the start of a line.
234       * In the absence of any of the above four options, comments are data like
235         any other text.
236       * When pass-comments is used, comment lines are written to standard output
237         immediately upon being read; they are not part of the record stream.
238         Results may be counterintuitive. A suggestion is to place comments at the
239         start of data files.
240
241   FORMAT-CONVERSION KEYSTROKE-SAVERS
242       As keystroke-savers for format-conversion you may use the following:
243         --c2t --c2d --c2n --c2j --c2x --c2p --c2m
244         --t2c       --t2d --t2n --t2j --t2x --t2p --t2m
245         --d2c --d2t       --d2n --d2j --d2x --d2p --d2m
246         --n2c --n2t --n2d       --n2j --n2x --n2p --n2m
247         --j2c --j2t --j2d --j2n       --j2x --j2p --j2m
248         --x2c --x2t --x2d --x2n --x2j       --x2p --x2m
249         --p2c --p2t --p2d --p2n --p2j --p2x       --p2m
250       The letters c t d n j x p m refer to formats CSV, TSV, DKVP, NIDX, JSON, XTAB,
251       PPRINT, and markdown, respectively. Note that markdown format is available for
252       output only.
253
254   COMPRESSED I/O
255         --prepipe {command} This allows Miller to handle compressed inputs. You can do
256         without this for single input files, e.g. "gunzip < myfile.csv.gz | mlr ...".
257         However, when multiple input files are present, between-file separations are
258         lost; also, the FILENAME variable doesn't iterate. Using --prepipe you can
259         specify an action to be taken on each input file. This pre-pipe command must
260         be able to read from standard input; it will be invoked with
261           {command} < {filename}.
262         Examples:
263           mlr --prepipe 'gunzip'
264           mlr --prepipe 'zcat -cf'
265           mlr --prepipe 'xz -cd'
266           mlr --prepipe cat
267         Note that this feature is quite general and is not limited to decompression
268         utilities. You can use it to apply per-file filters of your choice.
269         For output compression (or other) utilities, simply pipe the output:
270           mlr ... | {your compression command}
271
272   SEPARATORS
273         --rs     --irs     --ors              Record separators, e.g. 'lf' or '\r\n'
274         --fs     --ifs     --ofs  --repifs    Field separators, e.g. comma
275         --ps     --ips     --ops              Pair separators, e.g. equals sign
276
277         Notes about line endings:
278         * Default line endings (--irs and --ors) are "auto" which means autodetect from
279           the input file format, as long as the input file(s) have lines ending in either
280           LF (also known as linefeed, '\n', 0x0a, Unix-style) or CRLF (also known as
281           carriage-return/linefeed pairs, '\r\n', 0x0d 0x0a, Windows style).
282         * If both irs and ors are auto (which is the default) then LF input will lead to LF
283           output and CRLF input will lead to CRLF output, regardless of the platform you're
284           running on.
285         * The line-ending autodetector triggers on the first line ending detected in the input
286           stream. E.g. if you specify a CRLF-terminated file on the command line followed by an
287           LF-terminated file then autodetected line endings will be CRLF.
288         * If you use --ors {something else} with (default or explicitly specified) --irs auto
289           then line endings are autodetected on input and set to what you specify on output.
290         * If you use --irs {something else} with (default or explicitly specified) --ors auto
291           then the output line endings used are LF on Unix/Linux/BSD/MacOSX, and CRLF on Windows.
292
293         Notes about all other separators:
294         * IPS/OPS are only used for DKVP and XTAB formats, since only in these formats
295           do key-value pairs appear juxtaposed.
296         * IRS/ORS are ignored for XTAB format. Nominally IFS and OFS are newlines;
297           XTAB records are separated by two or more consecutive IFS/OFS -- i.e.
298           a blank line. Everything above about --irs/--ors/--rs auto becomes --ifs/--ofs/--fs
299           auto for XTAB format. (XTAB's default IFS/OFS are "auto".)
300         * OFS must be single-character for PPRINT format. This is because it is used
301           with repetition for alignment; multi-character separators would make
302           alignment impossible.
303         * OPS may be multi-character for XTAB format, in which case alignment is
304           disabled.
305         * TSV is simply CSV using tab as field separator ("--fs tab").
306         * FS/PS are ignored for markdown format; RS is used.
307         * All FS and PS options are ignored for JSON format, since they are not relevant
308           to the JSON format.
309         * You can specify separators in any of the following ways, shown by example:
310           - Type them out, quoting as necessary for shell escapes, e.g.
311             "--fs '|' --ips :"
312           - C-style escape sequences, e.g. "--rs '\r\n' --fs '\t'".
313           - To avoid backslashing, you can use any of the following names:
314             cr crcr newline lf lflf crlf crlfcrlf tab space comma pipe slash colon semicolon equals
315         * Default separators by format:
316             File format  RS       FS       PS
317             gen          N/A      (N/A)    (N/A)
318             dkvp         auto     ,        =
319             json         auto     (N/A)    (N/A)
320             nidx         auto     space    (N/A)
321             csv          auto     ,        (N/A)
322             csvlite      auto     ,        (N/A)
323             markdown     auto     (N/A)    (N/A)
324             pprint       auto     space    (N/A)
325             xtab         (N/A)    auto     space
326
327   CSV-SPECIFIC OPTIONS
328         --implicit-csv-header Use 1,2,3,... as field labels, rather than from line 1
329                            of input files. Tip: combine with "label" to recreate
330                            missing headers.
331         --headerless-csv-output   Print only CSV data lines.
332
333   DOUBLE-QUOTING FOR CSV/CSVLITE OUTPUT
334         --quote-all        Wrap all fields in double quotes
335         --quote-none       Do not wrap any fields in double quotes, even if they have
336                            OFS or ORS in them
337         --quote-minimal    Wrap fields in double quotes only if they have OFS or ORS
338                            in them (default)
339         --quote-numeric    Wrap fields in double quotes only if they have numbers
340                            in them
341         --quote-original   Wrap fields in double quotes if and only if they were
342                            quoted on input. This isn't sticky for computed fields:
343                            e.g. if fields a and b were quoted on input and you do
344                            "put '$c = $a . $b'" then field c won't inherit a or b's
345                            was-quoted-on-input flag.
346
347   NUMERICAL FORMATTING
348         --ofmt {format}    E.g. %.18lf, %.0lf. Please use sprintf-style codes for
349                            double-precision. Applies to verbs which compute new
350                            values, e.g. put, stats1, stats2. See also the fmtnum
351                            function within mlr put (mlr --help-all-functions).
352                            Defaults to %lf.
353
354   OTHER OPTIONS
355         --seed {n} with n of the form 12345678 or 0xcafefeed. For put/filter
356                            urand()/urandint()/urand32().
357         --nr-progress-mod {m}, with m a positive integer: print filename and record
358                            count to stderr every m input records.
359         --from {filename}  Use this to specify an input file before the verb(s),
360                            rather than after. May be used more than once. Example:
361                            "mlr --from a.dat --from b.dat cat" is the same as
362                            "mlr cat a.dat b.dat".
363         -n                 Process no input files, nor standard input either. Useful
364                            for mlr put with begin/end statements only. (Same as --from
365                            /dev/null.) Also useful in "mlr -n put -v '...'" for
366                            analyzing abstract syntax trees (if that's your thing).
367         -I                 Process files in-place. For each file name on the command
368                            line, output is written to a temp file in the same
369                            directory, which is then renamed over the original. Each
370                            file is processed in isolation: if the output format is
371                            CSV, CSV headers will be present in each output file;
372                            statistics are only over each file's own records; and so on.
373
374   THEN-CHAINING
375       Output of one verb may be chained as input to another using "then", e.g.
376         mlr stats1 -a min,mean,max -f flag,u,v -g color then sort -f color
377
378   AUXILIARY COMMANDS
379       Miller has a few otherwise-standalone executables packaged within it.
380       They do not participate in any other parts of Miller.
381       Available subcommands:
382         aux-list
383         lecat
384         termcvt
385         hex
386         unhex
387         netbsd-strptime
388       For more information, please invoke mlr {subcommand} --help
389

VERBS

391   altkv
392       Usage: mlr altkv [no options]
393       Given fields with values of the form a,b,c,d,e,f emits a=b,c=d,e=f pairs.
394
395   bar
396       Usage: mlr bar [options]
397       Replaces a numeric field with a number of asterisks, allowing for cheesy
398       bar plots. These align best with --opprint or --oxtab output format.
399       Options:
400       -f   {a,b,c}      Field names to convert to bars.
401       -c   {character}  Fill character: default '*'.
402       -x   {character}  Out-of-bounds character: default '#'.
403       -b   {character}  Blank character: default '.'.
404       --lo {lo}         Lower-limit value for min-width bar: default '0.000000'.
405       --hi {hi}         Upper-limit value for max-width bar: default '100.000000'.
406       -w   {n}          Bar-field width: default '40'.
407       --auto            Automatically computes limits, ignoring --lo and --hi.
408                         Holds all records in memory before producing any output.
409
410   bootstrap
411       Usage: mlr bootstrap [options]
412       Emits an n-sample, with replacement, of the input records.
413       Options:
414       -n {number} Number of samples to output. Defaults to number of input records.
415                   Must be non-negative.
416       See also mlr sample and mlr shuffle.
417
418   cat
419       Usage: mlr cat [options]
420       Passes input records directly to output. Most useful for format conversion.
421       Options:
422       -n        Prepend field "n" to each record with record-counter starting at 1
423       -g {comma-separated field name(s)} When used with -n/-N, writes record-counters
424                 keyed by specified field name(s).
425       -N {name} Prepend field {name} to each record with record-counter starting at 1
426
427   check
428       Usage: mlr check
429       Consumes records without printing any output.
430       Useful for doing a well-formatted check on input data.
431
432   clean-whitespace
433       Usage: mlr clean-whitespace [options] {old1,new1,old2,new2,...}
434       For each record, for each field in the record, whitespace-cleans the keys and
435       values. Whitespace-cleaning entails stripping leading and trailing whitespace,
436       and replacing multiple whitespace with singles. For finer-grained control,
437       please see the DSL functions lstrip, rstrip, strip, collapse_whitespace,
438       and clean_whitespace.
439
440       Options:
441       -k|--keys-only    Do not touch values.
442       -v|--values-only  Do not touch keys.
443       It is an error to specify -k as well as -v.
444
445   count-distinct
446       Usage: mlr count-distinct [options]
447       Prints number of records having distinct values for specified field names.
448       Same as uniq -c.
449
450       Options:
451       -f {a,b,c}    Field names for distinct count.
452       -n            Show only the number of distinct values. Not compatible with -u.
453       -o {name}     Field name for output count. Default "count".
454                     Ignored with -u.
455       -u            Do unlashed counts for multiple field names. With -f a,b and
456                     without -u, computes counts for distinct combinations of a
457                     and b field values. With -f a,b and with -u, computes counts
458                     for distinct a field values and counts for distinct b field
459                     values separately.
460
461   count-similar
462       Usage: mlr count-similar [options]
463       Ingests all records, then emits each record augmented by a count of
464       the number of other records having the same group-by field values.
465       Options:
466       -g {d,e,f} Group-by-field names for counts.
467       -o {name}  Field name for output count. Default "count".
468
469   cut
470       Usage: mlr cut [options]
471       Passes through input records with specified fields included/excluded.
472       -f {a,b,c}       Field names to include for cut.
473       -o               Retain fields in the order specified here in the argument list.
474                        Default is to retain them in the order found in the input data.
475       -x|--complement  Exclude, rather than include, field names specified by -f.
476       -r               Treat field names as regular expressions. "ab", "a.*b" will
477                        match any field name containing the substring "ab" or matching
478                        "a.*b", respectively; anchors of the form "^ab$", "^a.*b$" may
479                        be used. The -o flag is ignored when -r is present.
480       Examples:
481         mlr cut -f hostname,status
482         mlr cut -x -f hostname,status
483         mlr cut -r -f '^status$,sda[0-9]'
484         mlr cut -r -f '^status$,"sda[0-9]"'
485         mlr cut -r -f '^status$,"sda[0-9]"i' (this is case-insensitive)
486
487   decimate
488       Usage: mlr decimate [options]
489       -n {count}    Decimation factor; default 10
490       -b            Decimate by printing first of every n.
491       -e            Decimate by printing last of every n (default).
492       -g {a,b,c}    Optional group-by-field names for decimate counts
493       Passes through one of every n records, optionally by category.
494
495   fill-down
496       Usage: mlr fill-down [options]
497       -f {a,b,c}          Field names for fill-down
498       -a|--only-if-absent Field names for fill-down
499       If a given record has a missing value for a given field, fill that from
500       the corresponding value from a previous record, if any.
501       By default, a 'missing' field either is absent, or has the empty-string value.
502       With -a, a field is 'missing' only if it is absent.
503
504   filter
505       Usage: mlr filter [options] {expression}
506       Prints records for which {expression} evaluates to true.
507       If there are multiple semicolon-delimited expressions, all of them are
508       evaluated and the last one is used as the filter criterion.
509
510       Conversion options:
511       -S: Keeps field values as strings with no type inference to int or float.
512       -F: Keeps field values as strings or floats with no inference to int.
513       All field values are type-inferred to int/float/string unless this behavior is
514       suppressed with -S or -F.
515
516       Output/formatting options:
517       --oflatsep {string}: Separator to use when flattening multi-level @-variables
518           to output records for emit. Default ":".
519       --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
520       --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
521       Any of the output-format command-line flags (see mlr -h). Example: using
522         mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
523       the input is CSV, the output is pretty-print tabular, but the tee-file output
524       is written in JSON format.
525       --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
526           record.
527
528       Expression-specification options:
529       -f {filename}: the DSL expression is taken from the specified file rather
530           than from the command line. Outer single quotes wrapping the expression
531           should not be placed in the file. If -f is specified more than once,
532           all input files specified using -f are concatenated to produce the expression.
533           (For example, you can define functions in one file and call them from another.)
534       -e {expression}: You can use this after -f to add an expression. Example use
535           case: define functions/subroutines in a file you specify with -f, then call
536           them with an expression you specify with -e.
537       (If you mix -e and -f then the expressions are evaluated in the order encountered.
538       Since the expression pieces are simply concatenated, please be sure to use intervening
539       semicolons to separate expressions.)
540
541       Tracing options:
542       -v: Prints the expressions's AST (abstract syntax tree), which gives
543           full transparency on the precedence and associativity rules of
544           Miller's grammar, to stdout.
545       -a: Prints a low-level stack-allocation trace to stdout.
546       -t: Prints a low-level parser trace to stderr.
547       -T: Prints a every statement to stderr as it is executed.
548
549       Other options:
550       -x: Prints records for which {expression} evaluates to false.
551
552       Please use a dollar sign for field names and double-quotes for string
553       literals. If field names have special characters such as "." then you might
554       use braces, e.g. '${field.name}'. Miller built-in variables are
555       NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
556       variables. The environment-variable name may be an expression, e.g. a field
557       value.
558
559       Use # to comment to end of line.
560
561       Examples:
562         mlr filter 'log10($count) > 4.0'
563         mlr filter 'FNR == 2          (second record in each file)'
564         mlr filter 'urand() < 0.001'  (subsampling)
565         mlr filter '$color != "blue" && $value > 4.2'
566         mlr filter '($x<.5 && $y<.5) || ($x>.5 && $y>.5)'
567         mlr filter '($name =~ "^sys.*east$") || ($name =~ "^dev.[0-9]+"i)'
568         mlr filter '$ab = $a+$b; $cd = $c+$d; $ab != $cd'
569         mlr filter '
570           NR == 1 ||
571          #NR == 2 ||
572           NR == 3
573         '
574
575       Please see http://johnkerl.org/miller/doc/reference.html for more information
576       including function list. Or "mlr -f". Please also also "mlr grep" which is
577       useful when you don't yet know which field name(s) you're looking for.
578
579   fraction
580       Usage: mlr fraction [options]
581       For each record's value in specified fields, computes the ratio of that
582       value to the sum of values in that field over all input records.
583       E.g. with input records  x=1  x=2  x=3  and  x=4, emits output records
584       x=1,x_fraction=0.1  x=2,x_fraction=0.2  x=3,x_fraction=0.3  and  x=4,x_fraction=0.4
585
586       Note: this is internally a two-pass algorithm: on the first pass it retains
587       input records and accumulates sums; on the second pass it computes quotients
588       and emits output records. This means it produces no output until all input is read.
589
590       Options:
591       -f {a,b,c}    Field name(s) for fraction calculation
592       -g {d,e,f}    Optional group-by-field name(s) for fraction counts
593       -p            Produce percents [0..100], not fractions [0..1]. Output field names
594                     end with "_percent" rather than "_fraction"
595       -c            Produce cumulative distributions, i.e. running sums: each output
596                     value folds in the sum of the previous for the specified group
597                     E.g. with input records  x=1  x=2  x=3  and  x=4, emits output records
598                     x=1,x_cumulative_fraction=0.1  x=2,x_cumulative_fraction=0.3
599                     x=3,x_cumulative_fraction=0.6  and  x=4,x_cumulative_fraction=1.0
600
601   grep
602       Usage: mlr grep [options] {regular expression}
603       Passes through records which match {regex}.
604       Options:
605       -i    Use case-insensitive search.
606       -v    Invert: pass through records which do not match the regex.
607       Note that "mlr filter" is more powerful, but requires you to know field names.
608       By contrast, "mlr grep" allows you to regex-match the entire record. It does
609       this by formatting each record in memory as DKVP, using command-line-specified
610       ORS/OFS/OPS, and matching the resulting line against the regex specified
611       here. In particular, the regex is not applied to the input stream: if you
612       have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
613       be matched, not against either of these lines, but against the DKVP line
614       "x=1,y=2,z=3".  Furthermore, not all the options to system grep are supported,
615       and this command is intended to be merely a keystroke-saver. To get all the
616       features of system grep, you can do
617         "mlr --odkvp ... | grep ... | mlr --idkvp ..."
618
619   group-by
620       Usage: mlr group-by {comma-separated field names}
621       Outputs records in batches having identical values at specified field names.
622
623   group-like
624       Usage: mlr group-like
625       Outputs records in batches having identical field names.
626
627   having-fields
628       Usage: mlr having-fields [options]
629       Conditionally passes through records depending on each record's field names.
630       Options:
631         --at-least      {comma-separated names}
632         --which-are     {comma-separated names}
633         --at-most       {comma-separated names}
634         --all-matching  {regular expression}
635         --any-matching  {regular expression}
636         --none-matching {regular expression}
637       Examples:
638         mlr having-fields --which-are amount,status,owner
639         mlr having-fields --any-matching 'sda[0-9]'
640         mlr having-fields --any-matching '"sda[0-9]"'
641         mlr having-fields --any-matching '"sda[0-9]"i' (this is case-insensitive)
642
643   head
644       Usage: mlr head [options]
645       -n {count}    Head count to print; default 10
646       -g {a,b,c}    Optional group-by-field names for head counts
647       Passes through the first n records, optionally by category.
648       Without -g, ceases consuming more input (i.e. is fast) when n
649       records have been read.
650
651   histogram
652       Usage: mlr histogram [options]
653       -f {a,b,c}    Value-field names for histogram counts
654       --lo {lo}     Histogram low value
655       --hi {hi}     Histogram high value
656       --nbins {n}   Number of histogram bins
657       --auto        Automatically computes limits, ignoring --lo and --hi.
658                     Holds all values in memory before producing any output.
659       -o {prefix}   Prefix for output field name. Default: no prefix.
660       Just a histogram. Input values < lo or > hi are not counted.
661
662   join
663       Usage: mlr join [options]
664       Joins records from specified left file name with records from all file names
665       at the end of the Miller argument list.
666       Functionality is essentially the same as the system "join" command, but for
667       record streams.
668       Options:
669         -f {left file name}
670         -j {a,b,c}   Comma-separated join-field names for output
671         -l {a,b,c}   Comma-separated join-field names for left input file;
672                      defaults to -j values if omitted.
673         -r {a,b,c}   Comma-separated join-field names for right input file(s);
674                      defaults to -j values if omitted.
675         --lp {text}  Additional prefix for non-join output field names from
676                      the left file
677         --rp {text}  Additional prefix for non-join output field names from
678                      the right file(s)
679         --np         Do not emit paired records
680         --ul         Emit unpaired records from the left file
681         --ur         Emit unpaired records from the right file(s)
682         -s|--sorted-input  Require sorted input: records must be sorted
683                      lexically by their join-field names, else not all records will
684                      be paired. The only likely use case for this is with a left
685                      file which is too big to fit into system memory otherwise.
686         -u           Enable unsorted input. (This is the default even without -u.)
687                      In this case, the entire left file will be loaded into memory.
688         --prepipe {command} As in main input options; see mlr --help for details.
689                      If you wish to use a prepipe command for the main input as well
690                      as here, it must be specified there as well as here.
691       File-format options default to those for the right file names on the Miller
692       argument list, but may be overridden for the left file as follows. Please see
693       the main "mlr --help" for more information on syntax for these arguments.
694         -i {one of csv,dkvp,nidx,pprint,xtab}
695         --irs {record-separator character}
696         --ifs {field-separator character}
697         --ips {pair-separator character}
698         --repifs
699         --repips
700         --mmap
701         --no-mmap
702       Please use "mlr --usage-separator-options" for information on specifying separators.
703       Please see http://johnkerl.org/miller/doc/reference.html for more information
704       including examples.
705
706   label
707       Usage: mlr label {new1,new2,new3,...}
708       Given n comma-separated names, renames the first n fields of each record to
709       have the respective name. (Fields past the nth are left with their original
710       names.) Particularly useful with --inidx or --implicit-csv-header, to give
711       useful names to otherwise integer-indexed fields.
712       Examples:
713         "echo 'a b c d' | mlr --inidx --odkvp cat"       gives "1=a,2=b,3=c,4=d"
714         "echo 'a b c d' | mlr --inidx --odkvp label s,t" gives "s=a,t=b,3=c,4=d"
715
716   least-frequent
717       Usage: mlr least-frequent [options]
718       Shows the least frequently occurring distinct values for specified field names.
719       The first entry is the statistical anti-mode; the remaining are runners-up.
720       Options:
721       -f {one or more comma-separated field names}. Required flag.
722       -n {count}. Optional flag defaulting to 10.
723       -b          Suppress counts; show only field values.
724       -o {name}   Field name for output count. Default "count".
725       See also "mlr most-frequent".
726
727   merge-fields
728       Usage: mlr merge-fields [options]
729       Computes univariate statistics for each input record, accumulated across
730       specified fields.
731       Options:
732       -a {sum,count,...}  Names of accumulators. One or more of:
733         count     Count instances of fields
734         mode      Find most-frequently-occurring values for fields; first-found wins tie
735         antimode  Find least-frequently-occurring values for fields; first-found wins tie
736         sum       Compute sums of specified fields
737         mean      Compute averages (sample means) of specified fields
738         stddev    Compute sample standard deviation of specified fields
739         var       Compute sample variance of specified fields
740         meaneb    Estimate error bars for averages (assuming no sample autocorrelation)
741         skewness  Compute sample skewness of specified fields
742         kurtosis  Compute sample kurtosis of specified fields
743         min       Compute minimum values of specified fields
744         max       Compute maximum values of specified fields
745       -f {a,b,c}  Value-field names on which to compute statistics. Requires -o.
746       -r {a,b,c}  Regular expressions for value-field names on which to compute
747                   statistics. Requires -o.
748       -c {a,b,c}  Substrings for collapse mode. All fields which have the same names
749                   after removing substrings will be accumulated together. Please see
750                   examples below.
751       -i          Use interpolated percentiles, like R's type=7; default like type=1.
752                   Not sensical for string-valued fields.
753       -o {name}   Output field basename for -f/-r.
754       -k          Keep the input fields which contributed to the output statistics;
755                   the default is to omit them.
756       -F          Computes integerable things (e.g. count) in floating point.
757
758       String-valued data make sense unless arithmetic on them is required,
759       e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
760       numbers are less than strings.
761
762       Example input data: "a_in_x=1,a_out_x=2,b_in_y=4,b_out_x=8".
763       Example: mlr merge-fields -a sum,count -f a_in_x,a_out_x -o foo
764         produces "b_in_y=4,b_out_x=8,foo_sum=3,foo_count=2" since "a_in_x,a_out_x" are
765         summed over.
766       Example: mlr merge-fields -a sum,count -r in_,out_ -o bar
767         produces "bar_sum=15,bar_count=4" since all four fields are summed over.
768       Example: mlr merge-fields -a sum,count -c in_,out_
769         produces "a_x_sum=3,a_x_count=2,b_y_sum=4,b_y_count=1,b_x_sum=8,b_x_count=1"
770         since "a_in_x" and "a_out_x" both collapse to "a_x", "b_in_y" collapses to
771         "b_y", and "b_out_x" collapses to "b_x".
772
773   most-frequent
774       Usage: mlr most-frequent [options]
775       Shows the most frequently occurring distinct values for specified field names.
776       The first entry is the statistical mode; the remaining are runners-up.
777       Options:
778       -f {one or more comma-separated field names}. Required flag.
779       -n {count}. Optional flag defaulting to 10.
780       -b          Suppress counts; show only field values.
781       -o {name}   Field name for output count. Default "count".
782       See also "mlr least-frequent".
783
784   nest
785       Usage: mlr nest [options]
786       Explodes specified field values into separate fields/records, or reverses this.
787       Options:
788         --explode,--implode   One is required.
789         --values,--pairs      One is required.
790         --across-records,--across-fields One is required.
791         -f {field name}       Required.
792         --nested-fs {string}  Defaults to ";". Field separator for nested values.
793         --nested-ps {string}  Defaults to ":". Pair separator for nested key-value pairs.
794         --evar {string}       Shorthand for --explode --values ---across-records --nested-fs {string}
795       Please use "mlr --usage-separator-options" for information on specifying separators.
796
797       Examples:
798
799         mlr nest --explode --values --across-records -f x
800         with input record "x=a;b;c,y=d" produces output records
801           "x=a,y=d"
802           "x=b,y=d"
803           "x=c,y=d"
804         Use --implode to do the reverse.
805
806         mlr nest --explode --values --across-fields -f x
807         with input record "x=a;b;c,y=d" produces output records
808           "x_1=a,x_2=b,x_3=c,y=d"
809         Use --implode to do the reverse.
810
811         mlr nest --explode --pairs --across-records -f x
812         with input record "x=a:1;b:2;c:3,y=d" produces output records
813           "a=1,y=d"
814           "b=2,y=d"
815           "c=3,y=d"
816
817         mlr nest --explode --pairs --across-fields -f x
818         with input record "x=a:1;b:2;c:3,y=d" produces output records
819           "a=1,b=2,c=3,y=d"
820
821       Notes:
822       * With --pairs, --implode doesn't make sense since the original field name has
823         been lost.
824       * The combination "--implode --values --across-records" is non-streaming:
825         no output records are produced until all input records have been read. In
826         particular, this means it won't work in tail -f contexts. But all other flag
827         combinations result in streaming (tail -f friendly) data processing.
828       * It's up to you to ensure that the nested-fs is distinct from your data's IFS:
829         e.g. by default the former is semicolon and the latter is comma.
830       See also mlr reshape.
831
832   nothing
833       Usage: mlr nothing [options]
834       Drops all input records. Useful for testing, or after tee/print/etc. have
835       produced other output.
836
837   put
838       Usage: mlr put [options] {expression}
839       Adds/updates specified field(s). Expressions are semicolon-separated and must
840       either be assignments, or evaluate to boolean.  Booleans with following
841       statements in curly braces control whether those statements are executed;
842       booleans without following curly braces do nothing except side effects (e.g.
843       regex-captures into \1, \2, etc.).
844
845       Conversion options:
846       -S: Keeps field values as strings with no type inference to int or float.
847       -F: Keeps field values as strings or floats with no inference to int.
848       All field values are type-inferred to int/float/string unless this behavior is
849       suppressed with -S or -F.
850
851       Output/formatting options:
852       --oflatsep {string}: Separator to use when flattening multi-level @-variables
853           to output records for emit. Default ":".
854       --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
855       --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
856       Any of the output-format command-line flags (see mlr -h). Example: using
857         mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
858       the input is CSV, the output is pretty-print tabular, but the tee-file output
859       is written in JSON format.
860       --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
861           record.
862
863       Expression-specification options:
864       -f {filename}: the DSL expression is taken from the specified file rather
865           than from the command line. Outer single quotes wrapping the expression
866           should not be placed in the file. If -f is specified more than once,
867           all input files specified using -f are concatenated to produce the expression.
868           (For example, you can define functions in one file and call them from another.)
869       -e {expression}: You can use this after -f to add an expression. Example use
870           case: define functions/subroutines in a file you specify with -f, then call
871           them with an expression you specify with -e.
872       (If you mix -e and -f then the expressions are evaluated in the order encountered.
873       Since the expression pieces are simply concatenated, please be sure to use intervening
874       semicolons to separate expressions.)
875
876       Tracing options:
877       -v: Prints the expressions's AST (abstract syntax tree), which gives
878           full transparency on the precedence and associativity rules of
879           Miller's grammar, to stdout.
880       -a: Prints a low-level stack-allocation trace to stdout.
881       -t: Prints a low-level parser trace to stderr.
882       -T: Prints a every statement to stderr as it is executed.
883
884       Other options:
885       -q: Does not include the modified record in the output stream. Useful for when
886           all desired output is in begin and/or end blocks.
887
888       Please use a dollar sign for field names and double-quotes for string
889       literals. If field names have special characters such as "." then you might
890       use braces, e.g. '${field.name}'. Miller built-in variables are
891       NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
892       variables. The environment-variable name may be an expression, e.g. a field
893       value.
894
895       Use # to comment to end of line.
896
897       Examples:
898         mlr put '$y = log10($x); $z = sqrt($y)'
899         mlr put '$x>0.0 { $y=log10($x); $z=sqrt($y) }' # does {...} only if $x > 0.0
900         mlr put '$x>0.0;  $y=log10($x); $z=sqrt($y)'   # does all three statements
901         mlr put '$a =~ "([a-z]+)_([0-9]+);  $b = "left_\1"; $c = "right_\2"'
902         mlr put '$a =~ "([a-z]+)_([0-9]+) { $b = "left_\1"; $c = "right_\2" }'
903         mlr put '$filename = FILENAME'
904         mlr put '$colored_shape = $color . "_" . $shape'
905         mlr put '$y = cos($theta); $z = atan2($y, $x)'
906         mlr put '$name = sub($name, "http.*com"i, "")'
907         mlr put -q '@sum += $x; end {emit @sum}'
908         mlr put -q '@sum[$a] += $x; end {emit @sum, "a"}'
909         mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}'
910         mlr put -q '@min=min(@min,$x);@max=max(@max,$x); end{emitf @min, @max}'
911         mlr put -q 'is_null(@xmax) || $x > @xmax {@xmax=$x; @recmax=$*}; end {emit @recmax}'
912         mlr put '
913           $x = 1;
914          #$y = 2;
915           $z = 3
916         '
917
918       Please see also 'mlr -k' for examples using redirected output.
919
920       Please see http://johnkerl.org/miller/doc/reference.html for more information
921       including function list. Or "mlr -f".
922       Please see in particular:
923         http://www.johnkerl.org/miller/doc/reference.html#put
924
925   regularize
926       Usage: mlr regularize
927       For records seen earlier in the data stream with same field names in
928       a different order, outputs them with field names in the previously
929       encountered order.
930       Example: input records a=1,c=2,b=3, then e=4,d=5, then c=7,a=6,b=8
931       output as              a=1,c=2,b=3, then e=4,d=5, then a=6,c=7,b=8
932
933   rename
934       Usage: mlr rename [options] {old1,new1,old2,new2,...}
935       Renames specified fields.
936       Options:
937       -r         Treat old field  names as regular expressions. "ab", "a.*b"
938                  will match any field name containing the substring "ab" or
939                  matching "a.*b", respectively; anchors of the form "^ab$",
940                  "^a.*b$" may be used. New field names may be plain strings,
941                  or may contain capture groups of the form "\1" through
942                  "\9". Wrapping the regex in double quotes is optional, but
943                  is required if you wish to follow it with 'i' to indicate
944                  case-insensitivity.
945       -g         Do global replacement within each field name rather than
946                  first-match replacement.
947       Examples:
948       mlr rename old_name,new_name'
949       mlr rename old_name_1,new_name_1,old_name_2,new_name_2'
950       mlr rename -r 'Date_[0-9]+,Date,'  Rename all such fields to be "Date"
951       mlr rename -r '"Date_[0-9]+",Date' Same
952       mlr rename -r 'Date_([0-9]+).*,\1' Rename all such fields to be of the form 20151015
953       mlr rename -r '"name"i,Name'       Rename "name", "Name", "NAME", etc. to "Name"
954
955   reorder
956       Usage: mlr reorder [options]
957       -f {a,b,c}   Field names to reorder.
958       -e           Put specified field names at record end: default is to put
959                    them at record start.
960       Examples:
961       mlr reorder    -f a,b sends input record "d=4,b=2,a=1,c=3" to "a=1,b=2,d=4,c=3".
962       mlr reorder -e -f a,b sends input record "d=4,b=2,a=1,c=3" to "d=4,c=3,a=1,b=2".
963
964   repeat
965       Usage: mlr repeat [options]
966       Copies input records to output records multiple times.
967       Options must be exactly one of the following:
968         -n {repeat count}  Repeat each input record this many times.
969         -f {field name}    Same, but take the repeat count from the specified
970                            field name of each input record.
971       Example:
972         echo x=0 | mlr repeat -n 4 then put '$x=urand()'
973       produces:
974        x=0.488189
975        x=0.484973
976        x=0.704983
977        x=0.147311
978       Example:
979         echo a=1,b=2,c=3 | mlr repeat -f b
980       produces:
981         a=1,b=2,c=3
982         a=1,b=2,c=3
983       Example:
984         echo a=1,b=2,c=3 | mlr repeat -f c
985       produces:
986         a=1,b=2,c=3
987         a=1,b=2,c=3
988         a=1,b=2,c=3
989
990   reshape
991       Usage: mlr reshape [options]
992       Wide-to-long options:
993         -i {input field names}   -o {key-field name,value-field name}
994         -r {input field regexes} -o {key-field name,value-field name}
995         These pivot/reshape the input data such that the input fields are removed
996         and separate records are emitted for each key/value pair.
997         Note: this works with tail -f and produces output records for each input
998         record seen.
999       Long-to-wide options:
1000         -s {key-field name,value-field name}
1001         These pivot/reshape the input data to undo the wide-to-long operation.
1002         Note: this does not work with tail -f; it produces output records only after
1003         all input records have been read.
1004
1005       Examples:
1006
1007         Input file "wide.txt":
1008           time       X           Y
1009           2009-01-01 0.65473572  2.4520609
1010           2009-01-02 -0.89248112 0.2154713
1011           2009-01-03 0.98012375  1.3179287
1012
1013         mlr --pprint reshape -i X,Y -o item,value wide.txt
1014           time       item value
1015           2009-01-01 X    0.65473572
1016           2009-01-01 Y    2.4520609
1017           2009-01-02 X    -0.89248112
1018           2009-01-02 Y    0.2154713
1019           2009-01-03 X    0.98012375
1020           2009-01-03 Y    1.3179287
1021
1022         mlr --pprint reshape -r '[A-Z]' -o item,value wide.txt
1023           time       item value
1024           2009-01-01 X    0.65473572
1025           2009-01-01 Y    2.4520609
1026           2009-01-02 X    -0.89248112
1027           2009-01-02 Y    0.2154713
1028           2009-01-03 X    0.98012375
1029           2009-01-03 Y    1.3179287
1030
1031         Input file "long.txt":
1032           time       item value
1033           2009-01-01 X    0.65473572
1034           2009-01-01 Y    2.4520609
1035           2009-01-02 X    -0.89248112
1036           2009-01-02 Y    0.2154713
1037           2009-01-03 X    0.98012375
1038           2009-01-03 Y    1.3179287
1039
1040         mlr --pprint reshape -s item,value long.txt
1041           time       X           Y
1042           2009-01-01 0.65473572  2.4520609
1043           2009-01-02 -0.89248112 0.2154713
1044           2009-01-03 0.98012375  1.3179287
1045       See also mlr nest.
1046
1047   sample
1048       Usage: mlr sample [options]
1049       Reservoir sampling (subsampling without replacement), optionally by category.
1050       -k {count}    Required: number of records to output, total, or by group if using -g.
1051       -g {a,b,c}    Optional: group-by-field names for samples.
1052       See also mlr bootstrap and mlr shuffle.
1053
1054   sec2gmt
1055       Usage: mlr sec2gmt [options] {comma-separated list of field names}
1056       Replaces a numeric field representing seconds since the epoch with the
1057       corresponding GMT timestamp; leaves non-numbers as-is. This is nothing
1058       more than a keystroke-saver for the sec2gmt function:
1059         mlr sec2gmt time1,time2
1060       is the same as
1061         mlr put '$time1=sec2gmt($time1);$time2=sec2gmt($time2)'
1062       Options:
1063       -1 through -9: format the seconds using 1..9 decimal places, respectively.
1064
1065   sec2gmtdate
1066       Usage: mlr sec2gmtdate {comma-separated list of field names}
1067       Replaces a numeric field representing seconds since the epoch with the
1068       corresponding GMT year-month-day timestamp; leaves non-numbers as-is.
1069       This is nothing more than a keystroke-saver for the sec2gmtdate function:
1070         mlr sec2gmtdate time1,time2
1071       is the same as
1072         mlr put '$time1=sec2gmtdate($time1);$time2=sec2gmtdate($time2)'
1073
1074   seqgen
1075       Usage: mlr seqgen [options]
1076       Produces a sequence of counters.  Discards the input record stream. Produces
1077       output as specified by the following options:
1078       -f {name} Field name for counters; default "i".
1079       --start {number} Inclusive start value; default "1".
1080       --stop  {number} Inclusive stop value; default "100".
1081       --step  {number} Step value; default "1".
1082       Start, stop, and/or step may be floating-point. Output is integer if start,
1083       stop, and step are all integers. Step may be negative. It may not be zero
1084       unless start == stop.
1085
1086   shuffle
1087       Usage: mlr shuffle {no options}
1088       Outputs records randomly permuted. No output records are produced until
1089       all input records are read.
1090       See also mlr bootstrap and mlr sample.
1091
1092   sort
1093       Usage: mlr sort {flags}
1094       Flags:
1095         -f  {comma-separated field names}  Lexical ascending
1096         -n  {comma-separated field names}  Numerical ascending; nulls sort last
1097         -nf {comma-separated field names}  Numerical ascending; nulls sort last
1098         -r  {comma-separated field names}  Lexical descending
1099         -nr {comma-separated field names}  Numerical descending; nulls sort first
1100       Sorts records primarily by the first specified field, secondarily by the second
1101       field, and so on.  (Any records not having all specified sort keys will appear
1102       at the end of the output, in the order they were encountered, regardless of the
1103       specified sort order.) The sort is stable: records that compare equal will sort
1104       in the order they were encountered in the input record stream.
1105
1106       Example:
1107         mlr sort -f a,b -nr x,y,z
1108       which is the same as:
1109         mlr sort -f a -f b -nr x -nr y -nr z
1110
1111   stats1
1112       Usage: mlr stats1 [options]
1113       Computes univariate statistics for one or more given fields, accumulated across
1114       the input record stream.
1115       Options:
1116       -a {sum,count,...}  Names of accumulators: p10 p25.2 p50 p98 p100 etc. and/or
1117                           one or more of:
1118          count     Count instances of fields
1119          mode      Find most-frequently-occurring values for fields; first-found wins tie
1120          antimode  Find least-frequently-occurring values for fields; first-found wins tie
1121          sum       Compute sums of specified fields
1122          mean      Compute averages (sample means) of specified fields
1123          stddev    Compute sample standard deviation of specified fields
1124          var       Compute sample variance of specified fields
1125          meaneb    Estimate error bars for averages (assuming no sample autocorrelation)
1126          skewness  Compute sample skewness of specified fields
1127          kurtosis  Compute sample kurtosis of specified fields
1128          min       Compute minimum values of specified fields
1129          max       Compute maximum values of specified fields
1130       -f {a,b,c}   Value-field names on which to compute statistics
1131       --fr {regex} Regex for value-field names on which to compute statistics
1132                    (compute statsitics on values in all field names matching regex)
1133       --fx {regex} Inverted regex for value-field names on which to compute statistics
1134                    (compute statsitics on values in all field names not matching regex)
1135       -g {d,e,f}   Optional group-by-field names
1136       --gr {regex} Regex for optional group-by-field names
1137                    (group by values in field names matching regex)
1138       --gx {regex} Inverted regex for optional group-by-field names
1139                    (group by values in field names not matching regex)
1140       --grfx {regex} Shorthand for --gr {regex} --fx {that same regex}
1141       -i           Use interpolated percentiles, like R's type=7; default like type=1.
1142                    Not sensical for string-valued fields.
1143       -s           Print iterative stats. Useful in tail -f contexts (in which
1144                    case please avoid pprint-format output since end of input
1145                    stream will never be seen).
1146       -F           Computes integerable things (e.g. count) in floating point.
1147       Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
1148       Example: mlr stats1 -a count,mode -f size
1149       Example: mlr stats1 -a count,mode -f size -g shape
1150       Example: mlr stats1 -a count,mode --fr '^[a-h].*$' -gr '^k.*$'
1151                This computes count and mode statistics on all field names beginning
1152                with a through h, grouped by all field names starting with k.
1153       Notes:
1154       * p50 and median are synonymous.
1155       * min and max output the same results as p0 and p100, respectively, but use
1156         less memory.
1157       * String-valued data make sense unless arithmetic on them is required,
1158         e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
1159         numbers are less than strings.
1160       * count and mode allow text input; the rest require numeric input.
1161         In particular, 1 and 1.0 are distinct text for count and mode.
1162       * When there are mode ties, the first-encountered datum wins.
1163
1164   stats2
1165       Usage: mlr stats2 [options]
1166       Computes bivariate statistics for one or more given field-name pairs,
1167       accumulated across the input record stream.
1168       -a {linreg-ols,corr,...}  Names of accumulators: one or more of:
1169         linreg-pca   Linear regression using principal component analysis
1170         linreg-ols   Linear regression using ordinary least squares
1171         r2           Quality metric for linreg-ols (linreg-pca emits its own)
1172         logireg      Logistic regression
1173         corr         Sample correlation
1174         cov          Sample covariance
1175         covx         Sample-covariance matrix
1176       -f {a,b,c,d}   Value-field name-pairs on which to compute statistics.
1177                      There must be an even number of names.
1178       -g {e,f,g}     Optional group-by-field names.
1179       -v             Print additional output for linreg-pca.
1180       -s             Print iterative stats. Useful in tail -f contexts (in which
1181                      case please avoid pprint-format output since end of input
1182                      stream will never be seen).
1183       --fit          Rather than printing regression parameters, applies them to
1184                      the input data to compute new fit fields. All input records are
1185                      held in memory until end of input stream. Has effect only for
1186                      linreg-ols, linreg-pca, and logireg.
1187       Only one of -s or --fit may be used.
1188       Example: mlr stats2 -a linreg-pca -f x,y
1189       Example: mlr stats2 -a linreg-ols,r2 -f x,y -g size,shape
1190       Example: mlr stats2 -a corr -f x,y
1191
1192   step
1193       Usage: mlr step [options]
1194       Computes values dependent on the previous record, optionally grouped
1195       by category.
1196
1197       Options:
1198       -a {delta,rsum,...}   Names of steppers: comma-separated, one or more of:
1199         delta    Compute differences in field(s) between successive records
1200         shift    Include value(s) in field(s) from previous record, if any
1201         from-first Compute differences in field(s) from first record
1202         ratio    Compute ratios in field(s) between successive records
1203         rsum     Compute running sums of field(s) between successive records
1204         counter  Count instances of field(s) between successive records
1205         ewma     Exponentially weighted moving average over successive records
1206       -f {a,b,c} Value-field names on which to compute statistics
1207       -g {d,e,f} Optional group-by-field names
1208       -F         Computes integerable things (e.g. counter) in floating point.
1209       -d {x,y,z} Weights for ewma. 1 means current sample gets all weight (no
1210                  smoothing), near under under 1 is light smoothing, near over 0 is
1211                  heavy smoothing. Multiple weights may be specified, e.g.
1212                  "mlr step -a ewma -f sys_load -d 0.01,0.1,0.9". Default if omitted
1213                  is "-d 0.5".
1214       -o {a,b,c} Custom suffixes for EWMA output fields. If omitted, these default to
1215                  the -d values. If supplied, the number of -o values must be the same
1216                  as the number of -d values.
1217
1218       Examples:
1219         mlr step -a rsum -f request_size
1220         mlr step -a delta -f request_size -g hostname
1221         mlr step -a ewma -d 0.1,0.9 -f x,y
1222         mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y
1223         mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y -g group_name
1224
1225       Please see http://johnkerl.org/miller/doc/reference.html#filter or
1226       https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average
1227       for more information on EWMA.
1228
1229   tac
1230       Usage: mlr tac
1231       Prints records in reverse order from the order in which they were encountered.
1232
1233   tail
1234       Usage: mlr tail [options]
1235       -n {count}    Tail count to print; default 10
1236       -g {a,b,c}    Optional group-by-field names for tail counts
1237       Passes through the last n records, optionally by category.
1238
1239   tee
1240       Usage: mlr tee [options] {filename}
1241       Passes through input records (like mlr cat) but also writes to specified output
1242       file, using output-format flags from the command line (e.g. --ocsv). See also
1243       the "tee" keyword within mlr put, which allows data-dependent filenames.
1244       Options:
1245       -a:          append to existing file, if any, rather than overwriting.
1246       --no-fflush: don't call fflush() after every record.
1247       Any of the output-format command-line flags (see mlr -h). Example: using
1248         mlr --icsv --opprint put '...' then tee --ojson ./mytap.dat then stats1 ...
1249       the input is CSV, the output is pretty-print tabular, but the tee-file output
1250       is written in JSON format.
1251
1252   top
1253       Usage: mlr top [options]
1254       -f {a,b,c}    Value-field names for top counts.
1255       -g {d,e,f}    Optional group-by-field names for top counts.
1256       -n {count}    How many records to print per category; default 1.
1257       -a            Print all fields for top-value records; default is
1258                     to print only value and group-by fields. Requires a single
1259                     value-field name only.
1260       --min         Print top smallest values; default is top largest values.
1261       -F            Keep top values as floats even if they look like integers.
1262       -o {name}     Field name for output indices. Default "top_idx".
1263       Prints the n records with smallest/largest values at specified fields,
1264       optionally by category.
1265
1266   uniq
1267       Usage: mlr uniq [options]
1268       Prints distinct values for specified field names. With -c, same as
1269       count-distinct. For uniq, -f is a synonym for -g.
1270
1271       Options:
1272       -g {d,e,f}    Group-by-field names for uniq counts.
1273       -c            Show repeat counts in addition to unique values.
1274       -n            Show only the number of distinct values.
1275       -o {name}     Field name for output count. Default "count".
1276       -a            Output each unique record only once. Incompatible with -g.
1277                     With -c, produces unique records, with repeat counts for each.
1278                     With -n, produces only one record which is the unique-record count.
1279                     With neither -c nor -n, produces unique records.
1280
1281   unsparsify
1282       Usage: mlr unsparsify [options]
1283       Prints records with the union of field names over all input records.
1284       For field names absent in a given record but present in others, fills in
1285       a value. This verb retains all input before producing any output.
1286
1287       Options:
1288       --fill-with {filler string}  What to fill absent fields with. Defaults to
1289                                    the empty string.
1290
1291       Example: if the input is two records, one being 'a=1,b=2' and the other
1292       being 'b=3,c=4', then the output is the two records 'a=1,b=2,c=' and
1293       ’a=,b=3,c=4'.
1294

FUNCTIONS FOR FILTER/PUT

1296   +
1297       (class=arithmetic #args=2): Addition.
1298
1299       + (class=arithmetic #args=1): Unary plus.
1300
1301   -
1302       (class=arithmetic #args=2): Subtraction.
1303
1304       - (class=arithmetic #args=1): Unary minus.
1305
1306   *
1307       (class=arithmetic #args=2): Multiplication.
1308
1309   /
1310       (class=arithmetic #args=2): Division.
1311
1312   //
1313       (class=arithmetic #args=2): Integer division: rounds to negative (pythonic).
1314
1315   .+
1316       (class=arithmetic #args=2): Addition, with integer-to-integer overflow
1317
1318       .+ (class=arithmetic #args=1): Unary plus, with integer-to-integer overflow.
1319
1320   .-
1321       (class=arithmetic #args=2): Subtraction, with integer-to-integer overflow.
1322
1323       .- (class=arithmetic #args=1): Unary minus, with integer-to-integer overflow.
1324
1325   .*
1326       (class=arithmetic #args=2): Multiplication, with integer-to-integer overflow.
1327
1328   ./
1329       (class=arithmetic #args=2): Division, with integer-to-integer overflow.
1330
1331   .//
1332       (class=arithmetic #args=2): Integer division: rounds to negative (pythonic), with integer-to-integer overflow.
1333
1334   %
1335       (class=arithmetic #args=2): Remainder; never negative-valued (pythonic).
1336
1337   **
1338       (class=arithmetic #args=2): Exponentiation; same as pow, but as an infix
1339       operator.
1340
1341   |
1342       (class=arithmetic #args=2): Bitwise OR.
1343
1344   ^
1345       (class=arithmetic #args=2): Bitwise XOR.
1346
1347   &
1348       (class=arithmetic #args=2): Bitwise AND.
1349
1350   ~
1351       (class=arithmetic #args=1): Bitwise NOT. Beware '$y=~$x' since =~ is the
1352       regex-match operator: try '$y = ~$x'.
1353
1354   <<
1355       (class=arithmetic #args=2): Bitwise left-shift.
1356
1357   >>
1358       (class=arithmetic #args=2): Bitwise right-shift.
1359
1360   bitcount
1361       (class=arithmetic #args=1): Count of 1-bits
1362
1363   ==
1364       (class=boolean #args=2): String/numeric equality. Mixing number and string
1365       results in string compare.
1366
1367   !=
1368       (class=boolean #args=2): String/numeric inequality. Mixing number and string
1369       results in string compare.
1370
1371   =~
1372       (class=boolean #args=2): String (left-hand side) matches regex (right-hand
1373       side), e.g. '$name =~ "^a.*b$"'.
1374
1375   !=~
1376       (class=boolean #args=2): String (left-hand side) does not match regex
1377       (right-hand side), e.g. '$name !=~ "^a.*b$"'.
1378
1379   >
1380       (class=boolean #args=2): String/numeric greater-than. Mixing number and string
1381       results in string compare.
1382
1383   >=
1384       (class=boolean #args=2): String/numeric greater-than-or-equals. Mixing number
1385       and string results in string compare.
1386
1387   <
1388       (class=boolean #args=2): String/numeric less-than. Mixing number and string
1389       results in string compare.
1390
1391   <=
1392       (class=boolean #args=2): String/numeric less-than-or-equals. Mixing number
1393       and string results in string compare.
1394
1395   &&
1396       (class=boolean #args=2): Logical AND.
1397
1398   ||
1399       (class=boolean #args=2): Logical OR.
1400
1401   ^^
1402       (class=boolean #args=2): Logical XOR.
1403
1404   !
1405       (class=boolean #args=1): Logical negation.
1406
1407   ? :
1408       (class=boolean #args=3): Ternary operator.
1409
1410   .
1411       (class=string #args=2): String concatenation.
1412
1413   gsub
1414       (class=string #args=3): Example: '$name=gsub($name, "old", "new")'
1415       (replace all).
1416
1417   regextract
1418       (class=string #args=2): Example: '$name=regextract($name, "[A-Z]{3}[0-9]{2}")'
1419       .
1420
1421   regextract_or_else
1422       (class=string #args=3): Example: '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")'
1423       .
1424
1425   strlen
1426       (class=string #args=1): String length.
1427
1428   sub
1429       (class=string #args=3): Example: '$name=sub($name, "old", "new")'
1430       (replace once).
1431
1432   ssub
1433       (class=string #args=3): Like sub but does no regexing. No characters are special.
1434
1435   substr
1436       (class=string #args=3): substr(s,m,n) gives substring of s from 0-up position m to n
1437       inclusive. Negative indices -len .. -1 alias to 0 .. len-1.
1438
1439   tolower
1440       (class=string #args=1): Convert string to lowercase.
1441
1442   toupper
1443       (class=string #args=1): Convert string to uppercase.
1444
1445   lstrip
1446       (class=string #args=1): Strip leading whitespace from string.
1447
1448   rstrip
1449       (class=string #args=1): Strip trailing whitespace from string.
1450
1451   strip
1452       (class=string #args=1): Strip leading and trailing whitespace from string.
1453
1454   collapse_whitespace
1455       (class=string #args=1): Strip repeated whitespace from string.
1456
1457   clean_whitespace
1458       (class=string #args=1): Same as collapse_whitespace and strip.
1459
1460   abs
1461       (class=math #args=1): Absolute value.
1462
1463   acos
1464       (class=math #args=1): Inverse trigonometric cosine.
1465
1466   acosh
1467       (class=math #args=1): Inverse hyperbolic cosine.
1468
1469   asin
1470       (class=math #args=1): Inverse trigonometric sine.
1471
1472   asinh
1473       (class=math #args=1): Inverse hyperbolic sine.
1474
1475   atan
1476       (class=math #args=1): One-argument arctangent.
1477
1478   atan2
1479       (class=math #args=2): Two-argument arctangent.
1480
1481   atanh
1482       (class=math #args=1): Inverse hyperbolic tangent.
1483
1484   cbrt
1485       (class=math #args=1): Cube root.
1486
1487   ceil
1488       (class=math #args=1): Ceiling: nearest integer at or above.
1489
1490   cos
1491       (class=math #args=1): Trigonometric cosine.
1492
1493   cosh
1494       (class=math #args=1): Hyperbolic cosine.
1495
1496   erf
1497       (class=math #args=1): Error function.
1498
1499   erfc
1500       (class=math #args=1): Complementary error function.
1501
1502   exp
1503       (class=math #args=1): Exponential function e**x.
1504
1505   expm1
1506       (class=math #args=1): e**x - 1.
1507
1508   floor
1509       (class=math #args=1): Floor: nearest integer at or below.
1510
1511   invqnorm
1512       (class=math #args=1): Inverse of normal cumulative distribution
1513       function. Note that invqorm(urand()) is normally distributed.
1514
1515   log
1516       (class=math #args=1): Natural (base-e) logarithm.
1517
1518   log10
1519       (class=math #args=1): Base-10 logarithm.
1520
1521   log1p
1522       (class=math #args=1): log(1-x).
1523
1524   logifit
1525       (class=math #args=3): Given m and b from logistic regression, compute
1526       fit: $yhat=logifit($x,$m,$b).
1527
1528   madd
1529       (class=math #args=3): a + b mod m (integers)
1530
1531   max
1532       (class=math variadic): max of n numbers; null loses
1533
1534   mexp
1535       (class=math #args=3): a ** b mod m (integers)
1536
1537   min
1538       (class=math variadic): Min of n numbers; null loses
1539
1540   mmul
1541       (class=math #args=3): a * b mod m (integers)
1542
1543   msub
1544       (class=math #args=3): a - b mod m (integers)
1545
1546   pow
1547       (class=math #args=2): Exponentiation; same as **.
1548
1549   qnorm
1550       (class=math #args=1): Normal cumulative distribution function.
1551
1552   round
1553       (class=math #args=1): Round to nearest integer.
1554
1555   roundm
1556       (class=math #args=2): Round to nearest multiple of m: roundm($x,$m) is
1557       the same as round($x/$m)*$m
1558
1559   sgn
1560       (class=math #args=1): +1 for positive input, 0 for zero input, -1 for
1561       negative input.
1562
1563   sin
1564       (class=math #args=1): Trigonometric sine.
1565
1566   sinh
1567       (class=math #args=1): Hyperbolic sine.
1568
1569   sqrt
1570       (class=math #args=1): Square root.
1571
1572   tan
1573       (class=math #args=1): Trigonometric tangent.
1574
1575   tanh
1576       (class=math #args=1): Hyperbolic tangent.
1577
1578   urand
1579       (class=math #args=0): Floating-point numbers on the unit interval.
1580       Int-valued example: '$n=floor(20+urand()*11)'.
1581
1582   urand32
1583       (class=math #args=0): Integer uniformly distributed 0 and 2**32-1
1584       inclusive.
1585
1586   urandint
1587       (class=math #args=2): Integer uniformly distributed between inclusive
1588       integer endpoints.
1589
1590   dhms2fsec
1591       (class=time #args=1): Recovers floating-point seconds as in
1592       dhms2fsec("5d18h53m20.250000s") = 500000.250000
1593
1594   dhms2sec
1595       (class=time #args=1): Recovers integer seconds as in
1596       dhms2sec("5d18h53m20s") = 500000
1597
1598   fsec2dhms
1599       (class=time #args=1): Formats floating-point seconds as in
1600       fsec2dhms(500000.25) = "5d18h53m20.250000s"
1601
1602   fsec2hms
1603       (class=time #args=1): Formats floating-point seconds as in
1604       fsec2hms(5000.25) = "01:23:20.250000"
1605
1606   gmt2sec
1607       (class=time #args=1): Parses GMT timestamp as integer seconds since
1608       the epoch.
1609
1610   localtime2sec
1611       (class=time #args=1): Parses local timestamp as integer seconds since
1612       the epoch. Consults $TZ environment variable.
1613
1614   hms2fsec
1615       (class=time #args=1): Recovers floating-point seconds as in
1616       hms2fsec("01:23:20.250000") = 5000.250000
1617
1618   hms2sec
1619       (class=time #args=1): Recovers integer seconds as in
1620       hms2sec("01:23:20") = 5000
1621
1622   sec2dhms
1623       (class=time #args=1): Formats integer seconds as in sec2dhms(500000)
1624       = "5d18h53m20s"
1625
1626   sec2gmt
1627       (class=time #args=1): Formats seconds since epoch (integer part)
1628       as GMT timestamp, e.g. sec2gmt(1440768801.7) = "2015-08-28T13:33:21Z".
1629       Leaves non-numbers as-is.
1630
1631       sec2gmt (class=time #args=2): Formats seconds since epoch as GMT timestamp with n
1632       decimal places for seconds, e.g. sec2gmt(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1633       Leaves non-numbers as-is.
1634
1635   sec2gmtdate
1636       (class=time #args=1): Formats seconds since epoch (integer part)
1637       as GMT timestamp with year-month-date, e.g. sec2gmtdate(1440768801.7) = "2015-08-28".
1638       Leaves non-numbers as-is.
1639
1640   sec2localtime
1641       (class=time #args=1): Formats seconds since epoch (integer part)
1642       as local timestamp, e.g. sec2localtime(1440768801.7) = "2015-08-28T13:33:21Z".
1643       Consults $TZ environment variable. Leaves non-numbers as-is.
1644
1645       sec2localtime (class=time #args=2): Formats seconds since epoch as local timestamp with n
1646       decimal places for seconds, e.g. sec2localtime(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1647       Consults $TZ environment variable. Leaves non-numbers as-is.
1648
1649   sec2localdate
1650       (class=time #args=1): Formats seconds since epoch (integer part)
1651       as local timestamp with year-month-date, e.g. sec2localdate(1440768801.7) = "2015-08-28".
1652       Consults $TZ environment variable. Leaves non-numbers as-is.
1653
1654   sec2hms
1655       (class=time #args=1): Formats integer seconds as in
1656       sec2hms(5000) = "01:23:20"
1657
1658   strftime
1659       (class=time #args=2): Formats seconds since the epoch as timestamp, e.g.
1660       strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z", and
1661       strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z".
1662       Format strings are as in the C library (please see "man strftime" on your system),
1663       with the Miller-specific addition of "%1S" through "%9S" which format the seconds
1664       with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.)
1665       See also strftime_local.
1666
1667   strftime_local
1668       (class=time #args=2): Like strftime but consults the $TZ environment variable to get local time zone.
1669
1670   strptime
1671       (class=time #args=2): Parses timestamp as floating-point seconds since the epoch,
1672       e.g. strptime("2015-08-28T13:33:21Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.000000,
1673       and  strptime("2015-08-28T13:33:21.345Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.345000.
1674       See also strptime_local.
1675
1676   strptime_local
1677       (class=time #args=2): Like strptime, but consults $TZ environment variable to find and use local timezone.
1678
1679   systime
1680       (class=time #args=0): Floating-point seconds since the epoch,
1681       e.g. 1440768801.748936.
1682
1683   is_absent
1684       (class=typing #args=1): False if field is present in input, false otherwise
1685
1686   is_bool
1687       (class=typing #args=1): True if field is present with boolean value. Synonymous with is_boolean.
1688
1689   is_boolean
1690       (class=typing #args=1): True if field is present with boolean value. Synonymous with is_bool.
1691
1692   is_empty
1693       (class=typing #args=1): True if field is present in input with empty string value, false otherwise.
1694
1695   is_empty_map
1696       (class=typing #args=1): True if argument is a map which is empty.
1697
1698   is_float
1699       (class=typing #args=1): True if field is present with value inferred to be float
1700
1701   is_int
1702       (class=typing #args=1): True if field is present with value inferred to be int
1703
1704   is_map
1705       (class=typing #args=1): True if argument is a map.
1706
1707   is_nonempty_map
1708       (class=typing #args=1): True if argument is a map which is non-empty.
1709
1710   is_not_empty
1711       (class=typing #args=1): False if field is present in input with empty value, false otherwise
1712
1713   is_not_map
1714       (class=typing #args=1): True if argument is not a map.
1715
1716   is_not_null
1717       (class=typing #args=1): False if argument is null (empty or absent), true otherwise.
1718
1719   is_null
1720       (class=typing #args=1): True if argument is null (empty or absent), false otherwise.
1721
1722   is_numeric
1723       (class=typing #args=1): True if field is present with value inferred to be int or float
1724
1725   is_present
1726       (class=typing #args=1): True if field is present in input, false otherwise.
1727
1728   is_string
1729       (class=typing #args=1): True if field is present with string (including empty-string) value
1730
1731   asserting_absent
1732       (class=typing #args=1): Returns argument if it is absent in the input data, else
1733       throws an error.
1734
1735   asserting_bool
1736       (class=typing #args=1): Returns argument if it is present with boolean value, else
1737       throws an error.
1738
1739   asserting_boolean
1740       (class=typing #args=1): Returns argument if it is present with boolean value, else
1741       throws an error.
1742
1743   asserting_empty
1744       (class=typing #args=1): Returns argument if it is present in input with empty value,
1745       else throws an error.
1746
1747   asserting_empty_map
1748       (class=typing #args=1): Returns argument if it is a map with empty value, else
1749       throws an error.
1750
1751   asserting_float
1752       (class=typing #args=1): Returns argument if it is present with float value, else
1753       throws an error.
1754
1755   asserting_int
1756       (class=typing #args=1): Returns argument if it is present with int value, else
1757       throws an error.
1758
1759   asserting_map
1760       (class=typing #args=1): Returns argument if it is a map, else throws an error.
1761
1762   asserting_nonempty_map
1763       (class=typing #args=1): Returns argument if it is a non-empty map, else throws
1764       an error.
1765
1766   asserting_not_empty
1767       (class=typing #args=1): Returns argument if it is present in input with non-empty
1768       value, else throws an error.
1769
1770   asserting_not_map
1771       (class=typing #args=1): Returns argument if it is not a map, else throws an error.
1772
1773   asserting_not_null
1774       (class=typing #args=1): Returns argument if it is non-null (non-empty and non-absent),
1775       else throws an error.
1776
1777   asserting_null
1778       (class=typing #args=1): Returns argument if it is null (empty or absent), else throws
1779       an error.
1780
1781   asserting_numeric
1782       (class=typing #args=1): Returns argument if it is present with int or float value,
1783       else throws an error.
1784
1785   asserting_present
1786       (class=typing #args=1): Returns argument if it is present in input, else throws
1787       an error.
1788
1789   asserting_string
1790       (class=typing #args=1): Returns argument if it is present with string (including
1791       empty-string) value, else throws an error.
1792
1793   boolean
1794       (class=conversion #args=1): Convert int/float/bool/string to boolean.
1795
1796   float
1797       (class=conversion #args=1): Convert int/float/bool/string to float.
1798
1799   fmtnum
1800       (class=conversion #args=2): Convert int/float/bool to string using
1801       printf-style format string, e.g. '$s = fmtnum($n, "%06lld")'. WARNING: Miller numbers
1802       are all long long or double. If you use formats like %d or %f, behavior is undefined.
1803
1804   hexfmt
1805       (class=conversion #args=1): Convert int to string, e.g. 255 to "0xff".
1806
1807   int
1808       (class=conversion #args=1): Convert int/float/bool/string to int.
1809
1810   string
1811       (class=conversion #args=1): Convert int/float/bool/string to string.
1812
1813   typeof
1814       (class=conversion #args=1): Convert argument to type of argument (e.g.
1815       MT_STRING). For debug.
1816
1817   depth
1818       (class=maps #args=1): Prints maximum depth of hashmap: ''. Scalars have depth 0.
1819
1820   haskey
1821       (class=maps #args=2): True/false if map has/hasn't key, e.g. 'haskey($*, "a")' or
1822       ’haskey(mymap, mykey)'. Error if 1st argument is not a map.
1823
1824   joink
1825       (class=maps #args=2): Makes string from map keys. E.g. 'joink($*, ",")'.
1826
1827   joinkv
1828       (class=maps #args=3): Makes string from map key-value pairs. E.g. 'joinkv(@v[2], "=", ",")'
1829
1830   joinv
1831       (class=maps #args=2): Makes string from map keys. E.g. 'joinv(mymap, ",")'.
1832
1833   leafcount
1834       (class=maps #args=1): Counts total number of terminal values in hashmap. For single-level maps,
1835       same as length.
1836
1837   length
1838       (class=maps #args=1): Counts number of top-level entries in hashmap. Scalars have length 1.
1839
1840   mapdiff
1841       (class=maps variadic): With 0 args, returns empty map. With 1 arg, returns copy of arg.
1842       With 2 or more, returns copy of arg 1 with all keys from any of remaining argument maps removed.
1843
1844   mapexcept
1845       (class=maps variadic): Returns a map with keys from remaining arguments, if any, unset.
1846       E.g. 'mapexcept({1:2,3:4,5:6}, 1, 5, 7)' is '{3:4}'.
1847
1848   mapselect
1849       (class=maps variadic): Returns a map with only keys from remaining arguments set.
1850       E.g. 'mapselect({1:2,3:4,5:6}, 1, 5, 7)' is '{1:2,5:6}'.
1851
1852   mapsum
1853       (class=maps variadic): With 0 args, returns empty map. With >= 1 arg, returns a map with
1854       key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'.
1855
1856   splitkv
1857       (class=maps #args=3): Splits string by separators into map with type inference.
1858       E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives '{"a" : 1, "b" : 2, "c" : 3}'.
1859
1860   splitkvx
1861       (class=maps #args=3): Splits string by separators into map without type inference (keys and
1862       values are strings). E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives
1863       ’{"a" : "1", "b" : "2", "c" : "3"}'.
1864
1865   splitnv
1866       (class=maps #args=2): Splits string by separator into integer-indexed map with type inference.
1867       E.g. 'splitnv("a,b,c" , ",")' gives '{1 : "a", 2 : "b", 3 : "c"}'.
1868
1869   splitnvx
1870       (class=maps #args=2): Splits string by separator into integer-indexed map without type
1871       inference (values are strings). E.g. 'splitnv("4,5,6" , ",")' gives '{1 : "4", 2 : "5", 3 : "6"}'.
1872

KEYWORDS FOR PUT AND FILTER

1874   all
1875       all: used in "emit", "emitp", and "unset" as a synonym for @*
1876
1877   begin
1878       begin: defines a block of statements to be executed before input records
1879       are ingested. The body statements must be wrapped in curly braces.
1880       Example: 'begin { @count = 0 }'
1881
1882   bool
1883       bool: declares a boolean local variable in the current curly-braced scope.
1884       Type-checking happens at assignment: 'bool b = 1' is an error.
1885
1886   break
1887       break: causes execution to continue after the body of the current
1888       for/while/do-while loop.
1889
1890   call
1891       call: used for invoking a user-defined subroutine.
1892       Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
1893
1894   continue
1895       continue: causes execution to skip the remaining statements in the body of
1896       the current for/while/do-while loop. For-loop increments are still applied.
1897
1898   do
1899       do: with "while", introduces a do-while loop. The body statements must be wrapped
1900       in curly braces.
1901
1902   dump
1903       dump: prints all currently defined out-of-stream variables immediately
1904         to stdout as JSON.
1905
1906         With >, >>, or |, the data do not become part of the output record stream but
1907         are instead redirected.
1908
1909         The > and >> are for write and append, as in the shell, but (as with awk) the
1910         file-overwrite for > is on first write, not per record. The | is for piping to
1911         a process which will process the data. There will be one open file for each
1912         distinct file name (for > and >>) or one subordinate process for each distinct
1913         value of the piped-to command (for |). Output-formatting flags are taken from
1914         the main command line.
1915
1916         Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }'
1917         Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >  "mytap.dat"}'
1918         Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}'
1919         Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}'
1920
1921   edump
1922       edump: prints all currently defined out-of-stream variables immediately
1923         to stderr as JSON.
1924
1925         Example: mlr --from f.dat put -q '@v[NR]=$*; end { edump }'
1926
1927   elif
1928       elif: the way Miller spells "else if". The body statements must be wrapped
1929       in curly braces.
1930
1931   else
1932       else: terminates an if/elif/elif chain. The body statements must be wrapped
1933       in curly braces.
1934
1935   emit
1936       emit: inserts an out-of-stream variable into the output record stream. Hashmap
1937         indices present in the data but not slotted by emit arguments are not output.
1938
1939         With >, >>, or |, the data do not become part of the output record stream but
1940         are instead redirected.
1941
1942         The > and >> are for write and append, as in the shell, but (as with awk) the
1943         file-overwrite for > is on first write, not per record. The | is for piping to
1944         a process which will process the data. There will be one open file for each
1945         distinct file name (for > and >>) or one subordinate process for each distinct
1946         value of the piped-to command (for |). Output-formatting flags are taken from
1947         the main command line.
1948
1949         You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
1950         etc., to control the format of the output if the output is redirected. See also mlr -h.
1951
1952         Example: mlr --from f.dat put 'emit >  "/tmp/data-".$a, $*'
1953         Example: mlr --from f.dat put 'emit >  "/tmp/data-".$a, mapexcept($*, "a")'
1954         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums'
1955         Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums'
1956         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"'
1957         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"'
1958         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >  "mytap.dat", @*, "index1", "index2"'
1959         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"'
1960         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"'
1961         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
1962         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"'
1963
1964         Please see http://johnkerl.org/miller/doc for more information.
1965
1966   emitf
1967       emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the
1968         output record stream.
1969
1970         With >, >>, or |, the data do not become part of the output record stream but
1971         are instead redirected.
1972
1973         The > and >> are for write and append, as in the shell, but (as with awk) the
1974         file-overwrite for > is on first write, not per record. The | is for piping to
1975         a process which will process the data. There will be one open file for each
1976         distinct file name (for > and >>) or one subordinate process for each distinct
1977         value of the piped-to command (for |). Output-formatting flags are taken from
1978         the main command line.
1979
1980         You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
1981         etc., to control the format of the output if the output is redirected. See also mlr -h.
1982
1983         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a'
1984         Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a'
1985         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c'
1986         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c'
1987         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c'
1988         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c'
1989         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c'
1990         Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c'
1991
1992         Please see http://johnkerl.org/miller/doc for more information.
1993
1994   emitp
1995       emitp: inserts an out-of-stream variable into the output record stream.
1996         Hashmap indices present in the data but not slotted by emitp arguments are
1997         output concatenated with ":".
1998
1999         With >, >>, or |, the data do not become part of the output record stream but
2000         are instead redirected.
2001
2002         The > and >> are for write and append, as in the shell, but (as with awk) the
2003         file-overwrite for > is on first write, not per record. The | is for piping to
2004         a process which will process the data. There will be one open file for each
2005         distinct file name (for > and >>) or one subordinate process for each distinct
2006         value of the piped-to command (for |). Output-formatting flags are taken from
2007         the main command line.
2008
2009         You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2010         etc., to control the format of the output if the output is redirected. See also mlr -h.
2011
2012         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums'
2013         Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums'
2014         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"'
2015         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"'
2016         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >  "mytap.dat", @*, "index1", "index2"'
2017         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"'
2018         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"'
2019         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
2020         Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"'
2021
2022         Please see http://johnkerl.org/miller/doc for more information.
2023
2024   end
2025       end: defines a block of statements to be executed after input records
2026       are ingested. The body statements must be wrapped in curly braces.
2027       Example: 'end { emit @count }'
2028       Example: 'end { eprint "Final count is " . @count }'
2029
2030   eprint
2031       eprint: prints expression immediately to stderr.
2032         Example: mlr --from f.dat put -q 'eprint "The sum of x and y is ".($x+$y)'
2033         Example: mlr --from f.dat put -q 'for (k, v in $*) { eprint k . " => " . v }'
2034         Example: mlr --from f.dat put  '(NR % 1000 == 0) { eprint "Checkpoint ".NR}'
2035
2036   eprintn
2037       eprintn: prints expression immediately to stderr, without trailing newline.
2038         Example: mlr --from f.dat put -q 'eprintn "The sum of x and y is ".($x+$y); eprint ""'
2039
2040   false
2041       false: the boolean literal value.
2042
2043   filter
2044       filter: includes/excludes the record in the output record stream.
2045
2046         Example: mlr --from f.dat put 'filter (NR == 2 || $x > 5.4)'
2047
2048         Instead of put with 'filter false' you can simply use put -q.  The following
2049         uses the input record to accumulate data but only prints the running sum
2050         without printing the input record:
2051
2052         Example: mlr --from f.dat put -q '@running_sum += $x * $y; emit @running_sum'
2053
2054   float
2055       float: declares a floating-point local variable in the current curly-braced scope.
2056       Type-checking happens at assignment: 'float x = 0' is an error.
2057
2058   for
2059       for: defines a for-loop using one of three styles. The body statements must
2060       be wrapped in curly braces.
2061       For-loop over stream record:
2062         Example:  'for (k, v in $*) { ... }'
2063       For-loop over out-of-stream variables:
2064         Example: 'for (k, v in @counts) { ... }'
2065         Example: 'for ((k1, k2), v in @counts) { ... }'
2066         Example: 'for ((k1, k2, k3), v in @*) { ... }'
2067       C-style for-loop:
2068         Example:  'for (var i = 0, var b = 1; i < 10; i += 1, b *= 2) { ... }'
2069
2070   func
2071       func: used for defining a user-defined function.
2072       Example: 'func f(a,b) { return sqrt(a**2+b**2)} $d = f($x, $y)'
2073
2074   if
2075       if: starts an if/elif/elif chain. The body statements must be wrapped
2076       in curly braces.
2077
2078   in
2079       in: used in for-loops over stream records or out-of-stream variables.
2080
2081   int
2082       int: declares an integer local variable in the current curly-braced scope.
2083       Type-checking happens at assignment: 'int x = 0.0' is an error.
2084
2085   map
2086       map: declares an map-valued local variable in the current curly-braced scope.
2087       Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is
2088       always OK. map b = a is OK or not depending on whether a is a map.
2089
2090   num
2091       num: declares an int/float local variable in the current curly-braced scope.
2092       Type-checking happens at assignment: 'num b = true' is an error.
2093
2094   print
2095       print: prints expression immediately to stdout.
2096         Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)'
2097         Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }'
2098         Example: mlr --from f.dat put  '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
2099
2100   printn
2101       printn: prints expression immediately to stdout, without trailing newline.
2102         Example: mlr --from f.dat put -q 'printn "."; end { print "" }'
2103
2104   return
2105       return: specifies the return value from a user-defined function.
2106       Omitted return statements (including via if-branches) result in an absent-null
2107       return value, which in turns results in a skipped assignment to an LHS.
2108
2109   stderr
2110       stderr: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2111         to print to standard error.
2112
2113   stdout
2114       stdout: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2115         to print to standard output.
2116
2117   str
2118       str: declares a string local variable in the current curly-braced scope.
2119       Type-checking happens at assignment.
2120
2121   subr
2122       subr: used for defining a subroutine.
2123       Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
2124
2125   tee
2126       tee: prints the current record to specified file.
2127         This is an immediate print to the specified file (except for pprint format
2128         which of course waits until the end of the input stream to format all output).
2129
2130         The > and >> are for write and append, as in the shell, but (as with awk) the
2131         file-overwrite for > is on first write, not per record. The | is for piping to
2132         a process which will process the data. There will be one open file for each
2133         distinct file name (for > and >>) or one subordinate process for each distinct
2134         value of the piped-to command (for |). Output-formatting flags are taken from
2135         the main command line.
2136
2137         You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2138         etc., to control the format of the output. See also mlr -h.
2139
2140         emit with redirect and tee with redirect are identical, except tee can only
2141         output $*.
2142
2143         Example: mlr --from f.dat put 'tee >  "/tmp/data-".$a, $*'
2144         Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*'
2145         Example: mlr --from f.dat put 'tee >  stderr, $*'
2146         Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\]", $*'
2147         Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\] > /tmp/data-".$a, $*'
2148         Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2149         Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2150
2151   true
2152       true: the boolean literal value.
2153
2154   unset
2155       unset: clears field(s) from the current record, or an out-of-stream or local variable.
2156
2157         Example: mlr --from f.dat put 'unset $x'
2158         Example: mlr --from f.dat put 'unset $*'
2159         Example: mlr --from f.dat put 'for (k, v in $*) { if (k =~ "a.*") { unset $[k] } }'
2160         Example: mlr --from f.dat put '...; unset @sums'
2161         Example: mlr --from f.dat put '...; unset @sums["green"]'
2162         Example: mlr --from f.dat put '...; unset @*'
2163
2164   var
2165       var: declares an untyped local variable in the current curly-braced scope.
2166       Examples: 'var a=1', 'var xyz=""'
2167
2168   while
2169       while: introduces a while loop, or with "do", introduces a do-while loop.
2170       The body statements must be wrapped in curly braces.
2171
2172   ENV
2173       ENV: access to environment variables by name, e.g. '$home = ENV["HOME"]'
2174
2175   FILENAME
2176       FILENAME: evaluates to the name of the current file being processed.
2177
2178   FILENUM
2179       FILENUM: evaluates to the number of the current file being processed,
2180       starting with 1.
2181
2182   FNR
2183       FNR: evaluates to the number of the current record within the current file
2184       being processed, starting with 1. Resets at the start of each file.
2185
2186   IFS
2187       IFS: evaluates to the input field separator from the command line.
2188
2189   IPS
2190       IPS: evaluates to the input pair separator from the command line.
2191
2192   IRS
2193       IRS: evaluates to the input record separator from the command line,
2194       or to LF or CRLF from the input data if in autodetect mode (which is
2195       the default).
2196
2197   M_E
2198       M_E: the mathematical constant e.
2199
2200   M_PI
2201       M_PI: the mathematical constant pi.
2202
2203   NF
2204       NF: evaluates to the number of fields in the current record.
2205
2206   NR
2207       NR: evaluates to the number of the current record over all files
2208       being processed, starting with 1. Does not reset at the start of each file.
2209
2210   OFS
2211       OFS: evaluates to the output field separator from the command line.
2212
2213   OPS
2214       OPS: evaluates to the output pair separator from the command line.
2215
2216   ORS
2217       ORS: evaluates to the output record separator from the command line,
2218       or to LF or CRLF from the input data if in autodetect mode (which is
2219       the default).
2220

AUTHOR

2222       Miller is written by John Kerl <kerl.john.r@gmail.com>.
2223
2224       This manual page has been composed from Miller's help output by Eric
2225       MSP Veith <eveith@veith-m.de>.
2226