1MILLER(1) MILLER(1)
2
3
4
6 Miller is like awk, sed, cut, join, and sort for name-indexed data such
7 as CSV and tabular JSON.
8
10 Usage: mlr [I/O options] {verb} [verb-dependent options ...] {zero or
11 more file names}
12
13
15 Miller operates on key-value-pair data while the familiar Unix tools
16 operate on integer-indexed fields: if the natural data structure for
17 the latter is the array, then Miller's natural data structure is the
18 insertion-ordered hash map. This encompasses a variety of data
19 formats, including but not limited to the familiar CSV, TSV, and JSON.
20 (Miller can handle positionally-indexed data as a special case.) This
21 manpage documents Miller v5.4.0.
22
24 COMMAND-LINE SYNTAX
25 mlr --csv cut -f hostname,uptime mydata.csv
26 mlr --tsv --rs lf filter '$status != "down" && $upsec >= 10000' *.tsv
27 mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat
28 grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group
29 mlr join -j account_id -f accounts.dat then group-by account_name balances.dat
30 mlr --json put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json
31 mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/*
32 mlr stats2 -a linreg-pca -f u,v -g shape data/*
33 mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}' data/*
34 mlr --from estimates.tbl put '
35 for (k,v in $*) {
36 if (is_numeric(v) && k =~ "^[t-z].*$") {
37 $sum += v; $count += 1
38 }
39 }
40 $mean = $sum / $count # no assignment if count unset'
41 mlr --from infile.dat put -f analyze.mlr
42 mlr --from infile.dat put 'tee > "./taps/data-".$a."-".$b, $*'
43 mlr --from infile.dat put 'tee | "gzip > ./taps/data-".$a."-".$b.".gz", $*'
44 mlr --from infile.dat put -q '@v=$*; dump | "jq .[]"'
45 mlr --from infile.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
46
47 DATA FORMATS
48 DKVP: delimited key-value pairs (Miller default format)
49 +---------------------+
50 | apple=1,bat=2,cog=3 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
51 | dish=7,egg=8,flint | Record 2: "dish" => "7", "egg" => "8", "3" => "flint"
52 +---------------------+
53
54 NIDX: implicitly numerically indexed (Unix-toolkit style)
55 +---------------------+
56 | the quick brown | Record 1: "1" => "the", "2" => "quick", "3" => "brown"
57 | fox jumped | Record 2: "1" => "fox", "2" => "jumped"
58 +---------------------+
59
60 CSV/CSV-lite: comma-separated values with separate header line
61 +---------------------+
62 | apple,bat,cog |
63 | 1,2,3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
64 | 4,5,6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
65 +---------------------+
66
67 Tabular JSON: nested objects are supported, although arrays within them are not:
68 +---------------------+
69 | { |
70 | "apple": 1, | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
71 | "bat": 2, |
72 | "cog": 3 |
73 | } |
74 | { |
75 | "dish": { | Record 2: "dish:egg" => "7", "dish:flint" => "8", "garlic" => ""
76 | "egg": 7, |
77 | "flint": 8 |
78 | }, |
79 | "garlic": "" |
80 | } |
81 +---------------------+
82
83 PPRINT: pretty-printed tabular
84 +---------------------+
85 | apple bat cog |
86 | 1 2 3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
87 | 4 5 6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
88 +---------------------+
89
90 XTAB: pretty-printed transposed tabular
91 +---------------------+
92 | apple 1 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
93 | bat 2 |
94 | cog 3 |
95 | |
96 | dish 7 | Record 2: "dish" => "7", "egg" => "8"
97 | egg 8 |
98 +---------------------+
99
100 Markdown tabular (supported for output only):
101 +-----------------------+
102 | | apple | bat | cog | |
103 | | --- | --- | --- | |
104 | | 1 | 2 | 3 | | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
105 | | 4 | 5 | 6 | | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
106 +-----------------------+
107
109 In the following option flags, the version with "i" designates the
110 input stream, "o" the output stream, and the version without prefix
111 sets the option for both input and output stream. For example: --irs
112 sets the input record separator, --ors the output record separator, and
113 --rs sets both the input and output separator to the given value.
114
115 HELP OPTIONS
116 -h or --help Show this message.
117 --version Show the software version.
118 {verb name} --help Show verb-specific help.
119 --help-all-verbs Show help on all verbs.
120 -l or --list-all-verbs List only verb names.
121 -L List only verb names, one per line.
122 -f or --help-all-functions Show help on all built-in functions.
123 -F Show a bare listing of built-in functions by name.
124 -k or --help-all-keywords Show help on all keywords.
125 -K Show a bare listing of keywords by name.
126
127 VERB LIST
128 altkv bar bootstrap cat check clean-whitespace count-distinct count-similar
129 cut decimate fill-down filter fraction grep group-by group-like having-fields
130 head histogram join label least-frequent merge-fields most-frequent nest
131 nothing put regularize rename reorder repeat reshape sample sec2gmt
132 sec2gmtdate seqgen shuffle sort stats1 stats2 step tac tail tee top uniq
133 unsparsify
134
135 FUNCTION LIST
136 + + - - * / // .+ .+ .- .- .* ./ .// % ** | ^ & ~ << >> bitcount == != =~ !=~
137 > >= < <= && || ^^ ! ? : . gsub regextract regextract_or_else strlen sub ssub
138 substr tolower toupper lstrip rstrip strip collapse_whitespace
139 clean_whitespace abs acos acosh asin asinh atan atan2 atanh cbrt ceil cos cosh
140 erf erfc exp expm1 floor invqnorm log log10 log1p logifit madd max mexp min
141 mmul msub pow qnorm round roundm sgn sin sinh sqrt tan tanh urand urand32
142 urandint dhms2fsec dhms2sec fsec2dhms fsec2hms gmt2sec localtime2sec hms2fsec
143 hms2sec sec2dhms sec2gmt sec2gmt sec2gmtdate sec2localtime sec2localtime
144 sec2localdate sec2hms strftime strftime_local strptime strptime_local systime
145 is_absent is_bool is_boolean is_empty is_empty_map is_float is_int is_map
146 is_nonempty_map is_not_empty is_not_map is_not_null is_null is_numeric
147 is_present is_string asserting_absent asserting_bool asserting_boolean
148 asserting_empty asserting_empty_map asserting_float asserting_int
149 asserting_map asserting_nonempty_map asserting_not_empty asserting_not_map
150 asserting_not_null asserting_null asserting_numeric asserting_present
151 asserting_string boolean float fmtnum hexfmt int string typeof depth haskey
152 joink joinkv joinv leafcount length mapdiff mapexcept mapselect mapsum splitkv
153 splitkvx splitnv splitnvx
154
155 Please use "mlr --help-function {function name}" for function-specific help.
156
157 I/O FORMATTING
158 --idkvp --odkvp --dkvp Delimited key-value pairs, e.g "a=1,b=2"
159 (this is Miller's default format).
160
161 --inidx --onidx --nidx Implicitly-integer-indexed fields
162 (Unix-toolkit style).
163 -T Synonymous with "--nidx --fs tab".
164
165 --icsv --ocsv --csv Comma-separated value (or tab-separated
166 with --fs tab, etc.)
167
168 --itsv --otsv --tsv Keystroke-savers for "--icsv --ifs tab",
169 "--ocsv --ofs tab", "--csv --fs tab".
170
171 --icsvlite --ocsvlite --csvlite Comma-separated value (or tab-separated
172 with --fs tab, etc.). The 'lite' CSV does not handle
173 RFC-CSV double-quoting rules; is slightly faster;
174 and handles heterogeneity in the input stream via
175 empty newline followed by new header line. See also
176 http://johnkerl.org/miller/doc/file-formats.html#CSV/TSV/etc.
177
178 --itsvlite --otsvlite --tsvlite Keystroke-savers for "--icsvlite --ifs tab",
179 "--ocsvlite --ofs tab", "--csvlite --fs tab".
180 -t Synonymous with --tsvlite.
181
182 --ipprint --opprint --pprint Pretty-printed tabular (produces no
183 output until all input is in).
184 --right Right-justifies all fields for PPRINT output.
185 --barred Prints a border around PPRINT output
186 (only available for output).
187
188 --omd Markdown-tabular (only available for output).
189
190 --ixtab --oxtab --xtab Pretty-printed vertical-tabular.
191 --xvright Right-justifies values for XTAB format.
192
193 --ijson --ojson --json JSON tabular: sequence or list of one-level
194 maps: {...}{...} or [{...},{...}].
195 --json-map-arrays-on-input JSON arrays are unmillerable. --json-map-arrays-on-input
196 --json-skip-arrays-on-input is the default: arrays are converted to integer-indexed
197 --json-fatal-arrays-on-input maps. The other two options cause them to be skipped, or
198 to be treated as errors. Please use the jq tool for full
199 JSON (pre)processing.
200 --jvstack Put one key-value pair per line for JSON
201 output.
202 --jlistwrap Wrap JSON output in outermost [ ].
203 --jknquoteint Do not quote non-string map keys in JSON output.
204 --jvquoteall Quote map values in JSON output, even if they're
205 numeric.
206 --jflatsep {string} Separator for flattening multi-level JSON keys,
207 e.g. '{"a":{"b":3}}' becomes a:b => 3 for
208 non-JSON formats. Defaults to :.
209
210 -p is a keystroke-saver for --nidx --fs space --repifs
211
212 --mmap --no-mmap --mmap-below {n} Use mmap for files whenever possible, never, or
213 for files less than n bytes in size. Default is for
214 files less than 4294967296 bytes in size.
215 'Whenever possible' means always except for when reading
216 standard input which is not mmappable. If you don't know
217 what this means, don't worry about it -- it's a minor
218 performance optimization.
219
220 Examples: --csv for CSV-formatted input and output; --idkvp --opprint for
221 DKVP-formatted input and pretty-printed output.
222
223 COMMENTS IN DATA
224 --skip-comments Ignore commented lines (prefixed by "#")
225 within the input.
226 --skip-comments-with {string} Ignore commented lines within input, with
227 specified prefix.
228 --pass-comments Immediately print commented lines (prefixed by "#")
229 within the input.
230 --pass-comments-with {string} Immediately print commented lines within input, with
231 specified prefix.
232 Notes:
233 * Comments are only honored at the start of a line.
234 * In the absence of any of the above four options, comments are data like
235 any other text.
236 * When pass-comments is used, comment lines are written to standard output
237 immediately upon being read; they are not part of the record stream.
238 Results may be counterintuitive. A suggestion is to place comments at the
239 start of data files.
240
241 FORMAT-CONVERSION KEYSTROKE-SAVERS
242 As keystroke-savers for format-conversion you may use the following:
243 --c2t --c2d --c2n --c2j --c2x --c2p --c2m
244 --t2c --t2d --t2n --t2j --t2x --t2p --t2m
245 --d2c --d2t --d2n --d2j --d2x --d2p --d2m
246 --n2c --n2t --n2d --n2j --n2x --n2p --n2m
247 --j2c --j2t --j2d --j2n --j2x --j2p --j2m
248 --x2c --x2t --x2d --x2n --x2j --x2p --x2m
249 --p2c --p2t --p2d --p2n --p2j --p2x --p2m
250 The letters c t d n j x p m refer to formats CSV, TSV, DKVP, NIDX, JSON, XTAB,
251 PPRINT, and markdown, respectively. Note that markdown format is available for
252 output only.
253
254 COMPRESSED I/O
255 --prepipe {command} This allows Miller to handle compressed inputs. You can do
256 without this for single input files, e.g. "gunzip < myfile.csv.gz | mlr ...".
257 However, when multiple input files are present, between-file separations are
258 lost; also, the FILENAME variable doesn't iterate. Using --prepipe you can
259 specify an action to be taken on each input file. This pre-pipe command must
260 be able to read from standard input; it will be invoked with
261 {command} < {filename}.
262 Examples:
263 mlr --prepipe 'gunzip'
264 mlr --prepipe 'zcat -cf'
265 mlr --prepipe 'xz -cd'
266 mlr --prepipe cat
267 Note that this feature is quite general and is not limited to decompression
268 utilities. You can use it to apply per-file filters of your choice.
269 For output compression (or other) utilities, simply pipe the output:
270 mlr ... | {your compression command}
271
272 SEPARATORS
273 --rs --irs --ors Record separators, e.g. 'lf' or '\r\n'
274 --fs --ifs --ofs --repifs Field separators, e.g. comma
275 --ps --ips --ops Pair separators, e.g. equals sign
276
277 Notes about line endings:
278 * Default line endings (--irs and --ors) are "auto" which means autodetect from
279 the input file format, as long as the input file(s) have lines ending in either
280 LF (also known as linefeed, '\n', 0x0a, Unix-style) or CRLF (also known as
281 carriage-return/linefeed pairs, '\r\n', 0x0d 0x0a, Windows style).
282 * If both irs and ors are auto (which is the default) then LF input will lead to LF
283 output and CRLF input will lead to CRLF output, regardless of the platform you're
284 running on.
285 * The line-ending autodetector triggers on the first line ending detected in the input
286 stream. E.g. if you specify a CRLF-terminated file on the command line followed by an
287 LF-terminated file then autodetected line endings will be CRLF.
288 * If you use --ors {something else} with (default or explicitly specified) --irs auto
289 then line endings are autodetected on input and set to what you specify on output.
290 * If you use --irs {something else} with (default or explicitly specified) --ors auto
291 then the output line endings used are LF on Unix/Linux/BSD/MacOSX, and CRLF on Windows.
292
293 Notes about all other separators:
294 * IPS/OPS are only used for DKVP and XTAB formats, since only in these formats
295 do key-value pairs appear juxtaposed.
296 * IRS/ORS are ignored for XTAB format. Nominally IFS and OFS are newlines;
297 XTAB records are separated by two or more consecutive IFS/OFS -- i.e.
298 a blank line. Everything above about --irs/--ors/--rs auto becomes --ifs/--ofs/--fs
299 auto for XTAB format. (XTAB's default IFS/OFS are "auto".)
300 * OFS must be single-character for PPRINT format. This is because it is used
301 with repetition for alignment; multi-character separators would make
302 alignment impossible.
303 * OPS may be multi-character for XTAB format, in which case alignment is
304 disabled.
305 * TSV is simply CSV using tab as field separator ("--fs tab").
306 * FS/PS are ignored for markdown format; RS is used.
307 * All FS and PS options are ignored for JSON format, since they are not relevant
308 to the JSON format.
309 * You can specify separators in any of the following ways, shown by example:
310 - Type them out, quoting as necessary for shell escapes, e.g.
311 "--fs '|' --ips :"
312 - C-style escape sequences, e.g. "--rs '\r\n' --fs '\t'".
313 - To avoid backslashing, you can use any of the following names:
314 cr crcr newline lf lflf crlf crlfcrlf tab space comma pipe slash colon semicolon equals
315 * Default separators by format:
316 File format RS FS PS
317 gen N/A (N/A) (N/A)
318 dkvp auto , =
319 json auto (N/A) (N/A)
320 nidx auto space (N/A)
321 csv auto , (N/A)
322 csvlite auto , (N/A)
323 markdown auto (N/A) (N/A)
324 pprint auto space (N/A)
325 xtab (N/A) auto space
326
327 CSV-SPECIFIC OPTIONS
328 --implicit-csv-header Use 1,2,3,... as field labels, rather than from line 1
329 of input files. Tip: combine with "label" to recreate
330 missing headers.
331 --headerless-csv-output Print only CSV data lines.
332
333 DOUBLE-QUOTING FOR CSV/CSVLITE OUTPUT
334 --quote-all Wrap all fields in double quotes
335 --quote-none Do not wrap any fields in double quotes, even if they have
336 OFS or ORS in them
337 --quote-minimal Wrap fields in double quotes only if they have OFS or ORS
338 in them (default)
339 --quote-numeric Wrap fields in double quotes only if they have numbers
340 in them
341 --quote-original Wrap fields in double quotes if and only if they were
342 quoted on input. This isn't sticky for computed fields:
343 e.g. if fields a and b were quoted on input and you do
344 "put '$c = $a . $b'" then field c won't inherit a or b's
345 was-quoted-on-input flag.
346
347 NUMERICAL FORMATTING
348 --ofmt {format} E.g. %.18lf, %.0lf. Please use sprintf-style codes for
349 double-precision. Applies to verbs which compute new
350 values, e.g. put, stats1, stats2. See also the fmtnum
351 function within mlr put (mlr --help-all-functions).
352 Defaults to %lf.
353
354 OTHER OPTIONS
355 --seed {n} with n of the form 12345678 or 0xcafefeed. For put/filter
356 urand()/urandint()/urand32().
357 --nr-progress-mod {m}, with m a positive integer: print filename and record
358 count to stderr every m input records.
359 --from {filename} Use this to specify an input file before the verb(s),
360 rather than after. May be used more than once. Example:
361 "mlr --from a.dat --from b.dat cat" is the same as
362 "mlr cat a.dat b.dat".
363 -n Process no input files, nor standard input either. Useful
364 for mlr put with begin/end statements only. (Same as --from
365 /dev/null.) Also useful in "mlr -n put -v '...'" for
366 analyzing abstract syntax trees (if that's your thing).
367 -I Process files in-place. For each file name on the command
368 line, output is written to a temp file in the same
369 directory, which is then renamed over the original. Each
370 file is processed in isolation: if the output format is
371 CSV, CSV headers will be present in each output file;
372 statistics are only over each file's own records; and so on.
373
374 THEN-CHAINING
375 Output of one verb may be chained as input to another using "then", e.g.
376 mlr stats1 -a min,mean,max -f flag,u,v -g color then sort -f color
377
378 AUXILIARY COMMANDS
379 Miller has a few otherwise-standalone executables packaged within it.
380 They do not participate in any other parts of Miller.
381 Available subcommands:
382 aux-list
383 lecat
384 termcvt
385 hex
386 unhex
387 netbsd-strptime
388 For more information, please invoke mlr {subcommand} --help
389
391 altkv
392 Usage: mlr altkv [no options]
393 Given fields with values of the form a,b,c,d,e,f emits a=b,c=d,e=f pairs.
394
395 bar
396 Usage: mlr bar [options]
397 Replaces a numeric field with a number of asterisks, allowing for cheesy
398 bar plots. These align best with --opprint or --oxtab output format.
399 Options:
400 -f {a,b,c} Field names to convert to bars.
401 -c {character} Fill character: default '*'.
402 -x {character} Out-of-bounds character: default '#'.
403 -b {character} Blank character: default '.'.
404 --lo {lo} Lower-limit value for min-width bar: default '0.000000'.
405 --hi {hi} Upper-limit value for max-width bar: default '100.000000'.
406 -w {n} Bar-field width: default '40'.
407 --auto Automatically computes limits, ignoring --lo and --hi.
408 Holds all records in memory before producing any output.
409
410 bootstrap
411 Usage: mlr bootstrap [options]
412 Emits an n-sample, with replacement, of the input records.
413 Options:
414 -n {number} Number of samples to output. Defaults to number of input records.
415 Must be non-negative.
416 See also mlr sample and mlr shuffle.
417
418 cat
419 Usage: mlr cat [options]
420 Passes input records directly to output. Most useful for format conversion.
421 Options:
422 -n Prepend field "n" to each record with record-counter starting at 1
423 -g {comma-separated field name(s)} When used with -n/-N, writes record-counters
424 keyed by specified field name(s).
425 -N {name} Prepend field {name} to each record with record-counter starting at 1
426
427 check
428 Usage: mlr check
429 Consumes records without printing any output.
430 Useful for doing a well-formatted check on input data.
431
432 clean-whitespace
433 Usage: mlr clean-whitespace [options] {old1,new1,old2,new2,...}
434 For each record, for each field in the record, whitespace-cleans the keys and
435 values. Whitespace-cleaning entails stripping leading and trailing whitespace,
436 and replacing multiple whitespace with singles. For finer-grained control,
437 please see the DSL functions lstrip, rstrip, strip, collapse_whitespace,
438 and clean_whitespace.
439
440 Options:
441 -k|--keys-only Do not touch values.
442 -v|--values-only Do not touch keys.
443 It is an error to specify -k as well as -v.
444
445 count-distinct
446 Usage: mlr count-distinct [options]
447 Prints number of records having distinct values for specified field names.
448 Same as uniq -c.
449
450 Options:
451 -f {a,b,c} Field names for distinct count.
452 -n Show only the number of distinct values. Not compatible with -u.
453 -o {name} Field name for output count. Default "count".
454 Ignored with -u.
455 -u Do unlashed counts for multiple field names. With -f a,b and
456 without -u, computes counts for distinct combinations of a
457 and b field values. With -f a,b and with -u, computes counts
458 for distinct a field values and counts for distinct b field
459 values separately.
460
461 count-similar
462 Usage: mlr count-similar [options]
463 Ingests all records, then emits each record augmented by a count of
464 the number of other records having the same group-by field values.
465 Options:
466 -g {d,e,f} Group-by-field names for counts.
467 -o {name} Field name for output count. Default "count".
468
469 cut
470 Usage: mlr cut [options]
471 Passes through input records with specified fields included/excluded.
472 -f {a,b,c} Field names to include for cut.
473 -o Retain fields in the order specified here in the argument list.
474 Default is to retain them in the order found in the input data.
475 -x|--complement Exclude, rather than include, field names specified by -f.
476 -r Treat field names as regular expressions. "ab", "a.*b" will
477 match any field name containing the substring "ab" or matching
478 "a.*b", respectively; anchors of the form "^ab$", "^a.*b$" may
479 be used. The -o flag is ignored when -r is present.
480 Examples:
481 mlr cut -f hostname,status
482 mlr cut -x -f hostname,status
483 mlr cut -r -f '^status$,sda[0-9]'
484 mlr cut -r -f '^status$,"sda[0-9]"'
485 mlr cut -r -f '^status$,"sda[0-9]"i' (this is case-insensitive)
486
487 decimate
488 Usage: mlr decimate [options]
489 -n {count} Decimation factor; default 10
490 -b Decimate by printing first of every n.
491 -e Decimate by printing last of every n (default).
492 -g {a,b,c} Optional group-by-field names for decimate counts
493 Passes through one of every n records, optionally by category.
494
495 fill-down
496 Usage: mlr fill-down [options]
497 -f {a,b,c} Field names for fill-down
498 -a|--only-if-absent Field names for fill-down
499 If a given record has a missing value for a given field, fill that from
500 the corresponding value from a previous record, if any.
501 By default, a 'missing' field either is absent, or has the empty-string value.
502 With -a, a field is 'missing' only if it is absent.
503
504 filter
505 Usage: mlr filter [options] {expression}
506 Prints records for which {expression} evaluates to true.
507 If there are multiple semicolon-delimited expressions, all of them are
508 evaluated and the last one is used as the filter criterion.
509
510 Conversion options:
511 -S: Keeps field values as strings with no type inference to int or float.
512 -F: Keeps field values as strings or floats with no inference to int.
513 All field values are type-inferred to int/float/string unless this behavior is
514 suppressed with -S or -F.
515
516 Output/formatting options:
517 --oflatsep {string}: Separator to use when flattening multi-level @-variables
518 to output records for emit. Default ":".
519 --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
520 --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
521 Any of the output-format command-line flags (see mlr -h). Example: using
522 mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
523 the input is CSV, the output is pretty-print tabular, but the tee-file output
524 is written in JSON format.
525 --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
526 record.
527
528 Expression-specification options:
529 -f {filename}: the DSL expression is taken from the specified file rather
530 than from the command line. Outer single quotes wrapping the expression
531 should not be placed in the file. If -f is specified more than once,
532 all input files specified using -f are concatenated to produce the expression.
533 (For example, you can define functions in one file and call them from another.)
534 -e {expression}: You can use this after -f to add an expression. Example use
535 case: define functions/subroutines in a file you specify with -f, then call
536 them with an expression you specify with -e.
537 (If you mix -e and -f then the expressions are evaluated in the order encountered.
538 Since the expression pieces are simply concatenated, please be sure to use intervening
539 semicolons to separate expressions.)
540
541 Tracing options:
542 -v: Prints the expressions's AST (abstract syntax tree), which gives
543 full transparency on the precedence and associativity rules of
544 Miller's grammar, to stdout.
545 -a: Prints a low-level stack-allocation trace to stdout.
546 -t: Prints a low-level parser trace to stderr.
547 -T: Prints a every statement to stderr as it is executed.
548
549 Other options:
550 -x: Prints records for which {expression} evaluates to false.
551
552 Please use a dollar sign for field names and double-quotes for string
553 literals. If field names have special characters such as "." then you might
554 use braces, e.g. '${field.name}'. Miller built-in variables are
555 NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
556 variables. The environment-variable name may be an expression, e.g. a field
557 value.
558
559 Use # to comment to end of line.
560
561 Examples:
562 mlr filter 'log10($count) > 4.0'
563 mlr filter 'FNR == 2 (second record in each file)'
564 mlr filter 'urand() < 0.001' (subsampling)
565 mlr filter '$color != "blue" && $value > 4.2'
566 mlr filter '($x<.5 && $y<.5) || ($x>.5 && $y>.5)'
567 mlr filter '($name =~ "^sys.*east$") || ($name =~ "^dev.[0-9]+"i)'
568 mlr filter '$ab = $a+$b; $cd = $c+$d; $ab != $cd'
569 mlr filter '
570 NR == 1 ||
571 #NR == 2 ||
572 NR == 3
573 '
574
575 Please see http://johnkerl.org/miller/doc/reference.html for more information
576 including function list. Or "mlr -f". Please also also "mlr grep" which is
577 useful when you don't yet know which field name(s) you're looking for.
578
579 fraction
580 Usage: mlr fraction [options]
581 For each record's value in specified fields, computes the ratio of that
582 value to the sum of values in that field over all input records.
583 E.g. with input records x=1 x=2 x=3 and x=4, emits output records
584 x=1,x_fraction=0.1 x=2,x_fraction=0.2 x=3,x_fraction=0.3 and x=4,x_fraction=0.4
585
586 Note: this is internally a two-pass algorithm: on the first pass it retains
587 input records and accumulates sums; on the second pass it computes quotients
588 and emits output records. This means it produces no output until all input is read.
589
590 Options:
591 -f {a,b,c} Field name(s) for fraction calculation
592 -g {d,e,f} Optional group-by-field name(s) for fraction counts
593 -p Produce percents [0..100], not fractions [0..1]. Output field names
594 end with "_percent" rather than "_fraction"
595 -c Produce cumulative distributions, i.e. running sums: each output
596 value folds in the sum of the previous for the specified group
597 E.g. with input records x=1 x=2 x=3 and x=4, emits output records
598 x=1,x_cumulative_fraction=0.1 x=2,x_cumulative_fraction=0.3
599 x=3,x_cumulative_fraction=0.6 and x=4,x_cumulative_fraction=1.0
600
601 grep
602 Usage: mlr grep [options] {regular expression}
603 Passes through records which match {regex}.
604 Options:
605 -i Use case-insensitive search.
606 -v Invert: pass through records which do not match the regex.
607 Note that "mlr filter" is more powerful, but requires you to know field names.
608 By contrast, "mlr grep" allows you to regex-match the entire record. It does
609 this by formatting each record in memory as DKVP, using command-line-specified
610 ORS/OFS/OPS, and matching the resulting line against the regex specified
611 here. In particular, the regex is not applied to the input stream: if you
612 have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
613 be matched, not against either of these lines, but against the DKVP line
614 "x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
615 and this command is intended to be merely a keystroke-saver. To get all the
616 features of system grep, you can do
617 "mlr --odkvp ... | grep ... | mlr --idkvp ..."
618
619 group-by
620 Usage: mlr group-by {comma-separated field names}
621 Outputs records in batches having identical values at specified field names.
622
623 group-like
624 Usage: mlr group-like
625 Outputs records in batches having identical field names.
626
627 having-fields
628 Usage: mlr having-fields [options]
629 Conditionally passes through records depending on each record's field names.
630 Options:
631 --at-least {comma-separated names}
632 --which-are {comma-separated names}
633 --at-most {comma-separated names}
634 --all-matching {regular expression}
635 --any-matching {regular expression}
636 --none-matching {regular expression}
637 Examples:
638 mlr having-fields --which-are amount,status,owner
639 mlr having-fields --any-matching 'sda[0-9]'
640 mlr having-fields --any-matching '"sda[0-9]"'
641 mlr having-fields --any-matching '"sda[0-9]"i' (this is case-insensitive)
642
643 head
644 Usage: mlr head [options]
645 -n {count} Head count to print; default 10
646 -g {a,b,c} Optional group-by-field names for head counts
647 Passes through the first n records, optionally by category.
648 Without -g, ceases consuming more input (i.e. is fast) when n
649 records have been read.
650
651 histogram
652 Usage: mlr histogram [options]
653 -f {a,b,c} Value-field names for histogram counts
654 --lo {lo} Histogram low value
655 --hi {hi} Histogram high value
656 --nbins {n} Number of histogram bins
657 --auto Automatically computes limits, ignoring --lo and --hi.
658 Holds all values in memory before producing any output.
659 -o {prefix} Prefix for output field name. Default: no prefix.
660 Just a histogram. Input values < lo or > hi are not counted.
661
662 join
663 Usage: mlr join [options]
664 Joins records from specified left file name with records from all file names
665 at the end of the Miller argument list.
666 Functionality is essentially the same as the system "join" command, but for
667 record streams.
668 Options:
669 -f {left file name}
670 -j {a,b,c} Comma-separated join-field names for output
671 -l {a,b,c} Comma-separated join-field names for left input file;
672 defaults to -j values if omitted.
673 -r {a,b,c} Comma-separated join-field names for right input file(s);
674 defaults to -j values if omitted.
675 --lp {text} Additional prefix for non-join output field names from
676 the left file
677 --rp {text} Additional prefix for non-join output field names from
678 the right file(s)
679 --np Do not emit paired records
680 --ul Emit unpaired records from the left file
681 --ur Emit unpaired records from the right file(s)
682 -s|--sorted-input Require sorted input: records must be sorted
683 lexically by their join-field names, else not all records will
684 be paired. The only likely use case for this is with a left
685 file which is too big to fit into system memory otherwise.
686 -u Enable unsorted input. (This is the default even without -u.)
687 In this case, the entire left file will be loaded into memory.
688 --prepipe {command} As in main input options; see mlr --help for details.
689 If you wish to use a prepipe command for the main input as well
690 as here, it must be specified there as well as here.
691 File-format options default to those for the right file names on the Miller
692 argument list, but may be overridden for the left file as follows. Please see
693 the main "mlr --help" for more information on syntax for these arguments.
694 -i {one of csv,dkvp,nidx,pprint,xtab}
695 --irs {record-separator character}
696 --ifs {field-separator character}
697 --ips {pair-separator character}
698 --repifs
699 --repips
700 --mmap
701 --no-mmap
702 Please use "mlr --usage-separator-options" for information on specifying separators.
703 Please see http://johnkerl.org/miller/doc/reference.html for more information
704 including examples.
705
706 label
707 Usage: mlr label {new1,new2,new3,...}
708 Given n comma-separated names, renames the first n fields of each record to
709 have the respective name. (Fields past the nth are left with their original
710 names.) Particularly useful with --inidx or --implicit-csv-header, to give
711 useful names to otherwise integer-indexed fields.
712 Examples:
713 "echo 'a b c d' | mlr --inidx --odkvp cat" gives "1=a,2=b,3=c,4=d"
714 "echo 'a b c d' | mlr --inidx --odkvp label s,t" gives "s=a,t=b,3=c,4=d"
715
716 least-frequent
717 Usage: mlr least-frequent [options]
718 Shows the least frequently occurring distinct values for specified field names.
719 The first entry is the statistical anti-mode; the remaining are runners-up.
720 Options:
721 -f {one or more comma-separated field names}. Required flag.
722 -n {count}. Optional flag defaulting to 10.
723 -b Suppress counts; show only field values.
724 -o {name} Field name for output count. Default "count".
725 See also "mlr most-frequent".
726
727 merge-fields
728 Usage: mlr merge-fields [options]
729 Computes univariate statistics for each input record, accumulated across
730 specified fields.
731 Options:
732 -a {sum,count,...} Names of accumulators. One or more of:
733 count Count instances of fields
734 mode Find most-frequently-occurring values for fields; first-found wins tie
735 antimode Find least-frequently-occurring values for fields; first-found wins tie
736 sum Compute sums of specified fields
737 mean Compute averages (sample means) of specified fields
738 stddev Compute sample standard deviation of specified fields
739 var Compute sample variance of specified fields
740 meaneb Estimate error bars for averages (assuming no sample autocorrelation)
741 skewness Compute sample skewness of specified fields
742 kurtosis Compute sample kurtosis of specified fields
743 min Compute minimum values of specified fields
744 max Compute maximum values of specified fields
745 -f {a,b,c} Value-field names on which to compute statistics. Requires -o.
746 -r {a,b,c} Regular expressions for value-field names on which to compute
747 statistics. Requires -o.
748 -c {a,b,c} Substrings for collapse mode. All fields which have the same names
749 after removing substrings will be accumulated together. Please see
750 examples below.
751 -i Use interpolated percentiles, like R's type=7; default like type=1.
752 Not sensical for string-valued fields.
753 -o {name} Output field basename for -f/-r.
754 -k Keep the input fields which contributed to the output statistics;
755 the default is to omit them.
756 -F Computes integerable things (e.g. count) in floating point.
757
758 String-valued data make sense unless arithmetic on them is required,
759 e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
760 numbers are less than strings.
761
762 Example input data: "a_in_x=1,a_out_x=2,b_in_y=4,b_out_x=8".
763 Example: mlr merge-fields -a sum,count -f a_in_x,a_out_x -o foo
764 produces "b_in_y=4,b_out_x=8,foo_sum=3,foo_count=2" since "a_in_x,a_out_x" are
765 summed over.
766 Example: mlr merge-fields -a sum,count -r in_,out_ -o bar
767 produces "bar_sum=15,bar_count=4" since all four fields are summed over.
768 Example: mlr merge-fields -a sum,count -c in_,out_
769 produces "a_x_sum=3,a_x_count=2,b_y_sum=4,b_y_count=1,b_x_sum=8,b_x_count=1"
770 since "a_in_x" and "a_out_x" both collapse to "a_x", "b_in_y" collapses to
771 "b_y", and "b_out_x" collapses to "b_x".
772
773 most-frequent
774 Usage: mlr most-frequent [options]
775 Shows the most frequently occurring distinct values for specified field names.
776 The first entry is the statistical mode; the remaining are runners-up.
777 Options:
778 -f {one or more comma-separated field names}. Required flag.
779 -n {count}. Optional flag defaulting to 10.
780 -b Suppress counts; show only field values.
781 -o {name} Field name for output count. Default "count".
782 See also "mlr least-frequent".
783
784 nest
785 Usage: mlr nest [options]
786 Explodes specified field values into separate fields/records, or reverses this.
787 Options:
788 --explode,--implode One is required.
789 --values,--pairs One is required.
790 --across-records,--across-fields One is required.
791 -f {field name} Required.
792 --nested-fs {string} Defaults to ";". Field separator for nested values.
793 --nested-ps {string} Defaults to ":". Pair separator for nested key-value pairs.
794 --evar {string} Shorthand for --explode --values ---across-records --nested-fs {string}
795 Please use "mlr --usage-separator-options" for information on specifying separators.
796
797 Examples:
798
799 mlr nest --explode --values --across-records -f x
800 with input record "x=a;b;c,y=d" produces output records
801 "x=a,y=d"
802 "x=b,y=d"
803 "x=c,y=d"
804 Use --implode to do the reverse.
805
806 mlr nest --explode --values --across-fields -f x
807 with input record "x=a;b;c,y=d" produces output records
808 "x_1=a,x_2=b,x_3=c,y=d"
809 Use --implode to do the reverse.
810
811 mlr nest --explode --pairs --across-records -f x
812 with input record "x=a:1;b:2;c:3,y=d" produces output records
813 "a=1,y=d"
814 "b=2,y=d"
815 "c=3,y=d"
816
817 mlr nest --explode --pairs --across-fields -f x
818 with input record "x=a:1;b:2;c:3,y=d" produces output records
819 "a=1,b=2,c=3,y=d"
820
821 Notes:
822 * With --pairs, --implode doesn't make sense since the original field name has
823 been lost.
824 * The combination "--implode --values --across-records" is non-streaming:
825 no output records are produced until all input records have been read. In
826 particular, this means it won't work in tail -f contexts. But all other flag
827 combinations result in streaming (tail -f friendly) data processing.
828 * It's up to you to ensure that the nested-fs is distinct from your data's IFS:
829 e.g. by default the former is semicolon and the latter is comma.
830 See also mlr reshape.
831
832 nothing
833 Usage: mlr nothing [options]
834 Drops all input records. Useful for testing, or after tee/print/etc. have
835 produced other output.
836
837 put
838 Usage: mlr put [options] {expression}
839 Adds/updates specified field(s). Expressions are semicolon-separated and must
840 either be assignments, or evaluate to boolean. Booleans with following
841 statements in curly braces control whether those statements are executed;
842 booleans without following curly braces do nothing except side effects (e.g.
843 regex-captures into \1, \2, etc.).
844
845 Conversion options:
846 -S: Keeps field values as strings with no type inference to int or float.
847 -F: Keeps field values as strings or floats with no inference to int.
848 All field values are type-inferred to int/float/string unless this behavior is
849 suppressed with -S or -F.
850
851 Output/formatting options:
852 --oflatsep {string}: Separator to use when flattening multi-level @-variables
853 to output records for emit. Default ":".
854 --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
855 --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
856 Any of the output-format command-line flags (see mlr -h). Example: using
857 mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
858 the input is CSV, the output is pretty-print tabular, but the tee-file output
859 is written in JSON format.
860 --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
861 record.
862
863 Expression-specification options:
864 -f {filename}: the DSL expression is taken from the specified file rather
865 than from the command line. Outer single quotes wrapping the expression
866 should not be placed in the file. If -f is specified more than once,
867 all input files specified using -f are concatenated to produce the expression.
868 (For example, you can define functions in one file and call them from another.)
869 -e {expression}: You can use this after -f to add an expression. Example use
870 case: define functions/subroutines in a file you specify with -f, then call
871 them with an expression you specify with -e.
872 (If you mix -e and -f then the expressions are evaluated in the order encountered.
873 Since the expression pieces are simply concatenated, please be sure to use intervening
874 semicolons to separate expressions.)
875
876 Tracing options:
877 -v: Prints the expressions's AST (abstract syntax tree), which gives
878 full transparency on the precedence and associativity rules of
879 Miller's grammar, to stdout.
880 -a: Prints a low-level stack-allocation trace to stdout.
881 -t: Prints a low-level parser trace to stderr.
882 -T: Prints a every statement to stderr as it is executed.
883
884 Other options:
885 -q: Does not include the modified record in the output stream. Useful for when
886 all desired output is in begin and/or end blocks.
887
888 Please use a dollar sign for field names and double-quotes for string
889 literals. If field names have special characters such as "." then you might
890 use braces, e.g. '${field.name}'. Miller built-in variables are
891 NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
892 variables. The environment-variable name may be an expression, e.g. a field
893 value.
894
895 Use # to comment to end of line.
896
897 Examples:
898 mlr put '$y = log10($x); $z = sqrt($y)'
899 mlr put '$x>0.0 { $y=log10($x); $z=sqrt($y) }' # does {...} only if $x > 0.0
900 mlr put '$x>0.0; $y=log10($x); $z=sqrt($y)' # does all three statements
901 mlr put '$a =~ "([a-z]+)_([0-9]+); $b = "left_\1"; $c = "right_\2"'
902 mlr put '$a =~ "([a-z]+)_([0-9]+) { $b = "left_\1"; $c = "right_\2" }'
903 mlr put '$filename = FILENAME'
904 mlr put '$colored_shape = $color . "_" . $shape'
905 mlr put '$y = cos($theta); $z = atan2($y, $x)'
906 mlr put '$name = sub($name, "http.*com"i, "")'
907 mlr put -q '@sum += $x; end {emit @sum}'
908 mlr put -q '@sum[$a] += $x; end {emit @sum, "a"}'
909 mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}'
910 mlr put -q '@min=min(@min,$x);@max=max(@max,$x); end{emitf @min, @max}'
911 mlr put -q 'is_null(@xmax) || $x > @xmax {@xmax=$x; @recmax=$*}; end {emit @recmax}'
912 mlr put '
913 $x = 1;
914 #$y = 2;
915 $z = 3
916 '
917
918 Please see also 'mlr -k' for examples using redirected output.
919
920 Please see http://johnkerl.org/miller/doc/reference.html for more information
921 including function list. Or "mlr -f".
922 Please see in particular:
923 http://www.johnkerl.org/miller/doc/reference.html#put
924
925 regularize
926 Usage: mlr regularize
927 For records seen earlier in the data stream with same field names in
928 a different order, outputs them with field names in the previously
929 encountered order.
930 Example: input records a=1,c=2,b=3, then e=4,d=5, then c=7,a=6,b=8
931 output as a=1,c=2,b=3, then e=4,d=5, then a=6,c=7,b=8
932
933 rename
934 Usage: mlr rename [options] {old1,new1,old2,new2,...}
935 Renames specified fields.
936 Options:
937 -r Treat old field names as regular expressions. "ab", "a.*b"
938 will match any field name containing the substring "ab" or
939 matching "a.*b", respectively; anchors of the form "^ab$",
940 "^a.*b$" may be used. New field names may be plain strings,
941 or may contain capture groups of the form "\1" through
942 "\9". Wrapping the regex in double quotes is optional, but
943 is required if you wish to follow it with 'i' to indicate
944 case-insensitivity.
945 -g Do global replacement within each field name rather than
946 first-match replacement.
947 Examples:
948 mlr rename old_name,new_name'
949 mlr rename old_name_1,new_name_1,old_name_2,new_name_2'
950 mlr rename -r 'Date_[0-9]+,Date,' Rename all such fields to be "Date"
951 mlr rename -r '"Date_[0-9]+",Date' Same
952 mlr rename -r 'Date_([0-9]+).*,\1' Rename all such fields to be of the form 20151015
953 mlr rename -r '"name"i,Name' Rename "name", "Name", "NAME", etc. to "Name"
954
955 reorder
956 Usage: mlr reorder [options]
957 -f {a,b,c} Field names to reorder.
958 -e Put specified field names at record end: default is to put
959 them at record start.
960 Examples:
961 mlr reorder -f a,b sends input record "d=4,b=2,a=1,c=3" to "a=1,b=2,d=4,c=3".
962 mlr reorder -e -f a,b sends input record "d=4,b=2,a=1,c=3" to "d=4,c=3,a=1,b=2".
963
964 repeat
965 Usage: mlr repeat [options]
966 Copies input records to output records multiple times.
967 Options must be exactly one of the following:
968 -n {repeat count} Repeat each input record this many times.
969 -f {field name} Same, but take the repeat count from the specified
970 field name of each input record.
971 Example:
972 echo x=0 | mlr repeat -n 4 then put '$x=urand()'
973 produces:
974 x=0.488189
975 x=0.484973
976 x=0.704983
977 x=0.147311
978 Example:
979 echo a=1,b=2,c=3 | mlr repeat -f b
980 produces:
981 a=1,b=2,c=3
982 a=1,b=2,c=3
983 Example:
984 echo a=1,b=2,c=3 | mlr repeat -f c
985 produces:
986 a=1,b=2,c=3
987 a=1,b=2,c=3
988 a=1,b=2,c=3
989
990 reshape
991 Usage: mlr reshape [options]
992 Wide-to-long options:
993 -i {input field names} -o {key-field name,value-field name}
994 -r {input field regexes} -o {key-field name,value-field name}
995 These pivot/reshape the input data such that the input fields are removed
996 and separate records are emitted for each key/value pair.
997 Note: this works with tail -f and produces output records for each input
998 record seen.
999 Long-to-wide options:
1000 -s {key-field name,value-field name}
1001 These pivot/reshape the input data to undo the wide-to-long operation.
1002 Note: this does not work with tail -f; it produces output records only after
1003 all input records have been read.
1004
1005 Examples:
1006
1007 Input file "wide.txt":
1008 time X Y
1009 2009-01-01 0.65473572 2.4520609
1010 2009-01-02 -0.89248112 0.2154713
1011 2009-01-03 0.98012375 1.3179287
1012
1013 mlr --pprint reshape -i X,Y -o item,value wide.txt
1014 time item value
1015 2009-01-01 X 0.65473572
1016 2009-01-01 Y 2.4520609
1017 2009-01-02 X -0.89248112
1018 2009-01-02 Y 0.2154713
1019 2009-01-03 X 0.98012375
1020 2009-01-03 Y 1.3179287
1021
1022 mlr --pprint reshape -r '[A-Z]' -o item,value wide.txt
1023 time item value
1024 2009-01-01 X 0.65473572
1025 2009-01-01 Y 2.4520609
1026 2009-01-02 X -0.89248112
1027 2009-01-02 Y 0.2154713
1028 2009-01-03 X 0.98012375
1029 2009-01-03 Y 1.3179287
1030
1031 Input file "long.txt":
1032 time item value
1033 2009-01-01 X 0.65473572
1034 2009-01-01 Y 2.4520609
1035 2009-01-02 X -0.89248112
1036 2009-01-02 Y 0.2154713
1037 2009-01-03 X 0.98012375
1038 2009-01-03 Y 1.3179287
1039
1040 mlr --pprint reshape -s item,value long.txt
1041 time X Y
1042 2009-01-01 0.65473572 2.4520609
1043 2009-01-02 -0.89248112 0.2154713
1044 2009-01-03 0.98012375 1.3179287
1045 See also mlr nest.
1046
1047 sample
1048 Usage: mlr sample [options]
1049 Reservoir sampling (subsampling without replacement), optionally by category.
1050 -k {count} Required: number of records to output, total, or by group if using -g.
1051 -g {a,b,c} Optional: group-by-field names for samples.
1052 See also mlr bootstrap and mlr shuffle.
1053
1054 sec2gmt
1055 Usage: mlr sec2gmt [options] {comma-separated list of field names}
1056 Replaces a numeric field representing seconds since the epoch with the
1057 corresponding GMT timestamp; leaves non-numbers as-is. This is nothing
1058 more than a keystroke-saver for the sec2gmt function:
1059 mlr sec2gmt time1,time2
1060 is the same as
1061 mlr put '$time1=sec2gmt($time1);$time2=sec2gmt($time2)'
1062 Options:
1063 -1 through -9: format the seconds using 1..9 decimal places, respectively.
1064
1065 sec2gmtdate
1066 Usage: mlr sec2gmtdate {comma-separated list of field names}
1067 Replaces a numeric field representing seconds since the epoch with the
1068 corresponding GMT year-month-day timestamp; leaves non-numbers as-is.
1069 This is nothing more than a keystroke-saver for the sec2gmtdate function:
1070 mlr sec2gmtdate time1,time2
1071 is the same as
1072 mlr put '$time1=sec2gmtdate($time1);$time2=sec2gmtdate($time2)'
1073
1074 seqgen
1075 Usage: mlr seqgen [options]
1076 Produces a sequence of counters. Discards the input record stream. Produces
1077 output as specified by the following options:
1078 -f {name} Field name for counters; default "i".
1079 --start {number} Inclusive start value; default "1".
1080 --stop {number} Inclusive stop value; default "100".
1081 --step {number} Step value; default "1".
1082 Start, stop, and/or step may be floating-point. Output is integer if start,
1083 stop, and step are all integers. Step may be negative. It may not be zero
1084 unless start == stop.
1085
1086 shuffle
1087 Usage: mlr shuffle {no options}
1088 Outputs records randomly permuted. No output records are produced until
1089 all input records are read.
1090 See also mlr bootstrap and mlr sample.
1091
1092 sort
1093 Usage: mlr sort {flags}
1094 Flags:
1095 -f {comma-separated field names} Lexical ascending
1096 -n {comma-separated field names} Numerical ascending; nulls sort last
1097 -nf {comma-separated field names} Numerical ascending; nulls sort last
1098 -r {comma-separated field names} Lexical descending
1099 -nr {comma-separated field names} Numerical descending; nulls sort first
1100 Sorts records primarily by the first specified field, secondarily by the second
1101 field, and so on. (Any records not having all specified sort keys will appear
1102 at the end of the output, in the order they were encountered, regardless of the
1103 specified sort order.) The sort is stable: records that compare equal will sort
1104 in the order they were encountered in the input record stream.
1105
1106 Example:
1107 mlr sort -f a,b -nr x,y,z
1108 which is the same as:
1109 mlr sort -f a -f b -nr x -nr y -nr z
1110
1111 stats1
1112 Usage: mlr stats1 [options]
1113 Computes univariate statistics for one or more given fields, accumulated across
1114 the input record stream.
1115 Options:
1116 -a {sum,count,...} Names of accumulators: p10 p25.2 p50 p98 p100 etc. and/or
1117 one or more of:
1118 count Count instances of fields
1119 mode Find most-frequently-occurring values for fields; first-found wins tie
1120 antimode Find least-frequently-occurring values for fields; first-found wins tie
1121 sum Compute sums of specified fields
1122 mean Compute averages (sample means) of specified fields
1123 stddev Compute sample standard deviation of specified fields
1124 var Compute sample variance of specified fields
1125 meaneb Estimate error bars for averages (assuming no sample autocorrelation)
1126 skewness Compute sample skewness of specified fields
1127 kurtosis Compute sample kurtosis of specified fields
1128 min Compute minimum values of specified fields
1129 max Compute maximum values of specified fields
1130 -f {a,b,c} Value-field names on which to compute statistics
1131 --fr {regex} Regex for value-field names on which to compute statistics
1132 (compute statsitics on values in all field names matching regex)
1133 --fx {regex} Inverted regex for value-field names on which to compute statistics
1134 (compute statsitics on values in all field names not matching regex)
1135 -g {d,e,f} Optional group-by-field names
1136 --gr {regex} Regex for optional group-by-field names
1137 (group by values in field names matching regex)
1138 --gx {regex} Inverted regex for optional group-by-field names
1139 (group by values in field names not matching regex)
1140 --grfx {regex} Shorthand for --gr {regex} --fx {that same regex}
1141 -i Use interpolated percentiles, like R's type=7; default like type=1.
1142 Not sensical for string-valued fields.
1143 -s Print iterative stats. Useful in tail -f contexts (in which
1144 case please avoid pprint-format output since end of input
1145 stream will never be seen).
1146 -F Computes integerable things (e.g. count) in floating point.
1147 Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
1148 Example: mlr stats1 -a count,mode -f size
1149 Example: mlr stats1 -a count,mode -f size -g shape
1150 Example: mlr stats1 -a count,mode --fr '^[a-h].*$' -gr '^k.*$'
1151 This computes count and mode statistics on all field names beginning
1152 with a through h, grouped by all field names starting with k.
1153 Notes:
1154 * p50 and median are synonymous.
1155 * min and max output the same results as p0 and p100, respectively, but use
1156 less memory.
1157 * String-valued data make sense unless arithmetic on them is required,
1158 e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
1159 numbers are less than strings.
1160 * count and mode allow text input; the rest require numeric input.
1161 In particular, 1 and 1.0 are distinct text for count and mode.
1162 * When there are mode ties, the first-encountered datum wins.
1163
1164 stats2
1165 Usage: mlr stats2 [options]
1166 Computes bivariate statistics for one or more given field-name pairs,
1167 accumulated across the input record stream.
1168 -a {linreg-ols,corr,...} Names of accumulators: one or more of:
1169 linreg-pca Linear regression using principal component analysis
1170 linreg-ols Linear regression using ordinary least squares
1171 r2 Quality metric for linreg-ols (linreg-pca emits its own)
1172 logireg Logistic regression
1173 corr Sample correlation
1174 cov Sample covariance
1175 covx Sample-covariance matrix
1176 -f {a,b,c,d} Value-field name-pairs on which to compute statistics.
1177 There must be an even number of names.
1178 -g {e,f,g} Optional group-by-field names.
1179 -v Print additional output for linreg-pca.
1180 -s Print iterative stats. Useful in tail -f contexts (in which
1181 case please avoid pprint-format output since end of input
1182 stream will never be seen).
1183 --fit Rather than printing regression parameters, applies them to
1184 the input data to compute new fit fields. All input records are
1185 held in memory until end of input stream. Has effect only for
1186 linreg-ols, linreg-pca, and logireg.
1187 Only one of -s or --fit may be used.
1188 Example: mlr stats2 -a linreg-pca -f x,y
1189 Example: mlr stats2 -a linreg-ols,r2 -f x,y -g size,shape
1190 Example: mlr stats2 -a corr -f x,y
1191
1192 step
1193 Usage: mlr step [options]
1194 Computes values dependent on the previous record, optionally grouped
1195 by category.
1196
1197 Options:
1198 -a {delta,rsum,...} Names of steppers: comma-separated, one or more of:
1199 delta Compute differences in field(s) between successive records
1200 shift Include value(s) in field(s) from previous record, if any
1201 from-first Compute differences in field(s) from first record
1202 ratio Compute ratios in field(s) between successive records
1203 rsum Compute running sums of field(s) between successive records
1204 counter Count instances of field(s) between successive records
1205 ewma Exponentially weighted moving average over successive records
1206 -f {a,b,c} Value-field names on which to compute statistics
1207 -g {d,e,f} Optional group-by-field names
1208 -F Computes integerable things (e.g. counter) in floating point.
1209 -d {x,y,z} Weights for ewma. 1 means current sample gets all weight (no
1210 smoothing), near under under 1 is light smoothing, near over 0 is
1211 heavy smoothing. Multiple weights may be specified, e.g.
1212 "mlr step -a ewma -f sys_load -d 0.01,0.1,0.9". Default if omitted
1213 is "-d 0.5".
1214 -o {a,b,c} Custom suffixes for EWMA output fields. If omitted, these default to
1215 the -d values. If supplied, the number of -o values must be the same
1216 as the number of -d values.
1217
1218 Examples:
1219 mlr step -a rsum -f request_size
1220 mlr step -a delta -f request_size -g hostname
1221 mlr step -a ewma -d 0.1,0.9 -f x,y
1222 mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y
1223 mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y -g group_name
1224
1225 Please see http://johnkerl.org/miller/doc/reference.html#filter or
1226 https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average
1227 for more information on EWMA.
1228
1229 tac
1230 Usage: mlr tac
1231 Prints records in reverse order from the order in which they were encountered.
1232
1233 tail
1234 Usage: mlr tail [options]
1235 -n {count} Tail count to print; default 10
1236 -g {a,b,c} Optional group-by-field names for tail counts
1237 Passes through the last n records, optionally by category.
1238
1239 tee
1240 Usage: mlr tee [options] {filename}
1241 Passes through input records (like mlr cat) but also writes to specified output
1242 file, using output-format flags from the command line (e.g. --ocsv). See also
1243 the "tee" keyword within mlr put, which allows data-dependent filenames.
1244 Options:
1245 -a: append to existing file, if any, rather than overwriting.
1246 --no-fflush: don't call fflush() after every record.
1247 Any of the output-format command-line flags (see mlr -h). Example: using
1248 mlr --icsv --opprint put '...' then tee --ojson ./mytap.dat then stats1 ...
1249 the input is CSV, the output is pretty-print tabular, but the tee-file output
1250 is written in JSON format.
1251
1252 top
1253 Usage: mlr top [options]
1254 -f {a,b,c} Value-field names for top counts.
1255 -g {d,e,f} Optional group-by-field names for top counts.
1256 -n {count} How many records to print per category; default 1.
1257 -a Print all fields for top-value records; default is
1258 to print only value and group-by fields. Requires a single
1259 value-field name only.
1260 --min Print top smallest values; default is top largest values.
1261 -F Keep top values as floats even if they look like integers.
1262 -o {name} Field name for output indices. Default "top_idx".
1263 Prints the n records with smallest/largest values at specified fields,
1264 optionally by category.
1265
1266 uniq
1267 Usage: mlr uniq [options]
1268 Prints distinct values for specified field names. With -c, same as
1269 count-distinct. For uniq, -f is a synonym for -g.
1270
1271 Options:
1272 -g {d,e,f} Group-by-field names for uniq counts.
1273 -c Show repeat counts in addition to unique values.
1274 -n Show only the number of distinct values.
1275 -o {name} Field name for output count. Default "count".
1276 -a Output each unique record only once. Incompatible with -g.
1277 With -c, produces unique records, with repeat counts for each.
1278 With -n, produces only one record which is the unique-record count.
1279 With neither -c nor -n, produces unique records.
1280
1281 unsparsify
1282 Usage: mlr unsparsify [options]
1283 Prints records with the union of field names over all input records.
1284 For field names absent in a given record but present in others, fills in
1285 a value. This verb retains all input before producing any output.
1286
1287 Options:
1288 --fill-with {filler string} What to fill absent fields with. Defaults to
1289 the empty string.
1290
1291 Example: if the input is two records, one being 'a=1,b=2' and the other
1292 being 'b=3,c=4', then the output is the two records 'a=1,b=2,c=' and
1293 ’a=,b=3,c=4'.
1294
1296 +
1297 (class=arithmetic #args=2): Addition.
1298
1299 + (class=arithmetic #args=1): Unary plus.
1300
1301 -
1302 (class=arithmetic #args=2): Subtraction.
1303
1304 - (class=arithmetic #args=1): Unary minus.
1305
1306 *
1307 (class=arithmetic #args=2): Multiplication.
1308
1309 /
1310 (class=arithmetic #args=2): Division.
1311
1312 //
1313 (class=arithmetic #args=2): Integer division: rounds to negative (pythonic).
1314
1315 .+
1316 (class=arithmetic #args=2): Addition, with integer-to-integer overflow
1317
1318 .+ (class=arithmetic #args=1): Unary plus, with integer-to-integer overflow.
1319
1320 .-
1321 (class=arithmetic #args=2): Subtraction, with integer-to-integer overflow.
1322
1323 .- (class=arithmetic #args=1): Unary minus, with integer-to-integer overflow.
1324
1325 .*
1326 (class=arithmetic #args=2): Multiplication, with integer-to-integer overflow.
1327
1328 ./
1329 (class=arithmetic #args=2): Division, with integer-to-integer overflow.
1330
1331 .//
1332 (class=arithmetic #args=2): Integer division: rounds to negative (pythonic), with integer-to-integer overflow.
1333
1334 %
1335 (class=arithmetic #args=2): Remainder; never negative-valued (pythonic).
1336
1337 **
1338 (class=arithmetic #args=2): Exponentiation; same as pow, but as an infix
1339 operator.
1340
1341 |
1342 (class=arithmetic #args=2): Bitwise OR.
1343
1344 ^
1345 (class=arithmetic #args=2): Bitwise XOR.
1346
1347 &
1348 (class=arithmetic #args=2): Bitwise AND.
1349
1350 ~
1351 (class=arithmetic #args=1): Bitwise NOT. Beware '$y=~$x' since =~ is the
1352 regex-match operator: try '$y = ~$x'.
1353
1354 <<
1355 (class=arithmetic #args=2): Bitwise left-shift.
1356
1357 >>
1358 (class=arithmetic #args=2): Bitwise right-shift.
1359
1360 bitcount
1361 (class=arithmetic #args=1): Count of 1-bits
1362
1363 ==
1364 (class=boolean #args=2): String/numeric equality. Mixing number and string
1365 results in string compare.
1366
1367 !=
1368 (class=boolean #args=2): String/numeric inequality. Mixing number and string
1369 results in string compare.
1370
1371 =~
1372 (class=boolean #args=2): String (left-hand side) matches regex (right-hand
1373 side), e.g. '$name =~ "^a.*b$"'.
1374
1375 !=~
1376 (class=boolean #args=2): String (left-hand side) does not match regex
1377 (right-hand side), e.g. '$name !=~ "^a.*b$"'.
1378
1379 >
1380 (class=boolean #args=2): String/numeric greater-than. Mixing number and string
1381 results in string compare.
1382
1383 >=
1384 (class=boolean #args=2): String/numeric greater-than-or-equals. Mixing number
1385 and string results in string compare.
1386
1387 <
1388 (class=boolean #args=2): String/numeric less-than. Mixing number and string
1389 results in string compare.
1390
1391 <=
1392 (class=boolean #args=2): String/numeric less-than-or-equals. Mixing number
1393 and string results in string compare.
1394
1395 &&
1396 (class=boolean #args=2): Logical AND.
1397
1398 ||
1399 (class=boolean #args=2): Logical OR.
1400
1401 ^^
1402 (class=boolean #args=2): Logical XOR.
1403
1404 !
1405 (class=boolean #args=1): Logical negation.
1406
1407 ? :
1408 (class=boolean #args=3): Ternary operator.
1409
1410 .
1411 (class=string #args=2): String concatenation.
1412
1413 gsub
1414 (class=string #args=3): Example: '$name=gsub($name, "old", "new")'
1415 (replace all).
1416
1417 regextract
1418 (class=string #args=2): Example: '$name=regextract($name, "[A-Z]{3}[0-9]{2}")'
1419 .
1420
1421 regextract_or_else
1422 (class=string #args=3): Example: '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")'
1423 .
1424
1425 strlen
1426 (class=string #args=1): String length.
1427
1428 sub
1429 (class=string #args=3): Example: '$name=sub($name, "old", "new")'
1430 (replace once).
1431
1432 ssub
1433 (class=string #args=3): Like sub but does no regexing. No characters are special.
1434
1435 substr
1436 (class=string #args=3): substr(s,m,n) gives substring of s from 0-up position m to n
1437 inclusive. Negative indices -len .. -1 alias to 0 .. len-1.
1438
1439 tolower
1440 (class=string #args=1): Convert string to lowercase.
1441
1442 toupper
1443 (class=string #args=1): Convert string to uppercase.
1444
1445 lstrip
1446 (class=string #args=1): Strip leading whitespace from string.
1447
1448 rstrip
1449 (class=string #args=1): Strip trailing whitespace from string.
1450
1451 strip
1452 (class=string #args=1): Strip leading and trailing whitespace from string.
1453
1454 collapse_whitespace
1455 (class=string #args=1): Strip repeated whitespace from string.
1456
1457 clean_whitespace
1458 (class=string #args=1): Same as collapse_whitespace and strip.
1459
1460 abs
1461 (class=math #args=1): Absolute value.
1462
1463 acos
1464 (class=math #args=1): Inverse trigonometric cosine.
1465
1466 acosh
1467 (class=math #args=1): Inverse hyperbolic cosine.
1468
1469 asin
1470 (class=math #args=1): Inverse trigonometric sine.
1471
1472 asinh
1473 (class=math #args=1): Inverse hyperbolic sine.
1474
1475 atan
1476 (class=math #args=1): One-argument arctangent.
1477
1478 atan2
1479 (class=math #args=2): Two-argument arctangent.
1480
1481 atanh
1482 (class=math #args=1): Inverse hyperbolic tangent.
1483
1484 cbrt
1485 (class=math #args=1): Cube root.
1486
1487 ceil
1488 (class=math #args=1): Ceiling: nearest integer at or above.
1489
1490 cos
1491 (class=math #args=1): Trigonometric cosine.
1492
1493 cosh
1494 (class=math #args=1): Hyperbolic cosine.
1495
1496 erf
1497 (class=math #args=1): Error function.
1498
1499 erfc
1500 (class=math #args=1): Complementary error function.
1501
1502 exp
1503 (class=math #args=1): Exponential function e**x.
1504
1505 expm1
1506 (class=math #args=1): e**x - 1.
1507
1508 floor
1509 (class=math #args=1): Floor: nearest integer at or below.
1510
1511 invqnorm
1512 (class=math #args=1): Inverse of normal cumulative distribution
1513 function. Note that invqorm(urand()) is normally distributed.
1514
1515 log
1516 (class=math #args=1): Natural (base-e) logarithm.
1517
1518 log10
1519 (class=math #args=1): Base-10 logarithm.
1520
1521 log1p
1522 (class=math #args=1): log(1-x).
1523
1524 logifit
1525 (class=math #args=3): Given m and b from logistic regression, compute
1526 fit: $yhat=logifit($x,$m,$b).
1527
1528 madd
1529 (class=math #args=3): a + b mod m (integers)
1530
1531 max
1532 (class=math variadic): max of n numbers; null loses
1533
1534 mexp
1535 (class=math #args=3): a ** b mod m (integers)
1536
1537 min
1538 (class=math variadic): Min of n numbers; null loses
1539
1540 mmul
1541 (class=math #args=3): a * b mod m (integers)
1542
1543 msub
1544 (class=math #args=3): a - b mod m (integers)
1545
1546 pow
1547 (class=math #args=2): Exponentiation; same as **.
1548
1549 qnorm
1550 (class=math #args=1): Normal cumulative distribution function.
1551
1552 round
1553 (class=math #args=1): Round to nearest integer.
1554
1555 roundm
1556 (class=math #args=2): Round to nearest multiple of m: roundm($x,$m) is
1557 the same as round($x/$m)*$m
1558
1559 sgn
1560 (class=math #args=1): +1 for positive input, 0 for zero input, -1 for
1561 negative input.
1562
1563 sin
1564 (class=math #args=1): Trigonometric sine.
1565
1566 sinh
1567 (class=math #args=1): Hyperbolic sine.
1568
1569 sqrt
1570 (class=math #args=1): Square root.
1571
1572 tan
1573 (class=math #args=1): Trigonometric tangent.
1574
1575 tanh
1576 (class=math #args=1): Hyperbolic tangent.
1577
1578 urand
1579 (class=math #args=0): Floating-point numbers on the unit interval.
1580 Int-valued example: '$n=floor(20+urand()*11)'.
1581
1582 urand32
1583 (class=math #args=0): Integer uniformly distributed 0 and 2**32-1
1584 inclusive.
1585
1586 urandint
1587 (class=math #args=2): Integer uniformly distributed between inclusive
1588 integer endpoints.
1589
1590 dhms2fsec
1591 (class=time #args=1): Recovers floating-point seconds as in
1592 dhms2fsec("5d18h53m20.250000s") = 500000.250000
1593
1594 dhms2sec
1595 (class=time #args=1): Recovers integer seconds as in
1596 dhms2sec("5d18h53m20s") = 500000
1597
1598 fsec2dhms
1599 (class=time #args=1): Formats floating-point seconds as in
1600 fsec2dhms(500000.25) = "5d18h53m20.250000s"
1601
1602 fsec2hms
1603 (class=time #args=1): Formats floating-point seconds as in
1604 fsec2hms(5000.25) = "01:23:20.250000"
1605
1606 gmt2sec
1607 (class=time #args=1): Parses GMT timestamp as integer seconds since
1608 the epoch.
1609
1610 localtime2sec
1611 (class=time #args=1): Parses local timestamp as integer seconds since
1612 the epoch. Consults $TZ environment variable.
1613
1614 hms2fsec
1615 (class=time #args=1): Recovers floating-point seconds as in
1616 hms2fsec("01:23:20.250000") = 5000.250000
1617
1618 hms2sec
1619 (class=time #args=1): Recovers integer seconds as in
1620 hms2sec("01:23:20") = 5000
1621
1622 sec2dhms
1623 (class=time #args=1): Formats integer seconds as in sec2dhms(500000)
1624 = "5d18h53m20s"
1625
1626 sec2gmt
1627 (class=time #args=1): Formats seconds since epoch (integer part)
1628 as GMT timestamp, e.g. sec2gmt(1440768801.7) = "2015-08-28T13:33:21Z".
1629 Leaves non-numbers as-is.
1630
1631 sec2gmt (class=time #args=2): Formats seconds since epoch as GMT timestamp with n
1632 decimal places for seconds, e.g. sec2gmt(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1633 Leaves non-numbers as-is.
1634
1635 sec2gmtdate
1636 (class=time #args=1): Formats seconds since epoch (integer part)
1637 as GMT timestamp with year-month-date, e.g. sec2gmtdate(1440768801.7) = "2015-08-28".
1638 Leaves non-numbers as-is.
1639
1640 sec2localtime
1641 (class=time #args=1): Formats seconds since epoch (integer part)
1642 as local timestamp, e.g. sec2localtime(1440768801.7) = "2015-08-28T13:33:21Z".
1643 Consults $TZ environment variable. Leaves non-numbers as-is.
1644
1645 sec2localtime (class=time #args=2): Formats seconds since epoch as local timestamp with n
1646 decimal places for seconds, e.g. sec2localtime(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1647 Consults $TZ environment variable. Leaves non-numbers as-is.
1648
1649 sec2localdate
1650 (class=time #args=1): Formats seconds since epoch (integer part)
1651 as local timestamp with year-month-date, e.g. sec2localdate(1440768801.7) = "2015-08-28".
1652 Consults $TZ environment variable. Leaves non-numbers as-is.
1653
1654 sec2hms
1655 (class=time #args=1): Formats integer seconds as in
1656 sec2hms(5000) = "01:23:20"
1657
1658 strftime
1659 (class=time #args=2): Formats seconds since the epoch as timestamp, e.g.
1660 strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z", and
1661 strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z".
1662 Format strings are as in the C library (please see "man strftime" on your system),
1663 with the Miller-specific addition of "%1S" through "%9S" which format the seconds
1664 with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.)
1665 See also strftime_local.
1666
1667 strftime_local
1668 (class=time #args=2): Like strftime but consults the $TZ environment variable to get local time zone.
1669
1670 strptime
1671 (class=time #args=2): Parses timestamp as floating-point seconds since the epoch,
1672 e.g. strptime("2015-08-28T13:33:21Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.000000,
1673 and strptime("2015-08-28T13:33:21.345Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.345000.
1674 See also strptime_local.
1675
1676 strptime_local
1677 (class=time #args=2): Like strptime, but consults $TZ environment variable to find and use local timezone.
1678
1679 systime
1680 (class=time #args=0): Floating-point seconds since the epoch,
1681 e.g. 1440768801.748936.
1682
1683 is_absent
1684 (class=typing #args=1): False if field is present in input, false otherwise
1685
1686 is_bool
1687 (class=typing #args=1): True if field is present with boolean value. Synonymous with is_boolean.
1688
1689 is_boolean
1690 (class=typing #args=1): True if field is present with boolean value. Synonymous with is_bool.
1691
1692 is_empty
1693 (class=typing #args=1): True if field is present in input with empty string value, false otherwise.
1694
1695 is_empty_map
1696 (class=typing #args=1): True if argument is a map which is empty.
1697
1698 is_float
1699 (class=typing #args=1): True if field is present with value inferred to be float
1700
1701 is_int
1702 (class=typing #args=1): True if field is present with value inferred to be int
1703
1704 is_map
1705 (class=typing #args=1): True if argument is a map.
1706
1707 is_nonempty_map
1708 (class=typing #args=1): True if argument is a map which is non-empty.
1709
1710 is_not_empty
1711 (class=typing #args=1): False if field is present in input with empty value, false otherwise
1712
1713 is_not_map
1714 (class=typing #args=1): True if argument is not a map.
1715
1716 is_not_null
1717 (class=typing #args=1): False if argument is null (empty or absent), true otherwise.
1718
1719 is_null
1720 (class=typing #args=1): True if argument is null (empty or absent), false otherwise.
1721
1722 is_numeric
1723 (class=typing #args=1): True if field is present with value inferred to be int or float
1724
1725 is_present
1726 (class=typing #args=1): True if field is present in input, false otherwise.
1727
1728 is_string
1729 (class=typing #args=1): True if field is present with string (including empty-string) value
1730
1731 asserting_absent
1732 (class=typing #args=1): Returns argument if it is absent in the input data, else
1733 throws an error.
1734
1735 asserting_bool
1736 (class=typing #args=1): Returns argument if it is present with boolean value, else
1737 throws an error.
1738
1739 asserting_boolean
1740 (class=typing #args=1): Returns argument if it is present with boolean value, else
1741 throws an error.
1742
1743 asserting_empty
1744 (class=typing #args=1): Returns argument if it is present in input with empty value,
1745 else throws an error.
1746
1747 asserting_empty_map
1748 (class=typing #args=1): Returns argument if it is a map with empty value, else
1749 throws an error.
1750
1751 asserting_float
1752 (class=typing #args=1): Returns argument if it is present with float value, else
1753 throws an error.
1754
1755 asserting_int
1756 (class=typing #args=1): Returns argument if it is present with int value, else
1757 throws an error.
1758
1759 asserting_map
1760 (class=typing #args=1): Returns argument if it is a map, else throws an error.
1761
1762 asserting_nonempty_map
1763 (class=typing #args=1): Returns argument if it is a non-empty map, else throws
1764 an error.
1765
1766 asserting_not_empty
1767 (class=typing #args=1): Returns argument if it is present in input with non-empty
1768 value, else throws an error.
1769
1770 asserting_not_map
1771 (class=typing #args=1): Returns argument if it is not a map, else throws an error.
1772
1773 asserting_not_null
1774 (class=typing #args=1): Returns argument if it is non-null (non-empty and non-absent),
1775 else throws an error.
1776
1777 asserting_null
1778 (class=typing #args=1): Returns argument if it is null (empty or absent), else throws
1779 an error.
1780
1781 asserting_numeric
1782 (class=typing #args=1): Returns argument if it is present with int or float value,
1783 else throws an error.
1784
1785 asserting_present
1786 (class=typing #args=1): Returns argument if it is present in input, else throws
1787 an error.
1788
1789 asserting_string
1790 (class=typing #args=1): Returns argument if it is present with string (including
1791 empty-string) value, else throws an error.
1792
1793 boolean
1794 (class=conversion #args=1): Convert int/float/bool/string to boolean.
1795
1796 float
1797 (class=conversion #args=1): Convert int/float/bool/string to float.
1798
1799 fmtnum
1800 (class=conversion #args=2): Convert int/float/bool to string using
1801 printf-style format string, e.g. '$s = fmtnum($n, "%06lld")'. WARNING: Miller numbers
1802 are all long long or double. If you use formats like %d or %f, behavior is undefined.
1803
1804 hexfmt
1805 (class=conversion #args=1): Convert int to string, e.g. 255 to "0xff".
1806
1807 int
1808 (class=conversion #args=1): Convert int/float/bool/string to int.
1809
1810 string
1811 (class=conversion #args=1): Convert int/float/bool/string to string.
1812
1813 typeof
1814 (class=conversion #args=1): Convert argument to type of argument (e.g.
1815 MT_STRING). For debug.
1816
1817 depth
1818 (class=maps #args=1): Prints maximum depth of hashmap: ''. Scalars have depth 0.
1819
1820 haskey
1821 (class=maps #args=2): True/false if map has/hasn't key, e.g. 'haskey($*, "a")' or
1822 ’haskey(mymap, mykey)'. Error if 1st argument is not a map.
1823
1824 joink
1825 (class=maps #args=2): Makes string from map keys. E.g. 'joink($*, ",")'.
1826
1827 joinkv
1828 (class=maps #args=3): Makes string from map key-value pairs. E.g. 'joinkv(@v[2], "=", ",")'
1829
1830 joinv
1831 (class=maps #args=2): Makes string from map keys. E.g. 'joinv(mymap, ",")'.
1832
1833 leafcount
1834 (class=maps #args=1): Counts total number of terminal values in hashmap. For single-level maps,
1835 same as length.
1836
1837 length
1838 (class=maps #args=1): Counts number of top-level entries in hashmap. Scalars have length 1.
1839
1840 mapdiff
1841 (class=maps variadic): With 0 args, returns empty map. With 1 arg, returns copy of arg.
1842 With 2 or more, returns copy of arg 1 with all keys from any of remaining argument maps removed.
1843
1844 mapexcept
1845 (class=maps variadic): Returns a map with keys from remaining arguments, if any, unset.
1846 E.g. 'mapexcept({1:2,3:4,5:6}, 1, 5, 7)' is '{3:4}'.
1847
1848 mapselect
1849 (class=maps variadic): Returns a map with only keys from remaining arguments set.
1850 E.g. 'mapselect({1:2,3:4,5:6}, 1, 5, 7)' is '{1:2,5:6}'.
1851
1852 mapsum
1853 (class=maps variadic): With 0 args, returns empty map. With >= 1 arg, returns a map with
1854 key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'.
1855
1856 splitkv
1857 (class=maps #args=3): Splits string by separators into map with type inference.
1858 E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives '{"a" : 1, "b" : 2, "c" : 3}'.
1859
1860 splitkvx
1861 (class=maps #args=3): Splits string by separators into map without type inference (keys and
1862 values are strings). E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives
1863 ’{"a" : "1", "b" : "2", "c" : "3"}'.
1864
1865 splitnv
1866 (class=maps #args=2): Splits string by separator into integer-indexed map with type inference.
1867 E.g. 'splitnv("a,b,c" , ",")' gives '{1 : "a", 2 : "b", 3 : "c"}'.
1868
1869 splitnvx
1870 (class=maps #args=2): Splits string by separator into integer-indexed map without type
1871 inference (values are strings). E.g. 'splitnv("4,5,6" , ",")' gives '{1 : "4", 2 : "5", 3 : "6"}'.
1872
1874 all
1875 all: used in "emit", "emitp", and "unset" as a synonym for @*
1876
1877 begin
1878 begin: defines a block of statements to be executed before input records
1879 are ingested. The body statements must be wrapped in curly braces.
1880 Example: 'begin { @count = 0 }'
1881
1882 bool
1883 bool: declares a boolean local variable in the current curly-braced scope.
1884 Type-checking happens at assignment: 'bool b = 1' is an error.
1885
1886 break
1887 break: causes execution to continue after the body of the current
1888 for/while/do-while loop.
1889
1890 call
1891 call: used for invoking a user-defined subroutine.
1892 Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
1893
1894 continue
1895 continue: causes execution to skip the remaining statements in the body of
1896 the current for/while/do-while loop. For-loop increments are still applied.
1897
1898 do
1899 do: with "while", introduces a do-while loop. The body statements must be wrapped
1900 in curly braces.
1901
1902 dump
1903 dump: prints all currently defined out-of-stream variables immediately
1904 to stdout as JSON.
1905
1906 With >, >>, or |, the data do not become part of the output record stream but
1907 are instead redirected.
1908
1909 The > and >> are for write and append, as in the shell, but (as with awk) the
1910 file-overwrite for > is on first write, not per record. The | is for piping to
1911 a process which will process the data. There will be one open file for each
1912 distinct file name (for > and >>) or one subordinate process for each distinct
1913 value of the piped-to command (for |). Output-formatting flags are taken from
1914 the main command line.
1915
1916 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }'
1917 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump > "mytap.dat"}'
1918 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}'
1919 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}'
1920
1921 edump
1922 edump: prints all currently defined out-of-stream variables immediately
1923 to stderr as JSON.
1924
1925 Example: mlr --from f.dat put -q '@v[NR]=$*; end { edump }'
1926
1927 elif
1928 elif: the way Miller spells "else if". The body statements must be wrapped
1929 in curly braces.
1930
1931 else
1932 else: terminates an if/elif/elif chain. The body statements must be wrapped
1933 in curly braces.
1934
1935 emit
1936 emit: inserts an out-of-stream variable into the output record stream. Hashmap
1937 indices present in the data but not slotted by emit arguments are not output.
1938
1939 With >, >>, or |, the data do not become part of the output record stream but
1940 are instead redirected.
1941
1942 The > and >> are for write and append, as in the shell, but (as with awk) the
1943 file-overwrite for > is on first write, not per record. The | is for piping to
1944 a process which will process the data. There will be one open file for each
1945 distinct file name (for > and >>) or one subordinate process for each distinct
1946 value of the piped-to command (for |). Output-formatting flags are taken from
1947 the main command line.
1948
1949 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
1950 etc., to control the format of the output if the output is redirected. See also mlr -h.
1951
1952 Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, $*'
1953 Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, mapexcept($*, "a")'
1954 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums'
1955 Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums'
1956 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"'
1957 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"'
1958 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > "mytap.dat", @*, "index1", "index2"'
1959 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"'
1960 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"'
1961 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
1962 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"'
1963
1964 Please see http://johnkerl.org/miller/doc for more information.
1965
1966 emitf
1967 emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the
1968 output record stream.
1969
1970 With >, >>, or |, the data do not become part of the output record stream but
1971 are instead redirected.
1972
1973 The > and >> are for write and append, as in the shell, but (as with awk) the
1974 file-overwrite for > is on first write, not per record. The | is for piping to
1975 a process which will process the data. There will be one open file for each
1976 distinct file name (for > and >>) or one subordinate process for each distinct
1977 value of the piped-to command (for |). Output-formatting flags are taken from
1978 the main command line.
1979
1980 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
1981 etc., to control the format of the output if the output is redirected. See also mlr -h.
1982
1983 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a'
1984 Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a'
1985 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c'
1986 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c'
1987 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c'
1988 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c'
1989 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c'
1990 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c'
1991
1992 Please see http://johnkerl.org/miller/doc for more information.
1993
1994 emitp
1995 emitp: inserts an out-of-stream variable into the output record stream.
1996 Hashmap indices present in the data but not slotted by emitp arguments are
1997 output concatenated with ":".
1998
1999 With >, >>, or |, the data do not become part of the output record stream but
2000 are instead redirected.
2001
2002 The > and >> are for write and append, as in the shell, but (as with awk) the
2003 file-overwrite for > is on first write, not per record. The | is for piping to
2004 a process which will process the data. There will be one open file for each
2005 distinct file name (for > and >>) or one subordinate process for each distinct
2006 value of the piped-to command (for |). Output-formatting flags are taken from
2007 the main command line.
2008
2009 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2010 etc., to control the format of the output if the output is redirected. See also mlr -h.
2011
2012 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums'
2013 Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums'
2014 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"'
2015 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"'
2016 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > "mytap.dat", @*, "index1", "index2"'
2017 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"'
2018 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"'
2019 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
2020 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"'
2021
2022 Please see http://johnkerl.org/miller/doc for more information.
2023
2024 end
2025 end: defines a block of statements to be executed after input records
2026 are ingested. The body statements must be wrapped in curly braces.
2027 Example: 'end { emit @count }'
2028 Example: 'end { eprint "Final count is " . @count }'
2029
2030 eprint
2031 eprint: prints expression immediately to stderr.
2032 Example: mlr --from f.dat put -q 'eprint "The sum of x and y is ".($x+$y)'
2033 Example: mlr --from f.dat put -q 'for (k, v in $*) { eprint k . " => " . v }'
2034 Example: mlr --from f.dat put '(NR % 1000 == 0) { eprint "Checkpoint ".NR}'
2035
2036 eprintn
2037 eprintn: prints expression immediately to stderr, without trailing newline.
2038 Example: mlr --from f.dat put -q 'eprintn "The sum of x and y is ".($x+$y); eprint ""'
2039
2040 false
2041 false: the boolean literal value.
2042
2043 filter
2044 filter: includes/excludes the record in the output record stream.
2045
2046 Example: mlr --from f.dat put 'filter (NR == 2 || $x > 5.4)'
2047
2048 Instead of put with 'filter false' you can simply use put -q. The following
2049 uses the input record to accumulate data but only prints the running sum
2050 without printing the input record:
2051
2052 Example: mlr --from f.dat put -q '@running_sum += $x * $y; emit @running_sum'
2053
2054 float
2055 float: declares a floating-point local variable in the current curly-braced scope.
2056 Type-checking happens at assignment: 'float x = 0' is an error.
2057
2058 for
2059 for: defines a for-loop using one of three styles. The body statements must
2060 be wrapped in curly braces.
2061 For-loop over stream record:
2062 Example: 'for (k, v in $*) { ... }'
2063 For-loop over out-of-stream variables:
2064 Example: 'for (k, v in @counts) { ... }'
2065 Example: 'for ((k1, k2), v in @counts) { ... }'
2066 Example: 'for ((k1, k2, k3), v in @*) { ... }'
2067 C-style for-loop:
2068 Example: 'for (var i = 0, var b = 1; i < 10; i += 1, b *= 2) { ... }'
2069
2070 func
2071 func: used for defining a user-defined function.
2072 Example: 'func f(a,b) { return sqrt(a**2+b**2)} $d = f($x, $y)'
2073
2074 if
2075 if: starts an if/elif/elif chain. The body statements must be wrapped
2076 in curly braces.
2077
2078 in
2079 in: used in for-loops over stream records or out-of-stream variables.
2080
2081 int
2082 int: declares an integer local variable in the current curly-braced scope.
2083 Type-checking happens at assignment: 'int x = 0.0' is an error.
2084
2085 map
2086 map: declares an map-valued local variable in the current curly-braced scope.
2087 Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is
2088 always OK. map b = a is OK or not depending on whether a is a map.
2089
2090 num
2091 num: declares an int/float local variable in the current curly-braced scope.
2092 Type-checking happens at assignment: 'num b = true' is an error.
2093
2094 print
2095 print: prints expression immediately to stdout.
2096 Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)'
2097 Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }'
2098 Example: mlr --from f.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
2099
2100 printn
2101 printn: prints expression immediately to stdout, without trailing newline.
2102 Example: mlr --from f.dat put -q 'printn "."; end { print "" }'
2103
2104 return
2105 return: specifies the return value from a user-defined function.
2106 Omitted return statements (including via if-branches) result in an absent-null
2107 return value, which in turns results in a skipped assignment to an LHS.
2108
2109 stderr
2110 stderr: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2111 to print to standard error.
2112
2113 stdout
2114 stdout: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2115 to print to standard output.
2116
2117 str
2118 str: declares a string local variable in the current curly-braced scope.
2119 Type-checking happens at assignment.
2120
2121 subr
2122 subr: used for defining a subroutine.
2123 Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
2124
2125 tee
2126 tee: prints the current record to specified file.
2127 This is an immediate print to the specified file (except for pprint format
2128 which of course waits until the end of the input stream to format all output).
2129
2130 The > and >> are for write and append, as in the shell, but (as with awk) the
2131 file-overwrite for > is on first write, not per record. The | is for piping to
2132 a process which will process the data. There will be one open file for each
2133 distinct file name (for > and >>) or one subordinate process for each distinct
2134 value of the piped-to command (for |). Output-formatting flags are taken from
2135 the main command line.
2136
2137 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2138 etc., to control the format of the output. See also mlr -h.
2139
2140 emit with redirect and tee with redirect are identical, except tee can only
2141 output $*.
2142
2143 Example: mlr --from f.dat put 'tee > "/tmp/data-".$a, $*'
2144 Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*'
2145 Example: mlr --from f.dat put 'tee > stderr, $*'
2146 Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\]", $*'
2147 Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\] > /tmp/data-".$a, $*'
2148 Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2149 Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2150
2151 true
2152 true: the boolean literal value.
2153
2154 unset
2155 unset: clears field(s) from the current record, or an out-of-stream or local variable.
2156
2157 Example: mlr --from f.dat put 'unset $x'
2158 Example: mlr --from f.dat put 'unset $*'
2159 Example: mlr --from f.dat put 'for (k, v in $*) { if (k =~ "a.*") { unset $[k] } }'
2160 Example: mlr --from f.dat put '...; unset @sums'
2161 Example: mlr --from f.dat put '...; unset @sums["green"]'
2162 Example: mlr --from f.dat put '...; unset @*'
2163
2164 var
2165 var: declares an untyped local variable in the current curly-braced scope.
2166 Examples: 'var a=1', 'var xyz=""'
2167
2168 while
2169 while: introduces a while loop, or with "do", introduces a do-while loop.
2170 The body statements must be wrapped in curly braces.
2171
2172 ENV
2173 ENV: access to environment variables by name, e.g. '$home = ENV["HOME"]'
2174
2175 FILENAME
2176 FILENAME: evaluates to the name of the current file being processed.
2177
2178 FILENUM
2179 FILENUM: evaluates to the number of the current file being processed,
2180 starting with 1.
2181
2182 FNR
2183 FNR: evaluates to the number of the current record within the current file
2184 being processed, starting with 1. Resets at the start of each file.
2185
2186 IFS
2187 IFS: evaluates to the input field separator from the command line.
2188
2189 IPS
2190 IPS: evaluates to the input pair separator from the command line.
2191
2192 IRS
2193 IRS: evaluates to the input record separator from the command line,
2194 or to LF or CRLF from the input data if in autodetect mode (which is
2195 the default).
2196
2197 M_E
2198 M_E: the mathematical constant e.
2199
2200 M_PI
2201 M_PI: the mathematical constant pi.
2202
2203 NF
2204 NF: evaluates to the number of fields in the current record.
2205
2206 NR
2207 NR: evaluates to the number of the current record over all files
2208 being processed, starting with 1. Does not reset at the start of each file.
2209
2210 OFS
2211 OFS: evaluates to the output field separator from the command line.
2212
2213 OPS
2214 OPS: evaluates to the output pair separator from the command line.
2215
2216 ORS
2217 ORS: evaluates to the output record separator from the command line,
2218 or to LF or CRLF from the input data if in autodetect mode (which is
2219 the default).
2220
2222 Miller is written by John Kerl <kerl.john.r@gmail.com>.
2223
2224 This manual page has been composed from Miller's help output by Eric
2225 MSP Veith <eveith@veith-m.de>.
2226
2228 awk(1), sed(1), cut(1), join(1), sort(1), RFC 4180: Common Format and
2229 MIME Type for Comma-Separated Values (CSV) Files, the miller website
2230 http://johnkerl.org/miller/doc
2231
2232
2233
2234 2018-10-14 MILLER(1)