1MILLER(1) MILLER(1)
2
3
4
6 miller - like awk, sed, cut, join, and sort for name-indexed data such
7 as CSV and tabular JSON.
8
10 Usage: mlr [I/O options] {verb} [verb-dependent options ...] {zero or
11 more file names}
12
13
15 Miller operates on key-value-pair data while the familiar Unix tools
16 operate on integer-indexed fields: if the natural data structure for
17 the latter is the array, then Miller's natural data structure is the
18 insertion-ordered hash map. This encompasses a variety of data
19 formats, including but not limited to the familiar CSV, TSV, and JSON.
20 (Miller can handle positionally-indexed data as a special case.) This
21 manpage documents Miller v5.10.1.
22
24 COMMAND-LINE SYNTAX
25 mlr --csv cut -f hostname,uptime mydata.csv
26 mlr --tsv --rs lf filter '$status != "down" && $upsec >= 10000' *.tsv
27 mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat
28 grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group
29 mlr join -j account_id -f accounts.dat then group-by account_name balances.dat
30 mlr --json put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json
31 mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/*
32 mlr stats2 -a linreg-pca -f u,v -g shape data/*
33 mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}' data/*
34 mlr --from estimates.tbl put '
35 for (k,v in $*) {
36 if (is_numeric(v) && k =~ "^[t-z].*$") {
37 $sum += v; $count += 1
38 }
39 }
40 $mean = $sum / $count # no assignment if count unset'
41 mlr --from infile.dat put -f analyze.mlr
42 mlr --from infile.dat put 'tee > "./taps/data-".$a."-".$b, $*'
43 mlr --from infile.dat put 'tee | "gzip > ./taps/data-".$a."-".$b.".gz", $*'
44 mlr --from infile.dat put -q '@v=$*; dump | "jq .[]"'
45 mlr --from infile.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
46
47 DATA FORMATS
48 DKVP: delimited key-value pairs (Miller default format)
49 +---------------------+
50 | apple=1,bat=2,cog=3 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
51 | dish=7,egg=8,flint | Record 2: "dish" => "7", "egg" => "8", "3" => "flint"
52 +---------------------+
53
54 NIDX: implicitly numerically indexed (Unix-toolkit style)
55 +---------------------+
56 | the quick brown | Record 1: "1" => "the", "2" => "quick", "3" => "brown"
57 | fox jumped | Record 2: "1" => "fox", "2" => "jumped"
58 +---------------------+
59
60 CSV/CSV-lite: comma-separated values with separate header line
61 +---------------------+
62 | apple,bat,cog |
63 | 1,2,3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
64 | 4,5,6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
65 +---------------------+
66
67 Tabular JSON: nested objects are supported, although arrays within them are not:
68 +---------------------+
69 | { |
70 | "apple": 1, | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
71 | "bat": 2, |
72 | "cog": 3 |
73 | } |
74 | { |
75 | "dish": { | Record 2: "dish:egg" => "7", "dish:flint" => "8", "garlic" => ""
76 | "egg": 7, |
77 | "flint": 8 |
78 | }, |
79 | "garlic": "" |
80 | } |
81 +---------------------+
82
83 PPRINT: pretty-printed tabular
84 +---------------------+
85 | apple bat cog |
86 | 1 2 3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
87 | 4 5 6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
88 +---------------------+
89
90 XTAB: pretty-printed transposed tabular
91 +---------------------+
92 | apple 1 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
93 | bat 2 |
94 | cog 3 |
95 | |
96 | dish 7 | Record 2: "dish" => "7", "egg" => "8"
97 | egg 8 |
98 +---------------------+
99
100 Markdown tabular (supported for output only):
101 +-----------------------+
102 | | apple | bat | cog | |
103 | | --- | --- | --- | |
104 | | 1 | 2 | 3 | | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
105 | | 4 | 5 | 6 | | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
106 +-----------------------+
107
109 In the following option flags, the version with "i" designates the
110 input stream, "o" the output stream, and the version without prefix
111 sets the option for both input and output stream. For example: --irs
112 sets the input record separator, --ors the output record separator, and
113 --rs sets both the input and output separator to the given value.
114
115 HELP OPTIONS
116 -h or --help Show this message.
117 --version Show the software version.
118 {verb name} --help Show verb-specific help.
119 --help-all-verbs Show help on all verbs.
120 -l or --list-all-verbs List only verb names.
121 -L List only verb names, one per line.
122 -f or --help-all-functions Show help on all built-in functions.
123 -F Show a bare listing of built-in functions by name.
124 -k or --help-all-keywords Show help on all keywords.
125 -K Show a bare listing of keywords by name.
126
127 VERB LIST
128 altkv bar bootstrap cat check clean-whitespace count count-distinct
129 count-similar cut decimate fill-down filter format-values fraction grep
130 group-by group-like having-fields head histogram join label least-frequent
131 merge-fields most-frequent nest nothing put regularize remove-empty-columns
132 rename reorder repeat reshape sample sec2gmt sec2gmtdate seqgen shuffle
133 skip-trivial-records sort sort-within-records stats1 stats2 step tac tail tee
134 top uniq unsparsify
135
136 FUNCTION LIST
137 + + - - * / // .+ .+ .- .- .* ./ .// % ** | ^ & ~ << >> bitcount == != =~ !=~
138 > >= < <= && || ^^ ! ? : . gsub regextract regextract_or_else strlen sub ssub
139 substr tolower toupper truncate capitalize lstrip rstrip strip
140 collapse_whitespace clean_whitespace system abs acos acosh asin asinh atan
141 atan2 atanh cbrt ceil cos cosh erf erfc exp expm1 floor invqnorm log log10
142 log1p logifit madd max mexp min mmul msub pow qnorm round roundm sgn sin sinh
143 sqrt tan tanh urand urandrange urand32 urandint dhms2fsec dhms2sec fsec2dhms
144 fsec2hms gmt2sec localtime2sec hms2fsec hms2sec sec2dhms sec2gmt sec2gmt
145 sec2gmtdate sec2localtime sec2localtime sec2localdate sec2hms strftime
146 strftime_local strptime strptime_local systime is_absent is_bool is_boolean
147 is_empty is_empty_map is_float is_int is_map is_nonempty_map is_not_empty
148 is_not_map is_not_null is_null is_numeric is_present is_string
149 asserting_absent asserting_bool asserting_boolean asserting_empty
150 asserting_empty_map asserting_float asserting_int asserting_map
151 asserting_nonempty_map asserting_not_empty asserting_not_map
152 asserting_not_null asserting_null asserting_numeric asserting_present
153 asserting_string boolean float fmtnum hexfmt int string typeof depth haskey
154 joink joinkv joinv leafcount length mapdiff mapexcept mapselect mapsum splitkv
155 splitkvx splitnv splitnvx
156
157 Please use "mlr --help-function {function name}" for function-specific help.
158
159 I/O FORMATTING
160 --idkvp --odkvp --dkvp Delimited key-value pairs, e.g "a=1,b=2"
161 (this is Miller's default format).
162
163 --inidx --onidx --nidx Implicitly-integer-indexed fields
164 (Unix-toolkit style).
165 -T Synonymous with "--nidx --fs tab".
166
167 --icsv --ocsv --csv Comma-separated value (or tab-separated
168 with --fs tab, etc.)
169
170 --itsv --otsv --tsv Keystroke-savers for "--icsv --ifs tab",
171 "--ocsv --ofs tab", "--csv --fs tab".
172 --iasv --oasv --asv Similar but using ASCII FS 0x1f and RS 0x1e
173 --iusv --ousv --usv Similar but using Unicode FS U+241F (UTF-8 0xe2909f)
174 and RS U+241E (UTF-8 0xe2909e)
175
176 --icsvlite --ocsvlite --csvlite Comma-separated value (or tab-separated
177 with --fs tab, etc.). The 'lite' CSV does not handle
178 RFC-CSV double-quoting rules; is slightly faster;
179 and handles heterogeneity in the input stream via
180 empty newline followed by new header line. See also
181 http://johnkerl.org/miller/doc/file-formats.html#CSV/TSV/etc.
182
183 --itsvlite --otsvlite --tsvlite Keystroke-savers for "--icsvlite --ifs tab",
184 "--ocsvlite --ofs tab", "--csvlite --fs tab".
185 -t Synonymous with --tsvlite.
186 --iasvlite --oasvlite --asvlite Similar to --itsvlite et al. but using ASCII FS 0x1f and RS 0x1e
187 --iusvlite --ousvlite --usvlite Similar to --itsvlite et al. but using Unicode FS U+241F (UTF-8 0xe2909f)
188 and RS U+241E (UTF-8 0xe2909e)
189
190 --ipprint --opprint --pprint Pretty-printed tabular (produces no
191 output until all input is in).
192 --right Right-justifies all fields for PPRINT output.
193 --barred Prints a border around PPRINT output
194 (only available for output).
195
196 --omd Markdown-tabular (only available for output).
197
198 --ixtab --oxtab --xtab Pretty-printed vertical-tabular.
199 --xvright Right-justifies values for XTAB format.
200
201 --ijson --ojson --json JSON tabular: sequence or list of one-level
202 maps: {...}{...} or [{...},{...}].
203 --json-map-arrays-on-input JSON arrays are unmillerable. --json-map-arrays-on-input
204 --json-skip-arrays-on-input is the default: arrays are converted to integer-indexed
205 --json-fatal-arrays-on-input maps. The other two options cause them to be skipped, or
206 to be treated as errors. Please use the jq tool for full
207 JSON (pre)processing.
208 --jvstack Put one key-value pair per line for JSON
209 output.
210 --jsonx --ojsonx Keystroke-savers for --json --jvstack
211 --jsonx --ojsonx and --ojson --jvstack, respectively.
212 --jlistwrap Wrap JSON output in outermost [ ].
213 --jknquoteint Do not quote non-string map keys in JSON output.
214 --jvquoteall Quote map values in JSON output, even if they're
215 numeric.
216 --jflatsep {string} Separator for flattening multi-level JSON keys,
217 e.g. '{"a":{"b":3}}' becomes a:b => 3 for
218 non-JSON formats. Defaults to :.
219
220 -p is a keystroke-saver for --nidx --fs space --repifs
221
222 Examples: --csv for CSV-formatted input and output; --idkvp --opprint for
223 DKVP-formatted input and pretty-printed output.
224
225 Please use --iformat1 --oformat2 rather than --format1 --oformat2.
226 The latter sets up input and output flags for format1, not all of which
227 are overridden in all cases by setting output format to format2.
228
229 COMMENTS IN DATA
230 --skip-comments Ignore commented lines (prefixed by "#")
231 within the input.
232 --skip-comments-with {string} Ignore commented lines within input, with
233 specified prefix.
234 --pass-comments Immediately print commented lines (prefixed by "#")
235 within the input.
236 --pass-comments-with {string} Immediately print commented lines within input, with
237 specified prefix.
238 Notes:
239 * Comments are only honored at the start of a line.
240 * In the absence of any of the above four options, comments are data like
241 any other text.
242 * When pass-comments is used, comment lines are written to standard output
243 immediately upon being read; they are not part of the record stream.
244 Results may be counterintuitive. A suggestion is to place comments at the
245 start of data files.
246
247 FORMAT-CONVERSION KEYSTROKE-SAVERS
248 As keystroke-savers for format-conversion you may use the following:
249 --c2t --c2d --c2n --c2j --c2x --c2p --c2m
250 --t2c --t2d --t2n --t2j --t2x --t2p --t2m
251 --d2c --d2t --d2n --d2j --d2x --d2p --d2m
252 --n2c --n2t --n2d --n2j --n2x --n2p --n2m
253 --j2c --j2t --j2d --j2n --j2x --j2p --j2m
254 --x2c --x2t --x2d --x2n --x2j --x2p --x2m
255 --p2c --p2t --p2d --p2n --p2j --p2x --p2m
256 The letters c t d n j x p m refer to formats CSV, TSV, DKVP, NIDX, JSON, XTAB,
257 PPRINT, and markdown, respectively. Note that markdown format is available for
258 output only.
259
260 COMPRESSED I/O
261 --prepipe {command} This allows Miller to handle compressed inputs. You can do
262 without this for single input files, e.g. "gunzip < myfile.csv.gz | mlr ...".
263
264 However, when multiple input files are present, between-file separations are
265 lost; also, the FILENAME variable doesn't iterate. Using --prepipe you can
266 specify an action to be taken on each input file. This pre-pipe command must
267 be able to read from standard input; it will be invoked with
268 {command} < {filename}.
269 Examples:
270 mlr --prepipe 'gunzip'
271 mlr --prepipe 'zcat -cf'
272 mlr --prepipe 'xz -cd'
273 mlr --prepipe cat
274 mlr --prepipe-gunzip
275 mlr --prepipe-zcat
276 Note that this feature is quite general and is not limited to decompression
277 utilities. You can use it to apply per-file filters of your choice.
278 For output compression (or other) utilities, simply pipe the output:
279 mlr ... | {your compression command}
280
281 There are shorthands --prepipe-zcat and --prepipe-gunzip which are
282 valid in .mlrrc files. The --prepipe flag is not valid in .mlrrc
283 files since that would put execution of the prepipe command under
284 control of the .mlrrc file.
285
286 SEPARATORS
287 --rs --irs --ors Record separators, e.g. 'lf' or '\r\n'
288 --fs --ifs --ofs --repifs Field separators, e.g. comma
289 --ps --ips --ops Pair separators, e.g. equals sign
290
291 Notes about line endings:
292 * Default line endings (--irs and --ors) are "auto" which means autodetect from
293 the input file format, as long as the input file(s) have lines ending in either
294 LF (also known as linefeed, '\n', 0x0a, Unix-style) or CRLF (also known as
295 carriage-return/linefeed pairs, '\r\n', 0x0d 0x0a, Windows style).
296 * If both irs and ors are auto (which is the default) then LF input will lead to LF
297 output and CRLF input will lead to CRLF output, regardless of the platform you're
298 running on.
299 * The line-ending autodetector triggers on the first line ending detected in the input
300 stream. E.g. if you specify a CRLF-terminated file on the command line followed by an
301 LF-terminated file then autodetected line endings will be CRLF.
302 * If you use --ors {something else} with (default or explicitly specified) --irs auto
303 then line endings are autodetected on input and set to what you specify on output.
304 * If you use --irs {something else} with (default or explicitly specified) --ors auto
305 then the output line endings used are LF on Unix/Linux/BSD/MacOSX, and CRLF on Windows.
306
307 Notes about all other separators:
308 * IPS/OPS are only used for DKVP and XTAB formats, since only in these formats
309 do key-value pairs appear juxtaposed.
310 * IRS/ORS are ignored for XTAB format. Nominally IFS and OFS are newlines;
311 XTAB records are separated by two or more consecutive IFS/OFS -- i.e.
312 a blank line. Everything above about --irs/--ors/--rs auto becomes --ifs/--ofs/--fs
313 auto for XTAB format. (XTAB's default IFS/OFS are "auto".)
314 * OFS must be single-character for PPRINT format. This is because it is used
315 with repetition for alignment; multi-character separators would make
316 alignment impossible.
317 * OPS may be multi-character for XTAB format, in which case alignment is
318 disabled.
319 * TSV is simply CSV using tab as field separator ("--fs tab").
320 * FS/PS are ignored for markdown format; RS is used.
321 * All FS and PS options are ignored for JSON format, since they are not relevant
322 to the JSON format.
323 * You can specify separators in any of the following ways, shown by example:
324 - Type them out, quoting as necessary for shell escapes, e.g.
325 "--fs '|' --ips :"
326 - C-style escape sequences, e.g. "--rs '\r\n' --fs '\t'".
327 - To avoid backslashing, you can use any of the following names:
328 cr crcr newline lf lflf crlf crlfcrlf tab space comma pipe slash colon semicolon equals
329 * Default separators by format:
330 File format RS FS PS
331 gen N/A (N/A) (N/A)
332 dkvp auto , =
333 json auto (N/A) (N/A)
334 nidx auto space (N/A)
335 csv auto , (N/A)
336 csvlite auto , (N/A)
337 markdown auto (N/A) (N/A)
338 pprint auto space (N/A)
339 xtab (N/A) auto space
340
341 CSV-SPECIFIC OPTIONS
342 --implicit-csv-header Use 1,2,3,... as field labels, rather than from line 1
343 of input files. Tip: combine with "label" to recreate
344 missing headers.
345 --allow-ragged-csv-input|--ragged If a data line has fewer fields than the header line,
346 fill remaining keys with empty string. If a data line has more
347 fields than the header line, use integer field labels as in
348 the implicit-header case.
349 --headerless-csv-output Print only CSV data lines.
350 -N Keystroke-saver for --implicit-csv-header --headerless-csv-output.
351
352 DOUBLE-QUOTING FOR CSV/CSVLITE OUTPUT
353 --quote-all Wrap all fields in double quotes
354 --quote-none Do not wrap any fields in double quotes, even if they have
355 OFS or ORS in them
356 --quote-minimal Wrap fields in double quotes only if they have OFS or ORS
357 in them (default)
358 --quote-numeric Wrap fields in double quotes only if they have numbers
359 in them
360 --quote-original Wrap fields in double quotes if and only if they were
361 quoted on input. This isn't sticky for computed fields:
362 e.g. if fields a and b were quoted on input and you do
363 "put '$c = $a . $b'" then field c won't inherit a or b's
364 was-quoted-on-input flag.
365
366 NUMERICAL FORMATTING
367 --ofmt {format} E.g. %.18lf, %.0lf. Please use sprintf-style codes for
368 double-precision. Applies to verbs which compute new
369 values, e.g. put, stats1, stats2. See also the fmtnum
370 function within mlr put (mlr --help-all-functions).
371 Defaults to %lf.
372
373 OTHER OPTIONS
374 --seed {n} with n of the form 12345678 or 0xcafefeed. For put/filter
375 urand()/urandint()/urand32().
376 --nr-progress-mod {m}, with m a positive integer: print filename and record
377 count to stderr every m input records.
378 --from {filename} Use this to specify an input file before the verb(s),
379 rather than after. May be used more than once. Example:
380 "mlr --from a.dat --from b.dat cat" is the same as
381 "mlr cat a.dat b.dat".
382 -n Process no input files, nor standard input either. Useful
383 for mlr put with begin/end statements only. (Same as --from
384 /dev/null.) Also useful in "mlr -n put -v '...'" for
385 analyzing abstract syntax trees (if that's your thing).
386 -I Process files in-place. For each file name on the command
387 line, output is written to a temp file in the same
388 directory, which is then renamed over the original. Each
389 file is processed in isolation: if the output format is
390 CSV, CSV headers will be present in each output file;
391 statistics are only over each file's own records; and so on.
392
393 THEN-CHAINING
394 Output of one verb may be chained as input to another using "then", e.g.
395 mlr stats1 -a min,mean,max -f flag,u,v -g color then sort -f color
396
397 AUXILIARY COMMANDS
398 Miller has a few otherwise-standalone executables packaged within it.
399 They do not participate in any other parts of Miller.
400 Available subcommands:
401 aux-list
402 lecat
403 termcvt
404 hex
405 unhex
406 netbsd-strptime
407 For more information, please invoke mlr {subcommand} --help
408
410 You can set up personal defaults via a $HOME/.mlrrc and/or ./.mlrrc.
411 For example, if you usually process CSV, then you can put "--csv" in your .mlrrc file
412 and that will be the default input/output format unless otherwise specified on the command line.
413
414 The .mlrrc file format is one "--flag" or "--option value" per line, with the leading "--" optional.
415 Hash-style comments and blank lines are ignored.
416
417 Sample .mlrrc:
418 # Input and output formats are CSV by default (unless otherwise specified
419 # on the mlr command line):
420 csv
421 # These are no-ops for CSV, but when I do use JSON output, I want these
422 # pretty-printing options to be used:
423 jvstack
424 jlistwrap
425
426 How to specify location of .mlrrc:
427 * If $MLRRC is set:
428 o If its value is "__none__" then no .mlrrc files are processed.
429 o Otherwise, its value (as a filename) is loaded and processed. If there are syntax
430 errors, they abort mlr with a usage message (as if you had mistyped something on the
431 command line). If the file can't be loaded at all, though, it is silently skipped.
432 o Any .mlrrc in your home directory or current directory is ignored whenever $MLRRC is
433 set in the environment.
434 * Otherwise:
435 o If $HOME/.mlrrc exists, it's then processed as above.
436 o If ./.mlrrc exists, it's then also processed as above.
437 (I.e. current-directory .mlrrc defaults are stacked over home-directory .mlrrc defaults.)
438
439 See also:
440 https://johnkerl.org/miller/doc/customization.html
441
443 altkv
444 Usage: mlr altkv [no options]
445 Given fields with values of the form a,b,c,d,e,f emits a=b,c=d,e=f pairs.
446
447 bar
448 Usage: mlr bar [options]
449 Replaces a numeric field with a number of asterisks, allowing for cheesy
450 bar plots. These align best with --opprint or --oxtab output format.
451 Options:
452 -f {a,b,c} Field names to convert to bars.
453 -c {character} Fill character: default '*'.
454 -x {character} Out-of-bounds character: default '#'.
455 -b {character} Blank character: default '.'.
456 --lo {lo} Lower-limit value for min-width bar: default '0.000000'.
457 --hi {hi} Upper-limit value for max-width bar: default '100.000000'.
458 -w {n} Bar-field width: default '40'.
459 --auto Automatically computes limits, ignoring --lo and --hi.
460 Holds all records in memory before producing any output.
461
462 bootstrap
463 Usage: mlr bootstrap [options]
464 Emits an n-sample, with replacement, of the input records.
465 Options:
466 -n {number} Number of samples to output. Defaults to number of input records.
467 Must be non-negative.
468 See also mlr sample and mlr shuffle.
469
470 cat
471 Usage: mlr cat [options]
472 Passes input records directly to output. Most useful for format conversion.
473 Options:
474 -n Prepend field "n" to each record with record-counter starting at 1
475 -g {comma-separated field name(s)} When used with -n/-N, writes record-counters
476 keyed by specified field name(s).
477 -v Write a low-level record-structure dump to stderr.
478 -N {name} Prepend field {name} to each record with record-counter starting at 1
479
480 check
481 Usage: mlr check
482 Consumes records without printing any output.
483 Useful for doing a well-formatted check on input data.
484
485 clean-whitespace
486 Usage: mlr clean-whitespace [options]
487 For each record, for each field in the record, whitespace-cleans the keys and
488 values. Whitespace-cleaning entails stripping leading and trailing whitespace,
489 and replacing multiple whitespace with singles. For finer-grained control,
490 please see the DSL functions lstrip, rstrip, strip, collapse_whitespace,
491 and clean_whitespace.
492
493 Options:
494 -k|--keys-only Do not touch values.
495 -v|--values-only Do not touch keys.
496 It is an error to specify -k as well as -v -- to clean keys and values,
497 leave off -k as well as -v.
498
499 count
500 Usage: mlr count [options]
501 Prints number of records, optionally grouped by distinct values for specified field names.
502
503 Options:
504 -g {a,b,c} Field names for distinct count.
505 -n Show only the number of distinct values. Not interesting without -g.
506 -o {name} Field name for output count. Default "count".
507
508 count-distinct
509 Usage: mlr count-distinct [options]
510 Prints number of records having distinct values for specified field names.
511 Same as uniq -c.
512
513 Options:
514 -f {a,b,c} Field names for distinct count.
515 -n Show only the number of distinct values. Not compatible with -u.
516 -o {name} Field name for output count. Default "count".
517 Ignored with -u.
518 -u Do unlashed counts for multiple field names. With -f a,b and
519 without -u, computes counts for distinct combinations of a
520 and b field values. With -f a,b and with -u, computes counts
521 for distinct a field values and counts for distinct b field
522 values separately.
523
524 count-similar
525 Usage: mlr count-similar [options]
526 Ingests all records, then emits each record augmented by a count of
527 the number of other records having the same group-by field values.
528 Options:
529 -g {d,e,f} Group-by-field names for counts.
530 -o {name} Field name for output count. Default "count".
531
532 cut
533 Usage: mlr cut [options]
534 Passes through input records with specified fields included/excluded.
535 -f {a,b,c} Field names to include for cut.
536 -o Retain fields in the order specified here in the argument list.
537 Default is to retain them in the order found in the input data.
538 -x|--complement Exclude, rather than include, field names specified by -f.
539 -r Treat field names as regular expressions. "ab", "a.*b" will
540 match any field name containing the substring "ab" or matching
541 "a.*b", respectively; anchors of the form "^ab$", "^a.*b$" may
542 be used. The -o flag is ignored when -r is present.
543 Examples:
544 mlr cut -f hostname,status
545 mlr cut -x -f hostname,status
546 mlr cut -r -f '^status$,sda[0-9]'
547 mlr cut -r -f '^status$,"sda[0-9]"'
548 mlr cut -r -f '^status$,"sda[0-9]"i' (this is case-insensitive)
549
550 decimate
551 Usage: mlr decimate [options]
552 -n {count} Decimation factor; default 10
553 -b Decimate by printing first of every n.
554 -e Decimate by printing last of every n (default).
555 -g {a,b,c} Optional group-by-field names for decimate counts
556 Passes through one of every n records, optionally by category.
557
558 fill-down
559 Usage: mlr fill-down [options]
560 -f {a,b,c} Field names for fill-down
561 -a|--only-if-absent Field names for fill-down
562 If a given record has a missing value for a given field, fill that from
563 the corresponding value from a previous record, if any.
564 By default, a 'missing' field either is absent, or has the empty-string value.
565 With -a, a field is 'missing' only if it is absent.
566
567 filter
568 Usage: mlr filter [options] {expression}
569 Prints records for which {expression} evaluates to true.
570 If there are multiple semicolon-delimited expressions, all of them are
571 evaluated and the last one is used as the filter criterion.
572
573 Conversion options:
574 -S: Keeps field values as strings with no type inference to int or float.
575 -F: Keeps field values as strings or floats with no inference to int.
576 All field values are type-inferred to int/float/string unless this behavior is
577 suppressed with -S or -F.
578
579 Output/formatting options:
580 --oflatsep {string}: Separator to use when flattening multi-level @-variables
581 to output records for emit. Default ":".
582 --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
583 --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
584 Any of the output-format command-line flags (see mlr -h). Example: using
585 mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
586 the input is CSV, the output is pretty-print tabular, but the tee-file output
587 is written in JSON format.
588 --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
589 record.
590
591 Expression-specification options:
592 -f {filename}: the DSL expression is taken from the specified file rather
593 than from the command line. Outer single quotes wrapping the expression
594 should not be placed in the file. If -f is specified more than once,
595 all input files specified using -f are concatenated to produce the expression.
596 (For example, you can define functions in one file and call them from another.)
597 -e {expression}: You can use this after -f to add an expression. Example use
598 case: define functions/subroutines in a file you specify with -f, then call
599 them with an expression you specify with -e.
600 (If you mix -e and -f then the expressions are evaluated in the order encountered.
601 Since the expression pieces are simply concatenated, please be sure to use intervening
602 semicolons to separate expressions.)
603
604 -s name=value: Predefines out-of-stream variable @name to have value "value".
605 Thus mlr filter put -s foo=97 '$column += @foo' is like
606 mlr filter put 'begin {@foo = 97} $column += @foo'.
607 The value part is subject to type-inferencing as specified by -S/-F.
608 May be specified more than once, e.g. -s name1=value1 -s name2=value2.
609 Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE
610
611 Tracing options:
612 -v: Prints the expressions's AST (abstract syntax tree), which gives
613 full transparency on the precedence and associativity rules of
614 Miller's grammar, to stdout.
615 -a: Prints a low-level stack-allocation trace to stdout.
616 -t: Prints a low-level parser trace to stderr.
617 -T: Prints a every statement to stderr as it is executed.
618
619 Other options:
620 -x: Prints records for which {expression} evaluates to false.
621
622 Please use a dollar sign for field names and double-quotes for string
623 literals. If field names have special characters such as "." then you might
624 use braces, e.g. '${field.name}'. Miller built-in variables are
625 NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
626 variables. The environment-variable name may be an expression, e.g. a field
627 value.
628
629 Use # to comment to end of line.
630
631 Examples:
632 mlr filter 'log10($count) > 4.0'
633 mlr filter 'FNR == 2' (second record in each file)
634 mlr filter 'urand() < 0.001' (subsampling)
635 mlr filter '$color != "blue" && $value > 4.2'
636 mlr filter '($x<.5 && $y<.5) || ($x>.5 && $y>.5)'
637 mlr filter '($name =~ "^sys.*east$") || ($name =~ "^dev.[0-9]+"i)'
638 mlr filter '$ab = $a+$b; $cd = $c+$d; $ab != $cd'
639 mlr filter '
640 NR == 1 ||
641 #NR == 2 ||
642 NR == 3
643 '
644
645 Please see https://miller.readthedocs.io/en/latest/reference.html for more information
646 including function list. Or "mlr -f". Please also see "mlr grep" which is
647 useful when you don't yet know which field name(s) you're looking for.
648 Please see in particular:
649 http://www.johnkerl.org/miller/doc/reference-verbs.html#filter
650
651 format-values
652 Usage: mlr format-values [options]
653 Applies format strings to all field values, depending on autodetected type.
654 * If a field value is detected to be integer, applies integer format.
655 * Else, if a field value is detected to be float, applies float format.
656 * Else, applies string format.
657
658 Note: this is a low-keystroke way to apply formatting to many fields. To get
659 finer control, please see the fmtnum function within the mlr put DSL.
660
661 Note: this verb lets you apply arbitrary format strings, which can produce
662 undefined behavior and/or program crashes. See your system's "man printf".
663
664 Options:
665 -i {integer format} Defaults to "%lld".
666 Examples: "%06lld", "%08llx".
667 Note that Miller integers are long long so you must use
668 formats which apply to long long, e.g. with ll in them.
669 Undefined behavior results otherwise.
670 -f {float format} Defaults to "%lf".
671 Examples: "%8.3lf", "%.6le".
672 Note that Miller floats are double-precision so you must
673 use formats which apply to double, e.g. with l[efg] in them.
674 Undefined behavior results otherwise.
675 -s {string format} Defaults to "%s".
676 Examples: "_%s", "%08s".
677 Note that you must use formats which apply to string, e.g.
678 with s in them. Undefined behavior results otherwise.
679 -n Coerce field values autodetected as int to float, and then
680 apply the float format.
681
682 fraction
683 Usage: mlr fraction [options]
684 For each record's value in specified fields, computes the ratio of that
685 value to the sum of values in that field over all input records.
686 E.g. with input records x=1 x=2 x=3 and x=4, emits output records
687 x=1,x_fraction=0.1 x=2,x_fraction=0.2 x=3,x_fraction=0.3 and x=4,x_fraction=0.4
688
689 Note: this is internally a two-pass algorithm: on the first pass it retains
690 input records and accumulates sums; on the second pass it computes quotients
691 and emits output records. This means it produces no output until all input is read.
692
693 Options:
694 -f {a,b,c} Field name(s) for fraction calculation
695 -g {d,e,f} Optional group-by-field name(s) for fraction counts
696 -p Produce percents [0..100], not fractions [0..1]. Output field names
697 end with "_percent" rather than "_fraction"
698 -c Produce cumulative distributions, i.e. running sums: each output
699 value folds in the sum of the previous for the specified group
700 E.g. with input records x=1 x=2 x=3 and x=4, emits output records
701 x=1,x_cumulative_fraction=0.1 x=2,x_cumulative_fraction=0.3
702 x=3,x_cumulative_fraction=0.6 and x=4,x_cumulative_fraction=1.0
703
704 grep
705 Usage: mlr grep [options] {regular expression}
706 Passes through records which match {regex}.
707 Options:
708 -i Use case-insensitive search.
709 -v Invert: pass through records which do not match the regex.
710 Note that "mlr filter" is more powerful, but requires you to know field names.
711 By contrast, "mlr grep" allows you to regex-match the entire record. It does
712 this by formatting each record in memory as DKVP, using command-line-specified
713 ORS/OFS/OPS, and matching the resulting line against the regex specified
714 here. In particular, the regex is not applied to the input stream: if you
715 have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
716 be matched, not against either of these lines, but against the DKVP line
717 "x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
718 and this command is intended to be merely a keystroke-saver. To get all the
719 features of system grep, you can do
720 "mlr --odkvp ... | grep ... | mlr --idkvp ..."
721
722 group-by
723 Usage: mlr group-by {comma-separated field names}
724 Outputs records in batches having identical values at specified field names.
725
726 group-like
727 Usage: mlr group-like
728 Outputs records in batches having identical field names.
729
730 having-fields
731 Usage: mlr having-fields [options]
732 Conditionally passes through records depending on each record's field names.
733 Options:
734 --at-least {comma-separated names}
735 --which-are {comma-separated names}
736 --at-most {comma-separated names}
737 --all-matching {regular expression}
738 --any-matching {regular expression}
739 --none-matching {regular expression}
740 Examples:
741 mlr having-fields --which-are amount,status,owner
742 mlr having-fields --any-matching 'sda[0-9]'
743 mlr having-fields --any-matching '"sda[0-9]"'
744 mlr having-fields --any-matching '"sda[0-9]"i' (this is case-insensitive)
745
746 head
747 Usage: mlr head [options]
748 -n {count} Head count to print; default 10
749 -g {a,b,c} Optional group-by-field names for head counts
750 Passes through the first n records, optionally by category.
751 Without -g, ceases consuming more input (i.e. is fast) when n
752 records have been read.
753
754 histogram
755 Usage: mlr histogram [options]
756 -f {a,b,c} Value-field names for histogram counts
757 --lo {lo} Histogram low value
758 --hi {hi} Histogram high value
759 --nbins {n} Number of histogram bins
760 --auto Automatically computes limits, ignoring --lo and --hi.
761 Holds all values in memory before producing any output.
762 -o {prefix} Prefix for output field name. Default: no prefix.
763 Just a histogram. Input values < lo or > hi are not counted.
764
765 join
766 Usage: mlr join [options]
767 Joins records from specified left file name with records from all file names
768 at the end of the Miller argument list.
769 Functionality is essentially the same as the system "join" command, but for
770 record streams.
771 Options:
772 -f {left file name}
773 -j {a,b,c} Comma-separated join-field names for output
774 -l {a,b,c} Comma-separated join-field names for left input file;
775 defaults to -j values if omitted.
776 -r {a,b,c} Comma-separated join-field names for right input file(s);
777 defaults to -j values if omitted.
778 --lp {text} Additional prefix for non-join output field names from
779 the left file
780 --rp {text} Additional prefix for non-join output field names from
781 the right file(s)
782 --np Do not emit paired records
783 --ul Emit unpaired records from the left file
784 --ur Emit unpaired records from the right file(s)
785 -s|--sorted-input Require sorted input: records must be sorted
786 lexically by their join-field names, else not all records will
787 be paired. The only likely use case for this is with a left
788 file which is too big to fit into system memory otherwise.
789 -u Enable unsorted input. (This is the default even without -u.)
790 In this case, the entire left file will be loaded into memory.
791 --prepipe {command} As in main input options; see mlr --help for details.
792 If you wish to use a prepipe command for the main input as well
793 as here, it must be specified there as well as here.
794 File-format options default to those for the right file names on the Miller
795 argument list, but may be overridden for the left file as follows. Please see
796 the main "mlr --help" for more information on syntax for these arguments.
797 -i {one of csv,dkvp,nidx,pprint,xtab}
798 --irs {record-separator character}
799 --ifs {field-separator character}
800 --ips {pair-separator character}
801 --repifs
802 --repips
803 Please use "mlr --usage-separator-options" for information on specifying separators.
804 Please see https://miller.readthedocs.io/en/latest/reference-verbs.html#join for more information
805 including examples.
806
807 label
808 Usage: mlr label {new1,new2,new3,...}
809 Given n comma-separated names, renames the first n fields of each record to
810 have the respective name. (Fields past the nth are left with their original
811 names.) Particularly useful with --inidx or --implicit-csv-header, to give
812 useful names to otherwise integer-indexed fields.
813 Examples:
814 "echo 'a b c d' | mlr --inidx --odkvp cat" gives "1=a,2=b,3=c,4=d"
815 "echo 'a b c d' | mlr --inidx --odkvp label s,t" gives "s=a,t=b,3=c,4=d"
816
817 least-frequent
818 Usage: mlr least-frequent [options]
819 Shows the least frequently occurring distinct values for specified field names.
820 The first entry is the statistical anti-mode; the remaining are runners-up.
821 Options:
822 -f {one or more comma-separated field names}. Required flag.
823 -n {count}. Optional flag defaulting to 10.
824 -b Suppress counts; show only field values.
825 -o {name} Field name for output count. Default "count".
826 See also "mlr most-frequent".
827
828 merge-fields
829 Usage: mlr merge-fields [options]
830 Computes univariate statistics for each input record, accumulated across
831 specified fields.
832 Options:
833 -a {sum,count,...} Names of accumulators. One or more of:
834 count Count instances of fields
835 mode Find most-frequently-occurring values for fields; first-found wins tie
836 antimode Find least-frequently-occurring values for fields; first-found wins tie
837 sum Compute sums of specified fields
838 mean Compute averages (sample means) of specified fields
839 stddev Compute sample standard deviation of specified fields
840 var Compute sample variance of specified fields
841 meaneb Estimate error bars for averages (assuming no sample autocorrelation)
842 skewness Compute sample skewness of specified fields
843 kurtosis Compute sample kurtosis of specified fields
844 min Compute minimum values of specified fields
845 max Compute maximum values of specified fields
846 -f {a,b,c} Value-field names on which to compute statistics. Requires -o.
847 -r {a,b,c} Regular expressions for value-field names on which to compute
848 statistics. Requires -o.
849 -c {a,b,c} Substrings for collapse mode. All fields which have the same names
850 after removing substrings will be accumulated together. Please see
851 examples below.
852 -i Use interpolated percentiles, like R's type=7; default like type=1.
853 Not sensical for string-valued fields.
854 -o {name} Output field basename for -f/-r.
855 -k Keep the input fields which contributed to the output statistics;
856 the default is to omit them.
857 -F Computes integerable things (e.g. count) in floating point.
858
859 String-valued data make sense unless arithmetic on them is required,
860 e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
861 numbers are less than strings.
862
863 Example input data: "a_in_x=1,a_out_x=2,b_in_y=4,b_out_x=8".
864 Example: mlr merge-fields -a sum,count -f a_in_x,a_out_x -o foo
865 produces "b_in_y=4,b_out_x=8,foo_sum=3,foo_count=2" since "a_in_x,a_out_x" are
866 summed over.
867 Example: mlr merge-fields -a sum,count -r in_,out_ -o bar
868 produces "bar_sum=15,bar_count=4" since all four fields are summed over.
869 Example: mlr merge-fields -a sum,count -c in_,out_
870 produces "a_x_sum=3,a_x_count=2,b_y_sum=4,b_y_count=1,b_x_sum=8,b_x_count=1"
871 since "a_in_x" and "a_out_x" both collapse to "a_x", "b_in_y" collapses to
872 "b_y", and "b_out_x" collapses to "b_x".
873
874 most-frequent
875 Usage: mlr most-frequent [options]
876 Shows the most frequently occurring distinct values for specified field names.
877 The first entry is the statistical mode; the remaining are runners-up.
878 Options:
879 -f {one or more comma-separated field names}. Required flag.
880 -n {count}. Optional flag defaulting to 10.
881 -b Suppress counts; show only field values.
882 -o {name} Field name for output count. Default "count".
883 See also "mlr least-frequent".
884
885 nest
886 Usage: mlr nest [options]
887 Explodes specified field values into separate fields/records, or reverses this.
888 Options:
889 --explode,--implode One is required.
890 --values,--pairs One is required.
891 --across-records,--across-fields One is required.
892 -f {field name} Required.
893 --nested-fs {string} Defaults to ";". Field separator for nested values.
894 --nested-ps {string} Defaults to ":". Pair separator for nested key-value pairs.
895 --evar {string} Shorthand for --explode --values ---across-records --nested-fs {string}
896 --ivar {string} Shorthand for --implode --values ---across-records --nested-fs {string}
897 Please use "mlr --usage-separator-options" for information on specifying separators.
898
899 Examples:
900
901 mlr nest --explode --values --across-records -f x
902 with input record "x=a;b;c,y=d" produces output records
903 "x=a,y=d"
904 "x=b,y=d"
905 "x=c,y=d"
906 Use --implode to do the reverse.
907
908 mlr nest --explode --values --across-fields -f x
909 with input record "x=a;b;c,y=d" produces output records
910 "x_1=a,x_2=b,x_3=c,y=d"
911 Use --implode to do the reverse.
912
913 mlr nest --explode --pairs --across-records -f x
914 with input record "x=a:1;b:2;c:3,y=d" produces output records
915 "a=1,y=d"
916 "b=2,y=d"
917 "c=3,y=d"
918
919 mlr nest --explode --pairs --across-fields -f x
920 with input record "x=a:1;b:2;c:3,y=d" produces output records
921 "a=1,b=2,c=3,y=d"
922
923 Notes:
924 * With --pairs, --implode doesn't make sense since the original field name has
925 been lost.
926 * The combination "--implode --values --across-records" is non-streaming:
927 no output records are produced until all input records have been read. In
928 particular, this means it won't work in tail -f contexts. But all other flag
929 combinations result in streaming (tail -f friendly) data processing.
930 * It's up to you to ensure that the nested-fs is distinct from your data's IFS:
931 e.g. by default the former is semicolon and the latter is comma.
932 See also mlr reshape.
933
934 nothing
935 Usage: mlr nothing
936 Drops all input records. Useful for testing, or after tee/print/etc. have
937 produced other output.
938
939 put
940 Usage: mlr put [options] {expression}
941 Adds/updates specified field(s). Expressions are semicolon-separated and must
942 either be assignments, or evaluate to boolean. Booleans with following
943 statements in curly braces control whether those statements are executed;
944 booleans without following curly braces do nothing except side effects (e.g.
945 regex-captures into \1, \2, etc.).
946
947 Conversion options:
948 -S: Keeps field values as strings with no type inference to int or float.
949 -F: Keeps field values as strings or floats with no inference to int.
950 All field values are type-inferred to int/float/string unless this behavior is
951 suppressed with -S or -F.
952
953 Output/formatting options:
954 --oflatsep {string}: Separator to use when flattening multi-level @-variables
955 to output records for emit. Default ":".
956 --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
957 --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
958 Any of the output-format command-line flags (see mlr -h). Example: using
959 mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
960 the input is CSV, the output is pretty-print tabular, but the tee-file output
961 is written in JSON format.
962 --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
963 record.
964
965 Expression-specification options:
966 -f {filename}: the DSL expression is taken from the specified file rather
967 than from the command line. Outer single quotes wrapping the expression
968 should not be placed in the file. If -f is specified more than once,
969 all input files specified using -f are concatenated to produce the expression.
970 (For example, you can define functions in one file and call them from another.)
971 -e {expression}: You can use this after -f to add an expression. Example use
972 case: define functions/subroutines in a file you specify with -f, then call
973 them with an expression you specify with -e.
974 (If you mix -e and -f then the expressions are evaluated in the order encountered.
975 Since the expression pieces are simply concatenated, please be sure to use intervening
976 semicolons to separate expressions.)
977
978 -s name=value: Predefines out-of-stream variable @name to have value "value".
979 Thus mlr put put -s foo=97 '$column += @foo' is like
980 mlr put put 'begin {@foo = 97} $column += @foo'.
981 The value part is subject to type-inferencing as specified by -S/-F.
982 May be specified more than once, e.g. -s name1=value1 -s name2=value2.
983 Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE
984
985 Tracing options:
986 -v: Prints the expressions's AST (abstract syntax tree), which gives
987 full transparency on the precedence and associativity rules of
988 Miller's grammar, to stdout.
989 -a: Prints a low-level stack-allocation trace to stdout.
990 -t: Prints a low-level parser trace to stderr.
991 -T: Prints a every statement to stderr as it is executed.
992
993 Other options:
994 -q: Does not include the modified record in the output stream. Useful for when
995 all desired output is in begin and/or end blocks.
996
997 Please use a dollar sign for field names and double-quotes for string
998 literals. If field names have special characters such as "." then you might
999 use braces, e.g. '${field.name}'. Miller built-in variables are
1000 NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
1001 variables. The environment-variable name may be an expression, e.g. a field
1002 value.
1003
1004 Use # to comment to end of line.
1005
1006 Examples:
1007 mlr put '$y = log10($x); $z = sqrt($y)'
1008 mlr put '$x>0.0 { $y=log10($x); $z=sqrt($y) }' # does {...} only if $x > 0.0
1009 mlr put '$x>0.0; $y=log10($x); $z=sqrt($y)' # does all three statements
1010 mlr put '$a =~ "([a-z]+)_([0-9]+); $b = "left_\1"; $c = "right_\2"'
1011 mlr put '$a =~ "([a-z]+)_([0-9]+) { $b = "left_\1"; $c = "right_\2" }'
1012 mlr put '$filename = FILENAME'
1013 mlr put '$colored_shape = $color . "_" . $shape'
1014 mlr put '$y = cos($theta); $z = atan2($y, $x)'
1015 mlr put '$name = sub($name, "http.*com"i, "")'
1016 mlr put -q '@sum += $x; end {emit @sum}'
1017 mlr put -q '@sum[$a] += $x; end {emit @sum, "a"}'
1018 mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}'
1019 mlr put -q '@min=min(@min,$x);@max=max(@max,$x); end{emitf @min, @max}'
1020 mlr put -q 'is_null(@xmax) || $x > @xmax {@xmax=$x; @recmax=$*}; end {emit @recmax}'
1021 mlr put '
1022 $x = 1;
1023 #$y = 2;
1024 $z = 3
1025 '
1026
1027 Please see also 'mlr -k' for examples using redirected output.
1028
1029 Please see https://miller.readthedocs.io/en/latest/reference.html for more information
1030 including function list. Or "mlr -f".
1031 Please see in particular:
1032 http://www.johnkerl.org/miller/doc/reference-verbs.html#put
1033
1034 regularize
1035 Usage: mlr regularize
1036 For records seen earlier in the data stream with same field names in
1037 a different order, outputs them with field names in the previously
1038 encountered order.
1039 Example: input records a=1,c=2,b=3, then e=4,d=5, then c=7,a=6,b=8
1040 output as a=1,c=2,b=3, then e=4,d=5, then a=6,c=7,b=8
1041
1042 remove-empty-columns
1043 Usage: mlr remove-empty-columns
1044 Omits fields which are empty on every input row. Non-streaming.
1045
1046 rename
1047 Usage: mlr rename [options] {old1,new1,old2,new2,...}
1048 Renames specified fields.
1049 Options:
1050 -r Treat old field names as regular expressions. "ab", "a.*b"
1051 will match any field name containing the substring "ab" or
1052 matching "a.*b", respectively; anchors of the form "^ab$",
1053 "^a.*b$" may be used. New field names may be plain strings,
1054 or may contain capture groups of the form "\1" through
1055 "\9". Wrapping the regex in double quotes is optional, but
1056 is required if you wish to follow it with 'i' to indicate
1057 case-insensitivity.
1058 -g Do global replacement within each field name rather than
1059 first-match replacement.
1060 Examples:
1061 mlr rename old_name,new_name'
1062 mlr rename old_name_1,new_name_1,old_name_2,new_name_2'
1063 mlr rename -r 'Date_[0-9]+,Date,' Rename all such fields to be "Date"
1064 mlr rename -r '"Date_[0-9]+",Date' Same
1065 mlr rename -r 'Date_([0-9]+).*,\1' Rename all such fields to be of the form 20151015
1066 mlr rename -r '"name"i,Name' Rename "name", "Name", "NAME", etc. to "Name"
1067
1068 reorder
1069 Usage: mlr reorder [options]
1070 -f {a,b,c} Field names to reorder.
1071 -e Put specified field names at record end: default is to put
1072 them at record start.
1073 Examples:
1074 mlr reorder -f a,b sends input record "d=4,b=2,a=1,c=3" to "a=1,b=2,d=4,c=3".
1075 mlr reorder -e -f a,b sends input record "d=4,b=2,a=1,c=3" to "d=4,c=3,a=1,b=2".
1076
1077 repeat
1078 Usage: mlr repeat [options]
1079 Copies input records to output records multiple times.
1080 Options must be exactly one of the following:
1081 -n {repeat count} Repeat each input record this many times.
1082 -f {field name} Same, but take the repeat count from the specified
1083 field name of each input record.
1084 Example:
1085 echo x=0 | mlr repeat -n 4 then put '$x=urand()'
1086 produces:
1087 x=0.488189
1088 x=0.484973
1089 x=0.704983
1090 x=0.147311
1091 Example:
1092 echo a=1,b=2,c=3 | mlr repeat -f b
1093 produces:
1094 a=1,b=2,c=3
1095 a=1,b=2,c=3
1096 Example:
1097 echo a=1,b=2,c=3 | mlr repeat -f c
1098 produces:
1099 a=1,b=2,c=3
1100 a=1,b=2,c=3
1101 a=1,b=2,c=3
1102
1103 reshape
1104 Usage: mlr reshape [options]
1105 Wide-to-long options:
1106 -i {input field names} -o {key-field name,value-field name}
1107 -r {input field regexes} -o {key-field name,value-field name}
1108 These pivot/reshape the input data such that the input fields are removed
1109 and separate records are emitted for each key/value pair.
1110 Note: this works with tail -f and produces output records for each input
1111 record seen.
1112 Long-to-wide options:
1113 -s {key-field name,value-field name}
1114 These pivot/reshape the input data to undo the wide-to-long operation.
1115 Note: this does not work with tail -f; it produces output records only after
1116 all input records have been read.
1117
1118 Examples:
1119
1120 Input file "wide.txt":
1121 time X Y
1122 2009-01-01 0.65473572 2.4520609
1123 2009-01-02 -0.89248112 0.2154713
1124 2009-01-03 0.98012375 1.3179287
1125
1126 mlr --pprint reshape -i X,Y -o item,value wide.txt
1127 time item value
1128 2009-01-01 X 0.65473572
1129 2009-01-01 Y 2.4520609
1130 2009-01-02 X -0.89248112
1131 2009-01-02 Y 0.2154713
1132 2009-01-03 X 0.98012375
1133 2009-01-03 Y 1.3179287
1134
1135 mlr --pprint reshape -r '[A-Z]' -o item,value wide.txt
1136 time item value
1137 2009-01-01 X 0.65473572
1138 2009-01-01 Y 2.4520609
1139 2009-01-02 X -0.89248112
1140 2009-01-02 Y 0.2154713
1141 2009-01-03 X 0.98012375
1142 2009-01-03 Y 1.3179287
1143
1144 Input file "long.txt":
1145 time item value
1146 2009-01-01 X 0.65473572
1147 2009-01-01 Y 2.4520609
1148 2009-01-02 X -0.89248112
1149 2009-01-02 Y 0.2154713
1150 2009-01-03 X 0.98012375
1151 2009-01-03 Y 1.3179287
1152
1153 mlr --pprint reshape -s item,value long.txt
1154 time X Y
1155 2009-01-01 0.65473572 2.4520609
1156 2009-01-02 -0.89248112 0.2154713
1157 2009-01-03 0.98012375 1.3179287
1158 See also mlr nest.
1159
1160 sample
1161 Usage: mlr sample [options]
1162 Reservoir sampling (subsampling without replacement), optionally by category.
1163 -k {count} Required: number of records to output, total, or by group if using -g.
1164 -g {a,b,c} Optional: group-by-field names for samples.
1165 See also mlr bootstrap and mlr shuffle.
1166
1167 sec2gmt
1168 Usage: mlr sec2gmt [options] {comma-separated list of field names}
1169 Replaces a numeric field representing seconds since the epoch with the
1170 corresponding GMT timestamp; leaves non-numbers as-is. This is nothing
1171 more than a keystroke-saver for the sec2gmt function:
1172 mlr sec2gmt time1,time2
1173 is the same as
1174 mlr put '$time1=sec2gmt($time1);$time2=sec2gmt($time2)'
1175 Options:
1176 -1 through -9: format the seconds using 1..9 decimal places, respectively.
1177
1178 sec2gmtdate
1179 Usage: mlr sec2gmtdate {comma-separated list of field names}
1180 Replaces a numeric field representing seconds since the epoch with the
1181 corresponding GMT year-month-day timestamp; leaves non-numbers as-is.
1182 This is nothing more than a keystroke-saver for the sec2gmtdate function:
1183 mlr sec2gmtdate time1,time2
1184 is the same as
1185 mlr put '$time1=sec2gmtdate($time1);$time2=sec2gmtdate($time2)'
1186
1187 seqgen
1188 Usage: mlr seqgen [options]
1189 Produces a sequence of counters. Discards the input record stream. Produces
1190 output as specified by the following options:
1191 -f {name} Field name for counters; default "i".
1192 --start {number} Inclusive start value; default "1".
1193 --stop {number} Inclusive stop value; default "100".
1194 --step {number} Step value; default "1".
1195 Start, stop, and/or step may be floating-point. Output is integer if start,
1196 stop, and step are all integers. Step may be negative. It may not be zero
1197 unless start == stop.
1198
1199 shuffle
1200 Usage: mlr shuffle {no options}
1201 Outputs records randomly permuted. No output records are produced until
1202 all input records are read.
1203 See also mlr bootstrap and mlr sample.
1204
1205 skip-trivial-records
1206 Usage: mlr skip-trivial-records [options]
1207 Passes through all records except:
1208 * those with zero fields;
1209 * those for which all fields have empty value.
1210
1211 sort
1212 Usage: mlr sort {flags}
1213 Flags:
1214 -f {comma-separated field names} Lexical ascending
1215 -n {comma-separated field names} Numerical ascending; nulls sort last
1216 -nf {comma-separated field names} Same as -n
1217 -r {comma-separated field names} Lexical descending
1218 -nr {comma-separated field names} Numerical descending; nulls sort first
1219 Sorts records primarily by the first specified field, secondarily by the second
1220 field, and so on. (Any records not having all specified sort keys will appear
1221 at the end of the output, in the order they were encountered, regardless of the
1222 specified sort order.) The sort is stable: records that compare equal will sort
1223 in the order they were encountered in the input record stream.
1224
1225 Example:
1226 mlr sort -f a,b -nr x,y,z
1227 which is the same as:
1228 mlr sort -f a -f b -nr x -nr y -nr z
1229
1230 sort-within-records
1231 Usage: mlr sort-within-records [no options]
1232 Outputs records sorted lexically ascending by keys.
1233
1234 stats1
1235 Usage: mlr stats1 [options]
1236 Computes univariate statistics for one or more given fields, accumulated across
1237 the input record stream.
1238 Options:
1239 -a {sum,count,...} Names of accumulators: p10 p25.2 p50 p98 p100 etc. and/or
1240 one or more of:
1241 count Count instances of fields
1242 mode Find most-frequently-occurring values for fields; first-found wins tie
1243 antimode Find least-frequently-occurring values for fields; first-found wins tie
1244 sum Compute sums of specified fields
1245 mean Compute averages (sample means) of specified fields
1246 stddev Compute sample standard deviation of specified fields
1247 var Compute sample variance of specified fields
1248 meaneb Estimate error bars for averages (assuming no sample autocorrelation)
1249 skewness Compute sample skewness of specified fields
1250 kurtosis Compute sample kurtosis of specified fields
1251 min Compute minimum values of specified fields
1252 max Compute maximum values of specified fields
1253 -f {a,b,c} Value-field names on which to compute statistics
1254 --fr {regex} Regex for value-field names on which to compute statistics
1255 (compute statistics on values in all field names matching regex)
1256 --fx {regex} Inverted regex for value-field names on which to compute statistics
1257 (compute statistics on values in all field names not matching regex)
1258 -g {d,e,f} Optional group-by-field names
1259 --gr {regex} Regex for optional group-by-field names
1260 (group by values in field names matching regex)
1261 --gx {regex} Inverted regex for optional group-by-field names
1262 (group by values in field names not matching regex)
1263 --grfx {regex} Shorthand for --gr {regex} --fx {that same regex}
1264 -i Use interpolated percentiles, like R's type=7; default like type=1.
1265 Not sensical for string-valued fields.
1266 -s Print iterative stats. Useful in tail -f contexts (in which
1267 case please avoid pprint-format output since end of input
1268 stream will never be seen).
1269 -F Computes integerable things (e.g. count) in floating point.
1270 Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
1271 Example: mlr stats1 -a count,mode -f size
1272 Example: mlr stats1 -a count,mode -f size -g shape
1273 Example: mlr stats1 -a count,mode --fr '^[a-h].*$' -gr '^k.*$'
1274 This computes count and mode statistics on all field names beginning
1275 with a through h, grouped by all field names starting with k.
1276 Notes:
1277 * p50 and median are synonymous.
1278 * min and max output the same results as p0 and p100, respectively, but use
1279 less memory.
1280 * String-valued data make sense unless arithmetic on them is required,
1281 e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
1282 numbers are less than strings.
1283 * count and mode allow text input; the rest require numeric input.
1284 In particular, 1 and 1.0 are distinct text for count and mode.
1285 * When there are mode ties, the first-encountered datum wins.
1286
1287 stats2
1288 Usage: mlr stats2 [options]
1289 Computes bivariate statistics for one or more given field-name pairs,
1290 accumulated across the input record stream.
1291 -a {linreg-ols,corr,...} Names of accumulators: one or more of:
1292 linreg-pca Linear regression using principal component analysis
1293 linreg-ols Linear regression using ordinary least squares
1294 r2 Quality metric for linreg-ols (linreg-pca emits its own)
1295 logireg Logistic regression
1296 corr Sample correlation
1297 cov Sample covariance
1298 covx Sample-covariance matrix
1299 -f {a,b,c,d} Value-field name-pairs on which to compute statistics.
1300 There must be an even number of names.
1301 -g {e,f,g} Optional group-by-field names.
1302 -v Print additional output for linreg-pca.
1303 -s Print iterative stats. Useful in tail -f contexts (in which
1304 case please avoid pprint-format output since end of input
1305 stream will never be seen).
1306 --fit Rather than printing regression parameters, applies them to
1307 the input data to compute new fit fields. All input records are
1308 held in memory until end of input stream. Has effect only for
1309 linreg-ols, linreg-pca, and logireg.
1310 Only one of -s or --fit may be used.
1311 Example: mlr stats2 -a linreg-pca -f x,y
1312 Example: mlr stats2 -a linreg-ols,r2 -f x,y -g size,shape
1313 Example: mlr stats2 -a corr -f x,y
1314
1315 step
1316 Usage: mlr step [options]
1317 Computes values dependent on the previous record, optionally grouped
1318 by category.
1319
1320 Options:
1321 -a {delta,rsum,...} Names of steppers: comma-separated, one or more of:
1322 delta Compute differences in field(s) between successive records
1323 shift Include value(s) in field(s) from previous record, if any
1324 from-first Compute differences in field(s) from first record
1325 ratio Compute ratios in field(s) between successive records
1326 rsum Compute running sums of field(s) between successive records
1327 counter Count instances of field(s) between successive records
1328 ewma Exponentially weighted moving average over successive records
1329 -f {a,b,c} Value-field names on which to compute statistics
1330 -g {d,e,f} Optional group-by-field names
1331 -F Computes integerable things (e.g. counter) in floating point.
1332 -d {x,y,z} Weights for ewma. 1 means current sample gets all weight (no
1333 smoothing), near under under 1 is light smoothing, near over 0 is
1334 heavy smoothing. Multiple weights may be specified, e.g.
1335 "mlr step -a ewma -f sys_load -d 0.01,0.1,0.9". Default if omitted
1336 is "-d 0.5".
1337 -o {a,b,c} Custom suffixes for EWMA output fields. If omitted, these default to
1338 the -d values. If supplied, the number of -o values must be the same
1339 as the number of -d values.
1340
1341 Examples:
1342 mlr step -a rsum -f request_size
1343 mlr step -a delta -f request_size -g hostname
1344 mlr step -a ewma -d 0.1,0.9 -f x,y
1345 mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y
1346 mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y -g group_name
1347
1348 Please see https://miller.readthedocs.io/en/latest/reference-verbs.html#filter or
1349 https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average
1350 for more information on EWMA.
1351
1352 tac
1353 Usage: mlr tac
1354 Prints records in reverse order from the order in which they were encountered.
1355
1356 tail
1357 Usage: mlr tail [options]
1358 -n {count} Tail count to print; default 10
1359 -g {a,b,c} Optional group-by-field names for tail counts
1360 Passes through the last n records, optionally by category.
1361
1362 tee
1363 Usage: mlr tee [options] {filename}
1364 Passes through input records (like mlr cat) but also writes to specified output
1365 file, using output-format flags from the command line (e.g. --ocsv). See also
1366 the "tee" keyword within mlr put, which allows data-dependent filenames.
1367 Options:
1368 -a: append to existing file, if any, rather than overwriting.
1369 --no-fflush: don't call fflush() after every record.
1370 Any of the output-format command-line flags (see mlr -h). Example: using
1371 mlr --icsv --opprint put '...' then tee --ojson ./mytap.dat then stats1 ...
1372 the input is CSV, the output is pretty-print tabular, but the tee-file output
1373 is written in JSON format.
1374
1375 top
1376 Usage: mlr top [options]
1377 -f {a,b,c} Value-field names for top counts.
1378 -g {d,e,f} Optional group-by-field names for top counts.
1379 -n {count} How many records to print per category; default 1.
1380 -a Print all fields for top-value records; default is
1381 to print only value and group-by fields. Requires a single
1382 value-field name only.
1383 --min Print top smallest values; default is top largest values.
1384 -F Keep top values as floats even if they look like integers.
1385 -o {name} Field name for output indices. Default "top_idx".
1386 Prints the n records with smallest/largest values at specified fields,
1387 optionally by category.
1388
1389 uniq
1390 Usage: mlr uniq [options]
1391 Prints distinct values for specified field names. With -c, same as
1392 count-distinct. For uniq, -f is a synonym for -g.
1393
1394 Options:
1395 -g {d,e,f} Group-by-field names for uniq counts.
1396 -c Show repeat counts in addition to unique values.
1397 -n Show only the number of distinct values.
1398 -o {name} Field name for output count. Default "count".
1399 -a Output each unique record only once. Incompatible with -g.
1400 With -c, produces unique records, with repeat counts for each.
1401 With -n, produces only one record which is the unique-record count.
1402 With neither -c nor -n, produces unique records.
1403
1404 unsparsify
1405 Usage: mlr unsparsify [options]
1406 Prints records with the union of field names over all input records.
1407 For field names absent in a given record but present in others, fills in a
1408 value. Without -f, this verb retains all input before producing any output.
1409
1410 Options:
1411 --fill-with {filler string} What to fill absent fields with. Defaults to
1412 the empty string.
1413 -f {a,b,c} Specify field names to be operated on. Any other fields won't be
1414 modified, and operation will be streaming.
1415
1416 Example: if the input is two records, one being 'a=1,b=2' and the other
1417 being 'b=3,c=4', then the output is the two records 'a=1,b=2,c=' and
1418 ’a=,b=3,c=4'.
1419
1421 +
1422 (class=arithmetic #args=2): Addition.
1423
1424 + (class=arithmetic #args=1): Unary plus.
1425
1426 -
1427 (class=arithmetic #args=2): Subtraction.
1428
1429 - (class=arithmetic #args=1): Unary minus.
1430
1431 *
1432 (class=arithmetic #args=2): Multiplication.
1433
1434 /
1435 (class=arithmetic #args=2): Division.
1436
1437 //
1438 (class=arithmetic #args=2): Integer division: rounds to negative (pythonic).
1439
1440 .+
1441 (class=arithmetic #args=2): Addition, with integer-to-integer overflow
1442
1443 .+ (class=arithmetic #args=1): Unary plus, with integer-to-integer overflow.
1444
1445 .-
1446 (class=arithmetic #args=2): Subtraction, with integer-to-integer overflow.
1447
1448 .- (class=arithmetic #args=1): Unary minus, with integer-to-integer overflow.
1449
1450 .*
1451 (class=arithmetic #args=2): Multiplication, with integer-to-integer overflow.
1452
1453 ./
1454 (class=arithmetic #args=2): Division, with integer-to-integer overflow.
1455
1456 .//
1457 (class=arithmetic #args=2): Integer division: rounds to negative (pythonic), with integer-to-integer overflow.
1458
1459 %
1460 (class=arithmetic #args=2): Remainder; never negative-valued (pythonic).
1461
1462 **
1463 (class=arithmetic #args=2): Exponentiation; same as pow, but as an infix
1464 operator.
1465
1466 |
1467 (class=arithmetic #args=2): Bitwise OR.
1468
1469 ^
1470 (class=arithmetic #args=2): Bitwise XOR.
1471
1472 &
1473 (class=arithmetic #args=2): Bitwise AND.
1474
1475 ~
1476 (class=arithmetic #args=1): Bitwise NOT. Beware '$y=~$x' since =~ is the
1477 regex-match operator: try '$y = ~$x'.
1478
1479 <<
1480 (class=arithmetic #args=2): Bitwise left-shift.
1481
1482 >>
1483 (class=arithmetic #args=2): Bitwise right-shift.
1484
1485 bitcount
1486 (class=arithmetic #args=1): Count of 1-bits
1487
1488 ==
1489 (class=boolean #args=2): String/numeric equality. Mixing number and string
1490 results in string compare.
1491
1492 !=
1493 (class=boolean #args=2): String/numeric inequality. Mixing number and string
1494 results in string compare.
1495
1496 =~
1497 (class=boolean #args=2): String (left-hand side) matches regex (right-hand
1498 side), e.g. '$name =~ "^a.*b$"'.
1499
1500 !=~
1501 (class=boolean #args=2): String (left-hand side) does not match regex
1502 (right-hand side), e.g. '$name !=~ "^a.*b$"'.
1503
1504 >
1505 (class=boolean #args=2): String/numeric greater-than. Mixing number and string
1506 results in string compare.
1507
1508 >=
1509 (class=boolean #args=2): String/numeric greater-than-or-equals. Mixing number
1510 and string results in string compare.
1511
1512 <
1513 (class=boolean #args=2): String/numeric less-than. Mixing number and string
1514 results in string compare.
1515
1516 <=
1517 (class=boolean #args=2): String/numeric less-than-or-equals. Mixing number
1518 and string results in string compare.
1519
1520 &&
1521 (class=boolean #args=2): Logical AND.
1522
1523 ||
1524 (class=boolean #args=2): Logical OR.
1525
1526 ^^
1527 (class=boolean #args=2): Logical XOR.
1528
1529 !
1530 (class=boolean #args=1): Logical negation.
1531
1532 ? :
1533 (class=boolean #args=3): Ternary operator.
1534
1535 .
1536 (class=string #args=2): String concatenation.
1537
1538 gsub
1539 (class=string #args=3): Example: '$name=gsub($name, "old", "new")'
1540 (replace all).
1541
1542 regextract
1543 (class=string #args=2): Example: '$name=regextract($name, "[A-Z]{3}[0-9]{2}")'
1544 .
1545
1546 regextract_or_else
1547 (class=string #args=3): Example: '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")'
1548 .
1549
1550 strlen
1551 (class=string #args=1): String length.
1552
1553 sub
1554 (class=string #args=3): Example: '$name=sub($name, "old", "new")'
1555 (replace once).
1556
1557 ssub
1558 (class=string #args=3): Like sub but does no regexing. No characters are special.
1559
1560 substr
1561 (class=string #args=3): substr(s,m,n) gives substring of s from 0-up position m to n
1562 inclusive. Negative indices -len .. -1 alias to 0 .. len-1.
1563
1564 tolower
1565 (class=string #args=1): Convert string to lowercase.
1566
1567 toupper
1568 (class=string #args=1): Convert string to uppercase.
1569
1570 truncate
1571 (class=string #args=2): Truncates string first argument to max length of int second argument.
1572
1573 capitalize
1574 (class=string #args=1): Convert string's first character to uppercase.
1575
1576 lstrip
1577 (class=string #args=1): Strip leading whitespace from string.
1578
1579 rstrip
1580 (class=string #args=1): Strip trailing whitespace from string.
1581
1582 strip
1583 (class=string #args=1): Strip leading and trailing whitespace from string.
1584
1585 collapse_whitespace
1586 (class=string #args=1): Strip repeated whitespace from string.
1587
1588 clean_whitespace
1589 (class=string #args=1): Same as collapse_whitespace and strip.
1590
1591 system
1592 (class=string #args=1): Run command string, yielding its stdout minus final carriage return.
1593
1594 abs
1595 (class=math #args=1): Absolute value.
1596
1597 acos
1598 (class=math #args=1): Inverse trigonometric cosine.
1599
1600 acosh
1601 (class=math #args=1): Inverse hyperbolic cosine.
1602
1603 asin
1604 (class=math #args=1): Inverse trigonometric sine.
1605
1606 asinh
1607 (class=math #args=1): Inverse hyperbolic sine.
1608
1609 atan
1610 (class=math #args=1): One-argument arctangent.
1611
1612 atan2
1613 (class=math #args=2): Two-argument arctangent.
1614
1615 atanh
1616 (class=math #args=1): Inverse hyperbolic tangent.
1617
1618 cbrt
1619 (class=math #args=1): Cube root.
1620
1621 ceil
1622 (class=math #args=1): Ceiling: nearest integer at or above.
1623
1624 cos
1625 (class=math #args=1): Trigonometric cosine.
1626
1627 cosh
1628 (class=math #args=1): Hyperbolic cosine.
1629
1630 erf
1631 (class=math #args=1): Error function.
1632
1633 erfc
1634 (class=math #args=1): Complementary error function.
1635
1636 exp
1637 (class=math #args=1): Exponential function e**x.
1638
1639 expm1
1640 (class=math #args=1): e**x - 1.
1641
1642 floor
1643 (class=math #args=1): Floor: nearest integer at or below.
1644
1645 invqnorm
1646 (class=math #args=1): Inverse of normal cumulative distribution
1647 function. Note that invqorm(urand()) is normally distributed.
1648
1649 log
1650 (class=math #args=1): Natural (base-e) logarithm.
1651
1652 log10
1653 (class=math #args=1): Base-10 logarithm.
1654
1655 log1p
1656 (class=math #args=1): log(1-x).
1657
1658 logifit
1659 (class=math #args=3): Given m and b from logistic regression, compute
1660 fit: $yhat=logifit($x,$m,$b).
1661
1662 madd
1663 (class=math #args=3): a + b mod m (integers)
1664
1665 max
1666 (class=math variadic): max of n numbers; null loses
1667
1668 mexp
1669 (class=math #args=3): a ** b mod m (integers)
1670
1671 min
1672 (class=math variadic): Min of n numbers; null loses
1673
1674 mmul
1675 (class=math #args=3): a * b mod m (integers)
1676
1677 msub
1678 (class=math #args=3): a - b mod m (integers)
1679
1680 pow
1681 (class=math #args=2): Exponentiation; same as **.
1682
1683 qnorm
1684 (class=math #args=1): Normal cumulative distribution function.
1685
1686 round
1687 (class=math #args=1): Round to nearest integer.
1688
1689 roundm
1690 (class=math #args=2): Round to nearest multiple of m: roundm($x,$m) is
1691 the same as round($x/$m)*$m
1692
1693 sgn
1694 (class=math #args=1): +1 for positive input, 0 for zero input, -1 for
1695 negative input.
1696
1697 sin
1698 (class=math #args=1): Trigonometric sine.
1699
1700 sinh
1701 (class=math #args=1): Hyperbolic sine.
1702
1703 sqrt
1704 (class=math #args=1): Square root.
1705
1706 tan
1707 (class=math #args=1): Trigonometric tangent.
1708
1709 tanh
1710 (class=math #args=1): Hyperbolic tangent.
1711
1712 urand
1713 (class=math #args=0): Floating-point numbers uniformly distributed on the unit interval.
1714 Int-valued example: '$n=floor(20+urand()*11)'.
1715
1716 urandrange
1717 (class=math #args=2): Floating-point numbers uniformly distributed on the interval [a, b).
1718
1719 urand32
1720 (class=math #args=0): Integer uniformly distributed 0 and 2**32-1
1721 inclusive.
1722
1723 urandint
1724 (class=math #args=2): Integer uniformly distributed between inclusive
1725 integer endpoints.
1726
1727 dhms2fsec
1728 (class=time #args=1): Recovers floating-point seconds as in
1729 dhms2fsec("5d18h53m20.250000s") = 500000.250000
1730
1731 dhms2sec
1732 (class=time #args=1): Recovers integer seconds as in
1733 dhms2sec("5d18h53m20s") = 500000
1734
1735 fsec2dhms
1736 (class=time #args=1): Formats floating-point seconds as in
1737 fsec2dhms(500000.25) = "5d18h53m20.250000s"
1738
1739 fsec2hms
1740 (class=time #args=1): Formats floating-point seconds as in
1741 fsec2hms(5000.25) = "01:23:20.250000"
1742
1743 gmt2sec
1744 (class=time #args=1): Parses GMT timestamp as integer seconds since
1745 the epoch.
1746
1747 localtime2sec
1748 (class=time #args=1): Parses local timestamp as integer seconds since
1749 the epoch. Consults $TZ environment variable.
1750
1751 hms2fsec
1752 (class=time #args=1): Recovers floating-point seconds as in
1753 hms2fsec("01:23:20.250000") = 5000.250000
1754
1755 hms2sec
1756 (class=time #args=1): Recovers integer seconds as in
1757 hms2sec("01:23:20") = 5000
1758
1759 sec2dhms
1760 (class=time #args=1): Formats integer seconds as in sec2dhms(500000)
1761 = "5d18h53m20s"
1762
1763 sec2gmt
1764 (class=time #args=1): Formats seconds since epoch (integer part)
1765 as GMT timestamp, e.g. sec2gmt(1440768801.7) = "2015-08-28T13:33:21Z".
1766 Leaves non-numbers as-is.
1767
1768 sec2gmt (class=time #args=2): Formats seconds since epoch as GMT timestamp with n
1769 decimal places for seconds, e.g. sec2gmt(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1770 Leaves non-numbers as-is.
1771
1772 sec2gmtdate
1773 (class=time #args=1): Formats seconds since epoch (integer part)
1774 as GMT timestamp with year-month-date, e.g. sec2gmtdate(1440768801.7) = "2015-08-28".
1775 Leaves non-numbers as-is.
1776
1777 sec2localtime
1778 (class=time #args=1): Formats seconds since epoch (integer part)
1779 as local timestamp, e.g. sec2localtime(1440768801.7) = "2015-08-28T13:33:21Z".
1780 Consults $TZ environment variable. Leaves non-numbers as-is.
1781
1782 sec2localtime (class=time #args=2): Formats seconds since epoch as local timestamp with n
1783 decimal places for seconds, e.g. sec2localtime(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1784 Consults $TZ environment variable. Leaves non-numbers as-is.
1785
1786 sec2localdate
1787 (class=time #args=1): Formats seconds since epoch (integer part)
1788 as local timestamp with year-month-date, e.g. sec2localdate(1440768801.7) = "2015-08-28".
1789 Consults $TZ environment variable. Leaves non-numbers as-is.
1790
1791 sec2hms
1792 (class=time #args=1): Formats integer seconds as in
1793 sec2hms(5000) = "01:23:20"
1794
1795 strftime
1796 (class=time #args=2): Formats seconds since the epoch as timestamp, e.g.
1797 strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z", and
1798 strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z".
1799 Format strings are as in the C library (please see "man strftime" on your system),
1800 with the Miller-specific addition of "%1S" through "%9S" which format the seconds
1801 with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.)
1802 See also strftime_local.
1803
1804 strftime_local
1805 (class=time #args=2): Like strftime but consults the $TZ environment variable to get local time zone.
1806
1807 strptime
1808 (class=time #args=2): Parses timestamp as floating-point seconds since the epoch,
1809 e.g. strptime("2015-08-28T13:33:21Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.000000,
1810 and strptime("2015-08-28T13:33:21.345Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.345000.
1811 See also strptime_local.
1812
1813 strptime_local
1814 (class=time #args=2): Like strptime, but consults $TZ environment variable to find and use local timezone.
1815
1816 systime
1817 (class=time #args=0): Floating-point seconds since the epoch,
1818 e.g. 1440768801.748936.
1819
1820 is_absent
1821 (class=typing #args=1): False if field is present in input, true otherwise
1822
1823 is_bool
1824 (class=typing #args=1): True if field is present with boolean value. Synonymous with is_boolean.
1825
1826 is_boolean
1827 (class=typing #args=1): True if field is present with boolean value. Synonymous with is_bool.
1828
1829 is_empty
1830 (class=typing #args=1): True if field is present in input with empty string value, false otherwise.
1831
1832 is_empty_map
1833 (class=typing #args=1): True if argument is a map which is empty.
1834
1835 is_float
1836 (class=typing #args=1): True if field is present with value inferred to be float
1837
1838 is_int
1839 (class=typing #args=1): True if field is present with value inferred to be int
1840
1841 is_map
1842 (class=typing #args=1): True if argument is a map.
1843
1844 is_nonempty_map
1845 (class=typing #args=1): True if argument is a map which is non-empty.
1846
1847 is_not_empty
1848 (class=typing #args=1): False if field is present in input with empty value, true otherwise
1849
1850 is_not_map
1851 (class=typing #args=1): True if argument is not a map.
1852
1853 is_not_null
1854 (class=typing #args=1): False if argument is null (empty or absent), true otherwise.
1855
1856 is_null
1857 (class=typing #args=1): True if argument is null (empty or absent), false otherwise.
1858
1859 is_numeric
1860 (class=typing #args=1): True if field is present with value inferred to be int or float
1861
1862 is_present
1863 (class=typing #args=1): True if field is present in input, false otherwise.
1864
1865 is_string
1866 (class=typing #args=1): True if field is present with string (including empty-string) value
1867
1868 asserting_absent
1869 (class=typing #args=1): Returns argument if it is absent in the input data, else
1870 throws an error.
1871
1872 asserting_bool
1873 (class=typing #args=1): Returns argument if it is present with boolean value, else
1874 throws an error.
1875
1876 asserting_boolean
1877 (class=typing #args=1): Returns argument if it is present with boolean value, else
1878 throws an error.
1879
1880 asserting_empty
1881 (class=typing #args=1): Returns argument if it is present in input with empty value,
1882 else throws an error.
1883
1884 asserting_empty_map
1885 (class=typing #args=1): Returns argument if it is a map with empty value, else
1886 throws an error.
1887
1888 asserting_float
1889 (class=typing #args=1): Returns argument if it is present with float value, else
1890 throws an error.
1891
1892 asserting_int
1893 (class=typing #args=1): Returns argument if it is present with int value, else
1894 throws an error.
1895
1896 asserting_map
1897 (class=typing #args=1): Returns argument if it is a map, else throws an error.
1898
1899 asserting_nonempty_map
1900 (class=typing #args=1): Returns argument if it is a non-empty map, else throws
1901 an error.
1902
1903 asserting_not_empty
1904 (class=typing #args=1): Returns argument if it is present in input with non-empty
1905 value, else throws an error.
1906
1907 asserting_not_map
1908 (class=typing #args=1): Returns argument if it is not a map, else throws an error.
1909
1910 asserting_not_null
1911 (class=typing #args=1): Returns argument if it is non-null (non-empty and non-absent),
1912 else throws an error.
1913
1914 asserting_null
1915 (class=typing #args=1): Returns argument if it is null (empty or absent), else throws
1916 an error.
1917
1918 asserting_numeric
1919 (class=typing #args=1): Returns argument if it is present with int or float value,
1920 else throws an error.
1921
1922 asserting_present
1923 (class=typing #args=1): Returns argument if it is present in input, else throws
1924 an error.
1925
1926 asserting_string
1927 (class=typing #args=1): Returns argument if it is present with string (including
1928 empty-string) value, else throws an error.
1929
1930 boolean
1931 (class=conversion #args=1): Convert int/float/bool/string to boolean.
1932
1933 float
1934 (class=conversion #args=1): Convert int/float/bool/string to float.
1935
1936 fmtnum
1937 (class=conversion #args=2): Convert int/float/bool to string using
1938 printf-style format string, e.g. '$s = fmtnum($n, "%06lld")'. WARNING: Miller numbers
1939 are all long long or double. If you use formats like %d or %f, behavior is undefined.
1940
1941 hexfmt
1942 (class=conversion #args=1): Convert int to string, e.g. 255 to "0xff".
1943
1944 int
1945 (class=conversion #args=1): Convert int/float/bool/string to int.
1946
1947 string
1948 (class=conversion #args=1): Convert int/float/bool/string to string.
1949
1950 typeof
1951 (class=conversion #args=1): Convert argument to type of argument (e.g.
1952 MT_STRING). For debug.
1953
1954 depth
1955 (class=maps #args=1): Prints maximum depth of hashmap: ''. Scalars have depth 0.
1956
1957 haskey
1958 (class=maps #args=2): True/false if map has/hasn't key, e.g. 'haskey($*, "a")' or
1959 ’haskey(mymap, mykey)'. Error if 1st argument is not a map.
1960
1961 joink
1962 (class=maps #args=2): Makes string from map keys. E.g. 'joink($*, ",")'.
1963
1964 joinkv
1965 (class=maps #args=3): Makes string from map key-value pairs. E.g. 'joinkv(@v[2], "=", ",")'
1966
1967 joinv
1968 (class=maps #args=2): Makes string from map values. E.g. 'joinv(mymap, ",")'.
1969
1970 leafcount
1971 (class=maps #args=1): Counts total number of terminal values in hashmap. For single-level maps,
1972 same as length.
1973
1974 length
1975 (class=maps #args=1): Counts number of top-level entries in hashmap. Scalars have length 1.
1976
1977 mapdiff
1978 (class=maps variadic): With 0 args, returns empty map. With 1 arg, returns copy of arg.
1979 With 2 or more, returns copy of arg 1 with all keys from any of remaining argument maps removed.
1980
1981 mapexcept
1982 (class=maps variadic): Returns a map with keys from remaining arguments, if any, unset.
1983 E.g. 'mapexcept({1:2,3:4,5:6}, 1, 5, 7)' is '{3:4}'.
1984
1985 mapselect
1986 (class=maps variadic): Returns a map with only keys from remaining arguments set.
1987 E.g. 'mapselect({1:2,3:4,5:6}, 1, 5, 7)' is '{1:2,5:6}'.
1988
1989 mapsum
1990 (class=maps variadic): With 0 args, returns empty map. With >= 1 arg, returns a map with
1991 key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'.
1992
1993 splitkv
1994 (class=maps #args=3): Splits string by separators into map with type inference.
1995 E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives '{"a" : 1, "b" : 2, "c" : 3}'.
1996
1997 splitkvx
1998 (class=maps #args=3): Splits string by separators into map without type inference (keys and
1999 values are strings). E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives
2000 ’{"a" : "1", "b" : "2", "c" : "3"}'.
2001
2002 splitnv
2003 (class=maps #args=2): Splits string by separator into integer-indexed map with type inference.
2004 E.g. 'splitnv("a,b,c" , ",")' gives '{1 : "a", 2 : "b", 3 : "c"}'.
2005
2006 splitnvx
2007 (class=maps #args=2): Splits string by separator into integer-indexed map without type
2008 inference (values are strings). E.g. 'splitnv("4,5,6" , ",")' gives '{1 : "4", 2 : "5", 3 : "6"}'.
2009
2011 all
2012 all: used in "emit", "emitp", and "unset" as a synonym for @*
2013
2014 begin
2015 begin: defines a block of statements to be executed before input records
2016 are ingested. The body statements must be wrapped in curly braces.
2017 Example: 'begin { @count = 0 }'
2018
2019 bool
2020 bool: declares a boolean local variable in the current curly-braced scope.
2021 Type-checking happens at assignment: 'bool b = 1' is an error.
2022
2023 break
2024 break: causes execution to continue after the body of the current
2025 for/while/do-while loop.
2026
2027 call
2028 call: used for invoking a user-defined subroutine.
2029 Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
2030
2031 continue
2032 continue: causes execution to skip the remaining statements in the body of
2033 the current for/while/do-while loop. For-loop increments are still applied.
2034
2035 do
2036 do: with "while", introduces a do-while loop. The body statements must be wrapped
2037 in curly braces.
2038
2039 dump
2040 dump: prints all currently defined out-of-stream variables immediately
2041 to stdout as JSON.
2042
2043 With >, >>, or |, the data do not become part of the output record stream but
2044 are instead redirected.
2045
2046 The > and >> are for write and append, as in the shell, but (as with awk) the
2047 file-overwrite for > is on first write, not per record. The | is for piping to
2048 a process which will process the data. There will be one open file for each
2049 distinct file name (for > and >>) or one subordinate process for each distinct
2050 value of the piped-to command (for |). Output-formatting flags are taken from
2051 the main command line.
2052
2053 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }'
2054 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump > "mytap.dat"}'
2055 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}'
2056 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}'
2057
2058 edump
2059 edump: prints all currently defined out-of-stream variables immediately
2060 to stderr as JSON.
2061
2062 Example: mlr --from f.dat put -q '@v[NR]=$*; end { edump }'
2063
2064 elif
2065 elif: the way Miller spells "else if". The body statements must be wrapped
2066 in curly braces.
2067
2068 else
2069 else: terminates an if/elif/elif chain. The body statements must be wrapped
2070 in curly braces.
2071
2072 emit
2073 emit: inserts an out-of-stream variable into the output record stream. Hashmap
2074 indices present in the data but not slotted by emit arguments are not output.
2075
2076 With >, >>, or |, the data do not become part of the output record stream but
2077 are instead redirected.
2078
2079 The > and >> are for write and append, as in the shell, but (as with awk) the
2080 file-overwrite for > is on first write, not per record. The | is for piping to
2081 a process which will process the data. There will be one open file for each
2082 distinct file name (for > and >>) or one subordinate process for each distinct
2083 value of the piped-to command (for |). Output-formatting flags are taken from
2084 the main command line.
2085
2086 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2087 etc., to control the format of the output if the output is redirected. See also mlr -h.
2088
2089 Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, $*'
2090 Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, mapexcept($*, "a")'
2091 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums'
2092 Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums'
2093 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"'
2094 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"'
2095 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > "mytap.dat", @*, "index1", "index2"'
2096 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"'
2097 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"'
2098 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
2099 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"'
2100
2101 Please see http://johnkerl.org/miller/doc for more information.
2102
2103 emitf
2104 emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the
2105 output record stream.
2106
2107 With >, >>, or |, the data do not become part of the output record stream but
2108 are instead redirected.
2109
2110 The > and >> are for write and append, as in the shell, but (as with awk) the
2111 file-overwrite for > is on first write, not per record. The | is for piping to
2112 a process which will process the data. There will be one open file for each
2113 distinct file name (for > and >>) or one subordinate process for each distinct
2114 value of the piped-to command (for |). Output-formatting flags are taken from
2115 the main command line.
2116
2117 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2118 etc., to control the format of the output if the output is redirected. See also mlr -h.
2119
2120 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a'
2121 Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a'
2122 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c'
2123 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c'
2124 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c'
2125 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c'
2126 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c'
2127 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c'
2128
2129 Please see http://johnkerl.org/miller/doc for more information.
2130
2131 emitp
2132 emitp: inserts an out-of-stream variable into the output record stream.
2133 Hashmap indices present in the data but not slotted by emitp arguments are
2134 output concatenated with ":".
2135
2136 With >, >>, or |, the data do not become part of the output record stream but
2137 are instead redirected.
2138
2139 The > and >> are for write and append, as in the shell, but (as with awk) the
2140 file-overwrite for > is on first write, not per record. The | is for piping to
2141 a process which will process the data. There will be one open file for each
2142 distinct file name (for > and >>) or one subordinate process for each distinct
2143 value of the piped-to command (for |). Output-formatting flags are taken from
2144 the main command line.
2145
2146 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2147 etc., to control the format of the output if the output is redirected. See also mlr -h.
2148
2149 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums'
2150 Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums'
2151 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"'
2152 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"'
2153 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > "mytap.dat", @*, "index1", "index2"'
2154 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"'
2155 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"'
2156 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
2157 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"'
2158
2159 Please see http://johnkerl.org/miller/doc for more information.
2160
2161 end
2162 end: defines a block of statements to be executed after input records
2163 are ingested. The body statements must be wrapped in curly braces.
2164 Example: 'end { emit @count }'
2165 Example: 'end { eprint "Final count is " . @count }'
2166
2167 eprint
2168 eprint: prints expression immediately to stderr.
2169 Example: mlr --from f.dat put -q 'eprint "The sum of x and y is ".($x+$y)'
2170 Example: mlr --from f.dat put -q 'for (k, v in $*) { eprint k . " => " . v }'
2171 Example: mlr --from f.dat put '(NR % 1000 == 0) { eprint "Checkpoint ".NR}'
2172
2173 eprintn
2174 eprintn: prints expression immediately to stderr, without trailing newline.
2175 Example: mlr --from f.dat put -q 'eprintn "The sum of x and y is ".($x+$y); eprint ""'
2176
2177 false
2178 false: the boolean literal value.
2179
2180 filter
2181 filter: includes/excludes the record in the output record stream.
2182
2183 Example: mlr --from f.dat put 'filter (NR == 2 || $x > 5.4)'
2184
2185 Instead of put with 'filter false' you can simply use put -q. The following
2186 uses the input record to accumulate data but only prints the running sum
2187 without printing the input record:
2188
2189 Example: mlr --from f.dat put -q '@running_sum += $x * $y; emit @running_sum'
2190
2191 float
2192 float: declares a floating-point local variable in the current curly-braced scope.
2193 Type-checking happens at assignment: 'float x = 0' is an error.
2194
2195 for
2196 for: defines a for-loop using one of three styles. The body statements must
2197 be wrapped in curly braces.
2198 For-loop over stream record:
2199 Example: 'for (k, v in $*) { ... }'
2200 For-loop over out-of-stream variables:
2201 Example: 'for (k, v in @counts) { ... }'
2202 Example: 'for ((k1, k2), v in @counts) { ... }'
2203 Example: 'for ((k1, k2, k3), v in @*) { ... }'
2204 C-style for-loop:
2205 Example: 'for (var i = 0, var b = 1; i < 10; i += 1, b *= 2) { ... }'
2206
2207 func
2208 func: used for defining a user-defined function.
2209 Example: 'func f(a,b) { return sqrt(a**2+b**2)} $d = f($x, $y)'
2210
2211 if
2212 if: starts an if/elif/elif chain. The body statements must be wrapped
2213 in curly braces.
2214
2215 in
2216 in: used in for-loops over stream records or out-of-stream variables.
2217
2218 int
2219 int: declares an integer local variable in the current curly-braced scope.
2220 Type-checking happens at assignment: 'int x = 0.0' is an error.
2221
2222 map
2223 map: declares an map-valued local variable in the current curly-braced scope.
2224 Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is
2225 always OK. map b = a is OK or not depending on whether a is a map.
2226
2227 num
2228 num: declares an int/float local variable in the current curly-braced scope.
2229 Type-checking happens at assignment: 'num b = true' is an error.
2230
2231 print
2232 print: prints expression immediately to stdout.
2233 Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)'
2234 Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }'
2235 Example: mlr --from f.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
2236
2237 printn
2238 printn: prints expression immediately to stdout, without trailing newline.
2239 Example: mlr --from f.dat put -q 'printn "."; end { print "" }'
2240
2241 return
2242 return: specifies the return value from a user-defined function.
2243 Omitted return statements (including via if-branches) result in an absent-null
2244 return value, which in turns results in a skipped assignment to an LHS.
2245
2246 stderr
2247 stderr: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2248 to print to standard error.
2249
2250 stdout
2251 stdout: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2252 to print to standard output.
2253
2254 str
2255 str: declares a string local variable in the current curly-braced scope.
2256 Type-checking happens at assignment.
2257
2258 subr
2259 subr: used for defining a subroutine.
2260 Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
2261
2262 tee
2263 tee: prints the current record to specified file.
2264 This is an immediate print to the specified file (except for pprint format
2265 which of course waits until the end of the input stream to format all output).
2266
2267 The > and >> are for write and append, as in the shell, but (as with awk) the
2268 file-overwrite for > is on first write, not per record. The | is for piping to
2269 a process which will process the data. There will be one open file for each
2270 distinct file name (for > and >>) or one subordinate process for each distinct
2271 value of the piped-to command (for |). Output-formatting flags are taken from
2272 the main command line.
2273
2274 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2275 etc., to control the format of the output. See also mlr -h.
2276
2277 emit with redirect and tee with redirect are identical, except tee can only
2278 output $*.
2279
2280 Example: mlr --from f.dat put 'tee > "/tmp/data-".$a, $*'
2281 Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*'
2282 Example: mlr --from f.dat put 'tee > stderr, $*'
2283 Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\]", $*'
2284 Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\] > /tmp/data-".$a, $*'
2285 Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2286 Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2287
2288 true
2289 true: the boolean literal value.
2290
2291 unset
2292 unset: clears field(s) from the current record, or an out-of-stream or local variable.
2293
2294 Example: mlr --from f.dat put 'unset $x'
2295 Example: mlr --from f.dat put 'unset $*'
2296 Example: mlr --from f.dat put 'for (k, v in $*) { if (k =~ "a.*") { unset $[k] } }'
2297 Example: mlr --from f.dat put '...; unset @sums'
2298 Example: mlr --from f.dat put '...; unset @sums["green"]'
2299 Example: mlr --from f.dat put '...; unset @*'
2300
2301 var
2302 var: declares an untyped local variable in the current curly-braced scope.
2303 Examples: 'var a=1', 'var xyz=""'
2304
2305 while
2306 while: introduces a while loop, or with "do", introduces a do-while loop.
2307 The body statements must be wrapped in curly braces.
2308
2309 ENV
2310 ENV: access to environment variables by name, e.g. '$home = ENV["HOME"]'
2311
2312 FILENAME
2313 FILENAME: evaluates to the name of the current file being processed.
2314
2315 FILENUM
2316 FILENUM: evaluates to the number of the current file being processed,
2317 starting with 1.
2318
2319 FNR
2320 FNR: evaluates to the number of the current record within the current file
2321 being processed, starting with 1. Resets at the start of each file.
2322
2323 IFS
2324 IFS: evaluates to the input field separator from the command line.
2325
2326 IPS
2327 IPS: evaluates to the input pair separator from the command line.
2328
2329 IRS
2330 IRS: evaluates to the input record separator from the command line,
2331 or to LF or CRLF from the input data if in autodetect mode (which is
2332 the default).
2333
2334 M_E
2335 M_E: the mathematical constant e.
2336
2337 M_PI
2338 M_PI: the mathematical constant pi.
2339
2340 NF
2341 NF: evaluates to the number of fields in the current record.
2342
2343 NR
2344 NR: evaluates to the number of the current record over all files
2345 being processed, starting with 1. Does not reset at the start of each file.
2346
2347 OFS
2348 OFS: evaluates to the output field separator from the command line.
2349
2350 OPS
2351 OPS: evaluates to the output pair separator from the command line.
2352
2353 ORS
2354 ORS: evaluates to the output record separator from the command line,
2355 or to LF or CRLF from the input data if in autodetect mode (which is
2356 the default).
2357
2359 Miller is written by John Kerl <kerl.john.r@gmail.com>.
2360
2361 This manual page has been composed from Miller's help output by Eric
2362 MSP Veith <eveith@veith-m.de>.
2363
2365 awk(1), sed(1), cut(1), join(1), sort(1), RFC 4180: Common Format and
2366 MIME Type for Comma-Separated Values (CSV) Files, the miller website
2367 http://johnkerl.org/miller/doc
2368
2369
2370
2371 2021-03-23 MILLER(1)