1MILLER(1) MILLER(1)
2
3
4
6 miller - like awk, sed, cut, join, and sort for name-indexed data such
7 as CSV and tabular JSON.
8
10 Usage: mlr [I/O options] {verb} [verb-dependent options ...] {zero or
11 more file names}
12
13
15 Miller operates on key-value-pair data while the familiar Unix tools
16 operate on integer-indexed fields: if the natural data structure for
17 the latter is the array, then Miller's natural data structure is the
18 insertion-ordered hash map. This encompasses a variety of data
19 formats, including but not limited to the familiar CSV, TSV, and JSON.
20 (Miller can handle positionally-indexed data as a special case.) This
21 manpage documents Miller v5.9.1.
22
24 COMMAND-LINE SYNTAX
25 mlr --csv cut -f hostname,uptime mydata.csv
26 mlr --tsv --rs lf filter '$status != "down" && $upsec >= 10000' *.tsv
27 mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat
28 grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group
29 mlr join -j account_id -f accounts.dat then group-by account_name balances.dat
30 mlr --json put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json
31 mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/*
32 mlr stats2 -a linreg-pca -f u,v -g shape data/*
33 mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}' data/*
34 mlr --from estimates.tbl put '
35 for (k,v in $*) {
36 if (is_numeric(v) && k =~ "^[t-z].*$") {
37 $sum += v; $count += 1
38 }
39 }
40 $mean = $sum / $count # no assignment if count unset'
41 mlr --from infile.dat put -f analyze.mlr
42 mlr --from infile.dat put 'tee > "./taps/data-".$a."-".$b, $*'
43 mlr --from infile.dat put 'tee | "gzip > ./taps/data-".$a."-".$b.".gz", $*'
44 mlr --from infile.dat put -q '@v=$*; dump | "jq .[]"'
45 mlr --from infile.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
46
47 DATA FORMATS
48 DKVP: delimited key-value pairs (Miller default format)
49 +---------------------+
50 | apple=1,bat=2,cog=3 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
51 | dish=7,egg=8,flint | Record 2: "dish" => "7", "egg" => "8", "3" => "flint"
52 +---------------------+
53
54 NIDX: implicitly numerically indexed (Unix-toolkit style)
55 +---------------------+
56 | the quick brown | Record 1: "1" => "the", "2" => "quick", "3" => "brown"
57 | fox jumped | Record 2: "1" => "fox", "2" => "jumped"
58 +---------------------+
59
60 CSV/CSV-lite: comma-separated values with separate header line
61 +---------------------+
62 | apple,bat,cog |
63 | 1,2,3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
64 | 4,5,6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
65 +---------------------+
66
67 Tabular JSON: nested objects are supported, although arrays within them are not:
68 +---------------------+
69 | { |
70 | "apple": 1, | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
71 | "bat": 2, |
72 | "cog": 3 |
73 | } |
74 | { |
75 | "dish": { | Record 2: "dish:egg" => "7", "dish:flint" => "8", "garlic" => ""
76 | "egg": 7, |
77 | "flint": 8 |
78 | }, |
79 | "garlic": "" |
80 | } |
81 +---------------------+
82
83 PPRINT: pretty-printed tabular
84 +---------------------+
85 | apple bat cog |
86 | 1 2 3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
87 | 4 5 6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
88 +---------------------+
89
90 XTAB: pretty-printed transposed tabular
91 +---------------------+
92 | apple 1 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
93 | bat 2 |
94 | cog 3 |
95 | |
96 | dish 7 | Record 2: "dish" => "7", "egg" => "8"
97 | egg 8 |
98 +---------------------+
99
100 Markdown tabular (supported for output only):
101 +-----------------------+
102 | | apple | bat | cog | |
103 | | --- | --- | --- | |
104 | | 1 | 2 | 3 | | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
105 | | 4 | 5 | 6 | | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
106 +-----------------------+
107
109 In the following option flags, the version with "i" designates the
110 input stream, "o" the output stream, and the version without prefix
111 sets the option for both input and output stream. For example: --irs
112 sets the input record separator, --ors the output record separator, and
113 --rs sets both the input and output separator to the given value.
114
115 HELP OPTIONS
116 -h or --help Show this message.
117 --version Show the software version.
118 {verb name} --help Show verb-specific help.
119 --help-all-verbs Show help on all verbs.
120 -l or --list-all-verbs List only verb names.
121 -L List only verb names, one per line.
122 -f or --help-all-functions Show help on all built-in functions.
123 -F Show a bare listing of built-in functions by name.
124 -k or --help-all-keywords Show help on all keywords.
125 -K Show a bare listing of keywords by name.
126
127 VERB LIST
128 altkv bar bootstrap cat check clean-whitespace count count-distinct
129 count-similar cut decimate fill-down filter format-values fraction grep
130 group-by group-like having-fields head histogram join label least-frequent
131 merge-fields most-frequent nest nothing put regularize remove-empty-columns
132 rename reorder repeat reshape sample sec2gmt sec2gmtdate seqgen shuffle
133 skip-trivial-records sort stats1 stats2 step tac tail tee top uniq unsparsify
134
135 FUNCTION LIST
136 + + - - * / // .+ .+ .- .- .* ./ .// % ** | ^ & ~ << >> bitcount == != =~ !=~
137 > >= < <= && || ^^ ! ? : . gsub regextract regextract_or_else strlen sub ssub
138 substr tolower toupper capitalize lstrip rstrip strip collapse_whitespace
139 clean_whitespace system abs acos acosh asin asinh atan atan2 atanh cbrt ceil
140 cos cosh erf erfc exp expm1 floor invqnorm log log10 log1p logifit madd max
141 mexp min mmul msub pow qnorm round roundm sgn sin sinh sqrt tan tanh urand
142 urandrange urand32 urandint dhms2fsec dhms2sec fsec2dhms fsec2hms gmt2sec
143 localtime2sec hms2fsec hms2sec sec2dhms sec2gmt sec2gmt sec2gmtdate
144 sec2localtime sec2localtime sec2localdate sec2hms strftime strftime_local
145 strptime strptime_local systime is_absent is_bool is_boolean is_empty
146 is_empty_map is_float is_int is_map is_nonempty_map is_not_empty is_not_map
147 is_not_null is_null is_numeric is_present is_string asserting_absent
148 asserting_bool asserting_boolean asserting_empty asserting_empty_map
149 asserting_float asserting_int asserting_map asserting_nonempty_map
150 asserting_not_empty asserting_not_map asserting_not_null asserting_null
151 asserting_numeric asserting_present asserting_string boolean float fmtnum
152 hexfmt int string typeof depth haskey joink joinkv joinv leafcount length
153 mapdiff mapexcept mapselect mapsum splitkv splitkvx splitnv splitnvx
154
155 Please use "mlr --help-function {function name}" for function-specific help.
156
157 I/O FORMATTING
158 --idkvp --odkvp --dkvp Delimited key-value pairs, e.g "a=1,b=2"
159 (this is Miller's default format).
160
161 --inidx --onidx --nidx Implicitly-integer-indexed fields
162 (Unix-toolkit style).
163 -T Synonymous with "--nidx --fs tab".
164
165 --icsv --ocsv --csv Comma-separated value (or tab-separated
166 with --fs tab, etc.)
167
168 --itsv --otsv --tsv Keystroke-savers for "--icsv --ifs tab",
169 "--ocsv --ofs tab", "--csv --fs tab".
170 --iasv --oasv --asv Similar but using ASCII FS 0x1f and RS 0x1e
171 --iusv --ousv --usv Similar but using Unicode FS U+241F (UTF-8 0xe2909f)
172 and RS U+241E (UTF-8 0xe2909e)
173
174 --icsvlite --ocsvlite --csvlite Comma-separated value (or tab-separated
175 with --fs tab, etc.). The 'lite' CSV does not handle
176 RFC-CSV double-quoting rules; is slightly faster;
177 and handles heterogeneity in the input stream via
178 empty newline followed by new header line. See also
179 http://johnkerl.org/miller/doc/file-formats.html#CSV/TSV/etc.
180
181 --itsvlite --otsvlite --tsvlite Keystroke-savers for "--icsvlite --ifs tab",
182 "--ocsvlite --ofs tab", "--csvlite --fs tab".
183 -t Synonymous with --tsvlite.
184 --iasvlite --oasvlite --asvlite Similar to --itsvlite et al. but using ASCII FS 0x1f and RS 0x1e
185 --iusvlite --ousvlite --usvlite Similar to --itsvlite et al. but using Unicode FS U+241F (UTF-8 0xe2909f)
186 and RS U+241E (UTF-8 0xe2909e)
187
188 --ipprint --opprint --pprint Pretty-printed tabular (produces no
189 output until all input is in).
190 --right Right-justifies all fields for PPRINT output.
191 --barred Prints a border around PPRINT output
192 (only available for output).
193
194 --omd Markdown-tabular (only available for output).
195
196 --ixtab --oxtab --xtab Pretty-printed vertical-tabular.
197 --xvright Right-justifies values for XTAB format.
198
199 --ijson --ojson --json JSON tabular: sequence or list of one-level
200 maps: {...}{...} or [{...},{...}].
201 --json-map-arrays-on-input JSON arrays are unmillerable. --json-map-arrays-on-input
202 --json-skip-arrays-on-input is the default: arrays are converted to integer-indexed
203 --json-fatal-arrays-on-input maps. The other two options cause them to be skipped, or
204 to be treated as errors. Please use the jq tool for full
205 JSON (pre)processing.
206 --jvstack Put one key-value pair per line for JSON
207 output.
208 --jsonx --ojsonx Keystroke-savers for --json --jvstack
209 --jsonx --ojsonx and --ojson --jvstack, respectively.
210 --jlistwrap Wrap JSON output in outermost [ ].
211 --jknquoteint Do not quote non-string map keys in JSON output.
212 --jvquoteall Quote map values in JSON output, even if they're
213 numeric.
214 --jflatsep {string} Separator for flattening multi-level JSON keys,
215 e.g. '{"a":{"b":3}}' becomes a:b => 3 for
216 non-JSON formats. Defaults to :.
217
218 -p is a keystroke-saver for --nidx --fs space --repifs
219
220 Examples: --csv for CSV-formatted input and output; --idkvp --opprint for
221 DKVP-formatted input and pretty-printed output.
222
223 Please use --iformat1 --oformat2 rather than --format1 --oformat2.
224 The latter sets up input and output flags for format1, not all of which
225 are overridden in all cases by setting output format to format2.
226
227 COMMENTS IN DATA
228 --skip-comments Ignore commented lines (prefixed by "#")
229 within the input.
230 --skip-comments-with {string} Ignore commented lines within input, with
231 specified prefix.
232 --pass-comments Immediately print commented lines (prefixed by "#")
233 within the input.
234 --pass-comments-with {string} Immediately print commented lines within input, with
235 specified prefix.
236 Notes:
237 * Comments are only honored at the start of a line.
238 * In the absence of any of the above four options, comments are data like
239 any other text.
240 * When pass-comments is used, comment lines are written to standard output
241 immediately upon being read; they are not part of the record stream.
242 Results may be counterintuitive. A suggestion is to place comments at the
243 start of data files.
244
245 FORMAT-CONVERSION KEYSTROKE-SAVERS
246 As keystroke-savers for format-conversion you may use the following:
247 --c2t --c2d --c2n --c2j --c2x --c2p --c2m
248 --t2c --t2d --t2n --t2j --t2x --t2p --t2m
249 --d2c --d2t --d2n --d2j --d2x --d2p --d2m
250 --n2c --n2t --n2d --n2j --n2x --n2p --n2m
251 --j2c --j2t --j2d --j2n --j2x --j2p --j2m
252 --x2c --x2t --x2d --x2n --x2j --x2p --x2m
253 --p2c --p2t --p2d --p2n --p2j --p2x --p2m
254 The letters c t d n j x p m refer to formats CSV, TSV, DKVP, NIDX, JSON, XTAB,
255 PPRINT, and markdown, respectively. Note that markdown format is available for
256 output only.
257
258 COMPRESSED I/O
259 --prepipe {command} This allows Miller to handle compressed inputs. You can do
260 without this for single input files, e.g. "gunzip < myfile.csv.gz | mlr ...".
261
262 However, when multiple input files are present, between-file separations are
263 lost; also, the FILENAME variable doesn't iterate. Using --prepipe you can
264 specify an action to be taken on each input file. This pre-pipe command must
265 be able to read from standard input; it will be invoked with
266 {command} < {filename}.
267 Examples:
268 mlr --prepipe 'gunzip'
269 mlr --prepipe 'zcat -cf'
270 mlr --prepipe 'xz -cd'
271 mlr --prepipe cat
272 mlr --prepipe-gunzip
273 mlr --prepipe-zcat
274 Note that this feature is quite general and is not limited to decompression
275 utilities. You can use it to apply per-file filters of your choice.
276 For output compression (or other) utilities, simply pipe the output:
277 mlr ... | {your compression command}
278
279 There are shorthands --prepipe-zcat and --prepipe-gunzip which are
280 valid in .mlrrc files. The --prepipe flag is not valid in .mlrrc
281 files since that would put execution of the prepipe command under
282 control of the .mlrrc file.
283
284 SEPARATORS
285 --rs --irs --ors Record separators, e.g. 'lf' or '\r\n'
286 --fs --ifs --ofs --repifs Field separators, e.g. comma
287 --ps --ips --ops Pair separators, e.g. equals sign
288
289 Notes about line endings:
290 * Default line endings (--irs and --ors) are "auto" which means autodetect from
291 the input file format, as long as the input file(s) have lines ending in either
292 LF (also known as linefeed, '\n', 0x0a, Unix-style) or CRLF (also known as
293 carriage-return/linefeed pairs, '\r\n', 0x0d 0x0a, Windows style).
294 * If both irs and ors are auto (which is the default) then LF input will lead to LF
295 output and CRLF input will lead to CRLF output, regardless of the platform you're
296 running on.
297 * The line-ending autodetector triggers on the first line ending detected in the input
298 stream. E.g. if you specify a CRLF-terminated file on the command line followed by an
299 LF-terminated file then autodetected line endings will be CRLF.
300 * If you use --ors {something else} with (default or explicitly specified) --irs auto
301 then line endings are autodetected on input and set to what you specify on output.
302 * If you use --irs {something else} with (default or explicitly specified) --ors auto
303 then the output line endings used are LF on Unix/Linux/BSD/MacOSX, and CRLF on Windows.
304
305 Notes about all other separators:
306 * IPS/OPS are only used for DKVP and XTAB formats, since only in these formats
307 do key-value pairs appear juxtaposed.
308 * IRS/ORS are ignored for XTAB format. Nominally IFS and OFS are newlines;
309 XTAB records are separated by two or more consecutive IFS/OFS -- i.e.
310 a blank line. Everything above about --irs/--ors/--rs auto becomes --ifs/--ofs/--fs
311 auto for XTAB format. (XTAB's default IFS/OFS are "auto".)
312 * OFS must be single-character for PPRINT format. This is because it is used
313 with repetition for alignment; multi-character separators would make
314 alignment impossible.
315 * OPS may be multi-character for XTAB format, in which case alignment is
316 disabled.
317 * TSV is simply CSV using tab as field separator ("--fs tab").
318 * FS/PS are ignored for markdown format; RS is used.
319 * All FS and PS options are ignored for JSON format, since they are not relevant
320 to the JSON format.
321 * You can specify separators in any of the following ways, shown by example:
322 - Type them out, quoting as necessary for shell escapes, e.g.
323 "--fs '|' --ips :"
324 - C-style escape sequences, e.g. "--rs '\r\n' --fs '\t'".
325 - To avoid backslashing, you can use any of the following names:
326 cr crcr newline lf lflf crlf crlfcrlf tab space comma pipe slash colon semicolon equals
327 * Default separators by format:
328 File format RS FS PS
329 gen N/A (N/A) (N/A)
330 dkvp auto , =
331 json auto (N/A) (N/A)
332 nidx auto space (N/A)
333 csv auto , (N/A)
334 csvlite auto , (N/A)
335 markdown auto (N/A) (N/A)
336 pprint auto space (N/A)
337 xtab (N/A) auto space
338
339 CSV-SPECIFIC OPTIONS
340 --implicit-csv-header Use 1,2,3,... as field labels, rather than from line 1
341 of input files. Tip: combine with "label" to recreate
342 missing headers.
343 --allow-ragged-csv-input|--ragged If a data line has fewer fields than the header line,
344 fill remaining keys with empty string. If a data line has more
345 fields than the header line, use integer field labels as in
346 the implicit-header case.
347 --headerless-csv-output Print only CSV data lines.
348 -N Keystroke-saver for --implicit-csv-header --headerless-csv-output.
349
350 DOUBLE-QUOTING FOR CSV/CSVLITE OUTPUT
351 --quote-all Wrap all fields in double quotes
352 --quote-none Do not wrap any fields in double quotes, even if they have
353 OFS or ORS in them
354 --quote-minimal Wrap fields in double quotes only if they have OFS or ORS
355 in them (default)
356 --quote-numeric Wrap fields in double quotes only if they have numbers
357 in them
358 --quote-original Wrap fields in double quotes if and only if they were
359 quoted on input. This isn't sticky for computed fields:
360 e.g. if fields a and b were quoted on input and you do
361 "put '$c = $a . $b'" then field c won't inherit a or b's
362 was-quoted-on-input flag.
363
364 NUMERICAL FORMATTING
365 --ofmt {format} E.g. %.18lf, %.0lf. Please use sprintf-style codes for
366 double-precision. Applies to verbs which compute new
367 values, e.g. put, stats1, stats2. See also the fmtnum
368 function within mlr put (mlr --help-all-functions).
369 Defaults to %lf.
370
371 OTHER OPTIONS
372 --seed {n} with n of the form 12345678 or 0xcafefeed. For put/filter
373 urand()/urandint()/urand32().
374 --nr-progress-mod {m}, with m a positive integer: print filename and record
375 count to stderr every m input records.
376 --from {filename} Use this to specify an input file before the verb(s),
377 rather than after. May be used more than once. Example:
378 "mlr --from a.dat --from b.dat cat" is the same as
379 "mlr cat a.dat b.dat".
380 -n Process no input files, nor standard input either. Useful
381 for mlr put with begin/end statements only. (Same as --from
382 /dev/null.) Also useful in "mlr -n put -v '...'" for
383 analyzing abstract syntax trees (if that's your thing).
384 -I Process files in-place. For each file name on the command
385 line, output is written to a temp file in the same
386 directory, which is then renamed over the original. Each
387 file is processed in isolation: if the output format is
388 CSV, CSV headers will be present in each output file;
389 statistics are only over each file's own records; and so on.
390
391 THEN-CHAINING
392 Output of one verb may be chained as input to another using "then", e.g.
393 mlr stats1 -a min,mean,max -f flag,u,v -g color then sort -f color
394
395 AUXILIARY COMMANDS
396 Miller has a few otherwise-standalone executables packaged within it.
397 They do not participate in any other parts of Miller.
398 Available subcommands:
399 aux-list
400 lecat
401 termcvt
402 hex
403 unhex
404 netbsd-strptime
405 For more information, please invoke mlr {subcommand} --help
406
408 You can set up personal defaults via a $HOME/.mlrrc and/or ./.mlrrc.
409 For example, if you usually process CSV, then you can put "--csv" in your .mlrrc file
410 and that will be the default input/output format unless otherwise specified on the command line.
411
412 The .mlrrc file format is one "--flag" or "--option value" per line, with the leading "--" optional.
413 Hash-style comments and blank lines are ignored.
414
415 Sample .mlrrc:
416 # Input and output formats are CSV by default (unless otherwise specified
417 # on the mlr command line):
418 csv
419 # These are no-ops for CSV, but when I do use JSON output, I want these
420 # pretty-printing options to be used:
421 jvstack
422 jlistwrap
423
424 How to specify location of .mlrrc:
425 * If $MLRRC is set:
426 o If its value is "__none__" then no .mlrrc files are processed.
427 o Otherwise, its value (as a filename) is loaded and processed. If there are syntax
428 errors, they abort mlr with a usage message (as if you had mistyped something on the
429 command line). If the file can't be loaded at all, though, it is silently skipped.
430 o Any .mlrrc in your home directory or current directory is ignored whenever $MLRRC is
431 set in the environment.
432 * Otherwise:
433 o If $HOME/.mlrrc exists, it's then processed as above.
434 o If ./.mlrrc exists, it's then also processed as above.
435 (I.e. current-directory .mlrrc defaults are stacked over home-directory .mlrrc defaults.)
436
437 See also:
438 https://johnkerl.org/miller/doc/customization.html
439
441 altkv
442 Usage: mlr altkv [no options]
443 Given fields with values of the form a,b,c,d,e,f emits a=b,c=d,e=f pairs.
444
445 bar
446 Usage: mlr bar [options]
447 Replaces a numeric field with a number of asterisks, allowing for cheesy
448 bar plots. These align best with --opprint or --oxtab output format.
449 Options:
450 -f {a,b,c} Field names to convert to bars.
451 -c {character} Fill character: default '*'.
452 -x {character} Out-of-bounds character: default '#'.
453 -b {character} Blank character: default '.'.
454 --lo {lo} Lower-limit value for min-width bar: default '0.000000'.
455 --hi {hi} Upper-limit value for max-width bar: default '100.000000'.
456 -w {n} Bar-field width: default '40'.
457 --auto Automatically computes limits, ignoring --lo and --hi.
458 Holds all records in memory before producing any output.
459
460 bootstrap
461 Usage: mlr bootstrap [options]
462 Emits an n-sample, with replacement, of the input records.
463 Options:
464 -n {number} Number of samples to output. Defaults to number of input records.
465 Must be non-negative.
466 See also mlr sample and mlr shuffle.
467
468 cat
469 Usage: mlr cat [options]
470 Passes input records directly to output. Most useful for format conversion.
471 Options:
472 -n Prepend field "n" to each record with record-counter starting at 1
473 -g {comma-separated field name(s)} When used with -n/-N, writes record-counters
474 keyed by specified field name(s).
475 -v Write a low-level record-structure dump to stderr.
476 -N {name} Prepend field {name} to each record with record-counter starting at 1
477
478 check
479 Usage: mlr check
480 Consumes records without printing any output.
481 Useful for doing a well-formatted check on input data.
482
483 clean-whitespace
484 Usage: mlr clean-whitespace [options] {old1,new1,old2,new2,...}
485 For each record, for each field in the record, whitespace-cleans the keys and
486 values. Whitespace-cleaning entails stripping leading and trailing whitespace,
487 and replacing multiple whitespace with singles. For finer-grained control,
488 please see the DSL functions lstrip, rstrip, strip, collapse_whitespace,
489 and clean_whitespace.
490
491 Options:
492 -k|--keys-only Do not touch values.
493 -v|--values-only Do not touch keys.
494 It is an error to specify -k as well as -v.
495
496 count
497 Usage: mlr count [options]
498 Prints number of records, optionally grouped by distinct values for specified field names.
499
500 Options:
501 -g {a,b,c} Field names for distinct count.
502 -n Show only the number of distinct values. Not compatible with -u.
503 -o {name} Field name for output count. Default "count".
504
505 count-distinct
506 Usage: mlr count-distinct [options]
507 Prints number of records having distinct values for specified field names.
508 Same as uniq -c.
509
510 Options:
511 -f {a,b,c} Field names for distinct count.
512 -n Show only the number of distinct values. Not compatible with -u.
513 -o {name} Field name for output count. Default "count".
514 Ignored with -u.
515 -u Do unlashed counts for multiple field names. With -f a,b and
516 without -u, computes counts for distinct combinations of a
517 and b field values. With -f a,b and with -u, computes counts
518 for distinct a field values and counts for distinct b field
519 values separately.
520
521 count-similar
522 Usage: mlr count-similar [options]
523 Ingests all records, then emits each record augmented by a count of
524 the number of other records having the same group-by field values.
525 Options:
526 -g {d,e,f} Group-by-field names for counts.
527 -o {name} Field name for output count. Default "count".
528
529 cut
530 Usage: mlr cut [options]
531 Passes through input records with specified fields included/excluded.
532 -f {a,b,c} Field names to include for cut.
533 -o Retain fields in the order specified here in the argument list.
534 Default is to retain them in the order found in the input data.
535 -x|--complement Exclude, rather than include, field names specified by -f.
536 -r Treat field names as regular expressions. "ab", "a.*b" will
537 match any field name containing the substring "ab" or matching
538 "a.*b", respectively; anchors of the form "^ab$", "^a.*b$" may
539 be used. The -o flag is ignored when -r is present.
540 Examples:
541 mlr cut -f hostname,status
542 mlr cut -x -f hostname,status
543 mlr cut -r -f '^status$,sda[0-9]'
544 mlr cut -r -f '^status$,"sda[0-9]"'
545 mlr cut -r -f '^status$,"sda[0-9]"i' (this is case-insensitive)
546
547 decimate
548 Usage: mlr decimate [options]
549 -n {count} Decimation factor; default 10
550 -b Decimate by printing first of every n.
551 -e Decimate by printing last of every n (default).
552 -g {a,b,c} Optional group-by-field names for decimate counts
553 Passes through one of every n records, optionally by category.
554
555 fill-down
556 Usage: mlr fill-down [options]
557 -f {a,b,c} Field names for fill-down
558 -a|--only-if-absent Field names for fill-down
559 If a given record has a missing value for a given field, fill that from
560 the corresponding value from a previous record, if any.
561 By default, a 'missing' field either is absent, or has the empty-string value.
562 With -a, a field is 'missing' only if it is absent.
563
564 filter
565 Usage: mlr filter [options] {expression}
566 Prints records for which {expression} evaluates to true.
567 If there are multiple semicolon-delimited expressions, all of them are
568 evaluated and the last one is used as the filter criterion.
569
570 Conversion options:
571 -S: Keeps field values as strings with no type inference to int or float.
572 -F: Keeps field values as strings or floats with no inference to int.
573 All field values are type-inferred to int/float/string unless this behavior is
574 suppressed with -S or -F.
575
576 Output/formatting options:
577 --oflatsep {string}: Separator to use when flattening multi-level @-variables
578 to output records for emit. Default ":".
579 --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
580 --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
581 Any of the output-format command-line flags (see mlr -h). Example: using
582 mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
583 the input is CSV, the output is pretty-print tabular, but the tee-file output
584 is written in JSON format.
585 --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
586 record.
587
588 Expression-specification options:
589 -f {filename}: the DSL expression is taken from the specified file rather
590 than from the command line. Outer single quotes wrapping the expression
591 should not be placed in the file. If -f is specified more than once,
592 all input files specified using -f are concatenated to produce the expression.
593 (For example, you can define functions in one file and call them from another.)
594 -e {expression}: You can use this after -f to add an expression. Example use
595 case: define functions/subroutines in a file you specify with -f, then call
596 them with an expression you specify with -e.
597 (If you mix -e and -f then the expressions are evaluated in the order encountered.
598 Since the expression pieces are simply concatenated, please be sure to use intervening
599 semicolons to separate expressions.)
600
601 -s name=value: Predefines out-of-stream variable @name to have value "value".
602 Thus mlr filter put -s foo=97 '$column += @foo' is like
603 mlr filter put 'begin {@foo = 97} $column += @foo'.
604 The value part is subject to type-inferencing as specified by -S/-F.
605 May be specified more than once, e.g. -s name1=value1 -s name2=value2.
606 Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE
607
608 Tracing options:
609 -v: Prints the expressions's AST (abstract syntax tree), which gives
610 full transparency on the precedence and associativity rules of
611 Miller's grammar, to stdout.
612 -a: Prints a low-level stack-allocation trace to stdout.
613 -t: Prints a low-level parser trace to stderr.
614 -T: Prints a every statement to stderr as it is executed.
615
616 Other options:
617 -x: Prints records for which {expression} evaluates to false.
618
619 Please use a dollar sign for field names and double-quotes for string
620 literals. If field names have special characters such as "." then you might
621 use braces, e.g. '${field.name}'. Miller built-in variables are
622 NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
623 variables. The environment-variable name may be an expression, e.g. a field
624 value.
625
626 Use # to comment to end of line.
627
628 Examples:
629 mlr filter 'log10($count) > 4.0'
630 mlr filter 'FNR == 2 (second record in each file)'
631 mlr filter 'urand() < 0.001' (subsampling)
632 mlr filter '$color != "blue" && $value > 4.2'
633 mlr filter '($x<.5 && $y<.5) || ($x>.5 && $y>.5)'
634 mlr filter '($name =~ "^sys.*east$") || ($name =~ "^dev.[0-9]+"i)'
635 mlr filter '$ab = $a+$b; $cd = $c+$d; $ab != $cd'
636 mlr filter '
637 NR == 1 ||
638 #NR == 2 ||
639 NR == 3
640 '
641
642 Please see http://johnkerl.org/miller/doc/reference.html for more information
643 including function list. Or "mlr -f". Please also see "mlr grep" which is
644 useful when you don't yet know which field name(s) you're looking for.
645 Please see in particular:
646 http://www.johnkerl.org/miller/doc/reference-verbs.html#filter
647
648 format-values
649 Usage: mlr format-values [options]
650 Applies format strings to all field values, depending on autodetected type.
651 * If a field value is detected to be integer, applies integer format.
652 * Else, if a field value is detected to be float, applies float format.
653 * Else, applies string format.
654
655 Note: this is a low-keystroke way to apply formatting to many fields. To get
656 finer control, please see the fmtnum function within the mlr put DSL.
657
658 Note: this verb lets you apply arbitrary format strings, which can produce
659 undefined behavior and/or program crashes. See your system's "man printf".
660
661 Options:
662 -i {integer format} Defaults to "%lld".
663 Examples: "%06lld", "%08llx".
664 Note that Miller integers are long long so you must use
665 formats which apply to long long, e.g. with ll in them.
666 Undefined behavior results otherwise.
667 -f {float format} Defaults to "%lf".
668 Examples: "%8.3lf", "%.6le".
669 Note that Miller floats are double-precision so you must
670 use formats which apply to double, e.g. with l[efg] in them.
671 Undefined behavior results otherwise.
672 -s {string format} Defaults to "%s".
673 Examples: "_%s", "%08s".
674 Note that you must use formats which apply to string, e.g.
675 with s in them. Undefined behavior results otherwise.
676 -n Coerce field values autodetected as int to float, and then
677 apply the float format.
678
679 fraction
680 Usage: mlr fraction [options]
681 For each record's value in specified fields, computes the ratio of that
682 value to the sum of values in that field over all input records.
683 E.g. with input records x=1 x=2 x=3 and x=4, emits output records
684 x=1,x_fraction=0.1 x=2,x_fraction=0.2 x=3,x_fraction=0.3 and x=4,x_fraction=0.4
685
686 Note: this is internally a two-pass algorithm: on the first pass it retains
687 input records and accumulates sums; on the second pass it computes quotients
688 and emits output records. This means it produces no output until all input is read.
689
690 Options:
691 -f {a,b,c} Field name(s) for fraction calculation
692 -g {d,e,f} Optional group-by-field name(s) for fraction counts
693 -p Produce percents [0..100], not fractions [0..1]. Output field names
694 end with "_percent" rather than "_fraction"
695 -c Produce cumulative distributions, i.e. running sums: each output
696 value folds in the sum of the previous for the specified group
697 E.g. with input records x=1 x=2 x=3 and x=4, emits output records
698 x=1,x_cumulative_fraction=0.1 x=2,x_cumulative_fraction=0.3
699 x=3,x_cumulative_fraction=0.6 and x=4,x_cumulative_fraction=1.0
700
701 grep
702 Usage: mlr grep [options] {regular expression}
703 Passes through records which match {regex}.
704 Options:
705 -i Use case-insensitive search.
706 -v Invert: pass through records which do not match the regex.
707 Note that "mlr filter" is more powerful, but requires you to know field names.
708 By contrast, "mlr grep" allows you to regex-match the entire record. It does
709 this by formatting each record in memory as DKVP, using command-line-specified
710 ORS/OFS/OPS, and matching the resulting line against the regex specified
711 here. In particular, the regex is not applied to the input stream: if you
712 have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
713 be matched, not against either of these lines, but against the DKVP line
714 "x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
715 and this command is intended to be merely a keystroke-saver. To get all the
716 features of system grep, you can do
717 "mlr --odkvp ... | grep ... | mlr --idkvp ..."
718
719 group-by
720 Usage: mlr group-by {comma-separated field names}
721 Outputs records in batches having identical values at specified field names.
722
723 group-like
724 Usage: mlr group-like
725 Outputs records in batches having identical field names.
726
727 having-fields
728 Usage: mlr having-fields [options]
729 Conditionally passes through records depending on each record's field names.
730 Options:
731 --at-least {comma-separated names}
732 --which-are {comma-separated names}
733 --at-most {comma-separated names}
734 --all-matching {regular expression}
735 --any-matching {regular expression}
736 --none-matching {regular expression}
737 Examples:
738 mlr having-fields --which-are amount,status,owner
739 mlr having-fields --any-matching 'sda[0-9]'
740 mlr having-fields --any-matching '"sda[0-9]"'
741 mlr having-fields --any-matching '"sda[0-9]"i' (this is case-insensitive)
742
743 head
744 Usage: mlr head [options]
745 -n {count} Head count to print; default 10
746 -g {a,b,c} Optional group-by-field names for head counts
747 Passes through the first n records, optionally by category.
748 Without -g, ceases consuming more input (i.e. is fast) when n
749 records have been read.
750
751 histogram
752 Usage: mlr histogram [options]
753 -f {a,b,c} Value-field names for histogram counts
754 --lo {lo} Histogram low value
755 --hi {hi} Histogram high value
756 --nbins {n} Number of histogram bins
757 --auto Automatically computes limits, ignoring --lo and --hi.
758 Holds all values in memory before producing any output.
759 -o {prefix} Prefix for output field name. Default: no prefix.
760 Just a histogram. Input values < lo or > hi are not counted.
761
762 join
763 Usage: mlr join [options]
764 Joins records from specified left file name with records from all file names
765 at the end of the Miller argument list.
766 Functionality is essentially the same as the system "join" command, but for
767 record streams.
768 Options:
769 -f {left file name}
770 -j {a,b,c} Comma-separated join-field names for output
771 -l {a,b,c} Comma-separated join-field names for left input file;
772 defaults to -j values if omitted.
773 -r {a,b,c} Comma-separated join-field names for right input file(s);
774 defaults to -j values if omitted.
775 --lp {text} Additional prefix for non-join output field names from
776 the left file
777 --rp {text} Additional prefix for non-join output field names from
778 the right file(s)
779 --np Do not emit paired records
780 --ul Emit unpaired records from the left file
781 --ur Emit unpaired records from the right file(s)
782 -s|--sorted-input Require sorted input: records must be sorted
783 lexically by their join-field names, else not all records will
784 be paired. The only likely use case for this is with a left
785 file which is too big to fit into system memory otherwise.
786 -u Enable unsorted input. (This is the default even without -u.)
787 In this case, the entire left file will be loaded into memory.
788 --prepipe {command} As in main input options; see mlr --help for details.
789 If you wish to use a prepipe command for the main input as well
790 as here, it must be specified there as well as here.
791 File-format options default to those for the right file names on the Miller
792 argument list, but may be overridden for the left file as follows. Please see
793 the main "mlr --help" for more information on syntax for these arguments.
794 -i {one of csv,dkvp,nidx,pprint,xtab}
795 --irs {record-separator character}
796 --ifs {field-separator character}
797 --ips {pair-separator character}
798 --repifs
799 --repips
800 Please use "mlr --usage-separator-options" for information on specifying separators.
801 Please see http://johnkerl.org/miller/doc/reference-verbs.html#join for more information
802 including examples.
803
804 label
805 Usage: mlr label {new1,new2,new3,...}
806 Given n comma-separated names, renames the first n fields of each record to
807 have the respective name. (Fields past the nth are left with their original
808 names.) Particularly useful with --inidx or --implicit-csv-header, to give
809 useful names to otherwise integer-indexed fields.
810 Examples:
811 "echo 'a b c d' | mlr --inidx --odkvp cat" gives "1=a,2=b,3=c,4=d"
812 "echo 'a b c d' | mlr --inidx --odkvp label s,t" gives "s=a,t=b,3=c,4=d"
813
814 least-frequent
815 Usage: mlr least-frequent [options]
816 Shows the least frequently occurring distinct values for specified field names.
817 The first entry is the statistical anti-mode; the remaining are runners-up.
818 Options:
819 -f {one or more comma-separated field names}. Required flag.
820 -n {count}. Optional flag defaulting to 10.
821 -b Suppress counts; show only field values.
822 -o {name} Field name for output count. Default "count".
823 See also "mlr most-frequent".
824
825 merge-fields
826 Usage: mlr merge-fields [options]
827 Computes univariate statistics for each input record, accumulated across
828 specified fields.
829 Options:
830 -a {sum,count,...} Names of accumulators. One or more of:
831 count Count instances of fields
832 mode Find most-frequently-occurring values for fields; first-found wins tie
833 antimode Find least-frequently-occurring values for fields; first-found wins tie
834 sum Compute sums of specified fields
835 mean Compute averages (sample means) of specified fields
836 stddev Compute sample standard deviation of specified fields
837 var Compute sample variance of specified fields
838 meaneb Estimate error bars for averages (assuming no sample autocorrelation)
839 skewness Compute sample skewness of specified fields
840 kurtosis Compute sample kurtosis of specified fields
841 min Compute minimum values of specified fields
842 max Compute maximum values of specified fields
843 -f {a,b,c} Value-field names on which to compute statistics. Requires -o.
844 -r {a,b,c} Regular expressions for value-field names on which to compute
845 statistics. Requires -o.
846 -c {a,b,c} Substrings for collapse mode. All fields which have the same names
847 after removing substrings will be accumulated together. Please see
848 examples below.
849 -i Use interpolated percentiles, like R's type=7; default like type=1.
850 Not sensical for string-valued fields.
851 -o {name} Output field basename for -f/-r.
852 -k Keep the input fields which contributed to the output statistics;
853 the default is to omit them.
854 -F Computes integerable things (e.g. count) in floating point.
855
856 String-valued data make sense unless arithmetic on them is required,
857 e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
858 numbers are less than strings.
859
860 Example input data: "a_in_x=1,a_out_x=2,b_in_y=4,b_out_x=8".
861 Example: mlr merge-fields -a sum,count -f a_in_x,a_out_x -o foo
862 produces "b_in_y=4,b_out_x=8,foo_sum=3,foo_count=2" since "a_in_x,a_out_x" are
863 summed over.
864 Example: mlr merge-fields -a sum,count -r in_,out_ -o bar
865 produces "bar_sum=15,bar_count=4" since all four fields are summed over.
866 Example: mlr merge-fields -a sum,count -c in_,out_
867 produces "a_x_sum=3,a_x_count=2,b_y_sum=4,b_y_count=1,b_x_sum=8,b_x_count=1"
868 since "a_in_x" and "a_out_x" both collapse to "a_x", "b_in_y" collapses to
869 "b_y", and "b_out_x" collapses to "b_x".
870
871 most-frequent
872 Usage: mlr most-frequent [options]
873 Shows the most frequently occurring distinct values for specified field names.
874 The first entry is the statistical mode; the remaining are runners-up.
875 Options:
876 -f {one or more comma-separated field names}. Required flag.
877 -n {count}. Optional flag defaulting to 10.
878 -b Suppress counts; show only field values.
879 -o {name} Field name for output count. Default "count".
880 See also "mlr least-frequent".
881
882 nest
883 Usage: mlr nest [options]
884 Explodes specified field values into separate fields/records, or reverses this.
885 Options:
886 --explode,--implode One is required.
887 --values,--pairs One is required.
888 --across-records,--across-fields One is required.
889 -f {field name} Required.
890 --nested-fs {string} Defaults to ";". Field separator for nested values.
891 --nested-ps {string} Defaults to ":". Pair separator for nested key-value pairs.
892 --evar {string} Shorthand for --explode --values ---across-records --nested-fs {string}
893 --ivar {string} Shorthand for --implode --values ---across-records --nested-fs {string}
894 Please use "mlr --usage-separator-options" for information on specifying separators.
895
896 Examples:
897
898 mlr nest --explode --values --across-records -f x
899 with input record "x=a;b;c,y=d" produces output records
900 "x=a,y=d"
901 "x=b,y=d"
902 "x=c,y=d"
903 Use --implode to do the reverse.
904
905 mlr nest --explode --values --across-fields -f x
906 with input record "x=a;b;c,y=d" produces output records
907 "x_1=a,x_2=b,x_3=c,y=d"
908 Use --implode to do the reverse.
909
910 mlr nest --explode --pairs --across-records -f x
911 with input record "x=a:1;b:2;c:3,y=d" produces output records
912 "a=1,y=d"
913 "b=2,y=d"
914 "c=3,y=d"
915
916 mlr nest --explode --pairs --across-fields -f x
917 with input record "x=a:1;b:2;c:3,y=d" produces output records
918 "a=1,b=2,c=3,y=d"
919
920 Notes:
921 * With --pairs, --implode doesn't make sense since the original field name has
922 been lost.
923 * The combination "--implode --values --across-records" is non-streaming:
924 no output records are produced until all input records have been read. In
925 particular, this means it won't work in tail -f contexts. But all other flag
926 combinations result in streaming (tail -f friendly) data processing.
927 * It's up to you to ensure that the nested-fs is distinct from your data's IFS:
928 e.g. by default the former is semicolon and the latter is comma.
929 See also mlr reshape.
930
931 nothing
932 Usage: mlr nothing
933 Drops all input records. Useful for testing, or after tee/print/etc. have
934 produced other output.
935
936 put
937 Usage: mlr put [options] {expression}
938 Adds/updates specified field(s). Expressions are semicolon-separated and must
939 either be assignments, or evaluate to boolean. Booleans with following
940 statements in curly braces control whether those statements are executed;
941 booleans without following curly braces do nothing except side effects (e.g.
942 regex-captures into \1, \2, etc.).
943
944 Conversion options:
945 -S: Keeps field values as strings with no type inference to int or float.
946 -F: Keeps field values as strings or floats with no inference to int.
947 All field values are type-inferred to int/float/string unless this behavior is
948 suppressed with -S or -F.
949
950 Output/formatting options:
951 --oflatsep {string}: Separator to use when flattening multi-level @-variables
952 to output records for emit. Default ":".
953 --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
954 --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
955 Any of the output-format command-line flags (see mlr -h). Example: using
956 mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
957 the input is CSV, the output is pretty-print tabular, but the tee-file output
958 is written in JSON format.
959 --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
960 record.
961
962 Expression-specification options:
963 -f {filename}: the DSL expression is taken from the specified file rather
964 than from the command line. Outer single quotes wrapping the expression
965 should not be placed in the file. If -f is specified more than once,
966 all input files specified using -f are concatenated to produce the expression.
967 (For example, you can define functions in one file and call them from another.)
968 -e {expression}: You can use this after -f to add an expression. Example use
969 case: define functions/subroutines in a file you specify with -f, then call
970 them with an expression you specify with -e.
971 (If you mix -e and -f then the expressions are evaluated in the order encountered.
972 Since the expression pieces are simply concatenated, please be sure to use intervening
973 semicolons to separate expressions.)
974
975 -s name=value: Predefines out-of-stream variable @name to have value "value".
976 Thus mlr put put -s foo=97 '$column += @foo' is like
977 mlr put put 'begin {@foo = 97} $column += @foo'.
978 The value part is subject to type-inferencing as specified by -S/-F.
979 May be specified more than once, e.g. -s name1=value1 -s name2=value2.
980 Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE
981
982 Tracing options:
983 -v: Prints the expressions's AST (abstract syntax tree), which gives
984 full transparency on the precedence and associativity rules of
985 Miller's grammar, to stdout.
986 -a: Prints a low-level stack-allocation trace to stdout.
987 -t: Prints a low-level parser trace to stderr.
988 -T: Prints a every statement to stderr as it is executed.
989
990 Other options:
991 -q: Does not include the modified record in the output stream. Useful for when
992 all desired output is in begin and/or end blocks.
993
994 Please use a dollar sign for field names and double-quotes for string
995 literals. If field names have special characters such as "." then you might
996 use braces, e.g. '${field.name}'. Miller built-in variables are
997 NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
998 variables. The environment-variable name may be an expression, e.g. a field
999 value.
1000
1001 Use # to comment to end of line.
1002
1003 Examples:
1004 mlr put '$y = log10($x); $z = sqrt($y)'
1005 mlr put '$x>0.0 { $y=log10($x); $z=sqrt($y) }' # does {...} only if $x > 0.0
1006 mlr put '$x>0.0; $y=log10($x); $z=sqrt($y)' # does all three statements
1007 mlr put '$a =~ "([a-z]+)_([0-9]+); $b = "left_\1"; $c = "right_\2"'
1008 mlr put '$a =~ "([a-z]+)_([0-9]+) { $b = "left_\1"; $c = "right_\2" }'
1009 mlr put '$filename = FILENAME'
1010 mlr put '$colored_shape = $color . "_" . $shape'
1011 mlr put '$y = cos($theta); $z = atan2($y, $x)'
1012 mlr put '$name = sub($name, "http.*com"i, "")'
1013 mlr put -q '@sum += $x; end {emit @sum}'
1014 mlr put -q '@sum[$a] += $x; end {emit @sum, "a"}'
1015 mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}'
1016 mlr put -q '@min=min(@min,$x);@max=max(@max,$x); end{emitf @min, @max}'
1017 mlr put -q 'is_null(@xmax) || $x > @xmax {@xmax=$x; @recmax=$*}; end {emit @recmax}'
1018 mlr put '
1019 $x = 1;
1020 #$y = 2;
1021 $z = 3
1022 '
1023
1024 Please see also 'mlr -k' for examples using redirected output.
1025
1026 Please see http://johnkerl.org/miller/doc/reference.html for more information
1027 including function list. Or "mlr -f".
1028 Please see in particular:
1029 http://www.johnkerl.org/miller/doc/reference-verbs.html#put
1030
1031 regularize
1032 Usage: mlr regularize
1033 For records seen earlier in the data stream with same field names in
1034 a different order, outputs them with field names in the previously
1035 encountered order.
1036 Example: input records a=1,c=2,b=3, then e=4,d=5, then c=7,a=6,b=8
1037 output as a=1,c=2,b=3, then e=4,d=5, then a=6,c=7,b=8
1038
1039 remove-empty-columns
1040 Usage: mlr remove-empty-columns
1041 Omits fields which are empty on every input row. Non-streaming.
1042
1043 rename
1044 Usage: mlr rename [options] {old1,new1,old2,new2,...}
1045 Renames specified fields.
1046 Options:
1047 -r Treat old field names as regular expressions. "ab", "a.*b"
1048 will match any field name containing the substring "ab" or
1049 matching "a.*b", respectively; anchors of the form "^ab$",
1050 "^a.*b$" may be used. New field names may be plain strings,
1051 or may contain capture groups of the form "\1" through
1052 "\9". Wrapping the regex in double quotes is optional, but
1053 is required if you wish to follow it with 'i' to indicate
1054 case-insensitivity.
1055 -g Do global replacement within each field name rather than
1056 first-match replacement.
1057 Examples:
1058 mlr rename old_name,new_name'
1059 mlr rename old_name_1,new_name_1,old_name_2,new_name_2'
1060 mlr rename -r 'Date_[0-9]+,Date,' Rename all such fields to be "Date"
1061 mlr rename -r '"Date_[0-9]+",Date' Same
1062 mlr rename -r 'Date_([0-9]+).*,\1' Rename all such fields to be of the form 20151015
1063 mlr rename -r '"name"i,Name' Rename "name", "Name", "NAME", etc. to "Name"
1064
1065 reorder
1066 Usage: mlr reorder [options]
1067 -f {a,b,c} Field names to reorder.
1068 -e Put specified field names at record end: default is to put
1069 them at record start.
1070 Examples:
1071 mlr reorder -f a,b sends input record "d=4,b=2,a=1,c=3" to "a=1,b=2,d=4,c=3".
1072 mlr reorder -e -f a,b sends input record "d=4,b=2,a=1,c=3" to "d=4,c=3,a=1,b=2".
1073
1074 repeat
1075 Usage: mlr repeat [options]
1076 Copies input records to output records multiple times.
1077 Options must be exactly one of the following:
1078 -n {repeat count} Repeat each input record this many times.
1079 -f {field name} Same, but take the repeat count from the specified
1080 field name of each input record.
1081 Example:
1082 echo x=0 | mlr repeat -n 4 then put '$x=urand()'
1083 produces:
1084 x=0.488189
1085 x=0.484973
1086 x=0.704983
1087 x=0.147311
1088 Example:
1089 echo a=1,b=2,c=3 | mlr repeat -f b
1090 produces:
1091 a=1,b=2,c=3
1092 a=1,b=2,c=3
1093 Example:
1094 echo a=1,b=2,c=3 | mlr repeat -f c
1095 produces:
1096 a=1,b=2,c=3
1097 a=1,b=2,c=3
1098 a=1,b=2,c=3
1099
1100 reshape
1101 Usage: mlr reshape [options]
1102 Wide-to-long options:
1103 -i {input field names} -o {key-field name,value-field name}
1104 -r {input field regexes} -o {key-field name,value-field name}
1105 These pivot/reshape the input data such that the input fields are removed
1106 and separate records are emitted for each key/value pair.
1107 Note: this works with tail -f and produces output records for each input
1108 record seen.
1109 Long-to-wide options:
1110 -s {key-field name,value-field name}
1111 These pivot/reshape the input data to undo the wide-to-long operation.
1112 Note: this does not work with tail -f; it produces output records only after
1113 all input records have been read.
1114
1115 Examples:
1116
1117 Input file "wide.txt":
1118 time X Y
1119 2009-01-01 0.65473572 2.4520609
1120 2009-01-02 -0.89248112 0.2154713
1121 2009-01-03 0.98012375 1.3179287
1122
1123 mlr --pprint reshape -i X,Y -o item,value wide.txt
1124 time item value
1125 2009-01-01 X 0.65473572
1126 2009-01-01 Y 2.4520609
1127 2009-01-02 X -0.89248112
1128 2009-01-02 Y 0.2154713
1129 2009-01-03 X 0.98012375
1130 2009-01-03 Y 1.3179287
1131
1132 mlr --pprint reshape -r '[A-Z]' -o item,value wide.txt
1133 time item value
1134 2009-01-01 X 0.65473572
1135 2009-01-01 Y 2.4520609
1136 2009-01-02 X -0.89248112
1137 2009-01-02 Y 0.2154713
1138 2009-01-03 X 0.98012375
1139 2009-01-03 Y 1.3179287
1140
1141 Input file "long.txt":
1142 time item value
1143 2009-01-01 X 0.65473572
1144 2009-01-01 Y 2.4520609
1145 2009-01-02 X -0.89248112
1146 2009-01-02 Y 0.2154713
1147 2009-01-03 X 0.98012375
1148 2009-01-03 Y 1.3179287
1149
1150 mlr --pprint reshape -s item,value long.txt
1151 time X Y
1152 2009-01-01 0.65473572 2.4520609
1153 2009-01-02 -0.89248112 0.2154713
1154 2009-01-03 0.98012375 1.3179287
1155 See also mlr nest.
1156
1157 sample
1158 Usage: mlr sample [options]
1159 Reservoir sampling (subsampling without replacement), optionally by category.
1160 -k {count} Required: number of records to output, total, or by group if using -g.
1161 -g {a,b,c} Optional: group-by-field names for samples.
1162 See also mlr bootstrap and mlr shuffle.
1163
1164 sec2gmt
1165 Usage: mlr sec2gmt [options] {comma-separated list of field names}
1166 Replaces a numeric field representing seconds since the epoch with the
1167 corresponding GMT timestamp; leaves non-numbers as-is. This is nothing
1168 more than a keystroke-saver for the sec2gmt function:
1169 mlr sec2gmt time1,time2
1170 is the same as
1171 mlr put '$time1=sec2gmt($time1);$time2=sec2gmt($time2)'
1172 Options:
1173 -1 through -9: format the seconds using 1..9 decimal places, respectively.
1174
1175 sec2gmtdate
1176 Usage: mlr sec2gmtdate {comma-separated list of field names}
1177 Replaces a numeric field representing seconds since the epoch with the
1178 corresponding GMT year-month-day timestamp; leaves non-numbers as-is.
1179 This is nothing more than a keystroke-saver for the sec2gmtdate function:
1180 mlr sec2gmtdate time1,time2
1181 is the same as
1182 mlr put '$time1=sec2gmtdate($time1);$time2=sec2gmtdate($time2)'
1183
1184 seqgen
1185 Usage: mlr seqgen [options]
1186 Produces a sequence of counters. Discards the input record stream. Produces
1187 output as specified by the following options:
1188 -f {name} Field name for counters; default "i".
1189 --start {number} Inclusive start value; default "1".
1190 --stop {number} Inclusive stop value; default "100".
1191 --step {number} Step value; default "1".
1192 Start, stop, and/or step may be floating-point. Output is integer if start,
1193 stop, and step are all integers. Step may be negative. It may not be zero
1194 unless start == stop.
1195
1196 shuffle
1197 Usage: mlr shuffle {no options}
1198 Outputs records randomly permuted. No output records are produced until
1199 all input records are read.
1200 See also mlr bootstrap and mlr sample.
1201
1202 skip-trivial-records
1203 Usage: mlr skip-trivial-records [options]
1204 Passes through all records except:
1205 * those with zero fields;
1206 * those for which all fields have empty value.
1207
1208 sort
1209 Usage: mlr sort {flags}
1210 Flags:
1211 -f {comma-separated field names} Lexical ascending
1212 -n {comma-separated field names} Numerical ascending; nulls sort last
1213 -nf {comma-separated field names} Same as -n
1214 -r {comma-separated field names} Lexical descending
1215 -nr {comma-separated field names} Numerical descending; nulls sort first
1216 Sorts records primarily by the first specified field, secondarily by the second
1217 field, and so on. (Any records not having all specified sort keys will appear
1218 at the end of the output, in the order they were encountered, regardless of the
1219 specified sort order.) The sort is stable: records that compare equal will sort
1220 in the order they were encountered in the input record stream.
1221
1222 Example:
1223 mlr sort -f a,b -nr x,y,z
1224 which is the same as:
1225 mlr sort -f a -f b -nr x -nr y -nr z
1226
1227 stats1
1228 Usage: mlr stats1 [options]
1229 Computes univariate statistics for one or more given fields, accumulated across
1230 the input record stream.
1231 Options:
1232 -a {sum,count,...} Names of accumulators: p10 p25.2 p50 p98 p100 etc. and/or
1233 one or more of:
1234 count Count instances of fields
1235 mode Find most-frequently-occurring values for fields; first-found wins tie
1236 antimode Find least-frequently-occurring values for fields; first-found wins tie
1237 sum Compute sums of specified fields
1238 mean Compute averages (sample means) of specified fields
1239 stddev Compute sample standard deviation of specified fields
1240 var Compute sample variance of specified fields
1241 meaneb Estimate error bars for averages (assuming no sample autocorrelation)
1242 skewness Compute sample skewness of specified fields
1243 kurtosis Compute sample kurtosis of specified fields
1244 min Compute minimum values of specified fields
1245 max Compute maximum values of specified fields
1246 -f {a,b,c} Value-field names on which to compute statistics
1247 --fr {regex} Regex for value-field names on which to compute statistics
1248 (compute statistics on values in all field names matching regex)
1249 --fx {regex} Inverted regex for value-field names on which to compute statistics
1250 (compute statistics on values in all field names not matching regex)
1251 -g {d,e,f} Optional group-by-field names
1252 --gr {regex} Regex for optional group-by-field names
1253 (group by values in field names matching regex)
1254 --gx {regex} Inverted regex for optional group-by-field names
1255 (group by values in field names not matching regex)
1256 --grfx {regex} Shorthand for --gr {regex} --fx {that same regex}
1257 -i Use interpolated percentiles, like R's type=7; default like type=1.
1258 Not sensical for string-valued fields.
1259 -s Print iterative stats. Useful in tail -f contexts (in which
1260 case please avoid pprint-format output since end of input
1261 stream will never be seen).
1262 -F Computes integerable things (e.g. count) in floating point.
1263 Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
1264 Example: mlr stats1 -a count,mode -f size
1265 Example: mlr stats1 -a count,mode -f size -g shape
1266 Example: mlr stats1 -a count,mode --fr '^[a-h].*$' -gr '^k.*$'
1267 This computes count and mode statistics on all field names beginning
1268 with a through h, grouped by all field names starting with k.
1269 Notes:
1270 * p50 and median are synonymous.
1271 * min and max output the same results as p0 and p100, respectively, but use
1272 less memory.
1273 * String-valued data make sense unless arithmetic on them is required,
1274 e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
1275 numbers are less than strings.
1276 * count and mode allow text input; the rest require numeric input.
1277 In particular, 1 and 1.0 are distinct text for count and mode.
1278 * When there are mode ties, the first-encountered datum wins.
1279
1280 stats2
1281 Usage: mlr stats2 [options]
1282 Computes bivariate statistics for one or more given field-name pairs,
1283 accumulated across the input record stream.
1284 -a {linreg-ols,corr,...} Names of accumulators: one or more of:
1285 linreg-pca Linear regression using principal component analysis
1286 linreg-ols Linear regression using ordinary least squares
1287 r2 Quality metric for linreg-ols (linreg-pca emits its own)
1288 logireg Logistic regression
1289 corr Sample correlation
1290 cov Sample covariance
1291 covx Sample-covariance matrix
1292 -f {a,b,c,d} Value-field name-pairs on which to compute statistics.
1293 There must be an even number of names.
1294 -g {e,f,g} Optional group-by-field names.
1295 -v Print additional output for linreg-pca.
1296 -s Print iterative stats. Useful in tail -f contexts (in which
1297 case please avoid pprint-format output since end of input
1298 stream will never be seen).
1299 --fit Rather than printing regression parameters, applies them to
1300 the input data to compute new fit fields. All input records are
1301 held in memory until end of input stream. Has effect only for
1302 linreg-ols, linreg-pca, and logireg.
1303 Only one of -s or --fit may be used.
1304 Example: mlr stats2 -a linreg-pca -f x,y
1305 Example: mlr stats2 -a linreg-ols,r2 -f x,y -g size,shape
1306 Example: mlr stats2 -a corr -f x,y
1307
1308 step
1309 Usage: mlr step [options]
1310 Computes values dependent on the previous record, optionally grouped
1311 by category.
1312
1313 Options:
1314 -a {delta,rsum,...} Names of steppers: comma-separated, one or more of:
1315 delta Compute differences in field(s) between successive records
1316 shift Include value(s) in field(s) from previous record, if any
1317 from-first Compute differences in field(s) from first record
1318 ratio Compute ratios in field(s) between successive records
1319 rsum Compute running sums of field(s) between successive records
1320 counter Count instances of field(s) between successive records
1321 ewma Exponentially weighted moving average over successive records
1322 -f {a,b,c} Value-field names on which to compute statistics
1323 -g {d,e,f} Optional group-by-field names
1324 -F Computes integerable things (e.g. counter) in floating point.
1325 -d {x,y,z} Weights for ewma. 1 means current sample gets all weight (no
1326 smoothing), near under under 1 is light smoothing, near over 0 is
1327 heavy smoothing. Multiple weights may be specified, e.g.
1328 "mlr step -a ewma -f sys_load -d 0.01,0.1,0.9". Default if omitted
1329 is "-d 0.5".
1330 -o {a,b,c} Custom suffixes for EWMA output fields. If omitted, these default to
1331 the -d values. If supplied, the number of -o values must be the same
1332 as the number of -d values.
1333
1334 Examples:
1335 mlr step -a rsum -f request_size
1336 mlr step -a delta -f request_size -g hostname
1337 mlr step -a ewma -d 0.1,0.9 -f x,y
1338 mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y
1339 mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y -g group_name
1340
1341 Please see http://johnkerl.org/miller/doc/reference-verbs.html#filter or
1342 https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average
1343 for more information on EWMA.
1344
1345 tac
1346 Usage: mlr tac
1347 Prints records in reverse order from the order in which they were encountered.
1348
1349 tail
1350 Usage: mlr tail [options]
1351 -n {count} Tail count to print; default 10
1352 -g {a,b,c} Optional group-by-field names for tail counts
1353 Passes through the last n records, optionally by category.
1354
1355 tee
1356 Usage: mlr tee [options] {filename}
1357 Passes through input records (like mlr cat) but also writes to specified output
1358 file, using output-format flags from the command line (e.g. --ocsv). See also
1359 the "tee" keyword within mlr put, which allows data-dependent filenames.
1360 Options:
1361 -a: append to existing file, if any, rather than overwriting.
1362 --no-fflush: don't call fflush() after every record.
1363 Any of the output-format command-line flags (see mlr -h). Example: using
1364 mlr --icsv --opprint put '...' then tee --ojson ./mytap.dat then stats1 ...
1365 the input is CSV, the output is pretty-print tabular, but the tee-file output
1366 is written in JSON format.
1367
1368 top
1369 Usage: mlr top [options]
1370 -f {a,b,c} Value-field names for top counts.
1371 -g {d,e,f} Optional group-by-field names for top counts.
1372 -n {count} How many records to print per category; default 1.
1373 -a Print all fields for top-value records; default is
1374 to print only value and group-by fields. Requires a single
1375 value-field name only.
1376 --min Print top smallest values; default is top largest values.
1377 -F Keep top values as floats even if they look like integers.
1378 -o {name} Field name for output indices. Default "top_idx".
1379 Prints the n records with smallest/largest values at specified fields,
1380 optionally by category.
1381
1382 uniq
1383 Usage: mlr uniq [options]
1384 Prints distinct values for specified field names. With -c, same as
1385 count-distinct. For uniq, -f is a synonym for -g.
1386
1387 Options:
1388 -g {d,e,f} Group-by-field names for uniq counts.
1389 -c Show repeat counts in addition to unique values.
1390 -n Show only the number of distinct values.
1391 -o {name} Field name for output count. Default "count".
1392 -a Output each unique record only once. Incompatible with -g.
1393 With -c, produces unique records, with repeat counts for each.
1394 With -n, produces only one record which is the unique-record count.
1395 With neither -c nor -n, produces unique records.
1396
1397 unsparsify
1398 Usage: mlr unsparsify [options]
1399 Prints records with the union of field names over all input records.
1400 For field names absent in a given record but present in others, fills in
1401 a value. This verb retains all input before producing any output.
1402
1403 Options:
1404 --fill-with {filler string} What to fill absent fields with. Defaults to
1405 the empty string.
1406
1407 Example: if the input is two records, one being 'a=1,b=2' and the other
1408 being 'b=3,c=4', then the output is the two records 'a=1,b=2,c=' and
1409 ’a=,b=3,c=4'.
1410
1412 +
1413 (class=arithmetic #args=2): Addition.
1414
1415 + (class=arithmetic #args=1): Unary plus.
1416
1417 -
1418 (class=arithmetic #args=2): Subtraction.
1419
1420 - (class=arithmetic #args=1): Unary minus.
1421
1422 *
1423 (class=arithmetic #args=2): Multiplication.
1424
1425 /
1426 (class=arithmetic #args=2): Division.
1427
1428 //
1429 (class=arithmetic #args=2): Integer division: rounds to negative (pythonic).
1430
1431 .+
1432 (class=arithmetic #args=2): Addition, with integer-to-integer overflow
1433
1434 .+ (class=arithmetic #args=1): Unary plus, with integer-to-integer overflow.
1435
1436 .-
1437 (class=arithmetic #args=2): Subtraction, with integer-to-integer overflow.
1438
1439 .- (class=arithmetic #args=1): Unary minus, with integer-to-integer overflow.
1440
1441 .*
1442 (class=arithmetic #args=2): Multiplication, with integer-to-integer overflow.
1443
1444 ./
1445 (class=arithmetic #args=2): Division, with integer-to-integer overflow.
1446
1447 .//
1448 (class=arithmetic #args=2): Integer division: rounds to negative (pythonic), with integer-to-integer overflow.
1449
1450 %
1451 (class=arithmetic #args=2): Remainder; never negative-valued (pythonic).
1452
1453 **
1454 (class=arithmetic #args=2): Exponentiation; same as pow, but as an infix
1455 operator.
1456
1457 |
1458 (class=arithmetic #args=2): Bitwise OR.
1459
1460 ^
1461 (class=arithmetic #args=2): Bitwise XOR.
1462
1463 &
1464 (class=arithmetic #args=2): Bitwise AND.
1465
1466 ~
1467 (class=arithmetic #args=1): Bitwise NOT. Beware '$y=~$x' since =~ is the
1468 regex-match operator: try '$y = ~$x'.
1469
1470 <<
1471 (class=arithmetic #args=2): Bitwise left-shift.
1472
1473 >>
1474 (class=arithmetic #args=2): Bitwise right-shift.
1475
1476 bitcount
1477 (class=arithmetic #args=1): Count of 1-bits
1478
1479 ==
1480 (class=boolean #args=2): String/numeric equality. Mixing number and string
1481 results in string compare.
1482
1483 !=
1484 (class=boolean #args=2): String/numeric inequality. Mixing number and string
1485 results in string compare.
1486
1487 =~
1488 (class=boolean #args=2): String (left-hand side) matches regex (right-hand
1489 side), e.g. '$name =~ "^a.*b$"'.
1490
1491 !=~
1492 (class=boolean #args=2): String (left-hand side) does not match regex
1493 (right-hand side), e.g. '$name !=~ "^a.*b$"'.
1494
1495 >
1496 (class=boolean #args=2): String/numeric greater-than. Mixing number and string
1497 results in string compare.
1498
1499 >=
1500 (class=boolean #args=2): String/numeric greater-than-or-equals. Mixing number
1501 and string results in string compare.
1502
1503 <
1504 (class=boolean #args=2): String/numeric less-than. Mixing number and string
1505 results in string compare.
1506
1507 <=
1508 (class=boolean #args=2): String/numeric less-than-or-equals. Mixing number
1509 and string results in string compare.
1510
1511 &&
1512 (class=boolean #args=2): Logical AND.
1513
1514 ||
1515 (class=boolean #args=2): Logical OR.
1516
1517 ^^
1518 (class=boolean #args=2): Logical XOR.
1519
1520 !
1521 (class=boolean #args=1): Logical negation.
1522
1523 ? :
1524 (class=boolean #args=3): Ternary operator.
1525
1526 .
1527 (class=string #args=2): String concatenation.
1528
1529 gsub
1530 (class=string #args=3): Example: '$name=gsub($name, "old", "new")'
1531 (replace all).
1532
1533 regextract
1534 (class=string #args=2): Example: '$name=regextract($name, "[A-Z]{3}[0-9]{2}")'
1535 .
1536
1537 regextract_or_else
1538 (class=string #args=3): Example: '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")'
1539 .
1540
1541 strlen
1542 (class=string #args=1): String length.
1543
1544 sub
1545 (class=string #args=3): Example: '$name=sub($name, "old", "new")'
1546 (replace once).
1547
1548 ssub
1549 (class=string #args=3): Like sub but does no regexing. No characters are special.
1550
1551 substr
1552 (class=string #args=3): substr(s,m,n) gives substring of s from 0-up position m to n
1553 inclusive. Negative indices -len .. -1 alias to 0 .. len-1.
1554
1555 tolower
1556 (class=string #args=1): Convert string to lowercase.
1557
1558 toupper
1559 (class=string #args=1): Convert string to uppercase.
1560
1561 capitalize
1562 (class=string #args=1): Convert string's first character to uppercase.
1563
1564 lstrip
1565 (class=string #args=1): Strip leading whitespace from string.
1566
1567 rstrip
1568 (class=string #args=1): Strip trailing whitespace from string.
1569
1570 strip
1571 (class=string #args=1): Strip leading and trailing whitespace from string.
1572
1573 collapse_whitespace
1574 (class=string #args=1): Strip repeated whitespace from string.
1575
1576 clean_whitespace
1577 (class=string #args=1): Same as collapse_whitespace and strip.
1578
1579 system
1580 (class=string #args=1): Run command string, yielding its stdout minus final carriage return.
1581
1582 abs
1583 (class=math #args=1): Absolute value.
1584
1585 acos
1586 (class=math #args=1): Inverse trigonometric cosine.
1587
1588 acosh
1589 (class=math #args=1): Inverse hyperbolic cosine.
1590
1591 asin
1592 (class=math #args=1): Inverse trigonometric sine.
1593
1594 asinh
1595 (class=math #args=1): Inverse hyperbolic sine.
1596
1597 atan
1598 (class=math #args=1): One-argument arctangent.
1599
1600 atan2
1601 (class=math #args=2): Two-argument arctangent.
1602
1603 atanh
1604 (class=math #args=1): Inverse hyperbolic tangent.
1605
1606 cbrt
1607 (class=math #args=1): Cube root.
1608
1609 ceil
1610 (class=math #args=1): Ceiling: nearest integer at or above.
1611
1612 cos
1613 (class=math #args=1): Trigonometric cosine.
1614
1615 cosh
1616 (class=math #args=1): Hyperbolic cosine.
1617
1618 erf
1619 (class=math #args=1): Error function.
1620
1621 erfc
1622 (class=math #args=1): Complementary error function.
1623
1624 exp
1625 (class=math #args=1): Exponential function e**x.
1626
1627 expm1
1628 (class=math #args=1): e**x - 1.
1629
1630 floor
1631 (class=math #args=1): Floor: nearest integer at or below.
1632
1633 invqnorm
1634 (class=math #args=1): Inverse of normal cumulative distribution
1635 function. Note that invqorm(urand()) is normally distributed.
1636
1637 log
1638 (class=math #args=1): Natural (base-e) logarithm.
1639
1640 log10
1641 (class=math #args=1): Base-10 logarithm.
1642
1643 log1p
1644 (class=math #args=1): log(1-x).
1645
1646 logifit
1647 (class=math #args=3): Given m and b from logistic regression, compute
1648 fit: $yhat=logifit($x,$m,$b).
1649
1650 madd
1651 (class=math #args=3): a + b mod m (integers)
1652
1653 max
1654 (class=math variadic): max of n numbers; null loses
1655
1656 mexp
1657 (class=math #args=3): a ** b mod m (integers)
1658
1659 min
1660 (class=math variadic): Min of n numbers; null loses
1661
1662 mmul
1663 (class=math #args=3): a * b mod m (integers)
1664
1665 msub
1666 (class=math #args=3): a - b mod m (integers)
1667
1668 pow
1669 (class=math #args=2): Exponentiation; same as **.
1670
1671 qnorm
1672 (class=math #args=1): Normal cumulative distribution function.
1673
1674 round
1675 (class=math #args=1): Round to nearest integer.
1676
1677 roundm
1678 (class=math #args=2): Round to nearest multiple of m: roundm($x,$m) is
1679 the same as round($x/$m)*$m
1680
1681 sgn
1682 (class=math #args=1): +1 for positive input, 0 for zero input, -1 for
1683 negative input.
1684
1685 sin
1686 (class=math #args=1): Trigonometric sine.
1687
1688 sinh
1689 (class=math #args=1): Hyperbolic sine.
1690
1691 sqrt
1692 (class=math #args=1): Square root.
1693
1694 tan
1695 (class=math #args=1): Trigonometric tangent.
1696
1697 tanh
1698 (class=math #args=1): Hyperbolic tangent.
1699
1700 urand
1701 (class=math #args=0): Floating-point numbers uniformly distributed on the unit interval.
1702 Int-valued example: '$n=floor(20+urand()*11)'.
1703
1704 urandrange
1705 (class=math #args=2): Floating-point numbers uniformly distributed on the interval [a, b).
1706
1707 urand32
1708 (class=math #args=0): Integer uniformly distributed 0 and 2**32-1
1709 inclusive.
1710
1711 urandint
1712 (class=math #args=2): Integer uniformly distributed between inclusive
1713 integer endpoints.
1714
1715 dhms2fsec
1716 (class=time #args=1): Recovers floating-point seconds as in
1717 dhms2fsec("5d18h53m20.250000s") = 500000.250000
1718
1719 dhms2sec
1720 (class=time #args=1): Recovers integer seconds as in
1721 dhms2sec("5d18h53m20s") = 500000
1722
1723 fsec2dhms
1724 (class=time #args=1): Formats floating-point seconds as in
1725 fsec2dhms(500000.25) = "5d18h53m20.250000s"
1726
1727 fsec2hms
1728 (class=time #args=1): Formats floating-point seconds as in
1729 fsec2hms(5000.25) = "01:23:20.250000"
1730
1731 gmt2sec
1732 (class=time #args=1): Parses GMT timestamp as integer seconds since
1733 the epoch.
1734
1735 localtime2sec
1736 (class=time #args=1): Parses local timestamp as integer seconds since
1737 the epoch. Consults $TZ environment variable.
1738
1739 hms2fsec
1740 (class=time #args=1): Recovers floating-point seconds as in
1741 hms2fsec("01:23:20.250000") = 5000.250000
1742
1743 hms2sec
1744 (class=time #args=1): Recovers integer seconds as in
1745 hms2sec("01:23:20") = 5000
1746
1747 sec2dhms
1748 (class=time #args=1): Formats integer seconds as in sec2dhms(500000)
1749 = "5d18h53m20s"
1750
1751 sec2gmt
1752 (class=time #args=1): Formats seconds since epoch (integer part)
1753 as GMT timestamp, e.g. sec2gmt(1440768801.7) = "2015-08-28T13:33:21Z".
1754 Leaves non-numbers as-is.
1755
1756 sec2gmt (class=time #args=2): Formats seconds since epoch as GMT timestamp with n
1757 decimal places for seconds, e.g. sec2gmt(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1758 Leaves non-numbers as-is.
1759
1760 sec2gmtdate
1761 (class=time #args=1): Formats seconds since epoch (integer part)
1762 as GMT timestamp with year-month-date, e.g. sec2gmtdate(1440768801.7) = "2015-08-28".
1763 Leaves non-numbers as-is.
1764
1765 sec2localtime
1766 (class=time #args=1): Formats seconds since epoch (integer part)
1767 as local timestamp, e.g. sec2localtime(1440768801.7) = "2015-08-28T13:33:21Z".
1768 Consults $TZ environment variable. Leaves non-numbers as-is.
1769
1770 sec2localtime (class=time #args=2): Formats seconds since epoch as local timestamp with n
1771 decimal places for seconds, e.g. sec2localtime(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1772 Consults $TZ environment variable. Leaves non-numbers as-is.
1773
1774 sec2localdate
1775 (class=time #args=1): Formats seconds since epoch (integer part)
1776 as local timestamp with year-month-date, e.g. sec2localdate(1440768801.7) = "2015-08-28".
1777 Consults $TZ environment variable. Leaves non-numbers as-is.
1778
1779 sec2hms
1780 (class=time #args=1): Formats integer seconds as in
1781 sec2hms(5000) = "01:23:20"
1782
1783 strftime
1784 (class=time #args=2): Formats seconds since the epoch as timestamp, e.g.
1785 strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z", and
1786 strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z".
1787 Format strings are as in the C library (please see "man strftime" on your system),
1788 with the Miller-specific addition of "%1S" through "%9S" which format the seconds
1789 with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.)
1790 See also strftime_local.
1791
1792 strftime_local
1793 (class=time #args=2): Like strftime but consults the $TZ environment variable to get local time zone.
1794
1795 strptime
1796 (class=time #args=2): Parses timestamp as floating-point seconds since the epoch,
1797 e.g. strptime("2015-08-28T13:33:21Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.000000,
1798 and strptime("2015-08-28T13:33:21.345Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.345000.
1799 See also strptime_local.
1800
1801 strptime_local
1802 (class=time #args=2): Like strptime, but consults $TZ environment variable to find and use local timezone.
1803
1804 systime
1805 (class=time #args=0): Floating-point seconds since the epoch,
1806 e.g. 1440768801.748936.
1807
1808 is_absent
1809 (class=typing #args=1): False if field is present in input, true otherwise
1810
1811 is_bool
1812 (class=typing #args=1): True if field is present with boolean value. Synonymous with is_boolean.
1813
1814 is_boolean
1815 (class=typing #args=1): True if field is present with boolean value. Synonymous with is_bool.
1816
1817 is_empty
1818 (class=typing #args=1): True if field is present in input with empty string value, false otherwise.
1819
1820 is_empty_map
1821 (class=typing #args=1): True if argument is a map which is empty.
1822
1823 is_float
1824 (class=typing #args=1): True if field is present with value inferred to be float
1825
1826 is_int
1827 (class=typing #args=1): True if field is present with value inferred to be int
1828
1829 is_map
1830 (class=typing #args=1): True if argument is a map.
1831
1832 is_nonempty_map
1833 (class=typing #args=1): True if argument is a map which is non-empty.
1834
1835 is_not_empty
1836 (class=typing #args=1): False if field is present in input with empty value, true otherwise
1837
1838 is_not_map
1839 (class=typing #args=1): True if argument is not a map.
1840
1841 is_not_null
1842 (class=typing #args=1): False if argument is null (empty or absent), true otherwise.
1843
1844 is_null
1845 (class=typing #args=1): True if argument is null (empty or absent), false otherwise.
1846
1847 is_numeric
1848 (class=typing #args=1): True if field is present with value inferred to be int or float
1849
1850 is_present
1851 (class=typing #args=1): True if field is present in input, false otherwise.
1852
1853 is_string
1854 (class=typing #args=1): True if field is present with string (including empty-string) value
1855
1856 asserting_absent
1857 (class=typing #args=1): Returns argument if it is absent in the input data, else
1858 throws an error.
1859
1860 asserting_bool
1861 (class=typing #args=1): Returns argument if it is present with boolean value, else
1862 throws an error.
1863
1864 asserting_boolean
1865 (class=typing #args=1): Returns argument if it is present with boolean value, else
1866 throws an error.
1867
1868 asserting_empty
1869 (class=typing #args=1): Returns argument if it is present in input with empty value,
1870 else throws an error.
1871
1872 asserting_empty_map
1873 (class=typing #args=1): Returns argument if it is a map with empty value, else
1874 throws an error.
1875
1876 asserting_float
1877 (class=typing #args=1): Returns argument if it is present with float value, else
1878 throws an error.
1879
1880 asserting_int
1881 (class=typing #args=1): Returns argument if it is present with int value, else
1882 throws an error.
1883
1884 asserting_map
1885 (class=typing #args=1): Returns argument if it is a map, else throws an error.
1886
1887 asserting_nonempty_map
1888 (class=typing #args=1): Returns argument if it is a non-empty map, else throws
1889 an error.
1890
1891 asserting_not_empty
1892 (class=typing #args=1): Returns argument if it is present in input with non-empty
1893 value, else throws an error.
1894
1895 asserting_not_map
1896 (class=typing #args=1): Returns argument if it is not a map, else throws an error.
1897
1898 asserting_not_null
1899 (class=typing #args=1): Returns argument if it is non-null (non-empty and non-absent),
1900 else throws an error.
1901
1902 asserting_null
1903 (class=typing #args=1): Returns argument if it is null (empty or absent), else throws
1904 an error.
1905
1906 asserting_numeric
1907 (class=typing #args=1): Returns argument if it is present with int or float value,
1908 else throws an error.
1909
1910 asserting_present
1911 (class=typing #args=1): Returns argument if it is present in input, else throws
1912 an error.
1913
1914 asserting_string
1915 (class=typing #args=1): Returns argument if it is present with string (including
1916 empty-string) value, else throws an error.
1917
1918 boolean
1919 (class=conversion #args=1): Convert int/float/bool/string to boolean.
1920
1921 float
1922 (class=conversion #args=1): Convert int/float/bool/string to float.
1923
1924 fmtnum
1925 (class=conversion #args=2): Convert int/float/bool to string using
1926 printf-style format string, e.g. '$s = fmtnum($n, "%06lld")'. WARNING: Miller numbers
1927 are all long long or double. If you use formats like %d or %f, behavior is undefined.
1928
1929 hexfmt
1930 (class=conversion #args=1): Convert int to string, e.g. 255 to "0xff".
1931
1932 int
1933 (class=conversion #args=1): Convert int/float/bool/string to int.
1934
1935 string
1936 (class=conversion #args=1): Convert int/float/bool/string to string.
1937
1938 typeof
1939 (class=conversion #args=1): Convert argument to type of argument (e.g.
1940 MT_STRING). For debug.
1941
1942 depth
1943 (class=maps #args=1): Prints maximum depth of hashmap: ''. Scalars have depth 0.
1944
1945 haskey
1946 (class=maps #args=2): True/false if map has/hasn't key, e.g. 'haskey($*, "a")' or
1947 ’haskey(mymap, mykey)'. Error if 1st argument is not a map.
1948
1949 joink
1950 (class=maps #args=2): Makes string from map keys. E.g. 'joink($*, ",")'.
1951
1952 joinkv
1953 (class=maps #args=3): Makes string from map key-value pairs. E.g. 'joinkv(@v[2], "=", ",")'
1954
1955 joinv
1956 (class=maps #args=2): Makes string from map keys. E.g. 'joinv(mymap, ",")'.
1957
1958 leafcount
1959 (class=maps #args=1): Counts total number of terminal values in hashmap. For single-level maps,
1960 same as length.
1961
1962 length
1963 (class=maps #args=1): Counts number of top-level entries in hashmap. Scalars have length 1.
1964
1965 mapdiff
1966 (class=maps variadic): With 0 args, returns empty map. With 1 arg, returns copy of arg.
1967 With 2 or more, returns copy of arg 1 with all keys from any of remaining argument maps removed.
1968
1969 mapexcept
1970 (class=maps variadic): Returns a map with keys from remaining arguments, if any, unset.
1971 E.g. 'mapexcept({1:2,3:4,5:6}, 1, 5, 7)' is '{3:4}'.
1972
1973 mapselect
1974 (class=maps variadic): Returns a map with only keys from remaining arguments set.
1975 E.g. 'mapselect({1:2,3:4,5:6}, 1, 5, 7)' is '{1:2,5:6}'.
1976
1977 mapsum
1978 (class=maps variadic): With 0 args, returns empty map. With >= 1 arg, returns a map with
1979 key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'.
1980
1981 splitkv
1982 (class=maps #args=3): Splits string by separators into map with type inference.
1983 E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives '{"a" : 1, "b" : 2, "c" : 3}'.
1984
1985 splitkvx
1986 (class=maps #args=3): Splits string by separators into map without type inference (keys and
1987 values are strings). E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives
1988 ’{"a" : "1", "b" : "2", "c" : "3"}'.
1989
1990 splitnv
1991 (class=maps #args=2): Splits string by separator into integer-indexed map with type inference.
1992 E.g. 'splitnv("a,b,c" , ",")' gives '{1 : "a", 2 : "b", 3 : "c"}'.
1993
1994 splitnvx
1995 (class=maps #args=2): Splits string by separator into integer-indexed map without type
1996 inference (values are strings). E.g. 'splitnv("4,5,6" , ",")' gives '{1 : "4", 2 : "5", 3 : "6"}'.
1997
1999 all
2000 all: used in "emit", "emitp", and "unset" as a synonym for @*
2001
2002 begin
2003 begin: defines a block of statements to be executed before input records
2004 are ingested. The body statements must be wrapped in curly braces.
2005 Example: 'begin { @count = 0 }'
2006
2007 bool
2008 bool: declares a boolean local variable in the current curly-braced scope.
2009 Type-checking happens at assignment: 'bool b = 1' is an error.
2010
2011 break
2012 break: causes execution to continue after the body of the current
2013 for/while/do-while loop.
2014
2015 call
2016 call: used for invoking a user-defined subroutine.
2017 Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
2018
2019 continue
2020 continue: causes execution to skip the remaining statements in the body of
2021 the current for/while/do-while loop. For-loop increments are still applied.
2022
2023 do
2024 do: with "while", introduces a do-while loop. The body statements must be wrapped
2025 in curly braces.
2026
2027 dump
2028 dump: prints all currently defined out-of-stream variables immediately
2029 to stdout as JSON.
2030
2031 With >, >>, or |, the data do not become part of the output record stream but
2032 are instead redirected.
2033
2034 The > and >> are for write and append, as in the shell, but (as with awk) the
2035 file-overwrite for > is on first write, not per record. The | is for piping to
2036 a process which will process the data. There will be one open file for each
2037 distinct file name (for > and >>) or one subordinate process for each distinct
2038 value of the piped-to command (for |). Output-formatting flags are taken from
2039 the main command line.
2040
2041 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }'
2042 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump > "mytap.dat"}'
2043 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}'
2044 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}'
2045
2046 edump
2047 edump: prints all currently defined out-of-stream variables immediately
2048 to stderr as JSON.
2049
2050 Example: mlr --from f.dat put -q '@v[NR]=$*; end { edump }'
2051
2052 elif
2053 elif: the way Miller spells "else if". The body statements must be wrapped
2054 in curly braces.
2055
2056 else
2057 else: terminates an if/elif/elif chain. The body statements must be wrapped
2058 in curly braces.
2059
2060 emit
2061 emit: inserts an out-of-stream variable into the output record stream. Hashmap
2062 indices present in the data but not slotted by emit arguments are not output.
2063
2064 With >, >>, or |, the data do not become part of the output record stream but
2065 are instead redirected.
2066
2067 The > and >> are for write and append, as in the shell, but (as with awk) the
2068 file-overwrite for > is on first write, not per record. The | is for piping to
2069 a process which will process the data. There will be one open file for each
2070 distinct file name (for > and >>) or one subordinate process for each distinct
2071 value of the piped-to command (for |). Output-formatting flags are taken from
2072 the main command line.
2073
2074 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2075 etc., to control the format of the output if the output is redirected. See also mlr -h.
2076
2077 Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, $*'
2078 Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, mapexcept($*, "a")'
2079 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums'
2080 Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums'
2081 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"'
2082 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"'
2083 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > "mytap.dat", @*, "index1", "index2"'
2084 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"'
2085 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"'
2086 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
2087 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"'
2088
2089 Please see http://johnkerl.org/miller/doc for more information.
2090
2091 emitf
2092 emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the
2093 output record stream.
2094
2095 With >, >>, or |, the data do not become part of the output record stream but
2096 are instead redirected.
2097
2098 The > and >> are for write and append, as in the shell, but (as with awk) the
2099 file-overwrite for > is on first write, not per record. The | is for piping to
2100 a process which will process the data. There will be one open file for each
2101 distinct file name (for > and >>) or one subordinate process for each distinct
2102 value of the piped-to command (for |). Output-formatting flags are taken from
2103 the main command line.
2104
2105 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2106 etc., to control the format of the output if the output is redirected. See also mlr -h.
2107
2108 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a'
2109 Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a'
2110 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c'
2111 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c'
2112 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c'
2113 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c'
2114 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c'
2115 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c'
2116
2117 Please see http://johnkerl.org/miller/doc for more information.
2118
2119 emitp
2120 emitp: inserts an out-of-stream variable into the output record stream.
2121 Hashmap indices present in the data but not slotted by emitp arguments are
2122 output concatenated with ":".
2123
2124 With >, >>, or |, the data do not become part of the output record stream but
2125 are instead redirected.
2126
2127 The > and >> are for write and append, as in the shell, but (as with awk) the
2128 file-overwrite for > is on first write, not per record. The | is for piping to
2129 a process which will process the data. There will be one open file for each
2130 distinct file name (for > and >>) or one subordinate process for each distinct
2131 value of the piped-to command (for |). Output-formatting flags are taken from
2132 the main command line.
2133
2134 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2135 etc., to control the format of the output if the output is redirected. See also mlr -h.
2136
2137 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums'
2138 Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums'
2139 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"'
2140 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"'
2141 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > "mytap.dat", @*, "index1", "index2"'
2142 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"'
2143 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"'
2144 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
2145 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"'
2146
2147 Please see http://johnkerl.org/miller/doc for more information.
2148
2149 end
2150 end: defines a block of statements to be executed after input records
2151 are ingested. The body statements must be wrapped in curly braces.
2152 Example: 'end { emit @count }'
2153 Example: 'end { eprint "Final count is " . @count }'
2154
2155 eprint
2156 eprint: prints expression immediately to stderr.
2157 Example: mlr --from f.dat put -q 'eprint "The sum of x and y is ".($x+$y)'
2158 Example: mlr --from f.dat put -q 'for (k, v in $*) { eprint k . " => " . v }'
2159 Example: mlr --from f.dat put '(NR % 1000 == 0) { eprint "Checkpoint ".NR}'
2160
2161 eprintn
2162 eprintn: prints expression immediately to stderr, without trailing newline.
2163 Example: mlr --from f.dat put -q 'eprintn "The sum of x and y is ".($x+$y); eprint ""'
2164
2165 false
2166 false: the boolean literal value.
2167
2168 filter
2169 filter: includes/excludes the record in the output record stream.
2170
2171 Example: mlr --from f.dat put 'filter (NR == 2 || $x > 5.4)'
2172
2173 Instead of put with 'filter false' you can simply use put -q. The following
2174 uses the input record to accumulate data but only prints the running sum
2175 without printing the input record:
2176
2177 Example: mlr --from f.dat put -q '@running_sum += $x * $y; emit @running_sum'
2178
2179 float
2180 float: declares a floating-point local variable in the current curly-braced scope.
2181 Type-checking happens at assignment: 'float x = 0' is an error.
2182
2183 for
2184 for: defines a for-loop using one of three styles. The body statements must
2185 be wrapped in curly braces.
2186 For-loop over stream record:
2187 Example: 'for (k, v in $*) { ... }'
2188 For-loop over out-of-stream variables:
2189 Example: 'for (k, v in @counts) { ... }'
2190 Example: 'for ((k1, k2), v in @counts) { ... }'
2191 Example: 'for ((k1, k2, k3), v in @*) { ... }'
2192 C-style for-loop:
2193 Example: 'for (var i = 0, var b = 1; i < 10; i += 1, b *= 2) { ... }'
2194
2195 func
2196 func: used for defining a user-defined function.
2197 Example: 'func f(a,b) { return sqrt(a**2+b**2)} $d = f($x, $y)'
2198
2199 if
2200 if: starts an if/elif/elif chain. The body statements must be wrapped
2201 in curly braces.
2202
2203 in
2204 in: used in for-loops over stream records or out-of-stream variables.
2205
2206 int
2207 int: declares an integer local variable in the current curly-braced scope.
2208 Type-checking happens at assignment: 'int x = 0.0' is an error.
2209
2210 map
2211 map: declares an map-valued local variable in the current curly-braced scope.
2212 Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is
2213 always OK. map b = a is OK or not depending on whether a is a map.
2214
2215 num
2216 num: declares an int/float local variable in the current curly-braced scope.
2217 Type-checking happens at assignment: 'num b = true' is an error.
2218
2219 print
2220 print: prints expression immediately to stdout.
2221 Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)'
2222 Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }'
2223 Example: mlr --from f.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
2224
2225 printn
2226 printn: prints expression immediately to stdout, without trailing newline.
2227 Example: mlr --from f.dat put -q 'printn "."; end { print "" }'
2228
2229 return
2230 return: specifies the return value from a user-defined function.
2231 Omitted return statements (including via if-branches) result in an absent-null
2232 return value, which in turns results in a skipped assignment to an LHS.
2233
2234 stderr
2235 stderr: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2236 to print to standard error.
2237
2238 stdout
2239 stdout: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2240 to print to standard output.
2241
2242 str
2243 str: declares a string local variable in the current curly-braced scope.
2244 Type-checking happens at assignment.
2245
2246 subr
2247 subr: used for defining a subroutine.
2248 Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
2249
2250 tee
2251 tee: prints the current record to specified file.
2252 This is an immediate print to the specified file (except for pprint format
2253 which of course waits until the end of the input stream to format all output).
2254
2255 The > and >> are for write and append, as in the shell, but (as with awk) the
2256 file-overwrite for > is on first write, not per record. The | is for piping to
2257 a process which will process the data. There will be one open file for each
2258 distinct file name (for > and >>) or one subordinate process for each distinct
2259 value of the piped-to command (for |). Output-formatting flags are taken from
2260 the main command line.
2261
2262 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2263 etc., to control the format of the output. See also mlr -h.
2264
2265 emit with redirect and tee with redirect are identical, except tee can only
2266 output $*.
2267
2268 Example: mlr --from f.dat put 'tee > "/tmp/data-".$a, $*'
2269 Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*'
2270 Example: mlr --from f.dat put 'tee > stderr, $*'
2271 Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\]", $*'
2272 Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\] > /tmp/data-".$a, $*'
2273 Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2274 Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2275
2276 true
2277 true: the boolean literal value.
2278
2279 unset
2280 unset: clears field(s) from the current record, or an out-of-stream or local variable.
2281
2282 Example: mlr --from f.dat put 'unset $x'
2283 Example: mlr --from f.dat put 'unset $*'
2284 Example: mlr --from f.dat put 'for (k, v in $*) { if (k =~ "a.*") { unset $[k] } }'
2285 Example: mlr --from f.dat put '...; unset @sums'
2286 Example: mlr --from f.dat put '...; unset @sums["green"]'
2287 Example: mlr --from f.dat put '...; unset @*'
2288
2289 var
2290 var: declares an untyped local variable in the current curly-braced scope.
2291 Examples: 'var a=1', 'var xyz=""'
2292
2293 while
2294 while: introduces a while loop, or with "do", introduces a do-while loop.
2295 The body statements must be wrapped in curly braces.
2296
2297 ENV
2298 ENV: access to environment variables by name, e.g. '$home = ENV["HOME"]'
2299
2300 FILENAME
2301 FILENAME: evaluates to the name of the current file being processed.
2302
2303 FILENUM
2304 FILENUM: evaluates to the number of the current file being processed,
2305 starting with 1.
2306
2307 FNR
2308 FNR: evaluates to the number of the current record within the current file
2309 being processed, starting with 1. Resets at the start of each file.
2310
2311 IFS
2312 IFS: evaluates to the input field separator from the command line.
2313
2314 IPS
2315 IPS: evaluates to the input pair separator from the command line.
2316
2317 IRS
2318 IRS: evaluates to the input record separator from the command line,
2319 or to LF or CRLF from the input data if in autodetect mode (which is
2320 the default).
2321
2322 M_E
2323 M_E: the mathematical constant e.
2324
2325 M_PI
2326 M_PI: the mathematical constant pi.
2327
2328 NF
2329 NF: evaluates to the number of fields in the current record.
2330
2331 NR
2332 NR: evaluates to the number of the current record over all files
2333 being processed, starting with 1. Does not reset at the start of each file.
2334
2335 OFS
2336 OFS: evaluates to the output field separator from the command line.
2337
2338 OPS
2339 OPS: evaluates to the output pair separator from the command line.
2340
2341 ORS
2342 ORS: evaluates to the output record separator from the command line,
2343 or to LF or CRLF from the input data if in autodetect mode (which is
2344 the default).
2345
2347 Miller is written by John Kerl <kerl.john.r@gmail.com>.
2348
2349 This manual page has been composed from Miller's help output by Eric
2350 MSP Veith <eveith@veith-m.de>.
2351
2353 awk(1), sed(1), cut(1), join(1), sort(1), RFC 4180: Common Format and
2354 MIME Type for Comma-Separated Values (CSV) Files, the miller website
2355 http://johnkerl.org/miller/doc
2356
2357
2358
2359 2020-09-03 MILLER(1)