1MILLER(1) MILLER(1)
2
3
4
6 miller - like awk, sed, cut, join, and sort for name-indexed data such
7 as CSV and tabular JSON.
8
10 Usage: mlr [I/O options] {verb} [verb-dependent options ...] {zero or
11 more file names}
12
13
15 Miller operates on key-value-pair data while the familiar Unix tools
16 operate on integer-indexed fields: if the natural data structure for
17 the latter is the array, then Miller's natural data structure is the
18 insertion-ordered hash map. This encompasses a variety of data
19 formats, including but not limited to the familiar CSV, TSV, and JSON.
20 (Miller can handle positionally-indexed data as a special case.) This
21 manpage documents Miller v5.6.2.
22
24 COMMAND-LINE SYNTAX
25 mlr --csv cut -f hostname,uptime mydata.csv
26 mlr --tsv --rs lf filter '$status != "down" && $upsec >= 10000' *.tsv
27 mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat
28 grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group
29 mlr join -j account_id -f accounts.dat then group-by account_name balances.dat
30 mlr --json put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json
31 mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/*
32 mlr stats2 -a linreg-pca -f u,v -g shape data/*
33 mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}' data/*
34 mlr --from estimates.tbl put '
35 for (k,v in $*) {
36 if (is_numeric(v) && k =~ "^[t-z].*$") {
37 $sum += v; $count += 1
38 }
39 }
40 $mean = $sum / $count # no assignment if count unset'
41 mlr --from infile.dat put -f analyze.mlr
42 mlr --from infile.dat put 'tee > "./taps/data-".$a."-".$b, $*'
43 mlr --from infile.dat put 'tee | "gzip > ./taps/data-".$a."-".$b.".gz", $*'
44 mlr --from infile.dat put -q '@v=$*; dump | "jq .[]"'
45 mlr --from infile.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
46
47 DATA FORMATS
48 DKVP: delimited key-value pairs (Miller default format)
49 +---------------------+
50 | apple=1,bat=2,cog=3 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
51 | dish=7,egg=8,flint | Record 2: "dish" => "7", "egg" => "8", "3" => "flint"
52 +---------------------+
53
54 NIDX: implicitly numerically indexed (Unix-toolkit style)
55 +---------------------+
56 | the quick brown | Record 1: "1" => "the", "2" => "quick", "3" => "brown"
57 | fox jumped | Record 2: "1" => "fox", "2" => "jumped"
58 +---------------------+
59
60 CSV/CSV-lite: comma-separated values with separate header line
61 +---------------------+
62 | apple,bat,cog |
63 | 1,2,3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
64 | 4,5,6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
65 +---------------------+
66
67 Tabular JSON: nested objects are supported, although arrays within them are not:
68 +---------------------+
69 | { |
70 | "apple": 1, | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
71 | "bat": 2, |
72 | "cog": 3 |
73 | } |
74 | { |
75 | "dish": { | Record 2: "dish:egg" => "7", "dish:flint" => "8", "garlic" => ""
76 | "egg": 7, |
77 | "flint": 8 |
78 | }, |
79 | "garlic": "" |
80 | } |
81 +---------------------+
82
83 PPRINT: pretty-printed tabular
84 +---------------------+
85 | apple bat cog |
86 | 1 2 3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
87 | 4 5 6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
88 +---------------------+
89
90 XTAB: pretty-printed transposed tabular
91 +---------------------+
92 | apple 1 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
93 | bat 2 |
94 | cog 3 |
95 | |
96 | dish 7 | Record 2: "dish" => "7", "egg" => "8"
97 | egg 8 |
98 +---------------------+
99
100 Markdown tabular (supported for output only):
101 +-----------------------+
102 | | apple | bat | cog | |
103 | | --- | --- | --- | |
104 | | 1 | 2 | 3 | | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
105 | | 4 | 5 | 6 | | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
106 +-----------------------+
107
109 In the following option flags, the version with "i" designates the
110 input stream, "o" the output stream, and the version without prefix
111 sets the option for both input and output stream. For example: --irs
112 sets the input record separator, --ors the output record separator, and
113 --rs sets both the input and output separator to the given value.
114
115 HELP OPTIONS
116 -h or --help Show this message.
117 --version Show the software version.
118 {verb name} --help Show verb-specific help.
119 --help-all-verbs Show help on all verbs.
120 -l or --list-all-verbs List only verb names.
121 -L List only verb names, one per line.
122 -f or --help-all-functions Show help on all built-in functions.
123 -F Show a bare listing of built-in functions by name.
124 -k or --help-all-keywords Show help on all keywords.
125 -K Show a bare listing of keywords by name.
126
127 VERB LIST
128 altkv bar bootstrap cat check clean-whitespace count-distinct count-similar
129 cut decimate fill-down filter format-values fraction grep group-by group-like
130 having-fields head histogram join label least-frequent merge-fields
131 most-frequent nest nothing put regularize remove-empty-columns rename reorder
132 repeat reshape sample sec2gmt sec2gmtdate seqgen shuffle skip-trivial-records
133 sort stats1 stats2 step tac tail tee top uniq unsparsify
134
135 FUNCTION LIST
136 + + - - * / // .+ .+ .- .- .* ./ .// % ** | ^ & ~ << >> bitcount == != =~ !=~
137 > >= < <= && || ^^ ! ? : . gsub regextract regextract_or_else strlen sub ssub
138 substr tolower toupper capitalize lstrip rstrip strip collapse_whitespace
139 clean_whitespace system abs acos acosh asin asinh atan atan2 atanh cbrt ceil
140 cos cosh erf erfc exp expm1 floor invqnorm log log10 log1p logifit madd max
141 mexp min mmul msub pow qnorm round roundm sgn sin sinh sqrt tan tanh urand
142 urandrange urand32 urandint dhms2fsec dhms2sec fsec2dhms fsec2hms gmt2sec
143 localtime2sec hms2fsec hms2sec sec2dhms sec2gmt sec2gmt sec2gmtdate
144 sec2localtime sec2localtime sec2localdate sec2hms strftime strftime_local
145 strptime strptime_local systime is_absent is_bool is_boolean is_empty
146 is_empty_map is_float is_int is_map is_nonempty_map is_not_empty is_not_map
147 is_not_null is_null is_numeric is_present is_string asserting_absent
148 asserting_bool asserting_boolean asserting_empty asserting_empty_map
149 asserting_float asserting_int asserting_map asserting_nonempty_map
150 asserting_not_empty asserting_not_map asserting_not_null asserting_null
151 asserting_numeric asserting_present asserting_string boolean float fmtnum
152 hexfmt int string typeof depth haskey joink joinkv joinv leafcount length
153 mapdiff mapexcept mapselect mapsum splitkv splitkvx splitnv splitnvx
154
155 Please use "mlr --help-function {function name}" for function-specific help.
156
157 I/O FORMATTING
158 --idkvp --odkvp --dkvp Delimited key-value pairs, e.g "a=1,b=2"
159 (this is Miller's default format).
160
161 --inidx --onidx --nidx Implicitly-integer-indexed fields
162 (Unix-toolkit style).
163 -T Synonymous with "--nidx --fs tab".
164
165 --icsv --ocsv --csv Comma-separated value (or tab-separated
166 with --fs tab, etc.)
167
168 --itsv --otsv --tsv Keystroke-savers for "--icsv --ifs tab",
169 "--ocsv --ofs tab", "--csv --fs tab".
170 --iasv --oasv --asv Similar but using ASCII FS 0x1f and RS 0x1e
171 --iusv --ousv --usv Similar but using Unicode FS U+241F (UTF-8 0xe2909f)
172 and RS U+241E (UTF-8 0xe2909e)
173
174 --icsvlite --ocsvlite --csvlite Comma-separated value (or tab-separated
175 with --fs tab, etc.). The 'lite' CSV does not handle
176 RFC-CSV double-quoting rules; is slightly faster;
177 and handles heterogeneity in the input stream via
178 empty newline followed by new header line. See also
179 http://johnkerl.org/miller/doc/file-formats.html#CSV/TSV/etc.
180
181 --itsvlite --otsvlite --tsvlite Keystroke-savers for "--icsvlite --ifs tab",
182 "--ocsvlite --ofs tab", "--csvlite --fs tab".
183 -t Synonymous with --tsvlite.
184 --iasvlite --oasvlite --asvlite Similar to --itsvlite et al. but using ASCII FS 0x1f and RS 0x1e
185 --iusvlite --ousvlite --usvlite Similar to --itsvlite et al. but using Unicode FS U+241F (UTF-8 0xe2909f)
186 and RS U+241E (UTF-8 0xe2909e)
187
188 --ipprint --opprint --pprint Pretty-printed tabular (produces no
189 output until all input is in).
190 --right Right-justifies all fields for PPRINT output.
191 --barred Prints a border around PPRINT output
192 (only available for output).
193
194 --omd Markdown-tabular (only available for output).
195
196 --ixtab --oxtab --xtab Pretty-printed vertical-tabular.
197 --xvright Right-justifies values for XTAB format.
198
199 --ijson --ojson --json JSON tabular: sequence or list of one-level
200 maps: {...}{...} or [{...},{...}].
201 --json-map-arrays-on-input JSON arrays are unmillerable. --json-map-arrays-on-input
202 --json-skip-arrays-on-input is the default: arrays are converted to integer-indexed
203 --json-fatal-arrays-on-input maps. The other two options cause them to be skipped, or
204 to be treated as errors. Please use the jq tool for full
205 JSON (pre)processing.
206 --jvstack Put one key-value pair per line for JSON
207 output.
208 --jlistwrap Wrap JSON output in outermost [ ].
209 --jknquoteint Do not quote non-string map keys in JSON output.
210 --jvquoteall Quote map values in JSON output, even if they're
211 numeric.
212 --jflatsep {string} Separator for flattening multi-level JSON keys,
213 e.g. '{"a":{"b":3}}' becomes a:b => 3 for
214 non-JSON formats. Defaults to :.
215
216 -p is a keystroke-saver for --nidx --fs space --repifs
217
218 --mmap --no-mmap --mmap-below {n} Use mmap for files whenever possible, never, or
219 for files less than n bytes in size. Default is for
220 files less than 4294967296 bytes in size.
221 'Whenever possible' means always except for when reading
222 standard input which is not mmappable. If you don't know
223 what this means, don't worry about it -- it's a minor
224 performance optimization.
225
226 Examples: --csv for CSV-formatted input and output; --idkvp --opprint for
227 DKVP-formatted input and pretty-printed output.
228
229 Please use --iformat1 --oformat2 rather than --format1 --oformat2.
230 The latter sets up input and output flags for format1, not all of which
231 are overridden in all cases by setting output format to format2.
232
233 COMMENTS IN DATA
234 --skip-comments Ignore commented lines (prefixed by "#")
235 within the input.
236 --skip-comments-with {string} Ignore commented lines within input, with
237 specified prefix.
238 --pass-comments Immediately print commented lines (prefixed by "#")
239 within the input.
240 --pass-comments-with {string} Immediately print commented lines within input, with
241 specified prefix.
242 Notes:
243 * Comments are only honored at the start of a line.
244 * In the absence of any of the above four options, comments are data like
245 any other text.
246 * When pass-comments is used, comment lines are written to standard output
247 immediately upon being read; they are not part of the record stream.
248 Results may be counterintuitive. A suggestion is to place comments at the
249 start of data files.
250
251 FORMAT-CONVERSION KEYSTROKE-SAVERS
252 As keystroke-savers for format-conversion you may use the following:
253 --c2t --c2d --c2n --c2j --c2x --c2p --c2m
254 --t2c --t2d --t2n --t2j --t2x --t2p --t2m
255 --d2c --d2t --d2n --d2j --d2x --d2p --d2m
256 --n2c --n2t --n2d --n2j --n2x --n2p --n2m
257 --j2c --j2t --j2d --j2n --j2x --j2p --j2m
258 --x2c --x2t --x2d --x2n --x2j --x2p --x2m
259 --p2c --p2t --p2d --p2n --p2j --p2x --p2m
260 The letters c t d n j x p m refer to formats CSV, TSV, DKVP, NIDX, JSON, XTAB,
261 PPRINT, and markdown, respectively. Note that markdown format is available for
262 output only.
263
264 COMPRESSED I/O
265 --prepipe {command} This allows Miller to handle compressed inputs. You can do
266 without this for single input files, e.g. "gunzip < myfile.csv.gz | mlr ...".
267 However, when multiple input files are present, between-file separations are
268 lost; also, the FILENAME variable doesn't iterate. Using --prepipe you can
269 specify an action to be taken on each input file. This pre-pipe command must
270 be able to read from standard input; it will be invoked with
271 {command} < {filename}.
272 Examples:
273 mlr --prepipe 'gunzip'
274 mlr --prepipe 'zcat -cf'
275 mlr --prepipe 'xz -cd'
276 mlr --prepipe cat
277 Note that this feature is quite general and is not limited to decompression
278 utilities. You can use it to apply per-file filters of your choice.
279 For output compression (or other) utilities, simply pipe the output:
280 mlr ... | {your compression command}
281
282 SEPARATORS
283 --rs --irs --ors Record separators, e.g. 'lf' or '\r\n'
284 --fs --ifs --ofs --repifs Field separators, e.g. comma
285 --ps --ips --ops Pair separators, e.g. equals sign
286
287 Notes about line endings:
288 * Default line endings (--irs and --ors) are "auto" which means autodetect from
289 the input file format, as long as the input file(s) have lines ending in either
290 LF (also known as linefeed, '\n', 0x0a, Unix-style) or CRLF (also known as
291 carriage-return/linefeed pairs, '\r\n', 0x0d 0x0a, Windows style).
292 * If both irs and ors are auto (which is the default) then LF input will lead to LF
293 output and CRLF input will lead to CRLF output, regardless of the platform you're
294 running on.
295 * The line-ending autodetector triggers on the first line ending detected in the input
296 stream. E.g. if you specify a CRLF-terminated file on the command line followed by an
297 LF-terminated file then autodetected line endings will be CRLF.
298 * If you use --ors {something else} with (default or explicitly specified) --irs auto
299 then line endings are autodetected on input and set to what you specify on output.
300 * If you use --irs {something else} with (default or explicitly specified) --ors auto
301 then the output line endings used are LF on Unix/Linux/BSD/MacOSX, and CRLF on Windows.
302
303 Notes about all other separators:
304 * IPS/OPS are only used for DKVP and XTAB formats, since only in these formats
305 do key-value pairs appear juxtaposed.
306 * IRS/ORS are ignored for XTAB format. Nominally IFS and OFS are newlines;
307 XTAB records are separated by two or more consecutive IFS/OFS -- i.e.
308 a blank line. Everything above about --irs/--ors/--rs auto becomes --ifs/--ofs/--fs
309 auto for XTAB format. (XTAB's default IFS/OFS are "auto".)
310 * OFS must be single-character for PPRINT format. This is because it is used
311 with repetition for alignment; multi-character separators would make
312 alignment impossible.
313 * OPS may be multi-character for XTAB format, in which case alignment is
314 disabled.
315 * TSV is simply CSV using tab as field separator ("--fs tab").
316 * FS/PS are ignored for markdown format; RS is used.
317 * All FS and PS options are ignored for JSON format, since they are not relevant
318 to the JSON format.
319 * You can specify separators in any of the following ways, shown by example:
320 - Type them out, quoting as necessary for shell escapes, e.g.
321 "--fs '|' --ips :"
322 - C-style escape sequences, e.g. "--rs '\r\n' --fs '\t'".
323 - To avoid backslashing, you can use any of the following names:
324 cr crcr newline lf lflf crlf crlfcrlf tab space comma pipe slash colon semicolon equals
325 * Default separators by format:
326 File format RS FS PS
327 gen N/A (N/A) (N/A)
328 dkvp auto , =
329 json auto (N/A) (N/A)
330 nidx auto space (N/A)
331 csv auto , (N/A)
332 csvlite auto , (N/A)
333 markdown auto (N/A) (N/A)
334 pprint auto space (N/A)
335 xtab (N/A) auto space
336
337 CSV-SPECIFIC OPTIONS
338 --implicit-csv-header Use 1,2,3,... as field labels, rather than from line 1
339 of input files. Tip: combine with "label" to recreate
340 missing headers.
341 --allow-ragged-csv-input|--ragged If a data line has fewer fields than the header line,
342 fill remaining keys with empty string. If a data line has more
343 fields than the header line, use integer field labels as in
344 the implicit-header case.
345 --headerless-csv-output Print only CSV data lines.
346 -N Keystroke-saver for --implicit-csv-header --headerless-csv-output.
347
348 DOUBLE-QUOTING FOR CSV/CSVLITE OUTPUT
349 --quote-all Wrap all fields in double quotes
350 --quote-none Do not wrap any fields in double quotes, even if they have
351 OFS or ORS in them
352 --quote-minimal Wrap fields in double quotes only if they have OFS or ORS
353 in them (default)
354 --quote-numeric Wrap fields in double quotes only if they have numbers
355 in them
356 --quote-original Wrap fields in double quotes if and only if they were
357 quoted on input. This isn't sticky for computed fields:
358 e.g. if fields a and b were quoted on input and you do
359 "put '$c = $a . $b'" then field c won't inherit a or b's
360 was-quoted-on-input flag.
361
362 NUMERICAL FORMATTING
363 --ofmt {format} E.g. %.18lf, %.0lf. Please use sprintf-style codes for
364 double-precision. Applies to verbs which compute new
365 values, e.g. put, stats1, stats2. See also the fmtnum
366 function within mlr put (mlr --help-all-functions).
367 Defaults to %lf.
368
369 OTHER OPTIONS
370 --seed {n} with n of the form 12345678 or 0xcafefeed. For put/filter
371 urand()/urandint()/urand32().
372 --nr-progress-mod {m}, with m a positive integer: print filename and record
373 count to stderr every m input records.
374 --from {filename} Use this to specify an input file before the verb(s),
375 rather than after. May be used more than once. Example:
376 "mlr --from a.dat --from b.dat cat" is the same as
377 "mlr cat a.dat b.dat".
378 -n Process no input files, nor standard input either. Useful
379 for mlr put with begin/end statements only. (Same as --from
380 /dev/null.) Also useful in "mlr -n put -v '...'" for
381 analyzing abstract syntax trees (if that's your thing).
382 -I Process files in-place. For each file name on the command
383 line, output is written to a temp file in the same
384 directory, which is then renamed over the original. Each
385 file is processed in isolation: if the output format is
386 CSV, CSV headers will be present in each output file;
387 statistics are only over each file's own records; and so on.
388
389 THEN-CHAINING
390 Output of one verb may be chained as input to another using "then", e.g.
391 mlr stats1 -a min,mean,max -f flag,u,v -g color then sort -f color
392
393 AUXILIARY COMMANDS
394 Miller has a few otherwise-standalone executables packaged within it.
395 They do not participate in any other parts of Miller.
396 Available subcommands:
397 aux-list
398 lecat
399 termcvt
400 hex
401 unhex
402 netbsd-strptime
403 For more information, please invoke mlr {subcommand} --help
404
406 altkv
407 Usage: mlr altkv [no options]
408 Given fields with values of the form a,b,c,d,e,f emits a=b,c=d,e=f pairs.
409
410 bar
411 Usage: mlr bar [options]
412 Replaces a numeric field with a number of asterisks, allowing for cheesy
413 bar plots. These align best with --opprint or --oxtab output format.
414 Options:
415 -f {a,b,c} Field names to convert to bars.
416 -c {character} Fill character: default '*'.
417 -x {character} Out-of-bounds character: default '#'.
418 -b {character} Blank character: default '.'.
419 --lo {lo} Lower-limit value for min-width bar: default '0.000000'.
420 --hi {hi} Upper-limit value for max-width bar: default '100.000000'.
421 -w {n} Bar-field width: default '40'.
422 --auto Automatically computes limits, ignoring --lo and --hi.
423 Holds all records in memory before producing any output.
424
425 bootstrap
426 Usage: mlr bootstrap [options]
427 Emits an n-sample, with replacement, of the input records.
428 Options:
429 -n {number} Number of samples to output. Defaults to number of input records.
430 Must be non-negative.
431 See also mlr sample and mlr shuffle.
432
433 cat
434 Usage: mlr cat [options]
435 Passes input records directly to output. Most useful for format conversion.
436 Options:
437 -n Prepend field "n" to each record with record-counter starting at 1
438 -g {comma-separated field name(s)} When used with -n/-N, writes record-counters
439 keyed by specified field name(s).
440 -v Write a low-level record-structure dump to stderr.
441 -N {name} Prepend field {name} to each record with record-counter starting at 1
442
443 check
444 Usage: mlr check
445 Consumes records without printing any output.
446 Useful for doing a well-formatted check on input data.
447
448 clean-whitespace
449 Usage: mlr clean-whitespace [options] {old1,new1,old2,new2,...}
450 For each record, for each field in the record, whitespace-cleans the keys and
451 values. Whitespace-cleaning entails stripping leading and trailing whitespace,
452 and replacing multiple whitespace with singles. For finer-grained control,
453 please see the DSL functions lstrip, rstrip, strip, collapse_whitespace,
454 and clean_whitespace.
455
456 Options:
457 -k|--keys-only Do not touch values.
458 -v|--values-only Do not touch keys.
459 It is an error to specify -k as well as -v.
460
461 count-distinct
462 Usage: mlr count-distinct [options]
463 Prints number of records having distinct values for specified field names.
464 Same as uniq -c.
465
466 Options:
467 -f {a,b,c} Field names for distinct count.
468 -n Show only the number of distinct values. Not compatible with -u.
469 -o {name} Field name for output count. Default "count".
470 Ignored with -u.
471 -u Do unlashed counts for multiple field names. With -f a,b and
472 without -u, computes counts for distinct combinations of a
473 and b field values. With -f a,b and with -u, computes counts
474 for distinct a field values and counts for distinct b field
475 values separately.
476
477 count-similar
478 Usage: mlr count-similar [options]
479 Ingests all records, then emits each record augmented by a count of
480 the number of other records having the same group-by field values.
481 Options:
482 -g {d,e,f} Group-by-field names for counts.
483 -o {name} Field name for output count. Default "count".
484
485 cut
486 Usage: mlr cut [options]
487 Passes through input records with specified fields included/excluded.
488 -f {a,b,c} Field names to include for cut.
489 -o Retain fields in the order specified here in the argument list.
490 Default is to retain them in the order found in the input data.
491 -x|--complement Exclude, rather than include, field names specified by -f.
492 -r Treat field names as regular expressions. "ab", "a.*b" will
493 match any field name containing the substring "ab" or matching
494 "a.*b", respectively; anchors of the form "^ab$", "^a.*b$" may
495 be used. The -o flag is ignored when -r is present.
496 Examples:
497 mlr cut -f hostname,status
498 mlr cut -x -f hostname,status
499 mlr cut -r -f '^status$,sda[0-9]'
500 mlr cut -r -f '^status$,"sda[0-9]"'
501 mlr cut -r -f '^status$,"sda[0-9]"i' (this is case-insensitive)
502
503 decimate
504 Usage: mlr decimate [options]
505 -n {count} Decimation factor; default 10
506 -b Decimate by printing first of every n.
507 -e Decimate by printing last of every n (default).
508 -g {a,b,c} Optional group-by-field names for decimate counts
509 Passes through one of every n records, optionally by category.
510
511 fill-down
512 Usage: mlr fill-down [options]
513 -f {a,b,c} Field names for fill-down
514 -a|--only-if-absent Field names for fill-down
515 If a given record has a missing value for a given field, fill that from
516 the corresponding value from a previous record, if any.
517 By default, a 'missing' field either is absent, or has the empty-string value.
518 With -a, a field is 'missing' only if it is absent.
519
520 filter
521 Usage: mlr filter [options] {expression}
522 Prints records for which {expression} evaluates to true.
523 If there are multiple semicolon-delimited expressions, all of them are
524 evaluated and the last one is used as the filter criterion.
525
526 Conversion options:
527 -S: Keeps field values as strings with no type inference to int or float.
528 -F: Keeps field values as strings or floats with no inference to int.
529 All field values are type-inferred to int/float/string unless this behavior is
530 suppressed with -S or -F.
531
532 Output/formatting options:
533 --oflatsep {string}: Separator to use when flattening multi-level @-variables
534 to output records for emit. Default ":".
535 --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
536 --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
537 Any of the output-format command-line flags (see mlr -h). Example: using
538 mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
539 the input is CSV, the output is pretty-print tabular, but the tee-file output
540 is written in JSON format.
541 --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
542 record.
543
544 Expression-specification options:
545 -f {filename}: the DSL expression is taken from the specified file rather
546 than from the command line. Outer single quotes wrapping the expression
547 should not be placed in the file. If -f is specified more than once,
548 all input files specified using -f are concatenated to produce the expression.
549 (For example, you can define functions in one file and call them from another.)
550 -e {expression}: You can use this after -f to add an expression. Example use
551 case: define functions/subroutines in a file you specify with -f, then call
552 them with an expression you specify with -e.
553 (If you mix -e and -f then the expressions are evaluated in the order encountered.
554 Since the expression pieces are simply concatenated, please be sure to use intervening
555 semicolons to separate expressions.)
556
557 Tracing options:
558 -v: Prints the expressions's AST (abstract syntax tree), which gives
559 full transparency on the precedence and associativity rules of
560 Miller's grammar, to stdout.
561 -a: Prints a low-level stack-allocation trace to stdout.
562 -t: Prints a low-level parser trace to stderr.
563 -T: Prints a every statement to stderr as it is executed.
564
565 Other options:
566 -x: Prints records for which {expression} evaluates to false.
567
568 Please use a dollar sign for field names and double-quotes for string
569 literals. If field names have special characters such as "." then you might
570 use braces, e.g. '${field.name}'. Miller built-in variables are
571 NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
572 variables. The environment-variable name may be an expression, e.g. a field
573 value.
574
575 Use # to comment to end of line.
576
577 Examples:
578 mlr filter 'log10($count) > 4.0'
579 mlr filter 'FNR == 2 (second record in each file)'
580 mlr filter 'urand() < 0.001' (subsampling)
581 mlr filter '$color != "blue" && $value > 4.2'
582 mlr filter '($x<.5 && $y<.5) || ($x>.5 && $y>.5)'
583 mlr filter '($name =~ "^sys.*east$") || ($name =~ "^dev.[0-9]+"i)'
584 mlr filter '$ab = $a+$b; $cd = $c+$d; $ab != $cd'
585 mlr filter '
586 NR == 1 ||
587 #NR == 2 ||
588 NR == 3
589 '
590
591 Please see http://johnkerl.org/miller/doc/reference.html for more information
592 including function list. Or "mlr -f". Please also also "mlr grep" which is
593 useful when you don't yet know which field name(s) you're looking for.
594
595 format-values
596 Usage: mlr format-values [options]
597 Applies format strings to all field values, depending on autodetected type.
598 * If a field value is detected to be integer, applies integer format.
599 * Else, if a field value is detected to be float, applies float format.
600 * Else, applies string format.
601
602 Note: this is a low-keystroke way to apply formatting to many fields. To get
603 finer control, please see the fmtnum function within the mlr put DSL.
604
605 Note: this verb lets you apply arbitrary format strings, which can produce
606 undefined behavior and/or program crashes. See your system's "man printf".
607
608 Options:
609 -i {integer format} Defaults to "%lld".
610 Examples: "%06lld", "%08llx".
611 Note that Miller integers are long long so you must use
612 formats which apply to long long, e.g. with ll in them.
613 Undefined behavior results otherwise.
614 -f {float format} Defaults to "%lf".
615 Examples: "%8.3lf", "%.6le".
616 Note that Miller floats are double-precision so you must
617 use formats which apply to double, e.g. with l[efg] in them.
618 Undefined behavior results otherwise.
619 -s {string format} Defaults to "%s".
620 Examples: "_%s", "%08s".
621 Note that you must use formats which apply to string, e.g.
622 with s in them. Undefined behavior results otherwise.
623 -n Coerce field values autodetected as int to float, and then
624 apply the float format.
625
626 fraction
627 Usage: mlr fraction [options]
628 For each record's value in specified fields, computes the ratio of that
629 value to the sum of values in that field over all input records.
630 E.g. with input records x=1 x=2 x=3 and x=4, emits output records
631 x=1,x_fraction=0.1 x=2,x_fraction=0.2 x=3,x_fraction=0.3 and x=4,x_fraction=0.4
632
633 Note: this is internally a two-pass algorithm: on the first pass it retains
634 input records and accumulates sums; on the second pass it computes quotients
635 and emits output records. This means it produces no output until all input is read.
636
637 Options:
638 -f {a,b,c} Field name(s) for fraction calculation
639 -g {d,e,f} Optional group-by-field name(s) for fraction counts
640 -p Produce percents [0..100], not fractions [0..1]. Output field names
641 end with "_percent" rather than "_fraction"
642 -c Produce cumulative distributions, i.e. running sums: each output
643 value folds in the sum of the previous for the specified group
644 E.g. with input records x=1 x=2 x=3 and x=4, emits output records
645 x=1,x_cumulative_fraction=0.1 x=2,x_cumulative_fraction=0.3
646 x=3,x_cumulative_fraction=0.6 and x=4,x_cumulative_fraction=1.0
647
648 grep
649 Usage: mlr grep [options] {regular expression}
650 Passes through records which match {regex}.
651 Options:
652 -i Use case-insensitive search.
653 -v Invert: pass through records which do not match the regex.
654 Note that "mlr filter" is more powerful, but requires you to know field names.
655 By contrast, "mlr grep" allows you to regex-match the entire record. It does
656 this by formatting each record in memory as DKVP, using command-line-specified
657 ORS/OFS/OPS, and matching the resulting line against the regex specified
658 here. In particular, the regex is not applied to the input stream: if you
659 have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
660 be matched, not against either of these lines, but against the DKVP line
661 "x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
662 and this command is intended to be merely a keystroke-saver. To get all the
663 features of system grep, you can do
664 "mlr --odkvp ... | grep ... | mlr --idkvp ..."
665
666 group-by
667 Usage: mlr group-by {comma-separated field names}
668 Outputs records in batches having identical values at specified field names.
669
670 group-like
671 Usage: mlr group-like
672 Outputs records in batches having identical field names.
673
674 having-fields
675 Usage: mlr having-fields [options]
676 Conditionally passes through records depending on each record's field names.
677 Options:
678 --at-least {comma-separated names}
679 --which-are {comma-separated names}
680 --at-most {comma-separated names}
681 --all-matching {regular expression}
682 --any-matching {regular expression}
683 --none-matching {regular expression}
684 Examples:
685 mlr having-fields --which-are amount,status,owner
686 mlr having-fields --any-matching 'sda[0-9]'
687 mlr having-fields --any-matching '"sda[0-9]"'
688 mlr having-fields --any-matching '"sda[0-9]"i' (this is case-insensitive)
689
690 head
691 Usage: mlr head [options]
692 -n {count} Head count to print; default 10
693 -g {a,b,c} Optional group-by-field names for head counts
694 Passes through the first n records, optionally by category.
695 Without -g, ceases consuming more input (i.e. is fast) when n
696 records have been read.
697
698 histogram
699 Usage: mlr histogram [options]
700 -f {a,b,c} Value-field names for histogram counts
701 --lo {lo} Histogram low value
702 --hi {hi} Histogram high value
703 --nbins {n} Number of histogram bins
704 --auto Automatically computes limits, ignoring --lo and --hi.
705 Holds all values in memory before producing any output.
706 -o {prefix} Prefix for output field name. Default: no prefix.
707 Just a histogram. Input values < lo or > hi are not counted.
708
709 join
710 Usage: mlr join [options]
711 Joins records from specified left file name with records from all file names
712 at the end of the Miller argument list.
713 Functionality is essentially the same as the system "join" command, but for
714 record streams.
715 Options:
716 -f {left file name}
717 -j {a,b,c} Comma-separated join-field names for output
718 -l {a,b,c} Comma-separated join-field names for left input file;
719 defaults to -j values if omitted.
720 -r {a,b,c} Comma-separated join-field names for right input file(s);
721 defaults to -j values if omitted.
722 --lp {text} Additional prefix for non-join output field names from
723 the left file
724 --rp {text} Additional prefix for non-join output field names from
725 the right file(s)
726 --np Do not emit paired records
727 --ul Emit unpaired records from the left file
728 --ur Emit unpaired records from the right file(s)
729 -s|--sorted-input Require sorted input: records must be sorted
730 lexically by their join-field names, else not all records will
731 be paired. The only likely use case for this is with a left
732 file which is too big to fit into system memory otherwise.
733 -u Enable unsorted input. (This is the default even without -u.)
734 In this case, the entire left file will be loaded into memory.
735 --prepipe {command} As in main input options; see mlr --help for details.
736 If you wish to use a prepipe command for the main input as well
737 as here, it must be specified there as well as here.
738 File-format options default to those for the right file names on the Miller
739 argument list, but may be overridden for the left file as follows. Please see
740 the main "mlr --help" for more information on syntax for these arguments.
741 -i {one of csv,dkvp,nidx,pprint,xtab}
742 --irs {record-separator character}
743 --ifs {field-separator character}
744 --ips {pair-separator character}
745 --repifs
746 --repips
747 --mmap
748 --no-mmap
749 Please use "mlr --usage-separator-options" for information on specifying separators.
750 Please see http://johnkerl.org/miller/doc/reference.html for more information
751 including examples.
752
753 label
754 Usage: mlr label {new1,new2,new3,...}
755 Given n comma-separated names, renames the first n fields of each record to
756 have the respective name. (Fields past the nth are left with their original
757 names.) Particularly useful with --inidx or --implicit-csv-header, to give
758 useful names to otherwise integer-indexed fields.
759 Examples:
760 "echo 'a b c d' | mlr --inidx --odkvp cat" gives "1=a,2=b,3=c,4=d"
761 "echo 'a b c d' | mlr --inidx --odkvp label s,t" gives "s=a,t=b,3=c,4=d"
762
763 least-frequent
764 Usage: mlr least-frequent [options]
765 Shows the least frequently occurring distinct values for specified field names.
766 The first entry is the statistical anti-mode; the remaining are runners-up.
767 Options:
768 -f {one or more comma-separated field names}. Required flag.
769 -n {count}. Optional flag defaulting to 10.
770 -b Suppress counts; show only field values.
771 -o {name} Field name for output count. Default "count".
772 See also "mlr most-frequent".
773
774 merge-fields
775 Usage: mlr merge-fields [options]
776 Computes univariate statistics for each input record, accumulated across
777 specified fields.
778 Options:
779 -a {sum,count,...} Names of accumulators. One or more of:
780 count Count instances of fields
781 mode Find most-frequently-occurring values for fields; first-found wins tie
782 antimode Find least-frequently-occurring values for fields; first-found wins tie
783 sum Compute sums of specified fields
784 mean Compute averages (sample means) of specified fields
785 stddev Compute sample standard deviation of specified fields
786 var Compute sample variance of specified fields
787 meaneb Estimate error bars for averages (assuming no sample autocorrelation)
788 skewness Compute sample skewness of specified fields
789 kurtosis Compute sample kurtosis of specified fields
790 min Compute minimum values of specified fields
791 max Compute maximum values of specified fields
792 -f {a,b,c} Value-field names on which to compute statistics. Requires -o.
793 -r {a,b,c} Regular expressions for value-field names on which to compute
794 statistics. Requires -o.
795 -c {a,b,c} Substrings for collapse mode. All fields which have the same names
796 after removing substrings will be accumulated together. Please see
797 examples below.
798 -i Use interpolated percentiles, like R's type=7; default like type=1.
799 Not sensical for string-valued fields.
800 -o {name} Output field basename for -f/-r.
801 -k Keep the input fields which contributed to the output statistics;
802 the default is to omit them.
803 -F Computes integerable things (e.g. count) in floating point.
804
805 String-valued data make sense unless arithmetic on them is required,
806 e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
807 numbers are less than strings.
808
809 Example input data: "a_in_x=1,a_out_x=2,b_in_y=4,b_out_x=8".
810 Example: mlr merge-fields -a sum,count -f a_in_x,a_out_x -o foo
811 produces "b_in_y=4,b_out_x=8,foo_sum=3,foo_count=2" since "a_in_x,a_out_x" are
812 summed over.
813 Example: mlr merge-fields -a sum,count -r in_,out_ -o bar
814 produces "bar_sum=15,bar_count=4" since all four fields are summed over.
815 Example: mlr merge-fields -a sum,count -c in_,out_
816 produces "a_x_sum=3,a_x_count=2,b_y_sum=4,b_y_count=1,b_x_sum=8,b_x_count=1"
817 since "a_in_x" and "a_out_x" both collapse to "a_x", "b_in_y" collapses to
818 "b_y", and "b_out_x" collapses to "b_x".
819
820 most-frequent
821 Usage: mlr most-frequent [options]
822 Shows the most frequently occurring distinct values for specified field names.
823 The first entry is the statistical mode; the remaining are runners-up.
824 Options:
825 -f {one or more comma-separated field names}. Required flag.
826 -n {count}. Optional flag defaulting to 10.
827 -b Suppress counts; show only field values.
828 -o {name} Field name for output count. Default "count".
829 See also "mlr least-frequent".
830
831 nest
832 Usage: mlr nest [options]
833 Explodes specified field values into separate fields/records, or reverses this.
834 Options:
835 --explode,--implode One is required.
836 --values,--pairs One is required.
837 --across-records,--across-fields One is required.
838 -f {field name} Required.
839 --nested-fs {string} Defaults to ";". Field separator for nested values.
840 --nested-ps {string} Defaults to ":". Pair separator for nested key-value pairs.
841 --evar {string} Shorthand for --explode --values ---across-records --nested-fs {string}
842 --ivar {string} Shorthand for --implode --values ---across-records --nested-fs {string}
843 Please use "mlr --usage-separator-options" for information on specifying separators.
844
845 Examples:
846
847 mlr nest --explode --values --across-records -f x
848 with input record "x=a;b;c,y=d" produces output records
849 "x=a,y=d"
850 "x=b,y=d"
851 "x=c,y=d"
852 Use --implode to do the reverse.
853
854 mlr nest --explode --values --across-fields -f x
855 with input record "x=a;b;c,y=d" produces output records
856 "x_1=a,x_2=b,x_3=c,y=d"
857 Use --implode to do the reverse.
858
859 mlr nest --explode --pairs --across-records -f x
860 with input record "x=a:1;b:2;c:3,y=d" produces output records
861 "a=1,y=d"
862 "b=2,y=d"
863 "c=3,y=d"
864
865 mlr nest --explode --pairs --across-fields -f x
866 with input record "x=a:1;b:2;c:3,y=d" produces output records
867 "a=1,b=2,c=3,y=d"
868
869 Notes:
870 * With --pairs, --implode doesn't make sense since the original field name has
871 been lost.
872 * The combination "--implode --values --across-records" is non-streaming:
873 no output records are produced until all input records have been read. In
874 particular, this means it won't work in tail -f contexts. But all other flag
875 combinations result in streaming (tail -f friendly) data processing.
876 * It's up to you to ensure that the nested-fs is distinct from your data's IFS:
877 e.g. by default the former is semicolon and the latter is comma.
878 See also mlr reshape.
879
880 nothing
881 Usage: mlr nothing [options]
882 Drops all input records. Useful for testing, or after tee/print/etc. have
883 produced other output.
884
885 put
886 Usage: mlr put [options] {expression}
887 Adds/updates specified field(s). Expressions are semicolon-separated and must
888 either be assignments, or evaluate to boolean. Booleans with following
889 statements in curly braces control whether those statements are executed;
890 booleans without following curly braces do nothing except side effects (e.g.
891 regex-captures into \1, \2, etc.).
892
893 Conversion options:
894 -S: Keeps field values as strings with no type inference to int or float.
895 -F: Keeps field values as strings or floats with no inference to int.
896 All field values are type-inferred to int/float/string unless this behavior is
897 suppressed with -S or -F.
898
899 Output/formatting options:
900 --oflatsep {string}: Separator to use when flattening multi-level @-variables
901 to output records for emit. Default ":".
902 --jknquoteint: For dump output (JSON-formatted), do not quote map keys if non-string.
903 --jvquoteall: For dump output (JSON-formatted), quote map values even if non-string.
904 Any of the output-format command-line flags (see mlr -h). Example: using
905 mlr --icsv --opprint ... then put --ojson 'tee > "mytap-".$a.".dat", $*' then ...
906 the input is CSV, the output is pretty-print tabular, but the tee-file output
907 is written in JSON format.
908 --no-fflush: for emit, tee, print, and dump, don't call fflush() after every
909 record.
910
911 Expression-specification options:
912 -f {filename}: the DSL expression is taken from the specified file rather
913 than from the command line. Outer single quotes wrapping the expression
914 should not be placed in the file. If -f is specified more than once,
915 all input files specified using -f are concatenated to produce the expression.
916 (For example, you can define functions in one file and call them from another.)
917 -e {expression}: You can use this after -f to add an expression. Example use
918 case: define functions/subroutines in a file you specify with -f, then call
919 them with an expression you specify with -e.
920 (If you mix -e and -f then the expressions are evaluated in the order encountered.
921 Since the expression pieces are simply concatenated, please be sure to use intervening
922 semicolons to separate expressions.)
923
924 Tracing options:
925 -v: Prints the expressions's AST (abstract syntax tree), which gives
926 full transparency on the precedence and associativity rules of
927 Miller's grammar, to stdout.
928 -a: Prints a low-level stack-allocation trace to stdout.
929 -t: Prints a low-level parser trace to stderr.
930 -T: Prints a every statement to stderr as it is executed.
931
932 Other options:
933 -q: Does not include the modified record in the output stream. Useful for when
934 all desired output is in begin and/or end blocks.
935
936 Please use a dollar sign for field names and double-quotes for string
937 literals. If field names have special characters such as "." then you might
938 use braces, e.g. '${field.name}'. Miller built-in variables are
939 NF NR FNR FILENUM FILENAME M_PI M_E, and ENV["namegoeshere"] to access environment
940 variables. The environment-variable name may be an expression, e.g. a field
941 value.
942
943 Use # to comment to end of line.
944
945 Examples:
946 mlr put '$y = log10($x); $z = sqrt($y)'
947 mlr put '$x>0.0 { $y=log10($x); $z=sqrt($y) }' # does {...} only if $x > 0.0
948 mlr put '$x>0.0; $y=log10($x); $z=sqrt($y)' # does all three statements
949 mlr put '$a =~ "([a-z]+)_([0-9]+); $b = "left_\1"; $c = "right_\2"'
950 mlr put '$a =~ "([a-z]+)_([0-9]+) { $b = "left_\1"; $c = "right_\2" }'
951 mlr put '$filename = FILENAME'
952 mlr put '$colored_shape = $color . "_" . $shape'
953 mlr put '$y = cos($theta); $z = atan2($y, $x)'
954 mlr put '$name = sub($name, "http.*com"i, "")'
955 mlr put -q '@sum += $x; end {emit @sum}'
956 mlr put -q '@sum[$a] += $x; end {emit @sum, "a"}'
957 mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}'
958 mlr put -q '@min=min(@min,$x);@max=max(@max,$x); end{emitf @min, @max}'
959 mlr put -q 'is_null(@xmax) || $x > @xmax {@xmax=$x; @recmax=$*}; end {emit @recmax}'
960 mlr put '
961 $x = 1;
962 #$y = 2;
963 $z = 3
964 '
965
966 Please see also 'mlr -k' for examples using redirected output.
967
968 Please see http://johnkerl.org/miller/doc/reference.html for more information
969 including function list. Or "mlr -f".
970 Please see in particular:
971 http://www.johnkerl.org/miller/doc/reference.html#put
972
973 regularize
974 Usage: mlr regularize
975 For records seen earlier in the data stream with same field names in
976 a different order, outputs them with field names in the previously
977 encountered order.
978 Example: input records a=1,c=2,b=3, then e=4,d=5, then c=7,a=6,b=8
979 output as a=1,c=2,b=3, then e=4,d=5, then a=6,c=7,b=8
980
981 remove-empty-columns
982 Usage: mlr remove-empty-columns
983 Omits fields which are empty on every input row. Non-streaming.
984
985 rename
986 Usage: mlr rename [options] {old1,new1,old2,new2,...}
987 Renames specified fields.
988 Options:
989 -r Treat old field names as regular expressions. "ab", "a.*b"
990 will match any field name containing the substring "ab" or
991 matching "a.*b", respectively; anchors of the form "^ab$",
992 "^a.*b$" may be used. New field names may be plain strings,
993 or may contain capture groups of the form "\1" through
994 "\9". Wrapping the regex in double quotes is optional, but
995 is required if you wish to follow it with 'i' to indicate
996 case-insensitivity.
997 -g Do global replacement within each field name rather than
998 first-match replacement.
999 Examples:
1000 mlr rename old_name,new_name'
1001 mlr rename old_name_1,new_name_1,old_name_2,new_name_2'
1002 mlr rename -r 'Date_[0-9]+,Date,' Rename all such fields to be "Date"
1003 mlr rename -r '"Date_[0-9]+",Date' Same
1004 mlr rename -r 'Date_([0-9]+).*,\1' Rename all such fields to be of the form 20151015
1005 mlr rename -r '"name"i,Name' Rename "name", "Name", "NAME", etc. to "Name"
1006
1007 reorder
1008 Usage: mlr reorder [options]
1009 -f {a,b,c} Field names to reorder.
1010 -e Put specified field names at record end: default is to put
1011 them at record start.
1012 Examples:
1013 mlr reorder -f a,b sends input record "d=4,b=2,a=1,c=3" to "a=1,b=2,d=4,c=3".
1014 mlr reorder -e -f a,b sends input record "d=4,b=2,a=1,c=3" to "d=4,c=3,a=1,b=2".
1015
1016 repeat
1017 Usage: mlr repeat [options]
1018 Copies input records to output records multiple times.
1019 Options must be exactly one of the following:
1020 -n {repeat count} Repeat each input record this many times.
1021 -f {field name} Same, but take the repeat count from the specified
1022 field name of each input record.
1023 Example:
1024 echo x=0 | mlr repeat -n 4 then put '$x=urand()'
1025 produces:
1026 x=0.488189
1027 x=0.484973
1028 x=0.704983
1029 x=0.147311
1030 Example:
1031 echo a=1,b=2,c=3 | mlr repeat -f b
1032 produces:
1033 a=1,b=2,c=3
1034 a=1,b=2,c=3
1035 Example:
1036 echo a=1,b=2,c=3 | mlr repeat -f c
1037 produces:
1038 a=1,b=2,c=3
1039 a=1,b=2,c=3
1040 a=1,b=2,c=3
1041
1042 reshape
1043 Usage: mlr reshape [options]
1044 Wide-to-long options:
1045 -i {input field names} -o {key-field name,value-field name}
1046 -r {input field regexes} -o {key-field name,value-field name}
1047 These pivot/reshape the input data such that the input fields are removed
1048 and separate records are emitted for each key/value pair.
1049 Note: this works with tail -f and produces output records for each input
1050 record seen.
1051 Long-to-wide options:
1052 -s {key-field name,value-field name}
1053 These pivot/reshape the input data to undo the wide-to-long operation.
1054 Note: this does not work with tail -f; it produces output records only after
1055 all input records have been read.
1056
1057 Examples:
1058
1059 Input file "wide.txt":
1060 time X Y
1061 2009-01-01 0.65473572 2.4520609
1062 2009-01-02 -0.89248112 0.2154713
1063 2009-01-03 0.98012375 1.3179287
1064
1065 mlr --pprint reshape -i X,Y -o item,value wide.txt
1066 time item value
1067 2009-01-01 X 0.65473572
1068 2009-01-01 Y 2.4520609
1069 2009-01-02 X -0.89248112
1070 2009-01-02 Y 0.2154713
1071 2009-01-03 X 0.98012375
1072 2009-01-03 Y 1.3179287
1073
1074 mlr --pprint reshape -r '[A-Z]' -o item,value wide.txt
1075 time item value
1076 2009-01-01 X 0.65473572
1077 2009-01-01 Y 2.4520609
1078 2009-01-02 X -0.89248112
1079 2009-01-02 Y 0.2154713
1080 2009-01-03 X 0.98012375
1081 2009-01-03 Y 1.3179287
1082
1083 Input file "long.txt":
1084 time item value
1085 2009-01-01 X 0.65473572
1086 2009-01-01 Y 2.4520609
1087 2009-01-02 X -0.89248112
1088 2009-01-02 Y 0.2154713
1089 2009-01-03 X 0.98012375
1090 2009-01-03 Y 1.3179287
1091
1092 mlr --pprint reshape -s item,value long.txt
1093 time X Y
1094 2009-01-01 0.65473572 2.4520609
1095 2009-01-02 -0.89248112 0.2154713
1096 2009-01-03 0.98012375 1.3179287
1097 See also mlr nest.
1098
1099 sample
1100 Usage: mlr sample [options]
1101 Reservoir sampling (subsampling without replacement), optionally by category.
1102 -k {count} Required: number of records to output, total, or by group if using -g.
1103 -g {a,b,c} Optional: group-by-field names for samples.
1104 See also mlr bootstrap and mlr shuffle.
1105
1106 sec2gmt
1107 Usage: mlr sec2gmt [options] {comma-separated list of field names}
1108 Replaces a numeric field representing seconds since the epoch with the
1109 corresponding GMT timestamp; leaves non-numbers as-is. This is nothing
1110 more than a keystroke-saver for the sec2gmt function:
1111 mlr sec2gmt time1,time2
1112 is the same as
1113 mlr put '$time1=sec2gmt($time1);$time2=sec2gmt($time2)'
1114 Options:
1115 -1 through -9: format the seconds using 1..9 decimal places, respectively.
1116
1117 sec2gmtdate
1118 Usage: mlr sec2gmtdate {comma-separated list of field names}
1119 Replaces a numeric field representing seconds since the epoch with the
1120 corresponding GMT year-month-day timestamp; leaves non-numbers as-is.
1121 This is nothing more than a keystroke-saver for the sec2gmtdate function:
1122 mlr sec2gmtdate time1,time2
1123 is the same as
1124 mlr put '$time1=sec2gmtdate($time1);$time2=sec2gmtdate($time2)'
1125
1126 seqgen
1127 Usage: mlr seqgen [options]
1128 Produces a sequence of counters. Discards the input record stream. Produces
1129 output as specified by the following options:
1130 -f {name} Field name for counters; default "i".
1131 --start {number} Inclusive start value; default "1".
1132 --stop {number} Inclusive stop value; default "100".
1133 --step {number} Step value; default "1".
1134 Start, stop, and/or step may be floating-point. Output is integer if start,
1135 stop, and step are all integers. Step may be negative. It may not be zero
1136 unless start == stop.
1137
1138 shuffle
1139 Usage: mlr shuffle {no options}
1140 Outputs records randomly permuted. No output records are produced until
1141 all input records are read.
1142 See also mlr bootstrap and mlr sample.
1143
1144 skip-trivial-records
1145 Usage: mlr skip-trivial-records [options]
1146 Passes through all records except:
1147 * those with zero fields;
1148 * those for which all fields have empty value.
1149
1150 sort
1151 Usage: mlr sort {flags}
1152 Flags:
1153 -f {comma-separated field names} Lexical ascending
1154 -n {comma-separated field names} Numerical ascending; nulls sort last
1155 -nf {comma-separated field names} Same as -n
1156 -r {comma-separated field names} Lexical descending
1157 -nr {comma-separated field names} Numerical descending; nulls sort first
1158 Sorts records primarily by the first specified field, secondarily by the second
1159 field, and so on. (Any records not having all specified sort keys will appear
1160 at the end of the output, in the order they were encountered, regardless of the
1161 specified sort order.) The sort is stable: records that compare equal will sort
1162 in the order they were encountered in the input record stream.
1163
1164 Example:
1165 mlr sort -f a,b -nr x,y,z
1166 which is the same as:
1167 mlr sort -f a -f b -nr x -nr y -nr z
1168
1169 stats1
1170 Usage: mlr stats1 [options]
1171 Computes univariate statistics for one or more given fields, accumulated across
1172 the input record stream.
1173 Options:
1174 -a {sum,count,...} Names of accumulators: p10 p25.2 p50 p98 p100 etc. and/or
1175 one or more of:
1176 count Count instances of fields
1177 mode Find most-frequently-occurring values for fields; first-found wins tie
1178 antimode Find least-frequently-occurring values for fields; first-found wins tie
1179 sum Compute sums of specified fields
1180 mean Compute averages (sample means) of specified fields
1181 stddev Compute sample standard deviation of specified fields
1182 var Compute sample variance of specified fields
1183 meaneb Estimate error bars for averages (assuming no sample autocorrelation)
1184 skewness Compute sample skewness of specified fields
1185 kurtosis Compute sample kurtosis of specified fields
1186 min Compute minimum values of specified fields
1187 max Compute maximum values of specified fields
1188 -f {a,b,c} Value-field names on which to compute statistics
1189 --fr {regex} Regex for value-field names on which to compute statistics
1190 (compute statsitics on values in all field names matching regex)
1191 --fx {regex} Inverted regex for value-field names on which to compute statistics
1192 (compute statsitics on values in all field names not matching regex)
1193 -g {d,e,f} Optional group-by-field names
1194 --gr {regex} Regex for optional group-by-field names
1195 (group by values in field names matching regex)
1196 --gx {regex} Inverted regex for optional group-by-field names
1197 (group by values in field names not matching regex)
1198 --grfx {regex} Shorthand for --gr {regex} --fx {that same regex}
1199 -i Use interpolated percentiles, like R's type=7; default like type=1.
1200 Not sensical for string-valued fields.
1201 -s Print iterative stats. Useful in tail -f contexts (in which
1202 case please avoid pprint-format output since end of input
1203 stream will never be seen).
1204 -F Computes integerable things (e.g. count) in floating point.
1205 Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
1206 Example: mlr stats1 -a count,mode -f size
1207 Example: mlr stats1 -a count,mode -f size -g shape
1208 Example: mlr stats1 -a count,mode --fr '^[a-h].*$' -gr '^k.*$'
1209 This computes count and mode statistics on all field names beginning
1210 with a through h, grouped by all field names starting with k.
1211 Notes:
1212 * p50 and median are synonymous.
1213 * min and max output the same results as p0 and p100, respectively, but use
1214 less memory.
1215 * String-valued data make sense unless arithmetic on them is required,
1216 e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
1217 numbers are less than strings.
1218 * count and mode allow text input; the rest require numeric input.
1219 In particular, 1 and 1.0 are distinct text for count and mode.
1220 * When there are mode ties, the first-encountered datum wins.
1221
1222 stats2
1223 Usage: mlr stats2 [options]
1224 Computes bivariate statistics for one or more given field-name pairs,
1225 accumulated across the input record stream.
1226 -a {linreg-ols,corr,...} Names of accumulators: one or more of:
1227 linreg-pca Linear regression using principal component analysis
1228 linreg-ols Linear regression using ordinary least squares
1229 r2 Quality metric for linreg-ols (linreg-pca emits its own)
1230 logireg Logistic regression
1231 corr Sample correlation
1232 cov Sample covariance
1233 covx Sample-covariance matrix
1234 -f {a,b,c,d} Value-field name-pairs on which to compute statistics.
1235 There must be an even number of names.
1236 -g {e,f,g} Optional group-by-field names.
1237 -v Print additional output for linreg-pca.
1238 -s Print iterative stats. Useful in tail -f contexts (in which
1239 case please avoid pprint-format output since end of input
1240 stream will never be seen).
1241 --fit Rather than printing regression parameters, applies them to
1242 the input data to compute new fit fields. All input records are
1243 held in memory until end of input stream. Has effect only for
1244 linreg-ols, linreg-pca, and logireg.
1245 Only one of -s or --fit may be used.
1246 Example: mlr stats2 -a linreg-pca -f x,y
1247 Example: mlr stats2 -a linreg-ols,r2 -f x,y -g size,shape
1248 Example: mlr stats2 -a corr -f x,y
1249
1250 step
1251 Usage: mlr step [options]
1252 Computes values dependent on the previous record, optionally grouped
1253 by category.
1254
1255 Options:
1256 -a {delta,rsum,...} Names of steppers: comma-separated, one or more of:
1257 delta Compute differences in field(s) between successive records
1258 shift Include value(s) in field(s) from previous record, if any
1259 from-first Compute differences in field(s) from first record
1260 ratio Compute ratios in field(s) between successive records
1261 rsum Compute running sums of field(s) between successive records
1262 counter Count instances of field(s) between successive records
1263 ewma Exponentially weighted moving average over successive records
1264 -f {a,b,c} Value-field names on which to compute statistics
1265 -g {d,e,f} Optional group-by-field names
1266 -F Computes integerable things (e.g. counter) in floating point.
1267 -d {x,y,z} Weights for ewma. 1 means current sample gets all weight (no
1268 smoothing), near under under 1 is light smoothing, near over 0 is
1269 heavy smoothing. Multiple weights may be specified, e.g.
1270 "mlr step -a ewma -f sys_load -d 0.01,0.1,0.9". Default if omitted
1271 is "-d 0.5".
1272 -o {a,b,c} Custom suffixes for EWMA output fields. If omitted, these default to
1273 the -d values. If supplied, the number of -o values must be the same
1274 as the number of -d values.
1275
1276 Examples:
1277 mlr step -a rsum -f request_size
1278 mlr step -a delta -f request_size -g hostname
1279 mlr step -a ewma -d 0.1,0.9 -f x,y
1280 mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y
1281 mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y -g group_name
1282
1283 Please see http://johnkerl.org/miller/doc/reference.html#filter or
1284 https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average
1285 for more information on EWMA.
1286
1287 tac
1288 Usage: mlr tac
1289 Prints records in reverse order from the order in which they were encountered.
1290
1291 tail
1292 Usage: mlr tail [options]
1293 -n {count} Tail count to print; default 10
1294 -g {a,b,c} Optional group-by-field names for tail counts
1295 Passes through the last n records, optionally by category.
1296
1297 tee
1298 Usage: mlr tee [options] {filename}
1299 Passes through input records (like mlr cat) but also writes to specified output
1300 file, using output-format flags from the command line (e.g. --ocsv). See also
1301 the "tee" keyword within mlr put, which allows data-dependent filenames.
1302 Options:
1303 -a: append to existing file, if any, rather than overwriting.
1304 --no-fflush: don't call fflush() after every record.
1305 Any of the output-format command-line flags (see mlr -h). Example: using
1306 mlr --icsv --opprint put '...' then tee --ojson ./mytap.dat then stats1 ...
1307 the input is CSV, the output is pretty-print tabular, but the tee-file output
1308 is written in JSON format.
1309
1310 top
1311 Usage: mlr top [options]
1312 -f {a,b,c} Value-field names for top counts.
1313 -g {d,e,f} Optional group-by-field names for top counts.
1314 -n {count} How many records to print per category; default 1.
1315 -a Print all fields for top-value records; default is
1316 to print only value and group-by fields. Requires a single
1317 value-field name only.
1318 --min Print top smallest values; default is top largest values.
1319 -F Keep top values as floats even if they look like integers.
1320 -o {name} Field name for output indices. Default "top_idx".
1321 Prints the n records with smallest/largest values at specified fields,
1322 optionally by category.
1323
1324 uniq
1325 Usage: mlr uniq [options]
1326 Prints distinct values for specified field names. With -c, same as
1327 count-distinct. For uniq, -f is a synonym for -g.
1328
1329 Options:
1330 -g {d,e,f} Group-by-field names for uniq counts.
1331 -c Show repeat counts in addition to unique values.
1332 -n Show only the number of distinct values.
1333 -o {name} Field name for output count. Default "count".
1334 -a Output each unique record only once. Incompatible with -g.
1335 With -c, produces unique records, with repeat counts for each.
1336 With -n, produces only one record which is the unique-record count.
1337 With neither -c nor -n, produces unique records.
1338
1339 unsparsify
1340 Usage: mlr unsparsify [options]
1341 Prints records with the union of field names over all input records.
1342 For field names absent in a given record but present in others, fills in
1343 a value. This verb retains all input before producing any output.
1344
1345 Options:
1346 --fill-with {filler string} What to fill absent fields with. Defaults to
1347 the empty string.
1348
1349 Example: if the input is two records, one being 'a=1,b=2' and the other
1350 being 'b=3,c=4', then the output is the two records 'a=1,b=2,c=' and
1351 ’a=,b=3,c=4'.
1352
1354 +
1355 (class=arithmetic #args=2): Addition.
1356
1357 + (class=arithmetic #args=1): Unary plus.
1358
1359 -
1360 (class=arithmetic #args=2): Subtraction.
1361
1362 - (class=arithmetic #args=1): Unary minus.
1363
1364 *
1365 (class=arithmetic #args=2): Multiplication.
1366
1367 /
1368 (class=arithmetic #args=2): Division.
1369
1370 //
1371 (class=arithmetic #args=2): Integer division: rounds to negative (pythonic).
1372
1373 .+
1374 (class=arithmetic #args=2): Addition, with integer-to-integer overflow
1375
1376 .+ (class=arithmetic #args=1): Unary plus, with integer-to-integer overflow.
1377
1378 .-
1379 (class=arithmetic #args=2): Subtraction, with integer-to-integer overflow.
1380
1381 .- (class=arithmetic #args=1): Unary minus, with integer-to-integer overflow.
1382
1383 .*
1384 (class=arithmetic #args=2): Multiplication, with integer-to-integer overflow.
1385
1386 ./
1387 (class=arithmetic #args=2): Division, with integer-to-integer overflow.
1388
1389 .//
1390 (class=arithmetic #args=2): Integer division: rounds to negative (pythonic), with integer-to-integer overflow.
1391
1392 %
1393 (class=arithmetic #args=2): Remainder; never negative-valued (pythonic).
1394
1395 **
1396 (class=arithmetic #args=2): Exponentiation; same as pow, but as an infix
1397 operator.
1398
1399 |
1400 (class=arithmetic #args=2): Bitwise OR.
1401
1402 ^
1403 (class=arithmetic #args=2): Bitwise XOR.
1404
1405 &
1406 (class=arithmetic #args=2): Bitwise AND.
1407
1408 ~
1409 (class=arithmetic #args=1): Bitwise NOT. Beware '$y=~$x' since =~ is the
1410 regex-match operator: try '$y = ~$x'.
1411
1412 <<
1413 (class=arithmetic #args=2): Bitwise left-shift.
1414
1415 >>
1416 (class=arithmetic #args=2): Bitwise right-shift.
1417
1418 bitcount
1419 (class=arithmetic #args=1): Count of 1-bits
1420
1421 ==
1422 (class=boolean #args=2): String/numeric equality. Mixing number and string
1423 results in string compare.
1424
1425 !=
1426 (class=boolean #args=2): String/numeric inequality. Mixing number and string
1427 results in string compare.
1428
1429 =~
1430 (class=boolean #args=2): String (left-hand side) matches regex (right-hand
1431 side), e.g. '$name =~ "^a.*b$"'.
1432
1433 !=~
1434 (class=boolean #args=2): String (left-hand side) does not match regex
1435 (right-hand side), e.g. '$name !=~ "^a.*b$"'.
1436
1437 >
1438 (class=boolean #args=2): String/numeric greater-than. Mixing number and string
1439 results in string compare.
1440
1441 >=
1442 (class=boolean #args=2): String/numeric greater-than-or-equals. Mixing number
1443 and string results in string compare.
1444
1445 <
1446 (class=boolean #args=2): String/numeric less-than. Mixing number and string
1447 results in string compare.
1448
1449 <=
1450 (class=boolean #args=2): String/numeric less-than-or-equals. Mixing number
1451 and string results in string compare.
1452
1453 &&
1454 (class=boolean #args=2): Logical AND.
1455
1456 ||
1457 (class=boolean #args=2): Logical OR.
1458
1459 ^^
1460 (class=boolean #args=2): Logical XOR.
1461
1462 !
1463 (class=boolean #args=1): Logical negation.
1464
1465 ? :
1466 (class=boolean #args=3): Ternary operator.
1467
1468 .
1469 (class=string #args=2): String concatenation.
1470
1471 gsub
1472 (class=string #args=3): Example: '$name=gsub($name, "old", "new")'
1473 (replace all).
1474
1475 regextract
1476 (class=string #args=2): Example: '$name=regextract($name, "[A-Z]{3}[0-9]{2}")'
1477 .
1478
1479 regextract_or_else
1480 (class=string #args=3): Example: '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")'
1481 .
1482
1483 strlen
1484 (class=string #args=1): String length.
1485
1486 sub
1487 (class=string #args=3): Example: '$name=sub($name, "old", "new")'
1488 (replace once).
1489
1490 ssub
1491 (class=string #args=3): Like sub but does no regexing. No characters are special.
1492
1493 substr
1494 (class=string #args=3): substr(s,m,n) gives substring of s from 0-up position m to n
1495 inclusive. Negative indices -len .. -1 alias to 0 .. len-1.
1496
1497 tolower
1498 (class=string #args=1): Convert string to lowercase.
1499
1500 toupper
1501 (class=string #args=1): Convert string to uppercase.
1502
1503 capitalize
1504 (class=string #args=1): Convert string's first character to uppercase.
1505
1506 lstrip
1507 (class=string #args=1): Strip leading whitespace from string.
1508
1509 rstrip
1510 (class=string #args=1): Strip trailing whitespace from string.
1511
1512 strip
1513 (class=string #args=1): Strip leading and trailing whitespace from string.
1514
1515 collapse_whitespace
1516 (class=string #args=1): Strip repeated whitespace from string.
1517
1518 clean_whitespace
1519 (class=string #args=1): Same as collapse_whitespace and strip.
1520
1521 system
1522 (class=string #args=1): Run command string, yielding its stdout minus final carriage return.
1523
1524 abs
1525 (class=math #args=1): Absolute value.
1526
1527 acos
1528 (class=math #args=1): Inverse trigonometric cosine.
1529
1530 acosh
1531 (class=math #args=1): Inverse hyperbolic cosine.
1532
1533 asin
1534 (class=math #args=1): Inverse trigonometric sine.
1535
1536 asinh
1537 (class=math #args=1): Inverse hyperbolic sine.
1538
1539 atan
1540 (class=math #args=1): One-argument arctangent.
1541
1542 atan2
1543 (class=math #args=2): Two-argument arctangent.
1544
1545 atanh
1546 (class=math #args=1): Inverse hyperbolic tangent.
1547
1548 cbrt
1549 (class=math #args=1): Cube root.
1550
1551 ceil
1552 (class=math #args=1): Ceiling: nearest integer at or above.
1553
1554 cos
1555 (class=math #args=1): Trigonometric cosine.
1556
1557 cosh
1558 (class=math #args=1): Hyperbolic cosine.
1559
1560 erf
1561 (class=math #args=1): Error function.
1562
1563 erfc
1564 (class=math #args=1): Complementary error function.
1565
1566 exp
1567 (class=math #args=1): Exponential function e**x.
1568
1569 expm1
1570 (class=math #args=1): e**x - 1.
1571
1572 floor
1573 (class=math #args=1): Floor: nearest integer at or below.
1574
1575 invqnorm
1576 (class=math #args=1): Inverse of normal cumulative distribution
1577 function. Note that invqorm(urand()) is normally distributed.
1578
1579 log
1580 (class=math #args=1): Natural (base-e) logarithm.
1581
1582 log10
1583 (class=math #args=1): Base-10 logarithm.
1584
1585 log1p
1586 (class=math #args=1): log(1-x).
1587
1588 logifit
1589 (class=math #args=3): Given m and b from logistic regression, compute
1590 fit: $yhat=logifit($x,$m,$b).
1591
1592 madd
1593 (class=math #args=3): a + b mod m (integers)
1594
1595 max
1596 (class=math variadic): max of n numbers; null loses
1597
1598 mexp
1599 (class=math #args=3): a ** b mod m (integers)
1600
1601 min
1602 (class=math variadic): Min of n numbers; null loses
1603
1604 mmul
1605 (class=math #args=3): a * b mod m (integers)
1606
1607 msub
1608 (class=math #args=3): a - b mod m (integers)
1609
1610 pow
1611 (class=math #args=2): Exponentiation; same as **.
1612
1613 qnorm
1614 (class=math #args=1): Normal cumulative distribution function.
1615
1616 round
1617 (class=math #args=1): Round to nearest integer.
1618
1619 roundm
1620 (class=math #args=2): Round to nearest multiple of m: roundm($x,$m) is
1621 the same as round($x/$m)*$m
1622
1623 sgn
1624 (class=math #args=1): +1 for positive input, 0 for zero input, -1 for
1625 negative input.
1626
1627 sin
1628 (class=math #args=1): Trigonometric sine.
1629
1630 sinh
1631 (class=math #args=1): Hyperbolic sine.
1632
1633 sqrt
1634 (class=math #args=1): Square root.
1635
1636 tan
1637 (class=math #args=1): Trigonometric tangent.
1638
1639 tanh
1640 (class=math #args=1): Hyperbolic tangent.
1641
1642 urand
1643 (class=math #args=0): Floating-point numbers uniformly distributed on the unit interval.
1644 Int-valued example: '$n=floor(20+urand()*11)'.
1645
1646 urandrange
1647 (class=math #args=2): Floating-point numbers uniformly distributed on the interval [a, b).
1648
1649 urand32
1650 (class=math #args=0): Integer uniformly distributed 0 and 2**32-1
1651 inclusive.
1652
1653 urandint
1654 (class=math #args=2): Integer uniformly distributed between inclusive
1655 integer endpoints.
1656
1657 dhms2fsec
1658 (class=time #args=1): Recovers floating-point seconds as in
1659 dhms2fsec("5d18h53m20.250000s") = 500000.250000
1660
1661 dhms2sec
1662 (class=time #args=1): Recovers integer seconds as in
1663 dhms2sec("5d18h53m20s") = 500000
1664
1665 fsec2dhms
1666 (class=time #args=1): Formats floating-point seconds as in
1667 fsec2dhms(500000.25) = "5d18h53m20.250000s"
1668
1669 fsec2hms
1670 (class=time #args=1): Formats floating-point seconds as in
1671 fsec2hms(5000.25) = "01:23:20.250000"
1672
1673 gmt2sec
1674 (class=time #args=1): Parses GMT timestamp as integer seconds since
1675 the epoch.
1676
1677 localtime2sec
1678 (class=time #args=1): Parses local timestamp as integer seconds since
1679 the epoch. Consults $TZ environment variable.
1680
1681 hms2fsec
1682 (class=time #args=1): Recovers floating-point seconds as in
1683 hms2fsec("01:23:20.250000") = 5000.250000
1684
1685 hms2sec
1686 (class=time #args=1): Recovers integer seconds as in
1687 hms2sec("01:23:20") = 5000
1688
1689 sec2dhms
1690 (class=time #args=1): Formats integer seconds as in sec2dhms(500000)
1691 = "5d18h53m20s"
1692
1693 sec2gmt
1694 (class=time #args=1): Formats seconds since epoch (integer part)
1695 as GMT timestamp, e.g. sec2gmt(1440768801.7) = "2015-08-28T13:33:21Z".
1696 Leaves non-numbers as-is.
1697
1698 sec2gmt (class=time #args=2): Formats seconds since epoch as GMT timestamp with n
1699 decimal places for seconds, e.g. sec2gmt(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1700 Leaves non-numbers as-is.
1701
1702 sec2gmtdate
1703 (class=time #args=1): Formats seconds since epoch (integer part)
1704 as GMT timestamp with year-month-date, e.g. sec2gmtdate(1440768801.7) = "2015-08-28".
1705 Leaves non-numbers as-is.
1706
1707 sec2localtime
1708 (class=time #args=1): Formats seconds since epoch (integer part)
1709 as local timestamp, e.g. sec2localtime(1440768801.7) = "2015-08-28T13:33:21Z".
1710 Consults $TZ environment variable. Leaves non-numbers as-is.
1711
1712 sec2localtime (class=time #args=2): Formats seconds since epoch as local timestamp with n
1713 decimal places for seconds, e.g. sec2localtime(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
1714 Consults $TZ environment variable. Leaves non-numbers as-is.
1715
1716 sec2localdate
1717 (class=time #args=1): Formats seconds since epoch (integer part)
1718 as local timestamp with year-month-date, e.g. sec2localdate(1440768801.7) = "2015-08-28".
1719 Consults $TZ environment variable. Leaves non-numbers as-is.
1720
1721 sec2hms
1722 (class=time #args=1): Formats integer seconds as in
1723 sec2hms(5000) = "01:23:20"
1724
1725 strftime
1726 (class=time #args=2): Formats seconds since the epoch as timestamp, e.g.
1727 strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z", and
1728 strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z".
1729 Format strings are as in the C library (please see "man strftime" on your system),
1730 with the Miller-specific addition of "%1S" through "%9S" which format the seconds
1731 with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.)
1732 See also strftime_local.
1733
1734 strftime_local
1735 (class=time #args=2): Like strftime but consults the $TZ environment variable to get local time zone.
1736
1737 strptime
1738 (class=time #args=2): Parses timestamp as floating-point seconds since the epoch,
1739 e.g. strptime("2015-08-28T13:33:21Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.000000,
1740 and strptime("2015-08-28T13:33:21.345Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.345000.
1741 See also strptime_local.
1742
1743 strptime_local
1744 (class=time #args=2): Like strptime, but consults $TZ environment variable to find and use local timezone.
1745
1746 systime
1747 (class=time #args=0): Floating-point seconds since the epoch,
1748 e.g. 1440768801.748936.
1749
1750 is_absent
1751 (class=typing #args=1): False if field is present in input, false otherwise
1752
1753 is_bool
1754 (class=typing #args=1): True if field is present with boolean value. Synonymous with is_boolean.
1755
1756 is_boolean
1757 (class=typing #args=1): True if field is present with boolean value. Synonymous with is_bool.
1758
1759 is_empty
1760 (class=typing #args=1): True if field is present in input with empty string value, false otherwise.
1761
1762 is_empty_map
1763 (class=typing #args=1): True if argument is a map which is empty.
1764
1765 is_float
1766 (class=typing #args=1): True if field is present with value inferred to be float
1767
1768 is_int
1769 (class=typing #args=1): True if field is present with value inferred to be int
1770
1771 is_map
1772 (class=typing #args=1): True if argument is a map.
1773
1774 is_nonempty_map
1775 (class=typing #args=1): True if argument is a map which is non-empty.
1776
1777 is_not_empty
1778 (class=typing #args=1): False if field is present in input with empty value, true otherwise
1779
1780 is_not_map
1781 (class=typing #args=1): True if argument is not a map.
1782
1783 is_not_null
1784 (class=typing #args=1): False if argument is null (empty or absent), true otherwise.
1785
1786 is_null
1787 (class=typing #args=1): True if argument is null (empty or absent), false otherwise.
1788
1789 is_numeric
1790 (class=typing #args=1): True if field is present with value inferred to be int or float
1791
1792 is_present
1793 (class=typing #args=1): True if field is present in input, false otherwise.
1794
1795 is_string
1796 (class=typing #args=1): True if field is present with string (including empty-string) value
1797
1798 asserting_absent
1799 (class=typing #args=1): Returns argument if it is absent in the input data, else
1800 throws an error.
1801
1802 asserting_bool
1803 (class=typing #args=1): Returns argument if it is present with boolean value, else
1804 throws an error.
1805
1806 asserting_boolean
1807 (class=typing #args=1): Returns argument if it is present with boolean value, else
1808 throws an error.
1809
1810 asserting_empty
1811 (class=typing #args=1): Returns argument if it is present in input with empty value,
1812 else throws an error.
1813
1814 asserting_empty_map
1815 (class=typing #args=1): Returns argument if it is a map with empty value, else
1816 throws an error.
1817
1818 asserting_float
1819 (class=typing #args=1): Returns argument if it is present with float value, else
1820 throws an error.
1821
1822 asserting_int
1823 (class=typing #args=1): Returns argument if it is present with int value, else
1824 throws an error.
1825
1826 asserting_map
1827 (class=typing #args=1): Returns argument if it is a map, else throws an error.
1828
1829 asserting_nonempty_map
1830 (class=typing #args=1): Returns argument if it is a non-empty map, else throws
1831 an error.
1832
1833 asserting_not_empty
1834 (class=typing #args=1): Returns argument if it is present in input with non-empty
1835 value, else throws an error.
1836
1837 asserting_not_map
1838 (class=typing #args=1): Returns argument if it is not a map, else throws an error.
1839
1840 asserting_not_null
1841 (class=typing #args=1): Returns argument if it is non-null (non-empty and non-absent),
1842 else throws an error.
1843
1844 asserting_null
1845 (class=typing #args=1): Returns argument if it is null (empty or absent), else throws
1846 an error.
1847
1848 asserting_numeric
1849 (class=typing #args=1): Returns argument if it is present with int or float value,
1850 else throws an error.
1851
1852 asserting_present
1853 (class=typing #args=1): Returns argument if it is present in input, else throws
1854 an error.
1855
1856 asserting_string
1857 (class=typing #args=1): Returns argument if it is present with string (including
1858 empty-string) value, else throws an error.
1859
1860 boolean
1861 (class=conversion #args=1): Convert int/float/bool/string to boolean.
1862
1863 float
1864 (class=conversion #args=1): Convert int/float/bool/string to float.
1865
1866 fmtnum
1867 (class=conversion #args=2): Convert int/float/bool to string using
1868 printf-style format string, e.g. '$s = fmtnum($n, "%06lld")'. WARNING: Miller numbers
1869 are all long long or double. If you use formats like %d or %f, behavior is undefined.
1870
1871 hexfmt
1872 (class=conversion #args=1): Convert int to string, e.g. 255 to "0xff".
1873
1874 int
1875 (class=conversion #args=1): Convert int/float/bool/string to int.
1876
1877 string
1878 (class=conversion #args=1): Convert int/float/bool/string to string.
1879
1880 typeof
1881 (class=conversion #args=1): Convert argument to type of argument (e.g.
1882 MT_STRING). For debug.
1883
1884 depth
1885 (class=maps #args=1): Prints maximum depth of hashmap: ''. Scalars have depth 0.
1886
1887 haskey
1888 (class=maps #args=2): True/false if map has/hasn't key, e.g. 'haskey($*, "a")' or
1889 ’haskey(mymap, mykey)'. Error if 1st argument is not a map.
1890
1891 joink
1892 (class=maps #args=2): Makes string from map keys. E.g. 'joink($*, ",")'.
1893
1894 joinkv
1895 (class=maps #args=3): Makes string from map key-value pairs. E.g. 'joinkv(@v[2], "=", ",")'
1896
1897 joinv
1898 (class=maps #args=2): Makes string from map keys. E.g. 'joinv(mymap, ",")'.
1899
1900 leafcount
1901 (class=maps #args=1): Counts total number of terminal values in hashmap. For single-level maps,
1902 same as length.
1903
1904 length
1905 (class=maps #args=1): Counts number of top-level entries in hashmap. Scalars have length 1.
1906
1907 mapdiff
1908 (class=maps variadic): With 0 args, returns empty map. With 1 arg, returns copy of arg.
1909 With 2 or more, returns copy of arg 1 with all keys from any of remaining argument maps removed.
1910
1911 mapexcept
1912 (class=maps variadic): Returns a map with keys from remaining arguments, if any, unset.
1913 E.g. 'mapexcept({1:2,3:4,5:6}, 1, 5, 7)' is '{3:4}'.
1914
1915 mapselect
1916 (class=maps variadic): Returns a map with only keys from remaining arguments set.
1917 E.g. 'mapselect({1:2,3:4,5:6}, 1, 5, 7)' is '{1:2,5:6}'.
1918
1919 mapsum
1920 (class=maps variadic): With 0 args, returns empty map. With >= 1 arg, returns a map with
1921 key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'.
1922
1923 splitkv
1924 (class=maps #args=3): Splits string by separators into map with type inference.
1925 E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives '{"a" : 1, "b" : 2, "c" : 3}'.
1926
1927 splitkvx
1928 (class=maps #args=3): Splits string by separators into map without type inference (keys and
1929 values are strings). E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives
1930 ’{"a" : "1", "b" : "2", "c" : "3"}'.
1931
1932 splitnv
1933 (class=maps #args=2): Splits string by separator into integer-indexed map with type inference.
1934 E.g. 'splitnv("a,b,c" , ",")' gives '{1 : "a", 2 : "b", 3 : "c"}'.
1935
1936 splitnvx
1937 (class=maps #args=2): Splits string by separator into integer-indexed map without type
1938 inference (values are strings). E.g. 'splitnv("4,5,6" , ",")' gives '{1 : "4", 2 : "5", 3 : "6"}'.
1939
1941 all
1942 all: used in "emit", "emitp", and "unset" as a synonym for @*
1943
1944 begin
1945 begin: defines a block of statements to be executed before input records
1946 are ingested. The body statements must be wrapped in curly braces.
1947 Example: 'begin { @count = 0 }'
1948
1949 bool
1950 bool: declares a boolean local variable in the current curly-braced scope.
1951 Type-checking happens at assignment: 'bool b = 1' is an error.
1952
1953 break
1954 break: causes execution to continue after the body of the current
1955 for/while/do-while loop.
1956
1957 call
1958 call: used for invoking a user-defined subroutine.
1959 Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
1960
1961 continue
1962 continue: causes execution to skip the remaining statements in the body of
1963 the current for/while/do-while loop. For-loop increments are still applied.
1964
1965 do
1966 do: with "while", introduces a do-while loop. The body statements must be wrapped
1967 in curly braces.
1968
1969 dump
1970 dump: prints all currently defined out-of-stream variables immediately
1971 to stdout as JSON.
1972
1973 With >, >>, or |, the data do not become part of the output record stream but
1974 are instead redirected.
1975
1976 The > and >> are for write and append, as in the shell, but (as with awk) the
1977 file-overwrite for > is on first write, not per record. The | is for piping to
1978 a process which will process the data. There will be one open file for each
1979 distinct file name (for > and >>) or one subordinate process for each distinct
1980 value of the piped-to command (for |). Output-formatting flags are taken from
1981 the main command line.
1982
1983 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }'
1984 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump > "mytap.dat"}'
1985 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}'
1986 Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}'
1987
1988 edump
1989 edump: prints all currently defined out-of-stream variables immediately
1990 to stderr as JSON.
1991
1992 Example: mlr --from f.dat put -q '@v[NR]=$*; end { edump }'
1993
1994 elif
1995 elif: the way Miller spells "else if". The body statements must be wrapped
1996 in curly braces.
1997
1998 else
1999 else: terminates an if/elif/elif chain. The body statements must be wrapped
2000 in curly braces.
2001
2002 emit
2003 emit: inserts an out-of-stream variable into the output record stream. Hashmap
2004 indices present in the data but not slotted by emit arguments are not output.
2005
2006 With >, >>, or |, the data do not become part of the output record stream but
2007 are instead redirected.
2008
2009 The > and >> are for write and append, as in the shell, but (as with awk) the
2010 file-overwrite for > is on first write, not per record. The | is for piping to
2011 a process which will process the data. There will be one open file for each
2012 distinct file name (for > and >>) or one subordinate process for each distinct
2013 value of the piped-to command (for |). Output-formatting flags are taken from
2014 the main command line.
2015
2016 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2017 etc., to control the format of the output if the output is redirected. See also mlr -h.
2018
2019 Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, $*'
2020 Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, mapexcept($*, "a")'
2021 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums'
2022 Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums'
2023 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"'
2024 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"'
2025 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > "mytap.dat", @*, "index1", "index2"'
2026 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"'
2027 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"'
2028 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
2029 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"'
2030
2031 Please see http://johnkerl.org/miller/doc for more information.
2032
2033 emitf
2034 emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the
2035 output record stream.
2036
2037 With >, >>, or |, the data do not become part of the output record stream but
2038 are instead redirected.
2039
2040 The > and >> are for write and append, as in the shell, but (as with awk) the
2041 file-overwrite for > is on first write, not per record. The | is for piping to
2042 a process which will process the data. There will be one open file for each
2043 distinct file name (for > and >>) or one subordinate process for each distinct
2044 value of the piped-to command (for |). Output-formatting flags are taken from
2045 the main command line.
2046
2047 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2048 etc., to control the format of the output if the output is redirected. See also mlr -h.
2049
2050 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a'
2051 Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a'
2052 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c'
2053 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c'
2054 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c'
2055 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c'
2056 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c'
2057 Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c'
2058
2059 Please see http://johnkerl.org/miller/doc for more information.
2060
2061 emitp
2062 emitp: inserts an out-of-stream variable into the output record stream.
2063 Hashmap indices present in the data but not slotted by emitp arguments are
2064 output concatenated with ":".
2065
2066 With >, >>, or |, the data do not become part of the output record stream but
2067 are instead redirected.
2068
2069 The > and >> are for write and append, as in the shell, but (as with awk) the
2070 file-overwrite for > is on first write, not per record. The | is for piping to
2071 a process which will process the data. There will be one open file for each
2072 distinct file name (for > and >>) or one subordinate process for each distinct
2073 value of the piped-to command (for |). Output-formatting flags are taken from
2074 the main command line.
2075
2076 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2077 etc., to control the format of the output if the output is redirected. See also mlr -h.
2078
2079 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums'
2080 Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums'
2081 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"'
2082 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"'
2083 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > "mytap.dat", @*, "index1", "index2"'
2084 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"'
2085 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"'
2086 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
2087 Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"'
2088
2089 Please see http://johnkerl.org/miller/doc for more information.
2090
2091 end
2092 end: defines a block of statements to be executed after input records
2093 are ingested. The body statements must be wrapped in curly braces.
2094 Example: 'end { emit @count }'
2095 Example: 'end { eprint "Final count is " . @count }'
2096
2097 eprint
2098 eprint: prints expression immediately to stderr.
2099 Example: mlr --from f.dat put -q 'eprint "The sum of x and y is ".($x+$y)'
2100 Example: mlr --from f.dat put -q 'for (k, v in $*) { eprint k . " => " . v }'
2101 Example: mlr --from f.dat put '(NR % 1000 == 0) { eprint "Checkpoint ".NR}'
2102
2103 eprintn
2104 eprintn: prints expression immediately to stderr, without trailing newline.
2105 Example: mlr --from f.dat put -q 'eprintn "The sum of x and y is ".($x+$y); eprint ""'
2106
2107 false
2108 false: the boolean literal value.
2109
2110 filter
2111 filter: includes/excludes the record in the output record stream.
2112
2113 Example: mlr --from f.dat put 'filter (NR == 2 || $x > 5.4)'
2114
2115 Instead of put with 'filter false' you can simply use put -q. The following
2116 uses the input record to accumulate data but only prints the running sum
2117 without printing the input record:
2118
2119 Example: mlr --from f.dat put -q '@running_sum += $x * $y; emit @running_sum'
2120
2121 float
2122 float: declares a floating-point local variable in the current curly-braced scope.
2123 Type-checking happens at assignment: 'float x = 0' is an error.
2124
2125 for
2126 for: defines a for-loop using one of three styles. The body statements must
2127 be wrapped in curly braces.
2128 For-loop over stream record:
2129 Example: 'for (k, v in $*) { ... }'
2130 For-loop over out-of-stream variables:
2131 Example: 'for (k, v in @counts) { ... }'
2132 Example: 'for ((k1, k2), v in @counts) { ... }'
2133 Example: 'for ((k1, k2, k3), v in @*) { ... }'
2134 C-style for-loop:
2135 Example: 'for (var i = 0, var b = 1; i < 10; i += 1, b *= 2) { ... }'
2136
2137 func
2138 func: used for defining a user-defined function.
2139 Example: 'func f(a,b) { return sqrt(a**2+b**2)} $d = f($x, $y)'
2140
2141 if
2142 if: starts an if/elif/elif chain. The body statements must be wrapped
2143 in curly braces.
2144
2145 in
2146 in: used in for-loops over stream records or out-of-stream variables.
2147
2148 int
2149 int: declares an integer local variable in the current curly-braced scope.
2150 Type-checking happens at assignment: 'int x = 0.0' is an error.
2151
2152 map
2153 map: declares an map-valued local variable in the current curly-braced scope.
2154 Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is
2155 always OK. map b = a is OK or not depending on whether a is a map.
2156
2157 num
2158 num: declares an int/float local variable in the current curly-braced scope.
2159 Type-checking happens at assignment: 'num b = true' is an error.
2160
2161 print
2162 print: prints expression immediately to stdout.
2163 Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)'
2164 Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }'
2165 Example: mlr --from f.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
2166
2167 printn
2168 printn: prints expression immediately to stdout, without trailing newline.
2169 Example: mlr --from f.dat put -q 'printn "."; end { print "" }'
2170
2171 return
2172 return: specifies the return value from a user-defined function.
2173 Omitted return statements (including via if-branches) result in an absent-null
2174 return value, which in turns results in a skipped assignment to an LHS.
2175
2176 stderr
2177 stderr: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2178 to print to standard error.
2179
2180 stdout
2181 stdout: Used for tee, emit, emitf, emitp, print, and dump in place of filename
2182 to print to standard output.
2183
2184 str
2185 str: declares a string local variable in the current curly-braced scope.
2186 Type-checking happens at assignment.
2187
2188 subr
2189 subr: used for defining a subroutine.
2190 Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
2191
2192 tee
2193 tee: prints the current record to specified file.
2194 This is an immediate print to the specified file (except for pprint format
2195 which of course waits until the end of the input stream to format all output).
2196
2197 The > and >> are for write and append, as in the shell, but (as with awk) the
2198 file-overwrite for > is on first write, not per record. The | is for piping to
2199 a process which will process the data. There will be one open file for each
2200 distinct file name (for > and >>) or one subordinate process for each distinct
2201 value of the piped-to command (for |). Output-formatting flags are taken from
2202 the main command line.
2203
2204 You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
2205 etc., to control the format of the output. See also mlr -h.
2206
2207 emit with redirect and tee with redirect are identical, except tee can only
2208 output $*.
2209
2210 Example: mlr --from f.dat put 'tee > "/tmp/data-".$a, $*'
2211 Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*'
2212 Example: mlr --from f.dat put 'tee > stderr, $*'
2213 Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\]", $*'
2214 Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\] > /tmp/data-".$a, $*'
2215 Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2216 Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*'
2217
2218 true
2219 true: the boolean literal value.
2220
2221 unset
2222 unset: clears field(s) from the current record, or an out-of-stream or local variable.
2223
2224 Example: mlr --from f.dat put 'unset $x'
2225 Example: mlr --from f.dat put 'unset $*'
2226 Example: mlr --from f.dat put 'for (k, v in $*) { if (k =~ "a.*") { unset $[k] } }'
2227 Example: mlr --from f.dat put '...; unset @sums'
2228 Example: mlr --from f.dat put '...; unset @sums["green"]'
2229 Example: mlr --from f.dat put '...; unset @*'
2230
2231 var
2232 var: declares an untyped local variable in the current curly-braced scope.
2233 Examples: 'var a=1', 'var xyz=""'
2234
2235 while
2236 while: introduces a while loop, or with "do", introduces a do-while loop.
2237 The body statements must be wrapped in curly braces.
2238
2239 ENV
2240 ENV: access to environment variables by name, e.g. '$home = ENV["HOME"]'
2241
2242 FILENAME
2243 FILENAME: evaluates to the name of the current file being processed.
2244
2245 FILENUM
2246 FILENUM: evaluates to the number of the current file being processed,
2247 starting with 1.
2248
2249 FNR
2250 FNR: evaluates to the number of the current record within the current file
2251 being processed, starting with 1. Resets at the start of each file.
2252
2253 IFS
2254 IFS: evaluates to the input field separator from the command line.
2255
2256 IPS
2257 IPS: evaluates to the input pair separator from the command line.
2258
2259 IRS
2260 IRS: evaluates to the input record separator from the command line,
2261 or to LF or CRLF from the input data if in autodetect mode (which is
2262 the default).
2263
2264 M_E
2265 M_E: the mathematical constant e.
2266
2267 M_PI
2268 M_PI: the mathematical constant pi.
2269
2270 NF
2271 NF: evaluates to the number of fields in the current record.
2272
2273 NR
2274 NR: evaluates to the number of the current record over all files
2275 being processed, starting with 1. Does not reset at the start of each file.
2276
2277 OFS
2278 OFS: evaluates to the output field separator from the command line.
2279
2280 OPS
2281 OPS: evaluates to the output pair separator from the command line.
2282
2283 ORS
2284 ORS: evaluates to the output record separator from the command line,
2285 or to LF or CRLF from the input data if in autodetect mode (which is
2286 the default).
2287
2289 Miller is written by John Kerl <kerl.john.r@gmail.com>.
2290
2291 This manual page has been composed from Miller's help output by Eric
2292 MSP Veith <eveith@veith-m.de>.
2293
2295 awk(1), sed(1), cut(1), join(1), sort(1), RFC 4180: Common Format and
2296 MIME Type for Comma-Separated Values (CSV) Files, the miller website
2297 http://johnkerl.org/miller/doc
2298
2299
2300
2301 2019-09-22 MILLER(1)