parallel/parallel(1)

1PARALLEL(1)                        parallel                        PARALLEL(1)
2
3
4

NAME

6       parallel - build and execute shell command lines from standard input in
7       parallel
8

SYNOPSIS

10       parallel [options] [command [arguments]] < list_of_arguments
11
12       parallel [options] [command [arguments]] ( ::: arguments | :::+
13       arguments | :::: argfile(s) | ::::+ argfile(s) ) ...
14
15       parallel --semaphore [options] command
16
17       #!/usr/bin/parallel --shebang [options] [command [arguments]]
18
19       #!/usr/bin/parallel --shebang-wrap [options] [command [arguments]]
20

DESCRIPTION

22       STOP!
23
24       Read the Reader's guide below if you are new to GNU parallel.
25
26       GNU parallel is a shell tool for executing jobs in parallel using one
27       or more computers. A job can be a single command or a small script that
28       has to be run for each of the lines in the input. The typical input is
29       a list of files, a list of hosts, a list of users, a list of URLs, or a
30       list of tables. A job can also be a command that reads from a pipe. GNU
31       parallel can then split the input into blocks and pipe a block into
32       each command in parallel.
33
34       If you use xargs and tee today you will find GNU parallel very easy to
35       use as GNU parallel is written to have the same options as xargs. If
36       you write loops in shell, you will find GNU parallel may be able to
37       replace most of the loops and make them run faster by running several
38       jobs in parallel.
39
40       GNU parallel makes sure output from the commands is the same output as
41       you would get had you run the commands sequentially. This makes it
42       possible to use output from GNU parallel as input for other programs.
43
44       For each line of input GNU parallel will execute command with the line
45       as arguments. If no command is given, the line of input is executed.
46       Several lines will be run in parallel. GNU parallel can often be used
47       as a substitute for xargs or cat | bash.
48
49   Reader's guide
50       If you prefer reading a book buy GNU Parallel 2018 at
51       http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
52       or download it at: https://doi.org/10.5281/zenodo.1146014
53
54       Otherwise start by watching the intro videos for a quick introduction:
55       http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
56
57       If you need a one page printable cheat sheet you can find it on:
58       https://www.gnu.org/software/parallel/parallel_cheat.pdf
59
60       You can find a lot of EXAMPLEs of use after the list of OPTIONS in man
61       parallel (Use LESS=+/EXAMPLE: man parallel). That will give you an idea
62       of what GNU parallel is capable of, and you may find a solution you can
63       simply adapt to your situation.
64
65       If you want to dive even deeper: spend a couple of hours walking
66       through the tutorial (man parallel_tutorial). Your command line will
67       love you for it.
68
69       Finally you may want to look at the rest of the manual (man parallel)
70       if you have special needs not already covered.
71
72       If you want to know the design decisions behind GNU parallel, try: man
73       parallel_design. This is also a good intro if you intend to change GNU
74       parallel.
75

OPTIONS

77       command
78           Command to execute.  If command or the following arguments contain
79           replacement strings (such as {}) every instance will be substituted
80           with the input.
81
82           If command is given, GNU parallel solve the same tasks as xargs. If
83           command is not given GNU parallel will behave similar to cat | sh.
84
85           The command must be an executable, a script, a composed command, an
86           alias, or a function.
87
88           Bash functions: export -f the function first or use env_parallel.
89
90           Bash, Csh, or Tcsh aliases: Use env_parallel.
91
92           Zsh, Fish, Ksh, and Pdksh functions and aliases: Use env_parallel.
93
94       {} (beta testing)
95           Input line. This replacement string will be replaced by a full line
96           read from the input source. The input source is normally stdin
97           (standard input), but can also be given with -a, :::, or ::::.
98
99           The replacement string {} can be changed with -I.
100
101           If the command line contains no replacement strings then {} will be
102           appended to the command line.
103
104           Replacement strings are normally quoted, so special characters are
105           not parsed by the shell. The exception is if the command starts
106           with a replacement string; then the string is not quoted.
107
108       {.} Input line without extension. This replacement string will be
109           replaced by the input with the extension removed. If the input line
110           contains . after the last /, the last . until the end of the string
111           will be removed and {.} will be replaced with the remaining. E.g.
112           foo.jpg becomes foo, subdir/foo.jpg becomes subdir/foo,
113           sub.dir/foo.jpg becomes sub.dir/foo, sub.dir/bar remains
114           sub.dir/bar. If the input line does not contain . it will remain
115           unchanged.
116
117           The replacement string {.} can be changed with --er.
118
119           To understand replacement strings see {}.
120
121       {/} Basename of input line. This replacement string will be replaced by
122           the input with the directory part removed.
123
124           The replacement string {/} can be changed with --basenamereplace.
125
126           To understand replacement strings see {}.
127
128       {//}
129           Dirname of input line. This replacement string will be replaced by
130           the dir of the input line. See dirname(1).
131
132           The replacement string {//} can be changed with --dirnamereplace.
133
134           To understand replacement strings see {}.
135
136       {/.}
137           Basename of input line without extension. This replacement string
138           will be replaced by the input with the directory and extension part
139           removed. It is a combination of {/} and {.}.
140
141           The replacement string {/.} can be changed with
142           --basenameextensionreplace.
143
144           To understand replacement strings see {}.
145
146       {#} Sequence number of the job to run. This replacement string will be
147           replaced by the sequence number of the job being run. It contains
148           the same number as $PARALLEL_SEQ.
149
150           The replacement string {#} can be changed with --seqreplace.
151
152           To understand replacement strings see {}.
153
154       {%} Job slot number. This replacement string will be replaced by the
155           job's slot number between 1 and number of jobs to run in parallel.
156           There will never be 2 jobs running at the same time with the same
157           job slot number.
158
159           The replacement string {%} can be changed with --slotreplace.
160
161           To understand replacement strings see {}.
162
163       {n} Argument from input source n or the n'th argument. This positional
164           replacement string will be replaced by the input from input source
165           n (when used with -a or ::::) or with the n'th argument (when used
166           with -N). If n is negative it refers to the n'th last argument.
167
168           To understand replacement strings see {}.
169
170       {n.}
171           Argument from input source n or the n'th argument without
172           extension. It is a combination of {n} and {.}.
173
174           This positional replacement string will be replaced by the input
175           from input source n (when used with -a or ::::) or with the n'th
176           argument (when used with -N). The input will have the extension
177           removed.
178
179           To understand positional replacement strings see {n}.
180
181       {n/}
182           Basename of argument from input source n or the n'th argument.  It
183           is a combination of {n} and {/}.
184
185           This positional replacement string will be replaced by the input
186           from input source n (when used with -a or ::::) or with the n'th
187           argument (when used with -N). The input will have the directory (if
188           any) removed.
189
190           To understand positional replacement strings see {n}.
191
192       {n//}
193           Dirname of argument from input source n or the n'th argument.  It
194           is a combination of {n} and {//}.
195
196           This positional replacement string will be replaced by the dir of
197           the input from input source n (when used with -a or ::::) or with
198           the n'th argument (when used with -N). See dirname(1).
199
200           To understand positional replacement strings see {n}.
201
202       {n/.}
203           Basename of argument from input source n or the n'th argument
204           without extension.  It is a combination of {n}, {/}, and {.}.
205
206           This positional replacement string will be replaced by the input
207           from input source n (when used with -a or ::::) or with the n'th
208           argument (when used with -N). The input will have the directory (if
209           any) and extension removed.
210
211           To understand positional replacement strings see {n}.
212
213       {=perl expression=}
214           Replace with calculated perl expression. $_ will contain the same
215           as {}. After evaluating perl expression $_ will be used as the
216           value. It is recommended to only change $_ but you have full access
217           to all of GNU parallel's internal functions and data structures. A
218           few convenience functions and data structures have been made:
219
220            Q(string)     shell quote a string
221
222            pQ(string)    perl quote a string
223
224            uq() (or uq)  (beta testing) do not quote current replacement
225                          string
226
227            total_jobs()  number of jobs in total
228
229            slot()        slot number of job
230
231            seq()         sequence number of job
232
233            @arg          the arguments
234
235           Example:
236
237             seq 10 | parallel echo {} + 1 is {= '$_++' =}
238             parallel csh -c {= '$_="mkdir ".Q($_)' =} ::: '12" dir'
239             seq 50 | parallel echo job {#} of {= '$_=total_jobs()' =}
240
241           See also: --rpl --parens
242
243       {=n perl expression=}
244           Positional equivalent to {=perl expression=}. To understand
245           positional replacement strings see {n}.
246
247           See also: {=perl expression=} {n}.
248
249       ::: arguments
250           Use arguments from the command line as input source instead of
251           stdin (standard input). Unlike other options for GNU parallel :::
252           is placed after the command and before the arguments.
253
254           The following are equivalent:
255
256             (echo file1; echo file2) | parallel gzip
257             parallel gzip ::: file1 file2
258             parallel gzip {} ::: file1 file2
259             parallel --arg-sep ,, gzip {} ,, file1 file2
260             parallel --arg-sep ,, gzip ,, file1 file2
261             parallel ::: "gzip file1" "gzip file2"
262
263           To avoid treating ::: as special use --arg-sep to set the argument
264           separator to something else. See also --arg-sep.
265
266           If multiple ::: are given, each group will be treated as an input
267           source, and all combinations of input sources will be generated.
268           E.g. ::: 1 2 ::: a b c will result in the combinations (1,a) (1,b)
269           (1,c) (2,a) (2,b) (2,c). This is useful for replacing nested for-
270           loops.
271
272           ::: and :::: can be mixed. So these are equivalent:
273
274             parallel echo {1} {2} {3} ::: 6 7 ::: 4 5 ::: 1 2 3
275             parallel echo {1} {2} {3} :::: <(seq 6 7) <(seq 4 5) \
276               :::: <(seq 1 3)
277             parallel -a <(seq 6 7) echo {1} {2} {3} :::: <(seq 4 5) \
278               :::: <(seq 1 3)
279             parallel -a <(seq 6 7) -a <(seq 4 5) echo {1} {2} {3} \
280               ::: 1 2 3
281             seq 6 7 | parallel -a - -a <(seq 4 5) echo {1} {2} {3} \
282               ::: 1 2 3
283             seq 4 5 | parallel echo {1} {2} {3} :::: <(seq 6 7) - \
284               ::: 1 2 3
285
286       :::+ arguments
287           Like ::: but linked like --link to the previous input source.
288
289           Contrary to --link, values do not wrap: The shortest input source
290           determines the length.
291
292           Example:
293
294             parallel echo ::: a b c :::+ 1 2 3 ::: X Y :::+ 11 22
295
296       :::: argfiles
297           Another way to write -a argfile1 -a argfile2 ...
298
299           ::: and :::: can be mixed.
300
301           See -a, ::: and --link.
302
303       ::::+ argfiles
304           Like :::: but linked like --link to the previous input source.
305
306           Contrary to --link, values do not wrap: The shortest input source
307           determines the length.
308
309       --null
310       -0  Use NUL as delimiter.  Normally input lines will end in \n
311           (newline). If they end in \0 (NUL), then use this option. It is
312           useful for processing arguments that may contain \n (newline).
313
314       --arg-file input-file
315       -a input-file
316           Use input-file as input source. If you use this option, stdin
317           (standard input) is given to the first process run.  Otherwise,
318           stdin (standard input) is redirected from /dev/null.
319
320           If multiple -a are given, each input-file will be treated as an
321           input source, and all combinations of input sources will be
322           generated. E.g. The file foo contains 1 2, the file bar contains a
323           b c.  -a foo -a bar will result in the combinations (1,a) (1,b)
324           (1,c) (2,a) (2,b) (2,c). This is useful for replacing nested for-
325           loops.
326
327           See also --link and {n}.
328
329       --arg-file-sep sep-str
330           Use sep-str instead of :::: as separator string between command and
331           argument files. Useful if :::: is used for something else by the
332           command.
333
334           See also: ::::.
335
336       --arg-sep sep-str
337           Use sep-str instead of ::: as separator string. Useful if ::: is
338           used for something else by the command.
339
340           Also useful if you command uses ::: but you still want to read
341           arguments from stdin (standard input): Simply change --arg-sep to a
342           string that is not in the command line.
343
344           See also: :::.
345
346       --bar
347           Show progress as a progress bar. In the bar is shown: % of jobs
348           completed, estimated seconds left, and number of jobs started.
349
350           It is compatible with zenity:
351
352             seq 1000 | parallel -j30 --bar '(echo {};sleep 0.1)' \
353               2> >(zenity --progress --auto-kill) | wc
354
355       --basefile file
356       --bf file
357           file will be transferred to each sshlogin before a job is started.
358           It will be removed if --cleanup is active. The file may be a script
359           to run or some common base data needed for the job.  Multiple --bf
360           can be specified to transfer more basefiles. The file will be
361           transferred the same way as --transferfile.
362
363       --basenamereplace replace-str
364       --bnr replace-str
365           Use the replacement string replace-str instead of {/} for basename
366           of input line.
367
368       --basenameextensionreplace replace-str
369       --bner replace-str
370           Use the replacement string replace-str instead of {/.} for basename
371           of input line without extension.
372
373       --bin binexpr (alpha testing)
374           Use binexpr as binning key and bin input to the jobs.
375
376           binexpr is [column number|column name] [perlexpression] e.g. 3,
377           Address, 3 $_%=100, Address s/\D//g.
378
379           Each input line is split using --colsep. The value of the column is
380           put into $_, the perl expression is executed, the resulting value
381           is is the job slot that will be given the line. If the value is
382           bigger than the number of jobslots the value will be modulo number
383           of jobslots.
384
385           This is similar to --shard but the hashing algorithm is a simple
386           modulo, which makes it predictible which jobslot will receive which
387           value.
388
389           The performance is in the order of 100K rows per second. Faster if
390           the bincol is small (<10), slower if it is big (>100).
391
392           --bin requires --pipe and a fixed numeric value for --jobs.
393
394           See also --shard, --group-by, --roundrobin.
395
396       --bg
397           Run command in background thus GNU parallel will not wait for
398           completion of the command before exiting. This is the default if
399           --semaphore is set.
400
401           See also: --fg, man sem.
402
403           Implies --semaphore.
404
405       --bibtex
406       --citation
407           Print the citation notice and BibTeX entry for GNU parallel,
408           silence citation notice for all future runs, and exit. It will not
409           run any commands.
410
411           If it is impossible for you to run --citation you can instead use
412           --will-cite, which will run commands, but which will only silence
413           the citation notice for this single run.
414
415           If you use --will-cite in scripts to be run by others you are
416           making it harder for others to see the citation notice.  The
417           development of GNU parallel is indirectly financed through
418           citations, so if your users do not know they should cite then you
419           are making it harder to finance development. However, if you pay
420           10000 EUR, you have done your part to finance future development
421           and should feel free to use --will-cite in scripts.
422
423           If you do not want to help financing future development by letting
424           other users see the citation notice or by paying, then please use
425           another tool instead of GNU parallel. You can find some of the
426           alternatives in man parallel_alternatives.
427
428       --block size
429       --block-size size
430           Size of block in bytes to read at a time. The size can be postfixed
431           with K, M, G, T, P, E, k, m, g, t, p, or e which would multiply the
432           size with 1024, 1048576, 1073741824, 1099511627776,
433           1125899906842624, 1152921504606846976, 1000, 1000000, 1000000000,
434           1000000000000, 1000000000000000, or 1000000000000000000
435           respectively.
436
437           GNU parallel tries to meet the block size but can be off by the
438           length of one record. For performance reasons size should be bigger
439           than a two records. GNU parallel will warn you and automatically
440           increase the size if you choose a size that is too small.
441
442           If you use -N, --block-size should be bigger than N+1 records.
443
444           size defaults to 1M.
445
446           When using --pipepart a negative block size is not interpreted as a
447           blocksize but as the number of blocks each jobslot should have. So
448           this will run 10*5 = 50 jobs in total:
449
450             parallel --pipepart -a myfile --block -10 -j5 wc
451
452           This is an efficient alternative to --roundrobin because data is
453           never read by GNU parallel, but you can still have very few
454           jobslots process a large amount of data.
455
456           See --pipe and --pipepart for use of this.
457
458       --cat
459           Create a temporary file with content. Normally --pipe/--pipepart
460           will give data to the program on stdin (standard input). With --cat
461           GNU parallel will create a temporary file with the name in {}, so
462           you can do: parallel --pipe --cat wc {}.
463
464           Implies --pipe unless --pipepart is used.
465
466           See also --fifo.
467
468       --cleanup
469           Remove transferred files. --cleanup will remove the transferred
470           files on the remote computer after processing is done.
471
472             find log -name '*gz' | parallel \
473               --sshlogin server.example.com --transferfile {} \
474               --return {.}.bz2 --cleanup "zcat {} | bzip -9 >{.}.bz2"
475
476           With --transferfile {} the file transferred to the remote computer
477           will be removed on the remote computer.  Directories created will
478           not be removed - even if they are empty.
479
480           With --return the file transferred from the remote computer will be
481           removed on the remote computer.  Directories created will not be
482           removed - even if they are empty.
483
484           --cleanup is ignored when not used with --transferfile or --return.
485
486       --colsep regexp
487       -C regexp
488           Column separator. The input will be treated as a table with regexp
489           separating the columns. The n'th column can be accessed using {n}
490           or {n.}. E.g. {3} is the 3rd column.
491
492           If there are more input sources, each input source will be
493           separated, but the columns from each input source will be linked
494           (see --link).
495
496             parallel --colsep '-' echo {4} {3} {2} {1} \
497               ::: A-B C-D ::: e-f g-h
498
499           --colsep implies --trim rl, which can be overridden with --trim n.
500
501           regexp is a Perl Regular Expression:
502           http://perldoc.perl.org/perlre.html
503
504       --compress
505           Compress temporary files. If the output is big and very
506           compressible this will take up less disk space in $TMPDIR and
507           possibly be faster due to less disk I/O.
508
509           GNU parallel will try pzstd, lbzip2, pbzip2, zstd, pigz, lz4, lzop,
510           plzip, lzip, lrz, gzip, pxz, lzma, bzip2, xz, clzip, in that order,
511           and use the first available.
512
513       --compress-program prg
514       --decompress-program prg
515           Use prg for (de)compressing temporary files. It is assumed that prg
516           -dc will decompress stdin (standard input) to stdout (standard
517           output) unless --decompress-program is given.
518
519       --csv
520           Treat input as CSV-format. --colsep sets the field delimiter. It
521           works very much like --colsep except it deals correctly with
522           quoting:
523
524              echo '"1 big, 2 small","2""x4"" plank",12.34' |
525                parallel --csv echo {1} of {2} at {3}
526
527           Even quoted newlines are parsed correctly:
528
529              (echo '"Start of field 1 with newline'
530               echo 'Line 2 in field 1";value 2') |
531                parallel --csv --colsep ';' echo Field 1: {1} Field 2: {2}
532
533           When used with --pipe only pass full CSV-records.
534
535       --delimiter delim
536       -d delim
537           Input items are terminated by delim.  Quotes and backslash are not
538           special; every character in the input is taken literally.  Disables
539           the end-of-file string, which is treated like any other argument.
540           The specified delimiter may be characters, C-style character
541           escapes such as \n, or octal or hexadecimal escape codes.  Octal
542           and hexadecimal escape codes are understood as for the printf
543           command.  Multibyte characters are not supported.
544
545       --dirnamereplace replace-str
546       --dnr replace-str
547           Use the replacement string replace-str instead of {//} for dirname
548           of input line.
549
550       -E eof-str
551           Set the end of file string to eof-str.  If the end of file string
552           occurs as a line of input, the rest of the input is not read.  If
553           neither -E nor -e is used, no end of file string is used.
554
555       --delay mytime
556           Delay starting next job by mytime. GNU parallel will pause mytime
557           after starting each job. mytime is normally in seconds, but can be
558           floats postfixed with s, m, h, or d which would multiply the float
559           by 1, 60, 3600, or 86400. Thus these are equivalent: --delay 100000
560           and --delay 1d3.5h16.6m4s.
561
562       --dry-run
563           Print the job to run on stdout (standard output), but do not run
564           the job. Use -v -v to include the wrapping that GNU parallel
565           generates (for remote jobs, --tmux, --nice, --pipe, --pipepart,
566           --fifo and --cat). Do not count on this literally, though, as the
567           job may be scheduled on another computer or the local computer if :
568           is in the list.
569
570       --eof[=eof-str]
571       -e[eof-str]
572           This option is a synonym for the -E option.  Use -E instead,
573           because it is POSIX compliant for xargs while this option is not.
574           If eof-str is omitted, there is no end of file string.  If neither
575           -E nor -e is used, no end of file string is used.
576
577       --embed
578           Embed GNU parallel in a shell script. If you need to distribute
579           your script to someone who does not want to install GNU parallel
580           you can embed GNU parallel in your own shell script:
581
582             parallel --embed > new_script
583
584           After which you add your code at the end of new_script. This is
585           tested on ash, bash, dash, ksh, sh, and zsh.
586
587       --env var
588           Copy environment variable var. This will copy var to the
589           environment that the command is run in. This is especially useful
590           for remote execution.
591
592           In Bash var can also be a Bash function - just remember to export
593           -f the function, see command.
594
595           The variable '_' is special. It will copy all exported environment
596           variables except for the ones mentioned in
597           ~/.parallel/ignored_vars.
598
599           To copy the full environment (both exported and not exported
600           variables, arrays, and functions) use env_parallel.
601
602           See also: --record-env, --session.
603
604       --eta
605           Show the estimated number of seconds before finishing. This forces
606           GNU parallel to read all jobs before starting to find the number of
607           jobs. GNU parallel normally only reads the next job to run.
608
609           The estimate is based on the runtime of finished jobs, so the first
610           estimate will only be shown when the first job has finished.
611
612           Implies --progress.
613
614           See also: --bar, --progress.
615
616       --fg
617           Run command in foreground.
618
619           With --tmux and --tmuxpane GNU parallel will start tmux in the
620           foreground.
621
622           With --semaphore GNU parallel will run the command in the
623           foreground (opposite --bg), and wait for completion of the command
624           before exiting.
625
626           See also --bg, man sem.
627
628       --fifo
629           Create a temporary fifo with content. Normally --pipe and
630           --pipepart will give data to the program on stdin (standard input).
631           With --fifo GNU parallel will create a temporary fifo with the name
632           in {}, so you can do: parallel --pipe --fifo wc {}.
633
634           Beware: If data is not read from the fifo, the job will block
635           forever.
636
637           Implies --pipe unless --pipepart is used.
638
639           See also --cat.
640
641       --filter-hosts
642           Remove down hosts. For each remote host: check that login through
643           ssh works. If not: do not use this host.
644
645           For performance reasons, this check is performed only at the start
646           and every time --sshloginfile is changed. If an host goes down
647           after the first check, it will go undetected until --sshloginfile
648           is changed; --retries can be used to mitigate this.
649
650           Currently you can not put --filter-hosts in a profile, $PARALLEL,
651           /etc/parallel/config or similar. This is because GNU parallel uses
652           GNU parallel to compute this, so you will get an infinite loop.
653           This will likely be fixed in a later release.
654
655       --gnu
656           Behave like GNU parallel. This option historically took precedence
657           over --tollef. The --tollef option is now retired, and therefore
658           may not be used. --gnu is kept for compatibility.
659
660       --group
661           Group output. Output from each job is grouped together and is only
662           printed when the command is finished. Stdout (standard output)
663           first followed by stderr (standard error).
664
665           This takes in the order of 0.5ms per job and depends on the speed
666           of your disk for larger output. It can be disabled with -u, but
667           this means output from different commands can get mixed.
668
669           --group is the default. Can be reversed with -u.
670
671           See also: --line-buffer --ungroup
672
673       --group-by val
674           Group input by value. Combined with --pipe/--pipepart --group-by
675           groups lines with the same value into a record.
676
677           The value can be computed from the full line or from a single
678           column.
679
680           val can be:
681
682            column number Use the value in the column numbered.
683
684            column name   Treat the first line as a header and use the value
685                          in the column named.
686
687                          (Not supported with --pipepart).
688
689            perl expression
690                          Run the perl expression and use $_ as the value.
691
692            column number perl expression
693                          Put the value of the column put in $_, run the perl
694                          expression, and use $_ as the value.
695
696            column name perl expression
697                          Put the value of the column put in $_, run the perl
698                          expression, and use $_ as the value.
699
700                          (Not supported with --pipepart).
701
702           Example:
703
704             UserID, Consumption
705             123,    1
706             123,    2
707             12-3,   1
708             221,    3
709             221,    1
710             2/21,   5
711
712           If you want to group 123, 12-3, 221, and 2/21 into 4 records and
713           pass one record at a time to wc:
714
715             tail -n +2 table.csv | \
716               parallel --pipe --colsep , --group-by 1 -kN1 wc
717
718           Make GNU parallel treat the first line as a header:
719
720             cat table.csv | \
721               parallel --pipe --colsep , --header : --group-by 1 -kN1 wc
722
723           Address column by column name:
724
725             cat table.csv | \
726               parallel --pipe --colsep , --header : --group-by UserID -kN1 wc
727
728           If 12-3 and 123 are really the same UserID, remove non-digits in
729           UserID when grouping:
730
731             cat table.csv | parallel --pipe --colsep , --header : \
732               --group-by 'UserID s/\D//g' -kN1 wc
733
734           See also --shard, --roundrobin.
735
736       --help
737       -h  Print a summary of the options to GNU parallel and exit.
738
739       --halt-on-error val
740       --halt val
741           When should GNU parallel terminate? In some situations it makes no
742           sense to run all jobs. GNU parallel should simply give up as soon
743           as a condition is met.
744
745           val defaults to never, which runs all jobs no matter what.
746
747           val can also take on the form of when,why.
748
749           when can be 'now' which means kill all running jobs and halt
750           immediately, or it can be 'soon' which means wait for all running
751           jobs to complete, but start no new jobs.
752
753           why can be 'fail=X', 'fail=Y%', 'success=X', 'success=Y%',
754           'done=X', or 'done=Y%' where X is the number of jobs that has to
755           fail, succeed, or be done before halting, and Y is the percentage
756           of jobs that has to fail, succeed, or be done before halting.
757
758           Example:
759
760            --halt now,fail=1     exit when the first job fails. Kill running
761                                  jobs.
762
763            --halt soon,fail=3    exit when 3 jobs fail, but wait for running
764                                  jobs to complete.
765
766            --halt soon,fail=3%   exit when 3% of the jobs have failed, but
767                                  wait for running jobs to complete.
768
769            --halt now,success=1  exit when a job succeeds. Kill running jobs.
770
771            --halt soon,success=3 exit when 3 jobs succeeds, but wait for
772                                  running jobs to complete.
773
774            --halt now,success=3% exit when 3% of the jobs have succeeded.
775                                  Kill running jobs.
776
777            --halt now,done=1     exit when one of the jobs finishes. Kill
778                                  running jobs.
779
780            --halt soon,done=3    exit when 3 jobs finishes, but wait for
781                                  running jobs to complete.
782
783            --halt now,done=3%    exit when 3% of the jobs have finished. Kill
784                                  running jobs.
785
786           For backwards compatibility these also work:
787
788           0           never
789
790           1           soon,fail=1
791
792           2           now,fail=1
793
794           -1          soon,success=1
795
796           -2          now,success=1
797
798           1-99%       soon,fail=1-99%
799
800       --header regexp
801           Use regexp as header. For normal usage the matched header
802           (typically the first line: --header '.*\n') will be split using
803           --colsep (which will default to '\t') and column names can be used
804           as replacement variables: {column name}, {column name/}, {column
805           name//}, {column name/.}, {column name.}, {=column name perl
806           expression =}, ..
807
808           For --pipe the matched header will be prepended to each output.
809
810           --header : is an alias for --header '.*\n'.
811
812           If regexp is a number, it is a fixed number of lines.
813
814       --hostgroups
815       --hgrp
816           Enable hostgroups on arguments. If an argument contains '@' the
817           string after '@' will be removed and treated as a list of
818           hostgroups on which this job is allowed to run. If there is no
819           --sshlogin with a corresponding group, the job will run on any
820           hostgroup.
821
822           Example:
823
824             parallel --hostgroups \
825               --sshlogin @grp1/myserver1 -S @grp1+grp2/myserver2 \
826               --sshlogin @grp3/myserver3 \
827               echo ::: my_grp1_arg@grp1 arg_for_grp2@grp2 third@grp1+grp3
828
829           my_grp1_arg may be run on either myserver1 or myserver2, third may
830           be run on either myserver1 or myserver3, but arg_for_grp2 will only
831           be run on myserver2.
832
833           See also: --sshlogin.
834
835       -I replace-str
836           Use the replacement string replace-str instead of {}.
837
838       --replace[=replace-str]
839       -i[replace-str]
840           This option is a synonym for -Ireplace-str if replace-str is
841           specified, and for -I {} otherwise.  This option is deprecated; use
842           -I instead.
843
844       --joblog logfile
845           Logfile for executed jobs. Save a list of the executed jobs to
846           logfile in the following TAB separated format: sequence number,
847           sshlogin, start time as seconds since epoch, run time in seconds,
848           bytes in files transferred, bytes in files returned, exit status,
849           signal, and command run.
850
851           For --pipe bytes transferred and bytes returned are number of input
852           and output of bytes.
853
854           If logfile is prepended with '+' log lines will be appended to the
855           logfile.
856
857           To convert the times into ISO-8601 strict do:
858
859             cat logfile | perl -a -F"\t" -ne \
860               'chomp($F[2]=`date -d \@$F[2] +%FT%T`); print join("\t",@F)'
861
862           If the host is long, you can use column -t to pretty print it:
863
864             cat joblog | column -t
865
866           See also --resume --resume-failed.
867
868       --jobs N
869       -j N
870       --max-procs N
871       -P N
872           Number of jobslots on each machine. Run up to N jobs in parallel.
873           0 means as many as possible. Default is 100% which will run one job
874           per CPU on each machine.
875
876           If --semaphore is set, the default is 1 thus making a mutex.
877
878       --jobs +N
879       -j +N
880       --max-procs +N
881       -P +N
882           Add N to the number of CPUs.  Run this many jobs in parallel.  See
883           also --use-cores-instead-of-threads and
884           --use-sockets-instead-of-threads.
885
886       --jobs -N
887       -j -N
888       --max-procs -N
889       -P -N
890           Subtract N from the number of CPUs.  Run this many jobs in
891           parallel.  If the evaluated number is less than 1 then 1 will be
892           used.  See also --use-cores-instead-of-threads and
893           --use-sockets-instead-of-threads.
894
895       --jobs N%
896       -j N%
897       --max-procs N%
898       -P N%
899           Multiply N% with the number of CPUs.  Run this many jobs in
900           parallel. See also --use-cores-instead-of-threads and
901           --use-sockets-instead-of-threads.
902
903       --jobs procfile
904       -j procfile
905       --max-procs procfile
906       -P procfile
907           Read parameter from file. Use the content of procfile as parameter
908           for -j. E.g. procfile could contain the string 100% or +2 or 10. If
909           procfile is changed when a job completes, procfile is read again
910           and the new number of jobs is computed. If the number is lower than
911           before, running jobs will be allowed to finish but new jobs will
912           not be started until the wanted number of jobs has been reached.
913           This makes it possible to change the number of simultaneous running
914           jobs while GNU parallel is running.
915
916       --keep-order
917       -k  Keep sequence of output same as the order of input. Normally the
918           output of a job will be printed as soon as the job completes. Try
919           this to see the difference:
920
921             parallel -j4 sleep {}\; echo {} ::: 2 1 4 3
922             parallel -j4 -k sleep {}\; echo {} ::: 2 1 4 3
923
924           If used with --onall or --nonall the output will grouped by
925           sshlogin in sorted order.
926
927           If used with --pipe --roundrobin and the same input, the jobslots
928           will get the same blocks in the same order in every run.
929
930           -k only affects the order in which the output is printed - not the
931           order in which jobs are run.
932
933       -L recsize
934           When used with --pipe: Read records of recsize.
935
936           When used otherwise: Use at most recsize nonblank input lines per
937           command line.  Trailing blanks cause an input line to be logically
938           continued on the next input line.
939
940           -L 0 means read one line, but insert 0 arguments on the command
941           line.
942
943           Implies -X unless -m, --xargs, or --pipe is set.
944
945       --max-lines[=recsize]
946       -l[recsize]
947           When used with --pipe: Read records of recsize lines.
948
949           When used otherwise: Synonym for the -L option.  Unlike -L, the
950           recsize argument is optional.  If recsize is not specified, it
951           defaults to one.  The -l option is deprecated since the POSIX
952           standard specifies -L instead.
953
954           -l 0 is an alias for -l 1.
955
956           Implies -X unless -m, --xargs, or --pipe is set.
957
958       --limit "command args"
959           Dynamic job limit. Before starting a new job run command with args.
960           The exit value of command determines what GNU parallel will do:
961
962           0   Below limit. Start another job.
963
964           1   Over limit. Start no jobs.
965
966           2   Way over limit. Kill the youngest job.
967
968           You can use any shell command. There are 3 predefined commands:
969
970           "io n"    Limit for I/O. The amount of disk I/O will be computed as
971                     a value 0-100, where 0 is no I/O and 100 is at least one
972                     disk is 100% saturated.
973
974           "load n"  Similar to --load.
975
976           "mem n"   Similar to --memfree.
977
978       --line-buffer (beta testing)
979       --lb (beta testing)
980           Buffer output on line basis. --group will keep the output together
981           for a whole job. --ungroup allows output to mixup with half a line
982           coming from one job and half a line coming from another job.
983           --line-buffer fits between these two: GNU parallel will print a
984           full line, but will allow for mixing lines of different jobs.
985
986           --line-buffer takes more CPU power than both --group and --ungroup,
987           but can be much faster than --group if the CPU is not the limiting
988           factor.
989
990           Normally --line-buffer does not buffer on disk, and can thus
991           process an infinite amount of data, but it will buffer on disk when
992           combined with: --keep-order, --results, --compress, and --files.
993           This will make it as slow as --group and will limit output to the
994           available disk space.
995
996           With --keep-order --line-buffer will output lines from the first
997           job continuously while it is running, then lines from the second
998           job while that is running. It will buffer full lines, but jobs will
999           not mix. Compare:
1000
1001             parallel -j0 'echo {};sleep {};echo {}' ::: 1 3 2 4
1002             parallel -j0 --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
1003             parallel -j0 -k --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
1004
1005           See also: --group --ungroup
1006
1007       --xapply
1008       --link
1009           Link input sources. Read multiple input sources like xapply. If
1010           multiple input sources are given, one argument will be read from
1011           each of the input sources. The arguments can be accessed in the
1012           command as {1} .. {n}, so {1} will be a line from the first input
1013           source, and {6} will refer to the line with the same line number
1014           from the 6th input source.
1015
1016           Compare these two:
1017
1018             parallel echo {1} {2} ::: 1 2 3 ::: a b c
1019             parallel --link echo {1} {2} ::: 1 2 3 ::: a b c
1020
1021           Arguments will be recycled if one input source has more arguments
1022           than the others:
1023
1024             parallel --link echo {1} {2} {3} \
1025               ::: 1 2 ::: I II III ::: a b c d e f g
1026
1027           See also --header, :::+, ::::+.
1028
1029       --load max-load
1030           Do not start new jobs on a given computer unless the number of
1031           running processes on the computer is less than max-load. max-load
1032           uses the same syntax as --jobs, so 100% for one per CPU is a valid
1033           setting. Only difference is 0 which is interpreted as 0.01.
1034
1035       --controlmaster
1036       -M  Use ssh's ControlMaster to make ssh connections faster. Useful if
1037           jobs run remote and are very fast to run. This is disabled for
1038           sshlogins that specify their own ssh command.
1039
1040       --xargs
1041           Multiple arguments. Insert as many arguments as the command line
1042           length permits.
1043
1044           If {} is not used the arguments will be appended to the line.  If
1045           {} is used multiple times each {} will be replaced with all the
1046           arguments.
1047
1048           Support for --xargs with --sshlogin is limited and may fail.
1049
1050           See also -X for context replace. If in doubt use -X as that will
1051           most likely do what is needed.
1052
1053       -m  Multiple arguments. Insert as many arguments as the command line
1054           length permits. If multiple jobs are being run in parallel:
1055           distribute the arguments evenly among the jobs. Use -j1 or --xargs
1056           to avoid this.
1057
1058           If {} is not used the arguments will be appended to the line.  If
1059           {} is used multiple times each {} will be replaced with all the
1060           arguments.
1061
1062           Support for -m with --sshlogin is limited and may fail.
1063
1064           See also -X for context replace. If in doubt use -X as that will
1065           most likely do what is needed.
1066
1067       --memfree size
1068           Minimum memory free when starting another job. The size can be
1069           postfixed with K, M, G, T, P, k, m, g, t, or p which would multiply
1070           the size with 1024, 1048576, 1073741824, 1099511627776,
1071           1125899906842624, 1000, 1000000, 1000000000, 1000000000000, or
1072           1000000000000000, respectively.
1073
1074           If the jobs take up very different amount of RAM, GNU parallel will
1075           only start as many as there is memory for. If less than size bytes
1076           are free, no more jobs will be started. If less than 50% size bytes
1077           are free, the youngest job will be killed, and put back on the
1078           queue to be run later.
1079
1080           --retries must be set to determine how many times GNU parallel
1081           should retry a given job.
1082
1083       --minversion version
1084           Print the version GNU parallel and exit.  If the current version of
1085           GNU parallel is less than version the exit code is 255. Otherwise
1086           it is 0.
1087
1088           This is useful for scripts that depend on features only available
1089           from a certain version of GNU parallel.
1090
1091       --nonall
1092           --onall with no arguments. Run the command on all computers given
1093           with --sshlogin but take no arguments. GNU parallel will log into
1094           --jobs number of computers in parallel and run the job on the
1095           computer. -j adjusts how many computers to log into in parallel.
1096
1097           This is useful for running the same command (e.g. uptime) on a list
1098           of servers.
1099
1100       --onall
1101           Run all the jobs on all computers given with --sshlogin. GNU
1102           parallel will log into --jobs number of computers in parallel and
1103           run one job at a time on the computer. The order of the jobs will
1104           not be changed, but some computers may finish before others.
1105
1106           When using --group the output will be grouped by each server, so
1107           all the output from one server will be grouped together.
1108
1109           --joblog will contain an entry for each job on each server, so
1110           there will be several job sequence 1.
1111
1112       --output-as-files
1113       --outputasfiles
1114       --files
1115           Instead of printing the output to stdout (standard output) the
1116           output of each job is saved in a file and the filename is then
1117           printed.
1118
1119           See also: --results
1120
1121       --pipe
1122       --spreadstdin
1123           Spread input to jobs on stdin (standard input). Read a block of
1124           data from stdin (standard input) and give one block of data as
1125           input to one job.
1126
1127           The block size is determined by --block. The strings --recstart and
1128           --recend tell GNU parallel how a record starts and/or ends. The
1129           block read will have the final partial record removed before the
1130           block is passed on to the job. The partial record will be prepended
1131           to next block.
1132
1133           If --recstart is given this will be used to split at record start.
1134
1135           If --recend is given this will be used to split at record end.
1136
1137           If both --recstart and --recend are given both will have to match
1138           to find a split position.
1139
1140           If neither --recstart nor --recend are given --recend defaults to
1141           '\n'. To have no record separator use --recend "".
1142
1143           --files is often used with --pipe.
1144
1145           --pipe maxes out at around 1 GB/s input, and 100 MB/s output. If
1146           performance is important use --pipepart.
1147
1148           See also: --recstart, --recend, --fifo, --cat, --pipepart, --files.
1149
1150       --pipepart
1151           Pipe parts of a physical file. --pipepart works similar to --pipe,
1152           but is much faster.
1153
1154           --pipepart has a few limitations:
1155
1156           ·  The file must be a normal file or a block device (technically it
1157              must be seekable) and must be given using -a or ::::. The file
1158              cannot be a pipe or a fifo as they are not seekable.
1159
1160              If using a block device with lot of NUL bytes, remember to set
1161              --recend ''.
1162
1163           ·  Record counting (-N) and line counting (-L/-l) do not work.
1164
1165       --plain
1166           Ignore any --profile, $PARALLEL, and ~/.parallel/config to get full
1167           control on the command line (used by GNU parallel internally when
1168           called with --sshlogin).
1169
1170       --plus
1171           Activate additional replacement strings: {+/} {+.} {+..} {+...}
1172           {..} {...} {/..} {/...} {##}. The idea being that '{+foo}' matches
1173           the opposite of '{foo}' and {} = {+/}/{/} = {.}.{+.} =
1174           {+/}/{/.}.{+.} = {..}.{+..} = {+/}/{/..}.{+..} = {...}.{+...} =
1175           {+/}/{/...}.{+...}
1176
1177           {##} is the number of jobs to be run. It is incompatible with
1178           -X/-m/--xargs.
1179
1180           {choose_k} is inspired by n choose k: Given a list of n elements,
1181           choose k. k is the number of input sources and n is the number of
1182           arguments in an input source.  The content of the input sources
1183           must be the same and the arguments must be unique.
1184
1185           The following dynamic replacement strings are also activated. They
1186           are inspired by bash's parameter expansion:
1187
1188             {:-str}       str if the value is empty
1189             {:num}        remove the first num characters
1190             {:num1:num2}  characters from num1 to num2
1191             {#str}        remove prefix str
1192             {%str}        remove postfix str
1193             {/str1/str2}  replace str1 with str2
1194             {^str}        uppercase str if found at the start
1195             {^^str}       uppercase str
1196             {,str}        lowercase str if found at the start
1197             {,,str}       lowercase str
1198
1199       --progress
1200           Show progress of computations. List the computers involved in the
1201           task with number of CPUs detected and the max number of jobs to
1202           run. After that show progress for each computer: number of running
1203           jobs, number of completed jobs, and percentage of all jobs done by
1204           this computer. The percentage will only be available after all jobs
1205           have been scheduled as GNU parallel only read the next job when
1206           ready to schedule it - this is to avoid wasting time and memory by
1207           reading everything at startup.
1208
1209           By sending GNU parallel SIGUSR2 you can toggle turning on/off
1210           --progress on a running GNU parallel process.
1211
1212           See also --eta and --bar.
1213
1214       --max-args=max-args
1215       -n max-args
1216           Use at most max-args arguments per command line.  Fewer than max-
1217           args arguments will be used if the size (see the -s option) is
1218           exceeded, unless the -x option is given, in which case GNU parallel
1219           will exit.
1220
1221           -n 0 means read one argument, but insert 0 arguments on the command
1222           line.
1223
1224           Implies -X unless -m is set.
1225
1226       --max-replace-args=max-args
1227       -N max-args
1228           Use at most max-args arguments per command line. Like -n but also
1229           makes replacement strings {1} .. {max-args} that represents
1230           argument 1 .. max-args. If too few args the {n} will be empty.
1231
1232           -N 0 means read one argument, but insert 0 arguments on the command
1233           line.
1234
1235           This will set the owner of the homedir to the user:
1236
1237             tr ':' '\n' < /etc/passwd | parallel -N7 chown {1} {6}
1238
1239           Implies -X unless -m or --pipe is set.
1240
1241           When used with --pipe -N is the number of records to read. This is
1242           somewhat slower than --block.
1243
1244       --max-line-length-allowed
1245           Print the maximal number of characters allowed on the command line
1246           and exit (used by GNU parallel itself to determine the line length
1247           on remote computers).
1248
1249       --number-of-cpus (obsolete)
1250           Print the number of physical CPU cores and exit.
1251
1252       --number-of-cores (beta testing)
1253           Print the number of physical CPU cores and exit (used by GNU
1254           parallel itself to determine the number of physical CPU cores on
1255           remote computers).
1256
1257       --number-of-sockets (beta testing)
1258           Print the number of filled CPU sockets and exit (used by GNU
1259           parallel itself to determine the number of filled CPU sockets on
1260           remote computers).
1261
1262       --number-of-threads (beta testing)
1263           Print the number of hyperthreaded CPU cores and exit (used by GNU
1264           parallel itself to determine the number of hyperthreaded CPU cores
1265           on remote computers).
1266
1267       --no-keep-order
1268           Overrides an earlier --keep-order (e.g. if set in
1269           ~/.parallel/config).
1270
1271       --nice niceness (alpha testing)
1272           Run the command at this niceness.
1273
1274           By default GNU parallel will run jobs at the same nice level as GNU
1275           parallel is started - both on the local machine and remote servers,
1276           so you are unlikely to ever use this option.
1277
1278           Setting --nice will override this nice level. If the nice level is
1279           smaller than the current nice level, it will only affect remote
1280           jobs (e.g. current level is 10 and --nice 5 will cause local jobs
1281           to be run at level 10, but remote jobs run at nice level 5).
1282
1283       --interactive
1284       -p  Prompt the user about whether to run each command line and read a
1285           line from the terminal.  Only run the command line if the response
1286           starts with 'y' or 'Y'.  Implies -t.
1287
1288       --parens parensstring
1289           Define start and end parenthesis for {= perl expression =}. The
1290           left and the right parenthesis can be multiple characters and are
1291           assumed to be the same length. The default is {==} giving {= as the
1292           start parenthesis and =} as the end parenthesis.
1293
1294           Another useful setting is ,,,, which would make both parenthesis
1295           ,,:
1296
1297             parallel --parens ,,,, echo foo is ,,s/I/O/g,, ::: FII
1298
1299           See also: --rpl {= perl expression =}
1300
1301       --profile profilename (beta testing)
1302       -J profilename (beta testing)
1303           Use profile profilename for options. This is useful if you want to
1304           have multiple profiles. You could have one profile for running jobs
1305           in parallel on the local computer and a different profile for
1306           running jobs on remote computers. See the section PROFILE FILES for
1307           examples.
1308
1309           profilename corresponds to the file ~/.parallel/profilename.
1310
1311           You can give multiple profiles by repeating --profile. If parts of
1312           the profiles conflict, the later ones will be used.
1313
1314           Default: config
1315
1316       --quote
1317       -q  Quote command. The command must be a simple command (see man bash)
1318           without redirections and without variable assignments. This will
1319           quote the command line and arguments so special characters are not
1320           interpreted by the shell. See the section QUOTING. Most people will
1321           never need this.  Quoting is disabled by default.
1322
1323       --no-run-if-empty
1324       -r  If the stdin (standard input) only contains whitespace, do not run
1325           the command.
1326
1327           If used with --pipe this is slow.
1328
1329       --noswap
1330           Do not start new jobs on a given computer if there is both swap-in
1331           and swap-out activity.
1332
1333           The swap activity is only sampled every 10 seconds as the sampling
1334           takes 1 second to do.
1335
1336           Swap activity is computed as (swap-in)*(swap-out) which in practice
1337           is a good value: swapping out is not a problem, swapping in is not
1338           a problem, but both swapping in and out usually indicates a
1339           problem.
1340
1341           --memfree may give better results, so try using that first.
1342
1343       --record-env
1344           Record current environment variables in ~/.parallel/ignored_vars.
1345           This is useful before using --env _.
1346
1347           See also --env, --session.
1348
1349       --recstart startstring
1350       --recend endstring
1351           If --recstart is given startstring will be used to split at record
1352           start.
1353
1354           If --recend is given endstring will be used to split at record end.
1355
1356           If both --recstart and --recend are given the combined string
1357           endstringstartstring will have to match to find a split position.
1358           This is useful if either startstring or endstring match in the
1359           middle of a record.
1360
1361           If neither --recstart nor --recend are given then --recend defaults
1362           to '\n'. To have no record separator use --recend "".
1363
1364           --recstart and --recend are used with --pipe.
1365
1366           Use --regexp to interpret --recstart and --recend as regular
1367           expressions. This is slow, however.
1368
1369       --regexp
1370           Use --regexp to interpret --recstart and --recend as regular
1371           expressions. This is slow, however.
1372
1373       --remove-rec-sep
1374       --removerecsep
1375       --rrs
1376           Remove the text matched by --recstart and --recend before piping it
1377           to the command.
1378
1379           Only used with --pipe.
1380
1381       --results name
1382       --res name
1383           Save the output into files.
1384
1385           Simple string output dir
1386
1387           If name does not contain replacement strings and does not end in
1388           .csv/.tsv, the output will be stored in a directory tree rooted at
1389           name.  Within this directory tree, each command will result in
1390           three files: name/<ARGS>/stdout and name/<ARGS>/stderr,
1391           name/<ARGS>/seq, where <ARGS> is a sequence of directories
1392           representing the header of the input source (if using --header :)
1393           or the number of the input source and corresponding values.
1394
1395           E.g:
1396
1397             parallel --header : --results foo echo {a} {b} \
1398               ::: a I II ::: b III IIII
1399
1400           will generate the files:
1401
1402             foo/a/II/b/III/seq
1403             foo/a/II/b/III/stderr
1404             foo/a/II/b/III/stdout
1405             foo/a/II/b/IIII/seq
1406             foo/a/II/b/IIII/stderr
1407             foo/a/II/b/IIII/stdout
1408             foo/a/I/b/III/seq
1409             foo/a/I/b/III/stderr
1410             foo/a/I/b/III/stdout
1411             foo/a/I/b/IIII/seq
1412             foo/a/I/b/IIII/stderr
1413             foo/a/I/b/IIII/stdout
1414
1415           and
1416
1417             parallel --results foo echo {1} {2} ::: I II ::: III IIII
1418
1419           will generate the files:
1420
1421             foo/1/II/2/III/seq
1422             foo/1/II/2/III/stderr
1423             foo/1/II/2/III/stdout
1424             foo/1/II/2/IIII/seq
1425             foo/1/II/2/IIII/stderr
1426             foo/1/II/2/IIII/stdout
1427             foo/1/I/2/III/seq
1428             foo/1/I/2/III/stderr
1429             foo/1/I/2/III/stdout
1430             foo/1/I/2/IIII/seq
1431             foo/1/I/2/IIII/stderr
1432             foo/1/I/2/IIII/stdout
1433
1434           CSV file output
1435
1436           If name ends in .csv/.tsv the output will be a CSV-file named name.
1437
1438           .csv gives a comma separated value file. .tsv gives a TAB separated
1439           value file.
1440
1441           -.csv/-.tsv are special: It will give the file on stdout (standard
1442           output).
1443
1444           Replacement string output file
1445
1446           If name contains a replacement string and the replaced result does
1447           not end in /, then the standard output will be stored in a file
1448           named by this result. Standard error will be stored in the same
1449           file name with '.err' added, and the sequence number will be stored
1450           in the same file name with '.seq' added.
1451
1452           E.g.
1453
1454             parallel --results my_{} echo ::: foo bar baz
1455
1456           will generate the files:
1457
1458             my_bar
1459             my_bar.err
1460             my_bar.seq
1461             my_baz
1462             my_baz.err
1463             my_baz.seq
1464             my_foo
1465             my_foo.err
1466             my_foo.seq
1467
1468           Replacement string output dir
1469
1470           If name contains a replacement string and the replaced result ends
1471           in /, then output files will be stored in the resulting dir.
1472
1473           E.g.
1474
1475             parallel --results my_{}/ echo ::: foo bar baz
1476
1477           will generate the files:
1478
1479             my_bar/seq
1480             my_bar/stderr
1481             my_bar/stdout
1482             my_baz/seq
1483             my_baz/stderr
1484             my_baz/stdout
1485             my_foo/seq
1486             my_foo/stderr
1487             my_foo/stdout
1488
1489           See also --files, --tag, --header, --joblog.
1490
1491       --resume
1492           Resumes from the last unfinished job. By reading --joblog or the
1493           --results dir GNU parallel will figure out the last unfinished job
1494           and continue from there. As GNU parallel only looks at the sequence
1495           numbers in --joblog then the input, the command, and --joblog all
1496           have to remain unchanged; otherwise GNU parallel may run wrong
1497           commands.
1498
1499           See also --joblog, --results, --resume-failed, --retries.
1500
1501       --resume-failed
1502           Retry all failed and resume from the last unfinished job. By
1503           reading --joblog GNU parallel will figure out the failed jobs and
1504           run those again. After that it will resume last unfinished job and
1505           continue from there. As GNU parallel only looks at the sequence
1506           numbers in --joblog then the input, the command, and --joblog all
1507           have to remain unchanged; otherwise GNU parallel may run wrong
1508           commands.
1509
1510           See also --joblog, --resume, --retry-failed, --retries.
1511
1512       --retry-failed
1513           Retry all failed jobs in joblog. By reading --joblog GNU parallel
1514           will figure out the failed jobs and run those again.
1515
1516           --retry-failed ignores the command and arguments on the command
1517           line: It only looks at the joblog.
1518
1519           Differences between --resume, --resume-failed, --retry-failed
1520
1521           In this example exit {= $_%=2 =} will cause every other job to
1522           fail.
1523
1524             timeout -k 1 4 parallel --joblog log -j10 \
1525               'sleep {}; exit {= $_%=2 =}' ::: {10..1}
1526
1527           4 jobs completed. 2 failed:
1528
1529             Seq   [...]   Exitval Signal  Command
1530             10    [...]   1       0       sleep 1; exit 1
1531             9     [...]   0       0       sleep 2; exit 0
1532             8     [...]   1       0       sleep 3; exit 1
1533             7     [...]   0       0       sleep 4; exit 0
1534
1535           --resume does not care about the Exitval, but only looks at Seq. If
1536           the Seq is run, it will not be run again. So if needed, you can
1537           change the command for the seqs not run yet:
1538
1539             parallel --resume --joblog log -j10 \
1540               'sleep .{}; exit {= $_%=2 =}' ::: {10..1}
1541
1542             Seq   [...]   Exitval Signal  Command
1543             [... as above ...]
1544             1     [...]   0       0       sleep .10; exit 0
1545             6     [...]   1       0       sleep .5; exit 1
1546             5     [...]   0       0       sleep .6; exit 0
1547             4     [...]   1       0       sleep .7; exit 1
1548             3     [...]   0       0       sleep .8; exit 0
1549             2     [...]   1       0       sleep .9; exit 1
1550
1551           --resume-failed cares about the Exitval, but also only looks at Seq
1552           to figure out which commands to run. Again this means you can
1553           change the command, but not the arguments. It will run the failed
1554           seqs and the seqs not yet run:
1555
1556             parallel --resume-failed --joblog log -j10 \
1557               'echo {};sleep .{}; exit {= $_%=3 =}' ::: {10..1}
1558
1559             Seq   [...]   Exitval Signal  Command
1560             [... as above ...]
1561             10    [...]   1       0       echo 1;sleep .1; exit 1
1562             8     [...]   0       0       echo 3;sleep .3; exit 0
1563             6     [...]   2       0       echo 5;sleep .5; exit 2
1564             4     [...]   1       0       echo 7;sleep .7; exit 1
1565             2     [...]   0       0       echo 9;sleep .9; exit 0
1566
1567           --retry-failed cares about the Exitval, but takes the command from
1568           the joblog. It ignores any arguments or commands given on the
1569           command line:
1570
1571             parallel --retry-failed --joblog log -j10 this part is ignored
1572
1573             Seq   [...]   Exitval Signal  Command
1574             [... as above ...]
1575             10    [...]   1       0       echo 1;sleep .1; exit 1
1576             6     [...]   2       0       echo 5;sleep .5; exit 2
1577             4     [...]   1       0       echo 7;sleep .7; exit 1
1578
1579           See also --joblog, --resume, --resume-failed, --retries.
1580
1581       --retries n
1582           If a job fails, retry it on another computer on which it has not
1583           failed. Do this n times. If there are fewer than n computers in
1584           --sshlogin GNU parallel will re-use all the computers. This is
1585           useful if some jobs fail for no apparent reason (such as network
1586           failure).
1587
1588       --return filename
1589           Transfer files from remote computers. --return is used with
1590           --sshlogin when the arguments are files on the remote computers.
1591           When processing is done the file filename will be transferred from
1592           the remote computer using rsync and will be put relative to the
1593           default login dir. E.g.
1594
1595             echo foo/bar.txt | parallel --return {.}.out \
1596               --sshlogin server.example.com touch {.}.out
1597
1598           This will transfer the file $HOME/foo/bar.out from the computer
1599           server.example.com to the file foo/bar.out after running touch
1600           foo/bar.out on server.example.com.
1601
1602             parallel -S server --trc out/./{}.out touch {}.out ::: in/file
1603
1604           This will transfer the file in/file.out from the computer
1605           server.example.com to the files out/in/file.out after running touch
1606           in/file.out on server.
1607
1608             echo /tmp/foo/bar.txt | parallel --return {.}.out \
1609               --sshlogin server.example.com touch {.}.out
1610
1611           This will transfer the file /tmp/foo/bar.out from the computer
1612           server.example.com to the file /tmp/foo/bar.out after running touch
1613           /tmp/foo/bar.out on server.example.com.
1614
1615           Multiple files can be transferred by repeating the option multiple
1616           times:
1617
1618             echo /tmp/foo/bar.txt | parallel \
1619               --sshlogin server.example.com \
1620               --return {.}.out --return {.}.out2 touch {.}.out {.}.out2
1621
1622           --return is often used with --transferfile and --cleanup.
1623
1624           --return is ignored when used with --sshlogin : or when not used
1625           with --sshlogin.
1626
1627       --round-robin
1628       --round
1629           Normally --pipe will give a single block to each instance of the
1630           command. With --roundrobin all blocks will at random be written to
1631           commands already running. This is useful if the command takes a
1632           long time to initialize.
1633
1634           --keep-order will not work with --roundrobin as it is impossible to
1635           track which input block corresponds to which output.
1636
1637           --roundrobin implies --pipe, except if --pipepart is given.
1638
1639           See also --group-by, --shard.
1640
1641       --rpl 'tag perl expression'
1642           Use tag as a replacement string for perl expression. This makes it
1643           possible to define your own replacement strings. GNU parallel's 7
1644           replacement strings are implemented as:
1645
1646             --rpl '{} '
1647             --rpl '{#} 1 $_=$job->seq()'
1648             --rpl '{%} 1 $_=$job->slot()'
1649             --rpl '{/} s:.*/::'
1650             --rpl '{//} $Global::use{"File::Basename"} ||=
1651               eval "use File::Basename; 1;"; $_ = dirname($_);'
1652             --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
1653             --rpl '{.} s:\.[^/.]+$::'
1654
1655           The --plus replacement strings are implemented as:
1656
1657             --rpl '{+/} s:/[^/]*$::'
1658             --rpl '{+.} s:.*\.::'
1659             --rpl '{+..} s:.*\.([^.]*\.):$1:'
1660             --rpl '{+...} s:.*\.([^.]*\.[^.]*\.):$1:'
1661             --rpl '{..} s:\.[^/.]+$::; s:\.[^/.]+$::'
1662             --rpl '{...} s:\.[^/.]+$::; s:\.[^/.]+$::; s:\.[^/.]+$::'
1663             --rpl '{/..} s:.*/::; s:\.[^/.]+$::; s:\.[^/.]+$::'
1664             --rpl '{/...} s:.*/::;s:\.[^/.]+$::;s:\.[^/.]+$::;s:\.[^/.]+$::'
1665             --rpl '{##} $_=total_jobs()'
1666             --rpl '{:-(.+?)} $_ ||= $$1'
1667             --rpl '{:(\d+?)} substr($_,0,$$1) = ""'
1668             --rpl '{:(\d+?):(\d+?)} $_ = substr($_,$$1,$$2);'
1669             --rpl '{#([^#].*?)} s/^$$1//;'
1670             --rpl '{%(.+?)} s/$$1$//;'
1671             --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;'
1672             --rpl '{^(.+?)} s/^($$1)/uc($1)/e;'
1673             --rpl '{^^(.+?)} s/($$1)/uc($1)/eg;'
1674             --rpl '{,(.+?)} s/^($$1)/lc($1)/e;'
1675             --rpl '{,,(.+?)} s/($$1)/lc($1)/eg;'
1676
1677           If the user defined replacement string starts with '{' it can also
1678           be used as a positional replacement string (like {2.}).
1679
1680           It is recommended to only change $_ but you have full access to all
1681           of GNU parallel's internal functions and data structures.
1682
1683           Here are a few examples:
1684
1685             Is the job sequence even or odd?
1686             --rpl '{odd} $_ = seq() % 2 ? "odd" : "even"'
1687             Pad job sequence with leading zeros to get equal width
1688             --rpl '{0#} $f=1+int("".(log(total_jobs())/log(10)));
1689               $_=sprintf("%0${f}d",seq())'
1690             Job sequence counting from 0
1691             --rpl '{#0} $_ = seq() - 1'
1692             Job slot counting from 2
1693             --rpl '{%1} $_ = slot() + 1'
1694             Remove all extensions
1695             --rpl '{:} s:(\.[^/]+)*$::'
1696
1697           You can have dynamic replacement strings by including parenthesis
1698           in the replacement string and adding a regular expression between
1699           the parenthesis. The matching string will be inserted as $$1:
1700
1701             parallel --rpl '{%(.*?)} s/$$1//' echo {%.tar.gz} ::: my.tar.gz
1702             parallel --rpl '{:%(.+?)} s:$$1(\.[^/]+)*$::' \
1703               echo {:%_file} ::: my_file.tar.gz
1704             parallel -n3 --rpl '{/:%(.*?)} s:.*/(.*)$$1(\.[^/]+)*$:$1:' \
1705               echo job {#}: {2} {2.} {3/:%_1} ::: a/b.c c/d.e f/g_1.h.i
1706
1707           You can even use multiple matches:
1708
1709             parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;'
1710               echo {/replacethis/withthis} {/b/C} ::: a_replacethis_b
1711
1712             parallel --rpl '{(.*?)/(.*?)} $_="$$2$_$$1"' \
1713               echo {swap/these} ::: -middle-
1714
1715           See also: {= perl expression =} --parens
1716
1717       --rsync-opts options
1718           Options to pass on to rsync. Setting --rsync-opts takes precedence
1719           over setting the environment variable $PARALLEL_RSYNC_OPTS.
1720
1721       --max-chars=max-chars
1722       -s max-chars
1723           Use at most max-chars characters per command line, including the
1724           command and initial-arguments and the terminating nulls at the ends
1725           of the argument strings.  The largest allowed value is system-
1726           dependent, and is calculated as the argument length limit for exec,
1727           less the size of your environment.  The default value is the
1728           maximum.
1729
1730           Implies -X unless -m is set.
1731
1732       --show-limits
1733           Display the limits on the command-line length which are imposed by
1734           the operating system and the -s option.  Pipe the input from
1735           /dev/null (and perhaps specify --no-run-if-empty) if you don't want
1736           GNU parallel to do anything.
1737
1738       --semaphore
1739           Work as a counting semaphore. --semaphore will cause GNU parallel
1740           to start command in the background. When the number of jobs given
1741           by --jobs is reached, GNU parallel will wait for one of these to
1742           complete before starting another command.
1743
1744           --semaphore implies --bg unless --fg is specified.
1745
1746           --semaphore implies --semaphorename `tty` unless --semaphorename is
1747           specified.
1748
1749           Used with --fg, --wait, and --semaphorename.
1750
1751           The command sem is an alias for parallel --semaphore.
1752
1753           See also man sem.
1754
1755       --semaphorename name
1756       --id name
1757           Use name as the name of the semaphore. Default is the name of the
1758           controlling tty (output from tty).
1759
1760           The default normally works as expected when used interactively, but
1761           when used in a script name should be set. $$ or my_task_name are
1762           often a good value.
1763
1764           The semaphore is stored in ~/.parallel/semaphores/
1765
1766           Implies --semaphore.
1767
1768           See also man sem.
1769
1770       --semaphoretimeout secs
1771       --st secs
1772           If secs > 0: If the semaphore is not released within secs seconds,
1773           take it anyway.
1774
1775           If secs < 0: If the semaphore is not released within secs seconds,
1776           exit.
1777
1778           Implies --semaphore.
1779
1780           See also man sem.
1781
1782       --seqreplace replace-str
1783           Use the replacement string replace-str instead of {#} for job
1784           sequence number.
1785
1786       --session
1787           Record names in current environment in $PARALLEL_IGNORED_NAMES and
1788           exit. Only used with env_parallel. Aliases, functions, and
1789           variables with names in $PARALLEL_IGNORED_NAMES will not be copied.
1790
1791           Only supported in Ash, Bash, Dash, Ksh, Sh, and Zsh.
1792
1793           See also --env, --record-env.
1794
1795       --shard shardexpr (alpha testing)
1796           Use shardexpr as shard key and shard input to the jobs.
1797
1798           shardexpr is [column number|column name] [perlexpression] e.g. 3,
1799           Address, 3 $_%=100, Address s/\d//g.
1800
1801           Each input line is split using --colsep. The value of the column is
1802           put into $_, the perl expression is executed, the resulting value
1803           is hashed so that all lines of a given value is given to the same
1804           job slot.
1805
1806           This is similar to sharding in databases.
1807
1808           The performance is in the order of 100K rows per second. Faster if
1809           the shardcol is small (<10), slower if it is big (>100).
1810
1811           --shard requires --pipe and a fixed numeric value for --jobs.
1812
1813           See also --bin, --group-by, --roundrobin.
1814
1815       --shebang
1816       --hashbang
1817           GNU parallel can be called as a shebang (#!) command as the first
1818           line of a script. The content of the file will be treated as
1819           inputsource.
1820
1821           Like this:
1822
1823             #!/usr/bin/parallel --shebang -r wget
1824
1825             https://ftpmirror.gnu.org/parallel/parallel-20120822.tar.bz2
1826             https://ftpmirror.gnu.org/parallel/parallel-20130822.tar.bz2
1827             https://ftpmirror.gnu.org/parallel/parallel-20140822.tar.bz2
1828
1829           --shebang must be set as the first option.
1830
1831           On FreeBSD env is needed:
1832
1833             #!/usr/bin/env -S parallel --shebang -r wget
1834
1835             https://ftpmirror.gnu.org/parallel/parallel-20120822.tar.bz2
1836             https://ftpmirror.gnu.org/parallel/parallel-20130822.tar.bz2
1837             https://ftpmirror.gnu.org/parallel/parallel-20140822.tar.bz2
1838
1839           There are many limitations of shebang (#!) depending on your
1840           operating system. See details on
1841           http://www.in-ulm.de/~mascheck/various/shebang/
1842
1843       --shebang-wrap
1844           GNU parallel can parallelize scripts by wrapping the shebang line.
1845           If the program can be run like this:
1846
1847             cat arguments | parallel the_program
1848
1849           then the script can be changed to:
1850
1851             #!/usr/bin/parallel --shebang-wrap /original/parser --options
1852
1853           E.g.
1854
1855             #!/usr/bin/parallel --shebang-wrap /usr/bin/python
1856
1857           If the program can be run like this:
1858
1859             cat data | parallel --pipe the_program
1860
1861           then the script can be changed to:
1862
1863             #!/usr/bin/parallel --shebang-wrap --pipe /orig/parser --opts
1864
1865           E.g.
1866
1867             #!/usr/bin/parallel --shebang-wrap --pipe /usr/bin/perl -w
1868
1869           --shebang-wrap must be set as the first option.
1870
1871       --shellquote
1872           Does not run the command but quotes it. Useful for making quoted
1873           composed commands for GNU parallel.
1874
1875           Multiple --shellquote with quote the string multiple times, so
1876           parallel --shellquote | parallel --shellquote can be written as
1877           parallel --shellquote --shellquote.
1878
1879       --shuf
1880           Shuffle jobs. When having multiple input sources it is hard to
1881           randomize jobs. --shuf will generate all jobs, and shuffle them
1882           before running them. This is useful to get a quick preview of the
1883           results before running the full batch.
1884
1885       --skip-first-line
1886           Do not use the first line of input (used by GNU parallel itself
1887           when called with --shebang).
1888
1889       --sql DBURL (obsolete)
1890           Use --sqlmaster instead.
1891
1892       --sqlmaster DBURL
1893           Submit jobs via SQL server. DBURL must point to a table, which will
1894           contain the same information as --joblog, the values from the input
1895           sources (stored in columns V1 .. Vn), and the output (stored in
1896           columns Stdout and Stderr).
1897
1898           If DBURL is prepended with '+' GNU parallel assumes the table is
1899           already made with the correct columns and appends the jobs to it.
1900
1901           If DBURL is not prepended with '+' the table will be dropped and
1902           created with the correct amount of V-columns unless
1903
1904           --sqlmaster does not run any jobs, but it creates the values for
1905           the jobs to be run. One or more --sqlworker must be run to actually
1906           execute the jobs.
1907
1908           If --wait is set, GNU parallel will wait for the jobs to complete.
1909
1910           The format of a DBURL is:
1911
1912             [sql:]vendor://[[user][:pwd]@][host][:port]/[db]/table
1913
1914           E.g.
1915
1916             sql:mysql://hr:hr@localhost:3306/hrdb/jobs
1917             mysql://scott:tiger@my.example.com/pardb/paralleljobs
1918             sql:oracle://scott:tiger@ora.example.com/xe/parjob
1919             postgresql://scott:tiger@pg.example.com/pgdb/parjob
1920             pg:///parjob
1921             sqlite3:///pardb/parjob
1922
1923           It can also be an alias from ~/.sql/aliases:
1924
1925             :myalias mysql:///mydb/paralleljobs
1926
1927       --sqlandworker DBURL
1928           Shorthand for: --sqlmaster DBURL --sqlworker DBURL.
1929
1930       --sqlworker DBURL
1931           Execute jobs via SQL server. Read the input sources variables from
1932           the table pointed to by DBURL. The command on the command line
1933           should be the same as given by --sqlmaster.
1934
1935           If you have more than one --sqlworker jobs may be run more than
1936           once.
1937
1938           If --sqlworker runs on the local machine, the hostname in the SQL
1939           table will not be ':' but instead the hostname of the machine.
1940
1941       --ssh sshcommand
1942           GNU parallel defaults to using ssh for remote access. This can be
1943           overridden with --ssh. It can also be set on a per server basis
1944           (see --sshlogin).
1945
1946       --sshdelay secs
1947           Delay starting next ssh by secs seconds. GNU parallel will pause
1948           secs seconds after starting each ssh. secs can be less than 1
1949           seconds.
1950
1951       -S
1952       [@hostgroups/][ncpus/]sshlogin[,[@hostgroups/][ncpus/]sshlogin[,...]]
1953       -S @hostgroup
1954       --sshlogin
1955       [@hostgroups/][ncpus/]sshlogin[,[@hostgroups/][ncpus/]sshlogin[,...]]
1956       --sshlogin @hostgroup
1957           Distribute jobs to remote computers. The jobs will be run on a list
1958           of remote computers.
1959
1960           If hostgroups is given, the sshlogin will be added to that
1961           hostgroup. Multiple hostgroups are separated by '+'. The sshlogin
1962           will always be added to a hostgroup named the same as sshlogin.
1963
1964           If only the @hostgroup is given, only the sshlogins in that
1965           hostgroup will be used. Multiple @hostgroup can be given.
1966
1967           GNU parallel will determine the number of CPUs on the remote
1968           computers and run the number of jobs as specified by -j.  If the
1969           number ncpus is given GNU parallel will use this number for number
1970           of CPUs on the host. Normally ncpus will not be needed.
1971
1972           An sshlogin is of the form:
1973
1974             [sshcommand [options]] [username@]hostname
1975
1976           The sshlogin must not require a password (ssh-agent, ssh-copy-id,
1977           and sshpass may help with that).
1978
1979           The sshlogin ':' is special, it means 'no ssh' and will therefore
1980           run on the local computer.
1981
1982           The sshlogin '..' is special, it read sshlogins from
1983           ~/.parallel/sshloginfile or $XDG_CONFIG_HOME/parallel/sshloginfile
1984
1985           The sshlogin '-' is special, too, it read sshlogins from stdin
1986           (standard input).
1987
1988           To specify more sshlogins separate the sshlogins by comma, newline
1989           (in the same string), or repeat the options multiple times.
1990
1991           For examples: see --sshloginfile.
1992
1993           The remote host must have GNU parallel installed.
1994
1995           --sshlogin is known to cause problems with -m and -X.
1996
1997           --sshlogin is often used with --transferfile, --return, --cleanup,
1998           and --trc.
1999
2000       --sshloginfile filename
2001       --slf filename
2002           File with sshlogins. The file consists of sshlogins on separate
2003           lines. Empty lines and lines starting with '#' are ignored.
2004           Example:
2005
2006             server.example.com
2007             username@server2.example.com
2008             8/my-8-cpu-server.example.com
2009             2/my_other_username@my-dualcore.example.net
2010             # This server has SSH running on port 2222
2011             ssh -p 2222 server.example.net
2012             4/ssh -p 2222 quadserver.example.net
2013             # Use a different ssh program
2014             myssh -p 2222 -l myusername hexacpu.example.net
2015             # Use a different ssh program with default number of CPUs
2016             //usr/local/bin/myssh -p 2222 -l myusername hexacpu
2017             # Use a different ssh program with 6 CPUs
2018             6//usr/local/bin/myssh -p 2222 -l myusername hexacpu
2019             # Assume 16 CPUs on the local computer
2020             16/:
2021             # Put server1 in hostgroup1
2022             @hostgroup1/server1
2023             # Put myusername@server2 in hostgroup1+hostgroup2
2024             @hostgroup1+hostgroup2/myusername@server2
2025             # Force 4 CPUs and put 'ssh -p 2222 server3' in hostgroup1
2026             @hostgroup1/4/ssh -p 2222 server3
2027
2028           When using a different ssh program the last argument must be the
2029           hostname.
2030
2031           Multiple --sshloginfile are allowed.
2032
2033           GNU parallel will first look for the file in current dir; if that
2034           fails it look for the file in ~/.parallel.
2035
2036           The sshloginfile '..' is special, it read sshlogins from
2037           ~/.parallel/sshloginfile
2038
2039           The sshloginfile '.' is special, it read sshlogins from
2040           /etc/parallel/sshloginfile
2041
2042           The sshloginfile '-' is special, too, it read sshlogins from stdin
2043           (standard input).
2044
2045           If the sshloginfile is changed it will be re-read when a job
2046           finishes though at most once per second. This makes it possible to
2047           add and remove hosts while running.
2048
2049           This can be used to have a daemon that updates the sshloginfile to
2050           only contain servers that are up:
2051
2052               cp original.slf tmp2.slf
2053               while [ 1 ] ; do
2054                 nice parallel --nonall -j0 -k --slf original.slf \
2055                   --tag echo | perl 's/\t$//' > tmp.slf
2056                 if diff tmp.slf tmp2.slf; then
2057                   mv tmp.slf tmp2.slf
2058                 fi
2059                 sleep 10
2060               done &
2061               parallel --slf tmp2.slf ...
2062
2063       --slotreplace replace-str
2064           Use the replacement string replace-str instead of {%} for job slot
2065           number.
2066
2067       --silent
2068           Silent.  The job to be run will not be printed. This is the
2069           default.  Can be reversed with -v.
2070
2071       --tty
2072           Open terminal tty. If GNU parallel is used for starting a program
2073           that accesses the tty (such as an interactive program) then this
2074           option may be needed. It will default to starting only one job at a
2075           time (i.e. -j1), not buffer the output (i.e. -u), and it will open
2076           a tty for the job.
2077
2078           You can of course override -j1 and -u.
2079
2080           Using --tty unfortunately means that GNU parallel cannot kill the
2081           jobs (with --timeout, --memfree, or --halt). This is due to GNU
2082           parallel giving each child its own process group, which is then
2083           killed. Process groups are dependant on the tty.
2084
2085       --tag (beta testing)
2086           Tag lines with arguments. Each output line will be prepended with
2087           the arguments and TAB (\t). When combined with --onall or --nonall
2088           the lines will be prepended with the sshlogin instead.
2089
2090           --tag is ignored when using -u.
2091
2092       --tagstring str (beta testing)
2093           Tag lines with a string. Each output line will be prepended with
2094           str and TAB (\t). str can contain replacement strings such as {}.
2095
2096           --tagstring is ignored when using -u, --onall, and --nonall.
2097
2098       --tee
2099           Pipe all data to all jobs. Used with --pipe/--pipepart and :::.
2100
2101             seq 1000 | parallel --pipe --tee -v wc {} ::: -w -l -c
2102
2103           How many numbers in 1..1000 contain 0..9, and how many bytes do
2104           they fill:
2105
2106             seq 1000 | parallel --pipe --tee --tag \
2107               'grep {1} | wc {2}' ::: {0..9} ::: -l -c
2108
2109           How many words contain a..z and how many bytes do they fill?
2110
2111             parallel -a /usr/share/dict/words --pipepart --tee --tag \
2112               'grep {1} | wc {2}' ::: {a..z} ::: -l -c
2113
2114       --termseq sequence
2115           Termination sequence. When a job is killed due to --timeout,
2116           --memfree, --halt, or abnormal termination of GNU parallel,
2117           sequence determines how the job is killed. The default is:
2118
2119               TERM,200,TERM,100,TERM,50,KILL,25
2120
2121           which sends a TERM signal, waits 200 ms, sends another TERM signal,
2122           waits 100 ms, sends another TERM signal, waits 50 ms, sends a KILL
2123           signal, waits 25 ms, and exits. GNU parallel detects if a process
2124           dies before the waiting time is up.
2125
2126       --tmpdir dirname
2127           Directory for temporary files. GNU parallel normally buffers output
2128           into temporary files in /tmp. By setting --tmpdir you can use a
2129           different dir for the files. Setting --tmpdir is equivalent to
2130           setting $TMPDIR.
2131
2132       --tmux (Long beta testing)
2133           Use tmux for output. Start a tmux session and run each job in a
2134           window in that session. No other output will be produced.
2135
2136       --tmuxpane (Long beta testing)
2137           Use tmux for output but put output into panes in the first window.
2138           Useful if you want to monitor the progress of less than 100
2139           concurrent jobs.
2140
2141       --timeout duration
2142           Time out for command. If the command runs for longer than duration
2143           seconds it will get killed as per --termseq.
2144
2145           If duration is followed by a % then the timeout will dynamically be
2146           computed as a percentage of the median average runtime of
2147           successful jobs. Only values > 100% will make sense.
2148
2149           duration is normally in seconds, but can be floats postfixed with
2150           s, m, h, or d which would multiply the float by 1, 60, 3600, or
2151           86400. Thus these are equivalent: --timeout 100000 and --timeout
2152           1d3.5h16.6m4s.
2153
2154       --verbose
2155       -t  Print the job to be run on stderr (standard error).
2156
2157           See also -v, -p.
2158
2159       --transfer
2160           Transfer files to remote computers. Shorthand for: --transferfile
2161           {}.
2162
2163       --transferfile filename
2164       --tf filename
2165           --transferfile is used with --sshlogin to transfer files to the
2166           remote computers. The files will be transferred using rsync and
2167           will be put relative to the default work dir. If the path contains
2168           /./ the remaining path will be relative to the work dir. E.g.
2169
2170             echo foo/bar.txt | parallel --transferfile {} \
2171               --sshlogin server.example.com wc
2172
2173           This will transfer the file foo/bar.txt to the computer
2174           server.example.com to the file $HOME/foo/bar.txt before running wc
2175           foo/bar.txt on server.example.com.
2176
2177             echo /tmp/foo/bar.txt | parallel --transferfile {} \
2178               --sshlogin server.example.com wc
2179
2180           This will transfer the file /tmp/foo/bar.txt to the computer
2181           server.example.com to the file /tmp/foo/bar.txt before running wc
2182           /tmp/foo/bar.txt on server.example.com.
2183
2184             echo /tmp/./foo/bar.txt | parallel --transferfile {} \
2185               --sshlogin server.example.com wc {= s:.*/./:./: =}
2186
2187           This will transfer the file /tmp/foo/bar.txt to the computer
2188           server.example.com to the file foo/bar.txt before running wc
2189           ./foo/bar.txt on server.example.com.
2190
2191           --transferfile is often used with --return and --cleanup. A
2192           shorthand for --transferfile {} is --transfer.
2193
2194           --transferfile is ignored when used with --sshlogin : or when not
2195           used with --sshlogin.
2196
2197       --trc filename
2198           Transfer, Return, Cleanup. Shorthand for:
2199
2200           --transferfile {} --return filename --cleanup
2201
2202       --trim <n|l|r|lr|rl>
2203           Trim white space in input.
2204
2205           n   No trim. Input is not modified. This is the default.
2206
2207           l   Left trim. Remove white space from start of input. E.g. " a bc
2208               " -> "a bc ".
2209
2210           r   Right trim. Remove white space from end of input. E.g. " a bc "
2211               -> " a bc".
2212
2213           lr
2214           rl  Both trim. Remove white space from both start and end of input.
2215               E.g. " a bc " -> "a bc". This is the default if --colsep is
2216               used.
2217
2218       --ungroup
2219       -u  Ungroup output.  Output is printed as soon as possible and bypasses
2220           GNU parallel internal processing. This may cause output from
2221           different commands to be mixed thus should only be used if you do
2222           not care about the output. Compare these:
2223
2224             seq 4 | parallel -j0 \
2225               'sleep {};echo -n start{};sleep {};echo {}end'
2226             seq 4 | parallel -u -j0 \
2227               'sleep {};echo -n start{};sleep {};echo {}end'
2228
2229           It also disables --tag. GNU parallel outputs faster with -u.
2230           Compare the speeds of these:
2231
2232             parallel seq ::: 300000000 >/dev/null
2233             parallel -u seq ::: 300000000 >/dev/null
2234             parallel --line-buffer seq ::: 300000000 >/dev/null
2235
2236           Can be reversed with --group.
2237
2238           See also: --line-buffer --group
2239
2240       --extensionreplace replace-str
2241       --er replace-str
2242           Use the replacement string replace-str instead of {.} for input
2243           line without extension.
2244
2245       --use-sockets-instead-of-threads
2246       --use-cores-instead-of-threads
2247       --use-cpus-instead-of-cores (obsolete)
2248           Determine how GNU parallel counts the number of CPUs. GNU parallel
2249           uses this number when the number of jobslots is computed relative
2250           to the number of CPUs (e.g. 100% or +1).
2251
2252           CPUs can be counted in three different ways:
2253
2254           sockets The number of filled CPU sockets (i.e. the number of
2255                   physical chips).
2256
2257           cores   The number of physical cores (i.e. the number of physical
2258                   compute cores).
2259
2260           threads The number of hyperthreaded cores (i.e. the number of
2261                   virtual cores - with some of them possibly being
2262                   hyperthreaded)
2263
2264           Normally the number of CPUs is computed as the number of CPU
2265           threads. With --use-sockets-instead-of-threads or
2266           --use-cores-instead-of-threads you can force it to be computed as
2267           the number of filled sockets or number of cores instead.
2268
2269           Most users will not need these options.
2270
2271           --use-cpus-instead-of-cores is a (misleading) alias for
2272           --use-sockets-instead-of-threads and is kept for backwards
2273           compatibility.
2274
2275       -v  Verbose.  Print the job to be run on stdout (standard output). Can
2276           be reversed with --silent. See also -t.
2277
2278           Use -v -v to print the wrapping ssh command when running remotely.
2279
2280       --version
2281       -V  Print the version GNU parallel and exit.
2282
2283       --workdir mydir
2284       --wd mydir
2285           Files transferred using --transferfile and --return will be
2286           relative to mydir on remote computers, and the command will be
2287           executed in the dir mydir.
2288
2289           The special mydir value ... will create working dirs under
2290           ~/.parallel/tmp/ on the remote computers. If --cleanup is given
2291           these dirs will be removed.
2292
2293           The special mydir value . uses the current working dir.  If the
2294           current working dir is beneath your home dir, the value . is
2295           treated as the relative path to your home dir. This means that if
2296           your home dir is different on remote computers (e.g. if your login
2297           is different) the relative path will still be relative to your home
2298           dir.
2299
2300           To see the difference try:
2301
2302             parallel -S server pwd ::: ""
2303             parallel --wd . -S server pwd ::: ""
2304             parallel --wd ... -S server pwd ::: ""
2305
2306           mydir can contain GNU parallel's replacement strings.
2307
2308       --wait
2309           Wait for all commands to complete.
2310
2311           Used with --semaphore or --sqlmaster.
2312
2313           See also man sem.
2314
2315       -X  Multiple arguments with context replace. Insert as many arguments
2316           as the command line length permits. If multiple jobs are being run
2317           in parallel: distribute the arguments evenly among the jobs. Use
2318           -j1 to avoid this.
2319
2320           If {} is not used the arguments will be appended to the line.  If
2321           {} is used as part of a word (like pic{}.jpg) then the whole word
2322           will be repeated. If {} is used multiple times each {} will be
2323           replaced with the arguments.
2324
2325           Normally -X will do the right thing, whereas -m can give unexpected
2326           results if {} is used as part of a word.
2327
2328           Support for -X with --sshlogin is limited and may fail.
2329
2330           See also -m.
2331
2332       --exit
2333       -x  Exit if the size (see the -s option) is exceeded.
2334

EXAMPLE: Working as xargs -n1. Argument appending

2336       GNU parallel can work similar to xargs -n1.
2337
2338       To compress all html files using gzip run:
2339
2340         find . -name '*.html' | parallel gzip --best
2341
2342       If the file names may contain a newline use -0. Substitute FOO BAR with
2343       FUBAR in all files in this dir and subdirs:
2344
2345         find . -type f -print0 | \
2346           parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'
2347
2348       Note -q is needed because of the space in 'FOO BAR'.
2349

EXAMPLE: Simple network scanner

2351       prips can generate IP-addresses from CIDR notation. With GNU parallel
2352       you can build a simple network scanner to see which addresses respond
2353       to ping:
2354
2355         prips 130.229.16.0/20 | \
2356           parallel --timeout 2 -j0 \
2357             'ping -c 1 {} >/dev/null && echo {}' 2>/dev/null
2358

EXAMPLE: Reading arguments from command line

2360       GNU parallel can take the arguments from command line instead of stdin
2361       (standard input). To compress all html files in the current dir using
2362       gzip run:
2363
2364         parallel gzip --best ::: *.html
2365
2366       To convert *.wav to *.mp3 using LAME running one process per CPU run:
2367
2368         parallel lame {} -o {.}.mp3 ::: *.wav
2369

EXAMPLE: Inserting multiple arguments

2371       When moving a lot of files like this: mv *.log destdir you will
2372       sometimes get the error:
2373
2374         bash: /bin/mv: Argument list too long
2375
2376       because there are too many files. You can instead do:
2377
2378         ls | grep -E '\.log$' | parallel mv {} destdir
2379
2380       This will run mv for each file. It can be done faster if mv gets as
2381       many arguments that will fit on the line:
2382
2383         ls | grep -E '\.log$' | parallel -m mv {} destdir
2384
2385       In many shells you can also use printf:
2386
2387         printf '%s\0' *.log | parallel -0 -m mv {} destdir
2388

EXAMPLE: Context replace

2390       To remove the files pict0000.jpg .. pict9999.jpg you could do:
2391
2392         seq -w 0 9999 | parallel rm pict{}.jpg
2393
2394       You could also do:
2395
2396         seq -w 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm
2397
2398       The first will run rm 10000 times, while the last will only run rm as
2399       many times needed to keep the command line length short enough to avoid
2400       Argument list too long (it typically runs 1-2 times).
2401
2402       You could also run:
2403
2404         seq -w 0 9999 | parallel -X rm pict{}.jpg
2405
2406       This will also only run rm as many times needed to keep the command
2407       line length short enough.
2408

EXAMPLE: Compute intensive jobs and substitution

2410       If ImageMagick is installed this will generate a thumbnail of a jpg
2411       file:
2412
2413         convert -geometry 120 foo.jpg thumb_foo.jpg
2414
2415       This will run with number-of-cpus jobs in parallel for all jpg files in
2416       a directory:
2417
2418         ls *.jpg | parallel convert -geometry 120 {} thumb_{}
2419
2420       To do it recursively use find:
2421
2422         find . -name '*.jpg' | \
2423           parallel convert -geometry 120 {} {}_thumb.jpg
2424
2425       Notice how the argument has to start with {} as {} will include path
2426       (e.g. running convert -geometry 120 ./foo/bar.jpg thumb_./foo/bar.jpg
2427       would clearly be wrong). The command will generate files like
2428       ./foo/bar.jpg_thumb.jpg.
2429
2430       Use {.} to avoid the extra .jpg in the file name. This command will
2431       make files like ./foo/bar_thumb.jpg:
2432
2433         find . -name '*.jpg' | \
2434           parallel convert -geometry 120 {} {.}_thumb.jpg
2435

EXAMPLE: Substitution and redirection

2437       This will generate an uncompressed version of .gz-files next to the
2438       .gz-file:
2439
2440         parallel zcat {} ">"{.} ::: *.gz
2441
2442       Quoting of > is necessary to postpone the redirection. Another solution
2443       is to quote the whole command:
2444
2445         parallel "zcat {} >{.}" ::: *.gz
2446
2447       Other special shell characters (such as * ; $ > < | >> <<) also need to
2448       be put in quotes, as they may otherwise be interpreted by the shell and
2449       not given to GNU parallel.
2450

EXAMPLE: Composed commands

2452       A job can consist of several commands. This will print the number of
2453       files in each directory:
2454
2455         ls | parallel 'echo -n {}" "; ls {}|wc -l'
2456
2457       To put the output in a file called <name>.dir:
2458
2459         ls | parallel '(echo -n {}" "; ls {}|wc -l) >{}.dir'
2460
2461       Even small shell scripts can be run by GNU parallel:
2462
2463         find . | parallel 'a={}; name=${a##*/};' \
2464           'upper=$(echo "$name" | tr "[:lower:]" "[:upper:]");'\
2465           'echo "$name - $upper"'
2466
2467         ls | parallel 'mv {} "$(echo {} | tr "[:upper:]" "[:lower:]")"'
2468
2469       Given a list of URLs, list all URLs that fail to download. Print the
2470       line number and the URL.
2471
2472         cat urlfile | parallel "wget {} 2>/dev/null || grep -n {} urlfile"
2473
2474       Create a mirror directory with the same filenames except all files and
2475       symlinks are empty files.
2476
2477         cp -rs /the/source/dir mirror_dir
2478         find mirror_dir -type l | parallel -m rm {} '&&' touch {}
2479
2480       Find the files in a list that do not exist
2481
2482         cat file_list | parallel 'if [ ! -e {} ] ; then echo {}; fi'
2483

EXAMPLE: Composed command with perl replacement string

2485       You have a bunch of file. You want them sorted into dirs. The dir of
2486       each file should be named the first letter of the file name.
2487
2488         parallel 'mkdir -p {=s/(.).*/$1/=}; mv {} {=s/(.).*/$1/=}' ::: *
2489

EXAMPLE: Composed command with multiple input sources

2491       You have a dir with files named as 24 hours in 5 minute intervals:
2492       00:00, 00:05, 00:10 .. 23:55. You want to find the files missing:
2493
2494         parallel [ -f {1}:{2} ] "||" echo {1}:{2} does not exist \
2495           ::: {00..23} ::: {00..55..5}
2496

EXAMPLE: Calling Bash functions

2498       If the composed command is longer than a line, it becomes hard to read.
2499       In Bash you can use functions. Just remember to export -f the function.
2500
2501         doit() {
2502           echo Doing it for $1
2503           sleep 2
2504           echo Done with $1
2505         }
2506         export -f doit
2507         parallel doit ::: 1 2 3
2508
2509         doubleit() {
2510           echo Doing it for $1 $2
2511           sleep 2
2512           echo Done with $1 $2
2513         }
2514         export -f doubleit
2515         parallel doubleit ::: 1 2 3 ::: a b
2516
2517       To do this on remote servers you need to transfer the function using
2518       --env:
2519
2520         parallel --env doit -S server doit ::: 1 2 3
2521         parallel --env doubleit -S server doubleit ::: 1 2 3 ::: a b
2522
2523       If your environment (aliases, variables, and functions) is small you
2524       can copy the full environment without having to export -f anything. See
2525       env_parallel.
2526

EXAMPLE: Function tester

2528       To test a program with different parameters:
2529
2530         tester() {
2531           if (eval "$@") >&/dev/null; then
2532             perl -e 'printf "\033[30;102m[ OK ]\033[0m @ARGV\n"' "$@"
2533           else
2534             perl -e 'printf "\033[30;101m[FAIL]\033[0m @ARGV\n"' "$@"
2535           fi
2536         }
2537         export -f tester
2538         parallel tester my_program ::: arg1 arg2
2539         parallel tester exit ::: 1 0 2 0
2540
2541       If my_program fails a red FAIL will be printed followed by the failing
2542       command; otherwise a green OK will be printed followed by the command.
2543

EXAMPLE: Log rotate

2545       Log rotation renames a logfile to an extension with a higher number:
2546       log.1 becomes log.2, log.2 becomes log.3, and so on. The oldest log is
2547       removed. To avoid overwriting files the process starts backwards from
2548       the high number to the low number.  This will keep 10 old versions of
2549       the log:
2550
2551         seq 9 -1 1 | parallel -j1 mv log.{} log.'{= $_++ =}'
2552         mv log log.1
2553

EXAMPLE: Removing file extension when processing files

2555       When processing files removing the file extension using {.} is often
2556       useful.
2557
2558       Create a directory for each zip-file and unzip it in that dir:
2559
2560         parallel 'mkdir {.}; cd {.}; unzip ../{}' ::: *.zip
2561
2562       Recompress all .gz files in current directory using bzip2 running 1 job
2563       per CPU in parallel:
2564
2565         parallel "zcat {} | bzip2 >{.}.bz2 && rm {}" ::: *.gz
2566
2567       Convert all WAV files to MP3 using LAME:
2568
2569         find sounddir -type f -name '*.wav' | parallel lame {} -o {.}.mp3
2570
2571       Put all converted in the same directory:
2572
2573         find sounddir -type f -name '*.wav' | \
2574           parallel lame {} -o mydir/{/.}.mp3
2575

EXAMPLE: Removing strings from the argument

2577       If you have directory with tar.gz files and want these extracted in the
2578       corresponding dir (e.g foo.tar.gz will be extracted in the dir foo) you
2579       can do:
2580
2581         parallel --plus 'mkdir {..}; tar -C {..} -xf {}' ::: *.tar.gz
2582
2583       If you want to remove a different ending, you can use {%string}:
2584
2585         parallel --plus echo {%_demo} ::: mycode_demo keep_demo_here
2586
2587       You can also remove a starting string with {#string}
2588
2589         parallel --plus echo {#demo_} ::: demo_mycode keep_demo_here
2590
2591       To remove a string anywhere you can use regular expressions with
2592       {/regexp/replacement} and leave the replacement empty:
2593
2594         parallel --plus echo {/demo_/} ::: demo_mycode remove_demo_here
2595

EXAMPLE: Download 24 images for each of the past 30 days

2597       Let us assume a website stores images like:
2598
2599         http://www.example.com/path/to/YYYYMMDD_##.jpg
2600
2601       where YYYYMMDD is the date and ## is the number 01-24. This will
2602       download images for the past 30 days:
2603
2604         getit() {
2605           date=$(date -d "today -$1 days" +%Y%m%d)
2606           num=$2
2607           echo wget http://www.example.com/path/to/${date}_${num}.jpg
2608         }
2609         export -f getit
2610
2611         parallel getit ::: $(seq 30) ::: $(seq -w 24)
2612
2613       $(date -d "today -$1 days" +%Y%m%d) will give the dates in YYYYMMDD
2614       with $1 days subtracted.
2615

EXAMPLE: Download world map from NASA

2617       NASA provides tiles to download on earthdata.nasa.gov. Download tiles
2618       for Blue Marble world map and create a 10240x20480 map.
2619
2620         base=https://map1a.vis.earthdata.nasa.gov/wmts-geo/wmts.cgi
2621         service="SERVICE=WMTS&REQUEST=GetTile&VERSION=1.0.0"
2622         layer="LAYER=BlueMarble_ShadedRelief_Bathymetry"
2623         set="STYLE=&TILEMATRIXSET=EPSG4326_500m&TILEMATRIX=5"
2624         tile="TILEROW={1}&TILECOL={2}"
2625         format="FORMAT=image%2Fjpeg"
2626         url="$base?$service&$layer&$set&$tile&$format"
2627
2628         parallel -j0 -q wget "$url" -O {1}_{2}.jpg ::: {0..19} ::: {0..39}
2629         parallel eval convert +append {}_{0..39}.jpg line{}.jpg ::: {0..19}
2630         convert -append line{0..19}.jpg world.jpg
2631

EXAMPLE: Download Apollo-11 images from NASA using jq

2633       Search NASA using their API to get JSON for images related to 'apollo
2634       11' and has 'moon landing' in the description.
2635
2636       The search query returns JSON containing URLs to JSON containing
2637       collections of pictures. One of the pictures in each of these
2638       collection is large.
2639
2640       wget is used to get the JSON for the search query. jq is then used to
2641       extract the URLs of the collections. parallel then calls wget to get
2642       each collection, which is passed to jq to extract the URLs of all
2643       images. grep filters out the large images, and parallel finally uses
2644       wget to fetch the images.
2645
2646         base="https://images-api.nasa.gov/search"
2647         q="q=apollo 11"
2648         description="description=moon landing"
2649         media_type="media_type=image"
2650         wget -O - "$base?$q&$description&$media_type" |
2651           jq -r .collection.items[].href |
2652           parallel wget -O - |
2653           jq -r .[] |
2654           grep large |
2655           parallel wget
2656

EXAMPLE: Download video playlist in parallel

2658       youtube-dl is an excellent tool to download videos. It can, however,
2659       not download videos in parallel. This takes a playlist and downloads 10
2660       videos in parallel.
2661
2662         url='youtu.be/watch?v=0wOf2Fgi3DE&list=UU_cznB5YZZmvAmeq7Y3EriQ'
2663         export url
2664         youtube-dl --flat-playlist "https://$url" |
2665           parallel --tagstring {#} --lb -j10 \
2666             youtube-dl --playlist-start {#} --playlist-end {#} '"https://$url"'
2667

EXAMPLE: Prepend last modified date (ISO8601) to file name

2669         parallel mv {} '{= $a=pQ($_); $b=$_;' \
2670           '$_=qx{date -r "$a" +%FT%T}; chomp; $_="$_ $b" =}' ::: *
2671
2672       {= and =} mark a perl expression. pQ perl-quotes the string. date
2673       +%FT%T is the date in ISO8601 with time.
2674

EXAMPLE: Save output in ISO8601 dirs

2676       Save output from ps aux every second into dirs named
2677       yyyy-mm-ddThh:mm:ss+zz:zz.
2678
2679         seq 1000 | parallel -N0 -j1 --delay 1 \
2680           --results '{= $_=`date -Isec`; chomp=}/' ps aux
2681

EXAMPLE: Digital clock with "blinking" :

2683       The : in a digital clock blinks. To make every other line have a ':'
2684       and the rest a ' ' a perl expression is used to look at the 3rd input
2685       source. If the value modulo 2 is 1: Use ":" otherwise use " ":
2686
2687         parallel -k echo {1}'{=3 $_=$_%2?":":" "=}'{2}{3} \
2688           ::: {0..12} ::: {0..5} ::: {0..9}
2689

EXAMPLE: Aggregating content of files

2691       This:
2692
2693         parallel --header : echo x{X}y{Y}z{Z} \> x{X}y{Y}z{Z} \
2694         ::: X {1..5} ::: Y {01..10} ::: Z {1..5}
2695
2696       will generate the files x1y01z1 .. x5y10z5. If you want to aggregate
2697       the output grouping on x and z you can do this:
2698
2699         parallel eval 'cat {=s/y01/y*/=} > {=s/y01//=}' ::: *y01*
2700
2701       For all values of x and z it runs commands like:
2702
2703         cat x1y*z1 > x1z1
2704
2705       So you end up with x1z1 .. x5z5 each containing the content of all
2706       values of y.
2707

EXAMPLE: Breadth first parallel web crawler/mirrorer

2709       This script below will crawl and mirror a URL in parallel.  It
2710       downloads first pages that are 1 click down, then 2 clicks down, then
2711       3; instead of the normal depth first, where the first link link on each
2712       page is fetched first.
2713
2714       Run like this:
2715
2716         PARALLEL=-j100 ./parallel-crawl http://gatt.org.yeslab.org/
2717
2718       Remove the wget part if you only want a web crawler.
2719
2720       It works by fetching a page from a list of URLs and looking for links
2721       in that page that are within the same starting URL and that have not
2722       already been seen. These links are added to a new queue. When all the
2723       pages from the list is done, the new queue is moved to the list of URLs
2724       and the process is started over until no unseen links are found.
2725
2726         #!/bin/bash
2727
2728         # E.g. http://gatt.org.yeslab.org/
2729         URL=$1
2730         # Stay inside the start dir
2731         BASEURL=$(echo $URL | perl -pe 's:#.*::; s:(//.*/)[^/]*:$1:')
2732         URLLIST=$(mktemp urllist.XXXX)
2733         URLLIST2=$(mktemp urllist.XXXX)
2734         SEEN=$(mktemp seen.XXXX)
2735
2736         # Spider to get the URLs
2737         echo $URL >$URLLIST
2738         cp $URLLIST $SEEN
2739
2740         while [ -s $URLLIST ] ; do
2741           cat $URLLIST |
2742             parallel lynx -listonly -image_links -dump {} \; \
2743               wget -qm -l1 -Q1 {} \; echo Spidered: {} \>\&2 |
2744               perl -ne 's/#.*//; s/\s+\d+.\s(\S+)$/$1/ and
2745                 do { $seen{$1}++ or print }' |
2746             grep -F $BASEURL |
2747             grep -v -x -F -f $SEEN | tee -a $SEEN > $URLLIST2
2748           mv $URLLIST2 $URLLIST
2749         done
2750
2751         rm -f $URLLIST $URLLIST2 $SEEN
2752

EXAMPLE: Process files from a tar file while unpacking

2754       If the files to be processed are in a tar file then unpacking one file
2755       and processing it immediately may be faster than first unpacking all
2756       files.
2757
2758         tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' | \
2759           parallel echo
2760
2761       The Perl one-liner is needed to make sure the file is complete before
2762       handing it to GNU parallel.
2763

EXAMPLE: Rewriting a for-loop and a while-read-loop

2765       for-loops like this:
2766
2767         (for x in `cat list` ; do
2768           do_something $x
2769         done) | process_output
2770
2771       and while-read-loops like this:
2772
2773         cat list | (while read x ; do
2774           do_something $x
2775         done) | process_output
2776
2777       can be written like this:
2778
2779         cat list | parallel do_something | process_output
2780
2781       For example: Find which host name in a list has IP address 1.2.3 4:
2782
2783         cat hosts.txt | parallel -P 100 host | grep 1.2.3.4
2784
2785       If the processing requires more steps the for-loop like this:
2786
2787         (for x in `cat list` ; do
2788           no_extension=${x%.*};
2789           do_step1 $x scale $no_extension.jpg
2790           do_step2 <$x $no_extension
2791         done) | process_output
2792
2793       and while-loops like this:
2794
2795         cat list | (while read x ; do
2796           no_extension=${x%.*};
2797           do_step1 $x scale $no_extension.jpg
2798           do_step2 <$x $no_extension
2799         done) | process_output
2800
2801       can be written like this:
2802
2803         cat list | parallel "do_step1 {} scale {.}.jpg ; do_step2 <{} {.}" |\
2804           process_output
2805
2806       If the body of the loop is bigger, it improves readability to use a
2807       function:
2808
2809         (for x in `cat list` ; do
2810           do_something $x
2811           [... 100 lines that do something with $x ...]
2812         done) | process_output
2813
2814         cat list | (while read x ; do
2815           do_something $x
2816           [... 100 lines that do something with $x ...]
2817         done) | process_output
2818
2819       can both be rewritten as:
2820
2821         doit() {
2822           x=$1
2823           do_something $x
2824           [... 100 lines that do something with $x ...]
2825         }
2826         export -f doit
2827         cat list | parallel doit
2828

EXAMPLE: Rewriting nested for-loops

2830       Nested for-loops like this:
2831
2832         (for x in `cat xlist` ; do
2833           for y in `cat ylist` ; do
2834             do_something $x $y
2835           done
2836         done) | process_output
2837
2838       can be written like this:
2839
2840         parallel do_something {1} {2} :::: xlist ylist | process_output
2841
2842       Nested for-loops like this:
2843
2844         (for colour in red green blue ; do
2845           for size in S M L XL XXL ; do
2846             echo $colour $size
2847           done
2848         done) | sort
2849
2850       can be written like this:
2851
2852         parallel echo {1} {2} ::: red green blue ::: S M L XL XXL | sort
2853

EXAMPLE: Finding the lowest difference between files

2855       diff is good for finding differences in text files. diff | wc -l gives
2856       an indication of the size of the difference. To find the differences
2857       between all files in the current dir do:
2858
2859         parallel --tag 'diff {1} {2} | wc -l' ::: * ::: * | sort -nk3
2860
2861       This way it is possible to see if some files are closer to other files.
2862

EXAMPLE: for-loops with column names

2864       When doing multiple nested for-loops it can be easier to keep track of
2865       the loop variable if is is named instead of just having a number. Use
2866       --header : to let the first argument be an named alias for the
2867       positional replacement string:
2868
2869         parallel --header : echo {colour} {size} \
2870           ::: colour red green blue ::: size S M L XL XXL
2871
2872       This also works if the input file is a file with columns:
2873
2874         cat addressbook.tsv | \
2875           parallel --colsep '\t' --header : echo {Name} {E-mail address}
2876

EXAMPLE: All combinations in a list

2878       GNU parallel makes all combinations when given two lists.
2879
2880       To make all combinations in a single list with unique values, you
2881       repeat the list and use replacement string {choose_k}:
2882
2883         parallel --plus echo {choose_k} ::: A B C D ::: A B C D
2884
2885         parallel --plus echo 2{2choose_k} 1{1choose_k} ::: A B C D ::: A B C D
2886
2887       {choose_k} works for any number of input sources:
2888
2889         parallel --plus echo {choose_k} ::: A B C D ::: A B C D ::: A B C D
2890

EXAMPLE: From a to b and b to c

2892       Assume you have input like:
2893
2894         aardvark
2895         babble
2896         cab
2897         dab
2898         each
2899
2900       and want to run combinations like:
2901
2902         aardvark babble
2903         babble cab
2904         cab dab
2905         dab each
2906
2907       If the input is in the file in.txt:
2908
2909         parallel echo {1} - {2} ::::+ <(head -n -1 in.txt) <(tail -n +2 in.txt)
2910
2911       If the input is in the array $a here are two solutions:
2912
2913         seq $((${#a[@]}-1)) | \
2914           env_parallel --env a echo '${a[{=$_--=}]} - ${a[{}]}'
2915         parallel echo {1} - {2} ::: "${a[@]::${#a[@]}-1}" :::+ "${a[@]:1}"
2916

EXAMPLE: Count the differences between all files in a dir

2918       Using --results the results are saved in /tmp/diffcount*.
2919
2920         parallel --results /tmp/diffcount "diff -U 0 {1} {2} | \
2921           tail -n +3 |grep -v '^@'|wc -l" ::: * ::: *
2922
2923       To see the difference between file A and file B look at the file
2924       '/tmp/diffcount/1/A/2/B'.
2925

EXAMPLE: Speeding up fast jobs

2927       Starting a job on the local machine takes around 10 ms. This can be a
2928       big overhead if the job takes very few ms to run. Often you can group
2929       small jobs together using -X which will make the overhead less
2930       significant. Compare the speed of these:
2931
2932         seq -w 0 9999 | parallel touch pict{}.jpg
2933         seq -w 0 9999 | parallel -X touch pict{}.jpg
2934
2935       If your program cannot take multiple arguments, then you can use GNU
2936       parallel to spawn multiple GNU parallels:
2937
2938         seq -w 0 9999999 | \
2939           parallel -j10 -q -I,, --pipe parallel -j0 touch pict{}.jpg
2940
2941       If -j0 normally spawns 252 jobs, then the above will try to spawn 2520
2942       jobs. On a normal GNU/Linux system you can spawn 32000 jobs using this
2943       technique with no problems. To raise the 32000 jobs limit raise
2944       /proc/sys/kernel/pid_max to 4194303.
2945
2946       If you do not need GNU parallel to have control over each job (so no
2947       need for --retries or --joblog or similar), then it can be even faster
2948       if you can generate the command lines and pipe those to a shell. So if
2949       you can do this:
2950
2951         mygenerator | sh
2952
2953       Then that can be parallelized like this:
2954
2955         mygenerator | parallel --pipe --block 10M sh
2956
2957       E.g.
2958
2959         mygenerator() {
2960           seq 10000000 | perl -pe 'print "echo This is fast job number "';
2961         }
2962         mygenerator | parallel --pipe --block 10M sh
2963
2964       The overhead is 100000 times smaller namely around 100 nanoseconds per
2965       job.
2966

EXAMPLE: Using shell variables

2968       When using shell variables you need to quote them correctly as they may
2969       otherwise be interpreted by the shell.
2970
2971       Notice the difference between:
2972
2973         ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
2974         parallel echo ::: ${ARR[@]} # This is probably not what you want
2975
2976       and:
2977
2978         ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
2979         parallel echo ::: "${ARR[@]}"
2980
2981       When using variables in the actual command that contains special
2982       characters (e.g. space) you can quote them using '"$VAR"' or using "'s
2983       and -q:
2984
2985         VAR="My brother's 12\" records are worth <\$\$\$>"
2986         parallel -q echo "$VAR" ::: '!'
2987         export VAR
2988         parallel echo '"$VAR"' ::: '!'
2989
2990       If $VAR does not contain ' then "'$VAR'" will also work (and does not
2991       need export):
2992
2993         VAR="My 12\" records are worth <\$\$\$>"
2994         parallel echo "'$VAR'" ::: '!'
2995
2996       If you use them in a function you just quote as you normally would do:
2997
2998         VAR="My brother's 12\" records are worth <\$\$\$>"
2999         export VAR
3000         myfunc() { echo "$VAR" "$1"; }
3001         export -f myfunc
3002         parallel myfunc ::: '!'
3003

EXAMPLE: Group output lines

3005       When running jobs that output data, you often do not want the output of
3006       multiple jobs to run together. GNU parallel defaults to grouping the
3007       output of each job, so the output is printed when the job finishes. If
3008       you want full lines to be printed while the job is running you can use
3009       --line-buffer. If you want output to be printed as soon as possible you
3010       can use -u.
3011
3012       Compare the output of:
3013
3014         parallel wget --limit-rate=100k \
3015           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3016           ::: {12..16}
3017         parallel --line-buffer wget --limit-rate=100k \
3018           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3019           ::: {12..16}
3020         parallel -u wget --limit-rate=100k \
3021           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3022           ::: {12..16}
3023

EXAMPLE: Tag output lines

3025       GNU parallel groups the output lines, but it can be hard to see where
3026       the different jobs begin. --tag prepends the argument to make that more
3027       visible:
3028
3029         parallel --tag wget --limit-rate=100k \
3030           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3031           ::: {12..16}
3032
3033       --tag works with --line-buffer but not with -u:
3034
3035         parallel --tag --line-buffer wget --limit-rate=100k \
3036           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3037           ::: {12..16}
3038
3039       Check the uptime of the servers in ~/.parallel/sshloginfile:
3040
3041         parallel --tag -S .. --nonall uptime
3042

EXAMPLE: Colorize output

3044       Give each job a new color. Most terminals support ANSI colors with the
3045       escape code "\033[30;3Xm" where 0 <= X <= 7:
3046
3047           seq 10 | \
3048             parallel --tagstring '\033[30;3{=$_=++$::color%8=}m' seq {}
3049           parallel --rpl '{color} $_="\033[30;3".(++$::color%8)."m"' \
3050             --tagstring {color} seq {} ::: {1..10}
3051
3052       To get rid of the initial \t (which comes from --tagstring):
3053
3054           ... | perl -pe 's/\t//'
3055

EXAMPLE: Keep order of output same as order of input

3057       Normally the output of a job will be printed as soon as it completes.
3058       Sometimes you want the order of the output to remain the same as the
3059       order of the input. This is often important, if the output is used as
3060       input for another system. -k will make sure the order of output will be
3061       in the same order as input even if later jobs end before earlier jobs.
3062
3063       Append a string to every line in a text file:
3064
3065         cat textfile | parallel -k echo {} append_string
3066
3067       If you remove -k some of the lines may come out in the wrong order.
3068
3069       Another example is traceroute:
3070
3071         parallel traceroute ::: qubes-os.org debian.org freenetproject.org
3072
3073       will give traceroute of qubes-os.org, debian.org and
3074       freenetproject.org, but it will be sorted according to which job
3075       completed first.
3076
3077       To keep the order the same as input run:
3078
3079         parallel -k traceroute ::: qubes-os.org debian.org freenetproject.org
3080
3081       This will make sure the traceroute to qubes-os.org will be printed
3082       first.
3083
3084       A bit more complex example is downloading a huge file in chunks in
3085       parallel: Some internet connections will deliver more data if you
3086       download files in parallel. For downloading files in parallel see:
3087       "EXAMPLE: Download 10 images for each of the past 30 days". But if you
3088       are downloading a big file you can download the file in chunks in
3089       parallel.
3090
3091       To download byte 10000000-19999999 you can use curl:
3092
3093         curl -r 10000000-19999999 http://example.com/the/big/file >file.part
3094
3095       To download a 1 GB file we need 100 10MB chunks downloaded and combined
3096       in the correct order.
3097
3098         seq 0 99 | parallel -k curl -r \
3099           {}0000000-{}9999999 http://example.com/the/big/file > file
3100

EXAMPLE: Parallel grep

3102       grep -r greps recursively through directories. On multicore CPUs GNU
3103       parallel can often speed this up.
3104
3105         find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}
3106
3107       This will run 1.5 job per CPU, and give 1000 arguments to grep.
3108

EXAMPLE: Grepping n lines for m regular expressions.

3110       The simplest solution to grep a big file for a lot of regexps is:
3111
3112         grep -f regexps.txt bigfile
3113
3114       Or if the regexps are fixed strings:
3115
3116         grep -F -f regexps.txt bigfile
3117
3118       There are 3 limiting factors: CPU, RAM, and disk I/O.
3119
3120       RAM is easy to measure: If the grep process takes up most of your free
3121       memory (e.g. when running top), then RAM is a limiting factor.
3122
3123       CPU is also easy to measure: If the grep takes >90% CPU in top, then
3124       the CPU is a limiting factor, and parallelization will speed this up.
3125
3126       It is harder to see if disk I/O is the limiting factor, and depending
3127       on the disk system it may be faster or slower to parallelize. The only
3128       way to know for certain is to test and measure.
3129
3130   Limiting factor: RAM
3131       The normal grep -f regexs.txt bigfile works no matter the size of
3132       bigfile, but if regexps.txt is so big it cannot fit into memory, then
3133       you need to split this.
3134
3135       grep -F takes around 100 bytes of RAM and grep takes about 500 bytes of
3136       RAM per 1 byte of regexp. So if regexps.txt is 1% of your RAM, then it
3137       may be too big.
3138
3139       If you can convert your regexps into fixed strings do that. E.g. if the
3140       lines you are looking for in bigfile all looks like:
3141
3142         ID1 foo bar baz Identifier1 quux
3143         fubar ID2 foo bar baz Identifier2
3144
3145       then your regexps.txt can be converted from:
3146
3147         ID1.*Identifier1
3148         ID2.*Identifier2
3149
3150       into:
3151
3152         ID1 foo bar baz Identifier1
3153         ID2 foo bar baz Identifier2
3154
3155       This way you can use grep -F which takes around 80% less memory and is
3156       much faster.
3157
3158       If it still does not fit in memory you can do this:
3159
3160         parallel --pipepart -a regexps.txt --block 1M grep -Ff - -n bigfile | \
3161           sort -un | perl -pe 's/^\d+://'
3162
3163       The 1M should be your free memory divided by the number of CPU threads
3164       and divided by 200 for grep -F and by 1000 for normal grep. On
3165       GNU/Linux you can do:
3166
3167         free=$(awk '/^((Swap)?Cached|MemFree|Buffers):/ { sum += $2 }
3168                     END { print sum }' /proc/meminfo)
3169         percpu=$((free / 200 / $(parallel --number-of-threads)))k
3170
3171         parallel --pipepart -a regexps.txt --block $percpu --compress \
3172           grep -F -f - -n bigfile | \
3173           sort -un | perl -pe 's/^\d+://'
3174
3175       If you can live with duplicated lines and wrong order, it is faster to
3176       do:
3177
3178         parallel --pipepart -a regexps.txt --block $percpu --compress \
3179           grep -F -f - bigfile
3180
3181   Limiting factor: CPU
3182       If the CPU is the limiting factor parallelization should be done on the
3183       regexps:
3184
3185         cat regexp.txt | parallel --pipe -L1000 --roundrobin --compress \
3186           grep -f - -n bigfile | \
3187           sort -un | perl -pe 's/^\d+://'
3188
3189       The command will start one grep per CPU and read bigfile one time per
3190       CPU, but as that is done in parallel, all reads except the first will
3191       be cached in RAM. Depending on the size of regexp.txt it may be faster
3192       to use --block 10m instead of -L1000.
3193
3194       Some storage systems perform better when reading multiple chunks in
3195       parallel. This is true for some RAID systems and for some network file
3196       systems. To parallelize the reading of bigfile:
3197
3198         parallel --pipepart --block 100M -a bigfile -k --compress \
3199           grep -f regexp.txt
3200
3201       This will split bigfile into 100MB chunks and run grep on each of these
3202       chunks. To parallelize both reading of bigfile and regexp.txt combine
3203       the two using --fifo:
3204
3205         parallel --pipepart --block 100M -a bigfile --fifo cat regexp.txt \
3206           \| parallel --pipe -L1000 --roundrobin grep -f - {}
3207
3208       If a line matches multiple regexps, the line may be duplicated.
3209
3210   Bigger problem
3211       If the problem is too big to be solved by this, you are probably ready
3212       for Lucene.
3213

EXAMPLE: Using remote computers

3215       To run commands on a remote computer SSH needs to be set up and you
3216       must be able to login without entering a password (The commands ssh-
3217       copy-id, ssh-agent, and sshpass may help you do that).
3218
3219       If you need to login to a whole cluster, you typically do not want to
3220       accept the host key for every host. You want to accept them the first
3221       time and be warned if they are ever changed. To do that:
3222
3223         # Add the servers to the sshloginfile
3224         (echo servera; echo serverb) > .parallel/my_cluster
3225         # Make sure .ssh/config exist
3226         touch .ssh/config
3227         cp .ssh/config .ssh/config.backup
3228         # Disable StrictHostKeyChecking temporarily
3229         (echo 'Host *'; echo StrictHostKeyChecking no) >> .ssh/config
3230         parallel --slf my_cluster --nonall true
3231         # Remove the disabling of StrictHostKeyChecking
3232         mv .ssh/config.backup .ssh/config
3233
3234       The servers in .parallel/my_cluster are now added in .ssh/known_hosts.
3235
3236       To run echo on server.example.com:
3237
3238         seq 10 | parallel --sshlogin server.example.com echo
3239
3240       To run commands on more than one remote computer run:
3241
3242         seq 10 | parallel --sshlogin s1.example.com,s2.example.net echo
3243
3244       Or:
3245
3246         seq 10 | parallel --sshlogin server.example.com \
3247           --sshlogin server2.example.net echo
3248
3249       If the login username is foo on server2.example.net use:
3250
3251         seq 10 | parallel --sshlogin server.example.com \
3252           --sshlogin foo@server2.example.net echo
3253
3254       If your list of hosts is server1-88.example.net with login foo:
3255
3256         seq 10 | parallel -Sfoo@server{1..88}.example.net echo
3257
3258       To distribute the commands to a list of computers, make a file
3259       mycomputers with all the computers:
3260
3261         server.example.com
3262         foo@server2.example.com
3263         server3.example.com
3264
3265       Then run:
3266
3267         seq 10 | parallel --sshloginfile mycomputers echo
3268
3269       To include the local computer add the special sshlogin ':' to the list:
3270
3271         server.example.com
3272         foo@server2.example.com
3273         server3.example.com
3274         :
3275
3276       GNU parallel will try to determine the number of CPUs on each of the
3277       remote computers, and run one job per CPU - even if the remote
3278       computers do not have the same number of CPUs.
3279
3280       If the number of CPUs on the remote computers is not identified
3281       correctly the number of CPUs can be added in front. Here the computer
3282       has 8 CPUs.
3283
3284         seq 10 | parallel --sshlogin 8/server.example.com echo
3285

EXAMPLE: Transferring of files

3287       To recompress gzipped files with bzip2 using a remote computer run:
3288
3289         find logs/ -name '*.gz' | \
3290           parallel --sshlogin server.example.com \
3291           --transfer "zcat {} | bzip2 -9 >{.}.bz2"
3292
3293       This will list the .gz-files in the logs directory and all directories
3294       below. Then it will transfer the files to server.example.com to the
3295       corresponding directory in $HOME/logs. On server.example.com the file
3296       will be recompressed using zcat and bzip2 resulting in the
3297       corresponding file with .gz replaced with .bz2.
3298
3299       If you want the resulting bz2-file to be transferred back to the local
3300       computer add --return {.}.bz2:
3301
3302         find logs/ -name '*.gz' | \
3303           parallel --sshlogin server.example.com \
3304           --transfer --return {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3305
3306       After the recompressing is done the .bz2-file is transferred back to
3307       the local computer and put next to the original .gz-file.
3308
3309       If you want to delete the transferred files on the remote computer add
3310       --cleanup. This will remove both the file transferred to the remote
3311       computer and the files transferred from the remote computer:
3312
3313         find logs/ -name '*.gz' | \
3314           parallel --sshlogin server.example.com \
3315           --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
3316
3317       If you want run on several computers add the computers to --sshlogin
3318       either using ',' or multiple --sshlogin:
3319
3320         find logs/ -name '*.gz' | \
3321           parallel --sshlogin server.example.com,server2.example.com \
3322           --sshlogin server3.example.com \
3323           --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
3324
3325       You can add the local computer using --sshlogin :. This will disable
3326       the removing and transferring for the local computer only:
3327
3328         find logs/ -name '*.gz' | \
3329           parallel --sshlogin server.example.com,server2.example.com \
3330           --sshlogin server3.example.com \
3331           --sshlogin : \
3332           --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
3333
3334       Often --transfer, --return and --cleanup are used together. They can be
3335       shortened to --trc:
3336
3337         find logs/ -name '*.gz' | \
3338           parallel --sshlogin server.example.com,server2.example.com \
3339           --sshlogin server3.example.com \
3340           --sshlogin : \
3341           --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3342
3343       With the file mycomputers containing the list of computers it becomes:
3344
3345         find logs/ -name '*.gz' | parallel --sshloginfile mycomputers \
3346           --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3347
3348       If the file ~/.parallel/sshloginfile contains the list of computers the
3349       special short hand -S .. can be used:
3350
3351         find logs/ -name '*.gz' | parallel -S .. \
3352           --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3353

EXAMPLE: Distributing work to local and remote computers

3355       Convert *.mp3 to *.ogg running one process per CPU on local computer
3356       and server2:
3357
3358         parallel --trc {.}.ogg -S server2,: \
3359           'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg' ::: *.mp3
3360

EXAMPLE: Running the same command on remote computers

3362       To run the command uptime on remote computers you can do:
3363
3364         parallel --tag --nonall -S server1,server2 uptime
3365
3366       --nonall reads no arguments. If you have a list of jobs you want to run
3367       on each computer you can do:
3368
3369         parallel --tag --onall -S server1,server2 echo ::: 1 2 3
3370
3371       Remove --tag if you do not want the sshlogin added before the output.
3372
3373       If you have a lot of hosts use '-j0' to access more hosts in parallel.
3374

EXAMPLE: Using remote computers behind NAT wall

3376       If the workers are behind a NAT wall, you need some trickery to get to
3377       them.
3378
3379       If you can ssh to a jumphost, and reach the workers from there, then
3380       the obvious solution would be this, but it does not work:
3381
3382         parallel --ssh 'ssh jumphost ssh' -S host1 echo ::: DOES NOT WORK
3383
3384       It does not work because the command is dequoted by ssh twice where as
3385       GNU parallel only expects it to be dequoted once.
3386
3387       So instead put this in ~/.ssh/config:
3388
3389         Host host1 host2 host3
3390           ProxyCommand ssh jumphost.domain nc -w 1 %h 22
3391
3392       It requires nc(netcat) to be installed on jumphost. With this you can
3393       simply:
3394
3395         parallel -S host1,host2,host3 echo ::: This does work
3396
3397   No jumphost, but port forwards
3398       If there is no jumphost but each server has port 22 forwarded from the
3399       firewall (e.g. the firewall's port 22001 = port 22 on host1, 22002 =
3400       host2, 22003 = host3) then you can use ~/.ssh/config:
3401
3402         Host host1.v
3403           Port 22001
3404         Host host2.v
3405           Port 22002
3406         Host host3.v
3407           Port 22003
3408         Host *.v
3409           Hostname firewall
3410
3411       And then use host{1..3}.v as normal hosts:
3412
3413         parallel -S host1.v,host2.v,host3.v echo ::: a b c
3414
3415   No jumphost, no port forwards
3416       If ports cannot be forwarded, you need some sort of VPN to traverse the
3417       NAT-wall. TOR is one options for that, as it is very easy to get
3418       working.
3419
3420       You need to install TOR and setup a hidden service. In torrc put:
3421
3422         HiddenServiceDir /var/lib/tor/hidden_service/
3423         HiddenServicePort 22 127.0.0.1:22
3424
3425       Then start TOR: /etc/init.d/tor restart
3426
3427       The TOR hostname is now in /var/lib/tor/hidden_service/hostname and is
3428       something similar to izjafdceobowklhz.onion. Now you simply prepend
3429       torsocks to ssh:
3430
3431         parallel --ssh 'torsocks ssh' -S izjafdceobowklhz.onion \
3432           -S zfcdaeiojoklbwhz.onion,auclucjzobowklhi.onion echo ::: a b c
3433
3434       If not all hosts are accessible through TOR:
3435
3436         parallel -S 'torsocks ssh izjafdceobowklhz.onion,host2,host3' \
3437           echo ::: a b c
3438
3439       See more ssh tricks on
3440       https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Proxies_and_Jump_Hosts
3441

EXAMPLE: Parallelizing rsync

3443       rsync is a great tool, but sometimes it will not fill up the available
3444       bandwidth. Running multiple rsync in parallel can fix this.
3445
3446         cd src-dir
3447         find . -type f |
3448           parallel -j10 -X rsync -zR -Ha ./{} fooserver:/dest-dir/
3449
3450       Adjust -j10 until you find the optimal number.
3451
3452       rsync -R will create the needed subdirectories, so all files are not
3453       put into a single dir. The ./ is needed so the resulting command looks
3454       similar to:
3455
3456         rsync -zR ././sub/dir/file fooserver:/dest-dir/
3457
3458       The /./ is what rsync -R works on.
3459
3460       If you are unable to push data, but need to pull them and the files are
3461       called digits.png (e.g. 000000.png) you might be able to do:
3462
3463         seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/
3464

EXAMPLE: Use multiple inputs in one command

3466       Copy files like foo.es.ext to foo.ext:
3467
3468         ls *.es.* | perl -pe 'print; s/\.es//' | parallel -N2 cp {1} {2}
3469
3470       The perl command spits out 2 lines for each input. GNU parallel takes 2
3471       inputs (using -N2) and replaces {1} and {2} with the inputs.
3472
3473       Count in binary:
3474
3475         parallel -k echo ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1
3476
3477       Print the number on the opposing sides of a six sided die:
3478
3479         parallel --link -a <(seq 6) -a <(seq 6 -1 1) echo
3480         parallel --link echo :::: <(seq 6) <(seq 6 -1 1)
3481
3482       Convert files from all subdirs to PNG-files with consecutive numbers
3483       (useful for making input PNG's for ffmpeg):
3484
3485         parallel --link -a <(find . -type f | sort) \
3486           -a <(seq $(find . -type f|wc -l)) convert {1} {2}.png
3487
3488       Alternative version:
3489
3490         find . -type f | sort | parallel convert {} {#}.png
3491

EXAMPLE: Use a table as input

3493       Content of table_file.tsv:
3494
3495         foo<TAB>bar
3496         baz <TAB> quux
3497
3498       To run:
3499
3500         cmd -o bar -i foo
3501         cmd -o quux -i baz
3502
3503       you can run:
3504
3505         parallel -a table_file.tsv --colsep '\t' cmd -o {2} -i {1}
3506
3507       Note: The default for GNU parallel is to remove the spaces around the
3508       columns. To keep the spaces:
3509
3510         parallel -a table_file.tsv --trim n --colsep '\t' cmd -o {2} -i {1}
3511

EXAMPLE: Output to database

3513       GNU parallel can output to a database table and a CSV-file:
3514
3515         DBURL=csv:///%2Ftmp%2Fmy.csv
3516         DBTABLEURL=$DBURL/mytable
3517         parallel --sqlandworker $DBTABLEURL seq ::: {1..10}
3518
3519       It is rather slow and takes up a lot of CPU time because GNU parallel
3520       parses the whole CSV file for each update.
3521
3522       A better approach is to use an SQLite-base and then convert that to
3523       CSV:
3524
3525         DBURL=sqlite3:///%2Ftmp%2Fmy.sqlite
3526         DBTABLEURL=$DBURL/mytable
3527         parallel --sqlandworker $DBTABLEURL seq ::: {1..10}
3528         sql $DBURL '.headers on' '.mode csv' 'SELECT * FROM mytable;'
3529
3530       This takes around a second per job.
3531
3532       If you have access to a real database system, such as PostgreSQL, it is
3533       even faster:
3534
3535         DBURL=pg://user:pass@host/mydb
3536         DBTABLEURL=$DBURL/mytable
3537         parallel --sqlandworker $DBTABLEURL seq ::: {1..10}
3538         sql $DBURL \
3539           "COPY (SELECT * FROM mytable) TO stdout DELIMITER ',' CSV HEADER;"
3540
3541       Or MySQL:
3542
3543         DBURL=mysql://user:pass@host/mydb
3544         DBTABLEURL=$DBURL/mytable
3545         parallel --sqlandworker $DBTABLEURL seq ::: {1..10}
3546         sql -p -B $DBURL "SELECT * FROM mytable;" > mytable.tsv
3547         perl -pe 's/"/""/g; s/\t/","/g; s/^/"/; s/$/"/; s/\\\\/\\/g;
3548           s/\\t/\t/g; s/\\n/\n/g;' mytable.tsv
3549

EXAMPLE: Output to CSV-file for R

3551       If you have no need for the advanced job distribution control that a
3552       database provides, but you simply want output into a CSV file that you
3553       can read into R or LibreCalc, then you can use --results:
3554
3555         parallel --results my.csv seq ::: 10 20 30
3556         R
3557         > mydf <- read.csv("my.csv");
3558         > print(mydf[2,])
3559         > write(as.character(mydf[2,c("Stdout")]),'')
3560

EXAMPLE: Use XML as input

3562       The show Aflyttet on Radio 24syv publishes an RSS feed with their audio
3563       podcasts on: http://arkiv.radio24syv.dk/audiopodcast/channel/4466232
3564
3565       Using xpath you can extract the URLs for 2019 and download them using
3566       GNU parallel:
3567
3568         wget -O - http://arkiv.radio24syv.dk/audiopodcast/channel/4466232 | \
3569           xpath -e "//pubDate[contains(text(),'2019')]/../enclosure/@url" | \
3570           parallel -u wget '{= s/ url="//; s/"//; =}'
3571

EXAMPLE: Run the same command 10 times

3573       If you want to run the same command with the same arguments 10 times in
3574       parallel you can do:
3575
3576         seq 10 | parallel -n0 my_command my_args
3577

EXAMPLE: Working as cat | sh. Resource inexpensive jobs and evaluation

3579       GNU parallel can work similar to cat | sh.
3580
3581       A resource inexpensive job is a job that takes very little CPU, disk
3582       I/O and network I/O. Ping is an example of a resource inexpensive job.
3583       wget is too - if the webpages are small.
3584
3585       The content of the file jobs_to_run:
3586
3587         ping -c 1 10.0.0.1
3588         wget http://example.com/status.cgi?ip=10.0.0.1
3589         ping -c 1 10.0.0.2
3590         wget http://example.com/status.cgi?ip=10.0.0.2
3591         ...
3592         ping -c 1 10.0.0.255
3593         wget http://example.com/status.cgi?ip=10.0.0.255
3594
3595       To run 100 processes simultaneously do:
3596
3597         parallel -j 100 < jobs_to_run
3598
3599       As there is not a command the jobs will be evaluated by the shell.
3600

EXAMPLE: Processing a big file using more CPUs

3602       To process a big file or some output you can use --pipe to split up the
3603       data into blocks and pipe the blocks into the processing program.
3604
3605       If the program is gzip -9 you can do:
3606
3607         cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz
3608
3609       This will split bigfile into blocks of 1 MB and pass that to gzip -9 in
3610       parallel. One gzip will be run per CPU. The output of gzip -9 will be
3611       kept in order and saved to bigfile.gz
3612
3613       gzip works fine if the output is appended, but some processing does not
3614       work like that - for example sorting. For this GNU parallel can put the
3615       output of each command into a file. This will sort a big file in
3616       parallel:
3617
3618         cat bigfile | parallel --pipe --files sort |\
3619           parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort
3620
3621       Here bigfile is split into blocks of around 1MB, each block ending in
3622       '\n' (which is the default for --recend). Each block is passed to sort
3623       and the output from sort is saved into files. These files are passed to
3624       the second parallel that runs sort -m on the files before it removes
3625       the files. The output is saved to bigfile.sort.
3626
3627       GNU parallel's --pipe maxes out at around 100 MB/s because every byte
3628       has to be copied through GNU parallel. But if bigfile is a real
3629       (seekable) file GNU parallel can by-pass the copying and send the parts
3630       directly to the program:
3631
3632         parallel --pipepart --block 100m -a bigfile --files sort |\
3633           parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort
3634

EXAMPLE: Grouping input lines

3636       When processing with --pipe you may have lines grouped by a value. Here
3637       is my.csv:
3638
3639          Transaction Customer Item
3640               1       a       53
3641               2       b       65
3642               3       b       82
3643               4       c       96
3644               5       c       67
3645               6       c       13
3646               7       d       90
3647               8       d       43
3648               9       d       91
3649               10      d       84
3650               11      e       72
3651               12      e       102
3652               13      e       63
3653               14      e       56
3654               15      e       74
3655
3656       Let us assume you want GNU parallel to process each customer. In other
3657       words: You want all the transactions for a single customer to be
3658       treated as a single record.
3659
3660       To do this we preprocess the data with a program that inserts a record
3661       separator before each customer (column 2 = $F[1]). Here we first make a
3662       50 character random string, which we then use as the separator:
3663
3664         sep=`perl -e 'print map { ("a".."z","A".."Z")[rand(52)] } (1..50);'`
3665         cat my.csv | \
3666            perl -ape '$F[1] ne $l and print "'$sep'"; $l = $F[1]' | \
3667            parallel --recend $sep --rrs --pipe -N1 wc
3668
3669       If your program can process multiple customers replace -N1 with a
3670       reasonable --blocksize.
3671

EXAMPLE: Running more than 250 jobs workaround

3673       If you need to run a massive amount of jobs in parallel, then you will
3674       likely hit the filehandle limit which is often around 250 jobs. If you
3675       are super user you can raise the limit in /etc/security/limits.conf but
3676       you can also use this workaround. The filehandle limit is per process.
3677       That means that if you just spawn more GNU parallels then each of them
3678       can run 250 jobs. This will spawn up to 2500 jobs:
3679
3680         cat myinput |\
3681           parallel --pipe -N 50 --roundrobin -j50 parallel -j50 your_prg
3682
3683       This will spawn up to 62500 jobs (use with caution - you need 64 GB RAM
3684       to do this, and you may need to increase /proc/sys/kernel/pid_max):
3685
3686         cat myinput |\
3687           parallel --pipe -N 250 --roundrobin -j250 parallel -j250 your_prg
3688

EXAMPLE: Working as mutex and counting semaphore

3690       The command sem is an alias for parallel --semaphore.
3691
3692       A counting semaphore will allow a given number of jobs to be started in
3693       the background.  When the number of jobs are running in the background,
3694       GNU sem will wait for one of these to complete before starting another
3695       command. sem --wait will wait for all jobs to complete.
3696
3697       Run 10 jobs concurrently in the background:
3698
3699         for i in *.log ; do
3700           echo $i
3701           sem -j10 gzip $i ";" echo done
3702         done
3703         sem --wait
3704
3705       A mutex is a counting semaphore allowing only one job to run. This will
3706       edit the file myfile and prepends the file with lines with the numbers
3707       1 to 3.
3708
3709         seq 3 | parallel sem sed -i -e '1i{}' myfile
3710
3711       As myfile can be very big it is important only one process edits the
3712       file at the same time.
3713
3714       Name the semaphore to have multiple different semaphores active at the
3715       same time:
3716
3717         seq 3 | parallel sem --id mymutex sed -i -e '1i{}' myfile
3718

EXAMPLE: Mutex for a script

3720       Assume a script is called from cron or from a web service, but only one
3721       instance can be run at a time. With sem and --shebang-wrap the script
3722       can be made to wait for other instances to finish. Here in bash:
3723
3724         #!/usr/bin/sem --shebang-wrap -u --id $0 --fg /bin/bash
3725
3726         echo This will run
3727         sleep 5
3728         echo exclusively
3729
3730       Here perl:
3731
3732         #!/usr/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/perl
3733
3734         print "This will run ";
3735         sleep 5;
3736         print "exclusively\n";
3737
3738       Here python:
3739
3740         #!/usr/local/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/python
3741
3742         import time
3743         print "This will run ";
3744         time.sleep(5)
3745         print "exclusively";
3746

EXAMPLE: Start editor with filenames from stdin (standard input)

3748       You can use GNU parallel to start interactive programs like emacs or
3749       vi:
3750
3751         cat filelist | parallel --tty -X emacs
3752         cat filelist | parallel --tty -X vi
3753
3754       If there are more files than will fit on a single command line, the
3755       editor will be started again with the remaining files.
3756

EXAMPLE: Running sudo

3758       sudo requires a password to run a command as root. It caches the
3759       access, so you only need to enter the password again if you have not
3760       used sudo for a while.
3761
3762       The command:
3763
3764         parallel sudo echo ::: This is a bad idea
3765
3766       is no good, as you would be prompted for the sudo password for each of
3767       the jobs. You can either do:
3768
3769         sudo echo This
3770         parallel sudo echo ::: is a good idea
3771
3772       or:
3773
3774         sudo parallel echo ::: This is a good idea
3775
3776       This way you only have to enter the sudo password once.
3777

EXAMPLE: GNU Parallel as queue system/batch manager

3779       GNU parallel can work as a simple job queue system or batch manager.
3780       The idea is to put the jobs into a file and have GNU parallel read from
3781       that continuously. As GNU parallel will stop at end of file we use tail
3782       to continue reading:
3783
3784         true >jobqueue; tail -n+0 -f jobqueue | parallel
3785
3786       To submit your jobs to the queue:
3787
3788         echo my_command my_arg >> jobqueue
3789
3790       You can of course use -S to distribute the jobs to remote computers:
3791
3792         true >jobqueue; tail -n+0 -f jobqueue | parallel -S ..
3793
3794       If you keep this running for a long time, jobqueue will grow. A way of
3795       removing the jobs already run is by making GNU parallel stop when it
3796       hits a special value and then restart. To use --eof to make GNU
3797       parallel exit, tail also needs to be forced to exit:
3798
3799         true >jobqueue;
3800         while true; do
3801           tail -n+0 -f jobqueue |
3802             (parallel -E StOpHeRe -S ..; echo GNU Parallel is now done;
3803              perl -e 'while(<>){/StOpHeRe/ and last};print <>' jobqueue > j2;
3804              (seq 1000 >> jobqueue &);
3805              echo Done appending dummy data forcing tail to exit)
3806           echo tail exited;
3807           mv j2 jobqueue
3808         done
3809
3810       In some cases you can run on more CPUs and computers during the night:
3811
3812         # Day time
3813         echo 50% > jobfile
3814         cp day_server_list ~/.parallel/sshloginfile
3815         # Night time
3816         echo 100% > jobfile
3817         cp night_server_list ~/.parallel/sshloginfile
3818         tail -n+0 -f jobqueue | parallel --jobs jobfile -S ..
3819
3820       GNU parallel discovers if jobfile or ~/.parallel/sshloginfile changes.
3821
3822       There is a a small issue when using GNU parallel as queue system/batch
3823       manager: You have to submit JobSlot number of jobs before they will
3824       start, and after that you can submit one at a time, and job will start
3825       immediately if free slots are available.  Output from the running or
3826       completed jobs are held back and will only be printed when JobSlots
3827       more jobs has been started (unless you use --ungroup or --line-buffer,
3828       in which case the output from the jobs are printed immediately).  E.g.
3829       if you have 10 jobslots then the output from the first completed job
3830       will only be printed when job 11 has started, and the output of second
3831       completed job will only be printed when job 12 has started.
3832

EXAMPLE: GNU Parallel as dir processor

3834       If you have a dir in which users drop files that needs to be processed
3835       you can do this on GNU/Linux (If you know what inotifywait is called on
3836       other platforms file a bug report):
3837
3838         inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
3839           parallel -u echo
3840
3841       This will run the command echo on each file put into my_dir or subdirs
3842       of my_dir.
3843
3844       You can of course use -S to distribute the jobs to remote computers:
3845
3846         inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
3847           parallel -S ..  -u echo
3848
3849       If the files to be processed are in a tar file then unpacking one file
3850       and processing it immediately may be faster than first unpacking all
3851       files. Set up the dir processor as above and unpack into the dir.
3852
3853       Using GNU parallel as dir processor has the same limitations as using
3854       GNU parallel as queue system/batch manager.
3855

EXAMPLE: Locate the missing package

3857       If you have downloaded source and tried compiling it, you may have
3858       seen:
3859
3860         $ ./configure
3861         [...]
3862         checking for something.h... no
3863         configure: error: "libsomething not found"
3864
3865       Often it is not obvious which package you should install to get that
3866       file. Debian has `apt-file` to search for a file. `tracefile` from
3867       https://gitlab.com/ole.tange/tangetools can tell which files a program
3868       tried to access. In this case we are interested in one of the last
3869       files:
3870
3871         $ tracefile -un ./configure | tail | parallel -j0 apt-file search
3872

SPREADING BLOCKS OF DATA

3874       --round-robin, --pipe-part, --shard, --bin and --group-by are all
3875       specialized versions of --pipe.
3876
3877       In the following n is the number of jobslots given by --jobs. A record
3878       starts with --recstart and ends with --recend. It is typically a full
3879       line. A chunk is a number of full records that is approximately the
3880       size of a block. A block can contain half records, a chunk cannot.
3881
3882       --pipe starts one job per chunk. It reads blocks from stdin (standard
3883       input). It finds a record end near a block border and passes a chunk to
3884       the program.
3885
3886       --pipe-part starts one job per chunk - just like normal --pipe. It
3887       first finds record endings near all block borders in the file and then
3888       starts the jobs. By using --block -1 it will set the block size to 1/n
3889       * size-of-file. Used this way it will start n jobs in total.
3890
3891       --round-robin starts n jobs in total. It reads a block and passes a
3892       chunk to whichever job is ready to read. It does not parse the content
3893       except for identifying where a record ends to make sure it only passes
3894       full records.
3895
3896       --shard starts n jobs in total. It parses each line to read the value
3897       in the given column. Based on this value the line is passed to one of
3898       the n jobs. All lines having this value will be given to the same
3899       jobslot.
3900
3901       --bin works like --shard but the value of the column is the jobslot
3902       number it will be passed to. If the value is bigger than n, then n will
3903       be subtracted from the value until the values is smaller than or equal
3904       to n.
3905
3906       --group-by starts one job per chunk. Record borders are not given by
3907       --recend/--recstart. Instead a record is defined by a number of lines
3908       having the same value in a given column. So the value of a given column
3909       changes at a chunk border. With --pipe every line is parsed, with
3910       --pipe-part only a few lines are parsed to find the chunk border.
3911
3912       --group-by can be combined with --round-robin or --pipe-part.
3913

QUOTING

3915       GNU parallel is very liberal in quoting. You only need to quote
3916       characters that have special meaning in shell:
3917
3918         ( ) $ ` ' " < > ; | \
3919
3920       and depending on context these needs to be quoted, too:
3921
3922         ~ & # ! ? space * {
3923
3924       Therefore most people will never need more quoting than putting '\' in
3925       front of the special characters.
3926
3927       Often you can simply put \' around every ':
3928
3929         perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' file
3930
3931       can be quoted:
3932
3933         parallel perl -ne \''/^\S+\s+\S+$/ and print $ARGV,"\n"'\' ::: file
3934
3935       However, when you want to use a shell variable you need to quote the
3936       $-sign. Here is an example using $PARALLEL_SEQ. This variable is set by
3937       GNU parallel itself, so the evaluation of the $ must be done by the sub
3938       shell started by GNU parallel:
3939
3940         seq 10 | parallel -N2 echo seq:\$PARALLEL_SEQ arg1:{1} arg2:{2}
3941
3942       If the variable is set before GNU parallel starts you can do this:
3943
3944         VAR=this_is_set_before_starting
3945         echo test | parallel echo {} $VAR
3946
3947       Prints: test this_is_set_before_starting
3948
3949       It is a little more tricky if the variable contains more than one space
3950       in a row:
3951
3952         VAR="two  spaces  between  each  word"
3953         echo test | parallel echo {} \'"$VAR"\'
3954
3955       Prints: test two  spaces  between  each  word
3956
3957       If the variable should not be evaluated by the shell starting GNU
3958       parallel but be evaluated by the sub shell started by GNU parallel,
3959       then you need to quote it:
3960
3961         echo test | parallel VAR=this_is_set_after_starting \; echo {} \$VAR
3962
3963       Prints: test this_is_set_after_starting
3964
3965       It is a little more tricky if the variable contains space:
3966
3967         echo test |\
3968           parallel VAR='"two  spaces  between  each  word"' echo {} \'"$VAR"\'
3969
3970       Prints: test two  spaces  between  each  word
3971
3972       $$ is the shell variable containing the process id of the shell. This
3973       will print the process id of the shell running GNU parallel:
3974
3975         seq 10 | parallel echo $$
3976
3977       And this will print the process ids of the sub shells started by GNU
3978       parallel.
3979
3980         seq 10 | parallel echo \$\$
3981
3982       If the special characters should not be evaluated by the sub shell then
3983       you need to protect it against evaluation from both the shell starting
3984       GNU parallel and the sub shell:
3985
3986         echo test | parallel echo {} \\\$VAR
3987
3988       Prints: test $VAR
3989
3990       GNU parallel can protect against evaluation by the sub shell by using
3991       -q:
3992
3993         echo test | parallel -q echo {} \$VAR
3994
3995       Prints: test $VAR
3996
3997       This is particularly useful if you have lots of quoting. If you want to
3998       run a perl script like this:
3999
4000         perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' file
4001
4002       It needs to be quoted like one of these:
4003
4004         ls | parallel perl -ne '/^\\S+\\s+\\S+\$/\ and\ print\ \$ARGV,\"\\n\"'
4005         ls | parallel perl -ne \''/^\S+\s+\S+$/ and print $ARGV,"\n"'\'
4006
4007       Notice how spaces, \'s, "'s, and $'s need to be quoted. GNU parallel
4008       can do the quoting by using option -q:
4009
4010         ls | parallel -q  perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"'
4011
4012       However, this means you cannot make the sub shell interpret special
4013       characters. For example because of -q this WILL NOT WORK:
4014
4015         ls *.gz | parallel -q "zcat {} >{.}"
4016         ls *.gz | parallel -q "zcat {} | bzip2 >{.}.bz2"
4017
4018       because > and | need to be interpreted by the sub shell.
4019
4020       If you get errors like:
4021
4022         sh: -c: line 0: syntax error near unexpected token
4023         sh: Syntax error: Unterminated quoted string
4024         sh: -c: line 0: unexpected EOF while looking for matching `''
4025         sh: -c: line 1: syntax error: unexpected end of file
4026         zsh:1: no matches found:
4027
4028       then you might try using -q.
4029
4030       If you are using bash process substitution like <(cat foo) then you may
4031       try -q and prepending command with bash -c:
4032
4033         ls | parallel -q bash -c 'wc -c <(echo {})'
4034
4035       Or for substituting output:
4036
4037         ls | parallel -q bash -c \
4038           'tar c {} | tee >(gzip >{}.tar.gz) | bzip2 >{}.tar.bz2'
4039
4040       Conclusion: To avoid dealing with the quoting problems it may be easier
4041       just to write a small script or a function (remember to export -f the
4042       function) and have GNU parallel call that.
4043

LIST RUNNING JOBS

4045       If you want a list of the jobs currently running you can run:
4046
4047         killall -USR1 parallel
4048
4049       GNU parallel will then print the currently running jobs on stderr
4050       (standard error).
4051

COMPLETE RUNNING JOBS BUT DO NOT START NEW JOBS

4053       If you regret starting a lot of jobs you can simply break GNU parallel,
4054       but if you want to make sure you do not have half-completed jobs you
4055       should send the signal SIGHUP to GNU parallel:
4056
4057         killall -HUP parallel
4058
4059       This will tell GNU parallel to not start any new jobs, but wait until
4060       the currently running jobs are finished before exiting.
4061

ENVIRONMENT VARIABLES

4063       $PARALLEL_HOME
4064                Dir where GNU parallel stores config files, semaphores, and
4065                caches information between invocations. Default:
4066                $HOME/.parallel.
4067
4068       $PARALLEL_PID
4069                The environment variable $PARALLEL_PID is set by GNU parallel
4070                and is visible to the jobs started from GNU parallel. This
4071                makes it possible for the jobs to communicate directly to GNU
4072                parallel.  Remember to quote the $, so it gets evaluated by
4073                the correct shell.
4074
4075                Example: If each of the jobs tests a solution and one of jobs
4076                finds the solution the job can tell GNU parallel not to start
4077                more jobs by: kill -HUP $PARALLEL_PID. This only works on the
4078                local computer.
4079
4080       $PARALLEL_RSYNC_OPTS
4081                Options to pass on to rsync. Defaults to: -rlDzR.
4082
4083       $PARALLEL_SHELL
4084                Use this shell for the commands run by GNU parallel:
4085
4086                · $PARALLEL_SHELL. If undefined use:
4087
4088                · The shell that started GNU parallel. If that cannot be
4089                  determined:
4090
4091                · $SHELL. If undefined use:
4092
4093                · /bin/sh
4094
4095       $PARALLEL_SSH
4096                GNU parallel defaults to using ssh for remote access. This can
4097                be overridden with $PARALLEL_SSH, which again can be
4098                overridden with --ssh. It can also be set on a per server
4099                basis (see --sshlogin).
4100
4101       $PARALLEL_SSHLOGIN (beta testing)
4102                The environment variable $PARALLEL_SSHLOGIN is set by GNU
4103                parallel and is visible to the jobs started from GNU parallel.
4104                The value is the sshlogin line with number of cores removed.
4105                E.g.
4106
4107                  4//usr/bin/specialssh user@host
4108
4109                becomes:
4110
4111                  /usr/bin/specialssh user@host
4112
4113       $PARALLEL_SEQ
4114                $PARALLEL_SEQ will be set to the sequence number of the job
4115                running. Remember to quote the $, so it gets evaluated by the
4116                correct shell.
4117
4118                Example:
4119
4120                  seq 10 | parallel -N2 \
4121                    echo seq:'$'PARALLEL_SEQ arg1:{1} arg2:{2}
4122
4123       $PARALLEL_TMUX
4124                Path to tmux. If unset the tmux in $PATH is used.
4125
4126       $TMPDIR  Directory for temporary files. See: --tmpdir.
4127
4128       $PARALLEL
4129                The environment variable $PARALLEL will be used as default
4130                options for GNU parallel. If the variable contains special
4131                shell characters (e.g. $, *, or space) then these need to be
4132                to be escaped with \.
4133
4134                Example:
4135
4136                  cat list | parallel -j1 -k -v ls
4137                  cat list | parallel -j1 -k -v -S"myssh user@server" ls
4138
4139                can be written as:
4140
4141                  cat list | PARALLEL="-kvj1" parallel ls
4142                  cat list | PARALLEL='-kvj1 -S myssh\ user@server' \
4143                    parallel echo
4144
4145                Notice the \ in the middle is needed because 'myssh' and
4146                'user@server' must be one argument.
4147

DEFAULT PROFILE (CONFIG FILE)

4149       The global configuration file /etc/parallel/config, followed by user
4150       configuration file ~/.parallel/config (formerly known as .parallelrc)
4151       will be read in turn if they exist.  Lines starting with '#' will be
4152       ignored. The format can follow that of the environment variable
4153       $PARALLEL, but it is often easier to simply put each option on its own
4154       line.
4155
4156       Options on the command line take precedence, followed by the
4157       environment variable $PARALLEL, user configuration file
4158       ~/.parallel/config, and finally the global configuration file
4159       /etc/parallel/config.
4160
4161       Note that no file that is read for options, nor the environment
4162       variable $PARALLEL, may contain retired options such as --tollef.
4163

PROFILE FILES

4165       If --profile set, GNU parallel will read the profile from that file
4166       rather than the global or user configuration files. You can have
4167       multiple --profiles.
4168
4169       Profiles are searched for in ~/.parallel. If the name starts with / it
4170       is seen as an absolute path. If the name starts with ./ it is seen as a
4171       relative path from current dir.
4172
4173       Example: Profile for running a command on every sshlogin in
4174       ~/.ssh/sshlogins and prepend the output with the sshlogin:
4175
4176         echo --tag -S .. --nonall > ~/.parallel/n
4177         parallel -Jn uptime
4178
4179       Example: Profile for running every command with -j-1 and nice
4180
4181         echo -j-1 nice > ~/.parallel/nice_profile
4182         parallel -J nice_profile bzip2 -9 ::: *
4183
4184       Example: Profile for running a perl script before every command:
4185
4186         echo "perl -e '\$a=\$\$; print \$a,\" \",'\$PARALLEL_SEQ',\" \";';" \
4187           > ~/.parallel/pre_perl
4188         parallel -J pre_perl echo ::: *
4189
4190       Note how the $ and " need to be quoted using \.
4191
4192       Example: Profile for running distributed jobs with nice on the remote
4193       computers:
4194
4195         echo -S .. nice > ~/.parallel/dist
4196         parallel -J dist --trc {.}.bz2 bzip2 -9 ::: *
4197

EXIT STATUS

4199       Exit status depends on --halt-on-error if one of these is used:
4200       success=X, success=Y%, fail=Y%.
4201
4202       0     All jobs ran without error. If success=X is used: X jobs ran
4203             without error. If success=Y% is used: Y% of the jobs ran without
4204             error.
4205
4206       1-100 Some of the jobs failed. The exit status gives the number of
4207             failed jobs. If Y% is used the exit status is the percentage of
4208             jobs that failed.
4209
4210       101   More than 100 jobs failed.
4211
4212       255   Other error.
4213
4214       -1 (In joblog and SQL table)
4215             Killed by Ctrl-C, timeout, not enough memory or similar.
4216
4217       -2 (In joblog and SQL table)
4218             skip() was called in {= =}.
4219
4220       -1000 (In SQL table)
4221             Job is ready to run (set by --sqlmaster).
4222
4223       -1220 (In SQL table)
4224             Job is taken by worker (set by --sqlworker).
4225
4226       If fail=1 is used, the exit status will be the exit status of the
4227       failing job.
4228

DIFFERENCES BETWEEN GNU Parallel AND ALTERNATIVES

4230       See: man parallel_alternatives
4231

BUGS

4233   Quoting of newline
4234       Because of the way newline is quoted this will not work:
4235
4236         echo 1,2,3 | parallel -vkd, "echo 'a{}b'"
4237
4238       However, these will all work:
4239
4240         echo 1,2,3 | parallel -vkd, echo a{}b
4241         echo 1,2,3 | parallel -vkd, "echo 'a'{}'b'"
4242         echo 1,2,3 | parallel -vkd, "echo 'a'"{}"'b'"
4243
4244   Speed
4245       Startup
4246
4247       GNU parallel is slow at starting up - around 250 ms the first time and
4248       150 ms after that.
4249
4250       Job startup
4251
4252       Starting a job on the local machine takes around 10 ms. This can be a
4253       big overhead if the job takes very few ms to run. Often you can group
4254       small jobs together using -X which will make the overhead less
4255       significant. Or you can run multiple GNU parallels as described in
4256       EXAMPLE: Speeding up fast jobs.
4257
4258       SSH
4259
4260       When using multiple computers GNU parallel opens ssh connections to
4261       them to figure out how many connections can be used reliably
4262       simultaneously (Namely SSHD's MaxStartups). This test is done for each
4263       host in serial, so if your --sshloginfile contains many hosts it may be
4264       slow.
4265
4266       If your jobs are short you may see that there are fewer jobs running on
4267       the remote systems than expected. This is due to time spent logging in
4268       and out. -M may help here.
4269
4270       Disk access
4271
4272       A single disk can normally read data faster if it reads one file at a
4273       time instead of reading a lot of files in parallel, as this will avoid
4274       disk seeks. However, newer disk systems with multiple drives can read
4275       faster if reading from multiple files in parallel.
4276
4277       If the jobs are of the form read-all-compute-all-write-all, so
4278       everything is read before anything is written, it may be faster to
4279       force only one disk access at the time:
4280
4281         sem --id diskio cat file | compute | sem --id diskio cat > file
4282
4283       If the jobs are of the form read-compute-write, so writing starts
4284       before all reading is done, it may be faster to force only one reader
4285       and writer at the time:
4286
4287         sem --id read cat file | compute | sem --id write cat > file
4288
4289       If the jobs are of the form read-compute-read-compute, it may be faster
4290       to run more jobs in parallel than the system has CPUs, as some of the
4291       jobs will be stuck waiting for disk access.
4292
4293   --nice limits command length
4294       The current implementation of --nice is too pessimistic in the max
4295       allowed command length. It only uses a little more than half of what it
4296       could. This affects -X and -m. If this becomes a real problem for you,
4297       file a bug-report.
4298
4299   Aliases and functions do not work
4300       If you get:
4301
4302         Can't exec "command": No such file or directory
4303
4304       or:
4305
4306         open3: exec of by command failed
4307
4308       or:
4309
4310         /bin/bash: command: command not found
4311
4312       it may be because command is not known, but it could also be because
4313       command is an alias or a function. If it is a function you need to
4314       export -f the function first or use env_parallel. An alias will only
4315       work if you use env_parallel.
4316
4317   Database with MySQL fails randomly
4318       The --sql* options may fail randomly with MySQL. This problem does not
4319       exist with PostgreSQL.
4320

REPORTING BUGS

4322       Report bugs to <bug-parallel@gnu.org> or
4323       https://savannah.gnu.org/bugs/?func=additem&group=parallel
4324
4325       See a perfect bug report on
4326       https://lists.gnu.org/archive/html/bug-parallel/2015-01/msg00000.html
4327
4328       Your bug report should always include:
4329
4330       · The error message you get (if any). If the error message is not from
4331         GNU parallel you need to show why you think GNU parallel caused
4332         these.
4333
4334       · The complete output of parallel --version. If you are not running the
4335         latest released version (see http://ftp.gnu.org/gnu/parallel/) you
4336         should specify why you believe the problem is not fixed in that
4337         version.
4338
4339       · A minimal, complete, and verifiable example (See description on
4340         http://stackoverflow.com/help/mcve).
4341
4342         It should be a complete example that others can run that shows the
4343         problem including all files needed to run the example. This should
4344         preferably be small and simple, so try to remove as many options as
4345         possible. A combination of yes, seq, cat, echo, and sleep can
4346         reproduce most errors. If your example requires large files, see if
4347         you can make them by something like seq 1000000 > file or yes | head
4348         -n 10000000 > file.
4349
4350         If your example requires remote execution, see if you can use
4351         localhost - maybe using another login.
4352
4353         If you have access to a different system, test if the MCVE shows the
4354         problem on that system.
4355
4356       · The output of your example. If your problem is not easily reproduced
4357         by others, the output might help them figure out the problem.
4358
4359       · Whether you have watched the intro videos
4360         (http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1), walked
4361         through the tutorial (man parallel_tutorial), and read the EXAMPLE
4362         section in the man page (man parallel - search for EXAMPLE:).
4363
4364       If you suspect the error is dependent on your environment or
4365       distribution, please see if you can reproduce the error on one of these
4366       VirtualBox images:
4367       http://sourceforge.net/projects/virtualboximage/files/
4368       http://www.osboxes.org/virtualbox-images/
4369
4370       Specifying the name of your distribution is not enough as you may have
4371       installed software that is not in the VirtualBox images.
4372
4373       If you cannot reproduce the error on any of the VirtualBox images
4374       above, see if you can build a VirtualBox image on which you can
4375       reproduce the error. If not you should assume the debugging will be
4376       done through you. That will put more burden on you and it is extra
4377       important you give any information that help. In general the problem
4378       will be fixed faster and with less work for you if you can reproduce
4379       the error on a VirtualBox.
4380

AUTHOR

4382       When using GNU parallel for a publication please cite:
4383
4384       O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
4385       The USENIX Magazine, February 2011:42-47.
4386
4387       This helps funding further development; and it won't cost you a cent.
4388       If you pay 10000 EUR you should feel free to use GNU Parallel without
4389       citing.
4390
4391       Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
4392
4393       Copyright (C) 2008-2010 Ole Tange, http://ole.tange.dk
4394
4395       Copyright (C) 2010-2019 Ole Tange, http://ole.tange.dk and Free
4396       Software Foundation, Inc.
4397
4398       Parts of the manual concerning xargs compatibility is inspired by the
4399       manual of xargs from GNU findutils 4.4.2.
4400

LICENSE

4402       This program is free software; you can redistribute it and/or modify it
4403       under the terms of the GNU General Public License as published by the
4404       Free Software Foundation; either version 3 of the License, or at your
4405       option any later version.
4406
4407       This program is distributed in the hope that it will be useful, but
4408       WITHOUT ANY WARRANTY; without even the implied warranty of
4409       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
4410       General Public License for more details.
4411
4412       You should have received a copy of the GNU General Public License along
4413       with this program.  If not, see <http://www.gnu.org/licenses/>.
4414
4415   Documentation license I
4416       Permission is granted to copy, distribute and/or modify this
4417       documentation under the terms of the GNU Free Documentation License,
4418       Version 1.3 or any later version published by the Free Software
4419       Foundation; with no Invariant Sections, with no Front-Cover Texts, and
4420       with no Back-Cover Texts.  A copy of the license is included in the
4421       file fdl.txt.
4422
4423   Documentation license II
4424       You are free:
4425
4426       to Share to copy, distribute and transmit the work
4427
4428       to Remix to adapt the work
4429
4430       Under the following conditions:
4431
4432       Attribution
4433                You must attribute the work in the manner specified by the
4434                author or licensor (but not in any way that suggests that they
4435                endorse you or your use of the work).
4436
4437       Share Alike
4438                If you alter, transform, or build upon this work, you may
4439                distribute the resulting work only under the same, similar or
4440                a compatible license.
4441
4442       With the understanding that:
4443
4444       Waiver   Any of the above conditions can be waived if you get
4445                permission from the copyright holder.
4446
4447       Public Domain
4448                Where the work or any of its elements is in the public domain
4449                under applicable law, that status is in no way affected by the
4450                license.
4451
4452       Other Rights
4453                In no way are any of the following rights affected by the
4454                license:
4455
4456                · Your fair dealing or fair use rights, or other applicable
4457                  copyright exceptions and limitations;
4458
4459                · The author's moral rights;
4460
4461                · Rights other persons may have either in the work itself or
4462                  in how the work is used, such as publicity or privacy
4463                  rights.
4464
4465       Notice   For any reuse or distribution, you must make clear to others
4466                the license terms of this work.
4467
4468       A copy of the full license is included in the file as cc-by-sa.txt.
4469

DEPENDENCIES

4471       GNU parallel uses Perl, and the Perl modules Getopt::Long, IPC::Open3,
4472       Symbol, IO::File, POSIX, and File::Temp. For remote usage it also uses
4473       rsync with ssh.
4474