1PARALLEL(1) parallel PARALLEL(1)
2
3
4
6 parallel - build and execute shell command lines from standard input in
7 parallel
8
10 parallel [options] [command [arguments]] < list_of_arguments
11
12 parallel [options] [command [arguments]] ( ::: arguments | :::+
13 arguments | :::: argfile(s) | ::::+ argfile(s) ) ...
14
15 parallel --semaphore [options] command
16
17 #!/usr/bin/parallel --shebang [options] [command [arguments]]
18
19 #!/usr/bin/parallel --shebang-wrap [options] [command [arguments]]
20
22 STOP!
23
24 Read the Reader's guide below if you are new to GNU parallel.
25
26 GNU parallel is a shell tool for executing jobs in parallel using one
27 or more computers. A job can be a single command or a small script that
28 has to be run for each of the lines in the input. The typical input is
29 a list of files, a list of hosts, a list of users, a list of URLs, or a
30 list of tables. A job can also be a command that reads from a pipe. GNU
31 parallel can then split the input into blocks and pipe a block into
32 each command in parallel.
33
34 If you use xargs and tee today you will find GNU parallel very easy to
35 use as GNU parallel is written to have the same options as xargs. If
36 you write loops in shell, you will find GNU parallel may be able to
37 replace most of the loops and make them run faster by running several
38 jobs in parallel.
39
40 GNU parallel makes sure output from the commands is the same output as
41 you would get had you run the commands sequentially. This makes it
42 possible to use output from GNU parallel as input for other programs.
43
44 For each line of input GNU parallel will execute command with the line
45 as arguments. If no command is given, the line of input is executed.
46 Several lines will be run in parallel. GNU parallel can often be used
47 as a substitute for xargs or cat | bash.
48
49 Reader's guide
50 GNU parallel includes the 4 types of documentation: Tutorial, how-to,
51 reference and explanation.
52
53 Tutorial
54
55 If you prefer reading a book buy GNU Parallel 2018 at
56 http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
57 or download it at: https://doi.org/10.5281/zenodo.1146014 Read at least
58 chapter 1+2. It should take you less than 20 minutes.
59
60 Otherwise start by watching the intro videos for a quick introduction:
61 http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
62
63 If you want to dive deeper: spend a couple of hours walking through the
64 tutorial (man parallel_tutorial). Your command line will love you for
65 it.
66
67 How-to
68
69 You can find a lot of EXAMPLEs of use after the list of OPTIONS in man
70 parallel (Use LESS=+/EXAMPLE: man parallel). That will give you an idea
71 of what GNU parallel is capable of, and you may find a solution you can
72 simply adapt to your situation.
73
74 Reference
75
76 If you need a one page printable cheat sheet you can find it on:
77 https://www.gnu.org/software/parallel/parallel_cheat.pdf
78
79 The man page is the reference for all options.
80
81 Design discussion
82
83 If you want to know the design decisions behind GNU parallel, try: man
84 parallel_design. This is also a good intro if you intend to change GNU
85 parallel.
86
88 command
89 Command to execute. If command or the following arguments contain
90 replacement strings (such as {}) every instance will be substituted
91 with the input.
92
93 If command is given, GNU parallel solve the same tasks as xargs. If
94 command is not given GNU parallel will behave similar to cat | sh.
95
96 The command must be an executable, a script, a composed command, an
97 alias, or a function.
98
99 Bash functions: export -f the function first or use env_parallel.
100
101 Bash, Csh, or Tcsh aliases: Use env_parallel.
102
103 Zsh, Fish, Ksh, and Pdksh functions and aliases: Use env_parallel.
104
105 {} Input line. This replacement string will be replaced by a full line
106 read from the input source. The input source is normally stdin
107 (standard input), but can also be given with -a, :::, or ::::.
108
109 The replacement string {} can be changed with -I.
110
111 If the command line contains no replacement strings then {} will be
112 appended to the command line.
113
114 Replacement strings are normally quoted, so special characters are
115 not parsed by the shell. The exception is if the command starts
116 with a replacement string; then the string is not quoted.
117
118 {.} Input line without extension. This replacement string will be
119 replaced by the input with the extension removed. If the input line
120 contains . after the last /, the last . until the end of the string
121 will be removed and {.} will be replaced with the remaining. E.g.
122 foo.jpg becomes foo, subdir/foo.jpg becomes subdir/foo,
123 sub.dir/foo.jpg becomes sub.dir/foo, sub.dir/bar remains
124 sub.dir/bar. If the input line does not contain . it will remain
125 unchanged.
126
127 The replacement string {.} can be changed with --er.
128
129 To understand replacement strings see {}.
130
131 {/} Basename of input line. This replacement string will be replaced by
132 the input with the directory part removed.
133
134 The replacement string {/} can be changed with --basenamereplace.
135
136 To understand replacement strings see {}.
137
138 {//}
139 Dirname of input line. This replacement string will be replaced by
140 the dir of the input line. See dirname(1).
141
142 The replacement string {//} can be changed with --dirnamereplace.
143
144 To understand replacement strings see {}.
145
146 {/.}
147 Basename of input line without extension. This replacement string
148 will be replaced by the input with the directory and extension part
149 removed. It is a combination of {/} and {.}.
150
151 The replacement string {/.} can be changed with
152 --basenameextensionreplace.
153
154 To understand replacement strings see {}.
155
156 {#} Sequence number of the job to run. This replacement string will be
157 replaced by the sequence number of the job being run. It contains
158 the same number as $PARALLEL_SEQ.
159
160 The replacement string {#} can be changed with --seqreplace.
161
162 To understand replacement strings see {}.
163
164 {%} Job slot number. This replacement string will be replaced by the
165 job's slot number between 1 and number of jobs to run in parallel.
166 There will never be 2 jobs running at the same time with the same
167 job slot number.
168
169 The replacement string {%} can be changed with --slotreplace.
170
171 If the job needs to be retried (e.g using --retries or
172 --retry-failed) the job slot is not automatically updated. You
173 should then instead use $PARALLEL_JOBSLOT:
174
175 $ do_test() {
176 id="$3 {%}=$1 PARALLEL_JOBSLOT=$2"
177 echo run "$id";
178 sleep 1
179 # fail if {%} is odd
180 return `echo $1%2 | bc`
181 }
182 $ export -f do_test
183 $ parallel -j3 --jl mylog do_test {%} \$PARALLEL_JOBSLOT {} ::: A B C D
184 run A {%}=1 PARALLEL_JOBSLOT=1
185 run B {%}=2 PARALLEL_JOBSLOT=2
186 run C {%}=3 PARALLEL_JOBSLOT=3
187 run D {%}=1 PARALLEL_JOBSLOT=1
188 $ parallel --retry-failed -j3 --jl mylog do_test {%} \$PARALLEL_JOBSLOT {} ::: A B C D
189 run A {%}=1 PARALLEL_JOBSLOT=1
190 run C {%}=3 PARALLEL_JOBSLOT=2
191 run D {%}=1 PARALLEL_JOBSLOT=3
192
193 Notice how {%} and $PARALLEL_JOBSLOT differ in the retry run of C
194 and D.
195
196 To understand replacement strings see {}.
197
198 {n} Argument from input source n or the n'th argument. This positional
199 replacement string will be replaced by the input from input source
200 n (when used with -a or ::::) or with the n'th argument (when used
201 with -N). If n is negative it refers to the n'th last argument.
202
203 To understand replacement strings see {}.
204
205 {n.}
206 Argument from input source n or the n'th argument without
207 extension. It is a combination of {n} and {.}.
208
209 This positional replacement string will be replaced by the input
210 from input source n (when used with -a or ::::) or with the n'th
211 argument (when used with -N). The input will have the extension
212 removed.
213
214 To understand positional replacement strings see {n}.
215
216 {n/}
217 Basename of argument from input source n or the n'th argument. It
218 is a combination of {n} and {/}.
219
220 This positional replacement string will be replaced by the input
221 from input source n (when used with -a or ::::) or with the n'th
222 argument (when used with -N). The input will have the directory (if
223 any) removed.
224
225 To understand positional replacement strings see {n}.
226
227 {n//}
228 Dirname of argument from input source n or the n'th argument. It
229 is a combination of {n} and {//}.
230
231 This positional replacement string will be replaced by the dir of
232 the input from input source n (when used with -a or ::::) or with
233 the n'th argument (when used with -N). See dirname(1).
234
235 To understand positional replacement strings see {n}.
236
237 {n/.}
238 Basename of argument from input source n or the n'th argument
239 without extension. It is a combination of {n}, {/}, and {.}.
240
241 This positional replacement string will be replaced by the input
242 from input source n (when used with -a or ::::) or with the n'th
243 argument (when used with -N). The input will have the directory (if
244 any) and extension removed.
245
246 To understand positional replacement strings see {n}.
247
248 {=perl expression=}
249 Replace with calculated perl expression. $_ will contain the same
250 as {}. After evaluating perl expression $_ will be used as the
251 value. It is recommended to only change $_ but you have full access
252 to all of GNU parallel's internal functions and data structures. A
253 few convenience functions and data structures have been made:
254
255 Q(string) shell quote a string
256
257 pQ(string) perl quote a string
258
259 uq() (or uq) do not quote current replacement string
260
261 total_jobs() number of jobs in total
262
263 slot() slot number of job
264
265 seq() sequence number of job
266
267 @arg the arguments
268
269 Example:
270
271 seq 10 | parallel echo {} + 1 is {= '$_++' =}
272 parallel csh -c {= '$_="mkdir ".Q($_)' =} ::: '12" dir'
273 seq 50 | parallel echo job {#} of {= '$_=total_jobs()' =}
274
275 See also: --rpl --parens
276
277 {=n perl expression=}
278 Positional equivalent to {=perl expression=}. To understand
279 positional replacement strings see {n}.
280
281 See also: {=perl expression=} {n}.
282
283 ::: arguments
284 Use arguments from the command line as input source instead of
285 stdin (standard input). Unlike other options for GNU parallel :::
286 is placed after the command and before the arguments.
287
288 The following are equivalent:
289
290 (echo file1; echo file2) | parallel gzip
291 parallel gzip ::: file1 file2
292 parallel gzip {} ::: file1 file2
293 parallel --arg-sep ,, gzip {} ,, file1 file2
294 parallel --arg-sep ,, gzip ,, file1 file2
295 parallel ::: "gzip file1" "gzip file2"
296
297 To avoid treating ::: as special use --arg-sep to set the argument
298 separator to something else. See also --arg-sep.
299
300 If multiple ::: are given, each group will be treated as an input
301 source, and all combinations of input sources will be generated.
302 E.g. ::: 1 2 ::: a b c will result in the combinations (1,a) (1,b)
303 (1,c) (2,a) (2,b) (2,c). This is useful for replacing nested for-
304 loops.
305
306 ::: and :::: can be mixed. So these are equivalent:
307
308 parallel echo {1} {2} {3} ::: 6 7 ::: 4 5 ::: 1 2 3
309 parallel echo {1} {2} {3} :::: <(seq 6 7) <(seq 4 5) \
310 :::: <(seq 1 3)
311 parallel -a <(seq 6 7) echo {1} {2} {3} :::: <(seq 4 5) \
312 :::: <(seq 1 3)
313 parallel -a <(seq 6 7) -a <(seq 4 5) echo {1} {2} {3} \
314 ::: 1 2 3
315 seq 6 7 | parallel -a - -a <(seq 4 5) echo {1} {2} {3} \
316 ::: 1 2 3
317 seq 4 5 | parallel echo {1} {2} {3} :::: <(seq 6 7) - \
318 ::: 1 2 3
319
320 :::+ arguments
321 Like ::: but linked like --link to the previous input source.
322
323 Contrary to --link, values do not wrap: The shortest input source
324 determines the length.
325
326 Example:
327
328 parallel echo ::: a b c :::+ 1 2 3 ::: X Y :::+ 11 22
329
330 :::: argfiles
331 Another way to write -a argfile1 -a argfile2 ...
332
333 ::: and :::: can be mixed.
334
335 See -a, ::: and --link.
336
337 ::::+ argfiles
338 Like :::: but linked like --link to the previous input source.
339
340 Contrary to --link, values do not wrap: The shortest input source
341 determines the length.
342
343 --null
344 -0 Use NUL as delimiter. Normally input lines will end in \n
345 (newline). If they end in \0 (NUL), then use this option. It is
346 useful for processing arguments that may contain \n (newline).
347
348 --arg-file input-file
349 -a input-file
350 Use input-file as input source. If you use this option, stdin
351 (standard input) is given to the first process run. Otherwise,
352 stdin (standard input) is redirected from /dev/null.
353
354 If multiple -a are given, each input-file will be treated as an
355 input source, and all combinations of input sources will be
356 generated. E.g. The file foo contains 1 2, the file bar contains a
357 b c. -a foo -a bar will result in the combinations (1,a) (1,b)
358 (1,c) (2,a) (2,b) (2,c). This is useful for replacing nested for-
359 loops.
360
361 See also --link and {n}.
362
363 --arg-file-sep sep-str
364 Use sep-str instead of :::: as separator string between command and
365 argument files. Useful if :::: is used for something else by the
366 command.
367
368 See also: ::::.
369
370 --arg-sep sep-str
371 Use sep-str instead of ::: as separator string. Useful if ::: is
372 used for something else by the command.
373
374 Also useful if you command uses ::: but you still want to read
375 arguments from stdin (standard input): Simply change --arg-sep to a
376 string that is not in the command line.
377
378 See also: :::.
379
380 --bar
381 Show progress as a progress bar. In the bar is shown: % of jobs
382 completed, estimated seconds left, and number of jobs started.
383
384 It is compatible with zenity:
385
386 seq 1000 | parallel -j30 --bar '(echo {};sleep 0.1)' \
387 2> >(zenity --progress --auto-kill) | wc
388
389 --basefile file
390 --bf file
391 file will be transferred to each sshlogin before a job is started.
392 It will be removed if --cleanup is active. The file may be a script
393 to run or some common base data needed for the job. Multiple --bf
394 can be specified to transfer more basefiles. The file will be
395 transferred the same way as --transferfile.
396
397 --basenamereplace replace-str
398 --bnr replace-str
399 Use the replacement string replace-str instead of {/} for basename
400 of input line.
401
402 --basenameextensionreplace replace-str
403 --bner replace-str
404 Use the replacement string replace-str instead of {/.} for basename
405 of input line without extension.
406
407 --bin binexpr (beta testing)
408 Use binexpr as binning key and bin input to the jobs.
409
410 binexpr is [column number|column name] [perlexpression] e.g. 3,
411 Address, 3 $_%=100, Address s/\D//g.
412
413 Each input line is split using --colsep. The value of the column is
414 put into $_, the perl expression is executed, the resulting value
415 is is the job slot that will be given the line. If the value is
416 bigger than the number of jobslots the value will be modulo number
417 of jobslots.
418
419 This is similar to --shard but the hashing algorithm is a simple
420 modulo, which makes it predictible which jobslot will receive which
421 value.
422
423 The performance is in the order of 100K rows per second. Faster if
424 the bincol is small (<10), slower if it is big (>100).
425
426 --bin requires --pipe and a fixed numeric value for --jobs.
427
428 See also --shard, --group-by, --roundrobin.
429
430 --bg
431 Run command in background thus GNU parallel will not wait for
432 completion of the command before exiting. This is the default if
433 --semaphore is set.
434
435 See also: --fg, man sem.
436
437 Implies --semaphore.
438
439 --bibtex
440 --citation
441 Print the citation notice and BibTeX entry for GNU parallel,
442 silence citation notice for all future runs, and exit. It will not
443 run any commands.
444
445 If it is impossible for you to run --citation you can instead use
446 --will-cite, which will run commands, but which will only silence
447 the citation notice for this single run.
448
449 If you use --will-cite in scripts to be run by others you are
450 making it harder for others to see the citation notice. The
451 development of GNU parallel is indirectly financed through
452 citations, so if your users do not know they should cite then you
453 are making it harder to finance development. However, if you pay
454 10000 EUR, you have done your part to finance future development
455 and should feel free to use --will-cite in scripts.
456
457 If you do not want to help financing future development by letting
458 other users see the citation notice or by paying, then please use
459 another tool instead of GNU parallel. You can find some of the
460 alternatives in man parallel_alternatives.
461
462 --block size
463 --block-size size
464 Size of block in bytes to read at a time. The size can be postfixed
465 with K, M, G, T, P, E, k, m, g, t, p, or e which would multiply the
466 size with 1024, 1048576, 1073741824, 1099511627776,
467 1125899906842624, 1152921504606846976, 1000, 1000000, 1000000000,
468 1000000000000, 1000000000000000, or 1000000000000000000
469 respectively.
470
471 GNU parallel tries to meet the block size but can be off by the
472 length of one record. For performance reasons size should be bigger
473 than a two records. GNU parallel will warn you and automatically
474 increase the size if you choose a size that is too small.
475
476 If you use -N, --block-size should be bigger than N+1 records.
477
478 size defaults to 1M.
479
480 When using --pipepart a negative block size is not interpreted as a
481 blocksize but as the number of blocks each jobslot should have. So
482 this will run 10*5 = 50 jobs in total:
483
484 parallel --pipepart -a myfile --block -10 -j5 wc
485
486 This is an efficient alternative to --roundrobin because data is
487 never read by GNU parallel, but you can still have very few
488 jobslots process a large amount of data.
489
490 See --pipe and --pipepart for use of this.
491
492 --blocktimeout duration
493 --bt duration
494 Time out for reading block when using --pipe. If it takes longer
495 than duration to read a full block, use the partial block read so
496 far.
497
498 duration must be in whole seconds, but can be expressed as floats
499 postfixed with s, m, h, or d which would multiply the float by 1,
500 60, 3600, or 86400. Thus these are equivalent: --blocktimeout
501 100000 and --blocktimeout 1d3.5h16.6m4s.
502
503 --cat
504 Create a temporary file with content. Normally --pipe/--pipepart
505 will give data to the program on stdin (standard input). With --cat
506 GNU parallel will create a temporary file with the name in {}, so
507 you can do: parallel --pipe --cat wc {}.
508
509 Implies --pipe unless --pipepart is used.
510
511 See also --fifo.
512
513 --cleanup
514 Remove transferred files. --cleanup will remove the transferred
515 files on the remote computer after processing is done.
516
517 find log -name '*gz' | parallel \
518 --sshlogin server.example.com --transferfile {} \
519 --return {.}.bz2 --cleanup "zcat {} | bzip -9 >{.}.bz2"
520
521 With --transferfile {} the file transferred to the remote computer
522 will be removed on the remote computer. Directories created will
523 not be removed - even if they are empty.
524
525 With --return the file transferred from the remote computer will be
526 removed on the remote computer. Directories created will not be
527 removed - even if they are empty.
528
529 --cleanup is ignored when not used with --transferfile or --return.
530
531 --colsep regexp
532 -C regexp
533 Column separator. The input will be treated as a table with regexp
534 separating the columns. The n'th column can be accessed using {n}
535 or {n.}. E.g. {3} is the 3rd column.
536
537 If there are more input sources, each input source will be
538 separated, but the columns from each input source will be linked
539 (see --link).
540
541 parallel --colsep '-' echo {4} {3} {2} {1} \
542 ::: A-B C-D ::: e-f g-h
543
544 --colsep implies --trim rl, which can be overridden with --trim n.
545
546 regexp is a Perl Regular Expression:
547 http://perldoc.perl.org/perlre.html
548
549 --compress
550 Compress temporary files. If the output is big and very
551 compressible this will take up less disk space in $TMPDIR and
552 possibly be faster due to less disk I/O.
553
554 GNU parallel will try pzstd, lbzip2, pbzip2, zstd, pigz, lz4, lzop,
555 plzip, lzip, lrz, gzip, pxz, lzma, bzip2, xz, clzip, in that order,
556 and use the first available.
557
558 --compress-program prg
559 --decompress-program prg
560 Use prg for (de)compressing temporary files. It is assumed that prg
561 -dc will decompress stdin (standard input) to stdout (standard
562 output) unless --decompress-program is given.
563
564 --csv
565 Treat input as CSV-format. --colsep sets the field delimiter. It
566 works very much like --colsep except it deals correctly with
567 quoting:
568
569 echo '"1 big, 2 small","2""x4"" plank",12.34' |
570 parallel --csv echo {1} of {2} at {3}
571
572 Even quoted newlines are parsed correctly:
573
574 (echo '"Start of field 1 with newline'
575 echo 'Line 2 in field 1";value 2') |
576 parallel --csv --colsep ';' echo Field 1: {1} Field 2: {2}
577
578 When used with --pipe only pass full CSV-records.
579
580 --delay mytime
581 Delay starting next job by mytime. GNU parallel will pause mytime
582 after starting each job. mytime is normally in seconds, but can be
583 floats postfixed with s, m, h, or d which would multiply the float
584 by 1, 60, 3600, or 86400. Thus these are equivalent: --delay 100000
585 and --delay 1d3.5h16.6m4s.
586
587 --delimiter delim
588 -d delim
589 Input items are terminated by delim. Quotes and backslash are not
590 special; every character in the input is taken literally. Disables
591 the end-of-file string, which is treated like any other argument.
592 The specified delimiter may be characters, C-style character
593 escapes such as \n, or octal or hexadecimal escape codes. Octal
594 and hexadecimal escape codes are understood as for the printf
595 command. Multibyte characters are not supported.
596
597 --dirnamereplace replace-str
598 --dnr replace-str
599 Use the replacement string replace-str instead of {//} for dirname
600 of input line.
601
602 --dry-run
603 Print the job to run on stdout (standard output), but do not run
604 the job. Use -v -v to include the wrapping that GNU parallel
605 generates (for remote jobs, --tmux, --nice, --pipe, --pipepart,
606 --fifo and --cat). Do not count on this literally, though, as the
607 job may be scheduled on another computer or the local computer if :
608 is in the list.
609
610 -E eof-str
611 Set the end of file string to eof-str. If the end of file string
612 occurs as a line of input, the rest of the input is not read. If
613 neither -E nor -e is used, no end of file string is used.
614
615 --eof[=eof-str]
616 -e[eof-str]
617 This option is a synonym for the -E option. Use -E instead,
618 because it is POSIX compliant for xargs while this option is not.
619 If eof-str is omitted, there is no end of file string. If neither
620 -E nor -e is used, no end of file string is used.
621
622 --embed
623 Embed GNU parallel in a shell script. If you need to distribute
624 your script to someone who does not want to install GNU parallel
625 you can embed GNU parallel in your own shell script:
626
627 parallel --embed > new_script
628
629 After which you add your code at the end of new_script. This is
630 tested on ash, bash, dash, ksh, sh, and zsh.
631
632 --env var
633 Copy environment variable var. This will copy var to the
634 environment that the command is run in. This is especially useful
635 for remote execution.
636
637 In Bash var can also be a Bash function - just remember to export
638 -f the function, see command.
639
640 The variable '_' is special. It will copy all exported environment
641 variables except for the ones mentioned in
642 ~/.parallel/ignored_vars.
643
644 To copy the full environment (both exported and not exported
645 variables, arrays, and functions) use env_parallel.
646
647 See also: --record-env, --session.
648
649 --eta
650 Show the estimated number of seconds before finishing. This forces
651 GNU parallel to read all jobs before starting to find the number of
652 jobs. GNU parallel normally only reads the next job to run.
653
654 The estimate is based on the runtime of finished jobs, so the first
655 estimate will only be shown when the first job has finished.
656
657 Implies --progress.
658
659 See also: --bar, --progress.
660
661 --fg
662 Run command in foreground.
663
664 With --tmux and --tmuxpane GNU parallel will start tmux in the
665 foreground.
666
667 With --semaphore GNU parallel will run the command in the
668 foreground (opposite --bg), and wait for completion of the command
669 before exiting.
670
671 See also --bg, man sem.
672
673 --fifo
674 Create a temporary fifo with content. Normally --pipe and
675 --pipepart will give data to the program on stdin (standard input).
676 With --fifo GNU parallel will create a temporary fifo with the name
677 in {}, so you can do: parallel --pipe --fifo wc {}.
678
679 Beware: If data is not read from the fifo, the job will block
680 forever.
681
682 Implies --pipe unless --pipepart is used.
683
684 See also --cat.
685
686 --filter-hosts
687 Remove down hosts. For each remote host: check that login through
688 ssh works. If not: do not use this host.
689
690 For performance reasons, this check is performed only at the start
691 and every time --sshloginfile is changed. If an host goes down
692 after the first check, it will go undetected until --sshloginfile
693 is changed; --retries can be used to mitigate this.
694
695 Currently you can not put --filter-hosts in a profile, $PARALLEL,
696 /etc/parallel/config or similar. This is because GNU parallel uses
697 GNU parallel to compute this, so you will get an infinite loop.
698 This will likely be fixed in a later release.
699
700 --gnu
701 Behave like GNU parallel. This option historically took precedence
702 over --tollef. The --tollef option is now retired, and therefore
703 may not be used. --gnu is kept for compatibility.
704
705 --group
706 Group output. Output from each job is grouped together and is only
707 printed when the command is finished. Stdout (standard output)
708 first followed by stderr (standard error).
709
710 This takes in the order of 0.5ms per job and depends on the speed
711 of your disk for larger output. It can be disabled with -u, but
712 this means output from different commands can get mixed.
713
714 --group is the default. Can be reversed with -u.
715
716 See also: --line-buffer --ungroup
717
718 --group-by val
719 Group input by value. Combined with --pipe/--pipepart --group-by
720 groups lines with the same value into a record.
721
722 The value can be computed from the full line or from a single
723 column.
724
725 val can be:
726
727 column number Use the value in the column numbered.
728
729 column name Treat the first line as a header and use the value
730 in the column named.
731
732 (Not supported with --pipepart).
733
734 perl expression
735 Run the perl expression and use $_ as the value.
736
737 column number perl expression
738 Put the value of the column put in $_, run the perl
739 expression, and use $_ as the value.
740
741 column name perl expression
742 Put the value of the column put in $_, run the perl
743 expression, and use $_ as the value.
744
745 (Not supported with --pipepart).
746
747 Example:
748
749 UserID, Consumption
750 123, 1
751 123, 2
752 12-3, 1
753 221, 3
754 221, 1
755 2/21, 5
756
757 If you want to group 123, 12-3, 221, and 2/21 into 4 records and
758 pass one record at a time to wc:
759
760 tail -n +2 table.csv | \
761 parallel --pipe --colsep , --group-by 1 -kN1 wc
762
763 Make GNU parallel treat the first line as a header:
764
765 cat table.csv | \
766 parallel --pipe --colsep , --header : --group-by 1 -kN1 wc
767
768 Address column by column name:
769
770 cat table.csv | \
771 parallel --pipe --colsep , --header : --group-by UserID -kN1 wc
772
773 If 12-3 and 123 are really the same UserID, remove non-digits in
774 UserID when grouping:
775
776 cat table.csv | parallel --pipe --colsep , --header : \
777 --group-by 'UserID s/\D//g' -kN1 wc
778
779 See also --shard, --roundrobin.
780
781 --help
782 -h Print a summary of the options to GNU parallel and exit.
783
784 --halt-on-error val
785 --halt val
786 When should GNU parallel terminate? In some situations it makes no
787 sense to run all jobs. GNU parallel should simply give up as soon
788 as a condition is met.
789
790 val defaults to never, which runs all jobs no matter what.
791
792 val can also take on the form of when,why.
793
794 when can be 'now' which means kill all running jobs and halt
795 immediately, or it can be 'soon' which means wait for all running
796 jobs to complete, but start no new jobs.
797
798 why can be 'fail=X', 'fail=Y%', 'success=X', 'success=Y%',
799 'done=X', or 'done=Y%' where X is the number of jobs that has to
800 fail, succeed, or be done before halting, and Y is the percentage
801 of jobs that has to fail, succeed, or be done before halting.
802
803 Example:
804
805 --halt now,fail=1 exit when the first job fails. Kill running
806 jobs.
807
808 --halt soon,fail=3 exit when 3 jobs fail, but wait for running
809 jobs to complete.
810
811 --halt soon,fail=3% exit when 3% of the jobs have failed, but
812 wait for running jobs to complete.
813
814 --halt now,success=1 exit when a job succeeds. Kill running jobs.
815
816 --halt soon,success=3 exit when 3 jobs succeeds, but wait for
817 running jobs to complete.
818
819 --halt now,success=3% exit when 3% of the jobs have succeeded.
820 Kill running jobs.
821
822 --halt now,done=1 exit when one of the jobs finishes. Kill
823 running jobs.
824
825 --halt soon,done=3 exit when 3 jobs finishes, but wait for
826 running jobs to complete.
827
828 --halt now,done=3% exit when 3% of the jobs have finished. Kill
829 running jobs.
830
831 For backwards compatibility these also work:
832
833 0 never
834
835 1 soon,fail=1
836
837 2 now,fail=1
838
839 -1 soon,success=1
840
841 -2 now,success=1
842
843 1-99% soon,fail=1-99%
844
845 --header regexp
846 Use regexp as header. For normal usage the matched header
847 (typically the first line: --header '.*\n') will be split using
848 --colsep (which will default to '\t') and column names can be used
849 as replacement variables: {column name}, {column name/}, {column
850 name//}, {column name/.}, {column name.}, {=column name perl
851 expression =}, ..
852
853 For --pipe the matched header will be prepended to each output.
854
855 --header : is an alias for --header '.*\n'.
856
857 If regexp is a number, it is a fixed number of lines.
858
859 --hostgroups
860 --hgrp
861 Enable hostgroups on arguments. If an argument contains '@' the
862 string after '@' will be removed and treated as a list of
863 hostgroups on which this job is allowed to run. If there is no
864 --sshlogin with a corresponding group, the job will run on any
865 hostgroup.
866
867 Example:
868
869 parallel --hostgroups \
870 --sshlogin @grp1/myserver1 -S @grp1+grp2/myserver2 \
871 --sshlogin @grp3/myserver3 \
872 echo ::: my_grp1_arg@grp1 arg_for_grp2@grp2 third@grp1+grp3
873
874 my_grp1_arg may be run on either myserver1 or myserver2, third may
875 be run on either myserver1 or myserver3, but arg_for_grp2 will only
876 be run on myserver2.
877
878 See also: --sshlogin.
879
880 -I replace-str
881 Use the replacement string replace-str instead of {}.
882
883 --replace[=replace-str]
884 -i[replace-str]
885 This option is a synonym for -Ireplace-str if replace-str is
886 specified, and for -I {} otherwise. This option is deprecated; use
887 -I instead.
888
889 --joblog logfile
890 Logfile for executed jobs. Save a list of the executed jobs to
891 logfile in the following TAB separated format: sequence number,
892 sshlogin, start time as seconds since epoch, run time in seconds,
893 bytes in files transferred, bytes in files returned, exit status,
894 signal, and command run.
895
896 For --pipe bytes transferred and bytes returned are number of input
897 and output of bytes.
898
899 If logfile is prepended with '+' log lines will be appended to the
900 logfile.
901
902 To convert the times into ISO-8601 strict do:
903
904 cat logfile | perl -a -F"\t" -ne \
905 'chomp($F[2]=`date -d \@$F[2] +%FT%T`); print join("\t",@F)'
906
907 If the host is long, you can use column -t to pretty print it:
908
909 cat joblog | column -t
910
911 See also --resume --resume-failed.
912
913 --jobs N
914 -j N
915 --max-procs N
916 -P N
917 Number of jobslots on each machine. Run up to N jobs in parallel.
918 0 means as many as possible. Default is 100% which will run one job
919 per CPU on each machine.
920
921 If --semaphore is set, the default is 1 thus making a mutex.
922
923 --jobs +N
924 -j +N
925 --max-procs +N
926 -P +N
927 Add N to the number of CPUs. Run this many jobs in parallel. See
928 also --use-cores-instead-of-threads and
929 --use-sockets-instead-of-threads.
930
931 --jobs -N
932 -j -N
933 --max-procs -N
934 -P -N
935 Subtract N from the number of CPUs. Run this many jobs in
936 parallel. If the evaluated number is less than 1 then 1 will be
937 used. See also --use-cores-instead-of-threads and
938 --use-sockets-instead-of-threads.
939
940 --jobs N%
941 -j N%
942 --max-procs N%
943 -P N%
944 Multiply N% with the number of CPUs. Run this many jobs in
945 parallel. See also --use-cores-instead-of-threads and
946 --use-sockets-instead-of-threads.
947
948 --jobs procfile
949 -j procfile
950 --max-procs procfile
951 -P procfile
952 Read parameter from file. Use the content of procfile as parameter
953 for -j. E.g. procfile could contain the string 100% or +2 or 10. If
954 procfile is changed when a job completes, procfile is read again
955 and the new number of jobs is computed. If the number is lower than
956 before, running jobs will be allowed to finish but new jobs will
957 not be started until the wanted number of jobs has been reached.
958 This makes it possible to change the number of simultaneous running
959 jobs while GNU parallel is running.
960
961 --keep-order
962 -k Keep sequence of output same as the order of input. Normally the
963 output of a job will be printed as soon as the job completes. Try
964 this to see the difference:
965
966 parallel -j4 sleep {}\; echo {} ::: 2 1 4 3
967 parallel -j4 -k sleep {}\; echo {} ::: 2 1 4 3
968
969 If used with --onall or --nonall the output will grouped by
970 sshlogin in sorted order.
971
972 If used with --pipe --roundrobin and the same input, the jobslots
973 will get the same blocks in the same order in every run.
974
975 -k only affects the order in which the output is printed - not the
976 order in which jobs are run.
977
978 -L recsize
979 When used with --pipe: Read records of recsize.
980
981 When used otherwise: Use at most recsize nonblank input lines per
982 command line. Trailing blanks cause an input line to be logically
983 continued on the next input line.
984
985 -L 0 means read one line, but insert 0 arguments on the command
986 line.
987
988 Implies -X unless -m, --xargs, or --pipe is set.
989
990 --max-lines[=recsize]
991 -l[recsize]
992 When used with --pipe: Read records of recsize lines.
993
994 When used otherwise: Synonym for the -L option. Unlike -L, the
995 recsize argument is optional. If recsize is not specified, it
996 defaults to one. The -l option is deprecated since the POSIX
997 standard specifies -L instead.
998
999 -l 0 is an alias for -l 1.
1000
1001 Implies -X unless -m, --xargs, or --pipe is set.
1002
1003 --limit "command args"
1004 Dynamic job limit. Before starting a new job run command with args.
1005 The exit value of command determines what GNU parallel will do:
1006
1007 0 Below limit. Start another job.
1008
1009 1 Over limit. Start no jobs.
1010
1011 2 Way over limit. Kill the youngest job.
1012
1013 You can use any shell command. There are 3 predefined commands:
1014
1015 "io n" Limit for I/O. The amount of disk I/O will be computed as
1016 a value 0-100, where 0 is no I/O and 100 is at least one
1017 disk is 100% saturated.
1018
1019 "load n" Similar to --load.
1020
1021 "mem n" Similar to --memfree.
1022
1023 --line-buffer
1024 --lb
1025 Buffer output on line basis. --group will keep the output together
1026 for a whole job. --ungroup allows output to mixup with half a line
1027 coming from one job and half a line coming from another job.
1028 --line-buffer fits between these two: GNU parallel will print a
1029 full line, but will allow for mixing lines of different jobs.
1030
1031 --line-buffer takes more CPU power than both --group and --ungroup,
1032 but can be much faster than --group if the CPU is not the limiting
1033 factor.
1034
1035 Normally --line-buffer does not buffer on disk, and can thus
1036 process an infinite amount of data, but it will buffer on disk when
1037 combined with: --keep-order, --results, --compress, and --files.
1038 This will make it as slow as --group and will limit output to the
1039 available disk space.
1040
1041 With --keep-order --line-buffer will output lines from the first
1042 job continuously while it is running, then lines from the second
1043 job while that is running. It will buffer full lines, but jobs will
1044 not mix. Compare:
1045
1046 parallel -j0 'echo {};sleep {};echo {}' ::: 1 3 2 4
1047 parallel -j0 --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
1048 parallel -j0 -k --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
1049
1050 See also: --group --ungroup
1051
1052 --xapply
1053 --link
1054 Link input sources. Read multiple input sources like xapply. If
1055 multiple input sources are given, one argument will be read from
1056 each of the input sources. The arguments can be accessed in the
1057 command as {1} .. {n}, so {1} will be a line from the first input
1058 source, and {6} will refer to the line with the same line number
1059 from the 6th input source.
1060
1061 Compare these two:
1062
1063 parallel echo {1} {2} ::: 1 2 3 ::: a b c
1064 parallel --link echo {1} {2} ::: 1 2 3 ::: a b c
1065
1066 Arguments will be recycled if one input source has more arguments
1067 than the others:
1068
1069 parallel --link echo {1} {2} {3} \
1070 ::: 1 2 ::: I II III ::: a b c d e f g
1071
1072 See also --header, :::+, ::::+.
1073
1074 --load max-load
1075 Do not start new jobs on a given computer unless the number of
1076 running processes on the computer is less than max-load. max-load
1077 uses the same syntax as --jobs, so 100% for one per CPU is a valid
1078 setting. Only difference is 0 which is interpreted as 0.01.
1079
1080 --controlmaster
1081 -M Use ssh's ControlMaster to make ssh connections faster. Useful if
1082 jobs run remote and are very fast to run. This is disabled for
1083 sshlogins that specify their own ssh command.
1084
1085 -m Multiple arguments. Insert as many arguments as the command line
1086 length permits. If multiple jobs are being run in parallel:
1087 distribute the arguments evenly among the jobs. Use -j1 or --xargs
1088 to avoid this.
1089
1090 If {} is not used the arguments will be appended to the line. If
1091 {} is used multiple times each {} will be replaced with all the
1092 arguments.
1093
1094 Support for -m with --sshlogin is limited and may fail.
1095
1096 See also -X for context replace. If in doubt use -X as that will
1097 most likely do what is needed.
1098
1099 --memfree size
1100 Minimum memory free when starting another job. The size can be
1101 postfixed with K, M, G, T, P, k, m, g, t, or p which would multiply
1102 the size with 1024, 1048576, 1073741824, 1099511627776,
1103 1125899906842624, 1000, 1000000, 1000000000, 1000000000000, or
1104 1000000000000000, respectively.
1105
1106 If the jobs take up very different amount of RAM, GNU parallel will
1107 only start as many as there is memory for. If less than size bytes
1108 are free, no more jobs will be started. If less than 50% size bytes
1109 are free, the youngest job will be killed, and put back on the
1110 queue to be run later.
1111
1112 --retries must be set to determine how many times GNU parallel
1113 should retry a given job.
1114
1115 --minversion version
1116 Print the version GNU parallel and exit. If the current version of
1117 GNU parallel is less than version the exit code is 255. Otherwise
1118 it is 0.
1119
1120 This is useful for scripts that depend on features only available
1121 from a certain version of GNU parallel.
1122
1123 --max-args=max-args
1124 -n max-args
1125 Use at most max-args arguments per command line. Fewer than max-
1126 args arguments will be used if the size (see the -s option) is
1127 exceeded, unless the -x option is given, in which case GNU parallel
1128 will exit.
1129
1130 -n 0 means read one argument, but insert 0 arguments on the command
1131 line.
1132
1133 Implies -X unless -m is set.
1134
1135 --max-replace-args=max-args
1136 -N max-args
1137 Use at most max-args arguments per command line. Like -n but also
1138 makes replacement strings {1} .. {max-args} that represents
1139 argument 1 .. max-args. If too few args the {n} will be empty.
1140
1141 -N 0 means read one argument, but insert 0 arguments on the command
1142 line.
1143
1144 This will set the owner of the homedir to the user:
1145
1146 tr ':' '\n' < /etc/passwd | parallel -N7 chown {1} {6}
1147
1148 Implies -X unless -m or --pipe is set.
1149
1150 When used with --pipe -N is the number of records to read. This is
1151 somewhat slower than --block.
1152
1153 --nonall
1154 --onall with no arguments. Run the command on all computers given
1155 with --sshlogin but take no arguments. GNU parallel will log into
1156 --jobs number of computers in parallel and run the job on the
1157 computer. -j adjusts how many computers to log into in parallel.
1158
1159 This is useful for running the same command (e.g. uptime) on a list
1160 of servers.
1161
1162 --onall
1163 Run all the jobs on all computers given with --sshlogin. GNU
1164 parallel will log into --jobs number of computers in parallel and
1165 run one job at a time on the computer. The order of the jobs will
1166 not be changed, but some computers may finish before others.
1167
1168 When using --group the output will be grouped by each server, so
1169 all the output from one server will be grouped together.
1170
1171 --joblog will contain an entry for each job on each server, so
1172 there will be several job sequence 1.
1173
1174 --output-as-files
1175 --outputasfiles
1176 --files
1177 Instead of printing the output to stdout (standard output) the
1178 output of each job is saved in a file and the filename is then
1179 printed.
1180
1181 See also: --results
1182
1183 --pipe
1184 --spreadstdin
1185 Spread input to jobs on stdin (standard input). Read a block of
1186 data from stdin (standard input) and give one block of data as
1187 input to one job.
1188
1189 The block size is determined by --block. The strings --recstart and
1190 --recend tell GNU parallel how a record starts and/or ends. The
1191 block read will have the final partial record removed before the
1192 block is passed on to the job. The partial record will be prepended
1193 to next block.
1194
1195 If --recstart is given this will be used to split at record start.
1196
1197 If --recend is given this will be used to split at record end.
1198
1199 If both --recstart and --recend are given both will have to match
1200 to find a split position.
1201
1202 If neither --recstart nor --recend are given --recend defaults to
1203 '\n'. To have no record separator use --recend "".
1204
1205 --files is often used with --pipe.
1206
1207 --pipe maxes out at around 1 GB/s input, and 100 MB/s output. If
1208 performance is important use --pipepart.
1209
1210 See also: --recstart, --recend, --fifo, --cat, --pipepart, --files.
1211
1212 --pipepart
1213 Pipe parts of a physical file. --pipepart works similar to --pipe,
1214 but is much faster.
1215
1216 --pipepart has a few limitations:
1217
1218 · The file must be a normal file or a block device (technically it
1219 must be seekable) and must be given using -a or ::::. The file
1220 cannot be a pipe or a fifo as they are not seekable.
1221
1222 If using a block device with lot of NUL bytes, remember to set
1223 --recend ''.
1224
1225 · Record counting (-N) and line counting (-L/-l) do not work.
1226
1227 --plain
1228 Ignore any --profile, $PARALLEL, and ~/.parallel/config to get full
1229 control on the command line (used by GNU parallel internally when
1230 called with --sshlogin).
1231
1232 --plus
1233 Activate additional replacement strings: {+/} {+.} {+..} {+...}
1234 {..} {...} {/..} {/...} {##}. The idea being that '{+foo}' matches
1235 the opposite of '{foo}' and {} = {+/}/{/} = {.}.{+.} =
1236 {+/}/{/.}.{+.} = {..}.{+..} = {+/}/{/..}.{+..} = {...}.{+...} =
1237 {+/}/{/...}.{+...}
1238
1239 {##} is the total number of jobs to be run. It is incompatible with
1240 -X/-m/--xargs.
1241
1242 {choose_k} is inspired by n choose k: Given a list of n elements,
1243 choose k. k is the number of input sources and n is the number of
1244 arguments in an input source. The content of the input sources
1245 must be the same and the arguments must be unique.
1246
1247 Shorthands for variables:
1248
1249 {slot} $PARALLEL_JOBSLOT (see {%})
1250 {sshlogin} $PARALLEL_SSHLOGIN
1251 {host} $PARALLEL_SSHHOST
1252
1253 The following dynamic replacement strings are also activated. They
1254 are inspired by bash's parameter expansion:
1255
1256 {:-str} str if the value is empty
1257 {:num} remove the first num characters
1258 {:num1:num2} characters from num1 to num2
1259 {#str} remove prefix str
1260 {%str} remove postfix str
1261 {/str1/str2} replace str1 with str2
1262 {^str} uppercase str if found at the start
1263 {^^str} uppercase str
1264 {,str} lowercase str if found at the start
1265 {,,str} lowercase str
1266
1267 --progress
1268 Show progress of computations. List the computers involved in the
1269 task with number of CPUs detected and the max number of jobs to
1270 run. After that show progress for each computer: number of running
1271 jobs, number of completed jobs, and percentage of all jobs done by
1272 this computer. The percentage will only be available after all jobs
1273 have been scheduled as GNU parallel only read the next job when
1274 ready to schedule it - this is to avoid wasting time and memory by
1275 reading everything at startup.
1276
1277 By sending GNU parallel SIGUSR2 you can toggle turning on/off
1278 --progress on a running GNU parallel process.
1279
1280 See also --eta and --bar.
1281
1282 --max-line-length-allowed
1283 Print the maximal number of characters allowed on the command line
1284 and exit (used by GNU parallel itself to determine the line length
1285 on remote computers).
1286
1287 --number-of-cpus (obsolete)
1288 Print the number of physical CPU cores and exit.
1289
1290 --number-of-cores
1291 Print the number of physical CPU cores and exit (used by GNU
1292 parallel itself to determine the number of physical CPU cores on
1293 remote computers).
1294
1295 --number-of-sockets
1296 Print the number of filled CPU sockets and exit (used by GNU
1297 parallel itself to determine the number of filled CPU sockets on
1298 remote computers).
1299
1300 --number-of-threads
1301 Print the number of hyperthreaded CPU cores and exit (used by GNU
1302 parallel itself to determine the number of hyperthreaded CPU cores
1303 on remote computers).
1304
1305 --no-keep-order
1306 Overrides an earlier --keep-order (e.g. if set in
1307 ~/.parallel/config).
1308
1309 --nice niceness
1310 Run the command at this niceness.
1311
1312 By default GNU parallel will run jobs at the same nice level as GNU
1313 parallel is started - both on the local machine and remote servers,
1314 so you are unlikely to ever use this option.
1315
1316 Setting --nice will override this nice level. If the nice level is
1317 smaller than the current nice level, it will only affect remote
1318 jobs (e.g. if current level is 10 then --nice 5 will cause local
1319 jobs to be run at level 10, but remote jobs run at nice level 5).
1320
1321 --interactive
1322 -p Prompt the user about whether to run each command line and read a
1323 line from the terminal. Only run the command line if the response
1324 starts with 'y' or 'Y'. Implies -t.
1325
1326 --parens parensstring
1327 Define start and end parenthesis for {= perl expression =}. The
1328 left and the right parenthesis can be multiple characters and are
1329 assumed to be the same length. The default is {==} giving {= as the
1330 start parenthesis and =} as the end parenthesis.
1331
1332 Another useful setting is ,,,, which would make both parenthesis
1333 ,,:
1334
1335 parallel --parens ,,,, echo foo is ,,s/I/O/g,, ::: FII
1336
1337 See also: --rpl {= perl expression =}
1338
1339 --profile profilename
1340 -J profilename
1341 Use profile profilename for options. This is useful if you want to
1342 have multiple profiles. You could have one profile for running jobs
1343 in parallel on the local computer and a different profile for
1344 running jobs on remote computers. See the section PROFILE FILES for
1345 examples.
1346
1347 profilename corresponds to the file ~/.parallel/profilename.
1348
1349 You can give multiple profiles by repeating --profile. If parts of
1350 the profiles conflict, the later ones will be used.
1351
1352 Default: config
1353
1354 --quote
1355 -q Quote command. If your command contains special characters that
1356 should not be interpreted by the shell (e.g. ; \ | *), use --quote
1357 to escape these. The command must be a simple command (see man
1358 bash) without redirections and without variable assignments.
1359
1360 See the section QUOTING. Most people will not need this. Quoting
1361 is disabled by default.
1362
1363 --no-run-if-empty
1364 -r If the stdin (standard input) only contains whitespace, do not run
1365 the command.
1366
1367 If used with --pipe this is slow.
1368
1369 --noswap
1370 Do not start new jobs on a given computer if there is both swap-in
1371 and swap-out activity.
1372
1373 The swap activity is only sampled every 10 seconds as the sampling
1374 takes 1 second to do.
1375
1376 Swap activity is computed as (swap-in)*(swap-out) which in practice
1377 is a good value: swapping out is not a problem, swapping in is not
1378 a problem, but both swapping in and out usually indicates a
1379 problem.
1380
1381 --memfree may give better results, so try using that first.
1382
1383 --record-env
1384 Record current environment variables in ~/.parallel/ignored_vars.
1385 This is useful before using --env _.
1386
1387 See also --env, --session.
1388
1389 --recstart startstring
1390 --recend endstring
1391 If --recstart is given startstring will be used to split at record
1392 start.
1393
1394 If --recend is given endstring will be used to split at record end.
1395
1396 If both --recstart and --recend are given the combined string
1397 endstringstartstring will have to match to find a split position.
1398 This is useful if either startstring or endstring match in the
1399 middle of a record.
1400
1401 If neither --recstart nor --recend are given then --recend defaults
1402 to '\n'. To have no record separator use --recend "".
1403
1404 --recstart and --recend are used with --pipe.
1405
1406 Use --regexp to interpret --recstart and --recend as regular
1407 expressions. This is slow, however.
1408
1409 --regexp
1410 Use --regexp to interpret --recstart and --recend as regular
1411 expressions. This is slow, however.
1412
1413 --remove-rec-sep
1414 --removerecsep
1415 --rrs
1416 Remove the text matched by --recstart and --recend before piping it
1417 to the command.
1418
1419 Only used with --pipe.
1420
1421 --results name
1422 --res name
1423 Save the output into files.
1424
1425 Simple string output dir
1426
1427 If name does not contain replacement strings and does not end in
1428 .csv/.tsv, the output will be stored in a directory tree rooted at
1429 name. Within this directory tree, each command will result in
1430 three files: name/<ARGS>/stdout and name/<ARGS>/stderr,
1431 name/<ARGS>/seq, where <ARGS> is a sequence of directories
1432 representing the header of the input source (if using --header :)
1433 or the number of the input source and corresponding values.
1434
1435 E.g:
1436
1437 parallel --header : --results foo echo {a} {b} \
1438 ::: a I II ::: b III IIII
1439
1440 will generate the files:
1441
1442 foo/a/II/b/III/seq
1443 foo/a/II/b/III/stderr
1444 foo/a/II/b/III/stdout
1445 foo/a/II/b/IIII/seq
1446 foo/a/II/b/IIII/stderr
1447 foo/a/II/b/IIII/stdout
1448 foo/a/I/b/III/seq
1449 foo/a/I/b/III/stderr
1450 foo/a/I/b/III/stdout
1451 foo/a/I/b/IIII/seq
1452 foo/a/I/b/IIII/stderr
1453 foo/a/I/b/IIII/stdout
1454
1455 and
1456
1457 parallel --results foo echo {1} {2} ::: I II ::: III IIII
1458
1459 will generate the files:
1460
1461 foo/1/II/2/III/seq
1462 foo/1/II/2/III/stderr
1463 foo/1/II/2/III/stdout
1464 foo/1/II/2/IIII/seq
1465 foo/1/II/2/IIII/stderr
1466 foo/1/II/2/IIII/stdout
1467 foo/1/I/2/III/seq
1468 foo/1/I/2/III/stderr
1469 foo/1/I/2/III/stdout
1470 foo/1/I/2/IIII/seq
1471 foo/1/I/2/IIII/stderr
1472 foo/1/I/2/IIII/stdout
1473
1474 CSV file output
1475
1476 If name ends in .csv/.tsv the output will be a CSV-file named name.
1477
1478 .csv gives a comma separated value file. .tsv gives a TAB separated
1479 value file.
1480
1481 -.csv/-.tsv are special: It will give the file on stdout (standard
1482 output).
1483
1484 Replacement string output file
1485
1486 If name contains a replacement string and the replaced result does
1487 not end in /, then the standard output will be stored in a file
1488 named by this result. Standard error will be stored in the same
1489 file name with '.err' added, and the sequence number will be stored
1490 in the same file name with '.seq' added.
1491
1492 E.g.
1493
1494 parallel --results my_{} echo ::: foo bar baz
1495
1496 will generate the files:
1497
1498 my_bar
1499 my_bar.err
1500 my_bar.seq
1501 my_baz
1502 my_baz.err
1503 my_baz.seq
1504 my_foo
1505 my_foo.err
1506 my_foo.seq
1507
1508 Replacement string output dir
1509
1510 If name contains a replacement string and the replaced result ends
1511 in /, then output files will be stored in the resulting dir.
1512
1513 E.g.
1514
1515 parallel --results my_{}/ echo ::: foo bar baz
1516
1517 will generate the files:
1518
1519 my_bar/seq
1520 my_bar/stderr
1521 my_bar/stdout
1522 my_baz/seq
1523 my_baz/stderr
1524 my_baz/stdout
1525 my_foo/seq
1526 my_foo/stderr
1527 my_foo/stdout
1528
1529 See also --files, --tag, --header, --joblog.
1530
1531 --resume
1532 Resumes from the last unfinished job. By reading --joblog or the
1533 --results dir GNU parallel will figure out the last unfinished job
1534 and continue from there. As GNU parallel only looks at the sequence
1535 numbers in --joblog then the input, the command, and --joblog all
1536 have to remain unchanged; otherwise GNU parallel may run wrong
1537 commands.
1538
1539 See also --joblog, --results, --resume-failed, --retries.
1540
1541 --resume-failed
1542 Retry all failed and resume from the last unfinished job. By
1543 reading --joblog GNU parallel will figure out the failed jobs and
1544 run those again. After that it will resume last unfinished job and
1545 continue from there. As GNU parallel only looks at the sequence
1546 numbers in --joblog then the input, the command, and --joblog all
1547 have to remain unchanged; otherwise GNU parallel may run wrong
1548 commands.
1549
1550 See also --joblog, --resume, --retry-failed, --retries.
1551
1552 --retry-failed
1553 Retry all failed jobs in joblog. By reading --joblog GNU parallel
1554 will figure out the failed jobs and run those again.
1555
1556 --retry-failed ignores the command and arguments on the command
1557 line: It only looks at the joblog.
1558
1559 Differences between --resume, --resume-failed, --retry-failed
1560
1561 In this example exit {= $_%=2 =} will cause every other job to
1562 fail.
1563
1564 timeout -k 1 4 parallel --joblog log -j10 \
1565 'sleep {}; exit {= $_%=2 =}' ::: {10..1}
1566
1567 4 jobs completed. 2 failed:
1568
1569 Seq [...] Exitval Signal Command
1570 10 [...] 1 0 sleep 1; exit 1
1571 9 [...] 0 0 sleep 2; exit 0
1572 8 [...] 1 0 sleep 3; exit 1
1573 7 [...] 0 0 sleep 4; exit 0
1574
1575 --resume does not care about the Exitval, but only looks at Seq. If
1576 the Seq is run, it will not be run again. So if needed, you can
1577 change the command for the seqs not run yet:
1578
1579 parallel --resume --joblog log -j10 \
1580 'sleep .{}; exit {= $_%=2 =}' ::: {10..1}
1581
1582 Seq [...] Exitval Signal Command
1583 [... as above ...]
1584 1 [...] 0 0 sleep .10; exit 0
1585 6 [...] 1 0 sleep .5; exit 1
1586 5 [...] 0 0 sleep .6; exit 0
1587 4 [...] 1 0 sleep .7; exit 1
1588 3 [...] 0 0 sleep .8; exit 0
1589 2 [...] 1 0 sleep .9; exit 1
1590
1591 --resume-failed cares about the Exitval, but also only looks at Seq
1592 to figure out which commands to run. Again this means you can
1593 change the command, but not the arguments. It will run the failed
1594 seqs and the seqs not yet run:
1595
1596 parallel --resume-failed --joblog log -j10 \
1597 'echo {};sleep .{}; exit {= $_%=3 =}' ::: {10..1}
1598
1599 Seq [...] Exitval Signal Command
1600 [... as above ...]
1601 10 [...] 1 0 echo 1;sleep .1; exit 1
1602 8 [...] 0 0 echo 3;sleep .3; exit 0
1603 6 [...] 2 0 echo 5;sleep .5; exit 2
1604 4 [...] 1 0 echo 7;sleep .7; exit 1
1605 2 [...] 0 0 echo 9;sleep .9; exit 0
1606
1607 --retry-failed cares about the Exitval, but takes the command from
1608 the joblog. It ignores any arguments or commands given on the
1609 command line:
1610
1611 parallel --retry-failed --joblog log -j10 this part is ignored
1612
1613 Seq [...] Exitval Signal Command
1614 [... as above ...]
1615 10 [...] 1 0 echo 1;sleep .1; exit 1
1616 6 [...] 2 0 echo 5;sleep .5; exit 2
1617 4 [...] 1 0 echo 7;sleep .7; exit 1
1618
1619 See also --joblog, --resume, --resume-failed, --retries.
1620
1621 --retries n
1622 If a job fails, retry it on another computer on which it has not
1623 failed. Do this n times. If there are fewer than n computers in
1624 --sshlogin GNU parallel will re-use all the computers. This is
1625 useful if some jobs fail for no apparent reason (such as network
1626 failure).
1627
1628 --return filename
1629 Transfer files from remote computers. --return is used with
1630 --sshlogin when the arguments are files on the remote computers.
1631 When processing is done the file filename will be transferred from
1632 the remote computer using rsync and will be put relative to the
1633 default login dir. E.g.
1634
1635 echo foo/bar.txt | parallel --return {.}.out \
1636 --sshlogin server.example.com touch {.}.out
1637
1638 This will transfer the file $HOME/foo/bar.out from the computer
1639 server.example.com to the file foo/bar.out after running touch
1640 foo/bar.out on server.example.com.
1641
1642 parallel -S server --trc out/./{}.out touch {}.out ::: in/file
1643
1644 This will transfer the file in/file.out from the computer
1645 server.example.com to the files out/in/file.out after running touch
1646 in/file.out on server.
1647
1648 echo /tmp/foo/bar.txt | parallel --return {.}.out \
1649 --sshlogin server.example.com touch {.}.out
1650
1651 This will transfer the file /tmp/foo/bar.out from the computer
1652 server.example.com to the file /tmp/foo/bar.out after running touch
1653 /tmp/foo/bar.out on server.example.com.
1654
1655 Multiple files can be transferred by repeating the option multiple
1656 times:
1657
1658 echo /tmp/foo/bar.txt | parallel \
1659 --sshlogin server.example.com \
1660 --return {.}.out --return {.}.out2 touch {.}.out {.}.out2
1661
1662 --return is often used with --transferfile and --cleanup.
1663
1664 --return is ignored when used with --sshlogin : or when not used
1665 with --sshlogin.
1666
1667 --round-robin
1668 --round
1669 Normally --pipe will give a single block to each instance of the
1670 command. With --roundrobin all blocks will at random be written to
1671 commands already running. This is useful if the command takes a
1672 long time to initialize.
1673
1674 --keep-order will not work with --roundrobin as it is impossible to
1675 track which input block corresponds to which output.
1676
1677 --roundrobin implies --pipe, except if --pipepart is given.
1678
1679 See also --group-by, --shard.
1680
1681 --rpl 'tag perl expression'
1682 Use tag as a replacement string for perl expression. This makes it
1683 possible to define your own replacement strings. GNU parallel's 7
1684 replacement strings are implemented as:
1685
1686 --rpl '{} '
1687 --rpl '{#} 1 $_=$job->seq()'
1688 --rpl '{%} 1 $_=$job->slot()'
1689 --rpl '{/} s:.*/::'
1690 --rpl '{//} $Global::use{"File::Basename"} ||=
1691 eval "use File::Basename; 1;"; $_ = dirname($_);'
1692 --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
1693 --rpl '{.} s:\.[^/.]+$::'
1694
1695 The --plus replacement strings are implemented as:
1696
1697 --rpl '{+/} s:/[^/]*$::'
1698 --rpl '{+.} s:.*\.::'
1699 --rpl '{+..} s:.*\.([^.]*\.):$1:'
1700 --rpl '{+...} s:.*\.([^.]*\.[^.]*\.):$1:'
1701 --rpl '{..} s:\.[^/.]+$::; s:\.[^/.]+$::'
1702 --rpl '{...} s:\.[^/.]+$::; s:\.[^/.]+$::; s:\.[^/.]+$::'
1703 --rpl '{/..} s:.*/::; s:\.[^/.]+$::; s:\.[^/.]+$::'
1704 --rpl '{/...} s:.*/::;s:\.[^/.]+$::;s:\.[^/.]+$::;s:\.[^/.]+$::'
1705 --rpl '{##} $_=total_jobs()'
1706 --rpl '{:-(.+?)} $_ ||= $$1'
1707 --rpl '{:(\d+?)} substr($_,0,$$1) = ""'
1708 --rpl '{:(\d+?):(\d+?)} $_ = substr($_,$$1,$$2);'
1709 --rpl '{#([^#].*?)} s/^$$1//;'
1710 --rpl '{%(.+?)} s/$$1$//;'
1711 --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;'
1712 --rpl '{^(.+?)} s/^($$1)/uc($1)/e;'
1713 --rpl '{^^(.+?)} s/($$1)/uc($1)/eg;'
1714 --rpl '{,(.+?)} s/^($$1)/lc($1)/e;'
1715 --rpl '{,,(.+?)} s/($$1)/lc($1)/eg;'
1716
1717 If the user defined replacement string starts with '{' it can also
1718 be used as a positional replacement string (like {2.}).
1719
1720 It is recommended to only change $_ but you have full access to all
1721 of GNU parallel's internal functions and data structures.
1722
1723 Here are a few examples:
1724
1725 Is the job sequence even or odd?
1726 --rpl '{odd} $_ = seq() % 2 ? "odd" : "even"'
1727 Pad job sequence with leading zeros to get equal width
1728 --rpl '{0#} $f=1+int("".(log(total_jobs())/log(10)));
1729 $_=sprintf("%0${f}d",seq())'
1730 Job sequence counting from 0
1731 --rpl '{#0} $_ = seq() - 1'
1732 Job slot counting from 2
1733 --rpl '{%1} $_ = slot() + 1'
1734 Remove all extensions
1735 --rpl '{:} s:(\.[^/]+)*$::'
1736
1737 You can have dynamic replacement strings by including parenthesis
1738 in the replacement string and adding a regular expression between
1739 the parenthesis. The matching string will be inserted as $$1:
1740
1741 parallel --rpl '{%(.*?)} s/$$1//' echo {%.tar.gz} ::: my.tar.gz
1742 parallel --rpl '{:%(.+?)} s:$$1(\.[^/]+)*$::' \
1743 echo {:%_file} ::: my_file.tar.gz
1744 parallel -n3 --rpl '{/:%(.*?)} s:.*/(.*)$$1(\.[^/]+)*$:$1:' \
1745 echo job {#}: {2} {2.} {3/:%_1} ::: a/b.c c/d.e f/g_1.h.i
1746
1747 You can even use multiple matches:
1748
1749 parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;'
1750 echo {/replacethis/withthis} {/b/C} ::: a_replacethis_b
1751
1752 parallel --rpl '{(.*?)/(.*?)} $_="$$2$_$$1"' \
1753 echo {swap/these} ::: -middle-
1754
1755 See also: {= perl expression =} --parens
1756
1757 --rsync-opts options
1758 Options to pass on to rsync. Setting --rsync-opts takes precedence
1759 over setting the environment variable $PARALLEL_RSYNC_OPTS.
1760
1761 --max-chars=max-chars
1762 -s max-chars
1763 Use at most max-chars characters per command line, including the
1764 command and initial-arguments and the terminating nulls at the ends
1765 of the argument strings. The largest allowed value is system-
1766 dependent, and is calculated as the argument length limit for exec,
1767 less the size of your environment. The default value is the
1768 maximum.
1769
1770 Implies -X unless -m is set.
1771
1772 --show-limits
1773 Display the limits on the command-line length which are imposed by
1774 the operating system and the -s option. Pipe the input from
1775 /dev/null (and perhaps specify --no-run-if-empty) if you don't want
1776 GNU parallel to do anything.
1777
1778 --semaphore
1779 Work as a counting semaphore. --semaphore will cause GNU parallel
1780 to start command in the background. When the number of jobs given
1781 by --jobs is reached, GNU parallel will wait for one of these to
1782 complete before starting another command.
1783
1784 --semaphore implies --bg unless --fg is specified.
1785
1786 --semaphore implies --semaphorename `tty` unless --semaphorename is
1787 specified.
1788
1789 Used with --fg, --wait, and --semaphorename.
1790
1791 The command sem is an alias for parallel --semaphore.
1792
1793 See also man sem.
1794
1795 --semaphorename name
1796 --id name
1797 Use name as the name of the semaphore. Default is the name of the
1798 controlling tty (output from tty).
1799
1800 The default normally works as expected when used interactively, but
1801 when used in a script name should be set. $$ or my_task_name are
1802 often a good value.
1803
1804 The semaphore is stored in ~/.parallel/semaphores/
1805
1806 Implies --semaphore.
1807
1808 See also man sem.
1809
1810 --semaphoretimeout secs
1811 --st secs
1812 If secs > 0: If the semaphore is not released within secs seconds,
1813 take it anyway.
1814
1815 If secs < 0: If the semaphore is not released within secs seconds,
1816 exit.
1817
1818 Implies --semaphore.
1819
1820 See also man sem.
1821
1822 --seqreplace replace-str
1823 Use the replacement string replace-str instead of {#} for job
1824 sequence number.
1825
1826 --session
1827 Record names in current environment in $PARALLEL_IGNORED_NAMES and
1828 exit. Only used with env_parallel. Aliases, functions, and
1829 variables with names in $PARALLEL_IGNORED_NAMES will not be copied.
1830
1831 Only supported in Ash, Bash, Dash, Ksh, Sh, and Zsh.
1832
1833 See also --env, --record-env.
1834
1835 --shard shardexpr
1836 Use shardexpr as shard key and shard input to the jobs.
1837
1838 shardexpr is [column number|column name] [perlexpression] e.g. 3,
1839 Address, 3 $_%=100, Address s/\d//g.
1840
1841 Each input line is split using --colsep. The value of the column is
1842 put into $_, the perl expression is executed, the resulting value
1843 is hashed so that all lines of a given value is given to the same
1844 job slot.
1845
1846 This is similar to sharding in databases.
1847
1848 The performance is in the order of 100K rows per second. Faster if
1849 the shardcol is small (<10), slower if it is big (>100).
1850
1851 --shard requires --pipe and a fixed numeric value for --jobs.
1852
1853 See also --bin, --group-by, --roundrobin.
1854
1855 --shebang
1856 --hashbang
1857 GNU parallel can be called as a shebang (#!) command as the first
1858 line of a script. The content of the file will be treated as
1859 inputsource.
1860
1861 Like this:
1862
1863 #!/usr/bin/parallel --shebang -r wget
1864
1865 https://ftpmirror.gnu.org/parallel/parallel-20120822.tar.bz2
1866 https://ftpmirror.gnu.org/parallel/parallel-20130822.tar.bz2
1867 https://ftpmirror.gnu.org/parallel/parallel-20140822.tar.bz2
1868
1869 --shebang must be set as the first option.
1870
1871 On FreeBSD env is needed:
1872
1873 #!/usr/bin/env -S parallel --shebang -r wget
1874
1875 https://ftpmirror.gnu.org/parallel/parallel-20120822.tar.bz2
1876 https://ftpmirror.gnu.org/parallel/parallel-20130822.tar.bz2
1877 https://ftpmirror.gnu.org/parallel/parallel-20140822.tar.bz2
1878
1879 There are many limitations of shebang (#!) depending on your
1880 operating system. See details on
1881 http://www.in-ulm.de/~mascheck/various/shebang/
1882
1883 --shebang-wrap
1884 GNU parallel can parallelize scripts by wrapping the shebang line.
1885 If the program can be run like this:
1886
1887 cat arguments | parallel the_program
1888
1889 then the script can be changed to:
1890
1891 #!/usr/bin/parallel --shebang-wrap /original/parser --options
1892
1893 E.g.
1894
1895 #!/usr/bin/parallel --shebang-wrap /usr/bin/python
1896
1897 If the program can be run like this:
1898
1899 cat data | parallel --pipe the_program
1900
1901 then the script can be changed to:
1902
1903 #!/usr/bin/parallel --shebang-wrap --pipe /orig/parser --opts
1904
1905 E.g.
1906
1907 #!/usr/bin/parallel --shebang-wrap --pipe /usr/bin/perl -w
1908
1909 --shebang-wrap must be set as the first option.
1910
1911 --shellquote
1912 Does not run the command but quotes it. Useful for making quoted
1913 composed commands for GNU parallel.
1914
1915 Multiple --shellquote with quote the string multiple times, so
1916 parallel --shellquote | parallel --shellquote can be written as
1917 parallel --shellquote --shellquote.
1918
1919 --shuf
1920 Shuffle jobs. When having multiple input sources it is hard to
1921 randomize jobs. --shuf will generate all jobs, and shuffle them
1922 before running them. This is useful to get a quick preview of the
1923 results before running the full batch.
1924
1925 --skip-first-line
1926 Do not use the first line of input (used by GNU parallel itself
1927 when called with --shebang).
1928
1929 --sql DBURL (obsolete)
1930 Use --sqlmaster instead.
1931
1932 --sqlmaster DBURL (beta testing)
1933 Submit jobs via SQL server. DBURL must point to a table, which will
1934 contain the same information as --joblog, the values from the input
1935 sources (stored in columns V1 .. Vn), and the output (stored in
1936 columns Stdout and Stderr).
1937
1938 If DBURL is prepended with '+' GNU parallel assumes the table is
1939 already made with the correct columns and appends the jobs to it.
1940
1941 If DBURL is not prepended with '+' the table will be dropped and
1942 created with the correct amount of V-columns unless
1943
1944 --sqlmaster does not run any jobs, but it creates the values for
1945 the jobs to be run. One or more --sqlworker must be run to actually
1946 execute the jobs.
1947
1948 If --wait is set, GNU parallel will wait for the jobs to complete.
1949
1950 The format of a DBURL is:
1951
1952 [sql:]vendor://[[user][:pwd]@][host][:port]/[db]/table
1953
1954 E.g.
1955
1956 sql:mysql://hr:hr@localhost:3306/hrdb/jobs
1957 mysql://scott:tiger@my.example.com/pardb/paralleljobs
1958 sql:oracle://scott:tiger@ora.example.com/xe/parjob
1959 postgresql://scott:tiger@pg.example.com/pgdb/parjob
1960 pg:///parjob
1961 sqlite3:///%2Ftmp%2Fpardb.sqlite/parjob
1962 csv:///%2Ftmp%2Fpardb/parjob
1963
1964 Notice how / in the path of sqlite and CVS must be encoded as %2F.
1965 Except the last / in CSV which must be a /.
1966
1967 It can also be an alias from ~/.sql/aliases:
1968
1969 :myalias mysql:///mydb/paralleljobs
1970
1971 --sqlandworker DBURL (beta testing)
1972 Shorthand for: --sqlmaster DBURL --sqlworker DBURL.
1973
1974 --sqlworker DBURL (beta testing)
1975 Execute jobs via SQL server. Read the input sources variables from
1976 the table pointed to by DBURL. The command on the command line
1977 should be the same as given by --sqlmaster.
1978
1979 If you have more than one --sqlworker jobs may be run more than
1980 once.
1981
1982 If --sqlworker runs on the local machine, the hostname in the SQL
1983 table will not be ':' but instead the hostname of the machine.
1984
1985 --ssh sshcommand
1986 GNU parallel defaults to using ssh for remote access. This can be
1987 overridden with --ssh. It can also be set on a per server basis
1988 (see --sshlogin).
1989
1990 --sshdelay secs
1991 Delay starting next ssh by secs seconds. GNU parallel will pause
1992 secs seconds after starting each ssh. secs can be less than 1
1993 seconds.
1994
1995 -S
1996 [@hostgroups/][ncpus/]sshlogin[,[@hostgroups/][ncpus/]sshlogin[,...]]
1997 -S @hostgroup
1998 --sshlogin
1999 [@hostgroups/][ncpus/]sshlogin[,[@hostgroups/][ncpus/]sshlogin[,...]]
2000 --sshlogin @hostgroup
2001 Distribute jobs to remote computers. The jobs will be run on a list
2002 of remote computers.
2003
2004 If hostgroups is given, the sshlogin will be added to that
2005 hostgroup. Multiple hostgroups are separated by '+'. The sshlogin
2006 will always be added to a hostgroup named the same as sshlogin.
2007
2008 If only the @hostgroup is given, only the sshlogins in that
2009 hostgroup will be used. Multiple @hostgroup can be given.
2010
2011 GNU parallel will determine the number of CPUs on the remote
2012 computers and run the number of jobs as specified by -j. If the
2013 number ncpus is given GNU parallel will use this number for number
2014 of CPUs on the host. Normally ncpus will not be needed.
2015
2016 An sshlogin is of the form:
2017
2018 [sshcommand [options]] [username@]hostname
2019
2020 The sshlogin must not require a password (ssh-agent, ssh-copy-id,
2021 and sshpass may help with that).
2022
2023 The sshlogin ':' is special, it means 'no ssh' and will therefore
2024 run on the local computer.
2025
2026 The sshlogin '..' is special, it read sshlogins from
2027 ~/.parallel/sshloginfile or $XDG_CONFIG_HOME/parallel/sshloginfile
2028
2029 The sshlogin '-' is special, too, it read sshlogins from stdin
2030 (standard input).
2031
2032 To specify more sshlogins separate the sshlogins by comma, newline
2033 (in the same string), or repeat the options multiple times.
2034
2035 For examples: see --sshloginfile.
2036
2037 The remote host must have GNU parallel installed.
2038
2039 --sshlogin is known to cause problems with -m and -X.
2040
2041 --sshlogin is often used with --transferfile, --return, --cleanup,
2042 and --trc.
2043
2044 --sshloginfile filename
2045 --slf filename
2046 File with sshlogins. The file consists of sshlogins on separate
2047 lines. Empty lines and lines starting with '#' are ignored.
2048 Example:
2049
2050 server.example.com
2051 username@server2.example.com
2052 8/my-8-cpu-server.example.com
2053 2/my_other_username@my-dualcore.example.net
2054 # This server has SSH running on port 2222
2055 ssh -p 2222 server.example.net
2056 4/ssh -p 2222 quadserver.example.net
2057 # Use a different ssh program
2058 myssh -p 2222 -l myusername hexacpu.example.net
2059 # Use a different ssh program with default number of CPUs
2060 //usr/local/bin/myssh -p 2222 -l myusername hexacpu
2061 # Use a different ssh program with 6 CPUs
2062 6//usr/local/bin/myssh -p 2222 -l myusername hexacpu
2063 # Assume 16 CPUs on the local computer
2064 16/:
2065 # Put server1 in hostgroup1
2066 @hostgroup1/server1
2067 # Put myusername@server2 in hostgroup1+hostgroup2
2068 @hostgroup1+hostgroup2/myusername@server2
2069 # Force 4 CPUs and put 'ssh -p 2222 server3' in hostgroup1
2070 @hostgroup1/4/ssh -p 2222 server3
2071
2072 When using a different ssh program the last argument must be the
2073 hostname.
2074
2075 Multiple --sshloginfile are allowed.
2076
2077 GNU parallel will first look for the file in current dir; if that
2078 fails it look for the file in ~/.parallel.
2079
2080 The sshloginfile '..' is special, it read sshlogins from
2081 ~/.parallel/sshloginfile
2082
2083 The sshloginfile '.' is special, it read sshlogins from
2084 /etc/parallel/sshloginfile
2085
2086 The sshloginfile '-' is special, too, it read sshlogins from stdin
2087 (standard input).
2088
2089 If the sshloginfile is changed it will be re-read when a job
2090 finishes though at most once per second. This makes it possible to
2091 add and remove hosts while running.
2092
2093 This can be used to have a daemon that updates the sshloginfile to
2094 only contain servers that are up:
2095
2096 cp original.slf tmp2.slf
2097 while [ 1 ] ; do
2098 nice parallel --nonall -j0 -k --slf original.slf \
2099 --tag echo | perl 's/\t$//' > tmp.slf
2100 if diff tmp.slf tmp2.slf; then
2101 mv tmp.slf tmp2.slf
2102 fi
2103 sleep 10
2104 done &
2105 parallel --slf tmp2.slf ...
2106
2107 --slotreplace replace-str
2108 Use the replacement string replace-str instead of {%} for job slot
2109 number.
2110
2111 --silent
2112 Silent. The job to be run will not be printed. This is the
2113 default. Can be reversed with -v.
2114
2115 --tty
2116 Open terminal tty. If GNU parallel is used for starting a program
2117 that accesses the tty (such as an interactive program) then this
2118 option may be needed. It will default to starting only one job at a
2119 time (i.e. -j1), not buffer the output (i.e. -u), and it will open
2120 a tty for the job.
2121
2122 You can of course override -j1 and -u.
2123
2124 Using --tty unfortunately means that GNU parallel cannot kill the
2125 jobs (with --timeout, --memfree, or --halt). This is due to GNU
2126 parallel giving each child its own process group, which is then
2127 killed. Process groups are dependant on the tty.
2128
2129 --tag
2130 Tag lines with arguments. Each output line will be prepended with
2131 the arguments and TAB (\t). When combined with --onall or --nonall
2132 the lines will be prepended with the sshlogin instead.
2133
2134 --tag is ignored when using -u.
2135
2136 --tagstring str
2137 Tag lines with a string. Each output line will be prepended with
2138 str and TAB (\t). str can contain replacement strings such as {}.
2139
2140 --tagstring is ignored when using -u, --onall, and --nonall.
2141
2142 --tee
2143 Pipe all data to all jobs. Used with --pipe/--pipepart and :::.
2144
2145 seq 1000 | parallel --pipe --tee -v wc {} ::: -w -l -c
2146
2147 How many numbers in 1..1000 contain 0..9, and how many bytes do
2148 they fill:
2149
2150 seq 1000 | parallel --pipe --tee --tag \
2151 'grep {1} | wc {2}' ::: {0..9} ::: -l -c
2152
2153 How many words contain a..z and how many bytes do they fill?
2154
2155 parallel -a /usr/share/dict/words --pipepart --tee --tag \
2156 'grep {1} | wc {2}' ::: {a..z} ::: -l -c
2157
2158 --termseq sequence
2159 Termination sequence. When a job is killed due to --timeout,
2160 --memfree, --halt, or abnormal termination of GNU parallel,
2161 sequence determines how the job is killed. The default is:
2162
2163 TERM,200,TERM,100,TERM,50,KILL,25
2164
2165 which sends a TERM signal, waits 200 ms, sends another TERM signal,
2166 waits 100 ms, sends another TERM signal, waits 50 ms, sends a KILL
2167 signal, waits 25 ms, and exits. GNU parallel detects if a process
2168 dies before the waiting time is up.
2169
2170 --tmpdir dirname
2171 Directory for temporary files. GNU parallel normally buffers output
2172 into temporary files in /tmp. By setting --tmpdir you can use a
2173 different dir for the files. Setting --tmpdir is equivalent to
2174 setting $TMPDIR.
2175
2176 --tmux (Long beta testing)
2177 Use tmux for output. Start a tmux session and run each job in a
2178 window in that session. No other output will be produced.
2179
2180 --tmuxpane (Long beta testing)
2181 Use tmux for output but put output into panes in the first window.
2182 Useful if you want to monitor the progress of less than 100
2183 concurrent jobs.
2184
2185 --timeout duration
2186 Time out for command. If the command runs for longer than duration
2187 seconds it will get killed as per --termseq.
2188
2189 If duration is followed by a % then the timeout will dynamically be
2190 computed as a percentage of the median average runtime of
2191 successful jobs. Only values > 100% will make sense.
2192
2193 duration is normally in seconds, but can be floats postfixed with
2194 s, m, h, or d which would multiply the float by 1, 60, 3600, or
2195 86400. Thus these are equivalent: --timeout 100000 and --timeout
2196 1d3.5h16.6m4s.
2197
2198 --verbose
2199 -t Print the job to be run on stderr (standard error).
2200
2201 See also -v, -p.
2202
2203 --transfer
2204 Transfer files to remote computers. Shorthand for: --transferfile
2205 {}.
2206
2207 --transferfile filename
2208 --tf filename
2209 --transferfile is used with --sshlogin to transfer files to the
2210 remote computers. The files will be transferred using rsync and
2211 will be put relative to the default work dir. If the path contains
2212 /./ the remaining path will be relative to the work dir. E.g.
2213
2214 echo foo/bar.txt | parallel --transferfile {} \
2215 --sshlogin server.example.com wc
2216
2217 This will transfer the file foo/bar.txt to the computer
2218 server.example.com to the file $HOME/foo/bar.txt before running wc
2219 foo/bar.txt on server.example.com.
2220
2221 echo /tmp/foo/bar.txt | parallel --transferfile {} \
2222 --sshlogin server.example.com wc
2223
2224 This will transfer the file /tmp/foo/bar.txt to the computer
2225 server.example.com to the file /tmp/foo/bar.txt before running wc
2226 /tmp/foo/bar.txt on server.example.com.
2227
2228 echo /tmp/./foo/bar.txt | parallel --transferfile {} \
2229 --sshlogin server.example.com wc {= s:.*/./:./: =}
2230
2231 This will transfer the file /tmp/foo/bar.txt to the computer
2232 server.example.com to the file foo/bar.txt before running wc
2233 ./foo/bar.txt on server.example.com.
2234
2235 --transferfile is often used with --return and --cleanup. A
2236 shorthand for --transferfile {} is --transfer.
2237
2238 --transferfile is ignored when used with --sshlogin : or when not
2239 used with --sshlogin.
2240
2241 --trc filename
2242 Transfer, Return, Cleanup. Shorthand for:
2243
2244 --transferfile {} --return filename --cleanup
2245
2246 --trim <n|l|r|lr|rl>
2247 Trim white space in input.
2248
2249 n No trim. Input is not modified. This is the default.
2250
2251 l Left trim. Remove white space from start of input. E.g. " a bc
2252 " -> "a bc ".
2253
2254 r Right trim. Remove white space from end of input. E.g. " a bc "
2255 -> " a bc".
2256
2257 lr
2258 rl Both trim. Remove white space from both start and end of input.
2259 E.g. " a bc " -> "a bc". This is the default if --colsep is
2260 used.
2261
2262 --ungroup
2263 -u Ungroup output. Output is printed as soon as possible and bypasses
2264 GNU parallel internal processing. This may cause output from
2265 different commands to be mixed thus should only be used if you do
2266 not care about the output. Compare these:
2267
2268 seq 4 | parallel -j0 \
2269 'sleep {};echo -n start{};sleep {};echo {}end'
2270 seq 4 | parallel -u -j0 \
2271 'sleep {};echo -n start{};sleep {};echo {}end'
2272
2273 It also disables --tag. GNU parallel outputs faster with -u.
2274 Compare the speeds of these:
2275
2276 parallel seq ::: 300000000 >/dev/null
2277 parallel -u seq ::: 300000000 >/dev/null
2278 parallel --line-buffer seq ::: 300000000 >/dev/null
2279
2280 Can be reversed with --group.
2281
2282 See also: --line-buffer --group
2283
2284 --extensionreplace replace-str
2285 --er replace-str
2286 Use the replacement string replace-str instead of {.} for input
2287 line without extension.
2288
2289 --use-sockets-instead-of-threads
2290 --use-cores-instead-of-threads
2291 --use-cpus-instead-of-cores (obsolete)
2292 Determine how GNU parallel counts the number of CPUs. GNU parallel
2293 uses this number when the number of jobslots is computed relative
2294 to the number of CPUs (e.g. 100% or +1).
2295
2296 CPUs can be counted in three different ways:
2297
2298 sockets The number of filled CPU sockets (i.e. the number of
2299 physical chips).
2300
2301 cores The number of physical cores (i.e. the number of physical
2302 compute cores).
2303
2304 threads The number of hyperthreaded cores (i.e. the number of
2305 virtual cores - with some of them possibly being
2306 hyperthreaded)
2307
2308 Normally the number of CPUs is computed as the number of CPU
2309 threads. With --use-sockets-instead-of-threads or
2310 --use-cores-instead-of-threads you can force it to be computed as
2311 the number of filled sockets or number of cores instead.
2312
2313 Most users will not need these options.
2314
2315 --use-cpus-instead-of-cores is a (misleading) alias for
2316 --use-sockets-instead-of-threads and is kept for backwards
2317 compatibility.
2318
2319 -v Verbose. Print the job to be run on stdout (standard output). Can
2320 be reversed with --silent. See also -t.
2321
2322 Use -v -v to print the wrapping ssh command when running remotely.
2323
2324 --version
2325 -V Print the version GNU parallel and exit.
2326
2327 --workdir mydir
2328 --wd mydir
2329 Jobs will be run in the dir mydir.
2330
2331 Files transferred using --transferfile and --return will be
2332 relative to mydir on remote computers.
2333
2334 The special mydir value ... will create working dirs under
2335 ~/.parallel/tmp/. If --cleanup is given these dirs will be removed.
2336
2337 The special mydir value . uses the current working dir. If the
2338 current working dir is beneath your home dir, the value . is
2339 treated as the relative path to your home dir. This means that if
2340 your home dir is different on remote computers (e.g. if your login
2341 is different) the relative path will still be relative to your home
2342 dir.
2343
2344 To see the difference try:
2345
2346 parallel -S server pwd ::: ""
2347 parallel --wd . -S server pwd ::: ""
2348 parallel --wd ... -S server pwd ::: ""
2349
2350 mydir can contain GNU parallel's replacement strings.
2351
2352 --wait
2353 Wait for all commands to complete.
2354
2355 Used with --semaphore or --sqlmaster.
2356
2357 See also man sem.
2358
2359 -X Multiple arguments with context replace. Insert as many arguments
2360 as the command line length permits. If multiple jobs are being run
2361 in parallel: distribute the arguments evenly among the jobs. Use
2362 -j1 to avoid this.
2363
2364 If {} is not used the arguments will be appended to the line. If
2365 {} is used as part of a word (like pic{}.jpg) then the whole word
2366 will be repeated. If {} is used multiple times each {} will be
2367 replaced with the arguments.
2368
2369 Normally -X will do the right thing, whereas -m can give unexpected
2370 results if {} is used as part of a word.
2371
2372 Support for -X with --sshlogin is limited and may fail.
2373
2374 See also -m.
2375
2376 --exit
2377 -x Exit if the size (see the -s option) is exceeded.
2378
2379 --xargs
2380 Multiple arguments. Insert as many arguments as the command line
2381 length permits.
2382
2383 If {} is not used the arguments will be appended to the line. If
2384 {} is used multiple times each {} will be replaced with all the
2385 arguments.
2386
2387 Support for --xargs with --sshlogin is limited and may fail.
2388
2389 See also -X for context replace. If in doubt use -X as that will
2390 most likely do what is needed.
2391
2393 GNU parallel can work similar to xargs -n1.
2394
2395 To compress all html files using gzip run:
2396
2397 find . -name '*.html' | parallel gzip --best
2398
2399 If the file names may contain a newline use -0. Substitute FOO BAR with
2400 FUBAR in all files in this dir and subdirs:
2401
2402 find . -type f -print0 | \
2403 parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'
2404
2405 Note -q is needed because of the space in 'FOO BAR'.
2406
2408 prips can generate IP-addresses from CIDR notation. With GNU parallel
2409 you can build a simple network scanner to see which addresses respond
2410 to ping:
2411
2412 prips 130.229.16.0/20 | \
2413 parallel --timeout 2 -j0 \
2414 'ping -c 1 {} >/dev/null && echo {}' 2>/dev/null
2415
2417 GNU parallel can take the arguments from command line instead of stdin
2418 (standard input). To compress all html files in the current dir using
2419 gzip run:
2420
2421 parallel gzip --best ::: *.html
2422
2423 To convert *.wav to *.mp3 using LAME running one process per CPU run:
2424
2425 parallel lame {} -o {.}.mp3 ::: *.wav
2426
2428 When moving a lot of files like this: mv *.log destdir you will
2429 sometimes get the error:
2430
2431 bash: /bin/mv: Argument list too long
2432
2433 because there are too many files. You can instead do:
2434
2435 ls | grep -E '\.log$' | parallel mv {} destdir
2436
2437 This will run mv for each file. It can be done faster if mv gets as
2438 many arguments that will fit on the line:
2439
2440 ls | grep -E '\.log$' | parallel -m mv {} destdir
2441
2442 In many shells you can also use printf:
2443
2444 printf '%s\0' *.log | parallel -0 -m mv {} destdir
2445
2447 To remove the files pict0000.jpg .. pict9999.jpg you could do:
2448
2449 seq -w 0 9999 | parallel rm pict{}.jpg
2450
2451 You could also do:
2452
2453 seq -w 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm
2454
2455 The first will run rm 10000 times, while the last will only run rm as
2456 many times needed to keep the command line length short enough to avoid
2457 Argument list too long (it typically runs 1-2 times).
2458
2459 You could also run:
2460
2461 seq -w 0 9999 | parallel -X rm pict{}.jpg
2462
2463 This will also only run rm as many times needed to keep the command
2464 line length short enough.
2465
2467 If ImageMagick is installed this will generate a thumbnail of a jpg
2468 file:
2469
2470 convert -geometry 120 foo.jpg thumb_foo.jpg
2471
2472 This will run with number-of-cpus jobs in parallel for all jpg files in
2473 a directory:
2474
2475 ls *.jpg | parallel convert -geometry 120 {} thumb_{}
2476
2477 To do it recursively use find:
2478
2479 find . -name '*.jpg' | \
2480 parallel convert -geometry 120 {} {}_thumb.jpg
2481
2482 Notice how the argument has to start with {} as {} will include path
2483 (e.g. running convert -geometry 120 ./foo/bar.jpg thumb_./foo/bar.jpg
2484 would clearly be wrong). The command will generate files like
2485 ./foo/bar.jpg_thumb.jpg.
2486
2487 Use {.} to avoid the extra .jpg in the file name. This command will
2488 make files like ./foo/bar_thumb.jpg:
2489
2490 find . -name '*.jpg' | \
2491 parallel convert -geometry 120 {} {.}_thumb.jpg
2492
2494 This will generate an uncompressed version of .gz-files next to the
2495 .gz-file:
2496
2497 parallel zcat {} ">"{.} ::: *.gz
2498
2499 Quoting of > is necessary to postpone the redirection. Another solution
2500 is to quote the whole command:
2501
2502 parallel "zcat {} >{.}" ::: *.gz
2503
2504 Other special shell characters (such as * ; $ > < | >> <<) also need to
2505 be put in quotes, as they may otherwise be interpreted by the shell and
2506 not given to GNU parallel.
2507
2509 A job can consist of several commands. This will print the number of
2510 files in each directory:
2511
2512 ls | parallel 'echo -n {}" "; ls {}|wc -l'
2513
2514 To put the output in a file called <name>.dir:
2515
2516 ls | parallel '(echo -n {}" "; ls {}|wc -l) >{}.dir'
2517
2518 Even small shell scripts can be run by GNU parallel:
2519
2520 find . | parallel 'a={}; name=${a##*/};' \
2521 'upper=$(echo "$name" | tr "[:lower:]" "[:upper:]");'\
2522 'echo "$name - $upper"'
2523
2524 ls | parallel 'mv {} "$(echo {} | tr "[:upper:]" "[:lower:]")"'
2525
2526 Given a list of URLs, list all URLs that fail to download. Print the
2527 line number and the URL.
2528
2529 cat urlfile | parallel "wget {} 2>/dev/null || grep -n {} urlfile"
2530
2531 Create a mirror directory with the same filenames except all files and
2532 symlinks are empty files.
2533
2534 cp -rs /the/source/dir mirror_dir
2535 find mirror_dir -type l | parallel -m rm {} '&&' touch {}
2536
2537 Find the files in a list that do not exist
2538
2539 cat file_list | parallel 'if [ ! -e {} ] ; then echo {}; fi'
2540
2542 You have a bunch of file. You want them sorted into dirs. The dir of
2543 each file should be named the first letter of the file name.
2544
2545 parallel 'mkdir -p {=s/(.).*/$1/=}; mv {} {=s/(.).*/$1/=}' ::: *
2546
2548 You have a dir with files named as 24 hours in 5 minute intervals:
2549 00:00, 00:05, 00:10 .. 23:55. You want to find the files missing:
2550
2551 parallel [ -f {1}:{2} ] "||" echo {1}:{2} does not exist \
2552 ::: {00..23} ::: {00..55..5}
2553
2555 If the composed command is longer than a line, it becomes hard to read.
2556 In Bash you can use functions. Just remember to export -f the function.
2557
2558 doit() {
2559 echo Doing it for $1
2560 sleep 2
2561 echo Done with $1
2562 }
2563 export -f doit
2564 parallel doit ::: 1 2 3
2565
2566 doubleit() {
2567 echo Doing it for $1 $2
2568 sleep 2
2569 echo Done with $1 $2
2570 }
2571 export -f doubleit
2572 parallel doubleit ::: 1 2 3 ::: a b
2573
2574 To do this on remote servers you need to transfer the function using
2575 --env:
2576
2577 parallel --env doit -S server doit ::: 1 2 3
2578 parallel --env doubleit -S server doubleit ::: 1 2 3 ::: a b
2579
2580 If your environment (aliases, variables, and functions) is small you
2581 can copy the full environment without having to export -f anything. See
2582 env_parallel.
2583
2585 To test a program with different parameters:
2586
2587 tester() {
2588 if (eval "$@") >&/dev/null; then
2589 perl -e 'printf "\033[30;102m[ OK ]\033[0m @ARGV\n"' "$@"
2590 else
2591 perl -e 'printf "\033[30;101m[FAIL]\033[0m @ARGV\n"' "$@"
2592 fi
2593 }
2594 export -f tester
2595 parallel tester my_program ::: arg1 arg2
2596 parallel tester exit ::: 1 0 2 0
2597
2598 If my_program fails a red FAIL will be printed followed by the failing
2599 command; otherwise a green OK will be printed followed by the command.
2600
2602 It can be useful to monitor the output of running jobs.
2603
2604 This shows the most recent output line until a job finishes. After
2605 which the output of the job is printed in full:
2606
2607 parallel '{} | tee >(cat >&3)' ::: 'command 1' 'command 2' \
2608 3> >(perl -ne '$|=1;chomp;printf"%.'$COLUMNS's\r",$_." "x100')
2609
2611 Log rotation renames a logfile to an extension with a higher number:
2612 log.1 becomes log.2, log.2 becomes log.3, and so on. The oldest log is
2613 removed. To avoid overwriting files the process starts backwards from
2614 the high number to the low number. This will keep 10 old versions of
2615 the log:
2616
2617 seq 9 -1 1 | parallel -j1 mv log.{} log.'{= $_++ =}'
2618 mv log log.1
2619
2621 When processing files removing the file extension using {.} is often
2622 useful.
2623
2624 Create a directory for each zip-file and unzip it in that dir:
2625
2626 parallel 'mkdir {.}; cd {.}; unzip ../{}' ::: *.zip
2627
2628 Recompress all .gz files in current directory using bzip2 running 1 job
2629 per CPU in parallel:
2630
2631 parallel "zcat {} | bzip2 >{.}.bz2 && rm {}" ::: *.gz
2632
2633 Convert all WAV files to MP3 using LAME:
2634
2635 find sounddir -type f -name '*.wav' | parallel lame {} -o {.}.mp3
2636
2637 Put all converted in the same directory:
2638
2639 find sounddir -type f -name '*.wav' | \
2640 parallel lame {} -o mydir/{/.}.mp3
2641
2643 If you have directory with tar.gz files and want these extracted in the
2644 corresponding dir (e.g foo.tar.gz will be extracted in the dir foo) you
2645 can do:
2646
2647 parallel --plus 'mkdir {..}; tar -C {..} -xf {}' ::: *.tar.gz
2648
2649 If you want to remove a different ending, you can use {%string}:
2650
2651 parallel --plus echo {%_demo} ::: mycode_demo keep_demo_here
2652
2653 You can also remove a starting string with {#string}
2654
2655 parallel --plus echo {#demo_} ::: demo_mycode keep_demo_here
2656
2657 To remove a string anywhere you can use regular expressions with
2658 {/regexp/replacement} and leave the replacement empty:
2659
2660 parallel --plus echo {/demo_/} ::: demo_mycode remove_demo_here
2661
2663 Let us assume a website stores images like:
2664
2665 http://www.example.com/path/to/YYYYMMDD_##.jpg
2666
2667 where YYYYMMDD is the date and ## is the number 01-24. This will
2668 download images for the past 30 days:
2669
2670 getit() {
2671 date=$(date -d "today -$1 days" +%Y%m%d)
2672 num=$2
2673 echo wget http://www.example.com/path/to/${date}_${num}.jpg
2674 }
2675 export -f getit
2676
2677 parallel getit ::: $(seq 30) ::: $(seq -w 24)
2678
2679 $(date -d "today -$1 days" +%Y%m%d) will give the dates in YYYYMMDD
2680 with $1 days subtracted.
2681
2683 NASA provides tiles to download on earthdata.nasa.gov. Download tiles
2684 for Blue Marble world map and create a 10240x20480 map.
2685
2686 base=https://map1a.vis.earthdata.nasa.gov/wmts-geo/wmts.cgi
2687 service="SERVICE=WMTS&REQUEST=GetTile&VERSION=1.0.0"
2688 layer="LAYER=BlueMarble_ShadedRelief_Bathymetry"
2689 set="STYLE=&TILEMATRIXSET=EPSG4326_500m&TILEMATRIX=5"
2690 tile="TILEROW={1}&TILECOL={2}"
2691 format="FORMAT=image%2Fjpeg"
2692 url="$base?$service&$layer&$set&$tile&$format"
2693
2694 parallel -j0 -q wget "$url" -O {1}_{2}.jpg ::: {0..19} ::: {0..39}
2695 parallel eval convert +append {}_{0..39}.jpg line{}.jpg ::: {0..19}
2696 convert -append line{0..19}.jpg world.jpg
2697
2699 Search NASA using their API to get JSON for images related to 'apollo
2700 11' and has 'moon landing' in the description.
2701
2702 The search query returns JSON containing URLs to JSON containing
2703 collections of pictures. One of the pictures in each of these
2704 collection is large.
2705
2706 wget is used to get the JSON for the search query. jq is then used to
2707 extract the URLs of the collections. parallel then calls wget to get
2708 each collection, which is passed to jq to extract the URLs of all
2709 images. grep filters out the large images, and parallel finally uses
2710 wget to fetch the images.
2711
2712 base="https://images-api.nasa.gov/search"
2713 q="q=apollo 11"
2714 description="description=moon landing"
2715 media_type="media_type=image"
2716 wget -O - "$base?$q&$description&$media_type" |
2717 jq -r .collection.items[].href |
2718 parallel wget -O - |
2719 jq -r .[] |
2720 grep large |
2721 parallel wget
2722
2724 youtube-dl is an excellent tool to download videos. It can, however,
2725 not download videos in parallel. This takes a playlist and downloads 10
2726 videos in parallel.
2727
2728 url='youtu.be/watch?v=0wOf2Fgi3DE&list=UU_cznB5YZZmvAmeq7Y3EriQ'
2729 export url
2730 youtube-dl --flat-playlist "https://$url" |
2731 parallel --tagstring {#} --lb -j10 \
2732 youtube-dl --playlist-start {#} --playlist-end {#} '"https://$url"'
2733
2735 parallel mv {} '{= $a=pQ($_); $b=$_;' \
2736 '$_=qx{date -r "$a" +%FT%T}; chomp; $_="$_ $b" =}' ::: *
2737
2738 {= and =} mark a perl expression. pQ perl-quotes the string. date
2739 +%FT%T is the date in ISO8601 with time.
2740
2742 Save output from ps aux every second into dirs named
2743 yyyy-mm-ddThh:mm:ss+zz:zz.
2744
2745 seq 1000 | parallel -N0 -j1 --delay 1 \
2746 --results '{= $_=`date -Isec`; chomp=}/' ps aux
2747
2749 The : in a digital clock blinks. To make every other line have a ':'
2750 and the rest a ' ' a perl expression is used to look at the 3rd input
2751 source. If the value modulo 2 is 1: Use ":" otherwise use " ":
2752
2753 parallel -k echo {1}'{=3 $_=$_%2?":":" "=}'{2}{3} \
2754 ::: {0..12} ::: {0..5} ::: {0..9}
2755
2757 This:
2758
2759 parallel --header : echo x{X}y{Y}z{Z} \> x{X}y{Y}z{Z} \
2760 ::: X {1..5} ::: Y {01..10} ::: Z {1..5}
2761
2762 will generate the files x1y01z1 .. x5y10z5. If you want to aggregate
2763 the output grouping on x and z you can do this:
2764
2765 parallel eval 'cat {=s/y01/y*/=} > {=s/y01//=}' ::: *y01*
2766
2767 For all values of x and z it runs commands like:
2768
2769 cat x1y*z1 > x1z1
2770
2771 So you end up with x1z1 .. x5z5 each containing the content of all
2772 values of y.
2773
2775 This script below will crawl and mirror a URL in parallel. It
2776 downloads first pages that are 1 click down, then 2 clicks down, then
2777 3; instead of the normal depth first, where the first link link on each
2778 page is fetched first.
2779
2780 Run like this:
2781
2782 PARALLEL=-j100 ./parallel-crawl http://gatt.org.yeslab.org/
2783
2784 Remove the wget part if you only want a web crawler.
2785
2786 It works by fetching a page from a list of URLs and looking for links
2787 in that page that are within the same starting URL and that have not
2788 already been seen. These links are added to a new queue. When all the
2789 pages from the list is done, the new queue is moved to the list of URLs
2790 and the process is started over until no unseen links are found.
2791
2792 #!/bin/bash
2793
2794 # E.g. http://gatt.org.yeslab.org/
2795 URL=$1
2796 # Stay inside the start dir
2797 BASEURL=$(echo $URL | perl -pe 's:#.*::; s:(//.*/)[^/]*:$1:')
2798 URLLIST=$(mktemp urllist.XXXX)
2799 URLLIST2=$(mktemp urllist.XXXX)
2800 SEEN=$(mktemp seen.XXXX)
2801
2802 # Spider to get the URLs
2803 echo $URL >$URLLIST
2804 cp $URLLIST $SEEN
2805
2806 while [ -s $URLLIST ] ; do
2807 cat $URLLIST |
2808 parallel lynx -listonly -image_links -dump {} \; \
2809 wget -qm -l1 -Q1 {} \; echo Spidered: {} \>\&2 |
2810 perl -ne 's/#.*//; s/\s+\d+.\s(\S+)$/$1/ and
2811 do { $seen{$1}++ or print }' |
2812 grep -F $BASEURL |
2813 grep -v -x -F -f $SEEN | tee -a $SEEN > $URLLIST2
2814 mv $URLLIST2 $URLLIST
2815 done
2816
2817 rm -f $URLLIST $URLLIST2 $SEEN
2818
2820 If the files to be processed are in a tar file then unpacking one file
2821 and processing it immediately may be faster than first unpacking all
2822 files.
2823
2824 tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' | \
2825 parallel echo
2826
2827 The Perl one-liner is needed to make sure the file is complete before
2828 handing it to GNU parallel.
2829
2831 for-loops like this:
2832
2833 (for x in `cat list` ; do
2834 do_something $x
2835 done) | process_output
2836
2837 and while-read-loops like this:
2838
2839 cat list | (while read x ; do
2840 do_something $x
2841 done) | process_output
2842
2843 can be written like this:
2844
2845 cat list | parallel do_something | process_output
2846
2847 For example: Find which host name in a list has IP address 1.2.3 4:
2848
2849 cat hosts.txt | parallel -P 100 host | grep 1.2.3.4
2850
2851 If the processing requires more steps the for-loop like this:
2852
2853 (for x in `cat list` ; do
2854 no_extension=${x%.*};
2855 do_step1 $x scale $no_extension.jpg
2856 do_step2 <$x $no_extension
2857 done) | process_output
2858
2859 and while-loops like this:
2860
2861 cat list | (while read x ; do
2862 no_extension=${x%.*};
2863 do_step1 $x scale $no_extension.jpg
2864 do_step2 <$x $no_extension
2865 done) | process_output
2866
2867 can be written like this:
2868
2869 cat list | parallel "do_step1 {} scale {.}.jpg ; do_step2 <{} {.}" |\
2870 process_output
2871
2872 If the body of the loop is bigger, it improves readability to use a
2873 function:
2874
2875 (for x in `cat list` ; do
2876 do_something $x
2877 [... 100 lines that do something with $x ...]
2878 done) | process_output
2879
2880 cat list | (while read x ; do
2881 do_something $x
2882 [... 100 lines that do something with $x ...]
2883 done) | process_output
2884
2885 can both be rewritten as:
2886
2887 doit() {
2888 x=$1
2889 do_something $x
2890 [... 100 lines that do something with $x ...]
2891 }
2892 export -f doit
2893 cat list | parallel doit
2894
2896 Nested for-loops like this:
2897
2898 (for x in `cat xlist` ; do
2899 for y in `cat ylist` ; do
2900 do_something $x $y
2901 done
2902 done) | process_output
2903
2904 can be written like this:
2905
2906 parallel do_something {1} {2} :::: xlist ylist | process_output
2907
2908 Nested for-loops like this:
2909
2910 (for colour in red green blue ; do
2911 for size in S M L XL XXL ; do
2912 echo $colour $size
2913 done
2914 done) | sort
2915
2916 can be written like this:
2917
2918 parallel echo {1} {2} ::: red green blue ::: S M L XL XXL | sort
2919
2921 diff is good for finding differences in text files. diff | wc -l gives
2922 an indication of the size of the difference. To find the differences
2923 between all files in the current dir do:
2924
2925 parallel --tag 'diff {1} {2} | wc -l' ::: * ::: * | sort -nk3
2926
2927 This way it is possible to see if some files are closer to other files.
2928
2930 When doing multiple nested for-loops it can be easier to keep track of
2931 the loop variable if is is named instead of just having a number. Use
2932 --header : to let the first argument be an named alias for the
2933 positional replacement string:
2934
2935 parallel --header : echo {colour} {size} \
2936 ::: colour red green blue ::: size S M L XL XXL
2937
2938 This also works if the input file is a file with columns:
2939
2940 cat addressbook.tsv | \
2941 parallel --colsep '\t' --header : echo {Name} {E-mail address}
2942
2944 GNU parallel makes all combinations when given two lists.
2945
2946 To make all combinations in a single list with unique values, you
2947 repeat the list and use replacement string {choose_k}:
2948
2949 parallel --plus echo {choose_k} ::: A B C D ::: A B C D
2950
2951 parallel --plus echo 2{2choose_k} 1{1choose_k} ::: A B C D ::: A B C D
2952
2953 {choose_k} works for any number of input sources:
2954
2955 parallel --plus echo {choose_k} ::: A B C D ::: A B C D ::: A B C D
2956
2958 Assume you have input like:
2959
2960 aardvark
2961 babble
2962 cab
2963 dab
2964 each
2965
2966 and want to run combinations like:
2967
2968 aardvark babble
2969 babble cab
2970 cab dab
2971 dab each
2972
2973 If the input is in the file in.txt:
2974
2975 parallel echo {1} - {2} ::::+ <(head -n -1 in.txt) <(tail -n +2 in.txt)
2976
2977 If the input is in the array $a here are two solutions:
2978
2979 seq $((${#a[@]}-1)) | \
2980 env_parallel --env a echo '${a[{=$_--=}]} - ${a[{}]}'
2981 parallel echo {1} - {2} ::: "${a[@]::${#a[@]}-1}" :::+ "${a[@]:1}"
2982
2984 Using --results the results are saved in /tmp/diffcount*.
2985
2986 parallel --results /tmp/diffcount "diff -U 0 {1} {2} | \
2987 tail -n +3 |grep -v '^@'|wc -l" ::: * ::: *
2988
2989 To see the difference between file A and file B look at the file
2990 '/tmp/diffcount/1/A/2/B'.
2991
2993 Starting a job on the local machine takes around 10 ms. This can be a
2994 big overhead if the job takes very few ms to run. Often you can group
2995 small jobs together using -X which will make the overhead less
2996 significant. Compare the speed of these:
2997
2998 seq -w 0 9999 | parallel touch pict{}.jpg
2999 seq -w 0 9999 | parallel -X touch pict{}.jpg
3000
3001 If your program cannot take multiple arguments, then you can use GNU
3002 parallel to spawn multiple GNU parallels:
3003
3004 seq -w 0 9999999 | \
3005 parallel -j10 -q -I,, --pipe parallel -j0 touch pict{}.jpg
3006
3007 If -j0 normally spawns 252 jobs, then the above will try to spawn 2520
3008 jobs. On a normal GNU/Linux system you can spawn 32000 jobs using this
3009 technique with no problems. To raise the 32000 jobs limit raise
3010 /proc/sys/kernel/pid_max to 4194303.
3011
3012 If you do not need GNU parallel to have control over each job (so no
3013 need for --retries or --joblog or similar), then it can be even faster
3014 if you can generate the command lines and pipe those to a shell. So if
3015 you can do this:
3016
3017 mygenerator | sh
3018
3019 Then that can be parallelized like this:
3020
3021 mygenerator | parallel --pipe --block 10M sh
3022
3023 E.g.
3024
3025 mygenerator() {
3026 seq 10000000 | perl -pe 'print "echo This is fast job number "';
3027 }
3028 mygenerator | parallel --pipe --block 10M sh
3029
3030 The overhead is 100000 times smaller namely around 100 nanoseconds per
3031 job.
3032
3034 When using shell variables you need to quote them correctly as they may
3035 otherwise be interpreted by the shell.
3036
3037 Notice the difference between:
3038
3039 ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
3040 parallel echo ::: ${ARR[@]} # This is probably not what you want
3041
3042 and:
3043
3044 ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
3045 parallel echo ::: "${ARR[@]}"
3046
3047 When using variables in the actual command that contains special
3048 characters (e.g. space) you can quote them using '"$VAR"' or using "'s
3049 and -q:
3050
3051 VAR="My brother's 12\" records are worth <\$\$\$>"
3052 parallel -q echo "$VAR" ::: '!'
3053 export VAR
3054 parallel echo '"$VAR"' ::: '!'
3055
3056 If $VAR does not contain ' then "'$VAR'" will also work (and does not
3057 need export):
3058
3059 VAR="My 12\" records are worth <\$\$\$>"
3060 parallel echo "'$VAR'" ::: '!'
3061
3062 If you use them in a function you just quote as you normally would do:
3063
3064 VAR="My brother's 12\" records are worth <\$\$\$>"
3065 export VAR
3066 myfunc() { echo "$VAR" "$1"; }
3067 export -f myfunc
3068 parallel myfunc ::: '!'
3069
3071 When running jobs that output data, you often do not want the output of
3072 multiple jobs to run together. GNU parallel defaults to grouping the
3073 output of each job, so the output is printed when the job finishes. If
3074 you want full lines to be printed while the job is running you can use
3075 --line-buffer. If you want output to be printed as soon as possible you
3076 can use -u.
3077
3078 Compare the output of:
3079
3080 parallel wget --limit-rate=100k \
3081 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3082 ::: {12..16}
3083 parallel --line-buffer wget --limit-rate=100k \
3084 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3085 ::: {12..16}
3086 parallel -u wget --limit-rate=100k \
3087 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3088 ::: {12..16}
3089
3091 GNU parallel groups the output lines, but it can be hard to see where
3092 the different jobs begin. --tag prepends the argument to make that more
3093 visible:
3094
3095 parallel --tag wget --limit-rate=100k \
3096 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3097 ::: {12..16}
3098
3099 --tag works with --line-buffer but not with -u:
3100
3101 parallel --tag --line-buffer wget --limit-rate=100k \
3102 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3103 ::: {12..16}
3104
3105 Check the uptime of the servers in ~/.parallel/sshloginfile:
3106
3107 parallel --tag -S .. --nonall uptime
3108
3110 Give each job a new color. Most terminals support ANSI colors with the
3111 escape code "\033[30;3Xm" where 0 <= X <= 7:
3112
3113 seq 10 | \
3114 parallel --tagstring '\033[30;3{=$_=++$::color%8=}m' seq {}
3115 parallel --rpl '{color} $_="\033[30;3".(++$::color%8)."m"' \
3116 --tagstring {color} seq {} ::: {1..10}
3117
3118 To get rid of the initial \t (which comes from --tagstring):
3119
3120 ... | perl -pe 's/\t//'
3121
3123 Normally the output of a job will be printed as soon as it completes.
3124 Sometimes you want the order of the output to remain the same as the
3125 order of the input. This is often important, if the output is used as
3126 input for another system. -k will make sure the order of output will be
3127 in the same order as input even if later jobs end before earlier jobs.
3128
3129 Append a string to every line in a text file:
3130
3131 cat textfile | parallel -k echo {} append_string
3132
3133 If you remove -k some of the lines may come out in the wrong order.
3134
3135 Another example is traceroute:
3136
3137 parallel traceroute ::: qubes-os.org debian.org freenetproject.org
3138
3139 will give traceroute of qubes-os.org, debian.org and
3140 freenetproject.org, but it will be sorted according to which job
3141 completed first.
3142
3143 To keep the order the same as input run:
3144
3145 parallel -k traceroute ::: qubes-os.org debian.org freenetproject.org
3146
3147 This will make sure the traceroute to qubes-os.org will be printed
3148 first.
3149
3150 A bit more complex example is downloading a huge file in chunks in
3151 parallel: Some internet connections will deliver more data if you
3152 download files in parallel. For downloading files in parallel see:
3153 "EXAMPLE: Download 10 images for each of the past 30 days". But if you
3154 are downloading a big file you can download the file in chunks in
3155 parallel.
3156
3157 To download byte 10000000-19999999 you can use curl:
3158
3159 curl -r 10000000-19999999 http://example.com/the/big/file >file.part
3160
3161 To download a 1 GB file we need 100 10MB chunks downloaded and combined
3162 in the correct order.
3163
3164 seq 0 99 | parallel -k curl -r \
3165 {}0000000-{}9999999 http://example.com/the/big/file > file
3166
3168 grep -r greps recursively through directories. On multicore CPUs GNU
3169 parallel can often speed this up.
3170
3171 find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}
3172
3173 This will run 1.5 job per CPU, and give 1000 arguments to grep.
3174
3176 The simplest solution to grep a big file for a lot of regexps is:
3177
3178 grep -f regexps.txt bigfile
3179
3180 Or if the regexps are fixed strings:
3181
3182 grep -F -f regexps.txt bigfile
3183
3184 There are 3 limiting factors: CPU, RAM, and disk I/O.
3185
3186 RAM is easy to measure: If the grep process takes up most of your free
3187 memory (e.g. when running top), then RAM is a limiting factor.
3188
3189 CPU is also easy to measure: If the grep takes >90% CPU in top, then
3190 the CPU is a limiting factor, and parallelization will speed this up.
3191
3192 It is harder to see if disk I/O is the limiting factor, and depending
3193 on the disk system it may be faster or slower to parallelize. The only
3194 way to know for certain is to test and measure.
3195
3196 Limiting factor: RAM
3197 The normal grep -f regexs.txt bigfile works no matter the size of
3198 bigfile, but if regexps.txt is so big it cannot fit into memory, then
3199 you need to split this.
3200
3201 grep -F takes around 100 bytes of RAM and grep takes about 500 bytes of
3202 RAM per 1 byte of regexp. So if regexps.txt is 1% of your RAM, then it
3203 may be too big.
3204
3205 If you can convert your regexps into fixed strings do that. E.g. if the
3206 lines you are looking for in bigfile all looks like:
3207
3208 ID1 foo bar baz Identifier1 quux
3209 fubar ID2 foo bar baz Identifier2
3210
3211 then your regexps.txt can be converted from:
3212
3213 ID1.*Identifier1
3214 ID2.*Identifier2
3215
3216 into:
3217
3218 ID1 foo bar baz Identifier1
3219 ID2 foo bar baz Identifier2
3220
3221 This way you can use grep -F which takes around 80% less memory and is
3222 much faster.
3223
3224 If it still does not fit in memory you can do this:
3225
3226 parallel --pipepart -a regexps.txt --block 1M grep -Ff - -n bigfile | \
3227 sort -un | perl -pe 's/^\d+://'
3228
3229 The 1M should be your free memory divided by the number of CPU threads
3230 and divided by 200 for grep -F and by 1000 for normal grep. On
3231 GNU/Linux you can do:
3232
3233 free=$(awk '/^((Swap)?Cached|MemFree|Buffers):/ { sum += $2 }
3234 END { print sum }' /proc/meminfo)
3235 percpu=$((free / 200 / $(parallel --number-of-threads)))k
3236
3237 parallel --pipepart -a regexps.txt --block $percpu --compress \
3238 grep -F -f - -n bigfile | \
3239 sort -un | perl -pe 's/^\d+://'
3240
3241 If you can live with duplicated lines and wrong order, it is faster to
3242 do:
3243
3244 parallel --pipepart -a regexps.txt --block $percpu --compress \
3245 grep -F -f - bigfile
3246
3247 Limiting factor: CPU
3248 If the CPU is the limiting factor parallelization should be done on the
3249 regexps:
3250
3251 cat regexp.txt | parallel --pipe -L1000 --roundrobin --compress \
3252 grep -f - -n bigfile | \
3253 sort -un | perl -pe 's/^\d+://'
3254
3255 The command will start one grep per CPU and read bigfile one time per
3256 CPU, but as that is done in parallel, all reads except the first will
3257 be cached in RAM. Depending on the size of regexp.txt it may be faster
3258 to use --block 10m instead of -L1000.
3259
3260 Some storage systems perform better when reading multiple chunks in
3261 parallel. This is true for some RAID systems and for some network file
3262 systems. To parallelize the reading of bigfile:
3263
3264 parallel --pipepart --block 100M -a bigfile -k --compress \
3265 grep -f regexp.txt
3266
3267 This will split bigfile into 100MB chunks and run grep on each of these
3268 chunks. To parallelize both reading of bigfile and regexp.txt combine
3269 the two using --fifo:
3270
3271 parallel --pipepart --block 100M -a bigfile --fifo cat regexp.txt \
3272 \| parallel --pipe -L1000 --roundrobin grep -f - {}
3273
3274 If a line matches multiple regexps, the line may be duplicated.
3275
3276 Bigger problem
3277 If the problem is too big to be solved by this, you are probably ready
3278 for Lucene.
3279
3281 To run commands on a remote computer SSH needs to be set up and you
3282 must be able to login without entering a password (The commands ssh-
3283 copy-id, ssh-agent, and sshpass may help you do that).
3284
3285 If you need to login to a whole cluster, you typically do not want to
3286 accept the host key for every host. You want to accept them the first
3287 time and be warned if they are ever changed. To do that:
3288
3289 # Add the servers to the sshloginfile
3290 (echo servera; echo serverb) > .parallel/my_cluster
3291 # Make sure .ssh/config exist
3292 touch .ssh/config
3293 cp .ssh/config .ssh/config.backup
3294 # Disable StrictHostKeyChecking temporarily
3295 (echo 'Host *'; echo StrictHostKeyChecking no) >> .ssh/config
3296 parallel --slf my_cluster --nonall true
3297 # Remove the disabling of StrictHostKeyChecking
3298 mv .ssh/config.backup .ssh/config
3299
3300 The servers in .parallel/my_cluster are now added in .ssh/known_hosts.
3301
3302 To run echo on server.example.com:
3303
3304 seq 10 | parallel --sshlogin server.example.com echo
3305
3306 To run commands on more than one remote computer run:
3307
3308 seq 10 | parallel --sshlogin s1.example.com,s2.example.net echo
3309
3310 Or:
3311
3312 seq 10 | parallel --sshlogin server.example.com \
3313 --sshlogin server2.example.net echo
3314
3315 If the login username is foo on server2.example.net use:
3316
3317 seq 10 | parallel --sshlogin server.example.com \
3318 --sshlogin foo@server2.example.net echo
3319
3320 If your list of hosts is server1-88.example.net with login foo:
3321
3322 seq 10 | parallel -Sfoo@server{1..88}.example.net echo
3323
3324 To distribute the commands to a list of computers, make a file
3325 mycomputers with all the computers:
3326
3327 server.example.com
3328 foo@server2.example.com
3329 server3.example.com
3330
3331 Then run:
3332
3333 seq 10 | parallel --sshloginfile mycomputers echo
3334
3335 To include the local computer add the special sshlogin ':' to the list:
3336
3337 server.example.com
3338 foo@server2.example.com
3339 server3.example.com
3340 :
3341
3342 GNU parallel will try to determine the number of CPUs on each of the
3343 remote computers, and run one job per CPU - even if the remote
3344 computers do not have the same number of CPUs.
3345
3346 If the number of CPUs on the remote computers is not identified
3347 correctly the number of CPUs can be added in front. Here the computer
3348 has 8 CPUs.
3349
3350 seq 10 | parallel --sshlogin 8/server.example.com echo
3351
3353 To recompress gzipped files with bzip2 using a remote computer run:
3354
3355 find logs/ -name '*.gz' | \
3356 parallel --sshlogin server.example.com \
3357 --transfer "zcat {} | bzip2 -9 >{.}.bz2"
3358
3359 This will list the .gz-files in the logs directory and all directories
3360 below. Then it will transfer the files to server.example.com to the
3361 corresponding directory in $HOME/logs. On server.example.com the file
3362 will be recompressed using zcat and bzip2 resulting in the
3363 corresponding file with .gz replaced with .bz2.
3364
3365 If you want the resulting bz2-file to be transferred back to the local
3366 computer add --return {.}.bz2:
3367
3368 find logs/ -name '*.gz' | \
3369 parallel --sshlogin server.example.com \
3370 --transfer --return {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3371
3372 After the recompressing is done the .bz2-file is transferred back to
3373 the local computer and put next to the original .gz-file.
3374
3375 If you want to delete the transferred files on the remote computer add
3376 --cleanup. This will remove both the file transferred to the remote
3377 computer and the files transferred from the remote computer:
3378
3379 find logs/ -name '*.gz' | \
3380 parallel --sshlogin server.example.com \
3381 --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
3382
3383 If you want run on several computers add the computers to --sshlogin
3384 either using ',' or multiple --sshlogin:
3385
3386 find logs/ -name '*.gz' | \
3387 parallel --sshlogin server.example.com,server2.example.com \
3388 --sshlogin server3.example.com \
3389 --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
3390
3391 You can add the local computer using --sshlogin :. This will disable
3392 the removing and transferring for the local computer only:
3393
3394 find logs/ -name '*.gz' | \
3395 parallel --sshlogin server.example.com,server2.example.com \
3396 --sshlogin server3.example.com \
3397 --sshlogin : \
3398 --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
3399
3400 Often --transfer, --return and --cleanup are used together. They can be
3401 shortened to --trc:
3402
3403 find logs/ -name '*.gz' | \
3404 parallel --sshlogin server.example.com,server2.example.com \
3405 --sshlogin server3.example.com \
3406 --sshlogin : \
3407 --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3408
3409 With the file mycomputers containing the list of computers it becomes:
3410
3411 find logs/ -name '*.gz' | parallel --sshloginfile mycomputers \
3412 --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3413
3414 If the file ~/.parallel/sshloginfile contains the list of computers the
3415 special short hand -S .. can be used:
3416
3417 find logs/ -name '*.gz' | parallel -S .. \
3418 --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3419
3421 Convert *.mp3 to *.ogg running one process per CPU on local computer
3422 and server2:
3423
3424 parallel --trc {.}.ogg -S server2,: \
3425 'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg' ::: *.mp3
3426
3428 To run the command uptime on remote computers you can do:
3429
3430 parallel --tag --nonall -S server1,server2 uptime
3431
3432 --nonall reads no arguments. If you have a list of jobs you want to run
3433 on each computer you can do:
3434
3435 parallel --tag --onall -S server1,server2 echo ::: 1 2 3
3436
3437 Remove --tag if you do not want the sshlogin added before the output.
3438
3439 If you have a lot of hosts use '-j0' to access more hosts in parallel.
3440
3442 If the workers are behind a NAT wall, you need some trickery to get to
3443 them.
3444
3445 If you can ssh to a jumphost, and reach the workers from there, then
3446 the obvious solution would be this, but it does not work:
3447
3448 parallel --ssh 'ssh jumphost ssh' -S host1 echo ::: DOES NOT WORK
3449
3450 It does not work because the command is dequoted by ssh twice where as
3451 GNU parallel only expects it to be dequoted once.
3452
3453 You can use a bash function and have GNU parallel quote the command:
3454
3455 jumpssh() { ssh -A jumphost ssh $(parallel --shellquote ::: "$@"); }
3456 export -f jumpssh
3457 parallel --ssh jumpssh -S host1 echo ::: this works
3458
3459 Or you can instead put this in ~/.ssh/config:
3460
3461 Host host1 host2 host3
3462 ProxyCommand ssh jumphost.domain nc -w 1 %h 22
3463
3464 It requires nc(netcat) to be installed on jumphost. With this you can
3465 simply:
3466
3467 parallel -S host1,host2,host3 echo ::: This does work
3468
3469 No jumphost, but port forwards
3470 If there is no jumphost but each server has port 22 forwarded from the
3471 firewall (e.g. the firewall's port 22001 = port 22 on host1, 22002 =
3472 host2, 22003 = host3) then you can use ~/.ssh/config:
3473
3474 Host host1.v
3475 Port 22001
3476 Host host2.v
3477 Port 22002
3478 Host host3.v
3479 Port 22003
3480 Host *.v
3481 Hostname firewall
3482
3483 And then use host{1..3}.v as normal hosts:
3484
3485 parallel -S host1.v,host2.v,host3.v echo ::: a b c
3486
3487 No jumphost, no port forwards
3488 If ports cannot be forwarded, you need some sort of VPN to traverse the
3489 NAT-wall. TOR is one options for that, as it is very easy to get
3490 working.
3491
3492 You need to install TOR and setup a hidden service. In torrc put:
3493
3494 HiddenServiceDir /var/lib/tor/hidden_service/
3495 HiddenServicePort 22 127.0.0.1:22
3496
3497 Then start TOR: /etc/init.d/tor restart
3498
3499 The TOR hostname is now in /var/lib/tor/hidden_service/hostname and is
3500 something similar to izjafdceobowklhz.onion. Now you simply prepend
3501 torsocks to ssh:
3502
3503 parallel --ssh 'torsocks ssh' -S izjafdceobowklhz.onion \
3504 -S zfcdaeiojoklbwhz.onion,auclucjzobowklhi.onion echo ::: a b c
3505
3506 If not all hosts are accessible through TOR:
3507
3508 parallel -S 'torsocks ssh izjafdceobowklhz.onion,host2,host3' \
3509 echo ::: a b c
3510
3511 See more ssh tricks on
3512 https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Proxies_and_Jump_Hosts
3513
3515 rsync is a great tool, but sometimes it will not fill up the available
3516 bandwidth. Running multiple rsync in parallel can fix this.
3517
3518 cd src-dir
3519 find . -type f |
3520 parallel -j10 -X rsync -zR -Ha ./{} fooserver:/dest-dir/
3521
3522 Adjust -j10 until you find the optimal number.
3523
3524 rsync -R will create the needed subdirectories, so all files are not
3525 put into a single dir. The ./ is needed so the resulting command looks
3526 similar to:
3527
3528 rsync -zR ././sub/dir/file fooserver:/dest-dir/
3529
3530 The /./ is what rsync -R works on.
3531
3532 If you are unable to push data, but need to pull them and the files are
3533 called digits.png (e.g. 000000.png) you might be able to do:
3534
3535 seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/
3536
3538 Copy files like foo.es.ext to foo.ext:
3539
3540 ls *.es.* | perl -pe 'print; s/\.es//' | parallel -N2 cp {1} {2}
3541
3542 The perl command spits out 2 lines for each input. GNU parallel takes 2
3543 inputs (using -N2) and replaces {1} and {2} with the inputs.
3544
3545 Count in binary:
3546
3547 parallel -k echo ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1
3548
3549 Print the number on the opposing sides of a six sided die:
3550
3551 parallel --link -a <(seq 6) -a <(seq 6 -1 1) echo
3552 parallel --link echo :::: <(seq 6) <(seq 6 -1 1)
3553
3554 Convert files from all subdirs to PNG-files with consecutive numbers
3555 (useful for making input PNG's for ffmpeg):
3556
3557 parallel --link -a <(find . -type f | sort) \
3558 -a <(seq $(find . -type f|wc -l)) convert {1} {2}.png
3559
3560 Alternative version:
3561
3562 find . -type f | sort | parallel convert {} {#}.png
3563
3565 Content of table_file.tsv:
3566
3567 foo<TAB>bar
3568 baz <TAB> quux
3569
3570 To run:
3571
3572 cmd -o bar -i foo
3573 cmd -o quux -i baz
3574
3575 you can run:
3576
3577 parallel -a table_file.tsv --colsep '\t' cmd -o {2} -i {1}
3578
3579 Note: The default for GNU parallel is to remove the spaces around the
3580 columns. To keep the spaces:
3581
3582 parallel -a table_file.tsv --trim n --colsep '\t' cmd -o {2} -i {1}
3583
3585 GNU parallel can output to a database table and a CSV-file:
3586
3587 dburl=csv:///%2Ftmp%2Fmydir
3588 dbtableurl=$dburl/mytable.csv
3589 parallel --sqlandworker $dbtableurl seq ::: {1..10}
3590
3591 It is rather slow and takes up a lot of CPU time because GNU parallel
3592 parses the whole CSV file for each update.
3593
3594 A better approach is to use an SQLite-base and then convert that to
3595 CSV:
3596
3597 dburl=sqlite3:///%2Ftmp%2Fmy.sqlite
3598 dbtableurl=$dburl/mytable
3599 parallel --sqlandworker $dbtableurl seq ::: {1..10}
3600 sql $dburl '.headers on' '.mode csv' 'SELECT * FROM mytable;'
3601
3602 This takes around a second per job.
3603
3604 If you have access to a real database system, such as PostgreSQL, it is
3605 even faster:
3606
3607 dburl=pg://user:pass@host/mydb
3608 dbtableurl=$dburl/mytable
3609 parallel --sqlandworker $dbtableurl seq ::: {1..10}
3610 sql $dburl \
3611 "COPY (SELECT * FROM mytable) TO stdout DELIMITER ',' CSV HEADER;"
3612
3613 Or MySQL:
3614
3615 dburl=mysql://user:pass@host/mydb
3616 dbtableurl=$dburl/mytable
3617 parallel --sqlandworker $dbtableurl seq ::: {1..10}
3618 sql -p -B $dburl "SELECT * FROM mytable;" > mytable.tsv
3619 perl -pe 's/"/""/g; s/\t/","/g; s/^/"/; s/$/"/;
3620 %s=("\\" => "\\", "t" => "\t", "n" => "\n");
3621 s/\\([\\tn])/$s{$1}/g;' mytable.tsv
3622
3624 If you have no need for the advanced job distribution control that a
3625 database provides, but you simply want output into a CSV file that you
3626 can read into R or LibreCalc, then you can use --results:
3627
3628 parallel --results my.csv seq ::: 10 20 30
3629 R
3630 > mydf <- read.csv("my.csv");
3631 > print(mydf[2,])
3632 > write(as.character(mydf[2,c("Stdout")]),'')
3633
3635 The show Aflyttet on Radio 24syv publishes an RSS feed with their audio
3636 podcasts on: http://arkiv.radio24syv.dk/audiopodcast/channel/4466232
3637
3638 Using xpath you can extract the URLs for 2019 and download them using
3639 GNU parallel:
3640
3641 wget -O - http://arkiv.radio24syv.dk/audiopodcast/channel/4466232 | \
3642 xpath -e "//pubDate[contains(text(),'2019')]/../enclosure/@url" | \
3643 parallel -u wget '{= s/ url="//; s/"//; =}'
3644
3646 If you want to run the same command with the same arguments 10 times in
3647 parallel you can do:
3648
3649 seq 10 | parallel -n0 my_command my_args
3650
3652 GNU parallel can work similar to cat | sh.
3653
3654 A resource inexpensive job is a job that takes very little CPU, disk
3655 I/O and network I/O. Ping is an example of a resource inexpensive job.
3656 wget is too - if the webpages are small.
3657
3658 The content of the file jobs_to_run:
3659
3660 ping -c 1 10.0.0.1
3661 wget http://example.com/status.cgi?ip=10.0.0.1
3662 ping -c 1 10.0.0.2
3663 wget http://example.com/status.cgi?ip=10.0.0.2
3664 ...
3665 ping -c 1 10.0.0.255
3666 wget http://example.com/status.cgi?ip=10.0.0.255
3667
3668 To run 100 processes simultaneously do:
3669
3670 parallel -j 100 < jobs_to_run
3671
3672 As there is not a command the jobs will be evaluated by the shell.
3673
3675 FASTA files have the format:
3676
3677 >Sequence name1
3678 sequence
3679 sequence continued
3680 >Sequence name2
3681 sequence
3682 sequence continued
3683 more sequence
3684
3685 To call myprog with the sequence as argument run:
3686
3687 cat file.fasta |
3688 parallel --pipe -N1 --recstart '>' --rrs \
3689 'read a; echo Name: "$a"; myprog $(tr -d "\n")'
3690
3692 To process a big file or some output you can use --pipe to split up the
3693 data into blocks and pipe the blocks into the processing program.
3694
3695 If the program is gzip -9 you can do:
3696
3697 cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz
3698
3699 This will split bigfile into blocks of 1 MB and pass that to gzip -9 in
3700 parallel. One gzip will be run per CPU. The output of gzip -9 will be
3701 kept in order and saved to bigfile.gz
3702
3703 gzip works fine if the output is appended, but some processing does not
3704 work like that - for example sorting. For this GNU parallel can put the
3705 output of each command into a file. This will sort a big file in
3706 parallel:
3707
3708 cat bigfile | parallel --pipe --files sort |\
3709 parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort
3710
3711 Here bigfile is split into blocks of around 1MB, each block ending in
3712 '\n' (which is the default for --recend). Each block is passed to sort
3713 and the output from sort is saved into files. These files are passed to
3714 the second parallel that runs sort -m on the files before it removes
3715 the files. The output is saved to bigfile.sort.
3716
3717 GNU parallel's --pipe maxes out at around 100 MB/s because every byte
3718 has to be copied through GNU parallel. But if bigfile is a real
3719 (seekable) file GNU parallel can by-pass the copying and send the parts
3720 directly to the program:
3721
3722 parallel --pipepart --block 100m -a bigfile --files sort |\
3723 parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort
3724
3726 When processing with --pipe you may have lines grouped by a value. Here
3727 is my.csv:
3728
3729 Transaction Customer Item
3730 1 a 53
3731 2 b 65
3732 3 b 82
3733 4 c 96
3734 5 c 67
3735 6 c 13
3736 7 d 90
3737 8 d 43
3738 9 d 91
3739 10 d 84
3740 11 e 72
3741 12 e 102
3742 13 e 63
3743 14 e 56
3744 15 e 74
3745
3746 Let us assume you want GNU parallel to process each customer. In other
3747 words: You want all the transactions for a single customer to be
3748 treated as a single record.
3749
3750 To do this we preprocess the data with a program that inserts a record
3751 separator before each customer (column 2 = $F[1]). Here we first make a
3752 50 character random string, which we then use as the separator:
3753
3754 sep=`perl -e 'print map { ("a".."z","A".."Z")[rand(52)] } (1..50);'`
3755 cat my.csv | \
3756 perl -ape '$F[1] ne $l and print "'$sep'"; $l = $F[1]' | \
3757 parallel --recend $sep --rrs --pipe -N1 wc
3758
3759 If your program can process multiple customers replace -N1 with a
3760 reasonable --blocksize.
3761
3763 If you need to run a massive amount of jobs in parallel, then you will
3764 likely hit the filehandle limit which is often around 250 jobs. If you
3765 are super user you can raise the limit in /etc/security/limits.conf but
3766 you can also use this workaround. The filehandle limit is per process.
3767 That means that if you just spawn more GNU parallels then each of them
3768 can run 250 jobs. This will spawn up to 2500 jobs:
3769
3770 cat myinput |\
3771 parallel --pipe -N 50 --roundrobin -j50 parallel -j50 your_prg
3772
3773 This will spawn up to 62500 jobs (use with caution - you need 64 GB RAM
3774 to do this, and you may need to increase /proc/sys/kernel/pid_max):
3775
3776 cat myinput |\
3777 parallel --pipe -N 250 --roundrobin -j250 parallel -j250 your_prg
3778
3780 The command sem is an alias for parallel --semaphore.
3781
3782 A counting semaphore will allow a given number of jobs to be started in
3783 the background. When the number of jobs are running in the background,
3784 GNU sem will wait for one of these to complete before starting another
3785 command. sem --wait will wait for all jobs to complete.
3786
3787 Run 10 jobs concurrently in the background:
3788
3789 for i in *.log ; do
3790 echo $i
3791 sem -j10 gzip $i ";" echo done
3792 done
3793 sem --wait
3794
3795 A mutex is a counting semaphore allowing only one job to run. This will
3796 edit the file myfile and prepends the file with lines with the numbers
3797 1 to 3.
3798
3799 seq 3 | parallel sem sed -i -e '1i{}' myfile
3800
3801 As myfile can be very big it is important only one process edits the
3802 file at the same time.
3803
3804 Name the semaphore to have multiple different semaphores active at the
3805 same time:
3806
3807 seq 3 | parallel sem --id mymutex sed -i -e '1i{}' myfile
3808
3810 Assume a script is called from cron or from a web service, but only one
3811 instance can be run at a time. With sem and --shebang-wrap the script
3812 can be made to wait for other instances to finish. Here in bash:
3813
3814 #!/usr/bin/sem --shebang-wrap -u --id $0 --fg /bin/bash
3815
3816 echo This will run
3817 sleep 5
3818 echo exclusively
3819
3820 Here perl:
3821
3822 #!/usr/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/perl
3823
3824 print "This will run ";
3825 sleep 5;
3826 print "exclusively\n";
3827
3828 Here python:
3829
3830 #!/usr/local/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/python
3831
3832 import time
3833 print "This will run ";
3834 time.sleep(5)
3835 print "exclusively";
3836
3838 You can use GNU parallel to start interactive programs like emacs or
3839 vi:
3840
3841 cat filelist | parallel --tty -X emacs
3842 cat filelist | parallel --tty -X vi
3843
3844 If there are more files than will fit on a single command line, the
3845 editor will be started again with the remaining files.
3846
3848 sudo requires a password to run a command as root. It caches the
3849 access, so you only need to enter the password again if you have not
3850 used sudo for a while.
3851
3852 The command:
3853
3854 parallel sudo echo ::: This is a bad idea
3855
3856 is no good, as you would be prompted for the sudo password for each of
3857 the jobs. You can either do:
3858
3859 sudo echo This
3860 parallel sudo echo ::: is a good idea
3861
3862 or:
3863
3864 sudo parallel echo ::: This is a good idea
3865
3866 This way you only have to enter the sudo password once.
3867
3869 GNU parallel can work as a simple job queue system or batch manager.
3870 The idea is to put the jobs into a file and have GNU parallel read from
3871 that continuously. As GNU parallel will stop at end of file we use tail
3872 to continue reading:
3873
3874 true >jobqueue; tail -n+0 -f jobqueue | parallel
3875
3876 To submit your jobs to the queue:
3877
3878 echo my_command my_arg >> jobqueue
3879
3880 You can of course use -S to distribute the jobs to remote computers:
3881
3882 true >jobqueue; tail -n+0 -f jobqueue | parallel -S ..
3883
3884 If you keep this running for a long time, jobqueue will grow. A way of
3885 removing the jobs already run is by making GNU parallel stop when it
3886 hits a special value and then restart. To use --eof to make GNU
3887 parallel exit, tail also needs to be forced to exit:
3888
3889 true >jobqueue;
3890 while true; do
3891 tail -n+0 -f jobqueue |
3892 (parallel -E StOpHeRe -S ..; echo GNU Parallel is now done;
3893 perl -e 'while(<>){/StOpHeRe/ and last};print <>' jobqueue > j2;
3894 (seq 1000 >> jobqueue &);
3895 echo Done appending dummy data forcing tail to exit)
3896 echo tail exited;
3897 mv j2 jobqueue
3898 done
3899
3900 In some cases you can run on more CPUs and computers during the night:
3901
3902 # Day time
3903 echo 50% > jobfile
3904 cp day_server_list ~/.parallel/sshloginfile
3905 # Night time
3906 echo 100% > jobfile
3907 cp night_server_list ~/.parallel/sshloginfile
3908 tail -n+0 -f jobqueue | parallel --jobs jobfile -S ..
3909
3910 GNU parallel discovers if jobfile or ~/.parallel/sshloginfile changes.
3911
3912 There is a a small issue when using GNU parallel as queue system/batch
3913 manager: You have to submit JobSlot number of jobs before they will
3914 start, and after that you can submit one at a time, and job will start
3915 immediately if free slots are available. Output from the running or
3916 completed jobs are held back and will only be printed when JobSlots
3917 more jobs has been started (unless you use --ungroup or --line-buffer,
3918 in which case the output from the jobs are printed immediately). E.g.
3919 if you have 10 jobslots then the output from the first completed job
3920 will only be printed when job 11 has started, and the output of second
3921 completed job will only be printed when job 12 has started.
3922
3924 If you have a dir in which users drop files that needs to be processed
3925 you can do this on GNU/Linux (If you know what inotifywait is called on
3926 other platforms file a bug report):
3927
3928 inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
3929 parallel -u echo
3930
3931 This will run the command echo on each file put into my_dir or subdirs
3932 of my_dir.
3933
3934 You can of course use -S to distribute the jobs to remote computers:
3935
3936 inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
3937 parallel -S .. -u echo
3938
3939 If the files to be processed are in a tar file then unpacking one file
3940 and processing it immediately may be faster than first unpacking all
3941 files. Set up the dir processor as above and unpack into the dir.
3942
3943 Using GNU parallel as dir processor has the same limitations as using
3944 GNU parallel as queue system/batch manager.
3945
3947 If you have downloaded source and tried compiling it, you may have
3948 seen:
3949
3950 $ ./configure
3951 [...]
3952 checking for something.h... no
3953 configure: error: "libsomething not found"
3954
3955 Often it is not obvious which package you should install to get that
3956 file. Debian has `apt-file` to search for a file. `tracefile` from
3957 https://gitlab.com/ole.tange/tangetools can tell which files a program
3958 tried to access. In this case we are interested in one of the last
3959 files:
3960
3961 $ tracefile -un ./configure | tail | parallel -j0 apt-file search
3962
3964 --round-robin, --pipe-part, --shard, --bin and --group-by are all
3965 specialized versions of --pipe.
3966
3967 In the following n is the number of jobslots given by --jobs. A record
3968 starts with --recstart and ends with --recend. It is typically a full
3969 line. A chunk is a number of full records that is approximately the
3970 size of a block. A block can contain half records, a chunk cannot.
3971
3972 --pipe starts one job per chunk. It reads blocks from stdin (standard
3973 input). It finds a record end near a block border and passes a chunk to
3974 the program.
3975
3976 --pipe-part starts one job per chunk - just like normal --pipe. It
3977 first finds record endings near all block borders in the file and then
3978 starts the jobs. By using --block -1 it will set the block size to 1/n
3979 * size-of-file. Used this way it will start n jobs in total.
3980
3981 --round-robin starts n jobs in total. It reads a block and passes a
3982 chunk to whichever job is ready to read. It does not parse the content
3983 except for identifying where a record ends to make sure it only passes
3984 full records.
3985
3986 --shard starts n jobs in total. It parses each line to read the value
3987 in the given column. Based on this value the line is passed to one of
3988 the n jobs. All lines having this value will be given to the same
3989 jobslot.
3990
3991 --bin works like --shard but the value of the column is the jobslot
3992 number it will be passed to. If the value is bigger than n, then n will
3993 be subtracted from the value until the values is smaller than or equal
3994 to n.
3995
3996 --group-by starts one job per chunk. Record borders are not given by
3997 --recend/--recstart. Instead a record is defined by a number of lines
3998 having the same value in a given column. So the value of a given column
3999 changes at a chunk border. With --pipe every line is parsed, with
4000 --pipe-part only a few lines are parsed to find the chunk border.
4001
4002 --group-by can be combined with --round-robin or --pipe-part.
4003
4005 GNU parallel is very liberal in quoting. You only need to quote
4006 characters that have special meaning in shell:
4007
4008 ( ) $ ` ' " < > ; | \
4009
4010 and depending on context these needs to be quoted, too:
4011
4012 ~ & # ! ? space * {
4013
4014 Therefore most people will never need more quoting than putting '\' in
4015 front of the special characters.
4016
4017 Often you can simply put \' around every ':
4018
4019 perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' file
4020
4021 can be quoted:
4022
4023 parallel perl -ne \''/^\S+\s+\S+$/ and print $ARGV,"\n"'\' ::: file
4024
4025 However, when you want to use a shell variable you need to quote the
4026 $-sign. Here is an example using $PARALLEL_SEQ. This variable is set by
4027 GNU parallel itself, so the evaluation of the $ must be done by the sub
4028 shell started by GNU parallel:
4029
4030 seq 10 | parallel -N2 echo seq:\$PARALLEL_SEQ arg1:{1} arg2:{2}
4031
4032 If the variable is set before GNU parallel starts you can do this:
4033
4034 VAR=this_is_set_before_starting
4035 echo test | parallel echo {} $VAR
4036
4037 Prints: test this_is_set_before_starting
4038
4039 It is a little more tricky if the variable contains more than one space
4040 in a row:
4041
4042 VAR="two spaces between each word"
4043 echo test | parallel echo {} \'"$VAR"\'
4044
4045 Prints: test two spaces between each word
4046
4047 If the variable should not be evaluated by the shell starting GNU
4048 parallel but be evaluated by the sub shell started by GNU parallel,
4049 then you need to quote it:
4050
4051 echo test | parallel VAR=this_is_set_after_starting \; echo {} \$VAR
4052
4053 Prints: test this_is_set_after_starting
4054
4055 It is a little more tricky if the variable contains space:
4056
4057 echo test |\
4058 parallel VAR='"two spaces between each word"' echo {} \'"$VAR"\'
4059
4060 Prints: test two spaces between each word
4061
4062 $$ is the shell variable containing the process id of the shell. This
4063 will print the process id of the shell running GNU parallel:
4064
4065 seq 10 | parallel echo $$
4066
4067 And this will print the process ids of the sub shells started by GNU
4068 parallel.
4069
4070 seq 10 | parallel echo \$\$
4071
4072 If the special characters should not be evaluated by the sub shell then
4073 you need to protect it against evaluation from both the shell starting
4074 GNU parallel and the sub shell:
4075
4076 echo test | parallel echo {} \\\$VAR
4077
4078 Prints: test $VAR
4079
4080 GNU parallel can protect against evaluation by the sub shell by using
4081 -q:
4082
4083 echo test | parallel -q echo {} \$VAR
4084
4085 Prints: test $VAR
4086
4087 This is particularly useful if you have lots of quoting. If you want to
4088 run a perl script like this:
4089
4090 perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' file
4091
4092 It needs to be quoted like one of these:
4093
4094 ls | parallel perl -ne '/^\\S+\\s+\\S+\$/\ and\ print\ \$ARGV,\"\\n\"'
4095 ls | parallel perl -ne \''/^\S+\s+\S+$/ and print $ARGV,"\n"'\'
4096
4097 Notice how spaces, \'s, "'s, and $'s need to be quoted. GNU parallel
4098 can do the quoting by using option -q:
4099
4100 ls | parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"'
4101
4102 However, this means you cannot make the sub shell interpret special
4103 characters. For example because of -q this WILL NOT WORK:
4104
4105 ls *.gz | parallel -q "zcat {} >{.}"
4106 ls *.gz | parallel -q "zcat {} | bzip2 >{.}.bz2"
4107
4108 because > and | need to be interpreted by the sub shell.
4109
4110 If you get errors like:
4111
4112 sh: -c: line 0: syntax error near unexpected token
4113 sh: Syntax error: Unterminated quoted string
4114 sh: -c: line 0: unexpected EOF while looking for matching `''
4115 sh: -c: line 1: syntax error: unexpected end of file
4116 zsh:1: no matches found:
4117
4118 then you might try using -q.
4119
4120 If you are using bash process substitution like <(cat foo) then you may
4121 try -q and prepending command with bash -c:
4122
4123 ls | parallel -q bash -c 'wc -c <(echo {})'
4124
4125 Or for substituting output:
4126
4127 ls | parallel -q bash -c \
4128 'tar c {} | tee >(gzip >{}.tar.gz) | bzip2 >{}.tar.bz2'
4129
4130 Conclusion: To avoid dealing with the quoting problems it may be easier
4131 just to write a small script or a function (remember to export -f the
4132 function) and have GNU parallel call that.
4133
4135 If you want a list of the jobs currently running you can run:
4136
4137 killall -USR1 parallel
4138
4139 GNU parallel will then print the currently running jobs on stderr
4140 (standard error).
4141
4143 If you regret starting a lot of jobs you can simply break GNU parallel,
4144 but if you want to make sure you do not have half-completed jobs you
4145 should send the signal SIGHUP to GNU parallel:
4146
4147 killall -HUP parallel
4148
4149 This will tell GNU parallel to not start any new jobs, but wait until
4150 the currently running jobs are finished before exiting.
4151
4153 $PARALLEL_HOME
4154 Dir where GNU parallel stores config files, semaphores, and
4155 caches information between invocations. Default:
4156 $HOME/.parallel.
4157
4158 $PARALLEL_JOBSLOT
4159 Set by GNU parallel and can be used in jobs run by GNU
4160 parallel. Remember to quote the $, so it gets evaluated by
4161 the correct shell. Or use --plus and {slot}.
4162
4163 $PARALLEL_JOBSLOT is the jobslot of the job. It is equal to
4164 {%} unless the job is being retried. See {%} for details.
4165
4166 $PARALLEL_PID
4167 Set by GNU parallel and can be used in jobs run by GNU
4168 parallel. Remember to quote the $, so it gets evaluated by
4169 the correct shell.
4170
4171 This makes it possible for the jobs to communicate directly to
4172 GNU parallel.
4173
4174 Example: If each of the jobs tests a solution and one of jobs
4175 finds the solution the job can tell GNU parallel not to start
4176 more jobs by: kill -HUP $PARALLEL_PID. This only works on the
4177 local computer.
4178
4179 $PARALLEL_RSYNC_OPTS
4180 Options to pass on to rsync. Defaults to: -rlDzR.
4181
4182 $PARALLEL_SHELL
4183 Use this shell for the commands run by GNU parallel:
4184
4185 · $PARALLEL_SHELL. If undefined use:
4186
4187 · The shell that started GNU parallel. If that cannot be
4188 determined:
4189
4190 · $SHELL. If undefined use:
4191
4192 · /bin/sh
4193
4194 $PARALLEL_SSH
4195 GNU parallel defaults to using the ssh command for remote
4196 access. This can be overridden with $PARALLEL_SSH, which again
4197 can be overridden with --ssh. It can also be set on a per
4198 server basis (see --sshlogin).
4199
4200 $PARALLEL_SSHHOST
4201 Set by GNU parallel and can be used in jobs run by GNU
4202 parallel. Remember to quote the $, so it gets evaluated by
4203 the correct shell. Or use --plus and {host}.
4204
4205 $PARALLEL_SSHHOST is the host part of an sshlogin line. E.g.
4206
4207 4//usr/bin/specialssh user@host
4208
4209 becomes:
4210
4211 host
4212
4213 $PARALLEL_SSHLOGIN
4214 Set by GNU parallel and can be used in jobs run by GNU
4215 parallel. Remember to quote the $, so it gets evaluated by
4216 the correct shell. Or use --plus and {sshlogin}.
4217
4218 The value is the sshlogin line with number of cores removed.
4219 E.g.
4220
4221 4//usr/bin/specialssh user@host
4222
4223 becomes:
4224
4225 /usr/bin/specialssh user@host
4226
4227 $PARALLEL_SEQ
4228 Set by GNU parallel and can be used in jobs run by GNU
4229 parallel. Remember to quote the $, so it gets evaluated by
4230 the correct shell.
4231
4232 $PARALLEL_SEQ is the sequence number of the job running.
4233
4234 Example:
4235
4236 seq 10 | parallel -N2 \
4237 echo seq:'$'PARALLEL_SEQ arg1:{1} arg2:{2}
4238
4239 {#} is a shorthand for $PARALLEL_SEQ.
4240
4241 $PARALLEL_TMUX
4242 Path to tmux. If unset the tmux in $PATH is used.
4243
4244 $TMPDIR Directory for temporary files. See: --tmpdir.
4245
4246 $PARALLEL
4247 The environment variable $PARALLEL will be used as default
4248 options for GNU parallel. If the variable contains special
4249 shell characters (e.g. $, *, or space) then these need to be
4250 to be escaped with \.
4251
4252 Example:
4253
4254 cat list | parallel -j1 -k -v ls
4255 cat list | parallel -j1 -k -v -S"myssh user@server" ls
4256
4257 can be written as:
4258
4259 cat list | PARALLEL="-kvj1" parallel ls
4260 cat list | PARALLEL='-kvj1 -S myssh\ user@server' \
4261 parallel echo
4262
4263 Notice the \ after 'myssh' is needed because 'myssh' and
4264 'user@server' must be one argument.
4265
4267 The global configuration file /etc/parallel/config, followed by user
4268 configuration file ~/.parallel/config (formerly known as .parallelrc)
4269 will be read in turn if they exist. Lines starting with '#' will be
4270 ignored. The format can follow that of the environment variable
4271 $PARALLEL, but it is often easier to simply put each option on its own
4272 line.
4273
4274 Options on the command line take precedence, followed by the
4275 environment variable $PARALLEL, user configuration file
4276 ~/.parallel/config, and finally the global configuration file
4277 /etc/parallel/config.
4278
4279 Note that no file that is read for options, nor the environment
4280 variable $PARALLEL, may contain retired options such as --tollef.
4281
4283 If --profile set, GNU parallel will read the profile from that file
4284 rather than the global or user configuration files. You can have
4285 multiple --profiles.
4286
4287 Profiles are searched for in ~/.parallel. If the name starts with / it
4288 is seen as an absolute path. If the name starts with ./ it is seen as a
4289 relative path from current dir.
4290
4291 Example: Profile for running a command on every sshlogin in
4292 ~/.ssh/sshlogins and prepend the output with the sshlogin:
4293
4294 echo --tag -S .. --nonall > ~/.parallel/n
4295 parallel -Jn uptime
4296
4297 Example: Profile for running every command with -j-1 and nice
4298
4299 echo -j-1 nice > ~/.parallel/nice_profile
4300 parallel -J nice_profile bzip2 -9 ::: *
4301
4302 Example: Profile for running a perl script before every command:
4303
4304 echo "perl -e '\$a=\$\$; print \$a,\" \",'\$PARALLEL_SEQ',\" \";';" \
4305 > ~/.parallel/pre_perl
4306 parallel -J pre_perl echo ::: *
4307
4308 Note how the $ and " need to be quoted using \.
4309
4310 Example: Profile for running distributed jobs with nice on the remote
4311 computers:
4312
4313 echo -S .. nice > ~/.parallel/dist
4314 parallel -J dist --trc {.}.bz2 bzip2 -9 ::: *
4315
4317 Exit status depends on --halt-on-error if one of these is used:
4318 success=X, success=Y%, fail=Y%.
4319
4320 0 All jobs ran without error. If success=X is used: X jobs ran
4321 without error. If success=Y% is used: Y% of the jobs ran without
4322 error.
4323
4324 1-100 Some of the jobs failed. The exit status gives the number of
4325 failed jobs. If Y% is used the exit status is the percentage of
4326 jobs that failed.
4327
4328 101 More than 100 jobs failed.
4329
4330 255 Other error.
4331
4332 -1 (In joblog and SQL table)
4333 Killed by Ctrl-C, timeout, not enough memory or similar.
4334
4335 -2 (In joblog and SQL table)
4336 skip() was called in {= =}.
4337
4338 -1000 (In SQL table)
4339 Job is ready to run (set by --sqlmaster).
4340
4341 -1220 (In SQL table)
4342 Job is taken by worker (set by --sqlworker).
4343
4344 If fail=1 is used, the exit status will be the exit status of the
4345 failing job.
4346
4348 See: man parallel_alternatives
4349
4351 Quoting of newline
4352 Because of the way newline is quoted this will not work:
4353
4354 echo 1,2,3 | parallel -vkd, "echo 'a{}b'"
4355
4356 However, these will all work:
4357
4358 echo 1,2,3 | parallel -vkd, echo a{}b
4359 echo 1,2,3 | parallel -vkd, "echo 'a'{}'b'"
4360 echo 1,2,3 | parallel -vkd, "echo 'a'"{}"'b'"
4361
4362 Speed
4363 Startup
4364
4365 GNU parallel is slow at starting up - around 250 ms the first time and
4366 150 ms after that.
4367
4368 Job startup
4369
4370 Starting a job on the local machine takes around 10 ms. This can be a
4371 big overhead if the job takes very few ms to run. Often you can group
4372 small jobs together using -X which will make the overhead less
4373 significant. Or you can run multiple GNU parallels as described in
4374 EXAMPLE: Speeding up fast jobs.
4375
4376 SSH
4377
4378 When using multiple computers GNU parallel opens ssh connections to
4379 them to figure out how many connections can be used reliably
4380 simultaneously (Namely SSHD's MaxStartups). This test is done for each
4381 host in serial, so if your --sshloginfile contains many hosts it may be
4382 slow.
4383
4384 If your jobs are short you may see that there are fewer jobs running on
4385 the remote systems than expected. This is due to time spent logging in
4386 and out. -M may help here.
4387
4388 Disk access
4389
4390 A single disk can normally read data faster if it reads one file at a
4391 time instead of reading a lot of files in parallel, as this will avoid
4392 disk seeks. However, newer disk systems with multiple drives can read
4393 faster if reading from multiple files in parallel.
4394
4395 If the jobs are of the form read-all-compute-all-write-all, so
4396 everything is read before anything is written, it may be faster to
4397 force only one disk access at the time:
4398
4399 sem --id diskio cat file | compute | sem --id diskio cat > file
4400
4401 If the jobs are of the form read-compute-write, so writing starts
4402 before all reading is done, it may be faster to force only one reader
4403 and writer at the time:
4404
4405 sem --id read cat file | compute | sem --id write cat > file
4406
4407 If the jobs are of the form read-compute-read-compute, it may be faster
4408 to run more jobs in parallel than the system has CPUs, as some of the
4409 jobs will be stuck waiting for disk access.
4410
4411 --nice limits command length
4412 The current implementation of --nice is too pessimistic in the max
4413 allowed command length. It only uses a little more than half of what it
4414 could. This affects -X and -m. If this becomes a real problem for you,
4415 file a bug-report.
4416
4417 Aliases and functions do not work
4418 If you get:
4419
4420 Can't exec "command": No such file or directory
4421
4422 or:
4423
4424 open3: exec of by command failed
4425
4426 or:
4427
4428 /bin/bash: command: command not found
4429
4430 it may be because command is not known, but it could also be because
4431 command is an alias or a function. If it is a function you need to
4432 export -f the function first or use env_parallel. An alias will only
4433 work if you use env_parallel.
4434
4435 Database with MySQL fails randomly
4436 The --sql* options may fail randomly with MySQL. This problem does not
4437 exist with PostgreSQL.
4438
4440 Report bugs to <bug-parallel@gnu.org> or
4441 https://savannah.gnu.org/bugs/?func=additem&group=parallel
4442
4443 See a perfect bug report on
4444 https://lists.gnu.org/archive/html/bug-parallel/2015-01/msg00000.html
4445
4446 Your bug report should always include:
4447
4448 · The error message you get (if any). If the error message is not from
4449 GNU parallel you need to show why you think GNU parallel caused
4450 these.
4451
4452 · The complete output of parallel --version. If you are not running the
4453 latest released version (see http://ftp.gnu.org/gnu/parallel/) you
4454 should specify why you believe the problem is not fixed in that
4455 version.
4456
4457 · A minimal, complete, and verifiable example (See description on
4458 http://stackoverflow.com/help/mcve).
4459
4460 It should be a complete example that others can run that shows the
4461 problem including all files needed to run the example. This should
4462 preferably be small and simple, so try to remove as many options as
4463 possible. A combination of yes, seq, cat, echo, wc, and sleep can
4464 reproduce most errors. If your example requires large files, see if
4465 you can make them with something like seq 100000000 > bigfile or yes
4466 | head -n 1000000000 > file.
4467
4468 If your example requires remote execution, see if you can use
4469 localhost - maybe using another login.
4470
4471 If you have access to a different system (maybe a VirtualBox on your
4472 own machine), test if the MCVE shows the problem on that system.
4473
4474 · The output of your example. If your problem is not easily reproduced
4475 by others, the output might help them figure out the problem.
4476
4477 · Whether you have watched the intro videos
4478 (http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1), walked
4479 through the tutorial (man parallel_tutorial), and read the EXAMPLE
4480 section in the man page (man parallel - search for EXAMPLE:).
4481
4482 If you suspect the error is dependent on your environment or
4483 distribution, please see if you can reproduce the error on one of these
4484 VirtualBox images:
4485 http://sourceforge.net/projects/virtualboximage/files/
4486 http://www.osboxes.org/virtualbox-images/
4487
4488 Specifying the name of your distribution is not enough as you may have
4489 installed software that is not in the VirtualBox images.
4490
4491 If you cannot reproduce the error on any of the VirtualBox images
4492 above, see if you can build a VirtualBox image on which you can
4493 reproduce the error. If not you should assume the debugging will be
4494 done through you. That will put more burden on you and it is extra
4495 important you give any information that help. In general the problem
4496 will be fixed faster and with less work for you if you can reproduce
4497 the error on a VirtualBox.
4498
4500 When using GNU parallel for a publication please cite:
4501
4502 O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
4503 The USENIX Magazine, February 2011:42-47.
4504
4505 This helps funding further development; and it won't cost you a cent.
4506 If you pay 10000 EUR you should feel free to use GNU Parallel without
4507 citing.
4508
4509 Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
4510
4511 Copyright (C) 2008-2010 Ole Tange, http://ole.tange.dk
4512
4513 Copyright (C) 2010-2020 Ole Tange, http://ole.tange.dk and Free
4514 Software Foundation, Inc.
4515
4516 Parts of the manual concerning xargs compatibility is inspired by the
4517 manual of xargs from GNU findutils 4.4.2.
4518
4520 This program is free software; you can redistribute it and/or modify it
4521 under the terms of the GNU General Public License as published by the
4522 Free Software Foundation; either version 3 of the License, or at your
4523 option any later version.
4524
4525 This program is distributed in the hope that it will be useful, but
4526 WITHOUT ANY WARRANTY; without even the implied warranty of
4527 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
4528 General Public License for more details.
4529
4530 You should have received a copy of the GNU General Public License along
4531 with this program. If not, see <http://www.gnu.org/licenses/>.
4532
4533 Documentation license I
4534 Permission is granted to copy, distribute and/or modify this
4535 documentation under the terms of the GNU Free Documentation License,
4536 Version 1.3 or any later version published by the Free Software
4537 Foundation; with no Invariant Sections, with no Front-Cover Texts, and
4538 with no Back-Cover Texts. A copy of the license is included in the
4539 file fdl.txt.
4540
4541 Documentation license II
4542 You are free:
4543
4544 to Share to copy, distribute and transmit the work
4545
4546 to Remix to adapt the work
4547
4548 Under the following conditions:
4549
4550 Attribution
4551 You must attribute the work in the manner specified by the
4552 author or licensor (but not in any way that suggests that they
4553 endorse you or your use of the work).
4554
4555 Share Alike
4556 If you alter, transform, or build upon this work, you may
4557 distribute the resulting work only under the same, similar or
4558 a compatible license.
4559
4560 With the understanding that:
4561
4562 Waiver Any of the above conditions can be waived if you get
4563 permission from the copyright holder.
4564
4565 Public Domain
4566 Where the work or any of its elements is in the public domain
4567 under applicable law, that status is in no way affected by the
4568 license.
4569
4570 Other Rights
4571 In no way are any of the following rights affected by the
4572 license:
4573
4574 · Your fair dealing or fair use rights, or other applicable
4575 copyright exceptions and limitations;
4576
4577 · The author's moral rights;
4578
4579 · Rights other persons may have either in the work itself or
4580 in how the work is used, such as publicity or privacy
4581 rights.
4582
4583 Notice For any reuse or distribution, you must make clear to others
4584 the license terms of this work.
4585
4586 A copy of the full license is included in the file as cc-by-sa.txt.
4587
4589 GNU parallel uses Perl, and the Perl modules Getopt::Long, IPC::Open3,
4590 Symbol, IO::File, POSIX, and File::Temp.
4591
4592 For --csv it uses the Perl module Text::CSV.
4593
4594 For remote usage it uses rsync with ssh.
4595
4597 ssh(1), ssh-agent(1), sshpass(1), ssh-copy-id(1), rsync(1), find(1),
4598 xargs(1), dirname(1), make(1), pexec(1), ppss(1), xjobs(1), prll(1),
4599 dxargs(1), mdm(1)
4600
4601
4602
460320200522 2020-06-06 PARALLEL(1)