1PARALLEL(1) parallel PARALLEL(1)
2
3
4
6 parallel - build and execute shell command lines from standard input in
7 parallel
8
10 parallel [options] [command [arguments]] < list_of_arguments
11
12 parallel [options] [command [arguments]] ( ::: arguments | :::+
13 arguments | :::: argfile(s) | ::::+ argfile(s) ) ...
14
15 parallel --semaphore [options] command
16
17 #!/usr/bin/parallel --shebang [options] [command [arguments]]
18
19 #!/usr/bin/parallel --shebang-wrap [options] [command [arguments]]
20
22 STOP!
23
24 Read the Reader's guide below if you are new to GNU parallel.
25
26 GNU parallel is a shell tool for executing jobs in parallel using one
27 or more computers. A job can be a single command or a small script that
28 has to be run for each of the lines in the input. The typical input is
29 a list of files, a list of hosts, a list of users, a list of URLs, or a
30 list of tables. A job can also be a command that reads from a pipe. GNU
31 parallel can then split the input into blocks and pipe a block into
32 each command in parallel.
33
34 If you use xargs and tee today you will find GNU parallel very easy to
35 use as GNU parallel is written to have the same options as xargs. If
36 you write loops in shell, you will find GNU parallel may be able to
37 replace most of the loops and make them run faster by running several
38 jobs in parallel.
39
40 GNU parallel makes sure output from the commands is the same output as
41 you would get had you run the commands sequentially. This makes it
42 possible to use output from GNU parallel as input for other programs.
43
44 For each line of input GNU parallel will execute command with the line
45 as arguments. If no command is given, the line of input is executed.
46 Several lines will be run in parallel. GNU parallel can often be used
47 as a substitute for xargs or cat | bash.
48
49 Reader's guide
50 GNU parallel includes the 4 types of documentation: Tutorial, how-to,
51 reference and explanation.
52
53 Tutorial
54
55 If you prefer reading a book buy GNU Parallel 2018 at
56 http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
57 or download it at: https://doi.org/10.5281/zenodo.1146014 Read at least
58 chapter 1+2. It should take you less than 20 minutes.
59
60 Otherwise start by watching the intro videos for a quick introduction:
61 http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
62
63 If you want to dive deeper: spend a couple of hours walking through the
64 tutorial (man parallel_tutorial). Your command line will love you for
65 it.
66
67 How-to
68
69 You can find a lot of EXAMPLEs of use after the list of OPTIONS in man
70 parallel (Use LESS=+/EXAMPLE: man parallel). That will give you an idea
71 of what GNU parallel is capable of, and you may find a solution you can
72 simply adapt to your situation.
73
74 Reference
75
76 If you need a one page printable cheat sheet you can find it on:
77 https://www.gnu.org/software/parallel/parallel_cheat.pdf
78
79 The man page is the reference for all options.
80
81 Design discussion
82
83 If you want to know the design decisions behind GNU parallel, try: man
84 parallel_design. This is also a good intro if you intend to change GNU
85 parallel.
86
88 command
89 Command to execute. If command or the following arguments contain
90 replacement strings (such as {}) every instance will be substituted
91 with the input.
92
93 If command is given, GNU parallel solve the same tasks as xargs. If
94 command is not given GNU parallel will behave similar to cat | sh.
95
96 The command must be an executable, a script, a composed command, an
97 alias, or a function.
98
99 Bash functions: export -f the function first or use env_parallel.
100
101 Bash, Csh, or Tcsh aliases: Use env_parallel.
102
103 Zsh, Fish, Ksh, and Pdksh functions and aliases: Use env_parallel.
104
105 {} Input line. This replacement string will be replaced by a full line
106 read from the input source. The input source is normally stdin
107 (standard input), but can also be given with -a, :::, or ::::.
108
109 The replacement string {} can be changed with -I.
110
111 If the command line contains no replacement strings then {} will be
112 appended to the command line.
113
114 Replacement strings are normally quoted, so special characters are
115 not parsed by the shell. The exception is if the command starts
116 with a replacement string; then the string is not quoted.
117
118 {.} Input line without extension. This replacement string will be
119 replaced by the input with the extension removed. If the input line
120 contains . after the last /, the last . until the end of the string
121 will be removed and {.} will be replaced with the remaining. E.g.
122 foo.jpg becomes foo, subdir/foo.jpg becomes subdir/foo,
123 sub.dir/foo.jpg becomes sub.dir/foo, sub.dir/bar remains
124 sub.dir/bar. If the input line does not contain . it will remain
125 unchanged.
126
127 The replacement string {.} can be changed with --er.
128
129 To understand replacement strings see {}.
130
131 {/} Basename of input line. This replacement string will be replaced by
132 the input with the directory part removed.
133
134 The replacement string {/} can be changed with --basenamereplace.
135
136 To understand replacement strings see {}.
137
138 {//}
139 Dirname of input line. This replacement string will be replaced by
140 the dir of the input line. See dirname(1).
141
142 The replacement string {//} can be changed with --dirnamereplace.
143
144 To understand replacement strings see {}.
145
146 {/.}
147 Basename of input line without extension. This replacement string
148 will be replaced by the input with the directory and extension part
149 removed. It is a combination of {/} and {.}.
150
151 The replacement string {/.} can be changed with
152 --basenameextensionreplace.
153
154 To understand replacement strings see {}.
155
156 {#} Sequence number of the job to run. This replacement string will be
157 replaced by the sequence number of the job being run. It contains
158 the same number as $PARALLEL_SEQ.
159
160 The replacement string {#} can be changed with --seqreplace.
161
162 To understand replacement strings see {}.
163
164 {%} Job slot number. This replacement string will be replaced by the
165 job's slot number between 1 and number of jobs to run in parallel.
166 There will never be 2 jobs running at the same time with the same
167 job slot number.
168
169 The replacement string {%} can be changed with --slotreplace.
170
171 If the job needs to be retried (e.g using --retries or
172 --retry-failed) the job slot is not automatically updated. You
173 should then instead use $PARALLEL_JOBSLOT:
174
175 $ do_test() {
176 id="$3 {%}=$1 PARALLEL_JOBSLOT=$2"
177 echo run "$id";
178 sleep 1
179 # fail if {%} is odd
180 return `echo $1%2 | bc`
181 }
182 $ export -f do_test
183 $ parallel -j3 --jl mylog do_test {%} \$PARALLEL_JOBSLOT {} ::: A B C D
184 run A {%}=1 PARALLEL_JOBSLOT=1
185 run B {%}=2 PARALLEL_JOBSLOT=2
186 run C {%}=3 PARALLEL_JOBSLOT=3
187 run D {%}=1 PARALLEL_JOBSLOT=1
188 $ parallel --retry-failed -j3 --jl mylog do_test {%} \$PARALLEL_JOBSLOT {} ::: A B C D
189 run A {%}=1 PARALLEL_JOBSLOT=1
190 run C {%}=3 PARALLEL_JOBSLOT=2
191 run D {%}=1 PARALLEL_JOBSLOT=3
192
193 Notice how {%} and $PARALLEL_JOBSLOT differ in the retry run of C
194 and D.
195
196 To understand replacement strings see {}.
197
198 {n} Argument from input source n or the n'th argument. This positional
199 replacement string will be replaced by the input from input source
200 n (when used with -a or ::::) or with the n'th argument (when used
201 with -N). If n is negative it refers to the n'th last argument.
202
203 To understand replacement strings see {}.
204
205 {n.}
206 Argument from input source n or the n'th argument without
207 extension. It is a combination of {n} and {.}.
208
209 This positional replacement string will be replaced by the input
210 from input source n (when used with -a or ::::) or with the n'th
211 argument (when used with -N). The input will have the extension
212 removed.
213
214 To understand positional replacement strings see {n}.
215
216 {n/}
217 Basename of argument from input source n or the n'th argument. It
218 is a combination of {n} and {/}.
219
220 This positional replacement string will be replaced by the input
221 from input source n (when used with -a or ::::) or with the n'th
222 argument (when used with -N). The input will have the directory (if
223 any) removed.
224
225 To understand positional replacement strings see {n}.
226
227 {n//}
228 Dirname of argument from input source n or the n'th argument. It
229 is a combination of {n} and {//}.
230
231 This positional replacement string will be replaced by the dir of
232 the input from input source n (when used with -a or ::::) or with
233 the n'th argument (when used with -N). See dirname(1).
234
235 To understand positional replacement strings see {n}.
236
237 {n/.}
238 Basename of argument from input source n or the n'th argument
239 without extension. It is a combination of {n}, {/}, and {.}.
240
241 This positional replacement string will be replaced by the input
242 from input source n (when used with -a or ::::) or with the n'th
243 argument (when used with -N). The input will have the directory (if
244 any) and extension removed.
245
246 To understand positional replacement strings see {n}.
247
248 {=perl expression=}
249 Replace with calculated perl expression. $_ will contain the same
250 as {}. After evaluating perl expression $_ will be used as the
251 value. It is recommended to only change $_ but you have full access
252 to all of GNU parallel's internal functions and data structures. A
253 few convenience functions and data structures have been made:
254
255 Q(string) shell quote a string
256
257 pQ(string) perl quote a string
258
259 uq() (or uq) do not quote current replacement string
260
261 total_jobs() number of jobs in total
262
263 slot() slot number of job
264
265 seq() sequence number of job
266
267 @arg the arguments
268
269 Example:
270
271 seq 10 | parallel echo {} + 1 is {= '$_++' =}
272 parallel csh -c {= '$_="mkdir ".Q($_)' =} ::: '12" dir'
273 seq 50 | parallel echo job {#} of {= '$_=total_jobs()' =}
274
275 See also: --rpl --parens
276
277 {=n perl expression=}
278 Positional equivalent to {=perl expression=}. To understand
279 positional replacement strings see {n}.
280
281 See also: {=perl expression=} {n}.
282
283 ::: arguments
284 Use arguments from the command line as input source instead of
285 stdin (standard input). Unlike other options for GNU parallel :::
286 is placed after the command and before the arguments.
287
288 The following are equivalent:
289
290 (echo file1; echo file2) | parallel gzip
291 parallel gzip ::: file1 file2
292 parallel gzip {} ::: file1 file2
293 parallel --arg-sep ,, gzip {} ,, file1 file2
294 parallel --arg-sep ,, gzip ,, file1 file2
295 parallel ::: "gzip file1" "gzip file2"
296
297 To avoid treating ::: as special use --arg-sep to set the argument
298 separator to something else. See also --arg-sep.
299
300 If multiple ::: are given, each group will be treated as an input
301 source, and all combinations of input sources will be generated.
302 E.g. ::: 1 2 ::: a b c will result in the combinations (1,a) (1,b)
303 (1,c) (2,a) (2,b) (2,c). This is useful for replacing nested for-
304 loops.
305
306 ::: and :::: can be mixed. So these are equivalent:
307
308 parallel echo {1} {2} {3} ::: 6 7 ::: 4 5 ::: 1 2 3
309 parallel echo {1} {2} {3} :::: <(seq 6 7) <(seq 4 5) \
310 :::: <(seq 1 3)
311 parallel -a <(seq 6 7) echo {1} {2} {3} :::: <(seq 4 5) \
312 :::: <(seq 1 3)
313 parallel -a <(seq 6 7) -a <(seq 4 5) echo {1} {2} {3} \
314 ::: 1 2 3
315 seq 6 7 | parallel -a - -a <(seq 4 5) echo {1} {2} {3} \
316 ::: 1 2 3
317 seq 4 5 | parallel echo {1} {2} {3} :::: <(seq 6 7) - \
318 ::: 1 2 3
319
320 :::+ arguments
321 Like ::: but linked like --link to the previous input source.
322
323 Contrary to --link, values do not wrap: The shortest input source
324 determines the length.
325
326 Example:
327
328 parallel echo ::: a b c :::+ 1 2 3 ::: X Y :::+ 11 22
329
330 :::: argfiles
331 Another way to write -a argfile1 -a argfile2 ...
332
333 ::: and :::: can be mixed.
334
335 See -a, ::: and --link.
336
337 ::::+ argfiles
338 Like :::: but linked like --link to the previous input source.
339
340 Contrary to --link, values do not wrap: The shortest input source
341 determines the length.
342
343 --null
344 -0 Use NUL as delimiter. Normally input lines will end in \n
345 (newline). If they end in \0 (NUL), then use this option. It is
346 useful for processing arguments that may contain \n (newline).
347
348 --arg-file input-file
349 -a input-file
350 Use input-file as input source. If you use this option, stdin
351 (standard input) is given to the first process run. Otherwise,
352 stdin (standard input) is redirected from /dev/null.
353
354 If multiple -a are given, each input-file will be treated as an
355 input source, and all combinations of input sources will be
356 generated. E.g. The file foo contains 1 2, the file bar contains a
357 b c. -a foo -a bar will result in the combinations (1,a) (1,b)
358 (1,c) (2,a) (2,b) (2,c). This is useful for replacing nested for-
359 loops.
360
361 See also --link and {n}.
362
363 --arg-file-sep sep-str
364 Use sep-str instead of :::: as separator string between command and
365 argument files. Useful if :::: is used for something else by the
366 command.
367
368 See also: ::::.
369
370 --arg-sep sep-str
371 Use sep-str instead of ::: as separator string. Useful if ::: is
372 used for something else by the command.
373
374 Also useful if you command uses ::: but you still want to read
375 arguments from stdin (standard input): Simply change --arg-sep to a
376 string that is not in the command line.
377
378 See also: :::.
379
380 --bar
381 Show progress as a progress bar. In the bar is shown: % of jobs
382 completed, estimated seconds left, and number of jobs started.
383
384 It is compatible with zenity:
385
386 seq 1000 | parallel -j30 --bar '(echo {};sleep 0.1)' \
387 2> >(perl -pe 'BEGIN{$/="\r";$|=1};s/\r/\n/g' |
388 zenity --progress --auto-kill) | wc
389
390 --basefile file
391 --bf file
392 file will be transferred to each sshlogin before a job is started.
393 It will be removed if --cleanup is active. The file may be a script
394 to run or some common base data needed for the job. Multiple --bf
395 can be specified to transfer more basefiles. The file will be
396 transferred the same way as --transferfile.
397
398 --basenamereplace replace-str
399 --bnr replace-str
400 Use the replacement string replace-str instead of {/} for basename
401 of input line.
402
403 --basenameextensionreplace replace-str
404 --bner replace-str
405 Use the replacement string replace-str instead of {/.} for basename
406 of input line without extension.
407
408 --bin binexpr
409 Use binexpr as binning key and bin input to the jobs.
410
411 binexpr is [column number|column name] [perlexpression] e.g. 3,
412 Address, 3 $_%=100, Address s/\D//g.
413
414 Each input line is split using --colsep. The value of the column is
415 put into $_, the perl expression is executed, the resulting value
416 is is the job slot that will be given the line. If the value is
417 bigger than the number of jobslots the value will be modulo number
418 of jobslots.
419
420 This is similar to --shard but the hashing algorithm is a simple
421 modulo, which makes it predictible which jobslot will receive which
422 value.
423
424 The performance is in the order of 100K rows per second. Faster if
425 the bincol is small (<10), slower if it is big (>100).
426
427 --bin requires --pipe and a fixed numeric value for --jobs.
428
429 See also --shard, --group-by, --roundrobin.
430
431 --bg
432 Run command in background thus GNU parallel will not wait for
433 completion of the command before exiting. This is the default if
434 --semaphore is set.
435
436 See also: --fg, man sem.
437
438 Implies --semaphore.
439
440 --bibtex
441 --citation
442 Print the citation notice and BibTeX entry for GNU parallel,
443 silence citation notice for all future runs, and exit. It will not
444 run any commands.
445
446 If it is impossible for you to run --citation you can instead use
447 --will-cite, which will run commands, but which will only silence
448 the citation notice for this single run.
449
450 If you use --will-cite in scripts to be run by others you are
451 making it harder for others to see the citation notice. The
452 development of GNU parallel is indirectly financed through
453 citations, so if your users do not know they should cite then you
454 are making it harder to finance development. However, if you pay
455 10000 EUR, you have done your part to finance future development
456 and should feel free to use --will-cite in scripts.
457
458 If you do not want to help financing future development by letting
459 other users see the citation notice or by paying, then please use
460 another tool instead of GNU parallel. You can find some of the
461 alternatives in man parallel_alternatives.
462
463 --block size
464 --block-size size
465 Size of block in bytes to read at a time. The size can be postfixed
466 with K, M, G, T, P, E, k, m, g, t, p, or e which would multiply the
467 size with 1024, 1048576, 1073741824, 1099511627776,
468 1125899906842624, 1152921504606846976, 1000, 1000000, 1000000000,
469 1000000000000, 1000000000000000, or 1000000000000000000
470 respectively.
471
472 GNU parallel tries to meet the block size but can be off by the
473 length of one record. For performance reasons size should be bigger
474 than a two records. GNU parallel will warn you and automatically
475 increase the size if you choose a size that is too small.
476
477 If you use -N, --block-size should be bigger than N+1 records.
478
479 size defaults to 1M.
480
481 When using --pipepart a negative block size is not interpreted as a
482 blocksize but as the number of blocks each jobslot should have. So
483 this will run 10*5 = 50 jobs in total:
484
485 parallel --pipepart -a myfile --block -10 -j5 wc
486
487 This is an efficient alternative to --roundrobin because data is
488 never read by GNU parallel, but you can still have very few
489 jobslots process a large amount of data.
490
491 See --pipe and --pipepart for use of this.
492
493 --blocktimeout duration
494 --bt duration
495 Time out for reading block when using --pipe. If it takes longer
496 than duration to read a full block, use the partial block read so
497 far.
498
499 duration must be in whole seconds, but can be expressed as floats
500 postfixed with s, m, h, or d which would multiply the float by 1,
501 60, 3600, or 86400. Thus these are equivalent: --blocktimeout
502 100000 and --blocktimeout 1d3.5h16.6m4s.
503
504 --cat
505 Create a temporary file with content. Normally --pipe/--pipepart
506 will give data to the program on stdin (standard input). With --cat
507 GNU parallel will create a temporary file with the name in {}, so
508 you can do: parallel --pipe --cat wc {}.
509
510 Implies --pipe unless --pipepart is used.
511
512 See also --fifo.
513
514 --cleanup
515 Remove transferred files. --cleanup will remove the transferred
516 files on the remote computer after processing is done.
517
518 find log -name '*gz' | parallel \
519 --sshlogin server.example.com --transferfile {} \
520 --return {.}.bz2 --cleanup "zcat {} | bzip -9 >{.}.bz2"
521
522 With --transferfile {} the file transferred to the remote computer
523 will be removed on the remote computer. Directories created will
524 not be removed - even if they are empty.
525
526 With --return the file transferred from the remote computer will be
527 removed on the remote computer. Directories created will not be
528 removed - even if they are empty.
529
530 --cleanup is ignored when not used with --transferfile or --return.
531
532 --colsep regexp
533 -C regexp
534 Column separator. The input will be treated as a table with regexp
535 separating the columns. The n'th column can be accessed using {n}
536 or {n.}. E.g. {3} is the 3rd column.
537
538 If there are more input sources, each input source will be
539 separated, but the columns from each input source will be linked
540 (see --link).
541
542 parallel --colsep '-' echo {4} {3} {2} {1} \
543 ::: A-B C-D ::: e-f g-h
544
545 --colsep implies --trim rl, which can be overridden with --trim n.
546
547 regexp is a Perl Regular Expression:
548 http://perldoc.perl.org/perlre.html
549
550 --compress
551 Compress temporary files. If the output is big and very
552 compressible this will take up less disk space in $TMPDIR and
553 possibly be faster due to less disk I/O.
554
555 GNU parallel will try pzstd, lbzip2, pbzip2, zstd, pigz, lz4, lzop,
556 plzip, lzip, lrz, gzip, pxz, lzma, bzip2, xz, clzip, in that order,
557 and use the first available.
558
559 --compress-program prg
560 --decompress-program prg
561 Use prg for (de)compressing temporary files. It is assumed that prg
562 -dc will decompress stdin (standard input) to stdout (standard
563 output) unless --decompress-program is given.
564
565 --csv
566 Treat input as CSV-format. --colsep sets the field delimiter. It
567 works very much like --colsep except it deals correctly with
568 quoting:
569
570 echo '"1 big, 2 small","2""x4"" plank",12.34' |
571 parallel --csv echo {1} of {2} at {3}
572
573 Even quoted newlines are parsed correctly:
574
575 (echo '"Start of field 1 with newline'
576 echo 'Line 2 in field 1";value 2') |
577 parallel --csv --colsep ';' echo Field 1: {1} Field 2: {2}
578
579 When used with --pipe only pass full CSV-records.
580
581 --delay mytime (alpha testing)
582 Delay starting next job by mytime. GNU parallel will pause mytime
583 after starting each job. mytime is normally in seconds, but can be
584 floats postfixed with s, m, h, or d which would multiply the float
585 by 1, 60, 3600, or 86400. Thus these are equivalent: --delay 100000
586 and --delay 1d3.5h16.6m4s.
587
588 If you append 'auto' to mytime (e.g. 13m3sauto) GNU parallel will
589 automatically try to find the optimal value: If a job fails, mytime
590 is doubled. If a job succeeds, mytime is decreased by 10%.
591
592 --delimiter delim
593 -d delim
594 Input items are terminated by delim. Quotes and backslash are not
595 special; every character in the input is taken literally. Disables
596 the end-of-file string, which is treated like any other argument.
597 The specified delimiter may be characters, C-style character
598 escapes such as \n, or octal or hexadecimal escape codes. Octal
599 and hexadecimal escape codes are understood as for the printf
600 command. Multibyte characters are not supported.
601
602 --dirnamereplace replace-str
603 --dnr replace-str
604 Use the replacement string replace-str instead of {//} for dirname
605 of input line.
606
607 --dry-run
608 Print the job to run on stdout (standard output), but do not run
609 the job. Use -v -v to include the wrapping that GNU parallel
610 generates (for remote jobs, --tmux, --nice, --pipe, --pipepart,
611 --fifo and --cat). Do not count on this literally, though, as the
612 job may be scheduled on another computer or the local computer if :
613 is in the list.
614
615 -E eof-str
616 Set the end of file string to eof-str. If the end of file string
617 occurs as a line of input, the rest of the input is not read. If
618 neither -E nor -e is used, no end of file string is used.
619
620 --eof[=eof-str]
621 -e[eof-str]
622 This option is a synonym for the -E option. Use -E instead,
623 because it is POSIX compliant for xargs while this option is not.
624 If eof-str is omitted, there is no end of file string. If neither
625 -E nor -e is used, no end of file string is used.
626
627 --embed
628 Embed GNU parallel in a shell script. If you need to distribute
629 your script to someone who does not want to install GNU parallel
630 you can embed GNU parallel in your own shell script:
631
632 parallel --embed > new_script
633
634 After which you add your code at the end of new_script. This is
635 tested on ash, bash, dash, ksh, sh, and zsh.
636
637 --env var
638 Copy environment variable var. This will copy var to the
639 environment that the command is run in. This is especially useful
640 for remote execution.
641
642 In Bash var can also be a Bash function - just remember to export
643 -f the function, see command.
644
645 The variable '_' is special. It will copy all exported environment
646 variables except for the ones mentioned in
647 ~/.parallel/ignored_vars.
648
649 To copy the full environment (both exported and not exported
650 variables, arrays, and functions) use env_parallel.
651
652 See also: --record-env, --session.
653
654 --eta
655 Show the estimated number of seconds before finishing. This forces
656 GNU parallel to read all jobs before starting to find the number of
657 jobs. GNU parallel normally only reads the next job to run.
658
659 The estimate is based on the runtime of finished jobs, so the first
660 estimate will only be shown when the first job has finished.
661
662 Implies --progress.
663
664 See also: --bar, --progress.
665
666 --fg
667 Run command in foreground.
668
669 With --tmux and --tmuxpane GNU parallel will start tmux in the
670 foreground.
671
672 With --semaphore GNU parallel will run the command in the
673 foreground (opposite --bg), and wait for completion of the command
674 before exiting.
675
676 See also --bg, man sem.
677
678 --fifo
679 Create a temporary fifo with content. Normally --pipe and
680 --pipepart will give data to the program on stdin (standard input).
681 With --fifo GNU parallel will create a temporary fifo with the name
682 in {}, so you can do: parallel --pipe --fifo wc {}.
683
684 Beware: If data is not read from the fifo, the job will block
685 forever.
686
687 Implies --pipe unless --pipepart is used.
688
689 See also --cat.
690
691 --filter-hosts
692 Remove down hosts. For each remote host: check that login through
693 ssh works. If not: do not use this host.
694
695 For performance reasons, this check is performed only at the start
696 and every time --sshloginfile is changed. If an host goes down
697 after the first check, it will go undetected until --sshloginfile
698 is changed; --retries can be used to mitigate this.
699
700 Currently you can not put --filter-hosts in a profile, $PARALLEL,
701 /etc/parallel/config or similar. This is because GNU parallel uses
702 GNU parallel to compute this, so you will get an infinite loop.
703 This will likely be fixed in a later release.
704
705 --gnu
706 Behave like GNU parallel. This option historically took precedence
707 over --tollef. The --tollef option is now retired, and therefore
708 may not be used. --gnu is kept for compatibility.
709
710 --group
711 Group output. Output from each job is grouped together and is only
712 printed when the command is finished. Stdout (standard output)
713 first followed by stderr (standard error).
714
715 This takes in the order of 0.5ms per job and depends on the speed
716 of your disk for larger output. It can be disabled with -u, but
717 this means output from different commands can get mixed.
718
719 --group is the default. Can be reversed with -u.
720
721 See also: --line-buffer --ungroup
722
723 --group-by val (alpha testing)
724 Group input by value. Combined with --pipe/--pipepart --group-by
725 groups lines with the same value into a record.
726
727 The value can be computed from the full line or from a single
728 column.
729
730 val can be:
731
732 column number Use the value in the column numbered.
733
734 column name Treat the first line as a header and use the value
735 in the column named.
736
737 (Not supported with --pipepart).
738
739 perl expression
740 Run the perl expression and use $_ as the value.
741
742 column number perl expression
743 Put the value of the column put in $_, run the perl
744 expression, and use $_ as the value.
745
746 column name perl expression
747 Put the value of the column put in $_, run the perl
748 expression, and use $_ as the value.
749
750 (Not supported with --pipepart).
751
752 Example:
753
754 UserID, Consumption
755 123, 1
756 123, 2
757 12-3, 1
758 221, 3
759 221, 1
760 2/21, 5
761
762 If you want to group 123, 12-3, 221, and 2/21 into 4 records and
763 pass one record at a time to wc:
764
765 tail -n +2 table.csv | \
766 parallel --pipe --colsep , --group-by 1 -kN1 wc
767
768 Make GNU parallel treat the first line as a header:
769
770 cat table.csv | \
771 parallel --pipe --colsep , --header : --group-by 1 -kN1 wc
772
773 Address column by column name:
774
775 cat table.csv | \
776 parallel --pipe --colsep , --header : --group-by UserID -kN1 wc
777
778 If 12-3 and 123 are really the same UserID, remove non-digits in
779 UserID when grouping:
780
781 cat table.csv | parallel --pipe --colsep , --header : \
782 --group-by 'UserID s/\D//g' -kN1 wc
783
784 See also --shard, --roundrobin.
785
786 --help
787 -h Print a summary of the options to GNU parallel and exit.
788
789 --halt-on-error val
790 --halt val
791 When should GNU parallel terminate? In some situations it makes no
792 sense to run all jobs. GNU parallel should simply give up as soon
793 as a condition is met.
794
795 val defaults to never, which runs all jobs no matter what.
796
797 val can also take on the form of when,why.
798
799 when can be 'now' which means kill all running jobs and halt
800 immediately, or it can be 'soon' which means wait for all running
801 jobs to complete, but start no new jobs.
802
803 why can be 'fail=X', 'fail=Y%', 'success=X', 'success=Y%',
804 'done=X', or 'done=Y%' where X is the number of jobs that has to
805 fail, succeed, or be done before halting, and Y is the percentage
806 of jobs that has to fail, succeed, or be done before halting.
807
808 Example:
809
810 --halt now,fail=1 exit when the first job fails. Kill running
811 jobs.
812
813 --halt soon,fail=3 exit when 3 jobs fail, but wait for running
814 jobs to complete.
815
816 --halt soon,fail=3% exit when 3% of the jobs have failed, but
817 wait for running jobs to complete.
818
819 --halt now,success=1 exit when a job succeeds. Kill running jobs.
820
821 --halt soon,success=3 exit when 3 jobs succeeds, but wait for
822 running jobs to complete.
823
824 --halt now,success=3% exit when 3% of the jobs have succeeded.
825 Kill running jobs.
826
827 --halt now,done=1 exit when one of the jobs finishes. Kill
828 running jobs.
829
830 --halt soon,done=3 exit when 3 jobs finishes, but wait for
831 running jobs to complete.
832
833 --halt now,done=3% exit when 3% of the jobs have finished. Kill
834 running jobs.
835
836 For backwards compatibility these also work:
837
838 0 never
839
840 1 soon,fail=1
841
842 2 now,fail=1
843
844 -1 soon,success=1
845
846 -2 now,success=1
847
848 1-99% soon,fail=1-99%
849
850 --header regexp
851 Use regexp as header. For normal usage the matched header
852 (typically the first line: --header '.*\n') will be split using
853 --colsep (which will default to '\t') and column names can be used
854 as replacement variables: {column name}, {column name/}, {column
855 name//}, {column name/.}, {column name.}, {=column name perl
856 expression =}, ..
857
858 For --pipe the matched header will be prepended to each output.
859
860 --header : is an alias for --header '.*\n'.
861
862 If regexp is a number, it is a fixed number of lines.
863
864 --hostgroups (alpha testing)
865 --hgrp (alpha testing)
866 Enable hostgroups on arguments. If an argument contains '@' the
867 string after '@' will be removed and treated as a list of
868 hostgroups on which this job is allowed to run. If there is no
869 --sshlogin with a corresponding group, the job will run on any
870 hostgroup.
871
872 Example:
873
874 parallel --hostgroups \
875 --sshlogin @grp1/myserver1 -S @grp1+grp2/myserver2 \
876 --sshlogin @grp3/myserver3 \
877 echo ::: my_grp1_arg@grp1 arg_for_grp2@grp2 third@grp1+grp3
878
879 my_grp1_arg may be run on either myserver1 or myserver2, third may
880 be run on either myserver1 or myserver3, but arg_for_grp2 will only
881 be run on myserver2.
882
883 See also: --sshlogin, $PARALLEL_HOSTGROUPS.
884
885 -I replace-str
886 Use the replacement string replace-str instead of {}.
887
888 --replace[=replace-str]
889 -i[replace-str]
890 This option is a synonym for -Ireplace-str if replace-str is
891 specified, and for -I {} otherwise. This option is deprecated; use
892 -I instead.
893
894 --joblog logfile
895 Logfile for executed jobs. Save a list of the executed jobs to
896 logfile in the following TAB separated format: sequence number,
897 sshlogin, start time as seconds since epoch, run time in seconds,
898 bytes in files transferred, bytes in files returned, exit status,
899 signal, and command run.
900
901 For --pipe bytes transferred and bytes returned are number of input
902 and output of bytes.
903
904 If logfile is prepended with '+' log lines will be appended to the
905 logfile.
906
907 To convert the times into ISO-8601 strict do:
908
909 cat logfile | perl -a -F"\t" -ne \
910 'chomp($F[2]=`date -d \@$F[2] +%FT%T`); print join("\t",@F)'
911
912 If the host is long, you can use column -t to pretty print it:
913
914 cat joblog | column -t
915
916 See also --resume --resume-failed.
917
918 --jobs N
919 -j N
920 --max-procs N
921 -P N
922 Number of jobslots on each machine. Run up to N jobs in parallel.
923 0 means as many as possible. Default is 100% which will run one job
924 per CPU on each machine.
925
926 If --semaphore is set, the default is 1 thus making a mutex.
927
928 --jobs +N
929 -j +N
930 --max-procs +N
931 -P +N
932 Add N to the number of CPUs. Run this many jobs in parallel. See
933 also --use-cores-instead-of-threads and
934 --use-sockets-instead-of-threads.
935
936 --jobs -N
937 -j -N
938 --max-procs -N
939 -P -N
940 Subtract N from the number of CPUs. Run this many jobs in
941 parallel. If the evaluated number is less than 1 then 1 will be
942 used. See also --use-cores-instead-of-threads and
943 --use-sockets-instead-of-threads.
944
945 --jobs N%
946 -j N%
947 --max-procs N%
948 -P N%
949 Multiply N% with the number of CPUs. Run this many jobs in
950 parallel. See also --use-cores-instead-of-threads and
951 --use-sockets-instead-of-threads.
952
953 --jobs procfile
954 -j procfile
955 --max-procs procfile
956 -P procfile
957 Read parameter from file. Use the content of procfile as parameter
958 for -j. E.g. procfile could contain the string 100% or +2 or 10. If
959 procfile is changed when a job completes, procfile is read again
960 and the new number of jobs is computed. If the number is lower than
961 before, running jobs will be allowed to finish but new jobs will
962 not be started until the wanted number of jobs has been reached.
963 This makes it possible to change the number of simultaneous running
964 jobs while GNU parallel is running.
965
966 --keep-order
967 -k Keep sequence of output same as the order of input. Normally the
968 output of a job will be printed as soon as the job completes. Try
969 this to see the difference:
970
971 parallel -j4 sleep {}\; echo {} ::: 2 1 4 3
972 parallel -j4 -k sleep {}\; echo {} ::: 2 1 4 3
973
974 If used with --onall or --nonall the output will grouped by
975 sshlogin in sorted order.
976
977 If used with --pipe --roundrobin and the same input, the jobslots
978 will get the same blocks in the same order in every run.
979
980 -k only affects the order in which the output is printed - not the
981 order in which jobs are run.
982
983 -L recsize
984 When used with --pipe: Read records of recsize.
985
986 When used otherwise: Use at most recsize nonblank input lines per
987 command line. Trailing blanks cause an input line to be logically
988 continued on the next input line.
989
990 -L 0 means read one line, but insert 0 arguments on the command
991 line.
992
993 Implies -X unless -m, --xargs, or --pipe is set.
994
995 --max-lines[=recsize]
996 -l[recsize]
997 When used with --pipe: Read records of recsize lines.
998
999 When used otherwise: Synonym for the -L option. Unlike -L, the
1000 recsize argument is optional. If recsize is not specified, it
1001 defaults to one. The -l option is deprecated since the POSIX
1002 standard specifies -L instead.
1003
1004 -l 0 is an alias for -l 1.
1005
1006 Implies -X unless -m, --xargs, or --pipe is set.
1007
1008 --limit "command args"
1009 Dynamic job limit. Before starting a new job run command with args.
1010 The exit value of command determines what GNU parallel will do:
1011
1012 0 Below limit. Start another job.
1013
1014 1 Over limit. Start no jobs.
1015
1016 2 Way over limit. Kill the youngest job.
1017
1018 You can use any shell command. There are 3 predefined commands:
1019
1020 "io n" Limit for I/O. The amount of disk I/O will be computed as
1021 a value 0-100, where 0 is no I/O and 100 is at least one
1022 disk is 100% saturated.
1023
1024 "load n" Similar to --load.
1025
1026 "mem n" Similar to --memfree.
1027
1028 --line-buffer
1029 --lb
1030 Buffer output on line basis. --group will keep the output together
1031 for a whole job. --ungroup allows output to mixup with half a line
1032 coming from one job and half a line coming from another job.
1033 --line-buffer fits between these two: GNU parallel will print a
1034 full line, but will allow for mixing lines of different jobs.
1035
1036 --line-buffer takes more CPU power than both --group and --ungroup,
1037 but can be much faster than --group if the CPU is not the limiting
1038 factor.
1039
1040 Normally --line-buffer does not buffer on disk, and can thus
1041 process an infinite amount of data, but it will buffer on disk when
1042 combined with: --keep-order, --results, --compress, and --files.
1043 This will make it as slow as --group and will limit output to the
1044 available disk space.
1045
1046 With --keep-order --line-buffer will output lines from the first
1047 job continuously while it is running, then lines from the second
1048 job while that is running. It will buffer full lines, but jobs will
1049 not mix. Compare:
1050
1051 parallel -j0 'echo {};sleep {};echo {}' ::: 1 3 2 4
1052 parallel -j0 --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
1053 parallel -j0 -k --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
1054
1055 See also: --group --ungroup
1056
1057 --xapply
1058 --link
1059 Link input sources. Read multiple input sources like xapply. If
1060 multiple input sources are given, one argument will be read from
1061 each of the input sources. The arguments can be accessed in the
1062 command as {1} .. {n}, so {1} will be a line from the first input
1063 source, and {6} will refer to the line with the same line number
1064 from the 6th input source.
1065
1066 Compare these two:
1067
1068 parallel echo {1} {2} ::: 1 2 3 ::: a b c
1069 parallel --link echo {1} {2} ::: 1 2 3 ::: a b c
1070
1071 Arguments will be recycled if one input source has more arguments
1072 than the others:
1073
1074 parallel --link echo {1} {2} {3} \
1075 ::: 1 2 ::: I II III ::: a b c d e f g
1076
1077 See also --header, :::+, ::::+.
1078
1079 --load max-load
1080 Do not start new jobs on a given computer unless the number of
1081 running processes on the computer is less than max-load. max-load
1082 uses the same syntax as --jobs, so 100% for one per CPU is a valid
1083 setting. Only difference is 0 which is interpreted as 0.01.
1084
1085 --controlmaster
1086 -M Use ssh's ControlMaster to make ssh connections faster. Useful if
1087 jobs run remote and are very fast to run. This is disabled for
1088 sshlogins that specify their own ssh command.
1089
1090 -m Multiple arguments. Insert as many arguments as the command line
1091 length permits. If multiple jobs are being run in parallel:
1092 distribute the arguments evenly among the jobs. Use -j1 or --xargs
1093 to avoid this.
1094
1095 If {} is not used the arguments will be appended to the line. If
1096 {} is used multiple times each {} will be replaced with all the
1097 arguments.
1098
1099 Support for -m with --sshlogin is limited and may fail.
1100
1101 See also -X for context replace. If in doubt use -X as that will
1102 most likely do what is needed.
1103
1104 --memfree size
1105 Minimum memory free when starting another job. The size can be
1106 postfixed with K, M, G, T, P, k, m, g, t, or p which would multiply
1107 the size with 1024, 1048576, 1073741824, 1099511627776,
1108 1125899906842624, 1000, 1000000, 1000000000, 1000000000000, or
1109 1000000000000000, respectively.
1110
1111 If the jobs take up very different amount of RAM, GNU parallel will
1112 only start as many as there is memory for. If less than size bytes
1113 are free, no more jobs will be started. If less than 50% size bytes
1114 are free, the youngest job will be killed, and put back on the
1115 queue to be run later.
1116
1117 --retries must be set to determine how many times GNU parallel
1118 should retry a given job.
1119
1120 --minversion version
1121 Print the version GNU parallel and exit. If the current version of
1122 GNU parallel is less than version the exit code is 255. Otherwise
1123 it is 0.
1124
1125 This is useful for scripts that depend on features only available
1126 from a certain version of GNU parallel.
1127
1128 --max-args=max-args
1129 -n max-args
1130 Use at most max-args arguments per command line. Fewer than max-
1131 args arguments will be used if the size (see the -s option) is
1132 exceeded, unless the -x option is given, in which case GNU parallel
1133 will exit.
1134
1135 -n 0 means read one argument, but insert 0 arguments on the command
1136 line.
1137
1138 Implies -X unless -m is set.
1139
1140 --max-replace-args=max-args
1141 -N max-args
1142 Use at most max-args arguments per command line. Like -n but also
1143 makes replacement strings {1} .. {max-args} that represents
1144 argument 1 .. max-args. If too few args the {n} will be empty.
1145
1146 -N 0 means read one argument, but insert 0 arguments on the command
1147 line.
1148
1149 This will set the owner of the homedir to the user:
1150
1151 tr ':' '\n' < /etc/passwd | parallel -N7 chown {1} {6}
1152
1153 Implies -X unless -m or --pipe is set.
1154
1155 When used with --pipe -N is the number of records to read. This is
1156 somewhat slower than --block.
1157
1158 --nonall
1159 --onall with no arguments. Run the command on all computers given
1160 with --sshlogin but take no arguments. GNU parallel will log into
1161 --jobs number of computers in parallel and run the job on the
1162 computer. -j adjusts how many computers to log into in parallel.
1163
1164 This is useful for running the same command (e.g. uptime) on a list
1165 of servers.
1166
1167 --onall
1168 Run all the jobs on all computers given with --sshlogin. GNU
1169 parallel will log into --jobs number of computers in parallel and
1170 run one job at a time on the computer. The order of the jobs will
1171 not be changed, but some computers may finish before others.
1172
1173 When using --group the output will be grouped by each server, so
1174 all the output from one server will be grouped together.
1175
1176 --joblog will contain an entry for each job on each server, so
1177 there will be several job sequence 1.
1178
1179 --output-as-files
1180 --outputasfiles
1181 --files
1182 Instead of printing the output to stdout (standard output) the
1183 output of each job is saved in a file and the filename is then
1184 printed.
1185
1186 See also: --results
1187
1188 --pipe (alpha testing)
1189 --spreadstdin (alpha testing)
1190 Spread input to jobs on stdin (standard input). Read a block of
1191 data from stdin (standard input) and give one block of data as
1192 input to one job.
1193
1194 The block size is determined by --block. The strings --recstart and
1195 --recend tell GNU parallel how a record starts and/or ends. The
1196 block read will have the final partial record removed before the
1197 block is passed on to the job. The partial record will be prepended
1198 to next block.
1199
1200 If --recstart is given this will be used to split at record start.
1201
1202 If --recend is given this will be used to split at record end.
1203
1204 If both --recstart and --recend are given both will have to match
1205 to find a split position.
1206
1207 If neither --recstart nor --recend are given --recend defaults to
1208 '\n'. To have no record separator use --recend "".
1209
1210 --files is often used with --pipe.
1211
1212 --pipe maxes out at around 1 GB/s input, and 100 MB/s output. If
1213 performance is important use --pipepart.
1214
1215 See also: --recstart, --recend, --fifo, --cat, --pipepart, --files.
1216
1217 --pipepart
1218 Pipe parts of a physical file. --pipepart works similar to --pipe,
1219 but is much faster.
1220
1221 --pipepart has a few limitations:
1222
1223 • The file must be a normal file or a block device (technically it
1224 must be seekable) and must be given using -a or ::::. The file
1225 cannot be a pipe or a fifo as they are not seekable.
1226
1227 If using a block device with lot of NUL bytes, remember to set
1228 --recend ''.
1229
1230 • Record counting (-N) and line counting (-L/-l) do not work.
1231
1232 --plain
1233 Ignore any --profile, $PARALLEL, and ~/.parallel/config to get full
1234 control on the command line (used by GNU parallel internally when
1235 called with --sshlogin).
1236
1237 --plus
1238 Activate additional replacement strings: {+/} {+.} {+..} {+...}
1239 {..} {...} {/..} {/...} {##}. The idea being that '{+foo}' matches
1240 the opposite of '{foo}' and {} = {+/}/{/} = {.}.{+.} =
1241 {+/}/{/.}.{+.} = {..}.{+..} = {+/}/{/..}.{+..} = {...}.{+...} =
1242 {+/}/{/...}.{+...}
1243
1244 {##} is the total number of jobs to be run. It is incompatible with
1245 -X/-m/--xargs.
1246
1247 {choose_k} is inspired by n choose k: Given a list of n elements,
1248 choose k. k is the number of input sources and n is the number of
1249 arguments in an input source. The content of the input sources
1250 must be the same and the arguments must be unique.
1251
1252 Shorthands for variables:
1253
1254 {slot} $PARALLEL_JOBSLOT (see {%})
1255 {sshlogin} $PARALLEL_SSHLOGIN
1256 {host} $PARALLEL_SSHHOST
1257 {hgrp} $PARALLEL_HOSTGROUPS
1258
1259 The following dynamic replacement strings are also activated. They
1260 are inspired by bash's parameter expansion:
1261
1262 {:-str} str if the value is empty
1263 {:num} remove the first num characters
1264 {:num1:num2} characters from num1 to num2
1265 {#str} remove prefix str
1266 {%str} remove postfix str
1267 {/str1/str2} replace str1 with str2
1268 {^str} uppercase str if found at the start
1269 {^^str} uppercase str
1270 {,str} lowercase str if found at the start
1271 {,,str} lowercase str
1272
1273 --progress
1274 Show progress of computations. List the computers involved in the
1275 task with number of CPUs detected and the max number of jobs to
1276 run. After that show progress for each computer: number of running
1277 jobs, number of completed jobs, and percentage of all jobs done by
1278 this computer. The percentage will only be available after all jobs
1279 have been scheduled as GNU parallel only read the next job when
1280 ready to schedule it - this is to avoid wasting time and memory by
1281 reading everything at startup.
1282
1283 By sending GNU parallel SIGUSR2 you can toggle turning on/off
1284 --progress on a running GNU parallel process.
1285
1286 See also --eta and --bar.
1287
1288 --max-line-length-allowed
1289 Print the maximal number of characters allowed on the command line
1290 and exit (used by GNU parallel itself to determine the line length
1291 on remote computers).
1292
1293 --number-of-cpus (obsolete)
1294 Print the number of physical CPU cores and exit.
1295
1296 --number-of-cores
1297 Print the number of physical CPU cores and exit (used by GNU
1298 parallel itself to determine the number of physical CPU cores on
1299 remote computers).
1300
1301 --number-of-sockets
1302 Print the number of filled CPU sockets and exit (used by GNU
1303 parallel itself to determine the number of filled CPU sockets on
1304 remote computers).
1305
1306 --number-of-threads
1307 Print the number of hyperthreaded CPU cores and exit (used by GNU
1308 parallel itself to determine the number of hyperthreaded CPU cores
1309 on remote computers).
1310
1311 --no-keep-order
1312 Overrides an earlier --keep-order (e.g. if set in
1313 ~/.parallel/config).
1314
1315 --nice niceness
1316 Run the command at this niceness.
1317
1318 By default GNU parallel will run jobs at the same nice level as GNU
1319 parallel is started - both on the local machine and remote servers,
1320 so you are unlikely to ever use this option.
1321
1322 Setting --nice will override this nice level. If the nice level is
1323 smaller than the current nice level, it will only affect remote
1324 jobs (e.g. if current level is 10 then --nice 5 will cause local
1325 jobs to be run at level 10, but remote jobs run at nice level 5).
1326
1327 --interactive
1328 -p Prompt the user about whether to run each command line and read a
1329 line from the terminal. Only run the command line if the response
1330 starts with 'y' or 'Y'. Implies -t.
1331
1332 --parens parensstring
1333 Define start and end parenthesis for {= perl expression =}. The
1334 left and the right parenthesis can be multiple characters and are
1335 assumed to be the same length. The default is {==} giving {= as the
1336 start parenthesis and =} as the end parenthesis.
1337
1338 Another useful setting is ,,,, which would make both parenthesis
1339 ,,:
1340
1341 parallel --parens ,,,, echo foo is ,,s/I/O/g,, ::: FII
1342
1343 See also: --rpl {= perl expression =}
1344
1345 --profile profilename
1346 -J profilename
1347 Use profile profilename for options. This is useful if you want to
1348 have multiple profiles. You could have one profile for running jobs
1349 in parallel on the local computer and a different profile for
1350 running jobs on remote computers. See the section PROFILE FILES for
1351 examples.
1352
1353 profilename corresponds to the file ~/.parallel/profilename.
1354
1355 You can give multiple profiles by repeating --profile. If parts of
1356 the profiles conflict, the later ones will be used.
1357
1358 Default: config
1359
1360 --quote
1361 -q Quote command. If your command contains special characters that
1362 should not be interpreted by the shell (e.g. ; \ | *), use --quote
1363 to escape these. The command must be a simple command (see man
1364 bash) without redirections and without variable assignments.
1365
1366 See the section QUOTING. Most people will not need this. Quoting
1367 is disabled by default.
1368
1369 --no-run-if-empty
1370 -r If the stdin (standard input) only contains whitespace, do not run
1371 the command.
1372
1373 If used with --pipe this is slow.
1374
1375 --noswap
1376 Do not start new jobs on a given computer if there is both swap-in
1377 and swap-out activity.
1378
1379 The swap activity is only sampled every 10 seconds as the sampling
1380 takes 1 second to do.
1381
1382 Swap activity is computed as (swap-in)*(swap-out) which in practice
1383 is a good value: swapping out is not a problem, swapping in is not
1384 a problem, but both swapping in and out usually indicates a
1385 problem.
1386
1387 --memfree may give better results, so try using that first.
1388
1389 --record-env
1390 Record current environment variables in ~/.parallel/ignored_vars.
1391 This is useful before using --env _.
1392
1393 See also --env, --session.
1394
1395 --recstart startstring
1396 --recend endstring
1397 If --recstart is given startstring will be used to split at record
1398 start.
1399
1400 If --recend is given endstring will be used to split at record end.
1401
1402 If both --recstart and --recend are given the combined string
1403 endstringstartstring will have to match to find a split position.
1404 This is useful if either startstring or endstring match in the
1405 middle of a record.
1406
1407 If neither --recstart nor --recend are given then --recend defaults
1408 to '\n'. To have no record separator use --recend "".
1409
1410 --recstart and --recend are used with --pipe.
1411
1412 Use --regexp to interpret --recstart and --recend as regular
1413 expressions. This is slow, however.
1414
1415 --regexp
1416 Use --regexp to interpret --recstart and --recend as regular
1417 expressions. This is slow, however.
1418
1419 --remove-rec-sep
1420 --removerecsep
1421 --rrs
1422 Remove the text matched by --recstart and --recend before piping it
1423 to the command.
1424
1425 Only used with --pipe.
1426
1427 --results name
1428 --res name
1429 Save the output into files.
1430
1431 Simple string output dir
1432
1433 If name does not contain replacement strings and does not end in
1434 .csv/.tsv, the output will be stored in a directory tree rooted at
1435 name. Within this directory tree, each command will result in
1436 three files: name/<ARGS>/stdout and name/<ARGS>/stderr,
1437 name/<ARGS>/seq, where <ARGS> is a sequence of directories
1438 representing the header of the input source (if using --header :)
1439 or the number of the input source and corresponding values.
1440
1441 E.g:
1442
1443 parallel --header : --results foo echo {a} {b} \
1444 ::: a I II ::: b III IIII
1445
1446 will generate the files:
1447
1448 foo/a/II/b/III/seq
1449 foo/a/II/b/III/stderr
1450 foo/a/II/b/III/stdout
1451 foo/a/II/b/IIII/seq
1452 foo/a/II/b/IIII/stderr
1453 foo/a/II/b/IIII/stdout
1454 foo/a/I/b/III/seq
1455 foo/a/I/b/III/stderr
1456 foo/a/I/b/III/stdout
1457 foo/a/I/b/IIII/seq
1458 foo/a/I/b/IIII/stderr
1459 foo/a/I/b/IIII/stdout
1460
1461 and
1462
1463 parallel --results foo echo {1} {2} ::: I II ::: III IIII
1464
1465 will generate the files:
1466
1467 foo/1/II/2/III/seq
1468 foo/1/II/2/III/stderr
1469 foo/1/II/2/III/stdout
1470 foo/1/II/2/IIII/seq
1471 foo/1/II/2/IIII/stderr
1472 foo/1/II/2/IIII/stdout
1473 foo/1/I/2/III/seq
1474 foo/1/I/2/III/stderr
1475 foo/1/I/2/III/stdout
1476 foo/1/I/2/IIII/seq
1477 foo/1/I/2/IIII/stderr
1478 foo/1/I/2/IIII/stdout
1479
1480 CSV file output
1481
1482 If name ends in .csv/.tsv the output will be a CSV-file named name.
1483
1484 .csv gives a comma separated value file. .tsv gives a TAB separated
1485 value file.
1486
1487 -.csv/-.tsv are special: It will give the file on stdout (standard
1488 output).
1489
1490 JSON file output (alpha testing)
1491
1492 If name ends in .json the output will be a JSON-file named name.
1493
1494 -.json is special: It will give the file on stdout (standard
1495 output).
1496
1497 Replacement string output file (alpha testing)
1498
1499 If name contains a replacement string and the replaced result does
1500 not end in /, then the standard output will be stored in a file
1501 named by this result. Standard error will be stored in the same
1502 file name with '.err' added, and the sequence number will be stored
1503 in the same file name with '.seq' added.
1504
1505 E.g.
1506
1507 parallel --results my_{} echo ::: foo bar baz
1508
1509 will generate the files:
1510
1511 my_bar
1512 my_bar.err
1513 my_bar.seq
1514 my_baz
1515 my_baz.err
1516 my_baz.seq
1517 my_foo
1518 my_foo.err
1519 my_foo.seq
1520
1521 Replacement string output dir
1522
1523 If name contains a replacement string and the replaced result ends
1524 in /, then output files will be stored in the resulting dir.
1525
1526 E.g.
1527
1528 parallel --results my_{}/ echo ::: foo bar baz
1529
1530 will generate the files:
1531
1532 my_bar/seq
1533 my_bar/stderr
1534 my_bar/stdout
1535 my_baz/seq
1536 my_baz/stderr
1537 my_baz/stdout
1538 my_foo/seq
1539 my_foo/stderr
1540 my_foo/stdout
1541
1542 See also --files, --tag, --header, --joblog.
1543
1544 --resume
1545 Resumes from the last unfinished job. By reading --joblog or the
1546 --results dir GNU parallel will figure out the last unfinished job
1547 and continue from there. As GNU parallel only looks at the sequence
1548 numbers in --joblog then the input, the command, and --joblog all
1549 have to remain unchanged; otherwise GNU parallel may run wrong
1550 commands.
1551
1552 See also --joblog, --results, --resume-failed, --retries.
1553
1554 --resume-failed
1555 Retry all failed and resume from the last unfinished job. By
1556 reading --joblog GNU parallel will figure out the failed jobs and
1557 run those again. After that it will resume last unfinished job and
1558 continue from there. As GNU parallel only looks at the sequence
1559 numbers in --joblog then the input, the command, and --joblog all
1560 have to remain unchanged; otherwise GNU parallel may run wrong
1561 commands.
1562
1563 See also --joblog, --resume, --retry-failed, --retries.
1564
1565 --retry-failed
1566 Retry all failed jobs in joblog. By reading --joblog GNU parallel
1567 will figure out the failed jobs and run those again.
1568
1569 --retry-failed ignores the command and arguments on the command
1570 line: It only looks at the joblog.
1571
1572 Differences between --resume, --resume-failed, --retry-failed
1573
1574 In this example exit {= $_%=2 =} will cause every other job to
1575 fail.
1576
1577 timeout -k 1 4 parallel --joblog log -j10 \
1578 'sleep {}; exit {= $_%=2 =}' ::: {10..1}
1579
1580 4 jobs completed. 2 failed:
1581
1582 Seq [...] Exitval Signal Command
1583 10 [...] 1 0 sleep 1; exit 1
1584 9 [...] 0 0 sleep 2; exit 0
1585 8 [...] 1 0 sleep 3; exit 1
1586 7 [...] 0 0 sleep 4; exit 0
1587
1588 --resume does not care about the Exitval, but only looks at Seq. If
1589 the Seq is run, it will not be run again. So if needed, you can
1590 change the command for the seqs not run yet:
1591
1592 parallel --resume --joblog log -j10 \
1593 'sleep .{}; exit {= $_%=2 =}' ::: {10..1}
1594
1595 Seq [...] Exitval Signal Command
1596 [... as above ...]
1597 1 [...] 0 0 sleep .10; exit 0
1598 6 [...] 1 0 sleep .5; exit 1
1599 5 [...] 0 0 sleep .6; exit 0
1600 4 [...] 1 0 sleep .7; exit 1
1601 3 [...] 0 0 sleep .8; exit 0
1602 2 [...] 1 0 sleep .9; exit 1
1603
1604 --resume-failed cares about the Exitval, but also only looks at Seq
1605 to figure out which commands to run. Again this means you can
1606 change the command, but not the arguments. It will run the failed
1607 seqs and the seqs not yet run:
1608
1609 parallel --resume-failed --joblog log -j10 \
1610 'echo {};sleep .{}; exit {= $_%=3 =}' ::: {10..1}
1611
1612 Seq [...] Exitval Signal Command
1613 [... as above ...]
1614 10 [...] 1 0 echo 1;sleep .1; exit 1
1615 8 [...] 0 0 echo 3;sleep .3; exit 0
1616 6 [...] 2 0 echo 5;sleep .5; exit 2
1617 4 [...] 1 0 echo 7;sleep .7; exit 1
1618 2 [...] 0 0 echo 9;sleep .9; exit 0
1619
1620 --retry-failed cares about the Exitval, but takes the command from
1621 the joblog. It ignores any arguments or commands given on the
1622 command line:
1623
1624 parallel --retry-failed --joblog log -j10 this part is ignored
1625
1626 Seq [...] Exitval Signal Command
1627 [... as above ...]
1628 10 [...] 1 0 echo 1;sleep .1; exit 1
1629 6 [...] 2 0 echo 5;sleep .5; exit 2
1630 4 [...] 1 0 echo 7;sleep .7; exit 1
1631
1632 See also --joblog, --resume, --resume-failed, --retries.
1633
1634 --retries n
1635 If a job fails, retry it on another computer on which it has not
1636 failed. Do this n times. If there are fewer than n computers in
1637 --sshlogin GNU parallel will re-use all the computers. This is
1638 useful if some jobs fail for no apparent reason (such as network
1639 failure).
1640
1641 --return filename
1642 Transfer files from remote computers. --return is used with
1643 --sshlogin when the arguments are files on the remote computers.
1644 When processing is done the file filename will be transferred from
1645 the remote computer using rsync and will be put relative to the
1646 default login dir. E.g.
1647
1648 echo foo/bar.txt | parallel --return {.}.out \
1649 --sshlogin server.example.com touch {.}.out
1650
1651 This will transfer the file $HOME/foo/bar.out from the computer
1652 server.example.com to the file foo/bar.out after running touch
1653 foo/bar.out on server.example.com.
1654
1655 parallel -S server --trc out/./{}.out touch {}.out ::: in/file
1656
1657 This will transfer the file in/file.out from the computer
1658 server.example.com to the files out/in/file.out after running touch
1659 in/file.out on server.
1660
1661 echo /tmp/foo/bar.txt | parallel --return {.}.out \
1662 --sshlogin server.example.com touch {.}.out
1663
1664 This will transfer the file /tmp/foo/bar.out from the computer
1665 server.example.com to the file /tmp/foo/bar.out after running touch
1666 /tmp/foo/bar.out on server.example.com.
1667
1668 Multiple files can be transferred by repeating the option multiple
1669 times:
1670
1671 echo /tmp/foo/bar.txt | parallel \
1672 --sshlogin server.example.com \
1673 --return {.}.out --return {.}.out2 touch {.}.out {.}.out2
1674
1675 --return is often used with --transferfile and --cleanup.
1676
1677 --return is ignored when used with --sshlogin : or when not used
1678 with --sshlogin.
1679
1680 --round-robin
1681 --round
1682 Normally --pipe will give a single block to each instance of the
1683 command. With --roundrobin all blocks will at random be written to
1684 commands already running. This is useful if the command takes a
1685 long time to initialize.
1686
1687 --keep-order will not work with --roundrobin as it is impossible to
1688 track which input block corresponds to which output.
1689
1690 --roundrobin implies --pipe, except if --pipepart is given.
1691
1692 See also --group-by, --shard.
1693
1694 --rpl 'tag perl expression'
1695 Use tag as a replacement string for perl expression. This makes it
1696 possible to define your own replacement strings. GNU parallel's 7
1697 replacement strings are implemented as:
1698
1699 --rpl '{} '
1700 --rpl '{#} 1 $_=$job->seq()'
1701 --rpl '{%} 1 $_=$job->slot()'
1702 --rpl '{/} s:.*/::'
1703 --rpl '{//} $Global::use{"File::Basename"} ||=
1704 eval "use File::Basename; 1;"; $_ = dirname($_);'
1705 --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
1706 --rpl '{.} s:\.[^/.]+$::'
1707
1708 The --plus replacement strings are implemented as:
1709
1710 --rpl '{+/} s:/[^/]*$::'
1711 --rpl '{+.} s:.*\.::'
1712 --rpl '{+..} s:.*\.([^.]*\.):$1:'
1713 --rpl '{+...} s:.*\.([^.]*\.[^.]*\.):$1:'
1714 --rpl '{..} s:\.[^/.]+$::; s:\.[^/.]+$::'
1715 --rpl '{...} s:\.[^/.]+$::; s:\.[^/.]+$::; s:\.[^/.]+$::'
1716 --rpl '{/..} s:.*/::; s:\.[^/.]+$::; s:\.[^/.]+$::'
1717 --rpl '{/...} s:.*/::;s:\.[^/.]+$::;s:\.[^/.]+$::;s:\.[^/.]+$::'
1718 --rpl '{##} $_=total_jobs()'
1719 --rpl '{:-(.+?)} $_ ||= $$1'
1720 --rpl '{:(\d+?)} substr($_,0,$$1) = ""'
1721 --rpl '{:(\d+?):(\d+?)} $_ = substr($_,$$1,$$2);'
1722 --rpl '{#([^#].*?)} s/^$$1//;'
1723 --rpl '{%(.+?)} s/$$1$//;'
1724 --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;'
1725 --rpl '{^(.+?)} s/^($$1)/uc($1)/e;'
1726 --rpl '{^^(.+?)} s/($$1)/uc($1)/eg;'
1727 --rpl '{,(.+?)} s/^($$1)/lc($1)/e;'
1728 --rpl '{,,(.+?)} s/($$1)/lc($1)/eg;'
1729
1730 If the user defined replacement string starts with '{' it can also
1731 be used as a positional replacement string (like {2.}).
1732
1733 It is recommended to only change $_ but you have full access to all
1734 of GNU parallel's internal functions and data structures.
1735
1736 Here are a few examples:
1737
1738 Is the job sequence even or odd?
1739 --rpl '{odd} $_ = seq() % 2 ? "odd" : "even"'
1740 Pad job sequence with leading zeros to get equal width
1741 --rpl '{0#} $f=1+int("".(log(total_jobs())/log(10)));
1742 $_=sprintf("%0${f}d",seq())'
1743 Job sequence counting from 0
1744 --rpl '{#0} $_ = seq() - 1'
1745 Job slot counting from 2
1746 --rpl '{%1} $_ = slot() + 1'
1747 Remove all extensions
1748 --rpl '{:} s:(\.[^/]+)*$::'
1749
1750 You can have dynamic replacement strings by including parenthesis
1751 in the replacement string and adding a regular expression between
1752 the parenthesis. The matching string will be inserted as $$1:
1753
1754 parallel --rpl '{%(.*?)} s/$$1//' echo {%.tar.gz} ::: my.tar.gz
1755 parallel --rpl '{:%(.+?)} s:$$1(\.[^/]+)*$::' \
1756 echo {:%_file} ::: my_file.tar.gz
1757 parallel -n3 --rpl '{/:%(.*?)} s:.*/(.*)$$1(\.[^/]+)*$:$1:' \
1758 echo job {#}: {2} {2.} {3/:%_1} ::: a/b.c c/d.e f/g_1.h.i
1759
1760 You can even use multiple matches:
1761
1762 parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;'
1763 echo {/replacethis/withthis} {/b/C} ::: a_replacethis_b
1764
1765 parallel --rpl '{(.*?)/(.*?)} $_="$$2$_$$1"' \
1766 echo {swap/these} ::: -middle-
1767
1768 See also: {= perl expression =} --parens
1769
1770 --rsync-opts options
1771 Options to pass on to rsync. Setting --rsync-opts takes precedence
1772 over setting the environment variable $PARALLEL_RSYNC_OPTS.
1773
1774 --max-chars=max-chars
1775 -s max-chars
1776 Use at most max-chars characters per command line, including the
1777 command and initial-arguments and the terminating nulls at the ends
1778 of the argument strings. The largest allowed value is system-
1779 dependent, and is calculated as the argument length limit for exec,
1780 less the size of your environment. The default value is the
1781 maximum.
1782
1783 Implies -X unless -m is set.
1784
1785 --show-limits
1786 Display the limits on the command-line length which are imposed by
1787 the operating system and the -s option. Pipe the input from
1788 /dev/null (and perhaps specify --no-run-if-empty) if you don't want
1789 GNU parallel to do anything.
1790
1791 --semaphore
1792 Work as a counting semaphore. --semaphore will cause GNU parallel
1793 to start command in the background. When the number of jobs given
1794 by --jobs is reached, GNU parallel will wait for one of these to
1795 complete before starting another command.
1796
1797 --semaphore implies --bg unless --fg is specified.
1798
1799 --semaphore implies --semaphorename `tty` unless --semaphorename is
1800 specified.
1801
1802 Used with --fg, --wait, and --semaphorename.
1803
1804 The command sem is an alias for parallel --semaphore.
1805
1806 See also man sem.
1807
1808 --semaphorename name
1809 --id name
1810 Use name as the name of the semaphore. Default is the name of the
1811 controlling tty (output from tty).
1812
1813 The default normally works as expected when used interactively, but
1814 when used in a script name should be set. $$ or my_task_name are
1815 often a good value.
1816
1817 The semaphore is stored in ~/.parallel/semaphores/
1818
1819 Implies --semaphore.
1820
1821 See also man sem.
1822
1823 --semaphoretimeout secs
1824 --st secs
1825 If secs > 0: If the semaphore is not released within secs seconds,
1826 take it anyway.
1827
1828 If secs < 0: If the semaphore is not released within secs seconds,
1829 exit.
1830
1831 Implies --semaphore.
1832
1833 See also man sem.
1834
1835 --seqreplace replace-str
1836 Use the replacement string replace-str instead of {#} for job
1837 sequence number.
1838
1839 --session
1840 Record names in current environment in $PARALLEL_IGNORED_NAMES and
1841 exit. Only used with env_parallel. Aliases, functions, and
1842 variables with names in $PARALLEL_IGNORED_NAMES will not be copied.
1843
1844 Only supported in Ash, Bash, Dash, Ksh, Sh, and Zsh.
1845
1846 See also --env, --record-env.
1847
1848 --shard shardexpr
1849 Use shardexpr as shard key and shard input to the jobs.
1850
1851 shardexpr is [column number|column name] [perlexpression] e.g. 3,
1852 Address, 3 $_%=100, Address s/\d//g.
1853
1854 Each input line is split using --colsep. The value of the column is
1855 put into $_, the perl expression is executed, the resulting value
1856 is hashed so that all lines of a given value is given to the same
1857 job slot.
1858
1859 This is similar to sharding in databases.
1860
1861 The performance is in the order of 100K rows per second. Faster if
1862 the shardcol is small (<10), slower if it is big (>100).
1863
1864 --shard requires --pipe and a fixed numeric value for --jobs.
1865
1866 See also --bin, --group-by, --roundrobin.
1867
1868 --shebang
1869 --hashbang
1870 GNU parallel can be called as a shebang (#!) command as the first
1871 line of a script. The content of the file will be treated as
1872 inputsource.
1873
1874 Like this:
1875
1876 #!/usr/bin/parallel --shebang -r wget
1877
1878 https://ftpmirror.gnu.org/parallel/parallel-20120822.tar.bz2
1879 https://ftpmirror.gnu.org/parallel/parallel-20130822.tar.bz2
1880 https://ftpmirror.gnu.org/parallel/parallel-20140822.tar.bz2
1881
1882 --shebang must be set as the first option.
1883
1884 On FreeBSD env is needed:
1885
1886 #!/usr/bin/env -S parallel --shebang -r wget
1887
1888 https://ftpmirror.gnu.org/parallel/parallel-20120822.tar.bz2
1889 https://ftpmirror.gnu.org/parallel/parallel-20130822.tar.bz2
1890 https://ftpmirror.gnu.org/parallel/parallel-20140822.tar.bz2
1891
1892 There are many limitations of shebang (#!) depending on your
1893 operating system. See details on
1894 http://www.in-ulm.de/~mascheck/various/shebang/
1895
1896 --shebang-wrap
1897 GNU parallel can parallelize scripts by wrapping the shebang line.
1898 If the program can be run like this:
1899
1900 cat arguments | parallel the_program
1901
1902 then the script can be changed to:
1903
1904 #!/usr/bin/parallel --shebang-wrap /original/parser --options
1905
1906 E.g.
1907
1908 #!/usr/bin/parallel --shebang-wrap /usr/bin/python
1909
1910 If the program can be run like this:
1911
1912 cat data | parallel --pipe the_program
1913
1914 then the script can be changed to:
1915
1916 #!/usr/bin/parallel --shebang-wrap --pipe /orig/parser --opts
1917
1918 E.g.
1919
1920 #!/usr/bin/parallel --shebang-wrap --pipe /usr/bin/perl -w
1921
1922 --shebang-wrap must be set as the first option.
1923
1924 --shellquote
1925 Does not run the command but quotes it. Useful for making quoted
1926 composed commands for GNU parallel.
1927
1928 Multiple --shellquote with quote the string multiple times, so
1929 parallel --shellquote | parallel --shellquote can be written as
1930 parallel --shellquote --shellquote.
1931
1932 --shuf
1933 Shuffle jobs. When having multiple input sources it is hard to
1934 randomize jobs. --shuf will generate all jobs, and shuffle them
1935 before running them. This is useful to get a quick preview of the
1936 results before running the full batch.
1937
1938 --skip-first-line
1939 Do not use the first line of input (used by GNU parallel itself
1940 when called with --shebang).
1941
1942 --sql DBURL (obsolete)
1943 Use --sqlmaster instead.
1944
1945 --sqlmaster DBURL
1946 Submit jobs via SQL server. DBURL must point to a table, which will
1947 contain the same information as --joblog, the values from the input
1948 sources (stored in columns V1 .. Vn), and the output (stored in
1949 columns Stdout and Stderr).
1950
1951 If DBURL is prepended with '+' GNU parallel assumes the table is
1952 already made with the correct columns and appends the jobs to it.
1953
1954 If DBURL is not prepended with '+' the table will be dropped and
1955 created with the correct amount of V-columns unless
1956
1957 --sqlmaster does not run any jobs, but it creates the values for
1958 the jobs to be run. One or more --sqlworker must be run to actually
1959 execute the jobs.
1960
1961 If --wait is set, GNU parallel will wait for the jobs to complete.
1962
1963 The format of a DBURL is:
1964
1965 [sql:]vendor://[[user][:pwd]@][host][:port]/[db]/table
1966
1967 E.g.
1968
1969 sql:mysql://hr:hr@localhost:3306/hrdb/jobs
1970 mysql://scott:tiger@my.example.com/pardb/paralleljobs
1971 sql:oracle://scott:tiger@ora.example.com/xe/parjob
1972 postgresql://scott:tiger@pg.example.com/pgdb/parjob
1973 pg:///parjob
1974 sqlite3:///%2Ftmp%2Fpardb.sqlite/parjob
1975 csv:///%2Ftmp%2Fpardb/parjob
1976
1977 Notice how / in the path of sqlite and CVS must be encoded as %2F.
1978 Except the last / in CSV which must be a /.
1979
1980 It can also be an alias from ~/.sql/aliases:
1981
1982 :myalias mysql:///mydb/paralleljobs
1983
1984 --sqlandworker DBURL
1985 Shorthand for: --sqlmaster DBURL --sqlworker DBURL.
1986
1987 --sqlworker DBURL
1988 Execute jobs via SQL server. Read the input sources variables from
1989 the table pointed to by DBURL. The command on the command line
1990 should be the same as given by --sqlmaster.
1991
1992 If you have more than one --sqlworker jobs may be run more than
1993 once.
1994
1995 If --sqlworker runs on the local machine, the hostname in the SQL
1996 table will not be ':' but instead the hostname of the machine.
1997
1998 --ssh sshcommand
1999 GNU parallel defaults to using ssh for remote access. This can be
2000 overridden with --ssh. It can also be set on a per server basis
2001 (see --sshlogin).
2002
2003 --sshdelay mytime (alpha testing)
2004 Delay starting next ssh by mytime. GNU parallel will not start
2005 another ssh for the next mytime.
2006
2007 For details on mytime see --delay.
2008
2009 -S
2010 [@hostgroups/][ncpus/]sshlogin[,[@hostgroups/][ncpus/]sshlogin[,...]]
2011 -S @hostgroup
2012 --sshlogin
2013 [@hostgroups/][ncpus/]sshlogin[,[@hostgroups/][ncpus/]sshlogin[,...]]
2014 --sshlogin @hostgroup
2015 Distribute jobs to remote computers. The jobs will be run on a list
2016 of remote computers.
2017
2018 If hostgroups is given, the sshlogin will be added to that
2019 hostgroup. Multiple hostgroups are separated by '+'. The sshlogin
2020 will always be added to a hostgroup named the same as sshlogin.
2021
2022 If only the @hostgroup is given, only the sshlogins in that
2023 hostgroup will be used. Multiple @hostgroup can be given.
2024
2025 GNU parallel will determine the number of CPUs on the remote
2026 computers and run the number of jobs as specified by -j. If the
2027 number ncpus is given GNU parallel will use this number for number
2028 of CPUs on the host. Normally ncpus will not be needed.
2029
2030 An sshlogin is of the form:
2031
2032 [sshcommand [options]] [username@]hostname
2033
2034 The sshlogin must not require a password (ssh-agent, ssh-copy-id,
2035 and sshpass may help with that).
2036
2037 The sshlogin ':' is special, it means 'no ssh' and will therefore
2038 run on the local computer.
2039
2040 The sshlogin '..' is special, it read sshlogins from
2041 ~/.parallel/sshloginfile or $XDG_CONFIG_HOME/parallel/sshloginfile
2042
2043 The sshlogin '-' is special, too, it read sshlogins from stdin
2044 (standard input).
2045
2046 To specify more sshlogins separate the sshlogins by comma, newline
2047 (in the same string), or repeat the options multiple times.
2048
2049 For examples: see --sshloginfile.
2050
2051 The remote host must have GNU parallel installed.
2052
2053 --sshlogin is known to cause problems with -m and -X.
2054
2055 --sshlogin is often used with --transferfile, --return, --cleanup,
2056 and --trc.
2057
2058 --sshloginfile filename
2059 --slf filename
2060 File with sshlogins. The file consists of sshlogins on separate
2061 lines. Empty lines and lines starting with '#' are ignored.
2062 Example:
2063
2064 server.example.com
2065 username@server2.example.com
2066 8/my-8-cpu-server.example.com
2067 2/my_other_username@my-dualcore.example.net
2068 # This server has SSH running on port 2222
2069 ssh -p 2222 server.example.net
2070 4/ssh -p 2222 quadserver.example.net
2071 # Use a different ssh program
2072 myssh -p 2222 -l myusername hexacpu.example.net
2073 # Use a different ssh program with default number of CPUs
2074 //usr/local/bin/myssh -p 2222 -l myusername hexacpu
2075 # Use a different ssh program with 6 CPUs
2076 6//usr/local/bin/myssh -p 2222 -l myusername hexacpu
2077 # Assume 16 CPUs on the local computer
2078 16/:
2079 # Put server1 in hostgroup1
2080 @hostgroup1/server1
2081 # Put myusername@server2 in hostgroup1+hostgroup2
2082 @hostgroup1+hostgroup2/myusername@server2
2083 # Force 4 CPUs and put 'ssh -p 2222 server3' in hostgroup1
2084 @hostgroup1/4/ssh -p 2222 server3
2085
2086 When using a different ssh program the last argument must be the
2087 hostname.
2088
2089 Multiple --sshloginfile are allowed.
2090
2091 GNU parallel will first look for the file in current dir; if that
2092 fails it look for the file in ~/.parallel.
2093
2094 The sshloginfile '..' is special, it read sshlogins from
2095 ~/.parallel/sshloginfile
2096
2097 The sshloginfile '.' is special, it read sshlogins from
2098 /etc/parallel/sshloginfile
2099
2100 The sshloginfile '-' is special, too, it read sshlogins from stdin
2101 (standard input).
2102
2103 If the sshloginfile is changed it will be re-read when a job
2104 finishes though at most once per second. This makes it possible to
2105 add and remove hosts while running.
2106
2107 This can be used to have a daemon that updates the sshloginfile to
2108 only contain servers that are up:
2109
2110 cp original.slf tmp2.slf
2111 while [ 1 ] ; do
2112 nice parallel --nonall -j0 -k --slf original.slf \
2113 --tag echo | perl 's/\t$//' > tmp.slf
2114 if diff tmp.slf tmp2.slf; then
2115 mv tmp.slf tmp2.slf
2116 fi
2117 sleep 10
2118 done &
2119 parallel --slf tmp2.slf ...
2120
2121 --slotreplace replace-str
2122 Use the replacement string replace-str instead of {%} for job slot
2123 number.
2124
2125 --silent
2126 Silent. The job to be run will not be printed. This is the
2127 default. Can be reversed with -v.
2128
2129 --tty
2130 Open terminal tty. If GNU parallel is used for starting a program
2131 that accesses the tty (such as an interactive program) then this
2132 option may be needed. It will default to starting only one job at a
2133 time (i.e. -j1), not buffer the output (i.e. -u), and it will open
2134 a tty for the job.
2135
2136 You can of course override -j1 and -u.
2137
2138 Using --tty unfortunately means that GNU parallel cannot kill the
2139 jobs (with --timeout, --memfree, or --halt). This is due to GNU
2140 parallel giving each child its own process group, which is then
2141 killed. Process groups are dependant on the tty.
2142
2143 --tag
2144 Tag lines with arguments. Each output line will be prepended with
2145 the arguments and TAB (\t). When combined with --onall or --nonall
2146 the lines will be prepended with the sshlogin instead.
2147
2148 --tag is ignored when using -u.
2149
2150 --tagstring str
2151 Tag lines with a string. Each output line will be prepended with
2152 str and TAB (\t). str can contain replacement strings such as {}.
2153
2154 --tagstring is ignored when using -u, --onall, and --nonall.
2155
2156 --tee
2157 Pipe all data to all jobs. Used with --pipe/--pipepart and :::.
2158
2159 seq 1000 | parallel --pipe --tee -v wc {} ::: -w -l -c
2160
2161 How many numbers in 1..1000 contain 0..9, and how many bytes do
2162 they fill:
2163
2164 seq 1000 | parallel --pipe --tee --tag \
2165 'grep {1} | wc {2}' ::: {0..9} ::: -l -c
2166
2167 How many words contain a..z and how many bytes do they fill?
2168
2169 parallel -a /usr/share/dict/words --pipepart --tee --tag \
2170 'grep {1} | wc {2}' ::: {a..z} ::: -l -c
2171
2172 --termseq sequence
2173 Termination sequence. When a job is killed due to --timeout,
2174 --memfree, --halt, or abnormal termination of GNU parallel,
2175 sequence determines how the job is killed. The default is:
2176
2177 TERM,200,TERM,100,TERM,50,KILL,25
2178
2179 which sends a TERM signal, waits 200 ms, sends another TERM signal,
2180 waits 100 ms, sends another TERM signal, waits 50 ms, sends a KILL
2181 signal, waits 25 ms, and exits. GNU parallel detects if a process
2182 dies before the waiting time is up.
2183
2184 --tmpdir dirname
2185 Directory for temporary files. GNU parallel normally buffers output
2186 into temporary files in /tmp. By setting --tmpdir you can use a
2187 different dir for the files. Setting --tmpdir is equivalent to
2188 setting $TMPDIR.
2189
2190 --tmux (Long beta testing)
2191 Use tmux for output. Start a tmux session and run each job in a
2192 window in that session. No other output will be produced.
2193
2194 --tmuxpane (Long beta testing)
2195 Use tmux for output but put output into panes in the first window.
2196 Useful if you want to monitor the progress of less than 100
2197 concurrent jobs.
2198
2199 --timeout duration
2200 Time out for command. If the command runs for longer than duration
2201 seconds it will get killed as per --termseq.
2202
2203 If duration is followed by a % then the timeout will dynamically be
2204 computed as a percentage of the median average runtime of
2205 successful jobs. Only values > 100% will make sense.
2206
2207 duration is normally in seconds, but can be floats postfixed with
2208 s, m, h, or d which would multiply the float by 1, 60, 3600, or
2209 86400. Thus these are equivalent: --timeout 100000 and --timeout
2210 1d3.5h16.6m4s.
2211
2212 --verbose
2213 -t Print the job to be run on stderr (standard error).
2214
2215 See also -v, -p.
2216
2217 --transfer
2218 Transfer files to remote computers. Shorthand for: --transferfile
2219 {}.
2220
2221 --transferfile filename
2222 --tf filename
2223 --transferfile is used with --sshlogin to transfer files to the
2224 remote computers. The files will be transferred using rsync and
2225 will be put relative to the default work dir. If the path contains
2226 /./ the remaining path will be relative to the work dir. E.g.
2227
2228 echo foo/bar.txt | parallel --transferfile {} \
2229 --sshlogin server.example.com wc
2230
2231 This will transfer the file foo/bar.txt to the computer
2232 server.example.com to the file $HOME/foo/bar.txt before running wc
2233 foo/bar.txt on server.example.com.
2234
2235 echo /tmp/foo/bar.txt | parallel --transferfile {} \
2236 --sshlogin server.example.com wc
2237
2238 This will transfer the file /tmp/foo/bar.txt to the computer
2239 server.example.com to the file /tmp/foo/bar.txt before running wc
2240 /tmp/foo/bar.txt on server.example.com.
2241
2242 echo /tmp/./foo/bar.txt | parallel --transferfile {} \
2243 --sshlogin server.example.com wc {= s:.*/./:./: =}
2244
2245 This will transfer the file /tmp/foo/bar.txt to the computer
2246 server.example.com to the file foo/bar.txt before running wc
2247 ./foo/bar.txt on server.example.com.
2248
2249 --transferfile is often used with --return and --cleanup. A
2250 shorthand for --transferfile {} is --transfer.
2251
2252 --transferfile is ignored when used with --sshlogin : or when not
2253 used with --sshlogin.
2254
2255 --trc filename
2256 Transfer, Return, Cleanup. Shorthand for:
2257
2258 --transferfile {} --return filename --cleanup
2259
2260 --trim <n|l|r|lr|rl>
2261 Trim white space in input.
2262
2263 n No trim. Input is not modified. This is the default.
2264
2265 l Left trim. Remove white space from start of input. E.g. " a bc
2266 " -> "a bc ".
2267
2268 r Right trim. Remove white space from end of input. E.g. " a bc "
2269 -> " a bc".
2270
2271 lr
2272 rl Both trim. Remove white space from both start and end of input.
2273 E.g. " a bc " -> "a bc". This is the default if --colsep is
2274 used.
2275
2276 --ungroup
2277 -u Ungroup output. Output is printed as soon as possible and bypasses
2278 GNU parallel internal processing. This may cause output from
2279 different commands to be mixed thus should only be used if you do
2280 not care about the output. Compare these:
2281
2282 seq 4 | parallel -j0 \
2283 'sleep {};echo -n start{};sleep {};echo {}end'
2284 seq 4 | parallel -u -j0 \
2285 'sleep {};echo -n start{};sleep {};echo {}end'
2286
2287 It also disables --tag. GNU parallel outputs faster with -u.
2288 Compare the speeds of these:
2289
2290 parallel seq ::: 300000000 >/dev/null
2291 parallel -u seq ::: 300000000 >/dev/null
2292 parallel --line-buffer seq ::: 300000000 >/dev/null
2293
2294 Can be reversed with --group.
2295
2296 See also: --line-buffer --group
2297
2298 --extensionreplace replace-str
2299 --er replace-str
2300 Use the replacement string replace-str instead of {.} for input
2301 line without extension.
2302
2303 --use-sockets-instead-of-threads
2304 --use-cores-instead-of-threads
2305 --use-cpus-instead-of-cores (obsolete)
2306 Determine how GNU parallel counts the number of CPUs. GNU parallel
2307 uses this number when the number of jobslots is computed relative
2308 to the number of CPUs (e.g. 100% or +1).
2309
2310 CPUs can be counted in three different ways:
2311
2312 sockets The number of filled CPU sockets (i.e. the number of
2313 physical chips).
2314
2315 cores The number of physical cores (i.e. the number of physical
2316 compute cores).
2317
2318 threads The number of hyperthreaded cores (i.e. the number of
2319 virtual cores - with some of them possibly being
2320 hyperthreaded)
2321
2322 Normally the number of CPUs is computed as the number of CPU
2323 threads. With --use-sockets-instead-of-threads or
2324 --use-cores-instead-of-threads you can force it to be computed as
2325 the number of filled sockets or number of cores instead.
2326
2327 Most users will not need these options.
2328
2329 --use-cpus-instead-of-cores is a (misleading) alias for
2330 --use-sockets-instead-of-threads and is kept for backwards
2331 compatibility.
2332
2333 -v Verbose. Print the job to be run on stdout (standard output). Can
2334 be reversed with --silent. See also -t.
2335
2336 Use -v -v to print the wrapping ssh command when running remotely.
2337
2338 --version
2339 -V Print the version GNU parallel and exit.
2340
2341 --workdir mydir
2342 --wd mydir
2343 Jobs will be run in the dir mydir.
2344
2345 Files transferred using --transferfile and --return will be
2346 relative to mydir on remote computers.
2347
2348 The special mydir value ... will create working dirs under
2349 ~/.parallel/tmp/. If --cleanup is given these dirs will be removed.
2350
2351 The special mydir value . uses the current working dir. If the
2352 current working dir is beneath your home dir, the value . is
2353 treated as the relative path to your home dir. This means that if
2354 your home dir is different on remote computers (e.g. if your login
2355 is different) the relative path will still be relative to your home
2356 dir.
2357
2358 To see the difference try:
2359
2360 parallel -S server pwd ::: ""
2361 parallel --wd . -S server pwd ::: ""
2362 parallel --wd ... -S server pwd ::: ""
2363
2364 mydir can contain GNU parallel's replacement strings.
2365
2366 --wait
2367 Wait for all commands to complete.
2368
2369 Used with --semaphore or --sqlmaster.
2370
2371 See also man sem.
2372
2373 -X Multiple arguments with context replace. Insert as many arguments
2374 as the command line length permits. If multiple jobs are being run
2375 in parallel: distribute the arguments evenly among the jobs. Use
2376 -j1 to avoid this.
2377
2378 If {} is not used the arguments will be appended to the line. If
2379 {} is used as part of a word (like pic{}.jpg) then the whole word
2380 will be repeated. If {} is used multiple times each {} will be
2381 replaced with the arguments.
2382
2383 Normally -X will do the right thing, whereas -m can give unexpected
2384 results if {} is used as part of a word.
2385
2386 Support for -X with --sshlogin is limited and may fail.
2387
2388 See also -m.
2389
2390 --exit
2391 -x Exit if the size (see the -s option) is exceeded.
2392
2393 --xargs
2394 Multiple arguments. Insert as many arguments as the command line
2395 length permits.
2396
2397 If {} is not used the arguments will be appended to the line. If
2398 {} is used multiple times each {} will be replaced with all the
2399 arguments.
2400
2401 Support for --xargs with --sshlogin is limited and may fail.
2402
2403 See also -X for context replace. If in doubt use -X as that will
2404 most likely do what is needed.
2405
2407 GNU parallel can work similar to xargs -n1.
2408
2409 To compress all html files using gzip run:
2410
2411 find . -name '*.html' | parallel gzip --best
2412
2413 If the file names may contain a newline use -0. Substitute FOO BAR with
2414 FUBAR in all files in this dir and subdirs:
2415
2416 find . -type f -print0 | \
2417 parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'
2418
2419 Note -q is needed because of the space in 'FOO BAR'.
2420
2422 prips can generate IP-addresses from CIDR notation. With GNU parallel
2423 you can build a simple network scanner to see which addresses respond
2424 to ping:
2425
2426 prips 130.229.16.0/20 | \
2427 parallel --timeout 2 -j0 \
2428 'ping -c 1 {} >/dev/null && echo {}' 2>/dev/null
2429
2431 GNU parallel can take the arguments from command line instead of stdin
2432 (standard input). To compress all html files in the current dir using
2433 gzip run:
2434
2435 parallel gzip --best ::: *.html
2436
2437 To convert *.wav to *.mp3 using LAME running one process per CPU run:
2438
2439 parallel lame {} -o {.}.mp3 ::: *.wav
2440
2442 When moving a lot of files like this: mv *.log destdir you will
2443 sometimes get the error:
2444
2445 bash: /bin/mv: Argument list too long
2446
2447 because there are too many files. You can instead do:
2448
2449 ls | grep -E '\.log$' | parallel mv {} destdir
2450
2451 This will run mv for each file. It can be done faster if mv gets as
2452 many arguments that will fit on the line:
2453
2454 ls | grep -E '\.log$' | parallel -m mv {} destdir
2455
2456 In many shells you can also use printf:
2457
2458 printf '%s\0' *.log | parallel -0 -m mv {} destdir
2459
2461 To remove the files pict0000.jpg .. pict9999.jpg you could do:
2462
2463 seq -w 0 9999 | parallel rm pict{}.jpg
2464
2465 You could also do:
2466
2467 seq -w 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm
2468
2469 The first will run rm 10000 times, while the last will only run rm as
2470 many times needed to keep the command line length short enough to avoid
2471 Argument list too long (it typically runs 1-2 times).
2472
2473 You could also run:
2474
2475 seq -w 0 9999 | parallel -X rm pict{}.jpg
2476
2477 This will also only run rm as many times needed to keep the command
2478 line length short enough.
2479
2481 If ImageMagick is installed this will generate a thumbnail of a jpg
2482 file:
2483
2484 convert -geometry 120 foo.jpg thumb_foo.jpg
2485
2486 This will run with number-of-cpus jobs in parallel for all jpg files in
2487 a directory:
2488
2489 ls *.jpg | parallel convert -geometry 120 {} thumb_{}
2490
2491 To do it recursively use find:
2492
2493 find . -name '*.jpg' | \
2494 parallel convert -geometry 120 {} {}_thumb.jpg
2495
2496 Notice how the argument has to start with {} as {} will include path
2497 (e.g. running convert -geometry 120 ./foo/bar.jpg thumb_./foo/bar.jpg
2498 would clearly be wrong). The command will generate files like
2499 ./foo/bar.jpg_thumb.jpg.
2500
2501 Use {.} to avoid the extra .jpg in the file name. This command will
2502 make files like ./foo/bar_thumb.jpg:
2503
2504 find . -name '*.jpg' | \
2505 parallel convert -geometry 120 {} {.}_thumb.jpg
2506
2508 This will generate an uncompressed version of .gz-files next to the
2509 .gz-file:
2510
2511 parallel zcat {} ">"{.} ::: *.gz
2512
2513 Quoting of > is necessary to postpone the redirection. Another solution
2514 is to quote the whole command:
2515
2516 parallel "zcat {} >{.}" ::: *.gz
2517
2518 Other special shell characters (such as * ; $ > < | >> <<) also need to
2519 be put in quotes, as they may otherwise be interpreted by the shell and
2520 not given to GNU parallel.
2521
2523 A job can consist of several commands. This will print the number of
2524 files in each directory:
2525
2526 ls | parallel 'echo -n {}" "; ls {}|wc -l'
2527
2528 To put the output in a file called <name>.dir:
2529
2530 ls | parallel '(echo -n {}" "; ls {}|wc -l) >{}.dir'
2531
2532 Even small shell scripts can be run by GNU parallel:
2533
2534 find . | parallel 'a={}; name=${a##*/};' \
2535 'upper=$(echo "$name" | tr "[:lower:]" "[:upper:]");'\
2536 'echo "$name - $upper"'
2537
2538 ls | parallel 'mv {} "$(echo {} | tr "[:upper:]" "[:lower:]")"'
2539
2540 Given a list of URLs, list all URLs that fail to download. Print the
2541 line number and the URL.
2542
2543 cat urlfile | parallel "wget {} 2>/dev/null || grep -n {} urlfile"
2544
2545 Create a mirror directory with the same filenames except all files and
2546 symlinks are empty files.
2547
2548 cp -rs /the/source/dir mirror_dir
2549 find mirror_dir -type l | parallel -m rm {} '&&' touch {}
2550
2551 Find the files in a list that do not exist
2552
2553 cat file_list | parallel 'if [ ! -e {} ] ; then echo {}; fi'
2554
2556 You have a bunch of file. You want them sorted into dirs. The dir of
2557 each file should be named the first letter of the file name.
2558
2559 parallel 'mkdir -p {=s/(.).*/$1/=}; mv {} {=s/(.).*/$1/=}' ::: *
2560
2562 You have a dir with files named as 24 hours in 5 minute intervals:
2563 00:00, 00:05, 00:10 .. 23:55. You want to find the files missing:
2564
2565 parallel [ -f {1}:{2} ] "||" echo {1}:{2} does not exist \
2566 ::: {00..23} ::: {00..55..5}
2567
2569 If the composed command is longer than a line, it becomes hard to read.
2570 In Bash you can use functions. Just remember to export -f the function.
2571
2572 doit() {
2573 echo Doing it for $1
2574 sleep 2
2575 echo Done with $1
2576 }
2577 export -f doit
2578 parallel doit ::: 1 2 3
2579
2580 doubleit() {
2581 echo Doing it for $1 $2
2582 sleep 2
2583 echo Done with $1 $2
2584 }
2585 export -f doubleit
2586 parallel doubleit ::: 1 2 3 ::: a b
2587
2588 To do this on remote servers you need to transfer the function using
2589 --env:
2590
2591 parallel --env doit -S server doit ::: 1 2 3
2592 parallel --env doubleit -S server doubleit ::: 1 2 3 ::: a b
2593
2594 If your environment (aliases, variables, and functions) is small you
2595 can copy the full environment without having to export -f anything. See
2596 env_parallel.
2597
2599 To test a program with different parameters:
2600
2601 tester() {
2602 if (eval "$@") >&/dev/null; then
2603 perl -e 'printf "\033[30;102m[ OK ]\033[0m @ARGV\n"' "$@"
2604 else
2605 perl -e 'printf "\033[30;101m[FAIL]\033[0m @ARGV\n"' "$@"
2606 fi
2607 }
2608 export -f tester
2609 parallel tester my_program ::: arg1 arg2
2610 parallel tester exit ::: 1 0 2 0
2611
2612 If my_program fails a red FAIL will be printed followed by the failing
2613 command; otherwise a green OK will be printed followed by the command.
2614
2616 It can be useful to monitor the output of running jobs.
2617
2618 This shows the most recent output line until a job finishes. After
2619 which the output of the job is printed in full:
2620
2621 parallel '{} | tee >(cat >&3)' ::: 'command 1' 'command 2' \
2622 3> >(perl -ne '$|=1;chomp;printf"%.'$COLUMNS's\r",$_." "x100')
2623
2625 Log rotation renames a logfile to an extension with a higher number:
2626 log.1 becomes log.2, log.2 becomes log.3, and so on. The oldest log is
2627 removed. To avoid overwriting files the process starts backwards from
2628 the high number to the low number. This will keep 10 old versions of
2629 the log:
2630
2631 seq 9 -1 1 | parallel -j1 mv log.{} log.'{= $_++ =}'
2632 mv log log.1
2633
2635 When processing files removing the file extension using {.} is often
2636 useful.
2637
2638 Create a directory for each zip-file and unzip it in that dir:
2639
2640 parallel 'mkdir {.}; cd {.}; unzip ../{}' ::: *.zip
2641
2642 Recompress all .gz files in current directory using bzip2 running 1 job
2643 per CPU in parallel:
2644
2645 parallel "zcat {} | bzip2 >{.}.bz2 && rm {}" ::: *.gz
2646
2647 Convert all WAV files to MP3 using LAME:
2648
2649 find sounddir -type f -name '*.wav' | parallel lame {} -o {.}.mp3
2650
2651 Put all converted in the same directory:
2652
2653 find sounddir -type f -name '*.wav' | \
2654 parallel lame {} -o mydir/{/.}.mp3
2655
2657 If you have directory with tar.gz files and want these extracted in the
2658 corresponding dir (e.g foo.tar.gz will be extracted in the dir foo) you
2659 can do:
2660
2661 parallel --plus 'mkdir {..}; tar -C {..} -xf {}' ::: *.tar.gz
2662
2663 If you want to remove a different ending, you can use {%string}:
2664
2665 parallel --plus echo {%_demo} ::: mycode_demo keep_demo_here
2666
2667 You can also remove a starting string with {#string}
2668
2669 parallel --plus echo {#demo_} ::: demo_mycode keep_demo_here
2670
2671 To remove a string anywhere you can use regular expressions with
2672 {/regexp/replacement} and leave the replacement empty:
2673
2674 parallel --plus echo {/demo_/} ::: demo_mycode remove_demo_here
2675
2677 Let us assume a website stores images like:
2678
2679 http://www.example.com/path/to/YYYYMMDD_##.jpg
2680
2681 where YYYYMMDD is the date and ## is the number 01-24. This will
2682 download images for the past 30 days:
2683
2684 getit() {
2685 date=$(date -d "today -$1 days" +%Y%m%d)
2686 num=$2
2687 echo wget http://www.example.com/path/to/${date}_${num}.jpg
2688 }
2689 export -f getit
2690
2691 parallel getit ::: $(seq 30) ::: $(seq -w 24)
2692
2693 $(date -d "today -$1 days" +%Y%m%d) will give the dates in YYYYMMDD
2694 with $1 days subtracted.
2695
2697 NASA provides tiles to download on earthdata.nasa.gov. Download tiles
2698 for Blue Marble world map and create a 10240x20480 map.
2699
2700 base=https://map1a.vis.earthdata.nasa.gov/wmts-geo/wmts.cgi
2701 service="SERVICE=WMTS&REQUEST=GetTile&VERSION=1.0.0"
2702 layer="LAYER=BlueMarble_ShadedRelief_Bathymetry"
2703 set="STYLE=&TILEMATRIXSET=EPSG4326_500m&TILEMATRIX=5"
2704 tile="TILEROW={1}&TILECOL={2}"
2705 format="FORMAT=image%2Fjpeg"
2706 url="$base?$service&$layer&$set&$tile&$format"
2707
2708 parallel -j0 -q wget "$url" -O {1}_{2}.jpg ::: {0..19} ::: {0..39}
2709 parallel eval convert +append {}_{0..39}.jpg line{}.jpg ::: {0..19}
2710 convert -append line{0..19}.jpg world.jpg
2711
2713 Search NASA using their API to get JSON for images related to 'apollo
2714 11' and has 'moon landing' in the description.
2715
2716 The search query returns JSON containing URLs to JSON containing
2717 collections of pictures. One of the pictures in each of these
2718 collection is large.
2719
2720 wget is used to get the JSON for the search query. jq is then used to
2721 extract the URLs of the collections. parallel then calls wget to get
2722 each collection, which is passed to jq to extract the URLs of all
2723 images. grep filters out the large images, and parallel finally uses
2724 wget to fetch the images.
2725
2726 base="https://images-api.nasa.gov/search"
2727 q="q=apollo 11"
2728 description="description=moon landing"
2729 media_type="media_type=image"
2730 wget -O - "$base?$q&$description&$media_type" |
2731 jq -r .collection.items[].href |
2732 parallel wget -O - |
2733 jq -r .[] |
2734 grep large |
2735 parallel wget
2736
2738 youtube-dl is an excellent tool to download videos. It can, however,
2739 not download videos in parallel. This takes a playlist and downloads 10
2740 videos in parallel.
2741
2742 url='youtu.be/watch?v=0wOf2Fgi3DE&list=UU_cznB5YZZmvAmeq7Y3EriQ'
2743 export url
2744 youtube-dl --flat-playlist "https://$url" |
2745 parallel --tagstring {#} --lb -j10 \
2746 youtube-dl --playlist-start {#} --playlist-end {#} '"https://$url"'
2747
2749 parallel mv {} '{= $a=pQ($_); $b=$_;' \
2750 '$_=qx{date -r "$a" +%FT%T}; chomp; $_="$_ $b" =}' ::: *
2751
2752 {= and =} mark a perl expression. pQ perl-quotes the string. date
2753 +%FT%T is the date in ISO8601 with time.
2754
2756 Save output from ps aux every second into dirs named
2757 yyyy-mm-ddThh:mm:ss+zz:zz.
2758
2759 seq 1000 | parallel -N0 -j1 --delay 1 \
2760 --results '{= $_=`date -Isec`; chomp=}/' ps aux
2761
2763 The : in a digital clock blinks. To make every other line have a ':'
2764 and the rest a ' ' a perl expression is used to look at the 3rd input
2765 source. If the value modulo 2 is 1: Use ":" otherwise use " ":
2766
2767 parallel -k echo {1}'{=3 $_=$_%2?":":" "=}'{2}{3} \
2768 ::: {0..12} ::: {0..5} ::: {0..9}
2769
2771 This:
2772
2773 parallel --header : echo x{X}y{Y}z{Z} \> x{X}y{Y}z{Z} \
2774 ::: X {1..5} ::: Y {01..10} ::: Z {1..5}
2775
2776 will generate the files x1y01z1 .. x5y10z5. If you want to aggregate
2777 the output grouping on x and z you can do this:
2778
2779 parallel eval 'cat {=s/y01/y*/=} > {=s/y01//=}' ::: *y01*
2780
2781 For all values of x and z it runs commands like:
2782
2783 cat x1y*z1 > x1z1
2784
2785 So you end up with x1z1 .. x5z5 each containing the content of all
2786 values of y.
2787
2789 This script below will crawl and mirror a URL in parallel. It
2790 downloads first pages that are 1 click down, then 2 clicks down, then
2791 3; instead of the normal depth first, where the first link link on each
2792 page is fetched first.
2793
2794 Run like this:
2795
2796 PARALLEL=-j100 ./parallel-crawl http://gatt.org.yeslab.org/
2797
2798 Remove the wget part if you only want a web crawler.
2799
2800 It works by fetching a page from a list of URLs and looking for links
2801 in that page that are within the same starting URL and that have not
2802 already been seen. These links are added to a new queue. When all the
2803 pages from the list is done, the new queue is moved to the list of URLs
2804 and the process is started over until no unseen links are found.
2805
2806 #!/bin/bash
2807
2808 # E.g. http://gatt.org.yeslab.org/
2809 URL=$1
2810 # Stay inside the start dir
2811 BASEURL=$(echo $URL | perl -pe 's:#.*::; s:(//.*/)[^/]*:$1:')
2812 URLLIST=$(mktemp urllist.XXXX)
2813 URLLIST2=$(mktemp urllist.XXXX)
2814 SEEN=$(mktemp seen.XXXX)
2815
2816 # Spider to get the URLs
2817 echo $URL >$URLLIST
2818 cp $URLLIST $SEEN
2819
2820 while [ -s $URLLIST ] ; do
2821 cat $URLLIST |
2822 parallel lynx -listonly -image_links -dump {} \; \
2823 wget -qm -l1 -Q1 {} \; echo Spidered: {} \>\&2 |
2824 perl -ne 's/#.*//; s/\s+\d+.\s(\S+)$/$1/ and
2825 do { $seen{$1}++ or print }' |
2826 grep -F $BASEURL |
2827 grep -v -x -F -f $SEEN | tee -a $SEEN > $URLLIST2
2828 mv $URLLIST2 $URLLIST
2829 done
2830
2831 rm -f $URLLIST $URLLIST2 $SEEN
2832
2834 If the files to be processed are in a tar file then unpacking one file
2835 and processing it immediately may be faster than first unpacking all
2836 files.
2837
2838 tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' | \
2839 parallel echo
2840
2841 The Perl one-liner is needed to make sure the file is complete before
2842 handing it to GNU parallel.
2843
2845 for-loops like this:
2846
2847 (for x in `cat list` ; do
2848 do_something $x
2849 done) | process_output
2850
2851 and while-read-loops like this:
2852
2853 cat list | (while read x ; do
2854 do_something $x
2855 done) | process_output
2856
2857 can be written like this:
2858
2859 cat list | parallel do_something | process_output
2860
2861 For example: Find which host name in a list has IP address 1.2.3 4:
2862
2863 cat hosts.txt | parallel -P 100 host | grep 1.2.3.4
2864
2865 If the processing requires more steps the for-loop like this:
2866
2867 (for x in `cat list` ; do
2868 no_extension=${x%.*};
2869 do_step1 $x scale $no_extension.jpg
2870 do_step2 <$x $no_extension
2871 done) | process_output
2872
2873 and while-loops like this:
2874
2875 cat list | (while read x ; do
2876 no_extension=${x%.*};
2877 do_step1 $x scale $no_extension.jpg
2878 do_step2 <$x $no_extension
2879 done) | process_output
2880
2881 can be written like this:
2882
2883 cat list | parallel "do_step1 {} scale {.}.jpg ; do_step2 <{} {.}" |\
2884 process_output
2885
2886 If the body of the loop is bigger, it improves readability to use a
2887 function:
2888
2889 (for x in `cat list` ; do
2890 do_something $x
2891 [... 100 lines that do something with $x ...]
2892 done) | process_output
2893
2894 cat list | (while read x ; do
2895 do_something $x
2896 [... 100 lines that do something with $x ...]
2897 done) | process_output
2898
2899 can both be rewritten as:
2900
2901 doit() {
2902 x=$1
2903 do_something $x
2904 [... 100 lines that do something with $x ...]
2905 }
2906 export -f doit
2907 cat list | parallel doit
2908
2910 Nested for-loops like this:
2911
2912 (for x in `cat xlist` ; do
2913 for y in `cat ylist` ; do
2914 do_something $x $y
2915 done
2916 done) | process_output
2917
2918 can be written like this:
2919
2920 parallel do_something {1} {2} :::: xlist ylist | process_output
2921
2922 Nested for-loops like this:
2923
2924 (for colour in red green blue ; do
2925 for size in S M L XL XXL ; do
2926 echo $colour $size
2927 done
2928 done) | sort
2929
2930 can be written like this:
2931
2932 parallel echo {1} {2} ::: red green blue ::: S M L XL XXL | sort
2933
2935 diff is good for finding differences in text files. diff | wc -l gives
2936 an indication of the size of the difference. To find the differences
2937 between all files in the current dir do:
2938
2939 parallel --tag 'diff {1} {2} | wc -l' ::: * ::: * | sort -nk3
2940
2941 This way it is possible to see if some files are closer to other files.
2942
2944 When doing multiple nested for-loops it can be easier to keep track of
2945 the loop variable if is is named instead of just having a number. Use
2946 --header : to let the first argument be an named alias for the
2947 positional replacement string:
2948
2949 parallel --header : echo {colour} {size} \
2950 ::: colour red green blue ::: size S M L XL XXL
2951
2952 This also works if the input file is a file with columns:
2953
2954 cat addressbook.tsv | \
2955 parallel --colsep '\t' --header : echo {Name} {E-mail address}
2956
2958 GNU parallel makes all combinations when given two lists.
2959
2960 To make all combinations in a single list with unique values, you
2961 repeat the list and use replacement string {choose_k}:
2962
2963 parallel --plus echo {choose_k} ::: A B C D ::: A B C D
2964
2965 parallel --plus echo 2{2choose_k} 1{1choose_k} ::: A B C D ::: A B C D
2966
2967 {choose_k} works for any number of input sources:
2968
2969 parallel --plus echo {choose_k} ::: A B C D ::: A B C D ::: A B C D
2970
2972 Assume you have input like:
2973
2974 aardvark
2975 babble
2976 cab
2977 dab
2978 each
2979
2980 and want to run combinations like:
2981
2982 aardvark babble
2983 babble cab
2984 cab dab
2985 dab each
2986
2987 If the input is in the file in.txt:
2988
2989 parallel echo {1} - {2} ::::+ <(head -n -1 in.txt) <(tail -n +2 in.txt)
2990
2991 If the input is in the array $a here are two solutions:
2992
2993 seq $((${#a[@]}-1)) | \
2994 env_parallel --env a echo '${a[{=$_--=}]} - ${a[{}]}'
2995 parallel echo {1} - {2} ::: "${a[@]::${#a[@]}-1}" :::+ "${a[@]:1}"
2996
2998 Using --results the results are saved in /tmp/diffcount*.
2999
3000 parallel --results /tmp/diffcount "diff -U 0 {1} {2} | \
3001 tail -n +3 |grep -v '^@'|wc -l" ::: * ::: *
3002
3003 To see the difference between file A and file B look at the file
3004 '/tmp/diffcount/1/A/2/B'.
3005
3007 Starting a job on the local machine takes around 10 ms. This can be a
3008 big overhead if the job takes very few ms to run. Often you can group
3009 small jobs together using -X which will make the overhead less
3010 significant. Compare the speed of these:
3011
3012 seq -w 0 9999 | parallel touch pict{}.jpg
3013 seq -w 0 9999 | parallel -X touch pict{}.jpg
3014
3015 If your program cannot take multiple arguments, then you can use GNU
3016 parallel to spawn multiple GNU parallels:
3017
3018 seq -w 0 9999999 | \
3019 parallel -j10 -q -I,, --pipe parallel -j0 touch pict{}.jpg
3020
3021 If -j0 normally spawns 252 jobs, then the above will try to spawn 2520
3022 jobs. On a normal GNU/Linux system you can spawn 32000 jobs using this
3023 technique with no problems. To raise the 32000 jobs limit raise
3024 /proc/sys/kernel/pid_max to 4194303.
3025
3026 If you do not need GNU parallel to have control over each job (so no
3027 need for --retries or --joblog or similar), then it can be even faster
3028 if you can generate the command lines and pipe those to a shell. So if
3029 you can do this:
3030
3031 mygenerator | sh
3032
3033 Then that can be parallelized like this:
3034
3035 mygenerator | parallel --pipe --block 10M sh
3036
3037 E.g.
3038
3039 mygenerator() {
3040 seq 10000000 | perl -pe 'print "echo This is fast job number "';
3041 }
3042 mygenerator | parallel --pipe --block 10M sh
3043
3044 The overhead is 100000 times smaller namely around 100 nanoseconds per
3045 job.
3046
3048 When using shell variables you need to quote them correctly as they may
3049 otherwise be interpreted by the shell.
3050
3051 Notice the difference between:
3052
3053 ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
3054 parallel echo ::: ${ARR[@]} # This is probably not what you want
3055
3056 and:
3057
3058 ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
3059 parallel echo ::: "${ARR[@]}"
3060
3061 When using variables in the actual command that contains special
3062 characters (e.g. space) you can quote them using '"$VAR"' or using "'s
3063 and -q:
3064
3065 VAR="My brother's 12\" records are worth <\$\$\$>"
3066 parallel -q echo "$VAR" ::: '!'
3067 export VAR
3068 parallel echo '"$VAR"' ::: '!'
3069
3070 If $VAR does not contain ' then "'$VAR'" will also work (and does not
3071 need export):
3072
3073 VAR="My 12\" records are worth <\$\$\$>"
3074 parallel echo "'$VAR'" ::: '!'
3075
3076 If you use them in a function you just quote as you normally would do:
3077
3078 VAR="My brother's 12\" records are worth <\$\$\$>"
3079 export VAR
3080 myfunc() { echo "$VAR" "$1"; }
3081 export -f myfunc
3082 parallel myfunc ::: '!'
3083
3085 When running jobs that output data, you often do not want the output of
3086 multiple jobs to run together. GNU parallel defaults to grouping the
3087 output of each job, so the output is printed when the job finishes. If
3088 you want full lines to be printed while the job is running you can use
3089 --line-buffer. If you want output to be printed as soon as possible you
3090 can use -u.
3091
3092 Compare the output of:
3093
3094 parallel wget --limit-rate=100k \
3095 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3096 ::: {12..16}
3097 parallel --line-buffer wget --limit-rate=100k \
3098 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3099 ::: {12..16}
3100 parallel -u wget --limit-rate=100k \
3101 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3102 ::: {12..16}
3103
3105 GNU parallel groups the output lines, but it can be hard to see where
3106 the different jobs begin. --tag prepends the argument to make that more
3107 visible:
3108
3109 parallel --tag wget --limit-rate=100k \
3110 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3111 ::: {12..16}
3112
3113 --tag works with --line-buffer but not with -u:
3114
3115 parallel --tag --line-buffer wget --limit-rate=100k \
3116 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3117 ::: {12..16}
3118
3119 Check the uptime of the servers in ~/.parallel/sshloginfile:
3120
3121 parallel --tag -S .. --nonall uptime
3122
3124 Give each job a new color. Most terminals support ANSI colors with the
3125 escape code "\033[30;3Xm" where 0 <= X <= 7:
3126
3127 seq 10 | \
3128 parallel --tagstring '\033[30;3{=$_=++$::color%8=}m' seq {}
3129 parallel --rpl '{color} $_="\033[30;3".(++$::color%8)."m"' \
3130 --tagstring {color} seq {} ::: {1..10}
3131
3132 To get rid of the initial \t (which comes from --tagstring):
3133
3134 ... | perl -pe 's/\t//'
3135
3137 Normally the output of a job will be printed as soon as it completes.
3138 Sometimes you want the order of the output to remain the same as the
3139 order of the input. This is often important, if the output is used as
3140 input for another system. -k will make sure the order of output will be
3141 in the same order as input even if later jobs end before earlier jobs.
3142
3143 Append a string to every line in a text file:
3144
3145 cat textfile | parallel -k echo {} append_string
3146
3147 If you remove -k some of the lines may come out in the wrong order.
3148
3149 Another example is traceroute:
3150
3151 parallel traceroute ::: qubes-os.org debian.org freenetproject.org
3152
3153 will give traceroute of qubes-os.org, debian.org and
3154 freenetproject.org, but it will be sorted according to which job
3155 completed first.
3156
3157 To keep the order the same as input run:
3158
3159 parallel -k traceroute ::: qubes-os.org debian.org freenetproject.org
3160
3161 This will make sure the traceroute to qubes-os.org will be printed
3162 first.
3163
3164 A bit more complex example is downloading a huge file in chunks in
3165 parallel: Some internet connections will deliver more data if you
3166 download files in parallel. For downloading files in parallel see:
3167 "EXAMPLE: Download 10 images for each of the past 30 days". But if you
3168 are downloading a big file you can download the file in chunks in
3169 parallel.
3170
3171 To download byte 10000000-19999999 you can use curl:
3172
3173 curl -r 10000000-19999999 http://example.com/the/big/file >file.part
3174
3175 To download a 1 GB file we need 100 10MB chunks downloaded and combined
3176 in the correct order.
3177
3178 seq 0 99 | parallel -k curl -r \
3179 {}0000000-{}9999999 http://example.com/the/big/file > file
3180
3182 grep -r greps recursively through directories. On multicore CPUs GNU
3183 parallel can often speed this up.
3184
3185 find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}
3186
3187 This will run 1.5 job per CPU, and give 1000 arguments to grep.
3188
3190 The simplest solution to grep a big file for a lot of regexps is:
3191
3192 grep -f regexps.txt bigfile
3193
3194 Or if the regexps are fixed strings:
3195
3196 grep -F -f regexps.txt bigfile
3197
3198 There are 3 limiting factors: CPU, RAM, and disk I/O.
3199
3200 RAM is easy to measure: If the grep process takes up most of your free
3201 memory (e.g. when running top), then RAM is a limiting factor.
3202
3203 CPU is also easy to measure: If the grep takes >90% CPU in top, then
3204 the CPU is a limiting factor, and parallelization will speed this up.
3205
3206 It is harder to see if disk I/O is the limiting factor, and depending
3207 on the disk system it may be faster or slower to parallelize. The only
3208 way to know for certain is to test and measure.
3209
3210 Limiting factor: RAM
3211 The normal grep -f regexps.txt bigfile works no matter the size of
3212 bigfile, but if regexps.txt is so big it cannot fit into memory, then
3213 you need to split this.
3214
3215 grep -F takes around 100 bytes of RAM and grep takes about 500 bytes of
3216 RAM per 1 byte of regexp. So if regexps.txt is 1% of your RAM, then it
3217 may be too big.
3218
3219 If you can convert your regexps into fixed strings do that. E.g. if the
3220 lines you are looking for in bigfile all looks like:
3221
3222 ID1 foo bar baz Identifier1 quux
3223 fubar ID2 foo bar baz Identifier2
3224
3225 then your regexps.txt can be converted from:
3226
3227 ID1.*Identifier1
3228 ID2.*Identifier2
3229
3230 into:
3231
3232 ID1 foo bar baz Identifier1
3233 ID2 foo bar baz Identifier2
3234
3235 This way you can use grep -F which takes around 80% less memory and is
3236 much faster.
3237
3238 If it still does not fit in memory you can do this:
3239
3240 parallel --pipepart -a regexps.txt --block 1M grep -Ff - -n bigfile | \
3241 sort -un | perl -pe 's/^\d+://'
3242
3243 The 1M should be your free memory divided by the number of CPU threads
3244 and divided by 200 for grep -F and by 1000 for normal grep. On
3245 GNU/Linux you can do:
3246
3247 free=$(awk '/^((Swap)?Cached|MemFree|Buffers):/ { sum += $2 }
3248 END { print sum }' /proc/meminfo)
3249 percpu=$((free / 200 / $(parallel --number-of-threads)))k
3250
3251 parallel --pipepart -a regexps.txt --block $percpu --compress \
3252 grep -F -f - -n bigfile | \
3253 sort -un | perl -pe 's/^\d+://'
3254
3255 If you can live with duplicated lines and wrong order, it is faster to
3256 do:
3257
3258 parallel --pipepart -a regexps.txt --block $percpu --compress \
3259 grep -F -f - bigfile
3260
3261 Limiting factor: CPU
3262 If the CPU is the limiting factor parallelization should be done on the
3263 regexps:
3264
3265 cat regexps.txt | parallel --pipe -L1000 --roundrobin --compress \
3266 grep -f - -n bigfile | \
3267 sort -un | perl -pe 's/^\d+://'
3268
3269 The command will start one grep per CPU and read bigfile one time per
3270 CPU, but as that is done in parallel, all reads except the first will
3271 be cached in RAM. Depending on the size of regexps.txt it may be faster
3272 to use --block 10m instead of -L1000.
3273
3274 Some storage systems perform better when reading multiple chunks in
3275 parallel. This is true for some RAID systems and for some network file
3276 systems. To parallelize the reading of bigfile:
3277
3278 parallel --pipepart --block 100M -a bigfile -k --compress \
3279 grep -f regexps.txt
3280
3281 This will split bigfile into 100MB chunks and run grep on each of these
3282 chunks. To parallelize both reading of bigfile and regexps.txt combine
3283 the two using --cat:
3284
3285 parallel --pipepart --block 100M -a bigfile --cat cat regexps.txt \
3286 \| parallel --pipe -L1000 --roundrobin grep -f - {}
3287
3288 If a line matches multiple regexps, the line may be duplicated.
3289
3290 Bigger problem
3291 If the problem is too big to be solved by this, you are probably ready
3292 for Lucene.
3293
3295 To run commands on a remote computer SSH needs to be set up and you
3296 must be able to login without entering a password (The commands ssh-
3297 copy-id, ssh-agent, and sshpass may help you do that).
3298
3299 If you need to login to a whole cluster, you typically do not want to
3300 accept the host key for every host. You want to accept them the first
3301 time and be warned if they are ever changed. To do that:
3302
3303 # Add the servers to the sshloginfile
3304 (echo servera; echo serverb) > .parallel/my_cluster
3305 # Make sure .ssh/config exist
3306 touch .ssh/config
3307 cp .ssh/config .ssh/config.backup
3308 # Disable StrictHostKeyChecking temporarily
3309 (echo 'Host *'; echo StrictHostKeyChecking no) >> .ssh/config
3310 parallel --slf my_cluster --nonall true
3311 # Remove the disabling of StrictHostKeyChecking
3312 mv .ssh/config.backup .ssh/config
3313
3314 The servers in .parallel/my_cluster are now added in .ssh/known_hosts.
3315
3316 To run echo on server.example.com:
3317
3318 seq 10 | parallel --sshlogin server.example.com echo
3319
3320 To run commands on more than one remote computer run:
3321
3322 seq 10 | parallel --sshlogin s1.example.com,s2.example.net echo
3323
3324 Or:
3325
3326 seq 10 | parallel --sshlogin server.example.com \
3327 --sshlogin server2.example.net echo
3328
3329 If the login username is foo on server2.example.net use:
3330
3331 seq 10 | parallel --sshlogin server.example.com \
3332 --sshlogin foo@server2.example.net echo
3333
3334 If your list of hosts is server1-88.example.net with login foo:
3335
3336 seq 10 | parallel -Sfoo@server{1..88}.example.net echo
3337
3338 To distribute the commands to a list of computers, make a file
3339 mycomputers with all the computers:
3340
3341 server.example.com
3342 foo@server2.example.com
3343 server3.example.com
3344
3345 Then run:
3346
3347 seq 10 | parallel --sshloginfile mycomputers echo
3348
3349 To include the local computer add the special sshlogin ':' to the list:
3350
3351 server.example.com
3352 foo@server2.example.com
3353 server3.example.com
3354 :
3355
3356 GNU parallel will try to determine the number of CPUs on each of the
3357 remote computers, and run one job per CPU - even if the remote
3358 computers do not have the same number of CPUs.
3359
3360 If the number of CPUs on the remote computers is not identified
3361 correctly the number of CPUs can be added in front. Here the computer
3362 has 8 CPUs.
3363
3364 seq 10 | parallel --sshlogin 8/server.example.com echo
3365
3367 To recompress gzipped files with bzip2 using a remote computer run:
3368
3369 find logs/ -name '*.gz' | \
3370 parallel --sshlogin server.example.com \
3371 --transfer "zcat {} | bzip2 -9 >{.}.bz2"
3372
3373 This will list the .gz-files in the logs directory and all directories
3374 below. Then it will transfer the files to server.example.com to the
3375 corresponding directory in $HOME/logs. On server.example.com the file
3376 will be recompressed using zcat and bzip2 resulting in the
3377 corresponding file with .gz replaced with .bz2.
3378
3379 If you want the resulting bz2-file to be transferred back to the local
3380 computer add --return {.}.bz2:
3381
3382 find logs/ -name '*.gz' | \
3383 parallel --sshlogin server.example.com \
3384 --transfer --return {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3385
3386 After the recompressing is done the .bz2-file is transferred back to
3387 the local computer and put next to the original .gz-file.
3388
3389 If you want to delete the transferred files on the remote computer add
3390 --cleanup. This will remove both the file transferred to the remote
3391 computer and the files transferred from the remote computer:
3392
3393 find logs/ -name '*.gz' | \
3394 parallel --sshlogin server.example.com \
3395 --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
3396
3397 If you want run on several computers add the computers to --sshlogin
3398 either using ',' or multiple --sshlogin:
3399
3400 find logs/ -name '*.gz' | \
3401 parallel --sshlogin server.example.com,server2.example.com \
3402 --sshlogin server3.example.com \
3403 --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
3404
3405 You can add the local computer using --sshlogin :. This will disable
3406 the removing and transferring for the local computer only:
3407
3408 find logs/ -name '*.gz' | \
3409 parallel --sshlogin server.example.com,server2.example.com \
3410 --sshlogin server3.example.com \
3411 --sshlogin : \
3412 --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
3413
3414 Often --transfer, --return and --cleanup are used together. They can be
3415 shortened to --trc:
3416
3417 find logs/ -name '*.gz' | \
3418 parallel --sshlogin server.example.com,server2.example.com \
3419 --sshlogin server3.example.com \
3420 --sshlogin : \
3421 --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3422
3423 With the file mycomputers containing the list of computers it becomes:
3424
3425 find logs/ -name '*.gz' | parallel --sshloginfile mycomputers \
3426 --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3427
3428 If the file ~/.parallel/sshloginfile contains the list of computers the
3429 special short hand -S .. can be used:
3430
3431 find logs/ -name '*.gz' | parallel -S .. \
3432 --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3433
3435 Convert *.mp3 to *.ogg running one process per CPU on local computer
3436 and server2:
3437
3438 parallel --trc {.}.ogg -S server2,: \
3439 'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg' ::: *.mp3
3440
3442 To run the command uptime on remote computers you can do:
3443
3444 parallel --tag --nonall -S server1,server2 uptime
3445
3446 --nonall reads no arguments. If you have a list of jobs you want to run
3447 on each computer you can do:
3448
3449 parallel --tag --onall -S server1,server2 echo ::: 1 2 3
3450
3451 Remove --tag if you do not want the sshlogin added before the output.
3452
3453 If you have a lot of hosts use '-j0' to access more hosts in parallel.
3454
3456 Put the password into passwordfile then run:
3457
3458 parallel --ssh 'cat passwordfile | ssh' --nonall \
3459 -S user@server1,user@server2 sudo -S ls -l /root
3460
3462 If the workers are behind a NAT wall, you need some trickery to get to
3463 them.
3464
3465 If you can ssh to a jumphost, and reach the workers from there, then
3466 the obvious solution would be this, but it does not work:
3467
3468 parallel --ssh 'ssh jumphost ssh' -S host1 echo ::: DOES NOT WORK
3469
3470 It does not work because the command is dequoted by ssh twice where as
3471 GNU parallel only expects it to be dequoted once.
3472
3473 You can use a bash function and have GNU parallel quote the command:
3474
3475 jumpssh() { ssh -A jumphost ssh $(parallel --shellquote ::: "$@"); }
3476 export -f jumpssh
3477 parallel --ssh jumpssh -S host1 echo ::: this works
3478
3479 Or you can instead put this in ~/.ssh/config:
3480
3481 Host host1 host2 host3
3482 ProxyCommand ssh jumphost.domain nc -w 1 %h 22
3483
3484 It requires nc(netcat) to be installed on jumphost. With this you can
3485 simply:
3486
3487 parallel -S host1,host2,host3 echo ::: This does work
3488
3489 No jumphost, but port forwards
3490 If there is no jumphost but each server has port 22 forwarded from the
3491 firewall (e.g. the firewall's port 22001 = port 22 on host1, 22002 =
3492 host2, 22003 = host3) then you can use ~/.ssh/config:
3493
3494 Host host1.v
3495 Port 22001
3496 Host host2.v
3497 Port 22002
3498 Host host3.v
3499 Port 22003
3500 Host *.v
3501 Hostname firewall
3502
3503 And then use host{1..3}.v as normal hosts:
3504
3505 parallel -S host1.v,host2.v,host3.v echo ::: a b c
3506
3507 No jumphost, no port forwards
3508 If ports cannot be forwarded, you need some sort of VPN to traverse the
3509 NAT-wall. TOR is one options for that, as it is very easy to get
3510 working.
3511
3512 You need to install TOR and setup a hidden service. In torrc put:
3513
3514 HiddenServiceDir /var/lib/tor/hidden_service/
3515 HiddenServicePort 22 127.0.0.1:22
3516
3517 Then start TOR: /etc/init.d/tor restart
3518
3519 The TOR hostname is now in /var/lib/tor/hidden_service/hostname and is
3520 something similar to izjafdceobowklhz.onion. Now you simply prepend
3521 torsocks to ssh:
3522
3523 parallel --ssh 'torsocks ssh' -S izjafdceobowklhz.onion \
3524 -S zfcdaeiojoklbwhz.onion,auclucjzobowklhi.onion echo ::: a b c
3525
3526 If not all hosts are accessible through TOR:
3527
3528 parallel -S 'torsocks ssh izjafdceobowklhz.onion,host2,host3' \
3529 echo ::: a b c
3530
3531 See more ssh tricks on
3532 https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Proxies_and_Jump_Hosts
3533
3535 rsync is a great tool, but sometimes it will not fill up the available
3536 bandwidth. Running multiple rsync in parallel can fix this.
3537
3538 cd src-dir
3539 find . -type f |
3540 parallel -j10 -X rsync -zR -Ha ./{} fooserver:/dest-dir/
3541
3542 Adjust -j10 until you find the optimal number.
3543
3544 rsync -R will create the needed subdirectories, so all files are not
3545 put into a single dir. The ./ is needed so the resulting command looks
3546 similar to:
3547
3548 rsync -zR ././sub/dir/file fooserver:/dest-dir/
3549
3550 The /./ is what rsync -R works on.
3551
3552 If you are unable to push data, but need to pull them and the files are
3553 called digits.png (e.g. 000000.png) you might be able to do:
3554
3555 seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/
3556
3558 Copy files like foo.es.ext to foo.ext:
3559
3560 ls *.es.* | perl -pe 'print; s/\.es//' | parallel -N2 cp {1} {2}
3561
3562 The perl command spits out 2 lines for each input. GNU parallel takes 2
3563 inputs (using -N2) and replaces {1} and {2} with the inputs.
3564
3565 Count in binary:
3566
3567 parallel -k echo ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1
3568
3569 Print the number on the opposing sides of a six sided die:
3570
3571 parallel --link -a <(seq 6) -a <(seq 6 -1 1) echo
3572 parallel --link echo :::: <(seq 6) <(seq 6 -1 1)
3573
3574 Convert files from all subdirs to PNG-files with consecutive numbers
3575 (useful for making input PNG's for ffmpeg):
3576
3577 parallel --link -a <(find . -type f | sort) \
3578 -a <(seq $(find . -type f|wc -l)) convert {1} {2}.png
3579
3580 Alternative version:
3581
3582 find . -type f | sort | parallel convert {} {#}.png
3583
3585 Content of table_file.tsv:
3586
3587 foo<TAB>bar
3588 baz <TAB> quux
3589
3590 To run:
3591
3592 cmd -o bar -i foo
3593 cmd -o quux -i baz
3594
3595 you can run:
3596
3597 parallel -a table_file.tsv --colsep '\t' cmd -o {2} -i {1}
3598
3599 Note: The default for GNU parallel is to remove the spaces around the
3600 columns. To keep the spaces:
3601
3602 parallel -a table_file.tsv --trim n --colsep '\t' cmd -o {2} -i {1}
3603
3605 GNU parallel can output to a database table and a CSV-file:
3606
3607 dburl=csv:///%2Ftmp%2Fmydir
3608 dbtableurl=$dburl/mytable.csv
3609 parallel --sqlandworker $dbtableurl seq ::: {1..10}
3610
3611 It is rather slow and takes up a lot of CPU time because GNU parallel
3612 parses the whole CSV file for each update.
3613
3614 A better approach is to use an SQLite-base and then convert that to
3615 CSV:
3616
3617 dburl=sqlite3:///%2Ftmp%2Fmy.sqlite
3618 dbtableurl=$dburl/mytable
3619 parallel --sqlandworker $dbtableurl seq ::: {1..10}
3620 sql $dburl '.headers on' '.mode csv' 'SELECT * FROM mytable;'
3621
3622 This takes around a second per job.
3623
3624 If you have access to a real database system, such as PostgreSQL, it is
3625 even faster:
3626
3627 dburl=pg://user:pass@host/mydb
3628 dbtableurl=$dburl/mytable
3629 parallel --sqlandworker $dbtableurl seq ::: {1..10}
3630 sql $dburl \
3631 "COPY (SELECT * FROM mytable) TO stdout DELIMITER ',' CSV HEADER;"
3632
3633 Or MySQL:
3634
3635 dburl=mysql://user:pass@host/mydb
3636 dbtableurl=$dburl/mytable
3637 parallel --sqlandworker $dbtableurl seq ::: {1..10}
3638 sql -p -B $dburl "SELECT * FROM mytable;" > mytable.tsv
3639 perl -pe 's/"/""/g; s/\t/","/g; s/^/"/; s/$/"/;
3640 %s=("\\" => "\\", "t" => "\t", "n" => "\n");
3641 s/\\([\\tn])/$s{$1}/g;' mytable.tsv
3642
3644 If you have no need for the advanced job distribution control that a
3645 database provides, but you simply want output into a CSV file that you
3646 can read into R or LibreCalc, then you can use --results:
3647
3648 parallel --results my.csv seq ::: 10 20 30
3649 R
3650 > mydf <- read.csv("my.csv");
3651 > print(mydf[2,])
3652 > write(as.character(mydf[2,c("Stdout")]),'')
3653
3655 The show Aflyttet on Radio 24syv publishes an RSS feed with their audio
3656 podcasts on: http://arkiv.radio24syv.dk/audiopodcast/channel/4466232
3657
3658 Using xpath you can extract the URLs for 2019 and download them using
3659 GNU parallel:
3660
3661 wget -O - http://arkiv.radio24syv.dk/audiopodcast/channel/4466232 | \
3662 xpath -e "//pubDate[contains(text(),'2019')]/../enclosure/@url" | \
3663 parallel -u wget '{= s/ url="//; s/"//; =}'
3664
3666 If you want to run the same command with the same arguments 10 times in
3667 parallel you can do:
3668
3669 seq 10 | parallel -n0 my_command my_args
3670
3672 GNU parallel can work similar to cat | sh.
3673
3674 A resource inexpensive job is a job that takes very little CPU, disk
3675 I/O and network I/O. Ping is an example of a resource inexpensive job.
3676 wget is too - if the webpages are small.
3677
3678 The content of the file jobs_to_run:
3679
3680 ping -c 1 10.0.0.1
3681 wget http://example.com/status.cgi?ip=10.0.0.1
3682 ping -c 1 10.0.0.2
3683 wget http://example.com/status.cgi?ip=10.0.0.2
3684 ...
3685 ping -c 1 10.0.0.255
3686 wget http://example.com/status.cgi?ip=10.0.0.255
3687
3688 To run 100 processes simultaneously do:
3689
3690 parallel -j 100 < jobs_to_run
3691
3692 As there is not a command the jobs will be evaluated by the shell.
3693
3695 FASTA files have the format:
3696
3697 >Sequence name1
3698 sequence
3699 sequence continued
3700 >Sequence name2
3701 sequence
3702 sequence continued
3703 more sequence
3704
3705 To call myprog with the sequence as argument run:
3706
3707 cat file.fasta |
3708 parallel --pipe -N1 --recstart '>' --rrs \
3709 'read a; echo Name: "$a"; myprog $(tr -d "\n")'
3710
3712 To process a big file or some output you can use --pipe to split up the
3713 data into blocks and pipe the blocks into the processing program.
3714
3715 If the program is gzip -9 you can do:
3716
3717 cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz
3718
3719 This will split bigfile into blocks of 1 MB and pass that to gzip -9 in
3720 parallel. One gzip will be run per CPU. The output of gzip -9 will be
3721 kept in order and saved to bigfile.gz
3722
3723 gzip works fine if the output is appended, but some processing does not
3724 work like that - for example sorting. For this GNU parallel can put the
3725 output of each command into a file. This will sort a big file in
3726 parallel:
3727
3728 cat bigfile | parallel --pipe --files sort |\
3729 parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort
3730
3731 Here bigfile is split into blocks of around 1MB, each block ending in
3732 '\n' (which is the default for --recend). Each block is passed to sort
3733 and the output from sort is saved into files. These files are passed to
3734 the second parallel that runs sort -m on the files before it removes
3735 the files. The output is saved to bigfile.sort.
3736
3737 GNU parallel's --pipe maxes out at around 100 MB/s because every byte
3738 has to be copied through GNU parallel. But if bigfile is a real
3739 (seekable) file GNU parallel can by-pass the copying and send the parts
3740 directly to the program:
3741
3742 parallel --pipepart --block 100m -a bigfile --files sort |\
3743 parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort
3744
3746 When processing with --pipe you may have lines grouped by a value. Here
3747 is my.csv:
3748
3749 Transaction Customer Item
3750 1 a 53
3751 2 b 65
3752 3 b 82
3753 4 c 96
3754 5 c 67
3755 6 c 13
3756 7 d 90
3757 8 d 43
3758 9 d 91
3759 10 d 84
3760 11 e 72
3761 12 e 102
3762 13 e 63
3763 14 e 56
3764 15 e 74
3765
3766 Let us assume you want GNU parallel to process each customer. In other
3767 words: You want all the transactions for a single customer to be
3768 treated as a single record.
3769
3770 To do this we preprocess the data with a program that inserts a record
3771 separator before each customer (column 2 = $F[1]). Here we first make a
3772 50 character random string, which we then use as the separator:
3773
3774 sep=`perl -e 'print map { ("a".."z","A".."Z")[rand(52)] } (1..50);'`
3775 cat my.csv | \
3776 perl -ape '$F[1] ne $l and print "'$sep'"; $l = $F[1]' | \
3777 parallel --recend $sep --rrs --pipe -N1 wc
3778
3779 If your program can process multiple customers replace -N1 with a
3780 reasonable --blocksize.
3781
3783 If you need to run a massive amount of jobs in parallel, then you will
3784 likely hit the filehandle limit which is often around 250 jobs. If you
3785 are super user you can raise the limit in /etc/security/limits.conf but
3786 you can also use this workaround. The filehandle limit is per process.
3787 That means that if you just spawn more GNU parallels then each of them
3788 can run 250 jobs. This will spawn up to 2500 jobs:
3789
3790 cat myinput |\
3791 parallel --pipe -N 50 --roundrobin -j50 parallel -j50 your_prg
3792
3793 This will spawn up to 62500 jobs (use with caution - you need 64 GB RAM
3794 to do this, and you may need to increase /proc/sys/kernel/pid_max):
3795
3796 cat myinput |\
3797 parallel --pipe -N 250 --roundrobin -j250 parallel -j250 your_prg
3798
3800 The command sem is an alias for parallel --semaphore.
3801
3802 A counting semaphore will allow a given number of jobs to be started in
3803 the background. When the number of jobs are running in the background,
3804 GNU sem will wait for one of these to complete before starting another
3805 command. sem --wait will wait for all jobs to complete.
3806
3807 Run 10 jobs concurrently in the background:
3808
3809 for i in *.log ; do
3810 echo $i
3811 sem -j10 gzip $i ";" echo done
3812 done
3813 sem --wait
3814
3815 A mutex is a counting semaphore allowing only one job to run. This will
3816 edit the file myfile and prepends the file with lines with the numbers
3817 1 to 3.
3818
3819 seq 3 | parallel sem sed -i -e '1i{}' myfile
3820
3821 As myfile can be very big it is important only one process edits the
3822 file at the same time.
3823
3824 Name the semaphore to have multiple different semaphores active at the
3825 same time:
3826
3827 seq 3 | parallel sem --id mymutex sed -i -e '1i{}' myfile
3828
3830 Assume a script is called from cron or from a web service, but only one
3831 instance can be run at a time. With sem and --shebang-wrap the script
3832 can be made to wait for other instances to finish. Here in bash:
3833
3834 #!/usr/bin/sem --shebang-wrap -u --id $0 --fg /bin/bash
3835
3836 echo This will run
3837 sleep 5
3838 echo exclusively
3839
3840 Here perl:
3841
3842 #!/usr/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/perl
3843
3844 print "This will run ";
3845 sleep 5;
3846 print "exclusively\n";
3847
3848 Here python:
3849
3850 #!/usr/local/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/python
3851
3852 import time
3853 print "This will run ";
3854 time.sleep(5)
3855 print "exclusively";
3856
3858 You can use GNU parallel to start interactive programs like emacs or
3859 vi:
3860
3861 cat filelist | parallel --tty -X emacs
3862 cat filelist | parallel --tty -X vi
3863
3864 If there are more files than will fit on a single command line, the
3865 editor will be started again with the remaining files.
3866
3868 sudo requires a password to run a command as root. It caches the
3869 access, so you only need to enter the password again if you have not
3870 used sudo for a while.
3871
3872 The command:
3873
3874 parallel sudo echo ::: This is a bad idea
3875
3876 is no good, as you would be prompted for the sudo password for each of
3877 the jobs. You can either do:
3878
3879 sudo echo This
3880 parallel sudo echo ::: is a good idea
3881
3882 or:
3883
3884 sudo parallel echo ::: This is a good idea
3885
3886 This way you only have to enter the sudo password once.
3887
3889 GNU parallel can work as a simple job queue system or batch manager.
3890 The idea is to put the jobs into a file and have GNU parallel read from
3891 that continuously. As GNU parallel will stop at end of file we use tail
3892 to continue reading:
3893
3894 true >jobqueue; tail -n+0 -f jobqueue | parallel
3895
3896 To submit your jobs to the queue:
3897
3898 echo my_command my_arg >> jobqueue
3899
3900 You can of course use -S to distribute the jobs to remote computers:
3901
3902 true >jobqueue; tail -n+0 -f jobqueue | parallel -S ..
3903
3904 If you keep this running for a long time, jobqueue will grow. A way of
3905 removing the jobs already run is by making GNU parallel stop when it
3906 hits a special value and then restart. To use --eof to make GNU
3907 parallel exit, tail also needs to be forced to exit:
3908
3909 true >jobqueue;
3910 while true; do
3911 tail -n+0 -f jobqueue |
3912 (parallel -E StOpHeRe -S ..; echo GNU Parallel is now done;
3913 perl -e 'while(<>){/StOpHeRe/ and last};print <>' jobqueue > j2;
3914 (seq 1000 >> jobqueue &);
3915 echo Done appending dummy data forcing tail to exit)
3916 echo tail exited;
3917 mv j2 jobqueue
3918 done
3919
3920 In some cases you can run on more CPUs and computers during the night:
3921
3922 # Day time
3923 echo 50% > jobfile
3924 cp day_server_list ~/.parallel/sshloginfile
3925 # Night time
3926 echo 100% > jobfile
3927 cp night_server_list ~/.parallel/sshloginfile
3928 tail -n+0 -f jobqueue | parallel --jobs jobfile -S ..
3929
3930 GNU parallel discovers if jobfile or ~/.parallel/sshloginfile changes.
3931
3932 There is a a small issue when using GNU parallel as queue system/batch
3933 manager: You have to submit JobSlot number of jobs before they will
3934 start, and after that you can submit one at a time, and job will start
3935 immediately if free slots are available. Output from the running or
3936 completed jobs are held back and will only be printed when JobSlots
3937 more jobs has been started (unless you use --ungroup or --line-buffer,
3938 in which case the output from the jobs are printed immediately). E.g.
3939 if you have 10 jobslots then the output from the first completed job
3940 will only be printed when job 11 has started, and the output of second
3941 completed job will only be printed when job 12 has started.
3942
3944 If you have a dir in which users drop files that needs to be processed
3945 you can do this on GNU/Linux (If you know what inotifywait is called on
3946 other platforms file a bug report):
3947
3948 inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
3949 parallel -u echo
3950
3951 This will run the command echo on each file put into my_dir or subdirs
3952 of my_dir.
3953
3954 You can of course use -S to distribute the jobs to remote computers:
3955
3956 inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
3957 parallel -S .. -u echo
3958
3959 If the files to be processed are in a tar file then unpacking one file
3960 and processing it immediately may be faster than first unpacking all
3961 files. Set up the dir processor as above and unpack into the dir.
3962
3963 Using GNU parallel as dir processor has the same limitations as using
3964 GNU parallel as queue system/batch manager.
3965
3967 If you have downloaded source and tried compiling it, you may have
3968 seen:
3969
3970 $ ./configure
3971 [...]
3972 checking for something.h... no
3973 configure: error: "libsomething not found"
3974
3975 Often it is not obvious which package you should install to get that
3976 file. Debian has `apt-file` to search for a file. `tracefile` from
3977 https://gitlab.com/ole.tange/tangetools can tell which files a program
3978 tried to access. In this case we are interested in one of the last
3979 files:
3980
3981 $ tracefile -un ./configure | tail | parallel -j0 apt-file search
3982
3984 --round-robin, --pipe-part, --shard, --bin and --group-by are all
3985 specialized versions of --pipe.
3986
3987 In the following n is the number of jobslots given by --jobs. A record
3988 starts with --recstart and ends with --recend. It is typically a full
3989 line. A chunk is a number of full records that is approximately the
3990 size of a block. A block can contain half records, a chunk cannot.
3991
3992 --pipe starts one job per chunk. It reads blocks from stdin (standard
3993 input). It finds a record end near a block border and passes a chunk to
3994 the program.
3995
3996 --pipe-part starts one job per chunk - just like normal --pipe. It
3997 first finds record endings near all block borders in the file and then
3998 starts the jobs. By using --block -1 it will set the block size to 1/n
3999 * size-of-file. Used this way it will start n jobs in total.
4000
4001 --round-robin starts n jobs in total. It reads a block and passes a
4002 chunk to whichever job is ready to read. It does not parse the content
4003 except for identifying where a record ends to make sure it only passes
4004 full records.
4005
4006 --shard starts n jobs in total. It parses each line to read the value
4007 in the given column. Based on this value the line is passed to one of
4008 the n jobs. All lines having this value will be given to the same
4009 jobslot.
4010
4011 --bin works like --shard but the value of the column is the jobslot
4012 number it will be passed to. If the value is bigger than n, then n will
4013 be subtracted from the value until the values is smaller than or equal
4014 to n.
4015
4016 --group-by starts one job per chunk. Record borders are not given by
4017 --recend/--recstart. Instead a record is defined by a number of lines
4018 having the same value in a given column. So the value of a given column
4019 changes at a chunk border. With --pipe every line is parsed, with
4020 --pipe-part only a few lines are parsed to find the chunk border.
4021
4022 --group-by can be combined with --round-robin or --pipe-part.
4023
4025 GNU parallel is very liberal in quoting. You only need to quote
4026 characters that have special meaning in shell:
4027
4028 ( ) $ ` ' " < > ; | \
4029
4030 and depending on context these needs to be quoted, too:
4031
4032 ~ & # ! ? space * {
4033
4034 Therefore most people will never need more quoting than putting '\' in
4035 front of the special characters.
4036
4037 Often you can simply put \' around every ':
4038
4039 perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' file
4040
4041 can be quoted:
4042
4043 parallel perl -ne \''/^\S+\s+\S+$/ and print $ARGV,"\n"'\' ::: file
4044
4045 However, when you want to use a shell variable you need to quote the
4046 $-sign. Here is an example using $PARALLEL_SEQ. This variable is set by
4047 GNU parallel itself, so the evaluation of the $ must be done by the sub
4048 shell started by GNU parallel:
4049
4050 seq 10 | parallel -N2 echo seq:\$PARALLEL_SEQ arg1:{1} arg2:{2}
4051
4052 If the variable is set before GNU parallel starts you can do this:
4053
4054 VAR=this_is_set_before_starting
4055 echo test | parallel echo {} $VAR
4056
4057 Prints: test this_is_set_before_starting
4058
4059 It is a little more tricky if the variable contains more than one space
4060 in a row:
4061
4062 VAR="two spaces between each word"
4063 echo test | parallel echo {} \'"$VAR"\'
4064
4065 Prints: test two spaces between each word
4066
4067 If the variable should not be evaluated by the shell starting GNU
4068 parallel but be evaluated by the sub shell started by GNU parallel,
4069 then you need to quote it:
4070
4071 echo test | parallel VAR=this_is_set_after_starting \; echo {} \$VAR
4072
4073 Prints: test this_is_set_after_starting
4074
4075 It is a little more tricky if the variable contains space:
4076
4077 echo test |\
4078 parallel VAR='"two spaces between each word"' echo {} \'"$VAR"\'
4079
4080 Prints: test two spaces between each word
4081
4082 $$ is the shell variable containing the process id of the shell. This
4083 will print the process id of the shell running GNU parallel:
4084
4085 seq 10 | parallel echo $$
4086
4087 And this will print the process ids of the sub shells started by GNU
4088 parallel.
4089
4090 seq 10 | parallel echo \$\$
4091
4092 If the special characters should not be evaluated by the sub shell then
4093 you need to protect it against evaluation from both the shell starting
4094 GNU parallel and the sub shell:
4095
4096 echo test | parallel echo {} \\\$VAR
4097
4098 Prints: test $VAR
4099
4100 GNU parallel can protect against evaluation by the sub shell by using
4101 -q:
4102
4103 echo test | parallel -q echo {} \$VAR
4104
4105 Prints: test $VAR
4106
4107 This is particularly useful if you have lots of quoting. If you want to
4108 run a perl script like this:
4109
4110 perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' file
4111
4112 It needs to be quoted like one of these:
4113
4114 ls | parallel perl -ne '/^\\S+\\s+\\S+\$/\ and\ print\ \$ARGV,\"\\n\"'
4115 ls | parallel perl -ne \''/^\S+\s+\S+$/ and print $ARGV,"\n"'\'
4116
4117 Notice how spaces, \'s, "'s, and $'s need to be quoted. GNU parallel
4118 can do the quoting by using option -q:
4119
4120 ls | parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"'
4121
4122 However, this means you cannot make the sub shell interpret special
4123 characters. For example because of -q this WILL NOT WORK:
4124
4125 ls *.gz | parallel -q "zcat {} >{.}"
4126 ls *.gz | parallel -q "zcat {} | bzip2 >{.}.bz2"
4127
4128 because > and | need to be interpreted by the sub shell.
4129
4130 If you get errors like:
4131
4132 sh: -c: line 0: syntax error near unexpected token
4133 sh: Syntax error: Unterminated quoted string
4134 sh: -c: line 0: unexpected EOF while looking for matching `''
4135 sh: -c: line 1: syntax error: unexpected end of file
4136 zsh:1: no matches found:
4137
4138 then you might try using -q.
4139
4140 If you are using bash process substitution like <(cat foo) then you may
4141 try -q and prepending command with bash -c:
4142
4143 ls | parallel -q bash -c 'wc -c <(echo {})'
4144
4145 Or for substituting output:
4146
4147 ls | parallel -q bash -c \
4148 'tar c {} | tee >(gzip >{}.tar.gz) | bzip2 >{}.tar.bz2'
4149
4150 Conclusion: To avoid dealing with the quoting problems it may be easier
4151 just to write a small script or a function (remember to export -f the
4152 function) and have GNU parallel call that.
4153
4155 If you want a list of the jobs currently running you can run:
4156
4157 killall -USR1 parallel
4158
4159 GNU parallel will then print the currently running jobs on stderr
4160 (standard error).
4161
4163 If you regret starting a lot of jobs you can simply break GNU parallel,
4164 but if you want to make sure you do not have half-completed jobs you
4165 should send the signal SIGHUP to GNU parallel:
4166
4167 killall -HUP parallel
4168
4169 This will tell GNU parallel to not start any new jobs, but wait until
4170 the currently running jobs are finished before exiting.
4171
4173 $PARALLEL_HOME
4174 Dir where GNU parallel stores config files, semaphores, and
4175 caches information between invocations. Default:
4176 $HOME/.parallel.
4177
4178 $PARALLEL_HOSTGROUPS
4179 When using --hostgroups GNU parallel sets this to the
4180 intersection of the hostgroups of the job and the sshlogin
4181 that the job is run on.
4182
4183 Remember to quote the $, so it gets evaluated by the correct
4184 shell. Or use --plus and {hgrp}.
4185
4186 $PARALLEL_JOBSLOT
4187 Set by GNU parallel and can be used in jobs run by GNU
4188 parallel. Remember to quote the $, so it gets evaluated by
4189 the correct shell. Or use --plus and {slot}.
4190
4191 $PARALLEL_JOBSLOT is the jobslot of the job. It is equal to
4192 {%} unless the job is being retried. See {%} for details.
4193
4194 $PARALLEL_PID
4195 Set by GNU parallel and can be used in jobs run by GNU
4196 parallel. Remember to quote the $, so it gets evaluated by
4197 the correct shell.
4198
4199 This makes it possible for the jobs to communicate directly to
4200 GNU parallel.
4201
4202 Example: If each of the jobs tests a solution and one of jobs
4203 finds the solution the job can tell GNU parallel not to start
4204 more jobs by: kill -HUP $PARALLEL_PID. This only works on the
4205 local computer.
4206
4207 $PARALLEL_RSYNC_OPTS
4208 Options to pass on to rsync. Defaults to: -rlDzR.
4209
4210 $PARALLEL_SHELL
4211 Use this shell for the commands run by GNU parallel:
4212
4213 • $PARALLEL_SHELL. If undefined use:
4214
4215 • The shell that started GNU parallel. If that cannot be
4216 determined:
4217
4218 • $SHELL. If undefined use:
4219
4220 • /bin/sh
4221
4222 $PARALLEL_SSH
4223 GNU parallel defaults to using the ssh command for remote
4224 access. This can be overridden with $PARALLEL_SSH, which again
4225 can be overridden with --ssh. It can also be set on a per
4226 server basis (see --sshlogin).
4227
4228 $PARALLEL_SSHHOST
4229 Set by GNU parallel and can be used in jobs run by GNU
4230 parallel. Remember to quote the $, so it gets evaluated by
4231 the correct shell. Or use --plus and {host}.
4232
4233 $PARALLEL_SSHHOST is the host part of an sshlogin line. E.g.
4234
4235 4//usr/bin/specialssh user@host
4236
4237 becomes:
4238
4239 host
4240
4241 $PARALLEL_SSHLOGIN
4242 Set by GNU parallel and can be used in jobs run by GNU
4243 parallel. Remember to quote the $, so it gets evaluated by
4244 the correct shell. Or use --plus and {sshlogin}.
4245
4246 The value is the sshlogin line with number of cores removed.
4247 E.g.
4248
4249 4//usr/bin/specialssh user@host
4250
4251 becomes:
4252
4253 /usr/bin/specialssh user@host
4254
4255 $PARALLEL_SEQ
4256 Set by GNU parallel and can be used in jobs run by GNU
4257 parallel. Remember to quote the $, so it gets evaluated by
4258 the correct shell.
4259
4260 $PARALLEL_SEQ is the sequence number of the job running.
4261
4262 Example:
4263
4264 seq 10 | parallel -N2 \
4265 echo seq:'$'PARALLEL_SEQ arg1:{1} arg2:{2}
4266
4267 {#} is a shorthand for $PARALLEL_SEQ.
4268
4269 $PARALLEL_TMUX
4270 Path to tmux. If unset the tmux in $PATH is used.
4271
4272 $TMPDIR Directory for temporary files. See: --tmpdir.
4273
4274 $PARALLEL
4275 The environment variable $PARALLEL will be used as default
4276 options for GNU parallel. If the variable contains special
4277 shell characters (e.g. $, *, or space) then these need to be
4278 to be escaped with \.
4279
4280 Example:
4281
4282 cat list | parallel -j1 -k -v ls
4283 cat list | parallel -j1 -k -v -S"myssh user@server" ls
4284
4285 can be written as:
4286
4287 cat list | PARALLEL="-kvj1" parallel ls
4288 cat list | PARALLEL='-kvj1 -S myssh\ user@server' \
4289 parallel echo
4290
4291 Notice the \ after 'myssh' is needed because 'myssh' and
4292 'user@server' must be one argument.
4293
4295 The global configuration file /etc/parallel/config, followed by user
4296 configuration file ~/.parallel/config (formerly known as .parallelrc)
4297 will be read in turn if they exist. Lines starting with '#' will be
4298 ignored. The format can follow that of the environment variable
4299 $PARALLEL, but it is often easier to simply put each option on its own
4300 line.
4301
4302 Options on the command line take precedence, followed by the
4303 environment variable $PARALLEL, user configuration file
4304 ~/.parallel/config, and finally the global configuration file
4305 /etc/parallel/config.
4306
4307 Note that no file that is read for options, nor the environment
4308 variable $PARALLEL, may contain retired options such as --tollef.
4309
4311 If --profile set, GNU parallel will read the profile from that file
4312 rather than the global or user configuration files. You can have
4313 multiple --profiles.
4314
4315 Profiles are searched for in ~/.parallel. If the name starts with / it
4316 is seen as an absolute path. If the name starts with ./ it is seen as a
4317 relative path from current dir.
4318
4319 Example: Profile for running a command on every sshlogin in
4320 ~/.ssh/sshlogins and prepend the output with the sshlogin:
4321
4322 echo --tag -S .. --nonall > ~/.parallel/n
4323 parallel -Jn uptime
4324
4325 Example: Profile for running every command with -j-1 and nice
4326
4327 echo -j-1 nice > ~/.parallel/nice_profile
4328 parallel -J nice_profile bzip2 -9 ::: *
4329
4330 Example: Profile for running a perl script before every command:
4331
4332 echo "perl -e '\$a=\$\$; print \$a,\" \",'\$PARALLEL_SEQ',\" \";';" \
4333 > ~/.parallel/pre_perl
4334 parallel -J pre_perl echo ::: *
4335
4336 Note how the $ and " need to be quoted using \.
4337
4338 Example: Profile for running distributed jobs with nice on the remote
4339 computers:
4340
4341 echo -S .. nice > ~/.parallel/dist
4342 parallel -J dist --trc {.}.bz2 bzip2 -9 ::: *
4343
4345 Exit status depends on --halt-on-error if one of these is used:
4346 success=X, success=Y%, fail=Y%.
4347
4348 0 All jobs ran without error. If success=X is used: X jobs ran
4349 without error. If success=Y% is used: Y% of the jobs ran without
4350 error.
4351
4352 1-100 Some of the jobs failed. The exit status gives the number of
4353 failed jobs. If Y% is used the exit status is the percentage of
4354 jobs that failed.
4355
4356 101 More than 100 jobs failed.
4357
4358 255 Other error.
4359
4360 -1 (In joblog and SQL table)
4361 Killed by Ctrl-C, timeout, not enough memory or similar.
4362
4363 -2 (In joblog and SQL table)
4364 skip() was called in {= =}.
4365
4366 -1000 (In SQL table)
4367 Job is ready to run (set by --sqlmaster).
4368
4369 -1220 (In SQL table)
4370 Job is taken by worker (set by --sqlworker).
4371
4372 If fail=1 is used, the exit status will be the exit status of the
4373 failing job.
4374
4376 See: man parallel_alternatives
4377
4379 Quoting of newline
4380 Because of the way newline is quoted this will not work:
4381
4382 echo 1,2,3 | parallel -vkd, "echo 'a{}b'"
4383
4384 However, these will all work:
4385
4386 echo 1,2,3 | parallel -vkd, echo a{}b
4387 echo 1,2,3 | parallel -vkd, "echo 'a'{}'b'"
4388 echo 1,2,3 | parallel -vkd, "echo 'a'"{}"'b'"
4389
4390 Speed
4391 Startup
4392
4393 GNU parallel is slow at starting up - around 250 ms the first time and
4394 150 ms after that.
4395
4396 Job startup
4397
4398 Starting a job on the local machine takes around 10 ms. This can be a
4399 big overhead if the job takes very few ms to run. Often you can group
4400 small jobs together using -X which will make the overhead less
4401 significant. Or you can run multiple GNU parallels as described in
4402 EXAMPLE: Speeding up fast jobs.
4403
4404 SSH
4405
4406 When using multiple computers GNU parallel opens ssh connections to
4407 them to figure out how many connections can be used reliably
4408 simultaneously (Namely SSHD's MaxStartups). This test is done for each
4409 host in serial, so if your --sshloginfile contains many hosts it may be
4410 slow.
4411
4412 If your jobs are short you may see that there are fewer jobs running on
4413 the remote systems than expected. This is due to time spent logging in
4414 and out. -M may help here.
4415
4416 Disk access
4417
4418 A single disk can normally read data faster if it reads one file at a
4419 time instead of reading a lot of files in parallel, as this will avoid
4420 disk seeks. However, newer disk systems with multiple drives can read
4421 faster if reading from multiple files in parallel.
4422
4423 If the jobs are of the form read-all-compute-all-write-all, so
4424 everything is read before anything is written, it may be faster to
4425 force only one disk access at the time:
4426
4427 sem --id diskio cat file | compute | sem --id diskio cat > file
4428
4429 If the jobs are of the form read-compute-write, so writing starts
4430 before all reading is done, it may be faster to force only one reader
4431 and writer at the time:
4432
4433 sem --id read cat file | compute | sem --id write cat > file
4434
4435 If the jobs are of the form read-compute-read-compute, it may be faster
4436 to run more jobs in parallel than the system has CPUs, as some of the
4437 jobs will be stuck waiting for disk access.
4438
4439 --nice limits command length
4440 The current implementation of --nice is too pessimistic in the max
4441 allowed command length. It only uses a little more than half of what it
4442 could. This affects -X and -m. If this becomes a real problem for you,
4443 file a bug-report.
4444
4445 Aliases and functions do not work
4446 If you get:
4447
4448 Can't exec "command": No such file or directory
4449
4450 or:
4451
4452 open3: exec of by command failed
4453
4454 or:
4455
4456 /bin/bash: command: command not found
4457
4458 it may be because command is not known, but it could also be because
4459 command is an alias or a function. If it is a function you need to
4460 export -f the function first or use env_parallel. An alias will only
4461 work if you use env_parallel.
4462
4463 Database with MySQL fails randomly
4464 The --sql* options may fail randomly with MySQL. This problem does not
4465 exist with PostgreSQL.
4466
4468 Report bugs to <bug-parallel@gnu.org> or
4469 https://savannah.gnu.org/bugs/?func=additem&group=parallel
4470
4471 See a perfect bug report on
4472 https://lists.gnu.org/archive/html/bug-parallel/2015-01/msg00000.html
4473
4474 Your bug report should always include:
4475
4476 • The error message you get (if any). If the error message is not from
4477 GNU parallel you need to show why you think GNU parallel caused this.
4478
4479 • The complete output of parallel --version. If you are not running the
4480 latest released version (see http://ftp.gnu.org/gnu/parallel/) you
4481 should specify why you believe the problem is not fixed in that
4482 version.
4483
4484 • A minimal, complete, and verifiable example (See description on
4485 https://stackoverflow.com/help/mcve).
4486
4487 It should be a complete example that others can run which shows the
4488 problem including all files needed to run the example. This should
4489 preferably be small and simple, so try to remove as many options as
4490 possible. A combination of yes, seq, cat, echo, wc, and sleep can
4491 reproduce most errors. If your example requires large files, see if
4492 you can make them with something like seq 100000000 > bigfile or yes
4493 | head -n 1000000000 > file. If you need multiple columns: paste
4494 <(seq 1000) <(seq 1000 1999)
4495
4496 If your example requires remote execution, see if you can use
4497 localhost - maybe using another login.
4498
4499 If you have access to a different system (maybe a VirtualBox on your
4500 own machine), test if the MCVE shows the problem on that system.
4501
4502 • The output of your example. If your problem is not easily reproduced
4503 by others, the output might help them figure out the problem.
4504
4505 • Whether you have watched the intro videos
4506 (http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1), walked
4507 through the tutorial (man parallel_tutorial), and read the EXAMPLE
4508 section in the man page (man parallel - search for EXAMPLE:).
4509
4510 If you suspect the error is dependent on your environment or
4511 distribution, please see if you can reproduce the error on one of these
4512 VirtualBox images:
4513 http://sourceforge.net/projects/virtualboximage/files/
4514 http://www.osboxes.org/virtualbox-images/
4515
4516 Specifying the name of your distribution is not enough as you may have
4517 installed software that is not in the VirtualBox images.
4518
4519 If you cannot reproduce the error on any of the VirtualBox images
4520 above, see if you can build a VirtualBox image on which you can
4521 reproduce the error. If not you should assume the debugging will be
4522 done through you. That will put more burden on you and it is extra
4523 important you give any information that help. In general the problem
4524 will be fixed faster and with less work for you if you can reproduce
4525 the error on a VirtualBox.
4526
4528 When using GNU parallel for a publication please cite:
4529
4530 O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
4531 The USENIX Magazine, February 2011:42-47.
4532
4533 This helps funding further development; and it won't cost you a cent.
4534 If you pay 10000 EUR you should feel free to use GNU Parallel without
4535 citing.
4536
4537 Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
4538
4539 Copyright (C) 2008-2010 Ole Tange, http://ole.tange.dk
4540
4541 Copyright (C) 2010-2020 Ole Tange, http://ole.tange.dk and Free
4542 Software Foundation, Inc.
4543
4544 Parts of the manual concerning xargs compatibility is inspired by the
4545 manual of xargs from GNU findutils 4.4.2.
4546
4548 This program is free software; you can redistribute it and/or modify it
4549 under the terms of the GNU General Public License as published by the
4550 Free Software Foundation; either version 3 of the License, or at your
4551 option any later version.
4552
4553 This program is distributed in the hope that it will be useful, but
4554 WITHOUT ANY WARRANTY; without even the implied warranty of
4555 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
4556 General Public License for more details.
4557
4558 You should have received a copy of the GNU General Public License along
4559 with this program. If not, see <http://www.gnu.org/licenses/>.
4560
4561 Documentation license I
4562 Permission is granted to copy, distribute and/or modify this
4563 documentation under the terms of the GNU Free Documentation License,
4564 Version 1.3 or any later version published by the Free Software
4565 Foundation; with no Invariant Sections, with no Front-Cover Texts, and
4566 with no Back-Cover Texts. A copy of the license is included in the
4567 file fdl.txt.
4568
4569 Documentation license II
4570 You are free:
4571
4572 to Share to copy, distribute and transmit the work
4573
4574 to Remix to adapt the work
4575
4576 Under the following conditions:
4577
4578 Attribution
4579 You must attribute the work in the manner specified by the
4580 author or licensor (but not in any way that suggests that they
4581 endorse you or your use of the work).
4582
4583 Share Alike
4584 If you alter, transform, or build upon this work, you may
4585 distribute the resulting work only under the same, similar or
4586 a compatible license.
4587
4588 With the understanding that:
4589
4590 Waiver Any of the above conditions can be waived if you get
4591 permission from the copyright holder.
4592
4593 Public Domain
4594 Where the work or any of its elements is in the public domain
4595 under applicable law, that status is in no way affected by the
4596 license.
4597
4598 Other Rights
4599 In no way are any of the following rights affected by the
4600 license:
4601
4602 • Your fair dealing or fair use rights, or other applicable
4603 copyright exceptions and limitations;
4604
4605 • The author's moral rights;
4606
4607 • Rights other persons may have either in the work itself or
4608 in how the work is used, such as publicity or privacy
4609 rights.
4610
4611 Notice For any reuse or distribution, you must make clear to others
4612 the license terms of this work.
4613
4614 A copy of the full license is included in the file as cc-by-sa.txt.
4615
4617 GNU parallel uses Perl, and the Perl modules Getopt::Long, IPC::Open3,
4618 Symbol, IO::File, POSIX, and File::Temp.
4619
4620 For --csv it uses the Perl module Text::CSV.
4621
4622 For remote usage it uses rsync with ssh.
4623
4625 parallel_tutorial(1), env_parallel(1), parset(1), parsort(1),
4626 parallel_alternatives(1), parallel_design(7), niceload(1), sql(1),
4627 ssh(1), ssh-agent(1), sshpass(1), ssh-copy-id(1), rsync(1)
4628
4629
4630
463120201122 2020-12-21 PARALLEL(1)