1PARALLEL(1) parallel PARALLEL(1)
2
3
4
6 parallel - build and execute shell command lines from standard input in
7 parallel
8
10 parallel [options] [command [arguments]] < list_of_arguments
11
12 parallel [options] [command [arguments]] ( ::: arguments | :::+
13 arguments | :::: argfile(s) | ::::+ argfile(s) ) ...
14
15 parallel --semaphore [options] command
16
17 #!/usr/bin/parallel --shebang [options] [command [arguments]]
18
19 #!/usr/bin/parallel --shebang-wrap [options] [command [arguments]]
20
22 STOP!
23
24 Read the Reader's guide below if you are new to GNU parallel.
25
26 GNU parallel is a shell tool for executing jobs in parallel using one
27 or more computers. A job can be a single command or a small script that
28 has to be run for each of the lines in the input. The typical input is
29 a list of files, a list of hosts, a list of users, a list of URLs, or a
30 list of tables. A job can also be a command that reads from a pipe. GNU
31 parallel can then split the input into blocks and pipe a block into
32 each command in parallel.
33
34 If you use xargs and tee today you will find GNU parallel very easy to
35 use as GNU parallel is written to have the same options as xargs. If
36 you write loops in shell, you will find GNU parallel may be able to
37 replace most of the loops and make them run faster by running several
38 jobs in parallel.
39
40 GNU parallel makes sure output from the commands is the same output as
41 you would get had you run the commands sequentially. This makes it
42 possible to use output from GNU parallel as input for other programs.
43
44 For each line of input GNU parallel will execute command with the line
45 as arguments. If no command is given, the line of input is executed.
46 Several lines will be run in parallel. GNU parallel can often be used
47 as a substitute for xargs or cat | bash.
48
49 Reader's guide
50 If you prefer reading a book buy GNU Parallel 2018 at
51 http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
52 or download it at: https://doi.org/10.5281/zenodo.1146014
53
54 Otherwise start by watching the intro videos for a quick introduction:
55 http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
56
57 If you need a one page printable cheat sheet you can find it on:
58 https://www.gnu.org/software/parallel/parallel_cheat.pdf
59
60 You can find a lot of EXAMPLEs of use after the list of OPTIONS in man
61 parallel (Use LESS=+/EXAMPLE: man parallel). That will give you an idea
62 of what GNU parallel is capable of, and you may find a solution you can
63 simply adapt to your situation.
64
65 If you want to dive even deeper: spend a couple of hours walking
66 through the tutorial (man parallel_tutorial). Your command line will
67 love you for it.
68
69 Finally you may want to look at the rest of the manual (man parallel)
70 if you have special needs not already covered.
71
72 If you want to know the design decisions behind GNU parallel, try: man
73 parallel_design. This is also a good intro if you intend to change GNU
74 parallel.
75
77 command
78 Command to execute. If command or the following arguments contain
79 replacement strings (such as {}) every instance will be substituted
80 with the input.
81
82 If command is given, GNU parallel solve the same tasks as xargs. If
83 command is not given GNU parallel will behave similar to cat | sh.
84
85 The command must be an executable, a script, a composed command, an
86 alias, or a function.
87
88 Bash functions: export -f the function first or use env_parallel.
89
90 Bash, Csh, or Tcsh aliases: Use env_parallel.
91
92 Zsh, Fish, Ksh, and Pdksh functions and aliases: Use env_parallel.
93
94 {} (beta testing)
95 Input line. This replacement string will be replaced by a full line
96 read from the input source. The input source is normally stdin
97 (standard input), but can also be given with -a, :::, or ::::.
98
99 The replacement string {} can be changed with -I.
100
101 If the command line contains no replacement strings then {} will be
102 appended to the command line.
103
104 Replacement strings are normally quoted, so special characters are
105 not parsed by the shell. The exception is if the command starts
106 with a replacement string; then the string is not quoted.
107
108 {.} Input line without extension. This replacement string will be
109 replaced by the input with the extension removed. If the input line
110 contains . after the last /, the last . until the end of the string
111 will be removed and {.} will be replaced with the remaining. E.g.
112 foo.jpg becomes foo, subdir/foo.jpg becomes subdir/foo,
113 sub.dir/foo.jpg becomes sub.dir/foo, sub.dir/bar remains
114 sub.dir/bar. If the input line does not contain . it will remain
115 unchanged.
116
117 The replacement string {.} can be changed with --er.
118
119 To understand replacement strings see {}.
120
121 {/} Basename of input line. This replacement string will be replaced by
122 the input with the directory part removed.
123
124 The replacement string {/} can be changed with --basenamereplace.
125
126 To understand replacement strings see {}.
127
128 {//}
129 Dirname of input line. This replacement string will be replaced by
130 the dir of the input line. See dirname(1).
131
132 The replacement string {//} can be changed with --dirnamereplace.
133
134 To understand replacement strings see {}.
135
136 {/.}
137 Basename of input line without extension. This replacement string
138 will be replaced by the input with the directory and extension part
139 removed. It is a combination of {/} and {.}.
140
141 The replacement string {/.} can be changed with
142 --basenameextensionreplace.
143
144 To understand replacement strings see {}.
145
146 {#} Sequence number of the job to run. This replacement string will be
147 replaced by the sequence number of the job being run. It contains
148 the same number as $PARALLEL_SEQ.
149
150 The replacement string {#} can be changed with --seqreplace.
151
152 To understand replacement strings see {}.
153
154 {%} Job slot number. This replacement string will be replaced by the
155 job's slot number between 1 and number of jobs to run in parallel.
156 There will never be 2 jobs running at the same time with the same
157 job slot number.
158
159 The replacement string {%} can be changed with --slotreplace.
160
161 To understand replacement strings see {}.
162
163 {n} Argument from input source n or the n'th argument. This positional
164 replacement string will be replaced by the input from input source
165 n (when used with -a or ::::) or with the n'th argument (when used
166 with -N). If n is negative it refers to the n'th last argument.
167
168 To understand replacement strings see {}.
169
170 {n.}
171 Argument from input source n or the n'th argument without
172 extension. It is a combination of {n} and {.}.
173
174 This positional replacement string will be replaced by the input
175 from input source n (when used with -a or ::::) or with the n'th
176 argument (when used with -N). The input will have the extension
177 removed.
178
179 To understand positional replacement strings see {n}.
180
181 {n/}
182 Basename of argument from input source n or the n'th argument. It
183 is a combination of {n} and {/}.
184
185 This positional replacement string will be replaced by the input
186 from input source n (when used with -a or ::::) or with the n'th
187 argument (when used with -N). The input will have the directory (if
188 any) removed.
189
190 To understand positional replacement strings see {n}.
191
192 {n//}
193 Dirname of argument from input source n or the n'th argument. It
194 is a combination of {n} and {//}.
195
196 This positional replacement string will be replaced by the dir of
197 the input from input source n (when used with -a or ::::) or with
198 the n'th argument (when used with -N). See dirname(1).
199
200 To understand positional replacement strings see {n}.
201
202 {n/.}
203 Basename of argument from input source n or the n'th argument
204 without extension. It is a combination of {n}, {/}, and {.}.
205
206 This positional replacement string will be replaced by the input
207 from input source n (when used with -a or ::::) or with the n'th
208 argument (when used with -N). The input will have the directory (if
209 any) and extension removed.
210
211 To understand positional replacement strings see {n}.
212
213 {=perl expression=}
214 Replace with calculated perl expression. $_ will contain the same
215 as {}. After evaluating perl expression $_ will be used as the
216 value. It is recommended to only change $_ but you have full access
217 to all of GNU parallel's internal functions and data structures. A
218 few convenience functions and data structures have been made:
219
220 Q(string) shell quote a string
221
222 pQ(string) perl quote a string
223
224 uq() (or uq) (beta testing) do not quote current replacement
225 string
226
227 total_jobs() number of jobs in total
228
229 slot() slot number of job
230
231 seq() sequence number of job
232
233 @arg the arguments
234
235 Example:
236
237 seq 10 | parallel echo {} + 1 is {= '$_++' =}
238 parallel csh -c {= '$_="mkdir ".Q($_)' =} ::: '12" dir'
239 seq 50 | parallel echo job {#} of {= '$_=total_jobs()' =}
240
241 See also: --rpl --parens
242
243 {=n perl expression=}
244 Positional equivalent to {=perl expression=}. To understand
245 positional replacement strings see {n}.
246
247 See also: {=perl expression=} {n}.
248
249 ::: arguments
250 Use arguments from the command line as input source instead of
251 stdin (standard input). Unlike other options for GNU parallel :::
252 is placed after the command and before the arguments.
253
254 The following are equivalent:
255
256 (echo file1; echo file2) | parallel gzip
257 parallel gzip ::: file1 file2
258 parallel gzip {} ::: file1 file2
259 parallel --arg-sep ,, gzip {} ,, file1 file2
260 parallel --arg-sep ,, gzip ,, file1 file2
261 parallel ::: "gzip file1" "gzip file2"
262
263 To avoid treating ::: as special use --arg-sep to set the argument
264 separator to something else. See also --arg-sep.
265
266 If multiple ::: are given, each group will be treated as an input
267 source, and all combinations of input sources will be generated.
268 E.g. ::: 1 2 ::: a b c will result in the combinations (1,a) (1,b)
269 (1,c) (2,a) (2,b) (2,c). This is useful for replacing nested for-
270 loops.
271
272 ::: and :::: can be mixed. So these are equivalent:
273
274 parallel echo {1} {2} {3} ::: 6 7 ::: 4 5 ::: 1 2 3
275 parallel echo {1} {2} {3} :::: <(seq 6 7) <(seq 4 5) \
276 :::: <(seq 1 3)
277 parallel -a <(seq 6 7) echo {1} {2} {3} :::: <(seq 4 5) \
278 :::: <(seq 1 3)
279 parallel -a <(seq 6 7) -a <(seq 4 5) echo {1} {2} {3} \
280 ::: 1 2 3
281 seq 6 7 | parallel -a - -a <(seq 4 5) echo {1} {2} {3} \
282 ::: 1 2 3
283 seq 4 5 | parallel echo {1} {2} {3} :::: <(seq 6 7) - \
284 ::: 1 2 3
285
286 :::+ arguments
287 Like ::: but linked like --link to the previous input source.
288
289 Contrary to --link, values do not wrap: The shortest input source
290 determines the length.
291
292 Example:
293
294 parallel echo ::: a b c :::+ 1 2 3 ::: X Y :::+ 11 22
295
296 :::: argfiles
297 Another way to write -a argfile1 -a argfile2 ...
298
299 ::: and :::: can be mixed.
300
301 See -a, ::: and --link.
302
303 ::::+ argfiles
304 Like :::: but linked like --link to the previous input source.
305
306 Contrary to --link, values do not wrap: The shortest input source
307 determines the length.
308
309 --null
310 -0 Use NUL as delimiter. Normally input lines will end in \n
311 (newline). If they end in \0 (NUL), then use this option. It is
312 useful for processing arguments that may contain \n (newline).
313
314 --arg-file input-file
315 -a input-file
316 Use input-file as input source. If you use this option, stdin
317 (standard input) is given to the first process run. Otherwise,
318 stdin (standard input) is redirected from /dev/null.
319
320 If multiple -a are given, each input-file will be treated as an
321 input source, and all combinations of input sources will be
322 generated. E.g. The file foo contains 1 2, the file bar contains a
323 b c. -a foo -a bar will result in the combinations (1,a) (1,b)
324 (1,c) (2,a) (2,b) (2,c). This is useful for replacing nested for-
325 loops.
326
327 See also --link and {n}.
328
329 --arg-file-sep sep-str
330 Use sep-str instead of :::: as separator string between command and
331 argument files. Useful if :::: is used for something else by the
332 command.
333
334 See also: ::::.
335
336 --arg-sep sep-str
337 Use sep-str instead of ::: as separator string. Useful if ::: is
338 used for something else by the command.
339
340 Also useful if you command uses ::: but you still want to read
341 arguments from stdin (standard input): Simply change --arg-sep to a
342 string that is not in the command line.
343
344 See also: :::.
345
346 --bar
347 Show progress as a progress bar. In the bar is shown: % of jobs
348 completed, estimated seconds left, and number of jobs started.
349
350 It is compatible with zenity:
351
352 seq 1000 | parallel -j30 --bar '(echo {};sleep 0.1)' \
353 2> >(zenity --progress --auto-kill) | wc
354
355 --basefile file
356 --bf file
357 file will be transferred to each sshlogin before a job is started.
358 It will be removed if --cleanup is active. The file may be a script
359 to run or some common base data needed for the job. Multiple --bf
360 can be specified to transfer more basefiles. The file will be
361 transferred the same way as --transferfile.
362
363 --basenamereplace replace-str
364 --bnr replace-str
365 Use the replacement string replace-str instead of {/} for basename
366 of input line.
367
368 --basenameextensionreplace replace-str
369 --bner replace-str
370 Use the replacement string replace-str instead of {/.} for basename
371 of input line without extension.
372
373 --bin binexpr (alpha testing)
374 Use binexpr as binning key and bin input to the jobs.
375
376 binexpr is [column number|column name] [perlexpression] e.g. 3,
377 Address, 3 $_%=100, Address s/\D//g.
378
379 Each input line is split using --colsep. The value of the column is
380 put into $_, the perl expression is executed, the resulting value
381 is is the job slot that will be given the line. If the value is
382 bigger than the number of jobslots the value will be modulo number
383 of jobslots.
384
385 This is similar to --shard but the hashing algorithm is a simple
386 modulo, which makes it predictible which jobslot will receive which
387 value.
388
389 The performance is in the order of 100K rows per second. Faster if
390 the bincol is small (<10), slower if it is big (>100).
391
392 --bin requires --pipe and a fixed numeric value for --jobs.
393
394 See also --shard, --group-by, --roundrobin.
395
396 --bg
397 Run command in background thus GNU parallel will not wait for
398 completion of the command before exiting. This is the default if
399 --semaphore is set.
400
401 See also: --fg, man sem.
402
403 Implies --semaphore.
404
405 --bibtex
406 --citation
407 Print the citation notice and BibTeX entry for GNU parallel,
408 silence citation notice for all future runs, and exit. It will not
409 run any commands.
410
411 If it is impossible for you to run --citation you can instead use
412 --will-cite, which will run commands, but which will only silence
413 the citation notice for this single run.
414
415 If you use --will-cite in scripts to be run by others you are
416 making it harder for others to see the citation notice. The
417 development of GNU parallel is indirectly financed through
418 citations, so if your users do not know they should cite then you
419 are making it harder to finance development. However, if you pay
420 10000 EUR, you have done your part to finance future development
421 and should feel free to use --will-cite in scripts.
422
423 If you do not want to help financing future development by letting
424 other users see the citation notice or by paying, then please use
425 another tool instead of GNU parallel. You can find some of the
426 alternatives in man parallel_alternatives.
427
428 --block size
429 --block-size size
430 Size of block in bytes to read at a time. The size can be postfixed
431 with K, M, G, T, P, E, k, m, g, t, p, or e which would multiply the
432 size with 1024, 1048576, 1073741824, 1099511627776,
433 1125899906842624, 1152921504606846976, 1000, 1000000, 1000000000,
434 1000000000000, 1000000000000000, or 1000000000000000000
435 respectively.
436
437 GNU parallel tries to meet the block size but can be off by the
438 length of one record. For performance reasons size should be bigger
439 than a two records. GNU parallel will warn you and automatically
440 increase the size if you choose a size that is too small.
441
442 If you use -N, --block-size should be bigger than N+1 records.
443
444 size defaults to 1M.
445
446 When using --pipepart a negative block size is not interpreted as a
447 blocksize but as the number of blocks each jobslot should have. So
448 this will run 10*5 = 50 jobs in total:
449
450 parallel --pipepart -a myfile --block -10 -j5 wc
451
452 This is an efficient alternative to --roundrobin because data is
453 never read by GNU parallel, but you can still have very few
454 jobslots process a large amount of data.
455
456 See --pipe and --pipepart for use of this.
457
458 --cat
459 Create a temporary file with content. Normally --pipe/--pipepart
460 will give data to the program on stdin (standard input). With --cat
461 GNU parallel will create a temporary file with the name in {}, so
462 you can do: parallel --pipe --cat wc {}.
463
464 Implies --pipe unless --pipepart is used.
465
466 See also --fifo.
467
468 --cleanup
469 Remove transferred files. --cleanup will remove the transferred
470 files on the remote computer after processing is done.
471
472 find log -name '*gz' | parallel \
473 --sshlogin server.example.com --transferfile {} \
474 --return {.}.bz2 --cleanup "zcat {} | bzip -9 >{.}.bz2"
475
476 With --transferfile {} the file transferred to the remote computer
477 will be removed on the remote computer. Directories created will
478 not be removed - even if they are empty.
479
480 With --return the file transferred from the remote computer will be
481 removed on the remote computer. Directories created will not be
482 removed - even if they are empty.
483
484 --cleanup is ignored when not used with --transferfile or --return.
485
486 --colsep regexp
487 -C regexp
488 Column separator. The input will be treated as a table with regexp
489 separating the columns. The n'th column can be accessed using {n}
490 or {n.}. E.g. {3} is the 3rd column.
491
492 If there are more input sources, each input source will be
493 separated, but the columns from each input source will be linked
494 (see --link).
495
496 parallel --colsep '-' echo {4} {3} {2} {1} \
497 ::: A-B C-D ::: e-f g-h
498
499 --colsep implies --trim rl, which can be overridden with --trim n.
500
501 regexp is a Perl Regular Expression:
502 http://perldoc.perl.org/perlre.html
503
504 --compress
505 Compress temporary files. If the output is big and very
506 compressible this will take up less disk space in $TMPDIR and
507 possibly be faster due to less disk I/O.
508
509 GNU parallel will try pzstd, lbzip2, pbzip2, zstd, pigz, lz4, lzop,
510 plzip, lzip, lrz, gzip, pxz, lzma, bzip2, xz, clzip, in that order,
511 and use the first available.
512
513 --compress-program prg
514 --decompress-program prg
515 Use prg for (de)compressing temporary files. It is assumed that prg
516 -dc will decompress stdin (standard input) to stdout (standard
517 output) unless --decompress-program is given.
518
519 --csv
520 Treat input as CSV-format. --colsep sets the field delimiter. It
521 works very much like --colsep except it deals correctly with
522 quoting:
523
524 echo '"1 big, 2 small","2""x4"" plank",12.34' |
525 parallel --csv echo {1} of {2} at {3}
526
527 Even quoted newlines are parsed correctly:
528
529 (echo '"Start of field 1 with newline'
530 echo 'Line 2 in field 1";value 2') |
531 parallel --csv --colsep ';' echo Field 1: {1} Field 2: {2}
532
533 When used with --pipe only pass full CSV-records.
534
535 --delimiter delim
536 -d delim
537 Input items are terminated by delim. Quotes and backslash are not
538 special; every character in the input is taken literally. Disables
539 the end-of-file string, which is treated like any other argument.
540 The specified delimiter may be characters, C-style character
541 escapes such as \n, or octal or hexadecimal escape codes. Octal
542 and hexadecimal escape codes are understood as for the printf
543 command. Multibyte characters are not supported.
544
545 --dirnamereplace replace-str
546 --dnr replace-str
547 Use the replacement string replace-str instead of {//} for dirname
548 of input line.
549
550 -E eof-str
551 Set the end of file string to eof-str. If the end of file string
552 occurs as a line of input, the rest of the input is not read. If
553 neither -E nor -e is used, no end of file string is used.
554
555 --delay mytime
556 Delay starting next job by mytime. GNU parallel will pause mytime
557 after starting each job. mytime is normally in seconds, but can be
558 floats postfixed with s, m, h, or d which would multiply the float
559 by 1, 60, 3600, or 86400. Thus these are equivalent: --delay 100000
560 and --delay 1d3.5h16.6m4s.
561
562 --dry-run
563 Print the job to run on stdout (standard output), but do not run
564 the job. Use -v -v to include the wrapping that GNU parallel
565 generates (for remote jobs, --tmux, --nice, --pipe, --pipepart,
566 --fifo and --cat). Do not count on this literally, though, as the
567 job may be scheduled on another computer or the local computer if :
568 is in the list.
569
570 --eof[=eof-str]
571 -e[eof-str]
572 This option is a synonym for the -E option. Use -E instead,
573 because it is POSIX compliant for xargs while this option is not.
574 If eof-str is omitted, there is no end of file string. If neither
575 -E nor -e is used, no end of file string is used.
576
577 --embed
578 Embed GNU parallel in a shell script. If you need to distribute
579 your script to someone who does not want to install GNU parallel
580 you can embed GNU parallel in your own shell script:
581
582 parallel --embed > new_script
583
584 After which you add your code at the end of new_script. This is
585 tested on ash, bash, dash, ksh, sh, and zsh.
586
587 --env var
588 Copy environment variable var. This will copy var to the
589 environment that the command is run in. This is especially useful
590 for remote execution.
591
592 In Bash var can also be a Bash function - just remember to export
593 -f the function, see command.
594
595 The variable '_' is special. It will copy all exported environment
596 variables except for the ones mentioned in
597 ~/.parallel/ignored_vars.
598
599 To copy the full environment (both exported and not exported
600 variables, arrays, and functions) use env_parallel.
601
602 See also: --record-env, --session.
603
604 --eta
605 Show the estimated number of seconds before finishing. This forces
606 GNU parallel to read all jobs before starting to find the number of
607 jobs. GNU parallel normally only reads the next job to run.
608
609 The estimate is based on the runtime of finished jobs, so the first
610 estimate will only be shown when the first job has finished.
611
612 Implies --progress.
613
614 See also: --bar, --progress.
615
616 --fg
617 Run command in foreground.
618
619 With --tmux and --tmuxpane GNU parallel will start tmux in the
620 foreground.
621
622 With --semaphore GNU parallel will run the command in the
623 foreground (opposite --bg), and wait for completion of the command
624 before exiting.
625
626 See also --bg, man sem.
627
628 --fifo
629 Create a temporary fifo with content. Normally --pipe and
630 --pipepart will give data to the program on stdin (standard input).
631 With --fifo GNU parallel will create a temporary fifo with the name
632 in {}, so you can do: parallel --pipe --fifo wc {}.
633
634 Beware: If data is not read from the fifo, the job will block
635 forever.
636
637 Implies --pipe unless --pipepart is used.
638
639 See also --cat.
640
641 --filter-hosts
642 Remove down hosts. For each remote host: check that login through
643 ssh works. If not: do not use this host.
644
645 For performance reasons, this check is performed only at the start
646 and every time --sshloginfile is changed. If an host goes down
647 after the first check, it will go undetected until --sshloginfile
648 is changed; --retries can be used to mitigate this.
649
650 Currently you can not put --filter-hosts in a profile, $PARALLEL,
651 /etc/parallel/config or similar. This is because GNU parallel uses
652 GNU parallel to compute this, so you will get an infinite loop.
653 This will likely be fixed in a later release.
654
655 --gnu
656 Behave like GNU parallel. This option historically took precedence
657 over --tollef. The --tollef option is now retired, and therefore
658 may not be used. --gnu is kept for compatibility.
659
660 --group
661 Group output. Output from each job is grouped together and is only
662 printed when the command is finished. Stdout (standard output)
663 first followed by stderr (standard error).
664
665 This takes in the order of 0.5ms per job and depends on the speed
666 of your disk for larger output. It can be disabled with -u, but
667 this means output from different commands can get mixed.
668
669 --group is the default. Can be reversed with -u.
670
671 See also: --line-buffer --ungroup
672
673 --group-by val
674 Group input by value. Combined with --pipe/--pipepart --group-by
675 groups lines with the same value into a record.
676
677 The value can be computed from the full line or from a single
678 column.
679
680 val can be:
681
682 column number Use the value in the column numbered.
683
684 column name Treat the first line as a header and use the value
685 in the column named.
686
687 (Not supported with --pipepart).
688
689 perl expression
690 Run the perl expression and use $_ as the value.
691
692 column number perl expression
693 Put the value of the column put in $_, run the perl
694 expression, and use $_ as the value.
695
696 column name perl expression
697 Put the value of the column put in $_, run the perl
698 expression, and use $_ as the value.
699
700 (Not supported with --pipepart).
701
702 Example:
703
704 UserID, Consumption
705 123, 1
706 123, 2
707 12-3, 1
708 221, 3
709 221, 1
710 2/21, 5
711
712 If you want to group 123, 12-3, 221, and 2/21 into 4 records and
713 pass one record at a time to wc:
714
715 tail -n +2 table.csv | \
716 parallel --pipe --colsep , --group-by 1 -kN1 wc
717
718 Make GNU parallel treat the first line as a header:
719
720 cat table.csv | \
721 parallel --pipe --colsep , --header : --group-by 1 -kN1 wc
722
723 Address column by column name:
724
725 cat table.csv | \
726 parallel --pipe --colsep , --header : --group-by UserID -kN1 wc
727
728 If 12-3 and 123 are really the same UserID, remove non-digits in
729 UserID when grouping:
730
731 cat table.csv | parallel --pipe --colsep , --header : \
732 --group-by 'UserID s/\D//g' -kN1 wc
733
734 See also --shard, --roundrobin.
735
736 --help
737 -h Print a summary of the options to GNU parallel and exit.
738
739 --halt-on-error val
740 --halt val
741 When should GNU parallel terminate? In some situations it makes no
742 sense to run all jobs. GNU parallel should simply give up as soon
743 as a condition is met.
744
745 val defaults to never, which runs all jobs no matter what.
746
747 val can also take on the form of when,why.
748
749 when can be 'now' which means kill all running jobs and halt
750 immediately, or it can be 'soon' which means wait for all running
751 jobs to complete, but start no new jobs.
752
753 why can be 'fail=X', 'fail=Y%', 'success=X', 'success=Y%',
754 'done=X', or 'done=Y%' where X is the number of jobs that has to
755 fail, succeed, or be done before halting, and Y is the percentage
756 of jobs that has to fail, succeed, or be done before halting.
757
758 Example:
759
760 --halt now,fail=1 exit when the first job fails. Kill running
761 jobs.
762
763 --halt soon,fail=3 exit when 3 jobs fail, but wait for running
764 jobs to complete.
765
766 --halt soon,fail=3% exit when 3% of the jobs have failed, but
767 wait for running jobs to complete.
768
769 --halt now,success=1 exit when a job succeeds. Kill running jobs.
770
771 --halt soon,success=3 exit when 3 jobs succeeds, but wait for
772 running jobs to complete.
773
774 --halt now,success=3% exit when 3% of the jobs have succeeded.
775 Kill running jobs.
776
777 --halt now,done=1 exit when one of the jobs finishes. Kill
778 running jobs.
779
780 --halt soon,done=3 exit when 3 jobs finishes, but wait for
781 running jobs to complete.
782
783 --halt now,done=3% exit when 3% of the jobs have finished. Kill
784 running jobs.
785
786 For backwards compatibility these also work:
787
788 0 never
789
790 1 soon,fail=1
791
792 2 now,fail=1
793
794 -1 soon,success=1
795
796 -2 now,success=1
797
798 1-99% soon,fail=1-99%
799
800 --header regexp
801 Use regexp as header. For normal usage the matched header
802 (typically the first line: --header '.*\n') will be split using
803 --colsep (which will default to '\t') and column names can be used
804 as replacement variables: {column name}, {column name/}, {column
805 name//}, {column name/.}, {column name.}, {=column name perl
806 expression =}, ..
807
808 For --pipe the matched header will be prepended to each output.
809
810 --header : is an alias for --header '.*\n'.
811
812 If regexp is a number, it is a fixed number of lines.
813
814 --hostgroups
815 --hgrp
816 Enable hostgroups on arguments. If an argument contains '@' the
817 string after '@' will be removed and treated as a list of
818 hostgroups on which this job is allowed to run. If there is no
819 --sshlogin with a corresponding group, the job will run on any
820 hostgroup.
821
822 Example:
823
824 parallel --hostgroups \
825 --sshlogin @grp1/myserver1 -S @grp1+grp2/myserver2 \
826 --sshlogin @grp3/myserver3 \
827 echo ::: my_grp1_arg@grp1 arg_for_grp2@grp2 third@grp1+grp3
828
829 my_grp1_arg may be run on either myserver1 or myserver2, third may
830 be run on either myserver1 or myserver3, but arg_for_grp2 will only
831 be run on myserver2.
832
833 See also: --sshlogin.
834
835 -I replace-str
836 Use the replacement string replace-str instead of {}.
837
838 --replace[=replace-str]
839 -i[replace-str]
840 This option is a synonym for -Ireplace-str if replace-str is
841 specified, and for -I {} otherwise. This option is deprecated; use
842 -I instead.
843
844 --joblog logfile
845 Logfile for executed jobs. Save a list of the executed jobs to
846 logfile in the following TAB separated format: sequence number,
847 sshlogin, start time as seconds since epoch, run time in seconds,
848 bytes in files transferred, bytes in files returned, exit status,
849 signal, and command run.
850
851 For --pipe bytes transferred and bytes returned are number of input
852 and output of bytes.
853
854 If logfile is prepended with '+' log lines will be appended to the
855 logfile.
856
857 To convert the times into ISO-8601 strict do:
858
859 cat logfile | perl -a -F"\t" -ne \
860 'chomp($F[2]=`date -d \@$F[2] +%FT%T`); print join("\t",@F)'
861
862 If the host is long, you can use column -t to pretty print it:
863
864 cat joblog | column -t
865
866 See also --resume --resume-failed.
867
868 --jobs N
869 -j N
870 --max-procs N
871 -P N
872 Number of jobslots on each machine. Run up to N jobs in parallel.
873 0 means as many as possible. Default is 100% which will run one job
874 per CPU on each machine.
875
876 If --semaphore is set, the default is 1 thus making a mutex.
877
878 --jobs +N
879 -j +N
880 --max-procs +N
881 -P +N
882 Add N to the number of CPUs. Run this many jobs in parallel. See
883 also --use-cores-instead-of-threads and
884 --use-sockets-instead-of-threads.
885
886 --jobs -N
887 -j -N
888 --max-procs -N
889 -P -N
890 Subtract N from the number of CPUs. Run this many jobs in
891 parallel. If the evaluated number is less than 1 then 1 will be
892 used. See also --use-cores-instead-of-threads and
893 --use-sockets-instead-of-threads.
894
895 --jobs N%
896 -j N%
897 --max-procs N%
898 -P N%
899 Multiply N% with the number of CPUs. Run this many jobs in
900 parallel. See also --use-cores-instead-of-threads and
901 --use-sockets-instead-of-threads.
902
903 --jobs procfile
904 -j procfile
905 --max-procs procfile
906 -P procfile
907 Read parameter from file. Use the content of procfile as parameter
908 for -j. E.g. procfile could contain the string 100% or +2 or 10. If
909 procfile is changed when a job completes, procfile is read again
910 and the new number of jobs is computed. If the number is lower than
911 before, running jobs will be allowed to finish but new jobs will
912 not be started until the wanted number of jobs has been reached.
913 This makes it possible to change the number of simultaneous running
914 jobs while GNU parallel is running.
915
916 --keep-order
917 -k Keep sequence of output same as the order of input. Normally the
918 output of a job will be printed as soon as the job completes. Try
919 this to see the difference:
920
921 parallel -j4 sleep {}\; echo {} ::: 2 1 4 3
922 parallel -j4 -k sleep {}\; echo {} ::: 2 1 4 3
923
924 If used with --onall or --nonall the output will grouped by
925 sshlogin in sorted order.
926
927 If used with --pipe --roundrobin and the same input, the jobslots
928 will get the same blocks in the same order in every run.
929
930 -k only affects the order in which the output is printed - not the
931 order in which jobs are run.
932
933 -L recsize
934 When used with --pipe: Read records of recsize.
935
936 When used otherwise: Use at most recsize nonblank input lines per
937 command line. Trailing blanks cause an input line to be logically
938 continued on the next input line.
939
940 -L 0 means read one line, but insert 0 arguments on the command
941 line.
942
943 Implies -X unless -m, --xargs, or --pipe is set.
944
945 --max-lines[=recsize]
946 -l[recsize]
947 When used with --pipe: Read records of recsize lines.
948
949 When used otherwise: Synonym for the -L option. Unlike -L, the
950 recsize argument is optional. If recsize is not specified, it
951 defaults to one. The -l option is deprecated since the POSIX
952 standard specifies -L instead.
953
954 -l 0 is an alias for -l 1.
955
956 Implies -X unless -m, --xargs, or --pipe is set.
957
958 --limit "command args"
959 Dynamic job limit. Before starting a new job run command with args.
960 The exit value of command determines what GNU parallel will do:
961
962 0 Below limit. Start another job.
963
964 1 Over limit. Start no jobs.
965
966 2 Way over limit. Kill the youngest job.
967
968 You can use any shell command. There are 3 predefined commands:
969
970 "io n" Limit for I/O. The amount of disk I/O will be computed as
971 a value 0-100, where 0 is no I/O and 100 is at least one
972 disk is 100% saturated.
973
974 "load n" Similar to --load.
975
976 "mem n" Similar to --memfree.
977
978 --line-buffer (beta testing)
979 --lb (beta testing)
980 Buffer output on line basis. --group will keep the output together
981 for a whole job. --ungroup allows output to mixup with half a line
982 coming from one job and half a line coming from another job.
983 --line-buffer fits between these two: GNU parallel will print a
984 full line, but will allow for mixing lines of different jobs.
985
986 --line-buffer takes more CPU power than both --group and --ungroup,
987 but can be much faster than --group if the CPU is not the limiting
988 factor.
989
990 Normally --line-buffer does not buffer on disk, and can thus
991 process an infinite amount of data, but it will buffer on disk when
992 combined with: --keep-order, --results, --compress, and --files.
993 This will make it as slow as --group and will limit output to the
994 available disk space.
995
996 With --keep-order --line-buffer will output lines from the first
997 job continuously while it is running, then lines from the second
998 job while that is running. It will buffer full lines, but jobs will
999 not mix. Compare:
1000
1001 parallel -j0 'echo {};sleep {};echo {}' ::: 1 3 2 4
1002 parallel -j0 --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
1003 parallel -j0 -k --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
1004
1005 See also: --group --ungroup
1006
1007 --xapply
1008 --link
1009 Link input sources. Read multiple input sources like xapply. If
1010 multiple input sources are given, one argument will be read from
1011 each of the input sources. The arguments can be accessed in the
1012 command as {1} .. {n}, so {1} will be a line from the first input
1013 source, and {6} will refer to the line with the same line number
1014 from the 6th input source.
1015
1016 Compare these two:
1017
1018 parallel echo {1} {2} ::: 1 2 3 ::: a b c
1019 parallel --link echo {1} {2} ::: 1 2 3 ::: a b c
1020
1021 Arguments will be recycled if one input source has more arguments
1022 than the others:
1023
1024 parallel --link echo {1} {2} {3} \
1025 ::: 1 2 ::: I II III ::: a b c d e f g
1026
1027 See also --header, :::+, ::::+.
1028
1029 --load max-load
1030 Do not start new jobs on a given computer unless the number of
1031 running processes on the computer is less than max-load. max-load
1032 uses the same syntax as --jobs, so 100% for one per CPU is a valid
1033 setting. Only difference is 0 which is interpreted as 0.01.
1034
1035 --controlmaster
1036 -M Use ssh's ControlMaster to make ssh connections faster. Useful if
1037 jobs run remote and are very fast to run. This is disabled for
1038 sshlogins that specify their own ssh command.
1039
1040 --xargs
1041 Multiple arguments. Insert as many arguments as the command line
1042 length permits.
1043
1044 If {} is not used the arguments will be appended to the line. If
1045 {} is used multiple times each {} will be replaced with all the
1046 arguments.
1047
1048 Support for --xargs with --sshlogin is limited and may fail.
1049
1050 See also -X for context replace. If in doubt use -X as that will
1051 most likely do what is needed.
1052
1053 -m Multiple arguments. Insert as many arguments as the command line
1054 length permits. If multiple jobs are being run in parallel:
1055 distribute the arguments evenly among the jobs. Use -j1 or --xargs
1056 to avoid this.
1057
1058 If {} is not used the arguments will be appended to the line. If
1059 {} is used multiple times each {} will be replaced with all the
1060 arguments.
1061
1062 Support for -m with --sshlogin is limited and may fail.
1063
1064 See also -X for context replace. If in doubt use -X as that will
1065 most likely do what is needed.
1066
1067 --memfree size
1068 Minimum memory free when starting another job. The size can be
1069 postfixed with K, M, G, T, P, k, m, g, t, or p which would multiply
1070 the size with 1024, 1048576, 1073741824, 1099511627776,
1071 1125899906842624, 1000, 1000000, 1000000000, 1000000000000, or
1072 1000000000000000, respectively.
1073
1074 If the jobs take up very different amount of RAM, GNU parallel will
1075 only start as many as there is memory for. If less than size bytes
1076 are free, no more jobs will be started. If less than 50% size bytes
1077 are free, the youngest job will be killed, and put back on the
1078 queue to be run later.
1079
1080 --retries must be set to determine how many times GNU parallel
1081 should retry a given job.
1082
1083 --minversion version
1084 Print the version GNU parallel and exit. If the current version of
1085 GNU parallel is less than version the exit code is 255. Otherwise
1086 it is 0.
1087
1088 This is useful for scripts that depend on features only available
1089 from a certain version of GNU parallel.
1090
1091 --nonall
1092 --onall with no arguments. Run the command on all computers given
1093 with --sshlogin but take no arguments. GNU parallel will log into
1094 --jobs number of computers in parallel and run the job on the
1095 computer. -j adjusts how many computers to log into in parallel.
1096
1097 This is useful for running the same command (e.g. uptime) on a list
1098 of servers.
1099
1100 --onall
1101 Run all the jobs on all computers given with --sshlogin. GNU
1102 parallel will log into --jobs number of computers in parallel and
1103 run one job at a time on the computer. The order of the jobs will
1104 not be changed, but some computers may finish before others.
1105
1106 When using --group the output will be grouped by each server, so
1107 all the output from one server will be grouped together.
1108
1109 --joblog will contain an entry for each job on each server, so
1110 there will be several job sequence 1.
1111
1112 --output-as-files
1113 --outputasfiles
1114 --files
1115 Instead of printing the output to stdout (standard output) the
1116 output of each job is saved in a file and the filename is then
1117 printed.
1118
1119 See also: --results
1120
1121 --pipe
1122 --spreadstdin
1123 Spread input to jobs on stdin (standard input). Read a block of
1124 data from stdin (standard input) and give one block of data as
1125 input to one job.
1126
1127 The block size is determined by --block. The strings --recstart and
1128 --recend tell GNU parallel how a record starts and/or ends. The
1129 block read will have the final partial record removed before the
1130 block is passed on to the job. The partial record will be prepended
1131 to next block.
1132
1133 If --recstart is given this will be used to split at record start.
1134
1135 If --recend is given this will be used to split at record end.
1136
1137 If both --recstart and --recend are given both will have to match
1138 to find a split position.
1139
1140 If neither --recstart nor --recend are given --recend defaults to
1141 '\n'. To have no record separator use --recend "".
1142
1143 --files is often used with --pipe.
1144
1145 --pipe maxes out at around 1 GB/s input, and 100 MB/s output. If
1146 performance is important use --pipepart.
1147
1148 See also: --recstart, --recend, --fifo, --cat, --pipepart, --files.
1149
1150 --pipepart
1151 Pipe parts of a physical file. --pipepart works similar to --pipe,
1152 but is much faster.
1153
1154 --pipepart has a few limitations:
1155
1156 · The file must be a normal file or a block device (technically it
1157 must be seekable) and must be given using -a or ::::. The file
1158 cannot be a pipe or a fifo as they are not seekable.
1159
1160 If using a block device with lot of NUL bytes, remember to set
1161 --recend ''.
1162
1163 · Record counting (-N) and line counting (-L/-l) do not work.
1164
1165 --plain
1166 Ignore any --profile, $PARALLEL, and ~/.parallel/config to get full
1167 control on the command line (used by GNU parallel internally when
1168 called with --sshlogin).
1169
1170 --plus
1171 Activate additional replacement strings: {+/} {+.} {+..} {+...}
1172 {..} {...} {/..} {/...} {##}. The idea being that '{+foo}' matches
1173 the opposite of '{foo}' and {} = {+/}/{/} = {.}.{+.} =
1174 {+/}/{/.}.{+.} = {..}.{+..} = {+/}/{/..}.{+..} = {...}.{+...} =
1175 {+/}/{/...}.{+...}
1176
1177 {##} is the number of jobs to be run. It is incompatible with
1178 -X/-m/--xargs.
1179
1180 {choose_k} is inspired by n choose k: Given a list of n elements,
1181 choose k. k is the number of input sources and n is the number of
1182 arguments in an input source. The content of the input sources
1183 must be the same and the arguments must be unique.
1184
1185 The following dynamic replacement strings are also activated. They
1186 are inspired by bash's parameter expansion:
1187
1188 {:-str} str if the value is empty
1189 {:num} remove the first num characters
1190 {:num1:num2} characters from num1 to num2
1191 {#str} remove prefix str
1192 {%str} remove postfix str
1193 {/str1/str2} replace str1 with str2
1194 {^str} uppercase str if found at the start
1195 {^^str} uppercase str
1196 {,str} lowercase str if found at the start
1197 {,,str} lowercase str
1198
1199 --progress
1200 Show progress of computations. List the computers involved in the
1201 task with number of CPUs detected and the max number of jobs to
1202 run. After that show progress for each computer: number of running
1203 jobs, number of completed jobs, and percentage of all jobs done by
1204 this computer. The percentage will only be available after all jobs
1205 have been scheduled as GNU parallel only read the next job when
1206 ready to schedule it - this is to avoid wasting time and memory by
1207 reading everything at startup.
1208
1209 By sending GNU parallel SIGUSR2 you can toggle turning on/off
1210 --progress on a running GNU parallel process.
1211
1212 See also --eta and --bar.
1213
1214 --max-args=max-args
1215 -n max-args
1216 Use at most max-args arguments per command line. Fewer than max-
1217 args arguments will be used if the size (see the -s option) is
1218 exceeded, unless the -x option is given, in which case GNU parallel
1219 will exit.
1220
1221 -n 0 means read one argument, but insert 0 arguments on the command
1222 line.
1223
1224 Implies -X unless -m is set.
1225
1226 --max-replace-args=max-args
1227 -N max-args
1228 Use at most max-args arguments per command line. Like -n but also
1229 makes replacement strings {1} .. {max-args} that represents
1230 argument 1 .. max-args. If too few args the {n} will be empty.
1231
1232 -N 0 means read one argument, but insert 0 arguments on the command
1233 line.
1234
1235 This will set the owner of the homedir to the user:
1236
1237 tr ':' '\n' < /etc/passwd | parallel -N7 chown {1} {6}
1238
1239 Implies -X unless -m or --pipe is set.
1240
1241 When used with --pipe -N is the number of records to read. This is
1242 somewhat slower than --block.
1243
1244 --max-line-length-allowed
1245 Print the maximal number of characters allowed on the command line
1246 and exit (used by GNU parallel itself to determine the line length
1247 on remote computers).
1248
1249 --number-of-cpus (obsolete)
1250 Print the number of physical CPU cores and exit.
1251
1252 --number-of-cores (beta testing)
1253 Print the number of physical CPU cores and exit (used by GNU
1254 parallel itself to determine the number of physical CPU cores on
1255 remote computers).
1256
1257 --number-of-sockets (beta testing)
1258 Print the number of filled CPU sockets and exit (used by GNU
1259 parallel itself to determine the number of filled CPU sockets on
1260 remote computers).
1261
1262 --number-of-threads (beta testing)
1263 Print the number of hyperthreaded CPU cores and exit (used by GNU
1264 parallel itself to determine the number of hyperthreaded CPU cores
1265 on remote computers).
1266
1267 --no-keep-order
1268 Overrides an earlier --keep-order (e.g. if set in
1269 ~/.parallel/config).
1270
1271 --nice niceness (alpha testing)
1272 Run the command at this niceness.
1273
1274 By default GNU parallel will run jobs at the same nice level as GNU
1275 parallel is started - both on the local machine and remote servers,
1276 so you are unlikely to ever use this option.
1277
1278 Setting --nice will override this nice level. If the nice level is
1279 smaller than the current nice level, it will only affect remote
1280 jobs (e.g. current level is 10 and --nice 5 will cause local jobs
1281 to be run at level 10, but remote jobs run at nice level 5).
1282
1283 --interactive
1284 -p Prompt the user about whether to run each command line and read a
1285 line from the terminal. Only run the command line if the response
1286 starts with 'y' or 'Y'. Implies -t.
1287
1288 --parens parensstring
1289 Define start and end parenthesis for {= perl expression =}. The
1290 left and the right parenthesis can be multiple characters and are
1291 assumed to be the same length. The default is {==} giving {= as the
1292 start parenthesis and =} as the end parenthesis.
1293
1294 Another useful setting is ,,,, which would make both parenthesis
1295 ,,:
1296
1297 parallel --parens ,,,, echo foo is ,,s/I/O/g,, ::: FII
1298
1299 See also: --rpl {= perl expression =}
1300
1301 --profile profilename (beta testing)
1302 -J profilename (beta testing)
1303 Use profile profilename for options. This is useful if you want to
1304 have multiple profiles. You could have one profile for running jobs
1305 in parallel on the local computer and a different profile for
1306 running jobs on remote computers. See the section PROFILE FILES for
1307 examples.
1308
1309 profilename corresponds to the file ~/.parallel/profilename.
1310
1311 You can give multiple profiles by repeating --profile. If parts of
1312 the profiles conflict, the later ones will be used.
1313
1314 Default: config
1315
1316 --quote
1317 -q Quote command. The command must be a simple command (see man bash)
1318 without redirections and without variable assignments. This will
1319 quote the command line and arguments so special characters are not
1320 interpreted by the shell. See the section QUOTING. Most people will
1321 never need this. Quoting is disabled by default.
1322
1323 --no-run-if-empty
1324 -r If the stdin (standard input) only contains whitespace, do not run
1325 the command.
1326
1327 If used with --pipe this is slow.
1328
1329 --noswap
1330 Do not start new jobs on a given computer if there is both swap-in
1331 and swap-out activity.
1332
1333 The swap activity is only sampled every 10 seconds as the sampling
1334 takes 1 second to do.
1335
1336 Swap activity is computed as (swap-in)*(swap-out) which in practice
1337 is a good value: swapping out is not a problem, swapping in is not
1338 a problem, but both swapping in and out usually indicates a
1339 problem.
1340
1341 --memfree may give better results, so try using that first.
1342
1343 --record-env
1344 Record current environment variables in ~/.parallel/ignored_vars.
1345 This is useful before using --env _.
1346
1347 See also --env, --session.
1348
1349 --recstart startstring
1350 --recend endstring
1351 If --recstart is given startstring will be used to split at record
1352 start.
1353
1354 If --recend is given endstring will be used to split at record end.
1355
1356 If both --recstart and --recend are given the combined string
1357 endstringstartstring will have to match to find a split position.
1358 This is useful if either startstring or endstring match in the
1359 middle of a record.
1360
1361 If neither --recstart nor --recend are given then --recend defaults
1362 to '\n'. To have no record separator use --recend "".
1363
1364 --recstart and --recend are used with --pipe.
1365
1366 Use --regexp to interpret --recstart and --recend as regular
1367 expressions. This is slow, however.
1368
1369 --regexp
1370 Use --regexp to interpret --recstart and --recend as regular
1371 expressions. This is slow, however.
1372
1373 --remove-rec-sep
1374 --removerecsep
1375 --rrs
1376 Remove the text matched by --recstart and --recend before piping it
1377 to the command.
1378
1379 Only used with --pipe.
1380
1381 --results name
1382 --res name
1383 Save the output into files.
1384
1385 Simple string output dir
1386
1387 If name does not contain replacement strings and does not end in
1388 .csv/.tsv, the output will be stored in a directory tree rooted at
1389 name. Within this directory tree, each command will result in
1390 three files: name/<ARGS>/stdout and name/<ARGS>/stderr,
1391 name/<ARGS>/seq, where <ARGS> is a sequence of directories
1392 representing the header of the input source (if using --header :)
1393 or the number of the input source and corresponding values.
1394
1395 E.g:
1396
1397 parallel --header : --results foo echo {a} {b} \
1398 ::: a I II ::: b III IIII
1399
1400 will generate the files:
1401
1402 foo/a/II/b/III/seq
1403 foo/a/II/b/III/stderr
1404 foo/a/II/b/III/stdout
1405 foo/a/II/b/IIII/seq
1406 foo/a/II/b/IIII/stderr
1407 foo/a/II/b/IIII/stdout
1408 foo/a/I/b/III/seq
1409 foo/a/I/b/III/stderr
1410 foo/a/I/b/III/stdout
1411 foo/a/I/b/IIII/seq
1412 foo/a/I/b/IIII/stderr
1413 foo/a/I/b/IIII/stdout
1414
1415 and
1416
1417 parallel --results foo echo {1} {2} ::: I II ::: III IIII
1418
1419 will generate the files:
1420
1421 foo/1/II/2/III/seq
1422 foo/1/II/2/III/stderr
1423 foo/1/II/2/III/stdout
1424 foo/1/II/2/IIII/seq
1425 foo/1/II/2/IIII/stderr
1426 foo/1/II/2/IIII/stdout
1427 foo/1/I/2/III/seq
1428 foo/1/I/2/III/stderr
1429 foo/1/I/2/III/stdout
1430 foo/1/I/2/IIII/seq
1431 foo/1/I/2/IIII/stderr
1432 foo/1/I/2/IIII/stdout
1433
1434 CSV file output
1435
1436 If name ends in .csv/.tsv the output will be a CSV-file named name.
1437
1438 .csv gives a comma separated value file. .tsv gives a TAB separated
1439 value file.
1440
1441 -.csv/-.tsv are special: It will give the file on stdout (standard
1442 output).
1443
1444 Replacement string output file
1445
1446 If name contains a replacement string and the replaced result does
1447 not end in /, then the standard output will be stored in a file
1448 named by this result. Standard error will be stored in the same
1449 file name with '.err' added, and the sequence number will be stored
1450 in the same file name with '.seq' added.
1451
1452 E.g.
1453
1454 parallel --results my_{} echo ::: foo bar baz
1455
1456 will generate the files:
1457
1458 my_bar
1459 my_bar.err
1460 my_bar.seq
1461 my_baz
1462 my_baz.err
1463 my_baz.seq
1464 my_foo
1465 my_foo.err
1466 my_foo.seq
1467
1468 Replacement string output dir
1469
1470 If name contains a replacement string and the replaced result ends
1471 in /, then output files will be stored in the resulting dir.
1472
1473 E.g.
1474
1475 parallel --results my_{}/ echo ::: foo bar baz
1476
1477 will generate the files:
1478
1479 my_bar/seq
1480 my_bar/stderr
1481 my_bar/stdout
1482 my_baz/seq
1483 my_baz/stderr
1484 my_baz/stdout
1485 my_foo/seq
1486 my_foo/stderr
1487 my_foo/stdout
1488
1489 See also --files, --tag, --header, --joblog.
1490
1491 --resume
1492 Resumes from the last unfinished job. By reading --joblog or the
1493 --results dir GNU parallel will figure out the last unfinished job
1494 and continue from there. As GNU parallel only looks at the sequence
1495 numbers in --joblog then the input, the command, and --joblog all
1496 have to remain unchanged; otherwise GNU parallel may run wrong
1497 commands.
1498
1499 See also --joblog, --results, --resume-failed, --retries.
1500
1501 --resume-failed
1502 Retry all failed and resume from the last unfinished job. By
1503 reading --joblog GNU parallel will figure out the failed jobs and
1504 run those again. After that it will resume last unfinished job and
1505 continue from there. As GNU parallel only looks at the sequence
1506 numbers in --joblog then the input, the command, and --joblog all
1507 have to remain unchanged; otherwise GNU parallel may run wrong
1508 commands.
1509
1510 See also --joblog, --resume, --retry-failed, --retries.
1511
1512 --retry-failed
1513 Retry all failed jobs in joblog. By reading --joblog GNU parallel
1514 will figure out the failed jobs and run those again.
1515
1516 --retry-failed ignores the command and arguments on the command
1517 line: It only looks at the joblog.
1518
1519 Differences between --resume, --resume-failed, --retry-failed
1520
1521 In this example exit {= $_%=2 =} will cause every other job to
1522 fail.
1523
1524 timeout -k 1 4 parallel --joblog log -j10 \
1525 'sleep {}; exit {= $_%=2 =}' ::: {10..1}
1526
1527 4 jobs completed. 2 failed:
1528
1529 Seq [...] Exitval Signal Command
1530 10 [...] 1 0 sleep 1; exit 1
1531 9 [...] 0 0 sleep 2; exit 0
1532 8 [...] 1 0 sleep 3; exit 1
1533 7 [...] 0 0 sleep 4; exit 0
1534
1535 --resume does not care about the Exitval, but only looks at Seq. If
1536 the Seq is run, it will not be run again. So if needed, you can
1537 change the command for the seqs not run yet:
1538
1539 parallel --resume --joblog log -j10 \
1540 'sleep .{}; exit {= $_%=2 =}' ::: {10..1}
1541
1542 Seq [...] Exitval Signal Command
1543 [... as above ...]
1544 1 [...] 0 0 sleep .10; exit 0
1545 6 [...] 1 0 sleep .5; exit 1
1546 5 [...] 0 0 sleep .6; exit 0
1547 4 [...] 1 0 sleep .7; exit 1
1548 3 [...] 0 0 sleep .8; exit 0
1549 2 [...] 1 0 sleep .9; exit 1
1550
1551 --resume-failed cares about the Exitval, but also only looks at Seq
1552 to figure out which commands to run. Again this means you can
1553 change the command, but not the arguments. It will run the failed
1554 seqs and the seqs not yet run:
1555
1556 parallel --resume-failed --joblog log -j10 \
1557 'echo {};sleep .{}; exit {= $_%=3 =}' ::: {10..1}
1558
1559 Seq [...] Exitval Signal Command
1560 [... as above ...]
1561 10 [...] 1 0 echo 1;sleep .1; exit 1
1562 8 [...] 0 0 echo 3;sleep .3; exit 0
1563 6 [...] 2 0 echo 5;sleep .5; exit 2
1564 4 [...] 1 0 echo 7;sleep .7; exit 1
1565 2 [...] 0 0 echo 9;sleep .9; exit 0
1566
1567 --retry-failed cares about the Exitval, but takes the command from
1568 the joblog. It ignores any arguments or commands given on the
1569 command line:
1570
1571 parallel --retry-failed --joblog log -j10 this part is ignored
1572
1573 Seq [...] Exitval Signal Command
1574 [... as above ...]
1575 10 [...] 1 0 echo 1;sleep .1; exit 1
1576 6 [...] 2 0 echo 5;sleep .5; exit 2
1577 4 [...] 1 0 echo 7;sleep .7; exit 1
1578
1579 See also --joblog, --resume, --resume-failed, --retries.
1580
1581 --retries n
1582 If a job fails, retry it on another computer on which it has not
1583 failed. Do this n times. If there are fewer than n computers in
1584 --sshlogin GNU parallel will re-use all the computers. This is
1585 useful if some jobs fail for no apparent reason (such as network
1586 failure).
1587
1588 --return filename
1589 Transfer files from remote computers. --return is used with
1590 --sshlogin when the arguments are files on the remote computers.
1591 When processing is done the file filename will be transferred from
1592 the remote computer using rsync and will be put relative to the
1593 default login dir. E.g.
1594
1595 echo foo/bar.txt | parallel --return {.}.out \
1596 --sshlogin server.example.com touch {.}.out
1597
1598 This will transfer the file $HOME/foo/bar.out from the computer
1599 server.example.com to the file foo/bar.out after running touch
1600 foo/bar.out on server.example.com.
1601
1602 parallel -S server --trc out/./{}.out touch {}.out ::: in/file
1603
1604 This will transfer the file in/file.out from the computer
1605 server.example.com to the files out/in/file.out after running touch
1606 in/file.out on server.
1607
1608 echo /tmp/foo/bar.txt | parallel --return {.}.out \
1609 --sshlogin server.example.com touch {.}.out
1610
1611 This will transfer the file /tmp/foo/bar.out from the computer
1612 server.example.com to the file /tmp/foo/bar.out after running touch
1613 /tmp/foo/bar.out on server.example.com.
1614
1615 Multiple files can be transferred by repeating the option multiple
1616 times:
1617
1618 echo /tmp/foo/bar.txt | parallel \
1619 --sshlogin server.example.com \
1620 --return {.}.out --return {.}.out2 touch {.}.out {.}.out2
1621
1622 --return is often used with --transferfile and --cleanup.
1623
1624 --return is ignored when used with --sshlogin : or when not used
1625 with --sshlogin.
1626
1627 --round-robin
1628 --round
1629 Normally --pipe will give a single block to each instance of the
1630 command. With --roundrobin all blocks will at random be written to
1631 commands already running. This is useful if the command takes a
1632 long time to initialize.
1633
1634 --keep-order will not work with --roundrobin as it is impossible to
1635 track which input block corresponds to which output.
1636
1637 --roundrobin implies --pipe, except if --pipepart is given.
1638
1639 See also --group-by, --shard.
1640
1641 --rpl 'tag perl expression'
1642 Use tag as a replacement string for perl expression. This makes it
1643 possible to define your own replacement strings. GNU parallel's 7
1644 replacement strings are implemented as:
1645
1646 --rpl '{} '
1647 --rpl '{#} 1 $_=$job->seq()'
1648 --rpl '{%} 1 $_=$job->slot()'
1649 --rpl '{/} s:.*/::'
1650 --rpl '{//} $Global::use{"File::Basename"} ||=
1651 eval "use File::Basename; 1;"; $_ = dirname($_);'
1652 --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
1653 --rpl '{.} s:\.[^/.]+$::'
1654
1655 The --plus replacement strings are implemented as:
1656
1657 --rpl '{+/} s:/[^/]*$::'
1658 --rpl '{+.} s:.*\.::'
1659 --rpl '{+..} s:.*\.([^.]*\.):$1:'
1660 --rpl '{+...} s:.*\.([^.]*\.[^.]*\.):$1:'
1661 --rpl '{..} s:\.[^/.]+$::; s:\.[^/.]+$::'
1662 --rpl '{...} s:\.[^/.]+$::; s:\.[^/.]+$::; s:\.[^/.]+$::'
1663 --rpl '{/..} s:.*/::; s:\.[^/.]+$::; s:\.[^/.]+$::'
1664 --rpl '{/...} s:.*/::;s:\.[^/.]+$::;s:\.[^/.]+$::;s:\.[^/.]+$::'
1665 --rpl '{##} $_=total_jobs()'
1666 --rpl '{:-(.+?)} $_ ||= $$1'
1667 --rpl '{:(\d+?)} substr($_,0,$$1) = ""'
1668 --rpl '{:(\d+?):(\d+?)} $_ = substr($_,$$1,$$2);'
1669 --rpl '{#([^#].*?)} s/^$$1//;'
1670 --rpl '{%(.+?)} s/$$1$//;'
1671 --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;'
1672 --rpl '{^(.+?)} s/^($$1)/uc($1)/e;'
1673 --rpl '{^^(.+?)} s/($$1)/uc($1)/eg;'
1674 --rpl '{,(.+?)} s/^($$1)/lc($1)/e;'
1675 --rpl '{,,(.+?)} s/($$1)/lc($1)/eg;'
1676
1677 If the user defined replacement string starts with '{' it can also
1678 be used as a positional replacement string (like {2.}).
1679
1680 It is recommended to only change $_ but you have full access to all
1681 of GNU parallel's internal functions and data structures.
1682
1683 Here are a few examples:
1684
1685 Is the job sequence even or odd?
1686 --rpl '{odd} $_ = seq() % 2 ? "odd" : "even"'
1687 Pad job sequence with leading zeros to get equal width
1688 --rpl '{0#} $f=1+int("".(log(total_jobs())/log(10)));
1689 $_=sprintf("%0${f}d",seq())'
1690 Job sequence counting from 0
1691 --rpl '{#0} $_ = seq() - 1'
1692 Job slot counting from 2
1693 --rpl '{%1} $_ = slot() + 1'
1694 Remove all extensions
1695 --rpl '{:} s:(\.[^/]+)*$::'
1696
1697 You can have dynamic replacement strings by including parenthesis
1698 in the replacement string and adding a regular expression between
1699 the parenthesis. The matching string will be inserted as $$1:
1700
1701 parallel --rpl '{%(.*?)} s/$$1//' echo {%.tar.gz} ::: my.tar.gz
1702 parallel --rpl '{:%(.+?)} s:$$1(\.[^/]+)*$::' \
1703 echo {:%_file} ::: my_file.tar.gz
1704 parallel -n3 --rpl '{/:%(.*?)} s:.*/(.*)$$1(\.[^/]+)*$:$1:' \
1705 echo job {#}: {2} {2.} {3/:%_1} ::: a/b.c c/d.e f/g_1.h.i
1706
1707 You can even use multiple matches:
1708
1709 parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;'
1710 echo {/replacethis/withthis} {/b/C} ::: a_replacethis_b
1711
1712 parallel --rpl '{(.*?)/(.*?)} $_="$$2$_$$1"' \
1713 echo {swap/these} ::: -middle-
1714
1715 See also: {= perl expression =} --parens
1716
1717 --rsync-opts options
1718 Options to pass on to rsync. Setting --rsync-opts takes precedence
1719 over setting the environment variable $PARALLEL_RSYNC_OPTS.
1720
1721 --max-chars=max-chars
1722 -s max-chars
1723 Use at most max-chars characters per command line, including the
1724 command and initial-arguments and the terminating nulls at the ends
1725 of the argument strings. The largest allowed value is system-
1726 dependent, and is calculated as the argument length limit for exec,
1727 less the size of your environment. The default value is the
1728 maximum.
1729
1730 Implies -X unless -m is set.
1731
1732 --show-limits
1733 Display the limits on the command-line length which are imposed by
1734 the operating system and the -s option. Pipe the input from
1735 /dev/null (and perhaps specify --no-run-if-empty) if you don't want
1736 GNU parallel to do anything.
1737
1738 --semaphore
1739 Work as a counting semaphore. --semaphore will cause GNU parallel
1740 to start command in the background. When the number of jobs given
1741 by --jobs is reached, GNU parallel will wait for one of these to
1742 complete before starting another command.
1743
1744 --semaphore implies --bg unless --fg is specified.
1745
1746 --semaphore implies --semaphorename `tty` unless --semaphorename is
1747 specified.
1748
1749 Used with --fg, --wait, and --semaphorename.
1750
1751 The command sem is an alias for parallel --semaphore.
1752
1753 See also man sem.
1754
1755 --semaphorename name
1756 --id name
1757 Use name as the name of the semaphore. Default is the name of the
1758 controlling tty (output from tty).
1759
1760 The default normally works as expected when used interactively, but
1761 when used in a script name should be set. $$ or my_task_name are
1762 often a good value.
1763
1764 The semaphore is stored in ~/.parallel/semaphores/
1765
1766 Implies --semaphore.
1767
1768 See also man sem.
1769
1770 --semaphoretimeout secs
1771 --st secs
1772 If secs > 0: If the semaphore is not released within secs seconds,
1773 take it anyway.
1774
1775 If secs < 0: If the semaphore is not released within secs seconds,
1776 exit.
1777
1778 Implies --semaphore.
1779
1780 See also man sem.
1781
1782 --seqreplace replace-str
1783 Use the replacement string replace-str instead of {#} for job
1784 sequence number.
1785
1786 --session
1787 Record names in current environment in $PARALLEL_IGNORED_NAMES and
1788 exit. Only used with env_parallel. Aliases, functions, and
1789 variables with names in $PARALLEL_IGNORED_NAMES will not be copied.
1790
1791 Only supported in Ash, Bash, Dash, Ksh, Sh, and Zsh.
1792
1793 See also --env, --record-env.
1794
1795 --shard shardexpr (alpha testing)
1796 Use shardexpr as shard key and shard input to the jobs.
1797
1798 shardexpr is [column number|column name] [perlexpression] e.g. 3,
1799 Address, 3 $_%=100, Address s/\d//g.
1800
1801 Each input line is split using --colsep. The value of the column is
1802 put into $_, the perl expression is executed, the resulting value
1803 is hashed so that all lines of a given value is given to the same
1804 job slot.
1805
1806 This is similar to sharding in databases.
1807
1808 The performance is in the order of 100K rows per second. Faster if
1809 the shardcol is small (<10), slower if it is big (>100).
1810
1811 --shard requires --pipe and a fixed numeric value for --jobs.
1812
1813 See also --bin, --group-by, --roundrobin.
1814
1815 --shebang
1816 --hashbang
1817 GNU parallel can be called as a shebang (#!) command as the first
1818 line of a script. The content of the file will be treated as
1819 inputsource.
1820
1821 Like this:
1822
1823 #!/usr/bin/parallel --shebang -r wget
1824
1825 https://ftpmirror.gnu.org/parallel/parallel-20120822.tar.bz2
1826 https://ftpmirror.gnu.org/parallel/parallel-20130822.tar.bz2
1827 https://ftpmirror.gnu.org/parallel/parallel-20140822.tar.bz2
1828
1829 --shebang must be set as the first option.
1830
1831 On FreeBSD env is needed:
1832
1833 #!/usr/bin/env -S parallel --shebang -r wget
1834
1835 https://ftpmirror.gnu.org/parallel/parallel-20120822.tar.bz2
1836 https://ftpmirror.gnu.org/parallel/parallel-20130822.tar.bz2
1837 https://ftpmirror.gnu.org/parallel/parallel-20140822.tar.bz2
1838
1839 There are many limitations of shebang (#!) depending on your
1840 operating system. See details on
1841 http://www.in-ulm.de/~mascheck/various/shebang/
1842
1843 --shebang-wrap
1844 GNU parallel can parallelize scripts by wrapping the shebang line.
1845 If the program can be run like this:
1846
1847 cat arguments | parallel the_program
1848
1849 then the script can be changed to:
1850
1851 #!/usr/bin/parallel --shebang-wrap /original/parser --options
1852
1853 E.g.
1854
1855 #!/usr/bin/parallel --shebang-wrap /usr/bin/python
1856
1857 If the program can be run like this:
1858
1859 cat data | parallel --pipe the_program
1860
1861 then the script can be changed to:
1862
1863 #!/usr/bin/parallel --shebang-wrap --pipe /orig/parser --opts
1864
1865 E.g.
1866
1867 #!/usr/bin/parallel --shebang-wrap --pipe /usr/bin/perl -w
1868
1869 --shebang-wrap must be set as the first option.
1870
1871 --shellquote
1872 Does not run the command but quotes it. Useful for making quoted
1873 composed commands for GNU parallel.
1874
1875 Multiple --shellquote with quote the string multiple times, so
1876 parallel --shellquote | parallel --shellquote can be written as
1877 parallel --shellquote --shellquote.
1878
1879 --shuf
1880 Shuffle jobs. When having multiple input sources it is hard to
1881 randomize jobs. --shuf will generate all jobs, and shuffle them
1882 before running them. This is useful to get a quick preview of the
1883 results before running the full batch.
1884
1885 --skip-first-line
1886 Do not use the first line of input (used by GNU parallel itself
1887 when called with --shebang).
1888
1889 --sql DBURL (obsolete)
1890 Use --sqlmaster instead.
1891
1892 --sqlmaster DBURL
1893 Submit jobs via SQL server. DBURL must point to a table, which will
1894 contain the same information as --joblog, the values from the input
1895 sources (stored in columns V1 .. Vn), and the output (stored in
1896 columns Stdout and Stderr).
1897
1898 If DBURL is prepended with '+' GNU parallel assumes the table is
1899 already made with the correct columns and appends the jobs to it.
1900
1901 If DBURL is not prepended with '+' the table will be dropped and
1902 created with the correct amount of V-columns unless
1903
1904 --sqlmaster does not run any jobs, but it creates the values for
1905 the jobs to be run. One or more --sqlworker must be run to actually
1906 execute the jobs.
1907
1908 If --wait is set, GNU parallel will wait for the jobs to complete.
1909
1910 The format of a DBURL is:
1911
1912 [sql:]vendor://[[user][:pwd]@][host][:port]/[db]/table
1913
1914 E.g.
1915
1916 sql:mysql://hr:hr@localhost:3306/hrdb/jobs
1917 mysql://scott:tiger@my.example.com/pardb/paralleljobs
1918 sql:oracle://scott:tiger@ora.example.com/xe/parjob
1919 postgresql://scott:tiger@pg.example.com/pgdb/parjob
1920 pg:///parjob
1921 sqlite3:///pardb/parjob
1922
1923 It can also be an alias from ~/.sql/aliases:
1924
1925 :myalias mysql:///mydb/paralleljobs
1926
1927 --sqlandworker DBURL
1928 Shorthand for: --sqlmaster DBURL --sqlworker DBURL.
1929
1930 --sqlworker DBURL
1931 Execute jobs via SQL server. Read the input sources variables from
1932 the table pointed to by DBURL. The command on the command line
1933 should be the same as given by --sqlmaster.
1934
1935 If you have more than one --sqlworker jobs may be run more than
1936 once.
1937
1938 If --sqlworker runs on the local machine, the hostname in the SQL
1939 table will not be ':' but instead the hostname of the machine.
1940
1941 --ssh sshcommand
1942 GNU parallel defaults to using ssh for remote access. This can be
1943 overridden with --ssh. It can also be set on a per server basis
1944 (see --sshlogin).
1945
1946 --sshdelay secs
1947 Delay starting next ssh by secs seconds. GNU parallel will pause
1948 secs seconds after starting each ssh. secs can be less than 1
1949 seconds.
1950
1951 -S
1952 [@hostgroups/][ncpus/]sshlogin[,[@hostgroups/][ncpus/]sshlogin[,...]]
1953 -S @hostgroup
1954 --sshlogin
1955 [@hostgroups/][ncpus/]sshlogin[,[@hostgroups/][ncpus/]sshlogin[,...]]
1956 --sshlogin @hostgroup
1957 Distribute jobs to remote computers. The jobs will be run on a list
1958 of remote computers.
1959
1960 If hostgroups is given, the sshlogin will be added to that
1961 hostgroup. Multiple hostgroups are separated by '+'. The sshlogin
1962 will always be added to a hostgroup named the same as sshlogin.
1963
1964 If only the @hostgroup is given, only the sshlogins in that
1965 hostgroup will be used. Multiple @hostgroup can be given.
1966
1967 GNU parallel will determine the number of CPUs on the remote
1968 computers and run the number of jobs as specified by -j. If the
1969 number ncpus is given GNU parallel will use this number for number
1970 of CPUs on the host. Normally ncpus will not be needed.
1971
1972 An sshlogin is of the form:
1973
1974 [sshcommand [options]] [username@]hostname
1975
1976 The sshlogin must not require a password (ssh-agent, ssh-copy-id,
1977 and sshpass may help with that).
1978
1979 The sshlogin ':' is special, it means 'no ssh' and will therefore
1980 run on the local computer.
1981
1982 The sshlogin '..' is special, it read sshlogins from
1983 ~/.parallel/sshloginfile or $XDG_CONFIG_HOME/parallel/sshloginfile
1984
1985 The sshlogin '-' is special, too, it read sshlogins from stdin
1986 (standard input).
1987
1988 To specify more sshlogins separate the sshlogins by comma, newline
1989 (in the same string), or repeat the options multiple times.
1990
1991 For examples: see --sshloginfile.
1992
1993 The remote host must have GNU parallel installed.
1994
1995 --sshlogin is known to cause problems with -m and -X.
1996
1997 --sshlogin is often used with --transferfile, --return, --cleanup,
1998 and --trc.
1999
2000 --sshloginfile filename
2001 --slf filename
2002 File with sshlogins. The file consists of sshlogins on separate
2003 lines. Empty lines and lines starting with '#' are ignored.
2004 Example:
2005
2006 server.example.com
2007 username@server2.example.com
2008 8/my-8-cpu-server.example.com
2009 2/my_other_username@my-dualcore.example.net
2010 # This server has SSH running on port 2222
2011 ssh -p 2222 server.example.net
2012 4/ssh -p 2222 quadserver.example.net
2013 # Use a different ssh program
2014 myssh -p 2222 -l myusername hexacpu.example.net
2015 # Use a different ssh program with default number of CPUs
2016 //usr/local/bin/myssh -p 2222 -l myusername hexacpu
2017 # Use a different ssh program with 6 CPUs
2018 6//usr/local/bin/myssh -p 2222 -l myusername hexacpu
2019 # Assume 16 CPUs on the local computer
2020 16/:
2021 # Put server1 in hostgroup1
2022 @hostgroup1/server1
2023 # Put myusername@server2 in hostgroup1+hostgroup2
2024 @hostgroup1+hostgroup2/myusername@server2
2025 # Force 4 CPUs and put 'ssh -p 2222 server3' in hostgroup1
2026 @hostgroup1/4/ssh -p 2222 server3
2027
2028 When using a different ssh program the last argument must be the
2029 hostname.
2030
2031 Multiple --sshloginfile are allowed.
2032
2033 GNU parallel will first look for the file in current dir; if that
2034 fails it look for the file in ~/.parallel.
2035
2036 The sshloginfile '..' is special, it read sshlogins from
2037 ~/.parallel/sshloginfile
2038
2039 The sshloginfile '.' is special, it read sshlogins from
2040 /etc/parallel/sshloginfile
2041
2042 The sshloginfile '-' is special, too, it read sshlogins from stdin
2043 (standard input).
2044
2045 If the sshloginfile is changed it will be re-read when a job
2046 finishes though at most once per second. This makes it possible to
2047 add and remove hosts while running.
2048
2049 This can be used to have a daemon that updates the sshloginfile to
2050 only contain servers that are up:
2051
2052 cp original.slf tmp2.slf
2053 while [ 1 ] ; do
2054 nice parallel --nonall -j0 -k --slf original.slf \
2055 --tag echo | perl 's/\t$//' > tmp.slf
2056 if diff tmp.slf tmp2.slf; then
2057 mv tmp.slf tmp2.slf
2058 fi
2059 sleep 10
2060 done &
2061 parallel --slf tmp2.slf ...
2062
2063 --slotreplace replace-str
2064 Use the replacement string replace-str instead of {%} for job slot
2065 number.
2066
2067 --silent
2068 Silent. The job to be run will not be printed. This is the
2069 default. Can be reversed with -v.
2070
2071 --tty
2072 Open terminal tty. If GNU parallel is used for starting a program
2073 that accesses the tty (such as an interactive program) then this
2074 option may be needed. It will default to starting only one job at a
2075 time (i.e. -j1), not buffer the output (i.e. -u), and it will open
2076 a tty for the job.
2077
2078 You can of course override -j1 and -u.
2079
2080 Using --tty unfortunately means that GNU parallel cannot kill the
2081 jobs (with --timeout, --memfree, or --halt). This is due to GNU
2082 parallel giving each child its own process group, which is then
2083 killed. Process groups are dependant on the tty.
2084
2085 --tag (beta testing)
2086 Tag lines with arguments. Each output line will be prepended with
2087 the arguments and TAB (\t). When combined with --onall or --nonall
2088 the lines will be prepended with the sshlogin instead.
2089
2090 --tag is ignored when using -u.
2091
2092 --tagstring str (beta testing)
2093 Tag lines with a string. Each output line will be prepended with
2094 str and TAB (\t). str can contain replacement strings such as {}.
2095
2096 --tagstring is ignored when using -u, --onall, and --nonall.
2097
2098 --tee
2099 Pipe all data to all jobs. Used with --pipe/--pipepart and :::.
2100
2101 seq 1000 | parallel --pipe --tee -v wc {} ::: -w -l -c
2102
2103 How many numbers in 1..1000 contain 0..9, and how many bytes do
2104 they fill:
2105
2106 seq 1000 | parallel --pipe --tee --tag \
2107 'grep {1} | wc {2}' ::: {0..9} ::: -l -c
2108
2109 How many words contain a..z and how many bytes do they fill?
2110
2111 parallel -a /usr/share/dict/words --pipepart --tee --tag \
2112 'grep {1} | wc {2}' ::: {a..z} ::: -l -c
2113
2114 --termseq sequence
2115 Termination sequence. When a job is killed due to --timeout,
2116 --memfree, --halt, or abnormal termination of GNU parallel,
2117 sequence determines how the job is killed. The default is:
2118
2119 TERM,200,TERM,100,TERM,50,KILL,25
2120
2121 which sends a TERM signal, waits 200 ms, sends another TERM signal,
2122 waits 100 ms, sends another TERM signal, waits 50 ms, sends a KILL
2123 signal, waits 25 ms, and exits. GNU parallel detects if a process
2124 dies before the waiting time is up.
2125
2126 --tmpdir dirname
2127 Directory for temporary files. GNU parallel normally buffers output
2128 into temporary files in /tmp. By setting --tmpdir you can use a
2129 different dir for the files. Setting --tmpdir is equivalent to
2130 setting $TMPDIR.
2131
2132 --tmux (Long beta testing)
2133 Use tmux for output. Start a tmux session and run each job in a
2134 window in that session. No other output will be produced.
2135
2136 --tmuxpane (Long beta testing)
2137 Use tmux for output but put output into panes in the first window.
2138 Useful if you want to monitor the progress of less than 100
2139 concurrent jobs.
2140
2141 --timeout duration
2142 Time out for command. If the command runs for longer than duration
2143 seconds it will get killed as per --termseq.
2144
2145 If duration is followed by a % then the timeout will dynamically be
2146 computed as a percentage of the median average runtime of
2147 successful jobs. Only values > 100% will make sense.
2148
2149 duration is normally in seconds, but can be floats postfixed with
2150 s, m, h, or d which would multiply the float by 1, 60, 3600, or
2151 86400. Thus these are equivalent: --timeout 100000 and --timeout
2152 1d3.5h16.6m4s.
2153
2154 --verbose
2155 -t Print the job to be run on stderr (standard error).
2156
2157 See also -v, -p.
2158
2159 --transfer
2160 Transfer files to remote computers. Shorthand for: --transferfile
2161 {}.
2162
2163 --transferfile filename
2164 --tf filename
2165 --transferfile is used with --sshlogin to transfer files to the
2166 remote computers. The files will be transferred using rsync and
2167 will be put relative to the default work dir. If the path contains
2168 /./ the remaining path will be relative to the work dir. E.g.
2169
2170 echo foo/bar.txt | parallel --transferfile {} \
2171 --sshlogin server.example.com wc
2172
2173 This will transfer the file foo/bar.txt to the computer
2174 server.example.com to the file $HOME/foo/bar.txt before running wc
2175 foo/bar.txt on server.example.com.
2176
2177 echo /tmp/foo/bar.txt | parallel --transferfile {} \
2178 --sshlogin server.example.com wc
2179
2180 This will transfer the file /tmp/foo/bar.txt to the computer
2181 server.example.com to the file /tmp/foo/bar.txt before running wc
2182 /tmp/foo/bar.txt on server.example.com.
2183
2184 echo /tmp/./foo/bar.txt | parallel --transferfile {} \
2185 --sshlogin server.example.com wc {= s:.*/./:./: =}
2186
2187 This will transfer the file /tmp/foo/bar.txt to the computer
2188 server.example.com to the file foo/bar.txt before running wc
2189 ./foo/bar.txt on server.example.com.
2190
2191 --transferfile is often used with --return and --cleanup. A
2192 shorthand for --transferfile {} is --transfer.
2193
2194 --transferfile is ignored when used with --sshlogin : or when not
2195 used with --sshlogin.
2196
2197 --trc filename
2198 Transfer, Return, Cleanup. Shorthand for:
2199
2200 --transferfile {} --return filename --cleanup
2201
2202 --trim <n|l|r|lr|rl>
2203 Trim white space in input.
2204
2205 n No trim. Input is not modified. This is the default.
2206
2207 l Left trim. Remove white space from start of input. E.g. " a bc
2208 " -> "a bc ".
2209
2210 r Right trim. Remove white space from end of input. E.g. " a bc "
2211 -> " a bc".
2212
2213 lr
2214 rl Both trim. Remove white space from both start and end of input.
2215 E.g. " a bc " -> "a bc". This is the default if --colsep is
2216 used.
2217
2218 --ungroup
2219 -u Ungroup output. Output is printed as soon as possible and bypasses
2220 GNU parallel internal processing. This may cause output from
2221 different commands to be mixed thus should only be used if you do
2222 not care about the output. Compare these:
2223
2224 seq 4 | parallel -j0 \
2225 'sleep {};echo -n start{};sleep {};echo {}end'
2226 seq 4 | parallel -u -j0 \
2227 'sleep {};echo -n start{};sleep {};echo {}end'
2228
2229 It also disables --tag. GNU parallel outputs faster with -u.
2230 Compare the speeds of these:
2231
2232 parallel seq ::: 300000000 >/dev/null
2233 parallel -u seq ::: 300000000 >/dev/null
2234 parallel --line-buffer seq ::: 300000000 >/dev/null
2235
2236 Can be reversed with --group.
2237
2238 See also: --line-buffer --group
2239
2240 --extensionreplace replace-str
2241 --er replace-str
2242 Use the replacement string replace-str instead of {.} for input
2243 line without extension.
2244
2245 --use-sockets-instead-of-threads
2246 --use-cores-instead-of-threads
2247 --use-cpus-instead-of-cores (obsolete)
2248 Determine how GNU parallel counts the number of CPUs. GNU parallel
2249 uses this number when the number of jobslots is computed relative
2250 to the number of CPUs (e.g. 100% or +1).
2251
2252 CPUs can be counted in three different ways:
2253
2254 sockets The number of filled CPU sockets (i.e. the number of
2255 physical chips).
2256
2257 cores The number of physical cores (i.e. the number of physical
2258 compute cores).
2259
2260 threads The number of hyperthreaded cores (i.e. the number of
2261 virtual cores - with some of them possibly being
2262 hyperthreaded)
2263
2264 Normally the number of CPUs is computed as the number of CPU
2265 threads. With --use-sockets-instead-of-threads or
2266 --use-cores-instead-of-threads you can force it to be computed as
2267 the number of filled sockets or number of cores instead.
2268
2269 Most users will not need these options.
2270
2271 --use-cpus-instead-of-cores is a (misleading) alias for
2272 --use-sockets-instead-of-threads and is kept for backwards
2273 compatibility.
2274
2275 -v Verbose. Print the job to be run on stdout (standard output). Can
2276 be reversed with --silent. See also -t.
2277
2278 Use -v -v to print the wrapping ssh command when running remotely.
2279
2280 --version
2281 -V Print the version GNU parallel and exit.
2282
2283 --workdir mydir
2284 --wd mydir
2285 Files transferred using --transferfile and --return will be
2286 relative to mydir on remote computers, and the command will be
2287 executed in the dir mydir.
2288
2289 The special mydir value ... will create working dirs under
2290 ~/.parallel/tmp/ on the remote computers. If --cleanup is given
2291 these dirs will be removed.
2292
2293 The special mydir value . uses the current working dir. If the
2294 current working dir is beneath your home dir, the value . is
2295 treated as the relative path to your home dir. This means that if
2296 your home dir is different on remote computers (e.g. if your login
2297 is different) the relative path will still be relative to your home
2298 dir.
2299
2300 To see the difference try:
2301
2302 parallel -S server pwd ::: ""
2303 parallel --wd . -S server pwd ::: ""
2304 parallel --wd ... -S server pwd ::: ""
2305
2306 mydir can contain GNU parallel's replacement strings.
2307
2308 --wait
2309 Wait for all commands to complete.
2310
2311 Used with --semaphore or --sqlmaster.
2312
2313 See also man sem.
2314
2315 -X Multiple arguments with context replace. Insert as many arguments
2316 as the command line length permits. If multiple jobs are being run
2317 in parallel: distribute the arguments evenly among the jobs. Use
2318 -j1 to avoid this.
2319
2320 If {} is not used the arguments will be appended to the line. If
2321 {} is used as part of a word (like pic{}.jpg) then the whole word
2322 will be repeated. If {} is used multiple times each {} will be
2323 replaced with the arguments.
2324
2325 Normally -X will do the right thing, whereas -m can give unexpected
2326 results if {} is used as part of a word.
2327
2328 Support for -X with --sshlogin is limited and may fail.
2329
2330 See also -m.
2331
2332 --exit
2333 -x Exit if the size (see the -s option) is exceeded.
2334
2336 GNU parallel can work similar to xargs -n1.
2337
2338 To compress all html files using gzip run:
2339
2340 find . -name '*.html' | parallel gzip --best
2341
2342 If the file names may contain a newline use -0. Substitute FOO BAR with
2343 FUBAR in all files in this dir and subdirs:
2344
2345 find . -type f -print0 | \
2346 parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'
2347
2348 Note -q is needed because of the space in 'FOO BAR'.
2349
2351 prips can generate IP-addresses from CIDR notation. With GNU parallel
2352 you can build a simple network scanner to see which addresses respond
2353 to ping:
2354
2355 prips 130.229.16.0/20 | \
2356 parallel --timeout 2 -j0 \
2357 'ping -c 1 {} >/dev/null && echo {}' 2>/dev/null
2358
2360 GNU parallel can take the arguments from command line instead of stdin
2361 (standard input). To compress all html files in the current dir using
2362 gzip run:
2363
2364 parallel gzip --best ::: *.html
2365
2366 To convert *.wav to *.mp3 using LAME running one process per CPU run:
2367
2368 parallel lame {} -o {.}.mp3 ::: *.wav
2369
2371 When moving a lot of files like this: mv *.log destdir you will
2372 sometimes get the error:
2373
2374 bash: /bin/mv: Argument list too long
2375
2376 because there are too many files. You can instead do:
2377
2378 ls | grep -E '\.log$' | parallel mv {} destdir
2379
2380 This will run mv for each file. It can be done faster if mv gets as
2381 many arguments that will fit on the line:
2382
2383 ls | grep -E '\.log$' | parallel -m mv {} destdir
2384
2385 In many shells you can also use printf:
2386
2387 printf '%s\0' *.log | parallel -0 -m mv {} destdir
2388
2390 To remove the files pict0000.jpg .. pict9999.jpg you could do:
2391
2392 seq -w 0 9999 | parallel rm pict{}.jpg
2393
2394 You could also do:
2395
2396 seq -w 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm
2397
2398 The first will run rm 10000 times, while the last will only run rm as
2399 many times needed to keep the command line length short enough to avoid
2400 Argument list too long (it typically runs 1-2 times).
2401
2402 You could also run:
2403
2404 seq -w 0 9999 | parallel -X rm pict{}.jpg
2405
2406 This will also only run rm as many times needed to keep the command
2407 line length short enough.
2408
2410 If ImageMagick is installed this will generate a thumbnail of a jpg
2411 file:
2412
2413 convert -geometry 120 foo.jpg thumb_foo.jpg
2414
2415 This will run with number-of-cpus jobs in parallel for all jpg files in
2416 a directory:
2417
2418 ls *.jpg | parallel convert -geometry 120 {} thumb_{}
2419
2420 To do it recursively use find:
2421
2422 find . -name '*.jpg' | \
2423 parallel convert -geometry 120 {} {}_thumb.jpg
2424
2425 Notice how the argument has to start with {} as {} will include path
2426 (e.g. running convert -geometry 120 ./foo/bar.jpg thumb_./foo/bar.jpg
2427 would clearly be wrong). The command will generate files like
2428 ./foo/bar.jpg_thumb.jpg.
2429
2430 Use {.} to avoid the extra .jpg in the file name. This command will
2431 make files like ./foo/bar_thumb.jpg:
2432
2433 find . -name '*.jpg' | \
2434 parallel convert -geometry 120 {} {.}_thumb.jpg
2435
2437 This will generate an uncompressed version of .gz-files next to the
2438 .gz-file:
2439
2440 parallel zcat {} ">"{.} ::: *.gz
2441
2442 Quoting of > is necessary to postpone the redirection. Another solution
2443 is to quote the whole command:
2444
2445 parallel "zcat {} >{.}" ::: *.gz
2446
2447 Other special shell characters (such as * ; $ > < | >> <<) also need to
2448 be put in quotes, as they may otherwise be interpreted by the shell and
2449 not given to GNU parallel.
2450
2452 A job can consist of several commands. This will print the number of
2453 files in each directory:
2454
2455 ls | parallel 'echo -n {}" "; ls {}|wc -l'
2456
2457 To put the output in a file called <name>.dir:
2458
2459 ls | parallel '(echo -n {}" "; ls {}|wc -l) >{}.dir'
2460
2461 Even small shell scripts can be run by GNU parallel:
2462
2463 find . | parallel 'a={}; name=${a##*/};' \
2464 'upper=$(echo "$name" | tr "[:lower:]" "[:upper:]");'\
2465 'echo "$name - $upper"'
2466
2467 ls | parallel 'mv {} "$(echo {} | tr "[:upper:]" "[:lower:]")"'
2468
2469 Given a list of URLs, list all URLs that fail to download. Print the
2470 line number and the URL.
2471
2472 cat urlfile | parallel "wget {} 2>/dev/null || grep -n {} urlfile"
2473
2474 Create a mirror directory with the same filenames except all files and
2475 symlinks are empty files.
2476
2477 cp -rs /the/source/dir mirror_dir
2478 find mirror_dir -type l | parallel -m rm {} '&&' touch {}
2479
2480 Find the files in a list that do not exist
2481
2482 cat file_list | parallel 'if [ ! -e {} ] ; then echo {}; fi'
2483
2485 You have a bunch of file. You want them sorted into dirs. The dir of
2486 each file should be named the first letter of the file name.
2487
2488 parallel 'mkdir -p {=s/(.).*/$1/=}; mv {} {=s/(.).*/$1/=}' ::: *
2489
2491 You have a dir with files named as 24 hours in 5 minute intervals:
2492 00:00, 00:05, 00:10 .. 23:55. You want to find the files missing:
2493
2494 parallel [ -f {1}:{2} ] "||" echo {1}:{2} does not exist \
2495 ::: {00..23} ::: {00..55..5}
2496
2498 If the composed command is longer than a line, it becomes hard to read.
2499 In Bash you can use functions. Just remember to export -f the function.
2500
2501 doit() {
2502 echo Doing it for $1
2503 sleep 2
2504 echo Done with $1
2505 }
2506 export -f doit
2507 parallel doit ::: 1 2 3
2508
2509 doubleit() {
2510 echo Doing it for $1 $2
2511 sleep 2
2512 echo Done with $1 $2
2513 }
2514 export -f doubleit
2515 parallel doubleit ::: 1 2 3 ::: a b
2516
2517 To do this on remote servers you need to transfer the function using
2518 --env:
2519
2520 parallel --env doit -S server doit ::: 1 2 3
2521 parallel --env doubleit -S server doubleit ::: 1 2 3 ::: a b
2522
2523 If your environment (aliases, variables, and functions) is small you
2524 can copy the full environment without having to export -f anything. See
2525 env_parallel.
2526
2528 To test a program with different parameters:
2529
2530 tester() {
2531 if (eval "$@") >&/dev/null; then
2532 perl -e 'printf "\033[30;102m[ OK ]\033[0m @ARGV\n"' "$@"
2533 else
2534 perl -e 'printf "\033[30;101m[FAIL]\033[0m @ARGV\n"' "$@"
2535 fi
2536 }
2537 export -f tester
2538 parallel tester my_program ::: arg1 arg2
2539 parallel tester exit ::: 1 0 2 0
2540
2541 If my_program fails a red FAIL will be printed followed by the failing
2542 command; otherwise a green OK will be printed followed by the command.
2543
2545 Log rotation renames a logfile to an extension with a higher number:
2546 log.1 becomes log.2, log.2 becomes log.3, and so on. The oldest log is
2547 removed. To avoid overwriting files the process starts backwards from
2548 the high number to the low number. This will keep 10 old versions of
2549 the log:
2550
2551 seq 9 -1 1 | parallel -j1 mv log.{} log.'{= $_++ =}'
2552 mv log log.1
2553
2555 When processing files removing the file extension using {.} is often
2556 useful.
2557
2558 Create a directory for each zip-file and unzip it in that dir:
2559
2560 parallel 'mkdir {.}; cd {.}; unzip ../{}' ::: *.zip
2561
2562 Recompress all .gz files in current directory using bzip2 running 1 job
2563 per CPU in parallel:
2564
2565 parallel "zcat {} | bzip2 >{.}.bz2 && rm {}" ::: *.gz
2566
2567 Convert all WAV files to MP3 using LAME:
2568
2569 find sounddir -type f -name '*.wav' | parallel lame {} -o {.}.mp3
2570
2571 Put all converted in the same directory:
2572
2573 find sounddir -type f -name '*.wav' | \
2574 parallel lame {} -o mydir/{/.}.mp3
2575
2577 If you have directory with tar.gz files and want these extracted in the
2578 corresponding dir (e.g foo.tar.gz will be extracted in the dir foo) you
2579 can do:
2580
2581 parallel --plus 'mkdir {..}; tar -C {..} -xf {}' ::: *.tar.gz
2582
2583 If you want to remove a different ending, you can use {%string}:
2584
2585 parallel --plus echo {%_demo} ::: mycode_demo keep_demo_here
2586
2587 You can also remove a starting string with {#string}
2588
2589 parallel --plus echo {#demo_} ::: demo_mycode keep_demo_here
2590
2591 To remove a string anywhere you can use regular expressions with
2592 {/regexp/replacement} and leave the replacement empty:
2593
2594 parallel --plus echo {/demo_/} ::: demo_mycode remove_demo_here
2595
2597 Let us assume a website stores images like:
2598
2599 http://www.example.com/path/to/YYYYMMDD_##.jpg
2600
2601 where YYYYMMDD is the date and ## is the number 01-24. This will
2602 download images for the past 30 days:
2603
2604 getit() {
2605 date=$(date -d "today -$1 days" +%Y%m%d)
2606 num=$2
2607 echo wget http://www.example.com/path/to/${date}_${num}.jpg
2608 }
2609 export -f getit
2610
2611 parallel getit ::: $(seq 30) ::: $(seq -w 24)
2612
2613 $(date -d "today -$1 days" +%Y%m%d) will give the dates in YYYYMMDD
2614 with $1 days subtracted.
2615
2617 NASA provides tiles to download on earthdata.nasa.gov. Download tiles
2618 for Blue Marble world map and create a 10240x20480 map.
2619
2620 base=https://map1a.vis.earthdata.nasa.gov/wmts-geo/wmts.cgi
2621 service="SERVICE=WMTS&REQUEST=GetTile&VERSION=1.0.0"
2622 layer="LAYER=BlueMarble_ShadedRelief_Bathymetry"
2623 set="STYLE=&TILEMATRIXSET=EPSG4326_500m&TILEMATRIX=5"
2624 tile="TILEROW={1}&TILECOL={2}"
2625 format="FORMAT=image%2Fjpeg"
2626 url="$base?$service&$layer&$set&$tile&$format"
2627
2628 parallel -j0 -q wget "$url" -O {1}_{2}.jpg ::: {0..19} ::: {0..39}
2629 parallel eval convert +append {}_{0..39}.jpg line{}.jpg ::: {0..19}
2630 convert -append line{0..19}.jpg world.jpg
2631
2633 Search NASA using their API to get JSON for images related to 'apollo
2634 11' and has 'moon landing' in the description.
2635
2636 The search query returns JSON containing URLs to JSON containing
2637 collections of pictures. One of the pictures in each of these
2638 collection is large.
2639
2640 wget is used to get the JSON for the search query. jq is then used to
2641 extract the URLs of the collections. parallel then calls wget to get
2642 each collection, which is passed to jq to extract the URLs of all
2643 images. grep filters out the large images, and parallel finally uses
2644 wget to fetch the images.
2645
2646 base="https://images-api.nasa.gov/search"
2647 q="q=apollo 11"
2648 description="description=moon landing"
2649 media_type="media_type=image"
2650 wget -O - "$base?$q&$description&$media_type" |
2651 jq -r .collection.items[].href |
2652 parallel wget -O - |
2653 jq -r .[] |
2654 grep large |
2655 parallel wget
2656
2658 youtube-dl is an excellent tool to download videos. It can, however,
2659 not download videos in parallel. This takes a playlist and downloads 10
2660 videos in parallel.
2661
2662 url='youtu.be/watch?v=0wOf2Fgi3DE&list=UU_cznB5YZZmvAmeq7Y3EriQ'
2663 export url
2664 youtube-dl --flat-playlist "https://$url" |
2665 parallel --tagstring {#} --lb -j10 \
2666 youtube-dl --playlist-start {#} --playlist-end {#} '"https://$url"'
2667
2669 parallel mv {} '{= $a=pQ($_); $b=$_;' \
2670 '$_=qx{date -r "$a" +%FT%T}; chomp; $_="$_ $b" =}' ::: *
2671
2672 {= and =} mark a perl expression. pQ perl-quotes the string. date
2673 +%FT%T is the date in ISO8601 with time.
2674
2676 Save output from ps aux every second into dirs named
2677 yyyy-mm-ddThh:mm:ss+zz:zz.
2678
2679 seq 1000 | parallel -N0 -j1 --delay 1 \
2680 --results '{= $_=`date -Isec`; chomp=}/' ps aux
2681
2683 The : in a digital clock blinks. To make every other line have a ':'
2684 and the rest a ' ' a perl expression is used to look at the 3rd input
2685 source. If the value modulo 2 is 1: Use ":" otherwise use " ":
2686
2687 parallel -k echo {1}'{=3 $_=$_%2?":":" "=}'{2}{3} \
2688 ::: {0..12} ::: {0..5} ::: {0..9}
2689
2691 This:
2692
2693 parallel --header : echo x{X}y{Y}z{Z} \> x{X}y{Y}z{Z} \
2694 ::: X {1..5} ::: Y {01..10} ::: Z {1..5}
2695
2696 will generate the files x1y01z1 .. x5y10z5. If you want to aggregate
2697 the output grouping on x and z you can do this:
2698
2699 parallel eval 'cat {=s/y01/y*/=} > {=s/y01//=}' ::: *y01*
2700
2701 For all values of x and z it runs commands like:
2702
2703 cat x1y*z1 > x1z1
2704
2705 So you end up with x1z1 .. x5z5 each containing the content of all
2706 values of y.
2707
2709 This script below will crawl and mirror a URL in parallel. It
2710 downloads first pages that are 1 click down, then 2 clicks down, then
2711 3; instead of the normal depth first, where the first link link on each
2712 page is fetched first.
2713
2714 Run like this:
2715
2716 PARALLEL=-j100 ./parallel-crawl http://gatt.org.yeslab.org/
2717
2718 Remove the wget part if you only want a web crawler.
2719
2720 It works by fetching a page from a list of URLs and looking for links
2721 in that page that are within the same starting URL and that have not
2722 already been seen. These links are added to a new queue. When all the
2723 pages from the list is done, the new queue is moved to the list of URLs
2724 and the process is started over until no unseen links are found.
2725
2726 #!/bin/bash
2727
2728 # E.g. http://gatt.org.yeslab.org/
2729 URL=$1
2730 # Stay inside the start dir
2731 BASEURL=$(echo $URL | perl -pe 's:#.*::; s:(//.*/)[^/]*:$1:')
2732 URLLIST=$(mktemp urllist.XXXX)
2733 URLLIST2=$(mktemp urllist.XXXX)
2734 SEEN=$(mktemp seen.XXXX)
2735
2736 # Spider to get the URLs
2737 echo $URL >$URLLIST
2738 cp $URLLIST $SEEN
2739
2740 while [ -s $URLLIST ] ; do
2741 cat $URLLIST |
2742 parallel lynx -listonly -image_links -dump {} \; \
2743 wget -qm -l1 -Q1 {} \; echo Spidered: {} \>\&2 |
2744 perl -ne 's/#.*//; s/\s+\d+.\s(\S+)$/$1/ and
2745 do { $seen{$1}++ or print }' |
2746 grep -F $BASEURL |
2747 grep -v -x -F -f $SEEN | tee -a $SEEN > $URLLIST2
2748 mv $URLLIST2 $URLLIST
2749 done
2750
2751 rm -f $URLLIST $URLLIST2 $SEEN
2752
2754 If the files to be processed are in a tar file then unpacking one file
2755 and processing it immediately may be faster than first unpacking all
2756 files.
2757
2758 tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' | \
2759 parallel echo
2760
2761 The Perl one-liner is needed to make sure the file is complete before
2762 handing it to GNU parallel.
2763
2765 for-loops like this:
2766
2767 (for x in `cat list` ; do
2768 do_something $x
2769 done) | process_output
2770
2771 and while-read-loops like this:
2772
2773 cat list | (while read x ; do
2774 do_something $x
2775 done) | process_output
2776
2777 can be written like this:
2778
2779 cat list | parallel do_something | process_output
2780
2781 For example: Find which host name in a list has IP address 1.2.3 4:
2782
2783 cat hosts.txt | parallel -P 100 host | grep 1.2.3.4
2784
2785 If the processing requires more steps the for-loop like this:
2786
2787 (for x in `cat list` ; do
2788 no_extension=${x%.*};
2789 do_step1 $x scale $no_extension.jpg
2790 do_step2 <$x $no_extension
2791 done) | process_output
2792
2793 and while-loops like this:
2794
2795 cat list | (while read x ; do
2796 no_extension=${x%.*};
2797 do_step1 $x scale $no_extension.jpg
2798 do_step2 <$x $no_extension
2799 done) | process_output
2800
2801 can be written like this:
2802
2803 cat list | parallel "do_step1 {} scale {.}.jpg ; do_step2 <{} {.}" |\
2804 process_output
2805
2806 If the body of the loop is bigger, it improves readability to use a
2807 function:
2808
2809 (for x in `cat list` ; do
2810 do_something $x
2811 [... 100 lines that do something with $x ...]
2812 done) | process_output
2813
2814 cat list | (while read x ; do
2815 do_something $x
2816 [... 100 lines that do something with $x ...]
2817 done) | process_output
2818
2819 can both be rewritten as:
2820
2821 doit() {
2822 x=$1
2823 do_something $x
2824 [... 100 lines that do something with $x ...]
2825 }
2826 export -f doit
2827 cat list | parallel doit
2828
2830 Nested for-loops like this:
2831
2832 (for x in `cat xlist` ; do
2833 for y in `cat ylist` ; do
2834 do_something $x $y
2835 done
2836 done) | process_output
2837
2838 can be written like this:
2839
2840 parallel do_something {1} {2} :::: xlist ylist | process_output
2841
2842 Nested for-loops like this:
2843
2844 (for colour in red green blue ; do
2845 for size in S M L XL XXL ; do
2846 echo $colour $size
2847 done
2848 done) | sort
2849
2850 can be written like this:
2851
2852 parallel echo {1} {2} ::: red green blue ::: S M L XL XXL | sort
2853
2855 diff is good for finding differences in text files. diff | wc -l gives
2856 an indication of the size of the difference. To find the differences
2857 between all files in the current dir do:
2858
2859 parallel --tag 'diff {1} {2} | wc -l' ::: * ::: * | sort -nk3
2860
2861 This way it is possible to see if some files are closer to other files.
2862
2864 When doing multiple nested for-loops it can be easier to keep track of
2865 the loop variable if is is named instead of just having a number. Use
2866 --header : to let the first argument be an named alias for the
2867 positional replacement string:
2868
2869 parallel --header : echo {colour} {size} \
2870 ::: colour red green blue ::: size S M L XL XXL
2871
2872 This also works if the input file is a file with columns:
2873
2874 cat addressbook.tsv | \
2875 parallel --colsep '\t' --header : echo {Name} {E-mail address}
2876
2878 GNU parallel makes all combinations when given two lists.
2879
2880 To make all combinations in a single list with unique values, you
2881 repeat the list and use replacement string {choose_k}:
2882
2883 parallel --plus echo {choose_k} ::: A B C D ::: A B C D
2884
2885 parallel --plus echo 2{2choose_k} 1{1choose_k} ::: A B C D ::: A B C D
2886
2887 {choose_k} works for any number of input sources:
2888
2889 parallel --plus echo {choose_k} ::: A B C D ::: A B C D ::: A B C D
2890
2892 Assume you have input like:
2893
2894 aardvark
2895 babble
2896 cab
2897 dab
2898 each
2899
2900 and want to run combinations like:
2901
2902 aardvark babble
2903 babble cab
2904 cab dab
2905 dab each
2906
2907 If the input is in the file in.txt:
2908
2909 parallel echo {1} - {2} ::::+ <(head -n -1 in.txt) <(tail -n +2 in.txt)
2910
2911 If the input is in the array $a here are two solutions:
2912
2913 seq $((${#a[@]}-1)) | \
2914 env_parallel --env a echo '${a[{=$_--=}]} - ${a[{}]}'
2915 parallel echo {1} - {2} ::: "${a[@]::${#a[@]}-1}" :::+ "${a[@]:1}"
2916
2918 Using --results the results are saved in /tmp/diffcount*.
2919
2920 parallel --results /tmp/diffcount "diff -U 0 {1} {2} | \
2921 tail -n +3 |grep -v '^@'|wc -l" ::: * ::: *
2922
2923 To see the difference between file A and file B look at the file
2924 '/tmp/diffcount/1/A/2/B'.
2925
2927 Starting a job on the local machine takes around 10 ms. This can be a
2928 big overhead if the job takes very few ms to run. Often you can group
2929 small jobs together using -X which will make the overhead less
2930 significant. Compare the speed of these:
2931
2932 seq -w 0 9999 | parallel touch pict{}.jpg
2933 seq -w 0 9999 | parallel -X touch pict{}.jpg
2934
2935 If your program cannot take multiple arguments, then you can use GNU
2936 parallel to spawn multiple GNU parallels:
2937
2938 seq -w 0 9999999 | \
2939 parallel -j10 -q -I,, --pipe parallel -j0 touch pict{}.jpg
2940
2941 If -j0 normally spawns 252 jobs, then the above will try to spawn 2520
2942 jobs. On a normal GNU/Linux system you can spawn 32000 jobs using this
2943 technique with no problems. To raise the 32000 jobs limit raise
2944 /proc/sys/kernel/pid_max to 4194303.
2945
2946 If you do not need GNU parallel to have control over each job (so no
2947 need for --retries or --joblog or similar), then it can be even faster
2948 if you can generate the command lines and pipe those to a shell. So if
2949 you can do this:
2950
2951 mygenerator | sh
2952
2953 Then that can be parallelized like this:
2954
2955 mygenerator | parallel --pipe --block 10M sh
2956
2957 E.g.
2958
2959 mygenerator() {
2960 seq 10000000 | perl -pe 'print "echo This is fast job number "';
2961 }
2962 mygenerator | parallel --pipe --block 10M sh
2963
2964 The overhead is 100000 times smaller namely around 100 nanoseconds per
2965 job.
2966
2968 When using shell variables you need to quote them correctly as they may
2969 otherwise be interpreted by the shell.
2970
2971 Notice the difference between:
2972
2973 ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
2974 parallel echo ::: ${ARR[@]} # This is probably not what you want
2975
2976 and:
2977
2978 ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
2979 parallel echo ::: "${ARR[@]}"
2980
2981 When using variables in the actual command that contains special
2982 characters (e.g. space) you can quote them using '"$VAR"' or using "'s
2983 and -q:
2984
2985 VAR="My brother's 12\" records are worth <\$\$\$>"
2986 parallel -q echo "$VAR" ::: '!'
2987 export VAR
2988 parallel echo '"$VAR"' ::: '!'
2989
2990 If $VAR does not contain ' then "'$VAR'" will also work (and does not
2991 need export):
2992
2993 VAR="My 12\" records are worth <\$\$\$>"
2994 parallel echo "'$VAR'" ::: '!'
2995
2996 If you use them in a function you just quote as you normally would do:
2997
2998 VAR="My brother's 12\" records are worth <\$\$\$>"
2999 export VAR
3000 myfunc() { echo "$VAR" "$1"; }
3001 export -f myfunc
3002 parallel myfunc ::: '!'
3003
3005 When running jobs that output data, you often do not want the output of
3006 multiple jobs to run together. GNU parallel defaults to grouping the
3007 output of each job, so the output is printed when the job finishes. If
3008 you want full lines to be printed while the job is running you can use
3009 --line-buffer. If you want output to be printed as soon as possible you
3010 can use -u.
3011
3012 Compare the output of:
3013
3014 parallel wget --limit-rate=100k \
3015 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3016 ::: {12..16}
3017 parallel --line-buffer wget --limit-rate=100k \
3018 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3019 ::: {12..16}
3020 parallel -u wget --limit-rate=100k \
3021 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3022 ::: {12..16}
3023
3025 GNU parallel groups the output lines, but it can be hard to see where
3026 the different jobs begin. --tag prepends the argument to make that more
3027 visible:
3028
3029 parallel --tag wget --limit-rate=100k \
3030 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3031 ::: {12..16}
3032
3033 --tag works with --line-buffer but not with -u:
3034
3035 parallel --tag --line-buffer wget --limit-rate=100k \
3036 https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
3037 ::: {12..16}
3038
3039 Check the uptime of the servers in ~/.parallel/sshloginfile:
3040
3041 parallel --tag -S .. --nonall uptime
3042
3044 Give each job a new color. Most terminals support ANSI colors with the
3045 escape code "\033[30;3Xm" where 0 <= X <= 7:
3046
3047 seq 10 | \
3048 parallel --tagstring '\033[30;3{=$_=++$::color%8=}m' seq {}
3049 parallel --rpl '{color} $_="\033[30;3".(++$::color%8)."m"' \
3050 --tagstring {color} seq {} ::: {1..10}
3051
3052 To get rid of the initial \t (which comes from --tagstring):
3053
3054 ... | perl -pe 's/\t//'
3055
3057 Normally the output of a job will be printed as soon as it completes.
3058 Sometimes you want the order of the output to remain the same as the
3059 order of the input. This is often important, if the output is used as
3060 input for another system. -k will make sure the order of output will be
3061 in the same order as input even if later jobs end before earlier jobs.
3062
3063 Append a string to every line in a text file:
3064
3065 cat textfile | parallel -k echo {} append_string
3066
3067 If you remove -k some of the lines may come out in the wrong order.
3068
3069 Another example is traceroute:
3070
3071 parallel traceroute ::: qubes-os.org debian.org freenetproject.org
3072
3073 will give traceroute of qubes-os.org, debian.org and
3074 freenetproject.org, but it will be sorted according to which job
3075 completed first.
3076
3077 To keep the order the same as input run:
3078
3079 parallel -k traceroute ::: qubes-os.org debian.org freenetproject.org
3080
3081 This will make sure the traceroute to qubes-os.org will be printed
3082 first.
3083
3084 A bit more complex example is downloading a huge file in chunks in
3085 parallel: Some internet connections will deliver more data if you
3086 download files in parallel. For downloading files in parallel see:
3087 "EXAMPLE: Download 10 images for each of the past 30 days". But if you
3088 are downloading a big file you can download the file in chunks in
3089 parallel.
3090
3091 To download byte 10000000-19999999 you can use curl:
3092
3093 curl -r 10000000-19999999 http://example.com/the/big/file >file.part
3094
3095 To download a 1 GB file we need 100 10MB chunks downloaded and combined
3096 in the correct order.
3097
3098 seq 0 99 | parallel -k curl -r \
3099 {}0000000-{}9999999 http://example.com/the/big/file > file
3100
3102 grep -r greps recursively through directories. On multicore CPUs GNU
3103 parallel can often speed this up.
3104
3105 find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}
3106
3107 This will run 1.5 job per CPU, and give 1000 arguments to grep.
3108
3110 The simplest solution to grep a big file for a lot of regexps is:
3111
3112 grep -f regexps.txt bigfile
3113
3114 Or if the regexps are fixed strings:
3115
3116 grep -F -f regexps.txt bigfile
3117
3118 There are 3 limiting factors: CPU, RAM, and disk I/O.
3119
3120 RAM is easy to measure: If the grep process takes up most of your free
3121 memory (e.g. when running top), then RAM is a limiting factor.
3122
3123 CPU is also easy to measure: If the grep takes >90% CPU in top, then
3124 the CPU is a limiting factor, and parallelization will speed this up.
3125
3126 It is harder to see if disk I/O is the limiting factor, and depending
3127 on the disk system it may be faster or slower to parallelize. The only
3128 way to know for certain is to test and measure.
3129
3130 Limiting factor: RAM
3131 The normal grep -f regexs.txt bigfile works no matter the size of
3132 bigfile, but if regexps.txt is so big it cannot fit into memory, then
3133 you need to split this.
3134
3135 grep -F takes around 100 bytes of RAM and grep takes about 500 bytes of
3136 RAM per 1 byte of regexp. So if regexps.txt is 1% of your RAM, then it
3137 may be too big.
3138
3139 If you can convert your regexps into fixed strings do that. E.g. if the
3140 lines you are looking for in bigfile all looks like:
3141
3142 ID1 foo bar baz Identifier1 quux
3143 fubar ID2 foo bar baz Identifier2
3144
3145 then your regexps.txt can be converted from:
3146
3147 ID1.*Identifier1
3148 ID2.*Identifier2
3149
3150 into:
3151
3152 ID1 foo bar baz Identifier1
3153 ID2 foo bar baz Identifier2
3154
3155 This way you can use grep -F which takes around 80% less memory and is
3156 much faster.
3157
3158 If it still does not fit in memory you can do this:
3159
3160 parallel --pipepart -a regexps.txt --block 1M grep -Ff - -n bigfile | \
3161 sort -un | perl -pe 's/^\d+://'
3162
3163 The 1M should be your free memory divided by the number of CPU threads
3164 and divided by 200 for grep -F and by 1000 for normal grep. On
3165 GNU/Linux you can do:
3166
3167 free=$(awk '/^((Swap)?Cached|MemFree|Buffers):/ { sum += $2 }
3168 END { print sum }' /proc/meminfo)
3169 percpu=$((free / 200 / $(parallel --number-of-threads)))k
3170
3171 parallel --pipepart -a regexps.txt --block $percpu --compress \
3172 grep -F -f - -n bigfile | \
3173 sort -un | perl -pe 's/^\d+://'
3174
3175 If you can live with duplicated lines and wrong order, it is faster to
3176 do:
3177
3178 parallel --pipepart -a regexps.txt --block $percpu --compress \
3179 grep -F -f - bigfile
3180
3181 Limiting factor: CPU
3182 If the CPU is the limiting factor parallelization should be done on the
3183 regexps:
3184
3185 cat regexp.txt | parallel --pipe -L1000 --roundrobin --compress \
3186 grep -f - -n bigfile | \
3187 sort -un | perl -pe 's/^\d+://'
3188
3189 The command will start one grep per CPU and read bigfile one time per
3190 CPU, but as that is done in parallel, all reads except the first will
3191 be cached in RAM. Depending on the size of regexp.txt it may be faster
3192 to use --block 10m instead of -L1000.
3193
3194 Some storage systems perform better when reading multiple chunks in
3195 parallel. This is true for some RAID systems and for some network file
3196 systems. To parallelize the reading of bigfile:
3197
3198 parallel --pipepart --block 100M -a bigfile -k --compress \
3199 grep -f regexp.txt
3200
3201 This will split bigfile into 100MB chunks and run grep on each of these
3202 chunks. To parallelize both reading of bigfile and regexp.txt combine
3203 the two using --fifo:
3204
3205 parallel --pipepart --block 100M -a bigfile --fifo cat regexp.txt \
3206 \| parallel --pipe -L1000 --roundrobin grep -f - {}
3207
3208 If a line matches multiple regexps, the line may be duplicated.
3209
3210 Bigger problem
3211 If the problem is too big to be solved by this, you are probably ready
3212 for Lucene.
3213
3215 To run commands on a remote computer SSH needs to be set up and you
3216 must be able to login without entering a password (The commands ssh-
3217 copy-id, ssh-agent, and sshpass may help you do that).
3218
3219 If you need to login to a whole cluster, you typically do not want to
3220 accept the host key for every host. You want to accept them the first
3221 time and be warned if they are ever changed. To do that:
3222
3223 # Add the servers to the sshloginfile
3224 (echo servera; echo serverb) > .parallel/my_cluster
3225 # Make sure .ssh/config exist
3226 touch .ssh/config
3227 cp .ssh/config .ssh/config.backup
3228 # Disable StrictHostKeyChecking temporarily
3229 (echo 'Host *'; echo StrictHostKeyChecking no) >> .ssh/config
3230 parallel --slf my_cluster --nonall true
3231 # Remove the disabling of StrictHostKeyChecking
3232 mv .ssh/config.backup .ssh/config
3233
3234 The servers in .parallel/my_cluster are now added in .ssh/known_hosts.
3235
3236 To run echo on server.example.com:
3237
3238 seq 10 | parallel --sshlogin server.example.com echo
3239
3240 To run commands on more than one remote computer run:
3241
3242 seq 10 | parallel --sshlogin s1.example.com,s2.example.net echo
3243
3244 Or:
3245
3246 seq 10 | parallel --sshlogin server.example.com \
3247 --sshlogin server2.example.net echo
3248
3249 If the login username is foo on server2.example.net use:
3250
3251 seq 10 | parallel --sshlogin server.example.com \
3252 --sshlogin foo@server2.example.net echo
3253
3254 If your list of hosts is server1-88.example.net with login foo:
3255
3256 seq 10 | parallel -Sfoo@server{1..88}.example.net echo
3257
3258 To distribute the commands to a list of computers, make a file
3259 mycomputers with all the computers:
3260
3261 server.example.com
3262 foo@server2.example.com
3263 server3.example.com
3264
3265 Then run:
3266
3267 seq 10 | parallel --sshloginfile mycomputers echo
3268
3269 To include the local computer add the special sshlogin ':' to the list:
3270
3271 server.example.com
3272 foo@server2.example.com
3273 server3.example.com
3274 :
3275
3276 GNU parallel will try to determine the number of CPUs on each of the
3277 remote computers, and run one job per CPU - even if the remote
3278 computers do not have the same number of CPUs.
3279
3280 If the number of CPUs on the remote computers is not identified
3281 correctly the number of CPUs can be added in front. Here the computer
3282 has 8 CPUs.
3283
3284 seq 10 | parallel --sshlogin 8/server.example.com echo
3285
3287 To recompress gzipped files with bzip2 using a remote computer run:
3288
3289 find logs/ -name '*.gz' | \
3290 parallel --sshlogin server.example.com \
3291 --transfer "zcat {} | bzip2 -9 >{.}.bz2"
3292
3293 This will list the .gz-files in the logs directory and all directories
3294 below. Then it will transfer the files to server.example.com to the
3295 corresponding directory in $HOME/logs. On server.example.com the file
3296 will be recompressed using zcat and bzip2 resulting in the
3297 corresponding file with .gz replaced with .bz2.
3298
3299 If you want the resulting bz2-file to be transferred back to the local
3300 computer add --return {.}.bz2:
3301
3302 find logs/ -name '*.gz' | \
3303 parallel --sshlogin server.example.com \
3304 --transfer --return {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3305
3306 After the recompressing is done the .bz2-file is transferred back to
3307 the local computer and put next to the original .gz-file.
3308
3309 If you want to delete the transferred files on the remote computer add
3310 --cleanup. This will remove both the file transferred to the remote
3311 computer and the files transferred from the remote computer:
3312
3313 find logs/ -name '*.gz' | \
3314 parallel --sshlogin server.example.com \
3315 --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
3316
3317 If you want run on several computers add the computers to --sshlogin
3318 either using ',' or multiple --sshlogin:
3319
3320 find logs/ -name '*.gz' | \
3321 parallel --sshlogin server.example.com,server2.example.com \
3322 --sshlogin server3.example.com \
3323 --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
3324
3325 You can add the local computer using --sshlogin :. This will disable
3326 the removing and transferring for the local computer only:
3327
3328 find logs/ -name '*.gz' | \
3329 parallel --sshlogin server.example.com,server2.example.com \
3330 --sshlogin server3.example.com \
3331 --sshlogin : \
3332 --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
3333
3334 Often --transfer, --return and --cleanup are used together. They can be
3335 shortened to --trc:
3336
3337 find logs/ -name '*.gz' | \
3338 parallel --sshlogin server.example.com,server2.example.com \
3339 --sshlogin server3.example.com \
3340 --sshlogin : \
3341 --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3342
3343 With the file mycomputers containing the list of computers it becomes:
3344
3345 find logs/ -name '*.gz' | parallel --sshloginfile mycomputers \
3346 --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3347
3348 If the file ~/.parallel/sshloginfile contains the list of computers the
3349 special short hand -S .. can be used:
3350
3351 find logs/ -name '*.gz' | parallel -S .. \
3352 --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
3353
3355 Convert *.mp3 to *.ogg running one process per CPU on local computer
3356 and server2:
3357
3358 parallel --trc {.}.ogg -S server2,: \
3359 'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg' ::: *.mp3
3360
3362 To run the command uptime on remote computers you can do:
3363
3364 parallel --tag --nonall -S server1,server2 uptime
3365
3366 --nonall reads no arguments. If you have a list of jobs you want to run
3367 on each computer you can do:
3368
3369 parallel --tag --onall -S server1,server2 echo ::: 1 2 3
3370
3371 Remove --tag if you do not want the sshlogin added before the output.
3372
3373 If you have a lot of hosts use '-j0' to access more hosts in parallel.
3374
3376 If the workers are behind a NAT wall, you need some trickery to get to
3377 them.
3378
3379 If you can ssh to a jumphost, and reach the workers from there, then
3380 the obvious solution would be this, but it does not work:
3381
3382 parallel --ssh 'ssh jumphost ssh' -S host1 echo ::: DOES NOT WORK
3383
3384 It does not work because the command is dequoted by ssh twice where as
3385 GNU parallel only expects it to be dequoted once.
3386
3387 So instead put this in ~/.ssh/config:
3388
3389 Host host1 host2 host3
3390 ProxyCommand ssh jumphost.domain nc -w 1 %h 22
3391
3392 It requires nc(netcat) to be installed on jumphost. With this you can
3393 simply:
3394
3395 parallel -S host1,host2,host3 echo ::: This does work
3396
3397 No jumphost, but port forwards
3398 If there is no jumphost but each server has port 22 forwarded from the
3399 firewall (e.g. the firewall's port 22001 = port 22 on host1, 22002 =
3400 host2, 22003 = host3) then you can use ~/.ssh/config:
3401
3402 Host host1.v
3403 Port 22001
3404 Host host2.v
3405 Port 22002
3406 Host host3.v
3407 Port 22003
3408 Host *.v
3409 Hostname firewall
3410
3411 And then use host{1..3}.v as normal hosts:
3412
3413 parallel -S host1.v,host2.v,host3.v echo ::: a b c
3414
3415 No jumphost, no port forwards
3416 If ports cannot be forwarded, you need some sort of VPN to traverse the
3417 NAT-wall. TOR is one options for that, as it is very easy to get
3418 working.
3419
3420 You need to install TOR and setup a hidden service. In torrc put:
3421
3422 HiddenServiceDir /var/lib/tor/hidden_service/
3423 HiddenServicePort 22 127.0.0.1:22
3424
3425 Then start TOR: /etc/init.d/tor restart
3426
3427 The TOR hostname is now in /var/lib/tor/hidden_service/hostname and is
3428 something similar to izjafdceobowklhz.onion. Now you simply prepend
3429 torsocks to ssh:
3430
3431 parallel --ssh 'torsocks ssh' -S izjafdceobowklhz.onion \
3432 -S zfcdaeiojoklbwhz.onion,auclucjzobowklhi.onion echo ::: a b c
3433
3434 If not all hosts are accessible through TOR:
3435
3436 parallel -S 'torsocks ssh izjafdceobowklhz.onion,host2,host3' \
3437 echo ::: a b c
3438
3439 See more ssh tricks on
3440 https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Proxies_and_Jump_Hosts
3441
3443 rsync is a great tool, but sometimes it will not fill up the available
3444 bandwidth. Running multiple rsync in parallel can fix this.
3445
3446 cd src-dir
3447 find . -type f |
3448 parallel -j10 -X rsync -zR -Ha ./{} fooserver:/dest-dir/
3449
3450 Adjust -j10 until you find the optimal number.
3451
3452 rsync -R will create the needed subdirectories, so all files are not
3453 put into a single dir. The ./ is needed so the resulting command looks
3454 similar to:
3455
3456 rsync -zR ././sub/dir/file fooserver:/dest-dir/
3457
3458 The /./ is what rsync -R works on.
3459
3460 If you are unable to push data, but need to pull them and the files are
3461 called digits.png (e.g. 000000.png) you might be able to do:
3462
3463 seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/
3464
3466 Copy files like foo.es.ext to foo.ext:
3467
3468 ls *.es.* | perl -pe 'print; s/\.es//' | parallel -N2 cp {1} {2}
3469
3470 The perl command spits out 2 lines for each input. GNU parallel takes 2
3471 inputs (using -N2) and replaces {1} and {2} with the inputs.
3472
3473 Count in binary:
3474
3475 parallel -k echo ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1
3476
3477 Print the number on the opposing sides of a six sided die:
3478
3479 parallel --link -a <(seq 6) -a <(seq 6 -1 1) echo
3480 parallel --link echo :::: <(seq 6) <(seq 6 -1 1)
3481
3482 Convert files from all subdirs to PNG-files with consecutive numbers
3483 (useful for making input PNG's for ffmpeg):
3484
3485 parallel --link -a <(find . -type f | sort) \
3486 -a <(seq $(find . -type f|wc -l)) convert {1} {2}.png
3487
3488 Alternative version:
3489
3490 find . -type f | sort | parallel convert {} {#}.png
3491
3493 Content of table_file.tsv:
3494
3495 foo<TAB>bar
3496 baz <TAB> quux
3497
3498 To run:
3499
3500 cmd -o bar -i foo
3501 cmd -o quux -i baz
3502
3503 you can run:
3504
3505 parallel -a table_file.tsv --colsep '\t' cmd -o {2} -i {1}
3506
3507 Note: The default for GNU parallel is to remove the spaces around the
3508 columns. To keep the spaces:
3509
3510 parallel -a table_file.tsv --trim n --colsep '\t' cmd -o {2} -i {1}
3511
3513 GNU parallel can output to a database table and a CSV-file:
3514
3515 DBURL=csv:///%2Ftmp%2Fmy.csv
3516 DBTABLEURL=$DBURL/mytable
3517 parallel --sqlandworker $DBTABLEURL seq ::: {1..10}
3518
3519 It is rather slow and takes up a lot of CPU time because GNU parallel
3520 parses the whole CSV file for each update.
3521
3522 A better approach is to use an SQLite-base and then convert that to
3523 CSV:
3524
3525 DBURL=sqlite3:///%2Ftmp%2Fmy.sqlite
3526 DBTABLEURL=$DBURL/mytable
3527 parallel --sqlandworker $DBTABLEURL seq ::: {1..10}
3528 sql $DBURL '.headers on' '.mode csv' 'SELECT * FROM mytable;'
3529
3530 This takes around a second per job.
3531
3532 If you have access to a real database system, such as PostgreSQL, it is
3533 even faster:
3534
3535 DBURL=pg://user:pass@host/mydb
3536 DBTABLEURL=$DBURL/mytable
3537 parallel --sqlandworker $DBTABLEURL seq ::: {1..10}
3538 sql $DBURL \
3539 "COPY (SELECT * FROM mytable) TO stdout DELIMITER ',' CSV HEADER;"
3540
3541 Or MySQL:
3542
3543 DBURL=mysql://user:pass@host/mydb
3544 DBTABLEURL=$DBURL/mytable
3545 parallel --sqlandworker $DBTABLEURL seq ::: {1..10}
3546 sql -p -B $DBURL "SELECT * FROM mytable;" > mytable.tsv
3547 perl -pe 's/"/""/g; s/\t/","/g; s/^/"/; s/$/"/; s/\\\\/\\/g;
3548 s/\\t/\t/g; s/\\n/\n/g;' mytable.tsv
3549
3551 If you have no need for the advanced job distribution control that a
3552 database provides, but you simply want output into a CSV file that you
3553 can read into R or LibreCalc, then you can use --results:
3554
3555 parallel --results my.csv seq ::: 10 20 30
3556 R
3557 > mydf <- read.csv("my.csv");
3558 > print(mydf[2,])
3559 > write(as.character(mydf[2,c("Stdout")]),'')
3560
3562 The show Aflyttet on Radio 24syv publishes an RSS feed with their audio
3563 podcasts on: http://arkiv.radio24syv.dk/audiopodcast/channel/4466232
3564
3565 Using xpath you can extract the URLs for 2019 and download them using
3566 GNU parallel:
3567
3568 wget -O - http://arkiv.radio24syv.dk/audiopodcast/channel/4466232 | \
3569 xpath -e "//pubDate[contains(text(),'2019')]/../enclosure/@url" | \
3570 parallel -u wget '{= s/ url="//; s/"//; =}'
3571
3573 If you want to run the same command with the same arguments 10 times in
3574 parallel you can do:
3575
3576 seq 10 | parallel -n0 my_command my_args
3577
3579 GNU parallel can work similar to cat | sh.
3580
3581 A resource inexpensive job is a job that takes very little CPU, disk
3582 I/O and network I/O. Ping is an example of a resource inexpensive job.
3583 wget is too - if the webpages are small.
3584
3585 The content of the file jobs_to_run:
3586
3587 ping -c 1 10.0.0.1
3588 wget http://example.com/status.cgi?ip=10.0.0.1
3589 ping -c 1 10.0.0.2
3590 wget http://example.com/status.cgi?ip=10.0.0.2
3591 ...
3592 ping -c 1 10.0.0.255
3593 wget http://example.com/status.cgi?ip=10.0.0.255
3594
3595 To run 100 processes simultaneously do:
3596
3597 parallel -j 100 < jobs_to_run
3598
3599 As there is not a command the jobs will be evaluated by the shell.
3600
3602 To process a big file or some output you can use --pipe to split up the
3603 data into blocks and pipe the blocks into the processing program.
3604
3605 If the program is gzip -9 you can do:
3606
3607 cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz
3608
3609 This will split bigfile into blocks of 1 MB and pass that to gzip -9 in
3610 parallel. One gzip will be run per CPU. The output of gzip -9 will be
3611 kept in order and saved to bigfile.gz
3612
3613 gzip works fine if the output is appended, but some processing does not
3614 work like that - for example sorting. For this GNU parallel can put the
3615 output of each command into a file. This will sort a big file in
3616 parallel:
3617
3618 cat bigfile | parallel --pipe --files sort |\
3619 parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort
3620
3621 Here bigfile is split into blocks of around 1MB, each block ending in
3622 '\n' (which is the default for --recend). Each block is passed to sort
3623 and the output from sort is saved into files. These files are passed to
3624 the second parallel that runs sort -m on the files before it removes
3625 the files. The output is saved to bigfile.sort.
3626
3627 GNU parallel's --pipe maxes out at around 100 MB/s because every byte
3628 has to be copied through GNU parallel. But if bigfile is a real
3629 (seekable) file GNU parallel can by-pass the copying and send the parts
3630 directly to the program:
3631
3632 parallel --pipepart --block 100m -a bigfile --files sort |\
3633 parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort
3634
3636 When processing with --pipe you may have lines grouped by a value. Here
3637 is my.csv:
3638
3639 Transaction Customer Item
3640 1 a 53
3641 2 b 65
3642 3 b 82
3643 4 c 96
3644 5 c 67
3645 6 c 13
3646 7 d 90
3647 8 d 43
3648 9 d 91
3649 10 d 84
3650 11 e 72
3651 12 e 102
3652 13 e 63
3653 14 e 56
3654 15 e 74
3655
3656 Let us assume you want GNU parallel to process each customer. In other
3657 words: You want all the transactions for a single customer to be
3658 treated as a single record.
3659
3660 To do this we preprocess the data with a program that inserts a record
3661 separator before each customer (column 2 = $F[1]). Here we first make a
3662 50 character random string, which we then use as the separator:
3663
3664 sep=`perl -e 'print map { ("a".."z","A".."Z")[rand(52)] } (1..50);'`
3665 cat my.csv | \
3666 perl -ape '$F[1] ne $l and print "'$sep'"; $l = $F[1]' | \
3667 parallel --recend $sep --rrs --pipe -N1 wc
3668
3669 If your program can process multiple customers replace -N1 with a
3670 reasonable --blocksize.
3671
3673 If you need to run a massive amount of jobs in parallel, then you will
3674 likely hit the filehandle limit which is often around 250 jobs. If you
3675 are super user you can raise the limit in /etc/security/limits.conf but
3676 you can also use this workaround. The filehandle limit is per process.
3677 That means that if you just spawn more GNU parallels then each of them
3678 can run 250 jobs. This will spawn up to 2500 jobs:
3679
3680 cat myinput |\
3681 parallel --pipe -N 50 --roundrobin -j50 parallel -j50 your_prg
3682
3683 This will spawn up to 62500 jobs (use with caution - you need 64 GB RAM
3684 to do this, and you may need to increase /proc/sys/kernel/pid_max):
3685
3686 cat myinput |\
3687 parallel --pipe -N 250 --roundrobin -j250 parallel -j250 your_prg
3688
3690 The command sem is an alias for parallel --semaphore.
3691
3692 A counting semaphore will allow a given number of jobs to be started in
3693 the background. When the number of jobs are running in the background,
3694 GNU sem will wait for one of these to complete before starting another
3695 command. sem --wait will wait for all jobs to complete.
3696
3697 Run 10 jobs concurrently in the background:
3698
3699 for i in *.log ; do
3700 echo $i
3701 sem -j10 gzip $i ";" echo done
3702 done
3703 sem --wait
3704
3705 A mutex is a counting semaphore allowing only one job to run. This will
3706 edit the file myfile and prepends the file with lines with the numbers
3707 1 to 3.
3708
3709 seq 3 | parallel sem sed -i -e '1i{}' myfile
3710
3711 As myfile can be very big it is important only one process edits the
3712 file at the same time.
3713
3714 Name the semaphore to have multiple different semaphores active at the
3715 same time:
3716
3717 seq 3 | parallel sem --id mymutex sed -i -e '1i{}' myfile
3718
3720 Assume a script is called from cron or from a web service, but only one
3721 instance can be run at a time. With sem and --shebang-wrap the script
3722 can be made to wait for other instances to finish. Here in bash:
3723
3724 #!/usr/bin/sem --shebang-wrap -u --id $0 --fg /bin/bash
3725
3726 echo This will run
3727 sleep 5
3728 echo exclusively
3729
3730 Here perl:
3731
3732 #!/usr/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/perl
3733
3734 print "This will run ";
3735 sleep 5;
3736 print "exclusively\n";
3737
3738 Here python:
3739
3740 #!/usr/local/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/python
3741
3742 import time
3743 print "This will run ";
3744 time.sleep(5)
3745 print "exclusively";
3746
3748 You can use GNU parallel to start interactive programs like emacs or
3749 vi:
3750
3751 cat filelist | parallel --tty -X emacs
3752 cat filelist | parallel --tty -X vi
3753
3754 If there are more files than will fit on a single command line, the
3755 editor will be started again with the remaining files.
3756
3758 sudo requires a password to run a command as root. It caches the
3759 access, so you only need to enter the password again if you have not
3760 used sudo for a while.
3761
3762 The command:
3763
3764 parallel sudo echo ::: This is a bad idea
3765
3766 is no good, as you would be prompted for the sudo password for each of
3767 the jobs. You can either do:
3768
3769 sudo echo This
3770 parallel sudo echo ::: is a good idea
3771
3772 or:
3773
3774 sudo parallel echo ::: This is a good idea
3775
3776 This way you only have to enter the sudo password once.
3777
3779 GNU parallel can work as a simple job queue system or batch manager.
3780 The idea is to put the jobs into a file and have GNU parallel read from
3781 that continuously. As GNU parallel will stop at end of file we use tail
3782 to continue reading:
3783
3784 true >jobqueue; tail -n+0 -f jobqueue | parallel
3785
3786 To submit your jobs to the queue:
3787
3788 echo my_command my_arg >> jobqueue
3789
3790 You can of course use -S to distribute the jobs to remote computers:
3791
3792 true >jobqueue; tail -n+0 -f jobqueue | parallel -S ..
3793
3794 If you keep this running for a long time, jobqueue will grow. A way of
3795 removing the jobs already run is by making GNU parallel stop when it
3796 hits a special value and then restart. To use --eof to make GNU
3797 parallel exit, tail also needs to be forced to exit:
3798
3799 true >jobqueue;
3800 while true; do
3801 tail -n+0 -f jobqueue |
3802 (parallel -E StOpHeRe -S ..; echo GNU Parallel is now done;
3803 perl -e 'while(<>){/StOpHeRe/ and last};print <>' jobqueue > j2;
3804 (seq 1000 >> jobqueue &);
3805 echo Done appending dummy data forcing tail to exit)
3806 echo tail exited;
3807 mv j2 jobqueue
3808 done
3809
3810 In some cases you can run on more CPUs and computers during the night:
3811
3812 # Day time
3813 echo 50% > jobfile
3814 cp day_server_list ~/.parallel/sshloginfile
3815 # Night time
3816 echo 100% > jobfile
3817 cp night_server_list ~/.parallel/sshloginfile
3818 tail -n+0 -f jobqueue | parallel --jobs jobfile -S ..
3819
3820 GNU parallel discovers if jobfile or ~/.parallel/sshloginfile changes.
3821
3822 There is a a small issue when using GNU parallel as queue system/batch
3823 manager: You have to submit JobSlot number of jobs before they will
3824 start, and after that you can submit one at a time, and job will start
3825 immediately if free slots are available. Output from the running or
3826 completed jobs are held back and will only be printed when JobSlots
3827 more jobs has been started (unless you use --ungroup or --line-buffer,
3828 in which case the output from the jobs are printed immediately). E.g.
3829 if you have 10 jobslots then the output from the first completed job
3830 will only be printed when job 11 has started, and the output of second
3831 completed job will only be printed when job 12 has started.
3832
3834 If you have a dir in which users drop files that needs to be processed
3835 you can do this on GNU/Linux (If you know what inotifywait is called on
3836 other platforms file a bug report):
3837
3838 inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
3839 parallel -u echo
3840
3841 This will run the command echo on each file put into my_dir or subdirs
3842 of my_dir.
3843
3844 You can of course use -S to distribute the jobs to remote computers:
3845
3846 inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
3847 parallel -S .. -u echo
3848
3849 If the files to be processed are in a tar file then unpacking one file
3850 and processing it immediately may be faster than first unpacking all
3851 files. Set up the dir processor as above and unpack into the dir.
3852
3853 Using GNU parallel as dir processor has the same limitations as using
3854 GNU parallel as queue system/batch manager.
3855
3857 If you have downloaded source and tried compiling it, you may have
3858 seen:
3859
3860 $ ./configure
3861 [...]
3862 checking for something.h... no
3863 configure: error: "libsomething not found"
3864
3865 Often it is not obvious which package you should install to get that
3866 file. Debian has `apt-file` to search for a file. `tracefile` from
3867 https://gitlab.com/ole.tange/tangetools can tell which files a program
3868 tried to access. In this case we are interested in one of the last
3869 files:
3870
3871 $ tracefile -un ./configure | tail | parallel -j0 apt-file search
3872
3874 --round-robin, --pipe-part, --shard, --bin and --group-by are all
3875 specialized versions of --pipe.
3876
3877 In the following n is the number of jobslots given by --jobs. A record
3878 starts with --recstart and ends with --recend. It is typically a full
3879 line. A chunk is a number of full records that is approximately the
3880 size of a block. A block can contain half records, a chunk cannot.
3881
3882 --pipe starts one job per chunk. It reads blocks from stdin (standard
3883 input). It finds a record end near a block border and passes a chunk to
3884 the program.
3885
3886 --pipe-part starts one job per chunk - just like normal --pipe. It
3887 first finds record endings near all block borders in the file and then
3888 starts the jobs. By using --block -1 it will set the block size to 1/n
3889 * size-of-file. Used this way it will start n jobs in total.
3890
3891 --round-robin starts n jobs in total. It reads a block and passes a
3892 chunk to whichever job is ready to read. It does not parse the content
3893 except for identifying where a record ends to make sure it only passes
3894 full records.
3895
3896 --shard starts n jobs in total. It parses each line to read the value
3897 in the given column. Based on this value the line is passed to one of
3898 the n jobs. All lines having this value will be given to the same
3899 jobslot.
3900
3901 --bin works like --shard but the value of the column is the jobslot
3902 number it will be passed to. If the value is bigger than n, then n will
3903 be subtracted from the value until the values is smaller than or equal
3904 to n.
3905
3906 --group-by starts one job per chunk. Record borders are not given by
3907 --recend/--recstart. Instead a record is defined by a number of lines
3908 having the same value in a given column. So the value of a given column
3909 changes at a chunk border. With --pipe every line is parsed, with
3910 --pipe-part only a few lines are parsed to find the chunk border.
3911
3912 --group-by can be combined with --round-robin or --pipe-part.
3913
3915 GNU parallel is very liberal in quoting. You only need to quote
3916 characters that have special meaning in shell:
3917
3918 ( ) $ ` ' " < > ; | \
3919
3920 and depending on context these needs to be quoted, too:
3921
3922 ~ & # ! ? space * {
3923
3924 Therefore most people will never need more quoting than putting '\' in
3925 front of the special characters.
3926
3927 Often you can simply put \' around every ':
3928
3929 perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' file
3930
3931 can be quoted:
3932
3933 parallel perl -ne \''/^\S+\s+\S+$/ and print $ARGV,"\n"'\' ::: file
3934
3935 However, when you want to use a shell variable you need to quote the
3936 $-sign. Here is an example using $PARALLEL_SEQ. This variable is set by
3937 GNU parallel itself, so the evaluation of the $ must be done by the sub
3938 shell started by GNU parallel:
3939
3940 seq 10 | parallel -N2 echo seq:\$PARALLEL_SEQ arg1:{1} arg2:{2}
3941
3942 If the variable is set before GNU parallel starts you can do this:
3943
3944 VAR=this_is_set_before_starting
3945 echo test | parallel echo {} $VAR
3946
3947 Prints: test this_is_set_before_starting
3948
3949 It is a little more tricky if the variable contains more than one space
3950 in a row:
3951
3952 VAR="two spaces between each word"
3953 echo test | parallel echo {} \'"$VAR"\'
3954
3955 Prints: test two spaces between each word
3956
3957 If the variable should not be evaluated by the shell starting GNU
3958 parallel but be evaluated by the sub shell started by GNU parallel,
3959 then you need to quote it:
3960
3961 echo test | parallel VAR=this_is_set_after_starting \; echo {} \$VAR
3962
3963 Prints: test this_is_set_after_starting
3964
3965 It is a little more tricky if the variable contains space:
3966
3967 echo test |\
3968 parallel VAR='"two spaces between each word"' echo {} \'"$VAR"\'
3969
3970 Prints: test two spaces between each word
3971
3972 $$ is the shell variable containing the process id of the shell. This
3973 will print the process id of the shell running GNU parallel:
3974
3975 seq 10 | parallel echo $$
3976
3977 And this will print the process ids of the sub shells started by GNU
3978 parallel.
3979
3980 seq 10 | parallel echo \$\$
3981
3982 If the special characters should not be evaluated by the sub shell then
3983 you need to protect it against evaluation from both the shell starting
3984 GNU parallel and the sub shell:
3985
3986 echo test | parallel echo {} \\\$VAR
3987
3988 Prints: test $VAR
3989
3990 GNU parallel can protect against evaluation by the sub shell by using
3991 -q:
3992
3993 echo test | parallel -q echo {} \$VAR
3994
3995 Prints: test $VAR
3996
3997 This is particularly useful if you have lots of quoting. If you want to
3998 run a perl script like this:
3999
4000 perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' file
4001
4002 It needs to be quoted like one of these:
4003
4004 ls | parallel perl -ne '/^\\S+\\s+\\S+\$/\ and\ print\ \$ARGV,\"\\n\"'
4005 ls | parallel perl -ne \''/^\S+\s+\S+$/ and print $ARGV,"\n"'\'
4006
4007 Notice how spaces, \'s, "'s, and $'s need to be quoted. GNU parallel
4008 can do the quoting by using option -q:
4009
4010 ls | parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"'
4011
4012 However, this means you cannot make the sub shell interpret special
4013 characters. For example because of -q this WILL NOT WORK:
4014
4015 ls *.gz | parallel -q "zcat {} >{.}"
4016 ls *.gz | parallel -q "zcat {} | bzip2 >{.}.bz2"
4017
4018 because > and | need to be interpreted by the sub shell.
4019
4020 If you get errors like:
4021
4022 sh: -c: line 0: syntax error near unexpected token
4023 sh: Syntax error: Unterminated quoted string
4024 sh: -c: line 0: unexpected EOF while looking for matching `''
4025 sh: -c: line 1: syntax error: unexpected end of file
4026 zsh:1: no matches found:
4027
4028 then you might try using -q.
4029
4030 If you are using bash process substitution like <(cat foo) then you may
4031 try -q and prepending command with bash -c:
4032
4033 ls | parallel -q bash -c 'wc -c <(echo {})'
4034
4035 Or for substituting output:
4036
4037 ls | parallel -q bash -c \
4038 'tar c {} | tee >(gzip >{}.tar.gz) | bzip2 >{}.tar.bz2'
4039
4040 Conclusion: To avoid dealing with the quoting problems it may be easier
4041 just to write a small script or a function (remember to export -f the
4042 function) and have GNU parallel call that.
4043
4045 If you want a list of the jobs currently running you can run:
4046
4047 killall -USR1 parallel
4048
4049 GNU parallel will then print the currently running jobs on stderr
4050 (standard error).
4051
4053 If you regret starting a lot of jobs you can simply break GNU parallel,
4054 but if you want to make sure you do not have half-completed jobs you
4055 should send the signal SIGHUP to GNU parallel:
4056
4057 killall -HUP parallel
4058
4059 This will tell GNU parallel to not start any new jobs, but wait until
4060 the currently running jobs are finished before exiting.
4061
4063 $PARALLEL_HOME
4064 Dir where GNU parallel stores config files, semaphores, and
4065 caches information between invocations. Default:
4066 $HOME/.parallel.
4067
4068 $PARALLEL_PID
4069 The environment variable $PARALLEL_PID is set by GNU parallel
4070 and is visible to the jobs started from GNU parallel. This
4071 makes it possible for the jobs to communicate directly to GNU
4072 parallel. Remember to quote the $, so it gets evaluated by
4073 the correct shell.
4074
4075 Example: If each of the jobs tests a solution and one of jobs
4076 finds the solution the job can tell GNU parallel not to start
4077 more jobs by: kill -HUP $PARALLEL_PID. This only works on the
4078 local computer.
4079
4080 $PARALLEL_RSYNC_OPTS
4081 Options to pass on to rsync. Defaults to: -rlDzR.
4082
4083 $PARALLEL_SHELL
4084 Use this shell for the commands run by GNU parallel:
4085
4086 · $PARALLEL_SHELL. If undefined use:
4087
4088 · The shell that started GNU parallel. If that cannot be
4089 determined:
4090
4091 · $SHELL. If undefined use:
4092
4093 · /bin/sh
4094
4095 $PARALLEL_SSH
4096 GNU parallel defaults to using ssh for remote access. This can
4097 be overridden with $PARALLEL_SSH, which again can be
4098 overridden with --ssh. It can also be set on a per server
4099 basis (see --sshlogin).
4100
4101 $PARALLEL_SSHLOGIN (beta testing)
4102 The environment variable $PARALLEL_SSHLOGIN is set by GNU
4103 parallel and is visible to the jobs started from GNU parallel.
4104 The value is the sshlogin line with number of cores removed.
4105 E.g.
4106
4107 4//usr/bin/specialssh user@host
4108
4109 becomes:
4110
4111 /usr/bin/specialssh user@host
4112
4113 $PARALLEL_SEQ
4114 $PARALLEL_SEQ will be set to the sequence number of the job
4115 running. Remember to quote the $, so it gets evaluated by the
4116 correct shell.
4117
4118 Example:
4119
4120 seq 10 | parallel -N2 \
4121 echo seq:'$'PARALLEL_SEQ arg1:{1} arg2:{2}
4122
4123 $PARALLEL_TMUX
4124 Path to tmux. If unset the tmux in $PATH is used.
4125
4126 $TMPDIR Directory for temporary files. See: --tmpdir.
4127
4128 $PARALLEL
4129 The environment variable $PARALLEL will be used as default
4130 options for GNU parallel. If the variable contains special
4131 shell characters (e.g. $, *, or space) then these need to be
4132 to be escaped with \.
4133
4134 Example:
4135
4136 cat list | parallel -j1 -k -v ls
4137 cat list | parallel -j1 -k -v -S"myssh user@server" ls
4138
4139 can be written as:
4140
4141 cat list | PARALLEL="-kvj1" parallel ls
4142 cat list | PARALLEL='-kvj1 -S myssh\ user@server' \
4143 parallel echo
4144
4145 Notice the \ in the middle is needed because 'myssh' and
4146 'user@server' must be one argument.
4147
4149 The global configuration file /etc/parallel/config, followed by user
4150 configuration file ~/.parallel/config (formerly known as .parallelrc)
4151 will be read in turn if they exist. Lines starting with '#' will be
4152 ignored. The format can follow that of the environment variable
4153 $PARALLEL, but it is often easier to simply put each option on its own
4154 line.
4155
4156 Options on the command line take precedence, followed by the
4157 environment variable $PARALLEL, user configuration file
4158 ~/.parallel/config, and finally the global configuration file
4159 /etc/parallel/config.
4160
4161 Note that no file that is read for options, nor the environment
4162 variable $PARALLEL, may contain retired options such as --tollef.
4163
4165 If --profile set, GNU parallel will read the profile from that file
4166 rather than the global or user configuration files. You can have
4167 multiple --profiles.
4168
4169 Profiles are searched for in ~/.parallel. If the name starts with / it
4170 is seen as an absolute path. If the name starts with ./ it is seen as a
4171 relative path from current dir.
4172
4173 Example: Profile for running a command on every sshlogin in
4174 ~/.ssh/sshlogins and prepend the output with the sshlogin:
4175
4176 echo --tag -S .. --nonall > ~/.parallel/n
4177 parallel -Jn uptime
4178
4179 Example: Profile for running every command with -j-1 and nice
4180
4181 echo -j-1 nice > ~/.parallel/nice_profile
4182 parallel -J nice_profile bzip2 -9 ::: *
4183
4184 Example: Profile for running a perl script before every command:
4185
4186 echo "perl -e '\$a=\$\$; print \$a,\" \",'\$PARALLEL_SEQ',\" \";';" \
4187 > ~/.parallel/pre_perl
4188 parallel -J pre_perl echo ::: *
4189
4190 Note how the $ and " need to be quoted using \.
4191
4192 Example: Profile for running distributed jobs with nice on the remote
4193 computers:
4194
4195 echo -S .. nice > ~/.parallel/dist
4196 parallel -J dist --trc {.}.bz2 bzip2 -9 ::: *
4197
4199 Exit status depends on --halt-on-error if one of these is used:
4200 success=X, success=Y%, fail=Y%.
4201
4202 0 All jobs ran without error. If success=X is used: X jobs ran
4203 without error. If success=Y% is used: Y% of the jobs ran without
4204 error.
4205
4206 1-100 Some of the jobs failed. The exit status gives the number of
4207 failed jobs. If Y% is used the exit status is the percentage of
4208 jobs that failed.
4209
4210 101 More than 100 jobs failed.
4211
4212 255 Other error.
4213
4214 -1 (In joblog and SQL table)
4215 Killed by Ctrl-C, timeout, not enough memory or similar.
4216
4217 -2 (In joblog and SQL table)
4218 skip() was called in {= =}.
4219
4220 -1000 (In SQL table)
4221 Job is ready to run (set by --sqlmaster).
4222
4223 -1220 (In SQL table)
4224 Job is taken by worker (set by --sqlworker).
4225
4226 If fail=1 is used, the exit status will be the exit status of the
4227 failing job.
4228
4230 See: man parallel_alternatives
4231
4233 Quoting of newline
4234 Because of the way newline is quoted this will not work:
4235
4236 echo 1,2,3 | parallel -vkd, "echo 'a{}b'"
4237
4238 However, these will all work:
4239
4240 echo 1,2,3 | parallel -vkd, echo a{}b
4241 echo 1,2,3 | parallel -vkd, "echo 'a'{}'b'"
4242 echo 1,2,3 | parallel -vkd, "echo 'a'"{}"'b'"
4243
4244 Speed
4245 Startup
4246
4247 GNU parallel is slow at starting up - around 250 ms the first time and
4248 150 ms after that.
4249
4250 Job startup
4251
4252 Starting a job on the local machine takes around 10 ms. This can be a
4253 big overhead if the job takes very few ms to run. Often you can group
4254 small jobs together using -X which will make the overhead less
4255 significant. Or you can run multiple GNU parallels as described in
4256 EXAMPLE: Speeding up fast jobs.
4257
4258 SSH
4259
4260 When using multiple computers GNU parallel opens ssh connections to
4261 them to figure out how many connections can be used reliably
4262 simultaneously (Namely SSHD's MaxStartups). This test is done for each
4263 host in serial, so if your --sshloginfile contains many hosts it may be
4264 slow.
4265
4266 If your jobs are short you may see that there are fewer jobs running on
4267 the remote systems than expected. This is due to time spent logging in
4268 and out. -M may help here.
4269
4270 Disk access
4271
4272 A single disk can normally read data faster if it reads one file at a
4273 time instead of reading a lot of files in parallel, as this will avoid
4274 disk seeks. However, newer disk systems with multiple drives can read
4275 faster if reading from multiple files in parallel.
4276
4277 If the jobs are of the form read-all-compute-all-write-all, so
4278 everything is read before anything is written, it may be faster to
4279 force only one disk access at the time:
4280
4281 sem --id diskio cat file | compute | sem --id diskio cat > file
4282
4283 If the jobs are of the form read-compute-write, so writing starts
4284 before all reading is done, it may be faster to force only one reader
4285 and writer at the time:
4286
4287 sem --id read cat file | compute | sem --id write cat > file
4288
4289 If the jobs are of the form read-compute-read-compute, it may be faster
4290 to run more jobs in parallel than the system has CPUs, as some of the
4291 jobs will be stuck waiting for disk access.
4292
4293 --nice limits command length
4294 The current implementation of --nice is too pessimistic in the max
4295 allowed command length. It only uses a little more than half of what it
4296 could. This affects -X and -m. If this becomes a real problem for you,
4297 file a bug-report.
4298
4299 Aliases and functions do not work
4300 If you get:
4301
4302 Can't exec "command": No such file or directory
4303
4304 or:
4305
4306 open3: exec of by command failed
4307
4308 or:
4309
4310 /bin/bash: command: command not found
4311
4312 it may be because command is not known, but it could also be because
4313 command is an alias or a function. If it is a function you need to
4314 export -f the function first or use env_parallel. An alias will only
4315 work if you use env_parallel.
4316
4317 Database with MySQL fails randomly
4318 The --sql* options may fail randomly with MySQL. This problem does not
4319 exist with PostgreSQL.
4320
4322 Report bugs to <bug-parallel@gnu.org> or
4323 https://savannah.gnu.org/bugs/?func=additem&group=parallel
4324
4325 See a perfect bug report on
4326 https://lists.gnu.org/archive/html/bug-parallel/2015-01/msg00000.html
4327
4328 Your bug report should always include:
4329
4330 · The error message you get (if any). If the error message is not from
4331 GNU parallel you need to show why you think GNU parallel caused
4332 these.
4333
4334 · The complete output of parallel --version. If you are not running the
4335 latest released version (see http://ftp.gnu.org/gnu/parallel/) you
4336 should specify why you believe the problem is not fixed in that
4337 version.
4338
4339 · A minimal, complete, and verifiable example (See description on
4340 http://stackoverflow.com/help/mcve).
4341
4342 It should be a complete example that others can run that shows the
4343 problem including all files needed to run the example. This should
4344 preferably be small and simple, so try to remove as many options as
4345 possible. A combination of yes, seq, cat, echo, and sleep can
4346 reproduce most errors. If your example requires large files, see if
4347 you can make them by something like seq 1000000 > file or yes | head
4348 -n 10000000 > file.
4349
4350 If your example requires remote execution, see if you can use
4351 localhost - maybe using another login.
4352
4353 If you have access to a different system, test if the MCVE shows the
4354 problem on that system.
4355
4356 · The output of your example. If your problem is not easily reproduced
4357 by others, the output might help them figure out the problem.
4358
4359 · Whether you have watched the intro videos
4360 (http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1), walked
4361 through the tutorial (man parallel_tutorial), and read the EXAMPLE
4362 section in the man page (man parallel - search for EXAMPLE:).
4363
4364 If you suspect the error is dependent on your environment or
4365 distribution, please see if you can reproduce the error on one of these
4366 VirtualBox images:
4367 http://sourceforge.net/projects/virtualboximage/files/
4368 http://www.osboxes.org/virtualbox-images/
4369
4370 Specifying the name of your distribution is not enough as you may have
4371 installed software that is not in the VirtualBox images.
4372
4373 If you cannot reproduce the error on any of the VirtualBox images
4374 above, see if you can build a VirtualBox image on which you can
4375 reproduce the error. If not you should assume the debugging will be
4376 done through you. That will put more burden on you and it is extra
4377 important you give any information that help. In general the problem
4378 will be fixed faster and with less work for you if you can reproduce
4379 the error on a VirtualBox.
4380
4382 When using GNU parallel for a publication please cite:
4383
4384 O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
4385 The USENIX Magazine, February 2011:42-47.
4386
4387 This helps funding further development; and it won't cost you a cent.
4388 If you pay 10000 EUR you should feel free to use GNU Parallel without
4389 citing.
4390
4391 Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
4392
4393 Copyright (C) 2008-2010 Ole Tange, http://ole.tange.dk
4394
4395 Copyright (C) 2010-2019 Ole Tange, http://ole.tange.dk and Free
4396 Software Foundation, Inc.
4397
4398 Parts of the manual concerning xargs compatibility is inspired by the
4399 manual of xargs from GNU findutils 4.4.2.
4400
4402 This program is free software; you can redistribute it and/or modify it
4403 under the terms of the GNU General Public License as published by the
4404 Free Software Foundation; either version 3 of the License, or at your
4405 option any later version.
4406
4407 This program is distributed in the hope that it will be useful, but
4408 WITHOUT ANY WARRANTY; without even the implied warranty of
4409 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
4410 General Public License for more details.
4411
4412 You should have received a copy of the GNU General Public License along
4413 with this program. If not, see <http://www.gnu.org/licenses/>.
4414
4415 Documentation license I
4416 Permission is granted to copy, distribute and/or modify this
4417 documentation under the terms of the GNU Free Documentation License,
4418 Version 1.3 or any later version published by the Free Software
4419 Foundation; with no Invariant Sections, with no Front-Cover Texts, and
4420 with no Back-Cover Texts. A copy of the license is included in the
4421 file fdl.txt.
4422
4423 Documentation license II
4424 You are free:
4425
4426 to Share to copy, distribute and transmit the work
4427
4428 to Remix to adapt the work
4429
4430 Under the following conditions:
4431
4432 Attribution
4433 You must attribute the work in the manner specified by the
4434 author or licensor (but not in any way that suggests that they
4435 endorse you or your use of the work).
4436
4437 Share Alike
4438 If you alter, transform, or build upon this work, you may
4439 distribute the resulting work only under the same, similar or
4440 a compatible license.
4441
4442 With the understanding that:
4443
4444 Waiver Any of the above conditions can be waived if you get
4445 permission from the copyright holder.
4446
4447 Public Domain
4448 Where the work or any of its elements is in the public domain
4449 under applicable law, that status is in no way affected by the
4450 license.
4451
4452 Other Rights
4453 In no way are any of the following rights affected by the
4454 license:
4455
4456 · Your fair dealing or fair use rights, or other applicable
4457 copyright exceptions and limitations;
4458
4459 · The author's moral rights;
4460
4461 · Rights other persons may have either in the work itself or
4462 in how the work is used, such as publicity or privacy
4463 rights.
4464
4465 Notice For any reuse or distribution, you must make clear to others
4466 the license terms of this work.
4467
4468 A copy of the full license is included in the file as cc-by-sa.txt.
4469
4471 GNU parallel uses Perl, and the Perl modules Getopt::Long, IPC::Open3,
4472 Symbol, IO::File, POSIX, and File::Temp. For remote usage it also uses
4473 rsync with ssh.
4474
4476 ssh(1), ssh-agent(1), sshpass(1), ssh-copy-id(1), rsync(1), find(1),
4477 xargs(1), dirname(1), make(1), pexec(1), ppss(1), xjobs(1), prll(1),
4478 dxargs(1), mdm(1)
4479
4480
4481
448220190822 2019-09-16 PARALLEL(1)