1PARALLEL_ALTERNATIVES(7) parallel PARALLEL_ALTERNATIVES(7)
2
3
4
6 parallel_alternatives - Alternatives to GNU parallel
7
9 There are a lot programs that share functionality with GNU parallel.
10 Some of these are specialized tools, and while GNU parallel can emulate
11 many of them, a specialized tool can be better at a given task. GNU
12 parallel strives to include the best of the general functionality
13 without sacrificing ease of use.
14
15 parallel has existed since 2002-01-06 and as GNU parallel since 2010. A
16 lot of the alternatives have not had the vitality to survive that long,
17 but have come and gone during that time.
18
19 GNU parallel is actively maintained with a new release every month
20 since 2010. Most other alternatives are fleeting interests of the
21 developers with irregular releases and only maintained for a few years.
22
23 SUMMARY LEGEND
24 The following features are in some of the comparable tools:
25
26 Inputs
27
28 I1. Arguments can be read from stdin
29 I2. Arguments can be read from a file
30 I3. Arguments can be read from multiple files
31 I4. Arguments can be read from command line
32 I5. Arguments can be read from a table
33 I6. Arguments can be read from the same file using #! (shebang)
34 I7. Line oriented input as default (Quoting of special chars not
35 needed)
36
37 Manipulation of input
38
39 M1. Composed command
40 M2. Multiple arguments can fill up an execution line
41 M3. Arguments can be put anywhere in the execution line
42 M4. Multiple arguments can be put anywhere in the execution line
43 M5. Arguments can be replaced with context
44 M6. Input can be treated as the complete command line
45
46 Outputs
47
48 O1. Grouping output so output from different jobs do not mix
49 O2. Send stderr (standard error) to stderr (standard error)
50 O3. Send stdout (standard output) to stdout (standard output)
51 O4. Order of output can be same as order of input
52 O5. Stdout only contains stdout (standard output) from the command
53 O6. Stderr only contains stderr (standard error) from the command
54 O7. Buffering on disk
55 O8. Cleanup of temporary files if killed
56 O9. Test if disk runs full during run
57 O10. Output of a line bigger than 4 GB
58
59 Execution
60
61 E1. Running jobs in parallel
62 E2. List running jobs
63 E3. Finish running jobs, but do not start new jobs
64 E4. Number of running jobs can depend on number of cpus
65 E5. Finish running jobs, but do not start new jobs after first failure
66 E6. Number of running jobs can be adjusted while running
67 E7. Only spawn new jobs if load is less than a limit
68
69 Remote execution
70
71 R1. Jobs can be run on remote computers
72 R2. Basefiles can be transferred
73 R3. Argument files can be transferred
74 R4. Result files can be transferred
75 R5. Cleanup of transferred files
76 R6. No config files needed
77 R7. Do not run more than SSHD's MaxStartups can handle
78 R8. Configurable SSH command
79 R9. Retry if connection breaks occasionally
80
81 Semaphore
82
83 S1. Possibility to work as a mutex
84 S2. Possibility to work as a counting semaphore
85
86 Legend
87
88 - = no
89 x = not applicable
90 ID = yes
91
92 As every new version of the programs are not tested the table may be
93 outdated. Please file a bug report if you find errors (See REPORTING
94 BUGS).
95
96 parallel:
97
98 I1 I2 I3 I4 I5 I6 I7
99 M1 M2 M3 M4 M5 M6
100 O1 O2 O3 O4 O5 O6 O7 O8 O9 O10
101 E1 E2 E3 E4 E5 E6 E7
102 R1 R2 R3 R4 R5 R6 R7 R8 R9
103 S1 S2
104
105 DIFFERENCES BETWEEN xargs AND GNU Parallel
106 Summary (see legend above):
107
108 I1 I2 - - - - -
109 - M2 M3 - - -
110 - O2 O3 - O5 O6
111 E1 - - - - - -
112 - - - - - x - - -
113 - -
114
115 xargs offers some of the same possibilities as GNU parallel.
116
117 xargs deals badly with special characters (such as space, \, ' and ").
118 To see the problem try this:
119
120 touch important_file
121 touch 'not important_file'
122 ls not* | xargs rm
123 mkdir -p "My brother's 12\" records"
124 ls | xargs rmdir
125 touch 'c:\windows\system32\clfs.sys'
126 echo 'c:\windows\system32\clfs.sys' | xargs ls -l
127
128 You can specify -0, but many input generators are not optimized for
129 using NUL as separator but are optimized for newline as separator. E.g.
130 awk, ls, echo, tar -v, head (requires using -z), tail (requires using
131 -z), sed (requires using -z), perl (-0 and \0 instead of \n), locate
132 (requires using -0), find (requires using -print0), grep (requires
133 using -z or -Z), sort (requires using -z).
134
135 GNU parallel's newline separation can be emulated with:
136
137 cat | xargs -d "\n" -n1 command
138
139 xargs can run a given number of jobs in parallel, but has no support
140 for running number-of-cpu-cores jobs in parallel.
141
142 xargs has no support for grouping the output, therefore output may run
143 together, e.g. the first half of a line is from one process and the
144 last half of the line is from another process. The example Parallel
145 grep cannot be done reliably with xargs because of this. To see this in
146 action try:
147
148 parallel perl -e '\$a=\"1\".\"{}\"x10000000\;print\ \$a,\"\\n\"' \
149 '>' {} ::: a b c d e f g h
150 # Serial = no mixing = the wanted result
151 # 'tr -s a-z' squeezes repeating letters into a single letter
152 echo a b c d e f g h | xargs -P1 -n1 grep 1 | tr -s a-z
153 # Compare to 8 jobs in parallel
154 parallel -kP8 -n1 grep 1 ::: a b c d e f g h | tr -s a-z
155 echo a b c d e f g h | xargs -P8 -n1 grep 1 | tr -s a-z
156 echo a b c d e f g h | xargs -P8 -n1 grep --line-buffered 1 | \
157 tr -s a-z
158
159 Or try this:
160
161 slow_seq() {
162 echo Count to "$@"
163 seq "$@" |
164 perl -ne '$|=1; for(split//){ print; select($a,$a,$a,0.100);}'
165 }
166 export -f slow_seq
167 # Serial = no mixing = the wanted result
168 seq 8 | xargs -n1 -P1 -I {} bash -c 'slow_seq {}'
169 # Compare to 8 jobs in parallel
170 seq 8 | parallel -P8 slow_seq {}
171 seq 8 | xargs -n1 -P8 -I {} bash -c 'slow_seq {}'
172
173 xargs has no support for keeping the order of the output, therefore if
174 running jobs in parallel using xargs the output of the second job
175 cannot be postponed till the first job is done.
176
177 xargs has no support for running jobs on remote computers.
178
179 xargs has no support for context replace, so you will have to create
180 the arguments.
181
182 If you use a replace string in xargs (-I) you can not force xargs to
183 use more than one argument.
184
185 Quoting in xargs works like -q in GNU parallel. This means composed
186 commands and redirection require using bash -c.
187
188 ls | parallel "wc {} >{}.wc"
189 ls | parallel "echo {}; ls {}|wc"
190
191 becomes (assuming you have 8 cores and that none of the filenames
192 contain space, " or ').
193
194 ls | xargs -d "\n" -P8 -I {} bash -c "wc {} >{}.wc"
195 ls | xargs -d "\n" -P8 -I {} bash -c "echo {}; ls {}|wc"
196
197 A more extreme example can be found on:
198 https://unix.stackexchange.com/q/405552/
199
200 https://www.gnu.org/software/findutils/
201
202 DIFFERENCES BETWEEN find -exec AND GNU Parallel
203 Summary (see legend above):
204
205 - - - x - x -
206 - M2 M3 - - - -
207 - O2 O3 O4 O5 O6
208 - - - - - - -
209 - - - - - - - - -
210 x x
211
212 find -exec offers some of the same possibilities as GNU parallel.
213
214 find -exec only works on files. Processing other input (such as hosts
215 or URLs) will require creating these inputs as files. find -exec has no
216 support for running commands in parallel.
217
218 https://www.gnu.org/software/findutils/ (Last checked: 2019-01)
219
220 DIFFERENCES BETWEEN make -j AND GNU Parallel
221 Summary (see legend above):
222
223 - - - - - - -
224 - - - - - -
225 O1 O2 O3 - x O6
226 E1 - - - E5 -
227 - - - - - - - - -
228 - -
229
230 make -j can run jobs in parallel, but requires a crafted Makefile to do
231 this. That results in extra quoting to get filenames containing
232 newlines to work correctly.
233
234 make -j computes a dependency graph before running jobs. Jobs run by
235 GNU parallel does not depend on each other.
236
237 (Very early versions of GNU parallel were coincidentally implemented
238 using make -j).
239
240 https://www.gnu.org/software/make/ (Last checked: 2019-01)
241
242 DIFFERENCES BETWEEN ppss AND GNU Parallel
243 Summary (see legend above):
244
245 I1 I2 - - - - I7
246 M1 - M3 - - M6
247 O1 - - x - -
248 E1 E2 ?E3 E4 - - -
249 R1 R2 R3 R4 - - ?R7 ? ?
250 - -
251
252 ppss is also a tool for running jobs in parallel.
253
254 The output of ppss is status information and thus not useful for using
255 as input for another command. The output from the jobs are put into
256 files.
257
258 The argument replace string ($ITEM) cannot be changed. Arguments must
259 be quoted - thus arguments containing special characters (space '"&!*)
260 may cause problems. More than one argument is not supported. Filenames
261 containing newlines are not processed correctly. When reading input
262 from a file null cannot be used as a terminator. ppss needs to read the
263 whole input file before starting any jobs.
264
265 Output and status information is stored in ppss_dir and thus requires
266 cleanup when completed. If the dir is not removed before running ppss
267 again it may cause nothing to happen as ppss thinks the task is already
268 done. GNU parallel will normally not need cleaning up if running
269 locally and will only need cleaning up if stopped abnormally and
270 running remote (--cleanup may not complete if stopped abnormally). The
271 example Parallel grep would require extra postprocessing if written
272 using ppss.
273
274 For remote systems PPSS requires 3 steps: config, deploy, and start.
275 GNU parallel only requires one step.
276
277 EXAMPLES FROM ppss MANUAL
278
279 Here are the examples from ppss's manual page with the equivalent using
280 GNU parallel:
281
282 1$ ./ppss.sh standalone -d /path/to/files -c 'gzip '
283
284 1$ find /path/to/files -type f | parallel gzip
285
286 2$ ./ppss.sh standalone -d /path/to/files -c 'cp "$ITEM" /destination/dir '
287
288 2$ find /path/to/files -type f | parallel cp {} /destination/dir
289
290 3$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q '
291
292 3$ parallel -a list-of-urls.txt wget -q
293
294 4$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q "$ITEM"'
295
296 4$ parallel -a list-of-urls.txt wget -q {}
297
298 5$ ./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir \
299 -m 192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh \
300 -n nodes.txt -o /some/output/dir --upload --download;
301 ./ppss deploy -C config.cfg
302 ./ppss start -C config
303
304 5$ # parallel does not use configs. If you want a different username put it in nodes.txt: user@hostname
305 find source/dir -type f |
306 parallel --sshloginfile nodes.txt --trc {.}.mp3 lame -a {} -o {.}.mp3 --preset standard --quiet
307
308 6$ ./ppss stop -C config.cfg
309
310 6$ killall -TERM parallel
311
312 7$ ./ppss pause -C config.cfg
313
314 7$ Press: CTRL-Z or killall -SIGTSTP parallel
315
316 8$ ./ppss continue -C config.cfg
317
318 8$ Enter: fg or killall -SIGCONT parallel
319
320 9$ ./ppss.sh status -C config.cfg
321
322 9$ killall -SIGUSR2 parallel
323
324 https://github.com/louwrentius/PPSS
325
326 DIFFERENCES BETWEEN pexec AND GNU Parallel
327 Summary (see legend above):
328
329 I1 I2 - I4 I5 - -
330 M1 - M3 - - M6
331 O1 O2 O3 - O5 O6
332 E1 - - E4 - E6 -
333 R1 - - - - R6 - - -
334 S1 -
335
336 pexec is also a tool for running jobs in parallel.
337
338 EXAMPLES FROM pexec MANUAL
339
340 Here are the examples from pexec's info page with the equivalent using
341 GNU parallel:
342
343 1$ pexec -o sqrt-%s.dat -p "$(seq 10)" -e NUM -n 4 -c -- \
344 'echo "scale=10000;sqrt($NUM)" | bc'
345
346 1$ seq 10 | parallel -j4 'echo "scale=10000;sqrt({})" | \
347 bc > sqrt-{}.dat'
348
349 2$ pexec -p "$(ls myfiles*.ext)" -i %s -o %s.sort -- sort
350
351 2$ ls myfiles*.ext | parallel sort {} ">{}.sort"
352
353 3$ pexec -f image.list -n auto -e B -u star.log -c -- \
354 'fistar $B.fits -f 100 -F id,x,y,flux -o $B.star'
355
356 3$ parallel -a image.list \
357 'fistar {}.fits -f 100 -F id,x,y,flux -o {}.star' 2>star.log
358
359 4$ pexec -r *.png -e IMG -c -o - -- \
360 'convert $IMG ${IMG%.png}.jpeg ; "echo $IMG: done"'
361
362 4$ ls *.png | parallel 'convert {} {.}.jpeg; echo {}: done'
363
364 5$ pexec -r *.png -i %s -o %s.jpg -c 'pngtopnm | pnmtojpeg'
365
366 5$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {}.jpg'
367
368 6$ for p in *.png ; do echo ${p%.png} ; done | \
369 pexec -f - -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
370
371 6$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
372
373 7$ LIST=$(for p in *.png ; do echo ${p%.png} ; done)
374 pexec -r $LIST -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
375
376 7$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
377
378 8$ pexec -n 8 -r *.jpg -y unix -e IMG -c \
379 'pexec -j -m blockread -d $IMG | \
380 jpegtopnm | pnmscale 0.5 | pnmtojpeg | \
381 pexec -j -m blockwrite -s th_$IMG'
382
383 8$ # Combining GNU B<parallel> and GNU B<sem>.
384 ls *jpg | parallel -j8 'sem --id blockread cat {} | jpegtopnm |' \
385 'pnmscale 0.5 | pnmtojpeg | sem --id blockwrite cat > th_{}'
386
387 # If reading and writing is done to the same disk, this may be
388 # faster as only one process will be either reading or writing:
389 ls *jpg | parallel -j8 'sem --id diskio cat {} | jpegtopnm |' \
390 'pnmscale 0.5 | pnmtojpeg | sem --id diskio cat > th_{}'
391
392 https://www.gnu.org/software/pexec/
393
394 DIFFERENCES BETWEEN xjobs AND GNU Parallel
395 xjobs is also a tool for running jobs in parallel. It only supports
396 running jobs on your local computer.
397
398 xjobs deals badly with special characters just like xargs. See the
399 section DIFFERENCES BETWEEN xargs AND GNU Parallel.
400
401 EXAMPLES FROM xjobs MANUAL
402
403 Here are the examples from xjobs's man page with the equivalent using
404 GNU parallel:
405
406 1$ ls -1 *.zip | xjobs unzip
407
408 1$ ls *.zip | parallel unzip
409
410 2$ ls -1 *.zip | xjobs -n unzip
411
412 2$ ls *.zip | parallel unzip >/dev/null
413
414 3$ find . -name '*.bak' | xjobs gzip
415
416 3$ find . -name '*.bak' | parallel gzip
417
418 4$ ls -1 *.jar | sed 's/\(.*\)/\1 > \1.idx/' | xjobs jar tf
419
420 4$ ls *.jar | parallel jar tf {} '>' {}.idx
421
422 5$ xjobs -s script
423
424 5$ cat script | parallel
425
426 6$ mkfifo /var/run/my_named_pipe;
427 xjobs -s /var/run/my_named_pipe &
428 echo unzip 1.zip >> /var/run/my_named_pipe;
429 echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
430
431 6$ mkfifo /var/run/my_named_pipe;
432 cat /var/run/my_named_pipe | parallel &
433 echo unzip 1.zip >> /var/run/my_named_pipe;
434 echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
435
436 https://www.maier-komor.de/xjobs.html (Last checked: 2019-01)
437
438 DIFFERENCES BETWEEN prll AND GNU Parallel
439 prll is also a tool for running jobs in parallel. It does not support
440 running jobs on remote computers.
441
442 prll encourages using BASH aliases and BASH functions instead of
443 scripts. GNU parallel supports scripts directly, functions if they are
444 exported using export -f, and aliases if using env_parallel.
445
446 prll generates a lot of status information on stderr (standard error)
447 which makes it harder to use the stderr (standard error) output of the
448 job directly as input for another program.
449
450 EXAMPLES FROM prll's MANUAL
451
452 Here is the example from prll's man page with the equivalent using GNU
453 parallel:
454
455 1$ prll -s 'mogrify -flip $1' *.jpg
456
457 1$ parallel mogrify -flip ::: *.jpg
458
459 https://github.com/exzombie/prll (Last checked: 2019-01)
460
461 DIFFERENCES BETWEEN dxargs AND GNU Parallel
462 dxargs is also a tool for running jobs in parallel.
463
464 dxargs does not deal well with more simultaneous jobs than SSHD's
465 MaxStartups. dxargs is only built for remote run jobs, but does not
466 support transferring of files.
467
468 https://web.archive.org/web/20120518070250/http://www.
469 semicomplete.com/blog/geekery/distributed-xargs.html (Last checked:
470 2019-01)
471
472 DIFFERENCES BETWEEN mdm/middleman AND GNU Parallel
473 middleman(mdm) is also a tool for running jobs in parallel.
474
475 EXAMPLES FROM middleman's WEBSITE
476
477 Here are the shellscripts of
478 https://web.archive.org/web/20110728064735/http://mdm.
479 berlios.de/usage.html ported to GNU parallel:
480
481 1$ seq 19 | parallel buffon -o - | sort -n > result
482 cat files | parallel cmd
483 find dir -execdir sem cmd {} \;
484
485 https://github.com/cklin/mdm (Last checked: 2019-01)
486
487 DIFFERENCES BETWEEN xapply AND GNU Parallel
488 xapply can run jobs in parallel on the local computer.
489
490 EXAMPLES FROM xapply's MANUAL
491
492 Here are the examples from xapply's man page with the equivalent using
493 GNU parallel:
494
495 1$ xapply '(cd %1 && make all)' */
496
497 1$ parallel 'cd {} && make all' ::: */
498
499 2$ xapply -f 'diff %1 ../version5/%1' manifest | more
500
501 2$ parallel diff {} ../version5/{} < manifest | more
502
503 3$ xapply -p/dev/null -f 'diff %1 %2' manifest1 checklist1
504
505 3$ parallel --link diff {1} {2} :::: manifest1 checklist1
506
507 4$ xapply 'indent' *.c
508
509 4$ parallel indent ::: *.c
510
511 5$ find ~ksb/bin -type f ! -perm -111 -print | \
512 xapply -f -v 'chmod a+x' -
513
514 5$ find ~ksb/bin -type f ! -perm -111 -print | \
515 parallel -v chmod a+x
516
517 6$ find */ -... | fmt 960 1024 | xapply -f -i /dev/tty 'vi' -
518
519 6$ sh <(find */ -... | parallel -s 1024 echo vi)
520
521 6$ find */ -... | parallel -s 1024 -Xuj1 vi
522
523 7$ find ... | xapply -f -5 -i /dev/tty 'vi' - - - - -
524
525 7$ sh <(find ... | parallel -n5 echo vi)
526
527 7$ find ... | parallel -n5 -uj1 vi
528
529 8$ xapply -fn "" /etc/passwd
530
531 8$ parallel -k echo < /etc/passwd
532
533 9$ tr ':' '\012' < /etc/passwd | \
534 xapply -7 -nf 'chown %1 %6' - - - - - - -
535
536 9$ tr ':' '\012' < /etc/passwd | parallel -N7 chown {1} {6}
537
538 10$ xapply '[ -d %1/RCS ] || echo %1' */
539
540 10$ parallel '[ -d {}/RCS ] || echo {}' ::: */
541
542 11$ xapply -f '[ -f %1 ] && echo %1' List | ...
543
544 11$ parallel '[ -f {} ] && echo {}' < List | ...
545
546 https://www.databits.net/~ksb/msrc/local/bin/xapply/xapply.html
547
548 DIFFERENCES BETWEEN AIX apply AND GNU Parallel
549 apply can build command lines based on a template and arguments - very
550 much like GNU parallel. apply does not run jobs in parallel. apply does
551 not use an argument separator (like :::); instead the template must be
552 the first argument.
553
554 EXAMPLES FROM IBM's KNOWLEDGE CENTER
555
556 Here are the examples from IBM's Knowledge Center and the corresponding
557 command using GNU parallel:
558
559 To obtain results similar to those of the ls command, enter:
560
561 1$ apply echo *
562 1$ parallel echo ::: *
563
564 To compare the file named a1 to the file named b1, and the file named
565 a2 to the file named b2, enter:
566
567 2$ apply -2 cmp a1 b1 a2 b2
568 2$ parallel -N2 cmp ::: a1 b1 a2 b2
569
570 To run the who command five times, enter:
571
572 3$ apply -0 who 1 2 3 4 5
573 3$ parallel -N0 who ::: 1 2 3 4 5
574
575 To link all files in the current directory to the directory /usr/joe,
576 enter:
577
578 4$ apply 'ln %1 /usr/joe' *
579 4$ parallel ln {} /usr/joe ::: *
580
581 https://www-01.ibm.com/support/knowledgecenter/
582 ssw_aix_71/com.ibm.aix.cmds1/apply.htm (Last checked: 2019-01)
583
584 DIFFERENCES BETWEEN paexec AND GNU Parallel
585 paexec can run jobs in parallel on both the local and remote computers.
586
587 paexec requires commands to print a blank line as the last output. This
588 means you will have to write a wrapper for most programs.
589
590 paexec has a job dependency facility so a job can depend on another job
591 to be executed successfully. Sort of a poor-man's make.
592
593 EXAMPLES FROM paexec's EXAMPLE CATALOG
594
595 Here are the examples from paexec's example catalog with the equivalent
596 using GNU parallel:
597
598 1_div_X_run
599
600 1$ ../../paexec -s -l -c "`pwd`/1_div_X_cmd" -n +1 <<EOF [...]
601
602 1$ parallel echo {} '|' `pwd`/1_div_X_cmd <<EOF [...]
603
604 all_substr_run
605
606 2$ ../../paexec -lp -c "`pwd`/all_substr_cmd" -n +3 <<EOF [...]
607
608 2$ parallel echo {} '|' `pwd`/all_substr_cmd <<EOF [...]
609
610 cc_wrapper_run
611
612 3$ ../../paexec -c "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
613 -n 'host1 host2' \
614 -t '/usr/bin/ssh -x' <<EOF [...]
615
616 3$ parallel echo {} '|' "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
617 -S host1,host2 <<EOF [...]
618
619 # This is not exactly the same, but avoids the wrapper
620 parallel gcc -O2 -c -o {.}.o {} \
621 -S host1,host2 <<EOF [...]
622
623 toupper_run
624
625 4$ ../../paexec -lp -c "`pwd`/toupper_cmd" -n +10 <<EOF [...]
626
627 4$ parallel echo {} '|' ./toupper_cmd <<EOF [...]
628
629 # Without the wrapper:
630 parallel echo {} '| awk {print\ toupper\(\$0\)}' <<EOF [...]
631
632 https://github.com/cheusov/paexec
633
634 DIFFERENCES BETWEEN map(sitaramc) AND GNU Parallel
635 Summary (see legend above):
636
637 I1 - - I4 - - (I7)
638 M1 (M2) M3 (M4) M5 M6
639 - O2 O3 - O5 - - N/A N/A O10
640 E1 - - - - - -
641 - - - - - - - - -
642 - -
643
644 (I7): Only under special circumstances. See below.
645
646 (M2+M4): Only if there is a single replacement string.
647
648 map rejects input with special characters:
649
650 echo "The Cure" > My\ brother\'s\ 12\"\ records
651
652 ls | map 'echo %; wc %'
653
654 It works with GNU parallel:
655
656 ls | parallel 'echo {}; wc {}'
657
658 Under some circumstances it also works with map:
659
660 ls | map 'echo % works %'
661
662 But tiny changes make it reject the input with special characters:
663
664 ls | map 'echo % does not work "%"'
665
666 This means that many UTF-8 characters will be rejected. This is by
667 design. From the web page: "As such, programs that quietly handle them,
668 with no warnings at all, are doing their users a disservice."
669
670 map delays each job by 0.01 s. This can be emulated by using parallel
671 --delay 0.01.
672
673 map prints '+' on stderr when a job starts, and '-' when a job
674 finishes. This cannot be disabled. parallel has --bar if you need to
675 see progress.
676
677 map's replacement strings (% %D %B %E) can be simulated in GNU parallel
678 by putting this in ~/.parallel/config:
679
680 --rpl '%'
681 --rpl '%D $_=Q(::dirname($_));'
682 --rpl '%B s:.*/::;s:\.[^/.]+$::;'
683 --rpl '%E s:.*\.::'
684
685 map does not have an argument separator on the command line, but uses
686 the first argument as command. This makes quoting harder which again
687 may affect readability. Compare:
688
689 map -p 2 'perl -ne '"'"'/^\S+\s+\S+$/ and print $ARGV,"\n"'"'" *
690
691 parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' ::: *
692
693 map can do multiple arguments with context replace, but not without
694 context replace:
695
696 parallel --xargs echo 'BEGIN{'{}'}END' ::: 1 2 3
697
698 map "echo 'BEGIN{'%'}END'" 1 2 3
699
700 map has no support for grouping. So this gives the wrong results:
701
702 parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} \
703 ::: a b c d e f
704 ls -l a b c d e f
705 parallel -kP4 -n1 grep 1 ::: a b c d e f > out.par
706 map -n1 -p 4 'grep 1' a b c d e f > out.map-unbuf
707 map -n1 -p 4 'grep --line-buffered 1' a b c d e f > out.map-linebuf
708 map -n1 -p 1 'grep --line-buffered 1' a b c d e f > out.map-serial
709 ls -l out*
710 md5sum out*
711
712 EXAMPLES FROM map's WEBSITE
713
714 Here are the examples from map's web page with the equivalent using GNU
715 parallel:
716
717 1$ ls *.gif | map convert % %B.png # default max-args: 1
718
719 1$ ls *.gif | parallel convert {} {.}.png
720
721 2$ map "mkdir %B; tar -C %B -xf %" *.tgz # default max-args: 1
722
723 2$ parallel 'mkdir {.}; tar -C {.} -xf {}' ::: *.tgz
724
725 3$ ls *.gif | map cp % /tmp # default max-args: 100
726
727 3$ ls *.gif | parallel -X cp {} /tmp
728
729 4$ ls *.tar | map -n 1 tar -xf %
730
731 4$ ls *.tar | parallel tar -xf
732
733 5$ map "cp % /tmp" *.tgz
734
735 5$ parallel cp {} /tmp ::: *.tgz
736
737 6$ map "du -sm /home/%/mail" alice bob carol
738
739 6$ parallel "du -sm /home/{}/mail" ::: alice bob carol
740 or if you prefer running a single job with multiple args:
741 6$ parallel -Xj1 "du -sm /home/{}/mail" ::: alice bob carol
742
743 7$ cat /etc/passwd | map -d: 'echo user %1 has shell %7'
744
745 7$ cat /etc/passwd | parallel --colsep : 'echo user {1} has shell {7}'
746
747 8$ export MAP_MAX_PROCS=$(( `nproc` / 2 ))
748
749 8$ export PARALLEL=-j50%
750
751 https://github.com/sitaramc/map (Last checked: 2020-05)
752
753 DIFFERENCES BETWEEN ladon AND GNU Parallel
754 ladon can run multiple jobs on files in parallel.
755
756 ladon only works on files and the only way to specify files is using a
757 quoted glob string (such as \*.jpg). It is not possible to list the
758 files manually.
759
760 As replacement strings it uses FULLPATH DIRNAME BASENAME EXT RELDIR
761 RELPATH
762
763 These can be simulated using GNU parallel by putting this in
764 ~/.parallel/config:
765
766 --rpl 'FULLPATH $_=Q($_);chomp($_=qx{readlink -f $_});'
767 --rpl 'DIRNAME $_=Q(::dirname($_));chomp($_=qx{readlink -f $_});'
768 --rpl 'BASENAME s:.*/::;s:\.[^/.]+$::;'
769 --rpl 'EXT s:.*\.::'
770 --rpl 'RELDIR $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
771 s:\Q$c/\E::;$_=::dirname($_);'
772 --rpl 'RELPATH $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
773 s:\Q$c/\E::;'
774
775 ladon deals badly with filenames containing " and newline, and it fails
776 for output larger than 200k:
777
778 ladon '*' -- seq 36000 | wc
779
780 EXAMPLES FROM ladon MANUAL
781
782 It is assumed that the '--rpl's above are put in ~/.parallel/config and
783 that it is run under a shell that supports '**' globbing (such as zsh):
784
785 1$ ladon "**/*.txt" -- echo RELPATH
786
787 1$ parallel echo RELPATH ::: **/*.txt
788
789 2$ ladon "~/Documents/**/*.pdf" -- shasum FULLPATH >hashes.txt
790
791 2$ parallel shasum FULLPATH ::: ~/Documents/**/*.pdf >hashes.txt
792
793 3$ ladon -m thumbs/RELDIR "**/*.jpg" -- convert FULLPATH \
794 -thumbnail 100x100^ -gravity center -extent 100x100 \
795 thumbs/RELPATH
796
797 3$ parallel mkdir -p thumbs/RELDIR\; convert FULLPATH
798 -thumbnail 100x100^ -gravity center -extent 100x100 \
799 thumbs/RELPATH ::: **/*.jpg
800
801 4$ ladon "~/Music/*.wav" -- lame -V 2 FULLPATH DIRNAME/BASENAME.mp3
802
803 4$ parallel lame -V 2 FULLPATH DIRNAME/BASENAME.mp3 ::: ~/Music/*.wav
804
805 https://github.com/danielgtaylor/ladon (Last checked: 2019-01)
806
807 DIFFERENCES BETWEEN jobflow AND GNU Parallel
808 jobflow can run multiple jobs in parallel.
809
810 Just like xargs output from jobflow jobs running in parallel mix
811 together by default. jobflow can buffer into files (placed in
812 /run/shm), but these are not cleaned up if jobflow dies unexpectedly
813 (e.g. by Ctrl-C). If the total output is big (in the order of RAM+swap)
814 it can cause the system to slow to a crawl and eventually run out of
815 memory.
816
817 jobflow gives no error if the command is unknown, and like xargs
818 redirection and composed commands require wrapping with bash -c.
819
820 Input lines can at most be 4096 bytes. You can at most have 16 {}'s in
821 the command template. More than that either crashes the program or
822 simple does not execute the command.
823
824 jobflow has no equivalent for --pipe, or --sshlogin.
825
826 jobflow makes it possible to set resource limits on the running jobs.
827 This can be emulated by GNU parallel using bash's ulimit:
828
829 jobflow -limits=mem=100M,cpu=3,fsize=20M,nofiles=300 myjob
830
831 parallel 'ulimit -v 102400 -t 3 -f 204800 -n 300 myjob'
832
833 EXAMPLES FROM jobflow README
834
835 1$ cat things.list | jobflow -threads=8 -exec ./mytask {}
836
837 1$ cat things.list | parallel -j8 ./mytask {}
838
839 2$ seq 100 | jobflow -threads=100 -exec echo {}
840
841 2$ seq 100 | parallel -j100 echo {}
842
843 3$ cat urls.txt | jobflow -threads=32 -exec wget {}
844
845 3$ cat urls.txt | parallel -j32 wget {}
846
847 4$ find . -name '*.bmp' | \
848 jobflow -threads=8 -exec bmp2jpeg {.}.bmp {.}.jpg
849
850 4$ find . -name '*.bmp' | \
851 parallel -j8 bmp2jpeg {.}.bmp {.}.jpg
852
853 https://github.com/rofl0r/jobflow
854
855 DIFFERENCES BETWEEN gargs AND GNU Parallel
856 gargs can run multiple jobs in parallel.
857
858 Older versions cache output in memory. This causes it to be extremely
859 slow when the output is larger than the physical RAM, and can cause the
860 system to run out of memory.
861
862 See more details on this in man parallel_design.
863
864 Newer versions cache output in files, but leave files in $TMPDIR if it
865 is killed.
866
867 Output to stderr (standard error) is changed if the command fails.
868
869 EXAMPLES FROM gargs WEBSITE
870
871 1$ seq 12 -1 1 | gargs -p 4 -n 3 "sleep {0}; echo {1} {2}"
872
873 1$ seq 12 -1 1 | parallel -P 4 -n 3 "sleep {1}; echo {2} {3}"
874
875 2$ cat t.txt | gargs --sep "\s+" \
876 -p 2 "echo '{0}:{1}-{2}' full-line: \'{}\'"
877
878 2$ cat t.txt | parallel --colsep "\\s+" \
879 -P 2 "echo '{1}:{2}-{3}' full-line: \'{}\'"
880
881 https://github.com/brentp/gargs
882
883 DIFFERENCES BETWEEN orgalorg AND GNU Parallel
884 orgalorg can run the same job on multiple machines. This is related to
885 --onall and --nonall.
886
887 orgalorg supports entering the SSH password - provided it is the same
888 for all servers. GNU parallel advocates using ssh-agent instead, but it
889 is possible to emulate orgalorg's behavior by setting SSHPASS and by
890 using --ssh "sshpass ssh".
891
892 To make the emulation easier, make a simple alias:
893
894 alias par_emul="parallel -j0 --ssh 'sshpass ssh' --nonall --tag --lb"
895
896 If you want to supply a password run:
897
898 SSHPASS=`ssh-askpass`
899
900 or set the password directly:
901
902 SSHPASS=P4$$w0rd!
903
904 If the above is set up you can then do:
905
906 orgalorg -o frontend1 -o frontend2 -p -C uptime
907 par_emul -S frontend1 -S frontend2 uptime
908
909 orgalorg -o frontend1 -o frontend2 -p -C top -bid 1
910 par_emul -S frontend1 -S frontend2 top -bid 1
911
912 orgalorg -o frontend1 -o frontend2 -p -er /tmp -n \
913 'md5sum /tmp/bigfile' -S bigfile
914 par_emul -S frontend1 -S frontend2 --basefile bigfile \
915 --workdir /tmp md5sum /tmp/bigfile
916
917 orgalorg has a progress indicator for the transferring of a file. GNU
918 parallel does not.
919
920 https://github.com/reconquest/orgalorg
921
922 DIFFERENCES BETWEEN Rust parallel AND GNU Parallel
923 Rust parallel focuses on speed. It is almost as fast as xargs, but not
924 as fast as parallel-bash. It implements a few features from GNU
925 parallel, but lacks many functions. All these fail:
926
927 # Read arguments from file
928 parallel -a file echo
929 # Changing the delimiter
930 parallel -d _ echo ::: a_b_c_
931
932 These do something different from GNU parallel
933
934 # -q to protect quoted $ and space
935 parallel -q perl -e '$a=shift; print "$a"x10000000' ::: a b c
936 # Generation of combination of inputs
937 parallel echo {1} {2} ::: red green blue ::: S M L XL XXL
938 # {= perl expression =} replacement string
939 parallel echo '{= s/new/old/ =}' ::: my.new your.new
940 # --pipe
941 seq 100000 | parallel --pipe wc
942 # linked arguments
943 parallel echo ::: S M L :::+ sml med lrg ::: R G B :::+ red grn blu
944 # Run different shell dialects
945 zsh -c 'parallel echo \={} ::: zsh && true'
946 csh -c 'parallel echo \$\{\} ::: shell && true'
947 bash -c 'parallel echo \$\({}\) ::: pwd && true'
948 # Rust parallel does not start before the last argument is read
949 (seq 10; sleep 5; echo 2) | time parallel -j2 'sleep 2; echo'
950 tail -f /var/log/syslog | parallel echo
951
952 Most of the examples from the book GNU Parallel 2018 do not work, thus
953 Rust parallel is not close to being a compatible replacement.
954
955 Rust parallel has no remote facilities.
956
957 It uses /tmp/parallel for tmp files and does not clean up if terminated
958 abruptly. If another user on the system uses Rust parallel, then
959 /tmp/parallel will have the wrong permissions and Rust parallel will
960 fail. A malicious user can setup the right permissions and symlink the
961 output file to one of the user's files and next time the user uses Rust
962 parallel it will overwrite this file.
963
964 attacker$ mkdir /tmp/parallel
965 attacker$ chmod a+rwX /tmp/parallel
966 # Symlink to the file the attacker wants to zero out
967 attacker$ ln -s ~victim/.important-file /tmp/parallel/stderr_1
968 victim$ seq 1000 | parallel echo
969 # This file is now overwritten with stderr from 'echo'
970 victim$ cat ~victim/.important-file
971
972 If /tmp/parallel runs full during the run, Rust parallel does not
973 report this, but finishes with success - thereby risking data loss.
974
975 https://github.com/mmstick/parallel
976
977 DIFFERENCES BETWEEN Rush AND GNU Parallel
978 rush (https://github.com/shenwei356/rush) is written in Go and based on
979 gargs.
980
981 Just like GNU parallel rush buffers in temporary files. But opposite
982 GNU parallel rush does not clean up, if the process dies abnormally.
983
984 rush has some string manipulations that can be emulated by putting this
985 into ~/.parallel/config (/ is used instead of %, and % is used instead
986 of ^ as that is closer to bash's ${var%postfix}):
987
988 --rpl '{:} s:(\.[^/]+)*$::'
989 --rpl '{:%([^}]+?)} s:$$1(\.[^/]+)*$::'
990 --rpl '{/:%([^}]*?)} s:.*/(.*)$$1(\.[^/]+)*$:$1:'
991 --rpl '{/:} s:(.*/)?([^/.]+)(\.[^/]+)*$:$2:'
992 --rpl '{@(.*?)} /$$1/ and $_=$1;'
993
994 EXAMPLES FROM rush's WEBSITE
995
996 Here are the examples from rush's website with the equivalent command
997 in GNU parallel.
998
999 1. Simple run, quoting is not necessary
1000
1001 1$ seq 1 3 | rush echo {}
1002
1003 1$ seq 1 3 | parallel echo {}
1004
1005 2. Read data from file (`-i`)
1006
1007 2$ rush echo {} -i data1.txt -i data2.txt
1008
1009 2$ cat data1.txt data2.txt | parallel echo {}
1010
1011 3. Keep output order (`-k`)
1012
1013 3$ seq 1 3 | rush 'echo {}' -k
1014
1015 3$ seq 1 3 | parallel -k echo {}
1016
1017 4. Timeout (`-t`)
1018
1019 4$ time seq 1 | rush 'sleep 2; echo {}' -t 1
1020
1021 4$ time seq 1 | parallel --timeout 1 'sleep 2; echo {}'
1022
1023 5. Retry (`-r`)
1024
1025 5$ seq 1 | rush 'python unexisted_script.py' -r 1
1026
1027 5$ seq 1 | parallel --retries 2 'python unexisted_script.py'
1028
1029 Use -u to see it is really run twice:
1030
1031 5$ seq 1 | parallel -u --retries 2 'python unexisted_script.py'
1032
1033 6. Dirname (`{/}`) and basename (`{%}`) and remove custom suffix
1034 (`{^suffix}`)
1035
1036 6$ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
1037
1038 6$ echo dir/file_1.txt.gz |
1039 parallel --plus echo {//} {/} {%_1.txt.gz}
1040
1041 7. Get basename, and remove last (`{.}`) or any (`{:}`) extension
1042
1043 7$ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
1044
1045 7$ echo dir.d/file.txt.gz | parallel 'echo {.} {:} {/.} {/:}'
1046
1047 8. Job ID, combine fields index and other replacement strings
1048
1049 8$ echo 12 file.txt dir/s_1.fq.gz |
1050 rush 'echo job {#}: {2} {2.} {3%:^_1}'
1051
1052 8$ echo 12 file.txt dir/s_1.fq.gz |
1053 parallel --colsep ' ' 'echo job {#}: {2} {2.} {3/:%_1}'
1054
1055 9. Capture submatch using regular expression (`{@regexp}`)
1056
1057 9$ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
1058
1059 9$ echo read_1.fq.gz | parallel 'echo {@(.+)_\d}'
1060
1061 10. Custom field delimiter (`-d`)
1062
1063 10$ echo a=b=c | rush 'echo {1} {2} {3}' -d =
1064
1065 10$ echo a=b=c | parallel -d = echo {1} {2} {3}
1066
1067 11. Send multi-lines to every command (`-n`)
1068
1069 11$ seq 5 | rush -n 2 -k 'echo "{}"; echo'
1070
1071 11$ seq 5 |
1072 parallel -n 2 -k \
1073 'echo {=-1 $_=join"\n",@arg[1..$#arg] =}; echo'
1074
1075 11$ seq 5 | rush -n 2 -k 'echo "{}"; echo' -J ' '
1076
1077 11$ seq 5 | parallel -n 2 -k 'echo {}; echo'
1078
1079 12. Custom record delimiter (`-D`), note that empty records are not
1080 used.
1081
1082 12$ echo a b c d | rush -D " " -k 'echo {}'
1083
1084 12$ echo a b c d | parallel -d " " -k 'echo {}'
1085
1086 12$ echo abcd | rush -D "" -k 'echo {}'
1087
1088 Cannot be done by GNU Parallel
1089
1090 12$ cat fasta.fa
1091 >seq1
1092 tag
1093 >seq2
1094 cat
1095 gat
1096 >seq3
1097 attac
1098 a
1099 cat
1100
1101 12$ cat fasta.fa | rush -D ">" \
1102 'echo FASTA record {#}: name: {1} sequence: {2}' -k -d "\n"
1103 # rush fails to join the multiline sequences
1104
1105 12$ cat fasta.fa | (read -n1 ignore_first_char;
1106 parallel -d '>' --colsep '\n' echo FASTA record {#}: \
1107 name: {1} sequence: '{=2 $_=join"",@arg[2..$#arg]=}'
1108 )
1109
1110 13. Assign value to variable, like `awk -v` (`-v`)
1111
1112 13$ seq 1 |
1113 rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
1114
1115 13$ seq 1 |
1116 parallel -N0 \
1117 'fname=Wei; lname=Shen; echo Hello, ${fname} ${lname}!'
1118
1119 13$ for var in a b; do \
1120 13$ seq 1 3 | rush -k -v var=$var 'echo var: {var}, data: {}'; \
1121 13$ done
1122
1123 In GNU parallel you would typically do:
1124
1125 13$ seq 1 3 | parallel -k echo var: {1}, data: {2} ::: a b :::: -
1126
1127 If you really want the var:
1128
1129 13$ seq 1 3 |
1130 parallel -k var={1} ';echo var: $var, data: {}' ::: a b :::: -
1131
1132 If you really want the for-loop:
1133
1134 13$ for var in a b; do
1135 export var;
1136 seq 1 3 | parallel -k 'echo var: $var, data: {}';
1137 done
1138
1139 Contrary to rush this also works if the value is complex like:
1140
1141 My brother's 12" records
1142
1143 14. Preset variable (`-v`), avoid repeatedly writing verbose
1144 replacement strings
1145
1146 14$ # naive way
1147 echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz'
1148
1149 14$ echo read_1.fq.gz | parallel 'echo {:%_1} {:%_1}_2.fq.gz'
1150
1151 14$ # macro + removing suffix
1152 echo read_1.fq.gz |
1153 rush -v p='{:^_1}' 'echo {p} {p}_2.fq.gz'
1154
1155 14$ echo read_1.fq.gz |
1156 parallel 'p={:%_1}; echo $p ${p}_2.fq.gz'
1157
1158 14$ # macro + regular expression
1159 echo read_1.fq.gz | rush -v p='{@(.+?)_\d}' 'echo {p} {p}_2.fq.gz'
1160
1161 14$ echo read_1.fq.gz | parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1162
1163 Contrary to rush GNU parallel works with complex values:
1164
1165 14$ echo "My brother's 12\"read_1.fq.gz" |
1166 parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1167
1168 15. Interrupt jobs by `Ctrl-C`, rush will stop unfinished commands and
1169 exit.
1170
1171 15$ seq 1 20 | rush 'sleep 1; echo {}'
1172 ^C
1173
1174 15$ seq 1 20 | parallel 'sleep 1; echo {}'
1175 ^C
1176
1177 16. Continue/resume jobs (`-c`). When some jobs failed (by execution
1178 failure, timeout, or canceling by user with `Ctrl + C`), please switch
1179 flag `-c/--continue` on and run again, so that `rush` can save
1180 successful commands and ignore them in NEXT run.
1181
1182 16$ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1183 cat successful_cmds.rush
1184 seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1185
1186 16$ seq 1 3 | parallel --joblog mylog --timeout 2 \
1187 'sleep {}; echo {}'
1188 cat mylog
1189 seq 1 3 | parallel --joblog mylog --retry-failed \
1190 'sleep {}; echo {}'
1191
1192 Multi-line jobs:
1193
1194 16$ seq 1 3 | rush 'sleep {}; echo {}; \
1195 echo finish {}' -t 3 -c -C finished.rush
1196 cat finished.rush
1197 seq 1 3 | rush 'sleep {}; echo {}; \
1198 echo finish {}' -t 3 -c -C finished.rush
1199
1200 16$ seq 1 3 |
1201 parallel --joblog mylog --timeout 2 'sleep {}; echo {}; \
1202 echo finish {}'
1203 cat mylog
1204 seq 1 3 |
1205 parallel --joblog mylog --retry-failed 'sleep {}; echo {}; \
1206 echo finish {}'
1207
1208 17. A comprehensive example: downloading 1K+ pages given by three URL
1209 list files using `phantomjs save_page.js` (some page contents are
1210 dynamically generated by Javascript, so `wget` does not work). Here I
1211 set max jobs number (`-j`) as `20`, each job has a max running time
1212 (`-t`) of `60` seconds and `3` retry changes (`-r`). Continue flag `-c`
1213 is also switched on, so we can continue unfinished jobs. Luckily, it's
1214 accomplished in one run :)
1215
1216 17$ for f in $(seq 2014 2016); do \
1217 /bin/rm -rf $f; mkdir -p $f; \
1218 cat $f.html.txt | rush -v d=$f -d = \
1219 'phantomjs save_page.js "{}" > {d}/{3}.html' \
1220 -j 20 -t 60 -r 3 -c; \
1221 done
1222
1223 GNU parallel can append to an existing joblog with '+':
1224
1225 17$ rm mylog
1226 for f in $(seq 2014 2016); do
1227 /bin/rm -rf $f; mkdir -p $f;
1228 cat $f.html.txt |
1229 parallel -j20 --timeout 60 --retries 4 --joblog +mylog \
1230 --colsep = \
1231 phantomjs save_page.js {1}={2}={3} '>' $f/{3}.html
1232 done
1233
1234 18. A bioinformatics example: mapping with `bwa`, and processing result
1235 with `samtools`:
1236
1237 18$ ref=ref/xxx.fa
1238 threads=25
1239 ls -d raw.cluster.clean.mapping/* \
1240 | rush -v ref=$ref -v j=$threads -v p='{}/{%}' \
1241 'bwa mem -t {j} -M -a {ref} {p}_1.fq.gz {p}_2.fq.gz >{p}.sam;\
1242 samtools view -bS {p}.sam > {p}.bam; \
1243 samtools sort -T {p}.tmp -@ {j} {p}.bam -o {p}.sorted.bam; \
1244 samtools index {p}.sorted.bam; \
1245 samtools flagstat {p}.sorted.bam > {p}.sorted.bam.flagstat; \
1246 /bin/rm {p}.bam {p}.sam;' \
1247 -j 2 --verbose -c -C mapping.rush
1248
1249 GNU parallel would use a function:
1250
1251 18$ ref=ref/xxx.fa
1252 export ref
1253 thr=25
1254 export thr
1255 bwa_sam() {
1256 p="$1"
1257 bam="$p".bam
1258 sam="$p".sam
1259 sortbam="$p".sorted.bam
1260 bwa mem -t $thr -M -a $ref ${p}_1.fq.gz ${p}_2.fq.gz > "$sam"
1261 samtools view -bS "$sam" > "$bam"
1262 samtools sort -T ${p}.tmp -@ $thr "$bam" -o "$sortbam"
1263 samtools index "$sortbam"
1264 samtools flagstat "$sortbam" > "$sortbam".flagstat
1265 /bin/rm "$bam" "$sam"
1266 }
1267 export -f bwa_sam
1268 ls -d raw.cluster.clean.mapping/* |
1269 parallel -j 2 --verbose --joblog mylog bwa_sam
1270
1271 Other rush features
1272
1273 rush has:
1274
1275 • awk -v like custom defined variables (-v)
1276
1277 With GNU parallel you would simply set a shell variable:
1278
1279 parallel 'v={}; echo "$v"' ::: foo
1280 echo foo | rush -v v={} 'echo {v}'
1281
1282 Also rush does not like special chars. So these do not work:
1283
1284 echo does not work | rush -v v=\" 'echo {v}'
1285 echo "My brother's 12\" records" | rush -v v={} 'echo {v}'
1286
1287 Whereas the corresponding GNU parallel version works:
1288
1289 parallel 'v=\"; echo "$v"' ::: works
1290 parallel 'v={}; echo "$v"' ::: "My brother's 12\" records"
1291
1292 • Exit on first error(s) (-e)
1293
1294 This is called --halt now,fail=1 (or shorter: --halt 2) when used
1295 with GNU parallel.
1296
1297 • Settable records sending to every command (-n, default 1)
1298
1299 This is also called -n in GNU parallel.
1300
1301 • Practical replacement strings
1302
1303 {:} remove any extension
1304 With GNU parallel this can be emulated by:
1305
1306 parallel --plus echo '{/\..*/}' ::: foo.ext.bar.gz
1307
1308 {^suffix}, remove suffix
1309 With GNU parallel this can be emulated by:
1310
1311 parallel --plus echo '{%.bar.gz}' ::: foo.ext.bar.gz
1312
1313 {@regexp}, capture submatch using regular expression
1314 With GNU parallel this can be emulated by:
1315
1316 parallel --rpl '{@(.*?)} /$$1/ and $_=$1;' \
1317 echo '{@\d_(.*).gz}' ::: 1_foo.gz
1318
1319 {%.}, {%:}, basename without extension
1320 With GNU parallel this can be emulated by:
1321
1322 parallel echo '{= s:.*/::;s/\..*// =}' ::: dir/foo.bar.gz
1323
1324 And if you need it often, you define a --rpl in
1325 $HOME/.parallel/config:
1326
1327 --rpl '{%.} s:.*/::;s/\..*//'
1328 --rpl '{%:} s:.*/::;s/\..*//'
1329
1330 Then you can use them as:
1331
1332 parallel echo {%.} {%:} ::: dir/foo.bar.gz
1333
1334 • Preset variable (macro)
1335
1336 E.g.
1337
1338 echo foosuffix | rush -v p={^suffix} 'echo {p}_new_suffix'
1339
1340 With GNU parallel this can be emulated by:
1341
1342 echo foosuffix |
1343 parallel --plus 'p={%suffix}; echo ${p}_new_suffix'
1344
1345 Opposite rush GNU parallel works fine if the input contains double
1346 space, ' and ":
1347
1348 echo "1'6\" foosuffix" |
1349 parallel --plus 'p={%suffix}; echo "${p}"_new_suffix'
1350
1351 • Commands of multi-lines
1352
1353 While you can use multi-lined commands in GNU parallel, to improve
1354 readability GNU parallel discourages the use of multi-line
1355 commands. In most cases it can be written as a function:
1356
1357 seq 1 3 |
1358 parallel --timeout 2 --joblog my.log 'sleep {}; echo {}; \
1359 echo finish {}'
1360
1361 Could be written as:
1362
1363 doit() {
1364 sleep "$1"
1365 echo "$1"
1366 echo finish "$1"
1367 }
1368 export -f doit
1369 seq 1 3 | parallel --timeout 2 --joblog my.log doit
1370
1371 The failed commands can be resumed with:
1372
1373 seq 1 3 |
1374 parallel --resume-failed --joblog my.log 'sleep {}; echo {};\
1375 echo finish {}'
1376
1377 https://github.com/shenwei356/rush
1378
1379 DIFFERENCES BETWEEN ClusterSSH AND GNU Parallel
1380 ClusterSSH solves a different problem than GNU parallel.
1381
1382 ClusterSSH opens a terminal window for each computer and using a master
1383 window you can run the same command on all the computers. This is
1384 typically used for administrating several computers that are almost
1385 identical.
1386
1387 GNU parallel runs the same (or different) commands with different
1388 arguments in parallel possibly using remote computers to help
1389 computing. If more than one computer is listed in -S GNU parallel may
1390 only use one of these (e.g. if there are 8 jobs to be run and one
1391 computer has 8 cores).
1392
1393 GNU parallel can be used as a poor-man's version of ClusterSSH:
1394
1395 parallel --nonall -S server-a,server-b do_stuff foo bar
1396
1397 https://github.com/duncs/clusterssh
1398
1399 DIFFERENCES BETWEEN coshell AND GNU Parallel
1400 coshell only accepts full commands on standard input. Any quoting needs
1401 to be done by the user.
1402
1403 Commands are run in sh so any bash/tcsh/zsh specific syntax will not
1404 work.
1405
1406 Output can be buffered by using -d. Output is buffered in memory, so
1407 big output can cause swapping and therefore be terrible slow or even
1408 cause out of memory.
1409
1410 https://github.com/gdm85/coshell (Last checked: 2019-01)
1411
1412 DIFFERENCES BETWEEN spread AND GNU Parallel
1413 spread runs commands on all directories.
1414
1415 It can be emulated with GNU parallel using this Bash function:
1416
1417 spread() {
1418 _cmds() {
1419 perl -e '$"=" && ";print "@ARGV"' "cd {}" "$@"
1420 }
1421 parallel $(_cmds "$@")'|| echo exit status $?' ::: */
1422 }
1423
1424 This works except for the --exclude option.
1425
1426 (Last checked: 2017-11)
1427
1428 DIFFERENCES BETWEEN pyargs AND GNU Parallel
1429 pyargs deals badly with input containing spaces. It buffers stdout, but
1430 not stderr. It buffers in RAM. {} does not work as replacement string.
1431 It does not support running functions.
1432
1433 pyargs does not support composed commands if run with --lines, and
1434 fails on pyargs traceroute gnu.org fsf.org.
1435
1436 Examples
1437
1438 seq 5 | pyargs -P50 -L seq
1439 seq 5 | parallel -P50 --lb seq
1440
1441 seq 5 | pyargs -P50 --mark -L seq
1442 seq 5 | parallel -P50 --lb \
1443 --tagstring OUTPUT'[{= $_=$job->replaced()=}]' seq
1444 # Similar, but not precisely the same
1445 seq 5 | parallel -P50 --lb --tag seq
1446
1447 seq 5 | pyargs -P50 --mark command
1448 # Somewhat longer with GNU Parallel due to the special
1449 # --mark formatting
1450 cmd="$(echo "command" | parallel --shellquote)"
1451 wrap_cmd() {
1452 echo "MARK $cmd $@================================" >&3
1453 echo "OUTPUT START[$cmd $@]:"
1454 eval $cmd "$@"
1455 echo "OUTPUT END[$cmd $@]"
1456 }
1457 (seq 5 | env_parallel -P2 wrap_cmd) 3>&1
1458 # Similar, but not exactly the same
1459 seq 5 | parallel -t --tag command
1460
1461 (echo '1 2 3';echo 4 5 6) | pyargs --stream seq
1462 (echo '1 2 3';echo 4 5 6) | perl -pe 's/\n/ /' |
1463 parallel -r -d' ' seq
1464 # Similar, but not exactly the same
1465 parallel seq ::: 1 2 3 4 5 6
1466
1467 https://github.com/robertblackwell/pyargs (Last checked: 2019-01)
1468
1469 DIFFERENCES BETWEEN concurrently AND GNU Parallel
1470 concurrently runs jobs in parallel.
1471
1472 The output is prepended with the job number, and may be incomplete:
1473
1474 $ concurrently 'seq 100000' | (sleep 3;wc -l)
1475 7165
1476
1477 When pretty printing it caches output in memory. Output mixes by using
1478 test MIX below whether or not output is cached.
1479
1480 There seems to be no way of making a template command and have
1481 concurrently fill that with different args. The full commands must be
1482 given on the command line.
1483
1484 There is also no way of controlling how many jobs should be run in
1485 parallel at a time - i.e. "number of jobslots". Instead all jobs are
1486 simply started in parallel.
1487
1488 https://github.com/kimmobrunfeldt/concurrently (Last checked: 2019-01)
1489
1490 DIFFERENCES BETWEEN map(soveran) AND GNU Parallel
1491 map does not run jobs in parallel by default. The README suggests
1492 using:
1493
1494 ... | map t 'sleep $t && say done &'
1495
1496 But this fails if more jobs are run in parallel than the number of
1497 available processes. Since there is no support for parallelization in
1498 map itself, the output also mixes:
1499
1500 seq 10 | map i 'echo start-$i && sleep 0.$i && echo end-$i &'
1501
1502 The major difference is that GNU parallel is built for parallelization
1503 and map is not. So GNU parallel has lots of ways of dealing with the
1504 issues that parallelization raises:
1505
1506 • Keep the number of processes manageable
1507
1508 • Make sure output does not mix
1509
1510 • Make Ctrl-C kill all running processes
1511
1512 EXAMPLES FROM maps WEBSITE
1513
1514 Here are the 5 examples converted to GNU Parallel:
1515
1516 1$ ls *.c | map f 'foo $f'
1517 1$ ls *.c | parallel foo
1518
1519 2$ ls *.c | map f 'foo $f; bar $f'
1520 2$ ls *.c | parallel 'foo {}; bar {}'
1521
1522 3$ cat urls | map u 'curl -O $u'
1523 3$ cat urls | parallel curl -O
1524
1525 4$ printf "1\n1\n1\n" | map t 'sleep $t && say done'
1526 4$ printf "1\n1\n1\n" | parallel 'sleep {} && say done'
1527 4$ parallel 'sleep {} && say done' ::: 1 1 1
1528
1529 5$ printf "1\n1\n1\n" | map t 'sleep $t && say done &'
1530 5$ printf "1\n1\n1\n" | parallel -j0 'sleep {} && say done'
1531 5$ parallel -j0 'sleep {} && say done' ::: 1 1 1
1532
1533 https://github.com/soveran/map (Last checked: 2019-01)
1534
1535 DIFFERENCES BETWEEN loop AND GNU Parallel
1536 loop mixes stdout and stderr:
1537
1538 loop 'ls /no-such-file' >/dev/null
1539
1540 loop's replacement string $ITEM does not quote strings:
1541
1542 echo 'two spaces' | loop 'echo $ITEM'
1543
1544 loop cannot run functions:
1545
1546 myfunc() { echo joe; }
1547 export -f myfunc
1548 loop 'myfunc this fails'
1549
1550 EXAMPLES FROM loop's WEBSITE
1551
1552 Some of the examples from https://github.com/Miserlou/Loop/ can be
1553 emulated with GNU parallel:
1554
1555 # A couple of functions will make the code easier to read
1556 $ loopy() {
1557 yes | parallel -uN0 -j1 "$@"
1558 }
1559 $ export -f loopy
1560 $ time_out() {
1561 parallel -uN0 -q --timeout "$@" ::: 1
1562 }
1563 $ match() {
1564 perl -0777 -ne 'grep /'"$1"'/,$_ and print or exit 1'
1565 }
1566 $ export -f match
1567
1568 $ loop 'ls' --every 10s
1569 $ loopy --delay 10s ls
1570
1571 $ loop 'touch $COUNT.txt' --count-by 5
1572 $ loopy touch '{= $_=seq()*5 =}'.txt
1573
1574 $ loop --until-contains 200 -- \
1575 ./get_response_code.sh --site mysite.biz`
1576 $ loopy --halt now,success=1 \
1577 './get_response_code.sh --site mysite.biz | match 200'
1578
1579 $ loop './poke_server' --for-duration 8h
1580 $ time_out 8h loopy ./poke_server
1581
1582 $ loop './poke_server' --until-success
1583 $ loopy --halt now,success=1 ./poke_server
1584
1585 $ cat files_to_create.txt | loop 'touch $ITEM'
1586 $ cat files_to_create.txt | parallel touch {}
1587
1588 $ loop 'ls' --for-duration 10min --summary
1589 # --joblog is somewhat more verbose than --summary
1590 $ time_out 10m loopy --joblog my.log ./poke_server; cat my.log
1591
1592 $ loop 'echo hello'
1593 $ loopy echo hello
1594
1595 $ loop 'echo $COUNT'
1596 # GNU Parallel counts from 1
1597 $ loopy echo {#}
1598 # Counting from 0 can be forced
1599 $ loopy echo '{= $_=seq()-1 =}'
1600
1601 $ loop 'echo $COUNT' --count-by 2
1602 $ loopy echo '{= $_=2*(seq()-1) =}'
1603
1604 $ loop 'echo $COUNT' --count-by 2 --offset 10
1605 $ loopy echo '{= $_=10+2*(seq()-1) =}'
1606
1607 $ loop 'echo $COUNT' --count-by 1.1
1608 # GNU Parallel rounds 3.3000000000000003 to 3.3
1609 $ loopy echo '{= $_=1.1*(seq()-1) =}'
1610
1611 $ loop 'echo $COUNT $ACTUALCOUNT' --count-by 2
1612 $ loopy echo '{= $_=2*(seq()-1) =} {#}'
1613
1614 $ loop 'echo $COUNT' --num 3 --summary
1615 # --joblog is somewhat more verbose than --summary
1616 $ seq 3 | parallel --joblog my.log echo; cat my.log
1617
1618 $ loop 'ls -foobarbatz' --num 3 --summary
1619 # --joblog is somewhat more verbose than --summary
1620 $ seq 3 | parallel --joblog my.log -N0 ls -foobarbatz; cat my.log
1621
1622 $ loop 'echo $COUNT' --count-by 2 --num 50 --only-last
1623 # Can be emulated by running 2 jobs
1624 $ seq 49 | parallel echo '{= $_=2*(seq()-1) =}' >/dev/null
1625 $ echo 50| parallel echo '{= $_=2*(seq()-1) =}'
1626
1627 $ loop 'date' --every 5s
1628 $ loopy --delay 5s date
1629
1630 $ loop 'date' --for-duration 8s --every 2s
1631 $ time_out 8s loopy --delay 2s date
1632
1633 $ loop 'date -u' --until-time '2018-05-25 20:50:00' --every 5s
1634 $ seconds=$((`date -d 2019-05-25T20:50:00 +%s` - `date +%s`))s
1635 $ time_out $seconds loopy --delay 5s date -u
1636
1637 $ loop 'echo $RANDOM' --until-contains "666"
1638 $ loopy --halt now,success=1 'echo $RANDOM | match 666'
1639
1640 $ loop 'if (( RANDOM % 2 )); then
1641 (echo "TRUE"; true);
1642 else
1643 (echo "FALSE"; false);
1644 fi' --until-success
1645 $ loopy --halt now,success=1 'if (( $RANDOM % 2 )); then
1646 (echo "TRUE"; true);
1647 else
1648 (echo "FALSE"; false);
1649 fi'
1650
1651 $ loop 'if (( RANDOM % 2 )); then
1652 (echo "TRUE"; true);
1653 else
1654 (echo "FALSE"; false);
1655 fi' --until-error
1656 $ loopy --halt now,fail=1 'if (( $RANDOM % 2 )); then
1657 (echo "TRUE"; true);
1658 else
1659 (echo "FALSE"; false);
1660 fi'
1661
1662 $ loop 'date' --until-match "(\d{4})"
1663 $ loopy --halt now,success=1 'date | match [0-9][0-9][0-9][0-9]'
1664
1665 $ loop 'echo $ITEM' --for red,green,blue
1666 $ parallel echo ::: red green blue
1667
1668 $ cat /tmp/my-list-of-files-to-create.txt | loop 'touch $ITEM'
1669 $ cat /tmp/my-list-of-files-to-create.txt | parallel touch
1670
1671 $ ls | loop 'cp $ITEM $ITEM.bak'; ls
1672 $ ls | parallel cp {} {}.bak; ls
1673
1674 $ loop 'echo $ITEM | tr a-z A-Z' -i
1675 $ parallel 'echo {} | tr a-z A-Z'
1676 # Or more efficiently:
1677 $ parallel --pipe tr a-z A-Z
1678
1679 $ loop 'echo $ITEM' --for "`ls`"
1680 $ parallel echo {} ::: "`ls`"
1681
1682 $ ls | loop './my_program $ITEM' --until-success;
1683 $ ls | parallel --halt now,success=1 ./my_program {}
1684
1685 $ ls | loop './my_program $ITEM' --until-fail;
1686 $ ls | parallel --halt now,fail=1 ./my_program {}
1687
1688 $ ./deploy.sh;
1689 loop 'curl -sw "%{http_code}" http://coolwebsite.biz' \
1690 --every 5s --until-contains 200;
1691 ./announce_to_slack.sh
1692 $ ./deploy.sh;
1693 loopy --delay 5s --halt now,success=1 \
1694 'curl -sw "%{http_code}" http://coolwebsite.biz | match 200';
1695 ./announce_to_slack.sh
1696
1697 $ loop "ping -c 1 mysite.com" --until-success; ./do_next_thing
1698 $ loopy --halt now,success=1 ping -c 1 mysite.com; ./do_next_thing
1699
1700 $ ./create_big_file -o my_big_file.bin;
1701 loop 'ls' --until-contains 'my_big_file.bin';
1702 ./upload_big_file my_big_file.bin
1703 # inotifywait is a better tool to detect file system changes.
1704 # It can even make sure the file is complete
1705 # so you are not uploading an incomplete file
1706 $ inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f . |
1707 grep my_big_file.bin
1708
1709 $ ls | loop 'cp $ITEM $ITEM.bak'
1710 $ ls | parallel cp {} {}.bak
1711
1712 $ loop './do_thing.sh' --every 15s --until-success --num 5
1713 $ parallel --retries 5 --delay 15s ::: ./do_thing.sh
1714
1715 https://github.com/Miserlou/Loop/ (Last checked: 2018-10)
1716
1717 DIFFERENCES BETWEEN lorikeet AND GNU Parallel
1718 lorikeet can run jobs in parallel. It does this based on a dependency
1719 graph described in a file, so this is similar to make.
1720
1721 https://github.com/cetra3/lorikeet (Last checked: 2018-10)
1722
1723 DIFFERENCES BETWEEN spp AND GNU Parallel
1724 spp can run jobs in parallel. spp does not use a command template to
1725 generate the jobs, but requires jobs to be in a file. Output from the
1726 jobs mix.
1727
1728 https://github.com/john01dav/spp (Last checked: 2019-01)
1729
1730 DIFFERENCES BETWEEN paral AND GNU Parallel
1731 paral prints a lot of status information and stores the output from the
1732 commands run into files. This means it cannot be used the middle of a
1733 pipe like this
1734
1735 paral "echo this" "echo does not" "echo work" | wc
1736
1737 Instead it puts the output into files named like out_#_command.out.log.
1738 To get a very similar behaviour with GNU parallel use --results
1739 'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta
1740
1741 paral only takes arguments on the command line and each argument should
1742 be a full command. Thus it does not use command templates.
1743
1744 This limits how many jobs it can run in total, because they all need to
1745 fit on a single command line.
1746
1747 paral has no support for running jobs remotely.
1748
1749 EXAMPLES FROM README.markdown
1750
1751 The examples from README.markdown and the corresponding command run
1752 with GNU parallel (--results
1753 'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta is omitted from the
1754 GNU parallel command):
1755
1756 1$ paral "command 1" "command 2 --flag" "command arg1 arg2"
1757 1$ parallel ::: "command 1" "command 2 --flag" "command arg1 arg2"
1758
1759 2$ paral "sleep 1 && echo c1" "sleep 2 && echo c2" \
1760 "sleep 3 && echo c3" "sleep 4 && echo c4" "sleep 5 && echo c5"
1761 2$ parallel ::: "sleep 1 && echo c1" "sleep 2 && echo c2" \
1762 "sleep 3 && echo c3" "sleep 4 && echo c4" "sleep 5 && echo c5"
1763 # Or shorter:
1764 parallel "sleep {} && echo c{}" ::: {1..5}
1765
1766 3$ paral -n=0 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1767 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1768 3$ parallel ::: "sleep 5 && echo c5" "sleep 4 && echo c4" \
1769 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1770 # Or shorter:
1771 parallel -j0 "sleep {} && echo c{}" ::: 5 4 3 2 1
1772
1773 4$ paral -n=1 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1774 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1775 4$ parallel -j1 "sleep {} && echo c{}" ::: 5 4 3 2 1
1776
1777 5$ paral -n=2 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1778 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1779 5$ parallel -j2 "sleep {} && echo c{}" ::: 5 4 3 2 1
1780
1781 6$ paral -n=5 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1782 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1783 6$ parallel -j5 "sleep {} && echo c{}" ::: 5 4 3 2 1
1784
1785 7$ paral -n=1 "echo a && sleep 0.5 && echo b && sleep 0.5 && \
1786 echo c && sleep 0.5 && echo d && sleep 0.5 && \
1787 echo e && sleep 0.5 && echo f && sleep 0.5 && \
1788 echo g && sleep 0.5 && echo h"
1789 7$ parallel ::: "echo a && sleep 0.5 && echo b && sleep 0.5 && \
1790 echo c && sleep 0.5 && echo d && sleep 0.5 && \
1791 echo e && sleep 0.5 && echo f && sleep 0.5 && \
1792 echo g && sleep 0.5 && echo h"
1793
1794 https://github.com/amattn/paral (Last checked: 2019-01)
1795
1796 DIFFERENCES BETWEEN concurr AND GNU Parallel
1797 concurr is built to run jobs in parallel using a client/server model.
1798
1799 EXAMPLES FROM README.md
1800
1801 The examples from README.md:
1802
1803 1$ concurr 'echo job {#} on slot {%}: {}' : arg1 arg2 arg3 arg4
1804 1$ parallel 'echo job {#} on slot {%}: {}' ::: arg1 arg2 arg3 arg4
1805
1806 2$ concurr 'echo job {#} on slot {%}: {}' :: file1 file2 file3
1807 2$ parallel 'echo job {#} on slot {%}: {}' :::: file1 file2 file3
1808
1809 3$ concurr 'echo {}' < input_file
1810 3$ parallel 'echo {}' < input_file
1811
1812 4$ cat file | concurr 'echo {}'
1813 4$ cat file | parallel 'echo {}'
1814
1815 concurr deals badly empty input files and with output larger than 64
1816 KB.
1817
1818 https://github.com/mmstick/concurr (Last checked: 2019-01)
1819
1820 DIFFERENCES BETWEEN lesser-parallel AND GNU Parallel
1821 lesser-parallel is the inspiration for parallel --embed. Both lesser-
1822 parallel and parallel --embed define bash functions that can be
1823 included as part of a bash script to run jobs in parallel.
1824
1825 lesser-parallel implements a few of the replacement strings, but hardly
1826 any options, whereas parallel --embed gives you the full GNU parallel
1827 experience.
1828
1829 https://github.com/kou1okada/lesser-parallel (Last checked: 2019-01)
1830
1831 DIFFERENCES BETWEEN npm-parallel AND GNU Parallel
1832 npm-parallel can run npm tasks in parallel.
1833
1834 There are no examples and very little documentation, so it is hard to
1835 compare to GNU parallel.
1836
1837 https://github.com/spion/npm-parallel (Last checked: 2019-01)
1838
1839 DIFFERENCES BETWEEN machma AND GNU Parallel
1840 machma runs tasks in parallel. It gives time stamped output. It buffers
1841 in RAM.
1842
1843 EXAMPLES FROM README.md
1844
1845 The examples from README.md:
1846
1847 1$ # Put shorthand for timestamp in config for the examples
1848 echo '--rpl '\
1849 \''{time} $_=::strftime("%Y-%m-%d %H:%M:%S",localtime())'\' \
1850 > ~/.parallel/machma
1851 echo '--line-buffer --tagstring "{#} {time} {}"' \
1852 >> ~/.parallel/machma
1853
1854 2$ find . -iname '*.jpg' |
1855 machma -- mogrify -resize 1200x1200 -filter Lanczos {}
1856 find . -iname '*.jpg' |
1857 parallel --bar -Jmachma mogrify -resize 1200x1200 \
1858 -filter Lanczos {}
1859
1860 3$ cat /tmp/ips | machma -p 2 -- ping -c 2 -q {}
1861 3$ cat /tmp/ips | parallel -j2 -Jmachma ping -c 2 -q {}
1862
1863 4$ cat /tmp/ips |
1864 machma -- sh -c 'ping -c 2 -q $0 > /dev/null && echo alive' {}
1865 4$ cat /tmp/ips |
1866 parallel -Jmachma 'ping -c 2 -q {} > /dev/null && echo alive'
1867
1868 5$ find . -iname '*.jpg' |
1869 machma --timeout 5s -- mogrify -resize 1200x1200 \
1870 -filter Lanczos {}
1871 5$ find . -iname '*.jpg' |
1872 parallel --timeout 5s --bar mogrify -resize 1200x1200 \
1873 -filter Lanczos {}
1874
1875 6$ find . -iname '*.jpg' -print0 |
1876 machma --null -- mogrify -resize 1200x1200 -filter Lanczos {}
1877 6$ find . -iname '*.jpg' -print0 |
1878 parallel --null --bar mogrify -resize 1200x1200 \
1879 -filter Lanczos {}
1880
1881 https://github.com/fd0/machma (Last checked: 2019-06)
1882
1883 DIFFERENCES BETWEEN interlace AND GNU Parallel
1884 Summary (see legend above):
1885
1886 - I2 I3 I4 - - -
1887 M1 - M3 - - M6
1888 - O2 O3 - - - - x x
1889 E1 E2 - - - - -
1890 - - - - - - - - -
1891 - -
1892
1893 interlace is built for network analysis to run network tools in
1894 parallel.
1895
1896 interface does not buffer output, so output from different jobs mixes.
1897
1898 The overhead for each target is O(n*n), so with 1000 targets it becomes
1899 very slow with an overhead in the order of 500ms/target.
1900
1901 EXAMPLES FROM interlace's WEBSITE
1902
1903 Using prips most of the examples from
1904 https://github.com/codingo/Interlace can be run with GNU parallel:
1905
1906 Blocker
1907
1908 commands.txt:
1909 mkdir -p _output_/_target_/scans/
1910 _blocker_
1911 nmap _target_ -oA _output_/_target_/scans/_target_-nmap
1912 interlace -tL ./targets.txt -cL commands.txt -o $output
1913
1914 parallel -a targets.txt \
1915 mkdir -p $output/{}/scans/\; nmap {} -oA $output/{}/scans/{}-nmap
1916
1917 Blocks
1918
1919 commands.txt:
1920 _block:nmap_
1921 mkdir -p _target_/output/scans/
1922 nmap _target_ -oN _target_/output/scans/_target_-nmap
1923 _block:nmap_
1924 nikto --host _target_
1925 interlace -tL ./targets.txt -cL commands.txt
1926
1927 _nmap() {
1928 mkdir -p $1/output/scans/
1929 nmap $1 -oN $1/output/scans/$1-nmap
1930 }
1931 export -f _nmap
1932 parallel ::: _nmap "nikto --host" :::: targets.txt
1933
1934 Run Nikto Over Multiple Sites
1935
1936 interlace -tL ./targets.txt -threads 5 \
1937 -c "nikto --host _target_ > ./_target_-nikto.txt" -v
1938
1939 parallel -a targets.txt -P5 nikto --host {} \> ./{}_-nikto.txt
1940
1941 Run Nikto Over Multiple Sites and Ports
1942
1943 interlace -tL ./targets.txt -threads 5 -c \
1944 "nikto --host _target_:_port_ > ./_target_-_port_-nikto.txt" \
1945 -p 80,443 -v
1946
1947 parallel -P5 nikto --host {1}:{2} \> ./{1}-{2}-nikto.txt \
1948 :::: targets.txt ::: 80 443
1949
1950 Run a List of Commands against Target Hosts
1951
1952 commands.txt:
1953 nikto --host _target_:_port_ > _output_/_target_-nikto.txt
1954 sslscan _target_:_port_ > _output_/_target_-sslscan.txt
1955 testssl.sh _target_:_port_ > _output_/_target_-testssl.txt
1956 interlace -t example.com -o ~/Engagements/example/ \
1957 -cL ./commands.txt -p 80,443
1958
1959 parallel --results ~/Engagements/example/{2}:{3}{1} {1} {2}:{3} \
1960 ::: "nikto --host" sslscan testssl.sh ::: example.com ::: 80 443
1961
1962 CIDR notation with an application that doesn't support it
1963
1964 interlace -t 192.168.12.0/24 -c "vhostscan _target_ \
1965 -oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
1966
1967 prips 192.168.12.0/24 |
1968 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
1969
1970 Glob notation with an application that doesn't support it
1971
1972 interlace -t 192.168.12.* -c "vhostscan _target_ \
1973 -oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
1974
1975 # Glob is not supported in prips
1976 prips 192.168.12.0/24 |
1977 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
1978
1979 Dash (-) notation with an application that doesn't support it
1980
1981 interlace -t 192.168.12.1-15 -c \
1982 "vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
1983 -o ~/scans/ -threads 50
1984
1985 # Dash notation is not supported in prips
1986 prips 192.168.12.1 192.168.12.15 |
1987 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
1988
1989 Threading Support for an application that doesn't support it
1990
1991 interlace -tL ./target-list.txt -c \
1992 "vhostscan -t _target_ -oN _output_/_target_-vhosts.txt" \
1993 -o ~/scans/ -threads 50
1994
1995 cat ./target-list.txt |
1996 parallel -P50 vhostscan -t {} -oN ~/scans/{}-vhosts.txt
1997
1998 alternatively
1999
2000 ./vhosts-commands.txt:
2001 vhostscan -t $target -oN _output_/_target_-vhosts.txt
2002 interlace -cL ./vhosts-commands.txt -tL ./target-list.txt \
2003 -threads 50 -o ~/scans
2004
2005 ./vhosts-commands.txt:
2006 vhostscan -t "$1" -oN "$2"
2007 parallel -P50 ./vhosts-commands.txt {} ~/scans/{}-vhosts.txt \
2008 :::: ./target-list.txt
2009
2010 Exclusions
2011
2012 interlace -t 192.168.12.0/24 -e 192.168.12.0/26 -c \
2013 "vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
2014 -o ~/scans/ -threads 50
2015
2016 prips 192.168.12.0/24 | grep -xv -Ff <(prips 192.168.12.0/26) |
2017 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
2018
2019 Run Nikto Using Multiple Proxies
2020
2021 interlace -tL ./targets.txt -pL ./proxies.txt -threads 5 -c \
2022 "nikto --host _target_:_port_ -useproxy _proxy_ > \
2023 ./_target_-_port_-nikto.txt" -p 80,443 -v
2024
2025 parallel -j5 \
2026 "nikto --host {1}:{2} -useproxy {3} > ./{1}-{2}-nikto.txt" \
2027 :::: ./targets.txt ::: 80 443 :::: ./proxies.txt
2028
2029 https://github.com/codingo/Interlace (Last checked: 2019-09)
2030
2031 DIFFERENCES BETWEEN otonvm Parallel AND GNU Parallel
2032 I have been unable to get the code to run at all. It seems unfinished.
2033
2034 https://github.com/otonvm/Parallel (Last checked: 2019-02)
2035
2036 DIFFERENCES BETWEEN k-bx par AND GNU Parallel
2037 par requires Haskell to work. This limits the number of platforms this
2038 can work on.
2039
2040 par does line buffering in memory. The memory usage is 3x the longest
2041 line (compared to 1x for parallel --lb). Commands must be given as
2042 arguments. There is no template.
2043
2044 These are the examples from https://github.com/k-bx/par with the
2045 corresponding GNU parallel command.
2046
2047 par "echo foo; sleep 1; echo foo; sleep 1; echo foo" \
2048 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2049 parallel --lb ::: "echo foo; sleep 1; echo foo; sleep 1; echo foo" \
2050 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2051
2052 par "echo foo; sleep 1; foofoo" \
2053 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2054 parallel --lb --halt 1 ::: "echo foo; sleep 1; foofoo" \
2055 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2056
2057 par "PARPREFIX=[fooechoer] echo foo" "PARPREFIX=[bar] echo bar"
2058 parallel --lb --colsep , --tagstring {1} {2} \
2059 ::: "[fooechoer],echo foo" "[bar],echo bar"
2060
2061 par --succeed "foo" "bar" && echo 'wow'
2062 parallel "foo" "bar"; true && echo 'wow'
2063
2064 https://github.com/k-bx/par (Last checked: 2019-02)
2065
2066 DIFFERENCES BETWEEN parallelshell AND GNU Parallel
2067 parallelshell does not allow for composed commands:
2068
2069 # This does not work
2070 parallelshell 'echo foo;echo bar' 'echo baz;echo quuz'
2071
2072 Instead you have to wrap that in a shell:
2073
2074 parallelshell 'sh -c "echo foo;echo bar"' 'sh -c "echo baz;echo quuz"'
2075
2076 It buffers output in RAM. All commands must be given on the command
2077 line and all commands are started in parallel at the same time. This
2078 will cause the system to freeze if there are so many jobs that there is
2079 not enough memory to run them all at the same time.
2080
2081 https://github.com/keithamus/parallelshell (Last checked: 2019-02)
2082
2083 https://github.com/darkguy2008/parallelshell (Last checked: 2019-03)
2084
2085 DIFFERENCES BETWEEN shell-executor AND GNU Parallel
2086 shell-executor does not allow for composed commands:
2087
2088 # This does not work
2089 sx 'echo foo;echo bar' 'echo baz;echo quuz'
2090
2091 Instead you have to wrap that in a shell:
2092
2093 sx 'sh -c "echo foo;echo bar"' 'sh -c "echo baz;echo quuz"'
2094
2095 It buffers output in RAM. All commands must be given on the command
2096 line and all commands are started in parallel at the same time. This
2097 will cause the system to freeze if there are so many jobs that there is
2098 not enough memory to run them all at the same time.
2099
2100 https://github.com/royriojas/shell-executor (Last checked: 2019-02)
2101
2102 DIFFERENCES BETWEEN non-GNU par AND GNU Parallel
2103 par buffers in memory to avoid mixing of jobs. It takes 1s per 1
2104 million output lines.
2105
2106 par needs to have all commands before starting the first job. The jobs
2107 are read from stdin (standard input) so any quoting will have to be
2108 done by the user.
2109
2110 Stdout (standard output) is prepended with o:. Stderr (standard error)
2111 is sendt to stdout (standard output) and prepended with e:.
2112
2113 For short jobs with little output par is 20% faster than GNU parallel
2114 and 60% slower than xargs.
2115
2116 https://github.com/UnixJunkie/PAR
2117
2118 https://savannah.nongnu.org/projects/par (Last checked: 2019-02)
2119
2120 DIFFERENCES BETWEEN fd AND GNU Parallel
2121 fd does not support composed commands, so commands must be wrapped in
2122 sh -c.
2123
2124 It buffers output in RAM.
2125
2126 It only takes file names from the filesystem as input (similar to
2127 find).
2128
2129 https://github.com/sharkdp/fd (Last checked: 2019-02)
2130
2131 DIFFERENCES BETWEEN lateral AND GNU Parallel
2132 lateral is very similar to sem: It takes a single command and runs it
2133 in the background. The design means that output from parallel running
2134 jobs may mix. If it dies unexpectly it leaves a socket in
2135 ~/.lateral/socket.PID.
2136
2137 lateral deals badly with too long command lines. This makes the lateral
2138 server crash:
2139
2140 lateral run echo `seq 100000| head -c 1000k`
2141
2142 Any options will be read by lateral so this does not work (lateral
2143 interprets the -l):
2144
2145 lateral run ls -l
2146
2147 Composed commands do not work:
2148
2149 lateral run pwd ';' ls
2150
2151 Functions do not work:
2152
2153 myfunc() { echo a; }
2154 export -f myfunc
2155 lateral run myfunc
2156
2157 Running emacs in the terminal causes the parent shell to die:
2158
2159 echo '#!/bin/bash' > mycmd
2160 echo emacs -nw >> mycmd
2161 chmod +x mycmd
2162 lateral start
2163 lateral run ./mycmd
2164
2165 Here are the examples from https://github.com/akramer/lateral with the
2166 corresponding GNU sem and GNU parallel commands:
2167
2168 1$ lateral start
2169 for i in $(cat /tmp/names); do
2170 lateral run -- some_command $i
2171 done
2172 lateral wait
2173
2174 1$ for i in $(cat /tmp/names); do
2175 sem some_command $i
2176 done
2177 sem --wait
2178
2179 1$ parallel some_command :::: /tmp/names
2180
2181 2$ lateral start
2182 for i in $(seq 1 100); do
2183 lateral run -- my_slow_command < workfile$i > /tmp/logfile$i
2184 done
2185 lateral wait
2186
2187 2$ for i in $(seq 1 100); do
2188 sem my_slow_command < workfile$i > /tmp/logfile$i
2189 done
2190 sem --wait
2191
2192 2$ parallel 'my_slow_command < workfile{} > /tmp/logfile{}' \
2193 ::: {1..100}
2194
2195 3$ lateral start -p 0 # yup, it will just queue tasks
2196 for i in $(seq 1 100); do
2197 lateral run -- command_still_outputs_but_wont_spam inputfile$i
2198 done
2199 # command output spam can commence
2200 lateral config -p 10; lateral wait
2201
2202 3$ for i in $(seq 1 100); do
2203 echo "command inputfile$i" >> joblist
2204 done
2205 parallel -j 10 :::: joblist
2206
2207 3$ echo 1 > /tmp/njobs
2208 parallel -j /tmp/njobs command inputfile{} \
2209 ::: {1..100} &
2210 echo 10 >/tmp/njobs
2211 wait
2212
2213 https://github.com/akramer/lateral (Last checked: 2019-03)
2214
2215 DIFFERENCES BETWEEN with-this AND GNU Parallel
2216 The examples from https://github.com/amritb/with-this.git and the
2217 corresponding GNU parallel command:
2218
2219 with -v "$(cat myurls.txt)" "curl -L this"
2220 parallel curl -L ::: myurls.txt
2221
2222 with -v "$(cat myregions.txt)" \
2223 "aws --region=this ec2 describe-instance-status"
2224 parallel aws --region={} ec2 describe-instance-status \
2225 :::: myregions.txt
2226
2227 with -v "$(ls)" "kubectl --kubeconfig=this get pods"
2228 ls | parallel kubectl --kubeconfig={} get pods
2229
2230 with -v "$(ls | grep config)" "kubectl --kubeconfig=this get pods"
2231 ls | grep config | parallel kubectl --kubeconfig={} get pods
2232
2233 with -v "$(echo {1..10})" "echo 123"
2234 parallel -N0 echo 123 ::: {1..10}
2235
2236 Stderr is merged with stdout. with-this buffers in RAM. It uses 3x the
2237 output size, so you cannot have output larger than 1/3rd the amount of
2238 RAM. The input values cannot contain spaces. Composed commands do not
2239 work.
2240
2241 with-this gives some additional information, so the output has to be
2242 cleaned before piping it to the next command.
2243
2244 https://github.com/amritb/with-this.git (Last checked: 2019-03)
2245
2246 DIFFERENCES BETWEEN Tollef's parallel (moreutils) AND GNU Parallel
2247 Summary (see legend above):
2248
2249 - - - I4 - - I7
2250 - - M3 - - M6
2251 - O2 O3 - O5 O6 - x x
2252 E1 - - - - - E7
2253 - x x x x x x x x
2254 - -
2255
2256 EXAMPLES FROM Tollef's parallel MANUAL
2257
2258 Tollef parallel sh -c "echo hi; sleep 2; echo bye" -- 1 2 3
2259
2260 GNU parallel "echo hi; sleep 2; echo bye" ::: 1 2 3
2261
2262 Tollef parallel -j 3 ufraw -o processed -- *.NEF
2263
2264 GNU parallel -j 3 ufraw -o processed ::: *.NEF
2265
2266 Tollef parallel -j 3 -- ls df "echo hi"
2267
2268 GNU parallel -j 3 ::: ls df "echo hi"
2269
2270 (Last checked: 2019-08)
2271
2272 DIFFERENCES BETWEEN rargs AND GNU Parallel
2273 Summary (see legend above):
2274
2275 I1 - - - - - I7
2276 - - M3 M4 - -
2277 - O2 O3 - O5 O6 - O8 -
2278 E1 - - E4 - - -
2279 - - - - - - - - -
2280 - -
2281
2282 rargs has elegant ways of doing named regexp capture and field ranges.
2283
2284 With GNU parallel you can use --rpl to get a similar functionality as
2285 regexp capture gives, and use join and @arg to get the field ranges.
2286 But the syntax is longer. This:
2287
2288 --rpl '{r(\d+)\.\.(\d+)} $_=join"$opt::colsep",@arg[$$1..$$2]'
2289
2290 would make it possible to use:
2291
2292 {1r3..6}
2293
2294 for field 3..6.
2295
2296 For full support of {n..m:s} including negative numbers use a dynamic
2297 replacement string like this:
2298
2299 PARALLEL=--rpl\ \''{r((-?\d+)?)\.\.((-?\d+)?)((:([^}]*))?)}
2300 $a = defined $$2 ? $$2 < 0 ? 1+$#arg+$$2 : $$2 : 1;
2301 $b = defined $$4 ? $$4 < 0 ? 1+$#arg+$$4 : $$4 : $#arg+1;
2302 $s = defined $$6 ? $$7 : " ";
2303 $_ = join $s,@arg[$a..$b]'\'
2304 export PARALLEL
2305
2306 You can then do:
2307
2308 head /etc/passwd | parallel --colsep : echo ..={1r..} ..3={1r..3} \
2309 4..={1r4..} 2..4={1r2..4} 3..3={1r3..3} ..3:-={1r..3:-} \
2310 ..3:/={1r..3:/} -1={-1} -5={-5} -6={-6} -3..={1r-3..}
2311
2312 EXAMPLES FROM rargs MANUAL
2313
2314 ls *.bak | rargs -p '(.*)\.bak' mv {0} {1}
2315 ls *.bak | parallel mv {} {.}
2316
2317 cat download-list.csv | rargs -p '(?P<url>.*),(?P<filename>.*)' wget {url} -O {filename}
2318 cat download-list.csv | parallel --csv wget {1} -O {2}
2319 # or use regexps:
2320 cat download-list.csv |
2321 parallel --rpl '{url} s/,.*//' --rpl '{filename} s/.*?,//' wget {url} -O {filename}
2322
2323 cat /etc/passwd | rargs -d: echo -e 'id: "{1}"\t name: "{5}"\t rest: "{6..::}"'
2324 cat /etc/passwd |
2325 parallel -q --colsep : echo -e 'id: "{1}"\t name: "{5}"\t rest: "{=6 $_=join":",@arg[6..$#arg]=}"'
2326
2327 https://github.com/lotabout/rargs (Last checked: 2020-01)
2328
2329 DIFFERENCES BETWEEN threader AND GNU Parallel
2330 Summary (see legend above):
2331
2332 I1 - - - - - -
2333 M1 - M3 - - M6
2334 O1 - O3 - O5 - - N/A N/A
2335 E1 - - E4 - - -
2336 - - - - - - - - -
2337 - -
2338
2339 Newline separates arguments, but newline at the end of file is treated
2340 as an empty argument. So this runs 2 jobs:
2341
2342 echo two_jobs | threader -run 'echo "$THREADID"'
2343
2344 threader ignores stderr, so any output to stderr is lost. threader
2345 buffers in RAM, so output bigger than the machine's virtual memory will
2346 cause the machine to crash.
2347
2348 https://github.com/voodooEntity/threader (Last checked: 2020-04)
2349
2350 DIFFERENCES BETWEEN runp AND GNU Parallel
2351 Summary (see legend above):
2352
2353 I1 I2 - - - - -
2354 M1 - (M3) - - M6
2355 O1 O2 O3 - O5 O6 - N/A N/A -
2356 E1 - - - - - -
2357 - - - - - - - - -
2358 - -
2359
2360 (M3): You can add a prefix and a postfix to the input, so it means you
2361 can only insert the argument on the command line once.
2362
2363 runp runs 10 jobs in parallel by default. runp blocks if output of a
2364 command is > 64 Kbytes. Quoting of input is needed. It adds output to
2365 stderr (this can be prevented with -q)
2366
2367 Examples as GNU Parallel
2368
2369 base='https://images-api.nasa.gov/search'
2370 query='jupiter'
2371 desc='planet'
2372 type='image'
2373 url="$base?q=$query&description=$desc&media_type=$type"
2374
2375 # Download the images in parallel using runp
2376 curl -s $url | jq -r .collection.items[].href | \
2377 runp -p 'curl -s' | jq -r .[] | grep large | \
2378 runp -p 'curl -s -L -O'
2379
2380 time curl -s $url | jq -r .collection.items[].href | \
2381 runp -g 1 -q -p 'curl -s' | jq -r .[] | grep large | \
2382 runp -g 1 -q -p 'curl -s -L -O'
2383
2384 # Download the images in parallel
2385 curl -s $url | jq -r .collection.items[].href | \
2386 parallel curl -s | jq -r .[] | grep large | \
2387 parallel curl -s -L -O
2388
2389 time curl -s $url | jq -r .collection.items[].href | \
2390 parallel -j 1 curl -s | jq -r .[] | grep large | \
2391 parallel -j 1 curl -s -L -O
2392
2393 Run some test commands (read from file)
2394
2395 # Create a file containing commands to run in parallel.
2396 cat << EOF > /tmp/test-commands.txt
2397 sleep 5
2398 sleep 3
2399 blah # this will fail
2400 ls $PWD # PWD shell variable is used here
2401 EOF
2402
2403 # Run commands from the file.
2404 runp /tmp/test-commands.txt > /dev/null
2405
2406 parallel -a /tmp/test-commands.txt > /dev/null
2407
2408 Ping several hosts and see packet loss (read from stdin)
2409
2410 # First copy this line and press Enter
2411 runp -p 'ping -c 5 -W 2' -s '| grep loss'
2412 localhost
2413 1.1.1.1
2414 8.8.8.8
2415 # Press Enter and Ctrl-D when done entering the hosts
2416
2417 # First copy this line and press Enter
2418 parallel ping -c 5 -W 2 {} '| grep loss'
2419 localhost
2420 1.1.1.1
2421 8.8.8.8
2422 # Press Enter and Ctrl-D when done entering the hosts
2423
2424 Get directories' sizes (read from stdin)
2425
2426 echo -e "$HOME\n/etc\n/tmp" | runp -q -p 'sudo du -sh'
2427
2428 echo -e "$HOME\n/etc\n/tmp" | parallel sudo du -sh
2429 # or:
2430 parallel sudo du -sh ::: "$HOME" /etc /tmp
2431
2432 Compress files
2433
2434 find . -iname '*.txt' | runp -p 'gzip --best'
2435
2436 find . -iname '*.txt' | parallel gzip --best
2437
2438 Measure HTTP request + response time
2439
2440 export CURL="curl -w 'time_total: %{time_total}\n'"
2441 CURL="$CURL -o /dev/null -s https://golang.org/"
2442 perl -wE 'for (1..10) { say $ENV{CURL} }' |
2443 runp -q # Make 10 requests
2444
2445 perl -wE 'for (1..10) { say $ENV{CURL} }' | parallel
2446 # or:
2447 parallel -N0 "$CURL" ::: {1..10}
2448
2449 Find open TCP ports
2450
2451 cat << EOF > /tmp/host-port.txt
2452 localhost 22
2453 localhost 80
2454 localhost 81
2455 127.0.0.1 443
2456 127.0.0.1 444
2457 scanme.nmap.org 22
2458 scanme.nmap.org 23
2459 scanme.nmap.org 443
2460 EOF
2461
2462 1$ cat /tmp/host-port.txt |
2463 runp -q -p 'netcat -v -w2 -z' 2>&1 | egrep '(succeeded!|open)$'
2464
2465 # --colsep is needed to split the line
2466 1$ cat /tmp/host-port.txt |
2467 parallel --colsep ' ' netcat -v -w2 -z 2>&1 |
2468 egrep '(succeeded!|open)$'
2469 # or use uq for unquoted:
2470 1$ cat /tmp/host-port.txt |
2471 parallel netcat -v -w2 -z {=uq=} 2>&1 |
2472 egrep '(succeeded!|open)$'
2473
2474 https://github.com/jreisinger/runp (Last checked: 2020-04)
2475
2476 DIFFERENCES BETWEEN papply AND GNU Parallel
2477 Summary (see legend above):
2478
2479 - - - I4 - - -
2480 M1 - M3 - - M6
2481 - - O3 - O5 - - N/A N/A O10
2482 E1 - - E4 - - -
2483 - - - - - - - - -
2484 - -
2485
2486 papply does not print the output if the command fails:
2487
2488 $ papply 'echo %F; false' foo
2489 "echo foo; false" did not succeed
2490
2491 papply's replacement strings (%F %d %f %n %e %z) can be simulated in
2492 GNU parallel by putting this in ~/.parallel/config:
2493
2494 --rpl '%F'
2495 --rpl '%d $_=Q(::dirname($_));'
2496 --rpl '%f s:.*/::;'
2497 --rpl '%n s:.*/::;s:\.[^/.]+$::;'
2498 --rpl '%e s:.*\.:.:'
2499 --rpl '%z $_=""'
2500
2501 papply buffers in RAM, and uses twice the amount of output. So output
2502 of 5 GB takes 10 GB RAM.
2503
2504 The buffering is very CPU intensive: Buffering a line of 5 GB takes 40
2505 seconds (compared to 10 seconds with GNU parallel).
2506
2507 Examples as GNU Parallel
2508
2509 1$ papply gzip *.txt
2510
2511 1$ parallel gzip ::: *.txt
2512
2513 2$ papply "convert %F %n.jpg" *.png
2514
2515 2$ parallel convert {} {.}.jpg ::: *.png
2516
2517 https://pypi.org/project/papply/ (Last checked: 2020-04)
2518
2519 DIFFERENCES BETWEEN async AND GNU Parallel
2520 Summary (see legend above):
2521
2522 - - - I4 - - I7
2523 - - - - - M6
2524 - O2 O3 - O5 O6 - N/A N/A O10
2525 E1 - - E4 - E6 -
2526 - - - - - - - - -
2527 S1 S2
2528
2529 async is very similary to GNU parallel's --semaphore mode (aka sem).
2530 async requires the user to start a server process.
2531
2532 The input is quoted like -q so you need bash -c "...;..." to run
2533 composed commands.
2534
2535 Examples as GNU Parallel
2536
2537 1$ S="/tmp/example_socket"
2538
2539 1$ ID=myid
2540
2541 2$ async -s="$S" server --start
2542
2543 2$ # GNU Parallel does not need a server to run
2544
2545 3$ for i in {1..20}; do
2546 # prints command output to stdout
2547 async -s="$S" cmd -- bash -c "sleep 1 && echo test $i"
2548 done
2549
2550 3$ for i in {1..20}; do
2551 # prints command output to stdout
2552 sem --id "$ID" -j100% "sleep 1 && echo test $i"
2553 # GNU Parallel will only print job when it is done
2554 # If you need output from different jobs to mix
2555 # use -u or --line-buffer
2556 sem --id "$ID" -j100% --line-buffer "sleep 1 && echo test $i"
2557 done
2558
2559 4$ # wait until all commands are finished
2560 async -s="$S" wait
2561
2562 4$ sem --id "$ID" --wait
2563
2564 5$ # configure the server to run four commands in parallel
2565 async -s="$S" server -j4
2566
2567 5$ export PARALLEL=-j4
2568
2569 6$ mkdir "/tmp/ex_dir"
2570 for i in {21..40}; do
2571 # redirects command output to /tmp/ex_dir/file*
2572 async -s="$S" cmd -o "/tmp/ex_dir/file$i" -- \
2573 bash -c "sleep 1 && echo test $i"
2574 done
2575
2576 6$ mkdir "/tmp/ex_dir"
2577 for i in {21..40}; do
2578 # redirects command output to /tmp/ex_dir/file*
2579 sem --id "$ID" --result '/tmp/my-ex/file-{=$_=""=}'"$i" \
2580 "sleep 1 && echo test $i"
2581 done
2582
2583 7$ sem --id "$ID" --wait
2584
2585 7$ async -s="$S" wait
2586
2587 8$ # stops server
2588 async -s="$S" server --stop
2589
2590 8$ # GNU Parallel does not need to stop a server
2591
2592 https://github.com/ctbur/async/ (Last checked: 2020-11)
2593
2594 DIFFERENCES BETWEEN pardi AND GNU Parallel
2595 Summary (see legend above):
2596
2597 I1 I2 - - - - I7
2598 M1 - - - - M6
2599 O1 O2 O3 O4 O5 - O7 - - O10
2600 E1 - - E4 - - -
2601 - - - - - - - - -
2602 - -
2603
2604 pardi is very similar to parallel --pipe --cat: It reads blocks of data
2605 and not arguments. So it cannot insert an argument in the command line.
2606 It puts the block into a temporary file, and this file name (%IN) can
2607 be put in the command line. You can only use %IN once.
2608
2609 It can also run full command lines in parallel (like: cat file |
2610 parallel).
2611
2612 EXAMPLES FROM pardi test.sh
2613
2614 1$ time pardi -v -c 100 -i data/decoys.smi -ie .smi -oe .smi \
2615 -o data/decoys_std_pardi.smi \
2616 -w '(standardiser -i %IN -o %OUT 2>&1) > /dev/null'
2617
2618 1$ cat data/decoys.smi |
2619 time parallel -N 100 --pipe --cat \
2620 '(standardiser -i {} -o {#} 2>&1) > /dev/null; cat {#}; rm {#}' \
2621 > data/decoys_std_pardi.smi
2622
2623 2$ pardi -n 1 -i data/test_in.types -o data/test_out.types \
2624 -d 'r:^#atoms:' -w 'cat %IN > %OUT'
2625
2626 2$ cat data/test_in.types | parallel -n 1 -k --pipe --cat \
2627 --regexp --recstart '^#atoms' 'cat {}' > data/test_out.types
2628
2629 3$ pardi -c 6 -i data/test_in.types -o data/test_out.types \
2630 -d 'r:^#atoms:' -w 'cat %IN > %OUT'
2631
2632 3$ cat data/test_in.types | parallel -n 6 -k --pipe --cat \
2633 --regexp --recstart '^#atoms' 'cat {}' > data/test_out.types
2634
2635 4$ pardi -i data/decoys.mol2 -o data/still_decoys.mol2 \
2636 -d 's:@<TRIPOS>MOLECULE' -w 'cp %IN %OUT'
2637
2638 4$ cat data/decoys.mol2 |
2639 parallel -n 1 --pipe --cat --recstart '@<TRIPOS>MOLECULE' \
2640 'cp {} {#}; cat {#}; rm {#}' > data/still_decoys.mol2
2641
2642 5$ pardi -i data/decoys.mol2 -o data/decoys2.mol2 \
2643 -d b:10000 -w 'cp %IN %OUT' --preserve
2644
2645 5$ cat data/decoys.mol2 |
2646 parallel -k --pipe --block 10k --recend '' --cat \
2647 'cat {} > {#}; cat {#}; rm {#}' > data/decoys2.mol2
2648
2649 https://github.com/UnixJunkie/pardi (Last checked: 2021-01)
2650
2651 DIFFERENCES BETWEEN bthread AND GNU Parallel
2652 Summary (see legend above):
2653
2654 - - - I4 - - -
2655 - - - - - M6
2656 O1 - O3 - - - O7 O8 - -
2657 E1 - - - - - -
2658 - - - - - - - - -
2659 - -
2660
2661 bthread takes around 1 sec per MB of output. The maximal output line
2662 length is 1073741759.
2663
2664 You cannot quote space in the command, so you cannot run composed
2665 commands like sh -c "echo a; echo b".
2666
2667 https://gitlab.com/netikras/bthread (Last checked: 2021-01)
2668
2669 DIFFERENCES BETWEEN simple_gpu_scheduler AND GNU Parallel
2670 Summary (see legend above):
2671
2672 I1 - - - - - I7
2673 M1 - - - - M6
2674 - O2 O3 - - O6 - x x O10
2675 E1 - - - - - -
2676 - - - - - - - - -
2677 - -
2678
2679 EXAMPLES FROM simple_gpu_scheduler MANUAL
2680
2681 1$ simple_gpu_scheduler --gpus 0 1 2 < gpu_commands.txt
2682
2683 1$ parallel -j3 --shuf \
2684 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' < gpu_commands.txt
2685
2686 2$ simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" \
2687 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 |
2688 simple_gpu_scheduler --gpus 0,1,2
2689
2690 2$ parallel --header : --shuf -j3 -v \
2691 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =}' \
2692 python3 train_dnn.py --lr {lr} --batch_size {bs} \
2693 ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
2694
2695 3$ simple_hypersearch \
2696 "python3 train_dnn.py --lr {lr} --batch_size {bs}" \
2697 --n-samples 5 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 |
2698 simple_gpu_scheduler --gpus 0,1,2
2699
2700 3$ parallel --header : --shuf \
2701 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1; seq() > 5 and skip() =}' \
2702 python3 train_dnn.py --lr {lr} --batch_size {bs} \
2703 ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
2704
2705 4$ touch gpu.queue
2706 tail -f -n 0 gpu.queue | simple_gpu_scheduler --gpus 0,1,2 &
2707 echo "my_command_with | and stuff > logfile" >> gpu.queue
2708
2709 4$ touch gpu.queue
2710 tail -f -n 0 gpu.queue |
2711 parallel -j3 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' &
2712 # Needed to fill job slots once
2713 seq 3 | parallel echo true >> gpu.queue
2714 # Add jobs
2715 echo "my_command_with | and stuff > logfile" >> gpu.queue
2716 # Needed to flush output from completed jobs
2717 seq 3 | parallel echo true >> gpu.queue
2718
2719 https://github.com/ExpectationMax/simple_gpu_scheduler (Last checked:
2720 2021-01)
2721
2722 DIFFERENCES BETWEEN parasweep AND GNU Parallel
2723 parasweep is a Python module for facilitating parallel parameter
2724 sweeps.
2725
2726 A parasweep job will normally take a text file as input. The text file
2727 contains arguments for the job. Some of these arguments will be fixed
2728 and some of them will be changed by parasweep.
2729
2730 It does this by having a template file such as template.txt:
2731
2732 Xval: {x}
2733 Yval: {y}
2734 FixedValue: 9
2735 # x with 2 decimals
2736 DecimalX: {x:.2f}
2737 TenX: ${x*10}
2738 RandomVal: {r}
2739
2740 and from this template it generates the file to be used by the job by
2741 replacing the replacement strings.
2742
2743 Being a Python module parasweep integrates tighter with Python than GNU
2744 parallel. You get the parameters directly in a Python data structure.
2745 With GNU parallel you can use the JSON or CSV output format to get
2746 something similar, but you would have to read the output.
2747
2748 parasweep has a filtering method to ignore parameter combinations you
2749 do not need.
2750
2751 Instead of calling the jobs directly, parasweep can use Python's
2752 Distributed Resource Management Application API to make jobs run with
2753 different cluster software.
2754
2755 GNU parallel --tmpl supports templates with replacement strings. Such
2756 as:
2757
2758 Xval: {x}
2759 Yval: {y}
2760 FixedValue: 9
2761 # x with 2 decimals
2762 DecimalX: {=x $_=sprintf("%.2f",$_) =}
2763 TenX: {=x $_=$_*10 =}
2764 RandomVal: {=1 $_=rand() =}
2765
2766 that can be used like:
2767
2768 parallel --header : --tmpl my.tmpl={#}.t myprog {#}.t \
2769 ::: x 1 2 3 ::: y 1 2 3
2770
2771 Filtering is supported as:
2772
2773 parallel --filter '{1} > {2}' echo ::: 1 2 3 ::: 1 2 3
2774
2775 https://github.com/eviatarbach/parasweep (Last checked: 2021-01)
2776
2777 DIFFERENCES BETWEEN parallel-bash AND GNU Parallel
2778 Summary (see legend above):
2779
2780 I1 I2 - - - - -
2781 - - M3 - - M6
2782 - O2 O3 - O5 O6 - O8 x O10
2783 E1 - - - - - -
2784 - - - - - - - - -
2785 - -
2786
2787 parallel-bash is written in pure bash. It is really fast (overhead of
2788 ~0.05 ms/job compared to GNU parallel's ~3 ms/job). So if your jobs are
2789 extremely short lived, and you can live with the quite limited command,
2790 this may be useful.
2791
2792 It works by making a queue for each process. Then the jobs are
2793 distributed to the queues in a round robin fashion. Finally the queues
2794 are started in parallel. This works fine, if you are lucky, but if not,
2795 all the long jobs may end up in the same queue, so you may see:
2796
2797 $ printf "%b\n" 1 1 1 4 1 1 1 4 1 1 1 4 |
2798 time parallel -P4 sleep {}
2799 (7 seconds)
2800 $ printf "%b\n" 1 1 1 4 1 1 1 4 1 1 1 4 |
2801 time ./parallel-bash.bash -p 4 -c sleep {}
2802 (12 seconds)
2803
2804 Because it uses bash lists, the total number of jobs is limited to
2805 167000..265000 depending on your environment. You get a segmentation
2806 fault, when you reach the limit.
2807
2808 Ctrl-C does not stop spawning new jobs. Ctrl-Z does not suspend running
2809 jobs.
2810
2811 EXAMPLES FROM parallel-bash
2812
2813 1$ some_input | parallel-bash -p 5 -c echo
2814
2815 1$ some_input | parallel -j 5 echo
2816
2817 2$ parallel-bash -p 5 -c echo < some_file
2818
2819 2$ parallel -j 5 echo < some_file
2820
2821 3$ parallel-bash -p 5 -c echo <<< 'some string'
2822
2823 3$ parallel -j 5 -c echo <<< 'some string'
2824
2825 4$ something | parallel-bash -p 5 -c echo {} {}
2826
2827 4$ something | parallel -j 5 echo {} {}
2828
2829 https://reposhub.com/python/command-line-tools/Akianonymus-parallel-bash.html
2830 (Last checked: 2021-06)
2831
2832 DIFFERENCES BETWEEN bash-concurrent AND GNU Parallel
2833 bash-concurrent is more an alternative to make than to GNU parallel.
2834 Its input is very similar to a Makefile, where jobs depend on other
2835 jobs.
2836
2837 It has a nice progress indicator where you can see which jobs completed
2838 successfully, which jobs are currently running, which jobs failed, and
2839 which jobs were skipped due to a depending job failed. The indicator
2840 does not deal well with resizing the window.
2841
2842 Output is cached in tempfiles on disk, but is only shown if there is an
2843 error, so it is not meant to be part of a UNIX pipeline. If bash-
2844 concurrent crashes these tempfiles are not removed.
2845
2846 It uses an O(n*n) algorithm, so if you have 1000 independent jobs it
2847 takes 22 seconds to start it.
2848
2849 https://github.com/themattrix/bash-concurrent (Last checked: 2021-02)
2850
2851 DIFFERENCES BETWEEN spawntool AND GNU Parallel
2852 Summary (see legend above):
2853
2854 I1 - - - - - -
2855 M1 - - - - M6
2856 - O2 O3 - O5 O6 - x x O10
2857 E1 - - - - - -
2858 - - - - - - - - -
2859 - -
2860
2861 spawn reads a full command line from stdin which it executes in
2862 parallel.
2863
2864 http://code.google.com/p/spawntool/ (Last checked: 2021-07)
2865
2866 DIFFERENCES BETWEEN go-pssh AND GNU Parallel
2867 Summary (see legend above):
2868
2869 - - - - - - -
2870 M1 - - - - -
2871 O1 - - - - - - x x O10
2872 E1 - - - - - -
2873 R1 R2 - - - R6 - - -
2874 - -
2875
2876 go-pssh does ssh in parallel to multiple machines. It runs the same
2877 command on multiple machines similar to --nonall.
2878
2879 The hostnames must be given as IP-addresses (not as hostnames).
2880
2881 Output is sent to stdout (standard output) if command is successful,
2882 and to stderr (standard error) if the command fails.
2883
2884 EXAMPLES FROM go-pssh
2885
2886 1$ go-pssh -l <ip>,<ip> -u <user> -p <port> -P <passwd> -c "<command>"
2887
2888 1$ parallel -S 'sshpass -p <passwd> ssh -p <port> <user>@<ip>' \
2889 --nonall "<command>"
2890
2891 2$ go-pssh scp -f host.txt -u <user> -p <port> -P <password> \
2892 -s /local/file_or_directory -d /remote/directory
2893
2894 2$ parallel --nonall --slf host.txt \
2895 --basefile /local/file_or_directory/./ --wd /remote/directory
2896 --ssh 'sshpass -p <password> ssh -p <port> -l <user>' true
2897
2898 3$ go-pssh scp -l <ip>,<ip> -u <user> -p <port> -P <password> \
2899 -s /local/file_or_directory -d /remote/directory
2900
2901 3$ parallel --nonall -S <ip>,<ip> \
2902 --basefile /local/file_or_directory/./ --wd /remote/directory
2903 --ssh 'sshpass -p <password> ssh -p <port> -l <user>' true
2904
2905 https://github.com/xuchenCN/go-pssh (Last checked: 2021-07)
2906
2907 DIFFERENCES BETWEEN go-parallel AND GNU Parallel
2908 Summary (see legend above):
2909
2910 I1 I2 - - - - I7
2911 - - M3 - - M6
2912 - O2 O3 - O5 - - x x - O10
2913 E1 - - E4 - - -
2914 - - - - - - - - -
2915 - -
2916
2917 go-parallel uses Go templates for replacement strings. Quite similar to
2918 the {= perl expr =} replacement string.
2919
2920 EXAMPLES FROM go-parallel
2921
2922 1$ go-parallel -a ./files.txt -t 'cp {{.Input}} {{.Input | dirname | dirname}}'
2923
2924 1$ parallel -a ./files.txt cp {} '{= $_=::dirname(::dirname($_)) =}'
2925
2926 2$ go-parallel -a ./files.txt -t 'mkdir -p {{.Input}} {{noExt .Input}}'
2927
2928 2$ parallel -a ./files.txt echo mkdir -p {} {.}
2929
2930 3$ go-parallel -a ./files.txt -t 'mkdir -p {{.Input}} {{.Input | basename | noExt}}'
2931
2932 3$ parallel -a ./files.txt echo mkdir -p {} {/.}
2933
2934 https://github.com/mylanconnolly/parallel (Last checked: 2021-07)
2935
2936 Todo
2937 http://code.google.com/p/push/ (cannot compile)
2938
2939 https://github.com/krashanoff/parallel
2940
2941 https://github.com/Nukesor/pueue
2942
2943 https://arxiv.org/pdf/2012.15443.pdf KumQuat
2944
2945 https://arxiv.org/pdf/2007.09436.pdf PaSH: Light-touch Data-Parallel
2946 Shell Processing
2947
2948 https://github.com/JeiKeiLim/simple_distribute_job
2949
2950 https://github.com/reggi/pkgrun - not obvious how to use
2951
2952 https://github.com/benoror/better-npm-run - not obvious how to use
2953
2954 https://github.com/bahmutov/with-package
2955
2956 https://github.com/flesler/parallel
2957
2958 https://github.com/Julian/Verge
2959
2960 https://manpages.ubuntu.com/manpages/xenial/man1/tsp.1.html
2961
2962 https://vicerveza.homeunix.net/~viric/soft/ts/
2963
2964 https://github.com/chapmanjacobd/que
2965
2967 There are certain issues that are very common on parallelizing tools.
2968 Here are a few stress tests. Be warned: If the tool is badly coded it
2969 may overload your machine.
2970
2971 MIX: Output mixes
2972 Output from 2 jobs should not mix. If the output is not used, this does
2973 not matter; but if the output is used then it is important that you do
2974 not get half a line from one job followed by half a line from another
2975 job.
2976
2977 If the tool does not buffer, output will most likely mix now and then.
2978
2979 This test stresses whether output mixes.
2980
2981 #!/bin/bash
2982
2983 paralleltool="parallel -j0"
2984
2985 cat <<-EOF > mycommand
2986 #!/bin/bash
2987
2988 # If a, b, c, d, e, and f mix: Very bad
2989 perl -e 'print STDOUT "a"x3000_000," "'
2990 perl -e 'print STDERR "b"x3000_000," "'
2991 perl -e 'print STDOUT "c"x3000_000," "'
2992 perl -e 'print STDERR "d"x3000_000," "'
2993 perl -e 'print STDOUT "e"x3000_000," "'
2994 perl -e 'print STDERR "f"x3000_000," "'
2995 echo
2996 echo >&2
2997 EOF
2998 chmod +x mycommand
2999
3000 # Run 30 jobs in parallel
3001 seq 30 |
3002 $paralleltool ./mycommand > >(tr -s abcdef) 2> >(tr -s abcdef >&2)
3003
3004 # 'a c e' and 'b d f' should always stay together
3005 # and there should only be a single line per job
3006
3007 STDERRMERGE: Stderr is merged with stdout
3008 Output from stdout and stderr should not be merged, but kept separated.
3009
3010 This test shows whether stdout is mixed with stderr.
3011
3012 #!/bin/bash
3013
3014 paralleltool="parallel -j0"
3015
3016 cat <<-EOF > mycommand
3017 #!/bin/bash
3018
3019 echo stdout
3020 echo stderr >&2
3021 echo stdout
3022 echo stderr >&2
3023 EOF
3024 chmod +x mycommand
3025
3026 # Run one job
3027 echo |
3028 $paralleltool ./mycommand > stdout 2> stderr
3029 cat stdout
3030 cat stderr
3031
3032 RAM: Output limited by RAM
3033 Some tools cache output in RAM. This makes them extremely slow if the
3034 output is bigger than physical memory and crash if the output is bigger
3035 than the virtual memory.
3036
3037 #!/bin/bash
3038
3039 paralleltool="parallel -j0"
3040
3041 cat <<'EOF' > mycommand
3042 #!/bin/bash
3043
3044 # Generate 1 GB output
3045 yes "`perl -e 'print \"c\"x30_000'`" | head -c 1G
3046 EOF
3047 chmod +x mycommand
3048
3049 # Run 20 jobs in parallel
3050 # Adjust 20 to be > physical RAM and < free space on /tmp
3051 seq 20 | time $paralleltool ./mycommand | wc -c
3052
3053 DISKFULL: Incomplete data if /tmp runs full
3054 If caching is done on disk, the disk can run full during the run. Not
3055 all programs discover this. GNU Parallel discovers it, if it stays full
3056 for at least 2 seconds.
3057
3058 #!/bin/bash
3059
3060 paralleltool="parallel -j0"
3061
3062 # This should be a dir with less than 100 GB free space
3063 smalldisk=/tmp/shm/parallel
3064
3065 TMPDIR="$smalldisk"
3066 export TMPDIR
3067
3068 max_output() {
3069 # Force worst case scenario:
3070 # Make GNU Parallel only check once per second
3071 sleep 10
3072 # Generate 100 GB to fill $TMPDIR
3073 # Adjust if /tmp is bigger than 100 GB
3074 yes | head -c 100G >$TMPDIR/$$
3075 # Generate 10 MB output that will not be buffered due to full disk
3076 perl -e 'print "X"x10_000_000' | head -c 10M
3077 echo This part is missing from incomplete output
3078 sleep 2
3079 rm $TMPDIR/$$
3080 echo Final output
3081 }
3082
3083 export -f max_output
3084 seq 10 | $paralleltool max_output | tr -s X
3085
3086 CLEANUP: Leaving tmp files at unexpected death
3087 Some tools do not clean up tmp files if they are killed. If the tool
3088 buffers on disk, they may not clean up, if they are killed.
3089
3090 #!/bin/bash
3091
3092 paralleltool=parallel
3093
3094 ls /tmp >/tmp/before
3095 seq 10 | $paralleltool sleep &
3096 pid=$!
3097 # Give the tool time to start up
3098 sleep 1
3099 # Kill it without giving it a chance to cleanup
3100 kill -9 $!
3101 # Should be empty: No files should be left behind
3102 diff <(ls /tmp) /tmp/before
3103
3104 SPCCHAR: Dealing badly with special file names.
3105 It is not uncommon for users to create files like:
3106
3107 My brother's 12" *** record (costs $$$).jpg
3108
3109 Some tools break on this.
3110
3111 #!/bin/bash
3112
3113 paralleltool=parallel
3114
3115 touch "My brother's 12\" *** record (costs \$\$\$).jpg"
3116 ls My*jpg | $paralleltool ls -l
3117
3118 COMPOSED: Composed commands do not work
3119 Some tools require you to wrap composed commands into bash -c.
3120
3121 echo bar | $paralleltool echo foo';' echo {}
3122
3123 ONEREP: Only one replacement string allowed
3124 Some tools can only insert the argument once.
3125
3126 echo bar | $paralleltool echo {} foo {}
3127
3128 INPUTSIZE: Length of input should not be limited
3129 Some tools limit the length of the input lines artificially with no
3130 good reason. GNU parallel does not:
3131
3132 perl -e 'print "foo."."x"x100_000_000' | parallel echo {.}
3133
3134 GNU parallel limits the command to run to 128 KB due to execve(1):
3135
3136 perl -e 'print "x"x131_000' | parallel echo {} | wc
3137
3138 NUMWORDS: Speed depends on number of words
3139 Some tools become very slow if output lines have many words.
3140
3141 #!/bin/bash
3142
3143 paralleltool=parallel
3144
3145 cat <<-EOF > mycommand
3146 #!/bin/bash
3147
3148 # 10 MB of lines with 1000 words
3149 yes "`seq 1000`" | head -c 10M
3150 EOF
3151 chmod +x mycommand
3152
3153 # Run 30 jobs in parallel
3154 seq 30 | time $paralleltool -j0 ./mycommand > /dev/null
3155
3156 4GB: Output with a line > 4GB should be OK
3157 #!/bin/bash
3158
3159 paralleltool="parallel -j0"
3160
3161 cat <<-EOF > mycommand
3162 #!/bin/bash
3163
3164 perl -e '\$a="a"x1000_000; for(1..5000) { print \$a }'
3165 EOF
3166 chmod +x mycommand
3167
3168 # Run 1 job
3169 seq 1 | $paralleltool ./mycommand | LC_ALL=C wc
3170
3172 When using GNU parallel for a publication please cite:
3173
3174 O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
3175 The USENIX Magazine, February 2011:42-47.
3176
3177 This helps funding further development; and it won't cost you a cent.
3178 If you pay 10000 EUR you should feel free to use GNU Parallel without
3179 citing.
3180
3181 Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
3182
3183 Copyright (C) 2008-2010 Ole Tange, http://ole.tange.dk
3184
3185 Copyright (C) 2010-2022 Ole Tange, http://ole.tange.dk and Free
3186 Software Foundation, Inc.
3187
3188 Parts of the manual concerning xargs compatibility is inspired by the
3189 manual of xargs from GNU findutils 4.4.2.
3190
3192 This program is free software; you can redistribute it and/or modify it
3193 under the terms of the GNU General Public License as published by the
3194 Free Software Foundation; either version 3 of the License, or at your
3195 option any later version.
3196
3197 This program is distributed in the hope that it will be useful, but
3198 WITHOUT ANY WARRANTY; without even the implied warranty of
3199 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
3200 General Public License for more details.
3201
3202 You should have received a copy of the GNU General Public License along
3203 with this program. If not, see <https://www.gnu.org/licenses/>.
3204
3205 Documentation license I
3206 Permission is granted to copy, distribute and/or modify this
3207 documentation under the terms of the GNU Free Documentation License,
3208 Version 1.3 or any later version published by the Free Software
3209 Foundation; with no Invariant Sections, with no Front-Cover Texts, and
3210 with no Back-Cover Texts. A copy of the license is included in the
3211 file LICENSES/GFDL-1.3-or-later.txt.
3212
3213 Documentation license II
3214 You are free:
3215
3216 to Share to copy, distribute and transmit the work
3217
3218 to Remix to adapt the work
3219
3220 Under the following conditions:
3221
3222 Attribution
3223 You must attribute the work in the manner specified by the
3224 author or licensor (but not in any way that suggests that they
3225 endorse you or your use of the work).
3226
3227 Share Alike
3228 If you alter, transform, or build upon this work, you may
3229 distribute the resulting work only under the same, similar or
3230 a compatible license.
3231
3232 With the understanding that:
3233
3234 Waiver Any of the above conditions can be waived if you get
3235 permission from the copyright holder.
3236
3237 Public Domain
3238 Where the work or any of its elements is in the public domain
3239 under applicable law, that status is in no way affected by the
3240 license.
3241
3242 Other Rights
3243 In no way are any of the following rights affected by the
3244 license:
3245
3246 • Your fair dealing or fair use rights, or other applicable
3247 copyright exceptions and limitations;
3248
3249 • The author's moral rights;
3250
3251 • Rights other persons may have either in the work itself or
3252 in how the work is used, such as publicity or privacy
3253 rights.
3254
3255 Notice For any reuse or distribution, you must make clear to others
3256 the license terms of this work.
3257
3258 A copy of the full license is included in the file as
3259 LICENCES/CC-BY-SA-4.0.txt
3260
3262 GNU parallel uses Perl, and the Perl modules Getopt::Long, IPC::Open3,
3263 Symbol, IO::File, POSIX, and File::Temp. For remote usage it also uses
3264 rsync with ssh.
3265
3267 find(1), xargs(1), make(1), pexec(1), ppss(1), xjobs(1), prll(1),
3268 dxargs(1), mdm(1)
3269
3270
3271
327220211222 2022-01-22 PARALLEL_ALTERNATIVES(7)