1PARALLEL_ALTERNATIVES(7) parallel PARALLEL_ALTERNATIVES(7)
2
3
4
6 parallel_alternatives - Alternatives to GNU parallel
7
9 There are a lot programs that share functionality with GNU parallel.
10 Some of these are specialized tools, and while GNU parallel can emulate
11 many of them, a specialized tool can be better at a given task. GNU
12 parallel strives to include the best of the general functionality
13 without sacrificing ease of use.
14
15 parallel has existed since 2002-01-06 and as GNU parallel since 2010. A
16 lot of the alternatives have not had the vitality to survive that long,
17 but have come and gone during that time.
18
19 GNU parallel is actively maintained with a new release every month
20 since 2010. Most other alternatives are fleeting interests of the
21 developers with irregular releases and only maintained for a few years.
22
23 SUMMARY LEGEND
24 The following features are in some of the comparable tools:
25
26 Inputs
27
28 I1. Arguments can be read from stdin
29 I2. Arguments can be read from a file
30 I3. Arguments can be read from multiple files
31 I4. Arguments can be read from command line
32 I5. Arguments can be read from a table
33 I6. Arguments can be read from the same file using #! (shebang)
34 I7. Line oriented input as default (Quoting of special chars not
35 needed)
36
37 Manipulation of input
38
39 M1. Composed command
40 M2. Multiple arguments can fill up an execution line
41 M3. Arguments can be put anywhere in the execution line
42 M4. Multiple arguments can be put anywhere in the execution line
43 M5. Arguments can be replaced with context
44 M6. Input can be treated as the complete command line
45
46 Outputs
47
48 O1. Grouping output so output from different jobs do not mix
49 O2. Send stderr (standard error) to stderr (standard error)
50 O3. Send stdout (standard output) to stdout (standard output)
51 O4. Order of output can be same as order of input
52 O5. Stdout only contains stdout (standard output) from the command
53 O6. Stderr only contains stderr (standard error) from the command
54 O7. Buffering on disk
55 O8. No temporary files left if killed
56 O9. Test if disk runs full during run
57 O10. Output of a line bigger than 4 GB
58
59 Execution
60
61 E1. Running jobs in parallel
62 E2. List running jobs
63 E3. Finish running jobs, but do not start new jobs
64 E4. Number of running jobs can depend on number of cpus
65 E5. Finish running jobs, but do not start new jobs after first failure
66 E6. Number of running jobs can be adjusted while running
67 E7. Only spawn new jobs if load is less than a limit
68
69 Remote execution
70
71 R1. Jobs can be run on remote computers
72 R2. Basefiles can be transferred
73 R3. Argument files can be transferred
74 R4. Result files can be transferred
75 R5. Cleanup of transferred files
76 R6. No config files needed
77 R7. Do not run more than SSHD's MaxStartups can handle
78 R8. Configurable SSH command
79 R9. Retry if connection breaks occasionally
80
81 Semaphore
82
83 S1. Possibility to work as a mutex
84 S2. Possibility to work as a counting semaphore
85
86 Legend
87
88 - = no
89 x = not applicable
90 ID = yes
91
92 As every new version of the programs are not tested the table may be
93 outdated. Please file a bug report if you find errors (See REPORTING
94 BUGS).
95
96 parallel:
97
98 I1 I2 I3 I4 I5 I6 I7
99 M1 M2 M3 M4 M5 M6
100 O1 O2 O3 O4 O5 O6 O7 O8 O9 O10
101 E1 E2 E3 E4 E5 E6 E7
102 R1 R2 R3 R4 R5 R6 R7 R8 R9
103 S1 S2
104
105 DIFFERENCES BETWEEN xargs AND GNU Parallel
106 Summary (see legend above):
107
108 I1 I2 - - - - -
109 - M2 M3 - - -
110 - O2 O3 - O5 O6
111 E1 - - - - - -
112 - - - - - x - - -
113 - -
114
115 xargs offers some of the same possibilities as GNU parallel.
116
117 xargs deals badly with special characters (such as space, \, ' and ").
118 To see the problem try this:
119
120 touch important_file
121 touch 'not important_file'
122 ls not* | xargs rm
123 mkdir -p "My brother's 12\" records"
124 ls | xargs rmdir
125 touch 'c:\windows\system32\clfs.sys'
126 echo 'c:\windows\system32\clfs.sys' | xargs ls -l
127
128 You can specify -0, but many input generators are not optimized for
129 using NUL as separator but are optimized for newline as separator. E.g.
130 awk, ls, echo, tar -v, head (requires using -z), tail (requires using
131 -z), sed (requires using -z), perl (-0 and \0 instead of \n), locate
132 (requires using -0), find (requires using -print0), grep (requires
133 using -z or -Z), sort (requires using -z).
134
135 GNU parallel's newline separation can be emulated with:
136
137 cat | xargs -d "\n" -n1 command
138
139 xargs can run a given number of jobs in parallel, but has no support
140 for running number-of-cpu-cores jobs in parallel.
141
142 xargs has no support for grouping the output, therefore output may run
143 together, e.g. the first half of a line is from one process and the
144 last half of the line is from another process. The example Parallel
145 grep cannot be done reliably with xargs because of this. To see this in
146 action try:
147
148 parallel perl -e '\$a=\"1\".\"{}\"x10000000\;print\ \$a,\"\\n\"' \
149 '>' {} ::: a b c d e f g h
150 # Serial = no mixing = the wanted result
151 # 'tr -s a-z' squeezes repeating letters into a single letter
152 echo a b c d e f g h | xargs -P1 -n1 grep 1 | tr -s a-z
153 # Compare to 8 jobs in parallel
154 parallel -kP8 -n1 grep 1 ::: a b c d e f g h | tr -s a-z
155 echo a b c d e f g h | xargs -P8 -n1 grep 1 | tr -s a-z
156 echo a b c d e f g h | xargs -P8 -n1 grep --line-buffered 1 | \
157 tr -s a-z
158
159 Or try this:
160
161 slow_seq() {
162 echo Count to "$@"
163 seq "$@" |
164 perl -ne '$|=1; for(split//){ print; select($a,$a,$a,0.100);}'
165 }
166 export -f slow_seq
167 # Serial = no mixing = the wanted result
168 seq 8 | xargs -n1 -P1 -I {} bash -c 'slow_seq {}'
169 # Compare to 8 jobs in parallel
170 seq 8 | parallel -P8 slow_seq {}
171 seq 8 | xargs -n1 -P8 -I {} bash -c 'slow_seq {}'
172
173 xargs has no support for keeping the order of the output, therefore if
174 running jobs in parallel using xargs the output of the second job
175 cannot be postponed till the first job is done.
176
177 xargs has no support for running jobs on remote computers.
178
179 xargs has no support for context replace, so you will have to create
180 the arguments.
181
182 If you use a replace string in xargs (-I) you can not force xargs to
183 use more than one argument.
184
185 Quoting in xargs works like -q in GNU parallel. This means composed
186 commands and redirection require using bash -c.
187
188 ls | parallel "wc {} >{}.wc"
189 ls | parallel "echo {}; ls {}|wc"
190
191 becomes (assuming you have 8 cores and that none of the filenames
192 contain space, " or ').
193
194 ls | xargs -d "\n" -P8 -I {} bash -c "wc {} >{}.wc"
195 ls | xargs -d "\n" -P8 -I {} bash -c "echo {}; ls {}|wc"
196
197 A more extreme example can be found on:
198 https://unix.stackexchange.com/q/405552/
199
200 https://www.gnu.org/software/findutils/
201
202 DIFFERENCES BETWEEN find -exec AND GNU Parallel
203 Summary (see legend above):
204
205 - - - x - x -
206 - M2 M3 - - - -
207 - O2 O3 O4 O5 O6
208 - - - - - - -
209 - - - - - - - - -
210 x x
211
212 find -exec offers some of the same possibilities as GNU parallel.
213
214 find -exec only works on files. Processing other input (such as hosts
215 or URLs) will require creating these inputs as files. find -exec has no
216 support for running commands in parallel.
217
218 https://www.gnu.org/software/findutils/ (Last checked: 2019-01)
219
220 DIFFERENCES BETWEEN make -j AND GNU Parallel
221 Summary (see legend above):
222
223 - - - - - - -
224 - - - - - -
225 O1 O2 O3 - x O6
226 E1 - - - E5 -
227 - - - - - - - - -
228 - -
229
230 make -j can run jobs in parallel, but requires a crafted Makefile to do
231 this. That results in extra quoting to get filenames containing
232 newlines to work correctly.
233
234 make -j computes a dependency graph before running jobs. Jobs run by
235 GNU parallel does not depend on each other.
236
237 (Very early versions of GNU parallel were coincidentally implemented
238 using make -j).
239
240 https://www.gnu.org/software/make/ (Last checked: 2019-01)
241
242 DIFFERENCES BETWEEN ppss AND GNU Parallel
243 Summary (see legend above):
244
245 I1 I2 - - - - I7
246 M1 - M3 - - M6
247 O1 - - x - -
248 E1 E2 ?E3 E4 - - -
249 R1 R2 R3 R4 - - ?R7 ? ?
250 - -
251
252 ppss is also a tool for running jobs in parallel.
253
254 The output of ppss is status information and thus not useful for using
255 as input for another command. The output from the jobs are put into
256 files.
257
258 The argument replace string ($ITEM) cannot be changed. Arguments must
259 be quoted - thus arguments containing special characters (space '"&!*)
260 may cause problems. More than one argument is not supported. Filenames
261 containing newlines are not processed correctly. When reading input
262 from a file null cannot be used as a terminator. ppss needs to read the
263 whole input file before starting any jobs.
264
265 Output and status information is stored in ppss_dir and thus requires
266 cleanup when completed. If the dir is not removed before running ppss
267 again it may cause nothing to happen as ppss thinks the task is already
268 done. GNU parallel will normally not need cleaning up if running
269 locally and will only need cleaning up if stopped abnormally and
270 running remote (--cleanup may not complete if stopped abnormally). The
271 example Parallel grep would require extra postprocessing if written
272 using ppss.
273
274 For remote systems PPSS requires 3 steps: config, deploy, and start.
275 GNU parallel only requires one step.
276
277 EXAMPLES FROM ppss MANUAL
278
279 Here are the examples from ppss's manual page with the equivalent using
280 GNU parallel:
281
282 1$ ./ppss.sh standalone -d /path/to/files -c 'gzip '
283
284 1$ find /path/to/files -type f | parallel gzip
285
286 2$ ./ppss.sh standalone -d /path/to/files -c 'cp "$ITEM" /destination/dir '
287
288 2$ find /path/to/files -type f | parallel cp {} /destination/dir
289
290 3$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q '
291
292 3$ parallel -a list-of-urls.txt wget -q
293
294 4$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q "$ITEM"'
295
296 4$ parallel -a list-of-urls.txt wget -q {}
297
298 5$ ./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir \
299 -m 192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh \
300 -n nodes.txt -o /some/output/dir --upload --download;
301 ./ppss deploy -C config.cfg
302 ./ppss start -C config
303
304 5$ # parallel does not use configs. If you want a different username put it in nodes.txt: user@hostname
305 find source/dir -type f |
306 parallel --sshloginfile nodes.txt --trc {.}.mp3 lame -a {} -o {.}.mp3 --preset standard --quiet
307
308 6$ ./ppss stop -C config.cfg
309
310 6$ killall -TERM parallel
311
312 7$ ./ppss pause -C config.cfg
313
314 7$ Press: CTRL-Z or killall -SIGTSTP parallel
315
316 8$ ./ppss continue -C config.cfg
317
318 8$ Enter: fg or killall -SIGCONT parallel
319
320 9$ ./ppss.sh status -C config.cfg
321
322 9$ killall -SIGUSR2 parallel
323
324 https://github.com/louwrentius/PPSS
325
326 DIFFERENCES BETWEEN pexec AND GNU Parallel
327 Summary (see legend above):
328
329 I1 I2 - I4 I5 - -
330 M1 - M3 - - M6
331 O1 O2 O3 - O5 O6
332 E1 - - E4 - E6 -
333 R1 - - - - R6 - - -
334 S1 -
335
336 pexec is also a tool for running jobs in parallel.
337
338 EXAMPLES FROM pexec MANUAL
339
340 Here are the examples from pexec's info page with the equivalent using
341 GNU parallel:
342
343 1$ pexec -o sqrt-%s.dat -p "$(seq 10)" -e NUM -n 4 -c -- \
344 'echo "scale=10000;sqrt($NUM)" | bc'
345
346 1$ seq 10 | parallel -j4 'echo "scale=10000;sqrt({})" | \
347 bc > sqrt-{}.dat'
348
349 2$ pexec -p "$(ls myfiles*.ext)" -i %s -o %s.sort -- sort
350
351 2$ ls myfiles*.ext | parallel sort {} ">{}.sort"
352
353 3$ pexec -f image.list -n auto -e B -u star.log -c -- \
354 'fistar $B.fits -f 100 -F id,x,y,flux -o $B.star'
355
356 3$ parallel -a image.list \
357 'fistar {}.fits -f 100 -F id,x,y,flux -o {}.star' 2>star.log
358
359 4$ pexec -r *.png -e IMG -c -o - -- \
360 'convert $IMG ${IMG%.png}.jpeg ; "echo $IMG: done"'
361
362 4$ ls *.png | parallel 'convert {} {.}.jpeg; echo {}: done'
363
364 5$ pexec -r *.png -i %s -o %s.jpg -c 'pngtopnm | pnmtojpeg'
365
366 5$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {}.jpg'
367
368 6$ for p in *.png ; do echo ${p%.png} ; done | \
369 pexec -f - -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
370
371 6$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
372
373 7$ LIST=$(for p in *.png ; do echo ${p%.png} ; done)
374 pexec -r $LIST -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
375
376 7$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
377
378 8$ pexec -n 8 -r *.jpg -y unix -e IMG -c \
379 'pexec -j -m blockread -d $IMG | \
380 jpegtopnm | pnmscale 0.5 | pnmtojpeg | \
381 pexec -j -m blockwrite -s th_$IMG'
382
383 8$ # Combining GNU B<parallel> and GNU B<sem>.
384 ls *jpg | parallel -j8 'sem --id blockread cat {} | jpegtopnm |' \
385 'pnmscale 0.5 | pnmtojpeg | sem --id blockwrite cat > th_{}'
386
387 # If reading and writing is done to the same disk, this may be
388 # faster as only one process will be either reading or writing:
389 ls *jpg | parallel -j8 'sem --id diskio cat {} | jpegtopnm |' \
390 'pnmscale 0.5 | pnmtojpeg | sem --id diskio cat > th_{}'
391
392 https://www.gnu.org/software/pexec/
393
394 DIFFERENCES BETWEEN xjobs AND GNU Parallel
395 xjobs is also a tool for running jobs in parallel. It only supports
396 running jobs on your local computer.
397
398 xjobs deals badly with special characters just like xargs. See the
399 section DIFFERENCES BETWEEN xargs AND GNU Parallel.
400
401 EXAMPLES FROM xjobs MANUAL
402
403 Here are the examples from xjobs's man page with the equivalent using
404 GNU parallel:
405
406 1$ ls -1 *.zip | xjobs unzip
407
408 1$ ls *.zip | parallel unzip
409
410 2$ ls -1 *.zip | xjobs -n unzip
411
412 2$ ls *.zip | parallel unzip >/dev/null
413
414 3$ find . -name '*.bak' | xjobs gzip
415
416 3$ find . -name '*.bak' | parallel gzip
417
418 4$ ls -1 *.jar | sed 's/\(.*\)/\1 > \1.idx/' | xjobs jar tf
419
420 4$ ls *.jar | parallel jar tf {} '>' {}.idx
421
422 5$ xjobs -s script
423
424 5$ cat script | parallel
425
426 6$ mkfifo /var/run/my_named_pipe;
427 xjobs -s /var/run/my_named_pipe &
428 echo unzip 1.zip >> /var/run/my_named_pipe;
429 echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
430
431 6$ mkfifo /var/run/my_named_pipe;
432 cat /var/run/my_named_pipe | parallel &
433 echo unzip 1.zip >> /var/run/my_named_pipe;
434 echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
435
436 https://www.maier-komor.de/xjobs.html (Last checked: 2019-01)
437
438 DIFFERENCES BETWEEN prll AND GNU Parallel
439 prll is also a tool for running jobs in parallel. It does not support
440 running jobs on remote computers.
441
442 prll encourages using BASH aliases and BASH functions instead of
443 scripts. GNU parallel supports scripts directly, functions if they are
444 exported using export -f, and aliases if using env_parallel.
445
446 prll generates a lot of status information on stderr (standard error)
447 which makes it harder to use the stderr (standard error) output of the
448 job directly as input for another program.
449
450 EXAMPLES FROM prll's MANUAL
451
452 Here is the example from prll's man page with the equivalent using GNU
453 parallel:
454
455 1$ prll -s 'mogrify -flip $1' *.jpg
456
457 1$ parallel mogrify -flip ::: *.jpg
458
459 https://github.com/exzombie/prll (Last checked: 2019-01)
460
461 DIFFERENCES BETWEEN dxargs AND GNU Parallel
462 dxargs is also a tool for running jobs in parallel.
463
464 dxargs does not deal well with more simultaneous jobs than SSHD's
465 MaxStartups. dxargs is only built for remote run jobs, but does not
466 support transferring of files.
467
468 https://web.archive.org/web/20120518070250/http://www.
469 semicomplete.com/blog/geekery/distributed-xargs.html (Last checked:
470 2019-01)
471
472 DIFFERENCES BETWEEN mdm/middleman AND GNU Parallel
473 middleman(mdm) is also a tool for running jobs in parallel.
474
475 EXAMPLES FROM middleman's WEBSITE
476
477 Here are the shellscripts of
478 https://web.archive.org/web/20110728064735/http://mdm.
479 berlios.de/usage.html ported to GNU parallel:
480
481 1$ seq 19 | parallel buffon -o - | sort -n > result
482 cat files | parallel cmd
483 find dir -execdir sem cmd {} \;
484
485 https://github.com/cklin/mdm (Last checked: 2019-01)
486
487 DIFFERENCES BETWEEN xapply AND GNU Parallel
488 xapply can run jobs in parallel on the local computer.
489
490 EXAMPLES FROM xapply's MANUAL
491
492 Here are the examples from xapply's man page with the equivalent using
493 GNU parallel:
494
495 1$ xapply '(cd %1 && make all)' */
496
497 1$ parallel 'cd {} && make all' ::: */
498
499 2$ xapply -f 'diff %1 ../version5/%1' manifest | more
500
501 2$ parallel diff {} ../version5/{} < manifest | more
502
503 3$ xapply -p/dev/null -f 'diff %1 %2' manifest1 checklist1
504
505 3$ parallel --link diff {1} {2} :::: manifest1 checklist1
506
507 4$ xapply 'indent' *.c
508
509 4$ parallel indent ::: *.c
510
511 5$ find ~ksb/bin -type f ! -perm -111 -print | \
512 xapply -f -v 'chmod a+x' -
513
514 5$ find ~ksb/bin -type f ! -perm -111 -print | \
515 parallel -v chmod a+x
516
517 6$ find */ -... | fmt 960 1024 | xapply -f -i /dev/tty 'vi' -
518
519 6$ sh <(find */ -... | parallel -s 1024 echo vi)
520
521 6$ find */ -... | parallel -s 1024 -Xuj1 vi
522
523 7$ find ... | xapply -f -5 -i /dev/tty 'vi' - - - - -
524
525 7$ sh <(find ... | parallel -n5 echo vi)
526
527 7$ find ... | parallel -n5 -uj1 vi
528
529 8$ xapply -fn "" /etc/passwd
530
531 8$ parallel -k echo < /etc/passwd
532
533 9$ tr ':' '\012' < /etc/passwd | \
534 xapply -7 -nf 'chown %1 %6' - - - - - - -
535
536 9$ tr ':' '\012' < /etc/passwd | parallel -N7 chown {1} {6}
537
538 10$ xapply '[ -d %1/RCS ] || echo %1' */
539
540 10$ parallel '[ -d {}/RCS ] || echo {}' ::: */
541
542 11$ xapply -f '[ -f %1 ] && echo %1' List | ...
543
544 11$ parallel '[ -f {} ] && echo {}' < List | ...
545
546 https://www.databits.net/~ksb/msrc/local/bin/xapply/xapply.html
547
548 DIFFERENCES BETWEEN AIX apply AND GNU Parallel
549 apply can build command lines based on a template and arguments - very
550 much like GNU parallel. apply does not run jobs in parallel. apply does
551 not use an argument separator (like :::); instead the template must be
552 the first argument.
553
554 EXAMPLES FROM IBM's KNOWLEDGE CENTER
555
556 Here are the examples from IBM's Knowledge Center and the corresponding
557 command using GNU parallel:
558
559 To obtain results similar to those of the ls command, enter:
560
561 1$ apply echo *
562 1$ parallel echo ::: *
563
564 To compare the file named a1 to the file named b1, and the file named
565 a2 to the file named b2, enter:
566
567 2$ apply -2 cmp a1 b1 a2 b2
568 2$ parallel -N2 cmp ::: a1 b1 a2 b2
569
570 To run the who command five times, enter:
571
572 3$ apply -0 who 1 2 3 4 5
573 3$ parallel -N0 who ::: 1 2 3 4 5
574
575 To link all files in the current directory to the directory /usr/joe,
576 enter:
577
578 4$ apply 'ln %1 /usr/joe' *
579 4$ parallel ln {} /usr/joe ::: *
580
581 https://www-01.ibm.com/support/knowledgecenter/
582 ssw_aix_71/com.ibm.aix.cmds1/apply.htm (Last checked: 2019-01)
583
584 DIFFERENCES BETWEEN paexec AND GNU Parallel
585 paexec can run jobs in parallel on both the local and remote computers.
586
587 paexec requires commands to print a blank line as the last output. This
588 means you will have to write a wrapper for most programs.
589
590 paexec has a job dependency facility so a job can depend on another job
591 to be executed successfully. Sort of a poor-man's make.
592
593 EXAMPLES FROM paexec's EXAMPLE CATALOG
594
595 Here are the examples from paexec's example catalog with the equivalent
596 using GNU parallel:
597
598 1_div_X_run
599
600 1$ ../../paexec -s -l -c "`pwd`/1_div_X_cmd" -n +1 <<EOF [...]
601
602 1$ parallel echo {} '|' `pwd`/1_div_X_cmd <<EOF [...]
603
604 all_substr_run
605
606 2$ ../../paexec -lp -c "`pwd`/all_substr_cmd" -n +3 <<EOF [...]
607
608 2$ parallel echo {} '|' `pwd`/all_substr_cmd <<EOF [...]
609
610 cc_wrapper_run
611
612 3$ ../../paexec -c "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
613 -n 'host1 host2' \
614 -t '/usr/bin/ssh -x' <<EOF [...]
615
616 3$ parallel echo {} '|' "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
617 -S host1,host2 <<EOF [...]
618
619 # This is not exactly the same, but avoids the wrapper
620 parallel gcc -O2 -c -o {.}.o {} \
621 -S host1,host2 <<EOF [...]
622
623 toupper_run
624
625 4$ ../../paexec -lp -c "`pwd`/toupper_cmd" -n +10 <<EOF [...]
626
627 4$ parallel echo {} '|' ./toupper_cmd <<EOF [...]
628
629 # Without the wrapper:
630 parallel echo {} '| awk {print\ toupper\(\$0\)}' <<EOF [...]
631
632 https://github.com/cheusov/paexec
633
634 DIFFERENCES BETWEEN map(sitaramc) AND GNU Parallel
635 Summary (see legend above):
636
637 I1 - - I4 - - (I7)
638 M1 (M2) M3 (M4) M5 M6
639 - O2 O3 - O5 - - N/A N/A O10
640 E1 - - - - - -
641 - - - - - - - - -
642 - -
643
644 (I7): Only under special circumstances. See below.
645
646 (M2+M4): Only if there is a single replacement string.
647
648 map rejects input with special characters:
649
650 echo "The Cure" > My\ brother\'s\ 12\"\ records
651
652 ls | map 'echo %; wc %'
653
654 It works with GNU parallel:
655
656 ls | parallel 'echo {}; wc {}'
657
658 Under some circumstances it also works with map:
659
660 ls | map 'echo % works %'
661
662 But tiny changes make it reject the input with special characters:
663
664 ls | map 'echo % does not work "%"'
665
666 This means that many UTF-8 characters will be rejected. This is by
667 design. From the web page: "As such, programs that quietly handle them,
668 with no warnings at all, are doing their users a disservice."
669
670 map delays each job by 0.01 s. This can be emulated by using parallel
671 --delay 0.01.
672
673 map prints '+' on stderr when a job starts, and '-' when a job
674 finishes. This cannot be disabled. parallel has --bar if you need to
675 see progress.
676
677 map's replacement strings (% %D %B %E) can be simulated in GNU parallel
678 by putting this in ~/.parallel/config:
679
680 --rpl '%'
681 --rpl '%D $_=Q(::dirname($_));'
682 --rpl '%B s:.*/::;s:\.[^/.]+$::;'
683 --rpl '%E s:.*\.::'
684
685 map does not have an argument separator on the command line, but uses
686 the first argument as command. This makes quoting harder which again
687 may affect readability. Compare:
688
689 map -p 2 'perl -ne '"'"'/^\S+\s+\S+$/ and print $ARGV,"\n"'"'" *
690
691 parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' ::: *
692
693 map can do multiple arguments with context replace, but not without
694 context replace:
695
696 parallel --xargs echo 'BEGIN{'{}'}END' ::: 1 2 3
697
698 map "echo 'BEGIN{'%'}END'" 1 2 3
699
700 map has no support for grouping. So this gives the wrong results:
701
702 parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} \
703 ::: a b c d e f
704 ls -l a b c d e f
705 parallel -kP4 -n1 grep 1 ::: a b c d e f > out.par
706 map -n1 -p 4 'grep 1' a b c d e f > out.map-unbuf
707 map -n1 -p 4 'grep --line-buffered 1' a b c d e f > out.map-linebuf
708 map -n1 -p 1 'grep --line-buffered 1' a b c d e f > out.map-serial
709 ls -l out*
710 md5sum out*
711
712 EXAMPLES FROM map's WEBSITE
713
714 Here are the examples from map's web page with the equivalent using GNU
715 parallel:
716
717 1$ ls *.gif | map convert % %B.png # default max-args: 1
718
719 1$ ls *.gif | parallel convert {} {.}.png
720
721 2$ map "mkdir %B; tar -C %B -xf %" *.tgz # default max-args: 1
722
723 2$ parallel 'mkdir {.}; tar -C {.} -xf {}' ::: *.tgz
724
725 3$ ls *.gif | map cp % /tmp # default max-args: 100
726
727 3$ ls *.gif | parallel -X cp {} /tmp
728
729 4$ ls *.tar | map -n 1 tar -xf %
730
731 4$ ls *.tar | parallel tar -xf
732
733 5$ map "cp % /tmp" *.tgz
734
735 5$ parallel cp {} /tmp ::: *.tgz
736
737 6$ map "du -sm /home/%/mail" alice bob carol
738
739 6$ parallel "du -sm /home/{}/mail" ::: alice bob carol
740 or if you prefer running a single job with multiple args:
741 6$ parallel -Xj1 "du -sm /home/{}/mail" ::: alice bob carol
742
743 7$ cat /etc/passwd | map -d: 'echo user %1 has shell %7'
744
745 7$ cat /etc/passwd | parallel --colsep : 'echo user {1} has shell {7}'
746
747 8$ export MAP_MAX_PROCS=$(( `nproc` / 2 ))
748
749 8$ export PARALLEL=-j50%
750
751 https://github.com/sitaramc/map (Last checked: 2020-05)
752
753 DIFFERENCES BETWEEN ladon AND GNU Parallel
754 ladon can run multiple jobs on files in parallel.
755
756 ladon only works on files and the only way to specify files is using a
757 quoted glob string (such as \*.jpg). It is not possible to list the
758 files manually.
759
760 As replacement strings it uses FULLPATH DIRNAME BASENAME EXT RELDIR
761 RELPATH
762
763 These can be simulated using GNU parallel by putting this in
764 ~/.parallel/config:
765
766 --rpl 'FULLPATH $_=Q($_);chomp($_=qx{readlink -f $_});'
767 --rpl 'DIRNAME $_=Q(::dirname($_));chomp($_=qx{readlink -f $_});'
768 --rpl 'BASENAME s:.*/::;s:\.[^/.]+$::;'
769 --rpl 'EXT s:.*\.::'
770 --rpl 'RELDIR $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
771 s:\Q$c/\E::;$_=::dirname($_);'
772 --rpl 'RELPATH $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
773 s:\Q$c/\E::;'
774
775 ladon deals badly with filenames containing " and newline, and it fails
776 for output larger than 200k:
777
778 ladon '*' -- seq 36000 | wc
779
780 EXAMPLES FROM ladon MANUAL
781
782 It is assumed that the '--rpl's above are put in ~/.parallel/config and
783 that it is run under a shell that supports '**' globbing (such as zsh):
784
785 1$ ladon "**/*.txt" -- echo RELPATH
786
787 1$ parallel echo RELPATH ::: **/*.txt
788
789 2$ ladon "~/Documents/**/*.pdf" -- shasum FULLPATH >hashes.txt
790
791 2$ parallel shasum FULLPATH ::: ~/Documents/**/*.pdf >hashes.txt
792
793 3$ ladon -m thumbs/RELDIR "**/*.jpg" -- convert FULLPATH \
794 -thumbnail 100x100^ -gravity center -extent 100x100 \
795 thumbs/RELPATH
796
797 3$ parallel mkdir -p thumbs/RELDIR\; convert FULLPATH
798 -thumbnail 100x100^ -gravity center -extent 100x100 \
799 thumbs/RELPATH ::: **/*.jpg
800
801 4$ ladon "~/Music/*.wav" -- lame -V 2 FULLPATH DIRNAME/BASENAME.mp3
802
803 4$ parallel lame -V 2 FULLPATH DIRNAME/BASENAME.mp3 ::: ~/Music/*.wav
804
805 https://github.com/danielgtaylor/ladon (Last checked: 2019-01)
806
807 DIFFERENCES BETWEEN jobflow AND GNU Parallel
808 Summary (see legend above):
809
810 I1 - - - - - I7
811 - - M3 - - (M6)
812 O1 O2 O3 - O5 O6 (O7) - - O10
813 E1 - - - - E6 -
814 - - - - - - - - -
815 - -
816
817 jobflow can run multiple jobs in parallel.
818
819 Just like xargs output from jobflow jobs running in parallel mix
820 together by default. jobflow can buffer into files with -buffered
821 (placed in /run/shm), but these are not cleaned up if jobflow dies
822 unexpectedly (e.g. by Ctrl-C). If the total output is big (in the order
823 of RAM+swap) it can cause the system to slow to a crawl and eventually
824 run out of memory.
825
826 Just like xargs redirection and composed commands require wrapping with
827 bash -c.
828
829 Input lines can at most be 4096 bytes.
830
831 jobflow is faster than GNU parallel but around 6 times slower than
832 parallel-bash.
833
834 jobflow has no equivalent for --pipe, or --sshlogin.
835
836 jobflow makes it possible to set resource limits on the running jobs.
837 This can be emulated by GNU parallel using bash's ulimit:
838
839 jobflow -limits=mem=100M,cpu=3,fsize=20M,nofiles=300 myjob
840
841 parallel 'ulimit -v 102400 -t 3 -f 204800 -n 300 myjob'
842
843 EXAMPLES FROM jobflow README
844
845 1$ cat things.list | jobflow -threads=8 -exec ./mytask {}
846
847 1$ cat things.list | parallel -j8 ./mytask {}
848
849 2$ seq 100 | jobflow -threads=100 -exec echo {}
850
851 2$ seq 100 | parallel -j100 echo {}
852
853 3$ cat urls.txt | jobflow -threads=32 -exec wget {}
854
855 3$ cat urls.txt | parallel -j32 wget {}
856
857 4$ find . -name '*.bmp' | \
858 jobflow -threads=8 -exec bmp2jpeg {.}.bmp {.}.jpg
859
860 4$ find . -name '*.bmp' | \
861 parallel -j8 bmp2jpeg {.}.bmp {.}.jpg
862
863 5$ seq 100 | jobflow -skip 10 -count 10
864
865 5$ seq 100 | parallel --filter '{1} > 10 and {1} <= 20' echo
866
867 5$ seq 100 | parallel echo '{= $_>10 and $_<=20 or skip() =}'
868
869 https://github.com/rofl0r/jobflow (Last checked: 2022-05)
870
871 DIFFERENCES BETWEEN gargs AND GNU Parallel
872 gargs can run multiple jobs in parallel.
873
874 Older versions cache output in memory. This causes it to be extremely
875 slow when the output is larger than the physical RAM, and can cause the
876 system to run out of memory.
877
878 See more details on this in man parallel_design.
879
880 Newer versions cache output in files, but leave files in $TMPDIR if it
881 is killed.
882
883 Output to stderr (standard error) is changed if the command fails.
884
885 EXAMPLES FROM gargs WEBSITE
886
887 1$ seq 12 -1 1 | gargs -p 4 -n 3 "sleep {0}; echo {1} {2}"
888
889 1$ seq 12 -1 1 | parallel -P 4 -n 3 "sleep {1}; echo {2} {3}"
890
891 2$ cat t.txt | gargs --sep "\s+" \
892 -p 2 "echo '{0}:{1}-{2}' full-line: \'{}\'"
893
894 2$ cat t.txt | parallel --colsep "\\s+" \
895 -P 2 "echo '{1}:{2}-{3}' full-line: \'{}\'"
896
897 https://github.com/brentp/gargs
898
899 DIFFERENCES BETWEEN orgalorg AND GNU Parallel
900 orgalorg can run the same job on multiple machines. This is related to
901 --onall and --nonall.
902
903 orgalorg supports entering the SSH password - provided it is the same
904 for all servers. GNU parallel advocates using ssh-agent instead, but it
905 is possible to emulate orgalorg's behavior by setting SSHPASS and by
906 using --ssh "sshpass ssh".
907
908 To make the emulation easier, make a simple alias:
909
910 alias par_emul="parallel -j0 --ssh 'sshpass ssh' --nonall --tag --lb"
911
912 If you want to supply a password run:
913
914 SSHPASS=`ssh-askpass`
915
916 or set the password directly:
917
918 SSHPASS=P4$$w0rd!
919
920 If the above is set up you can then do:
921
922 orgalorg -o frontend1 -o frontend2 -p -C uptime
923 par_emul -S frontend1 -S frontend2 uptime
924
925 orgalorg -o frontend1 -o frontend2 -p -C top -bid 1
926 par_emul -S frontend1 -S frontend2 top -bid 1
927
928 orgalorg -o frontend1 -o frontend2 -p -er /tmp -n \
929 'md5sum /tmp/bigfile' -S bigfile
930 par_emul -S frontend1 -S frontend2 --basefile bigfile \
931 --workdir /tmp md5sum /tmp/bigfile
932
933 orgalorg has a progress indicator for the transferring of a file. GNU
934 parallel does not.
935
936 https://github.com/reconquest/orgalorg
937
938 DIFFERENCES BETWEEN Rust parallel AND GNU Parallel
939 Rust parallel focuses on speed. It is almost as fast as xargs, but not
940 as fast as parallel-bash. It implements a few features from GNU
941 parallel, but lacks many functions. All these fail:
942
943 # Read arguments from file
944 parallel -a file echo
945 # Changing the delimiter
946 parallel -d _ echo ::: a_b_c_
947
948 These do something different from GNU parallel
949
950 # -q to protect quoted $ and space
951 parallel -q perl -e '$a=shift; print "$a"x10000000' ::: a b c
952 # Generation of combination of inputs
953 parallel echo {1} {2} ::: red green blue ::: S M L XL XXL
954 # {= perl expression =} replacement string
955 parallel echo '{= s/new/old/ =}' ::: my.new your.new
956 # --pipe
957 seq 100000 | parallel --pipe wc
958 # linked arguments
959 parallel echo ::: S M L :::+ sml med lrg ::: R G B :::+ red grn blu
960 # Run different shell dialects
961 zsh -c 'parallel echo \={} ::: zsh && true'
962 csh -c 'parallel echo \$\{\} ::: shell && true'
963 bash -c 'parallel echo \$\({}\) ::: pwd && true'
964 # Rust parallel does not start before the last argument is read
965 (seq 10; sleep 5; echo 2) | time parallel -j2 'sleep 2; echo'
966 tail -f /var/log/syslog | parallel echo
967
968 Most of the examples from the book GNU Parallel 2018 do not work, thus
969 Rust parallel is not close to being a compatible replacement.
970
971 Rust parallel has no remote facilities.
972
973 It uses /tmp/parallel for tmp files and does not clean up if terminated
974 abruptly. If another user on the system uses Rust parallel, then
975 /tmp/parallel will have the wrong permissions and Rust parallel will
976 fail. A malicious user can setup the right permissions and symlink the
977 output file to one of the user's files and next time the user uses Rust
978 parallel it will overwrite this file.
979
980 attacker$ mkdir /tmp/parallel
981 attacker$ chmod a+rwX /tmp/parallel
982 # Symlink to the file the attacker wants to zero out
983 attacker$ ln -s ~victim/.important-file /tmp/parallel/stderr_1
984 victim$ seq 1000 | parallel echo
985 # This file is now overwritten with stderr from 'echo'
986 victim$ cat ~victim/.important-file
987
988 If /tmp/parallel runs full during the run, Rust parallel does not
989 report this, but finishes with success - thereby risking data loss.
990
991 https://github.com/mmstick/parallel
992
993 DIFFERENCES BETWEEN Rush AND GNU Parallel
994 rush (https://github.com/shenwei356/rush) is written in Go and based on
995 gargs.
996
997 Just like GNU parallel rush buffers in temporary files. But opposite
998 GNU parallel rush does not clean up, if the process dies abnormally.
999
1000 rush has some string manipulations that can be emulated by putting this
1001 into ~/.parallel/config (/ is used instead of %, and % is used instead
1002 of ^ as that is closer to bash's ${var%postfix}):
1003
1004 --rpl '{:} s:(\.[^/]+)*$::'
1005 --rpl '{:%([^}]+?)} s:$$1(\.[^/]+)*$::'
1006 --rpl '{/:%([^}]*?)} s:.*/(.*)$$1(\.[^/]+)*$:$1:'
1007 --rpl '{/:} s:(.*/)?([^/.]+)(\.[^/]+)*$:$2:'
1008 --rpl '{@(.*?)} /$$1/ and $_=$1;'
1009
1010 EXAMPLES FROM rush's WEBSITE
1011
1012 Here are the examples from rush's website with the equivalent command
1013 in GNU parallel.
1014
1015 1. Simple run, quoting is not necessary
1016
1017 1$ seq 1 3 | rush echo {}
1018
1019 1$ seq 1 3 | parallel echo {}
1020
1021 2. Read data from file (`-i`)
1022
1023 2$ rush echo {} -i data1.txt -i data2.txt
1024
1025 2$ cat data1.txt data2.txt | parallel echo {}
1026
1027 3. Keep output order (`-k`)
1028
1029 3$ seq 1 3 | rush 'echo {}' -k
1030
1031 3$ seq 1 3 | parallel -k echo {}
1032
1033 4. Timeout (`-t`)
1034
1035 4$ time seq 1 | rush 'sleep 2; echo {}' -t 1
1036
1037 4$ time seq 1 | parallel --timeout 1 'sleep 2; echo {}'
1038
1039 5. Retry (`-r`)
1040
1041 5$ seq 1 | rush 'python unexisted_script.py' -r 1
1042
1043 5$ seq 1 | parallel --retries 2 'python unexisted_script.py'
1044
1045 Use -u to see it is really run twice:
1046
1047 5$ seq 1 | parallel -u --retries 2 'python unexisted_script.py'
1048
1049 6. Dirname (`{/}`) and basename (`{%}`) and remove custom suffix
1050 (`{^suffix}`)
1051
1052 6$ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
1053
1054 6$ echo dir/file_1.txt.gz |
1055 parallel --plus echo {//} {/} {%_1.txt.gz}
1056
1057 7. Get basename, and remove last (`{.}`) or any (`{:}`) extension
1058
1059 7$ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
1060
1061 7$ echo dir.d/file.txt.gz | parallel 'echo {.} {:} {/.} {/:}'
1062
1063 8. Job ID, combine fields index and other replacement strings
1064
1065 8$ echo 12 file.txt dir/s_1.fq.gz |
1066 rush 'echo job {#}: {2} {2.} {3%:^_1}'
1067
1068 8$ echo 12 file.txt dir/s_1.fq.gz |
1069 parallel --colsep ' ' 'echo job {#}: {2} {2.} {3/:%_1}'
1070
1071 9. Capture submatch using regular expression (`{@regexp}`)
1072
1073 9$ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
1074
1075 9$ echo read_1.fq.gz | parallel 'echo {@(.+)_\d}'
1076
1077 10. Custom field delimiter (`-d`)
1078
1079 10$ echo a=b=c | rush 'echo {1} {2} {3}' -d =
1080
1081 10$ echo a=b=c | parallel -d = echo {1} {2} {3}
1082
1083 11. Send multi-lines to every command (`-n`)
1084
1085 11$ seq 5 | rush -n 2 -k 'echo "{}"; echo'
1086
1087 11$ seq 5 |
1088 parallel -n 2 -k \
1089 'echo {=-1 $_=join"\n",@arg[1..$#arg] =}; echo'
1090
1091 11$ seq 5 | rush -n 2 -k 'echo "{}"; echo' -J ' '
1092
1093 11$ seq 5 | parallel -n 2 -k 'echo {}; echo'
1094
1095 12. Custom record delimiter (`-D`), note that empty records are not
1096 used.
1097
1098 12$ echo a b c d | rush -D " " -k 'echo {}'
1099
1100 12$ echo a b c d | parallel -d " " -k 'echo {}'
1101
1102 12$ echo abcd | rush -D "" -k 'echo {}'
1103
1104 Cannot be done by GNU Parallel
1105
1106 12$ cat fasta.fa
1107 >seq1
1108 tag
1109 >seq2
1110 cat
1111 gat
1112 >seq3
1113 attac
1114 a
1115 cat
1116
1117 12$ cat fasta.fa | rush -D ">" \
1118 'echo FASTA record {#}: name: {1} sequence: {2}' -k -d "\n"
1119 # rush fails to join the multiline sequences
1120
1121 12$ cat fasta.fa | (read -n1 ignore_first_char;
1122 parallel -d '>' --colsep '\n' echo FASTA record {#}: \
1123 name: {1} sequence: '{=2 $_=join"",@arg[2..$#arg]=}'
1124 )
1125
1126 13. Assign value to variable, like `awk -v` (`-v`)
1127
1128 13$ seq 1 |
1129 rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
1130
1131 13$ seq 1 |
1132 parallel -N0 \
1133 'fname=Wei; lname=Shen; echo Hello, ${fname} ${lname}!'
1134
1135 13$ for var in a b; do \
1136 13$ seq 1 3 | rush -k -v var=$var 'echo var: {var}, data: {}'; \
1137 13$ done
1138
1139 In GNU parallel you would typically do:
1140
1141 13$ seq 1 3 | parallel -k echo var: {1}, data: {2} ::: a b :::: -
1142
1143 If you really want the var:
1144
1145 13$ seq 1 3 |
1146 parallel -k var={1} ';echo var: $var, data: {}' ::: a b :::: -
1147
1148 If you really want the for-loop:
1149
1150 13$ for var in a b; do
1151 export var;
1152 seq 1 3 | parallel -k 'echo var: $var, data: {}';
1153 done
1154
1155 Contrary to rush this also works if the value is complex like:
1156
1157 My brother's 12" records
1158
1159 14. Preset variable (`-v`), avoid repeatedly writing verbose
1160 replacement strings
1161
1162 14$ # naive way
1163 echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz'
1164
1165 14$ echo read_1.fq.gz | parallel 'echo {:%_1} {:%_1}_2.fq.gz'
1166
1167 14$ # macro + removing suffix
1168 echo read_1.fq.gz |
1169 rush -v p='{:^_1}' 'echo {p} {p}_2.fq.gz'
1170
1171 14$ echo read_1.fq.gz |
1172 parallel 'p={:%_1}; echo $p ${p}_2.fq.gz'
1173
1174 14$ # macro + regular expression
1175 echo read_1.fq.gz | rush -v p='{@(.+?)_\d}' 'echo {p} {p}_2.fq.gz'
1176
1177 14$ echo read_1.fq.gz | parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1178
1179 Contrary to rush GNU parallel works with complex values:
1180
1181 14$ echo "My brother's 12\"read_1.fq.gz" |
1182 parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1183
1184 15. Interrupt jobs by `Ctrl-C`, rush will stop unfinished commands and
1185 exit.
1186
1187 15$ seq 1 20 | rush 'sleep 1; echo {}'
1188 ^C
1189
1190 15$ seq 1 20 | parallel 'sleep 1; echo {}'
1191 ^C
1192
1193 16. Continue/resume jobs (`-c`). When some jobs failed (by execution
1194 failure, timeout, or canceling by user with `Ctrl + C`), please switch
1195 flag `-c/--continue` on and run again, so that `rush` can save
1196 successful commands and ignore them in NEXT run.
1197
1198 16$ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1199 cat successful_cmds.rush
1200 seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1201
1202 16$ seq 1 3 | parallel --joblog mylog --timeout 2 \
1203 'sleep {}; echo {}'
1204 cat mylog
1205 seq 1 3 | parallel --joblog mylog --retry-failed \
1206 'sleep {}; echo {}'
1207
1208 Multi-line jobs:
1209
1210 16$ seq 1 3 | rush 'sleep {}; echo {}; \
1211 echo finish {}' -t 3 -c -C finished.rush
1212 cat finished.rush
1213 seq 1 3 | rush 'sleep {}; echo {}; \
1214 echo finish {}' -t 3 -c -C finished.rush
1215
1216 16$ seq 1 3 |
1217 parallel --joblog mylog --timeout 2 'sleep {}; echo {}; \
1218 echo finish {}'
1219 cat mylog
1220 seq 1 3 |
1221 parallel --joblog mylog --retry-failed 'sleep {}; echo {}; \
1222 echo finish {}'
1223
1224 17. A comprehensive example: downloading 1K+ pages given by three URL
1225 list files using `phantomjs save_page.js` (some page contents are
1226 dynamically generated by Javascript, so `wget` does not work). Here I
1227 set max jobs number (`-j`) as `20`, each job has a max running time
1228 (`-t`) of `60` seconds and `3` retry changes (`-r`). Continue flag `-c`
1229 is also switched on, so we can continue unfinished jobs. Luckily, it's
1230 accomplished in one run :)
1231
1232 17$ for f in $(seq 2014 2016); do \
1233 /bin/rm -rf $f; mkdir -p $f; \
1234 cat $f.html.txt | rush -v d=$f -d = \
1235 'phantomjs save_page.js "{}" > {d}/{3}.html' \
1236 -j 20 -t 60 -r 3 -c; \
1237 done
1238
1239 GNU parallel can append to an existing joblog with '+':
1240
1241 17$ rm mylog
1242 for f in $(seq 2014 2016); do
1243 /bin/rm -rf $f; mkdir -p $f;
1244 cat $f.html.txt |
1245 parallel -j20 --timeout 60 --retries 4 --joblog +mylog \
1246 --colsep = \
1247 phantomjs save_page.js {1}={2}={3} '>' $f/{3}.html
1248 done
1249
1250 18. A bioinformatics example: mapping with `bwa`, and processing result
1251 with `samtools`:
1252
1253 18$ ref=ref/xxx.fa
1254 threads=25
1255 ls -d raw.cluster.clean.mapping/* \
1256 | rush -v ref=$ref -v j=$threads -v p='{}/{%}' \
1257 'bwa mem -t {j} -M -a {ref} {p}_1.fq.gz {p}_2.fq.gz >{p}.sam;\
1258 samtools view -bS {p}.sam > {p}.bam; \
1259 samtools sort -T {p}.tmp -@ {j} {p}.bam -o {p}.sorted.bam; \
1260 samtools index {p}.sorted.bam; \
1261 samtools flagstat {p}.sorted.bam > {p}.sorted.bam.flagstat; \
1262 /bin/rm {p}.bam {p}.sam;' \
1263 -j 2 --verbose -c -C mapping.rush
1264
1265 GNU parallel would use a function:
1266
1267 18$ ref=ref/xxx.fa
1268 export ref
1269 thr=25
1270 export thr
1271 bwa_sam() {
1272 p="$1"
1273 bam="$p".bam
1274 sam="$p".sam
1275 sortbam="$p".sorted.bam
1276 bwa mem -t $thr -M -a $ref ${p}_1.fq.gz ${p}_2.fq.gz > "$sam"
1277 samtools view -bS "$sam" > "$bam"
1278 samtools sort -T ${p}.tmp -@ $thr "$bam" -o "$sortbam"
1279 samtools index "$sortbam"
1280 samtools flagstat "$sortbam" > "$sortbam".flagstat
1281 /bin/rm "$bam" "$sam"
1282 }
1283 export -f bwa_sam
1284 ls -d raw.cluster.clean.mapping/* |
1285 parallel -j 2 --verbose --joblog mylog bwa_sam
1286
1287 Other rush features
1288
1289 rush has:
1290
1291 • awk -v like custom defined variables (-v)
1292
1293 With GNU parallel you would simply set a shell variable:
1294
1295 parallel 'v={}; echo "$v"' ::: foo
1296 echo foo | rush -v v={} 'echo {v}'
1297
1298 Also rush does not like special chars. So these do not work:
1299
1300 echo does not work | rush -v v=\" 'echo {v}'
1301 echo "My brother's 12\" records" | rush -v v={} 'echo {v}'
1302
1303 Whereas the corresponding GNU parallel version works:
1304
1305 parallel 'v=\"; echo "$v"' ::: works
1306 parallel 'v={}; echo "$v"' ::: "My brother's 12\" records"
1307
1308 • Exit on first error(s) (-e)
1309
1310 This is called --halt now,fail=1 (or shorter: --halt 2) when used
1311 with GNU parallel.
1312
1313 • Settable records sending to every command (-n, default 1)
1314
1315 This is also called -n in GNU parallel.
1316
1317 • Practical replacement strings
1318
1319 {:} remove any extension
1320 With GNU parallel this can be emulated by:
1321
1322 parallel --plus echo '{/\..*/}' ::: foo.ext.bar.gz
1323
1324 {^suffix}, remove suffix
1325 With GNU parallel this can be emulated by:
1326
1327 parallel --plus echo '{%.bar.gz}' ::: foo.ext.bar.gz
1328
1329 {@regexp}, capture submatch using regular expression
1330 With GNU parallel this can be emulated by:
1331
1332 parallel --rpl '{@(.*?)} /$$1/ and $_=$1;' \
1333 echo '{@\d_(.*).gz}' ::: 1_foo.gz
1334
1335 {%.}, {%:}, basename without extension
1336 With GNU parallel this can be emulated by:
1337
1338 parallel echo '{= s:.*/::;s/\..*// =}' ::: dir/foo.bar.gz
1339
1340 And if you need it often, you define a --rpl in
1341 $HOME/.parallel/config:
1342
1343 --rpl '{%.} s:.*/::;s/\..*//'
1344 --rpl '{%:} s:.*/::;s/\..*//'
1345
1346 Then you can use them as:
1347
1348 parallel echo {%.} {%:} ::: dir/foo.bar.gz
1349
1350 • Preset variable (macro)
1351
1352 E.g.
1353
1354 echo foosuffix | rush -v p={^suffix} 'echo {p}_new_suffix'
1355
1356 With GNU parallel this can be emulated by:
1357
1358 echo foosuffix |
1359 parallel --plus 'p={%suffix}; echo ${p}_new_suffix'
1360
1361 Opposite rush GNU parallel works fine if the input contains double
1362 space, ' and ":
1363
1364 echo "1'6\" foosuffix" |
1365 parallel --plus 'p={%suffix}; echo "${p}"_new_suffix'
1366
1367 • Commands of multi-lines
1368
1369 While you can use multi-lined commands in GNU parallel, to improve
1370 readability GNU parallel discourages the use of multi-line
1371 commands. In most cases it can be written as a function:
1372
1373 seq 1 3 |
1374 parallel --timeout 2 --joblog my.log 'sleep {}; echo {}; \
1375 echo finish {}'
1376
1377 Could be written as:
1378
1379 doit() {
1380 sleep "$1"
1381 echo "$1"
1382 echo finish "$1"
1383 }
1384 export -f doit
1385 seq 1 3 | parallel --timeout 2 --joblog my.log doit
1386
1387 The failed commands can be resumed with:
1388
1389 seq 1 3 |
1390 parallel --resume-failed --joblog my.log 'sleep {}; echo {};\
1391 echo finish {}'
1392
1393 https://github.com/shenwei356/rush
1394
1395 DIFFERENCES BETWEEN ClusterSSH AND GNU Parallel
1396 ClusterSSH solves a different problem than GNU parallel.
1397
1398 ClusterSSH opens a terminal window for each computer and using a master
1399 window you can run the same command on all the computers. This is
1400 typically used for administrating several computers that are almost
1401 identical.
1402
1403 GNU parallel runs the same (or different) commands with different
1404 arguments in parallel possibly using remote computers to help
1405 computing. If more than one computer is listed in -S GNU parallel may
1406 only use one of these (e.g. if there are 8 jobs to be run and one
1407 computer has 8 cores).
1408
1409 GNU parallel can be used as a poor-man's version of ClusterSSH:
1410
1411 parallel --nonall -S server-a,server-b do_stuff foo bar
1412
1413 https://github.com/duncs/clusterssh
1414
1415 DIFFERENCES BETWEEN coshell AND GNU Parallel
1416 coshell only accepts full commands on standard input. Any quoting needs
1417 to be done by the user.
1418
1419 Commands are run in sh so any bash/tcsh/zsh specific syntax will not
1420 work.
1421
1422 Output can be buffered by using -d. Output is buffered in memory, so
1423 big output can cause swapping and therefore be terrible slow or even
1424 cause out of memory.
1425
1426 https://github.com/gdm85/coshell (Last checked: 2019-01)
1427
1428 DIFFERENCES BETWEEN spread AND GNU Parallel
1429 spread runs commands on all directories.
1430
1431 It can be emulated with GNU parallel using this Bash function:
1432
1433 spread() {
1434 _cmds() {
1435 perl -e '$"=" && ";print "@ARGV"' "cd {}" "$@"
1436 }
1437 parallel $(_cmds "$@")'|| echo exit status $?' ::: */
1438 }
1439
1440 This works except for the --exclude option.
1441
1442 (Last checked: 2017-11)
1443
1444 DIFFERENCES BETWEEN pyargs AND GNU Parallel
1445 pyargs deals badly with input containing spaces. It buffers stdout, but
1446 not stderr. It buffers in RAM. {} does not work as replacement string.
1447 It does not support running functions.
1448
1449 pyargs does not support composed commands if run with --lines, and
1450 fails on pyargs traceroute gnu.org fsf.org.
1451
1452 Examples
1453
1454 seq 5 | pyargs -P50 -L seq
1455 seq 5 | parallel -P50 --lb seq
1456
1457 seq 5 | pyargs -P50 --mark -L seq
1458 seq 5 | parallel -P50 --lb \
1459 --tagstring OUTPUT'[{= $_=$job->replaced()=}]' seq
1460 # Similar, but not precisely the same
1461 seq 5 | parallel -P50 --lb --tag seq
1462
1463 seq 5 | pyargs -P50 --mark command
1464 # Somewhat longer with GNU Parallel due to the special
1465 # --mark formatting
1466 cmd="$(echo "command" | parallel --shellquote)"
1467 wrap_cmd() {
1468 echo "MARK $cmd $@================================" >&3
1469 echo "OUTPUT START[$cmd $@]:"
1470 eval $cmd "$@"
1471 echo "OUTPUT END[$cmd $@]"
1472 }
1473 (seq 5 | env_parallel -P2 wrap_cmd) 3>&1
1474 # Similar, but not exactly the same
1475 seq 5 | parallel -t --tag command
1476
1477 (echo '1 2 3';echo 4 5 6) | pyargs --stream seq
1478 (echo '1 2 3';echo 4 5 6) | perl -pe 's/\n/ /' |
1479 parallel -r -d' ' seq
1480 # Similar, but not exactly the same
1481 parallel seq ::: 1 2 3 4 5 6
1482
1483 https://github.com/robertblackwell/pyargs (Last checked: 2019-01)
1484
1485 DIFFERENCES BETWEEN concurrently AND GNU Parallel
1486 concurrently runs jobs in parallel.
1487
1488 The output is prepended with the job number, and may be incomplete:
1489
1490 $ concurrently 'seq 100000' | (sleep 3;wc -l)
1491 7165
1492
1493 When pretty printing it caches output in memory. Output mixes by using
1494 test MIX below whether or not output is cached.
1495
1496 There seems to be no way of making a template command and have
1497 concurrently fill that with different args. The full commands must be
1498 given on the command line.
1499
1500 There is also no way of controlling how many jobs should be run in
1501 parallel at a time - i.e. "number of jobslots". Instead all jobs are
1502 simply started in parallel.
1503
1504 https://github.com/kimmobrunfeldt/concurrently (Last checked: 2019-01)
1505
1506 DIFFERENCES BETWEEN map(soveran) AND GNU Parallel
1507 map does not run jobs in parallel by default. The README suggests
1508 using:
1509
1510 ... | map t 'sleep $t && say done &'
1511
1512 But this fails if more jobs are run in parallel than the number of
1513 available processes. Since there is no support for parallelization in
1514 map itself, the output also mixes:
1515
1516 seq 10 | map i 'echo start-$i && sleep 0.$i && echo end-$i &'
1517
1518 The major difference is that GNU parallel is built for parallelization
1519 and map is not. So GNU parallel has lots of ways of dealing with the
1520 issues that parallelization raises:
1521
1522 • Keep the number of processes manageable
1523
1524 • Make sure output does not mix
1525
1526 • Make Ctrl-C kill all running processes
1527
1528 EXAMPLES FROM maps WEBSITE
1529
1530 Here are the 5 examples converted to GNU Parallel:
1531
1532 1$ ls *.c | map f 'foo $f'
1533 1$ ls *.c | parallel foo
1534
1535 2$ ls *.c | map f 'foo $f; bar $f'
1536 2$ ls *.c | parallel 'foo {}; bar {}'
1537
1538 3$ cat urls | map u 'curl -O $u'
1539 3$ cat urls | parallel curl -O
1540
1541 4$ printf "1\n1\n1\n" | map t 'sleep $t && say done'
1542 4$ printf "1\n1\n1\n" | parallel 'sleep {} && say done'
1543 4$ parallel 'sleep {} && say done' ::: 1 1 1
1544
1545 5$ printf "1\n1\n1\n" | map t 'sleep $t && say done &'
1546 5$ printf "1\n1\n1\n" | parallel -j0 'sleep {} && say done'
1547 5$ parallel -j0 'sleep {} && say done' ::: 1 1 1
1548
1549 https://github.com/soveran/map (Last checked: 2019-01)
1550
1551 DIFFERENCES BETWEEN loop AND GNU Parallel
1552 loop mixes stdout and stderr:
1553
1554 loop 'ls /no-such-file' >/dev/null
1555
1556 loop's replacement string $ITEM does not quote strings:
1557
1558 echo 'two spaces' | loop 'echo $ITEM'
1559
1560 loop cannot run functions:
1561
1562 myfunc() { echo joe; }
1563 export -f myfunc
1564 loop 'myfunc this fails'
1565
1566 EXAMPLES FROM loop's WEBSITE
1567
1568 Some of the examples from https://github.com/Miserlou/Loop/ can be
1569 emulated with GNU parallel:
1570
1571 # A couple of functions will make the code easier to read
1572 $ loopy() {
1573 yes | parallel -uN0 -j1 "$@"
1574 }
1575 $ export -f loopy
1576 $ time_out() {
1577 parallel -uN0 -q --timeout "$@" ::: 1
1578 }
1579 $ match() {
1580 perl -0777 -ne 'grep /'"$1"'/,$_ and print or exit 1'
1581 }
1582 $ export -f match
1583
1584 $ loop 'ls' --every 10s
1585 $ loopy --delay 10s ls
1586
1587 $ loop 'touch $COUNT.txt' --count-by 5
1588 $ loopy touch '{= $_=seq()*5 =}'.txt
1589
1590 $ loop --until-contains 200 -- \
1591 ./get_response_code.sh --site mysite.biz`
1592 $ loopy --halt now,success=1 \
1593 './get_response_code.sh --site mysite.biz | match 200'
1594
1595 $ loop './poke_server' --for-duration 8h
1596 $ time_out 8h loopy ./poke_server
1597
1598 $ loop './poke_server' --until-success
1599 $ loopy --halt now,success=1 ./poke_server
1600
1601 $ cat files_to_create.txt | loop 'touch $ITEM'
1602 $ cat files_to_create.txt | parallel touch {}
1603
1604 $ loop 'ls' --for-duration 10min --summary
1605 # --joblog is somewhat more verbose than --summary
1606 $ time_out 10m loopy --joblog my.log ./poke_server; cat my.log
1607
1608 $ loop 'echo hello'
1609 $ loopy echo hello
1610
1611 $ loop 'echo $COUNT'
1612 # GNU Parallel counts from 1
1613 $ loopy echo {#}
1614 # Counting from 0 can be forced
1615 $ loopy echo '{= $_=seq()-1 =}'
1616
1617 $ loop 'echo $COUNT' --count-by 2
1618 $ loopy echo '{= $_=2*(seq()-1) =}'
1619
1620 $ loop 'echo $COUNT' --count-by 2 --offset 10
1621 $ loopy echo '{= $_=10+2*(seq()-1) =}'
1622
1623 $ loop 'echo $COUNT' --count-by 1.1
1624 # GNU Parallel rounds 3.3000000000000003 to 3.3
1625 $ loopy echo '{= $_=1.1*(seq()-1) =}'
1626
1627 $ loop 'echo $COUNT $ACTUALCOUNT' --count-by 2
1628 $ loopy echo '{= $_=2*(seq()-1) =} {#}'
1629
1630 $ loop 'echo $COUNT' --num 3 --summary
1631 # --joblog is somewhat more verbose than --summary
1632 $ seq 3 | parallel --joblog my.log echo; cat my.log
1633
1634 $ loop 'ls -foobarbatz' --num 3 --summary
1635 # --joblog is somewhat more verbose than --summary
1636 $ seq 3 | parallel --joblog my.log -N0 ls -foobarbatz; cat my.log
1637
1638 $ loop 'echo $COUNT' --count-by 2 --num 50 --only-last
1639 # Can be emulated by running 2 jobs
1640 $ seq 49 | parallel echo '{= $_=2*(seq()-1) =}' >/dev/null
1641 $ echo 50| parallel echo '{= $_=2*(seq()-1) =}'
1642
1643 $ loop 'date' --every 5s
1644 $ loopy --delay 5s date
1645
1646 $ loop 'date' --for-duration 8s --every 2s
1647 $ time_out 8s loopy --delay 2s date
1648
1649 $ loop 'date -u' --until-time '2018-05-25 20:50:00' --every 5s
1650 $ seconds=$((`date -d 2019-05-25T20:50:00 +%s` - `date +%s`))s
1651 $ time_out $seconds loopy --delay 5s date -u
1652
1653 $ loop 'echo $RANDOM' --until-contains "666"
1654 $ loopy --halt now,success=1 'echo $RANDOM | match 666'
1655
1656 $ loop 'if (( RANDOM % 2 )); then
1657 (echo "TRUE"; true);
1658 else
1659 (echo "FALSE"; false);
1660 fi' --until-success
1661 $ loopy --halt now,success=1 'if (( $RANDOM % 2 )); then
1662 (echo "TRUE"; true);
1663 else
1664 (echo "FALSE"; false);
1665 fi'
1666
1667 $ loop 'if (( RANDOM % 2 )); then
1668 (echo "TRUE"; true);
1669 else
1670 (echo "FALSE"; false);
1671 fi' --until-error
1672 $ loopy --halt now,fail=1 'if (( $RANDOM % 2 )); then
1673 (echo "TRUE"; true);
1674 else
1675 (echo "FALSE"; false);
1676 fi'
1677
1678 $ loop 'date' --until-match "(\d{4})"
1679 $ loopy --halt now,success=1 'date | match [0-9][0-9][0-9][0-9]'
1680
1681 $ loop 'echo $ITEM' --for red,green,blue
1682 $ parallel echo ::: red green blue
1683
1684 $ cat /tmp/my-list-of-files-to-create.txt | loop 'touch $ITEM'
1685 $ cat /tmp/my-list-of-files-to-create.txt | parallel touch
1686
1687 $ ls | loop 'cp $ITEM $ITEM.bak'; ls
1688 $ ls | parallel cp {} {}.bak; ls
1689
1690 $ loop 'echo $ITEM | tr a-z A-Z' -i
1691 $ parallel 'echo {} | tr a-z A-Z'
1692 # Or more efficiently:
1693 $ parallel --pipe tr a-z A-Z
1694
1695 $ loop 'echo $ITEM' --for "`ls`"
1696 $ parallel echo {} ::: "`ls`"
1697
1698 $ ls | loop './my_program $ITEM' --until-success;
1699 $ ls | parallel --halt now,success=1 ./my_program {}
1700
1701 $ ls | loop './my_program $ITEM' --until-fail;
1702 $ ls | parallel --halt now,fail=1 ./my_program {}
1703
1704 $ ./deploy.sh;
1705 loop 'curl -sw "%{http_code}" http://coolwebsite.biz' \
1706 --every 5s --until-contains 200;
1707 ./announce_to_slack.sh
1708 $ ./deploy.sh;
1709 loopy --delay 5s --halt now,success=1 \
1710 'curl -sw "%{http_code}" http://coolwebsite.biz | match 200';
1711 ./announce_to_slack.sh
1712
1713 $ loop "ping -c 1 mysite.com" --until-success; ./do_next_thing
1714 $ loopy --halt now,success=1 ping -c 1 mysite.com; ./do_next_thing
1715
1716 $ ./create_big_file -o my_big_file.bin;
1717 loop 'ls' --until-contains 'my_big_file.bin';
1718 ./upload_big_file my_big_file.bin
1719 # inotifywait is a better tool to detect file system changes.
1720 # It can even make sure the file is complete
1721 # so you are not uploading an incomplete file
1722 $ inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f . |
1723 grep my_big_file.bin
1724
1725 $ ls | loop 'cp $ITEM $ITEM.bak'
1726 $ ls | parallel cp {} {}.bak
1727
1728 $ loop './do_thing.sh' --every 15s --until-success --num 5
1729 $ parallel --retries 5 --delay 15s ::: ./do_thing.sh
1730
1731 https://github.com/Miserlou/Loop/ (Last checked: 2018-10)
1732
1733 DIFFERENCES BETWEEN lorikeet AND GNU Parallel
1734 lorikeet can run jobs in parallel. It does this based on a dependency
1735 graph described in a file, so this is similar to make.
1736
1737 https://github.com/cetra3/lorikeet (Last checked: 2018-10)
1738
1739 DIFFERENCES BETWEEN spp AND GNU Parallel
1740 spp can run jobs in parallel. spp does not use a command template to
1741 generate the jobs, but requires jobs to be in a file. Output from the
1742 jobs mix.
1743
1744 https://github.com/john01dav/spp (Last checked: 2019-01)
1745
1746 DIFFERENCES BETWEEN paral AND GNU Parallel
1747 paral prints a lot of status information and stores the output from the
1748 commands run into files. This means it cannot be used the middle of a
1749 pipe like this
1750
1751 paral "echo this" "echo does not" "echo work" | wc
1752
1753 Instead it puts the output into files named like out_#_command.out.log.
1754 To get a very similar behaviour with GNU parallel use --results
1755 'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta
1756
1757 paral only takes arguments on the command line and each argument should
1758 be a full command. Thus it does not use command templates.
1759
1760 This limits how many jobs it can run in total, because they all need to
1761 fit on a single command line.
1762
1763 paral has no support for running jobs remotely.
1764
1765 EXAMPLES FROM README.markdown
1766
1767 The examples from README.markdown and the corresponding command run
1768 with GNU parallel (--results
1769 'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta is omitted from the
1770 GNU parallel command):
1771
1772 1$ paral "command 1" "command 2 --flag" "command arg1 arg2"
1773 1$ parallel ::: "command 1" "command 2 --flag" "command arg1 arg2"
1774
1775 2$ paral "sleep 1 && echo c1" "sleep 2 && echo c2" \
1776 "sleep 3 && echo c3" "sleep 4 && echo c4" "sleep 5 && echo c5"
1777 2$ parallel ::: "sleep 1 && echo c1" "sleep 2 && echo c2" \
1778 "sleep 3 && echo c3" "sleep 4 && echo c4" "sleep 5 && echo c5"
1779 # Or shorter:
1780 parallel "sleep {} && echo c{}" ::: {1..5}
1781
1782 3$ paral -n=0 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1783 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1784 3$ parallel ::: "sleep 5 && echo c5" "sleep 4 && echo c4" \
1785 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1786 # Or shorter:
1787 parallel -j0 "sleep {} && echo c{}" ::: 5 4 3 2 1
1788
1789 4$ paral -n=1 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1790 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1791 4$ parallel -j1 "sleep {} && echo c{}" ::: 5 4 3 2 1
1792
1793 5$ paral -n=2 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1794 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1795 5$ parallel -j2 "sleep {} && echo c{}" ::: 5 4 3 2 1
1796
1797 6$ paral -n=5 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1798 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1799 6$ parallel -j5 "sleep {} && echo c{}" ::: 5 4 3 2 1
1800
1801 7$ paral -n=1 "echo a && sleep 0.5 && echo b && sleep 0.5 && \
1802 echo c && sleep 0.5 && echo d && sleep 0.5 && \
1803 echo e && sleep 0.5 && echo f && sleep 0.5 && \
1804 echo g && sleep 0.5 && echo h"
1805 7$ parallel ::: "echo a && sleep 0.5 && echo b && sleep 0.5 && \
1806 echo c && sleep 0.5 && echo d && sleep 0.5 && \
1807 echo e && sleep 0.5 && echo f && sleep 0.5 && \
1808 echo g && sleep 0.5 && echo h"
1809
1810 https://github.com/amattn/paral (Last checked: 2019-01)
1811
1812 DIFFERENCES BETWEEN concurr AND GNU Parallel
1813 concurr is built to run jobs in parallel using a client/server model.
1814
1815 EXAMPLES FROM README.md
1816
1817 The examples from README.md:
1818
1819 1$ concurr 'echo job {#} on slot {%}: {}' : arg1 arg2 arg3 arg4
1820 1$ parallel 'echo job {#} on slot {%}: {}' ::: arg1 arg2 arg3 arg4
1821
1822 2$ concurr 'echo job {#} on slot {%}: {}' :: file1 file2 file3
1823 2$ parallel 'echo job {#} on slot {%}: {}' :::: file1 file2 file3
1824
1825 3$ concurr 'echo {}' < input_file
1826 3$ parallel 'echo {}' < input_file
1827
1828 4$ cat file | concurr 'echo {}'
1829 4$ cat file | parallel 'echo {}'
1830
1831 concurr deals badly empty input files and with output larger than 64
1832 KB.
1833
1834 https://github.com/mmstick/concurr (Last checked: 2019-01)
1835
1836 DIFFERENCES BETWEEN lesser-parallel AND GNU Parallel
1837 lesser-parallel is the inspiration for parallel --embed. Both lesser-
1838 parallel and parallel --embed define bash functions that can be
1839 included as part of a bash script to run jobs in parallel.
1840
1841 lesser-parallel implements a few of the replacement strings, but hardly
1842 any options, whereas parallel --embed gives you the full GNU parallel
1843 experience.
1844
1845 https://github.com/kou1okada/lesser-parallel (Last checked: 2019-01)
1846
1847 DIFFERENCES BETWEEN npm-parallel AND GNU Parallel
1848 npm-parallel can run npm tasks in parallel.
1849
1850 There are no examples and very little documentation, so it is hard to
1851 compare to GNU parallel.
1852
1853 https://github.com/spion/npm-parallel (Last checked: 2019-01)
1854
1855 DIFFERENCES BETWEEN machma AND GNU Parallel
1856 machma runs tasks in parallel. It gives time stamped output. It buffers
1857 in RAM.
1858
1859 EXAMPLES FROM README.md
1860
1861 The examples from README.md:
1862
1863 1$ # Put shorthand for timestamp in config for the examples
1864 echo '--rpl '\
1865 \''{time} $_=::strftime("%Y-%m-%d %H:%M:%S",localtime())'\' \
1866 > ~/.parallel/machma
1867 echo '--line-buffer --tagstring "{#} {time} {}"' \
1868 >> ~/.parallel/machma
1869
1870 2$ find . -iname '*.jpg' |
1871 machma -- mogrify -resize 1200x1200 -filter Lanczos {}
1872 find . -iname '*.jpg' |
1873 parallel --bar -Jmachma mogrify -resize 1200x1200 \
1874 -filter Lanczos {}
1875
1876 3$ cat /tmp/ips | machma -p 2 -- ping -c 2 -q {}
1877 3$ cat /tmp/ips | parallel -j2 -Jmachma ping -c 2 -q {}
1878
1879 4$ cat /tmp/ips |
1880 machma -- sh -c 'ping -c 2 -q $0 > /dev/null && echo alive' {}
1881 4$ cat /tmp/ips |
1882 parallel -Jmachma 'ping -c 2 -q {} > /dev/null && echo alive'
1883
1884 5$ find . -iname '*.jpg' |
1885 machma --timeout 5s -- mogrify -resize 1200x1200 \
1886 -filter Lanczos {}
1887 5$ find . -iname '*.jpg' |
1888 parallel --timeout 5s --bar mogrify -resize 1200x1200 \
1889 -filter Lanczos {}
1890
1891 6$ find . -iname '*.jpg' -print0 |
1892 machma --null -- mogrify -resize 1200x1200 -filter Lanczos {}
1893 6$ find . -iname '*.jpg' -print0 |
1894 parallel --null --bar mogrify -resize 1200x1200 \
1895 -filter Lanczos {}
1896
1897 https://github.com/fd0/machma (Last checked: 2019-06)
1898
1899 DIFFERENCES BETWEEN interlace AND GNU Parallel
1900 Summary (see legend above):
1901
1902 - I2 I3 I4 - - -
1903 M1 - M3 - - M6
1904 - O2 O3 - - - - x x
1905 E1 E2 - - - - -
1906 - - - - - - - - -
1907 - -
1908
1909 interlace is built for network analysis to run network tools in
1910 parallel.
1911
1912 interface does not buffer output, so output from different jobs mixes.
1913
1914 The overhead for each target is O(n*n), so with 1000 targets it becomes
1915 very slow with an overhead in the order of 500ms/target.
1916
1917 EXAMPLES FROM interlace's WEBSITE
1918
1919 Using prips most of the examples from
1920 https://github.com/codingo/Interlace can be run with GNU parallel:
1921
1922 Blocker
1923
1924 commands.txt:
1925 mkdir -p _output_/_target_/scans/
1926 _blocker_
1927 nmap _target_ -oA _output_/_target_/scans/_target_-nmap
1928 interlace -tL ./targets.txt -cL commands.txt -o $output
1929
1930 parallel -a targets.txt \
1931 mkdir -p $output/{}/scans/\; nmap {} -oA $output/{}/scans/{}-nmap
1932
1933 Blocks
1934
1935 commands.txt:
1936 _block:nmap_
1937 mkdir -p _target_/output/scans/
1938 nmap _target_ -oN _target_/output/scans/_target_-nmap
1939 _block:nmap_
1940 nikto --host _target_
1941 interlace -tL ./targets.txt -cL commands.txt
1942
1943 _nmap() {
1944 mkdir -p $1/output/scans/
1945 nmap $1 -oN $1/output/scans/$1-nmap
1946 }
1947 export -f _nmap
1948 parallel ::: _nmap "nikto --host" :::: targets.txt
1949
1950 Run Nikto Over Multiple Sites
1951
1952 interlace -tL ./targets.txt -threads 5 \
1953 -c "nikto --host _target_ > ./_target_-nikto.txt" -v
1954
1955 parallel -a targets.txt -P5 nikto --host {} \> ./{}_-nikto.txt
1956
1957 Run Nikto Over Multiple Sites and Ports
1958
1959 interlace -tL ./targets.txt -threads 5 -c \
1960 "nikto --host _target_:_port_ > ./_target_-_port_-nikto.txt" \
1961 -p 80,443 -v
1962
1963 parallel -P5 nikto --host {1}:{2} \> ./{1}-{2}-nikto.txt \
1964 :::: targets.txt ::: 80 443
1965
1966 Run a List of Commands against Target Hosts
1967
1968 commands.txt:
1969 nikto --host _target_:_port_ > _output_/_target_-nikto.txt
1970 sslscan _target_:_port_ > _output_/_target_-sslscan.txt
1971 testssl.sh _target_:_port_ > _output_/_target_-testssl.txt
1972 interlace -t example.com -o ~/Engagements/example/ \
1973 -cL ./commands.txt -p 80,443
1974
1975 parallel --results ~/Engagements/example/{2}:{3}{1} {1} {2}:{3} \
1976 ::: "nikto --host" sslscan testssl.sh ::: example.com ::: 80 443
1977
1978 CIDR notation with an application that doesn't support it
1979
1980 interlace -t 192.168.12.0/24 -c "vhostscan _target_ \
1981 -oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
1982
1983 prips 192.168.12.0/24 |
1984 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
1985
1986 Glob notation with an application that doesn't support it
1987
1988 interlace -t 192.168.12.* -c "vhostscan _target_ \
1989 -oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
1990
1991 # Glob is not supported in prips
1992 prips 192.168.12.0/24 |
1993 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
1994
1995 Dash (-) notation with an application that doesn't support it
1996
1997 interlace -t 192.168.12.1-15 -c \
1998 "vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
1999 -o ~/scans/ -threads 50
2000
2001 # Dash notation is not supported in prips
2002 prips 192.168.12.1 192.168.12.15 |
2003 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
2004
2005 Threading Support for an application that doesn't support it
2006
2007 interlace -tL ./target-list.txt -c \
2008 "vhostscan -t _target_ -oN _output_/_target_-vhosts.txt" \
2009 -o ~/scans/ -threads 50
2010
2011 cat ./target-list.txt |
2012 parallel -P50 vhostscan -t {} -oN ~/scans/{}-vhosts.txt
2013
2014 alternatively
2015
2016 ./vhosts-commands.txt:
2017 vhostscan -t $target -oN _output_/_target_-vhosts.txt
2018 interlace -cL ./vhosts-commands.txt -tL ./target-list.txt \
2019 -threads 50 -o ~/scans
2020
2021 ./vhosts-commands.txt:
2022 vhostscan -t "$1" -oN "$2"
2023 parallel -P50 ./vhosts-commands.txt {} ~/scans/{}-vhosts.txt \
2024 :::: ./target-list.txt
2025
2026 Exclusions
2027
2028 interlace -t 192.168.12.0/24 -e 192.168.12.0/26 -c \
2029 "vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
2030 -o ~/scans/ -threads 50
2031
2032 prips 192.168.12.0/24 | grep -xv -Ff <(prips 192.168.12.0/26) |
2033 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
2034
2035 Run Nikto Using Multiple Proxies
2036
2037 interlace -tL ./targets.txt -pL ./proxies.txt -threads 5 -c \
2038 "nikto --host _target_:_port_ -useproxy _proxy_ > \
2039 ./_target_-_port_-nikto.txt" -p 80,443 -v
2040
2041 parallel -j5 \
2042 "nikto --host {1}:{2} -useproxy {3} > ./{1}-{2}-nikto.txt" \
2043 :::: ./targets.txt ::: 80 443 :::: ./proxies.txt
2044
2045 https://github.com/codingo/Interlace (Last checked: 2019-09)
2046
2047 DIFFERENCES BETWEEN otonvm Parallel AND GNU Parallel
2048 I have been unable to get the code to run at all. It seems unfinished.
2049
2050 https://github.com/otonvm/Parallel (Last checked: 2019-02)
2051
2052 DIFFERENCES BETWEEN k-bx par AND GNU Parallel
2053 par requires Haskell to work. This limits the number of platforms this
2054 can work on.
2055
2056 par does line buffering in memory. The memory usage is 3x the longest
2057 line (compared to 1x for parallel --lb). Commands must be given as
2058 arguments. There is no template.
2059
2060 These are the examples from https://github.com/k-bx/par with the
2061 corresponding GNU parallel command.
2062
2063 par "echo foo; sleep 1; echo foo; sleep 1; echo foo" \
2064 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2065 parallel --lb ::: "echo foo; sleep 1; echo foo; sleep 1; echo foo" \
2066 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2067
2068 par "echo foo; sleep 1; foofoo" \
2069 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2070 parallel --lb --halt 1 ::: "echo foo; sleep 1; foofoo" \
2071 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2072
2073 par "PARPREFIX=[fooechoer] echo foo" "PARPREFIX=[bar] echo bar"
2074 parallel --lb --colsep , --tagstring {1} {2} \
2075 ::: "[fooechoer],echo foo" "[bar],echo bar"
2076
2077 par --succeed "foo" "bar" && echo 'wow'
2078 parallel "foo" "bar"; true && echo 'wow'
2079
2080 https://github.com/k-bx/par (Last checked: 2019-02)
2081
2082 DIFFERENCES BETWEEN parallelshell AND GNU Parallel
2083 parallelshell does not allow for composed commands:
2084
2085 # This does not work
2086 parallelshell 'echo foo;echo bar' 'echo baz;echo quuz'
2087
2088 Instead you have to wrap that in a shell:
2089
2090 parallelshell 'sh -c "echo foo;echo bar"' 'sh -c "echo baz;echo quuz"'
2091
2092 It buffers output in RAM. All commands must be given on the command
2093 line and all commands are started in parallel at the same time. This
2094 will cause the system to freeze if there are so many jobs that there is
2095 not enough memory to run them all at the same time.
2096
2097 https://github.com/keithamus/parallelshell (Last checked: 2019-02)
2098
2099 https://github.com/darkguy2008/parallelshell (Last checked: 2019-03)
2100
2101 DIFFERENCES BETWEEN shell-executor AND GNU Parallel
2102 shell-executor does not allow for composed commands:
2103
2104 # This does not work
2105 sx 'echo foo;echo bar' 'echo baz;echo quuz'
2106
2107 Instead you have to wrap that in a shell:
2108
2109 sx 'sh -c "echo foo;echo bar"' 'sh -c "echo baz;echo quuz"'
2110
2111 It buffers output in RAM. All commands must be given on the command
2112 line and all commands are started in parallel at the same time. This
2113 will cause the system to freeze if there are so many jobs that there is
2114 not enough memory to run them all at the same time.
2115
2116 https://github.com/royriojas/shell-executor (Last checked: 2019-02)
2117
2118 DIFFERENCES BETWEEN non-GNU par AND GNU Parallel
2119 par buffers in memory to avoid mixing of jobs. It takes 1s per 1
2120 million output lines.
2121
2122 par needs to have all commands before starting the first job. The jobs
2123 are read from stdin (standard input) so any quoting will have to be
2124 done by the user.
2125
2126 Stdout (standard output) is prepended with o:. Stderr (standard error)
2127 is sendt to stdout (standard output) and prepended with e:.
2128
2129 For short jobs with little output par is 20% faster than GNU parallel
2130 and 60% slower than xargs.
2131
2132 https://github.com/UnixJunkie/PAR
2133
2134 https://savannah.nongnu.org/projects/par (Last checked: 2019-02)
2135
2136 DIFFERENCES BETWEEN fd AND GNU Parallel
2137 fd does not support composed commands, so commands must be wrapped in
2138 sh -c.
2139
2140 It buffers output in RAM.
2141
2142 It only takes file names from the filesystem as input (similar to
2143 find).
2144
2145 https://github.com/sharkdp/fd (Last checked: 2019-02)
2146
2147 DIFFERENCES BETWEEN lateral AND GNU Parallel
2148 lateral is very similar to sem: It takes a single command and runs it
2149 in the background. The design means that output from parallel running
2150 jobs may mix. If it dies unexpectly it leaves a socket in
2151 ~/.lateral/socket.PID.
2152
2153 lateral deals badly with too long command lines. This makes the lateral
2154 server crash:
2155
2156 lateral run echo `seq 100000| head -c 1000k`
2157
2158 Any options will be read by lateral so this does not work (lateral
2159 interprets the -l):
2160
2161 lateral run ls -l
2162
2163 Composed commands do not work:
2164
2165 lateral run pwd ';' ls
2166
2167 Functions do not work:
2168
2169 myfunc() { echo a; }
2170 export -f myfunc
2171 lateral run myfunc
2172
2173 Running emacs in the terminal causes the parent shell to die:
2174
2175 echo '#!/bin/bash' > mycmd
2176 echo emacs -nw >> mycmd
2177 chmod +x mycmd
2178 lateral start
2179 lateral run ./mycmd
2180
2181 Here are the examples from https://github.com/akramer/lateral with the
2182 corresponding GNU sem and GNU parallel commands:
2183
2184 1$ lateral start
2185 for i in $(cat /tmp/names); do
2186 lateral run -- some_command $i
2187 done
2188 lateral wait
2189
2190 1$ for i in $(cat /tmp/names); do
2191 sem some_command $i
2192 done
2193 sem --wait
2194
2195 1$ parallel some_command :::: /tmp/names
2196
2197 2$ lateral start
2198 for i in $(seq 1 100); do
2199 lateral run -- my_slow_command < workfile$i > /tmp/logfile$i
2200 done
2201 lateral wait
2202
2203 2$ for i in $(seq 1 100); do
2204 sem my_slow_command < workfile$i > /tmp/logfile$i
2205 done
2206 sem --wait
2207
2208 2$ parallel 'my_slow_command < workfile{} > /tmp/logfile{}' \
2209 ::: {1..100}
2210
2211 3$ lateral start -p 0 # yup, it will just queue tasks
2212 for i in $(seq 1 100); do
2213 lateral run -- command_still_outputs_but_wont_spam inputfile$i
2214 done
2215 # command output spam can commence
2216 lateral config -p 10; lateral wait
2217
2218 3$ for i in $(seq 1 100); do
2219 echo "command inputfile$i" >> joblist
2220 done
2221 parallel -j 10 :::: joblist
2222
2223 3$ echo 1 > /tmp/njobs
2224 parallel -j /tmp/njobs command inputfile{} \
2225 ::: {1..100} &
2226 echo 10 >/tmp/njobs
2227 wait
2228
2229 https://github.com/akramer/lateral (Last checked: 2019-03)
2230
2231 DIFFERENCES BETWEEN with-this AND GNU Parallel
2232 The examples from https://github.com/amritb/with-this.git and the
2233 corresponding GNU parallel command:
2234
2235 with -v "$(cat myurls.txt)" "curl -L this"
2236 parallel curl -L ::: myurls.txt
2237
2238 with -v "$(cat myregions.txt)" \
2239 "aws --region=this ec2 describe-instance-status"
2240 parallel aws --region={} ec2 describe-instance-status \
2241 :::: myregions.txt
2242
2243 with -v "$(ls)" "kubectl --kubeconfig=this get pods"
2244 ls | parallel kubectl --kubeconfig={} get pods
2245
2246 with -v "$(ls | grep config)" "kubectl --kubeconfig=this get pods"
2247 ls | grep config | parallel kubectl --kubeconfig={} get pods
2248
2249 with -v "$(echo {1..10})" "echo 123"
2250 parallel -N0 echo 123 ::: {1..10}
2251
2252 Stderr is merged with stdout. with-this buffers in RAM. It uses 3x the
2253 output size, so you cannot have output larger than 1/3rd the amount of
2254 RAM. The input values cannot contain spaces. Composed commands do not
2255 work.
2256
2257 with-this gives some additional information, so the output has to be
2258 cleaned before piping it to the next command.
2259
2260 https://github.com/amritb/with-this.git (Last checked: 2019-03)
2261
2262 DIFFERENCES BETWEEN Tollef's parallel (moreutils) AND GNU Parallel
2263 Summary (see legend above):
2264
2265 - - - I4 - - I7
2266 - - M3 - - M6
2267 - O2 O3 - O5 O6 - x x
2268 E1 - - - - - E7
2269 - x x x x x x x x
2270 - -
2271
2272 EXAMPLES FROM Tollef's parallel MANUAL
2273
2274 Tollef parallel sh -c "echo hi; sleep 2; echo bye" -- 1 2 3
2275
2276 GNU parallel "echo hi; sleep 2; echo bye" ::: 1 2 3
2277
2278 Tollef parallel -j 3 ufraw -o processed -- *.NEF
2279
2280 GNU parallel -j 3 ufraw -o processed ::: *.NEF
2281
2282 Tollef parallel -j 3 -- ls df "echo hi"
2283
2284 GNU parallel -j 3 ::: ls df "echo hi"
2285
2286 (Last checked: 2019-08)
2287
2288 DIFFERENCES BETWEEN rargs AND GNU Parallel
2289 Summary (see legend above):
2290
2291 I1 - - - - - I7
2292 - - M3 M4 - -
2293 - O2 O3 - O5 O6 - O8 -
2294 E1 - - E4 - - -
2295 - - - - - - - - -
2296 - -
2297
2298 rargs has elegant ways of doing named regexp capture and field ranges.
2299
2300 With GNU parallel you can use --rpl to get a similar functionality as
2301 regexp capture gives, and use join and @arg to get the field ranges.
2302 But the syntax is longer. This:
2303
2304 --rpl '{r(\d+)\.\.(\d+)} $_=join"$opt::colsep",@arg[$$1..$$2]'
2305
2306 would make it possible to use:
2307
2308 {1r3..6}
2309
2310 for field 3..6.
2311
2312 For full support of {n..m:s} including negative numbers use a dynamic
2313 replacement string like this:
2314
2315 PARALLEL=--rpl\ \''{r((-?\d+)?)\.\.((-?\d+)?)((:([^}]*))?)}
2316 $a = defined $$2 ? $$2 < 0 ? 1+$#arg+$$2 : $$2 : 1;
2317 $b = defined $$4 ? $$4 < 0 ? 1+$#arg+$$4 : $$4 : $#arg+1;
2318 $s = defined $$6 ? $$7 : " ";
2319 $_ = join $s,@arg[$a..$b]'\'
2320 export PARALLEL
2321
2322 You can then do:
2323
2324 head /etc/passwd | parallel --colsep : echo ..={1r..} ..3={1r..3} \
2325 4..={1r4..} 2..4={1r2..4} 3..3={1r3..3} ..3:-={1r..3:-} \
2326 ..3:/={1r..3:/} -1={-1} -5={-5} -6={-6} -3..={1r-3..}
2327
2328 EXAMPLES FROM rargs MANUAL
2329
2330 ls *.bak | rargs -p '(.*)\.bak' mv {0} {1}
2331 ls *.bak | parallel mv {} {.}
2332
2333 cat download-list.csv | rargs -p '(?P<url>.*),(?P<filename>.*)' wget {url} -O {filename}
2334 cat download-list.csv | parallel --csv wget {1} -O {2}
2335 # or use regexps:
2336 cat download-list.csv |
2337 parallel --rpl '{url} s/,.*//' --rpl '{filename} s/.*?,//' wget {url} -O {filename}
2338
2339 cat /etc/passwd | rargs -d: echo -e 'id: "{1}"\t name: "{5}"\t rest: "{6..::}"'
2340 cat /etc/passwd |
2341 parallel -q --colsep : echo -e 'id: "{1}"\t name: "{5}"\t rest: "{=6 $_=join":",@arg[6..$#arg]=}"'
2342
2343 https://github.com/lotabout/rargs (Last checked: 2020-01)
2344
2345 DIFFERENCES BETWEEN threader AND GNU Parallel
2346 Summary (see legend above):
2347
2348 I1 - - - - - -
2349 M1 - M3 - - M6
2350 O1 - O3 - O5 - - N/A N/A
2351 E1 - - E4 - - -
2352 - - - - - - - - -
2353 - -
2354
2355 Newline separates arguments, but newline at the end of file is treated
2356 as an empty argument. So this runs 2 jobs:
2357
2358 echo two_jobs | threader -run 'echo "$THREADID"'
2359
2360 threader ignores stderr, so any output to stderr is lost. threader
2361 buffers in RAM, so output bigger than the machine's virtual memory will
2362 cause the machine to crash.
2363
2364 https://github.com/voodooEntity/threader (Last checked: 2020-04)
2365
2366 DIFFERENCES BETWEEN runp AND GNU Parallel
2367 Summary (see legend above):
2368
2369 I1 I2 - - - - -
2370 M1 - (M3) - - M6
2371 O1 O2 O3 - O5 O6 - N/A N/A -
2372 E1 - - - - - -
2373 - - - - - - - - -
2374 - -
2375
2376 (M3): You can add a prefix and a postfix to the input, so it means you
2377 can only insert the argument on the command line once.
2378
2379 runp runs 10 jobs in parallel by default. runp blocks if output of a
2380 command is > 64 Kbytes. Quoting of input is needed. It adds output to
2381 stderr (this can be prevented with -q)
2382
2383 Examples as GNU Parallel
2384
2385 base='https://images-api.nasa.gov/search'
2386 query='jupiter'
2387 desc='planet'
2388 type='image'
2389 url="$base?q=$query&description=$desc&media_type=$type"
2390
2391 # Download the images in parallel using runp
2392 curl -s $url | jq -r .collection.items[].href | \
2393 runp -p 'curl -s' | jq -r .[] | grep large | \
2394 runp -p 'curl -s -L -O'
2395
2396 time curl -s $url | jq -r .collection.items[].href | \
2397 runp -g 1 -q -p 'curl -s' | jq -r .[] | grep large | \
2398 runp -g 1 -q -p 'curl -s -L -O'
2399
2400 # Download the images in parallel
2401 curl -s $url | jq -r .collection.items[].href | \
2402 parallel curl -s | jq -r .[] | grep large | \
2403 parallel curl -s -L -O
2404
2405 time curl -s $url | jq -r .collection.items[].href | \
2406 parallel -j 1 curl -s | jq -r .[] | grep large | \
2407 parallel -j 1 curl -s -L -O
2408
2409 Run some test commands (read from file)
2410
2411 # Create a file containing commands to run in parallel.
2412 cat << EOF > /tmp/test-commands.txt
2413 sleep 5
2414 sleep 3
2415 blah # this will fail
2416 ls $PWD # PWD shell variable is used here
2417 EOF
2418
2419 # Run commands from the file.
2420 runp /tmp/test-commands.txt > /dev/null
2421
2422 parallel -a /tmp/test-commands.txt > /dev/null
2423
2424 Ping several hosts and see packet loss (read from stdin)
2425
2426 # First copy this line and press Enter
2427 runp -p 'ping -c 5 -W 2' -s '| grep loss'
2428 localhost
2429 1.1.1.1
2430 8.8.8.8
2431 # Press Enter and Ctrl-D when done entering the hosts
2432
2433 # First copy this line and press Enter
2434 parallel ping -c 5 -W 2 {} '| grep loss'
2435 localhost
2436 1.1.1.1
2437 8.8.8.8
2438 # Press Enter and Ctrl-D when done entering the hosts
2439
2440 Get directories' sizes (read from stdin)
2441
2442 echo -e "$HOME\n/etc\n/tmp" | runp -q -p 'sudo du -sh'
2443
2444 echo -e "$HOME\n/etc\n/tmp" | parallel sudo du -sh
2445 # or:
2446 parallel sudo du -sh ::: "$HOME" /etc /tmp
2447
2448 Compress files
2449
2450 find . -iname '*.txt' | runp -p 'gzip --best'
2451
2452 find . -iname '*.txt' | parallel gzip --best
2453
2454 Measure HTTP request + response time
2455
2456 export CURL="curl -w 'time_total: %{time_total}\n'"
2457 CURL="$CURL -o /dev/null -s https://golang.org/"
2458 perl -wE 'for (1..10) { say $ENV{CURL} }' |
2459 runp -q # Make 10 requests
2460
2461 perl -wE 'for (1..10) { say $ENV{CURL} }' | parallel
2462 # or:
2463 parallel -N0 "$CURL" ::: {1..10}
2464
2465 Find open TCP ports
2466
2467 cat << EOF > /tmp/host-port.txt
2468 localhost 22
2469 localhost 80
2470 localhost 81
2471 127.0.0.1 443
2472 127.0.0.1 444
2473 scanme.nmap.org 22
2474 scanme.nmap.org 23
2475 scanme.nmap.org 443
2476 EOF
2477
2478 1$ cat /tmp/host-port.txt |
2479 runp -q -p 'netcat -v -w2 -z' 2>&1 | egrep '(succeeded!|open)$'
2480
2481 # --colsep is needed to split the line
2482 1$ cat /tmp/host-port.txt |
2483 parallel --colsep ' ' netcat -v -w2 -z 2>&1 |
2484 egrep '(succeeded!|open)$'
2485 # or use uq for unquoted:
2486 1$ cat /tmp/host-port.txt |
2487 parallel netcat -v -w2 -z {=uq=} 2>&1 |
2488 egrep '(succeeded!|open)$'
2489
2490 https://github.com/jreisinger/runp (Last checked: 2020-04)
2491
2492 DIFFERENCES BETWEEN papply AND GNU Parallel
2493 Summary (see legend above):
2494
2495 - - - I4 - - -
2496 M1 - M3 - - M6
2497 - - O3 - O5 - - N/A N/A O10
2498 E1 - - E4 - - -
2499 - - - - - - - - -
2500 - -
2501
2502 papply does not print the output if the command fails:
2503
2504 $ papply 'echo %F; false' foo
2505 "echo foo; false" did not succeed
2506
2507 papply's replacement strings (%F %d %f %n %e %z) can be simulated in
2508 GNU parallel by putting this in ~/.parallel/config:
2509
2510 --rpl '%F'
2511 --rpl '%d $_=Q(::dirname($_));'
2512 --rpl '%f s:.*/::;'
2513 --rpl '%n s:.*/::;s:\.[^/.]+$::;'
2514 --rpl '%e s:.*\.:.:'
2515 --rpl '%z $_=""'
2516
2517 papply buffers in RAM, and uses twice the amount of output. So output
2518 of 5 GB takes 10 GB RAM.
2519
2520 The buffering is very CPU intensive: Buffering a line of 5 GB takes 40
2521 seconds (compared to 10 seconds with GNU parallel).
2522
2523 Examples as GNU Parallel
2524
2525 1$ papply gzip *.txt
2526
2527 1$ parallel gzip ::: *.txt
2528
2529 2$ papply "convert %F %n.jpg" *.png
2530
2531 2$ parallel convert {} {.}.jpg ::: *.png
2532
2533 https://pypi.org/project/papply/ (Last checked: 2020-04)
2534
2535 DIFFERENCES BETWEEN async AND GNU Parallel
2536 Summary (see legend above):
2537
2538 - - - I4 - - I7
2539 - - - - - M6
2540 - O2 O3 - O5 O6 - N/A N/A O10
2541 E1 - - E4 - E6 -
2542 - - - - - - - - -
2543 S1 S2
2544
2545 async is very similary to GNU parallel's --semaphore mode (aka sem).
2546 async requires the user to start a server process.
2547
2548 The input is quoted like -q so you need bash -c "...;..." to run
2549 composed commands.
2550
2551 Examples as GNU Parallel
2552
2553 1$ S="/tmp/example_socket"
2554
2555 1$ ID=myid
2556
2557 2$ async -s="$S" server --start
2558
2559 2$ # GNU Parallel does not need a server to run
2560
2561 3$ for i in {1..20}; do
2562 # prints command output to stdout
2563 async -s="$S" cmd -- bash -c "sleep 1 && echo test $i"
2564 done
2565
2566 3$ for i in {1..20}; do
2567 # prints command output to stdout
2568 sem --id "$ID" -j100% "sleep 1 && echo test $i"
2569 # GNU Parallel will only print job when it is done
2570 # If you need output from different jobs to mix
2571 # use -u or --line-buffer
2572 sem --id "$ID" -j100% --line-buffer "sleep 1 && echo test $i"
2573 done
2574
2575 4$ # wait until all commands are finished
2576 async -s="$S" wait
2577
2578 4$ sem --id "$ID" --wait
2579
2580 5$ # configure the server to run four commands in parallel
2581 async -s="$S" server -j4
2582
2583 5$ export PARALLEL=-j4
2584
2585 6$ mkdir "/tmp/ex_dir"
2586 for i in {21..40}; do
2587 # redirects command output to /tmp/ex_dir/file*
2588 async -s="$S" cmd -o "/tmp/ex_dir/file$i" -- \
2589 bash -c "sleep 1 && echo test $i"
2590 done
2591
2592 6$ mkdir "/tmp/ex_dir"
2593 for i in {21..40}; do
2594 # redirects command output to /tmp/ex_dir/file*
2595 sem --id "$ID" --result '/tmp/my-ex/file-{=$_=""=}'"$i" \
2596 "sleep 1 && echo test $i"
2597 done
2598
2599 7$ sem --id "$ID" --wait
2600
2601 7$ async -s="$S" wait
2602
2603 8$ # stops server
2604 async -s="$S" server --stop
2605
2606 8$ # GNU Parallel does not need to stop a server
2607
2608 https://github.com/ctbur/async/ (Last checked: 2020-11)
2609
2610 DIFFERENCES BETWEEN pardi AND GNU Parallel
2611 Summary (see legend above):
2612
2613 I1 I2 - - - - I7
2614 M1 - - - - M6
2615 O1 O2 O3 O4 O5 - O7 - - O10
2616 E1 - - E4 - - -
2617 - - - - - - - - -
2618 - -
2619
2620 pardi is very similar to parallel --pipe --cat: It reads blocks of data
2621 and not arguments. So it cannot insert an argument in the command line.
2622 It puts the block into a temporary file, and this file name (%IN) can
2623 be put in the command line. You can only use %IN once.
2624
2625 It can also run full command lines in parallel (like: cat file |
2626 parallel).
2627
2628 EXAMPLES FROM pardi test.sh
2629
2630 1$ time pardi -v -c 100 -i data/decoys.smi -ie .smi -oe .smi \
2631 -o data/decoys_std_pardi.smi \
2632 -w '(standardiser -i %IN -o %OUT 2>&1) > /dev/null'
2633
2634 1$ cat data/decoys.smi |
2635 time parallel -N 100 --pipe --cat \
2636 '(standardiser -i {} -o {#} 2>&1) > /dev/null; cat {#}; rm {#}' \
2637 > data/decoys_std_pardi.smi
2638
2639 2$ pardi -n 1 -i data/test_in.types -o data/test_out.types \
2640 -d 'r:^#atoms:' -w 'cat %IN > %OUT'
2641
2642 2$ cat data/test_in.types | parallel -n 1 -k --pipe --cat \
2643 --regexp --recstart '^#atoms' 'cat {}' > data/test_out.types
2644
2645 3$ pardi -c 6 -i data/test_in.types -o data/test_out.types \
2646 -d 'r:^#atoms:' -w 'cat %IN > %OUT'
2647
2648 3$ cat data/test_in.types | parallel -n 6 -k --pipe --cat \
2649 --regexp --recstart '^#atoms' 'cat {}' > data/test_out.types
2650
2651 4$ pardi -i data/decoys.mol2 -o data/still_decoys.mol2 \
2652 -d 's:@<TRIPOS>MOLECULE' -w 'cp %IN %OUT'
2653
2654 4$ cat data/decoys.mol2 |
2655 parallel -n 1 --pipe --cat --recstart '@<TRIPOS>MOLECULE' \
2656 'cp {} {#}; cat {#}; rm {#}' > data/still_decoys.mol2
2657
2658 5$ pardi -i data/decoys.mol2 -o data/decoys2.mol2 \
2659 -d b:10000 -w 'cp %IN %OUT' --preserve
2660
2661 5$ cat data/decoys.mol2 |
2662 parallel -k --pipe --block 10k --recend '' --cat \
2663 'cat {} > {#}; cat {#}; rm {#}' > data/decoys2.mol2
2664
2665 https://github.com/UnixJunkie/pardi (Last checked: 2021-01)
2666
2667 DIFFERENCES BETWEEN bthread AND GNU Parallel
2668 Summary (see legend above):
2669
2670 - - - I4 - - -
2671 - - - - - M6
2672 O1 - O3 - - - O7 O8 - -
2673 E1 - - - - - -
2674 - - - - - - - - -
2675 - -
2676
2677 bthread takes around 1 sec per MB of output. The maximal output line
2678 length is 1073741759.
2679
2680 You cannot quote space in the command, so you cannot run composed
2681 commands like sh -c "echo a; echo b".
2682
2683 https://gitlab.com/netikras/bthread (Last checked: 2021-01)
2684
2685 DIFFERENCES BETWEEN simple_gpu_scheduler AND GNU Parallel
2686 Summary (see legend above):
2687
2688 I1 - - - - - I7
2689 M1 - - - - M6
2690 - O2 O3 - - O6 - x x O10
2691 E1 - - - - - -
2692 - - - - - - - - -
2693 - -
2694
2695 EXAMPLES FROM simple_gpu_scheduler MANUAL
2696
2697 1$ simple_gpu_scheduler --gpus 0 1 2 < gpu_commands.txt
2698
2699 1$ parallel -j3 --shuf \
2700 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' < gpu_commands.txt
2701
2702 2$ simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" \
2703 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 |
2704 simple_gpu_scheduler --gpus 0,1,2
2705
2706 2$ parallel --header : --shuf -j3 -v \
2707 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =}' \
2708 python3 train_dnn.py --lr {lr} --batch_size {bs} \
2709 ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
2710
2711 3$ simple_hypersearch \
2712 "python3 train_dnn.py --lr {lr} --batch_size {bs}" \
2713 --n-samples 5 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 |
2714 simple_gpu_scheduler --gpus 0,1,2
2715
2716 3$ parallel --header : --shuf \
2717 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1; seq() > 5 and skip() =}' \
2718 python3 train_dnn.py --lr {lr} --batch_size {bs} \
2719 ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
2720
2721 4$ touch gpu.queue
2722 tail -f -n 0 gpu.queue | simple_gpu_scheduler --gpus 0,1,2 &
2723 echo "my_command_with | and stuff > logfile" >> gpu.queue
2724
2725 4$ touch gpu.queue
2726 tail -f -n 0 gpu.queue |
2727 parallel -j3 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' &
2728 # Needed to fill job slots once
2729 seq 3 | parallel echo true >> gpu.queue
2730 # Add jobs
2731 echo "my_command_with | and stuff > logfile" >> gpu.queue
2732 # Needed to flush output from completed jobs
2733 seq 3 | parallel echo true >> gpu.queue
2734
2735 https://github.com/ExpectationMax/simple_gpu_scheduler (Last checked:
2736 2021-01)
2737
2738 DIFFERENCES BETWEEN parasweep AND GNU Parallel
2739 parasweep is a Python module for facilitating parallel parameter
2740 sweeps.
2741
2742 A parasweep job will normally take a text file as input. The text file
2743 contains arguments for the job. Some of these arguments will be fixed
2744 and some of them will be changed by parasweep.
2745
2746 It does this by having a template file such as template.txt:
2747
2748 Xval: {x}
2749 Yval: {y}
2750 FixedValue: 9
2751 # x with 2 decimals
2752 DecimalX: {x:.2f}
2753 TenX: ${x*10}
2754 RandomVal: {r}
2755
2756 and from this template it generates the file to be used by the job by
2757 replacing the replacement strings.
2758
2759 Being a Python module parasweep integrates tighter with Python than GNU
2760 parallel. You get the parameters directly in a Python data structure.
2761 With GNU parallel you can use the JSON or CSV output format to get
2762 something similar, but you would have to read the output.
2763
2764 parasweep has a filtering method to ignore parameter combinations you
2765 do not need.
2766
2767 Instead of calling the jobs directly, parasweep can use Python's
2768 Distributed Resource Management Application API to make jobs run with
2769 different cluster software.
2770
2771 GNU parallel --tmpl supports templates with replacement strings. Such
2772 as:
2773
2774 Xval: {x}
2775 Yval: {y}
2776 FixedValue: 9
2777 # x with 2 decimals
2778 DecimalX: {=x $_=sprintf("%.2f",$_) =}
2779 TenX: {=x $_=$_*10 =}
2780 RandomVal: {=1 $_=rand() =}
2781
2782 that can be used like:
2783
2784 parallel --header : --tmpl my.tmpl={#}.t myprog {#}.t \
2785 ::: x 1 2 3 ::: y 1 2 3
2786
2787 Filtering is supported as:
2788
2789 parallel --filter '{1} > {2}' echo ::: 1 2 3 ::: 1 2 3
2790
2791 https://github.com/eviatarbach/parasweep (Last checked: 2021-01)
2792
2793 DIFFERENCES BETWEEN parallel-bash AND GNU Parallel
2794 Summary (see legend above):
2795
2796 I1 I2 - - - - -
2797 - - M3 - - M6
2798 - O2 O3 - O5 O6 - O8 x O10
2799 E1 - - - - - -
2800 - - - - - - - - -
2801 - -
2802
2803 parallel-bash is written in pure bash. It is really fast (overhead of
2804 ~0.05 ms/job compared to GNU parallel's 3-10 ms/job). So if your jobs
2805 are extremely short lived, and you can live with the quite limited
2806 command, this may be useful.
2807
2808 It works by making a queue for each process. Then the jobs are
2809 distributed to the queues in a round robin fashion. Finally the queues
2810 are started in parallel. This works fine, if you are lucky, but if not,
2811 all the long jobs may end up in the same queue, so you may see:
2812
2813 $ printf "%b\n" 1 1 1 4 1 1 1 4 1 1 1 4 |
2814 time parallel -P4 sleep {}
2815 (7 seconds)
2816 $ printf "%b\n" 1 1 1 4 1 1 1 4 1 1 1 4 |
2817 time ./parallel-bash.bash -p 4 -c sleep {}
2818 (12 seconds)
2819
2820 Because it uses bash lists, the total number of jobs is limited to
2821 167000..265000 depending on your environment. You get a segmentation
2822 fault, when you reach the limit.
2823
2824 Ctrl-C does not stop spawning new jobs. Ctrl-Z does not suspend running
2825 jobs.
2826
2827 EXAMPLES FROM parallel-bash
2828
2829 1$ some_input | parallel-bash -p 5 -c echo
2830
2831 1$ some_input | parallel -j 5 echo
2832
2833 2$ parallel-bash -p 5 -c echo < some_file
2834
2835 2$ parallel -j 5 echo < some_file
2836
2837 3$ parallel-bash -p 5 -c echo <<< 'some string'
2838
2839 3$ parallel -j 5 -c echo <<< 'some string'
2840
2841 4$ something | parallel-bash -p 5 -c echo {} {}
2842
2843 4$ something | parallel -j 5 echo {} {}
2844
2845 https://reposhub.com/python/command-line-tools/Akianonymus-parallel-bash.html
2846 (Last checked: 2021-06)
2847
2848 DIFFERENCES BETWEEN bash-concurrent AND GNU Parallel
2849 bash-concurrent is more an alternative to make than to GNU parallel.
2850 Its input is very similar to a Makefile, where jobs depend on other
2851 jobs.
2852
2853 It has a nice progress indicator where you can see which jobs completed
2854 successfully, which jobs are currently running, which jobs failed, and
2855 which jobs were skipped due to a depending job failed. The indicator
2856 does not deal well with resizing the window.
2857
2858 Output is cached in tempfiles on disk, but is only shown if there is an
2859 error, so it is not meant to be part of a UNIX pipeline. If bash-
2860 concurrent crashes these tempfiles are not removed.
2861
2862 It uses an O(n*n) algorithm, so if you have 1000 independent jobs it
2863 takes 22 seconds to start it.
2864
2865 https://github.com/themattrix/bash-concurrent (Last checked: 2021-02)
2866
2867 DIFFERENCES BETWEEN spawntool AND GNU Parallel
2868 Summary (see legend above):
2869
2870 I1 - - - - - -
2871 M1 - - - - M6
2872 - O2 O3 - O5 O6 - x x O10
2873 E1 - - - - - -
2874 - - - - - - - - -
2875 - -
2876
2877 spawn reads a full command line from stdin which it executes in
2878 parallel.
2879
2880 http://code.google.com/p/spawntool/ (Last checked: 2021-07)
2881
2882 DIFFERENCES BETWEEN go-pssh AND GNU Parallel
2883 Summary (see legend above):
2884
2885 - - - - - - -
2886 M1 - - - - -
2887 O1 - - - - - - x x O10
2888 E1 - - - - - -
2889 R1 R2 - - - R6 - - -
2890 - -
2891
2892 go-pssh does ssh in parallel to multiple machines. It runs the same
2893 command on multiple machines similar to --nonall.
2894
2895 The hostnames must be given as IP-addresses (not as hostnames).
2896
2897 Output is sent to stdout (standard output) if command is successful,
2898 and to stderr (standard error) if the command fails.
2899
2900 EXAMPLES FROM go-pssh
2901
2902 1$ go-pssh -l <ip>,<ip> -u <user> -p <port> -P <passwd> -c "<command>"
2903
2904 1$ parallel -S 'sshpass -p <passwd> ssh -p <port> <user>@<ip>' \
2905 --nonall "<command>"
2906
2907 2$ go-pssh scp -f host.txt -u <user> -p <port> -P <password> \
2908 -s /local/file_or_directory -d /remote/directory
2909
2910 2$ parallel --nonall --slf host.txt \
2911 --basefile /local/file_or_directory/./ --wd /remote/directory
2912 --ssh 'sshpass -p <password> ssh -p <port> -l <user>' true
2913
2914 3$ go-pssh scp -l <ip>,<ip> -u <user> -p <port> -P <password> \
2915 -s /local/file_or_directory -d /remote/directory
2916
2917 3$ parallel --nonall -S <ip>,<ip> \
2918 --basefile /local/file_or_directory/./ --wd /remote/directory
2919 --ssh 'sshpass -p <password> ssh -p <port> -l <user>' true
2920
2921 https://github.com/xuchenCN/go-pssh (Last checked: 2021-07)
2922
2923 DIFFERENCES BETWEEN go-parallel AND GNU Parallel
2924 Summary (see legend above):
2925
2926 I1 I2 - - - - I7
2927 - - M3 - - M6
2928 - O2 O3 - O5 - - x x - O10
2929 E1 - - E4 - - -
2930 - - - - - - - - -
2931 - -
2932
2933 go-parallel uses Go templates for replacement strings. Quite similar to
2934 the {= perl expr =} replacement string.
2935
2936 EXAMPLES FROM go-parallel
2937
2938 1$ go-parallel -a ./files.txt -t 'cp {{.Input}} {{.Input | dirname | dirname}}'
2939
2940 1$ parallel -a ./files.txt cp {} '{= $_=::dirname(::dirname($_)) =}'
2941
2942 2$ go-parallel -a ./files.txt -t 'mkdir -p {{.Input}} {{noExt .Input}}'
2943
2944 2$ parallel -a ./files.txt echo mkdir -p {} {.}
2945
2946 3$ go-parallel -a ./files.txt -t 'mkdir -p {{.Input}} {{.Input | basename | noExt}}'
2947
2948 3$ parallel -a ./files.txt echo mkdir -p {} {/.}
2949
2950 https://github.com/mylanconnolly/parallel (Last checked: 2021-07)
2951
2952 DIFFERENCES BETWEEN p AND GNU Parallel
2953 Summary (see legend above):
2954
2955 - - - I4 - - N/A
2956 - - - - - M6
2957 - O2 O3 - O5 O6 - x x - O10
2958 E1 - - - - - -
2959 - - - - - - - - -
2960 - -
2961
2962 p is a tiny shell script. It can color output with some predefined
2963 colors, but is otherwise quite limited.
2964
2965 It maxes out at around 116000 jobs (probably due to limitations in
2966 Bash).
2967
2968 EXAMPLES FROM p
2969
2970 Some of the examples from p cannot be implemented 100% by GNU parallel:
2971 The coloring is a bit different, and GNU parallel cannot have --tag for
2972 some inputs and not for others.
2973
2974 The coloring done by GNU parallel is not exactly the same as p.
2975
2976 1$ p -bc blue "ping 127.0.0.1" -uc red "ping 192.168.0.1" \
2977 -rc yellow "ping 192.168.1.1" -t example "ping example.com"
2978
2979 1$ parallel --lb -j0 --color --tag ping \
2980 ::: 127.0.0.1 192.168.0.1 192.168.1.1 example.com
2981
2982 2$ p "tail -f /var/log/httpd/access_log" \
2983 -bc red "tail -f /var/log/httpd/error_log"
2984
2985 2$ cd /var/log/httpd;
2986 parallel --lb --color --tag tail -f ::: access_log error_log
2987
2988 3$ p tail -f "some file" \& p tail -f "other file with space.txt"
2989
2990 3$ parallel --lb tail -f ::: 'some file' "other file with space.txt"
2991
2992 4$ p -t project1 "hg pull project1" -t project2 \
2993 "hg pull project2" -t project3 "hg pull project3"
2994
2995 4$ parallel --lb hg pull ::: project{1..3}
2996
2997 https://github.com/rudymatela/evenmoreutils/blob/master/man/p.1.adoc
2998 (Last checked: 2022-04)
2999
3000 DIFFERENCES BETWEEN senechal AND GNU Parallel
3001 Summary (see legend above):
3002
3003 I1 - - - - - -
3004 M1 - M3 - - M6
3005 O1 - O3 O4 - - - x x -
3006 E1 - - - - - -
3007 - - - - - - - - -
3008 - -
3009
3010 seneschal only starts the first job after reading the last job, and
3011 output from the first job is only printed after the last job finishes.
3012
3013 1 byte of output requites 3.5 bytes of RAM.
3014
3015 This makes it impossible to have a total output bigger than the virtual
3016 memory.
3017
3018 Even though output is kept in RAM outputing is quite slow: 30 MB/s.
3019
3020 Output larger than 4 GB causes random problems - it looks like a race
3021 condition.
3022
3023 This:
3024
3025 echo 1 | seneschal --prefix='yes `seq 1000`|head -c 1G' >/dev/null
3026
3027 takes 4100(!) CPU seconds to run on a 64C64T server, but only 140 CPU
3028 seconds on a 4C8T laptop. So it looks like seneschal wastes a lot of
3029 CPU time coordinating the CPUs.
3030
3031 Compare this to:
3032
3033 echo 1 | time -v parallel -N0 'yes `seq 1000`|head -c 1G' >/dev/null
3034
3035 which takes 3-8 CPU seconds.
3036
3037 EXAMPLES FROM seneschal README.md
3038
3039 1$ echo $REPOS | seneschal --prefix="cd {} && git pull"
3040
3041 # If $REPOS is newline separated
3042 1$ echo "$REPOS" | parallel -k "cd {} && git pull"
3043 # If $REPOS is space separated
3044 1$ echo -n "$REPOS" | parallel -d' ' -k "cd {} && git pull"
3045
3046 COMMANDS="pwd
3047 sleep 5 && echo boom
3048 echo Howdy
3049 whoami"
3050
3051 2$ echo "$COMMANDS" | seneschal --debug
3052
3053 2$ echo "$COMMANDS" | parallel -k -v
3054
3055 3$ ls -1 | seneschal --prefix="pushd {}; git pull; popd;"
3056
3057 3$ ls -1 | parallel -k "pushd {}; git pull; popd;"
3058 # Or if current dir also contains files:
3059 3$ parallel -k "pushd {}; git pull; popd;" ::: */
3060
3061 https://github.com/TheWizardTower/seneschal (Last checked: 2022-06)
3062
3063 Todo
3064 http://code.google.com/p/push/ (cannot compile)
3065
3066 https://github.com/krashanoff/parallel
3067
3068 https://github.com/Nukesor/pueue
3069
3070 https://arxiv.org/pdf/2012.15443.pdf KumQuat
3071
3072 https://arxiv.org/pdf/2007.09436.pdf PaSH: Light-touch Data-Parallel
3073 Shell Processing
3074
3075 https://github.com/JeiKeiLim/simple_distribute_job
3076
3077 https://github.com/reggi/pkgrun - not obvious how to use
3078
3079 https://github.com/benoror/better-npm-run - not obvious how to use
3080
3081 https://github.com/bahmutov/with-package
3082
3083 https://github.com/flesler/parallel
3084
3085 https://github.com/Julian/Verge
3086
3087 https://manpages.ubuntu.com/manpages/xenial/man1/tsp.1.html
3088
3089 https://vicerveza.homeunix.net/~viric/soft/ts/
3090
3091 https://github.com/chapmanjacobd/que
3092
3094 There are certain issues that are very common on parallelizing tools.
3095 Here are a few stress tests. Be warned: If the tool is badly coded it
3096 may overload your machine.
3097
3098 MIX: Output mixes
3099 Output from 2 jobs should not mix. If the output is not used, this does
3100 not matter; but if the output is used then it is important that you do
3101 not get half a line from one job followed by half a line from another
3102 job.
3103
3104 If the tool does not buffer, output will most likely mix now and then.
3105
3106 This test stresses whether output mixes.
3107
3108 #!/bin/bash
3109
3110 paralleltool="parallel -j0"
3111
3112 cat <<-EOF > mycommand
3113 #!/bin/bash
3114
3115 # If a, b, c, d, e, and f mix: Very bad
3116 perl -e 'print STDOUT "a"x3000_000," "'
3117 perl -e 'print STDERR "b"x3000_000," "'
3118 perl -e 'print STDOUT "c"x3000_000," "'
3119 perl -e 'print STDERR "d"x3000_000," "'
3120 perl -e 'print STDOUT "e"x3000_000," "'
3121 perl -e 'print STDERR "f"x3000_000," "'
3122 echo
3123 echo >&2
3124 EOF
3125 chmod +x mycommand
3126
3127 # Run 30 jobs in parallel
3128 seq 30 |
3129 $paralleltool ./mycommand > >(tr -s abcdef) 2> >(tr -s abcdef >&2)
3130
3131 # 'a c e' and 'b d f' should always stay together
3132 # and there should only be a single line per job
3133
3134 STDERRMERGE: Stderr is merged with stdout
3135 Output from stdout and stderr should not be merged, but kept separated.
3136
3137 This test shows whether stdout is mixed with stderr.
3138
3139 #!/bin/bash
3140
3141 paralleltool="parallel -j0"
3142
3143 cat <<-EOF > mycommand
3144 #!/bin/bash
3145
3146 echo stdout
3147 echo stderr >&2
3148 echo stdout
3149 echo stderr >&2
3150 EOF
3151 chmod +x mycommand
3152
3153 # Run one job
3154 echo |
3155 $paralleltool ./mycommand > stdout 2> stderr
3156 cat stdout
3157 cat stderr
3158
3159 RAM: Output limited by RAM
3160 Some tools cache output in RAM. This makes them extremely slow if the
3161 output is bigger than physical memory and crash if the output is bigger
3162 than the virtual memory.
3163
3164 #!/bin/bash
3165
3166 paralleltool="parallel -j0"
3167
3168 cat <<'EOF' > mycommand
3169 #!/bin/bash
3170
3171 # Generate 1 GB output
3172 yes "`perl -e 'print \"c\"x30_000'`" | head -c 1G
3173 EOF
3174 chmod +x mycommand
3175
3176 # Run 20 jobs in parallel
3177 # Adjust 20 to be > physical RAM and < free space on /tmp
3178 seq 20 | time $paralleltool ./mycommand | wc -c
3179
3180 DISKFULL: Incomplete data if /tmp runs full
3181 If caching is done on disk, the disk can run full during the run. Not
3182 all programs discover this. GNU Parallel discovers it, if it stays full
3183 for at least 2 seconds.
3184
3185 #!/bin/bash
3186
3187 paralleltool="parallel -j0"
3188
3189 # This should be a dir with less than 100 GB free space
3190 smalldisk=/tmp/shm/parallel
3191
3192 TMPDIR="$smalldisk"
3193 export TMPDIR
3194
3195 max_output() {
3196 # Force worst case scenario:
3197 # Make GNU Parallel only check once per second
3198 sleep 10
3199 # Generate 100 GB to fill $TMPDIR
3200 # Adjust if /tmp is bigger than 100 GB
3201 yes | head -c 100G >$TMPDIR/$$
3202 # Generate 10 MB output that will not be buffered due to full disk
3203 perl -e 'print "X"x10_000_000' | head -c 10M
3204 echo This part is missing from incomplete output
3205 sleep 2
3206 rm $TMPDIR/$$
3207 echo Final output
3208 }
3209
3210 export -f max_output
3211 seq 10 | $paralleltool max_output | tr -s X
3212
3213 CLEANUP: Leaving tmp files at unexpected death
3214 Some tools do not clean up tmp files if they are killed. If the tool
3215 buffers on disk, they may not clean up, if they are killed.
3216
3217 #!/bin/bash
3218
3219 paralleltool=parallel
3220
3221 ls /tmp >/tmp/before
3222 seq 10 | $paralleltool sleep &
3223 pid=$!
3224 # Give the tool time to start up
3225 sleep 1
3226 # Kill it without giving it a chance to cleanup
3227 kill -9 $!
3228 # Should be empty: No files should be left behind
3229 diff <(ls /tmp) /tmp/before
3230
3231 SPCCHAR: Dealing badly with special file names.
3232 It is not uncommon for users to create files like:
3233
3234 My brother's 12" *** record (costs $$$).jpg
3235
3236 Some tools break on this.
3237
3238 #!/bin/bash
3239
3240 paralleltool=parallel
3241
3242 touch "My brother's 12\" *** record (costs \$\$\$).jpg"
3243 ls My*jpg | $paralleltool ls -l
3244
3245 COMPOSED: Composed commands do not work
3246 Some tools require you to wrap composed commands into bash -c.
3247
3248 echo bar | $paralleltool echo foo';' echo {}
3249
3250 ONEREP: Only one replacement string allowed
3251 Some tools can only insert the argument once.
3252
3253 echo bar | $paralleltool echo {} foo {}
3254
3255 INPUTSIZE: Length of input should not be limited
3256 Some tools limit the length of the input lines artificially with no
3257 good reason. GNU parallel does not:
3258
3259 perl -e 'print "foo."."x"x100_000_000' | parallel echo {.}
3260
3261 GNU parallel limits the command to run to 128 KB due to execve(1):
3262
3263 perl -e 'print "x"x131_000' | parallel echo {} | wc
3264
3265 NUMWORDS: Speed depends on number of words
3266 Some tools become very slow if output lines have many words.
3267
3268 #!/bin/bash
3269
3270 paralleltool=parallel
3271
3272 cat <<-EOF > mycommand
3273 #!/bin/bash
3274
3275 # 10 MB of lines with 1000 words
3276 yes "`seq 1000`" | head -c 10M
3277 EOF
3278 chmod +x mycommand
3279
3280 # Run 30 jobs in parallel
3281 seq 30 | time $paralleltool -j0 ./mycommand > /dev/null
3282
3283 4GB: Output with a line > 4GB should be OK
3284 #!/bin/bash
3285
3286 paralleltool="parallel -j0"
3287
3288 cat <<-EOF > mycommand
3289 #!/bin/bash
3290
3291 perl -e '\$a="a"x1000_000; for(1..5000) { print \$a }'
3292 EOF
3293 chmod +x mycommand
3294
3295 # Run 1 job
3296 seq 1 | $paralleltool ./mycommand | LC_ALL=C wc
3297
3299 When using GNU parallel for a publication please cite:
3300
3301 O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
3302 The USENIX Magazine, February 2011:42-47.
3303
3304 This helps funding further development; and it won't cost you a cent.
3305 If you pay 10000 EUR you should feel free to use GNU Parallel without
3306 citing.
3307
3308 Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
3309
3310 Copyright (C) 2008-2010 Ole Tange, http://ole.tange.dk
3311
3312 Copyright (C) 2010-2022 Ole Tange, http://ole.tange.dk and Free
3313 Software Foundation, Inc.
3314
3315 Parts of the manual concerning xargs compatibility is inspired by the
3316 manual of xargs from GNU findutils 4.4.2.
3317
3319 This program is free software; you can redistribute it and/or modify it
3320 under the terms of the GNU General Public License as published by the
3321 Free Software Foundation; either version 3 of the License, or at your
3322 option any later version.
3323
3324 This program is distributed in the hope that it will be useful, but
3325 WITHOUT ANY WARRANTY; without even the implied warranty of
3326 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
3327 General Public License for more details.
3328
3329 You should have received a copy of the GNU General Public License along
3330 with this program. If not, see <https://www.gnu.org/licenses/>.
3331
3332 Documentation license I
3333 Permission is granted to copy, distribute and/or modify this
3334 documentation under the terms of the GNU Free Documentation License,
3335 Version 1.3 or any later version published by the Free Software
3336 Foundation; with no Invariant Sections, with no Front-Cover Texts, and
3337 with no Back-Cover Texts. A copy of the license is included in the
3338 file LICENSES/GFDL-1.3-or-later.txt.
3339
3340 Documentation license II
3341 You are free:
3342
3343 to Share to copy, distribute and transmit the work
3344
3345 to Remix to adapt the work
3346
3347 Under the following conditions:
3348
3349 Attribution
3350 You must attribute the work in the manner specified by the
3351 author or licensor (but not in any way that suggests that they
3352 endorse you or your use of the work).
3353
3354 Share Alike
3355 If you alter, transform, or build upon this work, you may
3356 distribute the resulting work only under the same, similar or
3357 a compatible license.
3358
3359 With the understanding that:
3360
3361 Waiver Any of the above conditions can be waived if you get
3362 permission from the copyright holder.
3363
3364 Public Domain
3365 Where the work or any of its elements is in the public domain
3366 under applicable law, that status is in no way affected by the
3367 license.
3368
3369 Other Rights
3370 In no way are any of the following rights affected by the
3371 license:
3372
3373 • Your fair dealing or fair use rights, or other applicable
3374 copyright exceptions and limitations;
3375
3376 • The author's moral rights;
3377
3378 • Rights other persons may have either in the work itself or
3379 in how the work is used, such as publicity or privacy
3380 rights.
3381
3382 Notice For any reuse or distribution, you must make clear to others
3383 the license terms of this work.
3384
3385 A copy of the full license is included in the file as
3386 LICENCES/CC-BY-SA-4.0.txt
3387
3389 GNU parallel uses Perl, and the Perl modules Getopt::Long, IPC::Open3,
3390 Symbol, IO::File, POSIX, and File::Temp. For remote usage it also uses
3391 rsync with ssh.
3392
3394 find(1), xargs(1), make(1), pexec(1), ppss(1), xjobs(1), prll(1),
3395 dxargs(1), mdm(1)
3396
3397
3398
339920220722 2022-08-19 PARALLEL_ALTERNATIVES(7)