1PARALLEL_ALTERNATIVES(7) parallel PARALLEL_ALTERNATIVES(7)
2
3
4
6 parallel_alternatives - Alternatives to GNU parallel
7
9 There are a lot programs with some of the functionality of GNU
10 parallel. GNU parallel strives to include the best of the functionality
11 without sacrificing ease of use.
12
13 parallel has existed since 2002 and as GNU parallel since 2010. A lot
14 of the alternatives have not had the vitality to survive that long, but
15 have come and gone during that time.
16
17 GNU parallel is actively maintained with a new release every month
18 since 2010. Most other alternatives are fleeting interests of the
19 developers with irregular releases and only maintained for a few years.
20
21 SUMMARY TABLE
22 The following features are in some of the comparable tools:
23
24 Inputs
25 I1. Arguments can be read from stdin
26 I2. Arguments can be read from a file
27 I3. Arguments can be read from multiple files
28 I4. Arguments can be read from command line
29 I5. Arguments can be read from a table
30 I6. Arguments can be read from the same file using #! (shebang)
31 I7. Line oriented input as default (Quoting of special chars not
32 needed)
33
34 Manipulation of input
35 M1. Composed command
36 M2. Multiple arguments can fill up an execution line
37 M3. Arguments can be put anywhere in the execution line
38 M4. Multiple arguments can be put anywhere in the execution line
39 M5. Arguments can be replaced with context
40 M6. Input can be treated as the complete command line
41
42 Outputs
43 O1. Grouping output so output from different jobs do not mix
44 O2. Send stderr (standard error) to stderr (standard error)
45 O3. Send stdout (standard output) to stdout (standard output)
46 O4. Order of output can be same as order of input
47 O5. Stdout only contains stdout (standard output) from the command
48 O6. Stderr only contains stderr (standard error) from the command
49 O7. Buffering on disk
50 O8. Cleanup of file if killed
51 O9. Test if disk runs full during run
52 O10. Output of a line bigger than 4 GB
53
54 Execution
55 E1. Running jobs in parallel
56 E2. List running jobs
57 E3. Finish running jobs, but do not start new jobs
58 E4. Number of running jobs can depend on number of cpus
59 E5. Finish running jobs, but do not start new jobs after first failure
60 E6. Number of running jobs can be adjusted while running
61 E7. Only spawn new jobs if load is less than a limit
62
63 Remote execution
64 R1. Jobs can be run on remote computers
65 R2. Basefiles can be transferred
66 R3. Argument files can be transferred
67 R4. Result files can be transferred
68 R5. Cleanup of transferred files
69 R6. No config files needed
70 R7. Do not run more than SSHD's MaxStartups can handle
71 R8. Configurable SSH command
72 R9. Retry if connection breaks occasionally
73
74 Semaphore
75 S1. Possibility to work as a mutex
76 S2. Possibility to work as a counting semaphore
77
78 Legend
79 - = no
80 x = not applicable
81 ID = yes
82
83 As every new version of the programs are not tested the table may be
84 outdated. Please file a bug-report if you find errors (See REPORTING
85 BUGS).
86
87 parallel: I1 I2 I3 I4 I5 I6 I7 M1 M2 M3 M4 M5 M6 O1 O2 O3 O4 O5 O6 O7
88 O8 O9 O10 E1 E2 E3 E4 E5 E6 E7 R1 R2 R3 R4 R5 R6 R7 R8 R9 S1 S2
89
90 find -exec: - - - x - x - - M2 M3 - - - - - O2 O3 O4 O5 O6 -
91 - - - - - - - - - - - - - - - x x
92
93 make -j: - - - - - - - - - - - - - O1 O2 O3 - x O6 E1 - -
94 - E5 - - - - - - - - - - - -
95
96 DIFFERENCES BETWEEN xargs AND GNU Parallel
97 Summary table (see legend above): I1 I2 - - - - - - M2 M3 - - - - O2 O3
98 - O5 O6 E1 - - - - - - - - - - - x - - - - -
99
100 xargs offers some of the same possibilities as GNU parallel.
101
102 xargs deals badly with special characters (such as space, \, ' and ").
103 To see the problem try this:
104
105 touch important_file
106 touch 'not important_file'
107 ls not* | xargs rm
108 mkdir -p "My brother's 12\" records"
109 ls | xargs rmdir
110 touch 'c:\windows\system32\clfs.sys'
111 echo 'c:\windows\system32\clfs.sys' | xargs ls -l
112
113 You can specify -0, but many input generators are not optimized for
114 using NUL as separator but are optimized for newline as separator. E.g.
115 awk, ls, echo, tar -v, head (requires using -z), tail (requires using
116 -z), sed (requires using -z), perl (-0 and \0 instead of \n), locate
117 (requires using -0), find (requires using -print0), grep (requires
118 using -z or -Z), sort (requires using -z).
119
120 GNU parallel's newline separation can be emulated with:
121
122 cat | xargs -d "\n" -n1 command
123
124 xargs can run a given number of jobs in parallel, but has no support
125 for running number-of-cpu-cores jobs in parallel.
126
127 xargs has no support for grouping the output, therefore output may run
128 together, e.g. the first half of a line is from one process and the
129 last half of the line is from another process. The example Parallel
130 grep cannot be done reliably with xargs because of this. To see this in
131 action try:
132
133 parallel perl -e '\$a=\"1\".\"{}\"x10000000\;print\ \$a,\"\\n\"' \
134 '>' {} ::: a b c d e f g h
135 # Serial = no mixing = the wanted result
136 # 'tr -s a-z' squeezes repeating letters into a single letter
137 echo a b c d e f g h | xargs -P1 -n1 grep 1 | tr -s a-z
138 # Compare to 8 jobs in parallel
139 parallel -kP8 -n1 grep 1 ::: a b c d e f g h | tr -s a-z
140 echo a b c d e f g h | xargs -P8 -n1 grep 1 | tr -s a-z
141 echo a b c d e f g h | xargs -P8 -n1 grep --line-buffered 1 | \
142 tr -s a-z
143
144 Or try this:
145
146 slow_seq() {
147 echo Count to "$@"
148 seq "$@" |
149 perl -ne '$|=1; for(split//){ print; select($a,$a,$a,0.100);}'
150 }
151 export -f slow_seq
152 # Serial = no mixing = the wanted result
153 seq 8 | xargs -n1 -P1 -I {} bash -c 'slow_seq {}'
154 # Compare to 8 jobs in parallel
155 seq 8 | parallel -P8 slow_seq {}
156 seq 8 | xargs -n1 -P8 -I {} bash -c 'slow_seq {}'
157
158 xargs has no support for keeping the order of the output, therefore if
159 running jobs in parallel using xargs the output of the second job
160 cannot be postponed till the first job is done.
161
162 xargs has no support for running jobs on remote computers.
163
164 xargs has no support for context replace, so you will have to create
165 the arguments.
166
167 If you use a replace string in xargs (-I) you can not force xargs to
168 use more than one argument.
169
170 Quoting in xargs works like -q in GNU parallel. This means composed
171 commands and redirection require using bash -c.
172
173 ls | parallel "wc {} >{}.wc"
174 ls | parallel "echo {}; ls {}|wc"
175
176 becomes (assuming you have 8 cores and that none of the filenames
177 contain space, " or ').
178
179 ls | xargs -d "\n" -P8 -I {} bash -c "wc {} >{}.wc"
180 ls | xargs -d "\n" -P8 -I {} bash -c "echo {}; ls {}|wc"
181
182 https://www.gnu.org/software/findutils/
183
184 DIFFERENCES BETWEEN find -exec AND GNU Parallel
185 find -exec offers some of the same possibilities as GNU parallel.
186
187 find -exec only works on files. Processing other input (such as hosts
188 or URLs) will require creating these inputs as files. find -exec has no
189 support for running commands in parallel.
190
191 https://www.gnu.org/software/findutils/ (Last checked: 2019-01)
192
193 DIFFERENCES BETWEEN make -j AND GNU Parallel
194 make -j can run jobs in parallel, but requires a crafted Makefile to do
195 this. That results in extra quoting to get filenames containing
196 newlines to work correctly.
197
198 make -j computes a dependency graph before running jobs. Jobs run by
199 GNU parallel does not depend on each other.
200
201 (Very early versions of GNU parallel were coincidentally implemented
202 using make -j).
203
204 https://www.gnu.org/software/make/ (Last checked: 2019-01)
205
206 DIFFERENCES BETWEEN ppss AND GNU Parallel
207 Summary table (see legend above): I1 I2 - - - - I7 M1 - M3 - - M6 O1 -
208 - x - - E1 E2 ?E3 E4 - - - R1 R2 R3 R4 - - ?R7 ? ? - -
209
210 ppss is also a tool for running jobs in parallel.
211
212 The output of ppss is status information and thus not useful for using
213 as input for another command. The output from the jobs are put into
214 files.
215
216 The argument replace string ($ITEM) cannot be changed. Arguments must
217 be quoted - thus arguments containing special characters (space '"&!*)
218 may cause problems. More than one argument is not supported. Filenames
219 containing newlines are not processed correctly. When reading input
220 from a file null cannot be used as a terminator. ppss needs to read the
221 whole input file before starting any jobs.
222
223 Output and status information is stored in ppss_dir and thus requires
224 cleanup when completed. If the dir is not removed before running ppss
225 again it may cause nothing to happen as ppss thinks the task is already
226 done. GNU parallel will normally not need cleaning up if running
227 locally and will only need cleaning up if stopped abnormally and
228 running remote (--cleanup may not complete if stopped abnormally). The
229 example Parallel grep would require extra postprocessing if written
230 using ppss.
231
232 For remote systems PPSS requires 3 steps: config, deploy, and start.
233 GNU parallel only requires one step.
234
235 EXAMPLES FROM ppss MANUAL
236
237 Here are the examples from ppss's manual page with the equivalent using
238 GNU parallel:
239
240 1$ ./ppss.sh standalone -d /path/to/files -c 'gzip '
241
242 1$ find /path/to/files -type f | parallel gzip
243
244 2$ ./ppss.sh standalone -d /path/to/files -c 'cp "$ITEM" /destination/dir '
245
246 2$ find /path/to/files -type f | parallel cp {} /destination/dir
247
248 3$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q '
249
250 3$ parallel -a list-of-urls.txt wget -q
251
252 4$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q "$ITEM"'
253
254 4$ parallel -a list-of-urls.txt wget -q {}
255
256 5$ ./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir \
257 -m 192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh \
258 -n nodes.txt -o /some/output/dir --upload --download;
259 ./ppss deploy -C config.cfg
260 ./ppss start -C config
261
262 5$ # parallel does not use configs. If you want a different username put it in nodes.txt: user@hostname
263 find source/dir -type f |
264 parallel --sshloginfile nodes.txt --trc {.}.mp3 lame -a {} -o {.}.mp3 --preset standard --quiet
265
266 6$ ./ppss stop -C config.cfg
267
268 6$ killall -TERM parallel
269
270 7$ ./ppss pause -C config.cfg
271
272 7$ Press: CTRL-Z or killall -SIGTSTP parallel
273
274 8$ ./ppss continue -C config.cfg
275
276 8$ Enter: fg or killall -SIGCONT parallel
277
278 9$ ./ppss.sh status -C config.cfg
279
280 9$ killall -SIGUSR2 parallel
281
282 https://github.com/louwrentius/PPSS
283
284 DIFFERENCES BETWEEN pexec AND GNU Parallel
285 Summary table (see legend above): I1 I2 - I4 I5 - - M1 - M3 - - M6 O1
286 O2 O3 - O5 O6 E1 - - E4 - E6 - R1 - - - - R6 - - - S1 -
287
288 pexec is also a tool for running jobs in parallel.
289
290 EXAMPLES FROM pexec MANUAL
291
292 Here are the examples from pexec's info page with the equivalent using
293 GNU parallel:
294
295 1$ pexec -o sqrt-%s.dat -p "$(seq 10)" -e NUM -n 4 -c -- \
296 'echo "scale=10000;sqrt($NUM)" | bc'
297
298 1$ seq 10 | parallel -j4 'echo "scale=10000;sqrt({})" | \
299 bc > sqrt-{}.dat'
300
301 2$ pexec -p "$(ls myfiles*.ext)" -i %s -o %s.sort -- sort
302
303 2$ ls myfiles*.ext | parallel sort {} ">{}.sort"
304
305 3$ pexec -f image.list -n auto -e B -u star.log -c -- \
306 'fistar $B.fits -f 100 -F id,x,y,flux -o $B.star'
307
308 3$ parallel -a image.list \
309 'fistar {}.fits -f 100 -F id,x,y,flux -o {}.star' 2>star.log
310
311 4$ pexec -r *.png -e IMG -c -o - -- \
312 'convert $IMG ${IMG%.png}.jpeg ; "echo $IMG: done"'
313
314 4$ ls *.png | parallel 'convert {} {.}.jpeg; echo {}: done'
315
316 5$ pexec -r *.png -i %s -o %s.jpg -c 'pngtopnm | pnmtojpeg'
317
318 5$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {}.jpg'
319
320 6$ for p in *.png ; do echo ${p%.png} ; done | \
321 pexec -f - -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
322
323 6$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
324
325 7$ LIST=$(for p in *.png ; do echo ${p%.png} ; done)
326 pexec -r $LIST -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
327
328 7$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
329
330 8$ pexec -n 8 -r *.jpg -y unix -e IMG -c \
331 'pexec -j -m blockread -d $IMG | \
332 jpegtopnm | pnmscale 0.5 | pnmtojpeg | \
333 pexec -j -m blockwrite -s th_$IMG'
334
335 8$ # Combining GNU B<parallel> and GNU B<sem>.
336 ls *jpg | parallel -j8 'sem --id blockread cat {} | jpegtopnm |' \
337 'pnmscale 0.5 | pnmtojpeg | sem --id blockwrite cat > th_{}'
338
339 # If reading and writing is done to the same disk, this may be
340 # faster as only one process will be either reading or writing:
341 ls *jpg | parallel -j8 'sem --id diskio cat {} | jpegtopnm |' \
342 'pnmscale 0.5 | pnmtojpeg | sem --id diskio cat > th_{}'
343
344 https://www.gnu.org/software/pexec/
345
346 DIFFERENCES BETWEEN xjobs AND GNU Parallel
347 xjobs is also a tool for running jobs in parallel. It only supports
348 running jobs on your local computer.
349
350 xjobs deals badly with special characters just like xargs. See the
351 section DIFFERENCES BETWEEN xargs AND GNU Parallel.
352
353 EXAMPLES FROM xjobs MANUAL
354
355 Here are the examples from xjobs's man page with the equivalent using
356 GNU parallel:
357
358 1$ ls -1 *.zip | xjobs unzip
359
360 1$ ls *.zip | parallel unzip
361
362 2$ ls -1 *.zip | xjobs -n unzip
363
364 2$ ls *.zip | parallel unzip >/dev/null
365
366 3$ find . -name '*.bak' | xjobs gzip
367
368 3$ find . -name '*.bak' | parallel gzip
369
370 4$ ls -1 *.jar | sed 's/\(.*\)/\1 > \1.idx/' | xjobs jar tf
371
372 4$ ls *.jar | parallel jar tf {} '>' {}.idx
373
374 5$ xjobs -s script
375
376 5$ cat script | parallel
377
378 6$ mkfifo /var/run/my_named_pipe;
379 xjobs -s /var/run/my_named_pipe &
380 echo unzip 1.zip >> /var/run/my_named_pipe;
381 echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
382
383 6$ mkfifo /var/run/my_named_pipe;
384 cat /var/run/my_named_pipe | parallel &
385 echo unzip 1.zip >> /var/run/my_named_pipe;
386 echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
387
388 http://www.maier-komor.de/xjobs.html (Last checked: 2019-01)
389
390 DIFFERENCES BETWEEN prll AND GNU Parallel
391 prll is also a tool for running jobs in parallel. It does not support
392 running jobs on remote computers.
393
394 prll encourages using BASH aliases and BASH functions instead of
395 scripts. GNU parallel supports scripts directly, functions if they are
396 exported using export -f, and aliases if using env_parallel.
397
398 prll generates a lot of status information on stderr (standard error)
399 which makes it harder to use the stderr (standard error) output of the
400 job directly as input for another program.
401
402 EXAMPLES FROM prll's MANUAL
403
404 Here is the example from prll's man page with the equivalent using GNU
405 parallel:
406
407 1$ prll -s 'mogrify -flip $1' *.jpg
408
409 1$ parallel mogrify -flip ::: *.jpg
410
411 https://github.com/exzombie/prll (Last checked: 2019-01)
412
413 DIFFERENCES BETWEEN dxargs AND GNU Parallel
414 dxargs is also a tool for running jobs in parallel.
415
416 dxargs does not deal well with more simultaneous jobs than SSHD's
417 MaxStartups. dxargs is only built for remote run jobs, but does not
418 support transferring of files.
419
420 https://web.archive.org/web/20120518070250/http://www.
421 semicomplete.com/blog/geekery/distributed-xargs.html (Last checked:
422 2019-01)
423
424 DIFFERENCES BETWEEN mdm/middleman AND GNU Parallel
425 middleman(mdm) is also a tool for running jobs in parallel.
426
427 EXAMPLES FROM middleman's WEBSITE
428
429 Here are the shellscripts of
430 https://web.archive.org/web/20110728064735/http://mdm.
431 berlios.de/usage.html ported to GNU parallel:
432
433 1$ seq 19 | parallel buffon -o - | sort -n > result
434 cat files | parallel cmd
435 find dir -execdir sem cmd {} \;
436
437 https://github.com/cklin/mdm (Last checked: 2019-01)
438
439 DIFFERENCES BETWEEN xapply AND GNU Parallel
440 xapply can run jobs in parallel on the local computer.
441
442 EXAMPLES FROM xapply's MANUAL
443
444 Here are the examples from xapply's man page with the equivalent using
445 GNU parallel:
446
447 1$ xapply '(cd %1 && make all)' */
448
449 1$ parallel 'cd {} && make all' ::: */
450
451 2$ xapply -f 'diff %1 ../version5/%1' manifest | more
452
453 2$ parallel diff {} ../version5/{} < manifest | more
454
455 3$ xapply -p/dev/null -f 'diff %1 %2' manifest1 checklist1
456
457 3$ parallel --link diff {1} {2} :::: manifest1 checklist1
458
459 4$ xapply 'indent' *.c
460
461 4$ parallel indent ::: *.c
462
463 5$ find ~ksb/bin -type f ! -perm -111 -print | \
464 xapply -f -v 'chmod a+x' -
465
466 5$ find ~ksb/bin -type f ! -perm -111 -print | \
467 parallel -v chmod a+x
468
469 6$ find */ -... | fmt 960 1024 | xapply -f -i /dev/tty 'vi' -
470
471 6$ sh <(find */ -... | parallel -s 1024 echo vi)
472
473 6$ find */ -... | parallel -s 1024 -Xuj1 vi
474
475 7$ find ... | xapply -f -5 -i /dev/tty 'vi' - - - - -
476
477 7$ sh <(find ... | parallel -n5 echo vi)
478
479 7$ find ... | parallel -n5 -uj1 vi
480
481 8$ xapply -fn "" /etc/passwd
482
483 8$ parallel -k echo < /etc/passwd
484
485 9$ tr ':' '\012' < /etc/passwd | \
486 xapply -7 -nf 'chown %1 %6' - - - - - - -
487
488 9$ tr ':' '\012' < /etc/passwd | parallel -N7 chown {1} {6}
489
490 10$ xapply '[ -d %1/RCS ] || echo %1' */
491
492 10$ parallel '[ -d {}/RCS ] || echo {}' ::: */
493
494 11$ xapply -f '[ -f %1 ] && echo %1' List | ...
495
496 11$ parallel '[ -f {} ] && echo {}' < List | ...
497
498 https://web.archive.org/web/20160702211113/
499 http://carrera.databits.net/~ksb/msrc/local/bin/xapply/xapply.html
500
501 DIFFERENCES BETWEEN AIX apply AND GNU Parallel
502 apply can build command lines based on a template and arguments - very
503 much like GNU parallel. apply does not run jobs in parallel. apply does
504 not use an argument separator (like :::); instead the template must be
505 the first argument.
506
507 EXAMPLES FROM IBM's KNOWLEDGE CENTER
508
509 Here are the examples from IBM's Knowledge Center and the corresponding
510 command using GNU parallel:
511
512 To obtain results similar to those of the ls command, enter:
513
514 1$ apply echo *
515 1$ parallel echo ::: *
516
517 To compare the file named a1 to the file named b1, and the file named
518 a2 to the file named b2, enter:
519
520 2$ apply -2 cmp a1 b1 a2 b2
521 2$ parallel -N2 cmp ::: a1 b1 a2 b2
522
523 To run the who command five times, enter:
524
525 3$ apply -0 who 1 2 3 4 5
526 3$ parallel -N0 who ::: 1 2 3 4 5
527
528 To link all files in the current directory to the directory /usr/joe,
529 enter:
530
531 4$ apply 'ln %1 /usr/joe' *
532 4$ parallel ln {} /usr/joe ::: *
533
534 https://www-01.ibm.com/support/knowledgecenter/
535 ssw_aix_71/com.ibm.aix.cmds1/apply.htm (Last checked: 2019-01)
536
537 DIFFERENCES BETWEEN paexec AND GNU Parallel
538 paexec can run jobs in parallel on both the local and remote computers.
539
540 paexec requires commands to print a blank line as the last output. This
541 means you will have to write a wrapper for most programs.
542
543 paexec has a job dependency facility so a job can depend on another job
544 to be executed successfully. Sort of a poor-man's make.
545
546 EXAMPLES FROM paexec's EXAMPLE CATALOG
547
548 Here are the examples from paexec's example catalog with the equivalent
549 using GNU parallel:
550
551 1_div_X_run
552
553 1$ ../../paexec -s -l -c "`pwd`/1_div_X_cmd" -n +1 <<EOF [...]
554
555 1$ parallel echo {} '|' `pwd`/1_div_X_cmd <<EOF [...]
556
557 all_substr_run
558
559 2$ ../../paexec -lp -c "`pwd`/all_substr_cmd" -n +3 <<EOF [...]
560
561 2$ parallel echo {} '|' `pwd`/all_substr_cmd <<EOF [...]
562
563 cc_wrapper_run
564
565 3$ ../../paexec -c "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
566 -n 'host1 host2' \
567 -t '/usr/bin/ssh -x' <<EOF [...]
568
569 3$ parallel echo {} '|' "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
570 -S host1,host2 <<EOF [...]
571
572 # This is not exactly the same, but avoids the wrapper
573 parallel gcc -O2 -c -o {.}.o {} \
574 -S host1,host2 <<EOF [...]
575
576 toupper_run
577
578 4$ ../../paexec -lp -c "`pwd`/toupper_cmd" -n +10 <<EOF [...]
579
580 4$ parallel echo {} '|' ./toupper_cmd <<EOF [...]
581
582 # Without the wrapper:
583 parallel echo {} '| awk {print\ toupper\(\$0\)}' <<EOF [...]
584
585 https://github.com/cheusov/paexec
586
587 DIFFERENCES BETWEEN map(sitaramc) AND GNU Parallel
588 Summary table (see legend above): I1 - - I4 - - (I7) M1 (M2) M3 (M4) M5
589 M6 - O2 O3 - O5 - - N/A N/A O10 E1 - - - - - - - - - - - - - - - - -
590
591 (I7): Only under special circumstances. See below.
592
593 (M2+M4): Only if there is a single replacement string.
594
595 map rejects input with special characters:
596
597 echo "The Cure" > My\ brother\'s\ 12\"\ records
598
599 ls | map 'echo %; wc %'
600
601 It works with GNU parallel:
602
603 ls | parallel 'echo {}; wc {}'
604
605 Under some circumstances it also works with map:
606
607 ls | map 'echo % works %'
608
609 But tiny changes make it reject the input with special characters:
610
611 ls | map 'echo % does not work "%"'
612
613 This means that many UTF-8 characters will be rejected. This is by
614 design. From the web page: "As such, programs that quietly handle them,
615 with no warnings at all, are doing their users a disservice."
616
617 map delays each job by 0.01 s. This can be emulated by using parallel
618 --delay 0.01.
619
620 map prints '+' on stderr when a job starts, and '-' when a job
621 finishes. This cannot be disabled. parallel has --bar if you need to
622 see progress.
623
624 map's replacement strings (% %D %B %E) can be simulated in GNU parallel
625 by putting this in ~/.parallel/config:
626
627 --rpl '%'
628 --rpl '%D $_=Q(::dirname($_));'
629 --rpl '%B s:.*/::;s:\.[^/.]+$::;'
630 --rpl '%E s:.*\.::'
631
632 map does not have an argument separator on the command line, but uses
633 the first argument as command. This makes quoting harder which again
634 may affect readability. Compare:
635
636 map -p 2 'perl -ne '"'"'/^\S+\s+\S+$/ and print $ARGV,"\n"'"'" *
637
638 parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' ::: *
639
640 map can do multiple arguments with context replace, but not without
641 context replace:
642
643 parallel --xargs echo 'BEGIN{'{}'}END' ::: 1 2 3
644
645 map "echo 'BEGIN{'%'}END'" 1 2 3
646
647 map has no support for grouping. So this gives the wrong results:
648
649 parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} \
650 ::: a b c d e f
651 ls -l a b c d e f
652 parallel -kP4 -n1 grep 1 ::: a b c d e f > out.par
653 map -n1 -p 4 'grep 1' a b c d e f > out.map-unbuf
654 map -n1 -p 4 'grep --line-buffered 1' a b c d e f > out.map-linebuf
655 map -n1 -p 1 'grep --line-buffered 1' a b c d e f > out.map-serial
656 ls -l out*
657 md5sum out*
658
659 EXAMPLES FROM map's WEBSITE
660
661 Here are the examples from map's web page with the equivalent using GNU
662 parallel:
663
664 1$ ls *.gif | map convert % %B.png # default max-args: 1
665
666 1$ ls *.gif | parallel convert {} {.}.png
667
668 2$ map "mkdir %B; tar -C %B -xf %" *.tgz # default max-args: 1
669
670 2$ parallel 'mkdir {.}; tar -C {.} -xf {}' ::: *.tgz
671
672 3$ ls *.gif | map cp % /tmp # default max-args: 100
673
674 3$ ls *.gif | parallel -X cp {} /tmp
675
676 4$ ls *.tar | map -n 1 tar -xf %
677
678 4$ ls *.tar | parallel tar -xf
679
680 5$ map "cp % /tmp" *.tgz
681
682 5$ parallel cp {} /tmp ::: *.tgz
683
684 6$ map "du -sm /home/%/mail" alice bob carol
685
686 6$ parallel "du -sm /home/{}/mail" ::: alice bob carol
687 or if you prefer running a single job with multiple args:
688 6$ parallel -Xj1 "du -sm /home/{}/mail" ::: alice bob carol
689
690 7$ cat /etc/passwd | map -d: 'echo user %1 has shell %7'
691
692 7$ cat /etc/passwd | parallel --colsep : 'echo user {1} has shell {7}'
693
694 8$ export MAP_MAX_PROCS=$(( `nproc` / 2 ))
695
696 8$ export PARALLEL=-j50%
697
698 https://github.com/sitaramc/map (Last checked: 2020-05)
699
700 DIFFERENCES BETWEEN ladon AND GNU Parallel
701 ladon can run multiple jobs on files in parallel.
702
703 ladon only works on files and the only way to specify files is using a
704 quoted glob string (such as \*.jpg). It is not possible to list the
705 files manually.
706
707 As replacement strings it uses FULLPATH DIRNAME BASENAME EXT RELDIR
708 RELPATH
709
710 These can be simulated using GNU parallel by putting this in
711 ~/.parallel/config:
712
713 --rpl 'FULLPATH $_=Q($_);chomp($_=qx{readlink -f $_});'
714 --rpl 'DIRNAME $_=Q(::dirname($_));chomp($_=qx{readlink -f $_});'
715 --rpl 'BASENAME s:.*/::;s:\.[^/.]+$::;'
716 --rpl 'EXT s:.*\.::'
717 --rpl 'RELDIR $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
718 s:\Q$c/\E::;$_=::dirname($_);'
719 --rpl 'RELPATH $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
720 s:\Q$c/\E::;'
721
722 ladon deals badly with filenames containing " and newline, and it fails
723 for output larger than 200k:
724
725 ladon '*' -- seq 36000 | wc
726
727 EXAMPLES FROM ladon MANUAL
728
729 It is assumed that the '--rpl's above are put in ~/.parallel/config and
730 that it is run under a shell that supports '**' globbing (such as zsh):
731
732 1$ ladon "**/*.txt" -- echo RELPATH
733
734 1$ parallel echo RELPATH ::: **/*.txt
735
736 2$ ladon "~/Documents/**/*.pdf" -- shasum FULLPATH >hashes.txt
737
738 2$ parallel shasum FULLPATH ::: ~/Documents/**/*.pdf >hashes.txt
739
740 3$ ladon -m thumbs/RELDIR "**/*.jpg" -- convert FULLPATH \
741 -thumbnail 100x100^ -gravity center -extent 100x100 \
742 thumbs/RELPATH
743
744 3$ parallel mkdir -p thumbs/RELDIR\; convert FULLPATH
745 -thumbnail 100x100^ -gravity center -extent 100x100 \
746 thumbs/RELPATH ::: **/*.jpg
747
748 4$ ladon "~/Music/*.wav" -- lame -V 2 FULLPATH DIRNAME/BASENAME.mp3
749
750 4$ parallel lame -V 2 FULLPATH DIRNAME/BASENAME.mp3 ::: ~/Music/*.wav
751
752 https://github.com/danielgtaylor/ladon (Last checked: 2019-01)
753
754 DIFFERENCES BETWEEN jobflow AND GNU Parallel
755 jobflow can run multiple jobs in parallel.
756
757 Just like xargs output from jobflow jobs running in parallel mix
758 together by default. jobflow can buffer into files (placed in
759 /run/shm), but these are not cleaned up if jobflow dies unexpectedly
760 (e.g. by Ctrl-C). If the total output is big (in the order of RAM+swap)
761 it can cause the system to slow to a crawl and eventually run out of
762 memory.
763
764 jobflow gives no error if the command is unknown, and like xargs
765 redirection and composed commands require wrapping with bash -c.
766
767 Input lines can at most be 4096 bytes. You can at most have 16 {}'s in
768 the command template. More than that either crashes the program or
769 simple does not execute the command.
770
771 jobflow has no equivalent for --pipe, or --sshlogin.
772
773 jobflow makes it possible to set resource limits on the running jobs.
774 This can be emulated by GNU parallel using bash's ulimit:
775
776 jobflow -limits=mem=100M,cpu=3,fsize=20M,nofiles=300 myjob
777
778 parallel 'ulimit -v 102400 -t 3 -f 204800 -n 300 myjob'
779
780 EXAMPLES FROM jobflow README
781
782 1$ cat things.list | jobflow -threads=8 -exec ./mytask {}
783
784 1$ cat things.list | parallel -j8 ./mytask {}
785
786 2$ seq 100 | jobflow -threads=100 -exec echo {}
787
788 2$ seq 100 | parallel -j100 echo {}
789
790 3$ cat urls.txt | jobflow -threads=32 -exec wget {}
791
792 3$ cat urls.txt | parallel -j32 wget {}
793
794 4$ find . -name '*.bmp' | \
795 jobflow -threads=8 -exec bmp2jpeg {.}.bmp {.}.jpg
796
797 4$ find . -name '*.bmp' | \
798 parallel -j8 bmp2jpeg {.}.bmp {.}.jpg
799
800 https://github.com/rofl0r/jobflow
801
802 DIFFERENCES BETWEEN gargs AND GNU Parallel
803 gargs can run multiple jobs in parallel.
804
805 Older versions cache output in memory. This causes it to be extremely
806 slow when the output is larger than the physical RAM, and can cause the
807 system to run out of memory.
808
809 See more details on this in man parallel_design.
810
811 Newer versions cache output in files, but leave files in $TMPDIR if it
812 is killed.
813
814 Output to stderr (standard error) is changed if the command fails.
815
816 EXAMPLES FROM gargs WEBSITE
817
818 1$ seq 12 -1 1 | gargs -p 4 -n 3 "sleep {0}; echo {1} {2}"
819
820 1$ seq 12 -1 1 | parallel -P 4 -n 3 "sleep {1}; echo {2} {3}"
821
822 2$ cat t.txt | gargs --sep "\s+" \
823 -p 2 "echo '{0}:{1}-{2}' full-line: \'{}\'"
824
825 2$ cat t.txt | parallel --colsep "\\s+" \
826 -P 2 "echo '{1}:{2}-{3}' full-line: \'{}\'"
827
828 https://github.com/brentp/gargs
829
830 DIFFERENCES BETWEEN orgalorg AND GNU Parallel
831 orgalorg can run the same job on multiple machines. This is related to
832 --onall and --nonall.
833
834 orgalorg supports entering the SSH password - provided it is the same
835 for all servers. GNU parallel advocates using ssh-agent instead, but it
836 is possible to emulate orgalorg's behavior by setting SSHPASS and by
837 using --ssh "sshpass ssh".
838
839 To make the emulation easier, make a simple alias:
840
841 alias par_emul="parallel -j0 --ssh 'sshpass ssh' --nonall --tag --lb"
842
843 If you want to supply a password run:
844
845 SSHPASS=`ssh-askpass`
846
847 or set the password directly:
848
849 SSHPASS=P4$$w0rd!
850
851 If the above is set up you can then do:
852
853 orgalorg -o frontend1 -o frontend2 -p -C uptime
854 par_emul -S frontend1 -S frontend2 uptime
855
856 orgalorg -o frontend1 -o frontend2 -p -C top -bid 1
857 par_emul -S frontend1 -S frontend2 top -bid 1
858
859 orgalorg -o frontend1 -o frontend2 -p -er /tmp -n \
860 'md5sum /tmp/bigfile' -S bigfile
861 par_emul -S frontend1 -S frontend2 --basefile bigfile \
862 --workdir /tmp md5sum /tmp/bigfile
863
864 orgalorg has a progress indicator for the transferring of a file. GNU
865 parallel does not.
866
867 https://github.com/reconquest/orgalorg
868
869 DIFFERENCES BETWEEN Rust parallel AND GNU Parallel
870 Rust parallel focuses on speed. It is almost as fast as xargs. It
871 implements a few features from GNU parallel, but lacks many functions.
872 All these fail:
873
874 # Read arguments from file
875 parallel -a file echo
876 # Changing the delimiter
877 parallel -d _ echo ::: a_b_c_
878
879 These do something different from GNU parallel
880
881 # -q to protect quoted $ and space
882 parallel -q perl -e '$a=shift; print "$a"x10000000' ::: a b c
883 # Generation of combination of inputs
884 parallel echo {1} {2} ::: red green blue ::: S M L XL XXL
885 # {= perl expression =} replacement string
886 parallel echo '{= s/new/old/ =}' ::: my.new your.new
887 # --pipe
888 seq 100000 | parallel --pipe wc
889 # linked arguments
890 parallel echo ::: S M L :::+ sml med lrg ::: R G B :::+ red grn blu
891 # Run different shell dialects
892 zsh -c 'parallel echo \={} ::: zsh && true'
893 csh -c 'parallel echo \$\{\} ::: shell && true'
894 bash -c 'parallel echo \$\({}\) ::: pwd && true'
895 # Rust parallel does not start before the last argument is read
896 (seq 10; sleep 5; echo 2) | time parallel -j2 'sleep 2; echo'
897 tail -f /var/log/syslog | parallel echo
898
899 Most of the examples from the book GNU Parallel 2018 do not work, thus
900 Rust parallel is not close to being a compatible replacement.
901
902 Rust parallel has no remote facilities.
903
904 It uses /tmp/parallel for tmp files and does not clean up if terminated
905 abruptly. If another user on the system uses Rust parallel, then
906 /tmp/parallel will have the wrong permissions and Rust parallel will
907 fail. A malicious user can setup the right permissions and symlink the
908 output file to one of the user's files and next time the user uses Rust
909 parallel it will overwrite this file.
910
911 attacker$ mkdir /tmp/parallel
912 attacker$ chmod a+rwX /tmp/parallel
913 # Symlink to the file the attacker wants to zero out
914 attacker$ ln -s ~victim/.important-file /tmp/parallel/stderr_1
915 victim$ seq 1000 | parallel echo
916 # This file is now overwritten with stderr from 'echo'
917 victim$ cat ~victim/.important-file
918
919 If /tmp/parallel runs full during the run, Rust parallel does not
920 report this, but finishes with success - thereby risking data loss.
921
922 https://github.com/mmstick/parallel
923
924 DIFFERENCES BETWEEN Rush AND GNU Parallel
925 rush (https://github.com/shenwei356/rush) is written in Go and based on
926 gargs.
927
928 Just like GNU parallel rush buffers in temporary files. But opposite
929 GNU parallel rush does not clean up, if the process dies abnormally.
930
931 rush has some string manipulations that can be emulated by putting this
932 into ~/.parallel/config (/ is used instead of %, and % is used instead
933 of ^ as that is closer to bash's ${var%postfix}):
934
935 --rpl '{:} s:(\.[^/]+)*$::'
936 --rpl '{:%([^}]+?)} s:$$1(\.[^/]+)*$::'
937 --rpl '{/:%([^}]*?)} s:.*/(.*)$$1(\.[^/]+)*$:$1:'
938 --rpl '{/:} s:(.*/)?([^/.]+)(\.[^/]+)*$:$2:'
939 --rpl '{@(.*?)} /$$1/ and $_=$1;'
940
941 EXAMPLES FROM rush's WEBSITE
942
943 Here are the examples from rush's website with the equivalent command
944 in GNU parallel.
945
946 1. Simple run, quoting is not necessary
947
948 1$ seq 1 3 | rush echo {}
949
950 1$ seq 1 3 | parallel echo {}
951
952 2. Read data from file (`-i`)
953
954 2$ rush echo {} -i data1.txt -i data2.txt
955
956 2$ cat data1.txt data2.txt | parallel echo {}
957
958 3. Keep output order (`-k`)
959
960 3$ seq 1 3 | rush 'echo {}' -k
961
962 3$ seq 1 3 | parallel -k echo {}
963
964 4. Timeout (`-t`)
965
966 4$ time seq 1 | rush 'sleep 2; echo {}' -t 1
967
968 4$ time seq 1 | parallel --timeout 1 'sleep 2; echo {}'
969
970 5. Retry (`-r`)
971
972 5$ seq 1 | rush 'python unexisted_script.py' -r 1
973
974 5$ seq 1 | parallel --retries 2 'python unexisted_script.py'
975
976 Use -u to see it is really run twice:
977
978 5$ seq 1 | parallel -u --retries 2 'python unexisted_script.py'
979
980 6. Dirname (`{/}`) and basename (`{%}`) and remove custom suffix
981 (`{^suffix}`)
982
983 6$ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
984
985 6$ echo dir/file_1.txt.gz |
986 parallel --plus echo {//} {/} {%_1.txt.gz}
987
988 7. Get basename, and remove last (`{.}`) or any (`{:}`) extension
989
990 7$ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
991
992 7$ echo dir.d/file.txt.gz | parallel 'echo {.} {:} {/.} {/:}'
993
994 8. Job ID, combine fields index and other replacement strings
995
996 8$ echo 12 file.txt dir/s_1.fq.gz |
997 rush 'echo job {#}: {2} {2.} {3%:^_1}'
998
999 8$ echo 12 file.txt dir/s_1.fq.gz |
1000 parallel --colsep ' ' 'echo job {#}: {2} {2.} {3/:%_1}'
1001
1002 9. Capture submatch using regular expression (`{@regexp}`)
1003
1004 9$ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
1005
1006 9$ echo read_1.fq.gz | parallel 'echo {@(.+)_\d}'
1007
1008 10. Custom field delimiter (`-d`)
1009
1010 10$ echo a=b=c | rush 'echo {1} {2} {3}' -d =
1011
1012 10$ echo a=b=c | parallel -d = echo {1} {2} {3}
1013
1014 11. Send multi-lines to every command (`-n`)
1015
1016 11$ seq 5 | rush -n 2 -k 'echo "{}"; echo'
1017
1018 11$ seq 5 |
1019 parallel -n 2 -k \
1020 'echo {=-1 $_=join"\n",@arg[1..$#arg] =}; echo'
1021
1022 11$ seq 5 | rush -n 2 -k 'echo "{}"; echo' -J ' '
1023
1024 11$ seq 5 | parallel -n 2 -k 'echo {}; echo'
1025
1026 12. Custom record delimiter (`-D`), note that empty records are not
1027 used.
1028
1029 12$ echo a b c d | rush -D " " -k 'echo {}'
1030
1031 12$ echo a b c d | parallel -d " " -k 'echo {}'
1032
1033 12$ echo abcd | rush -D "" -k 'echo {}'
1034
1035 Cannot be done by GNU Parallel
1036
1037 12$ cat fasta.fa
1038 >seq1
1039 tag
1040 >seq2
1041 cat
1042 gat
1043 >seq3
1044 attac
1045 a
1046 cat
1047
1048 12$ cat fasta.fa | rush -D ">" \
1049 'echo FASTA record {#}: name: {1} sequence: {2}' -k -d "\n"
1050 # rush fails to join the multiline sequences
1051
1052 12$ cat fasta.fa | (read -n1 ignore_first_char;
1053 parallel -d '>' --colsep '\n' echo FASTA record {#}: \
1054 name: {1} sequence: '{=2 $_=join"",@arg[2..$#arg]=}'
1055 )
1056
1057 13. Assign value to variable, like `awk -v` (`-v`)
1058
1059 13$ seq 1 |
1060 rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
1061
1062 13$ seq 1 |
1063 parallel -N0 \
1064 'fname=Wei; lname=Shen; echo Hello, ${fname} ${lname}!'
1065
1066 13$ for var in a b; do \
1067 13$ seq 1 3 | rush -k -v var=$var 'echo var: {var}, data: {}'; \
1068 13$ done
1069
1070 In GNU parallel you would typically do:
1071
1072 13$ seq 1 3 | parallel -k echo var: {1}, data: {2} ::: a b :::: -
1073
1074 If you really want the var:
1075
1076 13$ seq 1 3 |
1077 parallel -k var={1} ';echo var: $var, data: {}' ::: a b :::: -
1078
1079 If you really want the for-loop:
1080
1081 13$ for var in a b; do
1082 export var;
1083 seq 1 3 | parallel -k 'echo var: $var, data: {}';
1084 done
1085
1086 Contrary to rush this also works if the value is complex like:
1087
1088 My brother's 12" records
1089
1090 14. Preset variable (`-v`), avoid repeatedly writing verbose
1091 replacement strings
1092
1093 14$ # naive way
1094 echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz'
1095
1096 14$ echo read_1.fq.gz | parallel 'echo {:%_1} {:%_1}_2.fq.gz'
1097
1098 14$ # macro + removing suffix
1099 echo read_1.fq.gz |
1100 rush -v p='{:^_1}' 'echo {p} {p}_2.fq.gz'
1101
1102 14$ echo read_1.fq.gz |
1103 parallel 'p={:%_1}; echo $p ${p}_2.fq.gz'
1104
1105 14$ # macro + regular expression
1106 echo read_1.fq.gz | rush -v p='{@(.+?)_\d}' 'echo {p} {p}_2.fq.gz'
1107
1108 14$ echo read_1.fq.gz | parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1109
1110 Contrary to rush GNU parallel works with complex values:
1111
1112 14$ echo "My brother's 12\"read_1.fq.gz" |
1113 parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1114
1115 15. Interrupt jobs by `Ctrl-C`, rush will stop unfinished commands and
1116 exit.
1117
1118 15$ seq 1 20 | rush 'sleep 1; echo {}'
1119 ^C
1120
1121 15$ seq 1 20 | parallel 'sleep 1; echo {}'
1122 ^C
1123
1124 16. Continue/resume jobs (`-c`). When some jobs failed (by execution
1125 failure, timeout, or canceling by user with `Ctrl + C`), please switch
1126 flag `-c/--continue` on and run again, so that `rush` can save
1127 successful commands and ignore them in NEXT run.
1128
1129 16$ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1130 cat successful_cmds.rush
1131 seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1132
1133 16$ seq 1 3 | parallel --joblog mylog --timeout 2 \
1134 'sleep {}; echo {}'
1135 cat mylog
1136 seq 1 3 | parallel --joblog mylog --retry-failed \
1137 'sleep {}; echo {}'
1138
1139 Multi-line jobs:
1140
1141 16$ seq 1 3 | rush 'sleep {}; echo {}; \
1142 echo finish {}' -t 3 -c -C finished.rush
1143 cat finished.rush
1144 seq 1 3 | rush 'sleep {}; echo {}; \
1145 echo finish {}' -t 3 -c -C finished.rush
1146
1147 16$ seq 1 3 |
1148 parallel --joblog mylog --timeout 2 'sleep {}; echo {}; \
1149 echo finish {}'
1150 cat mylog
1151 seq 1 3 |
1152 parallel --joblog mylog --retry-failed 'sleep {}; echo {}; \
1153 echo finish {}'
1154
1155 17. A comprehensive example: downloading 1K+ pages given by three URL
1156 list files using `phantomjs save_page.js` (some page contents are
1157 dynamically generated by Javascript, so `wget` does not work). Here I
1158 set max jobs number (`-j`) as `20`, each job has a max running time
1159 (`-t`) of `60` seconds and `3` retry changes (`-r`). Continue flag `-c`
1160 is also switched on, so we can continue unfinished jobs. Luckily, it's
1161 accomplished in one run :)
1162
1163 17$ for f in $(seq 2014 2016); do \
1164 /bin/rm -rf $f; mkdir -p $f; \
1165 cat $f.html.txt | rush -v d=$f -d = \
1166 'phantomjs save_page.js "{}" > {d}/{3}.html' \
1167 -j 20 -t 60 -r 3 -c; \
1168 done
1169
1170 GNU parallel can append to an existing joblog with '+':
1171
1172 17$ rm mylog
1173 for f in $(seq 2014 2016); do
1174 /bin/rm -rf $f; mkdir -p $f;
1175 cat $f.html.txt |
1176 parallel -j20 --timeout 60 --retries 4 --joblog +mylog \
1177 --colsep = \
1178 phantomjs save_page.js {1}={2}={3} '>' $f/{3}.html
1179 done
1180
1181 18. A bioinformatics example: mapping with `bwa`, and processing result
1182 with `samtools`:
1183
1184 18$ ref=ref/xxx.fa
1185 threads=25
1186 ls -d raw.cluster.clean.mapping/* \
1187 | rush -v ref=$ref -v j=$threads -v p='{}/{%}' \
1188 'bwa mem -t {j} -M -a {ref} {p}_1.fq.gz {p}_2.fq.gz >{p}.sam;\
1189 samtools view -bS {p}.sam > {p}.bam; \
1190 samtools sort -T {p}.tmp -@ {j} {p}.bam -o {p}.sorted.bam; \
1191 samtools index {p}.sorted.bam; \
1192 samtools flagstat {p}.sorted.bam > {p}.sorted.bam.flagstat; \
1193 /bin/rm {p}.bam {p}.sam;' \
1194 -j 2 --verbose -c -C mapping.rush
1195
1196 GNU parallel would use a function:
1197
1198 18$ ref=ref/xxx.fa
1199 export ref
1200 thr=25
1201 export thr
1202 bwa_sam() {
1203 p="$1"
1204 bam="$p".bam
1205 sam="$p".sam
1206 sortbam="$p".sorted.bam
1207 bwa mem -t $thr -M -a $ref ${p}_1.fq.gz ${p}_2.fq.gz > "$sam"
1208 samtools view -bS "$sam" > "$bam"
1209 samtools sort -T ${p}.tmp -@ $thr "$bam" -o "$sortbam"
1210 samtools index "$sortbam"
1211 samtools flagstat "$sortbam" > "$sortbam".flagstat
1212 /bin/rm "$bam" "$sam"
1213 }
1214 export -f bwa_sam
1215 ls -d raw.cluster.clean.mapping/* |
1216 parallel -j 2 --verbose --joblog mylog bwa_sam
1217
1218 Other rush features
1219
1220 rush has:
1221
1222 • awk -v like custom defined variables (-v)
1223
1224 With GNU parallel you would simply set a shell variable:
1225
1226 parallel 'v={}; echo "$v"' ::: foo
1227 echo foo | rush -v v={} 'echo {v}'
1228
1229 Also rush does not like special chars. So these do not work:
1230
1231 echo does not work | rush -v v=\" 'echo {v}'
1232 echo "My brother's 12\" records" | rush -v v={} 'echo {v}'
1233
1234 Whereas the corresponding GNU parallel version works:
1235
1236 parallel 'v=\"; echo "$v"' ::: works
1237 parallel 'v={}; echo "$v"' ::: "My brother's 12\" records"
1238
1239 • Exit on first error(s) (-e)
1240
1241 This is called --halt now,fail=1 (or shorter: --halt 2) when used
1242 with GNU parallel.
1243
1244 • Settable records sending to every command (-n, default 1)
1245
1246 This is also called -n in GNU parallel.
1247
1248 • Practical replacement strings
1249
1250 {:} remove any extension
1251 With GNU parallel this can be emulated by:
1252
1253 parallel --plus echo '{/\..*/}' ::: foo.ext.bar.gz
1254
1255 {^suffix}, remove suffix
1256 With GNU parallel this can be emulated by:
1257
1258 parallel --plus echo '{%.bar.gz}' ::: foo.ext.bar.gz
1259
1260 {@regexp}, capture submatch using regular expression
1261 With GNU parallel this can be emulated by:
1262
1263 parallel --rpl '{@(.*?)} /$$1/ and $_=$1;' \
1264 echo '{@\d_(.*).gz}' ::: 1_foo.gz
1265
1266 {%.}, {%:}, basename without extension
1267 With GNU parallel this can be emulated by:
1268
1269 parallel echo '{= s:.*/::;s/\..*// =}' ::: dir/foo.bar.gz
1270
1271 And if you need it often, you define a --rpl in
1272 $HOME/.parallel/config:
1273
1274 --rpl '{%.} s:.*/::;s/\..*//'
1275 --rpl '{%:} s:.*/::;s/\..*//'
1276
1277 Then you can use them as:
1278
1279 parallel echo {%.} {%:} ::: dir/foo.bar.gz
1280
1281 • Preset variable (macro)
1282
1283 E.g.
1284
1285 echo foosuffix | rush -v p={^suffix} 'echo {p}_new_suffix'
1286
1287 With GNU parallel this can be emulated by:
1288
1289 echo foosuffix |
1290 parallel --plus 'p={%suffix}; echo ${p}_new_suffix'
1291
1292 Opposite rush GNU parallel works fine if the input contains double
1293 space, ' and ":
1294
1295 echo "1'6\" foosuffix" |
1296 parallel --plus 'p={%suffix}; echo "${p}"_new_suffix'
1297
1298 • Commands of multi-lines
1299
1300 While you can use multi-lined commands in GNU parallel, to improve
1301 readability GNU parallel discourages the use of multi-line
1302 commands. In most cases it can be written as a function:
1303
1304 seq 1 3 |
1305 parallel --timeout 2 --joblog my.log 'sleep {}; echo {}; \
1306 echo finish {}'
1307
1308 Could be written as:
1309
1310 doit() {
1311 sleep "$1"
1312 echo "$1"
1313 echo finish "$1"
1314 }
1315 export -f doit
1316 seq 1 3 | parallel --timeout 2 --joblog my.log doit
1317
1318 The failed commands can be resumed with:
1319
1320 seq 1 3 |
1321 parallel --resume-failed --joblog my.log 'sleep {}; echo {};\
1322 echo finish {}'
1323
1324 https://github.com/shenwei356/rush
1325
1326 DIFFERENCES BETWEEN ClusterSSH AND GNU Parallel
1327 ClusterSSH solves a different problem than GNU parallel.
1328
1329 ClusterSSH opens a terminal window for each computer and using a master
1330 window you can run the same command on all the computers. This is
1331 typically used for administrating several computers that are almost
1332 identical.
1333
1334 GNU parallel runs the same (or different) commands with different
1335 arguments in parallel possibly using remote computers to help
1336 computing. If more than one computer is listed in -S GNU parallel may
1337 only use one of these (e.g. if there are 8 jobs to be run and one
1338 computer has 8 cores).
1339
1340 GNU parallel can be used as a poor-man's version of ClusterSSH:
1341
1342 parallel --nonall -S server-a,server-b do_stuff foo bar
1343
1344 https://github.com/duncs/clusterssh
1345
1346 DIFFERENCES BETWEEN coshell AND GNU Parallel
1347 coshell only accepts full commands on standard input. Any quoting needs
1348 to be done by the user.
1349
1350 Commands are run in sh so any bash/tcsh/zsh specific syntax will not
1351 work.
1352
1353 Output can be buffered by using -d. Output is buffered in memory, so
1354 big output can cause swapping and therefore be terrible slow or even
1355 cause out of memory.
1356
1357 https://github.com/gdm85/coshell (Last checked: 2019-01)
1358
1359 DIFFERENCES BETWEEN spread AND GNU Parallel
1360 spread runs commands on all directories.
1361
1362 It can be emulated with GNU parallel using this Bash function:
1363
1364 spread() {
1365 _cmds() {
1366 perl -e '$"=" && ";print "@ARGV"' "cd {}" "$@"
1367 }
1368 parallel $(_cmds "$@")'|| echo exit status $?' ::: */
1369 }
1370
1371 This works except for the --exclude option.
1372
1373 (Last checked: 2017-11)
1374
1375 DIFFERENCES BETWEEN pyargs AND GNU Parallel
1376 pyargs deals badly with input containing spaces. It buffers stdout, but
1377 not stderr. It buffers in RAM. {} does not work as replacement string.
1378 It does not support running functions.
1379
1380 pyargs does not support composed commands if run with --lines, and
1381 fails on pyargs traceroute gnu.org fsf.org.
1382
1383 Examples
1384
1385 seq 5 | pyargs -P50 -L seq
1386 seq 5 | parallel -P50 --lb seq
1387
1388 seq 5 | pyargs -P50 --mark -L seq
1389 seq 5 | parallel -P50 --lb \
1390 --tagstring OUTPUT'[{= $_=$job->replaced()=}]' seq
1391 # Similar, but not precisely the same
1392 seq 5 | parallel -P50 --lb --tag seq
1393
1394 seq 5 | pyargs -P50 --mark command
1395 # Somewhat longer with GNU Parallel due to the special
1396 # --mark formatting
1397 cmd="$(echo "command" | parallel --shellquote)"
1398 wrap_cmd() {
1399 echo "MARK $cmd $@================================" >&3
1400 echo "OUTPUT START[$cmd $@]:"
1401 eval $cmd "$@"
1402 echo "OUTPUT END[$cmd $@]"
1403 }
1404 (seq 5 | env_parallel -P2 wrap_cmd) 3>&1
1405 # Similar, but not exactly the same
1406 seq 5 | parallel -t --tag command
1407
1408 (echo '1 2 3';echo 4 5 6) | pyargs --stream seq
1409 (echo '1 2 3';echo 4 5 6) | perl -pe 's/\n/ /' |
1410 parallel -r -d' ' seq
1411 # Similar, but not exactly the same
1412 parallel seq ::: 1 2 3 4 5 6
1413
1414 https://github.com/robertblackwell/pyargs (Last checked: 2019-01)
1415
1416 DIFFERENCES BETWEEN concurrently AND GNU Parallel
1417 concurrently runs jobs in parallel.
1418
1419 The output is prepended with the job number, and may be incomplete:
1420
1421 $ concurrently 'seq 100000' | (sleep 3;wc -l)
1422 7165
1423
1424 When pretty printing it caches output in memory. Output mixes by using
1425 test MIX below whether or not output is cached.
1426
1427 There seems to be no way of making a template command and have
1428 concurrently fill that with different args. The full commands must be
1429 given on the command line.
1430
1431 There is also no way of controlling how many jobs should be run in
1432 parallel at a time - i.e. "number of jobslots". Instead all jobs are
1433 simply started in parallel.
1434
1435 https://github.com/kimmobrunfeldt/concurrently (Last checked: 2019-01)
1436
1437 DIFFERENCES BETWEEN map(soveran) AND GNU Parallel
1438 map does not run jobs in parallel by default. The README suggests
1439 using:
1440
1441 ... | map t 'sleep $t && say done &'
1442
1443 But this fails if more jobs are run in parallel than the number of
1444 available processes. Since there is no support for parallelization in
1445 map itself, the output also mixes:
1446
1447 seq 10 | map i 'echo start-$i && sleep 0.$i && echo end-$i &'
1448
1449 The major difference is that GNU parallel is built for parallelization
1450 and map is not. So GNU parallel has lots of ways of dealing with the
1451 issues that parallelization raises:
1452
1453 • Keep the number of processes manageable
1454
1455 • Make sure output does not mix
1456
1457 • Make Ctrl-C kill all running processes
1458
1459 EXAMPLES FROM maps WEBSITE
1460
1461 Here are the 5 examples converted to GNU Parallel:
1462
1463 1$ ls *.c | map f 'foo $f'
1464 1$ ls *.c | parallel foo
1465
1466 2$ ls *.c | map f 'foo $f; bar $f'
1467 2$ ls *.c | parallel 'foo {}; bar {}'
1468
1469 3$ cat urls | map u 'curl -O $u'
1470 3$ cat urls | parallel curl -O
1471
1472 4$ printf "1\n1\n1\n" | map t 'sleep $t && say done'
1473 4$ printf "1\n1\n1\n" | parallel 'sleep {} && say done'
1474 4$ parallel 'sleep {} && say done' ::: 1 1 1
1475
1476 5$ printf "1\n1\n1\n" | map t 'sleep $t && say done &'
1477 5$ printf "1\n1\n1\n" | parallel -j0 'sleep {} && say done'
1478 5$ parallel -j0 'sleep {} && say done' ::: 1 1 1
1479
1480 https://github.com/soveran/map (Last checked: 2019-01)
1481
1482 DIFFERENCES BETWEEN loop AND GNU Parallel
1483 loop mixes stdout and stderr:
1484
1485 loop 'ls /no-such-file' >/dev/null
1486
1487 loop's replacement string $ITEM does not quote strings:
1488
1489 echo 'two spaces' | loop 'echo $ITEM'
1490
1491 loop cannot run functions:
1492
1493 myfunc() { echo joe; }
1494 export -f myfunc
1495 loop 'myfunc this fails'
1496
1497 EXAMPLES FROM loop's WEBSITE
1498
1499 Some of the examples from https://github.com/Miserlou/Loop/ can be
1500 emulated with GNU parallel:
1501
1502 # A couple of functions will make the code easier to read
1503 $ loopy() {
1504 yes | parallel -uN0 -j1 "$@"
1505 }
1506 $ export -f loopy
1507 $ time_out() {
1508 parallel -uN0 -q --timeout "$@" ::: 1
1509 }
1510 $ match() {
1511 perl -0777 -ne 'grep /'"$1"'/,$_ and print or exit 1'
1512 }
1513 $ export -f match
1514
1515 $ loop 'ls' --every 10s
1516 $ loopy --delay 10s ls
1517
1518 $ loop 'touch $COUNT.txt' --count-by 5
1519 $ loopy touch '{= $_=seq()*5 =}'.txt
1520
1521 $ loop --until-contains 200 -- \
1522 ./get_response_code.sh --site mysite.biz`
1523 $ loopy --halt now,success=1 \
1524 './get_response_code.sh --site mysite.biz | match 200'
1525
1526 $ loop './poke_server' --for-duration 8h
1527 $ time_out 8h loopy ./poke_server
1528
1529 $ loop './poke_server' --until-success
1530 $ loopy --halt now,success=1 ./poke_server
1531
1532 $ cat files_to_create.txt | loop 'touch $ITEM'
1533 $ cat files_to_create.txt | parallel touch {}
1534
1535 $ loop 'ls' --for-duration 10min --summary
1536 # --joblog is somewhat more verbose than --summary
1537 $ time_out 10m loopy --joblog my.log ./poke_server; cat my.log
1538
1539 $ loop 'echo hello'
1540 $ loopy echo hello
1541
1542 $ loop 'echo $COUNT'
1543 # GNU Parallel counts from 1
1544 $ loopy echo {#}
1545 # Counting from 0 can be forced
1546 $ loopy echo '{= $_=seq()-1 =}'
1547
1548 $ loop 'echo $COUNT' --count-by 2
1549 $ loopy echo '{= $_=2*(seq()-1) =}'
1550
1551 $ loop 'echo $COUNT' --count-by 2 --offset 10
1552 $ loopy echo '{= $_=10+2*(seq()-1) =}'
1553
1554 $ loop 'echo $COUNT' --count-by 1.1
1555 # GNU Parallel rounds 3.3000000000000003 to 3.3
1556 $ loopy echo '{= $_=1.1*(seq()-1) =}'
1557
1558 $ loop 'echo $COUNT $ACTUALCOUNT' --count-by 2
1559 $ loopy echo '{= $_=2*(seq()-1) =} {#}'
1560
1561 $ loop 'echo $COUNT' --num 3 --summary
1562 # --joblog is somewhat more verbose than --summary
1563 $ seq 3 | parallel --joblog my.log echo; cat my.log
1564
1565 $ loop 'ls -foobarbatz' --num 3 --summary
1566 # --joblog is somewhat more verbose than --summary
1567 $ seq 3 | parallel --joblog my.log -N0 ls -foobarbatz; cat my.log
1568
1569 $ loop 'echo $COUNT' --count-by 2 --num 50 --only-last
1570 # Can be emulated by running 2 jobs
1571 $ seq 49 | parallel echo '{= $_=2*(seq()-1) =}' >/dev/null
1572 $ echo 50| parallel echo '{= $_=2*(seq()-1) =}'
1573
1574 $ loop 'date' --every 5s
1575 $ loopy --delay 5s date
1576
1577 $ loop 'date' --for-duration 8s --every 2s
1578 $ time_out 8s loopy --delay 2s date
1579
1580 $ loop 'date -u' --until-time '2018-05-25 20:50:00' --every 5s
1581 $ seconds=$((`date -d 2019-05-25T20:50:00 +%s` - `date +%s`))s
1582 $ time_out $seconds loopy --delay 5s date -u
1583
1584 $ loop 'echo $RANDOM' --until-contains "666"
1585 $ loopy --halt now,success=1 'echo $RANDOM | match 666'
1586
1587 $ loop 'if (( RANDOM % 2 )); then
1588 (echo "TRUE"; true);
1589 else
1590 (echo "FALSE"; false);
1591 fi' --until-success
1592 $ loopy --halt now,success=1 'if (( $RANDOM % 2 )); then
1593 (echo "TRUE"; true);
1594 else
1595 (echo "FALSE"; false);
1596 fi'
1597
1598 $ loop 'if (( RANDOM % 2 )); then
1599 (echo "TRUE"; true);
1600 else
1601 (echo "FALSE"; false);
1602 fi' --until-error
1603 $ loopy --halt now,fail=1 'if (( $RANDOM % 2 )); then
1604 (echo "TRUE"; true);
1605 else
1606 (echo "FALSE"; false);
1607 fi'
1608
1609 $ loop 'date' --until-match "(\d{4})"
1610 $ loopy --halt now,success=1 'date | match [0-9][0-9][0-9][0-9]'
1611
1612 $ loop 'echo $ITEM' --for red,green,blue
1613 $ parallel echo ::: red green blue
1614
1615 $ cat /tmp/my-list-of-files-to-create.txt | loop 'touch $ITEM'
1616 $ cat /tmp/my-list-of-files-to-create.txt | parallel touch
1617
1618 $ ls | loop 'cp $ITEM $ITEM.bak'; ls
1619 $ ls | parallel cp {} {}.bak; ls
1620
1621 $ loop 'echo $ITEM | tr a-z A-Z' -i
1622 $ parallel 'echo {} | tr a-z A-Z'
1623 # Or more efficiently:
1624 $ parallel --pipe tr a-z A-Z
1625
1626 $ loop 'echo $ITEM' --for "`ls`"
1627 $ parallel echo {} ::: "`ls`"
1628
1629 $ ls | loop './my_program $ITEM' --until-success;
1630 $ ls | parallel --halt now,success=1 ./my_program {}
1631
1632 $ ls | loop './my_program $ITEM' --until-fail;
1633 $ ls | parallel --halt now,fail=1 ./my_program {}
1634
1635 $ ./deploy.sh;
1636 loop 'curl -sw "%{http_code}" http://coolwebsite.biz' \
1637 --every 5s --until-contains 200;
1638 ./announce_to_slack.sh
1639 $ ./deploy.sh;
1640 loopy --delay 5s --halt now,success=1 \
1641 'curl -sw "%{http_code}" http://coolwebsite.biz | match 200';
1642 ./announce_to_slack.sh
1643
1644 $ loop "ping -c 1 mysite.com" --until-success; ./do_next_thing
1645 $ loopy --halt now,success=1 ping -c 1 mysite.com; ./do_next_thing
1646
1647 $ ./create_big_file -o my_big_file.bin;
1648 loop 'ls' --until-contains 'my_big_file.bin';
1649 ./upload_big_file my_big_file.bin
1650 # inotifywait is a better tool to detect file system changes.
1651 # It can even make sure the file is complete
1652 # so you are not uploading an incomplete file
1653 $ inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f . |
1654 grep my_big_file.bin
1655
1656 $ ls | loop 'cp $ITEM $ITEM.bak'
1657 $ ls | parallel cp {} {}.bak
1658
1659 $ loop './do_thing.sh' --every 15s --until-success --num 5
1660 $ parallel --retries 5 --delay 15s ::: ./do_thing.sh
1661
1662 https://github.com/Miserlou/Loop/ (Last checked: 2018-10)
1663
1664 DIFFERENCES BETWEEN lorikeet AND GNU Parallel
1665 lorikeet can run jobs in parallel. It does this based on a dependency
1666 graph described in a file, so this is similar to make.
1667
1668 https://github.com/cetra3/lorikeet (Last checked: 2018-10)
1669
1670 DIFFERENCES BETWEEN spp AND GNU Parallel
1671 spp can run jobs in parallel. spp does not use a command template to
1672 generate the jobs, but requires jobs to be in a file. Output from the
1673 jobs mix.
1674
1675 https://github.com/john01dav/spp (Last checked: 2019-01)
1676
1677 DIFFERENCES BETWEEN paral AND GNU Parallel
1678 paral prints a lot of status information and stores the output from the
1679 commands run into files. This means it cannot be used the middle of a
1680 pipe like this
1681
1682 paral "echo this" "echo does not" "echo work" | wc
1683
1684 Instead it puts the output into files named like out_#_command.out.log.
1685 To get a very similar behaviour with GNU parallel use --results
1686 'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta
1687
1688 paral only takes arguments on the command line and each argument should
1689 be a full command. Thus it does not use command templates.
1690
1691 This limits how many jobs it can run in total, because they all need to
1692 fit on a single command line.
1693
1694 paral has no support for running jobs remotely.
1695
1696 EXAMPLES FROM README.markdown
1697
1698 The examples from README.markdown and the corresponding command run
1699 with GNU parallel (--results
1700 'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta is omitted from the
1701 GNU parallel command):
1702
1703 1$ paral "command 1" "command 2 --flag" "command arg1 arg2"
1704 1$ parallel ::: "command 1" "command 2 --flag" "command arg1 arg2"
1705
1706 2$ paral "sleep 1 && echo c1" "sleep 2 && echo c2" \
1707 "sleep 3 && echo c3" "sleep 4 && echo c4" "sleep 5 && echo c5"
1708 2$ parallel ::: "sleep 1 && echo c1" "sleep 2 && echo c2" \
1709 "sleep 3 && echo c3" "sleep 4 && echo c4" "sleep 5 && echo c5"
1710 # Or shorter:
1711 parallel "sleep {} && echo c{}" ::: {1..5}
1712
1713 3$ paral -n=0 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1714 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1715 3$ parallel ::: "sleep 5 && echo c5" "sleep 4 && echo c4" \
1716 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1717 # Or shorter:
1718 parallel -j0 "sleep {} && echo c{}" ::: 5 4 3 2 1
1719
1720 4$ paral -n=1 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1721 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1722 4$ parallel -j1 "sleep {} && echo c{}" ::: 5 4 3 2 1
1723
1724 5$ paral -n=2 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1725 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1726 5$ parallel -j2 "sleep {} && echo c{}" ::: 5 4 3 2 1
1727
1728 6$ paral -n=5 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1729 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1730 6$ parallel -j5 "sleep {} && echo c{}" ::: 5 4 3 2 1
1731
1732 7$ paral -n=1 "echo a && sleep 0.5 && echo b && sleep 0.5 && \
1733 echo c && sleep 0.5 && echo d && sleep 0.5 && \
1734 echo e && sleep 0.5 && echo f && sleep 0.5 && \
1735 echo g && sleep 0.5 && echo h"
1736 7$ parallel ::: "echo a && sleep 0.5 && echo b && sleep 0.5 && \
1737 echo c && sleep 0.5 && echo d && sleep 0.5 && \
1738 echo e && sleep 0.5 && echo f && sleep 0.5 && \
1739 echo g && sleep 0.5 && echo h"
1740
1741 https://github.com/amattn/paral (Last checked: 2019-01)
1742
1743 DIFFERENCES BETWEEN concurr AND GNU Parallel
1744 concurr is built to run jobs in parallel using a client/server model.
1745
1746 EXAMPLES FROM README.md
1747
1748 The examples from README.md:
1749
1750 1$ concurr 'echo job {#} on slot {%}: {}' : arg1 arg2 arg3 arg4
1751 1$ parallel 'echo job {#} on slot {%}: {}' ::: arg1 arg2 arg3 arg4
1752
1753 2$ concurr 'echo job {#} on slot {%}: {}' :: file1 file2 file3
1754 2$ parallel 'echo job {#} on slot {%}: {}' :::: file1 file2 file3
1755
1756 3$ concurr 'echo {}' < input_file
1757 3$ parallel 'echo {}' < input_file
1758
1759 4$ cat file | concurr 'echo {}'
1760 4$ cat file | parallel 'echo {}'
1761
1762 concurr deals badly empty input files and with output larger than 64
1763 KB.
1764
1765 https://github.com/mmstick/concurr (Last checked: 2019-01)
1766
1767 DIFFERENCES BETWEEN lesser-parallel AND GNU Parallel
1768 lesser-parallel is the inspiration for parallel --embed. Both lesser-
1769 parallel and parallel --embed define bash functions that can be
1770 included as part of a bash script to run jobs in parallel.
1771
1772 lesser-parallel implements a few of the replacement strings, but hardly
1773 any options, whereas parallel --embed gives you the full GNU parallel
1774 experience.
1775
1776 https://github.com/kou1okada/lesser-parallel (Last checked: 2019-01)
1777
1778 DIFFERENCES BETWEEN npm-parallel AND GNU Parallel
1779 npm-parallel can run npm tasks in parallel.
1780
1781 There are no examples and very little documentation, so it is hard to
1782 compare to GNU parallel.
1783
1784 https://github.com/spion/npm-parallel (Last checked: 2019-01)
1785
1786 DIFFERENCES BETWEEN machma AND GNU Parallel
1787 machma runs tasks in parallel. It gives time stamped output. It buffers
1788 in RAM.
1789
1790 EXAMPLES FROM README.md
1791
1792 The examples from README.md:
1793
1794 1$ # Put shorthand for timestamp in config for the examples
1795 echo '--rpl '\
1796 \''{time} $_=::strftime("%Y-%m-%d %H:%M:%S",localtime())'\' \
1797 > ~/.parallel/machma
1798 echo '--line-buffer --tagstring "{#} {time} {}"' \
1799 >> ~/.parallel/machma
1800
1801 2$ find . -iname '*.jpg' |
1802 machma -- mogrify -resize 1200x1200 -filter Lanczos {}
1803 find . -iname '*.jpg' |
1804 parallel --bar -Jmachma mogrify -resize 1200x1200 \
1805 -filter Lanczos {}
1806
1807 3$ cat /tmp/ips | machma -p 2 -- ping -c 2 -q {}
1808 3$ cat /tmp/ips | parallel -j2 -Jmachma ping -c 2 -q {}
1809
1810 4$ cat /tmp/ips |
1811 machma -- sh -c 'ping -c 2 -q $0 > /dev/null && echo alive' {}
1812 4$ cat /tmp/ips |
1813 parallel -Jmachma 'ping -c 2 -q {} > /dev/null && echo alive'
1814
1815 5$ find . -iname '*.jpg' |
1816 machma --timeout 5s -- mogrify -resize 1200x1200 \
1817 -filter Lanczos {}
1818 5$ find . -iname '*.jpg' |
1819 parallel --timeout 5s --bar mogrify -resize 1200x1200 \
1820 -filter Lanczos {}
1821
1822 6$ find . -iname '*.jpg' -print0 |
1823 machma --null -- mogrify -resize 1200x1200 -filter Lanczos {}
1824 6$ find . -iname '*.jpg' -print0 |
1825 parallel --null --bar mogrify -resize 1200x1200 \
1826 -filter Lanczos {}
1827
1828 https://github.com/fd0/machma (Last checked: 2019-06)
1829
1830 DIFFERENCES BETWEEN interlace AND GNU Parallel
1831 Summary table (see legend above): - I2 I3 I4 - - - M1 - M3 - - M6 - O2
1832 O3 - - - - x x E1 E2 - - - - - - - - - - - - - - - -
1833
1834 interlace is built for network analysis to run network tools in
1835 parallel.
1836
1837 interface does not buffer output, so output from different jobs mixes.
1838
1839 The overhead for each target is O(n*n), so with 1000 targets it becomes
1840 very slow with an overhead in the order of 500ms/target.
1841
1842 EXAMPLES FROM interlace's WEBSITE
1843
1844 Using prips most of the examples from
1845 https://github.com/codingo/Interlace can be run with GNU parallel:
1846
1847 Blocker
1848
1849 commands.txt:
1850 mkdir -p _output_/_target_/scans/
1851 _blocker_
1852 nmap _target_ -oA _output_/_target_/scans/_target_-nmap
1853 interlace -tL ./targets.txt -cL commands.txt -o $output
1854
1855 parallel -a targets.txt \
1856 mkdir -p $output/{}/scans/\; nmap {} -oA $output/{}/scans/{}-nmap
1857
1858 Blocks
1859
1860 commands.txt:
1861 _block:nmap_
1862 mkdir -p _target_/output/scans/
1863 nmap _target_ -oN _target_/output/scans/_target_-nmap
1864 _block:nmap_
1865 nikto --host _target_
1866 interlace -tL ./targets.txt -cL commands.txt
1867
1868 _nmap() {
1869 mkdir -p $1/output/scans/
1870 nmap $1 -oN $1/output/scans/$1-nmap
1871 }
1872 export -f _nmap
1873 parallel ::: _nmap "nikto --host" :::: targets.txt
1874
1875 Run Nikto Over Multiple Sites
1876
1877 interlace -tL ./targets.txt -threads 5 \
1878 -c "nikto --host _target_ > ./_target_-nikto.txt" -v
1879
1880 parallel -a targets.txt -P5 nikto --host {} \> ./{}_-nikto.txt
1881
1882 Run Nikto Over Multiple Sites and Ports
1883
1884 interlace -tL ./targets.txt -threads 5 -c \
1885 "nikto --host _target_:_port_ > ./_target_-_port_-nikto.txt" \
1886 -p 80,443 -v
1887
1888 parallel -P5 nikto --host {1}:{2} \> ./{1}-{2}-nikto.txt \
1889 :::: targets.txt ::: 80 443
1890
1891 Run a List of Commands against Target Hosts
1892
1893 commands.txt:
1894 nikto --host _target_:_port_ > _output_/_target_-nikto.txt
1895 sslscan _target_:_port_ > _output_/_target_-sslscan.txt
1896 testssl.sh _target_:_port_ > _output_/_target_-testssl.txt
1897 interlace -t example.com -o ~/Engagements/example/ \
1898 -cL ./commands.txt -p 80,443
1899
1900 parallel --results ~/Engagements/example/{2}:{3}{1} {1} {2}:{3} \
1901 ::: "nikto --host" sslscan testssl.sh ::: example.com ::: 80 443
1902
1903 CIDR notation with an application that doesn't support it
1904
1905 interlace -t 192.168.12.0/24 -c "vhostscan _target_ \
1906 -oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
1907
1908 prips 192.168.12.0/24 |
1909 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
1910
1911 Glob notation with an application that doesn't support it
1912
1913 interlace -t 192.168.12.* -c "vhostscan _target_ \
1914 -oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
1915
1916 # Glob is not supported in prips
1917 prips 192.168.12.0/24 |
1918 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
1919
1920 Dash (-) notation with an application that doesn't support it
1921
1922 interlace -t 192.168.12.1-15 -c \
1923 "vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
1924 -o ~/scans/ -threads 50
1925
1926 # Dash notation is not supported in prips
1927 prips 192.168.12.1 192.168.12.15 |
1928 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
1929
1930 Threading Support for an application that doesn't support it
1931
1932 interlace -tL ./target-list.txt -c \
1933 "vhostscan -t _target_ -oN _output_/_target_-vhosts.txt" \
1934 -o ~/scans/ -threads 50
1935
1936 cat ./target-list.txt |
1937 parallel -P50 vhostscan -t {} -oN ~/scans/{}-vhosts.txt
1938
1939 alternatively
1940
1941 ./vhosts-commands.txt:
1942 vhostscan -t $target -oN _output_/_target_-vhosts.txt
1943 interlace -cL ./vhosts-commands.txt -tL ./target-list.txt \
1944 -threads 50 -o ~/scans
1945
1946 ./vhosts-commands.txt:
1947 vhostscan -t "$1" -oN "$2"
1948 parallel -P50 ./vhosts-commands.txt {} ~/scans/{}-vhosts.txt \
1949 :::: ./target-list.txt
1950
1951 Exclusions
1952
1953 interlace -t 192.168.12.0/24 -e 192.168.12.0/26 -c \
1954 "vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
1955 -o ~/scans/ -threads 50
1956
1957 prips 192.168.12.0/24 | grep -xv -Ff <(prips 192.168.12.0/26) |
1958 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
1959
1960 Run Nikto Using Multiple Proxies
1961
1962 interlace -tL ./targets.txt -pL ./proxies.txt -threads 5 -c \
1963 "nikto --host _target_:_port_ -useproxy _proxy_ > \
1964 ./_target_-_port_-nikto.txt" -p 80,443 -v
1965
1966 parallel -j5 \
1967 "nikto --host {1}:{2} -useproxy {3} > ./{1}-{2}-nikto.txt" \
1968 :::: ./targets.txt ::: 80 443 :::: ./proxies.txt
1969
1970 https://github.com/codingo/Interlace (Last checked: 2019-09)
1971
1972 DIFFERENCES BETWEEN otonvm Parallel AND GNU Parallel
1973 I have been unable to get the code to run at all. It seems unfinished.
1974
1975 https://github.com/otonvm/Parallel (Last checked: 2019-02)
1976
1977 DIFFERENCES BETWEEN k-bx par AND GNU Parallel
1978 par requires Haskell to work. This limits the number of platforms this
1979 can work on.
1980
1981 par does line buffering in memory. The memory usage is 3x the longest
1982 line (compared to 1x for parallel --lb). Commands must be given as
1983 arguments. There is no template.
1984
1985 These are the examples from https://github.com/k-bx/par with the
1986 corresponding GNU parallel command.
1987
1988 par "echo foo; sleep 1; echo foo; sleep 1; echo foo" \
1989 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
1990 parallel --lb ::: "echo foo; sleep 1; echo foo; sleep 1; echo foo" \
1991 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
1992
1993 par "echo foo; sleep 1; foofoo" \
1994 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
1995 parallel --lb --halt 1 ::: "echo foo; sleep 1; foofoo" \
1996 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
1997
1998 par "PARPREFIX=[fooechoer] echo foo" "PARPREFIX=[bar] echo bar"
1999 parallel --lb --colsep , --tagstring {1} {2} \
2000 ::: "[fooechoer],echo foo" "[bar],echo bar"
2001
2002 par --succeed "foo" "bar" && echo 'wow'
2003 parallel "foo" "bar"; true && echo 'wow'
2004
2005 https://github.com/k-bx/par (Last checked: 2019-02)
2006
2007 DIFFERENCES BETWEEN parallelshell AND GNU Parallel
2008 parallelshell does not allow for composed commands:
2009
2010 # This does not work
2011 parallelshell 'echo foo;echo bar' 'echo baz;echo quuz'
2012
2013 Instead you have to wrap that in a shell:
2014
2015 parallelshell 'sh -c "echo foo;echo bar"' 'sh -c "echo baz;echo quuz"'
2016
2017 It buffers output in RAM. All commands must be given on the command
2018 line and all commands are started in parallel at the same time. This
2019 will cause the system to freeze if there are so many jobs that there is
2020 not enough memory to run them all at the same time.
2021
2022 https://github.com/keithamus/parallelshell (Last checked: 2019-02)
2023
2024 https://github.com/darkguy2008/parallelshell (Last checked: 2019-03)
2025
2026 DIFFERENCES BETWEEN shell-executor AND GNU Parallel
2027 shell-executor does not allow for composed commands:
2028
2029 # This does not work
2030 sx 'echo foo;echo bar' 'echo baz;echo quuz'
2031
2032 Instead you have to wrap that in a shell:
2033
2034 sx 'sh -c "echo foo;echo bar"' 'sh -c "echo baz;echo quuz"'
2035
2036 It buffers output in RAM. All commands must be given on the command
2037 line and all commands are started in parallel at the same time. This
2038 will cause the system to freeze if there are so many jobs that there is
2039 not enough memory to run them all at the same time.
2040
2041 https://github.com/royriojas/shell-executor (Last checked: 2019-02)
2042
2043 DIFFERENCES BETWEEN non-GNU par AND GNU Parallel
2044 par buffers in memory to avoid mixing of jobs. It takes 1s per 1
2045 million output lines.
2046
2047 par needs to have all commands before starting the first job. The jobs
2048 are read from stdin (standard input) so any quoting will have to be
2049 done by the user.
2050
2051 Stdout (standard output) is prepended with o:. Stderr (standard error)
2052 is sendt to stdout (standard output) and prepended with e:.
2053
2054 For short jobs with little output par is 20% faster than GNU parallel
2055 and 60% slower than xargs.
2056
2057 http://savannah.nongnu.org/projects/par (Last checked: 2019-02)
2058
2059 DIFFERENCES BETWEEN fd AND GNU Parallel
2060 fd does not support composed commands, so commands must be wrapped in
2061 sh -c.
2062
2063 It buffers output in RAM.
2064
2065 It only takes file names from the filesystem as input (similar to
2066 find).
2067
2068 https://github.com/sharkdp/fd (Last checked: 2019-02)
2069
2070 DIFFERENCES BETWEEN lateral AND GNU Parallel
2071 lateral is very similar to sem: It takes a single command and runs it
2072 in the background. The design means that output from parallel running
2073 jobs may mix. If it dies unexpectly it leaves a socket in
2074 ~/.lateral/socket.PID.
2075
2076 lateral deals badly with too long command lines. This makes the lateral
2077 server crash:
2078
2079 lateral run echo `seq 100000| head -c 1000k`
2080
2081 Any options will be read by lateral so this does not work (lateral
2082 interprets the -l):
2083
2084 lateral run ls -l
2085
2086 Composed commands do not work:
2087
2088 lateral run pwd ';' ls
2089
2090 Functions do not work:
2091
2092 myfunc() { echo a; }
2093 export -f myfunc
2094 lateral run myfunc
2095
2096 Running emacs in the terminal causes the parent shell to die:
2097
2098 echo '#!/bin/bash' > mycmd
2099 echo emacs -nw >> mycmd
2100 chmod +x mycmd
2101 lateral start
2102 lateral run ./mycmd
2103
2104 Here are the examples from https://github.com/akramer/lateral with the
2105 corresponding GNU sem and GNU parallel commands:
2106
2107 1$ lateral start
2108 for i in $(cat /tmp/names); do
2109 lateral run -- some_command $i
2110 done
2111 lateral wait
2112
2113 1$ for i in $(cat /tmp/names); do
2114 sem some_command $i
2115 done
2116 sem --wait
2117
2118 1$ parallel some_command :::: /tmp/names
2119
2120 2$ lateral start
2121 for i in $(seq 1 100); do
2122 lateral run -- my_slow_command < workfile$i > /tmp/logfile$i
2123 done
2124 lateral wait
2125
2126 2$ for i in $(seq 1 100); do
2127 sem my_slow_command < workfile$i > /tmp/logfile$i
2128 done
2129 sem --wait
2130
2131 2$ parallel 'my_slow_command < workfile{} > /tmp/logfile{}' \
2132 ::: {1..100}
2133
2134 3$ lateral start -p 0 # yup, it will just queue tasks
2135 for i in $(seq 1 100); do
2136 lateral run -- command_still_outputs_but_wont_spam inputfile$i
2137 done
2138 # command output spam can commence
2139 lateral config -p 10; lateral wait
2140
2141 3$ for i in $(seq 1 100); do
2142 echo "command inputfile$i" >> joblist
2143 done
2144 parallel -j 10 :::: joblist
2145
2146 3$ echo 1 > /tmp/njobs
2147 parallel -j /tmp/njobs command inputfile{} \
2148 ::: {1..100} &
2149 echo 10 >/tmp/njobs
2150 wait
2151
2152 https://github.com/akramer/lateral (Last checked: 2019-03)
2153
2154 DIFFERENCES BETWEEN with-this AND GNU Parallel
2155 The examples from https://github.com/amritb/with-this.git and the
2156 corresponding GNU parallel command:
2157
2158 with -v "$(cat myurls.txt)" "curl -L this"
2159 parallel curl -L ::: myurls.txt
2160
2161 with -v "$(cat myregions.txt)" \
2162 "aws --region=this ec2 describe-instance-status"
2163 parallel aws --region={} ec2 describe-instance-status \
2164 :::: myregions.txt
2165
2166 with -v "$(ls)" "kubectl --kubeconfig=this get pods"
2167 ls | parallel kubectl --kubeconfig={} get pods
2168
2169 with -v "$(ls | grep config)" "kubectl --kubeconfig=this get pods"
2170 ls | grep config | parallel kubectl --kubeconfig={} get pods
2171
2172 with -v "$(echo {1..10})" "echo 123"
2173 parallel -N0 echo 123 ::: {1..10}
2174
2175 Stderr is merged with stdout. with-this buffers in RAM. It uses 3x the
2176 output size, so you cannot have output larger than 1/3rd the amount of
2177 RAM. The input values cannot contain spaces. Composed commands do not
2178 work.
2179
2180 with-this gives some additional information, so the output has to be
2181 cleaned before piping it to the next command.
2182
2183 https://github.com/amritb/with-this.git (Last checked: 2019-03)
2184
2185 DIFFERENCES BETWEEN Tollef's parallel (moreutils) AND GNU Parallel
2186 Summary table (see legend above): - - - I4 - - I7 - - M3 - - M6 - O2 O3
2187 - O5 O6 - x x E1 - - - - - E7 - x x x x x x x x - -
2188
2189 EXAMPLES FROM Tollef's parallel MANUAL
2190
2191 Tollef parallel sh -c "echo hi; sleep 2; echo bye" -- 1 2 3
2192
2193 GNU parallel "echo hi; sleep 2; echo bye" ::: 1 2 3
2194
2195 Tollef parallel -j 3 ufraw -o processed -- *.NEF
2196
2197 GNU parallel -j 3 ufraw -o processed ::: *.NEF
2198
2199 Tollef parallel -j 3 -- ls df "echo hi"
2200
2201 GNU parallel -j 3 ::: ls df "echo hi"
2202
2203 (Last checked: 2019-08)
2204
2205 DIFFERENCES BETWEEN rargs AND GNU Parallel
2206 Summary table (see legend above): I1 - - - - - I7 - - M3 M4 - - - O2 O3
2207 - O5 O6 - O8 - E1 - - E4 - - - - - - - - - - - - - -
2208
2209 rargs has elegant ways of doing named regexp capture and field ranges.
2210
2211 With GNU parallel you can use --rpl to get a similar functionality as
2212 regexp capture gives, and use join and @arg to get the field ranges.
2213 But the syntax is longer. This:
2214
2215 --rpl '{r(\d+)\.\.(\d+)} $_=join"$opt::colsep",@arg[$$1..$$2]'
2216
2217 would make it possible to use:
2218
2219 {1r3..6}
2220
2221 for field 3..6.
2222
2223 For full support of {n..m:s} including negative numbers use a dynamic
2224 replacement string like this:
2225
2226 PARALLEL=--rpl\ \''{r((-?\d+)?)\.\.((-?\d+)?)((:([^}]*))?)}
2227 $a = defined $$2 ? $$2 < 0 ? 1+$#arg+$$2 : $$2 : 1;
2228 $b = defined $$4 ? $$4 < 0 ? 1+$#arg+$$4 : $$4 : $#arg+1;
2229 $s = defined $$6 ? $$7 : " ";
2230 $_ = join $s,@arg[$a..$b]'\'
2231 export PARALLEL
2232
2233 You can then do:
2234
2235 head /etc/passwd | parallel --colsep : echo ..={1r..} ..3={1r..3} \
2236 4..={1r4..} 2..4={1r2..4} 3..3={1r3..3} ..3:-={1r..3:-} \
2237 ..3:/={1r..3:/} -1={-1} -5={-5} -6={-6} -3..={1r-3..}
2238
2239 EXAMPLES FROM rargs MANUAL
2240
2241 ls *.bak | rargs -p '(.*)\.bak' mv {0} {1}
2242 ls *.bak | parallel mv {} {.}
2243
2244 cat download-list.csv | rargs -p '(?P<url>.*),(?P<filename>.*)' wget {url} -O {filename}
2245 cat download-list.csv | parallel --csv wget {1} -O {2}
2246 # or use regexps:
2247 cat download-list.csv |
2248 parallel --rpl '{url} s/,.*//' --rpl '{filename} s/.*?,//' wget {url} -O {filename}
2249
2250 cat /etc/passwd | rargs -d: echo -e 'id: "{1}"\t name: "{5}"\t rest: "{6..::}"'
2251 cat /etc/passwd |
2252 parallel -q --colsep : echo -e 'id: "{1}"\t name: "{5}"\t rest: "{=6 $_=join":",@arg[6..$#arg]=}"'
2253
2254 https://github.com/lotabout/rargs (Last checked: 2020-01)
2255
2256 DIFFERENCES BETWEEN threader AND GNU Parallel
2257 Summary table (see legend above): I1 - - - - - - M1 - M3 - - M6 O1 - O3
2258 - O5 - - N/A N/A E1 - - E4 - - - - - - - - - - - - - -
2259
2260 Newline separates arguments, but newline at the end of file is treated
2261 as an empty argument. So this runs 2 jobs:
2262
2263 echo two_jobs | threader -run 'echo "$THREADID"'
2264
2265 threader ignores stderr, so any output to stderr is lost. threader
2266 buffers in RAM, so output bigger than the machine's virtual memory will
2267 cause the machine to crash.
2268
2269 https://github.com/voodooEntity/threader (Last checked: 2020-04)
2270
2271 DIFFERENCES BETWEEN runp AND GNU Parallel
2272 Summary table (see legend above): I1 I2 - - - - - M1 - (M3) - - M6 O1
2273 O2 O3 - O5 O6 - N/A N/A - E1 - - - - - - - - - - - - - - - - -
2274
2275 (M3): You can add a prefix and a postfix to the input, so it means you
2276 can only insert the argument on the command line once.
2277
2278 runp runs 10 jobs in parallel by default. runp blocks if output of a
2279 command is > 64 Kbytes. Quoting of input is needed. It adds output to
2280 stderr (this can be prevented with -q)
2281
2282 Examples as GNU Parallel
2283
2284 base='https://images-api.nasa.gov/search'
2285 query='jupiter'
2286 desc='planet'
2287 type='image'
2288 url="$base?q=$query&description=$desc&media_type=$type"
2289
2290 # Download the images in parallel using runp
2291 curl -s $url | jq -r .collection.items[].href | \
2292 runp -p 'curl -s' | jq -r .[] | grep large | \
2293 runp -p 'curl -s -L -O'
2294
2295 time curl -s $url | jq -r .collection.items[].href | \
2296 runp -g 1 -q -p 'curl -s' | jq -r .[] | grep large | \
2297 runp -g 1 -q -p 'curl -s -L -O'
2298
2299 # Download the images in parallel
2300 curl -s $url | jq -r .collection.items[].href | \
2301 parallel curl -s | jq -r .[] | grep large | \
2302 parallel curl -s -L -O
2303
2304 time curl -s $url | jq -r .collection.items[].href | \
2305 parallel -j 1 curl -s | jq -r .[] | grep large | \
2306 parallel -j 1 curl -s -L -O
2307
2308 Run some test commands (read from file)
2309
2310 # Create a file containing commands to run in parallel.
2311 cat << EOF > /tmp/test-commands.txt
2312 sleep 5
2313 sleep 3
2314 blah # this will fail
2315 ls $PWD # PWD shell variable is used here
2316 EOF
2317
2318 # Run commands from the file.
2319 runp /tmp/test-commands.txt > /dev/null
2320
2321 parallel -a /tmp/test-commands.txt > /dev/null
2322
2323 Ping several hosts and see packet loss (read from stdin)
2324
2325 # First copy this line and press Enter
2326 runp -p 'ping -c 5 -W 2' -s '| grep loss'
2327 localhost
2328 1.1.1.1
2329 8.8.8.8
2330 # Press Enter and Ctrl-D when done entering the hosts
2331
2332 # First copy this line and press Enter
2333 parallel ping -c 5 -W 2 {} '| grep loss'
2334 localhost
2335 1.1.1.1
2336 8.8.8.8
2337 # Press Enter and Ctrl-D when done entering the hosts
2338
2339 Get directories' sizes (read from stdin)
2340
2341 echo -e "$HOME\n/etc\n/tmp" | runp -q -p 'sudo du -sh'
2342
2343 echo -e "$HOME\n/etc\n/tmp" | parallel sudo du -sh
2344 # or:
2345 parallel sudo du -sh ::: "$HOME" /etc /tmp
2346
2347 Compress files
2348
2349 find . -iname '*.txt' | runp -p 'gzip --best'
2350
2351 find . -iname '*.txt' | parallel gzip --best
2352
2353 Measure HTTP request + response time
2354
2355 export CURL="curl -w 'time_total: %{time_total}\n'"
2356 CURL="$CURL -o /dev/null -s https://golang.org/"
2357 perl -wE 'for (1..10) { say $ENV{CURL} }' |
2358 runp -q # Make 10 requests
2359
2360 perl -wE 'for (1..10) { say $ENV{CURL} }' | parallel
2361 # or:
2362 parallel -N0 "$CURL" ::: {1..10}
2363
2364 Find open TCP ports
2365
2366 cat << EOF > /tmp/host-port.txt
2367 localhost 22
2368 localhost 80
2369 localhost 81
2370 127.0.0.1 443
2371 127.0.0.1 444
2372 scanme.nmap.org 22
2373 scanme.nmap.org 23
2374 scanme.nmap.org 443
2375 EOF
2376
2377 cat /tmp/host-port.txt | \
2378 runp -q -p 'netcat -v -w2 -z' 2>&1 | egrep '(succeeded!|open)$'
2379
2380 # --colsep is needed to split the line
2381 cat /tmp/host-port.txt | \
2382 parallel --colsep ' ' netcat -v -w2 -z 2>&1 | egrep '(succeeded!|open)$'
2383 # or use uq for unquoted:
2384 cat /tmp/host-port.txt | \
2385 parallel netcat -v -w2 -z {=uq=} 2>&1 | egrep '(succeeded!|open)$'
2386
2387 https://github.com/jreisinger/runp (Last checked: 2020-04)
2388
2389 DIFFERENCES BETWEEN papply AND GNU Parallel
2390 Summary table (see legend above): - - - I4 - - - M1 - M3 - - M6 - - O3
2391 - O5 - - N/A N/A O10 E1 - - E4 - - - - - - - - - - - - - -
2392
2393 papply does not print the output if the command fails:
2394
2395 $ papply 'echo %F; false' foo
2396 "echo foo; false" did not succeed
2397
2398 papply's replacement strings (%F %d %f %n %e %z) can be simulated in
2399 GNU parallel by putting this in ~/.parallel/config:
2400
2401 --rpl '%F'
2402 --rpl '%d $_=Q(::dirname($_));'
2403 --rpl '%f s:.*/::;'
2404 --rpl '%n s:.*/::;s:\.[^/.]+$::;'
2405 --rpl '%e s:.*\.:.:'
2406 --rpl '%z $_=""'
2407
2408 papply buffers in RAM, and uses twice the amount of output. So output
2409 of 5 GB takes 10 GB RAM.
2410
2411 The buffering is very CPU intensive: Buffering a line of 5 GB takes 40
2412 seconds (compared to 10 seconds with GNU parallel).
2413
2414 Examples as GNU Parallel
2415
2416 1$ papply gzip *.txt
2417
2418 1$ parallel gzip ::: *.txt
2419
2420 2$ papply "convert %F %n.jpg" *.png
2421
2422 2$ parallel convert {} {.}.jpg ::: *.png
2423
2424 https://pypi.org/project/papply/ (Last checked: 2020-04)
2425
2426 DIFFERENCES BETWEEN async AND GNU Parallel
2427 Summary table (see legend above): - - - I4 - - I7 - - - - - M6 - O2 O3
2428 - O5 O6 - N/A N/A O10 E1 - - E4 - E6 - - - - - - - - - - S1 S2
2429
2430 async is very similary to GNU parallel's --semaphore mode (aka sem).
2431 async requires the user to start a server process.
2432
2433 The input is quoted like -q so you need bash -c "...;..." to run
2434 composed commands.
2435
2436 Examples as GNU Parallel
2437
2438 1$ S="/tmp/example_socket"
2439
2440 1$ ID=myid
2441
2442 2$ async -s="$S" server --start
2443
2444 2$ # GNU Parallel does not need a server to run
2445
2446 3$ for i in {1..20}; do
2447 # prints command output to stdout
2448 async -s="$S" cmd -- bash -c "sleep 1 && echo test $i"
2449 done
2450
2451 3$ for i in {1..20}; do
2452 # prints command output to stdout
2453 sem --id "$ID" -j100% "sleep 1 && echo test $i"
2454 # GNU Parallel will only print job when it is done
2455 # If you need output from different jobs to mix
2456 # use -u or --line-buffer
2457 sem --id "$ID" -j100% --line-buffer "sleep 1 && echo test $i"
2458 done
2459
2460 4$ # wait until all commands are finished
2461 async -s="$S" wait
2462
2463 4$ sem --id "$ID" --wait
2464
2465 5$ # configure the server to run four commands in parallel
2466 async -s="$S" server -j4
2467
2468 5$ export PARALLEL=-j4
2469
2470 6$ mkdir "/tmp/ex_dir"
2471 for i in {21..40}; do
2472 # redirects command output to /tmp/ex_dir/file*
2473 async -s="$S" cmd -o "/tmp/ex_dir/file$i" -- \
2474 bash -c "sleep 1 && echo test $i"
2475 done
2476
2477 6$ mkdir "/tmp/ex_dir"
2478 for i in {21..40}; do
2479 # redirects command output to /tmp/ex_dir/file*
2480 sem --id "$ID" --result '/tmp/my-ex/file-{=$_=""=}'"$i" \
2481 "sleep 1 && echo test $i"
2482 done
2483
2484 7$ sem --id "$ID" --wait
2485
2486 7$ async -s="$S" wait
2487
2488 8$ # stops server
2489 async -s="$S" server --stop
2490
2491 8$ # GNU Parallel does not need to stop a server
2492
2493 https://github.com/ctbur/async/ (Last checked: 2020-11)
2494
2495 Todo
2496 test_many_var() { gen500k() { seq -f %f 1000000000000000
2497 1000000000050000 | head -c 131000; } for a in `seq 11000`; do eval
2498 "export a$a=1" ; done gen500k | stdout parallel --timeout 5 -Xj1 'echo
2499 {} {} {} {} | wc' | perl -pe 's/\d{3,5} //g' }
2500
2501 test_many_var_func() { gen500k() { seq -f %f 1000000000000000
2502 1000000000050000 | head -c 131000; } for a in `seq 5100`; do eval
2503 "export a$a=1" ; done for a in `seq 5100`; do eval "a$a() { 1; }" ;
2504 done for a in `seq 5100`; do eval export -f a$a ; done gen500k | stdout
2505 parallel --timeout 21 -Xj1 'echo {} {} {} {} | wc' | perl -pe
2506 's/\d{3,5} //g' }
2507
2508 test_many_var_func() { gen500k() { seq -f %f 1000000000000000
2509 1000000000050000 | head -c 131000; } for a in `seq 8000`; do eval
2510 "a$a() { 1; }" ; done for a in `seq 8000`; do eval export -f a$a ; done
2511 gen500k | stdout parallel --timeout 6 -Xj1 'echo {} {} {} {} | wc' |
2512 perl -pe 's/\d{3,5} //g' }
2513
2514 test_big_func() { gen500k() { seq -f %f 1000000000000000
2515 1000000000050000 | head -c 131000; } big=`seq 1000` for a in `seq 50`;
2516 do eval "a$a() { '$big'; }" ; done for a in `seq 50`; do eval export -f
2517 a$a ; done gen500k | stdout parallel --timeout 4 -Xj1 'echo {} {} {}
2518 {} | wc' | perl -pe 's/\d{3,5} //g' }
2519
2520 test_many_var_big_func() { gen500k() { seq -f %f 1000000000000000
2521 1000000000050000 | head -c 131000; } big=`seq 1000` for a in `seq
2522 5100`; do eval "export a$a=1" ; done for a in `seq 20`; do eval "a$a()
2523 { '$big'; }" ; done for a in `seq 20`; do eval export -f a$a ; done
2524 gen500k | stdout parallel --timeout 6 -Xj1 'echo {} {} {} {} | wc' |
2525 perl -pe 's/\d{3,5} //g' }
2526
2527 test_big_func_name() { gen500k() { seq -f %f 1000000000000000
2528 1000000000050000 | head -c 131000; } big=`perl -e print\"x\"x10000` for
2529 a in `seq 20`; do eval "export a$big$a=1" ; done gen500k | stdout
2530 parallel --timeout 8 -Xj1 'echo {} {} {} {} | wc' | perl -pe
2531 's/\d{3,5} //g' }
2532
2533 test_big_var_func_name() { gen500k() { seq -f %f 1000000000000000
2534 1000000000050000 | head -c 131000; } big=`perl -e print\"x\"x10000` for
2535 a in `seq 2`; do eval "export a$big$a=1" ; done for a in `seq 2`; do
2536 eval "a$big$a() { '$big'; }" ; done for a in `seq 2`; do eval export -f
2537 a$big$a ; done gen500k | stdout parallel --timeout 1000 -Xj1 'echo {}
2538 {} {} {} | wc' | perl -pe 's/\d{3,5} //g' }
2539
2540 tange@macosx:~$ for a in `seq 100`; do eval export
2541 a$a=fffffffffffffffffffffffff ; donetange@macosx:~$ seq 50000 | stdout
2542 parallel -Xj1 'echo {} {} | wc' | perl -pe 's/\d{3,5} //g'
2543 tange@macosx:~$ for a in `seq 100`; do eval export
2544 a$a=fffffffffffffffffffffffff ; donetange@macosx:~$ seq 50000 | stdout
2545 parallel -Xj1 'echo {} {} | wc' | perl -pe 's/\d{3,5} //g'
2546 tange@macosx:~$ for a in `seq 100`; do eval export -f a$a ; done
2547
2548 seq 100000 | stdout parallel -Xj1 'echo {} {} | wc' export a=`seq
2549 10000` seq 100000 | stdout parallel -Xj1 'echo {} {} | wc'
2550
2551 my $already_spread;
2552 my $env_size;
2553
2554 if($^O eq "darwin") {
2555 $env_size ||= 500+length(join'',%ENV);
2556 $max_len -= $env_size;
2557 }
2558
2559 PASH: Light-touch Data-Parallel Shell Processing
2560 https://arxiv.org/pdf/2007.09436.pdf
2561
2562 https://github.com/UnixJunkie/pardi
2563
2564 https://github.com/UnixJunkie/PAR (Same as
2565 http://savannah.nongnu.org/projects/par above?)
2566
2567 https://gitlab.com/netikras/bthread
2568
2569 https://github.com/JeiKeiLim/simple_distribute_job
2570
2571 https://github.com/reggi/pkgrun
2572
2573 https://github.com/benoror/better-npm-run - not obvious how to use
2574
2575 https://github.com/bahmutov/with-package
2576
2577 https://github.com/xuchenCN/go-pssh
2578
2579 https://github.com/flesler/parallel
2580
2581 https://github.com/Julian/Verge
2582
2583 https://github.com/ExpectationMax/simple_gpu_scheduler
2584 simple_gpu_scheduler --gpus 0 1 2 < gpu_commands.txt
2585 parallel -j3 --shuf CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =}
2586 {=uq;=}' < gpu_commands.txt
2587
2588 simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2
2589 parallel --header : --shuf -j3 -v CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =}' python3 train_dnn.py --lr {lr} --batch_size {bs} ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
2590
2591 simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" --n-samples 5 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2
2592 parallel --header : --shuf CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1; seq() > 5 and skip() =}' python3 train_dnn.py --lr {lr} --batch_size {bs} ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
2593
2594 touch gpu.queue
2595 tail -f -n 0 gpu.queue | simple_gpu_scheduler --gpus 0,1,2 &
2596 echo "my_command_with | and stuff > logfile" >> gpu.queue
2597
2598 touch gpu.queue
2599 tail -f -n 0 gpu.queue | parallel -j3 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' &
2600 # Needed to fill job slots once
2601 seq 3 | parallel echo true >> gpu.queue
2602 # Add jobs
2603 echo "my_command_with | and stuff > logfile" >> gpu.queue
2604 # Needed to flush output from completed jobs
2605 seq 3 | parallel echo true >> gpu.queue
2606
2608 There are certain issues that are very common on parallelizing tools.
2609 Here are a few stress tests. Be warned: If the tool is badly coded it
2610 may overload your machine.
2611
2612 MIX: Output mixes
2613 Output from 2 jobs should not mix. If the output is not used, this does
2614 not matter; but if the output is used then it is important that you do
2615 not get half a line from one job followed by half a line from another
2616 job.
2617
2618 If the tool does not buffer, output will most likely mix now and then.
2619
2620 This test stresses whether output mixes.
2621
2622 #!/bin/bash
2623
2624 paralleltool="parallel -j0"
2625
2626 cat <<-EOF > mycommand
2627 #!/bin/bash
2628
2629 # If a, b, c, d, e, and f mix: Very bad
2630 perl -e 'print STDOUT "a"x3000_000," "'
2631 perl -e 'print STDERR "b"x3000_000," "'
2632 perl -e 'print STDOUT "c"x3000_000," "'
2633 perl -e 'print STDERR "d"x3000_000," "'
2634 perl -e 'print STDOUT "e"x3000_000," "'
2635 perl -e 'print STDERR "f"x3000_000," "'
2636 echo
2637 echo >&2
2638 EOF
2639 chmod +x mycommand
2640
2641 # Run 30 jobs in parallel
2642 seq 30 |
2643 $paralleltool ./mycommand > >(tr -s abcdef) 2> >(tr -s abcdef >&2)
2644
2645 # 'a c e' and 'b d f' should always stay together
2646 # and there should only be a single line per job
2647
2648 STDERRMERGE: Stderr is merged with stdout
2649 Output from stdout and stderr should not be merged, but kept separated.
2650
2651 This test shows whether stdout is mixed with stderr.
2652
2653 #!/bin/bash
2654
2655 paralleltool="parallel -j0"
2656
2657 cat <<-EOF > mycommand
2658 #!/bin/bash
2659
2660 echo stdout
2661 echo stderr >&2
2662 echo stdout
2663 echo stderr >&2
2664 EOF
2665 chmod +x mycommand
2666
2667 # Run one job
2668 echo |
2669 $paralleltool ./mycommand > stdout 2> stderr
2670 cat stdout
2671 cat stderr
2672
2673 RAM: Output limited by RAM
2674 Some tools cache output in RAM. This makes them extremely slow if the
2675 output is bigger than physical memory and crash if the output is bigger
2676 than the virtual memory.
2677
2678 #!/bin/bash
2679
2680 paralleltool="parallel -j0"
2681
2682 cat <<'EOF' > mycommand
2683 #!/bin/bash
2684
2685 # Generate 1 GB output
2686 yes "`perl -e 'print \"c\"x30_000'`" | head -c 1G
2687 EOF
2688 chmod +x mycommand
2689
2690 # Run 20 jobs in parallel
2691 # Adjust 20 to be > physical RAM and < free space on /tmp
2692 seq 20 | time $paralleltool ./mycommand | wc -c
2693
2694 DISKFULL: Incomplete data if /tmp runs full
2695 If caching is done on disk, the disk can run full during the run. Not
2696 all programs discover this. GNU Parallel discovers it, if it stays full
2697 for at least 2 seconds.
2698
2699 #!/bin/bash
2700
2701 paralleltool="parallel -j0"
2702
2703 # This should be a dir with less than 100 GB free space
2704 smalldisk=/tmp/shm/parallel
2705
2706 TMPDIR="$smalldisk"
2707 export TMPDIR
2708
2709 max_output() {
2710 # Force worst case scenario:
2711 # Make GNU Parallel only check once per second
2712 sleep 10
2713 # Generate 100 GB to fill $TMPDIR
2714 # Adjust if /tmp is bigger than 100 GB
2715 yes | head -c 100G >$TMPDIR/$$
2716 # Generate 10 MB output that will not be buffered due to full disk
2717 perl -e 'print "X"x10_000_000' | head -c 10M
2718 echo This part is missing from incomplete output
2719 sleep 2
2720 rm $TMPDIR/$$
2721 echo Final output
2722 }
2723
2724 export -f max_output
2725 seq 10 | $paralleltool max_output | tr -s X
2726
2727 CLEANUP: Leaving tmp files at unexpected death
2728 Some tools do not clean up tmp files if they are killed. If the tool
2729 buffers on disk, they may not clean up, if they are killed.
2730
2731 #!/bin/bash
2732
2733 paralleltool=parallel
2734
2735 ls /tmp >/tmp/before
2736 seq 10 | $paralleltool sleep &
2737 pid=$!
2738 # Give the tool time to start up
2739 sleep 1
2740 # Kill it without giving it a chance to cleanup
2741 kill -9 $!
2742 # Should be empty: No files should be left behind
2743 diff <(ls /tmp) /tmp/before
2744
2745 SPCCHAR: Dealing badly with special file names.
2746 It is not uncommon for users to create files like:
2747
2748 My brother's 12" *** record (costs $$$).jpg
2749
2750 Some tools break on this.
2751
2752 #!/bin/bash
2753
2754 paralleltool=parallel
2755
2756 touch "My brother's 12\" *** record (costs \$\$\$).jpg"
2757 ls My*jpg | $paralleltool ls -l
2758
2759 COMPOSED: Composed commands do not work
2760 Some tools require you to wrap composed commands into bash -c.
2761
2762 echo bar | $paralleltool echo foo';' echo {}
2763
2764 ONEREP: Only one replacement string allowed
2765 Some tools can only insert the argument once.
2766
2767 echo bar | $paralleltool echo {} foo {}
2768
2769 INPUTSIZE: Length of input should not be limited
2770 Some tools limit the length of the input lines artificially with no
2771 good reason. GNU parallel does not:
2772
2773 perl -e 'print "foo."."x"x100_000_000' | parallel echo {.}
2774
2775 GNU parallel limits the command to run to 128 KB due to execve(1):
2776
2777 perl -e 'print "x"x131_000' | parallel echo {} | wc
2778
2779 NUMWORDS: Speed depends on number of words
2780 Some tools become very slow if output lines have many words.
2781
2782 #!/bin/bash
2783
2784 paralleltool=parallel
2785
2786 cat <<-EOF > mycommand
2787 #!/bin/bash
2788
2789 # 10 MB of lines with 1000 words
2790 yes "`seq 1000`" | head -c 10M
2791 EOF
2792 chmod +x mycommand
2793
2794 # Run 30 jobs in parallel
2795 seq 30 | time $paralleltool -j0 ./mycommand > /dev/null
2796
2797 4GB: Output with a line > 4GB should be OK
2798 #!/bin/bash
2799
2800 paralleltool="parallel -j0"
2801
2802 cat <<-EOF > mycommand
2803 #!/bin/bash
2804
2805 perl -e '\$a="a"x1000_000; for(1..5000) { print \$a }'
2806 EOF
2807 chmod +x mycommand
2808
2809 # Run 1 job
2810 seq 1 | $paralleltool ./mycommand | LC_ALL=C wc
2811
2813 When using GNU parallel for a publication please cite:
2814
2815 O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
2816 The USENIX Magazine, February 2011:42-47.
2817
2818 This helps funding further development; and it won't cost you a cent.
2819 If you pay 10000 EUR you should feel free to use GNU Parallel without
2820 citing.
2821
2822 Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
2823
2824 Copyright (C) 2008-2010 Ole Tange, http://ole.tange.dk
2825
2826 Copyright (C) 2010-2020 Ole Tange, http://ole.tange.dk and Free
2827 Software Foundation, Inc.
2828
2829 Parts of the manual concerning xargs compatibility is inspired by the
2830 manual of xargs from GNU findutils 4.4.2.
2831
2833 This program is free software; you can redistribute it and/or modify it
2834 under the terms of the GNU General Public License as published by the
2835 Free Software Foundation; either version 3 of the License, or at your
2836 option any later version.
2837
2838 This program is distributed in the hope that it will be useful, but
2839 WITHOUT ANY WARRANTY; without even the implied warranty of
2840 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
2841 General Public License for more details.
2842
2843 You should have received a copy of the GNU General Public License along
2844 with this program. If not, see <http://www.gnu.org/licenses/>.
2845
2846 Documentation license I
2847 Permission is granted to copy, distribute and/or modify this
2848 documentation under the terms of the GNU Free Documentation License,
2849 Version 1.3 or any later version published by the Free Software
2850 Foundation; with no Invariant Sections, with no Front-Cover Texts, and
2851 with no Back-Cover Texts. A copy of the license is included in the
2852 file fdl.txt.
2853
2854 Documentation license II
2855 You are free:
2856
2857 to Share to copy, distribute and transmit the work
2858
2859 to Remix to adapt the work
2860
2861 Under the following conditions:
2862
2863 Attribution
2864 You must attribute the work in the manner specified by the
2865 author or licensor (but not in any way that suggests that they
2866 endorse you or your use of the work).
2867
2868 Share Alike
2869 If you alter, transform, or build upon this work, you may
2870 distribute the resulting work only under the same, similar or
2871 a compatible license.
2872
2873 With the understanding that:
2874
2875 Waiver Any of the above conditions can be waived if you get
2876 permission from the copyright holder.
2877
2878 Public Domain
2879 Where the work or any of its elements is in the public domain
2880 under applicable law, that status is in no way affected by the
2881 license.
2882
2883 Other Rights
2884 In no way are any of the following rights affected by the
2885 license:
2886
2887 • Your fair dealing or fair use rights, or other applicable
2888 copyright exceptions and limitations;
2889
2890 • The author's moral rights;
2891
2892 • Rights other persons may have either in the work itself or
2893 in how the work is used, such as publicity or privacy
2894 rights.
2895
2896 Notice For any reuse or distribution, you must make clear to others
2897 the license terms of this work.
2898
2899 A copy of the full license is included in the file as cc-by-sa.txt.
2900
2902 GNU parallel uses Perl, and the Perl modules Getopt::Long, IPC::Open3,
2903 Symbol, IO::File, POSIX, and File::Temp. For remote usage it also uses
2904 rsync with ssh.
2905
2907 find(1), xargs(1), make(1), pexec(1), ppss(1), xjobs(1), prll(1),
2908 dxargs(1), mdm(1)
2909
2910
2911
291220201122 2020-12-20 PARALLEL_ALTERNATIVES(7)