1PARALLEL_ALTERNATIVES(7) parallel PARALLEL_ALTERNATIVES(7)
2
3
4
6 parallel_alternatives - Alternatives to GNU parallel
7
9 There are a lot programs with some of the functionality of GNU
10 parallel. GNU parallel strives to include the best of the functionality
11 without sacrificing ease of use.
12
13 SUMMARY TABLE
14 The following features are in some of the comparable tools:
15
16 Inputs
17 I1. Arguments can be read from stdin
18 I2. Arguments can be read from a file
19 I3. Arguments can be read from multiple files
20 I4. Arguments can be read from command line
21 I5. Arguments can be read from a table
22 I6. Arguments can be read from the same file using #! (shebang)
23 I7. Line oriented input as default (Quoting of special chars not
24 needed)
25
26 Manipulation of input
27 M1. Composed command
28 M2. Multiple arguments can fill up an execution line
29 M3. Arguments can be put anywhere in the execution line
30 M4. Multiple arguments can be put anywhere in the execution line
31 M5. Arguments can be replaced with context
32 M6. Input can be treated as the complete command line
33
34 Outputs
35 O1. Grouping output so output from different jobs do not mix
36 O2. Send stderr (standard error) to stderr (standard error)
37 O3. Send stdout (standard output) to stdout (standard output)
38 O4. Order of output can be same as order of input
39 O5. Stdout only contains stdout (standard output) from the command
40 O6. Stderr only contains stderr (standard error) from the command
41
42 Execution
43 E1. Running jobs in parallel
44 E2. List running jobs
45 E3. Finish running jobs, but do not start new jobs
46 E4. Number of running jobs can depend on number of cpus
47 E5. Finish running jobs, but do not start new jobs after first failure
48 E6. Number of running jobs can be adjusted while running
49
50 Remote execution
51 R1. Jobs can be run on remote computers
52 R2. Basefiles can be transferred
53 R3. Argument files can be transferred
54 R4. Result files can be transferred
55 R5. Cleanup of transferred files
56 R6. No config files needed
57 R7. Do not run more than SSHD's MaxStartups can handle
58 R8. Configurable SSH command
59 R9. Retry if connection breaks occasionally
60
61 Semaphore
62 S1. Possibility to work as a mutex
63 S2. Possibility to work as a counting semaphore
64
65 Legend
66 - = no
67 x = not applicable
68 ID = yes
69
70 As every new version of the programs are not tested the table may be
71 outdated. Please file a bug-report if you find errors (See REPORTING
72 BUGS).
73
74 parallel: I1 I2 I3 I4 I5 I6 I7 M1 M2 M3 M4 M5 M6 O1 O2 O3 O4 O5 O6 E1
75 E2 E3 E4 E5 E6 R1 R2 R3 R4 R5 R6 R7 R8 R9 S1 S2
76
77 xargs: I1 I2 - - - - - - M2 M3 - - - - O2 O3 - O5 O6 E1 - -
78 - - - - - - - - x - - - - -
79
80 find -exec: - - - x - x - - M2 M3 - - - - - O2 O3 O4 O5 O6 -
81 - - - - - - - - - - - - - - - x x
82
83 make -j: - - - - - - - - - - - - - O1 O2 O3 - x O6 E1 - -
84 - E5 - - - - - - - - - - - -
85
86 ppss: I1 I2 - - - - I7 M1 - M3 - - M6 O1 - - x - - E1 E2 ?E3
87 E4 - - R1 R2 R3 R4 - - ?R7 ? ? - -
88
89 pexec: I1 I2 - I4 I5 - - M1 - M3 - - M6 O1 O2 O3 - O5 O6 E1 - -
90 E4 - E6 R1 - - - - R6 - - - S1 -
91
92 xjobs, prll, dxargs, mdm/middelman, xapply, paexec, ladon, jobflow,
93 ClusterSSH: TODO - Please file a bug-report if you know what features
94 they support (See REPORTING BUGS).
95
96 DIFFERENCES BETWEEN xargs AND GNU Parallel
97 xargs offers some of the same possibilities as GNU parallel.
98
99 xargs deals badly with special characters (such as space, \, ' and ").
100 To see the problem try this:
101
102 touch important_file
103 touch 'not important_file'
104 ls not* | xargs rm
105 mkdir -p "My brother's 12\" records"
106 ls | xargs rmdir
107 touch 'c:\windows\system32\clfs.sys'
108 echo 'c:\windows\system32\clfs.sys' | xargs ls -l
109
110 You can specify -0, but many input generators are not optimized for
111 using NUL as separator but are optimized for newline as separator. E.g
112 head, tail, awk, ls, echo, sed, tar -v, perl (-0 and \0 instead of \n),
113 locate (requires using -0), find (requires using -print0), grep
114 (requires user to use -z or -Z), sort (requires using -z).
115
116 GNU parallel's newline separation can be emulated with:
117
118 cat | xargs -d "\n" -n1 command
119
120 xargs can run a given number of jobs in parallel, but has no support
121 for running number-of-cpu-cores jobs in parallel.
122
123 xargs has no support for grouping the output, therefore output may run
124 together, e.g. the first half of a line is from one process and the
125 last half of the line is from another process. The example Parallel
126 grep cannot be done reliably with xargs because of this. To see this in
127 action try:
128
129 parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} \
130 ::: a b c d e f
131 ls -l a b c d e f
132 parallel -kP4 -n1 grep 1 > out.par ::: a b c d e f
133 echo a b c d e f | xargs -P4 -n1 grep 1 > out.xargs-unbuf
134 echo a b c d e f | \
135 xargs -P4 -n1 grep --line-buffered 1 > out.xargs-linebuf
136 echo a b c d e f | xargs -n1 grep 1 > out.xargs-serial
137 ls -l out*
138 md5sum out*
139
140 Or try this:
141
142 slow_seq() {
143 seq "$@" |
144 perl -ne '$|=1; for(split//){ print; select($a,$a,$a,0.100);}'
145 }
146 export -f slow_seq
147 seq 5 | xargs -n1 -P0 -I {} bash -c 'slow_seq {}'
148 seq 5 | parallel -P0 slow_seq {}
149
150 xargs has no support for keeping the order of the output, therefore if
151 running jobs in parallel using xargs the output of the second job
152 cannot be postponed till the first job is done.
153
154 xargs has no support for running jobs on remote computers.
155
156 xargs has no support for context replace, so you will have to create
157 the arguments.
158
159 If you use a replace string in xargs (-I) you can not force xargs to
160 use more than one argument.
161
162 Quoting in xargs works like -q in GNU parallel. This means composed
163 commands and redirection require using bash -c.
164
165 ls | parallel "wc {} >{}.wc"
166 ls | parallel "echo {}; ls {}|wc"
167
168 becomes (assuming you have 8 cores)
169
170 ls | xargs -d "\n" -P8 -I {} bash -c "wc {} >{}.wc"
171 ls | xargs -d "\n" -P8 -I {} bash -c "echo {}; ls {}|wc"
172
173 https://www.gnu.org/software/findutils/
174
175 DIFFERENCES BETWEEN find -exec AND GNU Parallel
176 find -exec offer some of the same possibilities as GNU parallel.
177
178 find -exec only works on files. So processing other input (such as
179 hosts or URLs) will require creating these inputs as files. find -exec
180 has no support for running commands in parallel.
181
182 https://www.gnu.org/software/findutils/
183
184 DIFFERENCES BETWEEN make -j AND GNU Parallel
185 make -j can run jobs in parallel, but requires a crafted Makefile to do
186 this. That results in extra quoting to get filename containing newline
187 to work correctly.
188
189 make -j computes a dependency graph before running jobs. Jobs run by
190 GNU parallel does not depend on eachother.
191
192 (Very early versions of GNU parallel were coincidently implemented
193 using make -j).
194
195 https://www.gnu.org/software/make/
196
197 DIFFERENCES BETWEEN ppss AND GNU Parallel
198 ppss is also a tool for running jobs in parallel.
199
200 The output of ppss is status information and thus not useful for using
201 as input for another command. The output from the jobs are put into
202 files.
203
204 The argument replace string ($ITEM) cannot be changed. Arguments must
205 be quoted - thus arguments containing special characters (space '"&!*)
206 may cause problems. More than one argument is not supported. File names
207 containing newlines are not processed correctly. When reading input
208 from a file null cannot be used as a terminator. ppss needs to read the
209 whole input file before starting any jobs.
210
211 Output and status information is stored in ppss_dir and thus requires
212 cleanup when completed. If the dir is not removed before running ppss
213 again it may cause nothing to happen as ppss thinks the task is already
214 done. GNU parallel will normally not need cleaning up if running
215 locally and will only need cleaning up if stopped abnormally and
216 running remote (--cleanup may not complete if stopped abnormally). The
217 example Parallel grep would require extra postprocessing if written
218 using ppss.
219
220 For remote systems PPSS requires 3 steps: config, deploy, and start.
221 GNU parallel only requires one step.
222
223 EXAMPLES FROM ppss MANUAL
224
225 Here are the examples from ppss's manual page with the equivalent using
226 GNU parallel:
227
228 1 ./ppss.sh standalone -d /path/to/files -c 'gzip '
229
230 1 find /path/to/files -type f | parallel gzip
231
232 2 ./ppss.sh standalone -d /path/to/files -c 'cp "$ITEM"
233 /destination/dir '
234
235 2 find /path/to/files -type f | parallel cp {} /destination/dir
236
237 3 ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q '
238
239 3 parallel -a list-of-urls.txt wget -q
240
241 4 ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q "$ITEM"'
242
243 4 parallel -a list-of-urls.txt wget -q {}
244
245 5 ./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir -m
246 192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh -n nodes.txt -o
247 /some/output/dir --upload --download ; ./ppss deploy -C config.cfg ;
248 ./ppss start -C config
249
250 5 # parallel does not use configs. If you want a different username put
251 it in nodes.txt: user@hostname
252
253 5 find source/dir -type f | parallel --sshloginfile nodes.txt --trc
254 {.}.mp3 lame -a {} -o {.}.mp3 --preset standard --quiet
255
256 6 ./ppss stop -C config.cfg
257
258 6 killall -TERM parallel
259
260 7 ./ppss pause -C config.cfg
261
262 7 Press: CTRL-Z or killall -SIGTSTP parallel
263
264 8 ./ppss continue -C config.cfg
265
266 8 Enter: fg or killall -SIGCONT parallel
267
268 9 ./ppss.sh status -C config.cfg
269
270 9 killall -SIGUSR2 parallel
271
272 https://github.com/louwrentius/PPSS
273
274 DIFFERENCES BETWEEN pexec AND GNU Parallel
275 pexec is also a tool for running jobs in parallel.
276
277 EXAMPLES FROM pexec MANUAL
278
279 Here are the examples from pexec's info page with the equivalent using
280 GNU parallel:
281
282 1 pexec -o sqrt-%s.dat -p "$(seq 10)" -e NUM -n 4 -c -- \
283 'echo "scale=10000;sqrt($NUM)" | bc'
284
285 1 seq 10 | parallel -j4 'echo "scale=10000;sqrt({})" | bc >
286 sqrt-{}.dat'
287
288 2 pexec -p "$(ls myfiles*.ext)" -i %s -o %s.sort -- sort
289
290 2 ls myfiles*.ext | parallel sort {} ">{}.sort"
291
292 3 pexec -f image.list -n auto -e B -u star.log -c -- \
293 'fistar $B.fits -f 100 -F id,x,y,flux -o $B.star'
294
295 3 parallel -a image.list \
296 'fistar {}.fits -f 100 -F id,x,y,flux -o {}.star' 2>star.log
297
298 4 pexec -r *.png -e IMG -c -o - -- \
299 'convert $IMG ${IMG%.png}.jpeg ; "echo $IMG: done"'
300
301 4 ls *.png | parallel 'convert {} {.}.jpeg; echo {}: done'
302
303 5 pexec -r *.png -i %s -o %s.jpg -c 'pngtopnm | pnmtojpeg'
304
305 5 ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {}.jpg'
306
307 6 for p in *.png ; do echo ${p%.png} ; done | \
308 pexec -f - -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
309
310 6 ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
311
312 7 LIST=$(for p in *.png ; do echo ${p%.png} ; done)
313 pexec -r $LIST -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
314
315 7 ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
316
317 8 pexec -n 8 -r *.jpg -y unix -e IMG -c \
318 'pexec -j -m blockread -d $IMG | \
319 jpegtopnm | pnmscale 0.5 | pnmtojpeg | \
320 pexec -j -m blockwrite -s th_$IMG'
321
322 8 Combining GNU parallel and GNU sem.
323
324 8 ls *jpg | parallel -j8 'sem --id blockread cat {} | jpegtopnm |' \
325 'pnmscale 0.5 | pnmtojpeg | sem --id blockwrite cat > th_{}'
326
327 8 If reading and writing is done to the same disk, this may be faster
328 as only one process will be either reading or writing:
329
330 8 ls *jpg | parallel -j8 'sem --id diskio cat {} | jpegtopnm |' \
331 'pnmscale 0.5 | pnmtojpeg | sem --id diskio cat > th_{}'
332
333 https://www.gnu.org/software/pexec/
334
335 DIFFERENCES BETWEEN xjobs AND GNU Parallel
336 xjobs is also a tool for running jobs in parallel. It only supports
337 running jobs on your local computer.
338
339 xjobs deals badly with special characters just like xargs. See the
340 section DIFFERENCES BETWEEN xargs AND GNU Parallel.
341
342 Here are the examples from xjobs's man page with the equivalent using
343 GNU parallel:
344
345 1 ls -1 *.zip | xjobs unzip
346
347 1 ls *.zip | parallel unzip
348
349 2 ls -1 *.zip | xjobs -n unzip
350
351 2 ls *.zip | parallel unzip >/dev/null
352
353 3 find . -name '*.bak' | xjobs gzip
354
355 3 find . -name '*.bak' | parallel gzip
356
357 4 ls -1 *.jar | sed 's/\(.*\)/\1 > \1.idx/' | xjobs jar tf
358
359 4 ls *.jar | parallel jar tf {} '>' {}.idx
360
361 5 xjobs -s script
362
363 5 cat script | parallel
364
365 6 mkfifo /var/run/my_named_pipe; xjobs -s /var/run/my_named_pipe & echo
366 unzip 1.zip >> /var/run/my_named_pipe; echo tar cf /backup/myhome.tar
367 /home/me >> /var/run/my_named_pipe
368
369 6 mkfifo /var/run/my_named_pipe; cat /var/run/my_named_pipe | parallel
370 & echo unzip 1.zip >> /var/run/my_named_pipe; echo tar cf
371 /backup/myhome.tar /home/me >> /var/run/my_named_pipe
372
373 http://www.maier-komor.de/xjobs.html
374
375 DIFFERENCES BETWEEN prll AND GNU Parallel
376 prll is also a tool for running jobs in parallel. It does not support
377 running jobs on remote computers.
378
379 prll encourages using BASH aliases and BASH functions instead of
380 scripts. GNU parallel supports scripts directly, functions if they are
381 exported using export -f, and aliases if using env_parallel.
382
383 prll generates a lot of status information on stderr (standard error)
384 which makes it harder to use the stderr (standard error) output of the
385 job directly as input for another program.
386
387 Here is the example from prll's man page with the equivalent using GNU
388 parallel:
389
390 prll -s 'mogrify -flip $1' *.jpg
391 parallel mogrify -flip ::: *.jpg
392
393 https://github.com/exzombie/prll
394
395 DIFFERENCES BETWEEN dxargs AND GNU Parallel
396 dxargs is also a tool for running jobs in parallel.
397
398 dxargs does not deal well with more simultaneous jobs than SSHD's
399 MaxStartups. dxargs is only built for remote run jobs, but does not
400 support transferring of files.
401
402 http://www.semicomplete.com/blog/geekery/distributed-xargs.html
403
404 DIFFERENCES BETWEEN mdm/middleman AND GNU Parallel
405 middleman(mdm) is also a tool for running jobs in parallel.
406
407 Here are the shellscripts of http://mdm.berlios.de/usage.html ported to
408 GNU parallel:
409
410 seq 19 | parallel buffon -o - | sort -n > result
411 cat files | parallel cmd
412 find dir -execdir sem cmd {} \;
413
414 https://github.com/cklin/mdm
415
416 DIFFERENCES BETWEEN xapply AND GNU Parallel
417 xapply can run jobs in parallel on the local computer.
418
419 Here are the examples from xapply's man page with the equivalent using
420 GNU parallel:
421
422 1 xapply '(cd %1 && make all)' */
423
424 1 parallel 'cd {} && make all' ::: */
425
426 2 xapply -f 'diff %1 ../version5/%1' manifest | more
427
428 2 parallel diff {} ../version5/{} < manifest | more
429
430 3 xapply -p/dev/null -f 'diff %1 %2' manifest1 checklist1
431
432 3 parallel --link diff {1} {2} :::: manifest1 checklist1
433
434 4 xapply 'indent' *.c
435
436 4 parallel indent ::: *.c
437
438 5 find ~ksb/bin -type f ! -perm -111 -print | xapply -f -v 'chmod a+x'
439 -
440
441 5 find ~ksb/bin -type f ! -perm -111 -print | parallel -v chmod a+x
442
443 6 find */ -... | fmt 960 1024 | xapply -f -i /dev/tty 'vi' -
444
445 6 sh <(find */ -... | parallel -s 1024 echo vi)
446
447 6 find */ -... | parallel -s 1024 -Xuj1 vi
448
449 7 find ... | xapply -f -5 -i /dev/tty 'vi' - - - - -
450
451 7 sh <(find ... |parallel -n5 echo vi)
452
453 7 find ... |parallel -n5 -uj1 vi
454
455 8 xapply -fn "" /etc/passwd
456
457 8 parallel -k echo < /etc/passwd
458
459 9 tr ':' '\012' < /etc/passwd | xapply -7 -nf 'chown %1 %6' - - - - - -
460 -
461
462 9 tr ':' '\012' < /etc/passwd | parallel -N7 chown {1} {6}
463
464 10 xapply '[ -d %1/RCS ] || echo %1' */
465
466 10 parallel '[ -d {}/RCS ] || echo {}' ::: */
467
468 11 xapply -f '[ -f %1 ] && echo %1' List | ...
469
470 11 parallel '[ -f {} ] && echo {}' < List | ...
471
472 http://carrera.databits.net/~ksb/msrc/local/bin/xapply/xapply.html
473
474 DIFFERENCES BETWEEN AIX apply AND GNU Parallel
475 apply can build command lines based on a template and arguments - very
476 much like GNU parallel. apply does not run jobs in parallel. apply does
477 not use an argument separator (like :::); instead the template must be
478 the first argument.
479
480 Here are the examples from
481 https://www-01.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.cmds1/apply.htm
482
483 1. To obtain results similar to those of the ls command, enter:
484
485 apply echo *
486 parallel echo ::: *
487
488 2. To compare the file named a1 to the file named b1, and the file
489 named a2 to the file named b2, enter:
490
491 apply -2 cmp a1 b1 a2 b2
492 parallel -N2 cmp ::: a1 b1 a2 b2
493
494 3. To run the who command five times, enter:
495
496 apply -0 who 1 2 3 4 5
497 parallel -N0 who ::: 1 2 3 4 5
498
499 4. To link all files in the current directory to the directory
500 /usr/joe, enter:
501
502 apply 'ln %1 /usr/joe' *
503 parallel ln {} /usr/joe ::: *
504
505 https://www.ibm.com/support/knowledgecenter/en/ssw_aix_61/com.ibm.aix.cmds1/apply.htm
506
507 DIFFERENCES BETWEEN paexec AND GNU Parallel
508 paexec can run jobs in parallel on both the local and remote computers.
509
510 paexec requires commands to print a blank line as the last output. This
511 means you will have to write a wrapper for most programs.
512
513 paexec has a job dependency facility so a job can depend on another job
514 to be executed successfully. Sort of a poor-man's make.
515
516 Here are the examples from paexec's example catalog with the equivalent
517 using GNU parallel:
518
519 1_div_X_run:
520 ../../paexec -s -l -c "`pwd`/1_div_X_cmd" -n +1 <<EOF [...]
521 parallel echo {} '|' `pwd`/1_div_X_cmd <<EOF [...]
522
523 all_substr_run:
524 ../../paexec -lp -c "`pwd`/all_substr_cmd" -n +3 <<EOF [...]
525 parallel echo {} '|' `pwd`/all_substr_cmd <<EOF [...]
526
527 cc_wrapper_run:
528 ../../paexec -c "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
529 -n 'host1 host2' \
530 -t '/usr/bin/ssh -x' <<EOF [...]
531 parallel echo {} '|' "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
532 -S host1,host2 <<EOF [...]
533 # This is not exactly the same, but avoids the wrapper
534 parallel gcc -O2 -c -o {.}.o {} \
535 -S host1,host2 <<EOF [...]
536
537 toupper_run:
538 ../../paexec -lp -c "`pwd`/toupper_cmd" -n +10 <<EOF [...]
539 parallel echo {} '|' ./toupper_cmd <<EOF [...]
540 # Without the wrapper:
541 parallel echo {} '| awk {print\ toupper\(\$0\)}' <<EOF [...]
542
543 https://github.com/cheusov/paexec
544
545 DIFFERENCES BETWEEN map(sitaramc) AND GNU Parallel
546 map sees it as a feature to have less features and in doing so it also
547 handles corner cases incorrectly. A lot of GNU parallel's code is to
548 handle corner cases correctly on every platform, so you will not get a
549 nasty surprise if a user for example saves a file called: My brother's
550 12" records.txt
551
552 map's example showing how to deal with special characters fails on
553 special characters:
554
555 echo "The Cure" > My\ brother\'s\ 12\"\ records
556
557 ls | \
558 map 'echo -n `gzip < "%" | wc -c`; echo -n '*100/'; wc -c < "%"' | bc
559
560 It works with GNU parallel:
561
562 ls | \
563 parallel 'echo -n `gzip < {} | wc -c`; echo -n '*100/'; wc -c < {}' | bc
564
565 And you can even get the file name prepended:
566
567 ls | \
568 parallel --tag '(echo -n `gzip < {} | wc -c`'*100/'; wc -c < {}) | bc'
569
570 map has no support for grouping. So this gives the wrong results
571 without any warnings:
572
573 parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} \
574 ::: a b c d e f
575 ls -l a b c d e f
576 parallel -kP4 -n1 grep 1 > out.par ::: a b c d e f
577 map -p 4 'grep 1' a b c d e f > out.map-unbuf
578 map -p 4 'grep --line-buffered 1' a b c d e f > out.map-linebuf
579 map -p 1 'grep --line-buffered 1' a b c d e f > out.map-serial
580 ls -l out*
581 md5sum out*
582
583 The documentation shows a workaround, but not only does that mix stdout
584 (standard output) with stderr (standard error) it also fails completely
585 for certain jobs (and may even be considered less readable):
586
587 parallel echo -n {} ::: 1 2 3
588
589 map -p 4 'echo -n % 2>&1 | sed -e "s/^/$$:/"' 1 2 3 | sort | cut -f2- -d:
590
591 maps replacement strings (% %D %B %E) can be simulated in GNU parallel
592 by putting this in ~/.parallel/config:
593
594 --rpl '%'
595 --rpl '%D $_=::shell_quote(::dirname($_));'
596 --rpl '%B s:.*/::;s:\.[^/.]+$::;'
597 --rpl '%E s:.*\.::'
598
599 map cannot handle bundled options: map -vp 0 echo this fails
600
601 map does not have an argument separator on the command line, but uses
602 the first argument as command. This makes quoting harder which again
603 may affect readability. Compare:
604
605 map -p 2 perl\\\ -ne\\\ \\\'/^\\\\S+\\\\s+\\\\S+\\\$/\\\ and\\\ print\\\ \\\$ARGV,\\\"\\\\n\\\"\\\' *
606
607 parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' ::: *
608
609 map can do multiple arguments with context replace, but not without
610 context replace:
611
612 parallel --xargs echo 'BEGIN{'{}'}END' ::: 1 2 3
613
614 map does not set exit value according to whether one of the jobs
615 failed:
616
617 parallel false ::: 1 || echo Job failed
618
619 map false 1 || echo Never run
620
621 map requires Perl v5.10.0 making it harder to use on old systems.
622
623 map has no way of using % in the command (GNU Parallel has -I to
624 specify another replacement string than {}).
625
626 By design map is option incompatible with xargs, it does not have
627 remote job execution, a structured way of saving results, multiple
628 input sources, progress indicator, configurable record delimiter (only
629 field delimiter), logging of jobs run with possibility to resume,
630 keeping the output in the same order as input, --pipe processing, and
631 dynamically timeouts.
632
633 https://github.com/sitaramc/map
634
635 DIFFERENCES BETWEEN ladon AND GNU Parallel
636 ladon can run multiple jobs on files in parallel.
637
638 ladon only works on files and the only way to specify files is using a
639 quoted glob string (such as \*.jpg). It is not possible to list the
640 files manually.
641
642 As replacement strings it uses FULLPATH DIRNAME BASENAME EXT RELDIR
643 RELPATH
644
645 These can be simulated using GNU parallel by putting this in
646 ~/.parallel/config:
647
648 --rpl 'FULLPATH $_=::shell_quote($_);chomp($_=qx{readlink -f $_});'
649 --rpl 'DIRNAME $_=::shell_quote(::dirname($_));chomp($_=qx{readlink -f $_});'
650 --rpl 'BASENAME s:.*/::;s:\.[^/.]+$::;'
651 --rpl 'EXT s:.*\.::'
652 --rpl 'RELDIR $_=::shell_quote($_);chomp(($_,$c)=qx{readlink -f $_;pwd});s:\Q$c/\E::;$_=::dirname($_);'
653 --rpl 'RELPATH $_=::shell_quote($_);chomp(($_,$c)=qx{readlink -f $_;pwd});s:\Q$c/\E::;'
654
655 ladon deals badly with filenames containing " and newline, and it fails
656 for output larger than 200k:
657
658 ladon '*' -- seq 36000 | wc
659
660 EXAMPLES FROM ladon MANUAL
661
662 It is assumed that the '--rpl's above are put in ~/.parallel/config and
663 that it is run under a shell that supports '**' globbing (such as zsh):
664
665 1 ladon "**/*.txt" -- echo RELPATH
666
667 1 parallel echo RELPATH ::: **/*.txt
668
669 2 ladon "~/Documents/**/*.pdf" -- shasum FULLPATH >hashes.txt
670
671 2 parallel shasum FULLPATH ::: ~/Documents/**/*.pdf >hashes.txt
672
673 3 ladon -m thumbs/RELDIR "**/*.jpg" -- convert FULLPATH -thumbnail
674 100x100^ -gravity center -extent 100x100 thumbs/RELPATH
675
676 3 parallel mkdir -p thumbs/RELDIR\; convert FULLPATH -thumbnail
677 100x100^ -gravity center -extent 100x100 thumbs/RELPATH ::: **/*.jpg
678
679 4 ladon "~/Music/*.wav" -- lame -V 2 FULLPATH DIRNAME/BASENAME.mp3
680
681 4 parallel lame -V 2 FULLPATH DIRNAME/BASENAME.mp3 ::: ~/Music/*.wav
682
683 https://github.com/danielgtaylor/ladon
684
685 DIFFERENCES BETWEEN jobflow AND GNU Parallel
686 jobflow can run multiple jobs in parallel.
687
688 Just like xargs output from jobflow jobs running in parallel mix
689 together by default. jobflow can buffer into files (placed in
690 /run/shm), but these are not cleaned up - not even if jobflow dies
691 unexpectently. If the total output is big (in the order of RAM+swap) it
692 can cause the system to run out of memory.
693
694 jobflow gives no error if the command is unknown, and like xargs
695 redirection requires wrapping with bash -c.
696
697 jobflow makes it possible to set resource limits on the running jobs.
698 This can be emulated by GNU parallel using bash's ulimit:
699
700 jobflow -limits=mem=100M,cpu=3,fsize=20M,nofiles=300 myjob
701
702 parallel 'ulimit -v 102400 -t 3 -f 204800 -n 300 myjob'
703
704 EXAMPLES FROM jobflow README
705
706 1 cat things.list | jobflow -threads=8 -exec ./mytask {}
707
708 1 cat things.list | parallel -j8 ./mytask {}
709
710 2 seq 100 | jobflow -threads=100 -exec echo {}
711
712 2 seq 100 | parallel -j100 echo {}
713
714 3 cat urls.txt | jobflow -threads=32 -exec wget {}
715
716 3 cat urls.txt | parallel -j32 wget {}
717
718 4 find . -name '*.bmp' | jobflow -threads=8 -exec bmp2jpeg {.}.bmp
719 {.}.jpg
720
721 4 find . -name '*.bmp' | parallel -j8 bmp2jpeg {.}.bmp {.}.jpg
722
723 https://github.com/rofl0r/jobflow
724
725 DIFFERENCES BETWEEN gargs AND GNU Parallel
726 gargs can run multiple jobs in parallel.
727
728 It caches output in memory. This causes it to be extremely slow when
729 the output is larger than the physical RAM, and can cause the system to
730 run out of memory.
731
732 See more details on this in man parallel_design.
733
734 Output to stderr (standard error) is changed if the command fails.
735
736 Here are the two examples from gargs website.
737
738 1 seq 12 -1 1 | gargs -p 4 -n 3 "sleep {0}; echo {1} {2}"
739
740 1 seq 12 -1 1 | parallel -P 4 -n 3 "sleep {1}; echo {2} {3}"
741
742 2 cat t.txt | gargs --sep "\s+" -p 2 "echo '{0}:{1}-{2}' full-line:
743 \'{}\'"
744
745 2 cat t.txt | parallel --colsep "\\s+" -P 2 "echo '{1}:{2}-{3}' full-
746 line: \'{}\'"
747
748 https://github.com/brentp/gargs
749
750 DIFFERENCES BETWEEN orgalorg AND GNU Parallel
751 orgalorg can run the same job on multiple machines. This is related to
752 --onall and --nonall.
753
754 orgalorg supports entering the SSH password - provided it is the same
755 for all servers. GNU parallel advocates using ssh-agent instead, but it
756 is possible to emulate orgalorg's behavior by setting SSHPASS and by
757 using --ssh "sshpass ssh".
758
759 To make the emulation easier, make a simple alias:
760
761 alias par_emul="parallel -j0 --ssh 'sshpass ssh' --nonall --tag --linebuffer"
762
763 If you want to supply a password run:
764
765 SSHPASS=`ssh-askpass`
766
767 or set the password directly:
768
769 SSHPASS=P4$$w0rd!
770
771 If the above is set up you can then do:
772
773 orgalorg -o frontend1 -o frontend2 -p -C uptime
774 par_emul -S frontend1 -S frontend2 uptime
775
776 orgalorg -o frontend1 -o frontend2 -p -C top -bid 1
777 par_emul -S frontend1 -S frontend2 top -bid 1
778
779 orgalorg -o frontend1 -o frontend2 -p -er /tmp -n 'md5sum /tmp/bigfile' -S bigfile
780 par_emul -S frontend1 -S frontend2 --basefile bigfile --workdir /tmp md5sum /tmp/bigfile
781
782 orgalorg has a progress indicator for the transferring of a file. GNU
783 parallel does not.
784
785 https://github.com/reconquest/orgalorg
786
787 DIFFERENCES BETWEEN Rust parallel AND GNU Parallel
788 Rust parallel focuses on speed. It is almost as fast as xargs. It
789 implements a few features from GNU parallel, but lacks many functions.
790 All these fail:
791
792 # Show what would be executed
793 parallel --dry-run echo ::: a
794 # Read arguments from file
795 parallel -a file echo
796 # Changing the delimiter
797 parallel -d _ echo ::: a_b_c_
798
799 These do something different from GNU parallel
800
801 # Read more arguments at a time -n
802 parallel -n 2 echo ::: 1 a 2 b
803 # -q to protect quoted $ and space
804 parallel -q perl -e '$a=shift; print "$a"x10000000' ::: a b c
805 # Generation of combination of inputs
806 parallel echo {1} {2} ::: red green blue ::: S M L XL XXL
807 # {= perl expression =} replacement string
808 parallel echo '{= s/new/old/ =}' ::: my.new your.new
809 # --pipe
810 seq 100000 | parallel --pipe wc
811 # linked arguments
812 parallel echo ::: S M L :::+ small medium large ::: R G B :::+ red green blue
813 # Run different shell dialects
814 zsh -c 'parallel echo \={} ::: zsh && true'
815 csh -c 'parallel echo \$\{\} ::: shell && true'
816 bash -c 'parallel echo \$\({}\) ::: pwd && true'
817 # Rust parallel does not start before the last argument is read
818 (seq 10; sleep 5; echo 2) | time parallel -j2 'sleep 2; echo'
819 tail -f /var/log/syslog | parallel echo
820
821 Rust parallel has no remote facilities.
822
823 It uses /tmp/parallel for tmp files and does not clean up if terminated
824 abrubtly. If another user on the system uses Rust parallel, then
825 /tmp/parallel will have the wrong permissions and Rust parallel will
826 fail. A malicious user can setup the right permissions and symlink the
827 output file to one of the user's files and next time the user uses Rust
828 parallel it will overwrite this file.
829
830 If /tmp/parallel runs full during the run, Rust parallel does not
831 report this, but finishes with success - thereby risking data loss.
832
833 https://github.com/mmstick/parallel
834
835 DIFFERENCES BETWEEN Rush AND GNU Parallel
836 rush (https://github.com/shenwei356/rush) is written in Go and based on
837 gargs.
838
839 Just like GNU parallel rush buffers in temporary files. But opposite
840 GNU parallel rush does not clean up, if the process dies abnormally.
841
842 rush has some string manipulations that can be emulated by putting this
843 into ~/.parallel/config (/ is used instead of %, and % is used instead
844 of ^ as that is closer to bash's ${var%postfix}):
845
846 --rpl '{:} s:(\.[^/]+)*$::'
847 --rpl '{:%([^}]+?)} s:$$1(\.[^/]+)*$::'
848 --rpl '{/:%([^}]*?)} s:.*/(.*)$$1(\.[^/]+)*$:$1:'
849 --rpl '{/:} s:(.*/)?([^/.]+)(\.[^/]+)*$:$2:'
850 --rpl '{@(.*?)} /$$1/ and $_=$1;'
851
852 Here are the examples from rush's website with the equivalent command
853 in GNU parallel.
854
855 EXAMPLES
856
857 1. Simple run, quoting is not necessary
858
859 $ seq 1 3 | rush echo {}
860
861 $ seq 1 3 | parallel echo {}
862
863 2. Read data from file (`-i`)
864
865 $ rush echo {} -i data1.txt -i data2.txt
866
867 $ cat data1.txt data2.txt | parallel echo {}
868
869 3. Keep output order (`-k`)
870
871 $ seq 1 3 | rush 'echo {}' -k
872
873 $ seq 1 3 | parallel -k echo {}
874
875 4. Timeout (`-t`)
876
877 $ time seq 1 | rush 'sleep 2; echo {}' -t 1
878
879 $ time seq 1 | parallel --timeout 1 'sleep 2; echo {}'
880
881 5. Retry (`-r`)
882
883 $ seq 1 | rush 'python unexisted_script.py' -r 1
884
885 $ seq 1 | parallel --retries 2 'python unexisted_script.py'
886
887 Use -u to see it is really run twice:
888
889 $ seq 1 | parallel -u --retries 2 'python unexisted_script.py'
890
891 6. Dirname (`{/}`) and basename (`{%}`) and remove custom suffix
892 (`{^suffix}`)
893
894 $ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
895
896 $ echo dir/file_1.txt.gz |
897 parallel --plus echo {//} {/} {%_1.txt.gz}
898
899 7. Get basename, and remove last (`{.}`) or any (`{:}`) extension
900
901 $ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
902
903 $ echo dir.d/file.txt.gz | parallel 'echo {.} {:} {/.} {/:}'
904
905 8. Job ID, combine fields index and other replacement strings
906
907 $ echo 12 file.txt dir/s_1.fq.gz |
908 rush 'echo job {#}: {2} {2.} {3%:^_1}'
909
910 $ echo 12 file.txt dir/s_1.fq.gz |
911 parallel --colsep ' ' 'echo job {#}: {2} {2.} {3/:%_1}'
912
913 9. Capture submatch using regular expression (`{@regexp}`)
914
915 $ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
916
917 $ echo read_1.fq.gz | parallel 'echo {@(.+)_\d}'
918
919 10. Custom field delimiter (`-d`)
920
921 $ echo a=b=c | rush 'echo {1} {2} {3}' -d =
922
923 $ echo a=b=c | parallel -d = echo {1} {2} {3}
924
925 11. Send multi-lines to every command (`-n`)
926
927 $ seq 5 | rush -n 2 -k 'echo "{}"; echo'
928
929 $ seq 5 |
930 parallel -n 2 -k \
931 'echo {=-1 $_=join"\n",@arg[1..$#arg] =}; echo'
932
933 $ seq 5 | rush -n 2 -k 'echo "{}"; echo' -J ' '
934
935 $ seq 5 | parallel -n 2 -k 'echo {}; echo'
936
937 12. Custom record delimiter (`-D`), note that empty records are not
938 used.
939
940 $ echo a b c d | rush -D " " -k 'echo {}'
941
942 $ echo a b c d | parallel -d " " -k 'echo {}'
943
944 $ echo abcd | rush -D "" -k 'echo {}'
945
946 Cannot be done by GNU Parallel
947
948 $ cat fasta.fa
949 >seq1
950 tag
951 >seq2
952 cat
953 gat
954 >seq3
955 attac
956 a
957 cat
958
959 $ cat fasta.fa | rush -D ">" \
960 'echo FASTA record {#}: name: {1} sequence: {2}' -k -d "\n"
961 # rush fails to join the multiline sequences
962
963 $ cat fasta.fa | (read -n1 ignore_first_char;
964 parallel -d '>' --colsep '\n' echo FASTA record {#}: \
965 name: {1} sequence: '{=2 $_=join"",@arg[2..$#arg]=}'
966 )
967
968 13. Assign value to variable, like `awk -v` (`-v`)
969
970 $ seq 1 |
971 rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
972
973 $ seq 1 |
974 parallel -N0 \
975 'fname=Wei; lname=Shen; echo Hello, ${fname} ${lname}!'
976
977 $ for var in a b; do \
978 $ seq 1 3 | rush -k -v var=$var 'echo var: {var}, data: {}'; \
979 $ done
980
981 In GNU parallel you would typically do:
982
983 $ seq 1 3 | parallel -k echo var: {1}, data: {2} ::: a b :::: -
984
985 If you really want the var:
986
987 $ seq 1 3 |
988 parallel -k var={1} ';echo var: $var, data: {}' ::: a b :::: -
989
990 If you really want the for-loop:
991
992 $ for var in a b; do
993 > export var;
994 > seq 1 3 | parallel -k 'echo var: $var, data: {}';
995 > done
996
997 Contrary to rush this also works if the value is complex like:
998
999 My brother's 12" records
1000
1001 14. Preset variable (`-v`), avoid repeatedly writing verbose
1002 replacement strings
1003
1004 # naive way
1005 $ echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz'
1006
1007 $ echo read_1.fq.gz | parallel 'echo {:%_1} {:%_1}_2.fq.gz'
1008
1009 # macro + removing suffix
1010 $ echo read_1.fq.gz |
1011 rush -v p='{:^_1}' 'echo {p} {p}_2.fq.gz'
1012
1013 $ echo read_1.fq.gz |
1014 parallel 'p={:%_1}; echo $p ${p}_2.fq.gz'
1015
1016 # macro + regular expression
1017 $ echo read_1.fq.gz | rush -v p='{@(.+?)_\d}' 'echo {p} {p}_2.fq.gz'
1018
1019 $ echo read_1.fq.gz | parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1020
1021 Contrary to rush GNU parallel works with complex values:
1022
1023 echo "My brother's 12\"read_1.fq.gz" |
1024 parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1025
1026 15. Interrupt jobs by `Ctrl-C`, rush will stop unfinished commands and
1027 exit.
1028
1029 $ seq 1 20 | rush 'sleep 1; echo {}'
1030 ^C
1031
1032 $ seq 1 20 | parallel 'sleep 1; echo {}'
1033 ^C
1034
1035 16. Continue/resume jobs (`-c`). When some jobs failed (by execution
1036 failure, timeout, or cancelling by user with `Ctrl + C`), please switch
1037 flag `-c/--continue` on and run again, so that `rush` can save
1038 successful commands and ignore them in NEXT run.
1039
1040 $ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1041 $ cat successful_cmds.rush
1042 $ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1043
1044 $ seq 1 3 | parallel --joblog mylog --timeout 2 \
1045 'sleep {}; echo {}'
1046 $ cat mylog
1047 $ seq 1 3 | parallel --joblog mylog --retry-failed \
1048 'sleep {}; echo {}'
1049
1050 Multi-line jobs:
1051
1052 $ seq 1 3 | rush 'sleep {}; echo {}; \
1053 echo finish {}' -t 3 -c -C finished.rush
1054 $ cat finished.rush
1055 $ seq 1 3 | rush 'sleep {}; echo {}; \
1056 echo finish {}' -t 3 -c -C finished.rush
1057
1058 $ seq 1 3 |
1059 parallel --joblog mylog --timeout 2 'sleep {}; echo {}; \
1060 echo finish {}'
1061 $ cat mylog
1062 $ seq 1 3 |
1063 parallel --joblog mylog --retry-failed 'sleep {}; echo {}; \
1064 echo finish {}'
1065
1066 17. A comprehensive example: downloading 1K+ pages given by three URL
1067 list files using `phantomjs save_page.js` (some page contents are
1068 dynamicly generated by Javascript, so `wget` does not work). Here I set
1069 max jobs number (`-j`) as `20`, each job has a max running time (`-t`)
1070 of `60` seconds and `3` retry changes (`-r`). Continue flag `-c` is
1071 also switched on, so we can continue unfinished jobs. Luckily, it's
1072 accomplished in one run :)
1073
1074 $ for f in $(seq 2014 2016); do \
1075 $ /bin/rm -rf $f; mkdir -p $f; \
1076 $ cat $f.html.txt | rush -v d=$f -d = \
1077 'phantomjs save_page.js "{}" > {d}/{3}.html' \
1078 -j 20 -t 60 -r 3 -c; \
1079 $ done
1080
1081 GNU parallel can append to an existing joblog with '+':
1082
1083 $ rm mylog
1084 $ for f in $(seq 2014 2016); do
1085 /bin/rm -rf $f; mkdir -p $f;
1086 cat $f.html.txt |
1087 parallel -j20 --timeout 60 --retries 4 --joblog +mylog \
1088 --colsep = \
1089 phantomjs save_page.js {1}={2}={3} '>' $f/{3}.html
1090 done
1091
1092 18. A bioinformatics example: mapping with `bwa`, and processing result
1093 with `samtools`:
1094
1095 $ ref=ref/xxx.fa
1096 $ threads=25
1097 $ ls -d raw.cluster.clean.mapping/* \
1098 | rush -v ref=$ref -v j=$threads -v p='{}/{%}' \
1099 'bwa mem -t {j} -M -a {ref} {p}_1.fq.gz {p}_2.fq.gz > {p}.sam; \
1100 samtools view -bS {p}.sam > {p}.bam; \
1101 samtools sort -T {p}.tmp -@ {j} {p}.bam -o {p}.sorted.bam; \
1102 samtools index {p}.sorted.bam; \
1103 samtools flagstat {p}.sorted.bam > {p}.sorted.bam.flagstat; \
1104 /bin/rm {p}.bam {p}.sam;' \
1105 -j 2 --verbose -c -C mapping.rush
1106
1107 GNU parallel would use a function:
1108
1109 $ ref=ref/xxx.fa
1110 $ export ref
1111 $ thr=25
1112 $ export thr
1113 $ bwa_sam() {
1114 p="$1"
1115 bam="$p".bam
1116 sam="$p".sam
1117 sortbam="$p".sorted.bam
1118 bwa mem -t $thr -M -a $ref ${p}_1.fq.gz ${p}_2.fq.gz > "$sam"
1119 samtools view -bS "$sam" > "$bam"
1120 samtools sort -T ${p}.tmp -@ $thr "$bam" -o "$sortbam"
1121 samtools index "$sortbam"
1122 samtools flagstat "$sortbam" > "$sortbam".flagstat
1123 /bin/rm "$bam" "$sam"
1124 }
1125 $ export -f bwa_sam
1126 $ ls -d raw.cluster.clean.mapping/* |
1127 parallel -j 2 --verbose --joblog mylog bwa_sam
1128
1129 Other rush features
1130
1131 rush has:
1132
1133 · awk -v like custom defined variables (-v)
1134
1135 With GNU parallel you would simply simply set a shell variable:
1136
1137 parallel 'v={}; echo "$v"' ::: foo
1138 echo foo | rush -v v={} 'echo {v}'
1139
1140 Also rush does not like special chars. So these do not work:
1141
1142 echo does not work | rush -v v=\" 'echo {v}'
1143 echo "My brother's 12\" records" | rush -v v={} 'echo {v}'
1144
1145 Whereas the corresponding GNU parallel version works:
1146
1147 parallel 'v=\"; echo "$v"' ::: works
1148 parallel 'v={}; echo "$v"' ::: "My brother's 12\" records"
1149
1150 · Exit on first error(s) (-e)
1151
1152 This is called --halt now,fail=1 (or shorter: --halt 2) when used
1153 with GNU parallel.
1154
1155 · Settable records sending to every command (-n, default 1)
1156
1157 This is also called -n in GNU parallel.
1158
1159 · Practical replacement strings
1160
1161 {:} remove any extension
1162 With GNU parallel this can be emulated by:
1163
1164 parallel --plus echo '{/\..*/}' ::: foo.ext.bar.gz
1165
1166 {^suffix}, remove suffix
1167 With GNU parallel this can be emulated by:
1168
1169 parallel --plus echo '{%.bar.gz}' ::: foo.ext.bar.gz
1170
1171 {@regexp}, capture submatch using regular expression
1172 With GNU parallel this can be emulated by:
1173
1174 parallel --rpl '{@(.*?)} /$$1/ and $_=$1;' \
1175 echo '{@\d_(.*).gz}' ::: 1_foo.gz
1176
1177 {%.}, {%:}, basename without extension
1178 With GNU parallel this can be emulated by:
1179
1180 parallel echo '{= s:.*/::;s/\..*// =}' ::: dir/foo.bar.gz
1181
1182 And if you need it often, you define a --rpl in
1183 $HOME/.parallel/config:
1184
1185 --rpl '{%.} s:.*/::;s/\..*//'
1186 --rpl '{%:} s:.*/::;s/\..*//'
1187
1188 Then you can use them as:
1189
1190 parallel echo {%.} {%:} ::: dir/foo.bar.gz
1191
1192 · Preset variable (macro)
1193
1194 E.g.
1195
1196 echo foosuffix | rush -v p={^suffix} 'echo {p}_new_suffix'
1197
1198 With GNU parallel this can be emulated by:
1199
1200 echo foosuffix | parallel --plus 'p={%suffix}; echo ${p}_new_suffix'
1201
1202 Opposite rush GNU parallel works fine if the input contains double
1203 space, ' and ":
1204
1205 echo "1'6\" foosuffix" |
1206 parallel --plus 'p={%suffix}; echo "${p}"_new_suffix'
1207
1208 · Commands of multi-lines
1209
1210 While you can use multi-lined commands in GNU parallel, to improve
1211 readibilty GNU parallel discourages the use of multi-line commands.
1212 In most cases it can be written as a function:
1213
1214 seq 1 3 | parallel --timeout 2 --joblog my.log 'sleep {}; echo {}; \
1215 echo finish {}'
1216
1217 Could be written as:
1218
1219 doit() {
1220 sleep "$1"
1221 echo "$1"
1222 echo finish "$1"
1223 }
1224 export -f doit
1225 seq 1 3 | parallel --timeout 2 --joblog my.log doit
1226
1227 The failed commands can be resumed with:
1228
1229 seq 1 3 |
1230 parallel --resume-failed --joblog my.log 'sleep {}; echo {};\
1231 echo finish {}'
1232
1233 https://github.com/shenwei356/rush
1234
1235 DIFFERENCES BETWEEN ClusterSSH AND GNU Parallel
1236 ClusterSSH solves a different problem than GNU parallel.
1237
1238 ClusterSSH opens a terminal window for each computer and using a master
1239 window you can run the same command on all the computers. This is
1240 typically used for administrating several computers that are almost
1241 identical.
1242
1243 GNU parallel runs the same (or different) commands with different
1244 arguments in parallel possibly using remote computers to help
1245 computing. If more than one computer is listed in -S GNU parallel may
1246 only use one of these (e.g. if there are 8 jobs to be run and one
1247 computer has 8 cores).
1248
1249 GNU parallel can be used as a poor-man's version of ClusterSSH:
1250
1251 parallel --nonall -S server-a,server-b do_stuff foo bar
1252
1253 https://github.com/duncs/clusterssh
1254
1255 DIFFERENCES BETWEEN coshell AND GNU Parallel
1256 coshell only accepts full commands on standard input. Any quoting needs
1257 to be done by the user.
1258
1259 Commands are run in sh so any bash/tcsh/zsh specific syntax will not
1260 work.
1261
1262 Output can be buffered by using -d. Output is buffered in memory, so
1263 big output can cause swapping and therefore be terrible slow or even
1264 cause out of memory.
1265
1266 https://github.com/gdm85/coshell
1267
1268 DIFFERENCES BETWEEN spread AND GNU Parallel
1269 spread runs commands on all directories.
1270
1271 It can be emulated with GNU parallel using this Bash function:
1272
1273 spread() {
1274 _cmds() {
1275 perl -e '$"=" && ";print "@ARGV"' "cd {}" "$@"
1276 }
1277 parallel $(_cmds "$@")'|| echo exit status $?' ::: */
1278 }
1279
1280 This works execpt for the --exclude option.
1281
1282 DIFFERENCES BETWEEN pyargs AND GNU Parallel
1283 pyargs deals badly with input containing spaces. It buffers stdout, but
1284 not stderr. It buffers in RAM. {} does not work as replacement string.
1285 It does not support running functions.
1286
1287 pyargs does not support composed commands if run with --lines, and
1288 fails on pyargs traceroute gnu.org fsf.org.
1289
1290 Examples
1291
1292 seq 5 | pyargs -P50 -L seq
1293 seq 5 | parallel -P50 --lb seq
1294
1295 seq 5 | pyargs -P50 --mark -L seq
1296 seq 5 | parallel -P50 --lb \
1297 --tagstring OUTPUT'[{= $_=$job->replaced()=}]' seq
1298 # Similar, but not precisely the same
1299 seq 5 | parallel -P50 --lb --tag seq
1300
1301 seq 5 | pyargs -P50 --mark command
1302 # Somewhat longer with GNU Parallel due to the special
1303 # --mark formatting
1304 cmd="$(echo "command" | parallel --shellquote)"
1305 wrap_cmd() {
1306 echo "MARK $cmd $@================================" >&3
1307 echo "OUTPUT START[$cmd $@]:"
1308 eval $cmd "$@"
1309 echo "OUTPUT END[$cmd $@]"
1310 }
1311 (seq 5 | env_parallel -P2 wrap_cmd) 3>&1
1312 # Similar, but not exactly the same
1313 seq 5 | parallel -t --tag command
1314
1315 (echo '1 2 3';echo 4 5 6) | pyargs --stream seq
1316 (echo '1 2 3';echo 4 5 6) | perl -pe 's/\n/ /' |
1317 parallel -r -d' ' seq
1318 # Similar, but not exactly the same
1319 parallel seq ::: 1 2 3 4 5 6
1320
1321 https://github.com/robertblackwell/pyargs
1322
1323 DIFFERENCES BETWEEN concurrently AND GNU Parallel
1324 concurrently runs jobs in parallel.
1325
1326 The output is prepended with the job number, and may be incomplete:
1327
1328 $ concurrently 'seq 100000' | (sleep 3;wc -l)
1329 7165
1330
1331 When pretty printing it caches output in memory. Output mixes by using
1332 test MIX below wether or not output is cached.
1333
1334 There seems to be no way of making a template command and have
1335 concurrently fill that with different args. The full commands must be
1336 given on the command line.
1337
1338 There is also no way of controlling how many jobs should be run in
1339 parallel at a time - i.e. "number of jobslots". Instead all jobs are
1340 simply started in parallel.
1341
1342 https://github.com/kimmobrunfeldt/concurrently
1343
1344 DIFFERENCES BETWEEN map(soveran) AND GNU Parallel
1345 map does not run jobs in parallel by default. The README suggests
1346 using:
1347
1348 ... | map t 'sleep $t && say done &'
1349
1350 But this fails if more jobs are run in parallel than the number of
1351 available processes. Since there is no support for parallelization in
1352 map itself, the output also mixes:
1353
1354 seq 10 | map i 'echo start-$i && sleep 0.$i && echo end-$i &'
1355
1356 The major difference is that GNU parallel is build for parallelization
1357 and map is not. So GNU parallel has lots of ways of dealing with the
1358 issues that parallelization raises:
1359
1360 · Keep the number of processes manageable
1361
1362 · Make sure output does not mix
1363
1364 · Make Ctrl-C kill all running processes
1365
1366 Here are the 5 examples converted to GNU Parallel:
1367
1368 1$ ls *.c | map f 'foo $f'
1369 1$ ls *.c | parallel foo
1370
1371 2$ ls *.c | map f 'foo $f; bar $f'
1372 2$ ls *.c | parallel 'foo {}; bar {}'
1373
1374 3$ cat urls | map u 'curl -O $u'
1375 3$ cat urls | parallel curl -O
1376
1377 4$ printf "1\n1\n1\n" | map t 'sleep $t && say done'
1378 4$ printf "1\n1\n1\n" | parallel 'sleep {} && say done'
1379 4$ paralllel 'sleep {} && say done' ::: 1 1 1
1380
1381 5$ printf "1\n1\n1\n" | map t 'sleep $t && say done &'
1382 5$ printf "1\n1\n1\n" | parallel -j0 'sleep {} && say done'
1383 5$ parallel -j0 'sleep {} && say done' ::: 1 1 1
1384
1385 https://github.com/soveran/map
1386
1387 Todo
1388 Url for map, spread
1389
1390 machma. Requires Go >= 1.7.
1391
1392 https://github.com/k-bx/par requires Haskell to work. This limits the
1393 number of platforms this can work on.
1394
1395 https://github.com/otonvm/Parallel
1396
1397 https://github.com/flesler/parallel
1398
1399 https://github.com/kou1okada/lesser-parallel
1400
1401 https://github.com/Julian/Verge
1402
1403 https://github.com/amattn/paral
1404
1405 pyargs
1406
1408 There are certain issues that are very common on parallelizing tools.
1409 Here are a few stress tests. Be warned: If the tool is badly coded it
1410 may overload you machine.
1411
1412 MIX: Output mixes
1413 Output from 2 jobs should not mix. If the output is not used, this does
1414 not matter; but if the output is used then it is important that you do
1415 not get half a line from one job followed by half a line from another
1416 job.
1417
1418 If the tool does not buffer, output will most likely mix now and then.
1419
1420 This test stresses whether output mixes.
1421
1422 #!/bin/bash
1423
1424 paralleltool="parallel -j0"
1425
1426 cat <<-EOF > mycommand
1427 #!/bin/bash
1428
1429 # If a, b, c, d, e, and f mix: Very bad
1430 perl -e 'print STDOUT "a"x3000_000," "'
1431 perl -e 'print STDERR "b"x3000_000," "'
1432 perl -e 'print STDOUT "c"x3000_000," "'
1433 perl -e 'print STDERR "d"x3000_000," "'
1434 perl -e 'print STDOUT "e"x3000_000," "'
1435 perl -e 'print STDERR "f"x3000_000," "'
1436 echo
1437 echo >&2
1438 EOF
1439 chmod +x mycommand
1440
1441 # Run 30 jobs in parallel
1442 seq 30 | $paralleltool ./mycommand > >(tr -s abcdef) 2> >(tr -s abcdef >&2)
1443
1444 # 'a c e' and 'b d f' should always stay together
1445 # and there should only be a single line per job
1446
1447 RAM: Output limited by RAM
1448 Some tools cache output in RAM. This makes them extremely slow if the
1449 output is bigger than physical memory and crash if the the output is
1450 bigger than the virtual memory.
1451
1452 #!/bin/bash
1453
1454 paralleltool="parallel -j0"
1455
1456 cat <<'EOF' > mycommand
1457 #!/bin/bash
1458
1459 # Generate 1 GB output
1460 yes "`perl -e 'print \"c\"x30_000'`" | head -c 1G
1461 EOF
1462 chmod +x mycommand
1463
1464 # Run 20 jobs in parallel
1465 # Adjust 20 to be > physical RAM and < free space on /tmp
1466 seq 20 | time $paralleltool ./mycommand | wc -c
1467
1468 DISKFULL: Incomplete data if /tmp runs full
1469 If caching is done on disk, the disk can run full during the run. Not
1470 all programs discover this. GNU Parallel discovers it, if it stays full
1471 for at least 2 seconds.
1472
1473 #!/bin/bash
1474
1475 paralleltool="parallel -j0"
1476
1477 # This should be a dir with less than 100 GB free space
1478 smalldisk=/tmp/shm/parallel
1479
1480 TMPDIR="$smalldisk"
1481 export TMPDIR
1482
1483 max_output() {
1484 # Force worst case scenario:
1485 # Make GNU Parallel only check once per second
1486 sleep 10
1487 # Generate 100 GB to fill $TMPDIR
1488 # Adjust if /tmp is bigger than 100 GB
1489 yes | head -c 100G >$TMPDIR/$$
1490 # Generate 10 MB output that will not be buffered due to full disk
1491 perl -e 'print "X"x10_000_000' | head -c 10M
1492 echo This part is missing from incomplete output
1493 sleep 2
1494 rm $TMPDIR/$$
1495 echo Final output
1496 }
1497
1498 export -f max_output
1499 seq 10 | $paralleltool max_output | tr -s X
1500
1501 CLEANUP: Leaving tmp files at unexpected death
1502 Some tools do not clean up tmp files if they are killed. If the tool
1503 buffers on disk, they may not clean up, if they are killed.
1504
1505 #!/bin/bash
1506
1507 paralleltool=parallel
1508
1509 ls /tmp >/tmp/before
1510 seq 10 | $paralleltool sleep &
1511 pid=$!
1512 # Give the tool time to start up
1513 sleep 1
1514 # Kill it without giving it a chance to cleanup
1515 kill -9 $!
1516 # Should be empty: No files should be left behind
1517 diff <(ls /tmp) /tmp/before
1518
1519 SPCCHAR: Dealing badly with special file names.
1520 It is not uncommon for users to create files like:
1521
1522 My brother's 12" *** record (costs $$$).jpg
1523
1524 Some tools break on this.
1525
1526 #!/bin/bash
1527
1528 paralleltool=parallel
1529
1530 touch "My brother's 12\" *** record (costs \$\$\$).jpg"
1531 ls My*jpg | $paralleltool ls -l
1532
1533 COMPOSED: Composed commands do not work
1534 Some tools require you to wrap composed commands into bash -c.
1535
1536 echo bar | $paralleltool echo foo';' echo {}
1537
1538 ONEREP: Only one replacement string allowed
1539 Some tools can only insert the argument once.
1540
1541 echo bar | $paralleltool echo {} foo {}
1542
1543 NUMWORDS: Speed depends on number of words
1544 Some tools become very slow if output lines have many words.
1545
1546 #!/bin/bash
1547
1548 paralleltool=parallel
1549
1550 cat <<-EOF > mycommand
1551 #!/bin/bash
1552
1553 # 10 MB of lines with 1000 words
1554 yes "`seq 1000`" | head -c 10M
1555 EOF
1556 chmod +x mycommand
1557
1558 # Run 30 jobs in parallel
1559 seq 30 | time $paralleltool -j0 ./mycommand > /dev/null
1560
1562 When using GNU parallel for a publication please cite:
1563
1564 O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
1565 The USENIX Magazine, February 2011:42-47.
1566
1567 This helps funding further development; and it won't cost you a cent.
1568 If you pay 10000 EUR you should feel free to use GNU Parallel without
1569 citing.
1570
1571 Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
1572
1573 Copyright (C) 2008,2009,2010 Ole Tange, http://ole.tange.dk
1574
1575 Copyright (C) 2010,2011,2012,2013,2014,2015,2016,2017,2018 Ole Tange,
1576 http://ole.tange.dk and Free Software Foundation, Inc.
1577
1578 Parts of the manual concerning xargs compatibility is inspired by the
1579 manual of xargs from GNU findutils 4.4.2.
1580
1582 Copyright (C)
1583 2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018 Free
1584 Software Foundation, Inc.
1585
1586 This program is free software; you can redistribute it and/or modify it
1587 under the terms of the GNU General Public License as published by the
1588 Free Software Foundation; either version 3 of the License, or at your
1589 option any later version.
1590
1591 This program is distributed in the hope that it will be useful, but
1592 WITHOUT ANY WARRANTY; without even the implied warranty of
1593 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
1594 General Public License for more details.
1595
1596 You should have received a copy of the GNU General Public License along
1597 with this program. If not, see <http://www.gnu.org/licenses/>.
1598
1599 Documentation license I
1600 Permission is granted to copy, distribute and/or modify this
1601 documentation under the terms of the GNU Free Documentation License,
1602 Version 1.3 or any later version published by the Free Software
1603 Foundation; with no Invariant Sections, with no Front-Cover Texts, and
1604 with no Back-Cover Texts. A copy of the license is included in the
1605 file fdl.txt.
1606
1607 Documentation license II
1608 You are free:
1609
1610 to Share to copy, distribute and transmit the work
1611
1612 to Remix to adapt the work
1613
1614 Under the following conditions:
1615
1616 Attribution
1617 You must attribute the work in the manner specified by the
1618 author or licensor (but not in any way that suggests that they
1619 endorse you or your use of the work).
1620
1621 Share Alike
1622 If you alter, transform, or build upon this work, you may
1623 distribute the resulting work only under the same, similar or
1624 a compatible license.
1625
1626 With the understanding that:
1627
1628 Waiver Any of the above conditions can be waived if you get
1629 permission from the copyright holder.
1630
1631 Public Domain
1632 Where the work or any of its elements is in the public domain
1633 under applicable law, that status is in no way affected by the
1634 license.
1635
1636 Other Rights
1637 In no way are any of the following rights affected by the
1638 license:
1639
1640 · Your fair dealing or fair use rights, or other applicable
1641 copyright exceptions and limitations;
1642
1643 · The author's moral rights;
1644
1645 · Rights other persons may have either in the work itself or
1646 in how the work is used, such as publicity or privacy
1647 rights.
1648
1649 Notice For any reuse or distribution, you must make clear to others
1650 the license terms of this work.
1651
1652 A copy of the full license is included in the file as cc-by-sa.txt.
1653
1655 GNU parallel uses Perl, and the Perl modules Getopt::Long, IPC::Open3,
1656 Symbol, IO::File, POSIX, and File::Temp. For remote usage it also uses
1657 rsync with ssh.
1658
1660 find(1), xargs(1), make(1), pexec(1), ppss(1), xjobs(1), prll(1),
1661 dxargs(1), mdm(1)
1662
1663
1664
166520180322 2018-03-21 PARALLEL_ALTERNATIVES(7)