1PARALLEL_TUTORIAL(7) parallel PARALLEL_TUTORIAL(7)
2
3
4
6 This tutorial shows off much of GNU parallel's functionality. The
7 tutorial is meant to learn the options in and syntax of GNU parallel.
8 The tutorial is not to show realistic examples from the real world.
9
10 Reader's guide
11 If you prefer reading a book buy GNU Parallel 2018 at
12 http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
13 or download it at: https://doi.org/10.5281/zenodo.1146014
14
15 Otherwise start by watching the intro videos for a quick introduction:
16 http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
17
18 Then browse through the EXAMPLEs after the list of OPTIONS in man
19 parallel (Use LESS=+/EXAMPLE: man parallel). That will give you an idea
20 of what GNU parallel is capable of.
21
22 If you want to dive even deeper: spend a couple of hours walking
23 through the tutorial (man parallel_tutorial). Your command line will
24 love you for it.
25
26 Finally you may want to look at the rest of the manual (man parallel)
27 if you have special needs not already covered.
28
29 If you want to know the design decisions behind GNU parallel, try: man
30 parallel_design. This is also a good intro if you intend to change GNU
31 parallel.
32
34 To run this tutorial you must have the following:
35
36 parallel >= version 20160822
37 Install the newest version using your package manager
38 (recommended for security reasons), the way described in
39 README, or with this command:
40
41 $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
42 fetch -o - http://pi.dk/3 ) > install.sh
43 $ sha1sum install.sh
44 12345678 3374ec53 bacb199b 245af2dd a86df6c9
45 $ md5sum install.sh
46 029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
47 $ sha512sum install.sh
48 40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
49 60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
50 $ bash install.sh
51
52 This will also install the newest version of the tutorial
53 which you can see by running this:
54
55 man parallel_tutorial
56
57 Most of the tutorial will work on older versions, too.
58
59 abc-file:
60 The file can be generated by this command:
61
62 parallel -k echo ::: A B C > abc-file
63
64 def-file:
65 The file can be generated by this command:
66
67 parallel -k echo ::: D E F > def-file
68
69 abc0-file:
70 The file can be generated by this command:
71
72 perl -e 'printf "A\0B\0C\0"' > abc0-file
73
74 abc_-file:
75 The file can be generated by this command:
76
77 perl -e 'printf "A_B_C_"' > abc_-file
78
79 tsv-file.tsv
80 The file can be generated by this command:
81
82 perl -e 'printf "f1\tf2\nA\tB\nC\tD\n"' > tsv-file.tsv
83
84 num8 The file can be generated by this command:
85
86 perl -e 'for(1..8){print "$_\n"}' > num8
87
88 num128 The file can be generated by this command:
89
90 perl -e 'for(1..128){print "$_\n"}' > num128
91
92 num30000 The file can be generated by this command:
93
94 perl -e 'for(1..30000){print "$_\n"}' > num30000
95
96 num1000000
97 The file can be generated by this command:
98
99 perl -e 'for(1..1000000){print "$_\n"}' > num1000000
100
101 num_%header
102 The file can be generated by this command:
103
104 (echo %head1; echo %head2; \
105 perl -e 'for(1..10){print "$_\n"}') > num_%header
106
107 fixedlen The file can be generated by this command:
108
109 perl -e 'print "HHHHAAABBBCCC"' > fixedlen
110
111 For remote running: ssh login on 2 servers with no password in $SERVER1
112 and $SERVER2 must work.
113 SERVER1=server.example.com
114 SERVER2=server2.example.net
115
116 So you must be able to do this:
117
118 ssh $SERVER1 echo works
119 ssh $SERVER2 echo works
120
121 It can be setup by running 'ssh-keygen -t dsa; ssh-copy-id
122 $SERVER1' and using an empty passphrase.
123
125 GNU parallel reads input from input sources. These can be files, the
126 command line, and stdin (standard input or a pipe).
127
128 A single input source
129 Input can be read from the command line:
130
131 parallel echo ::: A B C
132
133 Output (the order may be different because the jobs are run in
134 parallel):
135
136 A
137 B
138 C
139
140 The input source can be a file:
141
142 parallel -a abc-file echo
143
144 Output: Same as above.
145
146 STDIN (standard input) can be the input source:
147
148 cat abc-file | parallel echo
149
150 Output: Same as above.
151
152 Multiple input sources
153 GNU parallel can take multiple input sources given on the command line.
154 GNU parallel then generates all combinations of the input sources:
155
156 parallel echo ::: A B C ::: D E F
157
158 Output (the order may be different):
159
160 A D
161 A E
162 A F
163 B D
164 B E
165 B F
166 C D
167 C E
168 C F
169
170 The input sources can be files:
171
172 parallel -a abc-file -a def-file echo
173
174 Output: Same as above.
175
176 STDIN (standard input) can be one of the input sources using -:
177
178 cat abc-file | parallel -a - -a def-file echo
179
180 Output: Same as above.
181
182 Instead of -a files can be given after :::::
183
184 cat abc-file | parallel echo :::: - def-file
185
186 Output: Same as above.
187
188 ::: and :::: can be mixed:
189
190 parallel echo ::: A B C :::: def-file
191
192 Output: Same as above.
193
194 Linking arguments from input sources
195
196 With --link you can link the input sources and get one argument from
197 each input source:
198
199 parallel --link echo ::: A B C ::: D E F
200
201 Output (the order may be different):
202
203 A D
204 B E
205 C F
206
207 If one of the input sources is too short, its values will wrap:
208
209 parallel --link echo ::: A B C D E ::: F G
210
211 Output (the order may be different):
212
213 A F
214 B G
215 C F
216 D G
217 E F
218
219 For more flexible linking you can use :::+ and ::::+. They work like
220 ::: and :::: except they link the previous input source to this input
221 source.
222
223 This will link ABC to GHI:
224
225 parallel echo :::: abc-file :::+ G H I :::: def-file
226
227 Output (the order may be different):
228
229 A G D
230 A G E
231 A G F
232 B H D
233 B H E
234 B H F
235 C I D
236 C I E
237 C I F
238
239 This will link GHI to DEF:
240
241 parallel echo :::: abc-file ::: G H I ::::+ def-file
242
243 Output (the order may be different):
244
245 A G D
246 A H E
247 A I F
248 B G D
249 B H E
250 B I F
251 C G D
252 C H E
253 C I F
254
255 If one of the input sources is too short when using :::+ or ::::+, the
256 rest will be ignored:
257
258 parallel echo ::: A B C D E :::+ F G
259
260 Output (the order may be different):
261
262 A F
263 B G
264
265 Changing the argument separator.
266 GNU parallel can use other separators than ::: or ::::. This is
267 typically useful if ::: or :::: is used in the command to run:
268
269 parallel --arg-sep ,, echo ,, A B C :::: def-file
270
271 Output (the order may be different):
272
273 A D
274 A E
275 A F
276 B D
277 B E
278 B F
279 C D
280 C E
281 C F
282
283 Changing the argument file separator:
284
285 parallel --arg-file-sep // echo ::: A B C // def-file
286
287 Output: Same as above.
288
289 Changing the argument delimiter
290 GNU parallel will normally treat a full line as a single argument: It
291 uses \n as argument delimiter. This can be changed with -d:
292
293 parallel -d _ echo :::: abc_-file
294
295 Output (the order may be different):
296
297 A
298 B
299 C
300
301 NUL can be given as \0:
302
303 parallel -d '\0' echo :::: abc0-file
304
305 Output: Same as above.
306
307 A shorthand for -d '\0' is -0 (this will often be used to read files
308 from find ... -print0):
309
310 parallel -0 echo :::: abc0-file
311
312 Output: Same as above.
313
314 End-of-file value for input source
315 GNU parallel can stop reading when it encounters a certain value:
316
317 parallel -E stop echo ::: A B stop C D
318
319 Output:
320
321 A
322 B
323
324 Skipping empty lines
325 Using --no-run-if-empty GNU parallel will skip empty lines.
326
327 (echo 1; echo; echo 2) | parallel --no-run-if-empty echo
328
329 Output:
330
331 1
332 2
333
335 No command means arguments are commands
336 If no command is given after parallel the arguments themselves are
337 treated as commands:
338
339 parallel ::: ls 'echo foo' pwd
340
341 Output (the order may be different):
342
343 [list of files in current dir]
344 foo
345 [/path/to/current/working/dir]
346
347 The command can be a script, a binary or a Bash function if the
348 function is exported using export -f:
349
350 # Only works in Bash
351 my_func() {
352 echo in my_func $1
353 }
354 export -f my_func
355 parallel my_func ::: 1 2 3
356
357 Output (the order may be different):
358
359 in my_func 1
360 in my_func 2
361 in my_func 3
362
363 Replacement strings
364 The 7 predefined replacement strings
365
366 GNU parallel has several replacement strings. If no replacement strings
367 are used the default is to append {}:
368
369 parallel echo ::: A/B.C
370
371 Output:
372
373 A/B.C
374
375 The default replacement string is {}:
376
377 parallel echo {} ::: A/B.C
378
379 Output:
380
381 A/B.C
382
383 The replacement string {.} removes the extension:
384
385 parallel echo {.} ::: A/B.C
386
387 Output:
388
389 A/B
390
391 The replacement string {/} removes the path:
392
393 parallel echo {/} ::: A/B.C
394
395 Output:
396
397 B.C
398
399 The replacement string {//} keeps only the path:
400
401 parallel echo {//} ::: A/B.C
402
403 Output:
404
405 A
406
407 The replacement string {/.} removes the path and the extension:
408
409 parallel echo {/.} ::: A/B.C
410
411 Output:
412
413 B
414
415 The replacement string {#} gives the job number:
416
417 parallel echo {#} ::: A B C
418
419 Output (the order may be different):
420
421 1
422 2
423 3
424
425 The replacement string {%} gives the job slot number (between 1 and
426 number of jobs to run in parallel):
427
428 parallel -j 2 echo {%} ::: A B C
429
430 Output (the order may be different and 1 and 2 may be swapped):
431
432 1
433 2
434 1
435
436 Changing the replacement strings
437
438 The replacement string {} can be changed with -I:
439
440 parallel -I ,, echo ,, ::: A/B.C
441
442 Output:
443
444 A/B.C
445
446 The replacement string {.} can be changed with --extensionreplace:
447
448 parallel --extensionreplace ,, echo ,, ::: A/B.C
449
450 Output:
451
452 A/B
453
454 The replacement string {/} can be replaced with --basenamereplace:
455
456 parallel --basenamereplace ,, echo ,, ::: A/B.C
457
458 Output:
459
460 B.C
461
462 The replacement string {//} can be changed with --dirnamereplace:
463
464 parallel --dirnamereplace ,, echo ,, ::: A/B.C
465
466 Output:
467
468 A
469
470 The replacement string {/.} can be changed with
471 --basenameextensionreplace:
472
473 parallel --basenameextensionreplace ,, echo ,, ::: A/B.C
474
475 Output:
476
477 B
478
479 The replacement string {#} can be changed with --seqreplace:
480
481 parallel --seqreplace ,, echo ,, ::: A B C
482
483 Output (the order may be different):
484
485 1
486 2
487 3
488
489 The replacement string {%} can be changed with --slotreplace:
490
491 parallel -j2 --slotreplace ,, echo ,, ::: A B C
492
493 Output (the order may be different and 1 and 2 may be swapped):
494
495 1
496 2
497 1
498
499 Perl expression replacement string
500
501 When predefined replacement strings are not flexible enough a perl
502 expression can be used instead. One example is to remove two
503 extensions: foo.tar.gz becomes foo
504
505 parallel echo '{= s:\.[^.]+$::;s:\.[^.]+$::; =}' ::: foo.tar.gz
506
507 Output:
508
509 foo
510
511 In {= =} you can access all of GNU parallel's internal functions and
512 variables. A few are worth mentioning.
513
514 total_jobs() returns the total number of jobs:
515
516 parallel echo Job {#} of {= '$_=total_jobs()' =} ::: {1..5}
517
518 Output:
519
520 Job 1 of 5
521 Job 2 of 5
522 Job 3 of 5
523 Job 4 of 5
524 Job 5 of 5
525
526 Q(...) shell quotes the string:
527
528 parallel echo {} shell quoted is {= '$_=Q($_)' =} ::: '*/!#$'
529
530 Output:
531
532 */!#$ shell quoted is \*/\!\#\$
533
534 skip() skips the job:
535
536 parallel echo {= 'if($_==3) { skip() }' =} ::: {1..5}
537
538 Output:
539
540 1
541 2
542 4
543 5
544
545 @arg contains the input source variables:
546
547 parallel echo {= 'if($arg[1]==$arg[2]) { skip() }' =} \
548 ::: {1..3} ::: {1..3}
549
550 Output:
551
552 1 2
553 1 3
554 2 1
555 2 3
556 3 1
557 3 2
558
559 If the strings {= and =} cause problems they can be replaced with
560 --parens:
561
562 parallel --parens ,,,, echo ',, s:\.[^.]+$::;s:\.[^.]+$::; ,,' \
563 ::: foo.tar.gz
564
565 Output:
566
567 foo
568
569 To define a shorthand replacement string use --rpl:
570
571 parallel --rpl '.. s:\.[^.]+$::;s:\.[^.]+$::;' echo '..' \
572 ::: foo.tar.gz
573
574 Output: Same as above.
575
576 If the shorthand starts with { it can be used as a positional
577 replacement string, too:
578
579 parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{..}'
580 ::: foo.tar.gz
581
582 Output: Same as above.
583
584 If the shorthand contains matching parenthesis the replacement string
585 becomes a dynamic replacement string and the string in the parenthesis
586 can be accessed as $$1. If there are multiple matching parenthesis, the
587 matched strings can be accessed using $$2, $$3 and so on.
588
589 You can think of this as giving arguments to the replacement string.
590 Here we give the argument .tar.gz to the replacement string {%string}
591 which removes string:
592
593 parallel --rpl '{%(.+?)} s/$$1$//;' echo {%.tar.gz}.zip ::: foo.tar.gz
594
595 Output:
596
597 foo.zip
598
599 Here we give the two arguments tar.gz and zip to the replacement string
600 {/string1/string2} which replaces string1 with string2:
601
602 parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;' echo {/tar.gz/zip} \
603 ::: foo.tar.gz
604
605 Output:
606
607 foo.zip
608
609 GNU parallel's 7 replacement strings are implemented as this:
610
611 --rpl '{} '
612 --rpl '{#} $_=$job->seq()'
613 --rpl '{%} $_=$job->slot()'
614 --rpl '{/} s:.*/::'
615 --rpl '{//} $Global::use{"File::Basename"} ||=
616 eval "use File::Basename; 1;"; $_ = dirname($_);'
617 --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
618 --rpl '{.} s:\.[^/.]+$::'
619
620 Positional replacement strings
621
622 With multiple input sources the argument from the individual input
623 sources can be accessed with {number}:
624
625 parallel echo {1} and {2} ::: A B ::: C D
626
627 Output (the order may be different):
628
629 A and C
630 A and D
631 B and C
632 B and D
633
634 The positional replacement strings can also be modified using /, //,
635 /., and .:
636
637 parallel echo /={1/} //={1//} /.={1/.} .={1.} ::: A/B.C D/E.F
638
639 Output (the order may be different):
640
641 /=B.C //=A /.=B .=A/B
642 /=E.F //=D /.=E .=D/E
643
644 If a position is negative, it will refer to the input source counted
645 from behind:
646
647 parallel echo 1={1} 2={2} 3={3} -1={-1} -2={-2} -3={-3} \
648 ::: A B ::: C D ::: E F
649
650 Output (the order may be different):
651
652 1=A 2=C 3=E -1=E -2=C -3=A
653 1=A 2=C 3=F -1=F -2=C -3=A
654 1=A 2=D 3=E -1=E -2=D -3=A
655 1=A 2=D 3=F -1=F -2=D -3=A
656 1=B 2=C 3=E -1=E -2=C -3=B
657 1=B 2=C 3=F -1=F -2=C -3=B
658 1=B 2=D 3=E -1=E -2=D -3=B
659 1=B 2=D 3=F -1=F -2=D -3=B
660
661 Positional perl expression replacement string
662
663 To use a perl expression as a positional replacement string simply
664 prepend the perl expression with number and space:
665
666 parallel echo '{=2 s:\.[^.]+$::;s:\.[^.]+$::; =} {1}' \
667 ::: bar ::: foo.tar.gz
668
669 Output:
670
671 foo bar
672
673 If a shorthand defined using --rpl starts with { it can be used as a
674 positional replacement string, too:
675
676 parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{2..} {1}' \
677 ::: bar ::: foo.tar.gz
678
679 Output: Same as above.
680
681 Input from columns
682
683 The columns in a file can be bound to positional replacement strings
684 using --colsep. Here the columns are separated by TAB (\t):
685
686 parallel --colsep '\t' echo 1={1} 2={2} :::: tsv-file.tsv
687
688 Output (the order may be different):
689
690 1=f1 2=f2
691 1=A 2=B
692 1=C 2=D
693
694 Header defined replacement strings
695
696 With --header GNU parallel will use the first value of the input source
697 as the name of the replacement string. Only the non-modified version {}
698 is supported:
699
700 parallel --header : echo f1={f1} f2={f2} ::: f1 A B ::: f2 C D
701
702 Output (the order may be different):
703
704 f1=A f2=C
705 f1=A f2=D
706 f1=B f2=C
707 f1=B f2=D
708
709 It is useful with --colsep for processing files with TAB separated
710 values:
711
712 parallel --header : --colsep '\t' echo f1={f1} f2={f2} \
713 :::: tsv-file.tsv
714
715 Output (the order may be different):
716
717 f1=A f2=B
718 f1=C f2=D
719
720 More pre-defined replacement strings with --plus
721
722 --plus adds the replacement strings {+/} {+.} {+..} {+...} {..} {...}
723 {/..} {/...} {##}. The idea being that {+foo} matches the opposite of
724 {foo} and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
725 {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}.
726
727 parallel --plus echo {} ::: dir/sub/file.ex1.ex2.ex3
728 parallel --plus echo {+/}/{/} ::: dir/sub/file.ex1.ex2.ex3
729 parallel --plus echo {.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
730 parallel --plus echo {+/}/{/.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
731 parallel --plus echo {..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
732 parallel --plus echo {+/}/{/..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
733 parallel --plus echo {...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
734 parallel --plus echo {+/}/{/...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
735
736 Output:
737
738 dir/sub/file.ex1.ex2.ex3
739
740 {##} is simply the number of jobs:
741
742 parallel --plus echo Job {#} of {##} ::: {1..5}
743
744 Output:
745
746 Job 1 of 5
747 Job 2 of 5
748 Job 3 of 5
749 Job 4 of 5
750 Job 5 of 5
751
752 Dynamic replacement strings with --plus
753
754 --plus also defines these dynamic replacement strings:
755
756 {:-string} Default value is string if the argument is empty.
757
758 {:number} Substring from number till end of string.
759
760 {:number1:number2} Substring from number1 to number2.
761
762 {#string} If the argument starts with string, remove it.
763
764 {%string} If the argument ends with string, remove it.
765
766 {/string1/string2} Replace string1 with string2.
767
768 {^string} If the argument starts with string, upper case it.
769 string must be a single letter.
770
771 {^^string} If the argument contains string, upper case it.
772 string must be a single letter.
773
774 {,string} If the argument starts with string, lower case it.
775 string must be a single letter.
776
777 {,,string} If the argument contains string, lower case it.
778 string must be a single letter.
779
780 They are inspired from Bash:
781
782 unset myvar
783 echo ${myvar:-myval}
784 parallel --plus echo {:-myval} ::: "$myvar"
785
786 myvar=abcAaAdef
787 echo ${myvar:2}
788 parallel --plus echo {:2} ::: "$myvar"
789
790 echo ${myvar:2:3}
791 parallel --plus echo {:2:3} ::: "$myvar"
792
793 echo ${myvar#bc}
794 parallel --plus echo {#bc} ::: "$myvar"
795 echo ${myvar#abc}
796 parallel --plus echo {#abc} ::: "$myvar"
797
798 echo ${myvar%de}
799 parallel --plus echo {%de} ::: "$myvar"
800 echo ${myvar%def}
801 parallel --plus echo {%def} ::: "$myvar"
802
803 echo ${myvar/def/ghi}
804 parallel --plus echo {/def/ghi} ::: "$myvar"
805
806 echo ${myvar^a}
807 parallel --plus echo {^a} ::: "$myvar"
808 echo ${myvar^^a}
809 parallel --plus echo {^^a} ::: "$myvar"
810
811 myvar=AbcAaAdef
812 echo ${myvar,A}
813 parallel --plus echo '{,A}' ::: "$myvar"
814 echo ${myvar,,A}
815 parallel --plus echo '{,,A}' ::: "$myvar"
816
817 Output:
818
819 myval
820 myval
821 cAaAdef
822 cAaAdef
823 cAa
824 cAa
825 abcAaAdef
826 abcAaAdef
827 AaAdef
828 AaAdef
829 abcAaAdef
830 abcAaAdef
831 abcAaA
832 abcAaA
833 abcAaAghi
834 abcAaAghi
835 AbcAaAdef
836 AbcAaAdef
837 AbcAAAdef
838 AbcAAAdef
839 abcAaAdef
840 abcAaAdef
841 abcaaadef
842 abcaaadef
843
844 More than one argument
845 With --xargs GNU parallel will fit as many arguments as possible on a
846 single line:
847
848 cat num30000 | parallel --xargs echo | wc -l
849
850 Output (if you run this under Bash on GNU/Linux):
851
852 2
853
854 The 30000 arguments fitted on 2 lines.
855
856 The maximal length of a single line can be set with -s. With a maximal
857 line length of 10000 chars 17 commands will be run:
858
859 cat num30000 | parallel --xargs -s 10000 echo | wc -l
860
861 Output:
862
863 17
864
865 For better parallelism GNU parallel can distribute the arguments
866 between all the parallel jobs when end of file is met.
867
868 Below GNU parallel reads the last argument when generating the second
869 job. When GNU parallel reads the last argument, it spreads all the
870 arguments for the second job over 4 jobs instead, as 4 parallel jobs
871 are requested.
872
873 The first job will be the same as the --xargs example above, but the
874 second job will be split into 4 evenly sized jobs, resulting in a total
875 of 5 jobs:
876
877 cat num30000 | parallel --jobs 4 -m echo | wc -l
878
879 Output (if you run this under Bash on GNU/Linux):
880
881 5
882
883 This is even more visible when running 4 jobs with 10 arguments. The 10
884 arguments are being spread over 4 jobs:
885
886 parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10
887
888 Output:
889
890 1 2 3
891 4 5 6
892 7 8 9
893 10
894
895 A replacement string can be part of a word. -m will not repeat the
896 context:
897
898 parallel --jobs 4 -m echo pre-{}-post ::: A B C D E F G
899
900 Output (the order may be different):
901
902 pre-A B-post
903 pre-C D-post
904 pre-E F-post
905 pre-G-post
906
907 To repeat the context use -X which otherwise works like -m:
908
909 parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G
910
911 Output (the order may be different):
912
913 pre-A-post pre-B-post
914 pre-C-post pre-D-post
915 pre-E-post pre-F-post
916 pre-G-post
917
918 To limit the number of arguments use -N:
919
920 parallel -N3 echo ::: A B C D E F G H
921
922 Output (the order may be different):
923
924 A B C
925 D E F
926 G H
927
928 -N also sets the positional replacement strings:
929
930 parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H
931
932 Output (the order may be different):
933
934 1=A 2=B 3=C
935 1=D 2=E 3=F
936 1=G 2=H 3=
937
938 -N0 reads 1 argument but inserts none:
939
940 parallel -N0 echo foo ::: 1 2 3
941
942 Output:
943
944 foo
945 foo
946 foo
947
948 Quoting
949 Command lines that contain special characters may need to be protected
950 from the shell.
951
952 The perl program print "@ARGV\n" basically works like echo.
953
954 perl -e 'print "@ARGV\n"' A
955
956 Output:
957
958 A
959
960 To run that in parallel the command needs to be quoted:
961
962 parallel perl -e 'print "@ARGV\n"' ::: This wont work
963
964 Output:
965
966 [Nothing]
967
968 To quote the command use -q:
969
970 parallel -q perl -e 'print "@ARGV\n"' ::: This works
971
972 Output (the order may be different):
973
974 This
975 works
976
977 Or you can quote the critical part using \':
978
979 parallel perl -e \''print "@ARGV\n"'\' ::: This works, too
980
981 Output (the order may be different):
982
983 This
984 works,
985 too
986
987 GNU parallel can also \-quote full lines. Simply run this:
988
989 parallel --shellquote
990 Warning: Input is read from the terminal. You either know what you
991 Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
992 Warning: ::: or :::: or to pipe data into parallel. If so
993 Warning: consider going through the tutorial: man parallel_tutorial
994 Warning: Press CTRL-D to exit.
995 perl -e 'print "@ARGV\n"'
996 [CTRL-D]
997
998 Output:
999
1000 perl\ -e\ \'print\ \"@ARGV\\n\"\'
1001
1002 This can then be used as the command:
1003
1004 parallel perl\ -e\ \'print\ \"@ARGV\\n\"\' ::: This also works
1005
1006 Output (the order may be different):
1007
1008 This
1009 also
1010 works
1011
1012 Trimming space
1013 Space can be trimmed on the arguments using --trim:
1014
1015 parallel --trim r echo pre-{}-post ::: ' A '
1016
1017 Output:
1018
1019 pre- A-post
1020
1021 To trim on the left side:
1022
1023 parallel --trim l echo pre-{}-post ::: ' A '
1024
1025 Output:
1026
1027 pre-A -post
1028
1029 To trim on the both sides:
1030
1031 parallel --trim lr echo pre-{}-post ::: ' A '
1032
1033 Output:
1034
1035 pre-A-post
1036
1037 Respecting the shell
1038 This tutorial uses Bash as the shell. GNU parallel respects which shell
1039 you are using, so in zsh you can do:
1040
1041 parallel echo \={} ::: zsh bash ls
1042
1043 Output:
1044
1045 /usr/bin/zsh
1046 /bin/bash
1047 /bin/ls
1048
1049 In csh you can do:
1050
1051 parallel 'set a="{}"; if( { test -d "$a" } ) echo "$a is a dir"' ::: *
1052
1053 Output:
1054
1055 [somedir] is a dir
1056
1057 This also becomes useful if you use GNU parallel in a shell script: GNU
1058 parallel will use the same shell as the shell script.
1059
1061 The output can prefixed with the argument:
1062
1063 parallel --tag echo foo-{} ::: A B C
1064
1065 Output (the order may be different):
1066
1067 A foo-A
1068 B foo-B
1069 C foo-C
1070
1071 To prefix it with another string use --tagstring:
1072
1073 parallel --tagstring {}-bar echo foo-{} ::: A B C
1074
1075 Output (the order may be different):
1076
1077 A-bar foo-A
1078 B-bar foo-B
1079 C-bar foo-C
1080
1081 To see what commands will be run without running them use --dryrun:
1082
1083 parallel --dryrun echo {} ::: A B C
1084
1085 Output (the order may be different):
1086
1087 echo A
1088 echo B
1089 echo C
1090
1091 To print the command before running them use --verbose:
1092
1093 parallel --verbose echo {} ::: A B C
1094
1095 Output (the order may be different):
1096
1097 echo A
1098 echo B
1099 A
1100 echo C
1101 B
1102 C
1103
1104 GNU parallel will postpone the output until the command completes:
1105
1106 parallel -j2 'printf "%s-start\n%s" {} {};
1107 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1108
1109 Output:
1110
1111 2-start
1112 2-middle
1113 2-end
1114 1-start
1115 1-middle
1116 1-end
1117 4-start
1118 4-middle
1119 4-end
1120
1121 To get the output immediately use --ungroup:
1122
1123 parallel -j2 --ungroup 'printf "%s-start\n%s" {} {};
1124 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1125
1126 Output:
1127
1128 4-start
1129 42-start
1130 2-middle
1131 2-end
1132 1-start
1133 1-middle
1134 1-end
1135 -middle
1136 4-end
1137
1138 --ungroup is fast, but can cause half a line from one job to be mixed
1139 with half a line of another job. That has happened in the second line,
1140 where the line '4-middle' is mixed with '2-start'.
1141
1142 To avoid this use --linebuffer:
1143
1144 parallel -j2 --linebuffer 'printf "%s-start\n%s" {} {};
1145 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1146
1147 Output:
1148
1149 4-start
1150 2-start
1151 2-middle
1152 2-end
1153 1-start
1154 1-middle
1155 1-end
1156 4-middle
1157 4-end
1158
1159 To force the output in the same order as the arguments use
1160 --keep-order/-k:
1161
1162 parallel -j2 -k 'printf "%s-start\n%s" {} {};
1163 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1164
1165 Output:
1166
1167 4-start
1168 4-middle
1169 4-end
1170 2-start
1171 2-middle
1172 2-end
1173 1-start
1174 1-middle
1175 1-end
1176
1177 Saving output into files
1178 GNU parallel can save the output of each job into files:
1179
1180 parallel --files echo ::: A B C
1181
1182 Output will be similar to this:
1183
1184 /tmp/pAh6uWuQCg.par
1185 /tmp/opjhZCzAX4.par
1186 /tmp/W0AT_Rph2o.par
1187
1188 By default GNU parallel will cache the output in files in /tmp. This
1189 can be changed by setting $TMPDIR or --tmpdir:
1190
1191 parallel --tmpdir /var/tmp --files echo ::: A B C
1192
1193 Output will be similar to this:
1194
1195 /var/tmp/N_vk7phQRc.par
1196 /var/tmp/7zA4Ccf3wZ.par
1197 /var/tmp/LIuKgF_2LP.par
1198
1199 Or:
1200
1201 TMPDIR=/var/tmp parallel --files echo ::: A B C
1202
1203 Output: Same as above.
1204
1205 The output files can be saved in a structured way using --results:
1206
1207 parallel --results outdir echo ::: A B C
1208
1209 Output:
1210
1211 A
1212 B
1213 C
1214
1215 These files were also generated containing the standard output
1216 (stdout), standard error (stderr), and the sequence number (seq):
1217
1218 outdir/1/A/seq
1219 outdir/1/A/stderr
1220 outdir/1/A/stdout
1221 outdir/1/B/seq
1222 outdir/1/B/stderr
1223 outdir/1/B/stdout
1224 outdir/1/C/seq
1225 outdir/1/C/stderr
1226 outdir/1/C/stdout
1227
1228 --header : will take the first value as name and use that in the
1229 directory structure. This is useful if you are using multiple input
1230 sources:
1231
1232 parallel --header : --results outdir echo ::: f1 A B ::: f2 C D
1233
1234 Generated files:
1235
1236 outdir/f1/A/f2/C/seq
1237 outdir/f1/A/f2/C/stderr
1238 outdir/f1/A/f2/C/stdout
1239 outdir/f1/A/f2/D/seq
1240 outdir/f1/A/f2/D/stderr
1241 outdir/f1/A/f2/D/stdout
1242 outdir/f1/B/f2/C/seq
1243 outdir/f1/B/f2/C/stderr
1244 outdir/f1/B/f2/C/stdout
1245 outdir/f1/B/f2/D/seq
1246 outdir/f1/B/f2/D/stderr
1247 outdir/f1/B/f2/D/stdout
1248
1249 The directories are named after the variables and their values.
1250
1252 Number of simultaneous jobs
1253 The number of concurrent jobs is given with --jobs/-j:
1254
1255 /usr/bin/time parallel -N0 -j64 sleep 1 :::: num128
1256
1257 With 64 jobs in parallel the 128 sleeps will take 2-8 seconds to run -
1258 depending on how fast your machine is.
1259
1260 By default --jobs is the same as the number of CPU cores. So this:
1261
1262 /usr/bin/time parallel -N0 sleep 1 :::: num128
1263
1264 should take twice the time of running 2 jobs per CPU core:
1265
1266 /usr/bin/time parallel -N0 --jobs 200% sleep 1 :::: num128
1267
1268 --jobs 0 will run as many jobs in parallel as possible:
1269
1270 /usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: num128
1271
1272 which should take 1-7 seconds depending on how fast your machine is.
1273
1274 --jobs can read from a file which is re-read when a job finishes:
1275
1276 echo 50% > my_jobs
1277 /usr/bin/time parallel -N0 --jobs my_jobs sleep 1 :::: num128 &
1278 sleep 1
1279 echo 0 > my_jobs
1280 wait
1281
1282 The first second only 50% of the CPU cores will run a job. Then 0 is
1283 put into my_jobs and then the rest of the jobs will be started in
1284 parallel.
1285
1286 Instead of basing the percentage on the number of CPU cores GNU
1287 parallel can base it on the number of CPUs:
1288
1289 parallel --use-cpus-instead-of-cores -N0 sleep 1 :::: num8
1290
1291 Shuffle job order
1292 If you have many jobs (e.g. by multiple combinations of input sources),
1293 it can be handy to shuffle the jobs, so you get different values run.
1294 Use --shuf for that:
1295
1296 parallel --shuf echo ::: 1 2 3 ::: a b c ::: A B C
1297
1298 Output:
1299
1300 All combinations but different order for each run.
1301
1302 Interactivity
1303 GNU parallel can ask the user if a command should be run using
1304 --interactive:
1305
1306 parallel --interactive echo ::: 1 2 3
1307
1308 Output:
1309
1310 echo 1 ?...y
1311 echo 2 ?...n
1312 1
1313 echo 3 ?...y
1314 3
1315
1316 GNU parallel can be used to put arguments on the command line for an
1317 interactive command such as emacs to edit one file at a time:
1318
1319 parallel --tty emacs ::: 1 2 3
1320
1321 Or give multiple argument in one go to open multiple files:
1322
1323 parallel -X --tty vi ::: 1 2 3
1324
1325 A terminal for every job
1326 Using --tmux GNU parallel can start a terminal for every job run:
1327
1328 seq 10 20 | parallel --tmux 'echo start {}; sleep {}; echo done {}'
1329
1330 This will tell you to run something similar to:
1331
1332 tmux -S /tmp/tmsrPrO0 attach
1333
1334 Using normal tmux keystrokes (CTRL-b n or CTRL-b p) you can cycle
1335 between windows of the running jobs. When a job is finished it will
1336 pause for 10 seconds before closing the window.
1337
1338 Timing
1339 Some jobs do heavy I/O when they start. To avoid a thundering herd GNU
1340 parallel can delay starting new jobs. --delay X will make sure there is
1341 at least X seconds between each start:
1342
1343 parallel --delay 2.5 echo Starting {}\;date ::: 1 2 3
1344
1345 Output:
1346
1347 Starting 1
1348 Thu Aug 15 16:24:33 CEST 2013
1349 Starting 2
1350 Thu Aug 15 16:24:35 CEST 2013
1351 Starting 3
1352 Thu Aug 15 16:24:38 CEST 2013
1353
1354 If jobs taking more than a certain amount of time are known to fail,
1355 they can be stopped with --timeout. The accuracy of --timeout is 2
1356 seconds:
1357
1358 parallel --timeout 4.1 sleep {}\; echo {} ::: 2 4 6 8
1359
1360 Output:
1361
1362 2
1363 4
1364
1365 GNU parallel can compute the median runtime for jobs and kill those
1366 that take more than 200% of the median runtime:
1367
1368 parallel --timeout 200% sleep {}\; echo {} ::: 2.1 2.2 3 7 2.3
1369
1370 Output:
1371
1372 2.1
1373 2.2
1374 3
1375 2.3
1376
1377 Progress information
1378 Based on the runtime of completed jobs GNU parallel can estimate the
1379 total runtime:
1380
1381 parallel --eta sleep ::: 1 3 2 2 1 3 3 2 1
1382
1383 Output:
1384
1385 Computers / CPU cores / Max jobs to run
1386 1:local / 2 / 2
1387
1388 Computer:jobs running/jobs completed/%of started jobs/
1389 Average seconds to complete
1390 ETA: 2s 0left 1.11avg local:0/9/100%/1.1s
1391
1392 GNU parallel can give progress information with --progress:
1393
1394 parallel --progress sleep ::: 1 3 2 2 1 3 3 2 1
1395
1396 Output:
1397
1398 Computers / CPU cores / Max jobs to run
1399 1:local / 2 / 2
1400
1401 Computer:jobs running/jobs completed/%of started jobs/
1402 Average seconds to complete
1403 local:0/9/100%/1.1s
1404
1405 A progress bar can be shown with --bar:
1406
1407 parallel --bar sleep ::: 1 3 2 2 1 3 3 2 1
1408
1409 And a graphic bar can be shown with --bar and zenity:
1410
1411 seq 1000 | parallel -j10 --bar '(echo -n {};sleep 0.1)' \
1412 2> >(zenity --progress --auto-kill --auto-close)
1413
1414 A logfile of the jobs completed so far can be generated with --joblog:
1415
1416 parallel --joblog /tmp/log exit ::: 1 2 3 0
1417 cat /tmp/log
1418
1419 Output:
1420
1421 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1422 1 : 1376577364.974 0.008 0 0 1 0 exit 1
1423 2 : 1376577364.982 0.013 0 0 2 0 exit 2
1424 3 : 1376577364.990 0.013 0 0 3 0 exit 3
1425 4 : 1376577365.003 0.003 0 0 0 0 exit 0
1426
1427 The log contains the job sequence, which host the job was run on, the
1428 start time and run time, how much data was transferred, the exit value,
1429 the signal that killed the job, and finally the command being run.
1430
1431 With a joblog GNU parallel can be stopped and later pickup where it
1432 left off. It it important that the input of the completed jobs is
1433 unchanged.
1434
1435 parallel --joblog /tmp/log exit ::: 1 2 3 0
1436 cat /tmp/log
1437 parallel --resume --joblog /tmp/log exit ::: 1 2 3 0 0 0
1438 cat /tmp/log
1439
1440 Output:
1441
1442 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1443 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1444 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1445 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1446 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1447
1448 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1449 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1450 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1451 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1452 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1453 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1454 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1455
1456 Note how the start time of the last 2 jobs is clearly different from
1457 the second run.
1458
1459 With --resume-failed GNU parallel will re-run the jobs that failed:
1460
1461 parallel --resume-failed --joblog /tmp/log exit ::: 1 2 3 0 0 0
1462 cat /tmp/log
1463
1464 Output:
1465
1466 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1467 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1468 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1469 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1470 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1471 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1472 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1473 1 : 1376580154.433 0.010 0 0 1 0 exit 1
1474 2 : 1376580154.444 0.022 0 0 2 0 exit 2
1475 3 : 1376580154.466 0.005 0 0 3 0 exit 3
1476
1477 Note how seq 1 2 3 have been repeated because they had exit value
1478 different from 0.
1479
1480 --retry-failed does almost the same as --resume-failed. Where
1481 --resume-failed reads the commands from the command line (and ignores
1482 the commands in the joblog), --retry-failed ignores the command line
1483 and reruns the commands mentioned in the joblog.
1484
1485 parallel --retry-failed --joblog /tmp/log
1486 cat /tmp/log
1487
1488 Output:
1489
1490 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1491 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1492 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1493 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1494 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1495 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1496 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1497 1 : 1376580154.433 0.010 0 0 1 0 exit 1
1498 2 : 1376580154.444 0.022 0 0 2 0 exit 2
1499 3 : 1376580154.466 0.005 0 0 3 0 exit 3
1500 1 : 1376580164.633 0.010 0 0 1 0 exit 1
1501 2 : 1376580164.644 0.022 0 0 2 0 exit 2
1502 3 : 1376580164.666 0.005 0 0 3 0 exit 3
1503
1504 Termination
1505 Unconditional termination
1506
1507 By default GNU parallel will wait for all jobs to finish before
1508 exiting.
1509
1510 If you send GNU parallel the TERM signal, GNU parallel will stop
1511 spawning new jobs and wait for the remaining jobs to finish. If you
1512 send GNU parallel the TERM signal again, GNU parallel will kill all
1513 running jobs and exit.
1514
1515 Termination dependent on job status
1516
1517 For certain jobs there is no need to continue if one of the jobs fails
1518 and has an exit code different from 0. GNU parallel will stop spawning
1519 new jobs with --halt soon,fail=1:
1520
1521 parallel -j2 --halt soon,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1522
1523 Output:
1524
1525 0
1526 0
1527 1
1528 parallel: This job failed:
1529 echo 1; exit 1
1530 parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1531 2
1532
1533 With --halt now,fail=1 the running jobs will be killed immediately:
1534
1535 parallel -j2 --halt now,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1536
1537 Output:
1538
1539 0
1540 0
1541 1
1542 parallel: This job failed:
1543 echo 1; exit 1
1544
1545 If --halt is given a percentage this percentage of the jobs must fail
1546 before GNU parallel stops spawning more jobs:
1547
1548 parallel -j2 --halt soon,fail=20% echo {}\; exit {} \
1549 ::: 0 1 2 3 4 5 6 7 8 9
1550
1551 Output:
1552
1553 0
1554 1
1555 parallel: This job failed:
1556 echo 1; exit 1
1557 2
1558 parallel: This job failed:
1559 echo 2; exit 2
1560 parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1561 3
1562 parallel: This job failed:
1563 echo 3; exit 3
1564
1565 If you are looking for success instead of failures, you can use
1566 success. This will finish as soon as the first job succeeds:
1567
1568 parallel -j2 --halt now,success=1 echo {}\; exit {} ::: 1 2 3 0 4 5 6
1569
1570 Output:
1571
1572 1
1573 2
1574 3
1575 0
1576 parallel: This job succeeded:
1577 echo 0; exit 0
1578
1579 GNU parallel can retry the command with --retries. This is useful if a
1580 command fails for unknown reasons now and then.
1581
1582 parallel -k --retries 3 \
1583 'echo tried {} >>/tmp/runs; echo completed {}; exit {}' ::: 1 2 0
1584 cat /tmp/runs
1585
1586 Output:
1587
1588 completed 1
1589 completed 2
1590 completed 0
1591
1592 tried 1
1593 tried 2
1594 tried 1
1595 tried 2
1596 tried 1
1597 tried 2
1598 tried 0
1599
1600 Note how job 1 and 2 were tried 3 times, but 0 was not retried because
1601 it had exit code 0.
1602
1603 Termination signals (advanced)
1604
1605 Using --termseq you can control which signals are sent when killing
1606 children. Normally children will be killed by sending them SIGTERM,
1607 waiting 200 ms, then another SIGTERM, waiting 100 ms, then another
1608 SIGTERM, waiting 50 ms, then a SIGKILL, finally waiting 25 ms before
1609 giving up. It looks like this:
1610
1611 show_signals() {
1612 perl -e 'for(keys %SIG) {
1613 $SIG{$_} = eval "sub { print \"Got $_\\n\"; }";
1614 }
1615 while(1){sleep 1}'
1616 }
1617 export -f show_signals
1618 echo | parallel --termseq TERM,200,TERM,100,TERM,50,KILL,25 \
1619 -u --timeout 1 show_signals
1620
1621 Output:
1622
1623 Got TERM
1624 Got TERM
1625 Got TERM
1626
1627 Or just:
1628
1629 echo | parallel -u --timeout 1 show_signals
1630
1631 Output: Same as above.
1632
1633 You can change this to SIGINT, SIGTERM, SIGKILL:
1634
1635 echo | parallel --termseq INT,200,TERM,100,KILL,25 \
1636 -u --timeout 1 show_signals
1637
1638 Output:
1639
1640 Got INT
1641 Got TERM
1642
1643 The SIGKILL does not show because it cannot be caught, and thus the
1644 child dies.
1645
1646 Limiting the resources
1647 To avoid overloading systems GNU parallel can look at the system load
1648 before starting another job:
1649
1650 parallel --load 100% echo load is less than {} job per cpu ::: 1
1651
1652 Output:
1653
1654 [when then load is less than the number of cpu cores]
1655 load is less than 1 job per cpu
1656
1657 GNU parallel can also check if the system is swapping.
1658
1659 parallel --noswap echo the system is not swapping ::: now
1660
1661 Output:
1662
1663 [when then system is not swapping]
1664 the system is not swapping now
1665
1666 Some jobs need a lot of memory, and should only be started when there
1667 is enough memory free. Using --memfree GNU parallel can check if there
1668 is enough memory free. Additionally, GNU parallel will kill off the
1669 youngest job if the memory free falls below 50% of the size. The killed
1670 job will put back on the queue and retried later.
1671
1672 parallel --memfree 1G echo will run if more than 1 GB is ::: free
1673
1674 GNU parallel can run the jobs with a nice value. This will work both
1675 locally and remotely.
1676
1677 parallel --nice 17 echo this is being run with nice -n ::: 17
1678
1679 Output:
1680
1681 this is being run with nice -n 17
1682
1684 GNU parallel can run jobs on remote servers. It uses ssh to communicate
1685 with the remote machines.
1686
1687 Sshlogin
1688 The most basic sshlogin is -S host:
1689
1690 parallel -S $SERVER1 echo running on ::: $SERVER1
1691
1692 Output:
1693
1694 running on [$SERVER1]
1695
1696 To use a different username prepend the server with username@:
1697
1698 parallel -S username@$SERVER1 echo running on ::: username@$SERVER1
1699
1700 Output:
1701
1702 running on [username@$SERVER1]
1703
1704 The special sshlogin : is the local machine:
1705
1706 parallel -S : echo running on ::: the_local_machine
1707
1708 Output:
1709
1710 running on the_local_machine
1711
1712 If ssh is not in $PATH it can be prepended to $SERVER1:
1713
1714 parallel -S '/usr/bin/ssh '$SERVER1 echo custom ::: ssh
1715
1716 Output:
1717
1718 custom ssh
1719
1720 The ssh command can also be given using --ssh:
1721
1722 parallel --ssh /usr/bin/ssh -S $SERVER1 echo custom ::: ssh
1723
1724 or by setting $PARALLEL_SSH:
1725
1726 export PARALLEL_SSH=/usr/bin/ssh
1727 parallel -S $SERVER1 echo custom ::: ssh
1728
1729 Several servers can be given using multiple -S:
1730
1731 parallel -S $SERVER1 -S $SERVER2 echo ::: running on more hosts
1732
1733 Output (the order may be different):
1734
1735 running
1736 on
1737 more
1738 hosts
1739
1740 Or they can be separated by ,:
1741
1742 parallel -S $SERVER1,$SERVER2 echo ::: running on more hosts
1743
1744 Output: Same as above.
1745
1746 Or newline:
1747
1748 # This gives a \n between $SERVER1 and $SERVER2
1749 SERVERS="`echo $SERVER1; echo $SERVER2`"
1750 parallel -S "$SERVERS" echo ::: running on more hosts
1751
1752 They can also be read from a file (replace user@ with the user on
1753 $SERVER2):
1754
1755 echo $SERVER1 > nodefile
1756 # Force 4 cores, special ssh-command, username
1757 echo 4//usr/bin/ssh user@$SERVER2 >> nodefile
1758 parallel --sshloginfile nodefile echo ::: running on more hosts
1759
1760 Output: Same as above.
1761
1762 Every time a job finished, the --sshloginfile will be re-read, so it is
1763 possible to both add and remove hosts while running.
1764
1765 The special --sshloginfile .. reads from ~/.parallel/sshloginfile.
1766
1767 To force GNU parallel to treat a server having a given number of CPU
1768 cores prepend the number of core followed by / to the sshlogin:
1769
1770 parallel -S 4/$SERVER1 echo force {} cpus on server ::: 4
1771
1772 Output:
1773
1774 force 4 cpus on server
1775
1776 Servers can be put into groups by prepending @groupname to the server
1777 and the group can then be selected by appending @groupname to the
1778 argument if using --hostgroup:
1779
1780 parallel --hostgroup -S @grp1/$SERVER1 -S @grp2/$SERVER2 echo {} \
1781 ::: run_on_grp1@grp1 run_on_grp2@grp2
1782
1783 Output:
1784
1785 run_on_grp1
1786 run_on_grp2
1787
1788 A host can be in multiple groups by separating the groups with +, and
1789 you can force GNU parallel to limit the groups on which the command can
1790 be run with -S @groupname:
1791
1792 parallel -S @grp1 -S @grp1+grp2/$SERVER1 -S @grp2/SERVER2 echo {} \
1793 ::: run_on_grp1 also_grp1
1794
1795 Output:
1796
1797 run_on_grp1
1798 also_grp1
1799
1800 Transferring files
1801 GNU parallel can transfer the files to be processed to the remote host.
1802 It does that using rsync.
1803
1804 echo This is input_file > input_file
1805 parallel -S $SERVER1 --transferfile {} cat ::: input_file
1806
1807 Output:
1808
1809 This is input_file
1810
1811 If the files are processed into another file, the resulting file can be
1812 transferred back:
1813
1814 echo This is input_file > input_file
1815 parallel -S $SERVER1 --transferfile {} --return {}.out \
1816 cat {} ">"{}.out ::: input_file
1817 cat input_file.out
1818
1819 Output: Same as above.
1820
1821 To remove the input and output file on the remote server use --cleanup:
1822
1823 echo This is input_file > input_file
1824 parallel -S $SERVER1 --transferfile {} --return {}.out --cleanup \
1825 cat {} ">"{}.out ::: input_file
1826 cat input_file.out
1827
1828 Output: Same as above.
1829
1830 There is a shorthand for --transferfile {} --return --cleanup called
1831 --trc:
1832
1833 echo This is input_file > input_file
1834 parallel -S $SERVER1 --trc {}.out cat {} ">"{}.out ::: input_file
1835 cat input_file.out
1836
1837 Output: Same as above.
1838
1839 Some jobs need a common database for all jobs. GNU parallel can
1840 transfer that using --basefile which will transfer the file before the
1841 first job:
1842
1843 echo common data > common_file
1844 parallel --basefile common_file -S $SERVER1 \
1845 cat common_file\; echo {} ::: foo
1846
1847 Output:
1848
1849 common data
1850 foo
1851
1852 To remove it from the remote host after the last job use --cleanup.
1853
1854 Working dir
1855 The default working dir on the remote machines is the login dir. This
1856 can be changed with --workdir mydir.
1857
1858 Files transferred using --transferfile and --return will be relative to
1859 mydir on remote computers, and the command will be executed in the dir
1860 mydir.
1861
1862 The special mydir value ... will create working dirs under
1863 ~/.parallel/tmp on the remote computers. If --cleanup is given these
1864 dirs will be removed.
1865
1866 The special mydir value . uses the current working dir. If the current
1867 working dir is beneath your home dir, the value . is treated as the
1868 relative path to your home dir. This means that if your home dir is
1869 different on remote computers (e.g. if your login is different) the
1870 relative path will still be relative to your home dir.
1871
1872 parallel -S $SERVER1 pwd ::: ""
1873 parallel --workdir . -S $SERVER1 pwd ::: ""
1874 parallel --workdir ... -S $SERVER1 pwd ::: ""
1875
1876 Output:
1877
1878 [the login dir on $SERVER1]
1879 [current dir relative on $SERVER1]
1880 [a dir in ~/.parallel/tmp/...]
1881
1882 Avoid overloading sshd
1883 If many jobs are started on the same server, sshd can be overloaded.
1884 GNU parallel can insert a delay between each job run on the same
1885 server:
1886
1887 parallel -S $SERVER1 --sshdelay 0.2 echo ::: 1 2 3
1888
1889 Output (the order may be different):
1890
1891 1
1892 2
1893 3
1894
1895 sshd will be less overloaded if using --controlmaster, which will
1896 multiplex ssh connections:
1897
1898 parallel --controlmaster -S $SERVER1 echo ::: 1 2 3
1899
1900 Output: Same as above.
1901
1902 Ignore hosts that are down
1903 In clusters with many hosts a few of them are often down. GNU parallel
1904 can ignore those hosts. In this case the host 173.194.32.46 is down:
1905
1906 parallel --filter-hosts -S 173.194.32.46,$SERVER1 echo ::: bar
1907
1908 Output:
1909
1910 bar
1911
1912 Running the same commands on all hosts
1913 GNU parallel can run the same command on all the hosts:
1914
1915 parallel --onall -S $SERVER1,$SERVER2 echo ::: foo bar
1916
1917 Output (the order may be different):
1918
1919 foo
1920 bar
1921 foo
1922 bar
1923
1924 Often you will just want to run a single command on all hosts with out
1925 arguments. --nonall is a no argument --onall:
1926
1927 parallel --nonall -S $SERVER1,$SERVER2 echo foo bar
1928
1929 Output:
1930
1931 foo bar
1932 foo bar
1933
1934 When --tag is used with --nonall and --onall the --tagstring is the
1935 host:
1936
1937 parallel --nonall --tag -S $SERVER1,$SERVER2 echo foo bar
1938
1939 Output (the order may be different):
1940
1941 $SERVER1 foo bar
1942 $SERVER2 foo bar
1943
1944 --jobs sets the number of servers to log in to in parallel.
1945
1946 Transferring environment variables and functions
1947 env_parallel is a shell function that transfers all aliases, functions,
1948 variables, and arrays. You active it by running:
1949
1950 source `which env_parallel.bash`
1951
1952 Replace bash with the shell you use.
1953
1954 Now you can use env_parallel instead of parallel and still have your
1955 environment:
1956
1957 alias myecho=echo
1958 myvar="Joe's var is"
1959 env_parallel -S $SERVER1 'myecho $myvar' ::: green
1960
1961 Output:
1962
1963 Joe's var is green
1964
1965 The disadvantage is that if your environment is huge env_parallel will
1966 fail.
1967
1968 When env_parallel fails, you can still use --env to tell GNU parallel
1969 to transfer an environment variable to the remote system.
1970
1971 MYVAR='foo bar'
1972 export MYVAR
1973 parallel --env MYVAR -S $SERVER1 echo '$MYVAR' ::: baz
1974
1975 Output:
1976
1977 foo bar baz
1978
1979 This works for functions, too, if your shell is Bash:
1980
1981 # This only works in Bash
1982 my_func() {
1983 echo in my_func $1
1984 }
1985 export -f my_func
1986 parallel --env my_func -S $SERVER1 my_func ::: baz
1987
1988 Output:
1989
1990 in my_func baz
1991
1992 GNU parallel can copy all user defined variables and functions to the
1993 remote system. It just needs to record which ones to ignore in
1994 ~/.parallel/ignored_vars. Do that by running this once:
1995
1996 parallel --record-env
1997 cat ~/.parallel/ignored_vars
1998
1999 Output:
2000
2001 [list of variables to ignore - including $PATH and $HOME]
2002
2003 Now all other variables and functions defined will be copied when using
2004 --env _.
2005
2006 # The function is only copied if using Bash
2007 my_func2() {
2008 echo in my_func2 $VAR $1
2009 }
2010 export -f my_func2
2011 VAR=foo
2012 export VAR
2013
2014 parallel --env _ -S $SERVER1 'echo $VAR; my_func2' ::: bar
2015
2016 Output:
2017
2018 foo
2019 in my_func2 foo bar
2020
2021 If you use env_parallel the variables, functions, and aliases do not
2022 even need to be exported to be copied:
2023
2024 NOT='not exported var'
2025 alias myecho=echo
2026 not_ex() {
2027 myecho in not_exported_func $NOT $1
2028 }
2029 env_parallel --env _ -S $SERVER1 'echo $NOT; not_ex' ::: bar
2030
2031 Output:
2032
2033 not exported var
2034 in not_exported_func not exported var bar
2035
2036 Showing what is actually run
2037 --verbose will show the command that would be run on the local machine.
2038
2039 When using --cat, --pipepart, or when a job is run on a remote machine,
2040 the command is wrapped with helper scripts. -vv shows all of this.
2041
2042 parallel -vv --pipepart --block 1M wc :::: num30000
2043
2044 Output:
2045
2046 <num30000 perl -e 'while(@ARGV) { sysseek(STDIN,shift,0) || die;
2047 $left = shift; while($read = sysread(STDIN,$buf, ($left > 131072
2048 ? 131072 : $left))){ $left -= $read; syswrite(STDOUT,$buf); } }'
2049 0 0 0 168894 | (wc)
2050 30000 30000 168894
2051
2052 When the command gets more complex, the output is so hard to read, that
2053 it is only useful for debugging:
2054
2055 my_func3() {
2056 echo in my_func $1 > $1.out
2057 }
2058 export -f my_func3
2059 parallel -vv --workdir ... --nice 17 --env _ --trc {}.out \
2060 -S $SERVER1 my_func3 {} ::: abc-file
2061
2062 Output will be similar to:
2063
2064 ( ssh server -- mkdir -p ./.parallel/tmp/aspire-1928520-1;rsync
2065 --protocol 30 -rlDzR -essh ./abc-file
2066 server:./.parallel/tmp/aspire-1928520-1 );ssh server -- exec perl -e
2067 \''@GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
2068 eval"@GNU_Parallel";my$eval=decode_base64(join"",@ARGV);eval$eval;'\'
2069 c3lzdGVtKCJta2RpciIsIi1wIiwiLS0iLCIucGFyYWxsZWwvdG1wL2FzcGlyZS0xOTI4N
2070 TsgY2hkaXIgIi5wYXJhbGxlbC90bXAvYXNwaXJlLTE5Mjg1MjAtMSIgfHxwcmludChTVE
2071 BhcmFsbGVsOiBDYW5ub3QgY2hkaXIgdG8gLnBhcmFsbGVsL3RtcC9hc3BpcmUtMTkyODU
2072 iKSAmJiBleGl0IDI1NTskRU5WeyJPTERQV0QifT0iL2hvbWUvdGFuZ2UvcHJpdmF0L3Bh
2073 IjskRU5WeyJQQVJBTExFTF9QSUQifT0iMTkyODUyMCI7JEVOVnsiUEFSQUxMRUxfU0VRI
2074 0BiYXNoX2Z1bmN0aW9ucz1xdyhteV9mdW5jMyk7IGlmKCRFTlZ7IlNIRUxMIn09fi9jc2
2075 ByaW50IFNUREVSUiAiQ1NIL1RDU0ggRE8gTk9UIFNVUFBPUlQgbmV3bGluZXMgSU4gVkF
2076 TL0ZVTkNUSU9OUy4gVW5zZXQgQGJhc2hfZnVuY3Rpb25zXG4iOyBleGVjICJmYWxzZSI7
2077 YXNoZnVuYyA9ICJteV9mdW5jMygpIHsgIGVjaG8gaW4gbXlfZnVuYyBcJDEgPiBcJDEub
2078 Xhwb3J0IC1mIG15X2Z1bmMzID4vZGV2L251bGw7IjtAQVJHVj0ibXlfZnVuYzMgYWJjLW
2079 RzaGVsbD0iJEVOVntTSEVMTH0iOyR0bXBkaXI9Ii90bXAiOyRuaWNlPTE3O2RveyRFTlZ
2080 MRUxfVE1QfT0kdG1wZGlyLiIvcGFyIi5qb2luIiIsbWFweygwLi45LCJhIi4uInoiLCJB
2081 KVtyYW5kKDYyKV19KDEuLjUpO313aGlsZSgtZSRFTlZ7UEFSQUxMRUxfVE1QfSk7JFNJ
2082 fT1zdWJ7JGRvbmU9MTt9OyRwaWQ9Zm9yazt1bmxlc3MoJHBpZCl7c2V0cGdycDtldmFse
2083 W9yaXR5KDAsMCwkbmljZSl9O2V4ZWMkc2hlbGwsIi1jIiwoJGJhc2hmdW5jLiJAQVJHVi
2084 JleGVjOiQhXG4iO31kb3skcz0kczwxPzAuMDAxKyRzKjEuMDM6JHM7c2VsZWN0KHVuZGV
2085 mLHVuZGVmLCRzKTt9dW50aWwoJGRvbmV8fGdldHBwaWQ9PTEpO2tpbGwoU0lHSFVQLC0k
2086 dW5sZXNzJGRvbmU7d2FpdDtleGl0KCQ/JjEyNz8xMjgrKCQ/JjEyNyk6MSskPz4+OCk=;
2087 _EXIT_status=$?; mkdir -p ./.; rsync --protocol 30 --rsync-path=cd\
2088 ./.parallel/tmp/aspire-1928520-1/./.\;\ rsync -rlDzR -essh
2089 server:./abc-file.out ./.;ssh server -- \(rm\ -f\
2090 ./.parallel/tmp/aspire-1928520-1/abc-file\;\ sh\ -c\ \'rmdir\
2091 ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\
2092 2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);ssh
2093 server -- \(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file.out\;\
2094 sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\
2095 ./.parallel/\ 2\>/dev/null\'\;rm\ -rf\
2096 ./.parallel/tmp/aspire-1928520-1\;\);ssh server -- rm -rf
2097 .parallel/tmp/aspire-1928520-1; exit $_EXIT_status;
2098
2100 GNU parset will set shell variables to the output of GNU parallel. GNU
2101 parset has one important limitation: It cannot be part of a pipe. In
2102 particular this means it cannot read anything from standard input
2103 (stdin) or pipe output to another program.
2104
2105 To use GNU parset prepend command with destination variables:
2106
2107 parset myvar1,myvar2 echo ::: a b
2108 echo $myvar1
2109 echo $myvar2
2110
2111 Output:
2112
2113 a
2114 b
2115
2116 If you only give a single variable, it will be treated as an array:
2117
2118 parset myarray seq {} 5 ::: 1 2 3
2119 echo "${myarray[1]}"
2120
2121 Output:
2122
2123 2
2124 3
2125 4
2126 5
2127
2128 The commands to run can be an array:
2129
2130 cmd=("echo '<<joe \"double space\" cartoon>>'" "pwd")
2131 parset data ::: "${cmd[@]}"
2132 echo "${data[0]}"
2133 echo "${data[1]}"
2134
2135 Output:
2136
2137 <<joe "double space" cartoon>>
2138 [current dir]
2139
2141 GNU parallel can save into an SQL base. Point GNU parallel to a table
2142 and it will put the joblog there together with the variables and the
2143 output each in their own column.
2144
2145 CSV as SQL base
2146 The simplest is to use a CSV file as the storage table:
2147
2148 parallel --sqlandworker csv:////%2Ftmp%2Flog.csv \
2149 seq ::: 10 ::: 12 13 14
2150 cat /tmp/log.csv
2151
2152 Note how '/' in the path must be written as %2F.
2153
2154 Output will be similar to:
2155
2156 Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2157 Command,V1,V2,Stdout,Stderr
2158 1,:,1458254498.254,0.069,0,9,0,0,"seq 10 12",10,12,"10
2159 11
2160 12
2161 ",
2162 2,:,1458254498.278,0.080,0,12,0,0,"seq 10 13",10,13,"10
2163 11
2164 12
2165 13
2166 ",
2167 3,:,1458254498.301,0.083,0,15,0,0,"seq 10 14",10,14,"10
2168 11
2169 12
2170 13
2171 14
2172 ",
2173
2174 A proper CSV reader (like LibreOffice or R's read.csv) will read this
2175 format correctly - even with fields containing newlines as above.
2176
2177 If the output is big you may want to put it into files using --results:
2178
2179 parallel --results outdir --sqlandworker csv:////%2Ftmp%2Flog2.csv \
2180 seq ::: 10 ::: 12 13 14
2181 cat /tmp/log2.csv
2182
2183 Output will be similar to:
2184
2185 Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2186 Command,V1,V2,Stdout,Stderr
2187 1,:,1458824738.287,0.029,0,9,0,0,
2188 "seq 10 12",10,12,outdir/1/10/2/12/stdout,outdir/1/10/2/12/stderr
2189 2,:,1458824738.298,0.025,0,12,0,0,
2190 "seq 10 13",10,13,outdir/1/10/2/13/stdout,outdir/1/10/2/13/stderr
2191 3,:,1458824738.309,0.026,0,15,0,0,
2192 "seq 10 14",10,14,outdir/1/10/2/14/stdout,outdir/1/10/2/14/stderr
2193
2194 DBURL as table
2195 The CSV file is an example of a DBURL.
2196
2197 GNU parallel uses a DBURL to address the table. A DBURL has this
2198 format:
2199
2200 vendor://[[user][:password]@][host][:port]/[database[/table]
2201
2202 Example:
2203
2204 mysql://scott:tiger@my.example.com/mydatabase/mytable
2205 postgresql://scott:tiger@pg.example.com/mydatabase/mytable
2206 sqlite3:///%2Ftmp%2Fmydatabase/mytable
2207 csv:////%2Ftmp%2Flog.csv
2208
2209 To refer to /tmp/mydatabase with sqlite or csv you need to encode the /
2210 as %2F.
2211
2212 Run a job using sqlite on mytable in /tmp/mydatabase:
2213
2214 DBURL=sqlite3:///%2Ftmp%2Fmydatabase
2215 DBURLTABLE=$DBURL/mytable
2216 parallel --sqlandworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2217
2218 To see the result:
2219
2220 sql $DBURL 'SELECT * FROM mytable ORDER BY Seq;'
2221
2222 Output will be similar to:
2223
2224 Seq|Host|Starttime|JobRuntime|Send|Receive|Exitval|_Signal|
2225 Command|V1|V2|Stdout|Stderr
2226 1|:|1451619638.903|0.806||8|0|0|echo foo baz|foo|baz|foo baz
2227 |
2228 2|:|1451619639.265|1.54||9|0|0|echo foo quuz|foo|quuz|foo quuz
2229 |
2230 3|:|1451619640.378|1.43||8|0|0|echo bar baz|bar|baz|bar baz
2231 |
2232 4|:|1451619641.473|0.958||9|0|0|echo bar quuz|bar|quuz|bar quuz
2233 |
2234
2235 The first columns are well known from --joblog. V1 and V2 are data from
2236 the input sources. Stdout and Stderr are standard output and standard
2237 error, respectively.
2238
2239 Using multiple workers
2240 Using an SQL base as storage costs overhead in the order of 1 second
2241 per job.
2242
2243 One of the situations where it makes sense is if you have multiple
2244 workers.
2245
2246 You can then have a single master machine that submits jobs to the SQL
2247 base (but does not do any of the work):
2248
2249 parallel --sqlmaster $DBURLTABLE echo ::: foo bar ::: baz quuz
2250
2251 On the worker machines you run exactly the same command except you
2252 replace --sqlmaster with --sqlworker.
2253
2254 parallel --sqlworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2255
2256 To run a master and a worker on the same machine use --sqlandworker as
2257 shown earlier.
2258
2260 The --pipe functionality puts GNU parallel in a different mode: Instead
2261 of treating the data on stdin (standard input) as arguments for a
2262 command to run, the data will be sent to stdin (standard input) of the
2263 command.
2264
2265 The typical situation is:
2266
2267 command_A | command_B | command_C
2268
2269 where command_B is slow, and you want to speed up command_B.
2270
2271 Chunk size
2272 By default GNU parallel will start an instance of command_B, read a
2273 chunk of 1 MB, and pass that to the instance. Then start another
2274 instance, read another chunk, and pass that to the second instance.
2275
2276 cat num1000000 | parallel --pipe wc
2277
2278 Output (the order may be different):
2279
2280 165668 165668 1048571
2281 149797 149797 1048579
2282 149796 149796 1048572
2283 149797 149797 1048579
2284 149797 149797 1048579
2285 149796 149796 1048572
2286 85349 85349 597444
2287
2288 The size of the chunk is not exactly 1 MB because GNU parallel only
2289 passes full lines - never half a line, thus the blocksize is only 1 MB
2290 on average. You can change the block size to 2 MB with --block:
2291
2292 cat num1000000 | parallel --pipe --block 2M wc
2293
2294 Output (the order may be different):
2295
2296 315465 315465 2097150
2297 299593 299593 2097151
2298 299593 299593 2097151
2299 85349 85349 597444
2300
2301 GNU parallel treats each line as a record. If the order of records is
2302 unimportant (e.g. you need all lines processed, but you do not care
2303 which is processed first), then you can use --roundrobin. Without
2304 --roundrobin GNU parallel will start a command per block; with
2305 --roundrobin only the requested number of jobs will be started
2306 (--jobs). The records will then be distributed between the running
2307 jobs:
2308
2309 cat num1000000 | parallel --pipe -j4 --roundrobin wc
2310
2311 Output will be similar to:
2312
2313 149797 149797 1048579
2314 299593 299593 2097151
2315 315465 315465 2097150
2316 235145 235145 1646016
2317
2318 One of the 4 instances got a single record, 2 instances got 2 full
2319 records each, and one instance got 1 full and 1 partial record.
2320
2321 Records
2322 GNU parallel sees the input as records. The default record is a single
2323 line.
2324
2325 Using -N140000 GNU parallel will read 140000 records at a time:
2326
2327 cat num1000000 | parallel --pipe -N140000 wc
2328
2329 Output (the order may be different):
2330
2331 140000 140000 868895
2332 140000 140000 980000
2333 140000 140000 980000
2334 140000 140000 980000
2335 140000 140000 980000
2336 140000 140000 980000
2337 140000 140000 980000
2338 20000 20000 140001
2339
2340 Note how that the last job could not get the full 140000 lines, but
2341 only 20000 lines.
2342
2343 If a record is 75 lines -L can be used:
2344
2345 cat num1000000 | parallel --pipe -L75 wc
2346
2347 Output (the order may be different):
2348
2349 165600 165600 1048095
2350 149850 149850 1048950
2351 149775 149775 1048425
2352 149775 149775 1048425
2353 149850 149850 1048950
2354 149775 149775 1048425
2355 85350 85350 597450
2356 25 25 176
2357
2358 Note how GNU parallel still reads a block of around 1 MB; but instead
2359 of passing full lines to wc it passes full 75 lines at a time. This of
2360 course does not hold for the last job (which in this case got 25
2361 lines).
2362
2363 Fixed length records
2364 Fixed length records can be processed by setting --recend '' and
2365 --block recordsize. A header of size n can be processed with --header
2366 .{n}.
2367
2368 Here is how to process a file with a 4-byte header and a 3-byte record
2369 size:
2370
2371 cat fixedlen | parallel --pipe --header .{4} --block 3 --recend '' \
2372 'echo start; cat; echo'
2373
2374 Output:
2375
2376 start
2377 HHHHAAA
2378 start
2379 HHHHCCC
2380 start
2381 HHHHBBB
2382
2383 It may be more efficient to increase --block to a multiplum of the
2384 record size.
2385
2386 Record separators
2387 GNU parallel uses separators to determine where two records split.
2388
2389 --recstart gives the string that starts a record; --recend gives the
2390 string that ends a record. The default is --recend '\n' (newline).
2391
2392 If both --recend and --recstart are given, then the record will only
2393 split if the recend string is immediately followed by the recstart
2394 string.
2395
2396 Here the --recend is set to ', ':
2397
2398 echo /foo, bar/, /baz, qux/, | \
2399 parallel -kN1 --recend ', ' --pipe echo JOB{#}\;cat\;echo END
2400
2401 Output:
2402
2403 JOB1
2404 /foo, END
2405 JOB2
2406 bar/, END
2407 JOB3
2408 /baz, END
2409 JOB4
2410 qux/,
2411 END
2412
2413 Here the --recstart is set to /:
2414
2415 echo /foo, bar/, /baz, qux/, | \
2416 parallel -kN1 --recstart / --pipe echo JOB{#}\;cat\;echo END
2417
2418 Output:
2419
2420 JOB1
2421 /foo, barEND
2422 JOB2
2423 /, END
2424 JOB3
2425 /baz, quxEND
2426 JOB4
2427 /,
2428 END
2429
2430 Here both --recend and --recstart are set:
2431
2432 echo /foo, bar/, /baz, qux/, | \
2433 parallel -kN1 --recend ', ' --recstart / --pipe \
2434 echo JOB{#}\;cat\;echo END
2435
2436 Output:
2437
2438 JOB1
2439 /foo, bar/, END
2440 JOB2
2441 /baz, qux/,
2442 END
2443
2444 Note the difference between setting one string and setting both
2445 strings.
2446
2447 With --regexp the --recend and --recstart will be treated as a regular
2448 expression:
2449
2450 echo foo,bar,_baz,__qux, | \
2451 parallel -kN1 --regexp --recend ,_+ --pipe \
2452 echo JOB{#}\;cat\;echo END
2453
2454 Output:
2455
2456 JOB1
2457 foo,bar,_END
2458 JOB2
2459 baz,__END
2460 JOB3
2461 qux,
2462 END
2463
2464 GNU parallel can remove the record separators with
2465 --remove-rec-sep/--rrs:
2466
2467 echo foo,bar,_baz,__qux, | \
2468 parallel -kN1 --rrs --regexp --recend ,_+ --pipe \
2469 echo JOB{#}\;cat\;echo END
2470
2471 Output:
2472
2473 JOB1
2474 foo,barEND
2475 JOB2
2476 bazEND
2477 JOB3
2478 qux,
2479 END
2480
2481 Header
2482 If the input data has a header, the header can be repeated for each job
2483 by matching the header with --header. If headers start with % you can
2484 do this:
2485
2486 cat num_%header | \
2487 parallel --header '(%.*\n)*' --pipe -N3 echo JOB{#}\;cat
2488
2489 Output (the order may be different):
2490
2491 JOB1
2492 %head1
2493 %head2
2494 1
2495 2
2496 3
2497 JOB2
2498 %head1
2499 %head2
2500 4
2501 5
2502 6
2503 JOB3
2504 %head1
2505 %head2
2506 7
2507 8
2508 9
2509 JOB4
2510 %head1
2511 %head2
2512 10
2513
2514 If the header is 2 lines, --header 2 will work:
2515
2516 cat num_%header | parallel --header 2 --pipe -N3 echo JOB{#}\;cat
2517
2518 Output: Same as above.
2519
2520 --pipepart
2521 --pipe is not very efficient. It maxes out at around 500 MB/s.
2522 --pipepart can easily deliver 5 GB/s. But there are a few limitations.
2523 The input has to be a normal file (not a pipe) given by -a or :::: and
2524 -L/-l/-N do not work. --recend and --recstart, however, do work, and
2525 records can often be split on that alone.
2526
2527 parallel --pipepart -a num1000000 --block 3m wc
2528
2529 Output (the order may be different):
2530
2531 444443 444444 3000002
2532 428572 428572 3000004
2533 126985 126984 888890
2534
2536 Input data and parallel command in the same file
2537 GNU parallel is often called as this:
2538
2539 cat input_file | parallel command
2540
2541 With --shebang the input_file and parallel can be combined into the
2542 same script.
2543
2544 UNIX shell scripts start with a shebang line like this:
2545
2546 #!/bin/bash
2547
2548 GNU parallel can do that, too. With --shebang the arguments can be
2549 listed in the file. The parallel command is the first line of the
2550 script:
2551
2552 #!/usr/bin/parallel --shebang -r echo
2553
2554 foo
2555 bar
2556 baz
2557
2558 Output (the order may be different):
2559
2560 foo
2561 bar
2562 baz
2563
2564 Parallelizing existing scripts
2565 GNU parallel is often called as this:
2566
2567 cat input_file | parallel command
2568 parallel command ::: foo bar
2569
2570 If command is a script, parallel can be combined into a single file so
2571 this will run the script in parallel:
2572
2573 cat input_file | command
2574 command foo bar
2575
2576 This perl script perl_echo works like echo:
2577
2578 #!/usr/bin/perl
2579
2580 print "@ARGV\n"
2581
2582 It can be called as this:
2583
2584 parallel perl_echo ::: foo bar
2585
2586 By changing the #!-line it can be run in parallel:
2587
2588 #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2589
2590 print "@ARGV\n"
2591
2592 Thus this will work:
2593
2594 perl_echo foo bar
2595
2596 Output (the order may be different):
2597
2598 foo
2599 bar
2600
2601 This technique can be used for:
2602
2603 Perl:
2604 #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2605
2606 print "Arguments @ARGV\n";
2607
2608 Python:
2609 #!/usr/bin/parallel --shebang-wrap /usr/bin/python
2610
2611 import sys
2612 print 'Arguments', str(sys.argv)
2613
2614 Bash/sh/zsh/Korn shell:
2615 #!/usr/bin/parallel --shebang-wrap /bin/bash
2616
2617 echo Arguments "$@"
2618
2619 csh:
2620 #!/usr/bin/parallel --shebang-wrap /bin/csh
2621
2622 echo Arguments "$argv"
2623
2624 Tcl:
2625 #!/usr/bin/parallel --shebang-wrap /usr/bin/tclsh
2626
2627 puts "Arguments $argv"
2628
2629 R:
2630 #!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave
2631
2632 args <- commandArgs(trailingOnly = TRUE)
2633 print(paste("Arguments ",args))
2634
2635 GNUplot:
2636 #!/usr/bin/parallel --shebang-wrap ARG={} /usr/bin/gnuplot
2637
2638 print "Arguments ", system('echo $ARG')
2639
2640 Ruby:
2641 #!/usr/bin/parallel --shebang-wrap /usr/bin/ruby
2642
2643 print "Arguments "
2644 puts ARGV
2645
2646 Octave:
2647 #!/usr/bin/parallel --shebang-wrap /usr/bin/octave
2648
2649 printf ("Arguments");
2650 arg_list = argv ();
2651 for i = 1:nargin
2652 printf (" %s", arg_list{i});
2653 endfor
2654 printf ("\n");
2655
2656 Common LISP:
2657 #!/usr/bin/parallel --shebang-wrap /usr/bin/clisp
2658
2659 (format t "~&~S~&" 'Arguments)
2660 (format t "~&~S~&" *args*)
2661
2662 PHP:
2663 #!/usr/bin/parallel --shebang-wrap /usr/bin/php
2664 <?php
2665 echo "Arguments";
2666 foreach(array_slice($argv,1) as $v)
2667 {
2668 echo " $v";
2669 }
2670 echo "\n";
2671 ?>
2672
2673 Node.js:
2674 #!/usr/bin/parallel --shebang-wrap /usr/bin/node
2675
2676 var myArgs = process.argv.slice(2);
2677 console.log('Arguments ', myArgs);
2678
2679 LUA:
2680 #!/usr/bin/parallel --shebang-wrap /usr/bin/lua
2681
2682 io.write "Arguments"
2683 for a = 1, #arg do
2684 io.write(" ")
2685 io.write(arg[a])
2686 end
2687 print("")
2688
2689 C#:
2690 #!/usr/bin/parallel --shebang-wrap ARGV={} /usr/bin/csharp
2691
2692 var argv = Environment.GetEnvironmentVariable("ARGV");
2693 print("Arguments "+argv);
2694
2696 GNU parallel can work as a counting semaphore. This is slower and less
2697 efficient than its normal mode.
2698
2699 A counting semaphore is like a row of toilets. People needing a toilet
2700 can use any toilet, but if there are more people than toilets, they
2701 will have to wait for one of the toilets to become available.
2702
2703 An alias for parallel --semaphore is sem.
2704
2705 sem will follow a person to the toilets, wait until a toilet is
2706 available, leave the person in the toilet and exit.
2707
2708 sem --fg will follow a person to the toilets, wait until a toilet is
2709 available, stay with the person in the toilet and exit when the person
2710 exits.
2711
2712 sem --wait will wait for all persons to leave the toilets.
2713
2714 sem does not have a queue discipline, so the next person is chosen
2715 randomly.
2716
2717 -j sets the number of toilets.
2718
2719 Mutex
2720 The default is to have only one toilet (this is called a mutex). The
2721 program is started in the background and sem exits immediately. Use
2722 --wait to wait for all sems to finish:
2723
2724 sem 'sleep 1; echo The first finished' &&
2725 echo The first is now running in the background &&
2726 sem 'sleep 1; echo The second finished' &&
2727 echo The second is now running in the background
2728 sem --wait
2729
2730 Output:
2731
2732 The first is now running in the background
2733 The first finished
2734 The second is now running in the background
2735 The second finished
2736
2737 The command can be run in the foreground with --fg, which will only
2738 exit when the command completes:
2739
2740 sem --fg 'sleep 1; echo The first finished' &&
2741 echo The first finished running in the foreground &&
2742 sem --fg 'sleep 1; echo The second finished' &&
2743 echo The second finished running in the foreground
2744 sem --wait
2745
2746 The difference between this and just running the command, is that a
2747 mutex is set, so if other sems were running in the background only one
2748 would run at a time.
2749
2750 To control which semaphore is used, use --semaphorename/--id. Run this
2751 in one terminal:
2752
2753 sem --id my_id -u 'echo First started; sleep 10; echo First done'
2754
2755 and simultaneously this in another terminal:
2756
2757 sem --id my_id -u 'echo Second started; sleep 10; echo Second done'
2758
2759 Note how the second will only be started when the first has finished.
2760
2761 Counting semaphore
2762 A mutex is like having a single toilet: When it is in use everyone else
2763 will have to wait. A counting semaphore is like having multiple
2764 toilets: Several people can use the toilets, but when they all are in
2765 use, everyone else will have to wait.
2766
2767 sem can emulate a counting semaphore. Use --jobs to set the number of
2768 toilets like this:
2769
2770 sem --jobs 3 --id my_id -u 'echo Start 1; sleep 5; echo 1 done' &&
2771 sem --jobs 3 --id my_id -u 'echo Start 2; sleep 6; echo 2 done' &&
2772 sem --jobs 3 --id my_id -u 'echo Start 3; sleep 7; echo 3 done' &&
2773 sem --jobs 3 --id my_id -u 'echo Start 4; sleep 8; echo 4 done' &&
2774 sem --wait --id my_id
2775
2776 Output:
2777
2778 Start 1
2779 Start 2
2780 Start 3
2781 1 done
2782 Start 4
2783 2 done
2784 3 done
2785 4 done
2786
2787 Timeout
2788 With --semaphoretimeout you can force running the command anyway after
2789 a period (positive number) or give up (negative number):
2790
2791 sem --id foo -u 'echo Slow started; sleep 5; echo Slow ended' &&
2792 sem --id foo --semaphoretimeout 1 'echo Forced running after 1 sec' &&
2793 sem --id foo --semaphoretimeout -2 'echo Give up after 2 secs'
2794 sem --id foo --wait
2795
2796 Output:
2797
2798 Slow started
2799 parallel: Warning: Semaphore timed out. Stealing the semaphore.
2800 Forced running after 1 sec
2801 parallel: Warning: Semaphore timed out. Exiting.
2802 Slow ended
2803
2804 Note how the 'Give up' was not run.
2805
2807 GNU parallel has some options to give short information about the
2808 configuration.
2809
2810 --help will print a summary of the most important options:
2811
2812 parallel --help
2813
2814 Output:
2815
2816 Usage:
2817
2818 parallel [options] [command [arguments]] < list_of_arguments
2819 parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
2820 cat ... | parallel --pipe [options] [command [arguments]]
2821
2822 -j n Run n jobs in parallel
2823 -k Keep same order
2824 -X Multiple arguments with context replace
2825 --colsep regexp Split input on regexp for positional replacements
2826 {} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings
2827 {3} {3.} {3/} {3/.} {=3 perl code =} Positional replacement strings
2828 With --plus: {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
2829 {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
2830
2831 -S sshlogin Example: foo@server.example.com
2832 --slf .. Use ~/.parallel/sshloginfile as the list of sshlogins
2833 --trc {}.bar Shorthand for --transfer --return {}.bar --cleanup
2834 --onall Run the given command with argument on all sshlogins
2835 --nonall Run the given command with no arguments on all sshlogins
2836
2837 --pipe Split stdin (standard input) to multiple jobs.
2838 --recend str Record end separator for --pipe.
2839 --recstart str Record start separator for --pipe.
2840
2841 See 'man parallel' for details
2842
2843 Academic tradition requires you to cite works you base your article on.
2844 When using programs that use GNU Parallel to process data for publication
2845 please cite:
2846
2847 O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
2848 ;login: The USENIX Magazine, February 2011:42-47.
2849
2850 This helps funding further development; AND IT WON'T COST YOU A CENT.
2851 If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2852
2853 When asking for help, always report the full output of this:
2854
2855 parallel --version
2856
2857 Output:
2858
2859 GNU parallel 20180122
2860 Copyright (C) 2007-2018
2861 Ole Tange and Free Software Foundation, Inc.
2862 License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
2863 This is free software: you are free to change and redistribute it.
2864 GNU parallel comes with no warranty.
2865
2866 Web site: http://www.gnu.org/software/parallel
2867
2868 When using programs that use GNU Parallel to process data for publication
2869 please cite as described in 'parallel --citation'.
2870
2871 In scripts --minversion can be used to ensure the user has at least
2872 this version:
2873
2874 parallel --minversion 20130722 && \
2875 echo Your version is at least 20130722.
2876
2877 Output:
2878
2879 20160322
2880 Your version is at least 20130722.
2881
2882 If you are using GNU parallel for research the BibTeX citation can be
2883 generated using --citation:
2884
2885 parallel --citation
2886
2887 Output:
2888
2889 Academic tradition requires you to cite works you base your article on.
2890 When using programs that use GNU Parallel to process data for publication
2891 please cite:
2892
2893 @article{Tange2011a,
2894 title = {GNU Parallel - The Command-Line Power Tool},
2895 author = {O. Tange},
2896 address = {Frederiksberg, Denmark},
2897 journal = {;login: The USENIX Magazine},
2898 month = {Feb},
2899 number = {1},
2900 volume = {36},
2901 url = {http://www.gnu.org/s/parallel},
2902 year = {2011},
2903 pages = {42-47},
2904 doi = {10.5281/zenodo.16303}
2905 }
2906
2907 (Feel free to use \nocite{Tange2011a})
2908
2909 This helps funding further development; AND IT WON'T COST YOU A CENT.
2910 If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2911
2912 If you send a copy of your published article to tange@gnu.org, it will be
2913 mentioned in the release notes of next version of GNU Parallel.
2914
2915 With --max-line-length-allowed GNU parallel will report the maximal
2916 size of the command line:
2917
2918 parallel --max-line-length-allowed
2919
2920 Output (may vary on different systems):
2921
2922 131071
2923
2924 --number-of-cpus and --number-of-cores run system specific code to
2925 determine the number of CPUs and CPU cores on the system. On
2926 unsupported platforms they will return 1:
2927
2928 parallel --number-of-cpus
2929 parallel --number-of-cores
2930
2931 Output (may vary on different systems):
2932
2933 4
2934 64
2935
2937 The defaults for GNU parallel can be changed systemwide by putting the
2938 command line options in /etc/parallel/config. They can be changed for a
2939 user by putting them in ~/.parallel/config.
2940
2941 Profiles work the same way, but have to be referred to with --profile:
2942
2943 echo '--nice 17' > ~/.parallel/nicetimeout
2944 echo '--timeout 300%' >> ~/.parallel/nicetimeout
2945 parallel --profile nicetimeout echo ::: A B C
2946
2947 Output:
2948
2949 A
2950 B
2951 C
2952
2953 Profiles can be combined:
2954
2955 echo '-vv --dry-run' > ~/.parallel/dryverbose
2956 parallel --profile dryverbose --profile nicetimeout echo ::: A B C
2957
2958 Output:
2959
2960 echo A
2961 echo B
2962 echo C
2963
2965 I hope you have learned something from this tutorial.
2966
2967 If you like GNU parallel:
2968
2969 · (Re-)walk through the tutorial if you have not done so in the past
2970 year (http://www.gnu.org/software/parallel/parallel_tutorial.html)
2971
2972 · Give a demo at your local user group/your team/your colleagues
2973
2974 · Post the intro videos and the tutorial on Reddit, Mastodon,
2975 Diaspora*, forums, blogs, Identi.ca, Google+, Twitter, Facebook,
2976 Linkedin, and mailing lists
2977
2978 · Request or write a review for your favourite blog or magazine
2979 (especially if you do something cool with GNU parallel)
2980
2981 · Invite me for your next conference
2982
2983 If you use GNU parallel for research:
2984
2985 · Please cite GNU parallel in you publications (use --citation)
2986
2987 If GNU parallel saves you money:
2988
2989 · (Have your company) donate to FSF or become a member
2990 https://my.fsf.org/donate/
2991
2992 (C) 2013-2019 Ole Tange, FDLv1.3 (See fdl.txt)
2993
2994
2995
299620190422 2019-05-01 PARALLEL_TUTORIAL(7)