1PARALLEL_TUTORIAL(7) parallel PARALLEL_TUTORIAL(7)
2
3
4
6 This tutorial shows off much of GNU parallel's functionality. The
7 tutorial is meant to learn the options in and syntax of GNU parallel.
8 The tutorial is not to show realistic examples from the real world. For
9 realistic examples see man parallel in the EXAMPLE section.
10
11 Spend an hour walking through the tutorial. Your command line will love
12 you for it.
13
15 To run this tutorial you must have the following:
16
17 parallel >= version 20160822
18 Install the newest version using your package manager
19 (recommended for security reasons), the way described in
20 README, or with this command:
21
22 (wget -O - pi.dk/3 || curl pi.dk/3/ || \
23 fetch -o - http://pi.dk/3) | bash
24
25 This will also install the newest version of the tutorial
26 which you can see by running this:
27
28 man parallel_tutorial
29
30 Most of the tutorial will work on older versions, too.
31
32 abc-file:
33 The file can be generated by this command:
34
35 parallel -k echo ::: A B C > abc-file
36
37 def-file:
38 The file can be generated by this command:
39
40 parallel -k echo ::: D E F > def-file
41
42 abc0-file:
43 The file can be generated by this command:
44
45 perl -e 'printf "A\0B\0C\0"' > abc0-file
46
47 abc_-file:
48 The file can be generated by this command:
49
50 perl -e 'printf "A_B_C_"' > abc_-file
51
52 tsv-file.tsv
53 The file can be generated by this command:
54
55 perl -e 'printf "f1\tf2\nA\tB\nC\tD\n"' > tsv-file.tsv
56
57 num8 The file can be generated by this command:
58
59 perl -e 'for(1..8){print "$_\n"}' > num8
60
61 num128 The file can be generated by this command:
62
63 perl -e 'for(1..128){print "$_\n"}' > num128
64
65 num30000 The file can be generated by this command:
66
67 perl -e 'for(1..30000){print "$_\n"}' > num30000
68
69 num1000000
70 The file can be generated by this command:
71
72 perl -e 'for(1..1000000){print "$_\n"}' > num1000000
73
74 num_%header
75 The file can be generated by this command:
76
77 (echo %head1; echo %head2; \
78 perl -e 'for(1..10){print "$_\n"}') > num_%header
79
80 fixedlen The file can be generated by this command:
81
82 perl -e 'print "HHHHAAABBBCCC"' > fixedlen
83
84 For remote running: ssh login on 2 servers with no password in $SERVER1
85 and $SERVER2 must work.
86 SERVER1=server.example.com
87 SERVER2=server2.example.net
88
89 So you must be able to do this:
90
91 ssh $SERVER1 echo works
92 ssh $SERVER2 echo works
93
94 It can be setup by running 'ssh-keygen -t dsa; ssh-copy-id
95 $SERVER1' and using an empty passphrase.
96
98 GNU parallel reads input from input sources. These can be files, the
99 command line, and stdin (standard input or a pipe).
100
101 A single input source
102 Input can be read from the command line:
103
104 parallel echo ::: A B C
105
106 Output (the order may be different because the jobs are run in
107 parallel):
108
109 A
110 B
111 C
112
113 The input source can be a file:
114
115 parallel -a abc-file echo
116
117 Output: Same as above.
118
119 STDIN (standard input) can be the input source:
120
121 cat abc-file | parallel echo
122
123 Output: Same as above.
124
125 Multiple input sources
126 GNU parallel can take multiple input sources given on the command line.
127 GNU parallel then generates all combinations of the input sources:
128
129 parallel echo ::: A B C ::: D E F
130
131 Output (the order may be different):
132
133 A D
134 A E
135 A F
136 B D
137 B E
138 B F
139 C D
140 C E
141 C F
142
143 The input sources can be files:
144
145 parallel -a abc-file -a def-file echo
146
147 Output: Same as above.
148
149 STDIN (standard input) can be one of the input sources using -:
150
151 cat abc-file | parallel -a - -a def-file echo
152
153 Output: Same as above.
154
155 Instead of -a files can be given after :::::
156
157 cat abc-file | parallel echo :::: - def-file
158
159 Output: Same as above.
160
161 ::: and :::: can be mixed:
162
163 parallel echo ::: A B C :::: def-file
164
165 Output: Same as above.
166
167 Linking arguments from input sources
168
169 With --link you can link the input sources and get one argument from
170 each input source:
171
172 parallel --link echo ::: A B C ::: D E F
173
174 Output (the order may be different):
175
176 A D
177 B E
178 C F
179
180 If one of the input sources is too short, its values will wrap:
181
182 parallel --link echo ::: A B C D E ::: F G
183
184 Output (the order may be different):
185
186 A F
187 B G
188 C F
189 D G
190 E F
191
192 For more flexible linking you can use :::+ and ::::+. They work like
193 ::: and :::: except they link the previous input source to this input
194 source.
195
196 This will link ABC to GHI:
197
198 parallel echo :::: abc-file :::+ G H I :::: def-file
199
200 Output (the order may be different):
201
202 A G D
203 A G E
204 A G F
205 B H D
206 B H E
207 B H F
208 C I D
209 C I E
210 C I F
211
212 This will link GHI to DEF:
213
214 parallel echo :::: abc-file ::: G H I ::::+ def-file
215
216 Output (the order may be different):
217
218 A G D
219 A H E
220 A I F
221 B G D
222 B H E
223 B I F
224 C G D
225 C H E
226 C I F
227
228 If one of the input sources is too short when using :::+ or ::::+, the
229 rest will be ignored:
230
231 parallel echo ::: A B C D E :::+ F G
232
233 Output (the order may be different):
234
235 A F
236 B G
237
238 Changing the argument separator.
239 GNU parallel can use other separators than ::: or ::::. This is
240 typically useful if ::: or :::: is used in the command to run:
241
242 parallel --arg-sep ,, echo ,, A B C :::: def-file
243
244 Output (the order may be different):
245
246 A D
247 A E
248 A F
249 B D
250 B E
251 B F
252 C D
253 C E
254 C F
255
256 Changing the argument file separator:
257
258 parallel --arg-file-sep // echo ::: A B C // def-file
259
260 Output: Same as above.
261
262 Changing the argument delimiter
263 GNU parallel will normally treat a full line as a single argument: It
264 uses \n as argument delimiter. This can be changed with -d:
265
266 parallel -d _ echo :::: abc_-file
267
268 Output (the order may be different):
269
270 A
271 B
272 C
273
274 NUL can be given as \0:
275
276 parallel -d '\0' echo :::: abc0-file
277
278 Output: Same as above.
279
280 A shorthand for -d '\0' is -0 (this will often be used to read files
281 from find ... -print0):
282
283 parallel -0 echo :::: abc0-file
284
285 Output: Same as above.
286
287 End-of-file value for input source
288 GNU parallel can stop reading when it encounters a certain value:
289
290 parallel -E stop echo ::: A B stop C D
291
292 Output:
293
294 A
295 B
296
297 Skipping empty lines
298 Using --no-run-if-empty GNU parallel will skip empty lines.
299
300 (echo 1; echo; echo 2) | parallel --no-run-if-empty echo
301
302 Output:
303
304 1
305 2
306
308 No command means arguments are commands
309 If no command is given after parallel the arguments themselves are
310 treated as commands:
311
312 parallel ::: ls 'echo foo' pwd
313
314 Output (the order may be different):
315
316 [list of files in current dir]
317 foo
318 [/path/to/current/working/dir]
319
320 The command can be a script, a binary or a Bash function if the
321 function is exported using export -f:
322
323 # Only works in Bash
324 my_func() {
325 echo in my_func $1
326 }
327 export -f my_func
328 parallel my_func ::: 1 2 3
329
330 Output (the order may be different):
331
332 in my_func 1
333 in my_func 2
334 in my_func 3
335
336 Replacement strings
337 The 7 predefined replacement strings
338
339 GNU parallel has several replacement strings. If no replacement strings
340 are used the default is to append {}:
341
342 parallel echo ::: A/B.C
343
344 Output:
345
346 A/B.C
347
348 The default replacement string is {}:
349
350 parallel echo {} ::: A/B.C
351
352 Output:
353
354 A/B.C
355
356 The replacement string {.} removes the extension:
357
358 parallel echo {.} ::: A/B.C
359
360 Output:
361
362 A/B
363
364 The replacement string {/} removes the path:
365
366 parallel echo {/} ::: A/B.C
367
368 Output:
369
370 B.C
371
372 The replacement string {//} keeps only the path:
373
374 parallel echo {//} ::: A/B.C
375
376 Output:
377
378 A
379
380 The replacement string {/.} removes the path and the extension:
381
382 parallel echo {/.} ::: A/B.C
383
384 Output:
385
386 B
387
388 The replacement string {#} gives the job number:
389
390 parallel echo {#} ::: A B C
391
392 Output (the order may be different):
393
394 1
395 2
396 3
397
398 The replacement string {%} gives the job slot number (between 1 and
399 number of jobs to run in parallel):
400
401 parallel -j 2 echo {%} ::: A B C
402
403 Output (the order may be different and 1 and 2 may be swapped):
404
405 1
406 2
407 1
408
409 Changing the replacement strings
410
411 The replacement string {} can be changed with -I:
412
413 parallel -I ,, echo ,, ::: A/B.C
414
415 Output:
416
417 A/B.C
418
419 The replacement string {.} can be changed with --extensionreplace:
420
421 parallel --extensionreplace ,, echo ,, ::: A/B.C
422
423 Output:
424
425 A/B
426
427 The replacement string {/} can be replaced with --basenamereplace:
428
429 parallel --basenamereplace ,, echo ,, ::: A/B.C
430
431 Output:
432
433 B.C
434
435 The replacement string {//} can be changed with --dirnamereplace:
436
437 parallel --dirnamereplace ,, echo ,, ::: A/B.C
438
439 Output:
440
441 A
442
443 The replacement string {/.} can be changed with
444 --basenameextensionreplace:
445
446 parallel --basenameextensionreplace ,, echo ,, ::: A/B.C
447
448 Output:
449
450 B
451
452 The replacement string {#} can be changed with --seqreplace:
453
454 parallel --seqreplace ,, echo ,, ::: A B C
455
456 Output (the order may be different):
457
458 1
459 2
460 3
461
462 The replacement string {%} can be changed with --slotreplace:
463
464 parallel -j2 --slotreplace ,, echo ,, ::: A B C
465
466 Output (the order may be different and 1 and 2 may be swapped):
467
468 1
469 2
470 1
471
472 Perl expression replacement string
473
474 When predefined replacement strings are not flexible enough a perl
475 expression can be used instead. One example is to remove two
476 extensions: foo.tar.gz becomes foo
477
478 parallel echo '{= s:\.[^.]+$::;s:\.[^.]+$::; =}' ::: foo.tar.gz
479
480 Output:
481
482 foo
483
484 In {= =} you can access all of GNU parallel's internal functions and
485 variables. A few are worth mentioning.
486
487 total_jobs() returns the total number of jobs:
488
489 parallel echo Job {#} of {= '$_=total_jobs()' =} ::: {1..5}
490
491 Output:
492
493 Job 1 of 5
494 Job 2 of 5
495 Job 3 of 5
496 Job 4 of 5
497 Job 5 of 5
498
499 Q(...) shell quotes the string:
500
501 parallel echo {} shell quoted is {= '$_=Q($_)' =} ::: '*/!#$'
502
503 Output:
504
505 */!#$ shell quoted is \*/\!\#\$
506
507 skip() skips the job:
508
509 parallel echo {= 'if($_==3) { skip() }' =} ::: {1..5}
510
511 Output:
512
513 1
514 2
515 4
516 5
517
518 @arg contains the input source variables:
519
520 parallel echo {= 'if($arg[1]==$arg[2]) { skip() }' =} \
521 ::: {1..3} ::: {1..3}
522
523 Output:
524
525 1 2
526 1 3
527 2 1
528 2 3
529 3 1
530 3 2
531
532 If the strings {= and =} cause problems they can be replaced with
533 --parens:
534
535 parallel --parens ,,,, echo ',, s:\.[^.]+$::;s:\.[^.]+$::; ,,' \
536 ::: foo.tar.gz
537
538 Output:
539
540 foo
541
542 To define a shorthand replacement string use --rpl:
543
544 parallel --rpl '.. s:\.[^.]+$::;s:\.[^.]+$::;' echo '..' \
545 ::: foo.tar.gz
546
547 Output: Same as above.
548
549 If the shorthand starts with { it can be used as a positional
550 replacement string, too:
551
552 parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{..}'
553 ::: foo.tar.gz
554
555 Output: Same as above.
556
557 If the shorthand contains matching parenthesis the replacement string
558 becomes a dynamic replacement string and the string in the parenthesis
559 can be accessed as $$1. If there are multiple matching parenthesis, the
560 matched strings can be accessed using $$2, $$3 and so on.
561
562 You can think of this as giving arguments to the replacement string.
563 Here we give the argument .tar.gz to the replacement string {%string}
564 which removes string:
565
566 parallel --rpl '{%(.+?)} s/$$1$//;' echo {%.tar.gz}.zip ::: foo.tar.gz
567
568 Output:
569
570 foo.zip
571
572 Here we give the two arguments tar.gz and zip to the replacement string
573 {/string1/string2} which replaces string1 with string2:
574
575 parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;' echo {/tar.gz/zip} \
576 ::: foo.tar.gz
577
578 Output:
579
580 foo.zip
581
582 GNU parallel's 7 replacement strings are implemented as this:
583
584 --rpl '{} '
585 --rpl '{#} $_=$job->seq()'
586 --rpl '{%} $_=$job->slot()'
587 --rpl '{/} s:.*/::'
588 --rpl '{//} $Global::use{"File::Basename"} ||=
589 eval "use File::Basename; 1;"; $_ = dirname($_);'
590 --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
591 --rpl '{.} s:\.[^/.]+$::'
592
593 Positional replacement strings
594
595 With multiple input sources the argument from the individual input
596 sources can be accessed with {number}:
597
598 parallel echo {1} and {2} ::: A B ::: C D
599
600 Output (the order may be different):
601
602 A and C
603 A and D
604 B and C
605 B and D
606
607 The positional replacement strings can also be modified using /, //,
608 /., and .:
609
610 parallel echo /={1/} //={1//} /.={1/.} .={1.} ::: A/B.C D/E.F
611
612 Output (the order may be different):
613
614 /=B.C //=A /.=B .=A/B
615 /=E.F //=D /.=E .=D/E
616
617 If a position is negative, it will refer to the input source counted
618 from behind:
619
620 parallel echo 1={1} 2={2} 3={3} -1={-1} -2={-2} -3={-3} \
621 ::: A B ::: C D ::: E F
622
623 Output (the order may be different):
624
625 1=A 2=C 3=E -1=E -2=C -3=A
626 1=A 2=C 3=F -1=F -2=C -3=A
627 1=A 2=D 3=E -1=E -2=D -3=A
628 1=A 2=D 3=F -1=F -2=D -3=A
629 1=B 2=C 3=E -1=E -2=C -3=B
630 1=B 2=C 3=F -1=F -2=C -3=B
631 1=B 2=D 3=E -1=E -2=D -3=B
632 1=B 2=D 3=F -1=F -2=D -3=B
633
634 Positional perl expression replacement string
635
636 To use a perl expression as a positional replacement string simply
637 prepend the perl expression with number and space:
638
639 parallel echo '{=2 s:\.[^.]+$::;s:\.[^.]+$::; =} {1}' \
640 ::: bar ::: foo.tar.gz
641
642 Output:
643
644 foo bar
645
646 If a shorthand defined using --rpl starts with { it can be used as a
647 positional replacement string, too:
648
649 parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{2..} {1}' \
650 ::: bar ::: foo.tar.gz
651
652 Output: Same as above.
653
654 Input from columns
655
656 The columns in a file can be bound to positional replacement strings
657 using --colsep. Here the columns are separated by TAB (\t):
658
659 parallel --colsep '\t' echo 1={1} 2={2} :::: tsv-file.tsv
660
661 Output (the order may be different):
662
663 1=f1 2=f2
664 1=A 2=B
665 1=C 2=D
666
667 Header defined replacement strings
668
669 With --header GNU parallel will use the first value of the input source
670 as the name of the replacement string. Only the non-modified version {}
671 is supported:
672
673 parallel --header : echo f1={f1} f2={f2} ::: f1 A B ::: f2 C D
674
675 Output (the order may be different):
676
677 f1=A f2=C
678 f1=A f2=D
679 f1=B f2=C
680 f1=B f2=D
681
682 It is useful with --colsep for processing files with TAB separated
683 values:
684
685 parallel --header : --colsep '\t' echo f1={f1} f2={f2} \
686 :::: tsv-file.tsv
687
688 Output (the order may be different):
689
690 f1=A f2=B
691 f1=C f2=D
692
693 More pre-defined replacement strings with --plus
694
695 --plus adds the replacement strings {+/} {+.} {+..} {+...} {..} {...}
696 {/..} {/...} {##}. The idea being that {+foo} matches the opposite of
697 {foo} and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
698 {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}.
699
700 parallel --plus echo {} ::: dir/sub/file.ex1.ex2.ex3
701 parallel --plus echo {+/}/{/} ::: dir/sub/file.ex1.ex2.ex3
702 parallel --plus echo {.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
703 parallel --plus echo {+/}/{/.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
704 parallel --plus echo {..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
705 parallel --plus echo {+/}/{/..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
706 parallel --plus echo {...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
707 parallel --plus echo {+/}/{/...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
708
709 Output:
710
711 dir/sub/file.ex1.ex2.ex3
712
713 {##} is simply the number of jobs:
714
715 parallel --plus echo Job {#} of {##} ::: {1..5}
716
717 Output:
718
719 Job 1 of 5
720 Job 2 of 5
721 Job 3 of 5
722 Job 4 of 5
723 Job 5 of 5
724
725 Dynamic replacement strings with --plus
726
727 --plus also defines these dynamic replacement strings:
728
729 {:-string} Default value is string if the argument is empty.
730
731 {:number} Substring from number till end of string.
732
733 {:number1:number2} Substring from number1 to number2.
734
735 {#string} If the argument starts with string, remove it.
736
737 {%string} If the argument ends with string, remove it.
738
739 {/string1/string2} Replace string1 with string2.
740
741 {^string} If the argument starts with string, upper case it.
742 string must be a single letter.
743
744 {^^string} If the argument contains string, upper case it.
745 string must be a single letter.
746
747 {,string} If the argument starts with string, lower case it.
748 string must be a single letter.
749
750 {,,string} If the argument contains string, lower case it.
751 string must be a single letter.
752
753 They are inspired from Bash:
754
755 unset myvar
756 echo ${myvar:-myval}
757 parallel --plus echo {:-myval} ::: "$myvar"
758
759 myvar=abcAaAdef
760 echo ${myvar:2}
761 parallel --plus echo {:2} ::: "$myvar"
762
763 echo ${myvar:2:3}
764 parallel --plus echo {:2:3} ::: "$myvar"
765
766 echo ${myvar#bc}
767 parallel --plus echo {#bc} ::: "$myvar"
768 echo ${myvar#abc}
769 parallel --plus echo {#abc} ::: "$myvar"
770
771 echo ${myvar%de}
772 parallel --plus echo {%de} ::: "$myvar"
773 echo ${myvar%def}
774 parallel --plus echo {%def} ::: "$myvar"
775
776 echo ${myvar/def/ghi}
777 parallel --plus echo {/def/ghi} ::: "$myvar"
778
779 echo ${myvar^a}
780 parallel --plus echo {^a} ::: "$myvar"
781 echo ${myvar^^a}
782 parallel --plus echo {^^a} ::: "$myvar"
783
784 myvar=AbcAaAdef
785 echo ${myvar,A}
786 parallel --plus echo '{,A}' ::: "$myvar"
787 echo ${myvar,,A}
788 parallel --plus echo '{,,A}' ::: "$myvar"
789
790 Output:
791
792 myval
793 myval
794 cAaAdef
795 cAaAdef
796 cAa
797 cAa
798 abcAaAdef
799 abcAaAdef
800 AaAdef
801 AaAdef
802 abcAaAdef
803 abcAaAdef
804 abcAaA
805 abcAaA
806 abcAaAghi
807 abcAaAghi
808 AbcAaAdef
809 AbcAaAdef
810 AbcAAAdef
811 AbcAAAdef
812 abcAaAdef
813 abcAaAdef
814 abcaaadef
815 abcaaadef
816
817 More than one argument
818 With --xargs GNU parallel will fit as many arguments as possible on a
819 single line:
820
821 cat num30000 | parallel --xargs echo | wc -l
822
823 Output (if you run this under Bash on GNU/Linux):
824
825 2
826
827 The 30000 arguments fitted on 2 lines.
828
829 The maximal length of a single line can be set with -s. With a maximal
830 line length of 10000 chars 17 commands will be run:
831
832 cat num30000 | parallel --xargs -s 10000 echo | wc -l
833
834 Output:
835
836 17
837
838 For better parallelism GNU parallel can distribute the arguments
839 between all the parallel jobs when end of file is met.
840
841 Below GNU parallel reads the last argument when generating the second
842 job. When GNU parallel reads the last argument, it spreads all the
843 arguments for the second job over 4 jobs instead, as 4 parallel jobs
844 are requested.
845
846 The first job will be the same as the --xargs example above, but the
847 second job will be split into 4 evenly sized jobs, resulting in a total
848 of 5 jobs:
849
850 cat num30000 | parallel --jobs 4 -m echo | wc -l
851
852 Output (if you run this under Bash on GNU/Linux):
853
854 5
855
856 This is even more visible when running 4 jobs with 10 arguments. The 10
857 arguments are being spread over 4 jobs:
858
859 parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10
860
861 Output:
862
863 1 2 3
864 4 5 6
865 7 8 9
866 10
867
868 A replacement string can be part of a word. -m will not repeat the
869 context:
870
871 parallel --jobs 4 -m echo pre-{}-post ::: A B C D E F G
872
873 Output (the order may be different):
874
875 pre-A B-post
876 pre-C D-post
877 pre-E F-post
878 pre-G-post
879
880 To repeat the context use -X which otherwise works like -m:
881
882 parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G
883
884 Output (the order may be different):
885
886 pre-A-post pre-B-post
887 pre-C-post pre-D-post
888 pre-E-post pre-F-post
889 pre-G-post
890
891 To limit the number of arguments use -N:
892
893 parallel -N3 echo ::: A B C D E F G H
894
895 Output (the order may be different):
896
897 A B C
898 D E F
899 G H
900
901 -N also sets the positional replacement strings:
902
903 parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H
904
905 Output (the order may be different):
906
907 1=A 2=B 3=C
908 1=D 2=E 3=F
909 1=G 2=H 3=
910
911 -N0 reads 1 argument but inserts none:
912
913 parallel -N0 echo foo ::: 1 2 3
914
915 Output:
916
917 foo
918 foo
919 foo
920
921 Quoting
922 Command lines that contain special characters may need to be protected
923 from the shell.
924
925 The perl program print "@ARGV\n" basically works like echo.
926
927 perl -e 'print "@ARGV\n"' A
928
929 Output:
930
931 A
932
933 To run that in parallel the command needs to be quoted:
934
935 parallel perl -e 'print "@ARGV\n"' ::: This wont work
936
937 Output:
938
939 [Nothing]
940
941 To quote the command use -q:
942
943 parallel -q perl -e 'print "@ARGV\n"' ::: This works
944
945 Output (the order may be different):
946
947 This
948 works
949
950 Or you can quote the critical part using \':
951
952 parallel perl -e \''print "@ARGV\n"'\' ::: This works, too
953
954 Output (the order may be different):
955
956 This
957 works,
958 too
959
960 GNU parallel can also \-quote full lines. Simply run this:
961
962 parallel --shellquote
963 Warning: Input is read from the terminal. You either know what you
964 Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
965 Warning: ::: or :::: or to pipe data into parallel. If so
966 Warning: consider going through the tutorial: man parallel_tutorial
967 Warning: Press CTRL-D to exit.
968 perl -e 'print "@ARGV\n"'
969 [CTRL-D]
970
971 Output:
972
973 perl\ -e\ \'print\ \"@ARGV\\n\"\'
974
975 This can then be used as the command:
976
977 parallel perl\ -e\ \'print\ \"@ARGV\\n\"\' ::: This also works
978
979 Output (the order may be different):
980
981 This
982 also
983 works
984
985 Trimming space
986 Space can be trimmed on the arguments using --trim:
987
988 parallel --trim r echo pre-{}-post ::: ' A '
989
990 Output:
991
992 pre- A-post
993
994 To trim on the left side:
995
996 parallel --trim l echo pre-{}-post ::: ' A '
997
998 Output:
999
1000 pre-A -post
1001
1002 To trim on the both sides:
1003
1004 parallel --trim lr echo pre-{}-post ::: ' A '
1005
1006 Output:
1007
1008 pre-A-post
1009
1010 Respecting the shell
1011 This tutorial uses Bash as the shell. GNU parallel respects which shell
1012 you are using, so in zsh you can do:
1013
1014 parallel echo \={} ::: zsh bash ls
1015
1016 Output:
1017
1018 /usr/bin/zsh
1019 /bin/bash
1020 /bin/ls
1021
1022 In csh you can do:
1023
1024 parallel 'set a="{}"; if( { test -d "$a" } ) echo "$a is a dir"' ::: *
1025
1026 Output:
1027
1028 [somedir] is a dir
1029
1030 This also becomes useful if you use GNU parallel in a shell script: GNU
1031 parallel will use the same shell as the shell script.
1032
1034 The output can prefixed with the argument:
1035
1036 parallel --tag echo foo-{} ::: A B C
1037
1038 Output (the order may be different):
1039
1040 A foo-A
1041 B foo-B
1042 C foo-C
1043
1044 To prefix it with another string use --tagstring:
1045
1046 parallel --tagstring {}-bar echo foo-{} ::: A B C
1047
1048 Output (the order may be different):
1049
1050 A-bar foo-A
1051 B-bar foo-B
1052 C-bar foo-C
1053
1054 To see what commands will be run without running them use --dryrun:
1055
1056 parallel --dryrun echo {} ::: A B C
1057
1058 Output (the order may be different):
1059
1060 echo A
1061 echo B
1062 echo C
1063
1064 To print the command before running them use --verbose:
1065
1066 parallel --verbose echo {} ::: A B C
1067
1068 Output (the order may be different):
1069
1070 echo A
1071 echo B
1072 A
1073 echo C
1074 B
1075 C
1076
1077 GNU parallel will postpone the output until the command completes:
1078
1079 parallel -j2 'printf "%s-start\n%s" {} {};
1080 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1081
1082 Output:
1083
1084 2-start
1085 2-middle
1086 2-end
1087 1-start
1088 1-middle
1089 1-end
1090 4-start
1091 4-middle
1092 4-end
1093
1094 To get the output immediately use --ungroup:
1095
1096 parallel -j2 --ungroup 'printf "%s-start\n%s" {} {};
1097 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1098
1099 Output:
1100
1101 4-start
1102 42-start
1103 2-middle
1104 2-end
1105 1-start
1106 1-middle
1107 1-end
1108 -middle
1109 4-end
1110
1111 --ungroup is fast, but can cause half a line from one job to be mixed
1112 with half a line of another job. That has happened in the second line,
1113 where the line '4-middle' is mixed with '2-start'.
1114
1115 To avoid this use --linebuffer:
1116
1117 parallel -j2 --linebuffer 'printf "%s-start\n%s" {} {};
1118 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1119
1120 Output:
1121
1122 4-start
1123 2-start
1124 2-middle
1125 2-end
1126 1-start
1127 1-middle
1128 1-end
1129 4-middle
1130 4-end
1131
1132 To force the output in the same order as the arguments use
1133 --keep-order/-k:
1134
1135 parallel -j2 -k 'printf "%s-start\n%s" {} {};
1136 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1137
1138 Output:
1139
1140 4-start
1141 4-middle
1142 4-end
1143 2-start
1144 2-middle
1145 2-end
1146 1-start
1147 1-middle
1148 1-end
1149
1150 Saving output into files
1151 GNU parallel can save the output of each job into files:
1152
1153 parallel --files echo ::: A B C
1154
1155 Output will be similar to this:
1156
1157 /tmp/pAh6uWuQCg.par
1158 /tmp/opjhZCzAX4.par
1159 /tmp/W0AT_Rph2o.par
1160
1161 By default GNU parallel will cache the output in files in /tmp. This
1162 can be changed by setting $TMPDIR or --tmpdir:
1163
1164 parallel --tmpdir /var/tmp --files echo ::: A B C
1165
1166 Output will be similar to this:
1167
1168 /var/tmp/N_vk7phQRc.par
1169 /var/tmp/7zA4Ccf3wZ.par
1170 /var/tmp/LIuKgF_2LP.par
1171
1172 Or:
1173
1174 TMPDIR=/var/tmp parallel --files echo ::: A B C
1175
1176 Output: Same as above.
1177
1178 The output files can be saved in a structured way using --results:
1179
1180 parallel --results outdir echo ::: A B C
1181
1182 Output:
1183
1184 A
1185 B
1186 C
1187
1188 These files were also generated containing the standard output
1189 (stdout), standard error (stderr), and the sequence number (seq):
1190
1191 outdir/1/A/seq
1192 outdir/1/A/stderr
1193 outdir/1/A/stdout
1194 outdir/1/B/seq
1195 outdir/1/B/stderr
1196 outdir/1/B/stdout
1197 outdir/1/C/seq
1198 outdir/1/C/stderr
1199 outdir/1/C/stdout
1200
1201 --header : will take the first value as name and use that in the
1202 directory structure. This is useful if you are using multiple input
1203 sources:
1204
1205 parallel --header : --results outdir echo ::: f1 A B ::: f2 C D
1206
1207 Generated files:
1208
1209 outdir/f1/A/f2/C/seq
1210 outdir/f1/A/f2/C/stderr
1211 outdir/f1/A/f2/C/stdout
1212 outdir/f1/A/f2/D/seq
1213 outdir/f1/A/f2/D/stderr
1214 outdir/f1/A/f2/D/stdout
1215 outdir/f1/B/f2/C/seq
1216 outdir/f1/B/f2/C/stderr
1217 outdir/f1/B/f2/C/stdout
1218 outdir/f1/B/f2/D/seq
1219 outdir/f1/B/f2/D/stderr
1220 outdir/f1/B/f2/D/stdout
1221
1222 The directories are named after the variables and their values.
1223
1225 Number of simultaneous jobs
1226 The number of concurrent jobs is given with --jobs/-j:
1227
1228 /usr/bin/time parallel -N0 -j64 sleep 1 :::: num128
1229
1230 With 64 jobs in parallel the 128 sleeps will take 2-8 seconds to run -
1231 depending on how fast your machine is.
1232
1233 By default --jobs is the same as the number of CPU cores. So this:
1234
1235 /usr/bin/time parallel -N0 sleep 1 :::: num128
1236
1237 should take twice the time of running 2 jobs per CPU core:
1238
1239 /usr/bin/time parallel -N0 --jobs 200% sleep 1 :::: num128
1240
1241 --jobs 0 will run as many jobs in parallel as possible:
1242
1243 /usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: num128
1244
1245 which should take 1-7 seconds depending on how fast your machine is.
1246
1247 --jobs can read from a file which is re-read when a job finishes:
1248
1249 echo 50% > my_jobs
1250 /usr/bin/time parallel -N0 --jobs my_jobs sleep 1 :::: num128 &
1251 sleep 1
1252 echo 0 > my_jobs
1253 wait
1254
1255 The first second only 50% of the CPU cores will run a job. Then 0 is
1256 put into my_jobs and then the rest of the jobs will be started in
1257 parallel.
1258
1259 Instead of basing the percentage on the number of CPU cores GNU
1260 parallel can base it on the number of CPUs:
1261
1262 parallel --use-cpus-instead-of-cores -N0 sleep 1 :::: num8
1263
1264 Shuffle job order
1265 If you have many jobs (e.g. by multiple combinations of input sources),
1266 it can be handy to shuffle the jobs, so you get different values run.
1267 Use --shuf for that:
1268
1269 parallel --shuf echo ::: 1 2 3 ::: a b c ::: A B C
1270
1271 Output:
1272
1273 All combinations but different order for each run.
1274
1275 Interactivity
1276 GNU parallel can ask the user if a command should be run using
1277 --interactive:
1278
1279 parallel --interactive echo ::: 1 2 3
1280
1281 Output:
1282
1283 echo 1 ?...y
1284 echo 2 ?...n
1285 1
1286 echo 3 ?...y
1287 3
1288
1289 GNU parallel can be used to put arguments on the command line for an
1290 interactive command such as emacs to edit one file at a time:
1291
1292 parallel --tty emacs ::: 1 2 3
1293
1294 Or give multiple argument in one go to open multiple files:
1295
1296 parallel -X --tty vi ::: 1 2 3
1297
1298 A terminal for every job
1299 Using --tmux GNU parallel can start a terminal for every job run:
1300
1301 seq 10 20 | parallel --tmux 'echo start {}; sleep {}; echo done {}'
1302
1303 This will tell you to run something similar to:
1304
1305 tmux -S /tmp/tmsrPrO0 attach
1306
1307 Using normal tmux keystrokes (CTRL-b n or CTRL-b p) you can cycle
1308 between windows of the running jobs. When a job is finished it will
1309 pause for 10 seconds before closing the window.
1310
1311 Timing
1312 Some jobs do heavy I/O when they start. To avoid a thundering herd GNU
1313 parallel can delay starting new jobs. --delay X will make sure there is
1314 at least X seconds between each start:
1315
1316 parallel --delay 2.5 echo Starting {}\;date ::: 1 2 3
1317
1318 Output:
1319
1320 Starting 1
1321 Thu Aug 15 16:24:33 CEST 2013
1322 Starting 2
1323 Thu Aug 15 16:24:35 CEST 2013
1324 Starting 3
1325 Thu Aug 15 16:24:38 CEST 2013
1326
1327 If jobs taking more than a certain amount of time are known to fail,
1328 they can be stopped with --timeout. The accuracy of --timeout is 2
1329 seconds:
1330
1331 parallel --timeout 4.1 sleep {}\; echo {} ::: 2 4 6 8
1332
1333 Output:
1334
1335 2
1336 4
1337
1338 GNU parallel can compute the median runtime for jobs and kill those
1339 that take more than 200% of the median runtime:
1340
1341 parallel --timeout 200% sleep {}\; echo {} ::: 2.1 2.2 3 7 2.3
1342
1343 Output:
1344
1345 2.1
1346 2.2
1347 3
1348 2.3
1349
1350 Progress information
1351 Based on the runtime of completed jobs GNU parallel can estimate the
1352 total runtime:
1353
1354 parallel --eta sleep ::: 1 3 2 2 1 3 3 2 1
1355
1356 Output:
1357
1358 Computers / CPU cores / Max jobs to run
1359 1:local / 2 / 2
1360
1361 Computer:jobs running/jobs completed/%of started jobs/
1362 Average seconds to complete
1363 ETA: 2s 0left 1.11avg local:0/9/100%/1.1s
1364
1365 GNU parallel can give progress information with --progress:
1366
1367 parallel --progress sleep ::: 1 3 2 2 1 3 3 2 1
1368
1369 Output:
1370
1371 Computers / CPU cores / Max jobs to run
1372 1:local / 2 / 2
1373
1374 Computer:jobs running/jobs completed/%of started jobs/
1375 Average seconds to complete
1376 local:0/9/100%/1.1s
1377
1378 A progress bar can be shown with --bar:
1379
1380 parallel --bar sleep ::: 1 3 2 2 1 3 3 2 1
1381
1382 And a graphic bar can be shown with --bar and zenity:
1383
1384 seq 1000 | parallel -j10 --bar '(echo -n {};sleep 0.1)' \
1385 2> >(zenity --progress --auto-kill --auto-close)
1386
1387 A logfile of the jobs completed so far can be generated with --joblog:
1388
1389 parallel --joblog /tmp/log exit ::: 1 2 3 0
1390 cat /tmp/log
1391
1392 Output:
1393
1394 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1395 1 : 1376577364.974 0.008 0 0 1 0 exit 1
1396 2 : 1376577364.982 0.013 0 0 2 0 exit 2
1397 3 : 1376577364.990 0.013 0 0 3 0 exit 3
1398 4 : 1376577365.003 0.003 0 0 0 0 exit 0
1399
1400 The log contains the job sequence, which host the job was run on, the
1401 start time and run time, how much data was transferred, the exit value,
1402 the signal that killed the job, and finally the command being run.
1403
1404 With a joblog GNU parallel can be stopped and later pickup where it
1405 left off. It it important that the input of the completed jobs is
1406 unchanged.
1407
1408 parallel --joblog /tmp/log exit ::: 1 2 3 0
1409 cat /tmp/log
1410 parallel --resume --joblog /tmp/log exit ::: 1 2 3 0 0 0
1411 cat /tmp/log
1412
1413 Output:
1414
1415 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1416 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1417 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1418 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1419 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1420
1421 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1422 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1423 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1424 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1425 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1426 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1427 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1428
1429 Note how the start time of the last 2 jobs is clearly different from
1430 the second run.
1431
1432 With --resume-failed GNU parallel will re-run the jobs that failed:
1433
1434 parallel --resume-failed --joblog /tmp/log exit ::: 1 2 3 0 0 0
1435 cat /tmp/log
1436
1437 Output:
1438
1439 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1440 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1441 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1442 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1443 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1444 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1445 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1446 1 : 1376580154.433 0.010 0 0 1 0 exit 1
1447 2 : 1376580154.444 0.022 0 0 2 0 exit 2
1448 3 : 1376580154.466 0.005 0 0 3 0 exit 3
1449
1450 Note how seq 1 2 3 have been repeated because they had exit value
1451 different from 0.
1452
1453 --retry-failed does almost the same as --resume-failed. Where
1454 --resume-failed reads the commands from the command line (and ignores
1455 the commands in the joblog), --retry-failed ignores the command line
1456 and reruns the commands mentioned in the joblog.
1457
1458 parallel --retry-failed --joblog /tmp/log
1459 cat /tmp/log
1460
1461 Output:
1462
1463 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1464 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1465 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1466 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1467 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1468 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1469 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1470 1 : 1376580154.433 0.010 0 0 1 0 exit 1
1471 2 : 1376580154.444 0.022 0 0 2 0 exit 2
1472 3 : 1376580154.466 0.005 0 0 3 0 exit 3
1473 1 : 1376580164.633 0.010 0 0 1 0 exit 1
1474 2 : 1376580164.644 0.022 0 0 2 0 exit 2
1475 3 : 1376580164.666 0.005 0 0 3 0 exit 3
1476
1477 Termination
1478 Unconditional termination
1479
1480 By default GNU parallel will wait for all jobs to finish before
1481 exiting.
1482
1483 If you send GNU parallel the TERM signal, GNU parallel will stop
1484 spawning new jobs and wait for the remaining jobs to finish. If you
1485 send GNU parallel the TERM signal again, GNU parallel will kill all
1486 running jobs and exit.
1487
1488 Termination dependent on job status
1489
1490 For certain jobs there is no need to continue if one of the jobs fails
1491 and has an exit code different from 0. GNU parallel will stop spawning
1492 new jobs with --halt soon,fail=1:
1493
1494 parallel -j2 --halt soon,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1495
1496 Output:
1497
1498 0
1499 0
1500 1
1501 parallel: This job failed:
1502 echo 1; exit 1
1503 parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1504 2
1505
1506 With --halt now,fail=1 the running jobs will be killed immediately:
1507
1508 parallel -j2 --halt now,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1509
1510 Output:
1511
1512 0
1513 0
1514 1
1515 parallel: This job failed:
1516 echo 1; exit 1
1517
1518 If --halt is given a percentage this percentage of the jobs must fail
1519 before GNU parallel stops spawning more jobs:
1520
1521 parallel -j2 --halt soon,fail=20% echo {}\; exit {} \
1522 ::: 0 1 2 3 4 5 6 7 8 9
1523
1524 Output:
1525
1526 0
1527 1
1528 parallel: This job failed:
1529 echo 1; exit 1
1530 2
1531 parallel: This job failed:
1532 echo 2; exit 2
1533 parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1534 3
1535 parallel: This job failed:
1536 echo 3; exit 3
1537
1538 If you are looking for success instead of failures, you can use
1539 success. This will finish as soon as the first job succeeds:
1540
1541 parallel -j2 --halt now,success=1 echo {}\; exit {} ::: 1 2 3 0 4 5 6
1542
1543 Output:
1544
1545 1
1546 2
1547 3
1548 0
1549 parallel: This job succeeded:
1550 echo 0; exit 0
1551
1552 GNU parallel can retry the command with --retries. This is useful if a
1553 command fails for unknown reasons now and then.
1554
1555 parallel -k --retries 3 \
1556 'echo tried {} >>/tmp/runs; echo completed {}; exit {}' ::: 1 2 0
1557 cat /tmp/runs
1558
1559 Output:
1560
1561 completed 1
1562 completed 2
1563 completed 0
1564
1565 tried 1
1566 tried 2
1567 tried 1
1568 tried 2
1569 tried 1
1570 tried 2
1571 tried 0
1572
1573 Note how job 1 and 2 were tried 3 times, but 0 was not retried because
1574 it had exit code 0.
1575
1576 Termination signals (advanced)
1577
1578 Using --termseq you can control which signals are sent when killing
1579 children. Normally children will be killed by sending them SIGTERM,
1580 waiting 200 ms, then another SIGTERM, waiting 100 ms, then another
1581 SIGTERM, waiting 50 ms, then a SIGKILL, finally waiting 25 ms before
1582 giving up. It looks like this:
1583
1584 show_signals() {
1585 perl -e 'for(keys %SIG) {
1586 $SIG{$_} = eval "sub { print \"Got $_\\n\"; }";
1587 }
1588 while(1){sleep 1}'
1589 }
1590 export -f show_signals
1591 echo | parallel --termseq TERM,200,TERM,100,TERM,50,KILL,25 \
1592 -u --timeout 1 show_signals
1593
1594 Output:
1595
1596 Got TERM
1597 Got TERM
1598 Got TERM
1599
1600 Or just:
1601
1602 echo | parallel -u --timeout 1 show_signals
1603
1604 Output: Same as above.
1605
1606 You can change this to SIGINT, SIGTERM, SIGKILL:
1607
1608 echo | parallel --termseq INT,200,TERM,100,KILL,25 \
1609 -u --timeout 1 show_signals
1610
1611 Output:
1612
1613 Got INT
1614 Got TERM
1615
1616 The SIGKILL does not show because it cannot be caught, and thus the
1617 child dies.
1618
1619 Limiting the resources
1620 To avoid overloading systems GNU parallel can look at the system load
1621 before starting another job:
1622
1623 parallel --load 100% echo load is less than {} job per cpu ::: 1
1624
1625 Output:
1626
1627 [when then load is less than the number of cpu cores]
1628 load is less than 1 job per cpu
1629
1630 GNU parallel can also check if the system is swapping.
1631
1632 parallel --noswap echo the system is not swapping ::: now
1633
1634 Output:
1635
1636 [when then system is not swapping]
1637 the system is not swapping now
1638
1639 Some jobs need a lot of memory, and should only be started when there
1640 is enough memory free. Using --memfree GNU parallel can check if there
1641 is enough memory free. Additionally, GNU parallel will kill off the
1642 youngest job if the memory free falls below 50% of the size. The killed
1643 job will put back on the queue and retried later.
1644
1645 parallel --memfree 1G echo will run if more than 1 GB is ::: free
1646
1647 GNU parallel can run the jobs with a nice value. This will work both
1648 locally and remotely.
1649
1650 parallel --nice 17 echo this is being run with nice -n ::: 17
1651
1652 Output:
1653
1654 this is being run with nice -n 17
1655
1657 GNU parallel can run jobs on remote servers. It uses ssh to communicate
1658 with the remote machines.
1659
1660 Sshlogin
1661 The most basic sshlogin is -S host:
1662
1663 parallel -S $SERVER1 echo running on ::: $SERVER1
1664
1665 Output:
1666
1667 running on [$SERVER1]
1668
1669 To use a different username prepend the server with username@:
1670
1671 parallel -S username@$SERVER1 echo running on ::: username@$SERVER1
1672
1673 Output:
1674
1675 running on [username@$SERVER1]
1676
1677 The special sshlogin : is the local machine:
1678
1679 parallel -S : echo running on ::: the_local_machine
1680
1681 Output:
1682
1683 running on the_local_machine
1684
1685 If ssh is not in $PATH it can be prepended to $SERVER1:
1686
1687 parallel -S '/usr/bin/ssh '$SERVER1 echo custom ::: ssh
1688
1689 Output:
1690
1691 custom ssh
1692
1693 The ssh command can also be given using --ssh:
1694
1695 parallel --ssh /usr/bin/ssh -S $SERVER1 echo custom ::: ssh
1696
1697 or by setting $PARALLEL_SSH:
1698
1699 export PARALLEL_SSH=/usr/bin/ssh
1700 parallel -S $SERVER1 echo custom ::: ssh
1701
1702 Several servers can be given using multiple -S:
1703
1704 parallel -S $SERVER1 -S $SERVER2 echo ::: running on more hosts
1705
1706 Output (the order may be different):
1707
1708 running
1709 on
1710 more
1711 hosts
1712
1713 Or they can be separated by ,:
1714
1715 parallel -S $SERVER1,$SERVER2 echo ::: running on more hosts
1716
1717 Output: Same as above.
1718
1719 Or newline:
1720
1721 # This gives a \n between $SERVER1 and $SERVER2
1722 SERVERS="`echo $SERVER1; echo $SERVER2`"
1723 parallel -S "$SERVERS" echo ::: running on more hosts
1724
1725 They can also be read from a file (replace user@ with the user on
1726 $SERVER2):
1727
1728 echo $SERVER1 > nodefile
1729 # Force 4 cores, special ssh-command, username
1730 echo 4//usr/bin/ssh user@$SERVER2 >> nodefile
1731 parallel --sshloginfile nodefile echo ::: running on more hosts
1732
1733 Output: Same as above.
1734
1735 Every time a job finished, the --sshloginfile will be re-read, so it is
1736 possible to both add and remove hosts while running.
1737
1738 The special --sshloginfile .. reads from ~/.parallel/sshloginfile.
1739
1740 To force GNU parallel to treat a server having a given number of CPU
1741 cores prepend the number of core followed by / to the sshlogin:
1742
1743 parallel -S 4/$SERVER1 echo force {} cpus on server ::: 4
1744
1745 Output:
1746
1747 force 4 cpus on server
1748
1749 Servers can be put into groups by prepending @groupname to the server
1750 and the group can then be selected by appending @groupname to the
1751 argument if using --hostgroup:
1752
1753 parallel --hostgroup -S @grp1/$SERVER1 -S @grp2/$SERVER2 echo {} \
1754 ::: run_on_grp1@grp1 run_on_grp2@grp2
1755
1756 Output:
1757
1758 run_on_grp1
1759 run_on_grp2
1760
1761 A host can be in multiple groups by separating the groups with +, and
1762 you can force GNU parallel to limit the groups on which the command can
1763 be run with -S @groupname:
1764
1765 parallel -S @grp1 -S @grp1+grp2/$SERVER1 -S @grp2/SERVER2 echo {} \
1766 ::: run_on_grp1 also_grp1
1767
1768 Output:
1769
1770 run_on_grp1
1771 also_grp1
1772
1773 Transferring files
1774 GNU parallel can transfer the files to be processed to the remote host.
1775 It does that using rsync.
1776
1777 echo This is input_file > input_file
1778 parallel -S $SERVER1 --transferfile {} cat ::: input_file
1779
1780 Output:
1781
1782 This is input_file
1783
1784 If the files are processed into another file, the resulting file can be
1785 transferred back:
1786
1787 echo This is input_file > input_file
1788 parallel -S $SERVER1 --transferfile {} --return {}.out \
1789 cat {} ">"{}.out ::: input_file
1790 cat input_file.out
1791
1792 Output: Same as above.
1793
1794 To remove the input and output file on the remote server use --cleanup:
1795
1796 echo This is input_file > input_file
1797 parallel -S $SERVER1 --transferfile {} --return {}.out --cleanup \
1798 cat {} ">"{}.out ::: input_file
1799 cat input_file.out
1800
1801 Output: Same as above.
1802
1803 There is a shorthand for --transferfile {} --return --cleanup called
1804 --trc:
1805
1806 echo This is input_file > input_file
1807 parallel -S $SERVER1 --trc {}.out cat {} ">"{}.out ::: input_file
1808 cat input_file.out
1809
1810 Output: Same as above.
1811
1812 Some jobs need a common database for all jobs. GNU parallel can
1813 transfer that using --basefile which will transfer the file before the
1814 first job:
1815
1816 echo common data > common_file
1817 parallel --basefile common_file -S $SERVER1 \
1818 cat common_file\; echo {} ::: foo
1819
1820 Output:
1821
1822 common data
1823 foo
1824
1825 To remove it from the remote host after the last job use --cleanup.
1826
1827 Working dir
1828 The default working dir on the remote machines is the login dir. This
1829 can be changed with --workdir mydir.
1830
1831 Files transferred using --transferfile and --return will be relative to
1832 mydir on remote computers, and the command will be executed in the dir
1833 mydir.
1834
1835 The special mydir value ... will create working dirs under
1836 ~/.parallel/tmp on the remote computers. If --cleanup is given these
1837 dirs will be removed.
1838
1839 The special mydir value . uses the current working dir. If the current
1840 working dir is beneath your home dir, the value . is treated as the
1841 relative path to your home dir. This means that if your home dir is
1842 different on remote computers (e.g. if your login is different) the
1843 relative path will still be relative to your home dir.
1844
1845 parallel -S $SERVER1 pwd ::: ""
1846 parallel --workdir . -S $SERVER1 pwd ::: ""
1847 parallel --workdir ... -S $SERVER1 pwd ::: ""
1848
1849 Output:
1850
1851 [the login dir on $SERVER1]
1852 [current dir relative on $SERVER1]
1853 [a dir in ~/.parallel/tmp/...]
1854
1855 Avoid overloading sshd
1856 If many jobs are started on the same server, sshd can be overloaded.
1857 GNU parallel can insert a delay between each job run on the same
1858 server:
1859
1860 parallel -S $SERVER1 --sshdelay 0.2 echo ::: 1 2 3
1861
1862 Output (the order may be different):
1863
1864 1
1865 2
1866 3
1867
1868 sshd will be less overloaded if using --controlmaster, which will
1869 multiplex ssh connections:
1870
1871 parallel --controlmaster -S $SERVER1 echo ::: 1 2 3
1872
1873 Output: Same as above.
1874
1875 Ignore hosts that are down
1876 In clusters with many hosts a few of them are often down. GNU parallel
1877 can ignore those hosts. In this case the host 173.194.32.46 is down:
1878
1879 parallel --filter-hosts -S 173.194.32.46,$SERVER1 echo ::: bar
1880
1881 Output:
1882
1883 bar
1884
1885 Running the same commands on all hosts
1886 GNU parallel can run the same command on all the hosts:
1887
1888 parallel --onall -S $SERVER1,$SERVER2 echo ::: foo bar
1889
1890 Output (the order may be different):
1891
1892 foo
1893 bar
1894 foo
1895 bar
1896
1897 Often you will just want to run a single command on all hosts with out
1898 arguments. --nonall is a no argument --onall:
1899
1900 parallel --nonall -S $SERVER1,$SERVER2 echo foo bar
1901
1902 Output:
1903
1904 foo bar
1905 foo bar
1906
1907 When --tag is used with --nonall and --onall the --tagstring is the
1908 host:
1909
1910 parallel --nonall --tag -S $SERVER1,$SERVER2 echo foo bar
1911
1912 Output (the order may be different):
1913
1914 $SERVER1 foo bar
1915 $SERVER2 foo bar
1916
1917 --jobs sets the number of servers to log in to in parallel.
1918
1919 Transferring environment variables and functions
1920 env_parallel is a shell function that transfers all aliases, functions,
1921 variables, and arrays. You active it by running:
1922
1923 source `which env_parallel.bash`
1924
1925 Replace bash with the shell you use.
1926
1927 Now you can use env_parallel instead of parallel and still have your
1928 environment:
1929
1930 alias myecho=echo
1931 myvar="Joe's var is"
1932 env_parallel -S $SERVER1 'myecho $myvar' ::: green
1933
1934 Output:
1935
1936 Joe's var is green
1937
1938 The disadvantage is that if your environment is huge env_parallel will
1939 fail.
1940
1941 When env_parallel fails, you can still use --env to tell GNU parallel
1942 to transfer an environment variable to the remote system.
1943
1944 MYVAR='foo bar'
1945 export MYVAR
1946 parallel --env MYVAR -S $SERVER1 echo '$MYVAR' ::: baz
1947
1948 Output:
1949
1950 foo bar baz
1951
1952 This works for functions, too, if your shell is Bash:
1953
1954 # This only works in Bash
1955 my_func() {
1956 echo in my_func $1
1957 }
1958 export -f my_func
1959 parallel --env my_func -S $SERVER1 my_func ::: baz
1960
1961 Output:
1962
1963 in my_func baz
1964
1965 GNU parallel can copy all user defined variables and functions to the
1966 remote system. It just needs to record which ones to ignore in
1967 ~/.parallel/ignored_vars. Do that by running this once:
1968
1969 parallel --record-env
1970 cat ~/.parallel/ignored_vars
1971
1972 Output:
1973
1974 [list of variables to ignore - including $PATH and $HOME]
1975
1976 Now all other variables and functions defined will be copied when using
1977 --env _.
1978
1979 # The function is only copied if using Bash
1980 my_func2() {
1981 echo in my_func2 $VAR $1
1982 }
1983 export -f my_func2
1984 VAR=foo
1985 export VAR
1986
1987 parallel --env _ -S $SERVER1 'echo $VAR; my_func2' ::: bar
1988
1989 Output:
1990
1991 foo
1992 in my_func2 foo bar
1993
1994 If you use env_parallel the variables, functions, and aliases do not
1995 even need to be exported to be copied:
1996
1997 NOT='not exported var'
1998 alias myecho=echo
1999 not_ex() {
2000 myecho in not_exported_func $NOT $1
2001 }
2002 env_parallel --env _ -S $SERVER1 'echo $NOT; not_ex' ::: bar
2003
2004 Output:
2005
2006 not exported var
2007 in not_exported_func not exported var bar
2008
2009 Showing what is actually run
2010 --verbose will show the command that would be run on the local machine.
2011
2012 When using --cat, --pipepart, or when a job is run on a remote machine,
2013 the command is wrapped with helper scripts. -vv shows all of this.
2014
2015 parallel -vv --pipepart --block 1M wc :::: num30000
2016
2017 Output:
2018
2019 <num30000 perl -e 'while(@ARGV) { sysseek(STDIN,shift,0) || die;
2020 $left = shift; while($read = sysread(STDIN,$buf, ($left > 131072
2021 ? 131072 : $left))){ $left -= $read; syswrite(STDOUT,$buf); } }'
2022 0 0 0 168894 | (wc)
2023 30000 30000 168894
2024
2025 When the command gets more complex, the output is so hard to read, that
2026 it is only useful for debugging:
2027
2028 my_func3() {
2029 echo in my_func $1 > $1.out
2030 }
2031 export -f my_func3
2032 parallel -vv --workdir ... --nice 17 --env _ --trc {}.out \
2033 -S $SERVER1 my_func3 {} ::: abc-file
2034
2035 Output will be similar to:
2036
2037 ( ssh server -- mkdir -p ./.parallel/tmp/aspire-1928520-1;rsync
2038 --protocol 30 -rlDzR -essh ./abc-file
2039 server:./.parallel/tmp/aspire-1928520-1 );ssh server -- exec perl -e
2040 \''@GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
2041 eval"@GNU_Parallel";my$eval=decode_base64(join"",@ARGV);eval$eval;'\'
2042 c3lzdGVtKCJta2RpciIsIi1wIiwiLS0iLCIucGFyYWxsZWwvdG1wL2FzcGlyZS0xOTI4N
2043 TsgY2hkaXIgIi5wYXJhbGxlbC90bXAvYXNwaXJlLTE5Mjg1MjAtMSIgfHxwcmludChTVE
2044 BhcmFsbGVsOiBDYW5ub3QgY2hkaXIgdG8gLnBhcmFsbGVsL3RtcC9hc3BpcmUtMTkyODU
2045 iKSAmJiBleGl0IDI1NTskRU5WeyJPTERQV0QifT0iL2hvbWUvdGFuZ2UvcHJpdmF0L3Bh
2046 IjskRU5WeyJQQVJBTExFTF9QSUQifT0iMTkyODUyMCI7JEVOVnsiUEFSQUxMRUxfU0VRI
2047 0BiYXNoX2Z1bmN0aW9ucz1xdyhteV9mdW5jMyk7IGlmKCRFTlZ7IlNIRUxMIn09fi9jc2
2048 ByaW50IFNUREVSUiAiQ1NIL1RDU0ggRE8gTk9UIFNVUFBPUlQgbmV3bGluZXMgSU4gVkF
2049 TL0ZVTkNUSU9OUy4gVW5zZXQgQGJhc2hfZnVuY3Rpb25zXG4iOyBleGVjICJmYWxzZSI7
2050 YXNoZnVuYyA9ICJteV9mdW5jMygpIHsgIGVjaG8gaW4gbXlfZnVuYyBcJDEgPiBcJDEub
2051 Xhwb3J0IC1mIG15X2Z1bmMzID4vZGV2L251bGw7IjtAQVJHVj0ibXlfZnVuYzMgYWJjLW
2052 RzaGVsbD0iJEVOVntTSEVMTH0iOyR0bXBkaXI9Ii90bXAiOyRuaWNlPTE3O2RveyRFTlZ
2053 MRUxfVE1QfT0kdG1wZGlyLiIvcGFyIi5qb2luIiIsbWFweygwLi45LCJhIi4uInoiLCJB
2054 KVtyYW5kKDYyKV19KDEuLjUpO313aGlsZSgtZSRFTlZ7UEFSQUxMRUxfVE1QfSk7JFNJ
2055 fT1zdWJ7JGRvbmU9MTt9OyRwaWQ9Zm9yazt1bmxlc3MoJHBpZCl7c2V0cGdycDtldmFse
2056 W9yaXR5KDAsMCwkbmljZSl9O2V4ZWMkc2hlbGwsIi1jIiwoJGJhc2hmdW5jLiJAQVJHVi
2057 JleGVjOiQhXG4iO31kb3skcz0kczwxPzAuMDAxKyRzKjEuMDM6JHM7c2VsZWN0KHVuZGV
2058 mLHVuZGVmLCRzKTt9dW50aWwoJGRvbmV8fGdldHBwaWQ9PTEpO2tpbGwoU0lHSFVQLC0k
2059 dW5sZXNzJGRvbmU7d2FpdDtleGl0KCQ/JjEyNz8xMjgrKCQ/JjEyNyk6MSskPz4+OCk=;
2060 _EXIT_status=$?; mkdir -p ./.; rsync --protocol 30 --rsync-path=cd\
2061 ./.parallel/tmp/aspire-1928520-1/./.\;\ rsync -rlDzR -essh
2062 server:./abc-file.out ./.;ssh server -- \(rm\ -f\
2063 ./.parallel/tmp/aspire-1928520-1/abc-file\;\ sh\ -c\ \'rmdir\
2064 ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\
2065 2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);ssh
2066 server -- \(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file.out\;\
2067 sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\
2068 ./.parallel/\ 2\>/dev/null\'\;rm\ -rf\
2069 ./.parallel/tmp/aspire-1928520-1\;\);ssh server -- rm -rf
2070 .parallel/tmp/aspire-1928520-1; exit $_EXIT_status;
2071
2073 GNU parset will set shell variables to the output of GNU parallel. GNU
2074 parset has one important limitation: It cannot be part of a pipe. In
2075 particular this means it cannot read anything from standard input
2076 (stdin) or pipe output to another program.
2077
2078 To use GNU parset prepend command with destination variables:
2079
2080 parset myvar1,myvar2 echo ::: a b
2081 echo $myvar1
2082 echo $myvar2
2083
2084 Output:
2085
2086 a
2087 b
2088
2089 If you only give a single variable, it will be treated as an array:
2090
2091 parset myarray seq {} 5 ::: 1 2 3
2092 echo "${myarray[1]}"
2093
2094 Output:
2095
2096 2
2097 3
2098 4
2099 5
2100
2101 The commands to run can be an array:
2102
2103 cmd=("echo '<<joe \"double space\" cartoon>>'" "pwd")
2104 parset data ::: "${cmd[@]}"
2105 echo "${data[0]}"
2106 echo "${data[1]}"
2107
2108 Output:
2109
2110 <<joe "double space" cartoon>>
2111 [current dir]
2112
2114 GNU parallel can save into an SQL base. Point GNU parallel to a table
2115 and it will put the joblog there together with the variables and the
2116 output each in their own column.
2117
2118 CSV as SQL base
2119 The simplest is to use a CSV file as the storage table:
2120
2121 parallel --sqlandworker csv:////%2Ftmp%2Flog.csv \
2122 seq ::: 10 ::: 12 13 14
2123 cat /tmp/log.csv
2124
2125 Note how '/' in the path must be written as %2F.
2126
2127 Output will be similar to:
2128
2129 Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2130 Command,V1,V2,Stdout,Stderr
2131 1,:,1458254498.254,0.069,0,9,0,0,"seq 10 12",10,12,"10
2132 11
2133 12
2134 ",
2135 2,:,1458254498.278,0.080,0,12,0,0,"seq 10 13",10,13,"10
2136 11
2137 12
2138 13
2139 ",
2140 3,:,1458254498.301,0.083,0,15,0,0,"seq 10 14",10,14,"10
2141 11
2142 12
2143 13
2144 14
2145 ",
2146
2147 A proper CSV reader (like LibreOffice or R's read.csv) will read this
2148 format correctly - even with fields containing newlines as above.
2149
2150 If the output is big you may want to put it into files using --results:
2151
2152 parallel --results outdir --sqlandworker csv:////%2Ftmp%2Flog2.csv \
2153 seq ::: 10 ::: 12 13 14
2154 cat /tmp/log2.csv
2155
2156 Output will be similar to:
2157
2158 Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2159 Command,V1,V2,Stdout,Stderr
2160 1,:,1458824738.287,0.029,0,9,0,0,
2161 "seq 10 12",10,12,outdir/1/10/2/12/stdout,outdir/1/10/2/12/stderr
2162 2,:,1458824738.298,0.025,0,12,0,0,
2163 "seq 10 13",10,13,outdir/1/10/2/13/stdout,outdir/1/10/2/13/stderr
2164 3,:,1458824738.309,0.026,0,15,0,0,
2165 "seq 10 14",10,14,outdir/1/10/2/14/stdout,outdir/1/10/2/14/stderr
2166
2167 DBURL as table
2168 The CSV file is an example of a DBURL.
2169
2170 GNU parallel uses a DBURL to address the table. A DBURL has this
2171 format:
2172
2173 vendor://[[user][:password]@][host][:port]/[database[/table]
2174
2175 Example:
2176
2177 mysql://scott:tiger@my.example.com/mydatabase/mytable
2178 postgresql://scott:tiger@pg.example.com/mydatabase/mytable
2179 sqlite3:///%2Ftmp%2Fmydatabase/mytable
2180 csv:////%2Ftmp%2Flog.csv
2181
2182 To refer to /tmp/mydatabase with sqlite or csv you need to encode the /
2183 as %2F.
2184
2185 Run a job using sqlite on mytable in /tmp/mydatabase:
2186
2187 DBURL=sqlite3:///%2Ftmp%2Fmydatabase
2188 DBURLTABLE=$DBURL/mytable
2189 parallel --sqlandworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2190
2191 To see the result:
2192
2193 sql $DBURL 'SELECT * FROM mytable ORDER BY Seq;'
2194
2195 Output will be similar to:
2196
2197 Seq|Host|Starttime|JobRuntime|Send|Receive|Exitval|_Signal|
2198 Command|V1|V2|Stdout|Stderr
2199 1|:|1451619638.903|0.806||8|0|0|echo foo baz|foo|baz|foo baz
2200 |
2201 2|:|1451619639.265|1.54||9|0|0|echo foo quuz|foo|quuz|foo quuz
2202 |
2203 3|:|1451619640.378|1.43||8|0|0|echo bar baz|bar|baz|bar baz
2204 |
2205 4|:|1451619641.473|0.958||9|0|0|echo bar quuz|bar|quuz|bar quuz
2206 |
2207
2208 The first columns are well known from --joblog. V1 and V2 are data from
2209 the input sources. Stdout and Stderr are standard output and standard
2210 error, respectively.
2211
2212 Using multiple workers
2213 Using an SQL base as storage costs overhead in the order of 1 second
2214 per job.
2215
2216 One of the situations where it makes sense is if you have multiple
2217 workers.
2218
2219 You can then have a single master machine that submits jobs to the SQL
2220 base (but does not do any of the work):
2221
2222 parallel --sqlmaster $DBURLTABLE echo ::: foo bar ::: baz quuz
2223
2224 On the worker machines you run exactly the same command except you
2225 replace --sqlmaster with --sqlworker.
2226
2227 parallel --sqlworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2228
2229 To run a master and a worker on the same machine use --sqlandworker as
2230 shown earlier.
2231
2233 The --pipe functionality puts GNU parallel in a different mode: Instead
2234 of treating the data on stdin (standard input) as arguments for a
2235 command to run, the data will be sent to stdin (standard input) of the
2236 command.
2237
2238 The typical situation is:
2239
2240 command_A | command_B | command_C
2241
2242 where command_B is slow, and you want to speed up command_B.
2243
2244 Chunk size
2245 By default GNU parallel will start an instance of command_B, read a
2246 chunk of 1 MB, and pass that to the instance. Then start another
2247 instance, read another chunk, and pass that to the second instance.
2248
2249 cat num1000000 | parallel --pipe wc
2250
2251 Output (the order may be different):
2252
2253 165668 165668 1048571
2254 149797 149797 1048579
2255 149796 149796 1048572
2256 149797 149797 1048579
2257 149797 149797 1048579
2258 149796 149796 1048572
2259 85349 85349 597444
2260
2261 The size of the chunk is not exactly 1 MB because GNU parallel only
2262 passes full lines - never half a line, thus the blocksize is only 1 MB
2263 on average. You can change the block size to 2 MB with --block:
2264
2265 cat num1000000 | parallel --pipe --block 2M wc
2266
2267 Output (the order may be different):
2268
2269 315465 315465 2097150
2270 299593 299593 2097151
2271 299593 299593 2097151
2272 85349 85349 597444
2273
2274 GNU parallel treats each line as a record. If the order of records is
2275 unimportant (e.g. you need all lines processed, but you do not care
2276 which is processed first), then you can use --round-robin. Without
2277 --round-robin GNU parallel will start a command per block; with
2278 --round-robin only the requested number of jobs will be started
2279 (--jobs). The records will then be distributed between the running
2280 jobs:
2281
2282 cat num1000000 | parallel --pipe -j4 --round-robin wc
2283
2284 Output will be similar to:
2285
2286 149797 149797 1048579
2287 299593 299593 2097151
2288 315465 315465 2097150
2289 235145 235145 1646016
2290
2291 One of the 4 instances got a single record, 2 instances got 2 full
2292 records each, and one instance got 1 full and 1 partial record.
2293
2294 Records
2295 GNU parallel sees the input as records. The default record is a single
2296 line.
2297
2298 Using -N140000 GNU parallel will read 140000 records at a time:
2299
2300 cat num1000000 | parallel --pipe -N140000 wc
2301
2302 Output (the order may be different):
2303
2304 140000 140000 868895
2305 140000 140000 980000
2306 140000 140000 980000
2307 140000 140000 980000
2308 140000 140000 980000
2309 140000 140000 980000
2310 140000 140000 980000
2311 20000 20000 140001
2312
2313 Note how that the last job could not get the full 140000 lines, but
2314 only 20000 lines.
2315
2316 If a record is 75 lines -L can be used:
2317
2318 cat num1000000 | parallel --pipe -L75 wc
2319
2320 Output (the order may be different):
2321
2322 165600 165600 1048095
2323 149850 149850 1048950
2324 149775 149775 1048425
2325 149775 149775 1048425
2326 149850 149850 1048950
2327 149775 149775 1048425
2328 85350 85350 597450
2329 25 25 176
2330
2331 Note how GNU parallel still reads a block of around 1 MB; but instead
2332 of passing full lines to wc it passes full 75 lines at a time. This of
2333 course does not hold for the last job (which in this case got 25
2334 lines).
2335
2336 Fixed length records
2337 Fixed length records can be processed by setting --recend '' and
2338 --block recordsize. A header of size n can be processed with --header
2339 .{n}.
2340
2341 Here is how to process a file with a 4-byte header and a 3-byte record
2342 size:
2343
2344 cat fixedlen | parallel --pipe --header .{4} --block 3 --recend '' \
2345 'echo start; cat; echo'
2346
2347 Output:
2348
2349 start
2350 HHHHAAA
2351 start
2352 HHHHCCC
2353 start
2354 HHHHBBB
2355
2356 It may be more efficient to increase --block to a multiplum of the
2357 record size.
2358
2359 Record separators
2360 GNU parallel uses separators to determine where two records split.
2361
2362 --recstart gives the string that starts a record; --recend gives the
2363 string that ends a record. The default is --recend '\n' (newline).
2364
2365 If both --recend and --recstart are given, then the record will only
2366 split if the recend string is immediately followed by the recstart
2367 string.
2368
2369 Here the --recend is set to ', ':
2370
2371 echo /foo, bar/, /baz, qux/, | \
2372 parallel -kN1 --recend ', ' --pipe echo JOB{#}\;cat\;echo END
2373
2374 Output:
2375
2376 JOB1
2377 /foo, END
2378 JOB2
2379 bar/, END
2380 JOB3
2381 /baz, END
2382 JOB4
2383 qux/,
2384 END
2385
2386 Here the --recstart is set to /:
2387
2388 echo /foo, bar/, /baz, qux/, | \
2389 parallel -kN1 --recstart / --pipe echo JOB{#}\;cat\;echo END
2390
2391 Output:
2392
2393 JOB1
2394 /foo, barEND
2395 JOB2
2396 /, END
2397 JOB3
2398 /baz, quxEND
2399 JOB4
2400 /,
2401 END
2402
2403 Here both --recend and --recstart are set:
2404
2405 echo /foo, bar/, /baz, qux/, | \
2406 parallel -kN1 --recend ', ' --recstart / --pipe \
2407 echo JOB{#}\;cat\;echo END
2408
2409 Output:
2410
2411 JOB1
2412 /foo, bar/, END
2413 JOB2
2414 /baz, qux/,
2415 END
2416
2417 Note the difference between setting one string and setting both
2418 strings.
2419
2420 With --regexp the --recend and --recstart will be treated as a regular
2421 expression:
2422
2423 echo foo,bar,_baz,__qux, | \
2424 parallel -kN1 --regexp --recend ,_+ --pipe \
2425 echo JOB{#}\;cat\;echo END
2426
2427 Output:
2428
2429 JOB1
2430 foo,bar,_END
2431 JOB2
2432 baz,__END
2433 JOB3
2434 qux,
2435 END
2436
2437 GNU parallel can remove the record separators with
2438 --remove-rec-sep/--rrs:
2439
2440 echo foo,bar,_baz,__qux, | \
2441 parallel -kN1 --rrs --regexp --recend ,_+ --pipe \
2442 echo JOB{#}\;cat\;echo END
2443
2444 Output:
2445
2446 JOB1
2447 foo,barEND
2448 JOB2
2449 bazEND
2450 JOB3
2451 qux,
2452 END
2453
2454 Header
2455 If the input data has a header, the header can be repeated for each job
2456 by matching the header with --header. If headers start with % you can
2457 do this:
2458
2459 cat num_%header | \
2460 parallel --header '(%.*\n)*' --pipe -N3 echo JOB{#}\;cat
2461
2462 Output (the order may be different):
2463
2464 JOB1
2465 %head1
2466 %head2
2467 1
2468 2
2469 3
2470 JOB2
2471 %head1
2472 %head2
2473 4
2474 5
2475 6
2476 JOB3
2477 %head1
2478 %head2
2479 7
2480 8
2481 9
2482 JOB4
2483 %head1
2484 %head2
2485 10
2486
2487 If the header is 2 lines, --header 2 will work:
2488
2489 cat num_%header | parallel --header 2 --pipe -N3 echo JOB{#}\;cat
2490
2491 Output: Same as above.
2492
2493 --pipepart
2494 --pipe is not very efficient. It maxes out at around 500 MB/s.
2495 --pipepart can easily deliver 5 GB/s. But there are a few limitations.
2496 The input has to be a normal file (not a pipe) given by -a or :::: and
2497 -L/-l/-N do not work. --recend and --recstart, however, do work, and
2498 records can often be split on that alone.
2499
2500 parallel --pipepart -a num1000000 --block 3m wc
2501
2502 Output (the order may be different):
2503
2504 444443 444444 3000002
2505 428572 428572 3000004
2506 126985 126984 888890
2507
2509 Input data and parallel command in the same file
2510 GNU parallel is often called as this:
2511
2512 cat input_file | parallel command
2513
2514 With --shebang the input_file and parallel can be combined into the
2515 same script.
2516
2517 UNIX shell scripts start with a shebang line like this:
2518
2519 #!/bin/bash
2520
2521 GNU parallel can do that, too. With --shebang the arguments can be
2522 listed in the file. The parallel command is the first line of the
2523 script:
2524
2525 #!/usr/bin/parallel --shebang -r echo
2526
2527 foo
2528 bar
2529 baz
2530
2531 Output (the order may be different):
2532
2533 foo
2534 bar
2535 baz
2536
2537 Parallelizing existing scripts
2538 GNU parallel is often called as this:
2539
2540 cat input_file | parallel command
2541 parallel command ::: foo bar
2542
2543 If command is a script, parallel can be combined into a single file so
2544 this will run the script in parallel:
2545
2546 cat input_file | command
2547 command foo bar
2548
2549 This perl script perl_echo works like echo:
2550
2551 #!/usr/bin/perl
2552
2553 print "@ARGV\n"
2554
2555 It can be called as this:
2556
2557 parallel perl_echo ::: foo bar
2558
2559 By changing the #!-line it can be run in parallel:
2560
2561 #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2562
2563 print "@ARGV\n"
2564
2565 Thus this will work:
2566
2567 perl_echo foo bar
2568
2569 Output (the order may be different):
2570
2571 foo
2572 bar
2573
2574 This technique can be used for:
2575
2576 Perl:
2577 #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2578
2579 print "Arguments @ARGV\n";
2580
2581 Python:
2582 #!/usr/bin/parallel --shebang-wrap /usr/bin/python
2583
2584 import sys
2585 print 'Arguments', str(sys.argv)
2586
2587 Bash/sh/zsh/Korn shell:
2588 #!/usr/bin/parallel --shebang-wrap /bin/bash
2589
2590 echo Arguments "$@"
2591
2592 csh:
2593 #!/usr/bin/parallel --shebang-wrap /bin/csh
2594
2595 echo Arguments "$argv"
2596
2597 Tcl:
2598 #!/usr/bin/parallel --shebang-wrap /usr/bin/tclsh
2599
2600 puts "Arguments $argv"
2601
2602 R:
2603 #!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave
2604
2605 args <- commandArgs(trailingOnly = TRUE)
2606 print(paste("Arguments ",args))
2607
2608 GNUplot:
2609 #!/usr/bin/parallel --shebang-wrap ARG={} /usr/bin/gnuplot
2610
2611 print "Arguments ", system('echo $ARG')
2612
2613 Ruby:
2614 #!/usr/bin/parallel --shebang-wrap /usr/bin/ruby
2615
2616 print "Arguments "
2617 puts ARGV
2618
2619 Octave:
2620 #!/usr/bin/parallel --shebang-wrap /usr/bin/octave
2621
2622 printf ("Arguments");
2623 arg_list = argv ();
2624 for i = 1:nargin
2625 printf (" %s", arg_list{i});
2626 endfor
2627 printf ("\n");
2628
2629 Common LISP:
2630 #!/usr/bin/parallel --shebang-wrap /usr/bin/clisp
2631
2632 (format t "~&~S~&" 'Arguments)
2633 (format t "~&~S~&" *args*)
2634
2635 PHP:
2636 #!/usr/bin/parallel --shebang-wrap /usr/bin/php
2637 <?php
2638 echo "Arguments";
2639 foreach(array_slice($argv,1) as $v)
2640 {
2641 echo " $v";
2642 }
2643 echo "\n";
2644 ?>
2645
2646 Node.js:
2647 #!/usr/bin/parallel --shebang-wrap /usr/bin/node
2648
2649 var myArgs = process.argv.slice(2);
2650 console.log('Arguments ', myArgs);
2651
2652 LUA:
2653 #!/usr/bin/parallel --shebang-wrap /usr/bin/lua
2654
2655 io.write "Arguments"
2656 for a = 1, #arg do
2657 io.write(" ")
2658 io.write(arg[a])
2659 end
2660 print("")
2661
2662 C#:
2663 #!/usr/bin/parallel --shebang-wrap ARGV={} /usr/bin/csharp
2664
2665 var argv = Environment.GetEnvironmentVariable("ARGV");
2666 print("Arguments "+argv);
2667
2669 GNU parallel can work as a counting semaphore. This is slower and less
2670 efficient than its normal mode.
2671
2672 A counting semaphore is like a row of toilets. People needing a toilet
2673 can use any toilet, but if there are more people than toilets, they
2674 will have to wait for one of the toilets to become available.
2675
2676 An alias for parallel --semaphore is sem.
2677
2678 sem will follow a person to the toilets, wait until a toilet is
2679 available, leave the person in the toilet and exit.
2680
2681 sem --fg will follow a person to the toilets, wait until a toilet is
2682 available, stay with the person in the toilet and exit when the person
2683 exits.
2684
2685 sem --wait will wait for all persons to leave the toilets.
2686
2687 sem does not have a queue discipline, so the next person is chosen
2688 randomly.
2689
2690 -j sets the number of toilets.
2691
2692 Mutex
2693 The default is to have only one toilet (this is called a mutex). The
2694 program is started in the background and sem exits immediately. Use
2695 --wait to wait for all sems to finish:
2696
2697 sem 'sleep 1; echo The first finished' &&
2698 echo The first is now running in the background &&
2699 sem 'sleep 1; echo The second finished' &&
2700 echo The second is now running in the background
2701 sem --wait
2702
2703 Output:
2704
2705 The first is now running in the background
2706 The first finished
2707 The second is now running in the background
2708 The second finished
2709
2710 The command can be run in the foreground with --fg, which will only
2711 exit when the command completes:
2712
2713 sem --fg 'sleep 1; echo The first finished' &&
2714 echo The first finished running in the foreground &&
2715 sem --fg 'sleep 1; echo The second finished' &&
2716 echo The second finished running in the foreground
2717 sem --wait
2718
2719 The difference between this and just running the command, is that a
2720 mutex is set, so if other sems were running in the background only one
2721 would run at a time.
2722
2723 To control which semaphore is used, use --semaphorename/--id. Run this
2724 in one terminal:
2725
2726 sem --id my_id -u 'echo First started; sleep 10; echo First done'
2727
2728 and simultaneously this in another terminal:
2729
2730 sem --id my_id -u 'echo Second started; sleep 10; echo Second done'
2731
2732 Note how the second will only be started when the first has finished.
2733
2734 Counting semaphore
2735 A mutex is like having a single toilet: When it is in use everyone else
2736 will have to wait. A counting semaphore is like having multiple
2737 toilets: Several people can use the toilets, but when they all are in
2738 use, everyone else will have to wait.
2739
2740 sem can emulate a counting semaphore. Use --jobs to set the number of
2741 toilets like this:
2742
2743 sem --jobs 3 --id my_id -u 'echo Start 1; sleep 5; echo 1 done' &&
2744 sem --jobs 3 --id my_id -u 'echo Start 2; sleep 6; echo 2 done' &&
2745 sem --jobs 3 --id my_id -u 'echo Start 3; sleep 7; echo 3 done' &&
2746 sem --jobs 3 --id my_id -u 'echo Start 4; sleep 8; echo 4 done' &&
2747 sem --wait --id my_id
2748
2749 Output:
2750
2751 Start 1
2752 Start 2
2753 Start 3
2754 1 done
2755 Start 4
2756 2 done
2757 3 done
2758 4 done
2759
2760 Timeout
2761 With --semaphoretimeout you can force running the command anyway after
2762 a period (postive number) or give up (negative number):
2763
2764 sem --id foo -u 'echo Slow started; sleep 5; echo Slow ended' &&
2765 sem --id foo --semaphoretimeout 1 'echo Forced running after 1 sec' &&
2766 sem --id foo --semaphoretimeout -2 'echo Give up after 2 secs'
2767 sem --id foo --wait
2768
2769 Output:
2770
2771 Slow started
2772 parallel: Warning: Semaphore timed out. Stealing the semaphore.
2773 Forced running after 1 sec
2774 parallel: Warning: Semaphore timed out. Exiting.
2775 Slow ended
2776
2777 Note how the 'Give up' was not run.
2778
2780 GNU parallel has some options to give short information about the
2781 configuration.
2782
2783 --help will print a summary of the most important options:
2784
2785 parallel --help
2786
2787 Output:
2788
2789 Usage:
2790
2791 parallel [options] [command [arguments]] < list_of_arguments
2792 parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
2793 cat ... | parallel --pipe [options] [command [arguments]]
2794
2795 -j n Run n jobs in parallel
2796 -k Keep same order
2797 -X Multiple arguments with context replace
2798 --colsep regexp Split input on regexp for positional replacements
2799 {} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings
2800 {3} {3.} {3/} {3/.} {=3 perl code =} Positional replacement strings
2801 With --plus: {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
2802 {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
2803
2804 -S sshlogin Example: foo@server.example.com
2805 --slf .. Use ~/.parallel/sshloginfile as the list of sshlogins
2806 --trc {}.bar Shorthand for --transfer --return {}.bar --cleanup
2807 --onall Run the given command with argument on all sshlogins
2808 --nonall Run the given command with no arguments on all sshlogins
2809
2810 --pipe Split stdin (standard input) to multiple jobs.
2811 --recend str Record end separator for --pipe.
2812 --recstart str Record start separator for --pipe.
2813
2814 See 'man parallel' for details
2815
2816 Academic tradition requires you to cite works you base your article on.
2817 When using programs that use GNU Parallel to process data for publication
2818 please cite:
2819
2820 O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
2821 ;login: The USENIX Magazine, February 2011:42-47.
2822
2823 This helps funding further development; AND IT WON'T COST YOU A CENT.
2824 If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2825
2826 When asking for help, always report the full output of this:
2827
2828 parallel --version
2829
2830 Output:
2831
2832 GNU parallel 20180122
2833 Copyright (C) 2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
2834 Ole Tange and Free Software Foundation, Inc.
2835 License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
2836 This is free software: you are free to change and redistribute it.
2837 GNU parallel comes with no warranty.
2838
2839 Web site: http://www.gnu.org/software/parallel
2840
2841 When using programs that use GNU Parallel to process data for publication
2842 please cite as described in 'parallel --citation'.
2843
2844 In scripts --minversion can be used to ensure the user has at least
2845 this version:
2846
2847 parallel --minversion 20130722 && \
2848 echo Your version is at least 20130722.
2849
2850 Output:
2851
2852 20160322
2853 Your version is at least 20130722.
2854
2855 If you are using GNU parallel for research the BibTeX citation can be
2856 generated using --citation:
2857
2858 parallel --citation
2859
2860 Output:
2861
2862 Academic tradition requires you to cite works you base your article on.
2863 When using programs that use GNU Parallel to process data for publication
2864 please cite:
2865
2866 @article{Tange2011a,
2867 title = {GNU Parallel - The Command-Line Power Tool},
2868 author = {O. Tange},
2869 address = {Frederiksberg, Denmark},
2870 journal = {;login: The USENIX Magazine},
2871 month = {Feb},
2872 number = {1},
2873 volume = {36},
2874 url = {http://www.gnu.org/s/parallel},
2875 year = {2011},
2876 pages = {42-47},
2877 doi = {10.5281/zenodo.16303}
2878 }
2879
2880 (Feel free to use \nocite{Tange2011a})
2881
2882 This helps funding further development; AND IT WON'T COST YOU A CENT.
2883 If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2884
2885 If you send a copy of your published article to tange@gnu.org, it will be
2886 mentioned in the release notes of next version of GNU Parallel.
2887
2888 With --max-line-length-allowed GNU parallel will report the maximal
2889 size of the command line:
2890
2891 parallel --max-line-length-allowed
2892
2893 Output (may vary on different systems):
2894
2895 131071
2896
2897 --number-of-cpus and --number-of-cores run system specific code to
2898 determine the number of CPUs and CPU cores on the system. On
2899 unsupported platforms they will return 1:
2900
2901 parallel --number-of-cpus
2902 parallel --number-of-cores
2903
2904 Output (may vary on different systems):
2905
2906 4
2907 64
2908
2910 The defaults for GNU parallel can be changed systemwide by putting the
2911 command line options in /etc/parallel/config. They can be changed for a
2912 user by putting them in ~/.parallel/config.
2913
2914 Profiles work the same way, but have to be referred to with --profile:
2915
2916 echo '--nice 17' > ~/.parallel/nicetimeout
2917 echo '--timeout 300%' >> ~/.parallel/nicetimeout
2918 parallel --profile nicetimeout echo ::: A B C
2919
2920 Output:
2921
2922 A
2923 B
2924 C
2925
2926 Profiles can be combined:
2927
2928 echo '-vv --dry-run' > ~/.parallel/dryverbose
2929 parallel --profile dryverbose --profile nicetimeout echo ::: A B C
2930
2931 Output:
2932
2933 echo A
2934 echo B
2935 echo C
2936
2938 I hope you have learned something from this tutorial.
2939
2940 If you like GNU parallel:
2941
2942 · (Re-)walk through the tutorial if you have not done so in the past
2943 year (http://www.gnu.org/software/parallel/parallel_tutorial.html)
2944
2945 · Give a demo at your local user group/your team/your colleagues
2946
2947 · Post the intro videos and the tutorial on Reddit, Mastodon,
2948 Diaspora*, forums, blogs, Identi.ca, Google+, Twitter, Facebook,
2949 Linkedin, and mailing lists
2950
2951 · Request or write a review for your favourite blog or magazine
2952 (especially if you do something cool with GNU parallel)
2953
2954 · Invite me for your next conference
2955
2956 If you use GNU parallel for research:
2957
2958 · Please cite GNU parallel in you publications (use --citation)
2959
2960 If GNU parallel saves you money:
2961
2962 · (Have your company) donate to FSF or become a member
2963 https://my.fsf.org/donate/
2964
2965 (C) 2013,2014,2015,2016,2017,2018 Ole Tange, GPLv3
2966
2967
2968
296920180122 2018-02-15 PARALLEL_TUTORIAL(7)