1PARALLEL_TUTORIAL(7)               parallel               PARALLEL_TUTORIAL(7)
2
3
4

GNU Parallel Tutorial

6       This tutorial shows off much of GNU parallel's functionality. The
7       tutorial is meant to learn the options in and syntax of GNU parallel.
8       The tutorial is not to show realistic examples from the real world. For
9       realistic examples see man parallel in the EXAMPLE section.
10
11       Spend an hour walking through the tutorial. Your command line will love
12       you for it.
13

Prerequisites

15       To run this tutorial you must have the following:
16
17       parallel >= version 20160822
18                Install the newest version using your package manager
19                (recommended for security reasons), the way described in
20                README, or with this command:
21
22                  (wget -O - pi.dk/3 || curl pi.dk/3/ || \
23                   fetch -o - http://pi.dk/3) | bash
24
25                This will also install the newest version of the tutorial
26                which you can see by running this:
27
28                  man parallel_tutorial
29
30                Most of the tutorial will work on older versions, too.
31
32       abc-file:
33                The file can be generated by this command:
34
35                  parallel -k echo ::: A B C > abc-file
36
37       def-file:
38                The file can be generated by this command:
39
40                  parallel -k echo ::: D E F > def-file
41
42       abc0-file:
43                The file can be generated by this command:
44
45                  perl -e 'printf "A\0B\0C\0"' > abc0-file
46
47       abc_-file:
48                The file can be generated by this command:
49
50                  perl -e 'printf "A_B_C_"' > abc_-file
51
52       tsv-file.tsv
53                The file can be generated by this command:
54
55                  perl -e 'printf "f1\tf2\nA\tB\nC\tD\n"' > tsv-file.tsv
56
57       num8     The file can be generated by this command:
58
59                  perl -e 'for(1..8){print "$_\n"}' > num8
60
61       num128   The file can be generated by this command:
62
63                  perl -e 'for(1..128){print "$_\n"}' > num128
64
65       num30000 The file can be generated by this command:
66
67                  perl -e 'for(1..30000){print "$_\n"}' > num30000
68
69       num1000000
70                The file can be generated by this command:
71
72                  perl -e 'for(1..1000000){print "$_\n"}' > num1000000
73
74       num_%header
75                The file can be generated by this command:
76
77                  (echo %head1; echo %head2; \
78                   perl -e 'for(1..10){print "$_\n"}') > num_%header
79
80       fixedlen The file can be generated by this command:
81
82                  perl -e 'print "HHHHAAABBBCCC"' > fixedlen
83
84       For remote running: ssh login on 2 servers with no password in $SERVER1
85       and $SERVER2 must work.
86                  SERVER1=server.example.com
87                  SERVER2=server2.example.net
88
89                So you must be able to do this:
90
91                  ssh $SERVER1 echo works
92                  ssh $SERVER2 echo works
93
94                It can be setup by running 'ssh-keygen -t dsa; ssh-copy-id
95                $SERVER1' and using an empty passphrase.
96

Input sources

98       GNU parallel reads input from input sources. These can be files, the
99       command line, and stdin (standard input or a pipe).
100
101   A single input source
102       Input can be read from the command line:
103
104         parallel echo ::: A B C
105
106       Output (the order may be different because the jobs are run in
107       parallel):
108
109         A
110         B
111         C
112
113       The input source can be a file:
114
115         parallel -a abc-file echo
116
117       Output: Same as above.
118
119       STDIN (standard input) can be the input source:
120
121         cat abc-file | parallel echo
122
123       Output: Same as above.
124
125   Multiple input sources
126       GNU parallel can take multiple input sources given on the command line.
127       GNU parallel then generates all combinations of the input sources:
128
129         parallel echo ::: A B C ::: D E F
130
131       Output (the order may be different):
132
133         A D
134         A E
135         A F
136         B D
137         B E
138         B F
139         C D
140         C E
141         C F
142
143       The input sources can be files:
144
145         parallel -a abc-file -a def-file echo
146
147       Output: Same as above.
148
149       STDIN (standard input) can be one of the input sources using -:
150
151         cat abc-file | parallel -a - -a def-file echo
152
153       Output: Same as above.
154
155       Instead of -a files can be given after :::::
156
157         cat abc-file | parallel echo :::: - def-file
158
159       Output: Same as above.
160
161       ::: and :::: can be mixed:
162
163         parallel echo ::: A B C :::: def-file
164
165       Output: Same as above.
166
167       Linking arguments from input sources
168
169       With --link you can link the input sources and get one argument from
170       each input source:
171
172         parallel --link echo ::: A B C ::: D E F
173
174       Output (the order may be different):
175
176         A D
177         B E
178         C F
179
180       If one of the input sources is too short, its values will wrap:
181
182         parallel --link echo ::: A B C D E ::: F G
183
184       Output (the order may be different):
185
186         A F
187         B G
188         C F
189         D G
190         E F
191
192       For more flexible linking you can use :::+ and ::::+. They work like
193       ::: and :::: except they link the previous input source to this input
194       source.
195
196       This will link ABC to GHI:
197
198         parallel echo :::: abc-file :::+ G H I :::: def-file
199
200       Output (the order may be different):
201
202         A G D
203         A G E
204         A G F
205         B H D
206         B H E
207         B H F
208         C I D
209         C I E
210         C I F
211
212       This will link GHI to DEF:
213
214         parallel echo :::: abc-file ::: G H I ::::+ def-file
215
216       Output (the order may be different):
217
218         A G D
219         A H E
220         A I F
221         B G D
222         B H E
223         B I F
224         C G D
225         C H E
226         C I F
227
228       If one of the input sources is too short when using :::+ or ::::+, the
229       rest will be ignored:
230
231         parallel echo ::: A B C D E :::+ F G
232
233       Output (the order may be different):
234
235         A F
236         B G
237
238   Changing the argument separator.
239       GNU parallel can use other separators than ::: or ::::. This is
240       typically useful if ::: or :::: is used in the command to run:
241
242         parallel --arg-sep ,, echo ,, A B C :::: def-file
243
244       Output (the order may be different):
245
246         A D
247         A E
248         A F
249         B D
250         B E
251         B F
252         C D
253         C E
254         C F
255
256       Changing the argument file separator:
257
258         parallel --arg-file-sep // echo ::: A B C // def-file
259
260       Output: Same as above.
261
262   Changing the argument delimiter
263       GNU parallel will normally treat a full line as a single argument: It
264       uses \n as argument delimiter. This can be changed with -d:
265
266         parallel -d _ echo :::: abc_-file
267
268       Output (the order may be different):
269
270         A
271         B
272         C
273
274       NUL can be given as \0:
275
276         parallel -d '\0' echo :::: abc0-file
277
278       Output: Same as above.
279
280       A shorthand for -d '\0' is -0 (this will often be used to read files
281       from find ... -print0):
282
283         parallel -0 echo :::: abc0-file
284
285       Output: Same as above.
286
287   End-of-file value for input source
288       GNU parallel can stop reading when it encounters a certain value:
289
290         parallel -E stop echo ::: A B stop C D
291
292       Output:
293
294         A
295         B
296
297   Skipping empty lines
298       Using --no-run-if-empty GNU parallel will skip empty lines.
299
300         (echo 1; echo; echo 2) | parallel --no-run-if-empty echo
301
302       Output:
303
304         1
305         2
306

Building the command line

308   No command means arguments are commands
309       If no command is given after parallel the arguments themselves are
310       treated as commands:
311
312         parallel ::: ls 'echo foo' pwd
313
314       Output (the order may be different):
315
316         [list of files in current dir]
317         foo
318         [/path/to/current/working/dir]
319
320       The command can be a script, a binary or a Bash function if the
321       function is exported using export -f:
322
323         # Only works in Bash
324         my_func() {
325           echo in my_func $1
326         }
327         export -f my_func
328         parallel my_func ::: 1 2 3
329
330       Output (the order may be different):
331
332         in my_func 1
333         in my_func 2
334         in my_func 3
335
336   Replacement strings
337       The 7 predefined replacement strings
338
339       GNU parallel has several replacement strings. If no replacement strings
340       are used the default is to append {}:
341
342         parallel echo ::: A/B.C
343
344       Output:
345
346         A/B.C
347
348       The default replacement string is {}:
349
350         parallel echo {} ::: A/B.C
351
352       Output:
353
354         A/B.C
355
356       The replacement string {.} removes the extension:
357
358         parallel echo {.} ::: A/B.C
359
360       Output:
361
362         A/B
363
364       The replacement string {/} removes the path:
365
366         parallel echo {/} ::: A/B.C
367
368       Output:
369
370         B.C
371
372       The replacement string {//} keeps only the path:
373
374         parallel echo {//} ::: A/B.C
375
376       Output:
377
378         A
379
380       The replacement string {/.} removes the path and the extension:
381
382         parallel echo {/.} ::: A/B.C
383
384       Output:
385
386         B
387
388       The replacement string {#} gives the job number:
389
390         parallel echo {#} ::: A B C
391
392       Output (the order may be different):
393
394         1
395         2
396         3
397
398       The replacement string {%} gives the job slot number (between 1 and
399       number of jobs to run in parallel):
400
401         parallel -j 2 echo {%} ::: A B C
402
403       Output (the order may be different and 1 and 2 may be swapped):
404
405         1
406         2
407         1
408
409       Changing the replacement strings
410
411       The replacement string {} can be changed with -I:
412
413         parallel -I ,, echo ,, ::: A/B.C
414
415       Output:
416
417         A/B.C
418
419       The replacement string {.} can be changed with --extensionreplace:
420
421         parallel --extensionreplace ,, echo ,, ::: A/B.C
422
423       Output:
424
425         A/B
426
427       The replacement string {/} can be replaced with --basenamereplace:
428
429         parallel --basenamereplace ,, echo ,, ::: A/B.C
430
431       Output:
432
433         B.C
434
435       The replacement string {//} can be changed with --dirnamereplace:
436
437         parallel --dirnamereplace ,, echo ,, ::: A/B.C
438
439       Output:
440
441         A
442
443       The replacement string {/.} can be changed with
444       --basenameextensionreplace:
445
446         parallel --basenameextensionreplace ,, echo ,, ::: A/B.C
447
448       Output:
449
450         B
451
452       The replacement string {#} can be changed with --seqreplace:
453
454         parallel --seqreplace ,, echo ,, ::: A B C
455
456       Output (the order may be different):
457
458         1
459         2
460         3
461
462       The replacement string {%} can be changed with --slotreplace:
463
464         parallel -j2 --slotreplace ,, echo ,, ::: A B C
465
466       Output (the order may be different and 1 and 2 may be swapped):
467
468         1
469         2
470         1
471
472       Perl expression replacement string
473
474       When predefined replacement strings are not flexible enough a perl
475       expression can be used instead. One example is to remove two
476       extensions: foo.tar.gz becomes foo
477
478         parallel echo '{= s:\.[^.]+$::;s:\.[^.]+$::; =}' ::: foo.tar.gz
479
480       Output:
481
482         foo
483
484       In {= =} you can access all of GNU parallel's internal functions and
485       variables. A few are worth mentioning.
486
487       total_jobs() returns the total number of jobs:
488
489         parallel echo Job {#} of {= '$_=total_jobs()' =} ::: {1..5}
490
491       Output:
492
493         Job 1 of 5
494         Job 2 of 5
495         Job 3 of 5
496         Job 4 of 5
497         Job 5 of 5
498
499       Q(...) shell quotes the string:
500
501         parallel echo {} shell quoted is {= '$_=Q($_)' =} ::: '*/!#$'
502
503       Output:
504
505         */!#$ shell quoted is \*/\!\#\$
506
507       skip() skips the job:
508
509         parallel echo {= 'if($_==3) { skip() }' =} ::: {1..5}
510
511       Output:
512
513         1
514         2
515         4
516         5
517
518       @arg contains the input source variables:
519
520         parallel echo {= 'if($arg[1]==$arg[2]) { skip() }' =} \
521           ::: {1..3} ::: {1..3}
522
523       Output:
524
525         1 2
526         1 3
527         2 1
528         2 3
529         3 1
530         3 2
531
532       If the strings {= and =} cause problems they can be replaced with
533       --parens:
534
535         parallel --parens ,,,, echo ',, s:\.[^.]+$::;s:\.[^.]+$::; ,,' \
536           ::: foo.tar.gz
537
538       Output:
539
540         foo
541
542       To define a shorthand replacement string use --rpl:
543
544         parallel --rpl '.. s:\.[^.]+$::;s:\.[^.]+$::;' echo '..' \
545           ::: foo.tar.gz
546
547       Output: Same as above.
548
549       If the shorthand starts with { it can be used as a positional
550       replacement string, too:
551
552         parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{..}'
553           ::: foo.tar.gz
554
555       Output: Same as above.
556
557       If the shorthand contains matching parenthesis the replacement string
558       becomes a dynamic replacement string and the string in the parenthesis
559       can be accessed as $$1. If there are multiple matching parenthesis, the
560       matched strings can be accessed using $$2, $$3 and so on.
561
562       You can think of this as giving arguments to the replacement string.
563       Here we give the argument .tar.gz to the replacement string {%string}
564       which removes string:
565
566         parallel --rpl '{%(.+?)} s/$$1$//;' echo {%.tar.gz}.zip ::: foo.tar.gz
567
568       Output:
569
570         foo.zip
571
572       Here we give the two arguments tar.gz and zip to the replacement string
573       {/string1/string2} which replaces string1 with string2:
574
575         parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;' echo {/tar.gz/zip} \
576           ::: foo.tar.gz
577
578       Output:
579
580         foo.zip
581
582       GNU parallel's 7 replacement strings are implemented as this:
583
584         --rpl '{} '
585         --rpl '{#} $_=$job->seq()'
586         --rpl '{%} $_=$job->slot()'
587         --rpl '{/} s:.*/::'
588         --rpl '{//} $Global::use{"File::Basename"} ||=
589                  eval "use File::Basename; 1;"; $_ = dirname($_);'
590         --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
591         --rpl '{.} s:\.[^/.]+$::'
592
593       Positional replacement strings
594
595       With multiple input sources the argument from the individual input
596       sources can be accessed with {number}:
597
598         parallel echo {1} and {2} ::: A B ::: C D
599
600       Output (the order may be different):
601
602         A and C
603         A and D
604         B and C
605         B and D
606
607       The positional replacement strings can also be modified using /, //,
608       /., and  .:
609
610         parallel echo /={1/} //={1//} /.={1/.} .={1.} ::: A/B.C D/E.F
611
612       Output (the order may be different):
613
614         /=B.C //=A /.=B .=A/B
615         /=E.F //=D /.=E .=D/E
616
617       If a position is negative, it will refer to the input source counted
618       from behind:
619
620         parallel echo 1={1} 2={2} 3={3} -1={-1} -2={-2} -3={-3} \
621           ::: A B ::: C D ::: E F
622
623       Output (the order may be different):
624
625         1=A 2=C 3=E -1=E -2=C -3=A
626         1=A 2=C 3=F -1=F -2=C -3=A
627         1=A 2=D 3=E -1=E -2=D -3=A
628         1=A 2=D 3=F -1=F -2=D -3=A
629         1=B 2=C 3=E -1=E -2=C -3=B
630         1=B 2=C 3=F -1=F -2=C -3=B
631         1=B 2=D 3=E -1=E -2=D -3=B
632         1=B 2=D 3=F -1=F -2=D -3=B
633
634       Positional perl expression replacement string
635
636       To use a perl expression as a positional replacement string simply
637       prepend the perl expression with number and space:
638
639         parallel echo '{=2 s:\.[^.]+$::;s:\.[^.]+$::; =} {1}' \
640           ::: bar ::: foo.tar.gz
641
642       Output:
643
644         foo bar
645
646       If a shorthand defined using --rpl starts with { it can be used as a
647       positional replacement string, too:
648
649         parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{2..} {1}' \
650           ::: bar ::: foo.tar.gz
651
652       Output: Same as above.
653
654       Input from columns
655
656       The columns in a file can be bound to positional replacement strings
657       using --colsep. Here the columns are separated by TAB (\t):
658
659         parallel --colsep '\t' echo 1={1} 2={2} :::: tsv-file.tsv
660
661       Output (the order may be different):
662
663         1=f1 2=f2
664         1=A 2=B
665         1=C 2=D
666
667       Header defined replacement strings
668
669       With --header GNU parallel will use the first value of the input source
670       as the name of the replacement string. Only the non-modified version {}
671       is supported:
672
673         parallel --header : echo f1={f1} f2={f2} ::: f1 A B ::: f2 C D
674
675       Output (the order may be different):
676
677         f1=A f2=C
678         f1=A f2=D
679         f1=B f2=C
680         f1=B f2=D
681
682       It is useful with --colsep for processing files with TAB separated
683       values:
684
685         parallel --header : --colsep '\t' echo f1={f1} f2={f2} \
686           :::: tsv-file.tsv
687
688       Output (the order may be different):
689
690         f1=A f2=B
691         f1=C f2=D
692
693       More pre-defined replacement strings with --plus
694
695       --plus adds the replacement strings {+/} {+.} {+..} {+...} {..}  {...}
696       {/..} {/...} {##}. The idea being that {+foo} matches the opposite of
697       {foo} and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
698       {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}.
699
700         parallel --plus echo {} ::: dir/sub/file.ex1.ex2.ex3
701         parallel --plus echo {+/}/{/} ::: dir/sub/file.ex1.ex2.ex3
702         parallel --plus echo {.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
703         parallel --plus echo {+/}/{/.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
704         parallel --plus echo {..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
705         parallel --plus echo {+/}/{/..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
706         parallel --plus echo {...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
707         parallel --plus echo {+/}/{/...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
708
709       Output:
710
711         dir/sub/file.ex1.ex2.ex3
712
713       {##} is simply the number of jobs:
714
715         parallel --plus echo Job {#} of {##} ::: {1..5}
716
717       Output:
718
719         Job 1 of 5
720         Job 2 of 5
721         Job 3 of 5
722         Job 4 of 5
723         Job 5 of 5
724
725       Dynamic replacement strings with --plus
726
727       --plus also defines these dynamic replacement strings:
728
729       {:-string}         Default value is string if the argument is empty.
730
731       {:number}          Substring from number till end of string.
732
733       {:number1:number2} Substring from number1 to number2.
734
735       {#string}          If the argument starts with string, remove it.
736
737       {%string}          If the argument ends with string, remove it.
738
739       {/string1/string2} Replace string1 with string2.
740
741       {^string}          If the argument starts with string, upper case it.
742                          string must be a single letter.
743
744       {^^string}         If the argument contains string, upper case it.
745                          string must be a single letter.
746
747       {,string}          If the argument starts with string, lower case it.
748                          string must be a single letter.
749
750       {,,string}         If the argument contains string, lower case it.
751                          string must be a single letter.
752
753       They are inspired from Bash:
754
755         unset myvar
756         echo ${myvar:-myval}
757         parallel --plus echo {:-myval} ::: "$myvar"
758
759         myvar=abcAaAdef
760         echo ${myvar:2}
761         parallel --plus echo {:2} ::: "$myvar"
762
763         echo ${myvar:2:3}
764         parallel --plus echo {:2:3} ::: "$myvar"
765
766         echo ${myvar#bc}
767         parallel --plus echo {#bc} ::: "$myvar"
768         echo ${myvar#abc}
769         parallel --plus echo {#abc} ::: "$myvar"
770
771         echo ${myvar%de}
772         parallel --plus echo {%de} ::: "$myvar"
773         echo ${myvar%def}
774         parallel --plus echo {%def} ::: "$myvar"
775
776         echo ${myvar/def/ghi}
777         parallel --plus echo {/def/ghi} ::: "$myvar"
778
779         echo ${myvar^a}
780         parallel --plus echo {^a} ::: "$myvar"
781         echo ${myvar^^a}
782         parallel --plus echo {^^a} ::: "$myvar"
783
784         myvar=AbcAaAdef
785         echo ${myvar,A}
786         parallel --plus echo '{,A}' ::: "$myvar"
787         echo ${myvar,,A}
788         parallel --plus echo '{,,A}' ::: "$myvar"
789
790       Output:
791
792         myval
793         myval
794         cAaAdef
795         cAaAdef
796         cAa
797         cAa
798         abcAaAdef
799         abcAaAdef
800         AaAdef
801         AaAdef
802         abcAaAdef
803         abcAaAdef
804         abcAaA
805         abcAaA
806         abcAaAghi
807         abcAaAghi
808         AbcAaAdef
809         AbcAaAdef
810         AbcAAAdef
811         AbcAAAdef
812         abcAaAdef
813         abcAaAdef
814         abcaaadef
815         abcaaadef
816
817   More than one argument
818       With --xargs GNU parallel will fit as many arguments as possible on a
819       single line:
820
821         cat num30000 | parallel --xargs echo | wc -l
822
823       Output (if you run this under Bash on GNU/Linux):
824
825         2
826
827       The 30000 arguments fitted on 2 lines.
828
829       The maximal length of a single line can be set with -s. With a maximal
830       line length of 10000 chars 17 commands will be run:
831
832         cat num30000 | parallel --xargs -s 10000 echo | wc -l
833
834       Output:
835
836         17
837
838       For better parallelism GNU parallel can distribute the arguments
839       between all the parallel jobs when end of file is met.
840
841       Below GNU parallel reads the last argument when generating the second
842       job. When GNU parallel reads the last argument, it spreads all the
843       arguments for the second job over 4 jobs instead, as 4 parallel jobs
844       are requested.
845
846       The first job will be the same as the --xargs example above, but the
847       second job will be split into 4 evenly sized jobs, resulting in a total
848       of 5 jobs:
849
850         cat num30000 | parallel --jobs 4 -m echo | wc -l
851
852       Output (if you run this under Bash on GNU/Linux):
853
854         5
855
856       This is even more visible when running 4 jobs with 10 arguments. The 10
857       arguments are being spread over 4 jobs:
858
859         parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10
860
861       Output:
862
863         1 2 3
864         4 5 6
865         7 8 9
866         10
867
868       A replacement string can be part of a word. -m will not repeat the
869       context:
870
871         parallel --jobs 4 -m echo pre-{}-post ::: A B C D E F G
872
873       Output (the order may be different):
874
875         pre-A B-post
876         pre-C D-post
877         pre-E F-post
878         pre-G-post
879
880       To repeat the context use -X which otherwise works like -m:
881
882         parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G
883
884       Output (the order may be different):
885
886         pre-A-post pre-B-post
887         pre-C-post pre-D-post
888         pre-E-post pre-F-post
889         pre-G-post
890
891       To limit the number of arguments use -N:
892
893         parallel -N3 echo ::: A B C D E F G H
894
895       Output (the order may be different):
896
897         A B C
898         D E F
899         G H
900
901       -N also sets the positional replacement strings:
902
903         parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H
904
905       Output (the order may be different):
906
907         1=A 2=B 3=C
908         1=D 2=E 3=F
909         1=G 2=H 3=
910
911       -N0 reads 1 argument but inserts none:
912
913         parallel -N0 echo foo ::: 1 2 3
914
915       Output:
916
917         foo
918         foo
919         foo
920
921   Quoting
922       Command lines that contain special characters may need to be protected
923       from the shell.
924
925       The perl program print "@ARGV\n" basically works like echo.
926
927         perl -e 'print "@ARGV\n"' A
928
929       Output:
930
931         A
932
933       To run that in parallel the command needs to be quoted:
934
935         parallel perl -e 'print "@ARGV\n"' ::: This wont work
936
937       Output:
938
939         [Nothing]
940
941       To quote the command use -q:
942
943         parallel -q perl -e 'print "@ARGV\n"' ::: This works
944
945       Output (the order may be different):
946
947         This
948         works
949
950       Or you can quote the critical part using \':
951
952         parallel perl -e \''print "@ARGV\n"'\' ::: This works, too
953
954       Output (the order may be different):
955
956         This
957         works,
958         too
959
960       GNU parallel can also \-quote full lines. Simply run this:
961
962         parallel --shellquote
963         Warning: Input is read from the terminal. You either know what you
964         Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
965         Warning: ::: or :::: or to pipe data into parallel. If so
966         Warning: consider going through the tutorial: man parallel_tutorial
967         Warning: Press CTRL-D to exit.
968         perl -e 'print "@ARGV\n"'
969         [CTRL-D]
970
971       Output:
972
973         perl\ -e\ \'print\ \"@ARGV\\n\"\'
974
975       This can then be used as the command:
976
977         parallel perl\ -e\ \'print\ \"@ARGV\\n\"\' ::: This also works
978
979       Output (the order may be different):
980
981         This
982         also
983         works
984
985   Trimming space
986       Space can be trimmed on the arguments using --trim:
987
988         parallel --trim r echo pre-{}-post ::: ' A '
989
990       Output:
991
992         pre- A-post
993
994       To trim on the left side:
995
996         parallel --trim l echo pre-{}-post ::: ' A '
997
998       Output:
999
1000         pre-A -post
1001
1002       To trim on the both sides:
1003
1004         parallel --trim lr echo pre-{}-post ::: ' A '
1005
1006       Output:
1007
1008         pre-A-post
1009
1010   Respecting the shell
1011       This tutorial uses Bash as the shell. GNU parallel respects which shell
1012       you are using, so in zsh you can do:
1013
1014         parallel echo \={} ::: zsh bash ls
1015
1016       Output:
1017
1018         /usr/bin/zsh
1019         /bin/bash
1020         /bin/ls
1021
1022       In csh you can do:
1023
1024         parallel 'set a="{}"; if( { test -d "$a" } ) echo "$a is a dir"' ::: *
1025
1026       Output:
1027
1028         [somedir] is a dir
1029
1030       This also becomes useful if you use GNU parallel in a shell script: GNU
1031       parallel will use the same shell as the shell script.
1032

Controlling the output

1034       The output can prefixed with the argument:
1035
1036         parallel --tag echo foo-{} ::: A B C
1037
1038       Output (the order may be different):
1039
1040         A       foo-A
1041         B       foo-B
1042         C       foo-C
1043
1044       To prefix it with another string use --tagstring:
1045
1046         parallel --tagstring {}-bar echo foo-{} ::: A B C
1047
1048       Output (the order may be different):
1049
1050         A-bar   foo-A
1051         B-bar   foo-B
1052         C-bar   foo-C
1053
1054       To see what commands will be run without running them use --dryrun:
1055
1056         parallel --dryrun echo {} ::: A B C
1057
1058       Output (the order may be different):
1059
1060         echo A
1061         echo B
1062         echo C
1063
1064       To print the command before running them use --verbose:
1065
1066         parallel --verbose echo {} ::: A B C
1067
1068       Output (the order may be different):
1069
1070         echo A
1071         echo B
1072         A
1073         echo C
1074         B
1075         C
1076
1077       GNU parallel will postpone the output until the command completes:
1078
1079         parallel -j2 'printf "%s-start\n%s" {} {};
1080           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1081
1082       Output:
1083
1084         2-start
1085         2-middle
1086         2-end
1087         1-start
1088         1-middle
1089         1-end
1090         4-start
1091         4-middle
1092         4-end
1093
1094       To get the output immediately use --ungroup:
1095
1096         parallel -j2 --ungroup 'printf "%s-start\n%s" {} {};
1097           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1098
1099       Output:
1100
1101         4-start
1102         42-start
1103         2-middle
1104         2-end
1105         1-start
1106         1-middle
1107         1-end
1108         -middle
1109         4-end
1110
1111       --ungroup is fast, but can cause half a line from one job to be mixed
1112       with half a line of another job. That has happened in the second line,
1113       where the line '4-middle' is mixed with '2-start'.
1114
1115       To avoid this use --linebuffer:
1116
1117         parallel -j2 --linebuffer 'printf "%s-start\n%s" {} {};
1118           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1119
1120       Output:
1121
1122         4-start
1123         2-start
1124         2-middle
1125         2-end
1126         1-start
1127         1-middle
1128         1-end
1129         4-middle
1130         4-end
1131
1132       To force the output in the same order as the arguments use
1133       --keep-order/-k:
1134
1135         parallel -j2 -k 'printf "%s-start\n%s" {} {};
1136           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1137
1138       Output:
1139
1140         4-start
1141         4-middle
1142         4-end
1143         2-start
1144         2-middle
1145         2-end
1146         1-start
1147         1-middle
1148         1-end
1149
1150   Saving output into files
1151       GNU parallel can save the output of each job into files:
1152
1153         parallel --files echo ::: A B C
1154
1155       Output will be similar to this:
1156
1157         /tmp/pAh6uWuQCg.par
1158         /tmp/opjhZCzAX4.par
1159         /tmp/W0AT_Rph2o.par
1160
1161       By default GNU parallel will cache the output in files in /tmp. This
1162       can be changed by setting $TMPDIR or --tmpdir:
1163
1164         parallel --tmpdir /var/tmp --files echo ::: A B C
1165
1166       Output will be similar to this:
1167
1168         /var/tmp/N_vk7phQRc.par
1169         /var/tmp/7zA4Ccf3wZ.par
1170         /var/tmp/LIuKgF_2LP.par
1171
1172       Or:
1173
1174         TMPDIR=/var/tmp parallel --files echo ::: A B C
1175
1176       Output: Same as above.
1177
1178       The output files can be saved in a structured way using --results:
1179
1180         parallel --results outdir echo ::: A B C
1181
1182       Output:
1183
1184         A
1185         B
1186         C
1187
1188       These files were also generated containing the standard output
1189       (stdout), standard error (stderr), and the sequence number (seq):
1190
1191         outdir/1/A/seq
1192         outdir/1/A/stderr
1193         outdir/1/A/stdout
1194         outdir/1/B/seq
1195         outdir/1/B/stderr
1196         outdir/1/B/stdout
1197         outdir/1/C/seq
1198         outdir/1/C/stderr
1199         outdir/1/C/stdout
1200
1201       --header : will take the first value as name and use that in the
1202       directory structure. This is useful if you are using multiple input
1203       sources:
1204
1205         parallel --header : --results outdir echo ::: f1 A B ::: f2 C D
1206
1207       Generated files:
1208
1209         outdir/f1/A/f2/C/seq
1210         outdir/f1/A/f2/C/stderr
1211         outdir/f1/A/f2/C/stdout
1212         outdir/f1/A/f2/D/seq
1213         outdir/f1/A/f2/D/stderr
1214         outdir/f1/A/f2/D/stdout
1215         outdir/f1/B/f2/C/seq
1216         outdir/f1/B/f2/C/stderr
1217         outdir/f1/B/f2/C/stdout
1218         outdir/f1/B/f2/D/seq
1219         outdir/f1/B/f2/D/stderr
1220         outdir/f1/B/f2/D/stdout
1221
1222       The directories are named after the variables and their values.
1223

Controlling the execution

1225   Number of simultaneous jobs
1226       The number of concurrent jobs is given with --jobs/-j:
1227
1228         /usr/bin/time parallel -N0 -j64 sleep 1 :::: num128
1229
1230       With 64 jobs in parallel the 128 sleeps will take 2-8 seconds to run -
1231       depending on how fast your machine is.
1232
1233       By default --jobs is the same as the number of CPU cores. So this:
1234
1235         /usr/bin/time parallel -N0 sleep 1 :::: num128
1236
1237       should take twice the time of running 2 jobs per CPU core:
1238
1239         /usr/bin/time parallel -N0 --jobs 200% sleep 1 :::: num128
1240
1241       --jobs 0 will run as many jobs in parallel as possible:
1242
1243         /usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: num128
1244
1245       which should take 1-7 seconds depending on how fast your machine is.
1246
1247       --jobs can read from a file which is re-read when a job finishes:
1248
1249         echo 50% > my_jobs
1250         /usr/bin/time parallel -N0 --jobs my_jobs sleep 1 :::: num128 &
1251         sleep 1
1252         echo 0 > my_jobs
1253         wait
1254
1255       The first second only 50% of the CPU cores will run a job. Then 0 is
1256       put into my_jobs and then the rest of the jobs will be started in
1257       parallel.
1258
1259       Instead of basing the percentage on the number of CPU cores GNU
1260       parallel can base it on the number of CPUs:
1261
1262         parallel --use-cpus-instead-of-cores -N0 sleep 1 :::: num8
1263
1264   Shuffle job order
1265       If you have many jobs (e.g. by multiple combinations of input sources),
1266       it can be handy to shuffle the jobs, so you get different values run.
1267       Use --shuf for that:
1268
1269         parallel --shuf echo ::: 1 2 3 ::: a b c ::: A B C
1270
1271       Output:
1272
1273         All combinations but different order for each run.
1274
1275   Interactivity
1276       GNU parallel can ask the user if a command should be run using
1277       --interactive:
1278
1279         parallel --interactive echo ::: 1 2 3
1280
1281       Output:
1282
1283         echo 1 ?...y
1284         echo 2 ?...n
1285         1
1286         echo 3 ?...y
1287         3
1288
1289       GNU parallel can be used to put arguments on the command line for an
1290       interactive command such as emacs to edit one file at a time:
1291
1292         parallel --tty emacs ::: 1 2 3
1293
1294       Or give multiple argument in one go to open multiple files:
1295
1296         parallel -X --tty vi ::: 1 2 3
1297
1298   A terminal for every job
1299       Using --tmux GNU parallel can start a terminal for every job run:
1300
1301         seq 10 20 | parallel --tmux 'echo start {}; sleep {}; echo done {}'
1302
1303       This will tell you to run something similar to:
1304
1305         tmux -S /tmp/tmsrPrO0 attach
1306
1307       Using normal tmux keystrokes (CTRL-b n or CTRL-b p) you can cycle
1308       between windows of the running jobs. When a job is finished it will
1309       pause for 10 seconds before closing the window.
1310
1311   Timing
1312       Some jobs do heavy I/O when they start. To avoid a thundering herd GNU
1313       parallel can delay starting new jobs. --delay X will make sure there is
1314       at least X seconds between each start:
1315
1316         parallel --delay 2.5 echo Starting {}\;date ::: 1 2 3
1317
1318       Output:
1319
1320         Starting 1
1321         Thu Aug 15 16:24:33 CEST 2013
1322         Starting 2
1323         Thu Aug 15 16:24:35 CEST 2013
1324         Starting 3
1325         Thu Aug 15 16:24:38 CEST 2013
1326
1327       If jobs taking more than a certain amount of time are known to fail,
1328       they can be stopped with --timeout. The accuracy of --timeout is 2
1329       seconds:
1330
1331         parallel --timeout 4.1 sleep {}\; echo {} ::: 2 4 6 8
1332
1333       Output:
1334
1335         2
1336         4
1337
1338       GNU parallel can compute the median runtime for jobs and kill those
1339       that take more than 200% of the median runtime:
1340
1341         parallel --timeout 200% sleep {}\; echo {} ::: 2.1 2.2 3 7 2.3
1342
1343       Output:
1344
1345         2.1
1346         2.2
1347         3
1348         2.3
1349
1350   Progress information
1351       Based on the runtime of completed jobs GNU parallel can estimate the
1352       total runtime:
1353
1354         parallel --eta sleep ::: 1 3 2 2 1 3 3 2 1
1355
1356       Output:
1357
1358         Computers / CPU cores / Max jobs to run
1359         1:local / 2 / 2
1360
1361         Computer:jobs running/jobs completed/%of started jobs/
1362           Average seconds to complete
1363         ETA: 2s 0left 1.11avg  local:0/9/100%/1.1s
1364
1365       GNU parallel can give progress information with --progress:
1366
1367         parallel --progress sleep ::: 1 3 2 2 1 3 3 2 1
1368
1369       Output:
1370
1371         Computers / CPU cores / Max jobs to run
1372         1:local / 2 / 2
1373
1374         Computer:jobs running/jobs completed/%of started jobs/
1375           Average seconds to complete
1376         local:0/9/100%/1.1s
1377
1378       A progress bar can be shown with --bar:
1379
1380         parallel --bar sleep ::: 1 3 2 2 1 3 3 2 1
1381
1382       And a graphic bar can be shown with --bar and zenity:
1383
1384         seq 1000 | parallel -j10 --bar '(echo -n {};sleep 0.1)' \
1385           2> >(zenity --progress --auto-kill --auto-close)
1386
1387       A logfile of the jobs completed so far can be generated with --joblog:
1388
1389         parallel --joblog /tmp/log exit  ::: 1 2 3 0
1390         cat /tmp/log
1391
1392       Output:
1393
1394         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1395         1   :    1376577364.974 0.008   0    0       1       0      exit 1
1396         2   :    1376577364.982 0.013   0    0       2       0      exit 2
1397         3   :    1376577364.990 0.013   0    0       3       0      exit 3
1398         4   :    1376577365.003 0.003   0    0       0       0      exit 0
1399
1400       The log contains the job sequence, which host the job was run on, the
1401       start time and run time, how much data was transferred, the exit value,
1402       the signal that killed the job, and finally the command being run.
1403
1404       With a joblog GNU parallel can be stopped and later pickup where it
1405       left off. It it important that the input of the completed jobs is
1406       unchanged.
1407
1408         parallel --joblog /tmp/log exit  ::: 1 2 3 0
1409         cat /tmp/log
1410         parallel --resume --joblog /tmp/log exit  ::: 1 2 3 0 0 0
1411         cat /tmp/log
1412
1413       Output:
1414
1415         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1416         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1417         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1418         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1419         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1420
1421         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1422         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1423         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1424         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1425         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1426         5   :    1376580070.028 0.009   0    0       0       0      exit 0
1427         6   :    1376580070.038 0.007   0    0       0       0      exit 0
1428
1429       Note how the start time of the last 2 jobs is clearly different from
1430       the second run.
1431
1432       With --resume-failed GNU parallel will re-run the jobs that failed:
1433
1434         parallel --resume-failed --joblog /tmp/log exit  ::: 1 2 3 0 0 0
1435         cat /tmp/log
1436
1437       Output:
1438
1439         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1440         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1441         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1442         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1443         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1444         5   :    1376580070.028 0.009   0    0       0       0      exit 0
1445         6   :    1376580070.038 0.007   0    0       0       0      exit 0
1446         1   :    1376580154.433 0.010   0    0       1       0      exit 1
1447         2   :    1376580154.444 0.022   0    0       2       0      exit 2
1448         3   :    1376580154.466 0.005   0    0       3       0      exit 3
1449
1450       Note how seq 1 2 3 have been repeated because they had exit value
1451       different from 0.
1452
1453       --retry-failed does almost the same as --resume-failed. Where
1454       --resume-failed reads the commands from the command line (and ignores
1455       the commands in the joblog), --retry-failed ignores the command line
1456       and reruns the commands mentioned in the joblog.
1457
1458         parallel --retry-failed --joblog /tmp/log
1459         cat /tmp/log
1460
1461       Output:
1462
1463         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1464         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1465         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1466         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1467         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1468         5   :    1376580070.028 0.009   0    0       0       0      exit 0
1469         6   :    1376580070.038 0.007   0    0       0       0      exit 0
1470         1   :    1376580154.433 0.010   0    0       1       0      exit 1
1471         2   :    1376580154.444 0.022   0    0       2       0      exit 2
1472         3   :    1376580154.466 0.005   0    0       3       0      exit 3
1473         1   :    1376580164.633 0.010   0    0       1       0      exit 1
1474         2   :    1376580164.644 0.022   0    0       2       0      exit 2
1475         3   :    1376580164.666 0.005   0    0       3       0      exit 3
1476
1477   Termination
1478       Unconditional termination
1479
1480       By default GNU parallel will wait for all jobs to finish before
1481       exiting.
1482
1483       If you send GNU parallel the TERM signal, GNU parallel will stop
1484       spawning new jobs and wait for the remaining jobs to finish. If you
1485       send GNU parallel the TERM signal again, GNU parallel will kill all
1486       running jobs and exit.
1487
1488       Termination dependent on job status
1489
1490       For certain jobs there is no need to continue if one of the jobs fails
1491       and has an exit code different from 0. GNU parallel will stop spawning
1492       new jobs with --halt soon,fail=1:
1493
1494         parallel -j2 --halt soon,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1495
1496       Output:
1497
1498         0
1499         0
1500         1
1501         parallel: This job failed:
1502         echo 1; exit 1
1503         parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1504         2
1505
1506       With --halt now,fail=1 the running jobs will be killed immediately:
1507
1508         parallel -j2 --halt now,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1509
1510       Output:
1511
1512         0
1513         0
1514         1
1515         parallel: This job failed:
1516         echo 1; exit 1
1517
1518       If --halt is given a percentage this percentage of the jobs must fail
1519       before GNU parallel stops spawning more jobs:
1520
1521         parallel -j2 --halt soon,fail=20% echo {}\; exit {} \
1522           ::: 0 1 2 3 4 5 6 7 8 9
1523
1524       Output:
1525
1526         0
1527         1
1528         parallel: This job failed:
1529         echo 1; exit 1
1530         2
1531         parallel: This job failed:
1532         echo 2; exit 2
1533         parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1534         3
1535         parallel: This job failed:
1536         echo 3; exit 3
1537
1538       If you are looking for success instead of failures, you can use
1539       success. This will finish as soon as the first job succeeds:
1540
1541         parallel -j2 --halt now,success=1 echo {}\; exit {} ::: 1 2 3 0 4 5 6
1542
1543       Output:
1544
1545         1
1546         2
1547         3
1548         0
1549         parallel: This job succeeded:
1550         echo 0; exit 0
1551
1552       GNU parallel can retry the command with --retries. This is useful if a
1553       command fails for unknown reasons now and then.
1554
1555         parallel -k --retries 3 \
1556           'echo tried {} >>/tmp/runs; echo completed {}; exit {}' ::: 1 2 0
1557         cat /tmp/runs
1558
1559       Output:
1560
1561         completed 1
1562         completed 2
1563         completed 0
1564
1565         tried 1
1566         tried 2
1567         tried 1
1568         tried 2
1569         tried 1
1570         tried 2
1571         tried 0
1572
1573       Note how job 1 and 2 were tried 3 times, but 0 was not retried because
1574       it had exit code 0.
1575
1576       Termination signals (advanced)
1577
1578       Using --termseq you can control which signals are sent when killing
1579       children. Normally children will be killed by sending them SIGTERM,
1580       waiting 200 ms, then another SIGTERM, waiting 100 ms, then another
1581       SIGTERM, waiting 50 ms, then a SIGKILL, finally waiting 25 ms before
1582       giving up. It looks like this:
1583
1584         show_signals() {
1585           perl -e 'for(keys %SIG) {
1586               $SIG{$_} = eval "sub { print \"Got $_\\n\"; }";
1587             }
1588             while(1){sleep 1}'
1589         }
1590         export -f show_signals
1591         echo | parallel --termseq TERM,200,TERM,100,TERM,50,KILL,25 \
1592           -u --timeout 1 show_signals
1593
1594       Output:
1595
1596         Got TERM
1597         Got TERM
1598         Got TERM
1599
1600       Or just:
1601
1602         echo | parallel -u --timeout 1 show_signals
1603
1604       Output: Same as above.
1605
1606       You can change this to SIGINT, SIGTERM, SIGKILL:
1607
1608         echo | parallel --termseq INT,200,TERM,100,KILL,25 \
1609           -u --timeout 1 show_signals
1610
1611       Output:
1612
1613         Got INT
1614         Got TERM
1615
1616       The SIGKILL does not show because it cannot be caught, and thus the
1617       child dies.
1618
1619   Limiting the resources
1620       To avoid overloading systems GNU parallel can look at the system load
1621       before starting another job:
1622
1623         parallel --load 100% echo load is less than {} job per cpu ::: 1
1624
1625       Output:
1626
1627         [when then load is less than the number of cpu cores]
1628         load is less than 1 job per cpu
1629
1630       GNU parallel can also check if the system is swapping.
1631
1632         parallel --noswap echo the system is not swapping ::: now
1633
1634       Output:
1635
1636         [when then system is not swapping]
1637         the system is not swapping now
1638
1639       Some jobs need a lot of memory, and should only be started when there
1640       is enough memory free. Using --memfree GNU parallel can check if there
1641       is enough memory free. Additionally, GNU parallel will kill off the
1642       youngest job if the memory free falls below 50% of the size. The killed
1643       job will put back on the queue and retried later.
1644
1645         parallel --memfree 1G echo will run if more than 1 GB is ::: free
1646
1647       GNU parallel can run the jobs with a nice value. This will work both
1648       locally and remotely.
1649
1650         parallel --nice 17 echo this is being run with nice -n ::: 17
1651
1652       Output:
1653
1654         this is being run with nice -n 17
1655

Remote execution

1657       GNU parallel can run jobs on remote servers. It uses ssh to communicate
1658       with the remote machines.
1659
1660   Sshlogin
1661       The most basic sshlogin is -S host:
1662
1663         parallel -S $SERVER1 echo running on ::: $SERVER1
1664
1665       Output:
1666
1667         running on [$SERVER1]
1668
1669       To use a different username prepend the server with username@:
1670
1671         parallel -S username@$SERVER1 echo running on ::: username@$SERVER1
1672
1673       Output:
1674
1675         running on [username@$SERVER1]
1676
1677       The special sshlogin : is the local machine:
1678
1679         parallel -S : echo running on ::: the_local_machine
1680
1681       Output:
1682
1683         running on the_local_machine
1684
1685       If ssh is not in $PATH it can be prepended to $SERVER1:
1686
1687         parallel -S '/usr/bin/ssh '$SERVER1 echo custom ::: ssh
1688
1689       Output:
1690
1691         custom ssh
1692
1693       The ssh command can also be given using --ssh:
1694
1695         parallel --ssh /usr/bin/ssh -S $SERVER1 echo custom ::: ssh
1696
1697       or by setting $PARALLEL_SSH:
1698
1699         export PARALLEL_SSH=/usr/bin/ssh
1700         parallel -S $SERVER1 echo custom ::: ssh
1701
1702       Several servers can be given using multiple -S:
1703
1704         parallel -S $SERVER1 -S $SERVER2 echo ::: running on more hosts
1705
1706       Output (the order may be different):
1707
1708         running
1709         on
1710         more
1711         hosts
1712
1713       Or they can be separated by ,:
1714
1715         parallel -S $SERVER1,$SERVER2 echo ::: running on more hosts
1716
1717       Output: Same as above.
1718
1719       Or newline:
1720
1721         # This gives a \n between $SERVER1 and $SERVER2
1722         SERVERS="`echo $SERVER1; echo $SERVER2`"
1723         parallel -S "$SERVERS" echo ::: running on more hosts
1724
1725       They can also be read from a file (replace user@ with the user on
1726       $SERVER2):
1727
1728         echo $SERVER1 > nodefile
1729         # Force 4 cores, special ssh-command, username
1730         echo 4//usr/bin/ssh user@$SERVER2 >> nodefile
1731         parallel --sshloginfile nodefile echo ::: running on more hosts
1732
1733       Output: Same as above.
1734
1735       Every time a job finished, the --sshloginfile will be re-read, so it is
1736       possible to both add and remove hosts while running.
1737
1738       The special --sshloginfile .. reads from ~/.parallel/sshloginfile.
1739
1740       To force GNU parallel to treat a server having a given number of CPU
1741       cores prepend the number of core followed by / to the sshlogin:
1742
1743         parallel -S 4/$SERVER1 echo force {} cpus on server ::: 4
1744
1745       Output:
1746
1747         force 4 cpus on server
1748
1749       Servers can be put into groups by prepending @groupname to the server
1750       and the group can then be selected by appending @groupname to the
1751       argument if using --hostgroup:
1752
1753         parallel --hostgroup -S @grp1/$SERVER1 -S @grp2/$SERVER2 echo {} \
1754           ::: run_on_grp1@grp1 run_on_grp2@grp2
1755
1756       Output:
1757
1758         run_on_grp1
1759         run_on_grp2
1760
1761       A host can be in multiple groups by separating the groups with +, and
1762       you can force GNU parallel to limit the groups on which the command can
1763       be run with -S @groupname:
1764
1765         parallel -S @grp1 -S @grp1+grp2/$SERVER1 -S @grp2/SERVER2 echo {} \
1766           ::: run_on_grp1 also_grp1
1767
1768       Output:
1769
1770         run_on_grp1
1771         also_grp1
1772
1773   Transferring files
1774       GNU parallel can transfer the files to be processed to the remote host.
1775       It does that using rsync.
1776
1777         echo This is input_file > input_file
1778         parallel -S $SERVER1 --transferfile {} cat ::: input_file
1779
1780       Output:
1781
1782         This is input_file
1783
1784       If the files are processed into another file, the resulting file can be
1785       transferred back:
1786
1787         echo This is input_file > input_file
1788         parallel -S $SERVER1 --transferfile {} --return {}.out \
1789           cat {} ">"{}.out ::: input_file
1790         cat input_file.out
1791
1792       Output: Same as above.
1793
1794       To remove the input and output file on the remote server use --cleanup:
1795
1796         echo This is input_file > input_file
1797         parallel -S $SERVER1 --transferfile {} --return {}.out --cleanup \
1798           cat {} ">"{}.out ::: input_file
1799         cat input_file.out
1800
1801       Output: Same as above.
1802
1803       There is a shorthand for --transferfile {} --return --cleanup called
1804       --trc:
1805
1806         echo This is input_file > input_file
1807         parallel -S $SERVER1 --trc {}.out cat {} ">"{}.out ::: input_file
1808         cat input_file.out
1809
1810       Output: Same as above.
1811
1812       Some jobs need a common database for all jobs. GNU parallel can
1813       transfer that using --basefile which will transfer the file before the
1814       first job:
1815
1816         echo common data > common_file
1817         parallel --basefile common_file -S $SERVER1 \
1818           cat common_file\; echo {} ::: foo
1819
1820       Output:
1821
1822         common data
1823         foo
1824
1825       To remove it from the remote host after the last job use --cleanup.
1826
1827   Working dir
1828       The default working dir on the remote machines is the login dir. This
1829       can be changed with --workdir mydir.
1830
1831       Files transferred using --transferfile and --return will be relative to
1832       mydir on remote computers, and the command will be executed in the dir
1833       mydir.
1834
1835       The special mydir value ... will create working dirs under
1836       ~/.parallel/tmp on the remote computers. If --cleanup is given these
1837       dirs will be removed.
1838
1839       The special mydir value . uses the current working dir.  If the current
1840       working dir is beneath your home dir, the value . is treated as the
1841       relative path to your home dir. This means that if your home dir is
1842       different on remote computers (e.g. if your login is different) the
1843       relative path will still be relative to your home dir.
1844
1845         parallel -S $SERVER1 pwd ::: ""
1846         parallel --workdir . -S $SERVER1 pwd ::: ""
1847         parallel --workdir ... -S $SERVER1 pwd ::: ""
1848
1849       Output:
1850
1851         [the login dir on $SERVER1]
1852         [current dir relative on $SERVER1]
1853         [a dir in ~/.parallel/tmp/...]
1854
1855   Avoid overloading sshd
1856       If many jobs are started on the same server, sshd can be overloaded.
1857       GNU parallel can insert a delay between each job run on the same
1858       server:
1859
1860         parallel -S $SERVER1 --sshdelay 0.2 echo ::: 1 2 3
1861
1862       Output (the order may be different):
1863
1864         1
1865         2
1866         3
1867
1868       sshd will be less overloaded if using --controlmaster, which will
1869       multiplex ssh connections:
1870
1871         parallel --controlmaster -S $SERVER1 echo ::: 1 2 3
1872
1873       Output: Same as above.
1874
1875   Ignore hosts that are down
1876       In clusters with many hosts a few of them are often down. GNU parallel
1877       can ignore those hosts. In this case the host 173.194.32.46 is down:
1878
1879         parallel --filter-hosts -S 173.194.32.46,$SERVER1 echo ::: bar
1880
1881       Output:
1882
1883         bar
1884
1885   Running the same commands on all hosts
1886       GNU parallel can run the same command on all the hosts:
1887
1888         parallel --onall -S $SERVER1,$SERVER2 echo ::: foo bar
1889
1890       Output (the order may be different):
1891
1892         foo
1893         bar
1894         foo
1895         bar
1896
1897       Often you will just want to run a single command on all hosts with out
1898       arguments. --nonall is a no argument --onall:
1899
1900         parallel --nonall -S $SERVER1,$SERVER2 echo foo bar
1901
1902       Output:
1903
1904         foo bar
1905         foo bar
1906
1907       When --tag is used with --nonall and --onall the --tagstring is the
1908       host:
1909
1910         parallel --nonall --tag -S $SERVER1,$SERVER2 echo foo bar
1911
1912       Output (the order may be different):
1913
1914         $SERVER1 foo bar
1915         $SERVER2 foo bar
1916
1917       --jobs sets the number of servers to log in to in parallel.
1918
1919   Transferring environment variables and functions
1920       env_parallel is a shell function that transfers all aliases, functions,
1921       variables, and arrays. You active it by running:
1922
1923         source `which env_parallel.bash`
1924
1925       Replace bash with the shell you use.
1926
1927       Now you can use env_parallel instead of parallel and still have your
1928       environment:
1929
1930         alias myecho=echo
1931         myvar="Joe's var is"
1932         env_parallel -S $SERVER1 'myecho $myvar' ::: green
1933
1934       Output:
1935
1936         Joe's var is green
1937
1938       The disadvantage is that if your environment is huge env_parallel will
1939       fail.
1940
1941       When env_parallel fails, you can still use --env to tell GNU parallel
1942       to transfer an environment variable to the remote system.
1943
1944         MYVAR='foo bar'
1945         export MYVAR
1946         parallel --env MYVAR -S $SERVER1 echo '$MYVAR' ::: baz
1947
1948       Output:
1949
1950         foo bar baz
1951
1952       This works for functions, too, if your shell is Bash:
1953
1954         # This only works in Bash
1955         my_func() {
1956           echo in my_func $1
1957         }
1958         export -f my_func
1959         parallel --env my_func -S $SERVER1 my_func ::: baz
1960
1961       Output:
1962
1963         in my_func baz
1964
1965       GNU parallel can copy all user defined variables and functions to the
1966       remote system. It just needs to record which ones to ignore in
1967       ~/.parallel/ignored_vars. Do that by running this once:
1968
1969         parallel --record-env
1970         cat ~/.parallel/ignored_vars
1971
1972       Output:
1973
1974         [list of variables to ignore - including $PATH and $HOME]
1975
1976       Now all other variables and functions defined will be copied when using
1977       --env _.
1978
1979         # The function is only copied if using Bash
1980         my_func2() {
1981           echo in my_func2 $VAR $1
1982         }
1983         export -f my_func2
1984         VAR=foo
1985         export VAR
1986
1987         parallel --env _ -S $SERVER1 'echo $VAR; my_func2' ::: bar
1988
1989       Output:
1990
1991         foo
1992         in my_func2 foo bar
1993
1994       If you use env_parallel the variables, functions, and aliases do not
1995       even need to be exported to be copied:
1996
1997         NOT='not exported var'
1998         alias myecho=echo
1999         not_ex() {
2000           myecho in not_exported_func $NOT $1
2001         }
2002         env_parallel --env _ -S $SERVER1 'echo $NOT; not_ex' ::: bar
2003
2004       Output:
2005
2006         not exported var
2007         in not_exported_func not exported var bar
2008
2009   Showing what is actually run
2010       --verbose will show the command that would be run on the local machine.
2011
2012       When using --cat, --pipepart, or when a job is run on a remote machine,
2013       the command is wrapped with helper scripts. -vv shows all of this.
2014
2015         parallel -vv --pipepart --block 1M wc :::: num30000
2016
2017       Output:
2018
2019         <num30000 perl -e 'while(@ARGV) { sysseek(STDIN,shift,0) || die;
2020         $left = shift; while($read = sysread(STDIN,$buf, ($left > 131072
2021         ? 131072 : $left))){ $left -= $read; syswrite(STDOUT,$buf); } }'
2022         0 0 0 168894 | (wc)
2023           30000   30000  168894
2024
2025       When the command gets more complex, the output is so hard to read, that
2026       it is only useful for debugging:
2027
2028         my_func3() {
2029           echo in my_func $1 > $1.out
2030         }
2031         export -f my_func3
2032         parallel -vv --workdir ... --nice 17 --env _ --trc {}.out \
2033           -S $SERVER1 my_func3 {} ::: abc-file
2034
2035       Output will be similar to:
2036
2037         ( ssh server -- mkdir -p ./.parallel/tmp/aspire-1928520-1;rsync
2038         --protocol 30 -rlDzR -essh ./abc-file
2039         server:./.parallel/tmp/aspire-1928520-1 );ssh server -- exec perl -e
2040         \''@GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
2041         eval"@GNU_Parallel";my$eval=decode_base64(join"",@ARGV);eval$eval;'\'
2042         c3lzdGVtKCJta2RpciIsIi1wIiwiLS0iLCIucGFyYWxsZWwvdG1wL2FzcGlyZS0xOTI4N
2043         TsgY2hkaXIgIi5wYXJhbGxlbC90bXAvYXNwaXJlLTE5Mjg1MjAtMSIgfHxwcmludChTVE
2044         BhcmFsbGVsOiBDYW5ub3QgY2hkaXIgdG8gLnBhcmFsbGVsL3RtcC9hc3BpcmUtMTkyODU
2045         iKSAmJiBleGl0IDI1NTskRU5WeyJPTERQV0QifT0iL2hvbWUvdGFuZ2UvcHJpdmF0L3Bh
2046         IjskRU5WeyJQQVJBTExFTF9QSUQifT0iMTkyODUyMCI7JEVOVnsiUEFSQUxMRUxfU0VRI
2047         0BiYXNoX2Z1bmN0aW9ucz1xdyhteV9mdW5jMyk7IGlmKCRFTlZ7IlNIRUxMIn09fi9jc2
2048         ByaW50IFNUREVSUiAiQ1NIL1RDU0ggRE8gTk9UIFNVUFBPUlQgbmV3bGluZXMgSU4gVkF
2049         TL0ZVTkNUSU9OUy4gVW5zZXQgQGJhc2hfZnVuY3Rpb25zXG4iOyBleGVjICJmYWxzZSI7
2050         YXNoZnVuYyA9ICJteV9mdW5jMygpIHsgIGVjaG8gaW4gbXlfZnVuYyBcJDEgPiBcJDEub
2051         Xhwb3J0IC1mIG15X2Z1bmMzID4vZGV2L251bGw7IjtAQVJHVj0ibXlfZnVuYzMgYWJjLW
2052         RzaGVsbD0iJEVOVntTSEVMTH0iOyR0bXBkaXI9Ii90bXAiOyRuaWNlPTE3O2RveyRFTlZ
2053         MRUxfVE1QfT0kdG1wZGlyLiIvcGFyIi5qb2luIiIsbWFweygwLi45LCJhIi4uInoiLCJB
2054         KVtyYW5kKDYyKV19KDEuLjUpO313aGlsZSgtZSRFTlZ7UEFSQUxMRUxfVE1QfSk7JFNJ
2055         fT1zdWJ7JGRvbmU9MTt9OyRwaWQ9Zm9yazt1bmxlc3MoJHBpZCl7c2V0cGdycDtldmFse
2056         W9yaXR5KDAsMCwkbmljZSl9O2V4ZWMkc2hlbGwsIi1jIiwoJGJhc2hmdW5jLiJAQVJHVi
2057         JleGVjOiQhXG4iO31kb3skcz0kczwxPzAuMDAxKyRzKjEuMDM6JHM7c2VsZWN0KHVuZGV
2058         mLHVuZGVmLCRzKTt9dW50aWwoJGRvbmV8fGdldHBwaWQ9PTEpO2tpbGwoU0lHSFVQLC0k
2059         dW5sZXNzJGRvbmU7d2FpdDtleGl0KCQ/JjEyNz8xMjgrKCQ/JjEyNyk6MSskPz4+OCk=;
2060         _EXIT_status=$?; mkdir -p ./.; rsync --protocol 30 --rsync-path=cd\
2061         ./.parallel/tmp/aspire-1928520-1/./.\;\ rsync -rlDzR -essh
2062         server:./abc-file.out ./.;ssh server -- \(rm\ -f\
2063         ./.parallel/tmp/aspire-1928520-1/abc-file\;\ sh\ -c\ \'rmdir\
2064         ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\
2065         2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);ssh
2066         server -- \(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file.out\;\
2067         sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\
2068         ./.parallel/\ 2\>/dev/null\'\;rm\ -rf\
2069         ./.parallel/tmp/aspire-1928520-1\;\);ssh server -- rm -rf
2070         .parallel/tmp/aspire-1928520-1; exit $_EXIT_status;
2071

Saving output to shell variables (advanced)

2073       GNU parset will set shell variables to the output of GNU parallel. GNU
2074       parset has one important limitation: It cannot be part of a pipe. In
2075       particular this means it cannot read anything from standard input
2076       (stdin) or pipe output to another program.
2077
2078       To use GNU parset prepend command with destination variables:
2079
2080         parset myvar1,myvar2 echo ::: a b
2081         echo $myvar1
2082         echo $myvar2
2083
2084       Output:
2085
2086         a
2087         b
2088
2089       If you only give a single variable, it will be treated as an array:
2090
2091         parset myarray seq {} 5 ::: 1 2 3
2092         echo "${myarray[1]}"
2093
2094       Output:
2095
2096         2
2097         3
2098         4
2099         5
2100
2101       The commands to run can be an array:
2102
2103         cmd=("echo '<<joe  \"double  space\"  cartoon>>'" "pwd")
2104         parset data ::: "${cmd[@]}"
2105         echo "${data[0]}"
2106         echo "${data[1]}"
2107
2108       Output:
2109
2110         <<joe  "double  space"  cartoon>>
2111         [current dir]
2112

Saving to an SQL base (advanced)

2114       GNU parallel can save into an SQL base. Point GNU parallel to a table
2115       and it will put the joblog there together with the variables and the
2116       output each in their own column.
2117
2118   CSV as SQL base
2119       The simplest is to use a CSV file as the storage table:
2120
2121         parallel --sqlandworker csv:////%2Ftmp%2Flog.csv \
2122           seq ::: 10 ::: 12 13 14
2123         cat /tmp/log.csv
2124
2125       Note how '/' in the path must be written as %2F.
2126
2127       Output will be similar to:
2128
2129         Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2130           Command,V1,V2,Stdout,Stderr
2131         1,:,1458254498.254,0.069,0,9,0,0,"seq 10 12",10,12,"10
2132         11
2133         12
2134         ",
2135         2,:,1458254498.278,0.080,0,12,0,0,"seq 10 13",10,13,"10
2136         11
2137         12
2138         13
2139         ",
2140         3,:,1458254498.301,0.083,0,15,0,0,"seq 10 14",10,14,"10
2141         11
2142         12
2143         13
2144         14
2145         ",
2146
2147       A proper CSV reader (like LibreOffice or R's read.csv) will read this
2148       format correctly - even with fields containing newlines as above.
2149
2150       If the output is big you may want to put it into files using --results:
2151
2152         parallel --results outdir --sqlandworker csv:////%2Ftmp%2Flog2.csv \
2153           seq ::: 10 ::: 12 13 14
2154         cat /tmp/log2.csv
2155
2156       Output will be similar to:
2157
2158         Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2159           Command,V1,V2,Stdout,Stderr
2160         1,:,1458824738.287,0.029,0,9,0,0,
2161           "seq 10 12",10,12,outdir/1/10/2/12/stdout,outdir/1/10/2/12/stderr
2162         2,:,1458824738.298,0.025,0,12,0,0,
2163           "seq 10 13",10,13,outdir/1/10/2/13/stdout,outdir/1/10/2/13/stderr
2164         3,:,1458824738.309,0.026,0,15,0,0,
2165           "seq 10 14",10,14,outdir/1/10/2/14/stdout,outdir/1/10/2/14/stderr
2166
2167   DBURL as table
2168       The CSV file is an example of a DBURL.
2169
2170       GNU parallel uses a DBURL to address the table. A DBURL has this
2171       format:
2172
2173         vendor://[[user][:password]@][host][:port]/[database[/table]
2174
2175       Example:
2176
2177         mysql://scott:tiger@my.example.com/mydatabase/mytable
2178         postgresql://scott:tiger@pg.example.com/mydatabase/mytable
2179         sqlite3:///%2Ftmp%2Fmydatabase/mytable
2180         csv:////%2Ftmp%2Flog.csv
2181
2182       To refer to /tmp/mydatabase with sqlite or csv you need to encode the /
2183       as %2F.
2184
2185       Run a job using sqlite on mytable in /tmp/mydatabase:
2186
2187         DBURL=sqlite3:///%2Ftmp%2Fmydatabase
2188         DBURLTABLE=$DBURL/mytable
2189         parallel --sqlandworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2190
2191       To see the result:
2192
2193         sql $DBURL 'SELECT * FROM mytable ORDER BY Seq;'
2194
2195       Output will be similar to:
2196
2197         Seq|Host|Starttime|JobRuntime|Send|Receive|Exitval|_Signal|
2198           Command|V1|V2|Stdout|Stderr
2199         1|:|1451619638.903|0.806||8|0|0|echo foo baz|foo|baz|foo baz
2200         |
2201         2|:|1451619639.265|1.54||9|0|0|echo foo quuz|foo|quuz|foo quuz
2202         |
2203         3|:|1451619640.378|1.43||8|0|0|echo bar baz|bar|baz|bar baz
2204         |
2205         4|:|1451619641.473|0.958||9|0|0|echo bar quuz|bar|quuz|bar quuz
2206         |
2207
2208       The first columns are well known from --joblog. V1 and V2 are data from
2209       the input sources. Stdout and Stderr are standard output and standard
2210       error, respectively.
2211
2212   Using multiple workers
2213       Using an SQL base as storage costs overhead in the order of 1 second
2214       per job.
2215
2216       One of the situations where it makes sense is if you have multiple
2217       workers.
2218
2219       You can then have a single master machine that submits jobs to the SQL
2220       base (but does not do any of the work):
2221
2222         parallel --sqlmaster $DBURLTABLE echo ::: foo bar ::: baz quuz
2223
2224       On the worker machines you run exactly the same command except you
2225       replace --sqlmaster with --sqlworker.
2226
2227         parallel --sqlworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2228
2229       To run a master and a worker on the same machine use --sqlandworker as
2230       shown earlier.
2231

--pipe

2233       The --pipe functionality puts GNU parallel in a different mode: Instead
2234       of treating the data on stdin (standard input) as arguments for a
2235       command to run, the data will be sent to stdin (standard input) of the
2236       command.
2237
2238       The typical situation is:
2239
2240         command_A | command_B | command_C
2241
2242       where command_B is slow, and you want to speed up command_B.
2243
2244   Chunk size
2245       By default GNU parallel will start an instance of command_B, read a
2246       chunk of 1 MB, and pass that to the instance. Then start another
2247       instance, read another chunk, and pass that to the second instance.
2248
2249         cat num1000000 | parallel --pipe wc
2250
2251       Output (the order may be different):
2252
2253         165668  165668 1048571
2254         149797  149797 1048579
2255         149796  149796 1048572
2256         149797  149797 1048579
2257         149797  149797 1048579
2258         149796  149796 1048572
2259          85349   85349  597444
2260
2261       The size of the chunk is not exactly 1 MB because GNU parallel only
2262       passes full lines - never half a line, thus the blocksize is only 1 MB
2263       on average. You can change the block size to 2 MB with --block:
2264
2265         cat num1000000 | parallel --pipe --block 2M wc
2266
2267       Output (the order may be different):
2268
2269         315465  315465 2097150
2270         299593  299593 2097151
2271         299593  299593 2097151
2272          85349   85349  597444
2273
2274       GNU parallel treats each line as a record. If the order of records is
2275       unimportant (e.g. you need all lines processed, but you do not care
2276       which is processed first), then you can use --round-robin. Without
2277       --round-robin GNU parallel will start a command per block; with
2278       --round-robin only the requested number of jobs will be started
2279       (--jobs). The records will then be distributed between the running
2280       jobs:
2281
2282         cat num1000000 | parallel --pipe -j4 --round-robin wc
2283
2284       Output will be similar to:
2285
2286         149797  149797 1048579
2287         299593  299593 2097151
2288         315465  315465 2097150
2289         235145  235145 1646016
2290
2291       One of the 4 instances got a single record, 2 instances got 2 full
2292       records each, and one instance got 1 full and 1 partial record.
2293
2294   Records
2295       GNU parallel sees the input as records. The default record is a single
2296       line.
2297
2298       Using -N140000 GNU parallel will read 140000 records at a time:
2299
2300         cat num1000000 | parallel --pipe -N140000 wc
2301
2302       Output (the order may be different):
2303
2304         140000  140000  868895
2305         140000  140000  980000
2306         140000  140000  980000
2307         140000  140000  980000
2308         140000  140000  980000
2309         140000  140000  980000
2310         140000  140000  980000
2311          20000   20000  140001
2312
2313       Note how that the last job could not get the full 140000 lines, but
2314       only 20000 lines.
2315
2316       If a record is 75 lines -L can be used:
2317
2318         cat num1000000 | parallel --pipe -L75 wc
2319
2320       Output (the order may be different):
2321
2322         165600  165600 1048095
2323         149850  149850 1048950
2324         149775  149775 1048425
2325         149775  149775 1048425
2326         149850  149850 1048950
2327         149775  149775 1048425
2328          85350   85350  597450
2329             25      25     176
2330
2331       Note how GNU parallel still reads a block of around 1 MB; but instead
2332       of passing full lines to wc it passes full 75 lines at a time. This of
2333       course does not hold for the last job (which in this case got 25
2334       lines).
2335
2336   Fixed length records
2337       Fixed length records can be processed by setting --recend '' and
2338       --block recordsize. A header of size n can be processed with --header
2339       .{n}.
2340
2341       Here is how to process a file with a 4-byte header and a 3-byte record
2342       size:
2343
2344         cat fixedlen | parallel --pipe --header .{4} --block 3 --recend '' \
2345           'echo start; cat; echo'
2346
2347       Output:
2348
2349         start
2350         HHHHAAA
2351         start
2352         HHHHCCC
2353         start
2354         HHHHBBB
2355
2356       It may be more efficient to increase --block to a multiplum of the
2357       record size.
2358
2359   Record separators
2360       GNU parallel uses separators to determine where two records split.
2361
2362       --recstart gives the string that starts a record; --recend gives the
2363       string that ends a record. The default is --recend '\n' (newline).
2364
2365       If both --recend and --recstart are given, then the record will only
2366       split if the recend string is immediately followed by the recstart
2367       string.
2368
2369       Here the --recend is set to ', ':
2370
2371         echo /foo, bar/, /baz, qux/, | \
2372           parallel -kN1 --recend ', ' --pipe echo JOB{#}\;cat\;echo END
2373
2374       Output:
2375
2376         JOB1
2377         /foo, END
2378         JOB2
2379         bar/, END
2380         JOB3
2381         /baz, END
2382         JOB4
2383         qux/,
2384         END
2385
2386       Here the --recstart is set to /:
2387
2388         echo /foo, bar/, /baz, qux/, | \
2389           parallel -kN1 --recstart / --pipe echo JOB{#}\;cat\;echo END
2390
2391       Output:
2392
2393         JOB1
2394         /foo, barEND
2395         JOB2
2396         /, END
2397         JOB3
2398         /baz, quxEND
2399         JOB4
2400         /,
2401         END
2402
2403       Here both --recend and --recstart are set:
2404
2405         echo /foo, bar/, /baz, qux/, | \
2406           parallel -kN1 --recend ', ' --recstart / --pipe \
2407           echo JOB{#}\;cat\;echo END
2408
2409       Output:
2410
2411         JOB1
2412         /foo, bar/, END
2413         JOB2
2414         /baz, qux/,
2415         END
2416
2417       Note the difference between setting one string and setting both
2418       strings.
2419
2420       With --regexp the --recend and --recstart will be treated as a regular
2421       expression:
2422
2423         echo foo,bar,_baz,__qux, | \
2424           parallel -kN1 --regexp --recend ,_+ --pipe \
2425           echo JOB{#}\;cat\;echo END
2426
2427       Output:
2428
2429         JOB1
2430         foo,bar,_END
2431         JOB2
2432         baz,__END
2433         JOB3
2434         qux,
2435         END
2436
2437       GNU parallel can remove the record separators with
2438       --remove-rec-sep/--rrs:
2439
2440         echo foo,bar,_baz,__qux, | \
2441           parallel -kN1 --rrs --regexp --recend ,_+ --pipe \
2442           echo JOB{#}\;cat\;echo END
2443
2444       Output:
2445
2446         JOB1
2447         foo,barEND
2448         JOB2
2449         bazEND
2450         JOB3
2451         qux,
2452         END
2453
2454   Header
2455       If the input data has a header, the header can be repeated for each job
2456       by matching the header with --header. If headers start with % you can
2457       do this:
2458
2459         cat num_%header | \
2460           parallel --header '(%.*\n)*' --pipe -N3 echo JOB{#}\;cat
2461
2462       Output (the order may be different):
2463
2464         JOB1
2465         %head1
2466         %head2
2467         1
2468         2
2469         3
2470         JOB2
2471         %head1
2472         %head2
2473         4
2474         5
2475         6
2476         JOB3
2477         %head1
2478         %head2
2479         7
2480         8
2481         9
2482         JOB4
2483         %head1
2484         %head2
2485         10
2486
2487       If the header is 2 lines, --header 2 will work:
2488
2489         cat num_%header | parallel --header 2 --pipe -N3 echo JOB{#}\;cat
2490
2491       Output: Same as above.
2492
2493   --pipepart
2494       --pipe is not very efficient. It maxes out at around 500 MB/s.
2495       --pipepart can easily deliver 5 GB/s. But there are a few limitations.
2496       The input has to be a normal file (not a pipe) given by -a or :::: and
2497       -L/-l/-N do not work. --recend and --recstart, however, do work, and
2498       records can often be split on that alone.
2499
2500         parallel --pipepart -a num1000000 --block 3m wc
2501
2502       Output (the order may be different):
2503
2504        444443  444444 3000002
2505        428572  428572 3000004
2506        126985  126984  888890
2507

Shebang

2509   Input data and parallel command in the same file
2510       GNU parallel is often called as this:
2511
2512         cat input_file | parallel command
2513
2514       With --shebang the input_file and parallel can be combined into the
2515       same script.
2516
2517       UNIX shell scripts start with a shebang line like this:
2518
2519         #!/bin/bash
2520
2521       GNU parallel can do that, too. With --shebang the arguments can be
2522       listed in the file. The parallel command is the first line of the
2523       script:
2524
2525         #!/usr/bin/parallel --shebang -r echo
2526
2527         foo
2528         bar
2529         baz
2530
2531       Output (the order may be different):
2532
2533         foo
2534         bar
2535         baz
2536
2537   Parallelizing existing scripts
2538       GNU parallel is often called as this:
2539
2540         cat input_file | parallel command
2541         parallel command ::: foo bar
2542
2543       If command is a script, parallel can be combined into a single file so
2544       this will run the script in parallel:
2545
2546         cat input_file | command
2547         command foo bar
2548
2549       This perl script perl_echo works like echo:
2550
2551         #!/usr/bin/perl
2552
2553         print "@ARGV\n"
2554
2555       It can be called as this:
2556
2557         parallel perl_echo ::: foo bar
2558
2559       By changing the #!-line it can be run in parallel:
2560
2561         #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2562
2563         print "@ARGV\n"
2564
2565       Thus this will work:
2566
2567         perl_echo foo bar
2568
2569       Output (the order may be different):
2570
2571         foo
2572         bar
2573
2574       This technique can be used for:
2575
2576       Perl:
2577                  #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2578
2579                  print "Arguments @ARGV\n";
2580
2581       Python:
2582                  #!/usr/bin/parallel --shebang-wrap /usr/bin/python
2583
2584                  import sys
2585                  print 'Arguments', str(sys.argv)
2586
2587       Bash/sh/zsh/Korn shell:
2588                  #!/usr/bin/parallel --shebang-wrap /bin/bash
2589
2590                  echo Arguments "$@"
2591
2592       csh:
2593                  #!/usr/bin/parallel --shebang-wrap /bin/csh
2594
2595                  echo Arguments "$argv"
2596
2597       Tcl:
2598                  #!/usr/bin/parallel --shebang-wrap /usr/bin/tclsh
2599
2600                  puts "Arguments $argv"
2601
2602       R:
2603                  #!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave
2604
2605                  args <- commandArgs(trailingOnly = TRUE)
2606                  print(paste("Arguments ",args))
2607
2608       GNUplot:
2609                  #!/usr/bin/parallel --shebang-wrap ARG={} /usr/bin/gnuplot
2610
2611                  print "Arguments ", system('echo $ARG')
2612
2613       Ruby:
2614                  #!/usr/bin/parallel --shebang-wrap /usr/bin/ruby
2615
2616                  print "Arguments "
2617                  puts ARGV
2618
2619       Octave:
2620                  #!/usr/bin/parallel --shebang-wrap /usr/bin/octave
2621
2622                  printf ("Arguments");
2623                  arg_list = argv ();
2624                  for i = 1:nargin
2625                    printf (" %s", arg_list{i});
2626                  endfor
2627                  printf ("\n");
2628
2629       Common LISP:
2630                  #!/usr/bin/parallel --shebang-wrap /usr/bin/clisp
2631
2632                  (format t "~&~S~&" 'Arguments)
2633                  (format t "~&~S~&" *args*)
2634
2635       PHP:
2636                  #!/usr/bin/parallel --shebang-wrap /usr/bin/php
2637                  <?php
2638                  echo "Arguments";
2639                  foreach(array_slice($argv,1) as $v)
2640                  {
2641                    echo " $v";
2642                  }
2643                  echo "\n";
2644                  ?>
2645
2646       Node.js:
2647                  #!/usr/bin/parallel --shebang-wrap /usr/bin/node
2648
2649                  var myArgs = process.argv.slice(2);
2650                  console.log('Arguments ', myArgs);
2651
2652       LUA:
2653                  #!/usr/bin/parallel --shebang-wrap /usr/bin/lua
2654
2655                  io.write "Arguments"
2656                  for a = 1, #arg do
2657                    io.write(" ")
2658                    io.write(arg[a])
2659                  end
2660                  print("")
2661
2662       C#:
2663                  #!/usr/bin/parallel --shebang-wrap ARGV={} /usr/bin/csharp
2664
2665                  var argv = Environment.GetEnvironmentVariable("ARGV");
2666                  print("Arguments "+argv);
2667

Semaphore

2669       GNU parallel can work as a counting semaphore. This is slower and less
2670       efficient than its normal mode.
2671
2672       A counting semaphore is like a row of toilets. People needing a toilet
2673       can use any toilet, but if there are more people than toilets, they
2674       will have to wait for one of the toilets to become available.
2675
2676       An alias for parallel --semaphore is sem.
2677
2678       sem will follow a person to the toilets, wait until a toilet is
2679       available, leave the person in the toilet and exit.
2680
2681       sem --fg will follow a person to the toilets, wait until a toilet is
2682       available, stay with the person in the toilet and exit when the person
2683       exits.
2684
2685       sem --wait will wait for all persons to leave the toilets.
2686
2687       sem does not have a queue discipline, so the next person is chosen
2688       randomly.
2689
2690       -j sets the number of toilets.
2691
2692   Mutex
2693       The default is to have only one toilet (this is called a mutex). The
2694       program is started in the background and sem exits immediately. Use
2695       --wait to wait for all sems to finish:
2696
2697         sem 'sleep 1; echo The first finished' &&
2698           echo The first is now running in the background &&
2699           sem 'sleep 1; echo The second finished' &&
2700           echo The second is now running in the background
2701         sem --wait
2702
2703       Output:
2704
2705         The first is now running in the background
2706         The first finished
2707         The second is now running in the background
2708         The second finished
2709
2710       The command can be run in the foreground with --fg, which will only
2711       exit when the command completes:
2712
2713         sem --fg 'sleep 1; echo The first finished' &&
2714           echo The first finished running in the foreground &&
2715           sem --fg 'sleep 1; echo The second finished' &&
2716           echo The second finished running in the foreground
2717         sem --wait
2718
2719       The difference between this and just running the command, is that a
2720       mutex is set, so if other sems were running in the background only one
2721       would run at a time.
2722
2723       To control which semaphore is used, use --semaphorename/--id. Run this
2724       in one terminal:
2725
2726         sem --id my_id -u 'echo First started; sleep 10; echo First done'
2727
2728       and simultaneously this in another terminal:
2729
2730         sem --id my_id -u 'echo Second started; sleep 10; echo Second done'
2731
2732       Note how the second will only be started when the first has finished.
2733
2734   Counting semaphore
2735       A mutex is like having a single toilet: When it is in use everyone else
2736       will have to wait. A counting semaphore is like having multiple
2737       toilets: Several people can use the toilets, but when they all are in
2738       use, everyone else will have to wait.
2739
2740       sem can emulate a counting semaphore. Use --jobs to set the number of
2741       toilets like this:
2742
2743         sem --jobs 3 --id my_id -u 'echo Start 1; sleep 5; echo 1 done' &&
2744         sem --jobs 3 --id my_id -u 'echo Start 2; sleep 6; echo 2 done' &&
2745         sem --jobs 3 --id my_id -u 'echo Start 3; sleep 7; echo 3 done' &&
2746         sem --jobs 3 --id my_id -u 'echo Start 4; sleep 8; echo 4 done' &&
2747         sem --wait --id my_id
2748
2749       Output:
2750
2751         Start 1
2752         Start 2
2753         Start 3
2754         1 done
2755         Start 4
2756         2 done
2757         3 done
2758         4 done
2759
2760   Timeout
2761       With --semaphoretimeout you can force running the command anyway after
2762       a period (postive number) or give up (negative number):
2763
2764         sem --id foo -u 'echo Slow started; sleep 5; echo Slow ended' &&
2765         sem --id foo --semaphoretimeout 1 'echo Forced running after 1 sec' &&
2766         sem --id foo --semaphoretimeout -2 'echo Give up after 2 secs'
2767         sem --id foo --wait
2768
2769       Output:
2770
2771         Slow started
2772         parallel: Warning: Semaphore timed out. Stealing the semaphore.
2773         Forced running after 1 sec
2774         parallel: Warning: Semaphore timed out. Exiting.
2775         Slow ended
2776
2777       Note how the 'Give up' was not run.
2778

Informational

2780       GNU parallel has some options to give short information about the
2781       configuration.
2782
2783       --help will print a summary of the most important options:
2784
2785         parallel --help
2786
2787       Output:
2788
2789         Usage:
2790
2791         parallel [options] [command [arguments]] < list_of_arguments
2792         parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
2793         cat ... | parallel --pipe [options] [command [arguments]]
2794
2795         -j n            Run n jobs in parallel
2796         -k              Keep same order
2797         -X              Multiple arguments with context replace
2798         --colsep regexp Split input on regexp for positional replacements
2799         {} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings
2800         {3} {3.} {3/} {3/.} {=3 perl code =}    Positional replacement strings
2801         With --plus:    {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
2802                         {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
2803
2804         -S sshlogin     Example: foo@server.example.com
2805         --slf ..        Use ~/.parallel/sshloginfile as the list of sshlogins
2806         --trc {}.bar    Shorthand for --transfer --return {}.bar --cleanup
2807         --onall         Run the given command with argument on all sshlogins
2808         --nonall        Run the given command with no arguments on all sshlogins
2809
2810         --pipe          Split stdin (standard input) to multiple jobs.
2811         --recend str    Record end separator for --pipe.
2812         --recstart str  Record start separator for --pipe.
2813
2814         See 'man parallel' for details
2815
2816         Academic tradition requires you to cite works you base your article on.
2817         When using programs that use GNU Parallel to process data for publication
2818         please cite:
2819
2820           O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
2821           ;login: The USENIX Magazine, February 2011:42-47.
2822
2823         This helps funding further development; AND IT WON'T COST YOU A CENT.
2824         If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2825
2826       When asking for help, always report the full output of this:
2827
2828         parallel --version
2829
2830       Output:
2831
2832         GNU parallel 20180122
2833         Copyright (C) 2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
2834         Ole Tange and Free Software Foundation, Inc.
2835         License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
2836         This is free software: you are free to change and redistribute it.
2837         GNU parallel comes with no warranty.
2838
2839         Web site: http://www.gnu.org/software/parallel
2840
2841         When using programs that use GNU Parallel to process data for publication
2842         please cite as described in 'parallel --citation'.
2843
2844       In scripts --minversion can be used to ensure the user has at least
2845       this version:
2846
2847         parallel --minversion 20130722 && \
2848           echo Your version is at least 20130722.
2849
2850       Output:
2851
2852         20160322
2853         Your version is at least 20130722.
2854
2855       If you are using GNU parallel for research the BibTeX citation can be
2856       generated using --citation:
2857
2858         parallel --citation
2859
2860       Output:
2861
2862         Academic tradition requires you to cite works you base your article on.
2863         When using programs that use GNU Parallel to process data for publication
2864         please cite:
2865
2866         @article{Tange2011a,
2867           title = {GNU Parallel - The Command-Line Power Tool},
2868           author = {O. Tange},
2869           address = {Frederiksberg, Denmark},
2870           journal = {;login: The USENIX Magazine},
2871           month = {Feb},
2872           number = {1},
2873           volume = {36},
2874           url = {http://www.gnu.org/s/parallel},
2875           year = {2011},
2876           pages = {42-47},
2877           doi = {10.5281/zenodo.16303}
2878         }
2879
2880         (Feel free to use \nocite{Tange2011a})
2881
2882         This helps funding further development; AND IT WON'T COST YOU A CENT.
2883         If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2884
2885         If you send a copy of your published article to tange@gnu.org, it will be
2886         mentioned in the release notes of next version of GNU Parallel.
2887
2888       With --max-line-length-allowed GNU parallel will report the maximal
2889       size of the command line:
2890
2891         parallel --max-line-length-allowed
2892
2893       Output (may vary on different systems):
2894
2895         131071
2896
2897       --number-of-cpus and --number-of-cores run system specific code to
2898       determine the number of CPUs and CPU cores on the system. On
2899       unsupported platforms they will return 1:
2900
2901         parallel --number-of-cpus
2902         parallel --number-of-cores
2903
2904       Output (may vary on different systems):
2905
2906         4
2907         64
2908

Profiles

2910       The defaults for GNU parallel can be changed systemwide by putting the
2911       command line options in /etc/parallel/config. They can be changed for a
2912       user by putting them in ~/.parallel/config.
2913
2914       Profiles work the same way, but have to be referred to with --profile:
2915
2916         echo '--nice 17' > ~/.parallel/nicetimeout
2917         echo '--timeout 300%' >> ~/.parallel/nicetimeout
2918         parallel --profile nicetimeout echo ::: A B C
2919
2920       Output:
2921
2922         A
2923         B
2924         C
2925
2926       Profiles can be combined:
2927
2928         echo '-vv --dry-run' > ~/.parallel/dryverbose
2929         parallel --profile dryverbose --profile nicetimeout echo ::: A B C
2930
2931       Output:
2932
2933         echo A
2934         echo B
2935         echo C
2936

Spread the word

2938       I hope you have learned something from this tutorial.
2939
2940       If you like GNU parallel:
2941
2942       · (Re-)walk through the tutorial if you have not done so in the past
2943         year (http://www.gnu.org/software/parallel/parallel_tutorial.html)
2944
2945       · Give a demo at your local user group/your team/your colleagues
2946
2947       · Post the intro videos and the tutorial on Reddit, Mastodon,
2948         Diaspora*, forums, blogs, Identi.ca, Google+, Twitter, Facebook,
2949         Linkedin, and mailing lists
2950
2951       · Request or write a review for your favourite blog or magazine
2952         (especially if you do something cool with GNU parallel)
2953
2954       · Invite me for your next conference
2955
2956       If you use GNU parallel for research:
2957
2958       · Please cite GNU parallel in you publications (use --citation)
2959
2960       If GNU parallel saves you money:
2961
2962       · (Have your company) donate to FSF or become a member
2963         https://my.fsf.org/donate/
2964
2965       (C) 2013,2014,2015,2016,2017,2018 Ole Tange, GPLv3
2966
2967
2968
296920180122                          2018-02-15              PARALLEL_TUTORIAL(7)
Impressum