parallel_tutorial(7)

1PARALLEL_TUTORIAL(7)               parallel               PARALLEL_TUTORIAL(7)
2
3
4

GNU Parallel Tutorial

6       This tutorial shows off much of GNU parallel's functionality. The
7       tutorial is meant to learn the options in and syntax of GNU parallel.
8       The tutorial is not to show realistic examples from the real world.
9
10   Reader's guide
11       If you prefer reading a book buy GNU Parallel 2018 at
12       http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
13       or download it at: https://doi.org/10.5281/zenodo.1146014
14
15       Otherwise start by watching the intro videos for a quick introduction:
16       http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
17
18       Then browse through the EXAMPLEs after the list of OPTIONS in man
19       parallel (Use LESS=+/EXAMPLE: man parallel). That will give you an idea
20       of what GNU parallel is capable of.
21
22       If you want to dive even deeper: spend a couple of hours walking
23       through the tutorial (man parallel_tutorial). Your command line will
24       love you for it.
25
26       Finally you may want to look at the rest of the manual (man parallel)
27       if you have special needs not already covered.
28
29       If you want to know the design decisions behind GNU parallel, try: man
30       parallel_design. This is also a good intro if you intend to change GNU
31       parallel.
32

Prerequisites

34       To run this tutorial you must have the following:
35
36       parallel >= version 20160822
37                Install the newest version using your package manager
38                (recommended for security reasons), the way described in
39                README, or with this command:
40
41                  $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
42                     fetch -o - http://pi.dk/3 ) > install.sh
43                  $ sha1sum install.sh
44                  12345678 3374ec53 bacb199b 245af2dd a86df6c9
45                  $ md5sum install.sh
46                  029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
47                  $ sha512sum install.sh
48                  40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
49                  60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
50                  $ bash install.sh
51
52                This will also install the newest version of the tutorial
53                which you can see by running this:
54
55                  man parallel_tutorial
56
57                Most of the tutorial will work on older versions, too.
58
59       abc-file:
60                The file can be generated by this command:
61
62                  parallel -k echo ::: A B C > abc-file
63
64       def-file:
65                The file can be generated by this command:
66
67                  parallel -k echo ::: D E F > def-file
68
69       abc0-file:
70                The file can be generated by this command:
71
72                  perl -e 'printf "A\0B\0C\0"' > abc0-file
73
74       abc_-file:
75                The file can be generated by this command:
76
77                  perl -e 'printf "A_B_C_"' > abc_-file
78
79       tsv-file.tsv
80                The file can be generated by this command:
81
82                  perl -e 'printf "f1\tf2\nA\tB\nC\tD\n"' > tsv-file.tsv
83
84       num8     The file can be generated by this command:
85
86                  perl -e 'for(1..8){print "$_\n"}' > num8
87
88       num128   The file can be generated by this command:
89
90                  perl -e 'for(1..128){print "$_\n"}' > num128
91
92       num30000 The file can be generated by this command:
93
94                  perl -e 'for(1..30000){print "$_\n"}' > num30000
95
96       num1000000
97                The file can be generated by this command:
98
99                  perl -e 'for(1..1000000){print "$_\n"}' > num1000000
100
101       num_%header
102                The file can be generated by this command:
103
104                  (echo %head1; echo %head2; \
105                   perl -e 'for(1..10){print "$_\n"}') > num_%header
106
107       fixedlen The file can be generated by this command:
108
109                  perl -e 'print "HHHHAAABBBCCC"' > fixedlen
110
111       For remote running: ssh login on 2 servers with no password in $SERVER1
112       and $SERVER2 must work.
113                  SERVER1=server.example.com
114                  SERVER2=server2.example.net
115
116                So you must be able to do this:
117
118                  ssh $SERVER1 echo works
119                  ssh $SERVER2 echo works
120
121                It can be setup by running 'ssh-keygen -t dsa; ssh-copy-id
122                $SERVER1' and using an empty passphrase.
123

Input sources

125       GNU parallel reads input from input sources. These can be files, the
126       command line, and stdin (standard input or a pipe).
127
128   A single input source
129       Input can be read from the command line:
130
131         parallel echo ::: A B C
132
133       Output (the order may be different because the jobs are run in
134       parallel):
135
136         A
137         B
138         C
139
140       The input source can be a file:
141
142         parallel -a abc-file echo
143
144       Output: Same as above.
145
146       STDIN (standard input) can be the input source:
147
148         cat abc-file | parallel echo
149
150       Output: Same as above.
151
152   Multiple input sources
153       GNU parallel can take multiple input sources given on the command line.
154       GNU parallel then generates all combinations of the input sources:
155
156         parallel echo ::: A B C ::: D E F
157
158       Output (the order may be different):
159
160         A D
161         A E
162         A F
163         B D
164         B E
165         B F
166         C D
167         C E
168         C F
169
170       The input sources can be files:
171
172         parallel -a abc-file -a def-file echo
173
174       Output: Same as above.
175
176       STDIN (standard input) can be one of the input sources using -:
177
178         cat abc-file | parallel -a - -a def-file echo
179
180       Output: Same as above.
181
182       Instead of -a files can be given after :::::
183
184         cat abc-file | parallel echo :::: - def-file
185
186       Output: Same as above.
187
188       ::: and :::: can be mixed:
189
190         parallel echo ::: A B C :::: def-file
191
192       Output: Same as above.
193
194       Linking arguments from input sources
195
196       With --link you can link the input sources and get one argument from
197       each input source:
198
199         parallel --link echo ::: A B C ::: D E F
200
201       Output (the order may be different):
202
203         A D
204         B E
205         C F
206
207       If one of the input sources is too short, its values will wrap:
208
209         parallel --link echo ::: A B C D E ::: F G
210
211       Output (the order may be different):
212
213         A F
214         B G
215         C F
216         D G
217         E F
218
219       For more flexible linking you can use :::+ and ::::+. They work like
220       ::: and :::: except they link the previous input source to this input
221       source.
222
223       This will link ABC to GHI:
224
225         parallel echo :::: abc-file :::+ G H I :::: def-file
226
227       Output (the order may be different):
228
229         A G D
230         A G E
231         A G F
232         B H D
233         B H E
234         B H F
235         C I D
236         C I E
237         C I F
238
239       This will link GHI to DEF:
240
241         parallel echo :::: abc-file ::: G H I ::::+ def-file
242
243       Output (the order may be different):
244
245         A G D
246         A H E
247         A I F
248         B G D
249         B H E
250         B I F
251         C G D
252         C H E
253         C I F
254
255       If one of the input sources is too short when using :::+ or ::::+, the
256       rest will be ignored:
257
258         parallel echo ::: A B C D E :::+ F G
259
260       Output (the order may be different):
261
262         A F
263         B G
264
265   Changing the argument separator.
266       GNU parallel can use other separators than ::: or ::::. This is
267       typically useful if ::: or :::: is used in the command to run:
268
269         parallel --arg-sep ,, echo ,, A B C :::: def-file
270
271       Output (the order may be different):
272
273         A D
274         A E
275         A F
276         B D
277         B E
278         B F
279         C D
280         C E
281         C F
282
283       Changing the argument file separator:
284
285         parallel --arg-file-sep // echo ::: A B C // def-file
286
287       Output: Same as above.
288
289   Changing the argument delimiter
290       GNU parallel will normally treat a full line as a single argument: It
291       uses \n as argument delimiter. This can be changed with -d:
292
293         parallel -d _ echo :::: abc_-file
294
295       Output (the order may be different):
296
297         A
298         B
299         C
300
301       NUL can be given as \0:
302
303         parallel -d '\0' echo :::: abc0-file
304
305       Output: Same as above.
306
307       A shorthand for -d '\0' is -0 (this will often be used to read files
308       from find ... -print0):
309
310         parallel -0 echo :::: abc0-file
311
312       Output: Same as above.
313
314   End-of-file value for input source
315       GNU parallel can stop reading when it encounters a certain value:
316
317         parallel -E stop echo ::: A B stop C D
318
319       Output:
320
321         A
322         B
323
324   Skipping empty lines
325       Using --no-run-if-empty GNU parallel will skip empty lines.
326
327         (echo 1; echo; echo 2) | parallel --no-run-if-empty echo
328
329       Output:
330
331         1
332         2
333

Building the command line

335   No command means arguments are commands
336       If no command is given after parallel the arguments themselves are
337       treated as commands:
338
339         parallel ::: ls 'echo foo' pwd
340
341       Output (the order may be different):
342
343         [list of files in current dir]
344         foo
345         [/path/to/current/working/dir]
346
347       The command can be a script, a binary or a Bash function if the
348       function is exported using export -f:
349
350         # Only works in Bash
351         my_func() {
352           echo in my_func $1
353         }
354         export -f my_func
355         parallel my_func ::: 1 2 3
356
357       Output (the order may be different):
358
359         in my_func 1
360         in my_func 2
361         in my_func 3
362
363   Replacement strings
364       The 7 predefined replacement strings
365
366       GNU parallel has several replacement strings. If no replacement strings
367       are used the default is to append {}:
368
369         parallel echo ::: A/B.C
370
371       Output:
372
373         A/B.C
374
375       The default replacement string is {}:
376
377         parallel echo {} ::: A/B.C
378
379       Output:
380
381         A/B.C
382
383       The replacement string {.} removes the extension:
384
385         parallel echo {.} ::: A/B.C
386
387       Output:
388
389         A/B
390
391       The replacement string {/} removes the path:
392
393         parallel echo {/} ::: A/B.C
394
395       Output:
396
397         B.C
398
399       The replacement string {//} keeps only the path:
400
401         parallel echo {//} ::: A/B.C
402
403       Output:
404
405         A
406
407       The replacement string {/.} removes the path and the extension:
408
409         parallel echo {/.} ::: A/B.C
410
411       Output:
412
413         B
414
415       The replacement string {#} gives the job number:
416
417         parallel echo {#} ::: A B C
418
419       Output (the order may be different):
420
421         1
422         2
423         3
424
425       The replacement string {%} gives the job slot number (between 1 and
426       number of jobs to run in parallel):
427
428         parallel -j 2 echo {%} ::: A B C
429
430       Output (the order may be different and 1 and 2 may be swapped):
431
432         1
433         2
434         1
435
436       Changing the replacement strings
437
438       The replacement string {} can be changed with -I:
439
440         parallel -I ,, echo ,, ::: A/B.C
441
442       Output:
443
444         A/B.C
445
446       The replacement string {.} can be changed with --extensionreplace:
447
448         parallel --extensionreplace ,, echo ,, ::: A/B.C
449
450       Output:
451
452         A/B
453
454       The replacement string {/} can be replaced with --basenamereplace:
455
456         parallel --basenamereplace ,, echo ,, ::: A/B.C
457
458       Output:
459
460         B.C
461
462       The replacement string {//} can be changed with --dirnamereplace:
463
464         parallel --dirnamereplace ,, echo ,, ::: A/B.C
465
466       Output:
467
468         A
469
470       The replacement string {/.} can be changed with
471       --basenameextensionreplace:
472
473         parallel --basenameextensionreplace ,, echo ,, ::: A/B.C
474
475       Output:
476
477         B
478
479       The replacement string {#} can be changed with --seqreplace:
480
481         parallel --seqreplace ,, echo ,, ::: A B C
482
483       Output (the order may be different):
484
485         1
486         2
487         3
488
489       The replacement string {%} can be changed with --slotreplace:
490
491         parallel -j2 --slotreplace ,, echo ,, ::: A B C
492
493       Output (the order may be different and 1 and 2 may be swapped):
494
495         1
496         2
497         1
498
499       Perl expression replacement string
500
501       When predefined replacement strings are not flexible enough a perl
502       expression can be used instead. One example is to remove two
503       extensions: foo.tar.gz becomes foo
504
505         parallel echo '{= s:\.[^.]+$::;s:\.[^.]+$::; =}' ::: foo.tar.gz
506
507       Output:
508
509         foo
510
511       In {= =} you can access all of GNU parallel's internal functions and
512       variables. A few are worth mentioning.
513
514       total_jobs() returns the total number of jobs:
515
516         parallel echo Job {#} of {= '$_=total_jobs()' =} ::: {1..5}
517
518       Output:
519
520         Job 1 of 5
521         Job 2 of 5
522         Job 3 of 5
523         Job 4 of 5
524         Job 5 of 5
525
526       Q(...) shell quotes the string:
527
528         parallel echo {} shell quoted is {= '$_=Q($_)' =} ::: '*/!#$'
529
530       Output:
531
532         */!#$ shell quoted is \*/\!\#\$
533
534       skip() skips the job:
535
536         parallel echo {= 'if($_==3) { skip() }' =} ::: {1..5}
537
538       Output:
539
540         1
541         2
542         4
543         5
544
545       @arg contains the input source variables:
546
547         parallel echo {= 'if($arg[1]==$arg[2]) { skip() }' =} \
548           ::: {1..3} ::: {1..3}
549
550       Output:
551
552         1 2
553         1 3
554         2 1
555         2 3
556         3 1
557         3 2
558
559       If the strings {= and =} cause problems they can be replaced with
560       --parens:
561
562         parallel --parens ,,,, echo ',, s:\.[^.]+$::;s:\.[^.]+$::; ,,' \
563           ::: foo.tar.gz
564
565       Output:
566
567         foo
568
569       To define a shorthand replacement string use --rpl:
570
571         parallel --rpl '.. s:\.[^.]+$::;s:\.[^.]+$::;' echo '..' \
572           ::: foo.tar.gz
573
574       Output: Same as above.
575
576       If the shorthand starts with { it can be used as a positional
577       replacement string, too:
578
579         parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{..}'
580           ::: foo.tar.gz
581
582       Output: Same as above.
583
584       If the shorthand contains matching parenthesis the replacement string
585       becomes a dynamic replacement string and the string in the parenthesis
586       can be accessed as $$1. If there are multiple matching parenthesis, the
587       matched strings can be accessed using $$2, $$3 and so on.
588
589       You can think of this as giving arguments to the replacement string.
590       Here we give the argument .tar.gz to the replacement string {%string}
591       which removes string:
592
593         parallel --rpl '{%(.+?)} s/$$1$//;' echo {%.tar.gz}.zip ::: foo.tar.gz
594
595       Output:
596
597         foo.zip
598
599       Here we give the two arguments tar.gz and zip to the replacement string
600       {/string1/string2} which replaces string1 with string2:
601
602         parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;' echo {/tar.gz/zip} \
603           ::: foo.tar.gz
604
605       Output:
606
607         foo.zip
608
609       GNU parallel's 7 replacement strings are implemented as this:
610
611         --rpl '{} '
612         --rpl '{#} $_=$job->seq()'
613         --rpl '{%} $_=$job->slot()'
614         --rpl '{/} s:.*/::'
615         --rpl '{//} $Global::use{"File::Basename"} ||=
616                  eval "use File::Basename; 1;"; $_ = dirname($_);'
617         --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
618         --rpl '{.} s:\.[^/.]+$::'
619
620       Positional replacement strings
621
622       With multiple input sources the argument from the individual input
623       sources can be accessed with {number}:
624
625         parallel echo {1} and {2} ::: A B ::: C D
626
627       Output (the order may be different):
628
629         A and C
630         A and D
631         B and C
632         B and D
633
634       The positional replacement strings can also be modified using /, //,
635       /., and  .:
636
637         parallel echo /={1/} //={1//} /.={1/.} .={1.} ::: A/B.C D/E.F
638
639       Output (the order may be different):
640
641         /=B.C //=A /.=B .=A/B
642         /=E.F //=D /.=E .=D/E
643
644       If a position is negative, it will refer to the input source counted
645       from behind:
646
647         parallel echo 1={1} 2={2} 3={3} -1={-1} -2={-2} -3={-3} \
648           ::: A B ::: C D ::: E F
649
650       Output (the order may be different):
651
652         1=A 2=C 3=E -1=E -2=C -3=A
653         1=A 2=C 3=F -1=F -2=C -3=A
654         1=A 2=D 3=E -1=E -2=D -3=A
655         1=A 2=D 3=F -1=F -2=D -3=A
656         1=B 2=C 3=E -1=E -2=C -3=B
657         1=B 2=C 3=F -1=F -2=C -3=B
658         1=B 2=D 3=E -1=E -2=D -3=B
659         1=B 2=D 3=F -1=F -2=D -3=B
660
661       Positional perl expression replacement string
662
663       To use a perl expression as a positional replacement string simply
664       prepend the perl expression with number and space:
665
666         parallel echo '{=2 s:\.[^.]+$::;s:\.[^.]+$::; =} {1}' \
667           ::: bar ::: foo.tar.gz
668
669       Output:
670
671         foo bar
672
673       If a shorthand defined using --rpl starts with { it can be used as a
674       positional replacement string, too:
675
676         parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{2..} {1}' \
677           ::: bar ::: foo.tar.gz
678
679       Output: Same as above.
680
681       Input from columns
682
683       The columns in a file can be bound to positional replacement strings
684       using --colsep. Here the columns are separated by TAB (\t):
685
686         parallel --colsep '\t' echo 1={1} 2={2} :::: tsv-file.tsv
687
688       Output (the order may be different):
689
690         1=f1 2=f2
691         1=A 2=B
692         1=C 2=D
693
694       Header defined replacement strings
695
696       With --header GNU parallel will use the first value of the input source
697       as the name of the replacement string. Only the non-modified version {}
698       is supported:
699
700         parallel --header : echo f1={f1} f2={f2} ::: f1 A B ::: f2 C D
701
702       Output (the order may be different):
703
704         f1=A f2=C
705         f1=A f2=D
706         f1=B f2=C
707         f1=B f2=D
708
709       It is useful with --colsep for processing files with TAB separated
710       values:
711
712         parallel --header : --colsep '\t' echo f1={f1} f2={f2} \
713           :::: tsv-file.tsv
714
715       Output (the order may be different):
716
717         f1=A f2=B
718         f1=C f2=D
719
720       More pre-defined replacement strings with --plus
721
722       --plus adds the replacement strings {+/} {+.} {+..} {+...} {..}  {...}
723       {/..} {/...} {##}. The idea being that {+foo} matches the opposite of
724       {foo} and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
725       {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}.
726
727         parallel --plus echo {} ::: dir/sub/file.ex1.ex2.ex3
728         parallel --plus echo {+/}/{/} ::: dir/sub/file.ex1.ex2.ex3
729         parallel --plus echo {.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
730         parallel --plus echo {+/}/{/.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
731         parallel --plus echo {..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
732         parallel --plus echo {+/}/{/..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
733         parallel --plus echo {...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
734         parallel --plus echo {+/}/{/...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
735
736       Output:
737
738         dir/sub/file.ex1.ex2.ex3
739
740       {##} is simply the number of jobs:
741
742         parallel --plus echo Job {#} of {##} ::: {1..5}
743
744       Output:
745
746         Job 1 of 5
747         Job 2 of 5
748         Job 3 of 5
749         Job 4 of 5
750         Job 5 of 5
751
752       Dynamic replacement strings with --plus
753
754       --plus also defines these dynamic replacement strings:
755
756       {:-string}         Default value is string if the argument is empty.
757
758       {:number}          Substring from number till end of string.
759
760       {:number1:number2} Substring from number1 to number2.
761
762       {#string}          If the argument starts with string, remove it.
763
764       {%string}          If the argument ends with string, remove it.
765
766       {/string1/string2} Replace string1 with string2.
767
768       {^string}          If the argument starts with string, upper case it.
769                          string must be a single letter.
770
771       {^^string}         If the argument contains string, upper case it.
772                          string must be a single letter.
773
774       {,string}          If the argument starts with string, lower case it.
775                          string must be a single letter.
776
777       {,,string}         If the argument contains string, lower case it.
778                          string must be a single letter.
779
780       They are inspired from Bash:
781
782         unset myvar
783         echo ${myvar:-myval}
784         parallel --plus echo {:-myval} ::: "$myvar"
785
786         myvar=abcAaAdef
787         echo ${myvar:2}
788         parallel --plus echo {:2} ::: "$myvar"
789
790         echo ${myvar:2:3}
791         parallel --plus echo {:2:3} ::: "$myvar"
792
793         echo ${myvar#bc}
794         parallel --plus echo {#bc} ::: "$myvar"
795         echo ${myvar#abc}
796         parallel --plus echo {#abc} ::: "$myvar"
797
798         echo ${myvar%de}
799         parallel --plus echo {%de} ::: "$myvar"
800         echo ${myvar%def}
801         parallel --plus echo {%def} ::: "$myvar"
802
803         echo ${myvar/def/ghi}
804         parallel --plus echo {/def/ghi} ::: "$myvar"
805
806         echo ${myvar^a}
807         parallel --plus echo {^a} ::: "$myvar"
808         echo ${myvar^^a}
809         parallel --plus echo {^^a} ::: "$myvar"
810
811         myvar=AbcAaAdef
812         echo ${myvar,A}
813         parallel --plus echo '{,A}' ::: "$myvar"
814         echo ${myvar,,A}
815         parallel --plus echo '{,,A}' ::: "$myvar"
816
817       Output:
818
819         myval
820         myval
821         cAaAdef
822         cAaAdef
823         cAa
824         cAa
825         abcAaAdef
826         abcAaAdef
827         AaAdef
828         AaAdef
829         abcAaAdef
830         abcAaAdef
831         abcAaA
832         abcAaA
833         abcAaAghi
834         abcAaAghi
835         AbcAaAdef
836         AbcAaAdef
837         AbcAAAdef
838         AbcAAAdef
839         abcAaAdef
840         abcAaAdef
841         abcaaadef
842         abcaaadef
843
844   More than one argument
845       With --xargs GNU parallel will fit as many arguments as possible on a
846       single line:
847
848         cat num30000 | parallel --xargs echo | wc -l
849
850       Output (if you run this under Bash on GNU/Linux):
851
852         2
853
854       The 30000 arguments fitted on 2 lines.
855
856       The maximal length of a single line can be set with -s. With a maximal
857       line length of 10000 chars 17 commands will be run:
858
859         cat num30000 | parallel --xargs -s 10000 echo | wc -l
860
861       Output:
862
863         17
864
865       For better parallelism GNU parallel can distribute the arguments
866       between all the parallel jobs when end of file is met.
867
868       Below GNU parallel reads the last argument when generating the second
869       job. When GNU parallel reads the last argument, it spreads all the
870       arguments for the second job over 4 jobs instead, as 4 parallel jobs
871       are requested.
872
873       The first job will be the same as the --xargs example above, but the
874       second job will be split into 4 evenly sized jobs, resulting in a total
875       of 5 jobs:
876
877         cat num30000 | parallel --jobs 4 -m echo | wc -l
878
879       Output (if you run this under Bash on GNU/Linux):
880
881         5
882
883       This is even more visible when running 4 jobs with 10 arguments. The 10
884       arguments are being spread over 4 jobs:
885
886         parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10
887
888       Output:
889
890         1 2 3
891         4 5 6
892         7 8 9
893         10
894
895       A replacement string can be part of a word. -m will not repeat the
896       context:
897
898         parallel --jobs 4 -m echo pre-{}-post ::: A B C D E F G
899
900       Output (the order may be different):
901
902         pre-A B-post
903         pre-C D-post
904         pre-E F-post
905         pre-G-post
906
907       To repeat the context use -X which otherwise works like -m:
908
909         parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G
910
911       Output (the order may be different):
912
913         pre-A-post pre-B-post
914         pre-C-post pre-D-post
915         pre-E-post pre-F-post
916         pre-G-post
917
918       To limit the number of arguments use -N:
919
920         parallel -N3 echo ::: A B C D E F G H
921
922       Output (the order may be different):
923
924         A B C
925         D E F
926         G H
927
928       -N also sets the positional replacement strings:
929
930         parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H
931
932       Output (the order may be different):
933
934         1=A 2=B 3=C
935         1=D 2=E 3=F
936         1=G 2=H 3=
937
938       -N0 reads 1 argument but inserts none:
939
940         parallel -N0 echo foo ::: 1 2 3
941
942       Output:
943
944         foo
945         foo
946         foo
947
948   Quoting
949       Command lines that contain special characters may need to be protected
950       from the shell.
951
952       The perl program print "@ARGV\n" basically works like echo.
953
954         perl -e 'print "@ARGV\n"' A
955
956       Output:
957
958         A
959
960       To run that in parallel the command needs to be quoted:
961
962         parallel perl -e 'print "@ARGV\n"' ::: This wont work
963
964       Output:
965
966         [Nothing]
967
968       To quote the command use -q:
969
970         parallel -q perl -e 'print "@ARGV\n"' ::: This works
971
972       Output (the order may be different):
973
974         This
975         works
976
977       Or you can quote the critical part using \':
978
979         parallel perl -e \''print "@ARGV\n"'\' ::: This works, too
980
981       Output (the order may be different):
982
983         This
984         works,
985         too
986
987       GNU parallel can also \-quote full lines. Simply run this:
988
989         parallel --shellquote
990         Warning: Input is read from the terminal. You either know what you
991         Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
992         Warning: ::: or :::: or to pipe data into parallel. If so
993         Warning: consider going through the tutorial: man parallel_tutorial
994         Warning: Press CTRL-D to exit.
995         perl -e 'print "@ARGV\n"'
996         [CTRL-D]
997
998       Output:
999
1000         perl\ -e\ \'print\ \"@ARGV\\n\"\'
1001
1002       This can then be used as the command:
1003
1004         parallel perl\ -e\ \'print\ \"@ARGV\\n\"\' ::: This also works
1005
1006       Output (the order may be different):
1007
1008         This
1009         also
1010         works
1011
1012   Trimming space
1013       Space can be trimmed on the arguments using --trim:
1014
1015         parallel --trim r echo pre-{}-post ::: ' A '
1016
1017       Output:
1018
1019         pre- A-post
1020
1021       To trim on the left side:
1022
1023         parallel --trim l echo pre-{}-post ::: ' A '
1024
1025       Output:
1026
1027         pre-A -post
1028
1029       To trim on the both sides:
1030
1031         parallel --trim lr echo pre-{}-post ::: ' A '
1032
1033       Output:
1034
1035         pre-A-post
1036
1037   Respecting the shell
1038       This tutorial uses Bash as the shell. GNU parallel respects which shell
1039       you are using, so in zsh you can do:
1040
1041         parallel echo \={} ::: zsh bash ls
1042
1043       Output:
1044
1045         /usr/bin/zsh
1046         /bin/bash
1047         /bin/ls
1048
1049       In csh you can do:
1050
1051         parallel 'set a="{}"; if( { test -d "$a" } ) echo "$a is a dir"' ::: *
1052
1053       Output:
1054
1055         [somedir] is a dir
1056
1057       This also becomes useful if you use GNU parallel in a shell script: GNU
1058       parallel will use the same shell as the shell script.
1059

Controlling the output

1061       The output can prefixed with the argument:
1062
1063         parallel --tag echo foo-{} ::: A B C
1064
1065       Output (the order may be different):
1066
1067         A       foo-A
1068         B       foo-B
1069         C       foo-C
1070
1071       To prefix it with another string use --tagstring:
1072
1073         parallel --tagstring {}-bar echo foo-{} ::: A B C
1074
1075       Output (the order may be different):
1076
1077         A-bar   foo-A
1078         B-bar   foo-B
1079         C-bar   foo-C
1080
1081       To see what commands will be run without running them use --dryrun:
1082
1083         parallel --dryrun echo {} ::: A B C
1084
1085       Output (the order may be different):
1086
1087         echo A
1088         echo B
1089         echo C
1090
1091       To print the command before running them use --verbose:
1092
1093         parallel --verbose echo {} ::: A B C
1094
1095       Output (the order may be different):
1096
1097         echo A
1098         echo B
1099         A
1100         echo C
1101         B
1102         C
1103
1104       GNU parallel will postpone the output until the command completes:
1105
1106         parallel -j2 'printf "%s-start\n%s" {} {};
1107           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1108
1109       Output:
1110
1111         2-start
1112         2-middle
1113         2-end
1114         1-start
1115         1-middle
1116         1-end
1117         4-start
1118         4-middle
1119         4-end
1120
1121       To get the output immediately use --ungroup:
1122
1123         parallel -j2 --ungroup 'printf "%s-start\n%s" {} {};
1124           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1125
1126       Output:
1127
1128         4-start
1129         42-start
1130         2-middle
1131         2-end
1132         1-start
1133         1-middle
1134         1-end
1135         -middle
1136         4-end
1137
1138       --ungroup is fast, but can cause half a line from one job to be mixed
1139       with half a line of another job. That has happened in the second line,
1140       where the line '4-middle' is mixed with '2-start'.
1141
1142       To avoid this use --linebuffer:
1143
1144         parallel -j2 --linebuffer 'printf "%s-start\n%s" {} {};
1145           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1146
1147       Output:
1148
1149         4-start
1150         2-start
1151         2-middle
1152         2-end
1153         1-start
1154         1-middle
1155         1-end
1156         4-middle
1157         4-end
1158
1159       To force the output in the same order as the arguments use
1160       --keep-order/-k:
1161
1162         parallel -j2 -k 'printf "%s-start\n%s" {} {};
1163           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1164
1165       Output:
1166
1167         4-start
1168         4-middle
1169         4-end
1170         2-start
1171         2-middle
1172         2-end
1173         1-start
1174         1-middle
1175         1-end
1176
1177   Saving output into files
1178       GNU parallel can save the output of each job into files:
1179
1180         parallel --files echo ::: A B C
1181
1182       Output will be similar to this:
1183
1184         /tmp/pAh6uWuQCg.par
1185         /tmp/opjhZCzAX4.par
1186         /tmp/W0AT_Rph2o.par
1187
1188       By default GNU parallel will cache the output in files in /tmp. This
1189       can be changed by setting $TMPDIR or --tmpdir:
1190
1191         parallel --tmpdir /var/tmp --files echo ::: A B C
1192
1193       Output will be similar to this:
1194
1195         /var/tmp/N_vk7phQRc.par
1196         /var/tmp/7zA4Ccf3wZ.par
1197         /var/tmp/LIuKgF_2LP.par
1198
1199       Or:
1200
1201         TMPDIR=/var/tmp parallel --files echo ::: A B C
1202
1203       Output: Same as above.
1204
1205       The output files can be saved in a structured way using --results:
1206
1207         parallel --results outdir echo ::: A B C
1208
1209       Output:
1210
1211         A
1212         B
1213         C
1214
1215       These files were also generated containing the standard output
1216       (stdout), standard error (stderr), and the sequence number (seq):
1217
1218         outdir/1/A/seq
1219         outdir/1/A/stderr
1220         outdir/1/A/stdout
1221         outdir/1/B/seq
1222         outdir/1/B/stderr
1223         outdir/1/B/stdout
1224         outdir/1/C/seq
1225         outdir/1/C/stderr
1226         outdir/1/C/stdout
1227
1228       --header : will take the first value as name and use that in the
1229       directory structure. This is useful if you are using multiple input
1230       sources:
1231
1232         parallel --header : --results outdir echo ::: f1 A B ::: f2 C D
1233
1234       Generated files:
1235
1236         outdir/f1/A/f2/C/seq
1237         outdir/f1/A/f2/C/stderr
1238         outdir/f1/A/f2/C/stdout
1239         outdir/f1/A/f2/D/seq
1240         outdir/f1/A/f2/D/stderr
1241         outdir/f1/A/f2/D/stdout
1242         outdir/f1/B/f2/C/seq
1243         outdir/f1/B/f2/C/stderr
1244         outdir/f1/B/f2/C/stdout
1245         outdir/f1/B/f2/D/seq
1246         outdir/f1/B/f2/D/stderr
1247         outdir/f1/B/f2/D/stdout
1248
1249       The directories are named after the variables and their values.
1250

Controlling the execution

1252   Number of simultaneous jobs
1253       The number of concurrent jobs is given with --jobs/-j:
1254
1255         /usr/bin/time parallel -N0 -j64 sleep 1 :::: num128
1256
1257       With 64 jobs in parallel the 128 sleeps will take 2-8 seconds to run -
1258       depending on how fast your machine is.
1259
1260       By default --jobs is the same as the number of CPU cores. So this:
1261
1262         /usr/bin/time parallel -N0 sleep 1 :::: num128
1263
1264       should take twice the time of running 2 jobs per CPU core:
1265
1266         /usr/bin/time parallel -N0 --jobs 200% sleep 1 :::: num128
1267
1268       --jobs 0 will run as many jobs in parallel as possible:
1269
1270         /usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: num128
1271
1272       which should take 1-7 seconds depending on how fast your machine is.
1273
1274       --jobs can read from a file which is re-read when a job finishes:
1275
1276         echo 50% > my_jobs
1277         /usr/bin/time parallel -N0 --jobs my_jobs sleep 1 :::: num128 &
1278         sleep 1
1279         echo 0 > my_jobs
1280         wait
1281
1282       The first second only 50% of the CPU cores will run a job. Then 0 is
1283       put into my_jobs and then the rest of the jobs will be started in
1284       parallel.
1285
1286       Instead of basing the percentage on the number of CPU cores GNU
1287       parallel can base it on the number of CPUs:
1288
1289         parallel --use-cpus-instead-of-cores -N0 sleep 1 :::: num8
1290
1291   Shuffle job order
1292       If you have many jobs (e.g. by multiple combinations of input sources),
1293       it can be handy to shuffle the jobs, so you get different values run.
1294       Use --shuf for that:
1295
1296         parallel --shuf echo ::: 1 2 3 ::: a b c ::: A B C
1297
1298       Output:
1299
1300         All combinations but different order for each run.
1301
1302   Interactivity
1303       GNU parallel can ask the user if a command should be run using
1304       --interactive:
1305
1306         parallel --interactive echo ::: 1 2 3
1307
1308       Output:
1309
1310         echo 1 ?...y
1311         echo 2 ?...n
1312         1
1313         echo 3 ?...y
1314         3
1315
1316       GNU parallel can be used to put arguments on the command line for an
1317       interactive command such as emacs to edit one file at a time:
1318
1319         parallel --tty emacs ::: 1 2 3
1320
1321       Or give multiple argument in one go to open multiple files:
1322
1323         parallel -X --tty vi ::: 1 2 3
1324
1325   A terminal for every job
1326       Using --tmux GNU parallel can start a terminal for every job run:
1327
1328         seq 10 20 | parallel --tmux 'echo start {}; sleep {}; echo done {}'
1329
1330       This will tell you to run something similar to:
1331
1332         tmux -S /tmp/tmsrPrO0 attach
1333
1334       Using normal tmux keystrokes (CTRL-b n or CTRL-b p) you can cycle
1335       between windows of the running jobs. When a job is finished it will
1336       pause for 10 seconds before closing the window.
1337
1338   Timing
1339       Some jobs do heavy I/O when they start. To avoid a thundering herd GNU
1340       parallel can delay starting new jobs. --delay X will make sure there is
1341       at least X seconds between each start:
1342
1343         parallel --delay 2.5 echo Starting {}\;date ::: 1 2 3
1344
1345       Output:
1346
1347         Starting 1
1348         Thu Aug 15 16:24:33 CEST 2013
1349         Starting 2
1350         Thu Aug 15 16:24:35 CEST 2013
1351         Starting 3
1352         Thu Aug 15 16:24:38 CEST 2013
1353
1354       If jobs taking more than a certain amount of time are known to fail,
1355       they can be stopped with --timeout. The accuracy of --timeout is 2
1356       seconds:
1357
1358         parallel --timeout 4.1 sleep {}\; echo {} ::: 2 4 6 8
1359
1360       Output:
1361
1362         2
1363         4
1364
1365       GNU parallel can compute the median runtime for jobs and kill those
1366       that take more than 200% of the median runtime:
1367
1368         parallel --timeout 200% sleep {}\; echo {} ::: 2.1 2.2 3 7 2.3
1369
1370       Output:
1371
1372         2.1
1373         2.2
1374         3
1375         2.3
1376
1377   Progress information
1378       Based on the runtime of completed jobs GNU parallel can estimate the
1379       total runtime:
1380
1381         parallel --eta sleep ::: 1 3 2 2 1 3 3 2 1
1382
1383       Output:
1384
1385         Computers / CPU cores / Max jobs to run
1386         1:local / 2 / 2
1387
1388         Computer:jobs running/jobs completed/%of started jobs/
1389           Average seconds to complete
1390         ETA: 2s 0left 1.11avg  local:0/9/100%/1.1s
1391
1392       GNU parallel can give progress information with --progress:
1393
1394         parallel --progress sleep ::: 1 3 2 2 1 3 3 2 1
1395
1396       Output:
1397
1398         Computers / CPU cores / Max jobs to run
1399         1:local / 2 / 2
1400
1401         Computer:jobs running/jobs completed/%of started jobs/
1402           Average seconds to complete
1403         local:0/9/100%/1.1s
1404
1405       A progress bar can be shown with --bar:
1406
1407         parallel --bar sleep ::: 1 3 2 2 1 3 3 2 1
1408
1409       And a graphic bar can be shown with --bar and zenity:
1410
1411         seq 1000 | parallel -j10 --bar '(echo -n {};sleep 0.1)' \
1412           2> >(zenity --progress --auto-kill --auto-close)
1413
1414       A logfile of the jobs completed so far can be generated with --joblog:
1415
1416         parallel --joblog /tmp/log exit  ::: 1 2 3 0
1417         cat /tmp/log
1418
1419       Output:
1420
1421         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1422         1   :    1376577364.974 0.008   0    0       1       0      exit 1
1423         2   :    1376577364.982 0.013   0    0       2       0      exit 2
1424         3   :    1376577364.990 0.013   0    0       3       0      exit 3
1425         4   :    1376577365.003 0.003   0    0       0       0      exit 0
1426
1427       The log contains the job sequence, which host the job was run on, the
1428       start time and run time, how much data was transferred, the exit value,
1429       the signal that killed the job, and finally the command being run.
1430
1431       With a joblog GNU parallel can be stopped and later pickup where it
1432       left off. It it important that the input of the completed jobs is
1433       unchanged.
1434
1435         parallel --joblog /tmp/log exit  ::: 1 2 3 0
1436         cat /tmp/log
1437         parallel --resume --joblog /tmp/log exit  ::: 1 2 3 0 0 0
1438         cat /tmp/log
1439
1440       Output:
1441
1442         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1443         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1444         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1445         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1446         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1447
1448         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1449         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1450         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1451         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1452         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1453         5   :    1376580070.028 0.009   0    0       0       0      exit 0
1454         6   :    1376580070.038 0.007   0    0       0       0      exit 0
1455
1456       Note how the start time of the last 2 jobs is clearly different from
1457       the second run.
1458
1459       With --resume-failed GNU parallel will re-run the jobs that failed:
1460
1461         parallel --resume-failed --joblog /tmp/log exit  ::: 1 2 3 0 0 0
1462         cat /tmp/log
1463
1464       Output:
1465
1466         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1467         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1468         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1469         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1470         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1471         5   :    1376580070.028 0.009   0    0       0       0      exit 0
1472         6   :    1376580070.038 0.007   0    0       0       0      exit 0
1473         1   :    1376580154.433 0.010   0    0       1       0      exit 1
1474         2   :    1376580154.444 0.022   0    0       2       0      exit 2
1475         3   :    1376580154.466 0.005   0    0       3       0      exit 3
1476
1477       Note how seq 1 2 3 have been repeated because they had exit value
1478       different from 0.
1479
1480       --retry-failed does almost the same as --resume-failed. Where
1481       --resume-failed reads the commands from the command line (and ignores
1482       the commands in the joblog), --retry-failed ignores the command line
1483       and reruns the commands mentioned in the joblog.
1484
1485         parallel --retry-failed --joblog /tmp/log
1486         cat /tmp/log
1487
1488       Output:
1489
1490         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1491         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1492         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1493         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1494         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1495         5   :    1376580070.028 0.009   0    0       0       0      exit 0
1496         6   :    1376580070.038 0.007   0    0       0       0      exit 0
1497         1   :    1376580154.433 0.010   0    0       1       0      exit 1
1498         2   :    1376580154.444 0.022   0    0       2       0      exit 2
1499         3   :    1376580154.466 0.005   0    0       3       0      exit 3
1500         1   :    1376580164.633 0.010   0    0       1       0      exit 1
1501         2   :    1376580164.644 0.022   0    0       2       0      exit 2
1502         3   :    1376580164.666 0.005   0    0       3       0      exit 3
1503
1504   Termination
1505       Unconditional termination
1506
1507       By default GNU parallel will wait for all jobs to finish before
1508       exiting.
1509
1510       If you send GNU parallel the TERM signal, GNU parallel will stop
1511       spawning new jobs and wait for the remaining jobs to finish. If you
1512       send GNU parallel the TERM signal again, GNU parallel will kill all
1513       running jobs and exit.
1514
1515       Termination dependent on job status
1516
1517       For certain jobs there is no need to continue if one of the jobs fails
1518       and has an exit code different from 0. GNU parallel will stop spawning
1519       new jobs with --halt soon,fail=1:
1520
1521         parallel -j2 --halt soon,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1522
1523       Output:
1524
1525         0
1526         0
1527         1
1528         parallel: This job failed:
1529         echo 1; exit 1
1530         parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1531         2
1532
1533       With --halt now,fail=1 the running jobs will be killed immediately:
1534
1535         parallel -j2 --halt now,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1536
1537       Output:
1538
1539         0
1540         0
1541         1
1542         parallel: This job failed:
1543         echo 1; exit 1
1544
1545       If --halt is given a percentage this percentage of the jobs must fail
1546       before GNU parallel stops spawning more jobs:
1547
1548         parallel -j2 --halt soon,fail=20% echo {}\; exit {} \
1549           ::: 0 1 2 3 4 5 6 7 8 9
1550
1551       Output:
1552
1553         0
1554         1
1555         parallel: This job failed:
1556         echo 1; exit 1
1557         2
1558         parallel: This job failed:
1559         echo 2; exit 2
1560         parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1561         3
1562         parallel: This job failed:
1563         echo 3; exit 3
1564
1565       If you are looking for success instead of failures, you can use
1566       success. This will finish as soon as the first job succeeds:
1567
1568         parallel -j2 --halt now,success=1 echo {}\; exit {} ::: 1 2 3 0 4 5 6
1569
1570       Output:
1571
1572         1
1573         2
1574         3
1575         0
1576         parallel: This job succeeded:
1577         echo 0; exit 0
1578
1579       GNU parallel can retry the command with --retries. This is useful if a
1580       command fails for unknown reasons now and then.
1581
1582         parallel -k --retries 3 \
1583           'echo tried {} >>/tmp/runs; echo completed {}; exit {}' ::: 1 2 0
1584         cat /tmp/runs
1585
1586       Output:
1587
1588         completed 1
1589         completed 2
1590         completed 0
1591
1592         tried 1
1593         tried 2
1594         tried 1
1595         tried 2
1596         tried 1
1597         tried 2
1598         tried 0
1599
1600       Note how job 1 and 2 were tried 3 times, but 0 was not retried because
1601       it had exit code 0.
1602
1603       Termination signals (advanced)
1604
1605       Using --termseq you can control which signals are sent when killing
1606       children. Normally children will be killed by sending them SIGTERM,
1607       waiting 200 ms, then another SIGTERM, waiting 100 ms, then another
1608       SIGTERM, waiting 50 ms, then a SIGKILL, finally waiting 25 ms before
1609       giving up. It looks like this:
1610
1611         show_signals() {
1612           perl -e 'for(keys %SIG) {
1613               $SIG{$_} = eval "sub { print \"Got $_\\n\"; }";
1614             }
1615             while(1){sleep 1}'
1616         }
1617         export -f show_signals
1618         echo | parallel --termseq TERM,200,TERM,100,TERM,50,KILL,25 \
1619           -u --timeout 1 show_signals
1620
1621       Output:
1622
1623         Got TERM
1624         Got TERM
1625         Got TERM
1626
1627       Or just:
1628
1629         echo | parallel -u --timeout 1 show_signals
1630
1631       Output: Same as above.
1632
1633       You can change this to SIGINT, SIGTERM, SIGKILL:
1634
1635         echo | parallel --termseq INT,200,TERM,100,KILL,25 \
1636           -u --timeout 1 show_signals
1637
1638       Output:
1639
1640         Got INT
1641         Got TERM
1642
1643       The SIGKILL does not show because it cannot be caught, and thus the
1644       child dies.
1645
1646   Limiting the resources
1647       To avoid overloading systems GNU parallel can look at the system load
1648       before starting another job:
1649
1650         parallel --load 100% echo load is less than {} job per cpu ::: 1
1651
1652       Output:
1653
1654         [when then load is less than the number of cpu cores]
1655         load is less than 1 job per cpu
1656
1657       GNU parallel can also check if the system is swapping.
1658
1659         parallel --noswap echo the system is not swapping ::: now
1660
1661       Output:
1662
1663         [when then system is not swapping]
1664         the system is not swapping now
1665
1666       Some jobs need a lot of memory, and should only be started when there
1667       is enough memory free. Using --memfree GNU parallel can check if there
1668       is enough memory free. Additionally, GNU parallel will kill off the
1669       youngest job if the memory free falls below 50% of the size. The killed
1670       job will put back on the queue and retried later.
1671
1672         parallel --memfree 1G echo will run if more than 1 GB is ::: free
1673
1674       GNU parallel can run the jobs with a nice value. This will work both
1675       locally and remotely.
1676
1677         parallel --nice 17 echo this is being run with nice -n ::: 17
1678
1679       Output:
1680
1681         this is being run with nice -n 17
1682

Remote execution

1684       GNU parallel can run jobs on remote servers. It uses ssh to communicate
1685       with the remote machines.
1686
1687   Sshlogin
1688       The most basic sshlogin is -S host:
1689
1690         parallel -S $SERVER1 echo running on ::: $SERVER1
1691
1692       Output:
1693
1694         running on [$SERVER1]
1695
1696       To use a different username prepend the server with username@:
1697
1698         parallel -S username@$SERVER1 echo running on ::: username@$SERVER1
1699
1700       Output:
1701
1702         running on [username@$SERVER1]
1703
1704       The special sshlogin : is the local machine:
1705
1706         parallel -S : echo running on ::: the_local_machine
1707
1708       Output:
1709
1710         running on the_local_machine
1711
1712       If ssh is not in $PATH it can be prepended to $SERVER1:
1713
1714         parallel -S '/usr/bin/ssh '$SERVER1 echo custom ::: ssh
1715
1716       Output:
1717
1718         custom ssh
1719
1720       The ssh command can also be given using --ssh:
1721
1722         parallel --ssh /usr/bin/ssh -S $SERVER1 echo custom ::: ssh
1723
1724       or by setting $PARALLEL_SSH:
1725
1726         export PARALLEL_SSH=/usr/bin/ssh
1727         parallel -S $SERVER1 echo custom ::: ssh
1728
1729       Several servers can be given using multiple -S:
1730
1731         parallel -S $SERVER1 -S $SERVER2 echo ::: running on more hosts
1732
1733       Output (the order may be different):
1734
1735         running
1736         on
1737         more
1738         hosts
1739
1740       Or they can be separated by ,:
1741
1742         parallel -S $SERVER1,$SERVER2 echo ::: running on more hosts
1743
1744       Output: Same as above.
1745
1746       Or newline:
1747
1748         # This gives a \n between $SERVER1 and $SERVER2
1749         SERVERS="`echo $SERVER1; echo $SERVER2`"
1750         parallel -S "$SERVERS" echo ::: running on more hosts
1751
1752       They can also be read from a file (replace user@ with the user on
1753       $SERVER2):
1754
1755         echo $SERVER1 > nodefile
1756         # Force 4 cores, special ssh-command, username
1757         echo 4//usr/bin/ssh user@$SERVER2 >> nodefile
1758         parallel --sshloginfile nodefile echo ::: running on more hosts
1759
1760       Output: Same as above.
1761
1762       Every time a job finished, the --sshloginfile will be re-read, so it is
1763       possible to both add and remove hosts while running.
1764
1765       The special --sshloginfile .. reads from ~/.parallel/sshloginfile.
1766
1767       To force GNU parallel to treat a server having a given number of CPU
1768       cores prepend the number of core followed by / to the sshlogin:
1769
1770         parallel -S 4/$SERVER1 echo force {} cpus on server ::: 4
1771
1772       Output:
1773
1774         force 4 cpus on server
1775
1776       Servers can be put into groups by prepending @groupname to the server
1777       and the group can then be selected by appending @groupname to the
1778       argument if using --hostgroup:
1779
1780         parallel --hostgroup -S @grp1/$SERVER1 -S @grp2/$SERVER2 echo {} \
1781           ::: run_on_grp1@grp1 run_on_grp2@grp2
1782
1783       Output:
1784
1785         run_on_grp1
1786         run_on_grp2
1787
1788       A host can be in multiple groups by separating the groups with +, and
1789       you can force GNU parallel to limit the groups on which the command can
1790       be run with -S @groupname:
1791
1792         parallel -S @grp1 -S @grp1+grp2/$SERVER1 -S @grp2/SERVER2 echo {} \
1793           ::: run_on_grp1 also_grp1
1794
1795       Output:
1796
1797         run_on_grp1
1798         also_grp1
1799
1800   Transferring files
1801       GNU parallel can transfer the files to be processed to the remote host.
1802       It does that using rsync.
1803
1804         echo This is input_file > input_file
1805         parallel -S $SERVER1 --transferfile {} cat ::: input_file
1806
1807       Output:
1808
1809         This is input_file
1810
1811       If the files are processed into another file, the resulting file can be
1812       transferred back:
1813
1814         echo This is input_file > input_file
1815         parallel -S $SERVER1 --transferfile {} --return {}.out \
1816           cat {} ">"{}.out ::: input_file
1817         cat input_file.out
1818
1819       Output: Same as above.
1820
1821       To remove the input and output file on the remote server use --cleanup:
1822
1823         echo This is input_file > input_file
1824         parallel -S $SERVER1 --transferfile {} --return {}.out --cleanup \
1825           cat {} ">"{}.out ::: input_file
1826         cat input_file.out
1827
1828       Output: Same as above.
1829
1830       There is a shorthand for --transferfile {} --return --cleanup called
1831       --trc:
1832
1833         echo This is input_file > input_file
1834         parallel -S $SERVER1 --trc {}.out cat {} ">"{}.out ::: input_file
1835         cat input_file.out
1836
1837       Output: Same as above.
1838
1839       Some jobs need a common database for all jobs. GNU parallel can
1840       transfer that using --basefile which will transfer the file before the
1841       first job:
1842
1843         echo common data > common_file
1844         parallel --basefile common_file -S $SERVER1 \
1845           cat common_file\; echo {} ::: foo
1846
1847       Output:
1848
1849         common data
1850         foo
1851
1852       To remove it from the remote host after the last job use --cleanup.
1853
1854   Working dir
1855       The default working dir on the remote machines is the login dir. This
1856       can be changed with --workdir mydir.
1857
1858       Files transferred using --transferfile and --return will be relative to
1859       mydir on remote computers, and the command will be executed in the dir
1860       mydir.
1861
1862       The special mydir value ... will create working dirs under
1863       ~/.parallel/tmp on the remote computers. If --cleanup is given these
1864       dirs will be removed.
1865
1866       The special mydir value . uses the current working dir.  If the current
1867       working dir is beneath your home dir, the value . is treated as the
1868       relative path to your home dir. This means that if your home dir is
1869       different on remote computers (e.g. if your login is different) the
1870       relative path will still be relative to your home dir.
1871
1872         parallel -S $SERVER1 pwd ::: ""
1873         parallel --workdir . -S $SERVER1 pwd ::: ""
1874         parallel --workdir ... -S $SERVER1 pwd ::: ""
1875
1876       Output:
1877
1878         [the login dir on $SERVER1]
1879         [current dir relative on $SERVER1]
1880         [a dir in ~/.parallel/tmp/...]
1881
1882   Avoid overloading sshd
1883       If many jobs are started on the same server, sshd can be overloaded.
1884       GNU parallel can insert a delay between each job run on the same
1885       server:
1886
1887         parallel -S $SERVER1 --sshdelay 0.2 echo ::: 1 2 3
1888
1889       Output (the order may be different):
1890
1891         1
1892         2
1893         3
1894
1895       sshd will be less overloaded if using --controlmaster, which will
1896       multiplex ssh connections:
1897
1898         parallel --controlmaster -S $SERVER1 echo ::: 1 2 3
1899
1900       Output: Same as above.
1901
1902   Ignore hosts that are down
1903       In clusters with many hosts a few of them are often down. GNU parallel
1904       can ignore those hosts. In this case the host 173.194.32.46 is down:
1905
1906         parallel --filter-hosts -S 173.194.32.46,$SERVER1 echo ::: bar
1907
1908       Output:
1909
1910         bar
1911
1912   Running the same commands on all hosts
1913       GNU parallel can run the same command on all the hosts:
1914
1915         parallel --onall -S $SERVER1,$SERVER2 echo ::: foo bar
1916
1917       Output (the order may be different):
1918
1919         foo
1920         bar
1921         foo
1922         bar
1923
1924       Often you will just want to run a single command on all hosts with out
1925       arguments. --nonall is a no argument --onall:
1926
1927         parallel --nonall -S $SERVER1,$SERVER2 echo foo bar
1928
1929       Output:
1930
1931         foo bar
1932         foo bar
1933
1934       When --tag is used with --nonall and --onall the --tagstring is the
1935       host:
1936
1937         parallel --nonall --tag -S $SERVER1,$SERVER2 echo foo bar
1938
1939       Output (the order may be different):
1940
1941         $SERVER1 foo bar
1942         $SERVER2 foo bar
1943
1944       --jobs sets the number of servers to log in to in parallel.
1945
1946   Transferring environment variables and functions
1947       env_parallel is a shell function that transfers all aliases, functions,
1948       variables, and arrays. You active it by running:
1949
1950         source `which env_parallel.bash`
1951
1952       Replace bash with the shell you use.
1953
1954       Now you can use env_parallel instead of parallel and still have your
1955       environment:
1956
1957         alias myecho=echo
1958         myvar="Joe's var is"
1959         env_parallel -S $SERVER1 'myecho $myvar' ::: green
1960
1961       Output:
1962
1963         Joe's var is green
1964
1965       The disadvantage is that if your environment is huge env_parallel will
1966       fail.
1967
1968       When env_parallel fails, you can still use --env to tell GNU parallel
1969       to transfer an environment variable to the remote system.
1970
1971         MYVAR='foo bar'
1972         export MYVAR
1973         parallel --env MYVAR -S $SERVER1 echo '$MYVAR' ::: baz
1974
1975       Output:
1976
1977         foo bar baz
1978
1979       This works for functions, too, if your shell is Bash:
1980
1981         # This only works in Bash
1982         my_func() {
1983           echo in my_func $1
1984         }
1985         export -f my_func
1986         parallel --env my_func -S $SERVER1 my_func ::: baz
1987
1988       Output:
1989
1990         in my_func baz
1991
1992       GNU parallel can copy all user defined variables and functions to the
1993       remote system. It just needs to record which ones to ignore in
1994       ~/.parallel/ignored_vars. Do that by running this once:
1995
1996         parallel --record-env
1997         cat ~/.parallel/ignored_vars
1998
1999       Output:
2000
2001         [list of variables to ignore - including $PATH and $HOME]
2002
2003       Now all other variables and functions defined will be copied when using
2004       --env _.
2005
2006         # The function is only copied if using Bash
2007         my_func2() {
2008           echo in my_func2 $VAR $1
2009         }
2010         export -f my_func2
2011         VAR=foo
2012         export VAR
2013
2014         parallel --env _ -S $SERVER1 'echo $VAR; my_func2' ::: bar
2015
2016       Output:
2017
2018         foo
2019         in my_func2 foo bar
2020
2021       If you use env_parallel the variables, functions, and aliases do not
2022       even need to be exported to be copied:
2023
2024         NOT='not exported var'
2025         alias myecho=echo
2026         not_ex() {
2027           myecho in not_exported_func $NOT $1
2028         }
2029         env_parallel --env _ -S $SERVER1 'echo $NOT; not_ex' ::: bar
2030
2031       Output:
2032
2033         not exported var
2034         in not_exported_func not exported var bar
2035
2036   Showing what is actually run
2037       --verbose will show the command that would be run on the local machine.
2038
2039       When using --cat, --pipepart, or when a job is run on a remote machine,
2040       the command is wrapped with helper scripts. -vv shows all of this.
2041
2042         parallel -vv --pipepart --block 1M wc :::: num30000
2043
2044       Output:
2045
2046         <num30000 perl -e 'while(@ARGV) { sysseek(STDIN,shift,0) || die;
2047         $left = shift; while($read = sysread(STDIN,$buf, ($left > 131072
2048         ? 131072 : $left))){ $left -= $read; syswrite(STDOUT,$buf); } }'
2049         0 0 0 168894 | (wc)
2050           30000   30000  168894
2051
2052       When the command gets more complex, the output is so hard to read, that
2053       it is only useful for debugging:
2054
2055         my_func3() {
2056           echo in my_func $1 > $1.out
2057         }
2058         export -f my_func3
2059         parallel -vv --workdir ... --nice 17 --env _ --trc {}.out \
2060           -S $SERVER1 my_func3 {} ::: abc-file
2061
2062       Output will be similar to:
2063
2064         ( ssh server -- mkdir -p ./.parallel/tmp/aspire-1928520-1;rsync
2065         --protocol 30 -rlDzR -essh ./abc-file
2066         server:./.parallel/tmp/aspire-1928520-1 );ssh server -- exec perl -e
2067         \''@GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
2068         eval"@GNU_Parallel";my$eval=decode_base64(join"",@ARGV);eval$eval;'\'
2069         c3lzdGVtKCJta2RpciIsIi1wIiwiLS0iLCIucGFyYWxsZWwvdG1wL2FzcGlyZS0xOTI4N
2070         TsgY2hkaXIgIi5wYXJhbGxlbC90bXAvYXNwaXJlLTE5Mjg1MjAtMSIgfHxwcmludChTVE
2071         BhcmFsbGVsOiBDYW5ub3QgY2hkaXIgdG8gLnBhcmFsbGVsL3RtcC9hc3BpcmUtMTkyODU
2072         iKSAmJiBleGl0IDI1NTskRU5WeyJPTERQV0QifT0iL2hvbWUvdGFuZ2UvcHJpdmF0L3Bh
2073         IjskRU5WeyJQQVJBTExFTF9QSUQifT0iMTkyODUyMCI7JEVOVnsiUEFSQUxMRUxfU0VRI
2074         0BiYXNoX2Z1bmN0aW9ucz1xdyhteV9mdW5jMyk7IGlmKCRFTlZ7IlNIRUxMIn09fi9jc2
2075         ByaW50IFNUREVSUiAiQ1NIL1RDU0ggRE8gTk9UIFNVUFBPUlQgbmV3bGluZXMgSU4gVkF
2076         TL0ZVTkNUSU9OUy4gVW5zZXQgQGJhc2hfZnVuY3Rpb25zXG4iOyBleGVjICJmYWxzZSI7
2077         YXNoZnVuYyA9ICJteV9mdW5jMygpIHsgIGVjaG8gaW4gbXlfZnVuYyBcJDEgPiBcJDEub
2078         Xhwb3J0IC1mIG15X2Z1bmMzID4vZGV2L251bGw7IjtAQVJHVj0ibXlfZnVuYzMgYWJjLW
2079         RzaGVsbD0iJEVOVntTSEVMTH0iOyR0bXBkaXI9Ii90bXAiOyRuaWNlPTE3O2RveyRFTlZ
2080         MRUxfVE1QfT0kdG1wZGlyLiIvcGFyIi5qb2luIiIsbWFweygwLi45LCJhIi4uInoiLCJB
2081         KVtyYW5kKDYyKV19KDEuLjUpO313aGlsZSgtZSRFTlZ7UEFSQUxMRUxfVE1QfSk7JFNJ
2082         fT1zdWJ7JGRvbmU9MTt9OyRwaWQ9Zm9yazt1bmxlc3MoJHBpZCl7c2V0cGdycDtldmFse
2083         W9yaXR5KDAsMCwkbmljZSl9O2V4ZWMkc2hlbGwsIi1jIiwoJGJhc2hmdW5jLiJAQVJHVi
2084         JleGVjOiQhXG4iO31kb3skcz0kczwxPzAuMDAxKyRzKjEuMDM6JHM7c2VsZWN0KHVuZGV
2085         mLHVuZGVmLCRzKTt9dW50aWwoJGRvbmV8fGdldHBwaWQ9PTEpO2tpbGwoU0lHSFVQLC0k
2086         dW5sZXNzJGRvbmU7d2FpdDtleGl0KCQ/JjEyNz8xMjgrKCQ/JjEyNyk6MSskPz4+OCk=;
2087         _EXIT_status=$?; mkdir -p ./.; rsync --protocol 30 --rsync-path=cd\
2088         ./.parallel/tmp/aspire-1928520-1/./.\;\ rsync -rlDzR -essh
2089         server:./abc-file.out ./.;ssh server -- \(rm\ -f\
2090         ./.parallel/tmp/aspire-1928520-1/abc-file\;\ sh\ -c\ \'rmdir\
2091         ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\
2092         2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);ssh
2093         server -- \(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file.out\;\
2094         sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\
2095         ./.parallel/\ 2\>/dev/null\'\;rm\ -rf\
2096         ./.parallel/tmp/aspire-1928520-1\;\);ssh server -- rm -rf
2097         .parallel/tmp/aspire-1928520-1; exit $_EXIT_status;
2098

Saving output to shell variables (advanced)

2100       GNU parset will set shell variables to the output of GNU parallel. GNU
2101       parset has one important limitation: It cannot be part of a pipe. In
2102       particular this means it cannot read anything from standard input
2103       (stdin) or pipe output to another program.
2104
2105       To use GNU parset prepend command with destination variables:
2106
2107         parset myvar1,myvar2 echo ::: a b
2108         echo $myvar1
2109         echo $myvar2
2110
2111       Output:
2112
2113         a
2114         b
2115
2116       If you only give a single variable, it will be treated as an array:
2117
2118         parset myarray seq {} 5 ::: 1 2 3
2119         echo "${myarray[1]}"
2120
2121       Output:
2122
2123         2
2124         3
2125         4
2126         5
2127
2128       The commands to run can be an array:
2129
2130         cmd=("echo '<<joe  \"double  space\"  cartoon>>'" "pwd")
2131         parset data ::: "${cmd[@]}"
2132         echo "${data[0]}"
2133         echo "${data[1]}"
2134
2135       Output:
2136
2137         <<joe  "double  space"  cartoon>>
2138         [current dir]
2139

Saving to an SQL base (advanced)

2141       GNU parallel can save into an SQL base. Point GNU parallel to a table
2142       and it will put the joblog there together with the variables and the
2143       output each in their own column.
2144
2145   CSV as SQL base
2146       The simplest is to use a CSV file as the storage table:
2147
2148         parallel --sqlandworker csv:////%2Ftmp%2Flog.csv \
2149           seq ::: 10 ::: 12 13 14
2150         cat /tmp/log.csv
2151
2152       Note how '/' in the path must be written as %2F.
2153
2154       Output will be similar to:
2155
2156         Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2157           Command,V1,V2,Stdout,Stderr
2158         1,:,1458254498.254,0.069,0,9,0,0,"seq 10 12",10,12,"10
2159         11
2160         12
2161         ",
2162         2,:,1458254498.278,0.080,0,12,0,0,"seq 10 13",10,13,"10
2163         11
2164         12
2165         13
2166         ",
2167         3,:,1458254498.301,0.083,0,15,0,0,"seq 10 14",10,14,"10
2168         11
2169         12
2170         13
2171         14
2172         ",
2173
2174       A proper CSV reader (like LibreOffice or R's read.csv) will read this
2175       format correctly - even with fields containing newlines as above.
2176
2177       If the output is big you may want to put it into files using --results:
2178
2179         parallel --results outdir --sqlandworker csv:////%2Ftmp%2Flog2.csv \
2180           seq ::: 10 ::: 12 13 14
2181         cat /tmp/log2.csv
2182
2183       Output will be similar to:
2184
2185         Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2186           Command,V1,V2,Stdout,Stderr
2187         1,:,1458824738.287,0.029,0,9,0,0,
2188           "seq 10 12",10,12,outdir/1/10/2/12/stdout,outdir/1/10/2/12/stderr
2189         2,:,1458824738.298,0.025,0,12,0,0,
2190           "seq 10 13",10,13,outdir/1/10/2/13/stdout,outdir/1/10/2/13/stderr
2191         3,:,1458824738.309,0.026,0,15,0,0,
2192           "seq 10 14",10,14,outdir/1/10/2/14/stdout,outdir/1/10/2/14/stderr
2193
2194   DBURL as table
2195       The CSV file is an example of a DBURL.
2196
2197       GNU parallel uses a DBURL to address the table. A DBURL has this
2198       format:
2199
2200         vendor://[[user][:password]@][host][:port]/[database[/table]
2201
2202       Example:
2203
2204         mysql://scott:tiger@my.example.com/mydatabase/mytable
2205         postgresql://scott:tiger@pg.example.com/mydatabase/mytable
2206         sqlite3:///%2Ftmp%2Fmydatabase/mytable
2207         csv:////%2Ftmp%2Flog.csv
2208
2209       To refer to /tmp/mydatabase with sqlite or csv you need to encode the /
2210       as %2F.
2211
2212       Run a job using sqlite on mytable in /tmp/mydatabase:
2213
2214         DBURL=sqlite3:///%2Ftmp%2Fmydatabase
2215         DBURLTABLE=$DBURL/mytable
2216         parallel --sqlandworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2217
2218       To see the result:
2219
2220         sql $DBURL 'SELECT * FROM mytable ORDER BY Seq;'
2221
2222       Output will be similar to:
2223
2224         Seq|Host|Starttime|JobRuntime|Send|Receive|Exitval|_Signal|
2225           Command|V1|V2|Stdout|Stderr
2226         1|:|1451619638.903|0.806||8|0|0|echo foo baz|foo|baz|foo baz
2227         |
2228         2|:|1451619639.265|1.54||9|0|0|echo foo quuz|foo|quuz|foo quuz
2229         |
2230         3|:|1451619640.378|1.43||8|0|0|echo bar baz|bar|baz|bar baz
2231         |
2232         4|:|1451619641.473|0.958||9|0|0|echo bar quuz|bar|quuz|bar quuz
2233         |
2234
2235       The first columns are well known from --joblog. V1 and V2 are data from
2236       the input sources. Stdout and Stderr are standard output and standard
2237       error, respectively.
2238
2239   Using multiple workers
2240       Using an SQL base as storage costs overhead in the order of 1 second
2241       per job.
2242
2243       One of the situations where it makes sense is if you have multiple
2244       workers.
2245
2246       You can then have a single master machine that submits jobs to the SQL
2247       base (but does not do any of the work):
2248
2249         parallel --sqlmaster $DBURLTABLE echo ::: foo bar ::: baz quuz
2250
2251       On the worker machines you run exactly the same command except you
2252       replace --sqlmaster with --sqlworker.
2253
2254         parallel --sqlworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2255
2256       To run a master and a worker on the same machine use --sqlandworker as
2257       shown earlier.
2258

--pipe

2260       The --pipe functionality puts GNU parallel in a different mode: Instead
2261       of treating the data on stdin (standard input) as arguments for a
2262       command to run, the data will be sent to stdin (standard input) of the
2263       command.
2264
2265       The typical situation is:
2266
2267         command_A | command_B | command_C
2268
2269       where command_B is slow, and you want to speed up command_B.
2270
2271   Chunk size
2272       By default GNU parallel will start an instance of command_B, read a
2273       chunk of 1 MB, and pass that to the instance. Then start another
2274       instance, read another chunk, and pass that to the second instance.
2275
2276         cat num1000000 | parallel --pipe wc
2277
2278       Output (the order may be different):
2279
2280         165668  165668 1048571
2281         149797  149797 1048579
2282         149796  149796 1048572
2283         149797  149797 1048579
2284         149797  149797 1048579
2285         149796  149796 1048572
2286          85349   85349  597444
2287
2288       The size of the chunk is not exactly 1 MB because GNU parallel only
2289       passes full lines - never half a line, thus the blocksize is only 1 MB
2290       on average. You can change the block size to 2 MB with --block:
2291
2292         cat num1000000 | parallel --pipe --block 2M wc
2293
2294       Output (the order may be different):
2295
2296         315465  315465 2097150
2297         299593  299593 2097151
2298         299593  299593 2097151
2299          85349   85349  597444
2300
2301       GNU parallel treats each line as a record. If the order of records is
2302       unimportant (e.g. you need all lines processed, but you do not care
2303       which is processed first), then you can use --roundrobin. Without
2304       --roundrobin GNU parallel will start a command per block; with
2305       --roundrobin only the requested number of jobs will be started
2306       (--jobs). The records will then be distributed between the running
2307       jobs:
2308
2309         cat num1000000 | parallel --pipe -j4 --roundrobin wc
2310
2311       Output will be similar to:
2312
2313         149797  149797 1048579
2314         299593  299593 2097151
2315         315465  315465 2097150
2316         235145  235145 1646016
2317
2318       One of the 4 instances got a single record, 2 instances got 2 full
2319       records each, and one instance got 1 full and 1 partial record.
2320
2321   Records
2322       GNU parallel sees the input as records. The default record is a single
2323       line.
2324
2325       Using -N140000 GNU parallel will read 140000 records at a time:
2326
2327         cat num1000000 | parallel --pipe -N140000 wc
2328
2329       Output (the order may be different):
2330
2331         140000  140000  868895
2332         140000  140000  980000
2333         140000  140000  980000
2334         140000  140000  980000
2335         140000  140000  980000
2336         140000  140000  980000
2337         140000  140000  980000
2338          20000   20000  140001
2339
2340       Note how that the last job could not get the full 140000 lines, but
2341       only 20000 lines.
2342
2343       If a record is 75 lines -L can be used:
2344
2345         cat num1000000 | parallel --pipe -L75 wc
2346
2347       Output (the order may be different):
2348
2349         165600  165600 1048095
2350         149850  149850 1048950
2351         149775  149775 1048425
2352         149775  149775 1048425
2353         149850  149850 1048950
2354         149775  149775 1048425
2355          85350   85350  597450
2356             25      25     176
2357
2358       Note how GNU parallel still reads a block of around 1 MB; but instead
2359       of passing full lines to wc it passes full 75 lines at a time. This of
2360       course does not hold for the last job (which in this case got 25
2361       lines).
2362
2363   Fixed length records
2364       Fixed length records can be processed by setting --recend '' and
2365       --block recordsize. A header of size n can be processed with --header
2366       .{n}.
2367
2368       Here is how to process a file with a 4-byte header and a 3-byte record
2369       size:
2370
2371         cat fixedlen | parallel --pipe --header .{4} --block 3 --recend '' \
2372           'echo start; cat; echo'
2373
2374       Output:
2375
2376         start
2377         HHHHAAA
2378         start
2379         HHHHCCC
2380         start
2381         HHHHBBB
2382
2383       It may be more efficient to increase --block to a multiplum of the
2384       record size.
2385
2386   Record separators
2387       GNU parallel uses separators to determine where two records split.
2388
2389       --recstart gives the string that starts a record; --recend gives the
2390       string that ends a record. The default is --recend '\n' (newline).
2391
2392       If both --recend and --recstart are given, then the record will only
2393       split if the recend string is immediately followed by the recstart
2394       string.
2395
2396       Here the --recend is set to ', ':
2397
2398         echo /foo, bar/, /baz, qux/, | \
2399           parallel -kN1 --recend ', ' --pipe echo JOB{#}\;cat\;echo END
2400
2401       Output:
2402
2403         JOB1
2404         /foo, END
2405         JOB2
2406         bar/, END
2407         JOB3
2408         /baz, END
2409         JOB4
2410         qux/,
2411         END
2412
2413       Here the --recstart is set to /:
2414
2415         echo /foo, bar/, /baz, qux/, | \
2416           parallel -kN1 --recstart / --pipe echo JOB{#}\;cat\;echo END
2417
2418       Output:
2419
2420         JOB1
2421         /foo, barEND
2422         JOB2
2423         /, END
2424         JOB3
2425         /baz, quxEND
2426         JOB4
2427         /,
2428         END
2429
2430       Here both --recend and --recstart are set:
2431
2432         echo /foo, bar/, /baz, qux/, | \
2433           parallel -kN1 --recend ', ' --recstart / --pipe \
2434           echo JOB{#}\;cat\;echo END
2435
2436       Output:
2437
2438         JOB1
2439         /foo, bar/, END
2440         JOB2
2441         /baz, qux/,
2442         END
2443
2444       Note the difference between setting one string and setting both
2445       strings.
2446
2447       With --regexp the --recend and --recstart will be treated as a regular
2448       expression:
2449
2450         echo foo,bar,_baz,__qux, | \
2451           parallel -kN1 --regexp --recend ,_+ --pipe \
2452           echo JOB{#}\;cat\;echo END
2453
2454       Output:
2455
2456         JOB1
2457         foo,bar,_END
2458         JOB2
2459         baz,__END
2460         JOB3
2461         qux,
2462         END
2463
2464       GNU parallel can remove the record separators with
2465       --remove-rec-sep/--rrs:
2466
2467         echo foo,bar,_baz,__qux, | \
2468           parallel -kN1 --rrs --regexp --recend ,_+ --pipe \
2469           echo JOB{#}\;cat\;echo END
2470
2471       Output:
2472
2473         JOB1
2474         foo,barEND
2475         JOB2
2476         bazEND
2477         JOB3
2478         qux,
2479         END
2480
2481   Header
2482       If the input data has a header, the header can be repeated for each job
2483       by matching the header with --header. If headers start with % you can
2484       do this:
2485
2486         cat num_%header | \
2487           parallel --header '(%.*\n)*' --pipe -N3 echo JOB{#}\;cat
2488
2489       Output (the order may be different):
2490
2491         JOB1
2492         %head1
2493         %head2
2494         1
2495         2
2496         3
2497         JOB2
2498         %head1
2499         %head2
2500         4
2501         5
2502         6
2503         JOB3
2504         %head1
2505         %head2
2506         7
2507         8
2508         9
2509         JOB4
2510         %head1
2511         %head2
2512         10
2513
2514       If the header is 2 lines, --header 2 will work:
2515
2516         cat num_%header | parallel --header 2 --pipe -N3 echo JOB{#}\;cat
2517
2518       Output: Same as above.
2519
2520   --pipepart
2521       --pipe is not very efficient. It maxes out at around 500 MB/s.
2522       --pipepart can easily deliver 5 GB/s. But there are a few limitations.
2523       The input has to be a normal file (not a pipe) given by -a or :::: and
2524       -L/-l/-N do not work. --recend and --recstart, however, do work, and
2525       records can often be split on that alone.
2526
2527         parallel --pipepart -a num1000000 --block 3m wc
2528
2529       Output (the order may be different):
2530
2531        444443  444444 3000002
2532        428572  428572 3000004
2533        126985  126984  888890
2534

Shebang

2536   Input data and parallel command in the same file
2537       GNU parallel is often called as this:
2538
2539         cat input_file | parallel command
2540
2541       With --shebang the input_file and parallel can be combined into the
2542       same script.
2543
2544       UNIX shell scripts start with a shebang line like this:
2545
2546         #!/bin/bash
2547
2548       GNU parallel can do that, too. With --shebang the arguments can be
2549       listed in the file. The parallel command is the first line of the
2550       script:
2551
2552         #!/usr/bin/parallel --shebang -r echo
2553
2554         foo
2555         bar
2556         baz
2557
2558       Output (the order may be different):
2559
2560         foo
2561         bar
2562         baz
2563
2564   Parallelizing existing scripts
2565       GNU parallel is often called as this:
2566
2567         cat input_file | parallel command
2568         parallel command ::: foo bar
2569
2570       If command is a script, parallel can be combined into a single file so
2571       this will run the script in parallel:
2572
2573         cat input_file | command
2574         command foo bar
2575
2576       This perl script perl_echo works like echo:
2577
2578         #!/usr/bin/perl
2579
2580         print "@ARGV\n"
2581
2582       It can be called as this:
2583
2584         parallel perl_echo ::: foo bar
2585
2586       By changing the #!-line it can be run in parallel:
2587
2588         #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2589
2590         print "@ARGV\n"
2591
2592       Thus this will work:
2593
2594         perl_echo foo bar
2595
2596       Output (the order may be different):
2597
2598         foo
2599         bar
2600
2601       This technique can be used for:
2602
2603       Perl:
2604                  #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2605
2606                  print "Arguments @ARGV\n";
2607
2608       Python:
2609                  #!/usr/bin/parallel --shebang-wrap /usr/bin/python
2610
2611                  import sys
2612                  print 'Arguments', str(sys.argv)
2613
2614       Bash/sh/zsh/Korn shell:
2615                  #!/usr/bin/parallel --shebang-wrap /bin/bash
2616
2617                  echo Arguments "$@"
2618
2619       csh:
2620                  #!/usr/bin/parallel --shebang-wrap /bin/csh
2621
2622                  echo Arguments "$argv"
2623
2624       Tcl:
2625                  #!/usr/bin/parallel --shebang-wrap /usr/bin/tclsh
2626
2627                  puts "Arguments $argv"
2628
2629       R:
2630                  #!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave
2631
2632                  args <- commandArgs(trailingOnly = TRUE)
2633                  print(paste("Arguments ",args))
2634
2635       GNUplot:
2636                  #!/usr/bin/parallel --shebang-wrap ARG={} /usr/bin/gnuplot
2637
2638                  print "Arguments ", system('echo $ARG')
2639
2640       Ruby:
2641                  #!/usr/bin/parallel --shebang-wrap /usr/bin/ruby
2642
2643                  print "Arguments "
2644                  puts ARGV
2645
2646       Octave:
2647                  #!/usr/bin/parallel --shebang-wrap /usr/bin/octave
2648
2649                  printf ("Arguments");
2650                  arg_list = argv ();
2651                  for i = 1:nargin
2652                    printf (" %s", arg_list{i});
2653                  endfor
2654                  printf ("\n");
2655
2656       Common LISP:
2657                  #!/usr/bin/parallel --shebang-wrap /usr/bin/clisp
2658
2659                  (format t "~&~S~&" 'Arguments)
2660                  (format t "~&~S~&" *args*)
2661
2662       PHP:
2663                  #!/usr/bin/parallel --shebang-wrap /usr/bin/php
2664                  <?php
2665                  echo "Arguments";
2666                  foreach(array_slice($argv,1) as $v)
2667                  {
2668                    echo " $v";
2669                  }
2670                  echo "\n";
2671                  ?>
2672
2673       Node.js:
2674                  #!/usr/bin/parallel --shebang-wrap /usr/bin/node
2675
2676                  var myArgs = process.argv.slice(2);
2677                  console.log('Arguments ', myArgs);
2678
2679       LUA:
2680                  #!/usr/bin/parallel --shebang-wrap /usr/bin/lua
2681
2682                  io.write "Arguments"
2683                  for a = 1, #arg do
2684                    io.write(" ")
2685                    io.write(arg[a])
2686                  end
2687                  print("")
2688
2689       C#:
2690                  #!/usr/bin/parallel --shebang-wrap ARGV={} /usr/bin/csharp
2691
2692                  var argv = Environment.GetEnvironmentVariable("ARGV");
2693                  print("Arguments "+argv);
2694

Semaphore

2696       GNU parallel can work as a counting semaphore. This is slower and less
2697       efficient than its normal mode.
2698
2699       A counting semaphore is like a row of toilets. People needing a toilet
2700       can use any toilet, but if there are more people than toilets, they
2701       will have to wait for one of the toilets to become available.
2702
2703       An alias for parallel --semaphore is sem.
2704
2705       sem will follow a person to the toilets, wait until a toilet is
2706       available, leave the person in the toilet and exit.
2707
2708       sem --fg will follow a person to the toilets, wait until a toilet is
2709       available, stay with the person in the toilet and exit when the person
2710       exits.
2711
2712       sem --wait will wait for all persons to leave the toilets.
2713
2714       sem does not have a queue discipline, so the next person is chosen
2715       randomly.
2716
2717       -j sets the number of toilets.
2718
2719   Mutex
2720       The default is to have only one toilet (this is called a mutex). The
2721       program is started in the background and sem exits immediately. Use
2722       --wait to wait for all sems to finish:
2723
2724         sem 'sleep 1; echo The first finished' &&
2725           echo The first is now running in the background &&
2726           sem 'sleep 1; echo The second finished' &&
2727           echo The second is now running in the background
2728         sem --wait
2729
2730       Output:
2731
2732         The first is now running in the background
2733         The first finished
2734         The second is now running in the background
2735         The second finished
2736
2737       The command can be run in the foreground with --fg, which will only
2738       exit when the command completes:
2739
2740         sem --fg 'sleep 1; echo The first finished' &&
2741           echo The first finished running in the foreground &&
2742           sem --fg 'sleep 1; echo The second finished' &&
2743           echo The second finished running in the foreground
2744         sem --wait
2745
2746       The difference between this and just running the command, is that a
2747       mutex is set, so if other sems were running in the background only one
2748       would run at a time.
2749
2750       To control which semaphore is used, use --semaphorename/--id. Run this
2751       in one terminal:
2752
2753         sem --id my_id -u 'echo First started; sleep 10; echo First done'
2754
2755       and simultaneously this in another terminal:
2756
2757         sem --id my_id -u 'echo Second started; sleep 10; echo Second done'
2758
2759       Note how the second will only be started when the first has finished.
2760
2761   Counting semaphore
2762       A mutex is like having a single toilet: When it is in use everyone else
2763       will have to wait. A counting semaphore is like having multiple
2764       toilets: Several people can use the toilets, but when they all are in
2765       use, everyone else will have to wait.
2766
2767       sem can emulate a counting semaphore. Use --jobs to set the number of
2768       toilets like this:
2769
2770         sem --jobs 3 --id my_id -u 'echo Start 1; sleep 5; echo 1 done' &&
2771         sem --jobs 3 --id my_id -u 'echo Start 2; sleep 6; echo 2 done' &&
2772         sem --jobs 3 --id my_id -u 'echo Start 3; sleep 7; echo 3 done' &&
2773         sem --jobs 3 --id my_id -u 'echo Start 4; sleep 8; echo 4 done' &&
2774         sem --wait --id my_id
2775
2776       Output:
2777
2778         Start 1
2779         Start 2
2780         Start 3
2781         1 done
2782         Start 4
2783         2 done
2784         3 done
2785         4 done
2786
2787   Timeout
2788       With --semaphoretimeout you can force running the command anyway after
2789       a period (positive number) or give up (negative number):
2790
2791         sem --id foo -u 'echo Slow started; sleep 5; echo Slow ended' &&
2792         sem --id foo --semaphoretimeout 1 'echo Forced running after 1 sec' &&
2793         sem --id foo --semaphoretimeout -2 'echo Give up after 2 secs'
2794         sem --id foo --wait
2795
2796       Output:
2797
2798         Slow started
2799         parallel: Warning: Semaphore timed out. Stealing the semaphore.
2800         Forced running after 1 sec
2801         parallel: Warning: Semaphore timed out. Exiting.
2802         Slow ended
2803
2804       Note how the 'Give up' was not run.
2805

Informational

2807       GNU parallel has some options to give short information about the
2808       configuration.
2809
2810       --help will print a summary of the most important options:
2811
2812         parallel --help
2813
2814       Output:
2815
2816         Usage:
2817
2818         parallel [options] [command [arguments]] < list_of_arguments
2819         parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
2820         cat ... | parallel --pipe [options] [command [arguments]]
2821
2822         -j n            Run n jobs in parallel
2823         -k              Keep same order
2824         -X              Multiple arguments with context replace
2825         --colsep regexp Split input on regexp for positional replacements
2826         {} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings
2827         {3} {3.} {3/} {3/.} {=3 perl code =}    Positional replacement strings
2828         With --plus:    {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
2829                         {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
2830
2831         -S sshlogin     Example: foo@server.example.com
2832         --slf ..        Use ~/.parallel/sshloginfile as the list of sshlogins
2833         --trc {}.bar    Shorthand for --transfer --return {}.bar --cleanup
2834         --onall         Run the given command with argument on all sshlogins
2835         --nonall        Run the given command with no arguments on all sshlogins
2836
2837         --pipe          Split stdin (standard input) to multiple jobs.
2838         --recend str    Record end separator for --pipe.
2839         --recstart str  Record start separator for --pipe.
2840
2841         See 'man parallel' for details
2842
2843         Academic tradition requires you to cite works you base your article on.
2844         When using programs that use GNU Parallel to process data for publication
2845         please cite:
2846
2847           O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
2848           ;login: The USENIX Magazine, February 2011:42-47.
2849
2850         This helps funding further development; AND IT WON'T COST YOU A CENT.
2851         If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2852
2853       When asking for help, always report the full output of this:
2854
2855         parallel --version
2856
2857       Output:
2858
2859         GNU parallel 20180122
2860         Copyright (C) 2007-2018
2861         Ole Tange and Free Software Foundation, Inc.
2862         License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
2863         This is free software: you are free to change and redistribute it.
2864         GNU parallel comes with no warranty.
2865
2866         Web site: http://www.gnu.org/software/parallel
2867
2868         When using programs that use GNU Parallel to process data for publication
2869         please cite as described in 'parallel --citation'.
2870
2871       In scripts --minversion can be used to ensure the user has at least
2872       this version:
2873
2874         parallel --minversion 20130722 && \
2875           echo Your version is at least 20130722.
2876
2877       Output:
2878
2879         20160322
2880         Your version is at least 20130722.
2881
2882       If you are using GNU parallel for research the BibTeX citation can be
2883       generated using --citation:
2884
2885         parallel --citation
2886
2887       Output:
2888
2889         Academic tradition requires you to cite works you base your article on.
2890         When using programs that use GNU Parallel to process data for publication
2891         please cite:
2892
2893         @article{Tange2011a,
2894           title = {GNU Parallel - The Command-Line Power Tool},
2895           author = {O. Tange},
2896           address = {Frederiksberg, Denmark},
2897           journal = {;login: The USENIX Magazine},
2898           month = {Feb},
2899           number = {1},
2900           volume = {36},
2901           url = {http://www.gnu.org/s/parallel},
2902           year = {2011},
2903           pages = {42-47},
2904           doi = {10.5281/zenodo.16303}
2905         }
2906
2907         (Feel free to use \nocite{Tange2011a})
2908
2909         This helps funding further development; AND IT WON'T COST YOU A CENT.
2910         If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2911
2912         If you send a copy of your published article to tange@gnu.org, it will be
2913         mentioned in the release notes of next version of GNU Parallel.
2914
2915       With --max-line-length-allowed GNU parallel will report the maximal
2916       size of the command line:
2917
2918         parallel --max-line-length-allowed
2919
2920       Output (may vary on different systems):
2921
2922         131071
2923
2924       --number-of-cpus and --number-of-cores run system specific code to
2925       determine the number of CPUs and CPU cores on the system. On
2926       unsupported platforms they will return 1:
2927
2928         parallel --number-of-cpus
2929         parallel --number-of-cores
2930
2931       Output (may vary on different systems):
2932
2933         4
2934         64
2935

Profiles

2937       The defaults for GNU parallel can be changed systemwide by putting the
2938       command line options in /etc/parallel/config. They can be changed for a
2939       user by putting them in ~/.parallel/config.
2940
2941       Profiles work the same way, but have to be referred to with --profile:
2942
2943         echo '--nice 17' > ~/.parallel/nicetimeout
2944         echo '--timeout 300%' >> ~/.parallel/nicetimeout
2945         parallel --profile nicetimeout echo ::: A B C
2946
2947       Output:
2948
2949         A
2950         B
2951         C
2952
2953       Profiles can be combined:
2954
2955         echo '-vv --dry-run' > ~/.parallel/dryverbose
2956         parallel --profile dryverbose --profile nicetimeout echo ::: A B C
2957
2958       Output:
2959
2960         echo A
2961         echo B
2962         echo C
2963

Spread the word

2965       I hope you have learned something from this tutorial.
2966
2967       If you like GNU parallel:
2968
2969       · (Re-)walk through the tutorial if you have not done so in the past
2970         year (http://www.gnu.org/software/parallel/parallel_tutorial.html)
2971
2972       · Give a demo at your local user group/your team/your colleagues
2973
2974       · Post the intro videos and the tutorial on Reddit, Mastodon,
2975         Diaspora*, forums, blogs, Identi.ca, Google+, Twitter, Facebook,
2976         Linkedin, and mailing lists
2977
2978       · Request or write a review for your favourite blog or magazine
2979         (especially if you do something cool with GNU parallel)
2980
2981       · Invite me for your next conference
2982
2983       If you use GNU parallel for research:
2984
2985       · Please cite GNU parallel in you publications (use --citation)
2986
2987       If GNU parallel saves you money:
2988
2989       · (Have your company) donate to FSF or become a member
2990         https://my.fsf.org/donate/
2991
2992       (C) 2013-2019 Ole Tange, FDLv1.3 (See fdl.txt)
2993
2994
2995
299620190422                          2019-05-01              PARALLEL_TUTORIAL(7)