parallel_tutorial(7)

1PARALLEL_TUTORIAL(7)               parallel               PARALLEL_TUTORIAL(7)
2
3
4

GNU Parallel Tutorial

6       This tutorial shows off much of GNU parallel's functionality. The
7       tutorial is meant to learn the options in and syntax of GNU parallel.
8       The tutorial is not to show realistic examples from the real world.
9
10   Reader's guide
11       If you prefer reading a book buy GNU Parallel 2018 at
12       http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
13       or download it at: https://doi.org/10.5281/zenodo.1146014
14
15       Otherwise start by watching the intro videos for a quick introduction:
16       http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
17
18       Then browse through the EXAMPLEs after the list of OPTIONS in man
19       parallel (Use LESS=+/EXAMPLE: man parallel). That will give you an idea
20       of what GNU parallel is capable of.
21
22       If you want to dive even deeper: spend a couple of hours walking
23       through the tutorial (man parallel_tutorial). Your command line will
24       love you for it.
25
26       Finally you may want to look at the rest of the manual (man parallel)
27       if you have special needs not already covered.
28
29       If you want to know the design decisions behind GNU parallel, try: man
30       parallel_design. This is also a good intro if you intend to change GNU
31       parallel.
32

Prerequisites

34       To run this tutorial you must have the following:
35
36       parallel >= version 20160822
37                Install the newest version using your package manager
38                (recommended for security reasons), the way described in
39                README, or with this command:
40
41                  $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
42                     fetch -o - http://pi.dk/3 ) > install.sh
43                  $ sha1sum install.sh
44                  12345678 3374ec53 bacb199b 245af2dd a86df6c9
45                  $ md5sum install.sh
46                  029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
47                  $ sha512sum install.sh
48                  40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
49                  60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
50                  $ bash install.sh
51
52                This will also install the newest version of the tutorial
53                which you can see by running this:
54
55                  man parallel_tutorial
56
57                Most of the tutorial will work on older versions, too.
58
59       abc-file:
60                The file can be generated by this command:
61
62                  parallel -k echo ::: A B C > abc-file
63
64       def-file:
65                The file can be generated by this command:
66
67                  parallel -k echo ::: D E F > def-file
68
69       abc0-file:
70                The file can be generated by this command:
71
72                  perl -e 'printf "A\0B\0C\0"' > abc0-file
73
74       abc_-file:
75                The file can be generated by this command:
76
77                  perl -e 'printf "A_B_C_"' > abc_-file
78
79       tsv-file.tsv
80                The file can be generated by this command:
81
82                  perl -e 'printf "f1\tf2\nA\tB\nC\tD\n"' > tsv-file.tsv
83
84       num8     The file can be generated by this command:
85
86                  perl -e 'for(1..8){print "$_\n"}' > num8
87
88       num128   The file can be generated by this command:
89
90                  perl -e 'for(1..128){print "$_\n"}' > num128
91
92       num30000 The file can be generated by this command:
93
94                  perl -e 'for(1..30000){print "$_\n"}' > num30000
95
96       num1000000
97                The file can be generated by this command:
98
99                  perl -e 'for(1..1000000){print "$_\n"}' > num1000000
100
101       num_%header
102                The file can be generated by this command:
103
104                  (echo %head1; echo %head2; \
105                   perl -e 'for(1..10){print "$_\n"}') > num_%header
106
107       fixedlen The file can be generated by this command:
108
109                  perl -e 'print "HHHHAAABBBCCC"' > fixedlen
110
111       For remote running: ssh login on 2 servers with no password in $SERVER1
112       and $SERVER2 must work.
113                  SERVER1=server.example.com
114                  SERVER2=server2.example.net
115
116                So you must be able to do this without entering a password:
117
118                  ssh $SERVER1 echo works
119                  ssh $SERVER2 echo works
120
121                It can be setup by running 'ssh-keygen -t dsa; ssh-copy-id
122                $SERVER1' and using an empty passphrase, or you can use ssh-
123                agent.
124

Input sources

126       GNU parallel reads input from input sources. These can be files, the
127       command line, and stdin (standard input or a pipe).
128
129   A single input source
130       Input can be read from the command line:
131
132         parallel echo ::: A B C
133
134       Output (the order may be different because the jobs are run in
135       parallel):
136
137         A
138         B
139         C
140
141       The input source can be a file:
142
143         parallel -a abc-file echo
144
145       Output: Same as above.
146
147       STDIN (standard input) can be the input source:
148
149         cat abc-file | parallel echo
150
151       Output: Same as above.
152
153   Multiple input sources
154       GNU parallel can take multiple input sources given on the command line.
155       GNU parallel then generates all combinations of the input sources:
156
157         parallel echo ::: A B C ::: D E F
158
159       Output (the order may be different):
160
161         A D
162         A E
163         A F
164         B D
165         B E
166         B F
167         C D
168         C E
169         C F
170
171       The input sources can be files:
172
173         parallel -a abc-file -a def-file echo
174
175       Output: Same as above.
176
177       STDIN (standard input) can be one of the input sources using -:
178
179         cat abc-file | parallel -a - -a def-file echo
180
181       Output: Same as above.
182
183       Instead of -a files can be given after :::::
184
185         cat abc-file | parallel echo :::: - def-file
186
187       Output: Same as above.
188
189       ::: and :::: can be mixed:
190
191         parallel echo ::: A B C :::: def-file
192
193       Output: Same as above.
194
195       Linking arguments from input sources
196
197       With --link you can link the input sources and get one argument from
198       each input source:
199
200         parallel --link echo ::: A B C ::: D E F
201
202       Output (the order may be different):
203
204         A D
205         B E
206         C F
207
208       If one of the input sources is too short, its values will wrap:
209
210         parallel --link echo ::: A B C D E ::: F G
211
212       Output (the order may be different):
213
214         A F
215         B G
216         C F
217         D G
218         E F
219
220       For more flexible linking you can use :::+ and ::::+. They work like
221       ::: and :::: except they link the previous input source to this input
222       source.
223
224       This will link ABC to GHI:
225
226         parallel echo :::: abc-file :::+ G H I :::: def-file
227
228       Output (the order may be different):
229
230         A G D
231         A G E
232         A G F
233         B H D
234         B H E
235         B H F
236         C I D
237         C I E
238         C I F
239
240       This will link GHI to DEF:
241
242         parallel echo :::: abc-file ::: G H I ::::+ def-file
243
244       Output (the order may be different):
245
246         A G D
247         A H E
248         A I F
249         B G D
250         B H E
251         B I F
252         C G D
253         C H E
254         C I F
255
256       If one of the input sources is too short when using :::+ or ::::+, the
257       rest will be ignored:
258
259         parallel echo ::: A B C D E :::+ F G
260
261       Output (the order may be different):
262
263         A F
264         B G
265
266   Changing the argument separator.
267       GNU parallel can use other separators than ::: or ::::. This is
268       typically useful if ::: or :::: is used in the command to run:
269
270         parallel --arg-sep ,, echo ,, A B C :::: def-file
271
272       Output (the order may be different):
273
274         A D
275         A E
276         A F
277         B D
278         B E
279         B F
280         C D
281         C E
282         C F
283
284       Changing the argument file separator:
285
286         parallel --arg-file-sep // echo ::: A B C // def-file
287
288       Output: Same as above.
289
290   Changing the argument delimiter
291       GNU parallel will normally treat a full line as a single argument: It
292       uses \n as argument delimiter. This can be changed with -d:
293
294         parallel -d _ echo :::: abc_-file
295
296       Output (the order may be different):
297
298         A
299         B
300         C
301
302       NUL can be given as \0:
303
304         parallel -d '\0' echo :::: abc0-file
305
306       Output: Same as above.
307
308       A shorthand for -d '\0' is -0 (this will often be used to read files
309       from find ... -print0):
310
311         parallel -0 echo :::: abc0-file
312
313       Output: Same as above.
314
315   End-of-file value for input source
316       GNU parallel can stop reading when it encounters a certain value:
317
318         parallel -E stop echo ::: A B stop C D
319
320       Output:
321
322         A
323         B
324
325   Skipping empty lines
326       Using --no-run-if-empty GNU parallel will skip empty lines.
327
328         (echo 1; echo; echo 2) | parallel --no-run-if-empty echo
329
330       Output:
331
332         1
333         2
334

Building the command line

336   No command means arguments are commands
337       If no command is given after parallel the arguments themselves are
338       treated as commands:
339
340         parallel ::: ls 'echo foo' pwd
341
342       Output (the order may be different):
343
344         [list of files in current dir]
345         foo
346         [/path/to/current/working/dir]
347
348       The command can be a script, a binary or a Bash function if the
349       function is exported using export -f:
350
351         # Only works in Bash
352         my_func() {
353           echo in my_func $1
354         }
355         export -f my_func
356         parallel my_func ::: 1 2 3
357
358       Output (the order may be different):
359
360         in my_func 1
361         in my_func 2
362         in my_func 3
363
364   Replacement strings
365       The 7 predefined replacement strings
366
367       GNU parallel has several replacement strings. If no replacement strings
368       are used the default is to append {}:
369
370         parallel echo ::: A/B.C
371
372       Output:
373
374         A/B.C
375
376       The default replacement string is {}:
377
378         parallel echo {} ::: A/B.C
379
380       Output:
381
382         A/B.C
383
384       The replacement string {.} removes the extension:
385
386         parallel echo {.} ::: A/B.C
387
388       Output:
389
390         A/B
391
392       The replacement string {/} removes the path:
393
394         parallel echo {/} ::: A/B.C
395
396       Output:
397
398         B.C
399
400       The replacement string {//} keeps only the path:
401
402         parallel echo {//} ::: A/B.C
403
404       Output:
405
406         A
407
408       The replacement string {/.} removes the path and the extension:
409
410         parallel echo {/.} ::: A/B.C
411
412       Output:
413
414         B
415
416       The replacement string {#} gives the job number:
417
418         parallel echo {#} ::: A B C
419
420       Output (the order may be different):
421
422         1
423         2
424         3
425
426       The replacement string {%} gives the job slot number (between 1 and
427       number of jobs to run in parallel):
428
429         parallel -j 2 echo {%} ::: A B C
430
431       Output (the order may be different and 1 and 2 may be swapped):
432
433         1
434         2
435         1
436
437       Changing the replacement strings
438
439       The replacement string {} can be changed with -I:
440
441         parallel -I ,, echo ,, ::: A/B.C
442
443       Output:
444
445         A/B.C
446
447       The replacement string {.} can be changed with --extensionreplace:
448
449         parallel --extensionreplace ,, echo ,, ::: A/B.C
450
451       Output:
452
453         A/B
454
455       The replacement string {/} can be replaced with --basenamereplace:
456
457         parallel --basenamereplace ,, echo ,, ::: A/B.C
458
459       Output:
460
461         B.C
462
463       The replacement string {//} can be changed with --dirnamereplace:
464
465         parallel --dirnamereplace ,, echo ,, ::: A/B.C
466
467       Output:
468
469         A
470
471       The replacement string {/.} can be changed with
472       --basenameextensionreplace:
473
474         parallel --basenameextensionreplace ,, echo ,, ::: A/B.C
475
476       Output:
477
478         B
479
480       The replacement string {#} can be changed with --seqreplace:
481
482         parallel --seqreplace ,, echo ,, ::: A B C
483
484       Output (the order may be different):
485
486         1
487         2
488         3
489
490       The replacement string {%} can be changed with --slotreplace:
491
492         parallel -j2 --slotreplace ,, echo ,, ::: A B C
493
494       Output (the order may be different and 1 and 2 may be swapped):
495
496         1
497         2
498         1
499
500       Perl expression replacement string
501
502       When predefined replacement strings are not flexible enough a perl
503       expression can be used instead. One example is to remove two
504       extensions: foo.tar.gz becomes foo
505
506         parallel echo '{= s:\.[^.]+$::;s:\.[^.]+$::; =}' ::: foo.tar.gz
507
508       Output:
509
510         foo
511
512       In {= =} you can access all of GNU parallel's internal functions and
513       variables. A few are worth mentioning.
514
515       total_jobs() returns the total number of jobs:
516
517         parallel echo Job {#} of {= '$_=total_jobs()' =} ::: {1..5}
518
519       Output:
520
521         Job 1 of 5
522         Job 2 of 5
523         Job 3 of 5
524         Job 4 of 5
525         Job 5 of 5
526
527       Q(...) shell quotes the string:
528
529         parallel echo {} shell quoted is {= '$_=Q($_)' =} ::: '*/!#$'
530
531       Output:
532
533         */!#$ shell quoted is \*/\!\#\$
534
535       skip() skips the job:
536
537         parallel echo {= 'if($_==3) { skip() }' =} ::: {1..5}
538
539       Output:
540
541         1
542         2
543         4
544         5
545
546       @arg contains the input source variables:
547
548         parallel echo {= 'if($arg[1]==$arg[2]) { skip() }' =} \
549           ::: {1..3} ::: {1..3}
550
551       Output:
552
553         1 2
554         1 3
555         2 1
556         2 3
557         3 1
558         3 2
559
560       If the strings {= and =} cause problems they can be replaced with
561       --parens:
562
563         parallel --parens ,,,, echo ',, s:\.[^.]+$::;s:\.[^.]+$::; ,,' \
564           ::: foo.tar.gz
565
566       Output:
567
568         foo
569
570       To define a shorthand replacement string use --rpl:
571
572         parallel --rpl '.. s:\.[^.]+$::;s:\.[^.]+$::;' echo '..' \
573           ::: foo.tar.gz
574
575       Output: Same as above.
576
577       If the shorthand starts with { it can be used as a positional
578       replacement string, too:
579
580         parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{..}'
581           ::: foo.tar.gz
582
583       Output: Same as above.
584
585       If the shorthand contains matching parenthesis the replacement string
586       becomes a dynamic replacement string and the string in the parenthesis
587       can be accessed as $$1. If there are multiple matching parenthesis, the
588       matched strings can be accessed using $$2, $$3 and so on.
589
590       You can think of this as giving arguments to the replacement string.
591       Here we give the argument .tar.gz to the replacement string {%string}
592       which removes string:
593
594         parallel --rpl '{%(.+?)} s/$$1$//;' echo {%.tar.gz}.zip ::: foo.tar.gz
595
596       Output:
597
598         foo.zip
599
600       Here we give the two arguments tar.gz and zip to the replacement string
601       {/string1/string2} which replaces string1 with string2:
602
603         parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;' echo {/tar.gz/zip} \
604           ::: foo.tar.gz
605
606       Output:
607
608         foo.zip
609
610       GNU parallel's 7 replacement strings are implemented as this:
611
612         --rpl '{} '
613         --rpl '{#} $_=$job->seq()'
614         --rpl '{%} $_=$job->slot()'
615         --rpl '{/} s:.*/::'
616         --rpl '{//} $Global::use{"File::Basename"} ||=
617                  eval "use File::Basename; 1;"; $_ = dirname($_);'
618         --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
619         --rpl '{.} s:\.[^/.]+$::'
620
621       Positional replacement strings
622
623       With multiple input sources the argument from the individual input
624       sources can be accessed with {number}:
625
626         parallel echo {1} and {2} ::: A B ::: C D
627
628       Output (the order may be different):
629
630         A and C
631         A and D
632         B and C
633         B and D
634
635       The positional replacement strings can also be modified using /, //,
636       /., and  .:
637
638         parallel echo /={1/} //={1//} /.={1/.} .={1.} ::: A/B.C D/E.F
639
640       Output (the order may be different):
641
642         /=B.C //=A /.=B .=A/B
643         /=E.F //=D /.=E .=D/E
644
645       If a position is negative, it will refer to the input source counted
646       from behind:
647
648         parallel echo 1={1} 2={2} 3={3} -1={-1} -2={-2} -3={-3} \
649           ::: A B ::: C D ::: E F
650
651       Output (the order may be different):
652
653         1=A 2=C 3=E -1=E -2=C -3=A
654         1=A 2=C 3=F -1=F -2=C -3=A
655         1=A 2=D 3=E -1=E -2=D -3=A
656         1=A 2=D 3=F -1=F -2=D -3=A
657         1=B 2=C 3=E -1=E -2=C -3=B
658         1=B 2=C 3=F -1=F -2=C -3=B
659         1=B 2=D 3=E -1=E -2=D -3=B
660         1=B 2=D 3=F -1=F -2=D -3=B
661
662       Positional perl expression replacement string
663
664       To use a perl expression as a positional replacement string simply
665       prepend the perl expression with number and space:
666
667         parallel echo '{=2 s:\.[^.]+$::;s:\.[^.]+$::; =} {1}' \
668           ::: bar ::: foo.tar.gz
669
670       Output:
671
672         foo bar
673
674       If a shorthand defined using --rpl starts with { it can be used as a
675       positional replacement string, too:
676
677         parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{2..} {1}' \
678           ::: bar ::: foo.tar.gz
679
680       Output: Same as above.
681
682       Input from columns
683
684       The columns in a file can be bound to positional replacement strings
685       using --colsep. Here the columns are separated by TAB (\t):
686
687         parallel --colsep '\t' echo 1={1} 2={2} :::: tsv-file.tsv
688
689       Output (the order may be different):
690
691         1=f1 2=f2
692         1=A 2=B
693         1=C 2=D
694
695       Header defined replacement strings
696
697       With --header GNU parallel will use the first value of the input source
698       as the name of the replacement string. Only the non-modified version {}
699       is supported:
700
701         parallel --header : echo f1={f1} f2={f2} ::: f1 A B ::: f2 C D
702
703       Output (the order may be different):
704
705         f1=A f2=C
706         f1=A f2=D
707         f1=B f2=C
708         f1=B f2=D
709
710       It is useful with --colsep for processing files with TAB separated
711       values:
712
713         parallel --header : --colsep '\t' echo f1={f1} f2={f2} \
714           :::: tsv-file.tsv
715
716       Output (the order may be different):
717
718         f1=A f2=B
719         f1=C f2=D
720
721       More pre-defined replacement strings with --plus
722
723       --plus adds the replacement strings {+/} {+.} {+..} {+...} {..}  {...}
724       {/..} {/...} {##}. The idea being that {+foo} matches the opposite of
725       {foo} and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
726       {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}.
727
728         parallel --plus echo {} ::: dir/sub/file.ex1.ex2.ex3
729         parallel --plus echo {+/}/{/} ::: dir/sub/file.ex1.ex2.ex3
730         parallel --plus echo {.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
731         parallel --plus echo {+/}/{/.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
732         parallel --plus echo {..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
733         parallel --plus echo {+/}/{/..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
734         parallel --plus echo {...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
735         parallel --plus echo {+/}/{/...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
736
737       Output:
738
739         dir/sub/file.ex1.ex2.ex3
740
741       {##} is simply the number of jobs:
742
743         parallel --plus echo Job {#} of {##} ::: {1..5}
744
745       Output:
746
747         Job 1 of 5
748         Job 2 of 5
749         Job 3 of 5
750         Job 4 of 5
751         Job 5 of 5
752
753       Dynamic replacement strings with --plus
754
755       --plus also defines these dynamic replacement strings:
756
757       {:-string}         Default value is string if the argument is empty.
758
759       {:number}          Substring from number till end of string.
760
761       {:number1:number2} Substring from number1 to number2.
762
763       {#string}          If the argument starts with string, remove it.
764
765       {%string}          If the argument ends with string, remove it.
766
767       {/string1/string2} Replace string1 with string2.
768
769       {^string}          If the argument starts with string, upper case it.
770                          string must be a single letter.
771
772       {^^string}         If the argument contains string, upper case it.
773                          string must be a single letter.
774
775       {,string}          If the argument starts with string, lower case it.
776                          string must be a single letter.
777
778       {,,string}         If the argument contains string, lower case it.
779                          string must be a single letter.
780
781       They are inspired from Bash:
782
783         unset myvar
784         echo ${myvar:-myval}
785         parallel --plus echo {:-myval} ::: "$myvar"
786
787         myvar=abcAaAdef
788         echo ${myvar:2}
789         parallel --plus echo {:2} ::: "$myvar"
790
791         echo ${myvar:2:3}
792         parallel --plus echo {:2:3} ::: "$myvar"
793
794         echo ${myvar#bc}
795         parallel --plus echo {#bc} ::: "$myvar"
796         echo ${myvar#abc}
797         parallel --plus echo {#abc} ::: "$myvar"
798
799         echo ${myvar%de}
800         parallel --plus echo {%de} ::: "$myvar"
801         echo ${myvar%def}
802         parallel --plus echo {%def} ::: "$myvar"
803
804         echo ${myvar/def/ghi}
805         parallel --plus echo {/def/ghi} ::: "$myvar"
806
807         echo ${myvar^a}
808         parallel --plus echo {^a} ::: "$myvar"
809         echo ${myvar^^a}
810         parallel --plus echo {^^a} ::: "$myvar"
811
812         myvar=AbcAaAdef
813         echo ${myvar,A}
814         parallel --plus echo '{,A}' ::: "$myvar"
815         echo ${myvar,,A}
816         parallel --plus echo '{,,A}' ::: "$myvar"
817
818       Output:
819
820         myval
821         myval
822         cAaAdef
823         cAaAdef
824         cAa
825         cAa
826         abcAaAdef
827         abcAaAdef
828         AaAdef
829         AaAdef
830         abcAaAdef
831         abcAaAdef
832         abcAaA
833         abcAaA
834         abcAaAghi
835         abcAaAghi
836         AbcAaAdef
837         AbcAaAdef
838         AbcAAAdef
839         AbcAAAdef
840         abcAaAdef
841         abcAaAdef
842         abcaaadef
843         abcaaadef
844
845   More than one argument
846       With --xargs GNU parallel will fit as many arguments as possible on a
847       single line:
848
849         cat num30000 | parallel --xargs echo | wc -l
850
851       Output (if you run this under Bash on GNU/Linux):
852
853         2
854
855       The 30000 arguments fitted on 2 lines.
856
857       The maximal length of a single line can be set with -s. With a maximal
858       line length of 10000 chars 17 commands will be run:
859
860         cat num30000 | parallel --xargs -s 10000 echo | wc -l
861
862       Output:
863
864         17
865
866       For better parallelism GNU parallel can distribute the arguments
867       between all the parallel jobs when end of file is met.
868
869       Below GNU parallel reads the last argument when generating the second
870       job. When GNU parallel reads the last argument, it spreads all the
871       arguments for the second job over 4 jobs instead, as 4 parallel jobs
872       are requested.
873
874       The first job will be the same as the --xargs example above, but the
875       second job will be split into 4 evenly sized jobs, resulting in a total
876       of 5 jobs:
877
878         cat num30000 | parallel --jobs 4 -m echo | wc -l
879
880       Output (if you run this under Bash on GNU/Linux):
881
882         5
883
884       This is even more visible when running 4 jobs with 10 arguments. The 10
885       arguments are being spread over 4 jobs:
886
887         parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10
888
889       Output:
890
891         1 2 3
892         4 5 6
893         7 8 9
894         10
895
896       A replacement string can be part of a word. -m will not repeat the
897       context:
898
899         parallel --jobs 4 -m echo pre-{}-post ::: A B C D E F G
900
901       Output (the order may be different):
902
903         pre-A B-post
904         pre-C D-post
905         pre-E F-post
906         pre-G-post
907
908       To repeat the context use -X which otherwise works like -m:
909
910         parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G
911
912       Output (the order may be different):
913
914         pre-A-post pre-B-post
915         pre-C-post pre-D-post
916         pre-E-post pre-F-post
917         pre-G-post
918
919       To limit the number of arguments use -N:
920
921         parallel -N3 echo ::: A B C D E F G H
922
923       Output (the order may be different):
924
925         A B C
926         D E F
927         G H
928
929       -N also sets the positional replacement strings:
930
931         parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H
932
933       Output (the order may be different):
934
935         1=A 2=B 3=C
936         1=D 2=E 3=F
937         1=G 2=H 3=
938
939       -N0 reads 1 argument but inserts none:
940
941         parallel -N0 echo foo ::: 1 2 3
942
943       Output:
944
945         foo
946         foo
947         foo
948
949   Quoting
950       Command lines that contain special characters may need to be protected
951       from the shell.
952
953       The perl program print "@ARGV\n" basically works like echo.
954
955         perl -e 'print "@ARGV\n"' A
956
957       Output:
958
959         A
960
961       To run that in parallel the command needs to be quoted:
962
963         parallel perl -e 'print "@ARGV\n"' ::: This wont work
964
965       Output:
966
967         [Nothing]
968
969       To quote the command use -q:
970
971         parallel -q perl -e 'print "@ARGV\n"' ::: This works
972
973       Output (the order may be different):
974
975         This
976         works
977
978       Or you can quote the critical part using \':
979
980         parallel perl -e \''print "@ARGV\n"'\' ::: This works, too
981
982       Output (the order may be different):
983
984         This
985         works,
986         too
987
988       GNU parallel can also \-quote full lines. Simply run this:
989
990         parallel --shellquote
991         Warning: Input is read from the terminal. You either know what you
992         Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
993         Warning: ::: or :::: or to pipe data into parallel. If so
994         Warning: consider going through the tutorial: man parallel_tutorial
995         Warning: Press CTRL-D to exit.
996         perl -e 'print "@ARGV\n"'
997         [CTRL-D]
998
999       Output:
1000
1001         perl\ -e\ \'print\ \"@ARGV\\n\"\'
1002
1003       This can then be used as the command:
1004
1005         parallel perl\ -e\ \'print\ \"@ARGV\\n\"\' ::: This also works
1006
1007       Output (the order may be different):
1008
1009         This
1010         also
1011         works
1012
1013   Trimming space
1014       Space can be trimmed on the arguments using --trim:
1015
1016         parallel --trim r echo pre-{}-post ::: ' A '
1017
1018       Output:
1019
1020         pre- A-post
1021
1022       To trim on the left side:
1023
1024         parallel --trim l echo pre-{}-post ::: ' A '
1025
1026       Output:
1027
1028         pre-A -post
1029
1030       To trim on the both sides:
1031
1032         parallel --trim lr echo pre-{}-post ::: ' A '
1033
1034       Output:
1035
1036         pre-A-post
1037
1038   Respecting the shell
1039       This tutorial uses Bash as the shell. GNU parallel respects which shell
1040       you are using, so in zsh you can do:
1041
1042         parallel echo \={} ::: zsh bash ls
1043
1044       Output:
1045
1046         /usr/bin/zsh
1047         /bin/bash
1048         /bin/ls
1049
1050       In csh you can do:
1051
1052         parallel 'set a="{}"; if( { test -d "$a" } ) echo "$a is a dir"' ::: *
1053
1054       Output:
1055
1056         [somedir] is a dir
1057
1058       This also becomes useful if you use GNU parallel in a shell script: GNU
1059       parallel will use the same shell as the shell script.
1060

Controlling the output

1062       The output can prefixed with the argument:
1063
1064         parallel --tag echo foo-{} ::: A B C
1065
1066       Output (the order may be different):
1067
1068         A       foo-A
1069         B       foo-B
1070         C       foo-C
1071
1072       To prefix it with another string use --tagstring:
1073
1074         parallel --tagstring {}-bar echo foo-{} ::: A B C
1075
1076       Output (the order may be different):
1077
1078         A-bar   foo-A
1079         B-bar   foo-B
1080         C-bar   foo-C
1081
1082       To see what commands will be run without running them use --dryrun:
1083
1084         parallel --dryrun echo {} ::: A B C
1085
1086       Output (the order may be different):
1087
1088         echo A
1089         echo B
1090         echo C
1091
1092       To print the command before running them use --verbose:
1093
1094         parallel --verbose echo {} ::: A B C
1095
1096       Output (the order may be different):
1097
1098         echo A
1099         echo B
1100         A
1101         echo C
1102         B
1103         C
1104
1105       GNU parallel will postpone the output until the command completes:
1106
1107         parallel -j2 'printf "%s-start\n%s" {} {};
1108           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1109
1110       Output:
1111
1112         2-start
1113         2-middle
1114         2-end
1115         1-start
1116         1-middle
1117         1-end
1118         4-start
1119         4-middle
1120         4-end
1121
1122       To get the output immediately use --ungroup:
1123
1124         parallel -j2 --ungroup 'printf "%s-start\n%s" {} {};
1125           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1126
1127       Output:
1128
1129         4-start
1130         42-start
1131         2-middle
1132         2-end
1133         1-start
1134         1-middle
1135         1-end
1136         -middle
1137         4-end
1138
1139       --ungroup is fast, but can cause half a line from one job to be mixed
1140       with half a line of another job. That has happened in the second line,
1141       where the line '4-middle' is mixed with '2-start'.
1142
1143       To avoid this use --linebuffer:
1144
1145         parallel -j2 --linebuffer 'printf "%s-start\n%s" {} {};
1146           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1147
1148       Output:
1149
1150         4-start
1151         2-start
1152         2-middle
1153         2-end
1154         1-start
1155         1-middle
1156         1-end
1157         4-middle
1158         4-end
1159
1160       To force the output in the same order as the arguments use
1161       --keep-order/-k:
1162
1163         parallel -j2 -k 'printf "%s-start\n%s" {} {};
1164           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1165
1166       Output:
1167
1168         4-start
1169         4-middle
1170         4-end
1171         2-start
1172         2-middle
1173         2-end
1174         1-start
1175         1-middle
1176         1-end
1177
1178   Saving output into files
1179       GNU parallel can save the output of each job into files:
1180
1181         parallel --files echo ::: A B C
1182
1183       Output will be similar to this:
1184
1185         /tmp/pAh6uWuQCg.par
1186         /tmp/opjhZCzAX4.par
1187         /tmp/W0AT_Rph2o.par
1188
1189       By default GNU parallel will cache the output in files in /tmp. This
1190       can be changed by setting $TMPDIR or --tmpdir:
1191
1192         parallel --tmpdir /var/tmp --files echo ::: A B C
1193
1194       Output will be similar to this:
1195
1196         /var/tmp/N_vk7phQRc.par
1197         /var/tmp/7zA4Ccf3wZ.par
1198         /var/tmp/LIuKgF_2LP.par
1199
1200       Or:
1201
1202         TMPDIR=/var/tmp parallel --files echo ::: A B C
1203
1204       Output: Same as above.
1205
1206       The output files can be saved in a structured way using --results:
1207
1208         parallel --results outdir echo ::: A B C
1209
1210       Output:
1211
1212         A
1213         B
1214         C
1215
1216       These files were also generated containing the standard output
1217       (stdout), standard error (stderr), and the sequence number (seq):
1218
1219         outdir/1/A/seq
1220         outdir/1/A/stderr
1221         outdir/1/A/stdout
1222         outdir/1/B/seq
1223         outdir/1/B/stderr
1224         outdir/1/B/stdout
1225         outdir/1/C/seq
1226         outdir/1/C/stderr
1227         outdir/1/C/stdout
1228
1229       --header : will take the first value as name and use that in the
1230       directory structure. This is useful if you are using multiple input
1231       sources:
1232
1233         parallel --header : --results outdir echo ::: f1 A B ::: f2 C D
1234
1235       Generated files:
1236
1237         outdir/f1/A/f2/C/seq
1238         outdir/f1/A/f2/C/stderr
1239         outdir/f1/A/f2/C/stdout
1240         outdir/f1/A/f2/D/seq
1241         outdir/f1/A/f2/D/stderr
1242         outdir/f1/A/f2/D/stdout
1243         outdir/f1/B/f2/C/seq
1244         outdir/f1/B/f2/C/stderr
1245         outdir/f1/B/f2/C/stdout
1246         outdir/f1/B/f2/D/seq
1247         outdir/f1/B/f2/D/stderr
1248         outdir/f1/B/f2/D/stdout
1249
1250       The directories are named after the variables and their values.
1251

Controlling the execution

1253   Number of simultaneous jobs
1254       The number of concurrent jobs is given with --jobs/-j:
1255
1256         /usr/bin/time parallel -N0 -j64 sleep 1 :::: num128
1257
1258       With 64 jobs in parallel the 128 sleeps will take 2-8 seconds to run -
1259       depending on how fast your machine is.
1260
1261       By default --jobs is the same as the number of CPU cores. So this:
1262
1263         /usr/bin/time parallel -N0 sleep 1 :::: num128
1264
1265       should take twice the time of running 2 jobs per CPU core:
1266
1267         /usr/bin/time parallel -N0 --jobs 200% sleep 1 :::: num128
1268
1269       --jobs 0 will run as many jobs in parallel as possible:
1270
1271         /usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: num128
1272
1273       which should take 1-7 seconds depending on how fast your machine is.
1274
1275       --jobs can read from a file which is re-read when a job finishes:
1276
1277         echo 50% > my_jobs
1278         /usr/bin/time parallel -N0 --jobs my_jobs sleep 1 :::: num128 &
1279         sleep 1
1280         echo 0 > my_jobs
1281         wait
1282
1283       The first second only 50% of the CPU cores will run a job. Then 0 is
1284       put into my_jobs and then the rest of the jobs will be started in
1285       parallel.
1286
1287       Instead of basing the percentage on the number of CPU cores GNU
1288       parallel can base it on the number of CPUs:
1289
1290         parallel --use-cpus-instead-of-cores -N0 sleep 1 :::: num8
1291
1292   Shuffle job order
1293       If you have many jobs (e.g. by multiple combinations of input sources),
1294       it can be handy to shuffle the jobs, so you get different values run.
1295       Use --shuf for that:
1296
1297         parallel --shuf echo ::: 1 2 3 ::: a b c ::: A B C
1298
1299       Output:
1300
1301         All combinations but different order for each run.
1302
1303   Interactivity
1304       GNU parallel can ask the user if a command should be run using
1305       --interactive:
1306
1307         parallel --interactive echo ::: 1 2 3
1308
1309       Output:
1310
1311         echo 1 ?...y
1312         echo 2 ?...n
1313         1
1314         echo 3 ?...y
1315         3
1316
1317       GNU parallel can be used to put arguments on the command line for an
1318       interactive command such as emacs to edit one file at a time:
1319
1320         parallel --tty emacs ::: 1 2 3
1321
1322       Or give multiple argument in one go to open multiple files:
1323
1324         parallel -X --tty vi ::: 1 2 3
1325
1326   A terminal for every job
1327       Using --tmux GNU parallel can start a terminal for every job run:
1328
1329         seq 10 20 | parallel --tmux 'echo start {}; sleep {}; echo done {}'
1330
1331       This will tell you to run something similar to:
1332
1333         tmux -S /tmp/tmsrPrO0 attach
1334
1335       Using normal tmux keystrokes (CTRL-b n or CTRL-b p) you can cycle
1336       between windows of the running jobs. When a job is finished it will
1337       pause for 10 seconds before closing the window.
1338
1339   Timing
1340       Some jobs do heavy I/O when they start. To avoid a thundering herd GNU
1341       parallel can delay starting new jobs. --delay X will make sure there is
1342       at least X seconds between each start:
1343
1344         parallel --delay 2.5 echo Starting {}\;date ::: 1 2 3
1345
1346       Output:
1347
1348         Starting 1
1349         Thu Aug 15 16:24:33 CEST 2013
1350         Starting 2
1351         Thu Aug 15 16:24:35 CEST 2013
1352         Starting 3
1353         Thu Aug 15 16:24:38 CEST 2013
1354
1355       If jobs taking more than a certain amount of time are known to fail,
1356       they can be stopped with --timeout. The accuracy of --timeout is 2
1357       seconds:
1358
1359         parallel --timeout 4.1 sleep {}\; echo {} ::: 2 4 6 8
1360
1361       Output:
1362
1363         2
1364         4
1365
1366       GNU parallel can compute the median runtime for jobs and kill those
1367       that take more than 200% of the median runtime:
1368
1369         parallel --timeout 200% sleep {}\; echo {} ::: 2.1 2.2 3 7 2.3
1370
1371       Output:
1372
1373         2.1
1374         2.2
1375         3
1376         2.3
1377
1378   Progress information
1379       Based on the runtime of completed jobs GNU parallel can estimate the
1380       total runtime:
1381
1382         parallel --eta sleep ::: 1 3 2 2 1 3 3 2 1
1383
1384       Output:
1385
1386         Computers / CPU cores / Max jobs to run
1387         1:local / 2 / 2
1388
1389         Computer:jobs running/jobs completed/%of started jobs/
1390           Average seconds to complete
1391         ETA: 2s 0left 1.11avg  local:0/9/100%/1.1s
1392
1393       GNU parallel can give progress information with --progress:
1394
1395         parallel --progress sleep ::: 1 3 2 2 1 3 3 2 1
1396
1397       Output:
1398
1399         Computers / CPU cores / Max jobs to run
1400         1:local / 2 / 2
1401
1402         Computer:jobs running/jobs completed/%of started jobs/
1403           Average seconds to complete
1404         local:0/9/100%/1.1s
1405
1406       A progress bar can be shown with --bar:
1407
1408         parallel --bar sleep ::: 1 3 2 2 1 3 3 2 1
1409
1410       And a graphic bar can be shown with --bar and zenity:
1411
1412         seq 1000 | parallel -j10 --bar '(echo -n {};sleep 0.1)' \
1413           2> >(zenity --progress --auto-kill --auto-close)
1414
1415       A logfile of the jobs completed so far can be generated with --joblog:
1416
1417         parallel --joblog /tmp/log exit  ::: 1 2 3 0
1418         cat /tmp/log
1419
1420       Output:
1421
1422         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1423         1   :    1376577364.974 0.008   0    0       1       0      exit 1
1424         2   :    1376577364.982 0.013   0    0       2       0      exit 2
1425         3   :    1376577364.990 0.013   0    0       3       0      exit 3
1426         4   :    1376577365.003 0.003   0    0       0       0      exit 0
1427
1428       The log contains the job sequence, which host the job was run on, the
1429       start time and run time, how much data was transferred, the exit value,
1430       the signal that killed the job, and finally the command being run.
1431
1432       With a joblog GNU parallel can be stopped and later pickup where it
1433       left off. It it important that the input of the completed jobs is
1434       unchanged.
1435
1436         parallel --joblog /tmp/log exit  ::: 1 2 3 0
1437         cat /tmp/log
1438         parallel --resume --joblog /tmp/log exit  ::: 1 2 3 0 0 0
1439         cat /tmp/log
1440
1441       Output:
1442
1443         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1444         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1445         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1446         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1447         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1448
1449         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1450         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1451         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1452         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1453         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1454         5   :    1376580070.028 0.009   0    0       0       0      exit 0
1455         6   :    1376580070.038 0.007   0    0       0       0      exit 0
1456
1457       Note how the start time of the last 2 jobs is clearly different from
1458       the second run.
1459
1460       With --resume-failed GNU parallel will re-run the jobs that failed:
1461
1462         parallel --resume-failed --joblog /tmp/log exit  ::: 1 2 3 0 0 0
1463         cat /tmp/log
1464
1465       Output:
1466
1467         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1468         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1469         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1470         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1471         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1472         5   :    1376580070.028 0.009   0    0       0       0      exit 0
1473         6   :    1376580070.038 0.007   0    0       0       0      exit 0
1474         1   :    1376580154.433 0.010   0    0       1       0      exit 1
1475         2   :    1376580154.444 0.022   0    0       2       0      exit 2
1476         3   :    1376580154.466 0.005   0    0       3       0      exit 3
1477
1478       Note how seq 1 2 3 have been repeated because they had exit value
1479       different from 0.
1480
1481       --retry-failed does almost the same as --resume-failed. Where
1482       --resume-failed reads the commands from the command line (and ignores
1483       the commands in the joblog), --retry-failed ignores the command line
1484       and reruns the commands mentioned in the joblog.
1485
1486         parallel --retry-failed --joblog /tmp/log
1487         cat /tmp/log
1488
1489       Output:
1490
1491         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1492         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1493         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1494         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1495         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1496         5   :    1376580070.028 0.009   0    0       0       0      exit 0
1497         6   :    1376580070.038 0.007   0    0       0       0      exit 0
1498         1   :    1376580154.433 0.010   0    0       1       0      exit 1
1499         2   :    1376580154.444 0.022   0    0       2       0      exit 2
1500         3   :    1376580154.466 0.005   0    0       3       0      exit 3
1501         1   :    1376580164.633 0.010   0    0       1       0      exit 1
1502         2   :    1376580164.644 0.022   0    0       2       0      exit 2
1503         3   :    1376580164.666 0.005   0    0       3       0      exit 3
1504
1505   Termination
1506       Unconditional termination
1507
1508       By default GNU parallel will wait for all jobs to finish before
1509       exiting.
1510
1511       If you send GNU parallel the TERM signal, GNU parallel will stop
1512       spawning new jobs and wait for the remaining jobs to finish. If you
1513       send GNU parallel the TERM signal again, GNU parallel will kill all
1514       running jobs and exit.
1515
1516       Termination dependent on job status
1517
1518       For certain jobs there is no need to continue if one of the jobs fails
1519       and has an exit code different from 0. GNU parallel will stop spawning
1520       new jobs with --halt soon,fail=1:
1521
1522         parallel -j2 --halt soon,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1523
1524       Output:
1525
1526         0
1527         0
1528         1
1529         parallel: This job failed:
1530         echo 1; exit 1
1531         parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1532         2
1533
1534       With --halt now,fail=1 the running jobs will be killed immediately:
1535
1536         parallel -j2 --halt now,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1537
1538       Output:
1539
1540         0
1541         0
1542         1
1543         parallel: This job failed:
1544         echo 1; exit 1
1545
1546       If --halt is given a percentage this percentage of the jobs must fail
1547       before GNU parallel stops spawning more jobs:
1548
1549         parallel -j2 --halt soon,fail=20% echo {}\; exit {} \
1550           ::: 0 1 2 3 4 5 6 7 8 9
1551
1552       Output:
1553
1554         0
1555         1
1556         parallel: This job failed:
1557         echo 1; exit 1
1558         2
1559         parallel: This job failed:
1560         echo 2; exit 2
1561         parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1562         3
1563         parallel: This job failed:
1564         echo 3; exit 3
1565
1566       If you are looking for success instead of failures, you can use
1567       success. This will finish as soon as the first job succeeds:
1568
1569         parallel -j2 --halt now,success=1 echo {}\; exit {} ::: 1 2 3 0 4 5 6
1570
1571       Output:
1572
1573         1
1574         2
1575         3
1576         0
1577         parallel: This job succeeded:
1578         echo 0; exit 0
1579
1580       GNU parallel can retry the command with --retries. This is useful if a
1581       command fails for unknown reasons now and then.
1582
1583         parallel -k --retries 3 \
1584           'echo tried {} >>/tmp/runs; echo completed {}; exit {}' ::: 1 2 0
1585         cat /tmp/runs
1586
1587       Output:
1588
1589         completed 1
1590         completed 2
1591         completed 0
1592
1593         tried 1
1594         tried 2
1595         tried 1
1596         tried 2
1597         tried 1
1598         tried 2
1599         tried 0
1600
1601       Note how job 1 and 2 were tried 3 times, but 0 was not retried because
1602       it had exit code 0.
1603
1604       Termination signals (advanced)
1605
1606       Using --termseq you can control which signals are sent when killing
1607       children. Normally children will be killed by sending them SIGTERM,
1608       waiting 200 ms, then another SIGTERM, waiting 100 ms, then another
1609       SIGTERM, waiting 50 ms, then a SIGKILL, finally waiting 25 ms before
1610       giving up. It looks like this:
1611
1612         show_signals() {
1613           perl -e 'for(keys %SIG) {
1614               $SIG{$_} = eval "sub { print \"Got $_\\n\"; }";
1615             }
1616             while(1){sleep 1}'
1617         }
1618         export -f show_signals
1619         echo | parallel --termseq TERM,200,TERM,100,TERM,50,KILL,25 \
1620           -u --timeout 1 show_signals
1621
1622       Output:
1623
1624         Got TERM
1625         Got TERM
1626         Got TERM
1627
1628       Or just:
1629
1630         echo | parallel -u --timeout 1 show_signals
1631
1632       Output: Same as above.
1633
1634       You can change this to SIGINT, SIGTERM, SIGKILL:
1635
1636         echo | parallel --termseq INT,200,TERM,100,KILL,25 \
1637           -u --timeout 1 show_signals
1638
1639       Output:
1640
1641         Got INT
1642         Got TERM
1643
1644       The SIGKILL does not show because it cannot be caught, and thus the
1645       child dies.
1646
1647   Limiting the resources
1648       To avoid overloading systems GNU parallel can look at the system load
1649       before starting another job:
1650
1651         parallel --load 100% echo load is less than {} job per cpu ::: 1
1652
1653       Output:
1654
1655         [when then load is less than the number of cpu cores]
1656         load is less than 1 job per cpu
1657
1658       GNU parallel can also check if the system is swapping.
1659
1660         parallel --noswap echo the system is not swapping ::: now
1661
1662       Output:
1663
1664         [when then system is not swapping]
1665         the system is not swapping now
1666
1667       Some jobs need a lot of memory, and should only be started when there
1668       is enough memory free. Using --memfree GNU parallel can check if there
1669       is enough memory free. Additionally, GNU parallel will kill off the
1670       youngest job if the memory free falls below 50% of the size. The killed
1671       job will put back on the queue and retried later.
1672
1673         parallel --memfree 1G echo will run if more than 1 GB is ::: free
1674
1675       GNU parallel can run the jobs with a nice value. This will work both
1676       locally and remotely.
1677
1678         parallel --nice 17 echo this is being run with nice -n ::: 17
1679
1680       Output:
1681
1682         this is being run with nice -n 17
1683

Remote execution

1685       GNU parallel can run jobs on remote servers. It uses ssh to communicate
1686       with the remote machines.
1687
1688   Sshlogin
1689       The most basic sshlogin is -S host:
1690
1691         parallel -S $SERVER1 echo running on ::: $SERVER1
1692
1693       Output:
1694
1695         running on [$SERVER1]
1696
1697       To use a different username prepend the server with username@:
1698
1699         parallel -S username@$SERVER1 echo running on ::: username@$SERVER1
1700
1701       Output:
1702
1703         running on [username@$SERVER1]
1704
1705       The special sshlogin : is the local machine:
1706
1707         parallel -S : echo running on ::: the_local_machine
1708
1709       Output:
1710
1711         running on the_local_machine
1712
1713       If ssh is not in $PATH it can be prepended to $SERVER1:
1714
1715         parallel -S '/usr/bin/ssh '$SERVER1 echo custom ::: ssh
1716
1717       Output:
1718
1719         custom ssh
1720
1721       The ssh command can also be given using --ssh:
1722
1723         parallel --ssh /usr/bin/ssh -S $SERVER1 echo custom ::: ssh
1724
1725       or by setting $PARALLEL_SSH:
1726
1727         export PARALLEL_SSH=/usr/bin/ssh
1728         parallel -S $SERVER1 echo custom ::: ssh
1729
1730       Several servers can be given using multiple -S:
1731
1732         parallel -S $SERVER1 -S $SERVER2 echo ::: running on more hosts
1733
1734       Output (the order may be different):
1735
1736         running
1737         on
1738         more
1739         hosts
1740
1741       Or they can be separated by ,:
1742
1743         parallel -S $SERVER1,$SERVER2 echo ::: running on more hosts
1744
1745       Output: Same as above.
1746
1747       Or newline:
1748
1749         # This gives a \n between $SERVER1 and $SERVER2
1750         SERVERS="`echo $SERVER1; echo $SERVER2`"
1751         parallel -S "$SERVERS" echo ::: running on more hosts
1752
1753       They can also be read from a file (replace user@ with the user on
1754       $SERVER2):
1755
1756         echo $SERVER1 > nodefile
1757         # Force 4 cores, special ssh-command, username
1758         echo 4//usr/bin/ssh user@$SERVER2 >> nodefile
1759         parallel --sshloginfile nodefile echo ::: running on more hosts
1760
1761       Output: Same as above.
1762
1763       Every time a job finished, the --sshloginfile will be re-read, so it is
1764       possible to both add and remove hosts while running.
1765
1766       The special --sshloginfile .. reads from ~/.parallel/sshloginfile.
1767
1768       To force GNU parallel to treat a server having a given number of CPU
1769       cores prepend the number of core followed by / to the sshlogin:
1770
1771         parallel -S 4/$SERVER1 echo force {} cpus on server ::: 4
1772
1773       Output:
1774
1775         force 4 cpus on server
1776
1777       Servers can be put into groups by prepending @groupname to the server
1778       and the group can then be selected by appending @groupname to the
1779       argument if using --hostgroup:
1780
1781         parallel --hostgroup -S @grp1/$SERVER1 -S @grp2/$SERVER2 echo {} \
1782           ::: run_on_grp1@grp1 run_on_grp2@grp2
1783
1784       Output:
1785
1786         run_on_grp1
1787         run_on_grp2
1788
1789       A host can be in multiple groups by separating the groups with +, and
1790       you can force GNU parallel to limit the groups on which the command can
1791       be run with -S @groupname:
1792
1793         parallel -S @grp1 -S @grp1+grp2/$SERVER1 -S @grp2/SERVER2 echo {} \
1794           ::: run_on_grp1 also_grp1
1795
1796       Output:
1797
1798         run_on_grp1
1799         also_grp1
1800
1801   Transferring files
1802       GNU parallel can transfer the files to be processed to the remote host.
1803       It does that using rsync.
1804
1805         echo This is input_file > input_file
1806         parallel -S $SERVER1 --transferfile {} cat ::: input_file
1807
1808       Output:
1809
1810         This is input_file
1811
1812       If the files are processed into another file, the resulting file can be
1813       transferred back:
1814
1815         echo This is input_file > input_file
1816         parallel -S $SERVER1 --transferfile {} --return {}.out \
1817           cat {} ">"{}.out ::: input_file
1818         cat input_file.out
1819
1820       Output: Same as above.
1821
1822       To remove the input and output file on the remote server use --cleanup:
1823
1824         echo This is input_file > input_file
1825         parallel -S $SERVER1 --transferfile {} --return {}.out --cleanup \
1826           cat {} ">"{}.out ::: input_file
1827         cat input_file.out
1828
1829       Output: Same as above.
1830
1831       There is a shorthand for --transferfile {} --return --cleanup called
1832       --trc:
1833
1834         echo This is input_file > input_file
1835         parallel -S $SERVER1 --trc {}.out cat {} ">"{}.out ::: input_file
1836         cat input_file.out
1837
1838       Output: Same as above.
1839
1840       Some jobs need a common database for all jobs. GNU parallel can
1841       transfer that using --basefile which will transfer the file before the
1842       first job:
1843
1844         echo common data > common_file
1845         parallel --basefile common_file -S $SERVER1 \
1846           cat common_file\; echo {} ::: foo
1847
1848       Output:
1849
1850         common data
1851         foo
1852
1853       To remove it from the remote host after the last job use --cleanup.
1854
1855   Working dir
1856       The default working dir on the remote machines is the login dir. This
1857       can be changed with --workdir mydir.
1858
1859       Files transferred using --transferfile and --return will be relative to
1860       mydir on remote computers, and the command will be executed in the dir
1861       mydir.
1862
1863       The special mydir value ... will create working dirs under
1864       ~/.parallel/tmp on the remote computers. If --cleanup is given these
1865       dirs will be removed.
1866
1867       The special mydir value . uses the current working dir.  If the current
1868       working dir is beneath your home dir, the value . is treated as the
1869       relative path to your home dir. This means that if your home dir is
1870       different on remote computers (e.g. if your login is different) the
1871       relative path will still be relative to your home dir.
1872
1873         parallel -S $SERVER1 pwd ::: ""
1874         parallel --workdir . -S $SERVER1 pwd ::: ""
1875         parallel --workdir ... -S $SERVER1 pwd ::: ""
1876
1877       Output:
1878
1879         [the login dir on $SERVER1]
1880         [current dir relative on $SERVER1]
1881         [a dir in ~/.parallel/tmp/...]
1882
1883   Avoid overloading sshd
1884       If many jobs are started on the same server, sshd can be overloaded.
1885       GNU parallel can insert a delay between each job run on the same
1886       server:
1887
1888         parallel -S $SERVER1 --sshdelay 0.2 echo ::: 1 2 3
1889
1890       Output (the order may be different):
1891
1892         1
1893         2
1894         3
1895
1896       sshd will be less overloaded if using --controlmaster, which will
1897       multiplex ssh connections:
1898
1899         parallel --controlmaster -S $SERVER1 echo ::: 1 2 3
1900
1901       Output: Same as above.
1902
1903   Ignore hosts that are down
1904       In clusters with many hosts a few of them are often down. GNU parallel
1905       can ignore those hosts. In this case the host 173.194.32.46 is down:
1906
1907         parallel --filter-hosts -S 173.194.32.46,$SERVER1 echo ::: bar
1908
1909       Output:
1910
1911         bar
1912
1913   Running the same commands on all hosts
1914       GNU parallel can run the same command on all the hosts:
1915
1916         parallel --onall -S $SERVER1,$SERVER2 echo ::: foo bar
1917
1918       Output (the order may be different):
1919
1920         foo
1921         bar
1922         foo
1923         bar
1924
1925       Often you will just want to run a single command on all hosts with out
1926       arguments. --nonall is a no argument --onall:
1927
1928         parallel --nonall -S $SERVER1,$SERVER2 echo foo bar
1929
1930       Output:
1931
1932         foo bar
1933         foo bar
1934
1935       When --tag is used with --nonall and --onall the --tagstring is the
1936       host:
1937
1938         parallel --nonall --tag -S $SERVER1,$SERVER2 echo foo bar
1939
1940       Output (the order may be different):
1941
1942         $SERVER1 foo bar
1943         $SERVER2 foo bar
1944
1945       --jobs sets the number of servers to log in to in parallel.
1946
1947   Transferring environment variables and functions
1948       env_parallel is a shell function that transfers all aliases, functions,
1949       variables, and arrays. You active it by running:
1950
1951         source `which env_parallel.bash`
1952
1953       Replace bash with the shell you use.
1954
1955       Now you can use env_parallel instead of parallel and still have your
1956       environment:
1957
1958         alias myecho=echo
1959         myvar="Joe's var is"
1960         env_parallel -S $SERVER1 'myecho $myvar' ::: green
1961
1962       Output:
1963
1964         Joe's var is green
1965
1966       The disadvantage is that if your environment is huge env_parallel will
1967       fail.
1968
1969       When env_parallel fails, you can still use --env to tell GNU parallel
1970       to transfer an environment variable to the remote system.
1971
1972         MYVAR='foo bar'
1973         export MYVAR
1974         parallel --env MYVAR -S $SERVER1 echo '$MYVAR' ::: baz
1975
1976       Output:
1977
1978         foo bar baz
1979
1980       This works for functions, too, if your shell is Bash:
1981
1982         # This only works in Bash
1983         my_func() {
1984           echo in my_func $1
1985         }
1986         export -f my_func
1987         parallel --env my_func -S $SERVER1 my_func ::: baz
1988
1989       Output:
1990
1991         in my_func baz
1992
1993       GNU parallel can copy all user defined variables and functions to the
1994       remote system. It just needs to record which ones to ignore in
1995       ~/.parallel/ignored_vars. Do that by running this once:
1996
1997         parallel --record-env
1998         cat ~/.parallel/ignored_vars
1999
2000       Output:
2001
2002         [list of variables to ignore - including $PATH and $HOME]
2003
2004       Now all other variables and functions defined will be copied when using
2005       --env _.
2006
2007         # The function is only copied if using Bash
2008         my_func2() {
2009           echo in my_func2 $VAR $1
2010         }
2011         export -f my_func2
2012         VAR=foo
2013         export VAR
2014
2015         parallel --env _ -S $SERVER1 'echo $VAR; my_func2' ::: bar
2016
2017       Output:
2018
2019         foo
2020         in my_func2 foo bar
2021
2022       If you use env_parallel the variables, functions, and aliases do not
2023       even need to be exported to be copied:
2024
2025         NOT='not exported var'
2026         alias myecho=echo
2027         not_ex() {
2028           myecho in not_exported_func $NOT $1
2029         }
2030         env_parallel --env _ -S $SERVER1 'echo $NOT; not_ex' ::: bar
2031
2032       Output:
2033
2034         not exported var
2035         in not_exported_func not exported var bar
2036
2037   Showing what is actually run
2038       --verbose will show the command that would be run on the local machine.
2039
2040       When using --cat, --pipepart, or when a job is run on a remote machine,
2041       the command is wrapped with helper scripts. -vv shows all of this.
2042
2043         parallel -vv --pipepart --block 1M wc :::: num30000
2044
2045       Output:
2046
2047         <num30000 perl -e 'while(@ARGV) { sysseek(STDIN,shift,0) || die;
2048         $left = shift; while($read = sysread(STDIN,$buf, ($left > 131072
2049         ? 131072 : $left))){ $left -= $read; syswrite(STDOUT,$buf); } }'
2050         0 0 0 168894 | (wc)
2051           30000   30000  168894
2052
2053       When the command gets more complex, the output is so hard to read, that
2054       it is only useful for debugging:
2055
2056         my_func3() {
2057           echo in my_func $1 > $1.out
2058         }
2059         export -f my_func3
2060         parallel -vv --workdir ... --nice 17 --env _ --trc {}.out \
2061           -S $SERVER1 my_func3 {} ::: abc-file
2062
2063       Output will be similar to:
2064
2065         ( ssh server -- mkdir -p ./.parallel/tmp/aspire-1928520-1;rsync
2066         --protocol 30 -rlDzR -essh ./abc-file
2067         server:./.parallel/tmp/aspire-1928520-1 );ssh server -- exec perl -e
2068         \''@GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
2069         eval"@GNU_Parallel";my$eval=decode_base64(join"",@ARGV);eval$eval;'\'
2070         c3lzdGVtKCJta2RpciIsIi1wIiwiLS0iLCIucGFyYWxsZWwvdG1wL2FzcGlyZS0xOTI4N
2071         TsgY2hkaXIgIi5wYXJhbGxlbC90bXAvYXNwaXJlLTE5Mjg1MjAtMSIgfHxwcmludChTVE
2072         BhcmFsbGVsOiBDYW5ub3QgY2hkaXIgdG8gLnBhcmFsbGVsL3RtcC9hc3BpcmUtMTkyODU
2073         iKSAmJiBleGl0IDI1NTskRU5WeyJPTERQV0QifT0iL2hvbWUvdGFuZ2UvcHJpdmF0L3Bh
2074         IjskRU5WeyJQQVJBTExFTF9QSUQifT0iMTkyODUyMCI7JEVOVnsiUEFSQUxMRUxfU0VRI
2075         0BiYXNoX2Z1bmN0aW9ucz1xdyhteV9mdW5jMyk7IGlmKCRFTlZ7IlNIRUxMIn09fi9jc2
2076         ByaW50IFNUREVSUiAiQ1NIL1RDU0ggRE8gTk9UIFNVUFBPUlQgbmV3bGluZXMgSU4gVkF
2077         TL0ZVTkNUSU9OUy4gVW5zZXQgQGJhc2hfZnVuY3Rpb25zXG4iOyBleGVjICJmYWxzZSI7
2078         YXNoZnVuYyA9ICJteV9mdW5jMygpIHsgIGVjaG8gaW4gbXlfZnVuYyBcJDEgPiBcJDEub
2079         Xhwb3J0IC1mIG15X2Z1bmMzID4vZGV2L251bGw7IjtAQVJHVj0ibXlfZnVuYzMgYWJjLW
2080         RzaGVsbD0iJEVOVntTSEVMTH0iOyR0bXBkaXI9Ii90bXAiOyRuaWNlPTE3O2RveyRFTlZ
2081         MRUxfVE1QfT0kdG1wZGlyLiIvcGFyIi5qb2luIiIsbWFweygwLi45LCJhIi4uInoiLCJB
2082         KVtyYW5kKDYyKV19KDEuLjUpO313aGlsZSgtZSRFTlZ7UEFSQUxMRUxfVE1QfSk7JFNJ
2083         fT1zdWJ7JGRvbmU9MTt9OyRwaWQ9Zm9yazt1bmxlc3MoJHBpZCl7c2V0cGdycDtldmFse
2084         W9yaXR5KDAsMCwkbmljZSl9O2V4ZWMkc2hlbGwsIi1jIiwoJGJhc2hmdW5jLiJAQVJHVi
2085         JleGVjOiQhXG4iO31kb3skcz0kczwxPzAuMDAxKyRzKjEuMDM6JHM7c2VsZWN0KHVuZGV
2086         mLHVuZGVmLCRzKTt9dW50aWwoJGRvbmV8fGdldHBwaWQ9PTEpO2tpbGwoU0lHSFVQLC0k
2087         dW5sZXNzJGRvbmU7d2FpdDtleGl0KCQ/JjEyNz8xMjgrKCQ/JjEyNyk6MSskPz4+OCk=;
2088         _EXIT_status=$?; mkdir -p ./.; rsync --protocol 30 --rsync-path=cd\
2089         ./.parallel/tmp/aspire-1928520-1/./.\;\ rsync -rlDzR -essh
2090         server:./abc-file.out ./.;ssh server -- \(rm\ -f\
2091         ./.parallel/tmp/aspire-1928520-1/abc-file\;\ sh\ -c\ \'rmdir\
2092         ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\
2093         2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);ssh
2094         server -- \(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file.out\;\
2095         sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\
2096         ./.parallel/\ 2\>/dev/null\'\;rm\ -rf\
2097         ./.parallel/tmp/aspire-1928520-1\;\);ssh server -- rm -rf
2098         .parallel/tmp/aspire-1928520-1; exit $_EXIT_status;
2099

Saving output to shell variables (advanced)

2101       GNU parset will set shell variables to the output of GNU parallel. GNU
2102       parset has one important limitation: It cannot be part of a pipe. In
2103       particular this means it cannot read anything from standard input
2104       (stdin) or pipe output to another program.
2105
2106       To use GNU parset prepend command with destination variables:
2107
2108         parset myvar1,myvar2 echo ::: a b
2109         echo $myvar1
2110         echo $myvar2
2111
2112       Output:
2113
2114         a
2115         b
2116
2117       If you only give a single variable, it will be treated as an array:
2118
2119         parset myarray seq {} 5 ::: 1 2 3
2120         echo "${myarray[1]}"
2121
2122       Output:
2123
2124         2
2125         3
2126         4
2127         5
2128
2129       The commands to run can be an array:
2130
2131         cmd=("echo '<<joe  \"double  space\"  cartoon>>'" "pwd")
2132         parset data ::: "${cmd[@]}"
2133         echo "${data[0]}"
2134         echo "${data[1]}"
2135
2136       Output:
2137
2138         <<joe  "double  space"  cartoon>>
2139         [current dir]
2140

Saving to an SQL base (advanced)

2142       GNU parallel can save into an SQL base. Point GNU parallel to a table
2143       and it will put the joblog there together with the variables and the
2144       output each in their own column.
2145
2146   CSV as SQL base
2147       The simplest is to use a CSV file as the storage table:
2148
2149         parallel --sqlandworker csv:///%2Ftmp/log.csv \
2150           seq ::: 10 ::: 12 13 14
2151         cat /tmp/log.csv
2152
2153       Note how '/' in the path must be written as %2F.
2154
2155       Output will be similar to:
2156
2157         Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2158           Command,V1,V2,Stdout,Stderr
2159         1,:,1458254498.254,0.069,0,9,0,0,"seq 10 12",10,12,"10
2160         11
2161         12
2162         ",
2163         2,:,1458254498.278,0.080,0,12,0,0,"seq 10 13",10,13,"10
2164         11
2165         12
2166         13
2167         ",
2168         3,:,1458254498.301,0.083,0,15,0,0,"seq 10 14",10,14,"10
2169         11
2170         12
2171         13
2172         14
2173         ",
2174
2175       A proper CSV reader (like LibreOffice or R's read.csv) will read this
2176       format correctly - even with fields containing newlines as above.
2177
2178       If the output is big you may want to put it into files using --results:
2179
2180         parallel --results outdir --sqlandworker csv:///%2Ftmp/log2.csv \
2181           seq ::: 10 ::: 12 13 14
2182         cat /tmp/log2.csv
2183
2184       Output will be similar to:
2185
2186         Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2187           Command,V1,V2,Stdout,Stderr
2188         1,:,1458824738.287,0.029,0,9,0,0,
2189           "seq 10 12",10,12,outdir/1/10/2/12/stdout,outdir/1/10/2/12/stderr
2190         2,:,1458824738.298,0.025,0,12,0,0,
2191           "seq 10 13",10,13,outdir/1/10/2/13/stdout,outdir/1/10/2/13/stderr
2192         3,:,1458824738.309,0.026,0,15,0,0,
2193           "seq 10 14",10,14,outdir/1/10/2/14/stdout,outdir/1/10/2/14/stderr
2194
2195   DBURL as table
2196       The CSV file is an example of a DBURL.
2197
2198       GNU parallel uses a DBURL to address the table. A DBURL has this
2199       format:
2200
2201         vendor://[[user][:password]@][host][:port]/[database[/table]
2202
2203       Example:
2204
2205         mysql://scott:tiger@my.example.com/mydatabase/mytable
2206         postgresql://scott:tiger@pg.example.com/mydatabase/mytable
2207         sqlite3:///%2Ftmp%2Fmydatabase/mytable
2208         csv:///%2Ftmp/log.csv
2209
2210       To refer to /tmp/mydatabase with sqlite or csv you need to encode the /
2211       as %2F.
2212
2213       Run a job using sqlite on mytable in /tmp/mydatabase:
2214
2215         DBURL=sqlite3:///%2Ftmp%2Fmydatabase
2216         DBURLTABLE=$DBURL/mytable
2217         parallel --sqlandworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2218
2219       To see the result:
2220
2221         sql $DBURL 'SELECT * FROM mytable ORDER BY Seq;'
2222
2223       Output will be similar to:
2224
2225         Seq|Host|Starttime|JobRuntime|Send|Receive|Exitval|_Signal|
2226           Command|V1|V2|Stdout|Stderr
2227         1|:|1451619638.903|0.806||8|0|0|echo foo baz|foo|baz|foo baz
2228         |
2229         2|:|1451619639.265|1.54||9|0|0|echo foo quuz|foo|quuz|foo quuz
2230         |
2231         3|:|1451619640.378|1.43||8|0|0|echo bar baz|bar|baz|bar baz
2232         |
2233         4|:|1451619641.473|0.958||9|0|0|echo bar quuz|bar|quuz|bar quuz
2234         |
2235
2236       The first columns are well known from --joblog. V1 and V2 are data from
2237       the input sources. Stdout and Stderr are standard output and standard
2238       error, respectively.
2239
2240   Using multiple workers
2241       Using an SQL base as storage costs overhead in the order of 1 second
2242       per job.
2243
2244       One of the situations where it makes sense is if you have multiple
2245       workers.
2246
2247       You can then have a single master machine that submits jobs to the SQL
2248       base (but does not do any of the work):
2249
2250         parallel --sqlmaster $DBURLTABLE echo ::: foo bar ::: baz quuz
2251
2252       On the worker machines you run exactly the same command except you
2253       replace --sqlmaster with --sqlworker.
2254
2255         parallel --sqlworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2256
2257       To run a master and a worker on the same machine use --sqlandworker as
2258       shown earlier.
2259

--pipe

2261       The --pipe functionality puts GNU parallel in a different mode: Instead
2262       of treating the data on stdin (standard input) as arguments for a
2263       command to run, the data will be sent to stdin (standard input) of the
2264       command.
2265
2266       The typical situation is:
2267
2268         command_A | command_B | command_C
2269
2270       where command_B is slow, and you want to speed up command_B.
2271
2272   Chunk size
2273       By default GNU parallel will start an instance of command_B, read a
2274       chunk of 1 MB, and pass that to the instance. Then start another
2275       instance, read another chunk, and pass that to the second instance.
2276
2277         cat num1000000 | parallel --pipe wc
2278
2279       Output (the order may be different):
2280
2281         165668  165668 1048571
2282         149797  149797 1048579
2283         149796  149796 1048572
2284         149797  149797 1048579
2285         149797  149797 1048579
2286         149796  149796 1048572
2287          85349   85349  597444
2288
2289       The size of the chunk is not exactly 1 MB because GNU parallel only
2290       passes full lines - never half a line, thus the blocksize is only 1 MB
2291       on average. You can change the block size to 2 MB with --block:
2292
2293         cat num1000000 | parallel --pipe --block 2M wc
2294
2295       Output (the order may be different):
2296
2297         315465  315465 2097150
2298         299593  299593 2097151
2299         299593  299593 2097151
2300          85349   85349  597444
2301
2302       GNU parallel treats each line as a record. If the order of records is
2303       unimportant (e.g. you need all lines processed, but you do not care
2304       which is processed first), then you can use --roundrobin. Without
2305       --roundrobin GNU parallel will start a command per block; with
2306       --roundrobin only the requested number of jobs will be started
2307       (--jobs). The records will then be distributed between the running
2308       jobs:
2309
2310         cat num1000000 | parallel --pipe -j4 --roundrobin wc
2311
2312       Output will be similar to:
2313
2314         149797  149797 1048579
2315         299593  299593 2097151
2316         315465  315465 2097150
2317         235145  235145 1646016
2318
2319       One of the 4 instances got a single record, 2 instances got 2 full
2320       records each, and one instance got 1 full and 1 partial record.
2321
2322   Records
2323       GNU parallel sees the input as records. The default record is a single
2324       line.
2325
2326       Using -N140000 GNU parallel will read 140000 records at a time:
2327
2328         cat num1000000 | parallel --pipe -N140000 wc
2329
2330       Output (the order may be different):
2331
2332         140000  140000  868895
2333         140000  140000  980000
2334         140000  140000  980000
2335         140000  140000  980000
2336         140000  140000  980000
2337         140000  140000  980000
2338         140000  140000  980000
2339          20000   20000  140001
2340
2341       Note how that the last job could not get the full 140000 lines, but
2342       only 20000 lines.
2343
2344       If a record is 75 lines -L can be used:
2345
2346         cat num1000000 | parallel --pipe -L75 wc
2347
2348       Output (the order may be different):
2349
2350         165600  165600 1048095
2351         149850  149850 1048950
2352         149775  149775 1048425
2353         149775  149775 1048425
2354         149850  149850 1048950
2355         149775  149775 1048425
2356          85350   85350  597450
2357             25      25     176
2358
2359       Note how GNU parallel still reads a block of around 1 MB; but instead
2360       of passing full lines to wc it passes full 75 lines at a time. This of
2361       course does not hold for the last job (which in this case got 25
2362       lines).
2363
2364   Fixed length records
2365       Fixed length records can be processed by setting --recend '' and
2366       --block recordsize. A header of size n can be processed with --header
2367       .{n}.
2368
2369       Here is how to process a file with a 4-byte header and a 3-byte record
2370       size:
2371
2372         cat fixedlen | parallel --pipe --header .{4} --block 3 --recend '' \
2373           'echo start; cat; echo'
2374
2375       Output:
2376
2377         start
2378         HHHHAAA
2379         start
2380         HHHHCCC
2381         start
2382         HHHHBBB
2383
2384       It may be more efficient to increase --block to a multiplum of the
2385       record size.
2386
2387   Record separators
2388       GNU parallel uses separators to determine where two records split.
2389
2390       --recstart gives the string that starts a record; --recend gives the
2391       string that ends a record. The default is --recend '\n' (newline).
2392
2393       If both --recend and --recstart are given, then the record will only
2394       split if the recend string is immediately followed by the recstart
2395       string.
2396
2397       Here the --recend is set to ', ':
2398
2399         echo /foo, bar/, /baz, qux/, | \
2400           parallel -kN1 --recend ', ' --pipe echo JOB{#}\;cat\;echo END
2401
2402       Output:
2403
2404         JOB1
2405         /foo, END
2406         JOB2
2407         bar/, END
2408         JOB3
2409         /baz, END
2410         JOB4
2411         qux/,
2412         END
2413
2414       Here the --recstart is set to /:
2415
2416         echo /foo, bar/, /baz, qux/, | \
2417           parallel -kN1 --recstart / --pipe echo JOB{#}\;cat\;echo END
2418
2419       Output:
2420
2421         JOB1
2422         /foo, barEND
2423         JOB2
2424         /, END
2425         JOB3
2426         /baz, quxEND
2427         JOB4
2428         /,
2429         END
2430
2431       Here both --recend and --recstart are set:
2432
2433         echo /foo, bar/, /baz, qux/, | \
2434           parallel -kN1 --recend ', ' --recstart / --pipe \
2435           echo JOB{#}\;cat\;echo END
2436
2437       Output:
2438
2439         JOB1
2440         /foo, bar/, END
2441         JOB2
2442         /baz, qux/,
2443         END
2444
2445       Note the difference between setting one string and setting both
2446       strings.
2447
2448       With --regexp the --recend and --recstart will be treated as a regular
2449       expression:
2450
2451         echo foo,bar,_baz,__qux, | \
2452           parallel -kN1 --regexp --recend ,_+ --pipe \
2453           echo JOB{#}\;cat\;echo END
2454
2455       Output:
2456
2457         JOB1
2458         foo,bar,_END
2459         JOB2
2460         baz,__END
2461         JOB3
2462         qux,
2463         END
2464
2465       GNU parallel can remove the record separators with
2466       --remove-rec-sep/--rrs:
2467
2468         echo foo,bar,_baz,__qux, | \
2469           parallel -kN1 --rrs --regexp --recend ,_+ --pipe \
2470           echo JOB{#}\;cat\;echo END
2471
2472       Output:
2473
2474         JOB1
2475         foo,barEND
2476         JOB2
2477         bazEND
2478         JOB3
2479         qux,
2480         END
2481
2482   Header
2483       If the input data has a header, the header can be repeated for each job
2484       by matching the header with --header. If headers start with % you can
2485       do this:
2486
2487         cat num_%header | \
2488           parallel --header '(%.*\n)*' --pipe -N3 echo JOB{#}\;cat
2489
2490       Output (the order may be different):
2491
2492         JOB1
2493         %head1
2494         %head2
2495         1
2496         2
2497         3
2498         JOB2
2499         %head1
2500         %head2
2501         4
2502         5
2503         6
2504         JOB3
2505         %head1
2506         %head2
2507         7
2508         8
2509         9
2510         JOB4
2511         %head1
2512         %head2
2513         10
2514
2515       If the header is 2 lines, --header 2 will work:
2516
2517         cat num_%header | parallel --header 2 --pipe -N3 echo JOB{#}\;cat
2518
2519       Output: Same as above.
2520
2521   --pipepart
2522       --pipe is not very efficient. It maxes out at around 500 MB/s.
2523       --pipepart can easily deliver 5 GB/s. But there are a few limitations.
2524       The input has to be a normal file (not a pipe) given by -a or :::: and
2525       -L/-l/-N do not work. --recend and --recstart, however, do work, and
2526       records can often be split on that alone.
2527
2528         parallel --pipepart -a num1000000 --block 3m wc
2529
2530       Output (the order may be different):
2531
2532        444443  444444 3000002
2533        428572  428572 3000004
2534        126985  126984  888890
2535

Shebang

2537   Input data and parallel command in the same file
2538       GNU parallel is often called as this:
2539
2540         cat input_file | parallel command
2541
2542       With --shebang the input_file and parallel can be combined into the
2543       same script.
2544
2545       UNIX shell scripts start with a shebang line like this:
2546
2547         #!/bin/bash
2548
2549       GNU parallel can do that, too. With --shebang the arguments can be
2550       listed in the file. The parallel command is the first line of the
2551       script:
2552
2553         #!/usr/bin/parallel --shebang -r echo
2554
2555         foo
2556         bar
2557         baz
2558
2559       Output (the order may be different):
2560
2561         foo
2562         bar
2563         baz
2564
2565   Parallelizing existing scripts
2566       GNU parallel is often called as this:
2567
2568         cat input_file | parallel command
2569         parallel command ::: foo bar
2570
2571       If command is a script, parallel can be combined into a single file so
2572       this will run the script in parallel:
2573
2574         cat input_file | command
2575         command foo bar
2576
2577       This perl script perl_echo works like echo:
2578
2579         #!/usr/bin/perl
2580
2581         print "@ARGV\n"
2582
2583       It can be called as this:
2584
2585         parallel perl_echo ::: foo bar
2586
2587       By changing the #!-line it can be run in parallel:
2588
2589         #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2590
2591         print "@ARGV\n"
2592
2593       Thus this will work:
2594
2595         perl_echo foo bar
2596
2597       Output (the order may be different):
2598
2599         foo
2600         bar
2601
2602       This technique can be used for:
2603
2604       Perl:
2605                  #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2606
2607                  print "Arguments @ARGV\n";
2608
2609       Python:
2610                  #!/usr/bin/parallel --shebang-wrap /usr/bin/python
2611
2612                  import sys
2613                  print 'Arguments', str(sys.argv)
2614
2615       Bash/sh/zsh/Korn shell:
2616                  #!/usr/bin/parallel --shebang-wrap /bin/bash
2617
2618                  echo Arguments "$@"
2619
2620       csh:
2621                  #!/usr/bin/parallel --shebang-wrap /bin/csh
2622
2623                  echo Arguments "$argv"
2624
2625       Tcl:
2626                  #!/usr/bin/parallel --shebang-wrap /usr/bin/tclsh
2627
2628                  puts "Arguments $argv"
2629
2630       R:
2631                  #!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave
2632
2633                  args <- commandArgs(trailingOnly = TRUE)
2634                  print(paste("Arguments ",args))
2635
2636       GNUplot:
2637                  #!/usr/bin/parallel --shebang-wrap ARG={} /usr/bin/gnuplot
2638
2639                  print "Arguments ", system('echo $ARG')
2640
2641       Ruby:
2642                  #!/usr/bin/parallel --shebang-wrap /usr/bin/ruby
2643
2644                  print "Arguments "
2645                  puts ARGV
2646
2647       Octave:
2648                  #!/usr/bin/parallel --shebang-wrap /usr/bin/octave
2649
2650                  printf ("Arguments");
2651                  arg_list = argv ();
2652                  for i = 1:nargin
2653                    printf (" %s", arg_list{i});
2654                  endfor
2655                  printf ("\n");
2656
2657       Common LISP:
2658                  #!/usr/bin/parallel --shebang-wrap /usr/bin/clisp
2659
2660                  (format t "~&~S~&" 'Arguments)
2661                  (format t "~&~S~&" *args*)
2662
2663       PHP:
2664                  #!/usr/bin/parallel --shebang-wrap /usr/bin/php
2665                  <?php
2666                  echo "Arguments";
2667                  foreach(array_slice($argv,1) as $v)
2668                  {
2669                    echo " $v";
2670                  }
2671                  echo "\n";
2672                  ?>
2673
2674       Node.js:
2675                  #!/usr/bin/parallel --shebang-wrap /usr/bin/node
2676
2677                  var myArgs = process.argv.slice(2);
2678                  console.log('Arguments ', myArgs);
2679
2680       LUA:
2681                  #!/usr/bin/parallel --shebang-wrap /usr/bin/lua
2682
2683                  io.write "Arguments"
2684                  for a = 1, #arg do
2685                    io.write(" ")
2686                    io.write(arg[a])
2687                  end
2688                  print("")
2689
2690       C#:
2691                  #!/usr/bin/parallel --shebang-wrap ARGV={} /usr/bin/csharp
2692
2693                  var argv = Environment.GetEnvironmentVariable("ARGV");
2694                  print("Arguments "+argv);
2695

Semaphore

2697       GNU parallel can work as a counting semaphore. This is slower and less
2698       efficient than its normal mode.
2699
2700       A counting semaphore is like a row of toilets. People needing a toilet
2701       can use any toilet, but if there are more people than toilets, they
2702       will have to wait for one of the toilets to become available.
2703
2704       An alias for parallel --semaphore is sem.
2705
2706       sem will follow a person to the toilets, wait until a toilet is
2707       available, leave the person in the toilet and exit.
2708
2709       sem --fg will follow a person to the toilets, wait until a toilet is
2710       available, stay with the person in the toilet and exit when the person
2711       exits.
2712
2713       sem --wait will wait for all persons to leave the toilets.
2714
2715       sem does not have a queue discipline, so the next person is chosen
2716       randomly.
2717
2718       -j sets the number of toilets.
2719
2720   Mutex
2721       The default is to have only one toilet (this is called a mutex). The
2722       program is started in the background and sem exits immediately. Use
2723       --wait to wait for all sems to finish:
2724
2725         sem 'sleep 1; echo The first finished' &&
2726           echo The first is now running in the background &&
2727           sem 'sleep 1; echo The second finished' &&
2728           echo The second is now running in the background
2729         sem --wait
2730
2731       Output:
2732
2733         The first is now running in the background
2734         The first finished
2735         The second is now running in the background
2736         The second finished
2737
2738       The command can be run in the foreground with --fg, which will only
2739       exit when the command completes:
2740
2741         sem --fg 'sleep 1; echo The first finished' &&
2742           echo The first finished running in the foreground &&
2743           sem --fg 'sleep 1; echo The second finished' &&
2744           echo The second finished running in the foreground
2745         sem --wait
2746
2747       The difference between this and just running the command, is that a
2748       mutex is set, so if other sems were running in the background only one
2749       would run at a time.
2750
2751       To control which semaphore is used, use --semaphorename/--id. Run this
2752       in one terminal:
2753
2754         sem --id my_id -u 'echo First started; sleep 10; echo First done'
2755
2756       and simultaneously this in another terminal:
2757
2758         sem --id my_id -u 'echo Second started; sleep 10; echo Second done'
2759
2760       Note how the second will only be started when the first has finished.
2761
2762   Counting semaphore
2763       A mutex is like having a single toilet: When it is in use everyone else
2764       will have to wait. A counting semaphore is like having multiple
2765       toilets: Several people can use the toilets, but when they all are in
2766       use, everyone else will have to wait.
2767
2768       sem can emulate a counting semaphore. Use --jobs to set the number of
2769       toilets like this:
2770
2771         sem --jobs 3 --id my_id -u 'echo Start 1; sleep 5; echo 1 done' &&
2772         sem --jobs 3 --id my_id -u 'echo Start 2; sleep 6; echo 2 done' &&
2773         sem --jobs 3 --id my_id -u 'echo Start 3; sleep 7; echo 3 done' &&
2774         sem --jobs 3 --id my_id -u 'echo Start 4; sleep 8; echo 4 done' &&
2775         sem --wait --id my_id
2776
2777       Output:
2778
2779         Start 1
2780         Start 2
2781         Start 3
2782         1 done
2783         Start 4
2784         2 done
2785         3 done
2786         4 done
2787
2788   Timeout
2789       With --semaphoretimeout you can force running the command anyway after
2790       a period (positive number) or give up (negative number):
2791
2792         sem --id foo -u 'echo Slow started; sleep 5; echo Slow ended' &&
2793         sem --id foo --semaphoretimeout 1 'echo Forced running after 1 sec' &&
2794         sem --id foo --semaphoretimeout -2 'echo Give up after 2 secs'
2795         sem --id foo --wait
2796
2797       Output:
2798
2799         Slow started
2800         parallel: Warning: Semaphore timed out. Stealing the semaphore.
2801         Forced running after 1 sec
2802         parallel: Warning: Semaphore timed out. Exiting.
2803         Slow ended
2804
2805       Note how the 'Give up' was not run.
2806

Informational

2808       GNU parallel has some options to give short information about the
2809       configuration.
2810
2811       --help will print a summary of the most important options:
2812
2813         parallel --help
2814
2815       Output:
2816
2817         Usage:
2818
2819         parallel [options] [command [arguments]] < list_of_arguments
2820         parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
2821         cat ... | parallel --pipe [options] [command [arguments]]
2822
2823         -j n            Run n jobs in parallel
2824         -k              Keep same order
2825         -X              Multiple arguments with context replace
2826         --colsep regexp Split input on regexp for positional replacements
2827         {} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings
2828         {3} {3.} {3/} {3/.} {=3 perl code =}    Positional replacement strings
2829         With --plus:    {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
2830                         {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
2831
2832         -S sshlogin     Example: foo@server.example.com
2833         --slf ..        Use ~/.parallel/sshloginfile as the list of sshlogins
2834         --trc {}.bar    Shorthand for --transfer --return {}.bar --cleanup
2835         --onall         Run the given command with argument on all sshlogins
2836         --nonall        Run the given command with no arguments on all sshlogins
2837
2838         --pipe          Split stdin (standard input) to multiple jobs.
2839         --recend str    Record end separator for --pipe.
2840         --recstart str  Record start separator for --pipe.
2841
2842         See 'man parallel' for details
2843
2844         Academic tradition requires you to cite works you base your article on.
2845         When using programs that use GNU Parallel to process data for publication
2846         please cite:
2847
2848           O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
2849           ;login: The USENIX Magazine, February 2011:42-47.
2850
2851         This helps funding further development; AND IT WON'T COST YOU A CENT.
2852         If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2853
2854       When asking for help, always report the full output of this:
2855
2856         parallel --version
2857
2858       Output:
2859
2860         GNU parallel 20200122
2861         Copyright (C) 2007-2020 Ole Tange, http://ole.tange.dk and Free Software
2862         Foundation, Inc.
2863         License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
2864         This is free software: you are free to change and redistribute it.
2865         GNU parallel comes with no warranty.
2866
2867         Web site: http://www.gnu.org/software/parallel
2868
2869         When using programs that use GNU Parallel to process data for publication
2870         please cite as described in 'parallel --citation'.
2871
2872       In scripts --minversion can be used to ensure the user has at least
2873       this version:
2874
2875         parallel --minversion 20130722 && \
2876           echo Your version is at least 20130722.
2877
2878       Output:
2879
2880         20160322
2881         Your version is at least 20130722.
2882
2883       If you are using GNU parallel for research the BibTeX citation can be
2884       generated using --citation:
2885
2886         parallel --citation
2887
2888       Output:
2889
2890         Academic tradition requires you to cite works you base your article on.
2891         When using programs that use GNU Parallel to process data for publication
2892         please cite:
2893
2894         @article{Tange2011a,
2895           title = {GNU Parallel - The Command-Line Power Tool},
2896           author = {O. Tange},
2897           address = {Frederiksberg, Denmark},
2898           journal = {;login: The USENIX Magazine},
2899           month = {Feb},
2900           number = {1},
2901           volume = {36},
2902           url = {http://www.gnu.org/s/parallel},
2903           year = {2011},
2904           pages = {42-47},
2905           doi = {10.5281/zenodo.16303}
2906         }
2907
2908         (Feel free to use \nocite{Tange2011a})
2909
2910         This helps funding further development; AND IT WON'T COST YOU A CENT.
2911         If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2912
2913         If you send a copy of your published article to tange@gnu.org, it will be
2914         mentioned in the release notes of next version of GNU Parallel.
2915
2916       With --max-line-length-allowed GNU parallel will report the maximal
2917       size of the command line:
2918
2919         parallel --max-line-length-allowed
2920
2921       Output (may vary on different systems):
2922
2923         131071
2924
2925       --number-of-cpus and --number-of-cores run system specific code to
2926       determine the number of CPUs and CPU cores on the system. On
2927       unsupported platforms they will return 1:
2928
2929         parallel --number-of-cpus
2930         parallel --number-of-cores
2931
2932       Output (may vary on different systems):
2933
2934         4
2935         64
2936

Profiles

2938       The defaults for GNU parallel can be changed systemwide by putting the
2939       command line options in /etc/parallel/config. They can be changed for a
2940       user by putting them in ~/.parallel/config.
2941
2942       Profiles work the same way, but have to be referred to with --profile:
2943
2944         echo '--nice 17' > ~/.parallel/nicetimeout
2945         echo '--timeout 300%' >> ~/.parallel/nicetimeout
2946         parallel --profile nicetimeout echo ::: A B C
2947
2948       Output:
2949
2950         A
2951         B
2952         C
2953
2954       Profiles can be combined:
2955
2956         echo '-vv --dry-run' > ~/.parallel/dryverbose
2957         parallel --profile dryverbose --profile nicetimeout echo ::: A B C
2958
2959       Output:
2960
2961         echo A
2962         echo B
2963         echo C
2964

Spread the word

2966       I hope you have learned something from this tutorial.
2967
2968       If you like GNU parallel:
2969
2970       · (Re-)walk through the tutorial if you have not done so in the past
2971         year (http://www.gnu.org/software/parallel/parallel_tutorial.html)
2972
2973       · Give a demo at your local user group/your team/your colleagues
2974
2975       · Post the intro videos and the tutorial on Reddit, Mastodon,
2976         Diaspora*, forums, blogs, Identi.ca, Google+, Twitter, Facebook,
2977         Linkedin, and mailing lists
2978
2979       · Request or write a review for your favourite blog or magazine
2980         (especially if you do something cool with GNU parallel)
2981
2982       · Invite me for your next conference
2983
2984       If you use GNU parallel for research:
2985
2986       · Please cite GNU parallel in you publications (use --citation)
2987
2988       If GNU parallel saves you money:
2989
2990       · (Have your company) donate to FSF or become a member
2991         https://my.fsf.org/donate/
2992
2993       (C) 2013-2020 Ole Tange, FDLv1.3 (See fdl.txt)
2994
2995
2996
299720200322                          2020-04-22              PARALLEL_TUTORIAL(7)