parallel_tutorial(7)

1PARALLEL_TUTORIAL(7)               parallel               PARALLEL_TUTORIAL(7)
2
3
4

GNU Parallel Tutorial

6       This tutorial shows off much of GNU parallel's functionality. The
7       tutorial is meant to learn the options in and syntax of GNU parallel.
8       The tutorial is not to show realistic examples from the real world.
9
10   Reader's guide
11       If you prefer reading a book buy GNU Parallel 2018 at
12       https://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
13       or download it at: https://doi.org/10.5281/zenodo.1146014
14
15       Otherwise start by watching the intro videos for a quick introduction:
16       https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
17
18       Then browse through the EXAMPLEs after the list of OPTIONS in man
19       parallel (Use LESS=+/EXAMPLE: man parallel). That will give you an idea
20       of what GNU parallel is capable of.
21
22       If you want to dive even deeper: spend a couple of hours walking
23       through the tutorial (man parallel_tutorial). Your command line will
24       love you for it.
25
26       Finally you may want to look at the rest of the manual (man parallel)
27       if you have special needs not already covered.
28
29       If you want to know the design decisions behind GNU parallel, try: man
30       parallel_design. This is also a good intro if you intend to change GNU
31       parallel.
32

Prerequisites

34       To run this tutorial you must have the following:
35
36       parallel >= version 20160822
37                Install the newest version using your package manager
38                (recommended for security reasons), the way described in
39                README, or with this command:
40
41                  $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
42                     fetch -o - http://pi.dk/3 ) > install.sh
43                  $ sha1sum install.sh
44                  12345678 3374ec53 bacb199b 245af2dd a86df6c9
45                  $ md5sum install.sh
46                  029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
47                  $ sha512sum install.sh
48                  40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
49                  60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
50                  $ bash install.sh
51
52                This will also install the newest version of the tutorial
53                which you can see by running this:
54
55                  man parallel_tutorial
56
57                Most of the tutorial will work on older versions, too.
58
59       abc-file:
60                The file can be generated by this command:
61
62                  parallel -k echo ::: A B C > abc-file
63
64       def-file:
65                The file can be generated by this command:
66
67                  parallel -k echo ::: D E F > def-file
68
69       abc0-file:
70                The file can be generated by this command:
71
72                  perl -e 'printf "A\0B\0C\0"' > abc0-file
73
74       abc_-file:
75                The file can be generated by this command:
76
77                  perl -e 'printf "A_B_C_"' > abc_-file
78
79       tsv-file.tsv
80                The file can be generated by this command:
81
82                  perl -e 'printf "f1\tf2\nA\tB\nC\tD\n"' > tsv-file.tsv
83
84       num8     The file can be generated by this command:
85
86                  perl -e 'for(1..8){print "$_\n"}' > num8
87
88       num128   The file can be generated by this command:
89
90                  perl -e 'for(1..128){print "$_\n"}' > num128
91
92       num30000 The file can be generated by this command:
93
94                  perl -e 'for(1..30000){print "$_\n"}' > num30000
95
96       num1000000
97                The file can be generated by this command:
98
99                  perl -e 'for(1..1000000){print "$_\n"}' > num1000000
100
101       num_%header
102                The file can be generated by this command:
103
104                  (echo %head1; echo %head2; \
105                   perl -e 'for(1..10){print "$_\n"}') > num_%header
106
107       fixedlen The file can be generated by this command:
108
109                  perl -e 'print "HHHHAAABBBCCC"' > fixedlen
110
111       For remote running: ssh login on 2 servers with no password in $SERVER1
112       and $SERVER2 must work.
113                  SERVER1=server.example.com
114                  SERVER2=server2.example.net
115
116                So you must be able to do this without entering a password:
117
118                  ssh $SERVER1 echo works
119                  ssh $SERVER2 echo works
120
121                It can be setup by running 'ssh-keygen -t dsa; ssh-copy-id
122                $SERVER1' and using an empty passphrase, or you can use ssh-
123                agent.
124

Input sources

126       GNU parallel reads input from input sources. These can be files, the
127       command line, and stdin (standard input or a pipe).
128
129   A single input source
130       Input can be read from the command line:
131
132         parallel echo ::: A B C
133
134       Output (the order may be different because the jobs are run in
135       parallel):
136
137         A
138         B
139         C
140
141       The input source can be a file:
142
143         parallel -a abc-file echo
144
145       Output: Same as above.
146
147       STDIN (standard input) can be the input source:
148
149         cat abc-file | parallel echo
150
151       Output: Same as above.
152
153   Multiple input sources
154       GNU parallel can take multiple input sources given on the command line.
155       GNU parallel then generates all combinations of the input sources:
156
157         parallel echo ::: A B C ::: D E F
158
159       Output (the order may be different):
160
161         A D
162         A E
163         A F
164         B D
165         B E
166         B F
167         C D
168         C E
169         C F
170
171       The input sources can be files:
172
173         parallel -a abc-file -a def-file echo
174
175       Output: Same as above.
176
177       STDIN (standard input) can be one of the input sources using -:
178
179         cat abc-file | parallel -a - -a def-file echo
180
181       Output: Same as above.
182
183       Instead of -a files can be given after :::::
184
185         cat abc-file | parallel echo :::: - def-file
186
187       Output: Same as above.
188
189       ::: and :::: can be mixed:
190
191         parallel echo ::: A B C :::: def-file
192
193       Output: Same as above.
194
195       Linking arguments from input sources
196
197       With --link you can link the input sources and get one argument from
198       each input source:
199
200         parallel --link echo ::: A B C ::: D E F
201
202       Output (the order may be different):
203
204         A D
205         B E
206         C F
207
208       If one of the input sources is too short, its values will wrap:
209
210         parallel --link echo ::: A B C D E ::: F G
211
212       Output (the order may be different):
213
214         A F
215         B G
216         C F
217         D G
218         E F
219
220       For more flexible linking you can use :::+ and ::::+. They work like
221       ::: and :::: except they link the previous input source to this input
222       source.
223
224       This will link ABC to GHI:
225
226         parallel echo :::: abc-file :::+ G H I :::: def-file
227
228       Output (the order may be different):
229
230         A G D
231         A G E
232         A G F
233         B H D
234         B H E
235         B H F
236         C I D
237         C I E
238         C I F
239
240       This will link GHI to DEF:
241
242         parallel echo :::: abc-file ::: G H I ::::+ def-file
243
244       Output (the order may be different):
245
246         A G D
247         A H E
248         A I F
249         B G D
250         B H E
251         B I F
252         C G D
253         C H E
254         C I F
255
256       If one of the input sources is too short when using :::+ or ::::+, the
257       rest will be ignored:
258
259         parallel echo ::: A B C D E :::+ F G
260
261       Output (the order may be different):
262
263         A F
264         B G
265
266   Changing the argument separator.
267       GNU parallel can use other separators than ::: or ::::. This is
268       typically useful if ::: or :::: is used in the command to run:
269
270         parallel --arg-sep ,, echo ,, A B C :::: def-file
271
272       Output (the order may be different):
273
274         A D
275         A E
276         A F
277         B D
278         B E
279         B F
280         C D
281         C E
282         C F
283
284       Changing the argument file separator:
285
286         parallel --arg-file-sep // echo ::: A B C // def-file
287
288       Output: Same as above.
289
290   Changing the argument delimiter
291       GNU parallel will normally treat a full line as a single argument: It
292       uses \n as argument delimiter. This can be changed with -d:
293
294         parallel -d _ echo :::: abc_-file
295
296       Output (the order may be different):
297
298         A
299         B
300         C
301
302       NUL can be given as \0:
303
304         parallel -d '\0' echo :::: abc0-file
305
306       Output: Same as above.
307
308       A shorthand for -d '\0' is -0 (this will often be used to read files
309       from find ... -print0):
310
311         parallel -0 echo :::: abc0-file
312
313       Output: Same as above.
314
315   End-of-file value for input source
316       GNU parallel can stop reading when it encounters a certain value:
317
318         parallel -E stop echo ::: A B stop C D
319
320       Output:
321
322         A
323         B
324
325   Skipping empty lines
326       Using --no-run-if-empty GNU parallel will skip empty lines.
327
328         (echo 1; echo; echo 2) | parallel --no-run-if-empty echo
329
330       Output:
331
332         1
333         2
334

Building the command line

336   No command means arguments are commands
337       If no command is given after parallel the arguments themselves are
338       treated as commands:
339
340         parallel ::: ls 'echo foo' pwd
341
342       Output (the order may be different):
343
344         [list of files in current dir]
345         foo
346         [/path/to/current/working/dir]
347
348       The command can be a script, a binary or a Bash function if the
349       function is exported using export -f:
350
351         # Only works in Bash
352         my_func() {
353           echo in my_func $1
354         }
355         export -f my_func
356         parallel my_func ::: 1 2 3
357
358       Output (the order may be different):
359
360         in my_func 1
361         in my_func 2
362         in my_func 3
363
364   Replacement strings
365       The 7 predefined replacement strings
366
367       GNU parallel has several replacement strings. If no replacement strings
368       are used the default is to append {}:
369
370         parallel echo ::: A/B.C
371
372       Output:
373
374         A/B.C
375
376       The default replacement string is {}:
377
378         parallel echo {} ::: A/B.C
379
380       Output:
381
382         A/B.C
383
384       The replacement string {.} removes the extension:
385
386         parallel echo {.} ::: A/B.C
387
388       Output:
389
390         A/B
391
392       The replacement string {/} removes the path:
393
394         parallel echo {/} ::: A/B.C
395
396       Output:
397
398         B.C
399
400       The replacement string {//} keeps only the path:
401
402         parallel echo {//} ::: A/B.C
403
404       Output:
405
406         A
407
408       The replacement string {/.} removes the path and the extension:
409
410         parallel echo {/.} ::: A/B.C
411
412       Output:
413
414         B
415
416       The replacement string {#} gives the job number:
417
418         parallel echo {#} ::: A B C
419
420       Output (the order may be different):
421
422         1
423         2
424         3
425
426       The replacement string {%} gives the job slot number (between 1 and
427       number of jobs to run in parallel):
428
429         parallel -j 2 echo {%} ::: A B C
430
431       Output (the order may be different and 1 and 2 may be swapped):
432
433         1
434         2
435         1
436
437       Changing the replacement strings
438
439       The replacement string {} can be changed with -I:
440
441         parallel -I ,, echo ,, ::: A/B.C
442
443       Output:
444
445         A/B.C
446
447       The replacement string {.} can be changed with --extensionreplace:
448
449         parallel --extensionreplace ,, echo ,, ::: A/B.C
450
451       Output:
452
453         A/B
454
455       The replacement string {/} can be replaced with --basenamereplace:
456
457         parallel --basenamereplace ,, echo ,, ::: A/B.C
458
459       Output:
460
461         B.C
462
463       The replacement string {//} can be changed with --dirnamereplace:
464
465         parallel --dirnamereplace ,, echo ,, ::: A/B.C
466
467       Output:
468
469         A
470
471       The replacement string {/.} can be changed with
472       --basenameextensionreplace:
473
474         parallel --basenameextensionreplace ,, echo ,, ::: A/B.C
475
476       Output:
477
478         B
479
480       The replacement string {#} can be changed with --seqreplace:
481
482         parallel --seqreplace ,, echo ,, ::: A B C
483
484       Output (the order may be different):
485
486         1
487         2
488         3
489
490       The replacement string {%} can be changed with --slotreplace:
491
492         parallel -j2 --slotreplace ,, echo ,, ::: A B C
493
494       Output (the order may be different and 1 and 2 may be swapped):
495
496         1
497         2
498         1
499
500       Perl expression replacement string
501
502       When predefined replacement strings are not flexible enough a perl
503       expression can be used instead. One example is to remove two
504       extensions: foo.tar.gz becomes foo
505
506         parallel echo '{= s:\.[^.]+$::;s:\.[^.]+$::; =}' ::: foo.tar.gz
507
508       Output:
509
510         foo
511
512       In {= =} you can access all of GNU parallel's internal functions and
513       variables. A few are worth mentioning.
514
515       total_jobs() returns the total number of jobs:
516
517         parallel echo Job {#} of {= '$_=total_jobs()' =} ::: {1..5}
518
519       Output:
520
521         Job 1 of 5
522         Job 2 of 5
523         Job 3 of 5
524         Job 4 of 5
525         Job 5 of 5
526
527       Q(...) shell quotes the string:
528
529         parallel echo {} shell quoted is {= '$_=Q($_)' =} ::: '*/!#$'
530
531       Output:
532
533         */!#$ shell quoted is \*/\!\#\$
534
535       skip() skips the job:
536
537         parallel echo {= 'if($_==3) { skip() }' =} ::: {1..5}
538
539       Output:
540
541         1
542         2
543         4
544         5
545
546       @arg contains the input source variables:
547
548         parallel echo {= 'if($arg[1]==$arg[2]) { skip() }' =} \
549           ::: {1..3} ::: {1..3}
550
551       Output:
552
553         1 2
554         1 3
555         2 1
556         2 3
557         3 1
558         3 2
559
560       If the strings {= and =} cause problems they can be replaced with
561       --parens:
562
563         parallel --parens ,,,, echo ',, s:\.[^.]+$::;s:\.[^.]+$::; ,,' \
564           ::: foo.tar.gz
565
566       Output:
567
568         foo
569
570       To define a shorthand replacement string use --rpl:
571
572         parallel --rpl '.. s:\.[^.]+$::;s:\.[^.]+$::;' echo '..' \
573           ::: foo.tar.gz
574
575       Output: Same as above.
576
577       If the shorthand starts with { it can be used as a positional
578       replacement string, too:
579
580         parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{..}'
581           ::: foo.tar.gz
582
583       Output: Same as above.
584
585       If the shorthand contains matching parenthesis the replacement string
586       becomes a dynamic replacement string and the string in the parenthesis
587       can be accessed as $$1. If there are multiple matching parenthesis, the
588       matched strings can be accessed using $$2, $$3 and so on.
589
590       You can think of this as giving arguments to the replacement string.
591       Here we give the argument .tar.gz to the replacement string {%string}
592       which removes string:
593
594         parallel --rpl '{%(.+?)} s/$$1$//;' echo {%.tar.gz}.zip ::: foo.tar.gz
595
596       Output:
597
598         foo.zip
599
600       Here we give the two arguments tar.gz and zip to the replacement string
601       {/string1/string2} which replaces string1 with string2:
602
603         parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;' echo {/tar.gz/zip} \
604           ::: foo.tar.gz
605
606       Output:
607
608         foo.zip
609
610       GNU parallel's 7 replacement strings are implemented as this:
611
612         --rpl '{} '
613         --rpl '{#} $_=$job->seq()'
614         --rpl '{%} $_=$job->slot()'
615         --rpl '{/} s:.*/::'
616         --rpl '{//} $Global::use{"File::Basename"} ||=
617                  eval "use File::Basename; 1;"; $_ = dirname($_);'
618         --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
619         --rpl '{.} s:\.[^/.]+$::'
620
621       Positional replacement strings
622
623       With multiple input sources the argument from the individual input
624       sources can be accessed with {number}:
625
626         parallel echo {1} and {2} ::: A B ::: C D
627
628       Output (the order may be different):
629
630         A and C
631         A and D
632         B and C
633         B and D
634
635       The positional replacement strings can also be modified using /, //,
636       /., and  .:
637
638         parallel echo /={1/} //={1//} /.={1/.} .={1.} ::: A/B.C D/E.F
639
640       Output (the order may be different):
641
642         /=B.C //=A /.=B .=A/B
643         /=E.F //=D /.=E .=D/E
644
645       If a position is negative, it will refer to the input source counted
646       from behind:
647
648         parallel echo 1={1} 2={2} 3={3} -1={-1} -2={-2} -3={-3} \
649           ::: A B ::: C D ::: E F
650
651       Output (the order may be different):
652
653         1=A 2=C 3=E -1=E -2=C -3=A
654         1=A 2=C 3=F -1=F -2=C -3=A
655         1=A 2=D 3=E -1=E -2=D -3=A
656         1=A 2=D 3=F -1=F -2=D -3=A
657         1=B 2=C 3=E -1=E -2=C -3=B
658         1=B 2=C 3=F -1=F -2=C -3=B
659         1=B 2=D 3=E -1=E -2=D -3=B
660         1=B 2=D 3=F -1=F -2=D -3=B
661
662       Positional perl expression replacement string
663
664       To use a perl expression as a positional replacement string simply
665       prepend the perl expression with number and space:
666
667         parallel echo '{=2 s:\.[^.]+$::;s:\.[^.]+$::; =} {1}' \
668           ::: bar ::: foo.tar.gz
669
670       Output:
671
672         foo bar
673
674       If a shorthand defined using --rpl starts with { it can be used as a
675       positional replacement string, too:
676
677         parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{2..} {1}' \
678           ::: bar ::: foo.tar.gz
679
680       Output: Same as above.
681
682       Input from columns
683
684       The columns in a file can be bound to positional replacement strings
685       using --colsep. Here the columns are separated by TAB (\t):
686
687         parallel --colsep '\t' echo 1={1} 2={2} :::: tsv-file.tsv
688
689       Output (the order may be different):
690
691         1=f1 2=f2
692         1=A 2=B
693         1=C 2=D
694
695       Header defined replacement strings
696
697       With --header GNU parallel will use the first value of the input source
698       as the name of the replacement string. Only the non-modified version {}
699       is supported:
700
701         parallel --header : echo f1={f1} f2={f2} ::: f1 A B ::: f2 C D
702
703       Output (the order may be different):
704
705         f1=A f2=C
706         f1=A f2=D
707         f1=B f2=C
708         f1=B f2=D
709
710       It is useful with --colsep for processing files with TAB separated
711       values:
712
713         parallel --header : --colsep '\t' echo f1={f1} f2={f2} \
714           :::: tsv-file.tsv
715
716       Output (the order may be different):
717
718         f1=A f2=B
719         f1=C f2=D
720
721       More pre-defined replacement strings with --plus
722
723       --plus adds the replacement strings {+/} {+.} {+..} {+...} {..}  {...}
724       {/..} {/...} {##}. The idea being that {+foo} matches the opposite of
725       {foo} and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
726       {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}.
727
728         parallel --plus echo {} ::: dir/sub/file.ex1.ex2.ex3
729         parallel --plus echo {+/}/{/} ::: dir/sub/file.ex1.ex2.ex3
730         parallel --plus echo {.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
731         parallel --plus echo {+/}/{/.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
732         parallel --plus echo {..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
733         parallel --plus echo {+/}/{/..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
734         parallel --plus echo {...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
735         parallel --plus echo {+/}/{/...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
736
737       Output:
738
739         dir/sub/file.ex1.ex2.ex3
740
741       {##} is simply the number of jobs:
742
743         parallel --plus echo Job {#} of {##} ::: {1..5}
744
745       Output:
746
747         Job 1 of 5
748         Job 2 of 5
749         Job 3 of 5
750         Job 4 of 5
751         Job 5 of 5
752
753       Dynamic replacement strings with --plus
754
755       --plus also defines these dynamic replacement strings:
756
757       {:-string}         Default value is string if the argument is empty.
758
759       {:number}          Substring from number till end of string.
760
761       {:number1:number2} Substring from number1 to number2.
762
763       {#string}          If the argument starts with string, remove it.
764
765       {%string}          If the argument ends with string, remove it.
766
767       {/string1/string2} Replace string1 with string2.
768
769       {^string}          If the argument starts with string, upper case it.
770                          string must be a single letter.
771
772       {^^string}         If the argument contains string, upper case it.
773                          string must be a single letter.
774
775       {,string}          If the argument starts with string, lower case it.
776                          string must be a single letter.
777
778       {,,string}         If the argument contains string, lower case it.
779                          string must be a single letter.
780
781       They are inspired from Bash:
782
783         unset myvar
784         echo ${myvar:-myval}
785         parallel --plus echo {:-myval} ::: "$myvar"
786
787         myvar=abcAaAdef
788         echo ${myvar:2}
789         parallel --plus echo {:2} ::: "$myvar"
790
791         echo ${myvar:2:3}
792         parallel --plus echo {:2:3} ::: "$myvar"
793
794         echo ${myvar#bc}
795         parallel --plus echo {#bc} ::: "$myvar"
796         echo ${myvar#abc}
797         parallel --plus echo {#abc} ::: "$myvar"
798
799         echo ${myvar%de}
800         parallel --plus echo {%de} ::: "$myvar"
801         echo ${myvar%def}
802         parallel --plus echo {%def} ::: "$myvar"
803
804         echo ${myvar/def/ghi}
805         parallel --plus echo {/def/ghi} ::: "$myvar"
806
807         echo ${myvar^a}
808         parallel --plus echo {^a} ::: "$myvar"
809         echo ${myvar^^a}
810         parallel --plus echo {^^a} ::: "$myvar"
811
812         myvar=AbcAaAdef
813         echo ${myvar,A}
814         parallel --plus echo '{,A}' ::: "$myvar"
815         echo ${myvar,,A}
816         parallel --plus echo '{,,A}' ::: "$myvar"
817
818       Output:
819
820         myval
821         myval
822         cAaAdef
823         cAaAdef
824         cAa
825         cAa
826         abcAaAdef
827         abcAaAdef
828         AaAdef
829         AaAdef
830         abcAaAdef
831         abcAaAdef
832         abcAaA
833         abcAaA
834         abcAaAghi
835         abcAaAghi
836         AbcAaAdef
837         AbcAaAdef
838         AbcAAAdef
839         AbcAAAdef
840         abcAaAdef
841         abcAaAdef
842         abcaaadef
843         abcaaadef
844
845   More than one argument
846       With --xargs GNU parallel will fit as many arguments as possible on a
847       single line:
848
849         cat num30000 | parallel --xargs echo | wc -l
850
851       Output (if you run this under Bash on GNU/Linux):
852
853         2
854
855       The 30000 arguments fitted on 2 lines.
856
857       The maximal length of a single line can be set with -s. With a maximal
858       line length of 10000 chars 17 commands will be run:
859
860         cat num30000 | parallel --xargs -s 10000 echo | wc -l
861
862       Output:
863
864         17
865
866       For better parallelism GNU parallel can distribute the arguments
867       between all the parallel jobs when end of file is met.
868
869       Below GNU parallel reads the last argument when generating the second
870       job. When GNU parallel reads the last argument, it spreads all the
871       arguments for the second job over 4 jobs instead, as 4 parallel jobs
872       are requested.
873
874       The first job will be the same as the --xargs example above, but the
875       second job will be split into 4 evenly sized jobs, resulting in a total
876       of 5 jobs:
877
878         cat num30000 | parallel --jobs 4 -m echo | wc -l
879
880       Output (if you run this under Bash on GNU/Linux):
881
882         5
883
884       This is even more visible when running 4 jobs with 10 arguments. The 10
885       arguments are being spread over 4 jobs:
886
887         parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10
888
889       Output:
890
891         1 2 3
892         4 5 6
893         7 8 9
894         10
895
896       A replacement string can be part of a word. -m will not repeat the
897       context:
898
899         parallel --jobs 4 -m echo pre-{}-post ::: A B C D E F G
900
901       Output (the order may be different):
902
903         pre-A B-post
904         pre-C D-post
905         pre-E F-post
906         pre-G-post
907
908       To repeat the context use -X which otherwise works like -m:
909
910         parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G
911
912       Output (the order may be different):
913
914         pre-A-post pre-B-post
915         pre-C-post pre-D-post
916         pre-E-post pre-F-post
917         pre-G-post
918
919       To limit the number of arguments use -N:
920
921         parallel -N3 echo ::: A B C D E F G H
922
923       Output (the order may be different):
924
925         A B C
926         D E F
927         G H
928
929       -N also sets the positional replacement strings:
930
931         parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H
932
933       Output (the order may be different):
934
935         1=A 2=B 3=C
936         1=D 2=E 3=F
937         1=G 2=H 3=
938
939       -N0 reads 1 argument but inserts none:
940
941         parallel -N0 echo foo ::: 1 2 3
942
943       Output:
944
945         foo
946         foo
947         foo
948
949   Quoting
950       Command lines that contain special characters may need to be protected
951       from the shell.
952
953       The perl program print "@ARGV\n" basically works like echo.
954
955         perl -e 'print "@ARGV\n"' A
956
957       Output:
958
959         A
960
961       To run that in parallel the command needs to be quoted:
962
963         parallel perl -e 'print "@ARGV\n"' ::: This wont work
964
965       Output:
966
967         [Nothing]
968
969       To quote the command use -q:
970
971         parallel -q perl -e 'print "@ARGV\n"' ::: This works
972
973       Output (the order may be different):
974
975         This
976         works
977
978       Or you can quote the critical part using \':
979
980         parallel perl -e \''print "@ARGV\n"'\' ::: This works, too
981
982       Output (the order may be different):
983
984         This
985         works,
986         too
987
988       GNU parallel can also \-quote full lines. Simply run this:
989
990         parallel --shellquote
991         Warning: Input is read from the terminal. You either know what you
992         Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
993         Warning: ::: or :::: or to pipe data into parallel. If so
994         Warning: consider going through the tutorial: man parallel_tutorial
995         Warning: Press CTRL-D to exit.
996         perl -e 'print "@ARGV\n"'
997         [CTRL-D]
998
999       Output:
1000
1001         perl\ -e\ \'print\ \"@ARGV\\n\"\'
1002
1003       This can then be used as the command:
1004
1005         parallel perl\ -e\ \'print\ \"@ARGV\\n\"\' ::: This also works
1006
1007       Output (the order may be different):
1008
1009         This
1010         also
1011         works
1012
1013   Trimming space
1014       Space can be trimmed on the arguments using --trim:
1015
1016         parallel --trim r echo pre-{}-post ::: ' A '
1017
1018       Output:
1019
1020         pre- A-post
1021
1022       To trim on the left side:
1023
1024         parallel --trim l echo pre-{}-post ::: ' A '
1025
1026       Output:
1027
1028         pre-A -post
1029
1030       To trim on the both sides:
1031
1032         parallel --trim lr echo pre-{}-post ::: ' A '
1033
1034       Output:
1035
1036         pre-A-post
1037
1038   Respecting the shell
1039       This tutorial uses Bash as the shell. GNU parallel respects which shell
1040       you are using, so in zsh you can do:
1041
1042         parallel echo \={} ::: zsh bash ls
1043
1044       Output:
1045
1046         /usr/bin/zsh
1047         /bin/bash
1048         /bin/ls
1049
1050       In csh you can do:
1051
1052         parallel 'set a="{}"; if( { test -d "$a" } ) echo "$a is a dir"' ::: *
1053
1054       Output:
1055
1056         [somedir] is a dir
1057
1058       This also becomes useful if you use GNU parallel in a shell script: GNU
1059       parallel will use the same shell as the shell script.
1060

Controlling the output

1062       The output can prefixed with the argument:
1063
1064         parallel --tag echo foo-{} ::: A B C
1065
1066       Output (the order may be different):
1067
1068         A       foo-A
1069         B       foo-B
1070         C       foo-C
1071
1072       To prefix it with another string use --tagstring:
1073
1074         parallel --tagstring {}-bar echo foo-{} ::: A B C
1075
1076       Output (the order may be different):
1077
1078         A-bar   foo-A
1079         B-bar   foo-B
1080         C-bar   foo-C
1081
1082       To see what commands will be run without running them use --dryrun:
1083
1084         parallel --dryrun echo {} ::: A B C
1085
1086       Output (the order may be different):
1087
1088         echo A
1089         echo B
1090         echo C
1091
1092       To print the command before running them use --verbose:
1093
1094         parallel --verbose echo {} ::: A B C
1095
1096       Output (the order may be different):
1097
1098         echo A
1099         echo B
1100         A
1101         echo C
1102         B
1103         C
1104
1105       GNU parallel will postpone the output until the command completes:
1106
1107         parallel -j2 'printf "%s-start\n%s" {} {};
1108           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1109
1110       Output:
1111
1112         2-start
1113         2-middle
1114         2-end
1115         1-start
1116         1-middle
1117         1-end
1118         4-start
1119         4-middle
1120         4-end
1121
1122       To get the output immediately use --ungroup:
1123
1124         parallel -j2 --ungroup 'printf "%s-start\n%s" {} {};
1125           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1126
1127       Output:
1128
1129         4-start
1130         42-start
1131         2-middle
1132         2-end
1133         1-start
1134         1-middle
1135         1-end
1136         -middle
1137         4-end
1138
1139       --ungroup is fast, but can cause half a line from one job to be mixed
1140       with half a line of another job. That has happened in the second line,
1141       where the line '4-middle' is mixed with '2-start'.
1142
1143       To avoid this use --linebuffer:
1144
1145         parallel -j2 --linebuffer 'printf "%s-start\n%s" {} {};
1146           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1147
1148       Output:
1149
1150         4-start
1151         2-start
1152         2-middle
1153         2-end
1154         1-start
1155         1-middle
1156         1-end
1157         4-middle
1158         4-end
1159
1160       To force the output in the same order as the arguments use
1161       --keep-order/-k:
1162
1163         parallel -j2 -k 'printf "%s-start\n%s" {} {};
1164           sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1165
1166       Output:
1167
1168         4-start
1169         4-middle
1170         4-end
1171         2-start
1172         2-middle
1173         2-end
1174         1-start
1175         1-middle
1176         1-end
1177
1178   Saving output into files
1179       GNU parallel can save the output of each job into files:
1180
1181         parallel --files echo ::: A B C
1182
1183       Output will be similar to this:
1184
1185         /tmp/pAh6uWuQCg.par
1186         /tmp/opjhZCzAX4.par
1187         /tmp/W0AT_Rph2o.par
1188
1189       By default GNU parallel will cache the output in files in /tmp. This
1190       can be changed by setting $TMPDIR or --tmpdir:
1191
1192         parallel --tmpdir /var/tmp --files echo ::: A B C
1193
1194       Output will be similar to this:
1195
1196         /var/tmp/N_vk7phQRc.par
1197         /var/tmp/7zA4Ccf3wZ.par
1198         /var/tmp/LIuKgF_2LP.par
1199
1200       Or:
1201
1202         TMPDIR=/var/tmp parallel --files echo ::: A B C
1203
1204       Output: Same as above.
1205
1206       The output files can be saved in a structured way using --results:
1207
1208         parallel --results outdir echo ::: A B C
1209
1210       Output:
1211
1212         A
1213         B
1214         C
1215
1216       These files were also generated containing the standard output
1217       (stdout), standard error (stderr), and the sequence number (seq):
1218
1219         outdir/1/A/seq
1220         outdir/1/A/stderr
1221         outdir/1/A/stdout
1222         outdir/1/B/seq
1223         outdir/1/B/stderr
1224         outdir/1/B/stdout
1225         outdir/1/C/seq
1226         outdir/1/C/stderr
1227         outdir/1/C/stdout
1228
1229       --header : will take the first value as name and use that in the
1230       directory structure. This is useful if you are using multiple input
1231       sources:
1232
1233         parallel --header : --results outdir echo ::: f1 A B ::: f2 C D
1234
1235       Generated files:
1236
1237         outdir/f1/A/f2/C/seq
1238         outdir/f1/A/f2/C/stderr
1239         outdir/f1/A/f2/C/stdout
1240         outdir/f1/A/f2/D/seq
1241         outdir/f1/A/f2/D/stderr
1242         outdir/f1/A/f2/D/stdout
1243         outdir/f1/B/f2/C/seq
1244         outdir/f1/B/f2/C/stderr
1245         outdir/f1/B/f2/C/stdout
1246         outdir/f1/B/f2/D/seq
1247         outdir/f1/B/f2/D/stderr
1248         outdir/f1/B/f2/D/stdout
1249
1250       The directories are named after the variables and their values.
1251

Controlling the execution

1253   Number of simultaneous jobs
1254       The number of concurrent jobs is given with --jobs/-j:
1255
1256         /usr/bin/time parallel -N0 -j64 sleep 1 :::: num128
1257
1258       With 64 jobs in parallel the 128 sleeps will take 2-8 seconds to run -
1259       depending on how fast your machine is.
1260
1261       By default --jobs is the same as the number of CPU cores. So this:
1262
1263         /usr/bin/time parallel -N0 sleep 1 :::: num128
1264
1265       should take twice the time of running 2 jobs per CPU core:
1266
1267         /usr/bin/time parallel -N0 --jobs 200% sleep 1 :::: num128
1268
1269       --jobs 0 will run as many jobs in parallel as possible:
1270
1271         /usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: num128
1272
1273       which should take 1-7 seconds depending on how fast your machine is.
1274
1275       --jobs can read from a file which is re-read when a job finishes:
1276
1277         echo 50% > my_jobs
1278         /usr/bin/time parallel -N0 --jobs my_jobs sleep 1 :::: num128 &
1279         sleep 1
1280         echo 0 > my_jobs
1281         wait
1282
1283       The first second only 50% of the CPU cores will run a job. Then 0 is
1284       put into my_jobs and then the rest of the jobs will be started in
1285       parallel.
1286
1287       Instead of basing the percentage on the number of CPU cores GNU
1288       parallel can base it on the number of CPUs:
1289
1290         parallel --use-cpus-instead-of-cores -N0 sleep 1 :::: num8
1291
1292   Shuffle job order
1293       If you have many jobs (e.g. by multiple combinations of input sources),
1294       it can be handy to shuffle the jobs, so you get different values run.
1295       Use --shuf for that:
1296
1297         parallel --shuf echo ::: 1 2 3 ::: a b c ::: A B C
1298
1299       Output:
1300
1301         All combinations but different order for each run.
1302
1303   Interactivity
1304       GNU parallel can ask the user if a command should be run using
1305       --interactive:
1306
1307         parallel --interactive echo ::: 1 2 3
1308
1309       Output:
1310
1311         echo 1 ?...y
1312         echo 2 ?...n
1313         1
1314         echo 3 ?...y
1315         3
1316
1317       GNU parallel can be used to put arguments on the command line for an
1318       interactive command such as emacs to edit one file at a time:
1319
1320         parallel --tty emacs ::: 1 2 3
1321
1322       Or give multiple argument in one go to open multiple files:
1323
1324         parallel -X --tty vi ::: 1 2 3
1325
1326   A terminal for every job
1327       Using --tmux GNU parallel can start a terminal for every job run:
1328
1329         seq 10 20 | parallel --tmux 'echo start {}; sleep {}; echo done {}'
1330
1331       This will tell you to run something similar to:
1332
1333         tmux -S /tmp/tmsrPrO0 attach
1334
1335       Using normal tmux keystrokes (CTRL-b n or CTRL-b p) you can cycle
1336       between windows of the running jobs. When a job is finished it will
1337       pause for 10 seconds before closing the window.
1338
1339   Timing
1340       Some jobs do heavy I/O when they start. To avoid a thundering herd GNU
1341       parallel can delay starting new jobs. --delay X will make sure there is
1342       at least X seconds between each start:
1343
1344         parallel --delay 2.5 echo Starting {}\;date ::: 1 2 3
1345
1346       Output:
1347
1348         Starting 1
1349         Thu Aug 15 16:24:33 CEST 2013
1350         Starting 2
1351         Thu Aug 15 16:24:35 CEST 2013
1352         Starting 3
1353         Thu Aug 15 16:24:38 CEST 2013
1354
1355       If jobs taking more than a certain amount of time are known to fail,
1356       they can be stopped with --timeout. The accuracy of --timeout is 2
1357       seconds:
1358
1359         parallel --timeout 4.1 sleep {}\; echo {} ::: 2 4 6 8
1360
1361       Output:
1362
1363         2
1364         4
1365
1366       GNU parallel can compute the median runtime for jobs and kill those
1367       that take more than 200% of the median runtime:
1368
1369         parallel --timeout 200% sleep {}\; echo {} ::: 2.1 2.2 3 7 2.3
1370
1371       Output:
1372
1373         2.1
1374         2.2
1375         3
1376         2.3
1377
1378   Progress information
1379       Based on the runtime of completed jobs GNU parallel can estimate the
1380       total runtime:
1381
1382         parallel --eta sleep ::: 1 3 2 2 1 3 3 2 1
1383
1384       Output:
1385
1386         Computers / CPU cores / Max jobs to run
1387         1:local / 2 / 2
1388
1389         Computer:jobs running/jobs completed/%of started jobs/
1390           Average seconds to complete
1391         ETA: 2s 0left 1.11avg  local:0/9/100%/1.1s
1392
1393       GNU parallel can give progress information with --progress:
1394
1395         parallel --progress sleep ::: 1 3 2 2 1 3 3 2 1
1396
1397       Output:
1398
1399         Computers / CPU cores / Max jobs to run
1400         1:local / 2 / 2
1401
1402         Computer:jobs running/jobs completed/%of started jobs/
1403           Average seconds to complete
1404         local:0/9/100%/1.1s
1405
1406       A progress bar can be shown with --bar:
1407
1408         parallel --bar sleep ::: 1 3 2 2 1 3 3 2 1
1409
1410       And a graphic bar can be shown with --bar and zenity:
1411
1412         seq 1000 | parallel -j10 --bar '(echo -n {};sleep 0.1)' \
1413           2> >(perl -pe 'BEGIN{$/="\r";$|=1};s/\r/\n/g' |
1414                zenity --progress --auto-kill --auto-close)
1415
1416       A logfile of the jobs completed so far can be generated with --joblog:
1417
1418         parallel --joblog /tmp/log exit  ::: 1 2 3 0
1419         cat /tmp/log
1420
1421       Output:
1422
1423         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1424         1   :    1376577364.974 0.008   0    0       1       0      exit 1
1425         2   :    1376577364.982 0.013   0    0       2       0      exit 2
1426         3   :    1376577364.990 0.013   0    0       3       0      exit 3
1427         4   :    1376577365.003 0.003   0    0       0       0      exit 0
1428
1429       The log contains the job sequence, which host the job was run on, the
1430       start time and run time, how much data was transferred, the exit value,
1431       the signal that killed the job, and finally the command being run.
1432
1433       With a joblog GNU parallel can be stopped and later pickup where it
1434       left off. It it important that the input of the completed jobs is
1435       unchanged.
1436
1437         parallel --joblog /tmp/log exit  ::: 1 2 3 0
1438         cat /tmp/log
1439         parallel --resume --joblog /tmp/log exit  ::: 1 2 3 0 0 0
1440         cat /tmp/log
1441
1442       Output:
1443
1444         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1445         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1446         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1447         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1448         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1449
1450         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1451         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1452         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1453         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1454         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1455         5   :    1376580070.028 0.009   0    0       0       0      exit 0
1456         6   :    1376580070.038 0.007   0    0       0       0      exit 0
1457
1458       Note how the start time of the last 2 jobs is clearly different from
1459       the second run.
1460
1461       With --resume-failed GNU parallel will re-run the jobs that failed:
1462
1463         parallel --resume-failed --joblog /tmp/log exit  ::: 1 2 3 0 0 0
1464         cat /tmp/log
1465
1466       Output:
1467
1468         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1469         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1470         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1471         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1472         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1473         5   :    1376580070.028 0.009   0    0       0       0      exit 0
1474         6   :    1376580070.038 0.007   0    0       0       0      exit 0
1475         1   :    1376580154.433 0.010   0    0       1       0      exit 1
1476         2   :    1376580154.444 0.022   0    0       2       0      exit 2
1477         3   :    1376580154.466 0.005   0    0       3       0      exit 3
1478
1479       Note how seq 1 2 3 have been repeated because they had exit value
1480       different from 0.
1481
1482       --retry-failed does almost the same as --resume-failed. Where
1483       --resume-failed reads the commands from the command line (and ignores
1484       the commands in the joblog), --retry-failed ignores the command line
1485       and reruns the commands mentioned in the joblog.
1486
1487         parallel --retry-failed --joblog /tmp/log
1488         cat /tmp/log
1489
1490       Output:
1491
1492         Seq Host Starttime      Runtime Send Receive Exitval Signal Command
1493         1   :    1376580069.544 0.008   0    0       1       0      exit 1
1494         2   :    1376580069.552 0.009   0    0       2       0      exit 2
1495         3   :    1376580069.560 0.012   0    0       3       0      exit 3
1496         4   :    1376580069.571 0.005   0    0       0       0      exit 0
1497         5   :    1376580070.028 0.009   0    0       0       0      exit 0
1498         6   :    1376580070.038 0.007   0    0       0       0      exit 0
1499         1   :    1376580154.433 0.010   0    0       1       0      exit 1
1500         2   :    1376580154.444 0.022   0    0       2       0      exit 2
1501         3   :    1376580154.466 0.005   0    0       3       0      exit 3
1502         1   :    1376580164.633 0.010   0    0       1       0      exit 1
1503         2   :    1376580164.644 0.022   0    0       2       0      exit 2
1504         3   :    1376580164.666 0.005   0    0       3       0      exit 3
1505
1506   Termination
1507       Unconditional termination
1508
1509       By default GNU parallel will wait for all jobs to finish before
1510       exiting.
1511
1512       If you send GNU parallel the TERM signal, GNU parallel will stop
1513       spawning new jobs and wait for the remaining jobs to finish. If you
1514       send GNU parallel the TERM signal again, GNU parallel will kill all
1515       running jobs and exit.
1516
1517       Termination dependent on job status
1518
1519       For certain jobs there is no need to continue if one of the jobs fails
1520       and has an exit code different from 0. GNU parallel will stop spawning
1521       new jobs with --halt soon,fail=1:
1522
1523         parallel -j2 --halt soon,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1524
1525       Output:
1526
1527         0
1528         0
1529         1
1530         parallel: This job failed:
1531         echo 1; exit 1
1532         parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1533         2
1534
1535       With --halt now,fail=1 the running jobs will be killed immediately:
1536
1537         parallel -j2 --halt now,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1538
1539       Output:
1540
1541         0
1542         0
1543         1
1544         parallel: This job failed:
1545         echo 1; exit 1
1546
1547       If --halt is given a percentage this percentage of the jobs must fail
1548       before GNU parallel stops spawning more jobs:
1549
1550         parallel -j2 --halt soon,fail=20% echo {}\; exit {} \
1551           ::: 0 1 2 3 4 5 6 7 8 9
1552
1553       Output:
1554
1555         0
1556         1
1557         parallel: This job failed:
1558         echo 1; exit 1
1559         2
1560         parallel: This job failed:
1561         echo 2; exit 2
1562         parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1563         3
1564         parallel: This job failed:
1565         echo 3; exit 3
1566
1567       If you are looking for success instead of failures, you can use
1568       success. This will finish as soon as the first job succeeds:
1569
1570         parallel -j2 --halt now,success=1 echo {}\; exit {} ::: 1 2 3 0 4 5 6
1571
1572       Output:
1573
1574         1
1575         2
1576         3
1577         0
1578         parallel: This job succeeded:
1579         echo 0; exit 0
1580
1581       GNU parallel can retry the command with --retries. This is useful if a
1582       command fails for unknown reasons now and then.
1583
1584         parallel -k --retries 3 \
1585           'echo tried {} >>/tmp/runs; echo completed {}; exit {}' ::: 1 2 0
1586         cat /tmp/runs
1587
1588       Output:
1589
1590         completed 1
1591         completed 2
1592         completed 0
1593
1594         tried 1
1595         tried 2
1596         tried 1
1597         tried 2
1598         tried 1
1599         tried 2
1600         tried 0
1601
1602       Note how job 1 and 2 were tried 3 times, but 0 was not retried because
1603       it had exit code 0.
1604
1605       Termination signals (advanced)
1606
1607       Using --termseq you can control which signals are sent when killing
1608       children. Normally children will be killed by sending them SIGTERM,
1609       waiting 200 ms, then another SIGTERM, waiting 100 ms, then another
1610       SIGTERM, waiting 50 ms, then a SIGKILL, finally waiting 25 ms before
1611       giving up. It looks like this:
1612
1613         show_signals() {
1614           perl -e 'for(keys %SIG) {
1615               $SIG{$_} = eval "sub { print \"Got $_\\n\"; }";
1616             }
1617             while(1){sleep 1}'
1618         }
1619         export -f show_signals
1620         echo | parallel --termseq TERM,200,TERM,100,TERM,50,KILL,25 \
1621           -u --timeout 1 show_signals
1622
1623       Output:
1624
1625         Got TERM
1626         Got TERM
1627         Got TERM
1628
1629       Or just:
1630
1631         echo | parallel -u --timeout 1 show_signals
1632
1633       Output: Same as above.
1634
1635       You can change this to SIGINT, SIGTERM, SIGKILL:
1636
1637         echo | parallel --termseq INT,200,TERM,100,KILL,25 \
1638           -u --timeout 1 show_signals
1639
1640       Output:
1641
1642         Got INT
1643         Got TERM
1644
1645       The SIGKILL does not show because it cannot be caught, and thus the
1646       child dies.
1647
1648   Limiting the resources
1649       To avoid overloading systems GNU parallel can look at the system load
1650       before starting another job:
1651
1652         parallel --load 100% echo load is less than {} job per cpu ::: 1
1653
1654       Output:
1655
1656         [when then load is less than the number of cpu cores]
1657         load is less than 1 job per cpu
1658
1659       GNU parallel can also check if the system is swapping.
1660
1661         parallel --noswap echo the system is not swapping ::: now
1662
1663       Output:
1664
1665         [when then system is not swapping]
1666         the system is not swapping now
1667
1668       Some jobs need a lot of memory, and should only be started when there
1669       is enough memory free. Using --memfree GNU parallel can check if there
1670       is enough memory free. Additionally, GNU parallel will kill off the
1671       youngest job if the memory free falls below 50% of the size. The killed
1672       job will put back on the queue and retried later.
1673
1674         parallel --memfree 1G echo will run if more than 1 GB is ::: free
1675
1676       GNU parallel can run the jobs with a nice value. This will work both
1677       locally and remotely.
1678
1679         parallel --nice 17 echo this is being run with nice -n ::: 17
1680
1681       Output:
1682
1683         this is being run with nice -n 17
1684

Remote execution

1686       GNU parallel can run jobs on remote servers. It uses ssh to communicate
1687       with the remote machines.
1688
1689   Sshlogin
1690       The most basic sshlogin is -S host:
1691
1692         parallel -S $SERVER1 echo running on ::: $SERVER1
1693
1694       Output:
1695
1696         running on [$SERVER1]
1697
1698       To use a different username prepend the server with username@:
1699
1700         parallel -S username@$SERVER1 echo running on ::: username@$SERVER1
1701
1702       Output:
1703
1704         running on [username@$SERVER1]
1705
1706       The special sshlogin : is the local machine:
1707
1708         parallel -S : echo running on ::: the_local_machine
1709
1710       Output:
1711
1712         running on the_local_machine
1713
1714       If ssh is not in $PATH it can be prepended to $SERVER1:
1715
1716         parallel -S '/usr/bin/ssh '$SERVER1 echo custom ::: ssh
1717
1718       Output:
1719
1720         custom ssh
1721
1722       The ssh command can also be given using --ssh:
1723
1724         parallel --ssh /usr/bin/ssh -S $SERVER1 echo custom ::: ssh
1725
1726       or by setting $PARALLEL_SSH:
1727
1728         export PARALLEL_SSH=/usr/bin/ssh
1729         parallel -S $SERVER1 echo custom ::: ssh
1730
1731       Several servers can be given using multiple -S:
1732
1733         parallel -S $SERVER1 -S $SERVER2 echo ::: running on more hosts
1734
1735       Output (the order may be different):
1736
1737         running
1738         on
1739         more
1740         hosts
1741
1742       Or they can be separated by ,:
1743
1744         parallel -S $SERVER1,$SERVER2 echo ::: running on more hosts
1745
1746       Output: Same as above.
1747
1748       Or newline:
1749
1750         # This gives a \n between $SERVER1 and $SERVER2
1751         SERVERS="`echo $SERVER1; echo $SERVER2`"
1752         parallel -S "$SERVERS" echo ::: running on more hosts
1753
1754       They can also be read from a file (replace user@ with the user on
1755       $SERVER2):
1756
1757         echo $SERVER1 > nodefile
1758         # Force 4 cores, special ssh-command, username
1759         echo 4//usr/bin/ssh user@$SERVER2 >> nodefile
1760         parallel --sshloginfile nodefile echo ::: running on more hosts
1761
1762       Output: Same as above.
1763
1764       Every time a job finished, the --sshloginfile will be re-read, so it is
1765       possible to both add and remove hosts while running.
1766
1767       The special --sshloginfile .. reads from ~/.parallel/sshloginfile.
1768
1769       To force GNU parallel to treat a server having a given number of CPU
1770       cores prepend the number of core followed by / to the sshlogin:
1771
1772         parallel -S 4/$SERVER1 echo force {} cpus on server ::: 4
1773
1774       Output:
1775
1776         force 4 cpus on server
1777
1778       Servers can be put into groups by prepending @groupname to the server
1779       and the group can then be selected by appending @groupname to the
1780       argument if using --hostgroup:
1781
1782         parallel --hostgroup -S @grp1/$SERVER1 -S @grp2/$SERVER2 echo {} \
1783           ::: run_on_grp1@grp1 run_on_grp2@grp2
1784
1785       Output:
1786
1787         run_on_grp1
1788         run_on_grp2
1789
1790       A host can be in multiple groups by separating the groups with +, and
1791       you can force GNU parallel to limit the groups on which the command can
1792       be run with -S @groupname:
1793
1794         parallel -S @grp1 -S @grp1+grp2/$SERVER1 -S @grp2/SERVER2 echo {} \
1795           ::: run_on_grp1 also_grp1
1796
1797       Output:
1798
1799         run_on_grp1
1800         also_grp1
1801
1802   Transferring files
1803       GNU parallel can transfer the files to be processed to the remote host.
1804       It does that using rsync.
1805
1806         echo This is input_file > input_file
1807         parallel -S $SERVER1 --transferfile {} cat ::: input_file
1808
1809       Output:
1810
1811         This is input_file
1812
1813       If the files are processed into another file, the resulting file can be
1814       transferred back:
1815
1816         echo This is input_file > input_file
1817         parallel -S $SERVER1 --transferfile {} --return {}.out \
1818           cat {} ">"{}.out ::: input_file
1819         cat input_file.out
1820
1821       Output: Same as above.
1822
1823       To remove the input and output file on the remote server use --cleanup:
1824
1825         echo This is input_file > input_file
1826         parallel -S $SERVER1 --transferfile {} --return {}.out --cleanup \
1827           cat {} ">"{}.out ::: input_file
1828         cat input_file.out
1829
1830       Output: Same as above.
1831
1832       There is a shorthand for --transferfile {} --return --cleanup called
1833       --trc:
1834
1835         echo This is input_file > input_file
1836         parallel -S $SERVER1 --trc {}.out cat {} ">"{}.out ::: input_file
1837         cat input_file.out
1838
1839       Output: Same as above.
1840
1841       Some jobs need a common database for all jobs. GNU parallel can
1842       transfer that using --basefile which will transfer the file before the
1843       first job:
1844
1845         echo common data > common_file
1846         parallel --basefile common_file -S $SERVER1 \
1847           cat common_file\; echo {} ::: foo
1848
1849       Output:
1850
1851         common data
1852         foo
1853
1854       To remove it from the remote host after the last job use --cleanup.
1855
1856   Working dir
1857       The default working dir on the remote machines is the login dir. This
1858       can be changed with --workdir mydir.
1859
1860       Files transferred using --transferfile and --return will be relative to
1861       mydir on remote computers, and the command will be executed in the dir
1862       mydir.
1863
1864       The special mydir value ... will create working dirs under
1865       ~/.parallel/tmp on the remote computers. If --cleanup is given these
1866       dirs will be removed.
1867
1868       The special mydir value . uses the current working dir.  If the current
1869       working dir is beneath your home dir, the value . is treated as the
1870       relative path to your home dir. This means that if your home dir is
1871       different on remote computers (e.g. if your login is different) the
1872       relative path will still be relative to your home dir.
1873
1874         parallel -S $SERVER1 pwd ::: ""
1875         parallel --workdir . -S $SERVER1 pwd ::: ""
1876         parallel --workdir ... -S $SERVER1 pwd ::: ""
1877
1878       Output:
1879
1880         [the login dir on $SERVER1]
1881         [current dir relative on $SERVER1]
1882         [a dir in ~/.parallel/tmp/...]
1883
1884   Avoid overloading sshd
1885       If many jobs are started on the same server, sshd can be overloaded.
1886       GNU parallel can insert a delay between each job run on the same
1887       server:
1888
1889         parallel -S $SERVER1 --sshdelay 0.2 echo ::: 1 2 3
1890
1891       Output (the order may be different):
1892
1893         1
1894         2
1895         3
1896
1897       sshd will be less overloaded if using --controlmaster, which will
1898       multiplex ssh connections:
1899
1900         parallel --controlmaster -S $SERVER1 echo ::: 1 2 3
1901
1902       Output: Same as above.
1903
1904   Ignore hosts that are down
1905       In clusters with many hosts a few of them are often down. GNU parallel
1906       can ignore those hosts. In this case the host 173.194.32.46 is down:
1907
1908         parallel --filter-hosts -S 173.194.32.46,$SERVER1 echo ::: bar
1909
1910       Output:
1911
1912         bar
1913
1914   Running the same commands on all hosts
1915       GNU parallel can run the same command on all the hosts:
1916
1917         parallel --onall -S $SERVER1,$SERVER2 echo ::: foo bar
1918
1919       Output (the order may be different):
1920
1921         foo
1922         bar
1923         foo
1924         bar
1925
1926       Often you will just want to run a single command on all hosts with out
1927       arguments. --nonall is a no argument --onall:
1928
1929         parallel --nonall -S $SERVER1,$SERVER2 echo foo bar
1930
1931       Output:
1932
1933         foo bar
1934         foo bar
1935
1936       When --tag is used with --nonall and --onall the --tagstring is the
1937       host:
1938
1939         parallel --nonall --tag -S $SERVER1,$SERVER2 echo foo bar
1940
1941       Output (the order may be different):
1942
1943         $SERVER1 foo bar
1944         $SERVER2 foo bar
1945
1946       --jobs sets the number of servers to log in to in parallel.
1947
1948   Transferring environment variables and functions
1949       env_parallel is a shell function that transfers all aliases, functions,
1950       variables, and arrays. You active it by running:
1951
1952         source `which env_parallel.bash`
1953
1954       Replace bash with the shell you use.
1955
1956       Now you can use env_parallel instead of parallel and still have your
1957       environment:
1958
1959         alias myecho=echo
1960         myvar="Joe's var is"
1961         env_parallel -S $SERVER1 'myecho $myvar' ::: green
1962
1963       Output:
1964
1965         Joe's var is green
1966
1967       The disadvantage is that if your environment is huge env_parallel will
1968       fail.
1969
1970       When env_parallel fails, you can still use --env to tell GNU parallel
1971       to transfer an environment variable to the remote system.
1972
1973         MYVAR='foo bar'
1974         export MYVAR
1975         parallel --env MYVAR -S $SERVER1 echo '$MYVAR' ::: baz
1976
1977       Output:
1978
1979         foo bar baz
1980
1981       This works for functions, too, if your shell is Bash:
1982
1983         # This only works in Bash
1984         my_func() {
1985           echo in my_func $1
1986         }
1987         export -f my_func
1988         parallel --env my_func -S $SERVER1 my_func ::: baz
1989
1990       Output:
1991
1992         in my_func baz
1993
1994       GNU parallel can copy all user defined variables and functions to the
1995       remote system. It just needs to record which ones to ignore in
1996       ~/.parallel/ignored_vars. Do that by running this once:
1997
1998         parallel --record-env
1999         cat ~/.parallel/ignored_vars
2000
2001       Output:
2002
2003         [list of variables to ignore - including $PATH and $HOME]
2004
2005       Now all other variables and functions defined will be copied when using
2006       --env _.
2007
2008         # The function is only copied if using Bash
2009         my_func2() {
2010           echo in my_func2 $VAR $1
2011         }
2012         export -f my_func2
2013         VAR=foo
2014         export VAR
2015
2016         parallel --env _ -S $SERVER1 'echo $VAR; my_func2' ::: bar
2017
2018       Output:
2019
2020         foo
2021         in my_func2 foo bar
2022
2023       If you use env_parallel the variables, functions, and aliases do not
2024       even need to be exported to be copied:
2025
2026         NOT='not exported var'
2027         alias myecho=echo
2028         not_ex() {
2029           myecho in not_exported_func $NOT $1
2030         }
2031         env_parallel --env _ -S $SERVER1 'echo $NOT; not_ex' ::: bar
2032
2033       Output:
2034
2035         not exported var
2036         in not_exported_func not exported var bar
2037
2038   Showing what is actually run
2039       --verbose will show the command that would be run on the local machine.
2040
2041       When using --cat, --pipepart, or when a job is run on a remote machine,
2042       the command is wrapped with helper scripts. -vv shows all of this.
2043
2044         parallel -vv --pipepart --block 1M wc :::: num30000
2045
2046       Output:
2047
2048         <num30000 perl -e 'while(@ARGV) { sysseek(STDIN,shift,0) || die;
2049         $left = shift; while($read = sysread(STDIN,$buf, ($left > 131072
2050         ? 131072 : $left))){ $left -= $read; syswrite(STDOUT,$buf); } }'
2051         0 0 0 168894 | (wc)
2052           30000   30000  168894
2053
2054       When the command gets more complex, the output is so hard to read, that
2055       it is only useful for debugging:
2056
2057         my_func3() {
2058           echo in my_func $1 > $1.out
2059         }
2060         export -f my_func3
2061         parallel -vv --workdir ... --nice 17 --env _ --trc {}.out \
2062           -S $SERVER1 my_func3 {} ::: abc-file
2063
2064       Output will be similar to:
2065
2066         ( ssh server -- mkdir -p ./.parallel/tmp/aspire-1928520-1;rsync
2067         --protocol 30 -rlDzR -essh ./abc-file
2068         server:./.parallel/tmp/aspire-1928520-1 );ssh server -- exec perl -e
2069         \''@GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
2070         eval"@GNU_Parallel";my$eval=decode_base64(join"",@ARGV);eval$eval;'\'
2071         c3lzdGVtKCJta2RpciIsIi1wIiwiLS0iLCIucGFyYWxsZWwvdG1wL2FzcGlyZS0xOTI4N
2072         TsgY2hkaXIgIi5wYXJhbGxlbC90bXAvYXNwaXJlLTE5Mjg1MjAtMSIgfHxwcmludChTVE
2073         BhcmFsbGVsOiBDYW5ub3QgY2hkaXIgdG8gLnBhcmFsbGVsL3RtcC9hc3BpcmUtMTkyODU
2074         iKSAmJiBleGl0IDI1NTskRU5WeyJPTERQV0QifT0iL2hvbWUvdGFuZ2UvcHJpdmF0L3Bh
2075         IjskRU5WeyJQQVJBTExFTF9QSUQifT0iMTkyODUyMCI7JEVOVnsiUEFSQUxMRUxfU0VRI
2076         0BiYXNoX2Z1bmN0aW9ucz1xdyhteV9mdW5jMyk7IGlmKCRFTlZ7IlNIRUxMIn09fi9jc2
2077         ByaW50IFNUREVSUiAiQ1NIL1RDU0ggRE8gTk9UIFNVUFBPUlQgbmV3bGluZXMgSU4gVkF
2078         TL0ZVTkNUSU9OUy4gVW5zZXQgQGJhc2hfZnVuY3Rpb25zXG4iOyBleGVjICJmYWxzZSI7
2079         YXNoZnVuYyA9ICJteV9mdW5jMygpIHsgIGVjaG8gaW4gbXlfZnVuYyBcJDEgPiBcJDEub
2080         Xhwb3J0IC1mIG15X2Z1bmMzID4vZGV2L251bGw7IjtAQVJHVj0ibXlfZnVuYzMgYWJjLW
2081         RzaGVsbD0iJEVOVntTSEVMTH0iOyR0bXBkaXI9Ii90bXAiOyRuaWNlPTE3O2RveyRFTlZ
2082         MRUxfVE1QfT0kdG1wZGlyLiIvcGFyIi5qb2luIiIsbWFweygwLi45LCJhIi4uInoiLCJB
2083         KVtyYW5kKDYyKV19KDEuLjUpO313aGlsZSgtZSRFTlZ7UEFSQUxMRUxfVE1QfSk7JFNJ
2084         fT1zdWJ7JGRvbmU9MTt9OyRwaWQ9Zm9yazt1bmxlc3MoJHBpZCl7c2V0cGdycDtldmFse
2085         W9yaXR5KDAsMCwkbmljZSl9O2V4ZWMkc2hlbGwsIi1jIiwoJGJhc2hmdW5jLiJAQVJHVi
2086         JleGVjOiQhXG4iO31kb3skcz0kczwxPzAuMDAxKyRzKjEuMDM6JHM7c2VsZWN0KHVuZGV
2087         mLHVuZGVmLCRzKTt9dW50aWwoJGRvbmV8fGdldHBwaWQ9PTEpO2tpbGwoU0lHSFVQLC0k
2088         dW5sZXNzJGRvbmU7d2FpdDtleGl0KCQ/JjEyNz8xMjgrKCQ/JjEyNyk6MSskPz4+OCk=;
2089         _EXIT_status=$?; mkdir -p ./.; rsync --protocol 30 --rsync-path=cd\
2090         ./.parallel/tmp/aspire-1928520-1/./.\;\ rsync -rlDzR -essh
2091         server:./abc-file.out ./.;ssh server -- \(rm\ -f\
2092         ./.parallel/tmp/aspire-1928520-1/abc-file\;\ sh\ -c\ \'rmdir\
2093         ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\
2094         2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);ssh
2095         server -- \(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file.out\;\
2096         sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\
2097         ./.parallel/\ 2\>/dev/null\'\;rm\ -rf\
2098         ./.parallel/tmp/aspire-1928520-1\;\);ssh server -- rm -rf
2099         .parallel/tmp/aspire-1928520-1; exit $_EXIT_status;
2100

Saving output to shell variables (advanced)

2102       GNU parset will set shell variables to the output of GNU parallel. GNU
2103       parset has one important limitation: It cannot be part of a pipe. In
2104       particular this means it cannot read anything from standard input
2105       (stdin) or pipe output to another program.
2106
2107       To use GNU parset prepend command with destination variables:
2108
2109         parset myvar1,myvar2 echo ::: a b
2110         echo $myvar1
2111         echo $myvar2
2112
2113       Output:
2114
2115         a
2116         b
2117
2118       If you only give a single variable, it will be treated as an array:
2119
2120         parset myarray seq {} 5 ::: 1 2 3
2121         echo "${myarray[1]}"
2122
2123       Output:
2124
2125         2
2126         3
2127         4
2128         5
2129
2130       The commands to run can be an array:
2131
2132         cmd=("echo '<<joe  \"double  space\"  cartoon>>'" "pwd")
2133         parset data ::: "${cmd[@]}"
2134         echo "${data[0]}"
2135         echo "${data[1]}"
2136
2137       Output:
2138
2139         <<joe  "double  space"  cartoon>>
2140         [current dir]
2141

Saving to an SQL base (advanced)

2143       GNU parallel can save into an SQL base. Point GNU parallel to a table
2144       and it will put the joblog there together with the variables and the
2145       output each in their own column.
2146
2147   CSV as SQL base
2148       The simplest is to use a CSV file as the storage table:
2149
2150         parallel --sqlandworker csv:///%2Ftmp/log.csv \
2151           seq ::: 10 ::: 12 13 14
2152         cat /tmp/log.csv
2153
2154       Note how '/' in the path must be written as %2F.
2155
2156       Output will be similar to:
2157
2158         Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2159           Command,V1,V2,Stdout,Stderr
2160         1,:,1458254498.254,0.069,0,9,0,0,"seq 10 12",10,12,"10
2161         11
2162         12
2163         ",
2164         2,:,1458254498.278,0.080,0,12,0,0,"seq 10 13",10,13,"10
2165         11
2166         12
2167         13
2168         ",
2169         3,:,1458254498.301,0.083,0,15,0,0,"seq 10 14",10,14,"10
2170         11
2171         12
2172         13
2173         14
2174         ",
2175
2176       A proper CSV reader (like LibreOffice or R's read.csv) will read this
2177       format correctly - even with fields containing newlines as above.
2178
2179       If the output is big you may want to put it into files using --results:
2180
2181         parallel --results outdir --sqlandworker csv:///%2Ftmp/log2.csv \
2182           seq ::: 10 ::: 12 13 14
2183         cat /tmp/log2.csv
2184
2185       Output will be similar to:
2186
2187         Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2188           Command,V1,V2,Stdout,Stderr
2189         1,:,1458824738.287,0.029,0,9,0,0,
2190           "seq 10 12",10,12,outdir/1/10/2/12/stdout,outdir/1/10/2/12/stderr
2191         2,:,1458824738.298,0.025,0,12,0,0,
2192           "seq 10 13",10,13,outdir/1/10/2/13/stdout,outdir/1/10/2/13/stderr
2193         3,:,1458824738.309,0.026,0,15,0,0,
2194           "seq 10 14",10,14,outdir/1/10/2/14/stdout,outdir/1/10/2/14/stderr
2195
2196   DBURL as table
2197       The CSV file is an example of a DBURL.
2198
2199       GNU parallel uses a DBURL to address the table. A DBURL has this
2200       format:
2201
2202         vendor://[[user][:password]@][host][:port]/[database[/table]
2203
2204       Example:
2205
2206         mysql://scott:tiger@my.example.com/mydatabase/mytable
2207         postgresql://scott:tiger@pg.example.com/mydatabase/mytable
2208         sqlite3:///%2Ftmp%2Fmydatabase/mytable
2209         csv:///%2Ftmp/log.csv
2210
2211       To refer to /tmp/mydatabase with sqlite or csv you need to encode the /
2212       as %2F.
2213
2214       Run a job using sqlite on mytable in /tmp/mydatabase:
2215
2216         DBURL=sqlite3:///%2Ftmp%2Fmydatabase
2217         DBURLTABLE=$DBURL/mytable
2218         parallel --sqlandworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2219
2220       To see the result:
2221
2222         sql $DBURL 'SELECT * FROM mytable ORDER BY Seq;'
2223
2224       Output will be similar to:
2225
2226         Seq|Host|Starttime|JobRuntime|Send|Receive|Exitval|_Signal|
2227           Command|V1|V2|Stdout|Stderr
2228         1|:|1451619638.903|0.806||8|0|0|echo foo baz|foo|baz|foo baz
2229         |
2230         2|:|1451619639.265|1.54||9|0|0|echo foo quuz|foo|quuz|foo quuz
2231         |
2232         3|:|1451619640.378|1.43||8|0|0|echo bar baz|bar|baz|bar baz
2233         |
2234         4|:|1451619641.473|0.958||9|0|0|echo bar quuz|bar|quuz|bar quuz
2235         |
2236
2237       The first columns are well known from --joblog. V1 and V2 are data from
2238       the input sources. Stdout and Stderr are standard output and standard
2239       error, respectively.
2240
2241   Using multiple workers
2242       Using an SQL base as storage costs overhead in the order of 1 second
2243       per job.
2244
2245       One of the situations where it makes sense is if you have multiple
2246       workers.
2247
2248       You can then have a single master machine that submits jobs to the SQL
2249       base (but does not do any of the work):
2250
2251         parallel --sqlmaster $DBURLTABLE echo ::: foo bar ::: baz quuz
2252
2253       On the worker machines you run exactly the same command except you
2254       replace --sqlmaster with --sqlworker.
2255
2256         parallel --sqlworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2257
2258       To run a master and a worker on the same machine use --sqlandworker as
2259       shown earlier.
2260

--pipe

2262       The --pipe functionality puts GNU parallel in a different mode: Instead
2263       of treating the data on stdin (standard input) as arguments for a
2264       command to run, the data will be sent to stdin (standard input) of the
2265       command.
2266
2267       The typical situation is:
2268
2269         command_A | command_B | command_C
2270
2271       where command_B is slow, and you want to speed up command_B.
2272
2273   Chunk size
2274       By default GNU parallel will start an instance of command_B, read a
2275       chunk of 1 MB, and pass that to the instance. Then start another
2276       instance, read another chunk, and pass that to the second instance.
2277
2278         cat num1000000 | parallel --pipe wc
2279
2280       Output (the order may be different):
2281
2282         165668  165668 1048571
2283         149797  149797 1048579
2284         149796  149796 1048572
2285         149797  149797 1048579
2286         149797  149797 1048579
2287         149796  149796 1048572
2288          85349   85349  597444
2289
2290       The size of the chunk is not exactly 1 MB because GNU parallel only
2291       passes full lines - never half a line, thus the blocksize is only 1 MB
2292       on average. You can change the block size to 2 MB with --block:
2293
2294         cat num1000000 | parallel --pipe --block 2M wc
2295
2296       Output (the order may be different):
2297
2298         315465  315465 2097150
2299         299593  299593 2097151
2300         299593  299593 2097151
2301          85349   85349  597444
2302
2303       GNU parallel treats each line as a record. If the order of records is
2304       unimportant (e.g. you need all lines processed, but you do not care
2305       which is processed first), then you can use --roundrobin. Without
2306       --roundrobin GNU parallel will start a command per block; with
2307       --roundrobin only the requested number of jobs will be started
2308       (--jobs). The records will then be distributed between the running
2309       jobs:
2310
2311         cat num1000000 | parallel --pipe -j4 --roundrobin wc
2312
2313       Output will be similar to:
2314
2315         149797  149797 1048579
2316         299593  299593 2097151
2317         315465  315465 2097150
2318         235145  235145 1646016
2319
2320       One of the 4 instances got a single record, 2 instances got 2 full
2321       records each, and one instance got 1 full and 1 partial record.
2322
2323   Records
2324       GNU parallel sees the input as records. The default record is a single
2325       line.
2326
2327       Using -N140000 GNU parallel will read 140000 records at a time:
2328
2329         cat num1000000 | parallel --pipe -N140000 wc
2330
2331       Output (the order may be different):
2332
2333         140000  140000  868895
2334         140000  140000  980000
2335         140000  140000  980000
2336         140000  140000  980000
2337         140000  140000  980000
2338         140000  140000  980000
2339         140000  140000  980000
2340          20000   20000  140001
2341
2342       Note how that the last job could not get the full 140000 lines, but
2343       only 20000 lines.
2344
2345       If a record is 75 lines -L can be used:
2346
2347         cat num1000000 | parallel --pipe -L75 wc
2348
2349       Output (the order may be different):
2350
2351         165600  165600 1048095
2352         149850  149850 1048950
2353         149775  149775 1048425
2354         149775  149775 1048425
2355         149850  149850 1048950
2356         149775  149775 1048425
2357          85350   85350  597450
2358             25      25     176
2359
2360       Note how GNU parallel still reads a block of around 1 MB; but instead
2361       of passing full lines to wc it passes full 75 lines at a time. This of
2362       course does not hold for the last job (which in this case got 25
2363       lines).
2364
2365   Fixed length records
2366       Fixed length records can be processed by setting --recend '' and
2367       --block recordsize. A header of size n can be processed with --header
2368       .{n}.
2369
2370       Here is how to process a file with a 4-byte header and a 3-byte record
2371       size:
2372
2373         cat fixedlen | parallel --pipe --header .{4} --block 3 --recend '' \
2374           'echo start; cat; echo'
2375
2376       Output:
2377
2378         start
2379         HHHHAAA
2380         start
2381         HHHHCCC
2382         start
2383         HHHHBBB
2384
2385       It may be more efficient to increase --block to a multiplum of the
2386       record size.
2387
2388   Record separators
2389       GNU parallel uses separators to determine where two records split.
2390
2391       --recstart gives the string that starts a record; --recend gives the
2392       string that ends a record. The default is --recend '\n' (newline).
2393
2394       If both --recend and --recstart are given, then the record will only
2395       split if the recend string is immediately followed by the recstart
2396       string.
2397
2398       Here the --recend is set to ', ':
2399
2400         echo /foo, bar/, /baz, qux/, | \
2401           parallel -kN1 --recend ', ' --pipe echo JOB{#}\;cat\;echo END
2402
2403       Output:
2404
2405         JOB1
2406         /foo, END
2407         JOB2
2408         bar/, END
2409         JOB3
2410         /baz, END
2411         JOB4
2412         qux/,
2413         END
2414
2415       Here the --recstart is set to /:
2416
2417         echo /foo, bar/, /baz, qux/, | \
2418           parallel -kN1 --recstart / --pipe echo JOB{#}\;cat\;echo END
2419
2420       Output:
2421
2422         JOB1
2423         /foo, barEND
2424         JOB2
2425         /, END
2426         JOB3
2427         /baz, quxEND
2428         JOB4
2429         /,
2430         END
2431
2432       Here both --recend and --recstart are set:
2433
2434         echo /foo, bar/, /baz, qux/, | \
2435           parallel -kN1 --recend ', ' --recstart / --pipe \
2436           echo JOB{#}\;cat\;echo END
2437
2438       Output:
2439
2440         JOB1
2441         /foo, bar/, END
2442         JOB2
2443         /baz, qux/,
2444         END
2445
2446       Note the difference between setting one string and setting both
2447       strings.
2448
2449       With --regexp the --recend and --recstart will be treated as a regular
2450       expression:
2451
2452         echo foo,bar,_baz,__qux, | \
2453           parallel -kN1 --regexp --recend ,_+ --pipe \
2454           echo JOB{#}\;cat\;echo END
2455
2456       Output:
2457
2458         JOB1
2459         foo,bar,_END
2460         JOB2
2461         baz,__END
2462         JOB3
2463         qux,
2464         END
2465
2466       GNU parallel can remove the record separators with
2467       --remove-rec-sep/--rrs:
2468
2469         echo foo,bar,_baz,__qux, | \
2470           parallel -kN1 --rrs --regexp --recend ,_+ --pipe \
2471           echo JOB{#}\;cat\;echo END
2472
2473       Output:
2474
2475         JOB1
2476         foo,barEND
2477         JOB2
2478         bazEND
2479         JOB3
2480         qux,
2481         END
2482
2483   Header
2484       If the input data has a header, the header can be repeated for each job
2485       by matching the header with --header. If headers start with % you can
2486       do this:
2487
2488         cat num_%header | \
2489           parallel --header '(%.*\n)*' --pipe -N3 echo JOB{#}\;cat
2490
2491       Output (the order may be different):
2492
2493         JOB1
2494         %head1
2495         %head2
2496         1
2497         2
2498         3
2499         JOB2
2500         %head1
2501         %head2
2502         4
2503         5
2504         6
2505         JOB3
2506         %head1
2507         %head2
2508         7
2509         8
2510         9
2511         JOB4
2512         %head1
2513         %head2
2514         10
2515
2516       If the header is 2 lines, --header 2 will work:
2517
2518         cat num_%header | parallel --header 2 --pipe -N3 echo JOB{#}\;cat
2519
2520       Output: Same as above.
2521
2522   --pipepart
2523       --pipe is not very efficient. It maxes out at around 500 MB/s.
2524       --pipepart can easily deliver 5 GB/s. But there are a few limitations.
2525       The input has to be a normal file (not a pipe) given by -a or :::: and
2526       -L/-l/-N do not work. --recend and --recstart, however, do work, and
2527       records can often be split on that alone.
2528
2529         parallel --pipepart -a num1000000 --block 3m wc
2530
2531       Output (the order may be different):
2532
2533        444443  444444 3000002
2534        428572  428572 3000004
2535        126985  126984  888890
2536

Shebang

2538   Input data and parallel command in the same file
2539       GNU parallel is often called as this:
2540
2541         cat input_file | parallel command
2542
2543       With --shebang the input_file and parallel can be combined into the
2544       same script.
2545
2546       UNIX shell scripts start with a shebang line like this:
2547
2548         #!/bin/bash
2549
2550       GNU parallel can do that, too. With --shebang the arguments can be
2551       listed in the file. The parallel command is the first line of the
2552       script:
2553
2554         #!/usr/bin/parallel --shebang -r echo
2555
2556         foo
2557         bar
2558         baz
2559
2560       Output (the order may be different):
2561
2562         foo
2563         bar
2564         baz
2565
2566   Parallelizing existing scripts
2567       GNU parallel is often called as this:
2568
2569         cat input_file | parallel command
2570         parallel command ::: foo bar
2571
2572       If command is a script, parallel can be combined into a single file so
2573       this will run the script in parallel:
2574
2575         cat input_file | command
2576         command foo bar
2577
2578       This perl script perl_echo works like echo:
2579
2580         #!/usr/bin/perl
2581
2582         print "@ARGV\n"
2583
2584       It can be called as this:
2585
2586         parallel perl_echo ::: foo bar
2587
2588       By changing the #!-line it can be run in parallel:
2589
2590         #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2591
2592         print "@ARGV\n"
2593
2594       Thus this will work:
2595
2596         perl_echo foo bar
2597
2598       Output (the order may be different):
2599
2600         foo
2601         bar
2602
2603       This technique can be used for:
2604
2605       Perl:
2606                  #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2607
2608                  print "Arguments @ARGV\n";
2609
2610       Python:
2611                  #!/usr/bin/parallel --shebang-wrap /usr/bin/python
2612
2613                  import sys
2614                  print 'Arguments', str(sys.argv)
2615
2616       Bash/sh/zsh/Korn shell:
2617                  #!/usr/bin/parallel --shebang-wrap /bin/bash
2618
2619                  echo Arguments "$@"
2620
2621       csh:
2622                  #!/usr/bin/parallel --shebang-wrap /bin/csh
2623
2624                  echo Arguments "$argv"
2625
2626       Tcl:
2627                  #!/usr/bin/parallel --shebang-wrap /usr/bin/tclsh
2628
2629                  puts "Arguments $argv"
2630
2631       R:
2632                  #!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave
2633
2634                  args <- commandArgs(trailingOnly = TRUE)
2635                  print(paste("Arguments ",args))
2636
2637       GNUplot:
2638                  #!/usr/bin/parallel --shebang-wrap ARG={} /usr/bin/gnuplot
2639
2640                  print "Arguments ", system('echo $ARG')
2641
2642       Ruby:
2643                  #!/usr/bin/parallel --shebang-wrap /usr/bin/ruby
2644
2645                  print "Arguments "
2646                  puts ARGV
2647
2648       Octave:
2649                  #!/usr/bin/parallel --shebang-wrap /usr/bin/octave
2650
2651                  printf ("Arguments");
2652                  arg_list = argv ();
2653                  for i = 1:nargin
2654                    printf (" %s", arg_list{i});
2655                  endfor
2656                  printf ("\n");
2657
2658       Common LISP:
2659                  #!/usr/bin/parallel --shebang-wrap /usr/bin/clisp
2660
2661                  (format t "~&~S~&" 'Arguments)
2662                  (format t "~&~S~&" *args*)
2663
2664       PHP:
2665                  #!/usr/bin/parallel --shebang-wrap /usr/bin/php
2666                  <?php
2667                  echo "Arguments";
2668                  foreach(array_slice($argv,1) as $v)
2669                  {
2670                    echo " $v";
2671                  }
2672                  echo "\n";
2673                  ?>
2674
2675       Node.js:
2676                  #!/usr/bin/parallel --shebang-wrap /usr/bin/node
2677
2678                  var myArgs = process.argv.slice(2);
2679                  console.log('Arguments ', myArgs);
2680
2681       LUA:
2682                  #!/usr/bin/parallel --shebang-wrap /usr/bin/lua
2683
2684                  io.write "Arguments"
2685                  for a = 1, #arg do
2686                    io.write(" ")
2687                    io.write(arg[a])
2688                  end
2689                  print("")
2690
2691       C#:
2692                  #!/usr/bin/parallel --shebang-wrap ARGV={} /usr/bin/csharp
2693
2694                  var argv = Environment.GetEnvironmentVariable("ARGV");
2695                  print("Arguments "+argv);
2696

Semaphore

2698       GNU parallel can work as a counting semaphore. This is slower and less
2699       efficient than its normal mode.
2700
2701       A counting semaphore is like a row of toilets. People needing a toilet
2702       can use any toilet, but if there are more people than toilets, they
2703       will have to wait for one of the toilets to become available.
2704
2705       An alias for parallel --semaphore is sem.
2706
2707       sem will follow a person to the toilets, wait until a toilet is
2708       available, leave the person in the toilet and exit.
2709
2710       sem --fg will follow a person to the toilets, wait until a toilet is
2711       available, stay with the person in the toilet and exit when the person
2712       exits.
2713
2714       sem --wait will wait for all persons to leave the toilets.
2715
2716       sem does not have a queue discipline, so the next person is chosen
2717       randomly.
2718
2719       -j sets the number of toilets.
2720
2721   Mutex
2722       The default is to have only one toilet (this is called a mutex). The
2723       program is started in the background and sem exits immediately. Use
2724       --wait to wait for all sems to finish:
2725
2726         sem 'sleep 1; echo The first finished' &&
2727           echo The first is now running in the background &&
2728           sem 'sleep 1; echo The second finished' &&
2729           echo The second is now running in the background
2730         sem --wait
2731
2732       Output:
2733
2734         The first is now running in the background
2735         The first finished
2736         The second is now running in the background
2737         The second finished
2738
2739       The command can be run in the foreground with --fg, which will only
2740       exit when the command completes:
2741
2742         sem --fg 'sleep 1; echo The first finished' &&
2743           echo The first finished running in the foreground &&
2744           sem --fg 'sleep 1; echo The second finished' &&
2745           echo The second finished running in the foreground
2746         sem --wait
2747
2748       The difference between this and just running the command, is that a
2749       mutex is set, so if other sems were running in the background only one
2750       would run at a time.
2751
2752       To control which semaphore is used, use --semaphorename/--id. Run this
2753       in one terminal:
2754
2755         sem --id my_id -u 'echo First started; sleep 10; echo First done'
2756
2757       and simultaneously this in another terminal:
2758
2759         sem --id my_id -u 'echo Second started; sleep 10; echo Second done'
2760
2761       Note how the second will only be started when the first has finished.
2762
2763   Counting semaphore
2764       A mutex is like having a single toilet: When it is in use everyone else
2765       will have to wait. A counting semaphore is like having multiple
2766       toilets: Several people can use the toilets, but when they all are in
2767       use, everyone else will have to wait.
2768
2769       sem can emulate a counting semaphore. Use --jobs to set the number of
2770       toilets like this:
2771
2772         sem --jobs 3 --id my_id -u 'echo Start 1; sleep 5; echo 1 done' &&
2773         sem --jobs 3 --id my_id -u 'echo Start 2; sleep 6; echo 2 done' &&
2774         sem --jobs 3 --id my_id -u 'echo Start 3; sleep 7; echo 3 done' &&
2775         sem --jobs 3 --id my_id -u 'echo Start 4; sleep 8; echo 4 done' &&
2776         sem --wait --id my_id
2777
2778       Output:
2779
2780         Start 1
2781         Start 2
2782         Start 3
2783         1 done
2784         Start 4
2785         2 done
2786         3 done
2787         4 done
2788
2789   Timeout
2790       With --semaphoretimeout you can force running the command anyway after
2791       a period (positive number) or give up (negative number):
2792
2793         sem --id foo -u 'echo Slow started; sleep 5; echo Slow ended' &&
2794         sem --id foo --semaphoretimeout 1 'echo Forced running after 1 sec' &&
2795         sem --id foo --semaphoretimeout -2 'echo Give up after 2 secs'
2796         sem --id foo --wait
2797
2798       Output:
2799
2800         Slow started
2801         parallel: Warning: Semaphore timed out. Stealing the semaphore.
2802         Forced running after 1 sec
2803         parallel: Warning: Semaphore timed out. Exiting.
2804         Slow ended
2805
2806       Note how the 'Give up' was not run.
2807

Informational

2809       GNU parallel has some options to give short information about the
2810       configuration.
2811
2812       --help will print a summary of the most important options:
2813
2814         parallel --help
2815
2816       Output:
2817
2818         Usage:
2819
2820         parallel [options] [command [arguments]] < list_of_arguments
2821         parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
2822         cat ... | parallel --pipe [options] [command [arguments]]
2823
2824         -j n            Run n jobs in parallel
2825         -k              Keep same order
2826         -X              Multiple arguments with context replace
2827         --colsep regexp Split input on regexp for positional replacements
2828         {} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings
2829         {3} {3.} {3/} {3/.} {=3 perl code =}    Positional replacement strings
2830         With --plus:    {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
2831                         {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
2832
2833         -S sshlogin     Example: foo@server.example.com
2834         --slf ..        Use ~/.parallel/sshloginfile as the list of sshlogins
2835         --trc {}.bar    Shorthand for --transfer --return {}.bar --cleanup
2836         --onall         Run the given command with argument on all sshlogins
2837         --nonall        Run the given command with no arguments on all sshlogins
2838
2839         --pipe          Split stdin (standard input) to multiple jobs.
2840         --recend str    Record end separator for --pipe.
2841         --recstart str  Record start separator for --pipe.
2842
2843         See 'man parallel' for details
2844
2845         Academic tradition requires you to cite works you base your article on.
2846         When using programs that use GNU Parallel to process data for publication
2847         please cite:
2848
2849           O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
2850           ;login: The USENIX Magazine, February 2011:42-47.
2851
2852         This helps funding further development; AND IT WON'T COST YOU A CENT.
2853         If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2854
2855       When asking for help, always report the full output of this:
2856
2857         parallel --version
2858
2859       Output:
2860
2861         GNU parallel 20210122
2862         Copyright (C) 2007-2022 Ole Tange, http://ole.tange.dk and Free Software
2863         Foundation, Inc.
2864         License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
2865         This is free software: you are free to change and redistribute it.
2866         GNU parallel comes with no warranty.
2867
2868         Web site: https://www.gnu.org/software/parallel
2869
2870         When using programs that use GNU Parallel to process data for publication
2871         please cite as described in 'parallel --citation'.
2872
2873       In scripts --minversion can be used to ensure the user has at least
2874       this version:
2875
2876         parallel --minversion 20130722 && \
2877           echo Your version is at least 20130722.
2878
2879       Output:
2880
2881         20160322
2882         Your version is at least 20130722.
2883
2884       If you are using GNU parallel for research the BibTeX citation can be
2885       generated using --citation:
2886
2887         parallel --citation
2888
2889       Output:
2890
2891         Academic tradition requires you to cite works you base your article on.
2892         When using programs that use GNU Parallel to process data for publication
2893         please cite:
2894
2895         @article{Tange2011a,
2896           title = {GNU Parallel - The Command-Line Power Tool},
2897           author = {O. Tange},
2898           address = {Frederiksberg, Denmark},
2899           journal = {;login: The USENIX Magazine},
2900           month = {Feb},
2901           number = {1},
2902           volume = {36},
2903           url = {https://www.gnu.org/s/parallel},
2904           year = {2011},
2905           pages = {42-47},
2906           doi = {10.5281/zenodo.16303}
2907         }
2908
2909         (Feel free to use \nocite{Tange2011a})
2910
2911         This helps funding further development; AND IT WON'T COST YOU A CENT.
2912         If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2913
2914         If you send a copy of your published article to tange@gnu.org, it will be
2915         mentioned in the release notes of next version of GNU Parallel.
2916
2917       With --max-line-length-allowed GNU parallel will report the maximal
2918       size of the command line:
2919
2920         parallel --max-line-length-allowed
2921
2922       Output (may vary on different systems):
2923
2924         131071
2925
2926       --number-of-cpus and --number-of-cores run system specific code to
2927       determine the number of CPUs and CPU cores on the system. On
2928       unsupported platforms they will return 1:
2929
2930         parallel --number-of-cpus
2931         parallel --number-of-cores
2932
2933       Output (may vary on different systems):
2934
2935         4
2936         64
2937

Profiles

2939       The defaults for GNU parallel can be changed systemwide by putting the
2940       command line options in /etc/parallel/config. They can be changed for a
2941       user by putting them in ~/.parallel/config.
2942
2943       Profiles work the same way, but have to be referred to with --profile:
2944
2945         echo '--nice 17' > ~/.parallel/nicetimeout
2946         echo '--timeout 300%' >> ~/.parallel/nicetimeout
2947         parallel --profile nicetimeout echo ::: A B C
2948
2949       Output:
2950
2951         A
2952         B
2953         C
2954
2955       Profiles can be combined:
2956
2957         echo '-vv --dry-run' > ~/.parallel/dryverbose
2958         parallel --profile dryverbose --profile nicetimeout echo ::: A B C
2959
2960       Output:
2961
2962         echo A
2963         echo B
2964         echo C
2965

Spread the word

2967       I hope you have learned something from this tutorial.
2968
2969       If you like GNU parallel:
2970
2971       • (Re-)walk through the tutorial if you have not done so in the past
2972         year (https://www.gnu.org/software/parallel/parallel_tutorial.html)
2973
2974       • Give a demo at your local user group/your team/your colleagues
2975
2976       • Post the intro videos and the tutorial on Reddit, Mastodon,
2977         Diaspora*, forums, blogs, Identi.ca, Google+, Twitter, Facebook,
2978         Linkedin, and mailing lists
2979
2980       • Request or write a review for your favourite blog or magazine
2981         (especially if you do something cool with GNU parallel)
2982
2983       • Invite me for your next conference
2984
2985       If you use GNU parallel for research:
2986
2987       • Please cite GNU parallel in you publications (use --citation)
2988
2989       If GNU parallel saves you money:
2990
2991       • (Have your company) donate to FSF or become a member
2992         https://my.fsf.org/donate/
2993
2994       (C) 2013-2022 Ole Tange, GFDLv1.3+ (See LICENSES/GFDL-1.3-or-later.txt)
2995
2996
2997
299820220422                          2022-05-22              PARALLEL_TUTORIAL(7)