1PARALLEL_TUTORIAL(7) parallel PARALLEL_TUTORIAL(7)
2
3
4
6 This tutorial shows off much of GNU parallel's functionality. The
7 tutorial is meant to learn the options in and syntax of GNU parallel.
8 The tutorial is not to show realistic examples from the real world.
9
10 Reader's guide
11 If you prefer reading a book buy GNU Parallel 2018 at
12 https://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
13 or download it at: https://doi.org/10.5281/zenodo.1146014
14
15 Otherwise start by watching the intro videos for a quick introduction:
16 https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
17
18 Then browse through the EXAMPLEs after the list of OPTIONS in man
19 parallel (Use LESS=+/EXAMPLE: man parallel). That will give you an idea
20 of what GNU parallel is capable of.
21
22 If you want to dive even deeper: spend a couple of hours walking
23 through the tutorial (man parallel_tutorial). Your command line will
24 love you for it.
25
26 Finally you may want to look at the rest of the manual (man parallel)
27 if you have special needs not already covered.
28
29 If you want to know the design decisions behind GNU parallel, try: man
30 parallel_design. This is also a good intro if you intend to change GNU
31 parallel.
32
34 To run this tutorial you must have the following:
35
36 parallel >= version 20160822
37 Install the newest version using your package manager
38 (recommended for security reasons), the way described in
39 README, or with this command:
40
41 $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
42 fetch -o - http://pi.dk/3 ) > install.sh
43 $ sha1sum install.sh
44 12345678 3374ec53 bacb199b 245af2dd a86df6c9
45 $ md5sum install.sh
46 029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
47 $ sha512sum install.sh
48 40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
49 60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
50 $ bash install.sh
51
52 This will also install the newest version of the tutorial
53 which you can see by running this:
54
55 man parallel_tutorial
56
57 Most of the tutorial will work on older versions, too.
58
59 abc-file:
60 The file can be generated by this command:
61
62 parallel -k echo ::: A B C > abc-file
63
64 def-file:
65 The file can be generated by this command:
66
67 parallel -k echo ::: D E F > def-file
68
69 abc0-file:
70 The file can be generated by this command:
71
72 perl -e 'printf "A\0B\0C\0"' > abc0-file
73
74 abc_-file:
75 The file can be generated by this command:
76
77 perl -e 'printf "A_B_C_"' > abc_-file
78
79 tsv-file.tsv
80 The file can be generated by this command:
81
82 perl -e 'printf "f1\tf2\nA\tB\nC\tD\n"' > tsv-file.tsv
83
84 num8 The file can be generated by this command:
85
86 perl -e 'for(1..8){print "$_\n"}' > num8
87
88 num128 The file can be generated by this command:
89
90 perl -e 'for(1..128){print "$_\n"}' > num128
91
92 num30000 The file can be generated by this command:
93
94 perl -e 'for(1..30000){print "$_\n"}' > num30000
95
96 num1000000
97 The file can be generated by this command:
98
99 perl -e 'for(1..1000000){print "$_\n"}' > num1000000
100
101 num_%header
102 The file can be generated by this command:
103
104 (echo %head1; echo %head2; \
105 perl -e 'for(1..10){print "$_\n"}') > num_%header
106
107 fixedlen The file can be generated by this command:
108
109 perl -e 'print "HHHHAAABBBCCC"' > fixedlen
110
111 For remote running: ssh login on 2 servers with no password in $SERVER1
112 and $SERVER2 must work.
113 SERVER1=server.example.com
114 SERVER2=server2.example.net
115
116 So you must be able to do this without entering a password:
117
118 ssh $SERVER1 echo works
119 ssh $SERVER2 echo works
120
121 It can be setup by running 'ssh-keygen -t dsa; ssh-copy-id
122 $SERVER1' and using an empty passphrase, or you can use ssh-
123 agent.
124
126 GNU parallel reads input from input sources. These can be files, the
127 command line, and stdin (standard input or a pipe).
128
129 A single input source
130 Input can be read from the command line:
131
132 parallel echo ::: A B C
133
134 Output (the order may be different because the jobs are run in
135 parallel):
136
137 A
138 B
139 C
140
141 The input source can be a file:
142
143 parallel -a abc-file echo
144
145 Output: Same as above.
146
147 STDIN (standard input) can be the input source:
148
149 cat abc-file | parallel echo
150
151 Output: Same as above.
152
153 Multiple input sources
154 GNU parallel can take multiple input sources given on the command line.
155 GNU parallel then generates all combinations of the input sources:
156
157 parallel echo ::: A B C ::: D E F
158
159 Output (the order may be different):
160
161 A D
162 A E
163 A F
164 B D
165 B E
166 B F
167 C D
168 C E
169 C F
170
171 The input sources can be files:
172
173 parallel -a abc-file -a def-file echo
174
175 Output: Same as above.
176
177 STDIN (standard input) can be one of the input sources using -:
178
179 cat abc-file | parallel -a - -a def-file echo
180
181 Output: Same as above.
182
183 Instead of -a files can be given after :::::
184
185 cat abc-file | parallel echo :::: - def-file
186
187 Output: Same as above.
188
189 ::: and :::: can be mixed:
190
191 parallel echo ::: A B C :::: def-file
192
193 Output: Same as above.
194
195 Linking arguments from input sources
196
197 With --link you can link the input sources and get one argument from
198 each input source:
199
200 parallel --link echo ::: A B C ::: D E F
201
202 Output (the order may be different):
203
204 A D
205 B E
206 C F
207
208 If one of the input sources is too short, its values will wrap:
209
210 parallel --link echo ::: A B C D E ::: F G
211
212 Output (the order may be different):
213
214 A F
215 B G
216 C F
217 D G
218 E F
219
220 For more flexible linking you can use :::+ and ::::+. They work like
221 ::: and :::: except they link the previous input source to this input
222 source.
223
224 This will link ABC to GHI:
225
226 parallel echo :::: abc-file :::+ G H I :::: def-file
227
228 Output (the order may be different):
229
230 A G D
231 A G E
232 A G F
233 B H D
234 B H E
235 B H F
236 C I D
237 C I E
238 C I F
239
240 This will link GHI to DEF:
241
242 parallel echo :::: abc-file ::: G H I ::::+ def-file
243
244 Output (the order may be different):
245
246 A G D
247 A H E
248 A I F
249 B G D
250 B H E
251 B I F
252 C G D
253 C H E
254 C I F
255
256 If one of the input sources is too short when using :::+ or ::::+, the
257 rest will be ignored:
258
259 parallel echo ::: A B C D E :::+ F G
260
261 Output (the order may be different):
262
263 A F
264 B G
265
266 Changing the argument separator.
267 GNU parallel can use other separators than ::: or ::::. This is
268 typically useful if ::: or :::: is used in the command to run:
269
270 parallel --arg-sep ,, echo ,, A B C :::: def-file
271
272 Output (the order may be different):
273
274 A D
275 A E
276 A F
277 B D
278 B E
279 B F
280 C D
281 C E
282 C F
283
284 Changing the argument file separator:
285
286 parallel --arg-file-sep // echo ::: A B C // def-file
287
288 Output: Same as above.
289
290 Changing the argument delimiter
291 GNU parallel will normally treat a full line as a single argument: It
292 uses \n as argument delimiter. This can be changed with -d:
293
294 parallel -d _ echo :::: abc_-file
295
296 Output (the order may be different):
297
298 A
299 B
300 C
301
302 NUL can be given as \0:
303
304 parallel -d '\0' echo :::: abc0-file
305
306 Output: Same as above.
307
308 A shorthand for -d '\0' is -0 (this will often be used to read files
309 from find ... -print0):
310
311 parallel -0 echo :::: abc0-file
312
313 Output: Same as above.
314
315 End-of-file value for input source
316 GNU parallel can stop reading when it encounters a certain value:
317
318 parallel -E stop echo ::: A B stop C D
319
320 Output:
321
322 A
323 B
324
325 Skipping empty lines
326 Using --no-run-if-empty GNU parallel will skip empty lines.
327
328 (echo 1; echo; echo 2) | parallel --no-run-if-empty echo
329
330 Output:
331
332 1
333 2
334
336 No command means arguments are commands
337 If no command is given after parallel the arguments themselves are
338 treated as commands:
339
340 parallel ::: ls 'echo foo' pwd
341
342 Output (the order may be different):
343
344 [list of files in current dir]
345 foo
346 [/path/to/current/working/dir]
347
348 The command can be a script, a binary or a Bash function if the
349 function is exported using export -f:
350
351 # Only works in Bash
352 my_func() {
353 echo in my_func $1
354 }
355 export -f my_func
356 parallel my_func ::: 1 2 3
357
358 Output (the order may be different):
359
360 in my_func 1
361 in my_func 2
362 in my_func 3
363
364 Replacement strings
365 The 7 predefined replacement strings
366
367 GNU parallel has several replacement strings. If no replacement strings
368 are used the default is to append {}:
369
370 parallel echo ::: A/B.C
371
372 Output:
373
374 A/B.C
375
376 The default replacement string is {}:
377
378 parallel echo {} ::: A/B.C
379
380 Output:
381
382 A/B.C
383
384 The replacement string {.} removes the extension:
385
386 parallel echo {.} ::: A/B.C
387
388 Output:
389
390 A/B
391
392 The replacement string {/} removes the path:
393
394 parallel echo {/} ::: A/B.C
395
396 Output:
397
398 B.C
399
400 The replacement string {//} keeps only the path:
401
402 parallel echo {//} ::: A/B.C
403
404 Output:
405
406 A
407
408 The replacement string {/.} removes the path and the extension:
409
410 parallel echo {/.} ::: A/B.C
411
412 Output:
413
414 B
415
416 The replacement string {#} gives the job number:
417
418 parallel echo {#} ::: A B C
419
420 Output (the order may be different):
421
422 1
423 2
424 3
425
426 The replacement string {%} gives the job slot number (between 1 and
427 number of jobs to run in parallel):
428
429 parallel -j 2 echo {%} ::: A B C
430
431 Output (the order may be different and 1 and 2 may be swapped):
432
433 1
434 2
435 1
436
437 Changing the replacement strings
438
439 The replacement string {} can be changed with -I:
440
441 parallel -I ,, echo ,, ::: A/B.C
442
443 Output:
444
445 A/B.C
446
447 The replacement string {.} can be changed with --extensionreplace:
448
449 parallel --extensionreplace ,, echo ,, ::: A/B.C
450
451 Output:
452
453 A/B
454
455 The replacement string {/} can be replaced with --basenamereplace:
456
457 parallel --basenamereplace ,, echo ,, ::: A/B.C
458
459 Output:
460
461 B.C
462
463 The replacement string {//} can be changed with --dirnamereplace:
464
465 parallel --dirnamereplace ,, echo ,, ::: A/B.C
466
467 Output:
468
469 A
470
471 The replacement string {/.} can be changed with
472 --basenameextensionreplace:
473
474 parallel --basenameextensionreplace ,, echo ,, ::: A/B.C
475
476 Output:
477
478 B
479
480 The replacement string {#} can be changed with --seqreplace:
481
482 parallel --seqreplace ,, echo ,, ::: A B C
483
484 Output (the order may be different):
485
486 1
487 2
488 3
489
490 The replacement string {%} can be changed with --slotreplace:
491
492 parallel -j2 --slotreplace ,, echo ,, ::: A B C
493
494 Output (the order may be different and 1 and 2 may be swapped):
495
496 1
497 2
498 1
499
500 Perl expression replacement string
501
502 When predefined replacement strings are not flexible enough a perl
503 expression can be used instead. One example is to remove two
504 extensions: foo.tar.gz becomes foo
505
506 parallel echo '{= s:\.[^.]+$::;s:\.[^.]+$::; =}' ::: foo.tar.gz
507
508 Output:
509
510 foo
511
512 In {= =} you can access all of GNU parallel's internal functions and
513 variables. A few are worth mentioning.
514
515 total_jobs() returns the total number of jobs:
516
517 parallel echo Job {#} of {= '$_=total_jobs()' =} ::: {1..5}
518
519 Output:
520
521 Job 1 of 5
522 Job 2 of 5
523 Job 3 of 5
524 Job 4 of 5
525 Job 5 of 5
526
527 Q(...) shell quotes the string:
528
529 parallel echo {} shell quoted is {= '$_=Q($_)' =} ::: '*/!#$'
530
531 Output:
532
533 */!#$ shell quoted is \*/\!\#\$
534
535 skip() skips the job:
536
537 parallel echo {= 'if($_==3) { skip() }' =} ::: {1..5}
538
539 Output:
540
541 1
542 2
543 4
544 5
545
546 @arg contains the input source variables:
547
548 parallel echo {= 'if($arg[1]==$arg[2]) { skip() }' =} \
549 ::: {1..3} ::: {1..3}
550
551 Output:
552
553 1 2
554 1 3
555 2 1
556 2 3
557 3 1
558 3 2
559
560 If the strings {= and =} cause problems they can be replaced with
561 --parens:
562
563 parallel --parens ,,,, echo ',, s:\.[^.]+$::;s:\.[^.]+$::; ,,' \
564 ::: foo.tar.gz
565
566 Output:
567
568 foo
569
570 To define a shorthand replacement string use --rpl:
571
572 parallel --rpl '.. s:\.[^.]+$::;s:\.[^.]+$::;' echo '..' \
573 ::: foo.tar.gz
574
575 Output: Same as above.
576
577 If the shorthand starts with { it can be used as a positional
578 replacement string, too:
579
580 parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{..}'
581 ::: foo.tar.gz
582
583 Output: Same as above.
584
585 If the shorthand contains matching parenthesis the replacement string
586 becomes a dynamic replacement string and the string in the parenthesis
587 can be accessed as $$1. If there are multiple matching parenthesis, the
588 matched strings can be accessed using $$2, $$3 and so on.
589
590 You can think of this as giving arguments to the replacement string.
591 Here we give the argument .tar.gz to the replacement string {%string}
592 which removes string:
593
594 parallel --rpl '{%(.+?)} s/$$1$//;' echo {%.tar.gz}.zip ::: foo.tar.gz
595
596 Output:
597
598 foo.zip
599
600 Here we give the two arguments tar.gz and zip to the replacement string
601 {/string1/string2} which replaces string1 with string2:
602
603 parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;' echo {/tar.gz/zip} \
604 ::: foo.tar.gz
605
606 Output:
607
608 foo.zip
609
610 GNU parallel's 7 replacement strings are implemented as this:
611
612 --rpl '{} '
613 --rpl '{#} $_=$job->seq()'
614 --rpl '{%} $_=$job->slot()'
615 --rpl '{/} s:.*/::'
616 --rpl '{//} $Global::use{"File::Basename"} ||=
617 eval "use File::Basename; 1;"; $_ = dirname($_);'
618 --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
619 --rpl '{.} s:\.[^/.]+$::'
620
621 Positional replacement strings
622
623 With multiple input sources the argument from the individual input
624 sources can be accessed with {number}:
625
626 parallel echo {1} and {2} ::: A B ::: C D
627
628 Output (the order may be different):
629
630 A and C
631 A and D
632 B and C
633 B and D
634
635 The positional replacement strings can also be modified using /, //,
636 /., and .:
637
638 parallel echo /={1/} //={1//} /.={1/.} .={1.} ::: A/B.C D/E.F
639
640 Output (the order may be different):
641
642 /=B.C //=A /.=B .=A/B
643 /=E.F //=D /.=E .=D/E
644
645 If a position is negative, it will refer to the input source counted
646 from behind:
647
648 parallel echo 1={1} 2={2} 3={3} -1={-1} -2={-2} -3={-3} \
649 ::: A B ::: C D ::: E F
650
651 Output (the order may be different):
652
653 1=A 2=C 3=E -1=E -2=C -3=A
654 1=A 2=C 3=F -1=F -2=C -3=A
655 1=A 2=D 3=E -1=E -2=D -3=A
656 1=A 2=D 3=F -1=F -2=D -3=A
657 1=B 2=C 3=E -1=E -2=C -3=B
658 1=B 2=C 3=F -1=F -2=C -3=B
659 1=B 2=D 3=E -1=E -2=D -3=B
660 1=B 2=D 3=F -1=F -2=D -3=B
661
662 Positional perl expression replacement string
663
664 To use a perl expression as a positional replacement string simply
665 prepend the perl expression with number and space:
666
667 parallel echo '{=2 s:\.[^.]+$::;s:\.[^.]+$::; =} {1}' \
668 ::: bar ::: foo.tar.gz
669
670 Output:
671
672 foo bar
673
674 If a shorthand defined using --rpl starts with { it can be used as a
675 positional replacement string, too:
676
677 parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{2..} {1}' \
678 ::: bar ::: foo.tar.gz
679
680 Output: Same as above.
681
682 Input from columns
683
684 The columns in a file can be bound to positional replacement strings
685 using --colsep. Here the columns are separated by TAB (\t):
686
687 parallel --colsep '\t' echo 1={1} 2={2} :::: tsv-file.tsv
688
689 Output (the order may be different):
690
691 1=f1 2=f2
692 1=A 2=B
693 1=C 2=D
694
695 Header defined replacement strings
696
697 With --header GNU parallel will use the first value of the input source
698 as the name of the replacement string. Only the non-modified version {}
699 is supported:
700
701 parallel --header : echo f1={f1} f2={f2} ::: f1 A B ::: f2 C D
702
703 Output (the order may be different):
704
705 f1=A f2=C
706 f1=A f2=D
707 f1=B f2=C
708 f1=B f2=D
709
710 It is useful with --colsep for processing files with TAB separated
711 values:
712
713 parallel --header : --colsep '\t' echo f1={f1} f2={f2} \
714 :::: tsv-file.tsv
715
716 Output (the order may be different):
717
718 f1=A f2=B
719 f1=C f2=D
720
721 More pre-defined replacement strings with --plus
722
723 --plus adds the replacement strings {+/} {+.} {+..} {+...} {..} {...}
724 {/..} {/...} {##}. The idea being that {+foo} matches the opposite of
725 {foo} and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
726 {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}.
727
728 parallel --plus echo {} ::: dir/sub/file.ex1.ex2.ex3
729 parallel --plus echo {+/}/{/} ::: dir/sub/file.ex1.ex2.ex3
730 parallel --plus echo {.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
731 parallel --plus echo {+/}/{/.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
732 parallel --plus echo {..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
733 parallel --plus echo {+/}/{/..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
734 parallel --plus echo {...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
735 parallel --plus echo {+/}/{/...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
736
737 Output:
738
739 dir/sub/file.ex1.ex2.ex3
740
741 {##} is simply the number of jobs:
742
743 parallel --plus echo Job {#} of {##} ::: {1..5}
744
745 Output:
746
747 Job 1 of 5
748 Job 2 of 5
749 Job 3 of 5
750 Job 4 of 5
751 Job 5 of 5
752
753 Dynamic replacement strings with --plus
754
755 --plus also defines these dynamic replacement strings:
756
757 {:-string} Default value is string if the argument is empty.
758
759 {:number} Substring from number till end of string.
760
761 {:number1:number2} Substring from number1 to number2.
762
763 {#string} If the argument starts with string, remove it.
764
765 {%string} If the argument ends with string, remove it.
766
767 {/string1/string2} Replace string1 with string2.
768
769 {^string} If the argument starts with string, upper case it.
770 string must be a single letter.
771
772 {^^string} If the argument contains string, upper case it.
773 string must be a single letter.
774
775 {,string} If the argument starts with string, lower case it.
776 string must be a single letter.
777
778 {,,string} If the argument contains string, lower case it.
779 string must be a single letter.
780
781 They are inspired from Bash:
782
783 unset myvar
784 echo ${myvar:-myval}
785 parallel --plus echo {:-myval} ::: "$myvar"
786
787 myvar=abcAaAdef
788 echo ${myvar:2}
789 parallel --plus echo {:2} ::: "$myvar"
790
791 echo ${myvar:2:3}
792 parallel --plus echo {:2:3} ::: "$myvar"
793
794 echo ${myvar#bc}
795 parallel --plus echo {#bc} ::: "$myvar"
796 echo ${myvar#abc}
797 parallel --plus echo {#abc} ::: "$myvar"
798
799 echo ${myvar%de}
800 parallel --plus echo {%de} ::: "$myvar"
801 echo ${myvar%def}
802 parallel --plus echo {%def} ::: "$myvar"
803
804 echo ${myvar/def/ghi}
805 parallel --plus echo {/def/ghi} ::: "$myvar"
806
807 echo ${myvar^a}
808 parallel --plus echo {^a} ::: "$myvar"
809 echo ${myvar^^a}
810 parallel --plus echo {^^a} ::: "$myvar"
811
812 myvar=AbcAaAdef
813 echo ${myvar,A}
814 parallel --plus echo '{,A}' ::: "$myvar"
815 echo ${myvar,,A}
816 parallel --plus echo '{,,A}' ::: "$myvar"
817
818 Output:
819
820 myval
821 myval
822 cAaAdef
823 cAaAdef
824 cAa
825 cAa
826 abcAaAdef
827 abcAaAdef
828 AaAdef
829 AaAdef
830 abcAaAdef
831 abcAaAdef
832 abcAaA
833 abcAaA
834 abcAaAghi
835 abcAaAghi
836 AbcAaAdef
837 AbcAaAdef
838 AbcAAAdef
839 AbcAAAdef
840 abcAaAdef
841 abcAaAdef
842 abcaaadef
843 abcaaadef
844
845 More than one argument
846 With --xargs GNU parallel will fit as many arguments as possible on a
847 single line:
848
849 cat num30000 | parallel --xargs echo | wc -l
850
851 Output (if you run this under Bash on GNU/Linux):
852
853 2
854
855 The 30000 arguments fitted on 2 lines.
856
857 The maximal length of a single line can be set with -s. With a maximal
858 line length of 10000 chars 17 commands will be run:
859
860 cat num30000 | parallel --xargs -s 10000 echo | wc -l
861
862 Output:
863
864 17
865
866 For better parallelism GNU parallel can distribute the arguments
867 between all the parallel jobs when end of file is met.
868
869 Below GNU parallel reads the last argument when generating the second
870 job. When GNU parallel reads the last argument, it spreads all the
871 arguments for the second job over 4 jobs instead, as 4 parallel jobs
872 are requested.
873
874 The first job will be the same as the --xargs example above, but the
875 second job will be split into 4 evenly sized jobs, resulting in a total
876 of 5 jobs:
877
878 cat num30000 | parallel --jobs 4 -m echo | wc -l
879
880 Output (if you run this under Bash on GNU/Linux):
881
882 5
883
884 This is even more visible when running 4 jobs with 10 arguments. The 10
885 arguments are being spread over 4 jobs:
886
887 parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10
888
889 Output:
890
891 1 2 3
892 4 5 6
893 7 8 9
894 10
895
896 A replacement string can be part of a word. -m will not repeat the
897 context:
898
899 parallel --jobs 4 -m echo pre-{}-post ::: A B C D E F G
900
901 Output (the order may be different):
902
903 pre-A B-post
904 pre-C D-post
905 pre-E F-post
906 pre-G-post
907
908 To repeat the context use -X which otherwise works like -m:
909
910 parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G
911
912 Output (the order may be different):
913
914 pre-A-post pre-B-post
915 pre-C-post pre-D-post
916 pre-E-post pre-F-post
917 pre-G-post
918
919 To limit the number of arguments use -N:
920
921 parallel -N3 echo ::: A B C D E F G H
922
923 Output (the order may be different):
924
925 A B C
926 D E F
927 G H
928
929 -N also sets the positional replacement strings:
930
931 parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H
932
933 Output (the order may be different):
934
935 1=A 2=B 3=C
936 1=D 2=E 3=F
937 1=G 2=H 3=
938
939 -N0 reads 1 argument but inserts none:
940
941 parallel -N0 echo foo ::: 1 2 3
942
943 Output:
944
945 foo
946 foo
947 foo
948
949 Quoting
950 Command lines that contain special characters may need to be protected
951 from the shell.
952
953 The perl program print "@ARGV\n" basically works like echo.
954
955 perl -e 'print "@ARGV\n"' A
956
957 Output:
958
959 A
960
961 To run that in parallel the command needs to be quoted:
962
963 parallel perl -e 'print "@ARGV\n"' ::: This wont work
964
965 Output:
966
967 [Nothing]
968
969 To quote the command use -q:
970
971 parallel -q perl -e 'print "@ARGV\n"' ::: This works
972
973 Output (the order may be different):
974
975 This
976 works
977
978 Or you can quote the critical part using \':
979
980 parallel perl -e \''print "@ARGV\n"'\' ::: This works, too
981
982 Output (the order may be different):
983
984 This
985 works,
986 too
987
988 GNU parallel can also \-quote full lines. Simply run this:
989
990 parallel --shellquote
991 Warning: Input is read from the terminal. You either know what you
992 Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
993 Warning: ::: or :::: or to pipe data into parallel. If so
994 Warning: consider going through the tutorial: man parallel_tutorial
995 Warning: Press CTRL-D to exit.
996 perl -e 'print "@ARGV\n"'
997 [CTRL-D]
998
999 Output:
1000
1001 perl\ -e\ \'print\ \"@ARGV\\n\"\'
1002
1003 This can then be used as the command:
1004
1005 parallel perl\ -e\ \'print\ \"@ARGV\\n\"\' ::: This also works
1006
1007 Output (the order may be different):
1008
1009 This
1010 also
1011 works
1012
1013 Trimming space
1014 Space can be trimmed on the arguments using --trim:
1015
1016 parallel --trim r echo pre-{}-post ::: ' A '
1017
1018 Output:
1019
1020 pre- A-post
1021
1022 To trim on the left side:
1023
1024 parallel --trim l echo pre-{}-post ::: ' A '
1025
1026 Output:
1027
1028 pre-A -post
1029
1030 To trim on the both sides:
1031
1032 parallel --trim lr echo pre-{}-post ::: ' A '
1033
1034 Output:
1035
1036 pre-A-post
1037
1038 Respecting the shell
1039 This tutorial uses Bash as the shell. GNU parallel respects which shell
1040 you are using, so in zsh you can do:
1041
1042 parallel echo \={} ::: zsh bash ls
1043
1044 Output:
1045
1046 /usr/bin/zsh
1047 /bin/bash
1048 /bin/ls
1049
1050 In csh you can do:
1051
1052 parallel 'set a="{}"; if( { test -d "$a" } ) echo "$a is a dir"' ::: *
1053
1054 Output:
1055
1056 [somedir] is a dir
1057
1058 This also becomes useful if you use GNU parallel in a shell script: GNU
1059 parallel will use the same shell as the shell script.
1060
1062 The output can prefixed with the argument:
1063
1064 parallel --tag echo foo-{} ::: A B C
1065
1066 Output (the order may be different):
1067
1068 A foo-A
1069 B foo-B
1070 C foo-C
1071
1072 To prefix it with another string use --tagstring:
1073
1074 parallel --tagstring {}-bar echo foo-{} ::: A B C
1075
1076 Output (the order may be different):
1077
1078 A-bar foo-A
1079 B-bar foo-B
1080 C-bar foo-C
1081
1082 To see what commands will be run without running them use --dryrun:
1083
1084 parallel --dryrun echo {} ::: A B C
1085
1086 Output (the order may be different):
1087
1088 echo A
1089 echo B
1090 echo C
1091
1092 To print the command before running them use --verbose:
1093
1094 parallel --verbose echo {} ::: A B C
1095
1096 Output (the order may be different):
1097
1098 echo A
1099 echo B
1100 A
1101 echo C
1102 B
1103 C
1104
1105 GNU parallel will postpone the output until the command completes:
1106
1107 parallel -j2 'printf "%s-start\n%s" {} {};
1108 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1109
1110 Output:
1111
1112 2-start
1113 2-middle
1114 2-end
1115 1-start
1116 1-middle
1117 1-end
1118 4-start
1119 4-middle
1120 4-end
1121
1122 To get the output immediately use --ungroup:
1123
1124 parallel -j2 --ungroup 'printf "%s-start\n%s" {} {};
1125 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1126
1127 Output:
1128
1129 4-start
1130 42-start
1131 2-middle
1132 2-end
1133 1-start
1134 1-middle
1135 1-end
1136 -middle
1137 4-end
1138
1139 --ungroup is fast, but can cause half a line from one job to be mixed
1140 with half a line of another job. That has happened in the second line,
1141 where the line '4-middle' is mixed with '2-start'.
1142
1143 To avoid this use --linebuffer:
1144
1145 parallel -j2 --linebuffer 'printf "%s-start\n%s" {} {};
1146 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1147
1148 Output:
1149
1150 4-start
1151 2-start
1152 2-middle
1153 2-end
1154 1-start
1155 1-middle
1156 1-end
1157 4-middle
1158 4-end
1159
1160 To force the output in the same order as the arguments use
1161 --keep-order/-k:
1162
1163 parallel -j2 -k 'printf "%s-start\n%s" {} {};
1164 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1165
1166 Output:
1167
1168 4-start
1169 4-middle
1170 4-end
1171 2-start
1172 2-middle
1173 2-end
1174 1-start
1175 1-middle
1176 1-end
1177
1178 Saving output into files
1179 GNU parallel can save the output of each job into files:
1180
1181 parallel --files echo ::: A B C
1182
1183 Output will be similar to this:
1184
1185 /tmp/pAh6uWuQCg.par
1186 /tmp/opjhZCzAX4.par
1187 /tmp/W0AT_Rph2o.par
1188
1189 By default GNU parallel will cache the output in files in /tmp. This
1190 can be changed by setting $TMPDIR or --tmpdir:
1191
1192 parallel --tmpdir /var/tmp --files echo ::: A B C
1193
1194 Output will be similar to this:
1195
1196 /var/tmp/N_vk7phQRc.par
1197 /var/tmp/7zA4Ccf3wZ.par
1198 /var/tmp/LIuKgF_2LP.par
1199
1200 Or:
1201
1202 TMPDIR=/var/tmp parallel --files echo ::: A B C
1203
1204 Output: Same as above.
1205
1206 The output files can be saved in a structured way using --results:
1207
1208 parallel --results outdir echo ::: A B C
1209
1210 Output:
1211
1212 A
1213 B
1214 C
1215
1216 These files were also generated containing the standard output
1217 (stdout), standard error (stderr), and the sequence number (seq):
1218
1219 outdir/1/A/seq
1220 outdir/1/A/stderr
1221 outdir/1/A/stdout
1222 outdir/1/B/seq
1223 outdir/1/B/stderr
1224 outdir/1/B/stdout
1225 outdir/1/C/seq
1226 outdir/1/C/stderr
1227 outdir/1/C/stdout
1228
1229 --header : will take the first value as name and use that in the
1230 directory structure. This is useful if you are using multiple input
1231 sources:
1232
1233 parallel --header : --results outdir echo ::: f1 A B ::: f2 C D
1234
1235 Generated files:
1236
1237 outdir/f1/A/f2/C/seq
1238 outdir/f1/A/f2/C/stderr
1239 outdir/f1/A/f2/C/stdout
1240 outdir/f1/A/f2/D/seq
1241 outdir/f1/A/f2/D/stderr
1242 outdir/f1/A/f2/D/stdout
1243 outdir/f1/B/f2/C/seq
1244 outdir/f1/B/f2/C/stderr
1245 outdir/f1/B/f2/C/stdout
1246 outdir/f1/B/f2/D/seq
1247 outdir/f1/B/f2/D/stderr
1248 outdir/f1/B/f2/D/stdout
1249
1250 The directories are named after the variables and their values.
1251
1253 Number of simultaneous jobs
1254 The number of concurrent jobs is given with --jobs/-j:
1255
1256 /usr/bin/time parallel -N0 -j64 sleep 1 :::: num128
1257
1258 With 64 jobs in parallel the 128 sleeps will take 2-8 seconds to run -
1259 depending on how fast your machine is.
1260
1261 By default --jobs is the same as the number of CPU cores. So this:
1262
1263 /usr/bin/time parallel -N0 sleep 1 :::: num128
1264
1265 should take twice the time of running 2 jobs per CPU core:
1266
1267 /usr/bin/time parallel -N0 --jobs 200% sleep 1 :::: num128
1268
1269 --jobs 0 will run as many jobs in parallel as possible:
1270
1271 /usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: num128
1272
1273 which should take 1-7 seconds depending on how fast your machine is.
1274
1275 --jobs can read from a file which is re-read when a job finishes:
1276
1277 echo 50% > my_jobs
1278 /usr/bin/time parallel -N0 --jobs my_jobs sleep 1 :::: num128 &
1279 sleep 1
1280 echo 0 > my_jobs
1281 wait
1282
1283 The first second only 50% of the CPU cores will run a job. Then 0 is
1284 put into my_jobs and then the rest of the jobs will be started in
1285 parallel.
1286
1287 Instead of basing the percentage on the number of CPU cores GNU
1288 parallel can base it on the number of CPUs:
1289
1290 parallel --use-cpus-instead-of-cores -N0 sleep 1 :::: num8
1291
1292 Shuffle job order
1293 If you have many jobs (e.g. by multiple combinations of input sources),
1294 it can be handy to shuffle the jobs, so you get different values run.
1295 Use --shuf for that:
1296
1297 parallel --shuf echo ::: 1 2 3 ::: a b c ::: A B C
1298
1299 Output:
1300
1301 All combinations but different order for each run.
1302
1303 Interactivity
1304 GNU parallel can ask the user if a command should be run using
1305 --interactive:
1306
1307 parallel --interactive echo ::: 1 2 3
1308
1309 Output:
1310
1311 echo 1 ?...y
1312 echo 2 ?...n
1313 1
1314 echo 3 ?...y
1315 3
1316
1317 GNU parallel can be used to put arguments on the command line for an
1318 interactive command such as emacs to edit one file at a time:
1319
1320 parallel --tty emacs ::: 1 2 3
1321
1322 Or give multiple argument in one go to open multiple files:
1323
1324 parallel -X --tty vi ::: 1 2 3
1325
1326 A terminal for every job
1327 Using --tmux GNU parallel can start a terminal for every job run:
1328
1329 seq 10 20 | parallel --tmux 'echo start {}; sleep {}; echo done {}'
1330
1331 This will tell you to run something similar to:
1332
1333 tmux -S /tmp/tmsrPrO0 attach
1334
1335 Using normal tmux keystrokes (CTRL-b n or CTRL-b p) you can cycle
1336 between windows of the running jobs. When a job is finished it will
1337 pause for 10 seconds before closing the window.
1338
1339 Timing
1340 Some jobs do heavy I/O when they start. To avoid a thundering herd GNU
1341 parallel can delay starting new jobs. --delay X will make sure there is
1342 at least X seconds between each start:
1343
1344 parallel --delay 2.5 echo Starting {}\;date ::: 1 2 3
1345
1346 Output:
1347
1348 Starting 1
1349 Thu Aug 15 16:24:33 CEST 2013
1350 Starting 2
1351 Thu Aug 15 16:24:35 CEST 2013
1352 Starting 3
1353 Thu Aug 15 16:24:38 CEST 2013
1354
1355 If jobs taking more than a certain amount of time are known to fail,
1356 they can be stopped with --timeout. The accuracy of --timeout is 2
1357 seconds:
1358
1359 parallel --timeout 4.1 sleep {}\; echo {} ::: 2 4 6 8
1360
1361 Output:
1362
1363 2
1364 4
1365
1366 GNU parallel can compute the median runtime for jobs and kill those
1367 that take more than 200% of the median runtime:
1368
1369 parallel --timeout 200% sleep {}\; echo {} ::: 2.1 2.2 3 7 2.3
1370
1371 Output:
1372
1373 2.1
1374 2.2
1375 3
1376 2.3
1377
1378 Progress information
1379 Based on the runtime of completed jobs GNU parallel can estimate the
1380 total runtime:
1381
1382 parallel --eta sleep ::: 1 3 2 2 1 3 3 2 1
1383
1384 Output:
1385
1386 Computers / CPU cores / Max jobs to run
1387 1:local / 2 / 2
1388
1389 Computer:jobs running/jobs completed/%of started jobs/
1390 Average seconds to complete
1391 ETA: 2s 0left 1.11avg local:0/9/100%/1.1s
1392
1393 GNU parallel can give progress information with --progress:
1394
1395 parallel --progress sleep ::: 1 3 2 2 1 3 3 2 1
1396
1397 Output:
1398
1399 Computers / CPU cores / Max jobs to run
1400 1:local / 2 / 2
1401
1402 Computer:jobs running/jobs completed/%of started jobs/
1403 Average seconds to complete
1404 local:0/9/100%/1.1s
1405
1406 A progress bar can be shown with --bar:
1407
1408 parallel --bar sleep ::: 1 3 2 2 1 3 3 2 1
1409
1410 And a graphic bar can be shown with --bar and zenity:
1411
1412 seq 1000 | parallel -j10 --bar '(echo -n {};sleep 0.1)' \
1413 2> >(perl -pe 'BEGIN{$/="\r";$|=1};s/\r/\n/g' |
1414 zenity --progress --auto-kill --auto-close)
1415
1416 A logfile of the jobs completed so far can be generated with --joblog:
1417
1418 parallel --joblog /tmp/log exit ::: 1 2 3 0
1419 cat /tmp/log
1420
1421 Output:
1422
1423 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1424 1 : 1376577364.974 0.008 0 0 1 0 exit 1
1425 2 : 1376577364.982 0.013 0 0 2 0 exit 2
1426 3 : 1376577364.990 0.013 0 0 3 0 exit 3
1427 4 : 1376577365.003 0.003 0 0 0 0 exit 0
1428
1429 The log contains the job sequence, which host the job was run on, the
1430 start time and run time, how much data was transferred, the exit value,
1431 the signal that killed the job, and finally the command being run.
1432
1433 With a joblog GNU parallel can be stopped and later pickup where it
1434 left off. It it important that the input of the completed jobs is
1435 unchanged.
1436
1437 parallel --joblog /tmp/log exit ::: 1 2 3 0
1438 cat /tmp/log
1439 parallel --resume --joblog /tmp/log exit ::: 1 2 3 0 0 0
1440 cat /tmp/log
1441
1442 Output:
1443
1444 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1445 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1446 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1447 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1448 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1449
1450 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1451 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1452 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1453 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1454 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1455 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1456 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1457
1458 Note how the start time of the last 2 jobs is clearly different from
1459 the second run.
1460
1461 With --resume-failed GNU parallel will re-run the jobs that failed:
1462
1463 parallel --resume-failed --joblog /tmp/log exit ::: 1 2 3 0 0 0
1464 cat /tmp/log
1465
1466 Output:
1467
1468 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1469 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1470 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1471 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1472 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1473 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1474 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1475 1 : 1376580154.433 0.010 0 0 1 0 exit 1
1476 2 : 1376580154.444 0.022 0 0 2 0 exit 2
1477 3 : 1376580154.466 0.005 0 0 3 0 exit 3
1478
1479 Note how seq 1 2 3 have been repeated because they had exit value
1480 different from 0.
1481
1482 --retry-failed does almost the same as --resume-failed. Where
1483 --resume-failed reads the commands from the command line (and ignores
1484 the commands in the joblog), --retry-failed ignores the command line
1485 and reruns the commands mentioned in the joblog.
1486
1487 parallel --retry-failed --joblog /tmp/log
1488 cat /tmp/log
1489
1490 Output:
1491
1492 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1493 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1494 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1495 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1496 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1497 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1498 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1499 1 : 1376580154.433 0.010 0 0 1 0 exit 1
1500 2 : 1376580154.444 0.022 0 0 2 0 exit 2
1501 3 : 1376580154.466 0.005 0 0 3 0 exit 3
1502 1 : 1376580164.633 0.010 0 0 1 0 exit 1
1503 2 : 1376580164.644 0.022 0 0 2 0 exit 2
1504 3 : 1376580164.666 0.005 0 0 3 0 exit 3
1505
1506 Termination
1507 Unconditional termination
1508
1509 By default GNU parallel will wait for all jobs to finish before
1510 exiting.
1511
1512 If you send GNU parallel the TERM signal, GNU parallel will stop
1513 spawning new jobs and wait for the remaining jobs to finish. If you
1514 send GNU parallel the TERM signal again, GNU parallel will kill all
1515 running jobs and exit.
1516
1517 Termination dependent on job status
1518
1519 For certain jobs there is no need to continue if one of the jobs fails
1520 and has an exit code different from 0. GNU parallel will stop spawning
1521 new jobs with --halt soon,fail=1:
1522
1523 parallel -j2 --halt soon,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1524
1525 Output:
1526
1527 0
1528 0
1529 1
1530 parallel: This job failed:
1531 echo 1; exit 1
1532 parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1533 2
1534
1535 With --halt now,fail=1 the running jobs will be killed immediately:
1536
1537 parallel -j2 --halt now,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1538
1539 Output:
1540
1541 0
1542 0
1543 1
1544 parallel: This job failed:
1545 echo 1; exit 1
1546
1547 If --halt is given a percentage this percentage of the jobs must fail
1548 before GNU parallel stops spawning more jobs:
1549
1550 parallel -j2 --halt soon,fail=20% echo {}\; exit {} \
1551 ::: 0 1 2 3 4 5 6 7 8 9
1552
1553 Output:
1554
1555 0
1556 1
1557 parallel: This job failed:
1558 echo 1; exit 1
1559 2
1560 parallel: This job failed:
1561 echo 2; exit 2
1562 parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1563 3
1564 parallel: This job failed:
1565 echo 3; exit 3
1566
1567 If you are looking for success instead of failures, you can use
1568 success. This will finish as soon as the first job succeeds:
1569
1570 parallel -j2 --halt now,success=1 echo {}\; exit {} ::: 1 2 3 0 4 5 6
1571
1572 Output:
1573
1574 1
1575 2
1576 3
1577 0
1578 parallel: This job succeeded:
1579 echo 0; exit 0
1580
1581 GNU parallel can retry the command with --retries. This is useful if a
1582 command fails for unknown reasons now and then.
1583
1584 parallel -k --retries 3 \
1585 'echo tried {} >>/tmp/runs; echo completed {}; exit {}' ::: 1 2 0
1586 cat /tmp/runs
1587
1588 Output:
1589
1590 completed 1
1591 completed 2
1592 completed 0
1593
1594 tried 1
1595 tried 2
1596 tried 1
1597 tried 2
1598 tried 1
1599 tried 2
1600 tried 0
1601
1602 Note how job 1 and 2 were tried 3 times, but 0 was not retried because
1603 it had exit code 0.
1604
1605 Termination signals (advanced)
1606
1607 Using --termseq you can control which signals are sent when killing
1608 children. Normally children will be killed by sending them SIGTERM,
1609 waiting 200 ms, then another SIGTERM, waiting 100 ms, then another
1610 SIGTERM, waiting 50 ms, then a SIGKILL, finally waiting 25 ms before
1611 giving up. It looks like this:
1612
1613 show_signals() {
1614 perl -e 'for(keys %SIG) {
1615 $SIG{$_} = eval "sub { print \"Got $_\\n\"; }";
1616 }
1617 while(1){sleep 1}'
1618 }
1619 export -f show_signals
1620 echo | parallel --termseq TERM,200,TERM,100,TERM,50,KILL,25 \
1621 -u --timeout 1 show_signals
1622
1623 Output:
1624
1625 Got TERM
1626 Got TERM
1627 Got TERM
1628
1629 Or just:
1630
1631 echo | parallel -u --timeout 1 show_signals
1632
1633 Output: Same as above.
1634
1635 You can change this to SIGINT, SIGTERM, SIGKILL:
1636
1637 echo | parallel --termseq INT,200,TERM,100,KILL,25 \
1638 -u --timeout 1 show_signals
1639
1640 Output:
1641
1642 Got INT
1643 Got TERM
1644
1645 The SIGKILL does not show because it cannot be caught, and thus the
1646 child dies.
1647
1648 Limiting the resources
1649 To avoid overloading systems GNU parallel can look at the system load
1650 before starting another job:
1651
1652 parallel --load 100% echo load is less than {} job per cpu ::: 1
1653
1654 Output:
1655
1656 [when then load is less than the number of cpu cores]
1657 load is less than 1 job per cpu
1658
1659 GNU parallel can also check if the system is swapping.
1660
1661 parallel --noswap echo the system is not swapping ::: now
1662
1663 Output:
1664
1665 [when then system is not swapping]
1666 the system is not swapping now
1667
1668 Some jobs need a lot of memory, and should only be started when there
1669 is enough memory free. Using --memfree GNU parallel can check if there
1670 is enough memory free. Additionally, GNU parallel will kill off the
1671 youngest job if the memory free falls below 50% of the size. The killed
1672 job will put back on the queue and retried later.
1673
1674 parallel --memfree 1G echo will run if more than 1 GB is ::: free
1675
1676 GNU parallel can run the jobs with a nice value. This will work both
1677 locally and remotely.
1678
1679 parallel --nice 17 echo this is being run with nice -n ::: 17
1680
1681 Output:
1682
1683 this is being run with nice -n 17
1684
1686 GNU parallel can run jobs on remote servers. It uses ssh to communicate
1687 with the remote machines.
1688
1689 Sshlogin
1690 The most basic sshlogin is -S host:
1691
1692 parallel -S $SERVER1 echo running on ::: $SERVER1
1693
1694 Output:
1695
1696 running on [$SERVER1]
1697
1698 To use a different username prepend the server with username@:
1699
1700 parallel -S username@$SERVER1 echo running on ::: username@$SERVER1
1701
1702 Output:
1703
1704 running on [username@$SERVER1]
1705
1706 The special sshlogin : is the local machine:
1707
1708 parallel -S : echo running on ::: the_local_machine
1709
1710 Output:
1711
1712 running on the_local_machine
1713
1714 If ssh is not in $PATH it can be prepended to $SERVER1:
1715
1716 parallel -S '/usr/bin/ssh '$SERVER1 echo custom ::: ssh
1717
1718 Output:
1719
1720 custom ssh
1721
1722 The ssh command can also be given using --ssh:
1723
1724 parallel --ssh /usr/bin/ssh -S $SERVER1 echo custom ::: ssh
1725
1726 or by setting $PARALLEL_SSH:
1727
1728 export PARALLEL_SSH=/usr/bin/ssh
1729 parallel -S $SERVER1 echo custom ::: ssh
1730
1731 Several servers can be given using multiple -S:
1732
1733 parallel -S $SERVER1 -S $SERVER2 echo ::: running on more hosts
1734
1735 Output (the order may be different):
1736
1737 running
1738 on
1739 more
1740 hosts
1741
1742 Or they can be separated by ,:
1743
1744 parallel -S $SERVER1,$SERVER2 echo ::: running on more hosts
1745
1746 Output: Same as above.
1747
1748 Or newline:
1749
1750 # This gives a \n between $SERVER1 and $SERVER2
1751 SERVERS="`echo $SERVER1; echo $SERVER2`"
1752 parallel -S "$SERVERS" echo ::: running on more hosts
1753
1754 They can also be read from a file (replace user@ with the user on
1755 $SERVER2):
1756
1757 echo $SERVER1 > nodefile
1758 # Force 4 cores, special ssh-command, username
1759 echo 4//usr/bin/ssh user@$SERVER2 >> nodefile
1760 parallel --sshloginfile nodefile echo ::: running on more hosts
1761
1762 Output: Same as above.
1763
1764 Every time a job finished, the --sshloginfile will be re-read, so it is
1765 possible to both add and remove hosts while running.
1766
1767 The special --sshloginfile .. reads from ~/.parallel/sshloginfile.
1768
1769 To force GNU parallel to treat a server having a given number of CPU
1770 cores prepend the number of core followed by / to the sshlogin:
1771
1772 parallel -S 4/$SERVER1 echo force {} cpus on server ::: 4
1773
1774 Output:
1775
1776 force 4 cpus on server
1777
1778 Servers can be put into groups by prepending @groupname to the server
1779 and the group can then be selected by appending @groupname to the
1780 argument if using --hostgroup:
1781
1782 parallel --hostgroup -S @grp1/$SERVER1 -S @grp2/$SERVER2 echo {} \
1783 ::: run_on_grp1@grp1 run_on_grp2@grp2
1784
1785 Output:
1786
1787 run_on_grp1
1788 run_on_grp2
1789
1790 A host can be in multiple groups by separating the groups with +, and
1791 you can force GNU parallel to limit the groups on which the command can
1792 be run with -S @groupname:
1793
1794 parallel -S @grp1 -S @grp1+grp2/$SERVER1 -S @grp2/SERVER2 echo {} \
1795 ::: run_on_grp1 also_grp1
1796
1797 Output:
1798
1799 run_on_grp1
1800 also_grp1
1801
1802 Transferring files
1803 GNU parallel can transfer the files to be processed to the remote host.
1804 It does that using rsync.
1805
1806 echo This is input_file > input_file
1807 parallel -S $SERVER1 --transferfile {} cat ::: input_file
1808
1809 Output:
1810
1811 This is input_file
1812
1813 If the files are processed into another file, the resulting file can be
1814 transferred back:
1815
1816 echo This is input_file > input_file
1817 parallel -S $SERVER1 --transferfile {} --return {}.out \
1818 cat {} ">"{}.out ::: input_file
1819 cat input_file.out
1820
1821 Output: Same as above.
1822
1823 To remove the input and output file on the remote server use --cleanup:
1824
1825 echo This is input_file > input_file
1826 parallel -S $SERVER1 --transferfile {} --return {}.out --cleanup \
1827 cat {} ">"{}.out ::: input_file
1828 cat input_file.out
1829
1830 Output: Same as above.
1831
1832 There is a shorthand for --transferfile {} --return --cleanup called
1833 --trc:
1834
1835 echo This is input_file > input_file
1836 parallel -S $SERVER1 --trc {}.out cat {} ">"{}.out ::: input_file
1837 cat input_file.out
1838
1839 Output: Same as above.
1840
1841 Some jobs need a common database for all jobs. GNU parallel can
1842 transfer that using --basefile which will transfer the file before the
1843 first job:
1844
1845 echo common data > common_file
1846 parallel --basefile common_file -S $SERVER1 \
1847 cat common_file\; echo {} ::: foo
1848
1849 Output:
1850
1851 common data
1852 foo
1853
1854 To remove it from the remote host after the last job use --cleanup.
1855
1856 Working dir
1857 The default working dir on the remote machines is the login dir. This
1858 can be changed with --workdir mydir.
1859
1860 Files transferred using --transferfile and --return will be relative to
1861 mydir on remote computers, and the command will be executed in the dir
1862 mydir.
1863
1864 The special mydir value ... will create working dirs under
1865 ~/.parallel/tmp on the remote computers. If --cleanup is given these
1866 dirs will be removed.
1867
1868 The special mydir value . uses the current working dir. If the current
1869 working dir is beneath your home dir, the value . is treated as the
1870 relative path to your home dir. This means that if your home dir is
1871 different on remote computers (e.g. if your login is different) the
1872 relative path will still be relative to your home dir.
1873
1874 parallel -S $SERVER1 pwd ::: ""
1875 parallel --workdir . -S $SERVER1 pwd ::: ""
1876 parallel --workdir ... -S $SERVER1 pwd ::: ""
1877
1878 Output:
1879
1880 [the login dir on $SERVER1]
1881 [current dir relative on $SERVER1]
1882 [a dir in ~/.parallel/tmp/...]
1883
1884 Avoid overloading sshd
1885 If many jobs are started on the same server, sshd can be overloaded.
1886 GNU parallel can insert a delay between each job run on the same
1887 server:
1888
1889 parallel -S $SERVER1 --sshdelay 0.2 echo ::: 1 2 3
1890
1891 Output (the order may be different):
1892
1893 1
1894 2
1895 3
1896
1897 sshd will be less overloaded if using --controlmaster, which will
1898 multiplex ssh connections:
1899
1900 parallel --controlmaster -S $SERVER1 echo ::: 1 2 3
1901
1902 Output: Same as above.
1903
1904 Ignore hosts that are down
1905 In clusters with many hosts a few of them are often down. GNU parallel
1906 can ignore those hosts. In this case the host 173.194.32.46 is down:
1907
1908 parallel --filter-hosts -S 173.194.32.46,$SERVER1 echo ::: bar
1909
1910 Output:
1911
1912 bar
1913
1914 Running the same commands on all hosts
1915 GNU parallel can run the same command on all the hosts:
1916
1917 parallel --onall -S $SERVER1,$SERVER2 echo ::: foo bar
1918
1919 Output (the order may be different):
1920
1921 foo
1922 bar
1923 foo
1924 bar
1925
1926 Often you will just want to run a single command on all hosts with out
1927 arguments. --nonall is a no argument --onall:
1928
1929 parallel --nonall -S $SERVER1,$SERVER2 echo foo bar
1930
1931 Output:
1932
1933 foo bar
1934 foo bar
1935
1936 When --tag is used with --nonall and --onall the --tagstring is the
1937 host:
1938
1939 parallel --nonall --tag -S $SERVER1,$SERVER2 echo foo bar
1940
1941 Output (the order may be different):
1942
1943 $SERVER1 foo bar
1944 $SERVER2 foo bar
1945
1946 --jobs sets the number of servers to log in to in parallel.
1947
1948 Transferring environment variables and functions
1949 env_parallel is a shell function that transfers all aliases, functions,
1950 variables, and arrays. You active it by running:
1951
1952 source `which env_parallel.bash`
1953
1954 Replace bash with the shell you use.
1955
1956 Now you can use env_parallel instead of parallel and still have your
1957 environment:
1958
1959 alias myecho=echo
1960 myvar="Joe's var is"
1961 env_parallel -S $SERVER1 'myecho $myvar' ::: green
1962
1963 Output:
1964
1965 Joe's var is green
1966
1967 The disadvantage is that if your environment is huge env_parallel will
1968 fail.
1969
1970 When env_parallel fails, you can still use --env to tell GNU parallel
1971 to transfer an environment variable to the remote system.
1972
1973 MYVAR='foo bar'
1974 export MYVAR
1975 parallel --env MYVAR -S $SERVER1 echo '$MYVAR' ::: baz
1976
1977 Output:
1978
1979 foo bar baz
1980
1981 This works for functions, too, if your shell is Bash:
1982
1983 # This only works in Bash
1984 my_func() {
1985 echo in my_func $1
1986 }
1987 export -f my_func
1988 parallel --env my_func -S $SERVER1 my_func ::: baz
1989
1990 Output:
1991
1992 in my_func baz
1993
1994 GNU parallel can copy all user defined variables and functions to the
1995 remote system. It just needs to record which ones to ignore in
1996 ~/.parallel/ignored_vars. Do that by running this once:
1997
1998 parallel --record-env
1999 cat ~/.parallel/ignored_vars
2000
2001 Output:
2002
2003 [list of variables to ignore - including $PATH and $HOME]
2004
2005 Now all other variables and functions defined will be copied when using
2006 --env _.
2007
2008 # The function is only copied if using Bash
2009 my_func2() {
2010 echo in my_func2 $VAR $1
2011 }
2012 export -f my_func2
2013 VAR=foo
2014 export VAR
2015
2016 parallel --env _ -S $SERVER1 'echo $VAR; my_func2' ::: bar
2017
2018 Output:
2019
2020 foo
2021 in my_func2 foo bar
2022
2023 If you use env_parallel the variables, functions, and aliases do not
2024 even need to be exported to be copied:
2025
2026 NOT='not exported var'
2027 alias myecho=echo
2028 not_ex() {
2029 myecho in not_exported_func $NOT $1
2030 }
2031 env_parallel --env _ -S $SERVER1 'echo $NOT; not_ex' ::: bar
2032
2033 Output:
2034
2035 not exported var
2036 in not_exported_func not exported var bar
2037
2038 Showing what is actually run
2039 --verbose will show the command that would be run on the local machine.
2040
2041 When using --cat, --pipepart, or when a job is run on a remote machine,
2042 the command is wrapped with helper scripts. -vv shows all of this.
2043
2044 parallel -vv --pipepart --block 1M wc :::: num30000
2045
2046 Output:
2047
2048 <num30000 perl -e 'while(@ARGV) { sysseek(STDIN,shift,0) || die;
2049 $left = shift; while($read = sysread(STDIN,$buf, ($left > 131072
2050 ? 131072 : $left))){ $left -= $read; syswrite(STDOUT,$buf); } }'
2051 0 0 0 168894 | (wc)
2052 30000 30000 168894
2053
2054 When the command gets more complex, the output is so hard to read, that
2055 it is only useful for debugging:
2056
2057 my_func3() {
2058 echo in my_func $1 > $1.out
2059 }
2060 export -f my_func3
2061 parallel -vv --workdir ... --nice 17 --env _ --trc {}.out \
2062 -S $SERVER1 my_func3 {} ::: abc-file
2063
2064 Output will be similar to:
2065
2066 ( ssh server -- mkdir -p ./.parallel/tmp/aspire-1928520-1;rsync
2067 --protocol 30 -rlDzR -essh ./abc-file
2068 server:./.parallel/tmp/aspire-1928520-1 );ssh server -- exec perl -e
2069 \''@GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
2070 eval"@GNU_Parallel";my$eval=decode_base64(join"",@ARGV);eval$eval;'\'
2071 c3lzdGVtKCJta2RpciIsIi1wIiwiLS0iLCIucGFyYWxsZWwvdG1wL2FzcGlyZS0xOTI4N
2072 TsgY2hkaXIgIi5wYXJhbGxlbC90bXAvYXNwaXJlLTE5Mjg1MjAtMSIgfHxwcmludChTVE
2073 BhcmFsbGVsOiBDYW5ub3QgY2hkaXIgdG8gLnBhcmFsbGVsL3RtcC9hc3BpcmUtMTkyODU
2074 iKSAmJiBleGl0IDI1NTskRU5WeyJPTERQV0QifT0iL2hvbWUvdGFuZ2UvcHJpdmF0L3Bh
2075 IjskRU5WeyJQQVJBTExFTF9QSUQifT0iMTkyODUyMCI7JEVOVnsiUEFSQUxMRUxfU0VRI
2076 0BiYXNoX2Z1bmN0aW9ucz1xdyhteV9mdW5jMyk7IGlmKCRFTlZ7IlNIRUxMIn09fi9jc2
2077 ByaW50IFNUREVSUiAiQ1NIL1RDU0ggRE8gTk9UIFNVUFBPUlQgbmV3bGluZXMgSU4gVkF
2078 TL0ZVTkNUSU9OUy4gVW5zZXQgQGJhc2hfZnVuY3Rpb25zXG4iOyBleGVjICJmYWxzZSI7
2079 YXNoZnVuYyA9ICJteV9mdW5jMygpIHsgIGVjaG8gaW4gbXlfZnVuYyBcJDEgPiBcJDEub
2080 Xhwb3J0IC1mIG15X2Z1bmMzID4vZGV2L251bGw7IjtAQVJHVj0ibXlfZnVuYzMgYWJjLW
2081 RzaGVsbD0iJEVOVntTSEVMTH0iOyR0bXBkaXI9Ii90bXAiOyRuaWNlPTE3O2RveyRFTlZ
2082 MRUxfVE1QfT0kdG1wZGlyLiIvcGFyIi5qb2luIiIsbWFweygwLi45LCJhIi4uInoiLCJB
2083 KVtyYW5kKDYyKV19KDEuLjUpO313aGlsZSgtZSRFTlZ7UEFSQUxMRUxfVE1QfSk7JFNJ
2084 fT1zdWJ7JGRvbmU9MTt9OyRwaWQ9Zm9yazt1bmxlc3MoJHBpZCl7c2V0cGdycDtldmFse
2085 W9yaXR5KDAsMCwkbmljZSl9O2V4ZWMkc2hlbGwsIi1jIiwoJGJhc2hmdW5jLiJAQVJHVi
2086 JleGVjOiQhXG4iO31kb3skcz0kczwxPzAuMDAxKyRzKjEuMDM6JHM7c2VsZWN0KHVuZGV
2087 mLHVuZGVmLCRzKTt9dW50aWwoJGRvbmV8fGdldHBwaWQ9PTEpO2tpbGwoU0lHSFVQLC0k
2088 dW5sZXNzJGRvbmU7d2FpdDtleGl0KCQ/JjEyNz8xMjgrKCQ/JjEyNyk6MSskPz4+OCk=;
2089 _EXIT_status=$?; mkdir -p ./.; rsync --protocol 30 --rsync-path=cd\
2090 ./.parallel/tmp/aspire-1928520-1/./.\;\ rsync -rlDzR -essh
2091 server:./abc-file.out ./.;ssh server -- \(rm\ -f\
2092 ./.parallel/tmp/aspire-1928520-1/abc-file\;\ sh\ -c\ \'rmdir\
2093 ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\
2094 2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);ssh
2095 server -- \(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file.out\;\
2096 sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\
2097 ./.parallel/\ 2\>/dev/null\'\;rm\ -rf\
2098 ./.parallel/tmp/aspire-1928520-1\;\);ssh server -- rm -rf
2099 .parallel/tmp/aspire-1928520-1; exit $_EXIT_status;
2100
2102 GNU parset will set shell variables to the output of GNU parallel. GNU
2103 parset has one important limitation: It cannot be part of a pipe. In
2104 particular this means it cannot read anything from standard input
2105 (stdin) or pipe output to another program.
2106
2107 To use GNU parset prepend command with destination variables:
2108
2109 parset myvar1,myvar2 echo ::: a b
2110 echo $myvar1
2111 echo $myvar2
2112
2113 Output:
2114
2115 a
2116 b
2117
2118 If you only give a single variable, it will be treated as an array:
2119
2120 parset myarray seq {} 5 ::: 1 2 3
2121 echo "${myarray[1]}"
2122
2123 Output:
2124
2125 2
2126 3
2127 4
2128 5
2129
2130 The commands to run can be an array:
2131
2132 cmd=("echo '<<joe \"double space\" cartoon>>'" "pwd")
2133 parset data ::: "${cmd[@]}"
2134 echo "${data[0]}"
2135 echo "${data[1]}"
2136
2137 Output:
2138
2139 <<joe "double space" cartoon>>
2140 [current dir]
2141
2143 GNU parallel can save into an SQL base. Point GNU parallel to a table
2144 and it will put the joblog there together with the variables and the
2145 output each in their own column.
2146
2147 CSV as SQL base
2148 The simplest is to use a CSV file as the storage table:
2149
2150 parallel --sqlandworker csv:///%2Ftmp/log.csv \
2151 seq ::: 10 ::: 12 13 14
2152 cat /tmp/log.csv
2153
2154 Note how '/' in the path must be written as %2F.
2155
2156 Output will be similar to:
2157
2158 Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2159 Command,V1,V2,Stdout,Stderr
2160 1,:,1458254498.254,0.069,0,9,0,0,"seq 10 12",10,12,"10
2161 11
2162 12
2163 ",
2164 2,:,1458254498.278,0.080,0,12,0,0,"seq 10 13",10,13,"10
2165 11
2166 12
2167 13
2168 ",
2169 3,:,1458254498.301,0.083,0,15,0,0,"seq 10 14",10,14,"10
2170 11
2171 12
2172 13
2173 14
2174 ",
2175
2176 A proper CSV reader (like LibreOffice or R's read.csv) will read this
2177 format correctly - even with fields containing newlines as above.
2178
2179 If the output is big you may want to put it into files using --results:
2180
2181 parallel --results outdir --sqlandworker csv:///%2Ftmp/log2.csv \
2182 seq ::: 10 ::: 12 13 14
2183 cat /tmp/log2.csv
2184
2185 Output will be similar to:
2186
2187 Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2188 Command,V1,V2,Stdout,Stderr
2189 1,:,1458824738.287,0.029,0,9,0,0,
2190 "seq 10 12",10,12,outdir/1/10/2/12/stdout,outdir/1/10/2/12/stderr
2191 2,:,1458824738.298,0.025,0,12,0,0,
2192 "seq 10 13",10,13,outdir/1/10/2/13/stdout,outdir/1/10/2/13/stderr
2193 3,:,1458824738.309,0.026,0,15,0,0,
2194 "seq 10 14",10,14,outdir/1/10/2/14/stdout,outdir/1/10/2/14/stderr
2195
2196 DBURL as table
2197 The CSV file is an example of a DBURL.
2198
2199 GNU parallel uses a DBURL to address the table. A DBURL has this
2200 format:
2201
2202 vendor://[[user][:password]@][host][:port]/[database[/table]
2203
2204 Example:
2205
2206 mysql://scott:tiger@my.example.com/mydatabase/mytable
2207 postgresql://scott:tiger@pg.example.com/mydatabase/mytable
2208 sqlite3:///%2Ftmp%2Fmydatabase/mytable
2209 csv:///%2Ftmp/log.csv
2210
2211 To refer to /tmp/mydatabase with sqlite or csv you need to encode the /
2212 as %2F.
2213
2214 Run a job using sqlite on mytable in /tmp/mydatabase:
2215
2216 DBURL=sqlite3:///%2Ftmp%2Fmydatabase
2217 DBURLTABLE=$DBURL/mytable
2218 parallel --sqlandworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2219
2220 To see the result:
2221
2222 sql $DBURL 'SELECT * FROM mytable ORDER BY Seq;'
2223
2224 Output will be similar to:
2225
2226 Seq|Host|Starttime|JobRuntime|Send|Receive|Exitval|_Signal|
2227 Command|V1|V2|Stdout|Stderr
2228 1|:|1451619638.903|0.806||8|0|0|echo foo baz|foo|baz|foo baz
2229 |
2230 2|:|1451619639.265|1.54||9|0|0|echo foo quuz|foo|quuz|foo quuz
2231 |
2232 3|:|1451619640.378|1.43||8|0|0|echo bar baz|bar|baz|bar baz
2233 |
2234 4|:|1451619641.473|0.958||9|0|0|echo bar quuz|bar|quuz|bar quuz
2235 |
2236
2237 The first columns are well known from --joblog. V1 and V2 are data from
2238 the input sources. Stdout and Stderr are standard output and standard
2239 error, respectively.
2240
2241 Using multiple workers
2242 Using an SQL base as storage costs overhead in the order of 1 second
2243 per job.
2244
2245 One of the situations where it makes sense is if you have multiple
2246 workers.
2247
2248 You can then have a single master machine that submits jobs to the SQL
2249 base (but does not do any of the work):
2250
2251 parallel --sqlmaster $DBURLTABLE echo ::: foo bar ::: baz quuz
2252
2253 On the worker machines you run exactly the same command except you
2254 replace --sqlmaster with --sqlworker.
2255
2256 parallel --sqlworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2257
2258 To run a master and a worker on the same machine use --sqlandworker as
2259 shown earlier.
2260
2262 The --pipe functionality puts GNU parallel in a different mode: Instead
2263 of treating the data on stdin (standard input) as arguments for a
2264 command to run, the data will be sent to stdin (standard input) of the
2265 command.
2266
2267 The typical situation is:
2268
2269 command_A | command_B | command_C
2270
2271 where command_B is slow, and you want to speed up command_B.
2272
2273 Chunk size
2274 By default GNU parallel will start an instance of command_B, read a
2275 chunk of 1 MB, and pass that to the instance. Then start another
2276 instance, read another chunk, and pass that to the second instance.
2277
2278 cat num1000000 | parallel --pipe wc
2279
2280 Output (the order may be different):
2281
2282 165668 165668 1048571
2283 149797 149797 1048579
2284 149796 149796 1048572
2285 149797 149797 1048579
2286 149797 149797 1048579
2287 149796 149796 1048572
2288 85349 85349 597444
2289
2290 The size of the chunk is not exactly 1 MB because GNU parallel only
2291 passes full lines - never half a line, thus the blocksize is only 1 MB
2292 on average. You can change the block size to 2 MB with --block:
2293
2294 cat num1000000 | parallel --pipe --block 2M wc
2295
2296 Output (the order may be different):
2297
2298 315465 315465 2097150
2299 299593 299593 2097151
2300 299593 299593 2097151
2301 85349 85349 597444
2302
2303 GNU parallel treats each line as a record. If the order of records is
2304 unimportant (e.g. you need all lines processed, but you do not care
2305 which is processed first), then you can use --roundrobin. Without
2306 --roundrobin GNU parallel will start a command per block; with
2307 --roundrobin only the requested number of jobs will be started
2308 (--jobs). The records will then be distributed between the running
2309 jobs:
2310
2311 cat num1000000 | parallel --pipe -j4 --roundrobin wc
2312
2313 Output will be similar to:
2314
2315 149797 149797 1048579
2316 299593 299593 2097151
2317 315465 315465 2097150
2318 235145 235145 1646016
2319
2320 One of the 4 instances got a single record, 2 instances got 2 full
2321 records each, and one instance got 1 full and 1 partial record.
2322
2323 Records
2324 GNU parallel sees the input as records. The default record is a single
2325 line.
2326
2327 Using -N140000 GNU parallel will read 140000 records at a time:
2328
2329 cat num1000000 | parallel --pipe -N140000 wc
2330
2331 Output (the order may be different):
2332
2333 140000 140000 868895
2334 140000 140000 980000
2335 140000 140000 980000
2336 140000 140000 980000
2337 140000 140000 980000
2338 140000 140000 980000
2339 140000 140000 980000
2340 20000 20000 140001
2341
2342 Note how that the last job could not get the full 140000 lines, but
2343 only 20000 lines.
2344
2345 If a record is 75 lines -L can be used:
2346
2347 cat num1000000 | parallel --pipe -L75 wc
2348
2349 Output (the order may be different):
2350
2351 165600 165600 1048095
2352 149850 149850 1048950
2353 149775 149775 1048425
2354 149775 149775 1048425
2355 149850 149850 1048950
2356 149775 149775 1048425
2357 85350 85350 597450
2358 25 25 176
2359
2360 Note how GNU parallel still reads a block of around 1 MB; but instead
2361 of passing full lines to wc it passes full 75 lines at a time. This of
2362 course does not hold for the last job (which in this case got 25
2363 lines).
2364
2365 Fixed length records
2366 Fixed length records can be processed by setting --recend '' and
2367 --block recordsize. A header of size n can be processed with --header
2368 .{n}.
2369
2370 Here is how to process a file with a 4-byte header and a 3-byte record
2371 size:
2372
2373 cat fixedlen | parallel --pipe --header .{4} --block 3 --recend '' \
2374 'echo start; cat; echo'
2375
2376 Output:
2377
2378 start
2379 HHHHAAA
2380 start
2381 HHHHCCC
2382 start
2383 HHHHBBB
2384
2385 It may be more efficient to increase --block to a multiplum of the
2386 record size.
2387
2388 Record separators
2389 GNU parallel uses separators to determine where two records split.
2390
2391 --recstart gives the string that starts a record; --recend gives the
2392 string that ends a record. The default is --recend '\n' (newline).
2393
2394 If both --recend and --recstart are given, then the record will only
2395 split if the recend string is immediately followed by the recstart
2396 string.
2397
2398 Here the --recend is set to ', ':
2399
2400 echo /foo, bar/, /baz, qux/, | \
2401 parallel -kN1 --recend ', ' --pipe echo JOB{#}\;cat\;echo END
2402
2403 Output:
2404
2405 JOB1
2406 /foo, END
2407 JOB2
2408 bar/, END
2409 JOB3
2410 /baz, END
2411 JOB4
2412 qux/,
2413 END
2414
2415 Here the --recstart is set to /:
2416
2417 echo /foo, bar/, /baz, qux/, | \
2418 parallel -kN1 --recstart / --pipe echo JOB{#}\;cat\;echo END
2419
2420 Output:
2421
2422 JOB1
2423 /foo, barEND
2424 JOB2
2425 /, END
2426 JOB3
2427 /baz, quxEND
2428 JOB4
2429 /,
2430 END
2431
2432 Here both --recend and --recstart are set:
2433
2434 echo /foo, bar/, /baz, qux/, | \
2435 parallel -kN1 --recend ', ' --recstart / --pipe \
2436 echo JOB{#}\;cat\;echo END
2437
2438 Output:
2439
2440 JOB1
2441 /foo, bar/, END
2442 JOB2
2443 /baz, qux/,
2444 END
2445
2446 Note the difference between setting one string and setting both
2447 strings.
2448
2449 With --regexp the --recend and --recstart will be treated as a regular
2450 expression:
2451
2452 echo foo,bar,_baz,__qux, | \
2453 parallel -kN1 --regexp --recend ,_+ --pipe \
2454 echo JOB{#}\;cat\;echo END
2455
2456 Output:
2457
2458 JOB1
2459 foo,bar,_END
2460 JOB2
2461 baz,__END
2462 JOB3
2463 qux,
2464 END
2465
2466 GNU parallel can remove the record separators with
2467 --remove-rec-sep/--rrs:
2468
2469 echo foo,bar,_baz,__qux, | \
2470 parallel -kN1 --rrs --regexp --recend ,_+ --pipe \
2471 echo JOB{#}\;cat\;echo END
2472
2473 Output:
2474
2475 JOB1
2476 foo,barEND
2477 JOB2
2478 bazEND
2479 JOB3
2480 qux,
2481 END
2482
2483 Header
2484 If the input data has a header, the header can be repeated for each job
2485 by matching the header with --header. If headers start with % you can
2486 do this:
2487
2488 cat num_%header | \
2489 parallel --header '(%.*\n)*' --pipe -N3 echo JOB{#}\;cat
2490
2491 Output (the order may be different):
2492
2493 JOB1
2494 %head1
2495 %head2
2496 1
2497 2
2498 3
2499 JOB2
2500 %head1
2501 %head2
2502 4
2503 5
2504 6
2505 JOB3
2506 %head1
2507 %head2
2508 7
2509 8
2510 9
2511 JOB4
2512 %head1
2513 %head2
2514 10
2515
2516 If the header is 2 lines, --header 2 will work:
2517
2518 cat num_%header | parallel --header 2 --pipe -N3 echo JOB{#}\;cat
2519
2520 Output: Same as above.
2521
2522 --pipepart
2523 --pipe is not very efficient. It maxes out at around 500 MB/s.
2524 --pipepart can easily deliver 5 GB/s. But there are a few limitations.
2525 The input has to be a normal file (not a pipe) given by -a or :::: and
2526 -L/-l/-N do not work. --recend and --recstart, however, do work, and
2527 records can often be split on that alone.
2528
2529 parallel --pipepart -a num1000000 --block 3m wc
2530
2531 Output (the order may be different):
2532
2533 444443 444444 3000002
2534 428572 428572 3000004
2535 126985 126984 888890
2536
2538 Input data and parallel command in the same file
2539 GNU parallel is often called as this:
2540
2541 cat input_file | parallel command
2542
2543 With --shebang the input_file and parallel can be combined into the
2544 same script.
2545
2546 UNIX shell scripts start with a shebang line like this:
2547
2548 #!/bin/bash
2549
2550 GNU parallel can do that, too. With --shebang the arguments can be
2551 listed in the file. The parallel command is the first line of the
2552 script:
2553
2554 #!/usr/bin/parallel --shebang -r echo
2555
2556 foo
2557 bar
2558 baz
2559
2560 Output (the order may be different):
2561
2562 foo
2563 bar
2564 baz
2565
2566 Parallelizing existing scripts
2567 GNU parallel is often called as this:
2568
2569 cat input_file | parallel command
2570 parallel command ::: foo bar
2571
2572 If command is a script, parallel can be combined into a single file so
2573 this will run the script in parallel:
2574
2575 cat input_file | command
2576 command foo bar
2577
2578 This perl script perl_echo works like echo:
2579
2580 #!/usr/bin/perl
2581
2582 print "@ARGV\n"
2583
2584 It can be called as this:
2585
2586 parallel perl_echo ::: foo bar
2587
2588 By changing the #!-line it can be run in parallel:
2589
2590 #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2591
2592 print "@ARGV\n"
2593
2594 Thus this will work:
2595
2596 perl_echo foo bar
2597
2598 Output (the order may be different):
2599
2600 foo
2601 bar
2602
2603 This technique can be used for:
2604
2605 Perl:
2606 #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2607
2608 print "Arguments @ARGV\n";
2609
2610 Python:
2611 #!/usr/bin/parallel --shebang-wrap /usr/bin/python
2612
2613 import sys
2614 print 'Arguments', str(sys.argv)
2615
2616 Bash/sh/zsh/Korn shell:
2617 #!/usr/bin/parallel --shebang-wrap /bin/bash
2618
2619 echo Arguments "$@"
2620
2621 csh:
2622 #!/usr/bin/parallel --shebang-wrap /bin/csh
2623
2624 echo Arguments "$argv"
2625
2626 Tcl:
2627 #!/usr/bin/parallel --shebang-wrap /usr/bin/tclsh
2628
2629 puts "Arguments $argv"
2630
2631 R:
2632 #!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave
2633
2634 args <- commandArgs(trailingOnly = TRUE)
2635 print(paste("Arguments ",args))
2636
2637 GNUplot:
2638 #!/usr/bin/parallel --shebang-wrap ARG={} /usr/bin/gnuplot
2639
2640 print "Arguments ", system('echo $ARG')
2641
2642 Ruby:
2643 #!/usr/bin/parallel --shebang-wrap /usr/bin/ruby
2644
2645 print "Arguments "
2646 puts ARGV
2647
2648 Octave:
2649 #!/usr/bin/parallel --shebang-wrap /usr/bin/octave
2650
2651 printf ("Arguments");
2652 arg_list = argv ();
2653 for i = 1:nargin
2654 printf (" %s", arg_list{i});
2655 endfor
2656 printf ("\n");
2657
2658 Common LISP:
2659 #!/usr/bin/parallel --shebang-wrap /usr/bin/clisp
2660
2661 (format t "~&~S~&" 'Arguments)
2662 (format t "~&~S~&" *args*)
2663
2664 PHP:
2665 #!/usr/bin/parallel --shebang-wrap /usr/bin/php
2666 <?php
2667 echo "Arguments";
2668 foreach(array_slice($argv,1) as $v)
2669 {
2670 echo " $v";
2671 }
2672 echo "\n";
2673 ?>
2674
2675 Node.js:
2676 #!/usr/bin/parallel --shebang-wrap /usr/bin/node
2677
2678 var myArgs = process.argv.slice(2);
2679 console.log('Arguments ', myArgs);
2680
2681 LUA:
2682 #!/usr/bin/parallel --shebang-wrap /usr/bin/lua
2683
2684 io.write "Arguments"
2685 for a = 1, #arg do
2686 io.write(" ")
2687 io.write(arg[a])
2688 end
2689 print("")
2690
2691 C#:
2692 #!/usr/bin/parallel --shebang-wrap ARGV={} /usr/bin/csharp
2693
2694 var argv = Environment.GetEnvironmentVariable("ARGV");
2695 print("Arguments "+argv);
2696
2698 GNU parallel can work as a counting semaphore. This is slower and less
2699 efficient than its normal mode.
2700
2701 A counting semaphore is like a row of toilets. People needing a toilet
2702 can use any toilet, but if there are more people than toilets, they
2703 will have to wait for one of the toilets to become available.
2704
2705 An alias for parallel --semaphore is sem.
2706
2707 sem will follow a person to the toilets, wait until a toilet is
2708 available, leave the person in the toilet and exit.
2709
2710 sem --fg will follow a person to the toilets, wait until a toilet is
2711 available, stay with the person in the toilet and exit when the person
2712 exits.
2713
2714 sem --wait will wait for all persons to leave the toilets.
2715
2716 sem does not have a queue discipline, so the next person is chosen
2717 randomly.
2718
2719 -j sets the number of toilets.
2720
2721 Mutex
2722 The default is to have only one toilet (this is called a mutex). The
2723 program is started in the background and sem exits immediately. Use
2724 --wait to wait for all sems to finish:
2725
2726 sem 'sleep 1; echo The first finished' &&
2727 echo The first is now running in the background &&
2728 sem 'sleep 1; echo The second finished' &&
2729 echo The second is now running in the background
2730 sem --wait
2731
2732 Output:
2733
2734 The first is now running in the background
2735 The first finished
2736 The second is now running in the background
2737 The second finished
2738
2739 The command can be run in the foreground with --fg, which will only
2740 exit when the command completes:
2741
2742 sem --fg 'sleep 1; echo The first finished' &&
2743 echo The first finished running in the foreground &&
2744 sem --fg 'sleep 1; echo The second finished' &&
2745 echo The second finished running in the foreground
2746 sem --wait
2747
2748 The difference between this and just running the command, is that a
2749 mutex is set, so if other sems were running in the background only one
2750 would run at a time.
2751
2752 To control which semaphore is used, use --semaphorename/--id. Run this
2753 in one terminal:
2754
2755 sem --id my_id -u 'echo First started; sleep 10; echo First done'
2756
2757 and simultaneously this in another terminal:
2758
2759 sem --id my_id -u 'echo Second started; sleep 10; echo Second done'
2760
2761 Note how the second will only be started when the first has finished.
2762
2763 Counting semaphore
2764 A mutex is like having a single toilet: When it is in use everyone else
2765 will have to wait. A counting semaphore is like having multiple
2766 toilets: Several people can use the toilets, but when they all are in
2767 use, everyone else will have to wait.
2768
2769 sem can emulate a counting semaphore. Use --jobs to set the number of
2770 toilets like this:
2771
2772 sem --jobs 3 --id my_id -u 'echo Start 1; sleep 5; echo 1 done' &&
2773 sem --jobs 3 --id my_id -u 'echo Start 2; sleep 6; echo 2 done' &&
2774 sem --jobs 3 --id my_id -u 'echo Start 3; sleep 7; echo 3 done' &&
2775 sem --jobs 3 --id my_id -u 'echo Start 4; sleep 8; echo 4 done' &&
2776 sem --wait --id my_id
2777
2778 Output:
2779
2780 Start 1
2781 Start 2
2782 Start 3
2783 1 done
2784 Start 4
2785 2 done
2786 3 done
2787 4 done
2788
2789 Timeout
2790 With --semaphoretimeout you can force running the command anyway after
2791 a period (positive number) or give up (negative number):
2792
2793 sem --id foo -u 'echo Slow started; sleep 5; echo Slow ended' &&
2794 sem --id foo --semaphoretimeout 1 'echo Forced running after 1 sec' &&
2795 sem --id foo --semaphoretimeout -2 'echo Give up after 2 secs'
2796 sem --id foo --wait
2797
2798 Output:
2799
2800 Slow started
2801 parallel: Warning: Semaphore timed out. Stealing the semaphore.
2802 Forced running after 1 sec
2803 parallel: Warning: Semaphore timed out. Exiting.
2804 Slow ended
2805
2806 Note how the 'Give up' was not run.
2807
2809 GNU parallel has some options to give short information about the
2810 configuration.
2811
2812 --help will print a summary of the most important options:
2813
2814 parallel --help
2815
2816 Output:
2817
2818 Usage:
2819
2820 parallel [options] [command [arguments]] < list_of_arguments
2821 parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
2822 cat ... | parallel --pipe [options] [command [arguments]]
2823
2824 -j n Run n jobs in parallel
2825 -k Keep same order
2826 -X Multiple arguments with context replace
2827 --colsep regexp Split input on regexp for positional replacements
2828 {} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings
2829 {3} {3.} {3/} {3/.} {=3 perl code =} Positional replacement strings
2830 With --plus: {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
2831 {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
2832
2833 -S sshlogin Example: foo@server.example.com
2834 --slf .. Use ~/.parallel/sshloginfile as the list of sshlogins
2835 --trc {}.bar Shorthand for --transfer --return {}.bar --cleanup
2836 --onall Run the given command with argument on all sshlogins
2837 --nonall Run the given command with no arguments on all sshlogins
2838
2839 --pipe Split stdin (standard input) to multiple jobs.
2840 --recend str Record end separator for --pipe.
2841 --recstart str Record start separator for --pipe.
2842
2843 See 'man parallel' for details
2844
2845 Academic tradition requires you to cite works you base your article on.
2846 When using programs that use GNU Parallel to process data for publication
2847 please cite:
2848
2849 O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
2850 ;login: The USENIX Magazine, February 2011:42-47.
2851
2852 This helps funding further development; AND IT WON'T COST YOU A CENT.
2853 If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2854
2855 When asking for help, always report the full output of this:
2856
2857 parallel --version
2858
2859 Output:
2860
2861 GNU parallel 20210122
2862 Copyright (C) 2007-2022 Ole Tange, http://ole.tange.dk and Free Software
2863 Foundation, Inc.
2864 License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
2865 This is free software: you are free to change and redistribute it.
2866 GNU parallel comes with no warranty.
2867
2868 Web site: https://www.gnu.org/software/parallel
2869
2870 When using programs that use GNU Parallel to process data for publication
2871 please cite as described in 'parallel --citation'.
2872
2873 In scripts --minversion can be used to ensure the user has at least
2874 this version:
2875
2876 parallel --minversion 20130722 && \
2877 echo Your version is at least 20130722.
2878
2879 Output:
2880
2881 20160322
2882 Your version is at least 20130722.
2883
2884 If you are using GNU parallel for research the BibTeX citation can be
2885 generated using --citation:
2886
2887 parallel --citation
2888
2889 Output:
2890
2891 Academic tradition requires you to cite works you base your article on.
2892 When using programs that use GNU Parallel to process data for publication
2893 please cite:
2894
2895 @article{Tange2011a,
2896 title = {GNU Parallel - The Command-Line Power Tool},
2897 author = {O. Tange},
2898 address = {Frederiksberg, Denmark},
2899 journal = {;login: The USENIX Magazine},
2900 month = {Feb},
2901 number = {1},
2902 volume = {36},
2903 url = {https://www.gnu.org/s/parallel},
2904 year = {2011},
2905 pages = {42-47},
2906 doi = {10.5281/zenodo.16303}
2907 }
2908
2909 (Feel free to use \nocite{Tange2011a})
2910
2911 This helps funding further development; AND IT WON'T COST YOU A CENT.
2912 If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2913
2914 If you send a copy of your published article to tange@gnu.org, it will be
2915 mentioned in the release notes of next version of GNU Parallel.
2916
2917 With --max-line-length-allowed GNU parallel will report the maximal
2918 size of the command line:
2919
2920 parallel --max-line-length-allowed
2921
2922 Output (may vary on different systems):
2923
2924 131071
2925
2926 --number-of-cpus and --number-of-cores run system specific code to
2927 determine the number of CPUs and CPU cores on the system. On
2928 unsupported platforms they will return 1:
2929
2930 parallel --number-of-cpus
2931 parallel --number-of-cores
2932
2933 Output (may vary on different systems):
2934
2935 4
2936 64
2937
2939 The defaults for GNU parallel can be changed systemwide by putting the
2940 command line options in /etc/parallel/config. They can be changed for a
2941 user by putting them in ~/.parallel/config.
2942
2943 Profiles work the same way, but have to be referred to with --profile:
2944
2945 echo '--nice 17' > ~/.parallel/nicetimeout
2946 echo '--timeout 300%' >> ~/.parallel/nicetimeout
2947 parallel --profile nicetimeout echo ::: A B C
2948
2949 Output:
2950
2951 A
2952 B
2953 C
2954
2955 Profiles can be combined:
2956
2957 echo '-vv --dry-run' > ~/.parallel/dryverbose
2958 parallel --profile dryverbose --profile nicetimeout echo ::: A B C
2959
2960 Output:
2961
2962 echo A
2963 echo B
2964 echo C
2965
2967 I hope you have learned something from this tutorial.
2968
2969 If you like GNU parallel:
2970
2971 • (Re-)walk through the tutorial if you have not done so in the past
2972 year (https://www.gnu.org/software/parallel/parallel_tutorial.html)
2973
2974 • Give a demo at your local user group/your team/your colleagues
2975
2976 • Post the intro videos and the tutorial on Reddit, Mastodon,
2977 Diaspora*, forums, blogs, Identi.ca, Google+, Twitter, Facebook,
2978 Linkedin, and mailing lists
2979
2980 • Request or write a review for your favourite blog or magazine
2981 (especially if you do something cool with GNU parallel)
2982
2983 • Invite me for your next conference
2984
2985 If you use GNU parallel for research:
2986
2987 • Please cite GNU parallel in you publications (use --citation)
2988
2989 If GNU parallel saves you money:
2990
2991 • (Have your company) donate to FSF or become a member
2992 https://my.fsf.org/donate/
2993
2994 (C) 2013-2022 Ole Tange, GFDLv1.3+ (See LICENSES/GFDL-1.3-or-later.txt)
2995
2996
2997
299820220422 2022-05-22 PARALLEL_TUTORIAL(7)