1PARALLEL_TUTORIAL(7) parallel PARALLEL_TUTORIAL(7)
2
3
4
6 This tutorial shows off much of GNU parallel's functionality. The
7 tutorial is meant to learn the options in and syntax of GNU parallel.
8 The tutorial is not to show realistic examples from the real world.
9
10 Reader's guide
11 If you prefer reading a book buy GNU Parallel 2018 at
12 http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
13 or download it at: https://doi.org/10.5281/zenodo.1146014
14
15 Otherwise start by watching the intro videos for a quick introduction:
16 http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
17
18 Then browse through the EXAMPLEs after the list of OPTIONS in man
19 parallel (Use LESS=+/EXAMPLE: man parallel). That will give you an idea
20 of what GNU parallel is capable of.
21
22 If you want to dive even deeper: spend a couple of hours walking
23 through the tutorial (man parallel_tutorial). Your command line will
24 love you for it.
25
26 Finally you may want to look at the rest of the manual (man parallel)
27 if you have special needs not already covered.
28
29 If you want to know the design decisions behind GNU parallel, try: man
30 parallel_design. This is also a good intro if you intend to change GNU
31 parallel.
32
34 To run this tutorial you must have the following:
35
36 parallel >= version 20160822
37 Install the newest version using your package manager
38 (recommended for security reasons), the way described in
39 README, or with this command:
40
41 $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
42 fetch -o - http://pi.dk/3 ) > install.sh
43 $ sha1sum install.sh
44 12345678 3374ec53 bacb199b 245af2dd a86df6c9
45 $ md5sum install.sh
46 029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
47 $ sha512sum install.sh
48 40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
49 60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
50 $ bash install.sh
51
52 This will also install the newest version of the tutorial
53 which you can see by running this:
54
55 man parallel_tutorial
56
57 Most of the tutorial will work on older versions, too.
58
59 abc-file:
60 The file can be generated by this command:
61
62 parallel -k echo ::: A B C > abc-file
63
64 def-file:
65 The file can be generated by this command:
66
67 parallel -k echo ::: D E F > def-file
68
69 abc0-file:
70 The file can be generated by this command:
71
72 perl -e 'printf "A\0B\0C\0"' > abc0-file
73
74 abc_-file:
75 The file can be generated by this command:
76
77 perl -e 'printf "A_B_C_"' > abc_-file
78
79 tsv-file.tsv
80 The file can be generated by this command:
81
82 perl -e 'printf "f1\tf2\nA\tB\nC\tD\n"' > tsv-file.tsv
83
84 num8 The file can be generated by this command:
85
86 perl -e 'for(1..8){print "$_\n"}' > num8
87
88 num128 The file can be generated by this command:
89
90 perl -e 'for(1..128){print "$_\n"}' > num128
91
92 num30000 The file can be generated by this command:
93
94 perl -e 'for(1..30000){print "$_\n"}' > num30000
95
96 num1000000
97 The file can be generated by this command:
98
99 perl -e 'for(1..1000000){print "$_\n"}' > num1000000
100
101 num_%header
102 The file can be generated by this command:
103
104 (echo %head1; echo %head2; \
105 perl -e 'for(1..10){print "$_\n"}') > num_%header
106
107 fixedlen The file can be generated by this command:
108
109 perl -e 'print "HHHHAAABBBCCC"' > fixedlen
110
111 For remote running: ssh login on 2 servers with no password in $SERVER1
112 and $SERVER2 must work.
113 SERVER1=server.example.com
114 SERVER2=server2.example.net
115
116 So you must be able to do this without entering a password:
117
118 ssh $SERVER1 echo works
119 ssh $SERVER2 echo works
120
121 It can be setup by running 'ssh-keygen -t dsa; ssh-copy-id
122 $SERVER1' and using an empty passphrase, or you can use ssh-
123 agent.
124
126 GNU parallel reads input from input sources. These can be files, the
127 command line, and stdin (standard input or a pipe).
128
129 A single input source
130 Input can be read from the command line:
131
132 parallel echo ::: A B C
133
134 Output (the order may be different because the jobs are run in
135 parallel):
136
137 A
138 B
139 C
140
141 The input source can be a file:
142
143 parallel -a abc-file echo
144
145 Output: Same as above.
146
147 STDIN (standard input) can be the input source:
148
149 cat abc-file | parallel echo
150
151 Output: Same as above.
152
153 Multiple input sources
154 GNU parallel can take multiple input sources given on the command line.
155 GNU parallel then generates all combinations of the input sources:
156
157 parallel echo ::: A B C ::: D E F
158
159 Output (the order may be different):
160
161 A D
162 A E
163 A F
164 B D
165 B E
166 B F
167 C D
168 C E
169 C F
170
171 The input sources can be files:
172
173 parallel -a abc-file -a def-file echo
174
175 Output: Same as above.
176
177 STDIN (standard input) can be one of the input sources using -:
178
179 cat abc-file | parallel -a - -a def-file echo
180
181 Output: Same as above.
182
183 Instead of -a files can be given after :::::
184
185 cat abc-file | parallel echo :::: - def-file
186
187 Output: Same as above.
188
189 ::: and :::: can be mixed:
190
191 parallel echo ::: A B C :::: def-file
192
193 Output: Same as above.
194
195 Linking arguments from input sources
196
197 With --link you can link the input sources and get one argument from
198 each input source:
199
200 parallel --link echo ::: A B C ::: D E F
201
202 Output (the order may be different):
203
204 A D
205 B E
206 C F
207
208 If one of the input sources is too short, its values will wrap:
209
210 parallel --link echo ::: A B C D E ::: F G
211
212 Output (the order may be different):
213
214 A F
215 B G
216 C F
217 D G
218 E F
219
220 For more flexible linking you can use :::+ and ::::+. They work like
221 ::: and :::: except they link the previous input source to this input
222 source.
223
224 This will link ABC to GHI:
225
226 parallel echo :::: abc-file :::+ G H I :::: def-file
227
228 Output (the order may be different):
229
230 A G D
231 A G E
232 A G F
233 B H D
234 B H E
235 B H F
236 C I D
237 C I E
238 C I F
239
240 This will link GHI to DEF:
241
242 parallel echo :::: abc-file ::: G H I ::::+ def-file
243
244 Output (the order may be different):
245
246 A G D
247 A H E
248 A I F
249 B G D
250 B H E
251 B I F
252 C G D
253 C H E
254 C I F
255
256 If one of the input sources is too short when using :::+ or ::::+, the
257 rest will be ignored:
258
259 parallel echo ::: A B C D E :::+ F G
260
261 Output (the order may be different):
262
263 A F
264 B G
265
266 Changing the argument separator.
267 GNU parallel can use other separators than ::: or ::::. This is
268 typically useful if ::: or :::: is used in the command to run:
269
270 parallel --arg-sep ,, echo ,, A B C :::: def-file
271
272 Output (the order may be different):
273
274 A D
275 A E
276 A F
277 B D
278 B E
279 B F
280 C D
281 C E
282 C F
283
284 Changing the argument file separator:
285
286 parallel --arg-file-sep // echo ::: A B C // def-file
287
288 Output: Same as above.
289
290 Changing the argument delimiter
291 GNU parallel will normally treat a full line as a single argument: It
292 uses \n as argument delimiter. This can be changed with -d:
293
294 parallel -d _ echo :::: abc_-file
295
296 Output (the order may be different):
297
298 A
299 B
300 C
301
302 NUL can be given as \0:
303
304 parallel -d '\0' echo :::: abc0-file
305
306 Output: Same as above.
307
308 A shorthand for -d '\0' is -0 (this will often be used to read files
309 from find ... -print0):
310
311 parallel -0 echo :::: abc0-file
312
313 Output: Same as above.
314
315 End-of-file value for input source
316 GNU parallel can stop reading when it encounters a certain value:
317
318 parallel -E stop echo ::: A B stop C D
319
320 Output:
321
322 A
323 B
324
325 Skipping empty lines
326 Using --no-run-if-empty GNU parallel will skip empty lines.
327
328 (echo 1; echo; echo 2) | parallel --no-run-if-empty echo
329
330 Output:
331
332 1
333 2
334
336 No command means arguments are commands
337 If no command is given after parallel the arguments themselves are
338 treated as commands:
339
340 parallel ::: ls 'echo foo' pwd
341
342 Output (the order may be different):
343
344 [list of files in current dir]
345 foo
346 [/path/to/current/working/dir]
347
348 The command can be a script, a binary or a Bash function if the
349 function is exported using export -f:
350
351 # Only works in Bash
352 my_func() {
353 echo in my_func $1
354 }
355 export -f my_func
356 parallel my_func ::: 1 2 3
357
358 Output (the order may be different):
359
360 in my_func 1
361 in my_func 2
362 in my_func 3
363
364 Replacement strings
365 The 7 predefined replacement strings
366
367 GNU parallel has several replacement strings. If no replacement strings
368 are used the default is to append {}:
369
370 parallel echo ::: A/B.C
371
372 Output:
373
374 A/B.C
375
376 The default replacement string is {}:
377
378 parallel echo {} ::: A/B.C
379
380 Output:
381
382 A/B.C
383
384 The replacement string {.} removes the extension:
385
386 parallel echo {.} ::: A/B.C
387
388 Output:
389
390 A/B
391
392 The replacement string {/} removes the path:
393
394 parallel echo {/} ::: A/B.C
395
396 Output:
397
398 B.C
399
400 The replacement string {//} keeps only the path:
401
402 parallel echo {//} ::: A/B.C
403
404 Output:
405
406 A
407
408 The replacement string {/.} removes the path and the extension:
409
410 parallel echo {/.} ::: A/B.C
411
412 Output:
413
414 B
415
416 The replacement string {#} gives the job number:
417
418 parallel echo {#} ::: A B C
419
420 Output (the order may be different):
421
422 1
423 2
424 3
425
426 The replacement string {%} gives the job slot number (between 1 and
427 number of jobs to run in parallel):
428
429 parallel -j 2 echo {%} ::: A B C
430
431 Output (the order may be different and 1 and 2 may be swapped):
432
433 1
434 2
435 1
436
437 Changing the replacement strings
438
439 The replacement string {} can be changed with -I:
440
441 parallel -I ,, echo ,, ::: A/B.C
442
443 Output:
444
445 A/B.C
446
447 The replacement string {.} can be changed with --extensionreplace:
448
449 parallel --extensionreplace ,, echo ,, ::: A/B.C
450
451 Output:
452
453 A/B
454
455 The replacement string {/} can be replaced with --basenamereplace:
456
457 parallel --basenamereplace ,, echo ,, ::: A/B.C
458
459 Output:
460
461 B.C
462
463 The replacement string {//} can be changed with --dirnamereplace:
464
465 parallel --dirnamereplace ,, echo ,, ::: A/B.C
466
467 Output:
468
469 A
470
471 The replacement string {/.} can be changed with
472 --basenameextensionreplace:
473
474 parallel --basenameextensionreplace ,, echo ,, ::: A/B.C
475
476 Output:
477
478 B
479
480 The replacement string {#} can be changed with --seqreplace:
481
482 parallel --seqreplace ,, echo ,, ::: A B C
483
484 Output (the order may be different):
485
486 1
487 2
488 3
489
490 The replacement string {%} can be changed with --slotreplace:
491
492 parallel -j2 --slotreplace ,, echo ,, ::: A B C
493
494 Output (the order may be different and 1 and 2 may be swapped):
495
496 1
497 2
498 1
499
500 Perl expression replacement string
501
502 When predefined replacement strings are not flexible enough a perl
503 expression can be used instead. One example is to remove two
504 extensions: foo.tar.gz becomes foo
505
506 parallel echo '{= s:\.[^.]+$::;s:\.[^.]+$::; =}' ::: foo.tar.gz
507
508 Output:
509
510 foo
511
512 In {= =} you can access all of GNU parallel's internal functions and
513 variables. A few are worth mentioning.
514
515 total_jobs() returns the total number of jobs:
516
517 parallel echo Job {#} of {= '$_=total_jobs()' =} ::: {1..5}
518
519 Output:
520
521 Job 1 of 5
522 Job 2 of 5
523 Job 3 of 5
524 Job 4 of 5
525 Job 5 of 5
526
527 Q(...) shell quotes the string:
528
529 parallel echo {} shell quoted is {= '$_=Q($_)' =} ::: '*/!#$'
530
531 Output:
532
533 */!#$ shell quoted is \*/\!\#\$
534
535 skip() skips the job:
536
537 parallel echo {= 'if($_==3) { skip() }' =} ::: {1..5}
538
539 Output:
540
541 1
542 2
543 4
544 5
545
546 @arg contains the input source variables:
547
548 parallel echo {= 'if($arg[1]==$arg[2]) { skip() }' =} \
549 ::: {1..3} ::: {1..3}
550
551 Output:
552
553 1 2
554 1 3
555 2 1
556 2 3
557 3 1
558 3 2
559
560 If the strings {= and =} cause problems they can be replaced with
561 --parens:
562
563 parallel --parens ,,,, echo ',, s:\.[^.]+$::;s:\.[^.]+$::; ,,' \
564 ::: foo.tar.gz
565
566 Output:
567
568 foo
569
570 To define a shorthand replacement string use --rpl:
571
572 parallel --rpl '.. s:\.[^.]+$::;s:\.[^.]+$::;' echo '..' \
573 ::: foo.tar.gz
574
575 Output: Same as above.
576
577 If the shorthand starts with { it can be used as a positional
578 replacement string, too:
579
580 parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{..}'
581 ::: foo.tar.gz
582
583 Output: Same as above.
584
585 If the shorthand contains matching parenthesis the replacement string
586 becomes a dynamic replacement string and the string in the parenthesis
587 can be accessed as $$1. If there are multiple matching parenthesis, the
588 matched strings can be accessed using $$2, $$3 and so on.
589
590 You can think of this as giving arguments to the replacement string.
591 Here we give the argument .tar.gz to the replacement string {%string}
592 which removes string:
593
594 parallel --rpl '{%(.+?)} s/$$1$//;' echo {%.tar.gz}.zip ::: foo.tar.gz
595
596 Output:
597
598 foo.zip
599
600 Here we give the two arguments tar.gz and zip to the replacement string
601 {/string1/string2} which replaces string1 with string2:
602
603 parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;' echo {/tar.gz/zip} \
604 ::: foo.tar.gz
605
606 Output:
607
608 foo.zip
609
610 GNU parallel's 7 replacement strings are implemented as this:
611
612 --rpl '{} '
613 --rpl '{#} $_=$job->seq()'
614 --rpl '{%} $_=$job->slot()'
615 --rpl '{/} s:.*/::'
616 --rpl '{//} $Global::use{"File::Basename"} ||=
617 eval "use File::Basename; 1;"; $_ = dirname($_);'
618 --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
619 --rpl '{.} s:\.[^/.]+$::'
620
621 Positional replacement strings
622
623 With multiple input sources the argument from the individual input
624 sources can be accessed with {number}:
625
626 parallel echo {1} and {2} ::: A B ::: C D
627
628 Output (the order may be different):
629
630 A and C
631 A and D
632 B and C
633 B and D
634
635 The positional replacement strings can also be modified using /, //,
636 /., and .:
637
638 parallel echo /={1/} //={1//} /.={1/.} .={1.} ::: A/B.C D/E.F
639
640 Output (the order may be different):
641
642 /=B.C //=A /.=B .=A/B
643 /=E.F //=D /.=E .=D/E
644
645 If a position is negative, it will refer to the input source counted
646 from behind:
647
648 parallel echo 1={1} 2={2} 3={3} -1={-1} -2={-2} -3={-3} \
649 ::: A B ::: C D ::: E F
650
651 Output (the order may be different):
652
653 1=A 2=C 3=E -1=E -2=C -3=A
654 1=A 2=C 3=F -1=F -2=C -3=A
655 1=A 2=D 3=E -1=E -2=D -3=A
656 1=A 2=D 3=F -1=F -2=D -3=A
657 1=B 2=C 3=E -1=E -2=C -3=B
658 1=B 2=C 3=F -1=F -2=C -3=B
659 1=B 2=D 3=E -1=E -2=D -3=B
660 1=B 2=D 3=F -1=F -2=D -3=B
661
662 Positional perl expression replacement string
663
664 To use a perl expression as a positional replacement string simply
665 prepend the perl expression with number and space:
666
667 parallel echo '{=2 s:\.[^.]+$::;s:\.[^.]+$::; =} {1}' \
668 ::: bar ::: foo.tar.gz
669
670 Output:
671
672 foo bar
673
674 If a shorthand defined using --rpl starts with { it can be used as a
675 positional replacement string, too:
676
677 parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{2..} {1}' \
678 ::: bar ::: foo.tar.gz
679
680 Output: Same as above.
681
682 Input from columns
683
684 The columns in a file can be bound to positional replacement strings
685 using --colsep. Here the columns are separated by TAB (\t):
686
687 parallel --colsep '\t' echo 1={1} 2={2} :::: tsv-file.tsv
688
689 Output (the order may be different):
690
691 1=f1 2=f2
692 1=A 2=B
693 1=C 2=D
694
695 Header defined replacement strings
696
697 With --header GNU parallel will use the first value of the input source
698 as the name of the replacement string. Only the non-modified version {}
699 is supported:
700
701 parallel --header : echo f1={f1} f2={f2} ::: f1 A B ::: f2 C D
702
703 Output (the order may be different):
704
705 f1=A f2=C
706 f1=A f2=D
707 f1=B f2=C
708 f1=B f2=D
709
710 It is useful with --colsep for processing files with TAB separated
711 values:
712
713 parallel --header : --colsep '\t' echo f1={f1} f2={f2} \
714 :::: tsv-file.tsv
715
716 Output (the order may be different):
717
718 f1=A f2=B
719 f1=C f2=D
720
721 More pre-defined replacement strings with --plus
722
723 --plus adds the replacement strings {+/} {+.} {+..} {+...} {..} {...}
724 {/..} {/...} {##}. The idea being that {+foo} matches the opposite of
725 {foo} and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
726 {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}.
727
728 parallel --plus echo {} ::: dir/sub/file.ex1.ex2.ex3
729 parallel --plus echo {+/}/{/} ::: dir/sub/file.ex1.ex2.ex3
730 parallel --plus echo {.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
731 parallel --plus echo {+/}/{/.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
732 parallel --plus echo {..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
733 parallel --plus echo {+/}/{/..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
734 parallel --plus echo {...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
735 parallel --plus echo {+/}/{/...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
736
737 Output:
738
739 dir/sub/file.ex1.ex2.ex3
740
741 {##} is simply the number of jobs:
742
743 parallel --plus echo Job {#} of {##} ::: {1..5}
744
745 Output:
746
747 Job 1 of 5
748 Job 2 of 5
749 Job 3 of 5
750 Job 4 of 5
751 Job 5 of 5
752
753 Dynamic replacement strings with --plus
754
755 --plus also defines these dynamic replacement strings:
756
757 {:-string} Default value is string if the argument is empty.
758
759 {:number} Substring from number till end of string.
760
761 {:number1:number2} Substring from number1 to number2.
762
763 {#string} If the argument starts with string, remove it.
764
765 {%string} If the argument ends with string, remove it.
766
767 {/string1/string2} Replace string1 with string2.
768
769 {^string} If the argument starts with string, upper case it.
770 string must be a single letter.
771
772 {^^string} If the argument contains string, upper case it.
773 string must be a single letter.
774
775 {,string} If the argument starts with string, lower case it.
776 string must be a single letter.
777
778 {,,string} If the argument contains string, lower case it.
779 string must be a single letter.
780
781 They are inspired from Bash:
782
783 unset myvar
784 echo ${myvar:-myval}
785 parallel --plus echo {:-myval} ::: "$myvar"
786
787 myvar=abcAaAdef
788 echo ${myvar:2}
789 parallel --plus echo {:2} ::: "$myvar"
790
791 echo ${myvar:2:3}
792 parallel --plus echo {:2:3} ::: "$myvar"
793
794 echo ${myvar#bc}
795 parallel --plus echo {#bc} ::: "$myvar"
796 echo ${myvar#abc}
797 parallel --plus echo {#abc} ::: "$myvar"
798
799 echo ${myvar%de}
800 parallel --plus echo {%de} ::: "$myvar"
801 echo ${myvar%def}
802 parallel --plus echo {%def} ::: "$myvar"
803
804 echo ${myvar/def/ghi}
805 parallel --plus echo {/def/ghi} ::: "$myvar"
806
807 echo ${myvar^a}
808 parallel --plus echo {^a} ::: "$myvar"
809 echo ${myvar^^a}
810 parallel --plus echo {^^a} ::: "$myvar"
811
812 myvar=AbcAaAdef
813 echo ${myvar,A}
814 parallel --plus echo '{,A}' ::: "$myvar"
815 echo ${myvar,,A}
816 parallel --plus echo '{,,A}' ::: "$myvar"
817
818 Output:
819
820 myval
821 myval
822 cAaAdef
823 cAaAdef
824 cAa
825 cAa
826 abcAaAdef
827 abcAaAdef
828 AaAdef
829 AaAdef
830 abcAaAdef
831 abcAaAdef
832 abcAaA
833 abcAaA
834 abcAaAghi
835 abcAaAghi
836 AbcAaAdef
837 AbcAaAdef
838 AbcAAAdef
839 AbcAAAdef
840 abcAaAdef
841 abcAaAdef
842 abcaaadef
843 abcaaadef
844
845 More than one argument
846 With --xargs GNU parallel will fit as many arguments as possible on a
847 single line:
848
849 cat num30000 | parallel --xargs echo | wc -l
850
851 Output (if you run this under Bash on GNU/Linux):
852
853 2
854
855 The 30000 arguments fitted on 2 lines.
856
857 The maximal length of a single line can be set with -s. With a maximal
858 line length of 10000 chars 17 commands will be run:
859
860 cat num30000 | parallel --xargs -s 10000 echo | wc -l
861
862 Output:
863
864 17
865
866 For better parallelism GNU parallel can distribute the arguments
867 between all the parallel jobs when end of file is met.
868
869 Below GNU parallel reads the last argument when generating the second
870 job. When GNU parallel reads the last argument, it spreads all the
871 arguments for the second job over 4 jobs instead, as 4 parallel jobs
872 are requested.
873
874 The first job will be the same as the --xargs example above, but the
875 second job will be split into 4 evenly sized jobs, resulting in a total
876 of 5 jobs:
877
878 cat num30000 | parallel --jobs 4 -m echo | wc -l
879
880 Output (if you run this under Bash on GNU/Linux):
881
882 5
883
884 This is even more visible when running 4 jobs with 10 arguments. The 10
885 arguments are being spread over 4 jobs:
886
887 parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10
888
889 Output:
890
891 1 2 3
892 4 5 6
893 7 8 9
894 10
895
896 A replacement string can be part of a word. -m will not repeat the
897 context:
898
899 parallel --jobs 4 -m echo pre-{}-post ::: A B C D E F G
900
901 Output (the order may be different):
902
903 pre-A B-post
904 pre-C D-post
905 pre-E F-post
906 pre-G-post
907
908 To repeat the context use -X which otherwise works like -m:
909
910 parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G
911
912 Output (the order may be different):
913
914 pre-A-post pre-B-post
915 pre-C-post pre-D-post
916 pre-E-post pre-F-post
917 pre-G-post
918
919 To limit the number of arguments use -N:
920
921 parallel -N3 echo ::: A B C D E F G H
922
923 Output (the order may be different):
924
925 A B C
926 D E F
927 G H
928
929 -N also sets the positional replacement strings:
930
931 parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H
932
933 Output (the order may be different):
934
935 1=A 2=B 3=C
936 1=D 2=E 3=F
937 1=G 2=H 3=
938
939 -N0 reads 1 argument but inserts none:
940
941 parallel -N0 echo foo ::: 1 2 3
942
943 Output:
944
945 foo
946 foo
947 foo
948
949 Quoting
950 Command lines that contain special characters may need to be protected
951 from the shell.
952
953 The perl program print "@ARGV\n" basically works like echo.
954
955 perl -e 'print "@ARGV\n"' A
956
957 Output:
958
959 A
960
961 To run that in parallel the command needs to be quoted:
962
963 parallel perl -e 'print "@ARGV\n"' ::: This wont work
964
965 Output:
966
967 [Nothing]
968
969 To quote the command use -q:
970
971 parallel -q perl -e 'print "@ARGV\n"' ::: This works
972
973 Output (the order may be different):
974
975 This
976 works
977
978 Or you can quote the critical part using \':
979
980 parallel perl -e \''print "@ARGV\n"'\' ::: This works, too
981
982 Output (the order may be different):
983
984 This
985 works,
986 too
987
988 GNU parallel can also \-quote full lines. Simply run this:
989
990 parallel --shellquote
991 Warning: Input is read from the terminal. You either know what you
992 Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
993 Warning: ::: or :::: or to pipe data into parallel. If so
994 Warning: consider going through the tutorial: man parallel_tutorial
995 Warning: Press CTRL-D to exit.
996 perl -e 'print "@ARGV\n"'
997 [CTRL-D]
998
999 Output:
1000
1001 perl\ -e\ \'print\ \"@ARGV\\n\"\'
1002
1003 This can then be used as the command:
1004
1005 parallel perl\ -e\ \'print\ \"@ARGV\\n\"\' ::: This also works
1006
1007 Output (the order may be different):
1008
1009 This
1010 also
1011 works
1012
1013 Trimming space
1014 Space can be trimmed on the arguments using --trim:
1015
1016 parallel --trim r echo pre-{}-post ::: ' A '
1017
1018 Output:
1019
1020 pre- A-post
1021
1022 To trim on the left side:
1023
1024 parallel --trim l echo pre-{}-post ::: ' A '
1025
1026 Output:
1027
1028 pre-A -post
1029
1030 To trim on the both sides:
1031
1032 parallel --trim lr echo pre-{}-post ::: ' A '
1033
1034 Output:
1035
1036 pre-A-post
1037
1038 Respecting the shell
1039 This tutorial uses Bash as the shell. GNU parallel respects which shell
1040 you are using, so in zsh you can do:
1041
1042 parallel echo \={} ::: zsh bash ls
1043
1044 Output:
1045
1046 /usr/bin/zsh
1047 /bin/bash
1048 /bin/ls
1049
1050 In csh you can do:
1051
1052 parallel 'set a="{}"; if( { test -d "$a" } ) echo "$a is a dir"' ::: *
1053
1054 Output:
1055
1056 [somedir] is a dir
1057
1058 This also becomes useful if you use GNU parallel in a shell script: GNU
1059 parallel will use the same shell as the shell script.
1060
1062 The output can prefixed with the argument:
1063
1064 parallel --tag echo foo-{} ::: A B C
1065
1066 Output (the order may be different):
1067
1068 A foo-A
1069 B foo-B
1070 C foo-C
1071
1072 To prefix it with another string use --tagstring:
1073
1074 parallel --tagstring {}-bar echo foo-{} ::: A B C
1075
1076 Output (the order may be different):
1077
1078 A-bar foo-A
1079 B-bar foo-B
1080 C-bar foo-C
1081
1082 To see what commands will be run without running them use --dryrun:
1083
1084 parallel --dryrun echo {} ::: A B C
1085
1086 Output (the order may be different):
1087
1088 echo A
1089 echo B
1090 echo C
1091
1092 To print the command before running them use --verbose:
1093
1094 parallel --verbose echo {} ::: A B C
1095
1096 Output (the order may be different):
1097
1098 echo A
1099 echo B
1100 A
1101 echo C
1102 B
1103 C
1104
1105 GNU parallel will postpone the output until the command completes:
1106
1107 parallel -j2 'printf "%s-start\n%s" {} {};
1108 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1109
1110 Output:
1111
1112 2-start
1113 2-middle
1114 2-end
1115 1-start
1116 1-middle
1117 1-end
1118 4-start
1119 4-middle
1120 4-end
1121
1122 To get the output immediately use --ungroup:
1123
1124 parallel -j2 --ungroup 'printf "%s-start\n%s" {} {};
1125 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1126
1127 Output:
1128
1129 4-start
1130 42-start
1131 2-middle
1132 2-end
1133 1-start
1134 1-middle
1135 1-end
1136 -middle
1137 4-end
1138
1139 --ungroup is fast, but can cause half a line from one job to be mixed
1140 with half a line of another job. That has happened in the second line,
1141 where the line '4-middle' is mixed with '2-start'.
1142
1143 To avoid this use --linebuffer:
1144
1145 parallel -j2 --linebuffer 'printf "%s-start\n%s" {} {};
1146 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1147
1148 Output:
1149
1150 4-start
1151 2-start
1152 2-middle
1153 2-end
1154 1-start
1155 1-middle
1156 1-end
1157 4-middle
1158 4-end
1159
1160 To force the output in the same order as the arguments use
1161 --keep-order/-k:
1162
1163 parallel -j2 -k 'printf "%s-start\n%s" {} {};
1164 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1165
1166 Output:
1167
1168 4-start
1169 4-middle
1170 4-end
1171 2-start
1172 2-middle
1173 2-end
1174 1-start
1175 1-middle
1176 1-end
1177
1178 Saving output into files
1179 GNU parallel can save the output of each job into files:
1180
1181 parallel --files echo ::: A B C
1182
1183 Output will be similar to this:
1184
1185 /tmp/pAh6uWuQCg.par
1186 /tmp/opjhZCzAX4.par
1187 /tmp/W0AT_Rph2o.par
1188
1189 By default GNU parallel will cache the output in files in /tmp. This
1190 can be changed by setting $TMPDIR or --tmpdir:
1191
1192 parallel --tmpdir /var/tmp --files echo ::: A B C
1193
1194 Output will be similar to this:
1195
1196 /var/tmp/N_vk7phQRc.par
1197 /var/tmp/7zA4Ccf3wZ.par
1198 /var/tmp/LIuKgF_2LP.par
1199
1200 Or:
1201
1202 TMPDIR=/var/tmp parallel --files echo ::: A B C
1203
1204 Output: Same as above.
1205
1206 The output files can be saved in a structured way using --results:
1207
1208 parallel --results outdir echo ::: A B C
1209
1210 Output:
1211
1212 A
1213 B
1214 C
1215
1216 These files were also generated containing the standard output
1217 (stdout), standard error (stderr), and the sequence number (seq):
1218
1219 outdir/1/A/seq
1220 outdir/1/A/stderr
1221 outdir/1/A/stdout
1222 outdir/1/B/seq
1223 outdir/1/B/stderr
1224 outdir/1/B/stdout
1225 outdir/1/C/seq
1226 outdir/1/C/stderr
1227 outdir/1/C/stdout
1228
1229 --header : will take the first value as name and use that in the
1230 directory structure. This is useful if you are using multiple input
1231 sources:
1232
1233 parallel --header : --results outdir echo ::: f1 A B ::: f2 C D
1234
1235 Generated files:
1236
1237 outdir/f1/A/f2/C/seq
1238 outdir/f1/A/f2/C/stderr
1239 outdir/f1/A/f2/C/stdout
1240 outdir/f1/A/f2/D/seq
1241 outdir/f1/A/f2/D/stderr
1242 outdir/f1/A/f2/D/stdout
1243 outdir/f1/B/f2/C/seq
1244 outdir/f1/B/f2/C/stderr
1245 outdir/f1/B/f2/C/stdout
1246 outdir/f1/B/f2/D/seq
1247 outdir/f1/B/f2/D/stderr
1248 outdir/f1/B/f2/D/stdout
1249
1250 The directories are named after the variables and their values.
1251
1253 Number of simultaneous jobs
1254 The number of concurrent jobs is given with --jobs/-j:
1255
1256 /usr/bin/time parallel -N0 -j64 sleep 1 :::: num128
1257
1258 With 64 jobs in parallel the 128 sleeps will take 2-8 seconds to run -
1259 depending on how fast your machine is.
1260
1261 By default --jobs is the same as the number of CPU cores. So this:
1262
1263 /usr/bin/time parallel -N0 sleep 1 :::: num128
1264
1265 should take twice the time of running 2 jobs per CPU core:
1266
1267 /usr/bin/time parallel -N0 --jobs 200% sleep 1 :::: num128
1268
1269 --jobs 0 will run as many jobs in parallel as possible:
1270
1271 /usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: num128
1272
1273 which should take 1-7 seconds depending on how fast your machine is.
1274
1275 --jobs can read from a file which is re-read when a job finishes:
1276
1277 echo 50% > my_jobs
1278 /usr/bin/time parallel -N0 --jobs my_jobs sleep 1 :::: num128 &
1279 sleep 1
1280 echo 0 > my_jobs
1281 wait
1282
1283 The first second only 50% of the CPU cores will run a job. Then 0 is
1284 put into my_jobs and then the rest of the jobs will be started in
1285 parallel.
1286
1287 Instead of basing the percentage on the number of CPU cores GNU
1288 parallel can base it on the number of CPUs:
1289
1290 parallel --use-cpus-instead-of-cores -N0 sleep 1 :::: num8
1291
1292 Shuffle job order
1293 If you have many jobs (e.g. by multiple combinations of input sources),
1294 it can be handy to shuffle the jobs, so you get different values run.
1295 Use --shuf for that:
1296
1297 parallel --shuf echo ::: 1 2 3 ::: a b c ::: A B C
1298
1299 Output:
1300
1301 All combinations but different order for each run.
1302
1303 Interactivity
1304 GNU parallel can ask the user if a command should be run using
1305 --interactive:
1306
1307 parallel --interactive echo ::: 1 2 3
1308
1309 Output:
1310
1311 echo 1 ?...y
1312 echo 2 ?...n
1313 1
1314 echo 3 ?...y
1315 3
1316
1317 GNU parallel can be used to put arguments on the command line for an
1318 interactive command such as emacs to edit one file at a time:
1319
1320 parallel --tty emacs ::: 1 2 3
1321
1322 Or give multiple argument in one go to open multiple files:
1323
1324 parallel -X --tty vi ::: 1 2 3
1325
1326 A terminal for every job
1327 Using --tmux GNU parallel can start a terminal for every job run:
1328
1329 seq 10 20 | parallel --tmux 'echo start {}; sleep {}; echo done {}'
1330
1331 This will tell you to run something similar to:
1332
1333 tmux -S /tmp/tmsrPrO0 attach
1334
1335 Using normal tmux keystrokes (CTRL-b n or CTRL-b p) you can cycle
1336 between windows of the running jobs. When a job is finished it will
1337 pause for 10 seconds before closing the window.
1338
1339 Timing
1340 Some jobs do heavy I/O when they start. To avoid a thundering herd GNU
1341 parallel can delay starting new jobs. --delay X will make sure there is
1342 at least X seconds between each start:
1343
1344 parallel --delay 2.5 echo Starting {}\;date ::: 1 2 3
1345
1346 Output:
1347
1348 Starting 1
1349 Thu Aug 15 16:24:33 CEST 2013
1350 Starting 2
1351 Thu Aug 15 16:24:35 CEST 2013
1352 Starting 3
1353 Thu Aug 15 16:24:38 CEST 2013
1354
1355 If jobs taking more than a certain amount of time are known to fail,
1356 they can be stopped with --timeout. The accuracy of --timeout is 2
1357 seconds:
1358
1359 parallel --timeout 4.1 sleep {}\; echo {} ::: 2 4 6 8
1360
1361 Output:
1362
1363 2
1364 4
1365
1366 GNU parallel can compute the median runtime for jobs and kill those
1367 that take more than 200% of the median runtime:
1368
1369 parallel --timeout 200% sleep {}\; echo {} ::: 2.1 2.2 3 7 2.3
1370
1371 Output:
1372
1373 2.1
1374 2.2
1375 3
1376 2.3
1377
1378 Progress information
1379 Based on the runtime of completed jobs GNU parallel can estimate the
1380 total runtime:
1381
1382 parallel --eta sleep ::: 1 3 2 2 1 3 3 2 1
1383
1384 Output:
1385
1386 Computers / CPU cores / Max jobs to run
1387 1:local / 2 / 2
1388
1389 Computer:jobs running/jobs completed/%of started jobs/
1390 Average seconds to complete
1391 ETA: 2s 0left 1.11avg local:0/9/100%/1.1s
1392
1393 GNU parallel can give progress information with --progress:
1394
1395 parallel --progress sleep ::: 1 3 2 2 1 3 3 2 1
1396
1397 Output:
1398
1399 Computers / CPU cores / Max jobs to run
1400 1:local / 2 / 2
1401
1402 Computer:jobs running/jobs completed/%of started jobs/
1403 Average seconds to complete
1404 local:0/9/100%/1.1s
1405
1406 A progress bar can be shown with --bar:
1407
1408 parallel --bar sleep ::: 1 3 2 2 1 3 3 2 1
1409
1410 And a graphic bar can be shown with --bar and zenity:
1411
1412 seq 1000 | parallel -j10 --bar '(echo -n {};sleep 0.1)' \
1413 2> >(zenity --progress --auto-kill --auto-close)
1414
1415 A logfile of the jobs completed so far can be generated with --joblog:
1416
1417 parallel --joblog /tmp/log exit ::: 1 2 3 0
1418 cat /tmp/log
1419
1420 Output:
1421
1422 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1423 1 : 1376577364.974 0.008 0 0 1 0 exit 1
1424 2 : 1376577364.982 0.013 0 0 2 0 exit 2
1425 3 : 1376577364.990 0.013 0 0 3 0 exit 3
1426 4 : 1376577365.003 0.003 0 0 0 0 exit 0
1427
1428 The log contains the job sequence, which host the job was run on, the
1429 start time and run time, how much data was transferred, the exit value,
1430 the signal that killed the job, and finally the command being run.
1431
1432 With a joblog GNU parallel can be stopped and later pickup where it
1433 left off. It it important that the input of the completed jobs is
1434 unchanged.
1435
1436 parallel --joblog /tmp/log exit ::: 1 2 3 0
1437 cat /tmp/log
1438 parallel --resume --joblog /tmp/log exit ::: 1 2 3 0 0 0
1439 cat /tmp/log
1440
1441 Output:
1442
1443 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1444 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1445 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1446 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1447 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1448
1449 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1450 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1451 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1452 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1453 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1454 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1455 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1456
1457 Note how the start time of the last 2 jobs is clearly different from
1458 the second run.
1459
1460 With --resume-failed GNU parallel will re-run the jobs that failed:
1461
1462 parallel --resume-failed --joblog /tmp/log exit ::: 1 2 3 0 0 0
1463 cat /tmp/log
1464
1465 Output:
1466
1467 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1468 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1469 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1470 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1471 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1472 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1473 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1474 1 : 1376580154.433 0.010 0 0 1 0 exit 1
1475 2 : 1376580154.444 0.022 0 0 2 0 exit 2
1476 3 : 1376580154.466 0.005 0 0 3 0 exit 3
1477
1478 Note how seq 1 2 3 have been repeated because they had exit value
1479 different from 0.
1480
1481 --retry-failed does almost the same as --resume-failed. Where
1482 --resume-failed reads the commands from the command line (and ignores
1483 the commands in the joblog), --retry-failed ignores the command line
1484 and reruns the commands mentioned in the joblog.
1485
1486 parallel --retry-failed --joblog /tmp/log
1487 cat /tmp/log
1488
1489 Output:
1490
1491 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1492 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1493 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1494 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1495 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1496 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1497 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1498 1 : 1376580154.433 0.010 0 0 1 0 exit 1
1499 2 : 1376580154.444 0.022 0 0 2 0 exit 2
1500 3 : 1376580154.466 0.005 0 0 3 0 exit 3
1501 1 : 1376580164.633 0.010 0 0 1 0 exit 1
1502 2 : 1376580164.644 0.022 0 0 2 0 exit 2
1503 3 : 1376580164.666 0.005 0 0 3 0 exit 3
1504
1505 Termination
1506 Unconditional termination
1507
1508 By default GNU parallel will wait for all jobs to finish before
1509 exiting.
1510
1511 If you send GNU parallel the TERM signal, GNU parallel will stop
1512 spawning new jobs and wait for the remaining jobs to finish. If you
1513 send GNU parallel the TERM signal again, GNU parallel will kill all
1514 running jobs and exit.
1515
1516 Termination dependent on job status
1517
1518 For certain jobs there is no need to continue if one of the jobs fails
1519 and has an exit code different from 0. GNU parallel will stop spawning
1520 new jobs with --halt soon,fail=1:
1521
1522 parallel -j2 --halt soon,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1523
1524 Output:
1525
1526 0
1527 0
1528 1
1529 parallel: This job failed:
1530 echo 1; exit 1
1531 parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1532 2
1533
1534 With --halt now,fail=1 the running jobs will be killed immediately:
1535
1536 parallel -j2 --halt now,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1537
1538 Output:
1539
1540 0
1541 0
1542 1
1543 parallel: This job failed:
1544 echo 1; exit 1
1545
1546 If --halt is given a percentage this percentage of the jobs must fail
1547 before GNU parallel stops spawning more jobs:
1548
1549 parallel -j2 --halt soon,fail=20% echo {}\; exit {} \
1550 ::: 0 1 2 3 4 5 6 7 8 9
1551
1552 Output:
1553
1554 0
1555 1
1556 parallel: This job failed:
1557 echo 1; exit 1
1558 2
1559 parallel: This job failed:
1560 echo 2; exit 2
1561 parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1562 3
1563 parallel: This job failed:
1564 echo 3; exit 3
1565
1566 If you are looking for success instead of failures, you can use
1567 success. This will finish as soon as the first job succeeds:
1568
1569 parallel -j2 --halt now,success=1 echo {}\; exit {} ::: 1 2 3 0 4 5 6
1570
1571 Output:
1572
1573 1
1574 2
1575 3
1576 0
1577 parallel: This job succeeded:
1578 echo 0; exit 0
1579
1580 GNU parallel can retry the command with --retries. This is useful if a
1581 command fails for unknown reasons now and then.
1582
1583 parallel -k --retries 3 \
1584 'echo tried {} >>/tmp/runs; echo completed {}; exit {}' ::: 1 2 0
1585 cat /tmp/runs
1586
1587 Output:
1588
1589 completed 1
1590 completed 2
1591 completed 0
1592
1593 tried 1
1594 tried 2
1595 tried 1
1596 tried 2
1597 tried 1
1598 tried 2
1599 tried 0
1600
1601 Note how job 1 and 2 were tried 3 times, but 0 was not retried because
1602 it had exit code 0.
1603
1604 Termination signals (advanced)
1605
1606 Using --termseq you can control which signals are sent when killing
1607 children. Normally children will be killed by sending them SIGTERM,
1608 waiting 200 ms, then another SIGTERM, waiting 100 ms, then another
1609 SIGTERM, waiting 50 ms, then a SIGKILL, finally waiting 25 ms before
1610 giving up. It looks like this:
1611
1612 show_signals() {
1613 perl -e 'for(keys %SIG) {
1614 $SIG{$_} = eval "sub { print \"Got $_\\n\"; }";
1615 }
1616 while(1){sleep 1}'
1617 }
1618 export -f show_signals
1619 echo | parallel --termseq TERM,200,TERM,100,TERM,50,KILL,25 \
1620 -u --timeout 1 show_signals
1621
1622 Output:
1623
1624 Got TERM
1625 Got TERM
1626 Got TERM
1627
1628 Or just:
1629
1630 echo | parallel -u --timeout 1 show_signals
1631
1632 Output: Same as above.
1633
1634 You can change this to SIGINT, SIGTERM, SIGKILL:
1635
1636 echo | parallel --termseq INT,200,TERM,100,KILL,25 \
1637 -u --timeout 1 show_signals
1638
1639 Output:
1640
1641 Got INT
1642 Got TERM
1643
1644 The SIGKILL does not show because it cannot be caught, and thus the
1645 child dies.
1646
1647 Limiting the resources
1648 To avoid overloading systems GNU parallel can look at the system load
1649 before starting another job:
1650
1651 parallel --load 100% echo load is less than {} job per cpu ::: 1
1652
1653 Output:
1654
1655 [when then load is less than the number of cpu cores]
1656 load is less than 1 job per cpu
1657
1658 GNU parallel can also check if the system is swapping.
1659
1660 parallel --noswap echo the system is not swapping ::: now
1661
1662 Output:
1663
1664 [when then system is not swapping]
1665 the system is not swapping now
1666
1667 Some jobs need a lot of memory, and should only be started when there
1668 is enough memory free. Using --memfree GNU parallel can check if there
1669 is enough memory free. Additionally, GNU parallel will kill off the
1670 youngest job if the memory free falls below 50% of the size. The killed
1671 job will put back on the queue and retried later.
1672
1673 parallel --memfree 1G echo will run if more than 1 GB is ::: free
1674
1675 GNU parallel can run the jobs with a nice value. This will work both
1676 locally and remotely.
1677
1678 parallel --nice 17 echo this is being run with nice -n ::: 17
1679
1680 Output:
1681
1682 this is being run with nice -n 17
1683
1685 GNU parallel can run jobs on remote servers. It uses ssh to communicate
1686 with the remote machines.
1687
1688 Sshlogin
1689 The most basic sshlogin is -S host:
1690
1691 parallel -S $SERVER1 echo running on ::: $SERVER1
1692
1693 Output:
1694
1695 running on [$SERVER1]
1696
1697 To use a different username prepend the server with username@:
1698
1699 parallel -S username@$SERVER1 echo running on ::: username@$SERVER1
1700
1701 Output:
1702
1703 running on [username@$SERVER1]
1704
1705 The special sshlogin : is the local machine:
1706
1707 parallel -S : echo running on ::: the_local_machine
1708
1709 Output:
1710
1711 running on the_local_machine
1712
1713 If ssh is not in $PATH it can be prepended to $SERVER1:
1714
1715 parallel -S '/usr/bin/ssh '$SERVER1 echo custom ::: ssh
1716
1717 Output:
1718
1719 custom ssh
1720
1721 The ssh command can also be given using --ssh:
1722
1723 parallel --ssh /usr/bin/ssh -S $SERVER1 echo custom ::: ssh
1724
1725 or by setting $PARALLEL_SSH:
1726
1727 export PARALLEL_SSH=/usr/bin/ssh
1728 parallel -S $SERVER1 echo custom ::: ssh
1729
1730 Several servers can be given using multiple -S:
1731
1732 parallel -S $SERVER1 -S $SERVER2 echo ::: running on more hosts
1733
1734 Output (the order may be different):
1735
1736 running
1737 on
1738 more
1739 hosts
1740
1741 Or they can be separated by ,:
1742
1743 parallel -S $SERVER1,$SERVER2 echo ::: running on more hosts
1744
1745 Output: Same as above.
1746
1747 Or newline:
1748
1749 # This gives a \n between $SERVER1 and $SERVER2
1750 SERVERS="`echo $SERVER1; echo $SERVER2`"
1751 parallel -S "$SERVERS" echo ::: running on more hosts
1752
1753 They can also be read from a file (replace user@ with the user on
1754 $SERVER2):
1755
1756 echo $SERVER1 > nodefile
1757 # Force 4 cores, special ssh-command, username
1758 echo 4//usr/bin/ssh user@$SERVER2 >> nodefile
1759 parallel --sshloginfile nodefile echo ::: running on more hosts
1760
1761 Output: Same as above.
1762
1763 Every time a job finished, the --sshloginfile will be re-read, so it is
1764 possible to both add and remove hosts while running.
1765
1766 The special --sshloginfile .. reads from ~/.parallel/sshloginfile.
1767
1768 To force GNU parallel to treat a server having a given number of CPU
1769 cores prepend the number of core followed by / to the sshlogin:
1770
1771 parallel -S 4/$SERVER1 echo force {} cpus on server ::: 4
1772
1773 Output:
1774
1775 force 4 cpus on server
1776
1777 Servers can be put into groups by prepending @groupname to the server
1778 and the group can then be selected by appending @groupname to the
1779 argument if using --hostgroup:
1780
1781 parallel --hostgroup -S @grp1/$SERVER1 -S @grp2/$SERVER2 echo {} \
1782 ::: run_on_grp1@grp1 run_on_grp2@grp2
1783
1784 Output:
1785
1786 run_on_grp1
1787 run_on_grp2
1788
1789 A host can be in multiple groups by separating the groups with +, and
1790 you can force GNU parallel to limit the groups on which the command can
1791 be run with -S @groupname:
1792
1793 parallel -S @grp1 -S @grp1+grp2/$SERVER1 -S @grp2/SERVER2 echo {} \
1794 ::: run_on_grp1 also_grp1
1795
1796 Output:
1797
1798 run_on_grp1
1799 also_grp1
1800
1801 Transferring files
1802 GNU parallel can transfer the files to be processed to the remote host.
1803 It does that using rsync.
1804
1805 echo This is input_file > input_file
1806 parallel -S $SERVER1 --transferfile {} cat ::: input_file
1807
1808 Output:
1809
1810 This is input_file
1811
1812 If the files are processed into another file, the resulting file can be
1813 transferred back:
1814
1815 echo This is input_file > input_file
1816 parallel -S $SERVER1 --transferfile {} --return {}.out \
1817 cat {} ">"{}.out ::: input_file
1818 cat input_file.out
1819
1820 Output: Same as above.
1821
1822 To remove the input and output file on the remote server use --cleanup:
1823
1824 echo This is input_file > input_file
1825 parallel -S $SERVER1 --transferfile {} --return {}.out --cleanup \
1826 cat {} ">"{}.out ::: input_file
1827 cat input_file.out
1828
1829 Output: Same as above.
1830
1831 There is a shorthand for --transferfile {} --return --cleanup called
1832 --trc:
1833
1834 echo This is input_file > input_file
1835 parallel -S $SERVER1 --trc {}.out cat {} ">"{}.out ::: input_file
1836 cat input_file.out
1837
1838 Output: Same as above.
1839
1840 Some jobs need a common database for all jobs. GNU parallel can
1841 transfer that using --basefile which will transfer the file before the
1842 first job:
1843
1844 echo common data > common_file
1845 parallel --basefile common_file -S $SERVER1 \
1846 cat common_file\; echo {} ::: foo
1847
1848 Output:
1849
1850 common data
1851 foo
1852
1853 To remove it from the remote host after the last job use --cleanup.
1854
1855 Working dir
1856 The default working dir on the remote machines is the login dir. This
1857 can be changed with --workdir mydir.
1858
1859 Files transferred using --transferfile and --return will be relative to
1860 mydir on remote computers, and the command will be executed in the dir
1861 mydir.
1862
1863 The special mydir value ... will create working dirs under
1864 ~/.parallel/tmp on the remote computers. If --cleanup is given these
1865 dirs will be removed.
1866
1867 The special mydir value . uses the current working dir. If the current
1868 working dir is beneath your home dir, the value . is treated as the
1869 relative path to your home dir. This means that if your home dir is
1870 different on remote computers (e.g. if your login is different) the
1871 relative path will still be relative to your home dir.
1872
1873 parallel -S $SERVER1 pwd ::: ""
1874 parallel --workdir . -S $SERVER1 pwd ::: ""
1875 parallel --workdir ... -S $SERVER1 pwd ::: ""
1876
1877 Output:
1878
1879 [the login dir on $SERVER1]
1880 [current dir relative on $SERVER1]
1881 [a dir in ~/.parallel/tmp/...]
1882
1883 Avoid overloading sshd
1884 If many jobs are started on the same server, sshd can be overloaded.
1885 GNU parallel can insert a delay between each job run on the same
1886 server:
1887
1888 parallel -S $SERVER1 --sshdelay 0.2 echo ::: 1 2 3
1889
1890 Output (the order may be different):
1891
1892 1
1893 2
1894 3
1895
1896 sshd will be less overloaded if using --controlmaster, which will
1897 multiplex ssh connections:
1898
1899 parallel --controlmaster -S $SERVER1 echo ::: 1 2 3
1900
1901 Output: Same as above.
1902
1903 Ignore hosts that are down
1904 In clusters with many hosts a few of them are often down. GNU parallel
1905 can ignore those hosts. In this case the host 173.194.32.46 is down:
1906
1907 parallel --filter-hosts -S 173.194.32.46,$SERVER1 echo ::: bar
1908
1909 Output:
1910
1911 bar
1912
1913 Running the same commands on all hosts
1914 GNU parallel can run the same command on all the hosts:
1915
1916 parallel --onall -S $SERVER1,$SERVER2 echo ::: foo bar
1917
1918 Output (the order may be different):
1919
1920 foo
1921 bar
1922 foo
1923 bar
1924
1925 Often you will just want to run a single command on all hosts with out
1926 arguments. --nonall is a no argument --onall:
1927
1928 parallel --nonall -S $SERVER1,$SERVER2 echo foo bar
1929
1930 Output:
1931
1932 foo bar
1933 foo bar
1934
1935 When --tag is used with --nonall and --onall the --tagstring is the
1936 host:
1937
1938 parallel --nonall --tag -S $SERVER1,$SERVER2 echo foo bar
1939
1940 Output (the order may be different):
1941
1942 $SERVER1 foo bar
1943 $SERVER2 foo bar
1944
1945 --jobs sets the number of servers to log in to in parallel.
1946
1947 Transferring environment variables and functions
1948 env_parallel is a shell function that transfers all aliases, functions,
1949 variables, and arrays. You active it by running:
1950
1951 source `which env_parallel.bash`
1952
1953 Replace bash with the shell you use.
1954
1955 Now you can use env_parallel instead of parallel and still have your
1956 environment:
1957
1958 alias myecho=echo
1959 myvar="Joe's var is"
1960 env_parallel -S $SERVER1 'myecho $myvar' ::: green
1961
1962 Output:
1963
1964 Joe's var is green
1965
1966 The disadvantage is that if your environment is huge env_parallel will
1967 fail.
1968
1969 When env_parallel fails, you can still use --env to tell GNU parallel
1970 to transfer an environment variable to the remote system.
1971
1972 MYVAR='foo bar'
1973 export MYVAR
1974 parallel --env MYVAR -S $SERVER1 echo '$MYVAR' ::: baz
1975
1976 Output:
1977
1978 foo bar baz
1979
1980 This works for functions, too, if your shell is Bash:
1981
1982 # This only works in Bash
1983 my_func() {
1984 echo in my_func $1
1985 }
1986 export -f my_func
1987 parallel --env my_func -S $SERVER1 my_func ::: baz
1988
1989 Output:
1990
1991 in my_func baz
1992
1993 GNU parallel can copy all user defined variables and functions to the
1994 remote system. It just needs to record which ones to ignore in
1995 ~/.parallel/ignored_vars. Do that by running this once:
1996
1997 parallel --record-env
1998 cat ~/.parallel/ignored_vars
1999
2000 Output:
2001
2002 [list of variables to ignore - including $PATH and $HOME]
2003
2004 Now all other variables and functions defined will be copied when using
2005 --env _.
2006
2007 # The function is only copied if using Bash
2008 my_func2() {
2009 echo in my_func2 $VAR $1
2010 }
2011 export -f my_func2
2012 VAR=foo
2013 export VAR
2014
2015 parallel --env _ -S $SERVER1 'echo $VAR; my_func2' ::: bar
2016
2017 Output:
2018
2019 foo
2020 in my_func2 foo bar
2021
2022 If you use env_parallel the variables, functions, and aliases do not
2023 even need to be exported to be copied:
2024
2025 NOT='not exported var'
2026 alias myecho=echo
2027 not_ex() {
2028 myecho in not_exported_func $NOT $1
2029 }
2030 env_parallel --env _ -S $SERVER1 'echo $NOT; not_ex' ::: bar
2031
2032 Output:
2033
2034 not exported var
2035 in not_exported_func not exported var bar
2036
2037 Showing what is actually run
2038 --verbose will show the command that would be run on the local machine.
2039
2040 When using --cat, --pipepart, or when a job is run on a remote machine,
2041 the command is wrapped with helper scripts. -vv shows all of this.
2042
2043 parallel -vv --pipepart --block 1M wc :::: num30000
2044
2045 Output:
2046
2047 <num30000 perl -e 'while(@ARGV) { sysseek(STDIN,shift,0) || die;
2048 $left = shift; while($read = sysread(STDIN,$buf, ($left > 131072
2049 ? 131072 : $left))){ $left -= $read; syswrite(STDOUT,$buf); } }'
2050 0 0 0 168894 | (wc)
2051 30000 30000 168894
2052
2053 When the command gets more complex, the output is so hard to read, that
2054 it is only useful for debugging:
2055
2056 my_func3() {
2057 echo in my_func $1 > $1.out
2058 }
2059 export -f my_func3
2060 parallel -vv --workdir ... --nice 17 --env _ --trc {}.out \
2061 -S $SERVER1 my_func3 {} ::: abc-file
2062
2063 Output will be similar to:
2064
2065 ( ssh server -- mkdir -p ./.parallel/tmp/aspire-1928520-1;rsync
2066 --protocol 30 -rlDzR -essh ./abc-file
2067 server:./.parallel/tmp/aspire-1928520-1 );ssh server -- exec perl -e
2068 \''@GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
2069 eval"@GNU_Parallel";my$eval=decode_base64(join"",@ARGV);eval$eval;'\'
2070 c3lzdGVtKCJta2RpciIsIi1wIiwiLS0iLCIucGFyYWxsZWwvdG1wL2FzcGlyZS0xOTI4N
2071 TsgY2hkaXIgIi5wYXJhbGxlbC90bXAvYXNwaXJlLTE5Mjg1MjAtMSIgfHxwcmludChTVE
2072 BhcmFsbGVsOiBDYW5ub3QgY2hkaXIgdG8gLnBhcmFsbGVsL3RtcC9hc3BpcmUtMTkyODU
2073 iKSAmJiBleGl0IDI1NTskRU5WeyJPTERQV0QifT0iL2hvbWUvdGFuZ2UvcHJpdmF0L3Bh
2074 IjskRU5WeyJQQVJBTExFTF9QSUQifT0iMTkyODUyMCI7JEVOVnsiUEFSQUxMRUxfU0VRI
2075 0BiYXNoX2Z1bmN0aW9ucz1xdyhteV9mdW5jMyk7IGlmKCRFTlZ7IlNIRUxMIn09fi9jc2
2076 ByaW50IFNUREVSUiAiQ1NIL1RDU0ggRE8gTk9UIFNVUFBPUlQgbmV3bGluZXMgSU4gVkF
2077 TL0ZVTkNUSU9OUy4gVW5zZXQgQGJhc2hfZnVuY3Rpb25zXG4iOyBleGVjICJmYWxzZSI7
2078 YXNoZnVuYyA9ICJteV9mdW5jMygpIHsgIGVjaG8gaW4gbXlfZnVuYyBcJDEgPiBcJDEub
2079 Xhwb3J0IC1mIG15X2Z1bmMzID4vZGV2L251bGw7IjtAQVJHVj0ibXlfZnVuYzMgYWJjLW
2080 RzaGVsbD0iJEVOVntTSEVMTH0iOyR0bXBkaXI9Ii90bXAiOyRuaWNlPTE3O2RveyRFTlZ
2081 MRUxfVE1QfT0kdG1wZGlyLiIvcGFyIi5qb2luIiIsbWFweygwLi45LCJhIi4uInoiLCJB
2082 KVtyYW5kKDYyKV19KDEuLjUpO313aGlsZSgtZSRFTlZ7UEFSQUxMRUxfVE1QfSk7JFNJ
2083 fT1zdWJ7JGRvbmU9MTt9OyRwaWQ9Zm9yazt1bmxlc3MoJHBpZCl7c2V0cGdycDtldmFse
2084 W9yaXR5KDAsMCwkbmljZSl9O2V4ZWMkc2hlbGwsIi1jIiwoJGJhc2hmdW5jLiJAQVJHVi
2085 JleGVjOiQhXG4iO31kb3skcz0kczwxPzAuMDAxKyRzKjEuMDM6JHM7c2VsZWN0KHVuZGV
2086 mLHVuZGVmLCRzKTt9dW50aWwoJGRvbmV8fGdldHBwaWQ9PTEpO2tpbGwoU0lHSFVQLC0k
2087 dW5sZXNzJGRvbmU7d2FpdDtleGl0KCQ/JjEyNz8xMjgrKCQ/JjEyNyk6MSskPz4+OCk=;
2088 _EXIT_status=$?; mkdir -p ./.; rsync --protocol 30 --rsync-path=cd\
2089 ./.parallel/tmp/aspire-1928520-1/./.\;\ rsync -rlDzR -essh
2090 server:./abc-file.out ./.;ssh server -- \(rm\ -f\
2091 ./.parallel/tmp/aspire-1928520-1/abc-file\;\ sh\ -c\ \'rmdir\
2092 ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\
2093 2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);ssh
2094 server -- \(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file.out\;\
2095 sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\
2096 ./.parallel/\ 2\>/dev/null\'\;rm\ -rf\
2097 ./.parallel/tmp/aspire-1928520-1\;\);ssh server -- rm -rf
2098 .parallel/tmp/aspire-1928520-1; exit $_EXIT_status;
2099
2101 GNU parset will set shell variables to the output of GNU parallel. GNU
2102 parset has one important limitation: It cannot be part of a pipe. In
2103 particular this means it cannot read anything from standard input
2104 (stdin) or pipe output to another program.
2105
2106 To use GNU parset prepend command with destination variables:
2107
2108 parset myvar1,myvar2 echo ::: a b
2109 echo $myvar1
2110 echo $myvar2
2111
2112 Output:
2113
2114 a
2115 b
2116
2117 If you only give a single variable, it will be treated as an array:
2118
2119 parset myarray seq {} 5 ::: 1 2 3
2120 echo "${myarray[1]}"
2121
2122 Output:
2123
2124 2
2125 3
2126 4
2127 5
2128
2129 The commands to run can be an array:
2130
2131 cmd=("echo '<<joe \"double space\" cartoon>>'" "pwd")
2132 parset data ::: "${cmd[@]}"
2133 echo "${data[0]}"
2134 echo "${data[1]}"
2135
2136 Output:
2137
2138 <<joe "double space" cartoon>>
2139 [current dir]
2140
2142 GNU parallel can save into an SQL base. Point GNU parallel to a table
2143 and it will put the joblog there together with the variables and the
2144 output each in their own column.
2145
2146 CSV as SQL base
2147 The simplest is to use a CSV file as the storage table:
2148
2149 parallel --sqlandworker csv:///%2Ftmp/log.csv \
2150 seq ::: 10 ::: 12 13 14
2151 cat /tmp/log.csv
2152
2153 Note how '/' in the path must be written as %2F.
2154
2155 Output will be similar to:
2156
2157 Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2158 Command,V1,V2,Stdout,Stderr
2159 1,:,1458254498.254,0.069,0,9,0,0,"seq 10 12",10,12,"10
2160 11
2161 12
2162 ",
2163 2,:,1458254498.278,0.080,0,12,0,0,"seq 10 13",10,13,"10
2164 11
2165 12
2166 13
2167 ",
2168 3,:,1458254498.301,0.083,0,15,0,0,"seq 10 14",10,14,"10
2169 11
2170 12
2171 13
2172 14
2173 ",
2174
2175 A proper CSV reader (like LibreOffice or R's read.csv) will read this
2176 format correctly - even with fields containing newlines as above.
2177
2178 If the output is big you may want to put it into files using --results:
2179
2180 parallel --results outdir --sqlandworker csv:///%2Ftmp/log2.csv \
2181 seq ::: 10 ::: 12 13 14
2182 cat /tmp/log2.csv
2183
2184 Output will be similar to:
2185
2186 Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2187 Command,V1,V2,Stdout,Stderr
2188 1,:,1458824738.287,0.029,0,9,0,0,
2189 "seq 10 12",10,12,outdir/1/10/2/12/stdout,outdir/1/10/2/12/stderr
2190 2,:,1458824738.298,0.025,0,12,0,0,
2191 "seq 10 13",10,13,outdir/1/10/2/13/stdout,outdir/1/10/2/13/stderr
2192 3,:,1458824738.309,0.026,0,15,0,0,
2193 "seq 10 14",10,14,outdir/1/10/2/14/stdout,outdir/1/10/2/14/stderr
2194
2195 DBURL as table
2196 The CSV file is an example of a DBURL.
2197
2198 GNU parallel uses a DBURL to address the table. A DBURL has this
2199 format:
2200
2201 vendor://[[user][:password]@][host][:port]/[database[/table]
2202
2203 Example:
2204
2205 mysql://scott:tiger@my.example.com/mydatabase/mytable
2206 postgresql://scott:tiger@pg.example.com/mydatabase/mytable
2207 sqlite3:///%2Ftmp%2Fmydatabase/mytable
2208 csv:///%2Ftmp/log.csv
2209
2210 To refer to /tmp/mydatabase with sqlite or csv you need to encode the /
2211 as %2F.
2212
2213 Run a job using sqlite on mytable in /tmp/mydatabase:
2214
2215 DBURL=sqlite3:///%2Ftmp%2Fmydatabase
2216 DBURLTABLE=$DBURL/mytable
2217 parallel --sqlandworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2218
2219 To see the result:
2220
2221 sql $DBURL 'SELECT * FROM mytable ORDER BY Seq;'
2222
2223 Output will be similar to:
2224
2225 Seq|Host|Starttime|JobRuntime|Send|Receive|Exitval|_Signal|
2226 Command|V1|V2|Stdout|Stderr
2227 1|:|1451619638.903|0.806||8|0|0|echo foo baz|foo|baz|foo baz
2228 |
2229 2|:|1451619639.265|1.54||9|0|0|echo foo quuz|foo|quuz|foo quuz
2230 |
2231 3|:|1451619640.378|1.43||8|0|0|echo bar baz|bar|baz|bar baz
2232 |
2233 4|:|1451619641.473|0.958||9|0|0|echo bar quuz|bar|quuz|bar quuz
2234 |
2235
2236 The first columns are well known from --joblog. V1 and V2 are data from
2237 the input sources. Stdout and Stderr are standard output and standard
2238 error, respectively.
2239
2240 Using multiple workers
2241 Using an SQL base as storage costs overhead in the order of 1 second
2242 per job.
2243
2244 One of the situations where it makes sense is if you have multiple
2245 workers.
2246
2247 You can then have a single master machine that submits jobs to the SQL
2248 base (but does not do any of the work):
2249
2250 parallel --sqlmaster $DBURLTABLE echo ::: foo bar ::: baz quuz
2251
2252 On the worker machines you run exactly the same command except you
2253 replace --sqlmaster with --sqlworker.
2254
2255 parallel --sqlworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2256
2257 To run a master and a worker on the same machine use --sqlandworker as
2258 shown earlier.
2259
2261 The --pipe functionality puts GNU parallel in a different mode: Instead
2262 of treating the data on stdin (standard input) as arguments for a
2263 command to run, the data will be sent to stdin (standard input) of the
2264 command.
2265
2266 The typical situation is:
2267
2268 command_A | command_B | command_C
2269
2270 where command_B is slow, and you want to speed up command_B.
2271
2272 Chunk size
2273 By default GNU parallel will start an instance of command_B, read a
2274 chunk of 1 MB, and pass that to the instance. Then start another
2275 instance, read another chunk, and pass that to the second instance.
2276
2277 cat num1000000 | parallel --pipe wc
2278
2279 Output (the order may be different):
2280
2281 165668 165668 1048571
2282 149797 149797 1048579
2283 149796 149796 1048572
2284 149797 149797 1048579
2285 149797 149797 1048579
2286 149796 149796 1048572
2287 85349 85349 597444
2288
2289 The size of the chunk is not exactly 1 MB because GNU parallel only
2290 passes full lines - never half a line, thus the blocksize is only 1 MB
2291 on average. You can change the block size to 2 MB with --block:
2292
2293 cat num1000000 | parallel --pipe --block 2M wc
2294
2295 Output (the order may be different):
2296
2297 315465 315465 2097150
2298 299593 299593 2097151
2299 299593 299593 2097151
2300 85349 85349 597444
2301
2302 GNU parallel treats each line as a record. If the order of records is
2303 unimportant (e.g. you need all lines processed, but you do not care
2304 which is processed first), then you can use --roundrobin. Without
2305 --roundrobin GNU parallel will start a command per block; with
2306 --roundrobin only the requested number of jobs will be started
2307 (--jobs). The records will then be distributed between the running
2308 jobs:
2309
2310 cat num1000000 | parallel --pipe -j4 --roundrobin wc
2311
2312 Output will be similar to:
2313
2314 149797 149797 1048579
2315 299593 299593 2097151
2316 315465 315465 2097150
2317 235145 235145 1646016
2318
2319 One of the 4 instances got a single record, 2 instances got 2 full
2320 records each, and one instance got 1 full and 1 partial record.
2321
2322 Records
2323 GNU parallel sees the input as records. The default record is a single
2324 line.
2325
2326 Using -N140000 GNU parallel will read 140000 records at a time:
2327
2328 cat num1000000 | parallel --pipe -N140000 wc
2329
2330 Output (the order may be different):
2331
2332 140000 140000 868895
2333 140000 140000 980000
2334 140000 140000 980000
2335 140000 140000 980000
2336 140000 140000 980000
2337 140000 140000 980000
2338 140000 140000 980000
2339 20000 20000 140001
2340
2341 Note how that the last job could not get the full 140000 lines, but
2342 only 20000 lines.
2343
2344 If a record is 75 lines -L can be used:
2345
2346 cat num1000000 | parallel --pipe -L75 wc
2347
2348 Output (the order may be different):
2349
2350 165600 165600 1048095
2351 149850 149850 1048950
2352 149775 149775 1048425
2353 149775 149775 1048425
2354 149850 149850 1048950
2355 149775 149775 1048425
2356 85350 85350 597450
2357 25 25 176
2358
2359 Note how GNU parallel still reads a block of around 1 MB; but instead
2360 of passing full lines to wc it passes full 75 lines at a time. This of
2361 course does not hold for the last job (which in this case got 25
2362 lines).
2363
2364 Fixed length records
2365 Fixed length records can be processed by setting --recend '' and
2366 --block recordsize. A header of size n can be processed with --header
2367 .{n}.
2368
2369 Here is how to process a file with a 4-byte header and a 3-byte record
2370 size:
2371
2372 cat fixedlen | parallel --pipe --header .{4} --block 3 --recend '' \
2373 'echo start; cat; echo'
2374
2375 Output:
2376
2377 start
2378 HHHHAAA
2379 start
2380 HHHHCCC
2381 start
2382 HHHHBBB
2383
2384 It may be more efficient to increase --block to a multiplum of the
2385 record size.
2386
2387 Record separators
2388 GNU parallel uses separators to determine where two records split.
2389
2390 --recstart gives the string that starts a record; --recend gives the
2391 string that ends a record. The default is --recend '\n' (newline).
2392
2393 If both --recend and --recstart are given, then the record will only
2394 split if the recend string is immediately followed by the recstart
2395 string.
2396
2397 Here the --recend is set to ', ':
2398
2399 echo /foo, bar/, /baz, qux/, | \
2400 parallel -kN1 --recend ', ' --pipe echo JOB{#}\;cat\;echo END
2401
2402 Output:
2403
2404 JOB1
2405 /foo, END
2406 JOB2
2407 bar/, END
2408 JOB3
2409 /baz, END
2410 JOB4
2411 qux/,
2412 END
2413
2414 Here the --recstart is set to /:
2415
2416 echo /foo, bar/, /baz, qux/, | \
2417 parallel -kN1 --recstart / --pipe echo JOB{#}\;cat\;echo END
2418
2419 Output:
2420
2421 JOB1
2422 /foo, barEND
2423 JOB2
2424 /, END
2425 JOB3
2426 /baz, quxEND
2427 JOB4
2428 /,
2429 END
2430
2431 Here both --recend and --recstart are set:
2432
2433 echo /foo, bar/, /baz, qux/, | \
2434 parallel -kN1 --recend ', ' --recstart / --pipe \
2435 echo JOB{#}\;cat\;echo END
2436
2437 Output:
2438
2439 JOB1
2440 /foo, bar/, END
2441 JOB2
2442 /baz, qux/,
2443 END
2444
2445 Note the difference between setting one string and setting both
2446 strings.
2447
2448 With --regexp the --recend and --recstart will be treated as a regular
2449 expression:
2450
2451 echo foo,bar,_baz,__qux, | \
2452 parallel -kN1 --regexp --recend ,_+ --pipe \
2453 echo JOB{#}\;cat\;echo END
2454
2455 Output:
2456
2457 JOB1
2458 foo,bar,_END
2459 JOB2
2460 baz,__END
2461 JOB3
2462 qux,
2463 END
2464
2465 GNU parallel can remove the record separators with
2466 --remove-rec-sep/--rrs:
2467
2468 echo foo,bar,_baz,__qux, | \
2469 parallel -kN1 --rrs --regexp --recend ,_+ --pipe \
2470 echo JOB{#}\;cat\;echo END
2471
2472 Output:
2473
2474 JOB1
2475 foo,barEND
2476 JOB2
2477 bazEND
2478 JOB3
2479 qux,
2480 END
2481
2482 Header
2483 If the input data has a header, the header can be repeated for each job
2484 by matching the header with --header. If headers start with % you can
2485 do this:
2486
2487 cat num_%header | \
2488 parallel --header '(%.*\n)*' --pipe -N3 echo JOB{#}\;cat
2489
2490 Output (the order may be different):
2491
2492 JOB1
2493 %head1
2494 %head2
2495 1
2496 2
2497 3
2498 JOB2
2499 %head1
2500 %head2
2501 4
2502 5
2503 6
2504 JOB3
2505 %head1
2506 %head2
2507 7
2508 8
2509 9
2510 JOB4
2511 %head1
2512 %head2
2513 10
2514
2515 If the header is 2 lines, --header 2 will work:
2516
2517 cat num_%header | parallel --header 2 --pipe -N3 echo JOB{#}\;cat
2518
2519 Output: Same as above.
2520
2521 --pipepart
2522 --pipe is not very efficient. It maxes out at around 500 MB/s.
2523 --pipepart can easily deliver 5 GB/s. But there are a few limitations.
2524 The input has to be a normal file (not a pipe) given by -a or :::: and
2525 -L/-l/-N do not work. --recend and --recstart, however, do work, and
2526 records can often be split on that alone.
2527
2528 parallel --pipepart -a num1000000 --block 3m wc
2529
2530 Output (the order may be different):
2531
2532 444443 444444 3000002
2533 428572 428572 3000004
2534 126985 126984 888890
2535
2537 Input data and parallel command in the same file
2538 GNU parallel is often called as this:
2539
2540 cat input_file | parallel command
2541
2542 With --shebang the input_file and parallel can be combined into the
2543 same script.
2544
2545 UNIX shell scripts start with a shebang line like this:
2546
2547 #!/bin/bash
2548
2549 GNU parallel can do that, too. With --shebang the arguments can be
2550 listed in the file. The parallel command is the first line of the
2551 script:
2552
2553 #!/usr/bin/parallel --shebang -r echo
2554
2555 foo
2556 bar
2557 baz
2558
2559 Output (the order may be different):
2560
2561 foo
2562 bar
2563 baz
2564
2565 Parallelizing existing scripts
2566 GNU parallel is often called as this:
2567
2568 cat input_file | parallel command
2569 parallel command ::: foo bar
2570
2571 If command is a script, parallel can be combined into a single file so
2572 this will run the script in parallel:
2573
2574 cat input_file | command
2575 command foo bar
2576
2577 This perl script perl_echo works like echo:
2578
2579 #!/usr/bin/perl
2580
2581 print "@ARGV\n"
2582
2583 It can be called as this:
2584
2585 parallel perl_echo ::: foo bar
2586
2587 By changing the #!-line it can be run in parallel:
2588
2589 #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2590
2591 print "@ARGV\n"
2592
2593 Thus this will work:
2594
2595 perl_echo foo bar
2596
2597 Output (the order may be different):
2598
2599 foo
2600 bar
2601
2602 This technique can be used for:
2603
2604 Perl:
2605 #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2606
2607 print "Arguments @ARGV\n";
2608
2609 Python:
2610 #!/usr/bin/parallel --shebang-wrap /usr/bin/python
2611
2612 import sys
2613 print 'Arguments', str(sys.argv)
2614
2615 Bash/sh/zsh/Korn shell:
2616 #!/usr/bin/parallel --shebang-wrap /bin/bash
2617
2618 echo Arguments "$@"
2619
2620 csh:
2621 #!/usr/bin/parallel --shebang-wrap /bin/csh
2622
2623 echo Arguments "$argv"
2624
2625 Tcl:
2626 #!/usr/bin/parallel --shebang-wrap /usr/bin/tclsh
2627
2628 puts "Arguments $argv"
2629
2630 R:
2631 #!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave
2632
2633 args <- commandArgs(trailingOnly = TRUE)
2634 print(paste("Arguments ",args))
2635
2636 GNUplot:
2637 #!/usr/bin/parallel --shebang-wrap ARG={} /usr/bin/gnuplot
2638
2639 print "Arguments ", system('echo $ARG')
2640
2641 Ruby:
2642 #!/usr/bin/parallel --shebang-wrap /usr/bin/ruby
2643
2644 print "Arguments "
2645 puts ARGV
2646
2647 Octave:
2648 #!/usr/bin/parallel --shebang-wrap /usr/bin/octave
2649
2650 printf ("Arguments");
2651 arg_list = argv ();
2652 for i = 1:nargin
2653 printf (" %s", arg_list{i});
2654 endfor
2655 printf ("\n");
2656
2657 Common LISP:
2658 #!/usr/bin/parallel --shebang-wrap /usr/bin/clisp
2659
2660 (format t "~&~S~&" 'Arguments)
2661 (format t "~&~S~&" *args*)
2662
2663 PHP:
2664 #!/usr/bin/parallel --shebang-wrap /usr/bin/php
2665 <?php
2666 echo "Arguments";
2667 foreach(array_slice($argv,1) as $v)
2668 {
2669 echo " $v";
2670 }
2671 echo "\n";
2672 ?>
2673
2674 Node.js:
2675 #!/usr/bin/parallel --shebang-wrap /usr/bin/node
2676
2677 var myArgs = process.argv.slice(2);
2678 console.log('Arguments ', myArgs);
2679
2680 LUA:
2681 #!/usr/bin/parallel --shebang-wrap /usr/bin/lua
2682
2683 io.write "Arguments"
2684 for a = 1, #arg do
2685 io.write(" ")
2686 io.write(arg[a])
2687 end
2688 print("")
2689
2690 C#:
2691 #!/usr/bin/parallel --shebang-wrap ARGV={} /usr/bin/csharp
2692
2693 var argv = Environment.GetEnvironmentVariable("ARGV");
2694 print("Arguments "+argv);
2695
2697 GNU parallel can work as a counting semaphore. This is slower and less
2698 efficient than its normal mode.
2699
2700 A counting semaphore is like a row of toilets. People needing a toilet
2701 can use any toilet, but if there are more people than toilets, they
2702 will have to wait for one of the toilets to become available.
2703
2704 An alias for parallel --semaphore is sem.
2705
2706 sem will follow a person to the toilets, wait until a toilet is
2707 available, leave the person in the toilet and exit.
2708
2709 sem --fg will follow a person to the toilets, wait until a toilet is
2710 available, stay with the person in the toilet and exit when the person
2711 exits.
2712
2713 sem --wait will wait for all persons to leave the toilets.
2714
2715 sem does not have a queue discipline, so the next person is chosen
2716 randomly.
2717
2718 -j sets the number of toilets.
2719
2720 Mutex
2721 The default is to have only one toilet (this is called a mutex). The
2722 program is started in the background and sem exits immediately. Use
2723 --wait to wait for all sems to finish:
2724
2725 sem 'sleep 1; echo The first finished' &&
2726 echo The first is now running in the background &&
2727 sem 'sleep 1; echo The second finished' &&
2728 echo The second is now running in the background
2729 sem --wait
2730
2731 Output:
2732
2733 The first is now running in the background
2734 The first finished
2735 The second is now running in the background
2736 The second finished
2737
2738 The command can be run in the foreground with --fg, which will only
2739 exit when the command completes:
2740
2741 sem --fg 'sleep 1; echo The first finished' &&
2742 echo The first finished running in the foreground &&
2743 sem --fg 'sleep 1; echo The second finished' &&
2744 echo The second finished running in the foreground
2745 sem --wait
2746
2747 The difference between this and just running the command, is that a
2748 mutex is set, so if other sems were running in the background only one
2749 would run at a time.
2750
2751 To control which semaphore is used, use --semaphorename/--id. Run this
2752 in one terminal:
2753
2754 sem --id my_id -u 'echo First started; sleep 10; echo First done'
2755
2756 and simultaneously this in another terminal:
2757
2758 sem --id my_id -u 'echo Second started; sleep 10; echo Second done'
2759
2760 Note how the second will only be started when the first has finished.
2761
2762 Counting semaphore
2763 A mutex is like having a single toilet: When it is in use everyone else
2764 will have to wait. A counting semaphore is like having multiple
2765 toilets: Several people can use the toilets, but when they all are in
2766 use, everyone else will have to wait.
2767
2768 sem can emulate a counting semaphore. Use --jobs to set the number of
2769 toilets like this:
2770
2771 sem --jobs 3 --id my_id -u 'echo Start 1; sleep 5; echo 1 done' &&
2772 sem --jobs 3 --id my_id -u 'echo Start 2; sleep 6; echo 2 done' &&
2773 sem --jobs 3 --id my_id -u 'echo Start 3; sleep 7; echo 3 done' &&
2774 sem --jobs 3 --id my_id -u 'echo Start 4; sleep 8; echo 4 done' &&
2775 sem --wait --id my_id
2776
2777 Output:
2778
2779 Start 1
2780 Start 2
2781 Start 3
2782 1 done
2783 Start 4
2784 2 done
2785 3 done
2786 4 done
2787
2788 Timeout
2789 With --semaphoretimeout you can force running the command anyway after
2790 a period (positive number) or give up (negative number):
2791
2792 sem --id foo -u 'echo Slow started; sleep 5; echo Slow ended' &&
2793 sem --id foo --semaphoretimeout 1 'echo Forced running after 1 sec' &&
2794 sem --id foo --semaphoretimeout -2 'echo Give up after 2 secs'
2795 sem --id foo --wait
2796
2797 Output:
2798
2799 Slow started
2800 parallel: Warning: Semaphore timed out. Stealing the semaphore.
2801 Forced running after 1 sec
2802 parallel: Warning: Semaphore timed out. Exiting.
2803 Slow ended
2804
2805 Note how the 'Give up' was not run.
2806
2808 GNU parallel has some options to give short information about the
2809 configuration.
2810
2811 --help will print a summary of the most important options:
2812
2813 parallel --help
2814
2815 Output:
2816
2817 Usage:
2818
2819 parallel [options] [command [arguments]] < list_of_arguments
2820 parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
2821 cat ... | parallel --pipe [options] [command [arguments]]
2822
2823 -j n Run n jobs in parallel
2824 -k Keep same order
2825 -X Multiple arguments with context replace
2826 --colsep regexp Split input on regexp for positional replacements
2827 {} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings
2828 {3} {3.} {3/} {3/.} {=3 perl code =} Positional replacement strings
2829 With --plus: {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
2830 {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
2831
2832 -S sshlogin Example: foo@server.example.com
2833 --slf .. Use ~/.parallel/sshloginfile as the list of sshlogins
2834 --trc {}.bar Shorthand for --transfer --return {}.bar --cleanup
2835 --onall Run the given command with argument on all sshlogins
2836 --nonall Run the given command with no arguments on all sshlogins
2837
2838 --pipe Split stdin (standard input) to multiple jobs.
2839 --recend str Record end separator for --pipe.
2840 --recstart str Record start separator for --pipe.
2841
2842 See 'man parallel' for details
2843
2844 Academic tradition requires you to cite works you base your article on.
2845 When using programs that use GNU Parallel to process data for publication
2846 please cite:
2847
2848 O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
2849 ;login: The USENIX Magazine, February 2011:42-47.
2850
2851 This helps funding further development; AND IT WON'T COST YOU A CENT.
2852 If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2853
2854 When asking for help, always report the full output of this:
2855
2856 parallel --version
2857
2858 Output:
2859
2860 GNU parallel 20200122
2861 Copyright (C) 2007-2020 Ole Tange, http://ole.tange.dk and Free Software
2862 Foundation, Inc.
2863 License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
2864 This is free software: you are free to change and redistribute it.
2865 GNU parallel comes with no warranty.
2866
2867 Web site: http://www.gnu.org/software/parallel
2868
2869 When using programs that use GNU Parallel to process data for publication
2870 please cite as described in 'parallel --citation'.
2871
2872 In scripts --minversion can be used to ensure the user has at least
2873 this version:
2874
2875 parallel --minversion 20130722 && \
2876 echo Your version is at least 20130722.
2877
2878 Output:
2879
2880 20160322
2881 Your version is at least 20130722.
2882
2883 If you are using GNU parallel for research the BibTeX citation can be
2884 generated using --citation:
2885
2886 parallel --citation
2887
2888 Output:
2889
2890 Academic tradition requires you to cite works you base your article on.
2891 When using programs that use GNU Parallel to process data for publication
2892 please cite:
2893
2894 @article{Tange2011a,
2895 title = {GNU Parallel - The Command-Line Power Tool},
2896 author = {O. Tange},
2897 address = {Frederiksberg, Denmark},
2898 journal = {;login: The USENIX Magazine},
2899 month = {Feb},
2900 number = {1},
2901 volume = {36},
2902 url = {http://www.gnu.org/s/parallel},
2903 year = {2011},
2904 pages = {42-47},
2905 doi = {10.5281/zenodo.16303}
2906 }
2907
2908 (Feel free to use \nocite{Tange2011a})
2909
2910 This helps funding further development; AND IT WON'T COST YOU A CENT.
2911 If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2912
2913 If you send a copy of your published article to tange@gnu.org, it will be
2914 mentioned in the release notes of next version of GNU Parallel.
2915
2916 With --max-line-length-allowed GNU parallel will report the maximal
2917 size of the command line:
2918
2919 parallel --max-line-length-allowed
2920
2921 Output (may vary on different systems):
2922
2923 131071
2924
2925 --number-of-cpus and --number-of-cores run system specific code to
2926 determine the number of CPUs and CPU cores on the system. On
2927 unsupported platforms they will return 1:
2928
2929 parallel --number-of-cpus
2930 parallel --number-of-cores
2931
2932 Output (may vary on different systems):
2933
2934 4
2935 64
2936
2938 The defaults for GNU parallel can be changed systemwide by putting the
2939 command line options in /etc/parallel/config. They can be changed for a
2940 user by putting them in ~/.parallel/config.
2941
2942 Profiles work the same way, but have to be referred to with --profile:
2943
2944 echo '--nice 17' > ~/.parallel/nicetimeout
2945 echo '--timeout 300%' >> ~/.parallel/nicetimeout
2946 parallel --profile nicetimeout echo ::: A B C
2947
2948 Output:
2949
2950 A
2951 B
2952 C
2953
2954 Profiles can be combined:
2955
2956 echo '-vv --dry-run' > ~/.parallel/dryverbose
2957 parallel --profile dryverbose --profile nicetimeout echo ::: A B C
2958
2959 Output:
2960
2961 echo A
2962 echo B
2963 echo C
2964
2966 I hope you have learned something from this tutorial.
2967
2968 If you like GNU parallel:
2969
2970 · (Re-)walk through the tutorial if you have not done so in the past
2971 year (http://www.gnu.org/software/parallel/parallel_tutorial.html)
2972
2973 · Give a demo at your local user group/your team/your colleagues
2974
2975 · Post the intro videos and the tutorial on Reddit, Mastodon,
2976 Diaspora*, forums, blogs, Identi.ca, Google+, Twitter, Facebook,
2977 Linkedin, and mailing lists
2978
2979 · Request or write a review for your favourite blog or magazine
2980 (especially if you do something cool with GNU parallel)
2981
2982 · Invite me for your next conference
2983
2984 If you use GNU parallel for research:
2985
2986 · Please cite GNU parallel in you publications (use --citation)
2987
2988 If GNU parallel saves you money:
2989
2990 · (Have your company) donate to FSF or become a member
2991 https://my.fsf.org/donate/
2992
2993 (C) 2013-2020 Ole Tange, FDLv1.3 (See fdl.txt)
2994
2995
2996
299720200322 2020-04-22 PARALLEL_TUTORIAL(7)