parallel_book(7)

1PARALLEL_BOOK(7)                   parallel                   PARALLEL_BOOK(7)
2
3
4

Why should you read this book?

6       If you write shell scripts to do the same processing for different
7       input, then GNU parallel will make your life easier and make your
8       scripts run faster.
9
10       The book is written so you get the juicy parts first: The goal is that
11       you read just enough to get you going. GNU parallel has an overwhelming
12       amount of special features to help in different situations, and to
13       avoid overloading you with information, the most used features are
14       presented first.
15
16       All the examples are tested in Bash, and most will work in other
17       shells, too, but there are a few exceptions. So you are recommended to
18       use Bash while testing out the examples.
19

Learn GNU Parallel in 5 minutes

21       You just need to run commands in parallel. You do not care about fine
22       tuning.
23
24       To get going please run this to make some example files:
25
26         # If your system does not have 'seq', replace 'seq' with 'jot'
27         seq 5 | parallel seq {} '>' example.{}
28
29   Input sources
30       GNU parallel reads values from input sources. One input source is the
31       command line. The values are put after ::: :
32
33         parallel echo ::: 1 2 3 4 5
34
35       This makes it easy to run the same program on some files:
36
37         parallel wc ::: example.*
38
39       If you give multiple :::s, GNU parallel will generate all combinations:
40
41         parallel wc ::: -l -c ::: example.*
42
43       GNU parallel can also read the values from stdin (standard input):
44
45         seq 5 | parallel echo
46
47   Building the command line
48       The command line is put before the :::. It can contain contain a
49       command and options for the command:
50
51         parallel wc -l ::: example.*
52
53       The command can contain multiple programs. Just remember to quote
54       characters that are interpreted by the shell (such as ;):
55
56         parallel echo counting lines';' wc -l ::: example.*
57
58       The value will normally be appended to the command, but can be placed
59       anywhere by using the replacement string {}:
60
61         parallel echo counting {}';' wc -l {} ::: example.*
62
63       When using multiple input sources you use the positional replacement
64       strings {1} and {2}:
65
66         parallel echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*
67
68       You can check what will be run with --dry-run:
69
70         parallel --dry-run echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*
71
72       This is a good idea to do for every command until you are comfortable
73       with GNU parallel.
74
75   Controlling the output
76       The output will be printed as soon as the command completes. This means
77       the output may come in a different order than the input:
78
79         parallel sleep {}';' echo {} done ::: 5 4 3 2 1
80
81       You can force GNU parallel to print in the order of the values with
82       --keep-order/-k. This will still run the commands in parallel.  The
83       output of the later jobs will be delayed, until the earlier jobs are
84       printed:
85
86         parallel -k sleep {}';' echo {} done ::: 5 4 3 2 1
87
88   Controlling the execution
89       If your jobs are compute intensive, you will most likely run one job
90       for each core in the system. This is the default for GNU parallel.
91
92       But sometimes you want more jobs running. You control the number of job
93       slots with -j. Give -j the number of jobs to run in parallel:
94
95         parallel -j50 \
96           wget http://ftpmirror.gnu.org/parallel/parallel-{1}{2}22.tar.bz2 \
97           ::: 2012 2013 2014 2015 2016 \
98           ::: 01 02 03 04 05 06 07 08 09 10 11 12
99
100   Pipe mode
101       GNU parallel can also pass blocks of data to commands on stdin
102       (standard input):
103
104         seq 1000000 | parallel --pipe wc
105
106       This can be used to process big text files. By default GNU parallel
107       splits on \n (newline) and passes a block of around 1 MB to each job.
108
109   That's it
110       You have now learned the basic use of GNU parallel. This will probably
111       cover most cases of your use of GNU parallel.
112
113       The rest of this document will go into more details on each of the
114       sections and cover special use cases.
115

Learn GNU Parallel in an hour

117       In this part we will dive deeper into what you learned in the first 5
118       minutes.
119
120       To get going please run this to make some example files:
121
122         seq 6 > seq6
123         seq 6 -1 1 > seq-6
124
125   Input sources
126       On top of the command line, input sources can also be stdin (standard
127       input or '-'), files and fifos and they can be mixed. Files are given
128       after -a or ::::. So these all do the same:
129
130         parallel echo Dice1={1} Dice2={2} ::: 1 2 3 4 5 6 ::: 6 5 4 3 2 1
131         parallel echo Dice1={1} Dice2={2} :::: <(seq 6) :::: <(seq 6 -1 1)
132         parallel echo Dice1={1} Dice2={2} :::: seq6 seq-6
133         parallel echo Dice1={1} Dice2={2} :::: seq6 :::: seq-6
134         parallel -a seq6 -a seq-6 echo Dice1={1} Dice2={2}
135         parallel -a seq6 echo Dice1={1} Dice2={2} :::: seq-6
136         parallel echo Dice1={1} Dice2={2} ::: 1 2 3 4 5 6 :::: seq-6
137         cat seq-6 | parallel echo Dice1={1} Dice2={2} :::: seq6 -
138
139       If stdin (standard input) is the only input source, you do not need the
140       '-':
141
142         cat seq6 | parallel echo Dice1={1}
143
144       Linking input sources
145
146       You can link multiple input sources with :::+ and ::::+:
147
148         parallel echo {1}={2} ::: I II III IV V VI :::+ 1 2 3 4 5 6
149         parallel echo {1}={2} ::: I II III IV V VI ::::+ seq6
150
151       The :::+ (and ::::+) will link each value to the corresponding value in
152       the previous input source, so value number 3 from the first input
153       source will be linked to value number 3 from the second input source.
154
155       You can combine :::+ and :::, so you link 2 input sources, but generate
156       all combinations with other input sources:
157
158         parallel echo Dice1={1}={2} Dice2={3}={4} ::: I II III IV V VI ::::+ seq6 \
159           ::: VI V IV III II I ::::+ seq-6
160
161   Building the command line
162       The command
163
164       The command can be a script, a binary or a Bash function if the
165       function is exported using export -f:
166
167         # Works only in Bash
168         my_func() {
169           echo in my_func "$1"
170         }
171         export -f my_func
172         parallel my_func ::: 1 2 3
173
174       If the command is complex, it often improves readability to make it
175       into a function.
176
177       The replacement strings
178
179       GNU parallel has some replacement strings to make it easier to refer to
180       the input read from the input sources.
181
182       If the input is mydir/mysubdir/myfile.myext then:
183
184         {} = mydir/mysubdir/myfile.myext
185         {.} = mydir/mysubdir/myfile
186         {/} = myfile.myext
187         {//} = mydir/mysubdir
188         {/.} = myfile
189         {#} = the sequence number of the job
190         {%} = the job slot number
191
192       When a job is started it gets a sequence number that starts at 1 and
193       increases by 1 for each new job. The job also gets assigned a slot
194       number. This number is from 1 to the number of jobs running in
195       parallel. It is unique between the running jobs, but is re-used as soon
196       as a job finishes.
197
198       The positional replacement strings
199
200       The replacement strings have corresponding positional replacement
201       strings. If the value from the 3rd input source is
202       mydir/mysubdir/myfile.myext:
203
204         {3} = mydir/mysubdir/myfile.myext
205         {3.} = mydir/mysubdir/myfile
206         {3/} = myfile.myext
207         {3//} = mydir/mysubdir
208         {3/.} = myfile
209
210       So the number of the input source is simply prepended inside the {}'s.
211

Replacement strings

213       --plus replacement strings
214
215       change the replacement string (-I --extensionreplace --basenamereplace
216       --basenamereplace --dirnamereplace --basenameextensionreplace
217       --seqreplace --slotreplace
218
219       --header with named replacement string
220
221       {= =}
222
223       Dynamic replacement strings
224
225   Defining replacement strings
226   Copying environment
227       env_parallel
228
229   Controlling the output
230       parset
231
232       parset is a shell function to get the output from GNU parallel into
233       shell variables.
234
235       parset is fully supported for Bash/Zsh/Ksh and partially supported for
236       ash/dash. I will assume you run Bash.
237
238       To activate parset you have to run:
239
240         . `which env_parallel.bash`
241
242       (replace bash with your shell's name).
243
244       Then you can run:
245
246         parset a,b,c seq ::: 4 5 6
247         echo "$c"
248
249       or:
250
251         parset 'a b c' seq ::: 4 5 6
252         echo "$c"
253
254       If you give a single variable, this will become an array:
255
256         parset arr seq ::: 4 5 6
257         echo "${arr[1]}"
258
259       parset has one limitation: If it reads from a pipe, the output will be
260       lost.
261
262         echo This will not work | parset myarr echo
263         echo Nothing: "${myarr[*]}"
264
265       Instead you can do this:
266
267         echo This will work > tempfile
268         parset myarr echo < tempfile
269         echo ${myarr[*]}
270
271       sql cvs
272
273   Controlling the execution
274       --dryrun -v
275
276   Remote execution
277       For this section you must have ssh access with no password to 2
278       servers: $server1 and $server2.
279
280         server1=server.example.com
281         server2=server2.example.net
282
283       So you must be able to do this:
284
285         ssh $server1 echo works
286         ssh $server2 echo works
287
288       It can be setup by running 'ssh-keygen -t dsa; ssh-copy-id $server1'
289       and using an empty passphrase. Or you can use ssh-agent.
290
291       Workers
292
293       --transferfile
294
295       --transferfile filename will transfer filename to the worker. filename
296       can contain a replacement string:
297
298         parallel -S $server1,$server2 --transferfile {} wc ::: example.*
299         parallel -S $server1,$server2 --transferfile {2} \
300            echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*
301
302       A shorthand for --transferfile {} is --transfer.
303
304       --return
305
306       --cleanup
307
308       A shorthand for --transfer --return {} --cleanup is --trc {}.
309
310   Pipe mode
311       --pipepart
312
313   That's it

Advanced usage

315       parset fifo, cmd substitution, arrayelements, array with var names and
316       cmds, env_parset
317
318       env_parallel
319
320       Interfacing with R.
321
322       Interfacing with JSON/jq
323
324       4dl() {
325         board="$(printf -- '%s' "${1}" | cut -d '/' -f4)"
326         thread="$(printf -- '%s' "${1}" | cut -d '/' -f6)"
327         wget -qO- "https://a.4cdn.org/${board}/thread/${thread}.json" |
328           jq -r '
329             .posts
330             | map(select(.tim != null))
331             | map((.tim | tostring) + .ext)
332             | map("https://i.4cdn.org/'"${board}"'/"+.)[]
333           ' |
334             parallel --gnu -j 0 wget -nv }
335
336       Interfacing with XML/?
337
338       Interfacing with HTML/?
339
340   Controlling the execution
341       --termseq
342
343   Remote execution
344       seq 10 | parallel --sshlogin 'ssh -i "key.pem" a@b.com' echo
345
346       seq 10 | PARALLEL_SSH='ssh -i "key.pem"' parallel --sshlogin a@b.com
347       echo
348
349       seq 10 | parallel --ssh 'ssh -i "key.pem"' --sshlogin a@b.com echo
350
351       ssh-agent
352
353       The sshlogin file format
354
355       Check if servers are up
356
357
358
35920180722                          2018-08-22                  PARALLEL_BOOK(7)