1PARALLEL_BOOK(7) parallel PARALLEL_BOOK(7)
2
3
4
6 If you write shell scripts to do the same processing for different
7 input, then GNU parallel will make your life easier and make your
8 scripts run faster.
9
10 The book is written so you get the juicy parts first: The goal is that
11 you read just enough to get you going. GNU parallel has an overwhelming
12 amount of special features to help in different situations, and to
13 avoid overloading you with information, the most used features are
14 presented first.
15
16 All the examples are tested in Bash, and most will work in other
17 shells, too, but there are a few exceptions. So you are recommended to
18 use Bash while testing out the examples.
19
21 You just need to run commands in parallel. You do not care about fine
22 tuning.
23
24 To get going please run this to make some example files:
25
26 # If your system does not have 'seq', replace 'seq' with 'jot'
27 seq 5 | parallel seq {} '>' example.{}
28
29 Input sources
30 GNU parallel reads values from input sources. One input source is the
31 command line. The values are put after ::: :
32
33 parallel echo ::: 1 2 3 4 5
34
35 This makes it easy to run the same program on some files:
36
37 parallel wc ::: example.*
38
39 If you give multiple :::s, GNU parallel will generate all combinations:
40
41 parallel wc ::: -l -c ::: example.*
42
43 GNU parallel can also read the values from stdin (standard input):
44
45 seq 5 | parallel echo
46
47 Building the command line
48 The command line is put before the :::. It can contain contain a
49 command and options for the command:
50
51 parallel wc -l ::: example.*
52
53 The command can contain multiple programs. Just remember to quote
54 characters that are interpreted by the shell (such as ;):
55
56 parallel echo counting lines';' wc -l ::: example.*
57
58 The value will normally be appended to the command, but can be placed
59 anywhere by using the replacement string {}:
60
61 parallel echo counting {}';' wc -l {} ::: example.*
62
63 When using multiple input sources you use the positional replacement
64 strings {1} and {2}:
65
66 parallel echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*
67
68 You can check what will be run with --dry-run:
69
70 parallel --dry-run echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*
71
72 This is a good idea to do for every command until you are comfortable
73 with GNU parallel.
74
75 Controlling the output
76 The output will be printed as soon as the command completes. This means
77 the output may come in a different order than the input:
78
79 parallel sleep {}';' echo {} done ::: 5 4 3 2 1
80
81 You can force GNU parallel to print in the order of the values with
82 --keep-order/-k. This will still run the commands in parallel. The
83 output of the later jobs will be delayed, until the earlier jobs are
84 printed:
85
86 parallel -k sleep {}';' echo {} done ::: 5 4 3 2 1
87
88 Controlling the execution
89 If your jobs are compute intensive, you will most likely run one job
90 for each core in the system. This is the default for GNU parallel.
91
92 But sometimes you want more jobs running. You control the number of job
93 slots with -j. Give -j the number of jobs to run in parallel:
94
95 parallel -j50 \
96 wget http://ftpmirror.gnu.org/parallel/parallel-{1}{2}22.tar.bz2 \
97 ::: 2012 2013 2014 2015 2016 \
98 ::: 01 02 03 04 05 06 07 08 09 10 11 12
99
100 Pipe mode
101 GNU parallel can also pass blocks of data to commands on stdin
102 (standard input):
103
104 seq 1000000 | parallel --pipe wc
105
106 This can be used to process big text files. By default GNU parallel
107 splits on \n (newline) and passes a block of around 1 MB to each job.
108
109 That's it
110 You have now learned the basic use of GNU parallel. This will probably
111 cover most cases of your use of GNU parallel.
112
113 The rest of this document will go into more details on each of the
114 sections and cover special use cases.
115
117 In this part we will dive deeper into what you learned in the first 5
118 minutes.
119
120 To get going please run this to make some example files:
121
122 seq 6 > seq6
123 seq 6 -1 1 > seq-6
124
125 Input sources
126 On top of the command line, input sources can also be stdin (standard
127 input or '-'), files and fifos and they can be mixed. Files are given
128 after -a or ::::. So these all do the same:
129
130 parallel echo Dice1={1} Dice2={2} ::: 1 2 3 4 5 6 ::: 6 5 4 3 2 1
131 parallel echo Dice1={1} Dice2={2} :::: <(seq 6) :::: <(seq 6 -1 1)
132 parallel echo Dice1={1} Dice2={2} :::: seq6 seq-6
133 parallel echo Dice1={1} Dice2={2} :::: seq6 :::: seq-6
134 parallel -a seq6 -a seq-6 echo Dice1={1} Dice2={2}
135 parallel -a seq6 echo Dice1={1} Dice2={2} :::: seq-6
136 parallel echo Dice1={1} Dice2={2} ::: 1 2 3 4 5 6 :::: seq-6
137 cat seq-6 | parallel echo Dice1={1} Dice2={2} :::: seq6 -
138
139 If stdin (standard input) is the only input source, you do not need the
140 '-':
141
142 cat seq6 | parallel echo Dice1={1}
143
144 Linking input sources
145
146 You can link multiple input sources with :::+ and ::::+:
147
148 parallel echo {1}={2} ::: I II III IV V VI :::+ 1 2 3 4 5 6
149 parallel echo {1}={2} ::: I II III IV V VI ::::+ seq6
150
151 The :::+ (and ::::+) will link each value to the corresponding value in
152 the previous input source, so value number 3 from the first input
153 source will be linked to value number 3 from the second input source.
154
155 You can combine :::+ and :::, so you link 2 input sources, but generate
156 all combinations with other input sources:
157
158 parallel echo Dice1={1}={2} Dice2={3}={4} ::: I II III IV V VI ::::+ seq6 \
159 ::: VI V IV III II I ::::+ seq-6
160
161 Building the command line
162 The command
163
164 The command can be a script, a binary or a Bash function if the
165 function is exported using export -f:
166
167 # Works only in Bash
168 my_func() {
169 echo in my_func "$1"
170 }
171 export -f my_func
172 parallel my_func ::: 1 2 3
173
174 If the command is complex, it often improves readability to make it
175 into a function.
176
177 The replacement strings
178
179 GNU parallel has some replacement strings to make it easier to refer to
180 the input read from the input sources.
181
182 If the input is mydir/mysubdir/myfile.myext then:
183
184 {} = mydir/mysubdir/myfile.myext
185 {.} = mydir/mysubdir/myfile
186 {/} = myfile.myext
187 {//} = mydir/mysubdir
188 {/.} = myfile
189 {#} = the sequence number of the job
190 {%} = the job slot number
191
192 When a job is started it gets a sequence number that starts at 1 and
193 increases by 1 for each new job. The job also gets assigned a slot
194 number. This number is from 1 to the number of jobs running in
195 parallel. It is unique between the running jobs, but is re-used as soon
196 as a job finishes.
197
198 The positional replacement strings
199
200 The replacement strings have corresponding positional replacement
201 strings. If the value from the 3rd input source is
202 mydir/mysubdir/myfile.myext:
203
204 {3} = mydir/mysubdir/myfile.myext
205 {3.} = mydir/mysubdir/myfile
206 {3/} = myfile.myext
207 {3//} = mydir/mysubdir
208 {3/.} = myfile
209
210 So the number of the input source is simply prepended inside the {}'s.
211
213 --plus replacement strings
214
215 change the replacement string (-I --extensionreplace --basenamereplace
216 --basenamereplace --dirnamereplace --basenameextensionreplace
217 --seqreplace --slotreplace
218
219 --header with named replacement string
220
221 {= =}
222
223 Dynamic replacement strings
224
225 Defining replacement strings
226 Copying environment
227 env_parallel
228
229 Controlling the output
230 parset
231
232 parset is a shell function to get the output from GNU parallel into
233 shell variables.
234
235 parset is fully supported for Bash/Zsh/Ksh and partially supported for
236 ash/dash. I will assume you run Bash.
237
238 To activate parset you have to run:
239
240 . `which env_parallel.bash`
241
242 (replace bash with your shell's name).
243
244 Then you can run:
245
246 parset a,b,c seq ::: 4 5 6
247 echo "$c"
248
249 or:
250
251 parset 'a b c' seq ::: 4 5 6
252 echo "$c"
253
254 If you give a single variable, this will become an array:
255
256 parset arr seq ::: 4 5 6
257 echo "${arr[1]}"
258
259 parset has one limitation: If it reads from a pipe, the output will be
260 lost.
261
262 echo This will not work | parset myarr echo
263 echo Nothing: "${myarr[*]}"
264
265 Instead you can do this:
266
267 echo This will work > tempfile
268 parset myarr echo < tempfile
269 echo ${myarr[*]}
270
271 sql cvs
272
273 Controlling the execution
274 --dryrun -v
275
276 Remote execution
277 For this section you must have ssh access with no password to 2
278 servers: $server1 and $server2.
279
280 server1=server.example.com
281 server2=server2.example.net
282
283 So you must be able to do this:
284
285 ssh $server1 echo works
286 ssh $server2 echo works
287
288 It can be setup by running 'ssh-keygen -t dsa; ssh-copy-id $server1'
289 and using an empty passphrase. Or you can use ssh-agent.
290
291 Workers
292
293 --transferfile
294
295 --transferfile filename will transfer filename to the worker. filename
296 can contain a replacement string:
297
298 parallel -S $server1,$server2 --transferfile {} wc ::: example.*
299 parallel -S $server1,$server2 --transferfile {2} \
300 echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*
301
302 A shorthand for --transferfile {} is --transfer.
303
304 --return
305
306 --cleanup
307
308 A shorthand for --transfer --return {} --cleanup is --trc {}.
309
310 Pipe mode
311 --pipepart
312
313 That's it
315 parset fifo, cmd substitution, arrayelements, array with var names and
316 cmds, env_parset
317
318 env_parallel
319
320 Interfacing with R.
321
322 Interfacing with JSON/jq
323
324 4dl() {
325 board="$(printf -- '%s' "${1}" | cut -d '/' -f4)"
326 thread="$(printf -- '%s' "${1}" | cut -d '/' -f6)"
327 wget -qO- "https://a.4cdn.org/${board}/thread/${thread}.json" |
328 jq -r '
329 .posts
330 | map(select(.tim != null))
331 | map((.tim | tostring) + .ext)
332 | map("https://i.4cdn.org/'"${board}"'/"+.)[]
333 ' |
334 parallel --gnu -j 0 wget -nv }
335
336 Interfacing with XML/?
337
338 Interfacing with HTML/?
339
340 Controlling the execution
341 --termseq
342
343 Remote execution
344 seq 10 | parallel --sshlogin 'ssh -i "key.pem" a@b.com' echo
345
346 seq 10 | PARALLEL_SSH='ssh -i "key.pem"' parallel --sshlogin a@b.com
347 echo
348
349 seq 10 | parallel --ssh 'ssh -i "key.pem"' --sshlogin a@b.com echo
350
351 ssh-agent
352
353 The sshlogin file format
354
355 Check if servers are up
356
357
358
35920180722 2018-08-22 PARALLEL_BOOK(7)