parallel_examples(7)

1PARALLEL_EXAMPLES(7)               parallel               PARALLEL_EXAMPLES(7)
2
3
4

GNU PARALLEL EXAMPLES

6   EXAMPLE: Working as xargs -n1. Argument appending
7       GNU parallel can work similar to xargs -n1.
8
9       To compress all html files using gzip run:
10
11         find . -name '*.html' | parallel gzip --best
12
13       If the file names may contain a newline use -0. Substitute FOO BAR with
14       FUBAR in all files in this dir and subdirs:
15
16         find . -type f -print0 | \
17           parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'
18
19       Note -q is needed because of the space in 'FOO BAR'.
20
21   EXAMPLE: Simple network scanner
22       prips can generate IP-addresses from CIDR notation. With GNU parallel
23       you can build a simple network scanner to see which addresses respond
24       to ping:
25
26         prips 130.229.16.0/20 | \
27           parallel --timeout 2 -j0 \
28             'ping -c 1 {} >/dev/null && echo {}' 2>/dev/null
29
30   EXAMPLE: Reading arguments from command line
31       GNU parallel can take the arguments from command line instead of stdin
32       (standard input). To compress all html files in the current dir using
33       gzip run:
34
35         parallel gzip --best ::: *.html
36
37       To convert *.wav to *.mp3 using LAME running one process per CPU run:
38
39         parallel lame {} -o {.}.mp3 ::: *.wav
40
41   EXAMPLE: Inserting multiple arguments
42       When moving a lot of files like this: mv *.log destdir you will
43       sometimes get the error:
44
45         bash: /bin/mv: Argument list too long
46
47       because there are too many files. You can instead do:
48
49         ls | grep -E '\.log$' | parallel mv {} destdir
50
51       This will run mv for each file. It can be done faster if mv gets as
52       many arguments that will fit on the line:
53
54         ls | grep -E '\.log$' | parallel -m mv {} destdir
55
56       In many shells you can also use printf:
57
58         printf '%s\0' *.log | parallel -0 -m mv {} destdir
59
60   EXAMPLE: Context replace
61       To remove the files pict0000.jpg .. pict9999.jpg you could do:
62
63         seq -w 0 9999 | parallel rm pict{}.jpg
64
65       You could also do:
66
67         seq -w 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm
68
69       The first will run rm 10000 times, while the last will only run rm as
70       many times needed to keep the command line length short enough to avoid
71       Argument list too long (it typically runs 1-2 times).
72
73       You could also run:
74
75         seq -w 0 9999 | parallel -X rm pict{}.jpg
76
77       This will also only run rm as many times needed to keep the command
78       line length short enough.
79
80   EXAMPLE: Compute intensive jobs and substitution
81       If ImageMagick is installed this will generate a thumbnail of a jpg
82       file:
83
84         convert -geometry 120 foo.jpg thumb_foo.jpg
85
86       This will run with number-of-cpus jobs in parallel for all jpg files in
87       a directory:
88
89         ls *.jpg | parallel convert -geometry 120 {} thumb_{}
90
91       To do it recursively use find:
92
93         find . -name '*.jpg' | \
94           parallel convert -geometry 120 {} {}_thumb.jpg
95
96       Notice how the argument has to start with {} as {} will include path
97       (e.g. running convert -geometry 120 ./foo/bar.jpg thumb_./foo/bar.jpg
98       would clearly be wrong). The command will generate files like
99       ./foo/bar.jpg_thumb.jpg.
100
101       Use {.} to avoid the extra .jpg in the file name. This command will
102       make files like ./foo/bar_thumb.jpg:
103
104         find . -name '*.jpg' | \
105           parallel convert -geometry 120 {} {.}_thumb.jpg
106
107   EXAMPLE: Substitution and redirection
108       This will generate an uncompressed version of .gz-files next to the
109       .gz-file:
110
111         parallel zcat {} ">"{.} ::: *.gz
112
113       Quoting of > is necessary to postpone the redirection. Another solution
114       is to quote the whole command:
115
116         parallel "zcat {} >{.}" ::: *.gz
117
118       Other special shell characters (such as * ; $ > < |  >> <<) also need
119       to be put in quotes, as they may otherwise be interpreted by the shell
120       and not given to GNU parallel.
121
122   EXAMPLE: Composed commands
123       A job can consist of several commands. This will print the number of
124       files in each directory:
125
126         ls | parallel 'echo -n {}" "; ls {}|wc -l'
127
128       To put the output in a file called <name>.dir:
129
130         ls | parallel '(echo -n {}" "; ls {}|wc -l) >{}.dir'
131
132       Even small shell scripts can be run by GNU parallel:
133
134         find . | parallel 'a={}; name=${a##*/};' \
135           'upper=$(echo "$name" | tr "[:lower:]" "[:upper:]");'\
136           'echo "$name - $upper"'
137
138         ls | parallel 'mv {} "$(echo {} | tr "[:upper:]" "[:lower:]")"'
139
140       Given a list of URLs, list all URLs that fail to download. Print the
141       line number and the URL.
142
143         cat urlfile | parallel "wget {} 2>/dev/null || grep -n {} urlfile"
144
145       Create a mirror directory with the same file names except all files and
146       symlinks are empty files.
147
148         cp -rs /the/source/dir mirror_dir
149         find mirror_dir -type l | parallel -m rm {} '&&' touch {}
150
151       Find the files in a list that do not exist
152
153         cat file_list | parallel 'if [ ! -e {} ] ; then echo {}; fi'
154
155   EXAMPLE: Composed command with perl replacement string
156       You have a bunch of file. You want them sorted into dirs. The dir of
157       each file should be named the first letter of the file name.
158
159         parallel 'mkdir -p {=s/(.).*/$1/=}; mv {} {=s/(.).*/$1/=}' ::: *
160
161   EXAMPLE: Composed command with multiple input sources
162       You have a dir with files named as 24 hours in 5 minute intervals:
163       00:00, 00:05, 00:10 .. 23:55. You want to find the files missing:
164
165         parallel [ -f {1}:{2} ] "||" echo {1}:{2} does not exist \
166           ::: {00..23} ::: {00..55..5}
167
168   EXAMPLE: Calling Bash functions
169       If the composed command is longer than a line, it becomes hard to read.
170       In Bash you can use functions. Just remember to export -f the function.
171
172         doit() {
173           echo Doing it for $1
174           sleep 2
175           echo Done with $1
176         }
177         export -f doit
178         parallel doit ::: 1 2 3
179
180         doubleit() {
181           echo Doing it for $1 $2
182           sleep 2
183           echo Done with $1 $2
184         }
185         export -f doubleit
186         parallel doubleit ::: 1 2 3 ::: a b
187
188       To do this on remote servers you need to transfer the function using
189       --env:
190
191         parallel --env doit -S server doit ::: 1 2 3
192         parallel --env doubleit -S server doubleit ::: 1 2 3 ::: a b
193
194       If your environment (aliases, variables, and functions) is small you
195       can copy the full environment without having to export -f anything. See
196       env_parallel.
197
198   EXAMPLE: Function tester
199       To test a program with different parameters:
200
201         tester() {
202           if (eval "$@") >&/dev/null; then
203             perl -e 'printf "\033[30;102m[ OK ]\033[0m @ARGV\n"' "$@"
204           else
205             perl -e 'printf "\033[30;101m[FAIL]\033[0m @ARGV\n"' "$@"
206           fi
207         }
208         export -f tester
209         parallel tester my_program ::: arg1 arg2
210         parallel tester exit ::: 1 0 2 0
211
212       If my_program fails a red FAIL will be printed followed by the failing
213       command; otherwise a green OK will be printed followed by the command.
214
215   EXAMPLE: Continously show the latest line of output
216       It can be useful to monitor the output of running jobs.
217
218       This shows the most recent output line until a job finishes. After
219       which the output of the job is printed in full:
220
221         parallel '{} | tee >(cat >&3)' ::: 'command 1' 'command 2' \
222           3> >(perl -ne '$|=1;chomp;printf"%.'$COLUMNS's\r",$_." "x100')
223
224   EXAMPLE: Log rotate
225       Log rotation renames a logfile to an extension with a higher number:
226       log.1 becomes log.2, log.2 becomes log.3, and so on. The oldest log is
227       removed. To avoid overwriting files the process starts backwards from
228       the high number to the low number.  This will keep 10 old versions of
229       the log:
230
231         seq 9 -1 1 | parallel -j1 mv log.{} log.'{= $_++ =}'
232         mv log log.1
233
234   EXAMPLE: Removing file extension when processing files
235       When processing files removing the file extension using {.} is often
236       useful.
237
238       Create a directory for each zip-file and unzip it in that dir:
239
240         parallel 'mkdir {.}; cd {.}; unzip ../{}' ::: *.zip
241
242       Recompress all .gz files in current directory using bzip2 running 1 job
243       per CPU in parallel:
244
245         parallel "zcat {} | bzip2 >{.}.bz2 && rm {}" ::: *.gz
246
247       Convert all WAV files to MP3 using LAME:
248
249         find sounddir -type f -name '*.wav' | parallel lame {} -o {.}.mp3
250
251       Put all converted in the same directory:
252
253         find sounddir -type f -name '*.wav' | \
254           parallel lame {} -o mydir/{/.}.mp3
255
256   EXAMPLE: Replacing parts of file names
257       If you deal with paired end reads, you will have files like
258       barcode1_R1.fq.gz, barcode1_R2.fq.gz, barcode2_R1.fq.gz, and
259       barcode2_R2.fq.gz.
260
261       You want barcodeN_R1 to be processed with barcodeN_R2.
262
263           parallel --plus myprocess {} {/_R1.fq.gz/_R2.fq.gz} ::: *_R1.fq.gz
264
265       If the barcode does not contain '_R1', you can do:
266
267           parallel --plus myprocess {} {/_R1/_R2} ::: *_R1.fq.gz
268
269   EXAMPLE: Removing strings from the argument
270       If you have directory with tar.gz files and want these extracted in the
271       corresponding dir (e.g foo.tar.gz will be extracted in the dir foo) you
272       can do:
273
274         parallel --plus 'mkdir {..}; tar -C {..} -xf {}' ::: *.tar.gz
275
276       If you want to remove a different ending, you can use {%string}:
277
278         parallel --plus echo {%_demo} ::: mycode_demo keep_demo_here
279
280       You can also remove a starting string with {#string}
281
282         parallel --plus echo {#demo_} ::: demo_mycode keep_demo_here
283
284       To remove a string anywhere you can use regular expressions with
285       {/regexp/replacement} and leave the replacement empty:
286
287         parallel --plus echo {/demo_/} ::: demo_mycode remove_demo_here
288
289   EXAMPLE: Download 24 images for each of the past 30 days
290       Let us assume a website stores images like:
291
292         https://www.example.com/path/to/YYYYMMDD_##.jpg
293
294       where YYYYMMDD is the date and ## is the number 01-24. This will
295       download images for the past 30 days:
296
297         getit() {
298           date=$(date -d "today -$1 days" +%Y%m%d)
299           num=$2
300           echo wget https://www.example.com/path/to/${date}_${num}.jpg
301         }
302         export -f getit
303
304         parallel getit ::: $(seq 30) ::: $(seq -w 24)
305
306       $(date -d "today -$1 days" +%Y%m%d) will give the dates in YYYYMMDD
307       with $1 days subtracted.
308
309   EXAMPLE: Download world map from NASA
310       NASA provides tiles to download on earthdata.nasa.gov. Download tiles
311       for Blue Marble world map and create a 10240x20480 map.
312
313         base=https://map1a.vis.earthdata.nasa.gov/wmts-geo/wmts.cgi
314         service="SERVICE=WMTS&REQUEST=GetTile&VERSION=1.0.0"
315         layer="LAYER=BlueMarble_ShadedRelief_Bathymetry"
316         set="STYLE=&TILEMATRIXSET=EPSG4326_500m&TILEMATRIX=5"
317         tile="TILEROW={1}&TILECOL={2}"
318         format="FORMAT=image%2Fjpeg"
319         url="$base?$service&$layer&$set&$tile&$format"
320
321         parallel -j0 -q wget "$url" -O {1}_{2}.jpg ::: {0..19} ::: {0..39}
322         parallel eval convert +append {}_{0..39}.jpg line{}.jpg ::: {0..19}
323         convert -append line{0..19}.jpg world.jpg
324
325   EXAMPLE: Download Apollo-11 images from NASA using jq
326       Search NASA using their API to get JSON for images related to 'apollo
327       11' and has 'moon landing' in the description.
328
329       The search query returns JSON containing URLs to JSON containing
330       collections of pictures. One of the pictures in each of these
331       collection is large.
332
333       wget is used to get the JSON for the search query. jq is then used to
334       extract the URLs of the collections. parallel then calls wget to get
335       each collection, which is passed to jq to extract the URLs of all
336       images. grep filters out the large images, and parallel finally uses
337       wget to fetch the images.
338
339         base="https://images-api.nasa.gov/search"
340         q="q=apollo 11"
341         description="description=moon landing"
342         media_type="media_type=image"
343         wget -O - "$base?$q&$description&$media_type" |
344           jq -r .collection.items[].href |
345           parallel wget -O - |
346           jq -r .[] |
347           grep large |
348           parallel wget
349
350   EXAMPLE: Download video playlist in parallel
351       youtube-dl is an excellent tool to download videos. It can, however,
352       not download videos in parallel. This takes a playlist and downloads 10
353       videos in parallel.
354
355         url='youtu.be/watch?v=0wOf2Fgi3DE&list=UU_cznB5YZZmvAmeq7Y3EriQ'
356         export url
357         youtube-dl --flat-playlist "https://$url" |
358           parallel --tagstring {#} --lb -j10 \
359             youtube-dl --playlist-start {#} --playlist-end {#} '"https://$url"'
360
361   EXAMPLE: Prepend last modified date (ISO8601) to file name
362         parallel mv {} '{= $a=pQ($_); $b=$_;' \
363           '$_=qx{date -r "$a" +%FT%T}; chomp; $_="$_ $b" =}' ::: *
364
365       {= and =} mark a perl expression. pQ perl-quotes the string. date
366       +%FT%T is the date in ISO8601 with time.
367
368   EXAMPLE: Save output in ISO8601 dirs
369       Save output from ps aux every second into dirs named
370       yyyy-mm-ddThh:mm:ss+zz:zz.
371
372         seq 1000 | parallel -N0 -j1 --delay 1 \
373           --results '{= $_=`date -Isec`; chomp=}/' ps aux
374
375   EXAMPLE: Digital clock with "blinking" :
376       The : in a digital clock blinks. To make every other line have a ':'
377       and the rest a ' ' a perl expression is used to look at the 3rd input
378       source. If the value modulo 2 is 1: Use ":" otherwise use " ":
379
380         parallel -k echo {1}'{=3 $_=$_%2?":":" "=}'{2}{3} \
381           ::: {0..12} ::: {0..5} ::: {0..9}
382
383   EXAMPLE: Aggregating content of files
384       This:
385
386         parallel --header : echo x{X}y{Y}z{Z} \> x{X}y{Y}z{Z} \
387         ::: X {1..5} ::: Y {01..10} ::: Z {1..5}
388
389       will generate the files x1y01z1 .. x5y10z5. If you want to aggregate
390       the output grouping on x and z you can do this:
391
392         parallel eval 'cat {=s/y01/y*/=} > {=s/y01//=}' ::: *y01*
393
394       For all values of x and z it runs commands like:
395
396         cat x1y*z1 > x1z1
397
398       So you end up with x1z1 .. x5z5 each containing the content of all
399       values of y.
400
401   EXAMPLE: Breadth first parallel web crawler/mirrorer
402       This script below will crawl and mirror a URL in parallel.  It
403       downloads first pages that are 1 click down, then 2 clicks down, then
404       3; instead of the normal depth first, where the first link link on each
405       page is fetched first.
406
407       Run like this:
408
409         PARALLEL=-j100 ./parallel-crawl http://gatt.org.yeslab.org/
410
411       Remove the wget part if you only want a web crawler.
412
413       It works by fetching a page from a list of URLs and looking for links
414       in that page that are within the same starting URL and that have not
415       already been seen. These links are added to a new queue. When all the
416       pages from the list is done, the new queue is moved to the list of URLs
417       and the process is started over until no unseen links are found.
418
419         #!/bin/bash
420
421         # E.g. http://gatt.org.yeslab.org/
422         URL=$1
423         # Stay inside the start dir
424         BASEURL=$(echo $URL | perl -pe 's:#.*::; s:(//.*/)[^/]*:$1:')
425         URLLIST=$(mktemp urllist.XXXX)
426         URLLIST2=$(mktemp urllist.XXXX)
427         SEEN=$(mktemp seen.XXXX)
428
429         # Spider to get the URLs
430         echo $URL >$URLLIST
431         cp $URLLIST $SEEN
432
433         while [ -s $URLLIST ] ; do
434           cat $URLLIST |
435             parallel lynx -listonly -image_links -dump {} \; \
436               wget -qm -l1 -Q1 {} \; echo Spidered: {} \>\&2 |
437               perl -ne 's/#.*//; s/\s+\d+.\s(\S+)$/$1/ and
438                 do { $seen{$1}++ or print }' |
439             grep -F $BASEURL |
440             grep -v -x -F -f $SEEN | tee -a $SEEN > $URLLIST2
441           mv $URLLIST2 $URLLIST
442         done
443
444         rm -f $URLLIST $URLLIST2 $SEEN
445
446   EXAMPLE: Process files from a tar file while unpacking
447       If the files to be processed are in a tar file then unpacking one file
448       and processing it immediately may be faster than first unpacking all
449       files.
450
451         tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' | \
452           parallel echo
453
454       The Perl one-liner is needed to make sure the file is complete before
455       handing it to GNU parallel.
456
457   EXAMPLE: Rewriting a for-loop and a while-read-loop
458       for-loops like this:
459
460         (for x in `cat list` ; do
461           do_something $x
462         done) | process_output
463
464       and while-read-loops like this:
465
466         cat list | (while read x ; do
467           do_something $x
468         done) | process_output
469
470       can be written like this:
471
472         cat list | parallel do_something | process_output
473
474       For example: Find which host name in a list has IP address 1.2.3 4:
475
476         cat hosts.txt | parallel -P 100 host | grep 1.2.3.4
477
478       If the processing requires more steps the for-loop like this:
479
480         (for x in `cat list` ; do
481           no_extension=${x%.*};
482           do_step1 $x scale $no_extension.jpg
483           do_step2 <$x $no_extension
484         done) | process_output
485
486       and while-loops like this:
487
488         cat list | (while read x ; do
489           no_extension=${x%.*};
490           do_step1 $x scale $no_extension.jpg
491           do_step2 <$x $no_extension
492         done) | process_output
493
494       can be written like this:
495
496         cat list | parallel "do_step1 {} scale {.}.jpg ; do_step2 <{} {.}" |\
497           process_output
498
499       If the body of the loop is bigger, it improves readability to use a
500       function:
501
502         (for x in `cat list` ; do
503           do_something $x
504           [... 100 lines that do something with $x ...]
505         done) | process_output
506
507         cat list | (while read x ; do
508           do_something $x
509           [... 100 lines that do something with $x ...]
510         done) | process_output
511
512       can both be rewritten as:
513
514         doit() {
515           x=$1
516           do_something $x
517           [... 100 lines that do something with $x ...]
518         }
519         export -f doit
520         cat list | parallel doit
521
522   EXAMPLE: Rewriting nested for-loops
523       Nested for-loops like this:
524
525         (for x in `cat xlist` ; do
526           for y in `cat ylist` ; do
527             do_something $x $y
528           done
529         done) | process_output
530
531       can be written like this:
532
533         parallel do_something {1} {2} :::: xlist ylist | process_output
534
535       Nested for-loops like this:
536
537         (for colour in red green blue ; do
538           for size in S M L XL XXL ; do
539             echo $colour $size
540           done
541         done) | sort
542
543       can be written like this:
544
545         parallel echo {1} {2} ::: red green blue ::: S M L XL XXL | sort
546
547   EXAMPLE: Finding the lowest difference between files
548       diff is good for finding differences in text files. diff | wc -l gives
549       an indication of the size of the difference. To find the differences
550       between all files in the current dir do:
551
552         parallel --tag 'diff {1} {2} | wc -l' ::: * ::: * | sort -nk3
553
554       This way it is possible to see if some files are closer to other files.
555
556   EXAMPLE: for-loops with column names
557       When doing multiple nested for-loops it can be easier to keep track of
558       the loop variable if is is named instead of just having a number. Use
559       --header : to let the first argument be an named alias for the
560       positional replacement string:
561
562         parallel --header : echo {colour} {size} \
563           ::: colour red green blue ::: size S M L XL XXL
564
565       This also works if the input file is a file with columns:
566
567         cat addressbook.tsv | \
568           parallel --colsep '\t' --header : echo {Name} {E-mail address}
569
570   EXAMPLE: All combinations in a list
571       GNU parallel makes all combinations when given two lists.
572
573       To make all combinations in a single list with unique values, you
574       repeat the list and use replacement string {choose_k}:
575
576         parallel --plus echo {choose_k} ::: A B C D ::: A B C D
577
578         parallel --plus echo 2{2choose_k} 1{1choose_k} ::: A B C D ::: A B C D
579
580       {choose_k} works for any number of input sources:
581
582         parallel --plus echo {choose_k} ::: A B C D ::: A B C D ::: A B C D
583
584       Where {choose_k} does not care about order, {uniq} cares about order.
585       It simply skips jobs where values from different input sources are the
586       same:
587
588         parallel --plus echo {uniq} ::: A B C  ::: A B C  ::: A B C
589         parallel --plus echo {1uniq}+{2uniq}+{3uniq} \
590           ::: A B C  ::: A B C  ::: A B C
591
592       The behaviour of {choose_k} is undefined, if the input values of each
593       source are different.
594
595   EXAMPLE: From a to b and b to c
596       Assume you have input like:
597
598         aardvark
599         babble
600         cab
601         dab
602         each
603
604       and want to run combinations like:
605
606         aardvark babble
607         babble cab
608         cab dab
609         dab each
610
611       If the input is in the file in.txt:
612
613         parallel echo {1} - {2} ::::+ <(head -n -1 in.txt) <(tail -n +2 in.txt)
614
615       If the input is in the array $a here are two solutions:
616
617         seq $((${#a[@]}-1)) | \
618           env_parallel --env a echo '${a[{=$_--=}]} - ${a[{}]}'
619         parallel echo {1} - {2} ::: "${a[@]::${#a[@]}-1}" :::+ "${a[@]:1}"
620
621   EXAMPLE: Count the differences between all files in a dir
622       Using --results the results are saved in /tmp/diffcount*.
623
624         parallel --results /tmp/diffcount "diff -U 0 {1} {2} | \
625           tail -n +3 |grep -v '^@'|wc -l" ::: * ::: *
626
627       To see the difference between file A and file B look at the file
628       '/tmp/diffcount/1/A/2/B'.
629
630   EXAMPLE: Speeding up fast jobs
631       Starting a job on the local machine takes around 3-10 ms. This can be a
632       big overhead if the job takes very few ms to run. Often you can group
633       small jobs together using -X which will make the overhead less
634       significant. Compare the speed of these:
635
636         seq -w 0 9999 | parallel touch pict{}.jpg
637         seq -w 0 9999 | parallel -X touch pict{}.jpg
638
639       If your program cannot take multiple arguments, then you can use GNU
640       parallel to spawn multiple GNU parallels:
641
642         seq -w 0 9999999 | \
643           parallel -j10 -q -I,, --pipe parallel -j0 touch pict{}.jpg
644
645       If -j0 normally spawns 252 jobs, then the above will try to spawn 2520
646       jobs. On a normal GNU/Linux system you can spawn 32000 jobs using this
647       technique with no problems. To raise the 32000 jobs limit raise
648       /proc/sys/kernel/pid_max to 4194303.
649
650       If you do not need GNU parallel to have control over each job (so no
651       need for --retries or --joblog or similar), then it can be even faster
652       if you can generate the command lines and pipe those to a shell. So if
653       you can do this:
654
655         mygenerator | sh
656
657       Then that can be parallelized like this:
658
659         mygenerator | parallel --pipe --block 10M sh
660
661       E.g.
662
663         mygenerator() {
664           seq 10000000 | perl -pe 'print "echo This is fast job number "';
665         }
666         mygenerator | parallel --pipe --block 10M sh
667
668       The overhead is 100000 times smaller namely around 100 nanoseconds per
669       job.
670
671   EXAMPLE: Using shell variables
672       When using shell variables you need to quote them correctly as they may
673       otherwise be interpreted by the shell.
674
675       Notice the difference between:
676
677         ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
678         parallel echo ::: ${ARR[@]} # This is probably not what you want
679
680       and:
681
682         ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
683         parallel echo ::: "${ARR[@]}"
684
685       When using variables in the actual command that contains special
686       characters (e.g. space) you can quote them using '"$VAR"' or using "'s
687       and -q:
688
689         VAR="My brother's 12\" records are worth <\$\$\$>"
690         parallel -q echo "$VAR" ::: '!'
691         export VAR
692         parallel echo '"$VAR"' ::: '!'
693
694       If $VAR does not contain ' then "'$VAR'" will also work (and does not
695       need export):
696
697         VAR="My 12\" records are worth <\$\$\$>"
698         parallel echo "'$VAR'" ::: '!'
699
700       If you use them in a function you just quote as you normally would do:
701
702         VAR="My brother's 12\" records are worth <\$\$\$>"
703         export VAR
704         myfunc() { echo "$VAR" "$1"; }
705         export -f myfunc
706         parallel myfunc ::: '!'
707
708   EXAMPLE: Group output lines
709       When running jobs that output data, you often do not want the output of
710       multiple jobs to run together. GNU parallel defaults to grouping the
711       output of each job, so the output is printed when the job finishes. If
712       you want full lines to be printed while the job is running you can use
713       --line-buffer. If you want output to be printed as soon as possible you
714       can use -u.
715
716       Compare the output of:
717
718         parallel wget --progress=dot --limit-rate=100k \
719           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
720           ::: {12..16}
721         parallel --line-buffer wget --progress=dot --limit-rate=100k \
722           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
723           ::: {12..16}
724         parallel --latest-line wget --progress=dot --limit-rate=100k \
725           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
726           ::: {12..16}
727         parallel -u wget --progress=dot --limit-rate=100k \
728           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
729           ::: {12..16}
730
731   EXAMPLE: Tag output lines
732       GNU parallel groups the output lines, but it can be hard to see where
733       the different jobs begin. --tag prepends the argument to make that more
734       visible:
735
736         parallel --tag wget --limit-rate=100k \
737           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
738           ::: {12..16}
739
740       --tag works with --line-buffer but not with -u:
741
742         parallel --tag --line-buffer wget --limit-rate=100k \
743           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
744           ::: {12..16}
745
746       Check the uptime of the servers in ~/.parallel/sshloginfile:
747
748         parallel --tag -S .. --nonall uptime
749
750   EXAMPLE: Colorize output
751       Give each job a new color. Most terminals support ANSI colors with the
752       escape code "\033[30;3Xm" where 0 <= X <= 7:
753
754           seq 10 | \
755             parallel --tagstring '\033[30;3{=$_=++$::color%8=}m' seq {}
756           parallel --rpl '{color} $_="\033[30;3".(++$::color%8)."m"' \
757             --tagstring {color} seq {} ::: {1..10}
758
759       To get rid of the initial \t (which comes from --tagstring):
760
761           ... | perl -pe 's/\t//'
762
763   EXAMPLE: Keep order of output same as order of input
764       Normally the output of a job will be printed as soon as it completes.
765       Sometimes you want the order of the output to remain the same as the
766       order of the input. This is often important, if the output is used as
767       input for another system. -k will make sure the order of output will be
768       in the same order as input even if later jobs end before earlier jobs.
769
770       Append a string to every line in a text file:
771
772         cat textfile | parallel -k echo {} append_string
773
774       If you remove -k some of the lines may come out in the wrong order.
775
776       Another example is traceroute:
777
778         parallel traceroute ::: qubes-os.org debian.org freenetproject.org
779
780       will give traceroute of qubes-os.org, debian.org and
781       freenetproject.org, but it will be sorted according to which job
782       completed first.
783
784       To keep the order the same as input run:
785
786         parallel -k traceroute ::: qubes-os.org debian.org freenetproject.org
787
788       This will make sure the traceroute to qubes-os.org will be printed
789       first.
790
791       A bit more complex example is downloading a huge file in chunks in
792       parallel: Some internet connections will deliver more data if you
793       download files in parallel. For downloading files in parallel see:
794       "EXAMPLE: Download 10 images for each of the past 30 days". But if you
795       are downloading a big file you can download the file in chunks in
796       parallel.
797
798       To download byte 10000000-19999999 you can use curl:
799
800         curl -r 10000000-19999999 https://example.com/the/big/file >file.part
801
802       To download a 1 GB file we need 100 10MB chunks downloaded and combined
803       in the correct order.
804
805         seq 0 99 | parallel -k curl -r \
806           {}0000000-{}9999999 https://example.com/the/big/file > file
807
808   EXAMPLE: Parallel grep
809       grep -r greps recursively through directories. GNU parallel can often
810       speed this up.
811
812         find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}
813
814       This will run 1.5 job per CPU, and give 1000 arguments to grep.
815
816       There are situations where the above will be slower than grep -r:
817
818       • If data is already in RAM. The overhead of starting jobs and
819         buffering output may outweigh the benefit of running in parallel.
820
821       • If the files are big. If a file cannot be read in a single seek, the
822         disk may start thrashing.
823
824       The speedup is caused by two factors:
825
826       • On rotating harddisks small files often require a seek for each file.
827         By searching for more files in parallel, the arm may pass another
828         wanted file on its way.
829
830       • NVMe drives often perform better by having multiple command running
831         in parallel.
832
833   EXAMPLE: Grepping n lines for m regular expressions.
834       The simplest solution to grep a big file for a lot of regexps is:
835
836         grep -f regexps.txt bigfile
837
838       Or if the regexps are fixed strings:
839
840         grep -F -f regexps.txt bigfile
841
842       There are 3 limiting factors: CPU, RAM, and disk I/O.
843
844       RAM is easy to measure: If the grep process takes up most of your free
845       memory (e.g. when running top), then RAM is a limiting factor.
846
847       CPU is also easy to measure: If the grep takes >90% CPU in top, then
848       the CPU is a limiting factor, and parallelization will speed this up.
849
850       It is harder to see if disk I/O is the limiting factor, and depending
851       on the disk system it may be faster or slower to parallelize. The only
852       way to know for certain is to test and measure.
853
854       Limiting factor: RAM
855
856       The normal grep -f regexps.txt bigfile works no matter the size of
857       bigfile, but if regexps.txt is so big it cannot fit into memory, then
858       you need to split this.
859
860       grep -F takes around 100 bytes of RAM and grep takes about 500 bytes of
861       RAM per 1 byte of regexp. So if regexps.txt is 1% of your RAM, then it
862       may be too big.
863
864       If you can convert your regexps into fixed strings do that. E.g. if the
865       lines you are looking for in bigfile all looks like:
866
867         ID1 foo bar baz Identifier1 quux
868         fubar ID2 foo bar baz Identifier2
869
870       then your regexps.txt can be converted from:
871
872         ID1.*Identifier1
873         ID2.*Identifier2
874
875       into:
876
877         ID1 foo bar baz Identifier1
878         ID2 foo bar baz Identifier2
879
880       This way you can use grep -F which takes around 80% less memory and is
881       much faster.
882
883       If it still does not fit in memory you can do this:
884
885         parallel --pipe-part -a regexps.txt --block 1M grep -F -f - -n bigfile | \
886           sort -un | perl -pe 's/^\d+://'
887
888       The 1M should be your free memory divided by the number of CPU threads
889       and divided by 200 for grep -F and by 1000 for normal grep. On
890       GNU/Linux you can do:
891
892         free=$(awk '/^((Swap)?Cached|MemFree|Buffers):/ { sum += $2 }
893                     END { print sum }' /proc/meminfo)
894         percpu=$((free / 200 / $(parallel --number-of-threads)))k
895
896         parallel --pipe-part -a regexps.txt --block $percpu --compress \
897           grep -F -f - -n bigfile | \
898           sort -un | perl -pe 's/^\d+://'
899
900       If you can live with duplicated lines and wrong order, it is faster to
901       do:
902
903         parallel --pipe-part -a regexps.txt --block $percpu --compress \
904           grep -F -f - bigfile
905
906       Limiting factor: CPU
907
908       If the CPU is the limiting factor parallelization should be done on the
909       regexps:
910
911         cat regexps.txt | parallel --pipe -L1000 --round-robin --compress \
912           grep -f - -n bigfile | \
913           sort -un | perl -pe 's/^\d+://'
914
915       The command will start one grep per CPU and read bigfile one time per
916       CPU, but as that is done in parallel, all reads except the first will
917       be cached in RAM. Depending on the size of regexps.txt it may be faster
918       to use --block 10m instead of -L1000.
919
920       Some storage systems perform better when reading multiple chunks in
921       parallel. This is true for some RAID systems and for some network file
922       systems. To parallelize the reading of bigfile:
923
924         parallel --pipe-part --block 100M -a bigfile -k --compress \
925           grep -f regexps.txt
926
927       This will split bigfile into 100MB chunks and run grep on each of these
928       chunks. To parallelize both reading of bigfile and regexps.txt combine
929       the two using --cat:
930
931         parallel --pipe-part --block 100M -a bigfile --cat cat regexps.txt \
932           \| parallel --pipe -L1000 --round-robin grep -f - {}
933
934       If a line matches multiple regexps, the line may be duplicated.
935
936       Bigger problem
937
938       If the problem is too big to be solved by this, you are probably ready
939       for Lucene.
940
941   EXAMPLE: Using remote computers
942       To run commands on a remote computer SSH needs to be set up and you
943       must be able to login without entering a password (The commands ssh-
944       copy-id, ssh-agent, and sshpass may help you do that).
945
946       If you need to login to a whole cluster, you typically do not want to
947       accept the host key for every host. You want to accept them the first
948       time and be warned if they are ever changed. To do that:
949
950         # Add the servers to the sshloginfile
951         (echo servera; echo serverb) > .parallel/my_cluster
952         # Make sure .ssh/config exist
953         touch .ssh/config
954         cp .ssh/config .ssh/config.backup
955         # Disable StrictHostKeyChecking temporarily
956         (echo 'Host *'; echo StrictHostKeyChecking no) >> .ssh/config
957         parallel --slf my_cluster --nonall true
958         # Remove the disabling of StrictHostKeyChecking
959         mv .ssh/config.backup .ssh/config
960
961       The servers in .parallel/my_cluster are now added in .ssh/known_hosts.
962
963       To run echo on server.example.com:
964
965         seq 10 | parallel --sshlogin server.example.com echo
966
967       To run commands on more than one remote computer run:
968
969         seq 10 | parallel --sshlogin s1.example.com,s2.example.net echo
970
971       Or:
972
973         seq 10 | parallel --sshlogin server.example.com \
974           --sshlogin server2.example.net echo
975
976       If the login username is foo on server2.example.net use:
977
978         seq 10 | parallel --sshlogin server.example.com \
979           --sshlogin foo@server2.example.net echo
980
981       If your list of hosts is server1-88.example.net with login foo:
982
983         seq 10 | parallel -Sfoo@server{1..88}.example.net echo
984
985       To distribute the commands to a list of computers, make a file
986       mycomputers with all the computers:
987
988         server.example.com
989         foo@server2.example.com
990         server3.example.com
991
992       Then run:
993
994         seq 10 | parallel --sshloginfile mycomputers echo
995
996       To include the local computer add the special sshlogin ':' to the list:
997
998         server.example.com
999         foo@server2.example.com
1000         server3.example.com
1001         :
1002
1003       GNU parallel will try to determine the number of CPUs on each of the
1004       remote computers, and run one job per CPU - even if the remote
1005       computers do not have the same number of CPUs.
1006
1007       If the number of CPUs on the remote computers is not identified
1008       correctly the number of CPUs can be added in front. Here the computer
1009       has 8 CPUs.
1010
1011         seq 10 | parallel --sshlogin 8/server.example.com echo
1012
1013   EXAMPLE: Transferring of files
1014       To recompress gzipped files with bzip2 using a remote computer run:
1015
1016         find logs/ -name '*.gz' | \
1017           parallel --sshlogin server.example.com \
1018           --transfer "zcat {} | bzip2 -9 >{.}.bz2"
1019
1020       This will list the .gz-files in the logs directory and all directories
1021       below. Then it will transfer the files to server.example.com to the
1022       corresponding directory in $HOME/logs. On server.example.com the file
1023       will be recompressed using zcat and bzip2 resulting in the
1024       corresponding file with .gz replaced with .bz2.
1025
1026       If you want the resulting bz2-file to be transferred back to the local
1027       computer add --return {.}.bz2:
1028
1029         find logs/ -name '*.gz' | \
1030           parallel --sshlogin server.example.com \
1031           --transfer --return {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
1032
1033       After the recompressing is done the .bz2-file is transferred back to
1034       the local computer and put next to the original .gz-file.
1035
1036       If you want to delete the transferred files on the remote computer add
1037       --cleanup. This will remove both the file transferred to the remote
1038       computer and the files transferred from the remote computer:
1039
1040         find logs/ -name '*.gz' | \
1041           parallel --sshlogin server.example.com \
1042           --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
1043
1044       If you want run on several computers add the computers to --sshlogin
1045       either using ',' or multiple --sshlogin:
1046
1047         find logs/ -name '*.gz' | \
1048           parallel --sshlogin server.example.com,server2.example.com \
1049           --sshlogin server3.example.com \
1050           --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
1051
1052       You can add the local computer using --sshlogin :. This will disable
1053       the removing and transferring for the local computer only:
1054
1055         find logs/ -name '*.gz' | \
1056           parallel --sshlogin server.example.com,server2.example.com \
1057           --sshlogin server3.example.com \
1058           --sshlogin : \
1059           --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
1060
1061       Often --transfer, --return and --cleanup are used together. They can be
1062       shortened to --trc:
1063
1064         find logs/ -name '*.gz' | \
1065           parallel --sshlogin server.example.com,server2.example.com \
1066           --sshlogin server3.example.com \
1067           --sshlogin : \
1068           --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
1069
1070       With the file mycomputers containing the list of computers it becomes:
1071
1072         find logs/ -name '*.gz' | parallel --sshloginfile mycomputers \
1073           --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
1074
1075       If the file ~/.parallel/sshloginfile contains the list of computers the
1076       special short hand -S .. can be used:
1077
1078         find logs/ -name '*.gz' | parallel -S .. \
1079           --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
1080
1081   EXAMPLE: Advanced file transfer
1082       Assume you have files in in/*, want them processed on server, and
1083       transferred back into /other/dir:
1084
1085         parallel -S server --trc /other/dir/./{/}.out \
1086           cp {/} {/}.out ::: in/./*
1087
1088   EXAMPLE: Distributing work to local and remote computers
1089       Convert *.mp3 to *.ogg running one process per CPU on local computer
1090       and server2:
1091
1092         parallel --trc {.}.ogg -S server2,: \
1093           'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg' ::: *.mp3
1094
1095   EXAMPLE: Running the same command on remote computers
1096       To run the command uptime on remote computers you can do:
1097
1098         parallel --tag --nonall -S server1,server2 uptime
1099
1100       --nonall reads no arguments. If you have a list of jobs you want to run
1101       on each computer you can do:
1102
1103         parallel --tag --onall -S server1,server2 echo ::: 1 2 3
1104
1105       Remove --tag if you do not want the sshlogin added before the output.
1106
1107       If you have a lot of hosts use '-j0' to access more hosts in parallel.
1108
1109   EXAMPLE: Running 'sudo' on remote computers
1110       Put the password into passwordfile then run:
1111
1112         parallel --ssh 'cat passwordfile | ssh' --nonall \
1113           -S user@server1,user@server2 sudo -S ls -l /root
1114
1115   EXAMPLE: Using remote computers behind NAT wall
1116       If the workers are behind a NAT wall, you need some trickery to get to
1117       them.
1118
1119       If you can ssh to a jumphost, and reach the workers from there, then
1120       the obvious solution would be this, but it does not work:
1121
1122         parallel --ssh 'ssh jumphost ssh' -S host1 echo ::: DOES NOT WORK
1123
1124       It does not work because the command is dequoted by ssh twice where as
1125       GNU parallel only expects it to be dequoted once.
1126
1127       You can use a bash function and have GNU parallel quote the command:
1128
1129         jumpssh() { ssh -A jumphost ssh $(parallel --shellquote ::: "$@"); }
1130         export -f jumpssh
1131         parallel --ssh jumpssh -S host1 echo ::: this works
1132
1133       Or you can instead put this in ~/.ssh/config:
1134
1135         Host host1 host2 host3
1136           ProxyCommand ssh jumphost.domain nc -w 1 %h 22
1137
1138       It requires nc(netcat) to be installed on jumphost. With this you can
1139       simply:
1140
1141         parallel -S host1,host2,host3 echo ::: This does work
1142
1143       No jumphost, but port forwards
1144
1145       If there is no jumphost but each server has port 22 forwarded from the
1146       firewall (e.g. the firewall's port 22001 = port 22 on host1, 22002 =
1147       host2, 22003 = host3) then you can use ~/.ssh/config:
1148
1149         Host host1.v
1150           Port 22001
1151         Host host2.v
1152           Port 22002
1153         Host host3.v
1154           Port 22003
1155         Host *.v
1156           Hostname firewall
1157
1158       And then use host{1..3}.v as normal hosts:
1159
1160         parallel -S host1.v,host2.v,host3.v echo ::: a b c
1161
1162       No jumphost, no port forwards
1163
1164       If ports cannot be forwarded, you need some sort of VPN to traverse the
1165       NAT-wall. TOR is one options for that, as it is very easy to get
1166       working.
1167
1168       You need to install TOR and setup a hidden service. In torrc put:
1169
1170         HiddenServiceDir /var/lib/tor/hidden_service/
1171         HiddenServicePort 22 127.0.0.1:22
1172
1173       Then start TOR: /etc/init.d/tor restart
1174
1175       The TOR hostname is now in /var/lib/tor/hidden_service/hostname and is
1176       something similar to izjafdceobowklhz.onion. Now you simply prepend
1177       torsocks to ssh:
1178
1179         parallel --ssh 'torsocks ssh' -S izjafdceobowklhz.onion \
1180           -S zfcdaeiojoklbwhz.onion,auclucjzobowklhi.onion echo ::: a b c
1181
1182       If not all hosts are accessible through TOR:
1183
1184         parallel -S 'torsocks ssh izjafdceobowklhz.onion,host2,host3' \
1185           echo ::: a b c
1186
1187       See more ssh tricks on
1188       https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Proxies_and_Jump_Hosts
1189
1190   EXAMPLE: Use sshpass with ssh
1191       If you cannot use passwordless login, you may be able to use sshpass:
1192
1193         seq 10 | parallel -S user-with-password:MyPassword@server echo
1194
1195       or:
1196
1197         export SSHPASS='MyPa$$w0rd'
1198         seq 10 | parallel -S user-with-password:@server echo
1199
1200   EXAMPLE: Use outrun instead of ssh
1201       outrun lets you run a command on a remote server. outrun sets up a
1202       connection to access files at the source server, and automatically
1203       transfers files. outrun must be installed on the remote system.
1204
1205       You can use outrun in an sshlogin this way:
1206
1207         parallel -S 'outrun user@server' command
1208
1209       or:
1210
1211         parallel --ssh outrun -S server command
1212
1213   EXAMPLE: Slurm cluster
1214       The Slurm Workload Manager is used in many clusters.
1215
1216       Here is a simple example of using GNU parallel to call srun:
1217
1218         #!/bin/bash
1219
1220         #SBATCH --time 00:02:00
1221         #SBATCH --ntasks=4
1222         #SBATCH --job-name GnuParallelDemo
1223         #SBATCH --output gnuparallel.out
1224
1225         module purge
1226         module load gnu_parallel
1227
1228         my_parallel="parallel --delay .2 -j $SLURM_NTASKS"
1229         my_srun="srun --export=all --exclusive -n1"
1230         my_srun="$my_srun --cpus-per-task=1 --cpu-bind=cores"
1231         $my_parallel "$my_srun" echo This is job {} ::: {1..20}
1232
1233   EXAMPLE: Parallelizing rsync
1234       rsync is a great tool, but sometimes it will not fill up the available
1235       bandwidth. Running multiple rsync in parallel can fix this.
1236
1237         cd src-dir
1238         find . -type f |
1239           parallel -j10 -X rsync -zR -Ha ./{} fooserver:/dest-dir/
1240
1241       Adjust -j10 until you find the optimal number.
1242
1243       rsync -R will create the needed subdirectories, so all files are not
1244       put into a single dir. The ./ is needed so the resulting command looks
1245       similar to:
1246
1247         rsync -zR ././sub/dir/file fooserver:/dest-dir/
1248
1249       The /./ is what rsync -R works on.
1250
1251       If you are unable to push data, but need to pull them and the files are
1252       called digits.png (e.g. 000000.png) you might be able to do:
1253
1254         seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/
1255
1256   EXAMPLE: Use multiple inputs in one command
1257       Copy files like foo.es.ext to foo.ext:
1258
1259         ls *.es.* | perl -pe 'print; s/\.es//' | parallel -N2 cp {1} {2}
1260
1261       The perl command spits out 2 lines for each input. GNU parallel takes 2
1262       inputs (using -N2) and replaces {1} and {2} with the inputs.
1263
1264       Count in binary:
1265
1266         parallel -k echo ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1
1267
1268       Print the number on the opposing sides of a six sided die:
1269
1270         parallel --link -a <(seq 6) -a <(seq 6 -1 1) echo
1271         parallel --link echo :::: <(seq 6) <(seq 6 -1 1)
1272
1273       Convert files from all subdirs to PNG-files with consecutive numbers
1274       (useful for making input PNG's for ffmpeg):
1275
1276         parallel --link -a <(find . -type f | sort) \
1277           -a <(seq $(find . -type f|wc -l)) convert {1} {2}.png
1278
1279       Alternative version:
1280
1281         find . -type f | sort | parallel convert {} {#}.png
1282
1283   EXAMPLE: Use a table as input
1284       Content of table_file.tsv:
1285
1286         foo<TAB>bar
1287         baz <TAB> quux
1288
1289       To run:
1290
1291         cmd -o bar -i foo
1292         cmd -o quux -i baz
1293
1294       you can run:
1295
1296         parallel -a table_file.tsv --colsep '\t' cmd -o {2} -i {1}
1297
1298       Note: The default for GNU parallel is to remove the spaces around the
1299       columns. To keep the spaces:
1300
1301         parallel -a table_file.tsv --trim n --colsep '\t' cmd -o {2} -i {1}
1302
1303   EXAMPLE: Output to database
1304       GNU parallel can output to a database table and a CSV-file:
1305
1306         dburl=csv:///%2Ftmp%2Fmydir
1307         dbtableurl=$dburl/mytable.csv
1308         parallel --sqlandworker $dbtableurl seq ::: {1..10}
1309
1310       It is rather slow and takes up a lot of CPU time because GNU parallel
1311       parses the whole CSV file for each update.
1312
1313       A better approach is to use an SQLite-base and then convert that to
1314       CSV:
1315
1316         dburl=sqlite3:///%2Ftmp%2Fmy.sqlite
1317         dbtableurl=$dburl/mytable
1318         parallel --sqlandworker $dbtableurl seq ::: {1..10}
1319         sql $dburl '.headers on' '.mode csv' 'SELECT * FROM mytable;'
1320
1321       This takes around a second per job.
1322
1323       If you have access to a real database system, such as PostgreSQL, it is
1324       even faster:
1325
1326         dburl=pg://user:pass@host/mydb
1327         dbtableurl=$dburl/mytable
1328         parallel --sqlandworker $dbtableurl seq ::: {1..10}
1329         sql $dburl \
1330           "COPY (SELECT * FROM mytable) TO stdout DELIMITER ',' CSV HEADER;"
1331
1332       Or MySQL:
1333
1334         dburl=mysql://user:pass@host/mydb
1335         dbtableurl=$dburl/mytable
1336         parallel --sqlandworker $dbtableurl seq ::: {1..10}
1337         sql -p -B $dburl "SELECT * FROM mytable;" > mytable.tsv
1338         perl -pe 's/"/""/g; s/\t/","/g; s/^/"/; s/$/"/;
1339           %s=("\\" => "\\", "t" => "\t", "n" => "\n");
1340           s/\\([\\tn])/$s{$1}/g;' mytable.tsv
1341
1342   EXAMPLE: Output to CSV-file for R
1343       If you have no need for the advanced job distribution control that a
1344       database provides, but you simply want output into a CSV file that you
1345       can read into R or LibreCalc, then you can use --results:
1346
1347         parallel --results my.csv seq ::: 10 20 30
1348         R
1349         > mydf <- read.csv("my.csv");
1350         > print(mydf[2,])
1351         > write(as.character(mydf[2,c("Stdout")]),'')
1352
1353   EXAMPLE: Use XML as input
1354       The show Aflyttet on Radio 24syv publishes an RSS feed with their audio
1355       podcasts on: http://arkiv.radio24syv.dk/audiopodcast/channel/4466232
1356
1357       Using xpath you can extract the URLs for 2019 and download them using
1358       GNU parallel:
1359
1360         wget -O - http://arkiv.radio24syv.dk/audiopodcast/channel/4466232 | \
1361           xpath -e "//pubDate[contains(text(),'2019')]/../enclosure/@url" | \
1362           parallel -u wget '{= s/ url="//; s/"//; =}'
1363
1364   EXAMPLE: Run the same command 10 times
1365       If you want to run the same command with the same arguments 10 times in
1366       parallel you can do:
1367
1368         seq 10 | parallel -n0 my_command my_args
1369
1370   EXAMPLE: Working as cat | sh. Resource inexpensive jobs and evaluation
1371       GNU parallel can work similar to cat | sh.
1372
1373       A resource inexpensive job is a job that takes very little CPU, disk
1374       I/O and network I/O. Ping is an example of a resource inexpensive job.
1375       wget is too - if the webpages are small.
1376
1377       The content of the file jobs_to_run:
1378
1379         ping -c 1 10.0.0.1
1380         wget http://example.com/status.cgi?ip=10.0.0.1
1381         ping -c 1 10.0.0.2
1382         wget http://example.com/status.cgi?ip=10.0.0.2
1383         ...
1384         ping -c 1 10.0.0.255
1385         wget http://example.com/status.cgi?ip=10.0.0.255
1386
1387       To run 100 processes simultaneously do:
1388
1389         parallel -j 100 < jobs_to_run
1390
1391       As there is not a command the jobs will be evaluated by the shell.
1392
1393   EXAMPLE: Call program with FASTA sequence
1394       FASTA files have the format:
1395
1396         >Sequence name1
1397         sequence
1398         sequence continued
1399         >Sequence name2
1400         sequence
1401         sequence continued
1402         more sequence
1403
1404       To call myprog with the sequence as argument run:
1405
1406         cat file.fasta |
1407           parallel --pipe -N1 --recstart '>' --rrs \
1408             'read a; echo Name: "$a"; myprog $(tr -d "\n")'
1409
1410   EXAMPLE: Call program with interleaved FASTQ records
1411       FASTQ files have the format:
1412
1413         @M10991:61:000000000-A7EML:1:1101:14011:1001 1:N:0:28
1414         CTCCTAGGTCGGCATGATGGGGGAAGGAGAGCATGGGAAGAAATGAGAGAGTAGCAAGG
1415         +
1416         #8BCCGGGGGFEFECFGGGGGGGGG@;FFGGGEG@FF<EE<@FFC,CEGCCGGFF<FGF
1417
1418       Interleaved FASTQ starts with a line like these:
1419
1420         @HWUSI-EAS100R:6:73:941:1973#0/1
1421         @EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG
1422         @EAS139:136:FC706VJ:2:2104:15343:197393 1:N:18:1
1423
1424       where '/1' and ' 1:' determines this is read 1.
1425
1426       This will cut big.fq into one chunk per CPU thread and pass it on stdin
1427       (standard input) to the program fastq-reader:
1428
1429         parallel --pipe-part -a big.fq --block -1 --regexp \
1430           --recend '\n' --recstart '@.*(/1| 1:.*)\n[A-Za-z\n\.~]' \
1431           fastq-reader
1432
1433   EXAMPLE: Processing a big file using more CPUs
1434       To process a big file or some output you can use --pipe to split up the
1435       data into blocks and pipe the blocks into the processing program.
1436
1437       If the program is gzip -9 you can do:
1438
1439         cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz
1440
1441       This will split bigfile into blocks of 1 MB and pass that to gzip -9 in
1442       parallel. One gzip will be run per CPU. The output of gzip -9 will be
1443       kept in order and saved to bigfile.gz
1444
1445       gzip works fine if the output is appended, but some processing does not
1446       work like that - for example sorting. For this GNU parallel can put the
1447       output of each command into a file. This will sort a big file in
1448       parallel:
1449
1450         cat bigfile | parallel --pipe --files sort |\
1451           parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort
1452
1453       Here bigfile is split into blocks of around 1MB, each block ending in
1454       '\n' (which is the default for --recend). Each block is passed to sort
1455       and the output from sort is saved into files. These files are passed to
1456       the second parallel that runs sort -m on the files before it removes
1457       the files. The output is saved to bigfile.sort.
1458
1459       GNU parallel's --pipe maxes out at around 100 MB/s because every byte
1460       has to be copied through GNU parallel. But if bigfile is a real
1461       (seekable) file GNU parallel can by-pass the copying and send the parts
1462       directly to the program:
1463
1464         parallel --pipe-part --block 100m -a bigfile --files sort |\
1465           parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort
1466
1467   EXAMPLE: Grouping input lines
1468       When processing with --pipe you may have lines grouped by a value. Here
1469       is my.csv:
1470
1471          Transaction Customer Item
1472               1       a       53
1473               2       b       65
1474               3       b       82
1475               4       c       96
1476               5       c       67
1477               6       c       13
1478               7       d       90
1479               8       d       43
1480               9       d       91
1481               10      d       84
1482               11      e       72
1483               12      e       102
1484               13      e       63
1485               14      e       56
1486               15      e       74
1487
1488       Let us assume you want GNU parallel to process each customer. In other
1489       words: You want all the transactions for a single customer to be
1490       treated as a single record.
1491
1492       To do this we preprocess the data with a program that inserts a record
1493       separator before each customer (column 2 = $F[1]). Here we first make a
1494       50 character random string, which we then use as the separator:
1495
1496         sep=`perl -e 'print map { ("a".."z","A".."Z")[rand(52)] } (1..50);'`
1497         cat my.csv | \
1498            perl -ape '$F[1] ne $l and print "'$sep'"; $l = $F[1]' | \
1499            parallel --recend $sep --rrs --pipe -N1 wc
1500
1501       If your program can process multiple customers replace -N1 with a
1502       reasonable --blocksize.
1503
1504   EXAMPLE: Running more than 250 jobs workaround
1505       If you need to run a massive amount of jobs in parallel, then you will
1506       likely hit the filehandle limit which is often around 250 jobs. If you
1507       are super user you can raise the limit in /etc/security/limits.conf but
1508       you can also use this workaround. The filehandle limit is per process.
1509       That means that if you just spawn more GNU parallels then each of them
1510       can run 250 jobs. This will spawn up to 2500 jobs:
1511
1512         cat myinput |\
1513           parallel --pipe -N 50 --round-robin -j50 parallel -j50 your_prg
1514
1515       This will spawn up to 62500 jobs (use with caution - you need 64 GB RAM
1516       to do this, and you may need to increase /proc/sys/kernel/pid_max):
1517
1518         cat myinput |\
1519           parallel --pipe -N 250 --round-robin -j250 parallel -j250 your_prg
1520
1521   EXAMPLE: Working as mutex and counting semaphore
1522       The command sem is an alias for parallel --semaphore.
1523
1524       A counting semaphore will allow a given number of jobs to be started in
1525       the background.  When the number of jobs are running in the background,
1526       GNU sem will wait for one of these to complete before starting another
1527       command. sem --wait will wait for all jobs to complete.
1528
1529       Run 10 jobs concurrently in the background:
1530
1531         for i in *.log ; do
1532           echo $i
1533           sem -j10 gzip $i ";" echo done
1534         done
1535         sem --wait
1536
1537       A mutex is a counting semaphore allowing only one job to run. This will
1538       edit the file myfile and prepends the file with lines with the numbers
1539       1 to 3.
1540
1541         seq 3 | parallel sem sed -i -e '1i{}' myfile
1542
1543       As myfile can be very big it is important only one process edits the
1544       file at the same time.
1545
1546       Name the semaphore to have multiple different semaphores active at the
1547       same time:
1548
1549         seq 3 | parallel sem --id mymutex sed -i -e '1i{}' myfile
1550
1551   EXAMPLE: Mutex for a script
1552       Assume a script is called from cron or from a web service, but only one
1553       instance can be run at a time. With sem and --shebang-wrap the script
1554       can be made to wait for other instances to finish. Here in bash:
1555
1556         #!/usr/bin/sem --shebang-wrap -u --id $0 --fg /bin/bash
1557
1558         echo This will run
1559         sleep 5
1560         echo exclusively
1561
1562       Here perl:
1563
1564         #!/usr/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/perl
1565
1566         print "This will run ";
1567         sleep 5;
1568         print "exclusively\n";
1569
1570       Here python:
1571
1572         #!/usr/local/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/python
1573
1574         import time
1575         print "This will run ";
1576         time.sleep(5)
1577         print "exclusively";
1578
1579   EXAMPLE: Start editor with file names from stdin (standard input)
1580       You can use GNU parallel to start interactive programs like emacs or
1581       vi:
1582
1583         cat filelist | parallel --tty -X emacs
1584         cat filelist | parallel --tty -X vi
1585
1586       If there are more files than will fit on a single command line, the
1587       editor will be started again with the remaining files.
1588
1589   EXAMPLE: Running sudo
1590       sudo requires a password to run a command as root. It caches the
1591       access, so you only need to enter the password again if you have not
1592       used sudo for a while.
1593
1594       The command:
1595
1596         parallel sudo echo ::: This is a bad idea
1597
1598       is no good, as you would be prompted for the sudo password for each of
1599       the jobs. Instead do:
1600
1601         sudo parallel echo ::: This is a good idea
1602
1603       This way you only have to enter the sudo password once.
1604
1605   EXAMPLE: Run ping in parallel
1606       ping prints out statistics when killed with CTRL-C.
1607
1608       Unfortunately, CTRL-C will also normally kill GNU parallel.
1609
1610       But by using --open-tty and ignoring SIGINT you can get the wanted
1611       effect:
1612
1613         parallel -j0 --open-tty --lb --tag ping '{= $SIG{INT}=sub {} =}' \
1614           ::: 1.1.1.1 8.8.8.8 9.9.9.9 21.21.21.21 80.80.80.80 88.88.88.88
1615
1616       --open-tty will make the pings receive SIGINT (from CTRL-C).  CTRL-C
1617       will not kill GNU parallel, so that will only exit after ping is done.
1618
1619   EXAMPLE: GNU Parallel as queue system/batch manager
1620       GNU parallel can work as a simple job queue system or batch manager.
1621       The idea is to put the jobs into a file and have GNU parallel read from
1622       that continuously. As GNU parallel will stop at end of file we use tail
1623       to continue reading:
1624
1625         true >jobqueue; tail -n+0 -f jobqueue | parallel
1626
1627       To submit your jobs to the queue:
1628
1629         echo my_command my_arg >> jobqueue
1630
1631       You can of course use -S to distribute the jobs to remote computers:
1632
1633         true >jobqueue; tail -n+0 -f jobqueue | parallel -S ..
1634
1635       Output only will be printed when reading the next input after a job has
1636       finished: So you need to submit a job after the first has finished to
1637       see the output from the first job.
1638
1639       If you keep this running for a long time, jobqueue will grow. A way of
1640       removing the jobs already run is by making GNU parallel stop when it
1641       hits a special value and then restart. To use --eof to make GNU
1642       parallel exit, tail also needs to be forced to exit:
1643
1644         true >jobqueue;
1645         while true; do
1646           tail -n+0 -f jobqueue |
1647             (parallel -E StOpHeRe -S ..; echo GNU Parallel is now done;
1648              perl -e 'while(<>){/StOpHeRe/ and last};print <>' jobqueue > j2;
1649              (seq 1000 >> jobqueue &);
1650              echo Done appending dummy data forcing tail to exit)
1651           echo tail exited;
1652           mv j2 jobqueue
1653         done
1654
1655       In some cases you can run on more CPUs and computers during the night:
1656
1657         # Day time
1658         echo 50% > jobfile
1659         cp day_server_list ~/.parallel/sshloginfile
1660         # Night time
1661         echo 100% > jobfile
1662         cp night_server_list ~/.parallel/sshloginfile
1663         tail -n+0 -f jobqueue | parallel --jobs jobfile -S ..
1664
1665       GNU parallel discovers if jobfile or ~/.parallel/sshloginfile changes.
1666
1667   EXAMPLE: GNU Parallel as dir processor
1668       If you have a dir in which users drop files that needs to be processed
1669       you can do this on GNU/Linux (If you know what inotifywait is called on
1670       other platforms file a bug report):
1671
1672         inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
1673           parallel -u echo
1674
1675       This will run the command echo on each file put into my_dir or subdirs
1676       of my_dir.
1677
1678       You can of course use -S to distribute the jobs to remote computers:
1679
1680         inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
1681           parallel -S ..  -u echo
1682
1683       If the files to be processed are in a tar file then unpacking one file
1684       and processing it immediately may be faster than first unpacking all
1685       files. Set up the dir processor as above and unpack into the dir.
1686
1687       Using GNU parallel as dir processor has the same limitations as using
1688       GNU parallel as queue system/batch manager.
1689
1690   EXAMPLE: Locate the missing package
1691       If you have downloaded source and tried compiling it, you may have
1692       seen:
1693
1694         $ ./configure
1695         [...]
1696         checking for something.h... no
1697         configure: error: "libsomething not found"
1698
1699       Often it is not obvious which package you should install to get that
1700       file. Debian has `apt-file` to search for a file. `tracefile` from
1701       https://gitlab.com/ole.tange/tangetools can tell which files a program
1702       tried to access. In this case we are interested in one of the last
1703       files:
1704
1705         $ tracefile -un ./configure | tail | parallel -j0 apt-file search
1706

AUTHOR

1708       When using GNU parallel for a publication please cite:
1709
1710       O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
1711       The USENIX Magazine, February 2011:42-47.
1712
1713       This helps funding further development; and it won't cost you a cent.
1714       If you pay 10000 EUR you should feel free to use GNU Parallel without
1715       citing.
1716
1717       Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
1718
1719       Copyright (C) 2008-2010 Ole Tange, http://ole.tange.dk
1720
1721       Copyright (C) 2010-2023 Ole Tange, http://ole.tange.dk and Free
1722       Software Foundation, Inc.
1723
1724       Parts of the manual concerning xargs compatibility is inspired by the
1725       manual of xargs from GNU findutils 4.4.2.
1726

LICENSE

1728       This program is free software; you can redistribute it and/or modify it
1729       under the terms of the GNU General Public License as published by the
1730       Free Software Foundation; either version 3 of the License, or at your
1731       option any later version.
1732
1733       This program is distributed in the hope that it will be useful, but
1734       WITHOUT ANY WARRANTY; without even the implied warranty of
1735       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
1736       General Public License for more details.
1737
1738       You should have received a copy of the GNU General Public License along
1739       with this program.  If not, see <https://www.gnu.org/licenses/>.
1740
1741   Documentation license I
1742       Permission is granted to copy, distribute and/or modify this
1743       documentation under the terms of the GNU Free Documentation License,
1744       Version 1.3 or any later version published by the Free Software
1745       Foundation; with no Invariant Sections, with no Front-Cover Texts, and
1746       with no Back-Cover Texts.  A copy of the license is included in the
1747       file LICENSES/GFDL-1.3-or-later.txt.
1748
1749   Documentation license II
1750       You are free:
1751
1752       to Share to copy, distribute and transmit the work
1753
1754       to Remix to adapt the work
1755
1756       Under the following conditions:
1757
1758       Attribution
1759                You must attribute the work in the manner specified by the
1760                author or licensor (but not in any way that suggests that they
1761                endorse you or your use of the work).
1762
1763       Share Alike
1764                If you alter, transform, or build upon this work, you may
1765                distribute the resulting work only under the same, similar or
1766                a compatible license.
1767
1768       With the understanding that:
1769
1770       Waiver   Any of the above conditions can be waived if you get
1771                permission from the copyright holder.
1772
1773       Public Domain
1774                Where the work or any of its elements is in the public domain
1775                under applicable law, that status is in no way affected by the
1776                license.
1777
1778       Other Rights
1779                In no way are any of the following rights affected by the
1780                license:
1781
1782                • Your fair dealing or fair use rights, or other applicable
1783                  copyright exceptions and limitations;
1784
1785                • The author's moral rights;
1786
1787                • Rights other persons may have either in the work itself or
1788                  in how the work is used, such as publicity or privacy
1789                  rights.
1790
1791       Notice   For any reuse or distribution, you must make clear to others
1792                the license terms of this work.
1793
1794       A copy of the full license is included in the file as
1795       LICENCES/CC-BY-SA-4.0.txt
1796

GNU PARALLEL EXAMPLES

AUTHOR

LICENSE

SEE ALSO