parallel_examples(7)

1PARALLEL_EXAMPLES(7)               parallel               PARALLEL_EXAMPLES(7)
2
3
4

GNU PARALLEL EXAMPLES

6   EXAMPLE: Working as xargs -n1. Argument appending
7       GNU parallel can work similar to xargs -n1.
8
9       To compress all html files using gzip run:
10
11         find . -name '*.html' | parallel gzip --best
12
13       If the file names may contain a newline use -0. Substitute FOO BAR with
14       FUBAR in all files in this dir and subdirs:
15
16         find . -type f -print0 | \
17           parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'
18
19       Note -q is needed because of the space in 'FOO BAR'.
20
21   EXAMPLE: Simple network scanner
22       prips can generate IP-addresses from CIDR notation. With GNU parallel
23       you can build a simple network scanner to see which addresses respond
24       to ping:
25
26         prips 130.229.16.0/20 | \
27           parallel --timeout 2 -j0 \
28             'ping -c 1 {} >/dev/null && echo {}' 2>/dev/null
29
30   EXAMPLE: Reading arguments from command line
31       GNU parallel can take the arguments from command line instead of stdin
32       (standard input). To compress all html files in the current dir using
33       gzip run:
34
35         parallel gzip --best ::: *.html
36
37       To convert *.wav to *.mp3 using LAME running one process per CPU run:
38
39         parallel lame {} -o {.}.mp3 ::: *.wav
40
41   EXAMPLE: Inserting multiple arguments
42       When moving a lot of files like this: mv *.log destdir you will
43       sometimes get the error:
44
45         bash: /bin/mv: Argument list too long
46
47       because there are too many files. You can instead do:
48
49         ls | grep -E '\.log$' | parallel mv {} destdir
50
51       This will run mv for each file. It can be done faster if mv gets as
52       many arguments that will fit on the line:
53
54         ls | grep -E '\.log$' | parallel -m mv {} destdir
55
56       In many shells you can also use printf:
57
58         printf '%s\0' *.log | parallel -0 -m mv {} destdir
59
60   EXAMPLE: Context replace
61       To remove the files pict0000.jpg .. pict9999.jpg you could do:
62
63         seq -w 0 9999 | parallel rm pict{}.jpg
64
65       You could also do:
66
67         seq -w 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm
68
69       The first will run rm 10000 times, while the last will only run rm as
70       many times needed to keep the command line length short enough to avoid
71       Argument list too long (it typically runs 1-2 times).
72
73       You could also run:
74
75         seq -w 0 9999 | parallel -X rm pict{}.jpg
76
77       This will also only run rm as many times needed to keep the command
78       line length short enough.
79
80   EXAMPLE: Compute intensive jobs and substitution
81       If ImageMagick is installed this will generate a thumbnail of a jpg
82       file:
83
84         convert -geometry 120 foo.jpg thumb_foo.jpg
85
86       This will run with number-of-cpus jobs in parallel for all jpg files in
87       a directory:
88
89         ls *.jpg | parallel convert -geometry 120 {} thumb_{}
90
91       To do it recursively use find:
92
93         find . -name '*.jpg' | \
94           parallel convert -geometry 120 {} {}_thumb.jpg
95
96       Notice how the argument has to start with {} as {} will include path
97       (e.g. running convert -geometry 120 ./foo/bar.jpg thumb_./foo/bar.jpg
98       would clearly be wrong). The command will generate files like
99       ./foo/bar.jpg_thumb.jpg.
100
101       Use {.} to avoid the extra .jpg in the file name. This command will
102       make files like ./foo/bar_thumb.jpg:
103
104         find . -name '*.jpg' | \
105           parallel convert -geometry 120 {} {.}_thumb.jpg
106
107   EXAMPLE: Substitution and redirection
108       This will generate an uncompressed version of .gz-files next to the
109       .gz-file:
110
111         parallel zcat {} ">"{.} ::: *.gz
112
113       Quoting of > is necessary to postpone the redirection. Another solution
114       is to quote the whole command:
115
116         parallel "zcat {} >{.}" ::: *.gz
117
118       Other special shell characters (such as * ; $ > < |  >> <<) also need
119       to be put in quotes, as they may otherwise be interpreted by the shell
120       and not given to GNU parallel.
121
122   EXAMPLE: Composed commands
123       A job can consist of several commands. This will print the number of
124       files in each directory:
125
126         ls | parallel 'echo -n {}" "; ls {}|wc -l'
127
128       To put the output in a file called <name>.dir:
129
130         ls | parallel '(echo -n {}" "; ls {}|wc -l) >{}.dir'
131
132       Even small shell scripts can be run by GNU parallel:
133
134         find . | parallel 'a={}; name=${a##*/};' \
135           'upper=$(echo "$name" | tr "[:lower:]" "[:upper:]");'\
136           'echo "$name - $upper"'
137
138         ls | parallel 'mv {} "$(echo {} | tr "[:upper:]" "[:lower:]")"'
139
140       Given a list of URLs, list all URLs that fail to download. Print the
141       line number and the URL.
142
143         cat urlfile | parallel "wget {} 2>/dev/null || grep -n {} urlfile"
144
145       Create a mirror directory with the same filenames except all files and
146       symlinks are empty files.
147
148         cp -rs /the/source/dir mirror_dir
149         find mirror_dir -type l | parallel -m rm {} '&&' touch {}
150
151       Find the files in a list that do not exist
152
153         cat file_list | parallel 'if [ ! -e {} ] ; then echo {}; fi'
154
155   EXAMPLE: Composed command with perl replacement string
156       You have a bunch of file. You want them sorted into dirs. The dir of
157       each file should be named the first letter of the file name.
158
159         parallel 'mkdir -p {=s/(.).*/$1/=}; mv {} {=s/(.).*/$1/=}' ::: *
160
161   EXAMPLE: Composed command with multiple input sources
162       You have a dir with files named as 24 hours in 5 minute intervals:
163       00:00, 00:05, 00:10 .. 23:55. You want to find the files missing:
164
165         parallel [ -f {1}:{2} ] "||" echo {1}:{2} does not exist \
166           ::: {00..23} ::: {00..55..5}
167
168   EXAMPLE: Calling Bash functions
169       If the composed command is longer than a line, it becomes hard to read.
170       In Bash you can use functions. Just remember to export -f the function.
171
172         doit() {
173           echo Doing it for $1
174           sleep 2
175           echo Done with $1
176         }
177         export -f doit
178         parallel doit ::: 1 2 3
179
180         doubleit() {
181           echo Doing it for $1 $2
182           sleep 2
183           echo Done with $1 $2
184         }
185         export -f doubleit
186         parallel doubleit ::: 1 2 3 ::: a b
187
188       To do this on remote servers you need to transfer the function using
189       --env:
190
191         parallel --env doit -S server doit ::: 1 2 3
192         parallel --env doubleit -S server doubleit ::: 1 2 3 ::: a b
193
194       If your environment (aliases, variables, and functions) is small you
195       can copy the full environment without having to export -f anything. See
196       env_parallel.
197
198   EXAMPLE: Function tester
199       To test a program with different parameters:
200
201         tester() {
202           if (eval "$@") >&/dev/null; then
203             perl -e 'printf "\033[30;102m[ OK ]\033[0m @ARGV\n"' "$@"
204           else
205             perl -e 'printf "\033[30;101m[FAIL]\033[0m @ARGV\n"' "$@"
206           fi
207         }
208         export -f tester
209         parallel tester my_program ::: arg1 arg2
210         parallel tester exit ::: 1 0 2 0
211
212       If my_program fails a red FAIL will be printed followed by the failing
213       command; otherwise a green OK will be printed followed by the command.
214
215   EXAMPLE: Continously show the latest line of output
216       It can be useful to monitor the output of running jobs.
217
218       This shows the most recent output line until a job finishes. After
219       which the output of the job is printed in full:
220
221         parallel '{} | tee >(cat >&3)' ::: 'command 1' 'command 2' \
222           3> >(perl -ne '$|=1;chomp;printf"%.'$COLUMNS's\r",$_." "x100')
223
224   EXAMPLE: Log rotate
225       Log rotation renames a logfile to an extension with a higher number:
226       log.1 becomes log.2, log.2 becomes log.3, and so on. The oldest log is
227       removed. To avoid overwriting files the process starts backwards from
228       the high number to the low number.  This will keep 10 old versions of
229       the log:
230
231         seq 9 -1 1 | parallel -j1 mv log.{} log.'{= $_++ =}'
232         mv log log.1
233
234   EXAMPLE: Removing file extension when processing files
235       When processing files removing the file extension using {.} is often
236       useful.
237
238       Create a directory for each zip-file and unzip it in that dir:
239
240         parallel 'mkdir {.}; cd {.}; unzip ../{}' ::: *.zip
241
242       Recompress all .gz files in current directory using bzip2 running 1 job
243       per CPU in parallel:
244
245         parallel "zcat {} | bzip2 >{.}.bz2 && rm {}" ::: *.gz
246
247       Convert all WAV files to MP3 using LAME:
248
249         find sounddir -type f -name '*.wav' | parallel lame {} -o {.}.mp3
250
251       Put all converted in the same directory:
252
253         find sounddir -type f -name '*.wav' | \
254           parallel lame {} -o mydir/{/.}.mp3
255
256   EXAMPLE: Removing strings from the argument
257       If you have directory with tar.gz files and want these extracted in the
258       corresponding dir (e.g foo.tar.gz will be extracted in the dir foo) you
259       can do:
260
261         parallel --plus 'mkdir {..}; tar -C {..} -xf {}' ::: *.tar.gz
262
263       If you want to remove a different ending, you can use {%string}:
264
265         parallel --plus echo {%_demo} ::: mycode_demo keep_demo_here
266
267       You can also remove a starting string with {#string}
268
269         parallel --plus echo {#demo_} ::: demo_mycode keep_demo_here
270
271       To remove a string anywhere you can use regular expressions with
272       {/regexp/replacement} and leave the replacement empty:
273
274         parallel --plus echo {/demo_/} ::: demo_mycode remove_demo_here
275
276   EXAMPLE: Download 24 images for each of the past 30 days
277       Let us assume a website stores images like:
278
279         https://www.example.com/path/to/YYYYMMDD_##.jpg
280
281       where YYYYMMDD is the date and ## is the number 01-24. This will
282       download images for the past 30 days:
283
284         getit() {
285           date=$(date -d "today -$1 days" +%Y%m%d)
286           num=$2
287           echo wget https://www.example.com/path/to/${date}_${num}.jpg
288         }
289         export -f getit
290
291         parallel getit ::: $(seq 30) ::: $(seq -w 24)
292
293       $(date -d "today -$1 days" +%Y%m%d) will give the dates in YYYYMMDD
294       with $1 days subtracted.
295
296   EXAMPLE: Download world map from NASA
297       NASA provides tiles to download on earthdata.nasa.gov. Download tiles
298       for Blue Marble world map and create a 10240x20480 map.
299
300         base=https://map1a.vis.earthdata.nasa.gov/wmts-geo/wmts.cgi
301         service="SERVICE=WMTS&REQUEST=GetTile&VERSION=1.0.0"
302         layer="LAYER=BlueMarble_ShadedRelief_Bathymetry"
303         set="STYLE=&TILEMATRIXSET=EPSG4326_500m&TILEMATRIX=5"
304         tile="TILEROW={1}&TILECOL={2}"
305         format="FORMAT=image%2Fjpeg"
306         url="$base?$service&$layer&$set&$tile&$format"
307
308         parallel -j0 -q wget "$url" -O {1}_{2}.jpg ::: {0..19} ::: {0..39}
309         parallel eval convert +append {}_{0..39}.jpg line{}.jpg ::: {0..19}
310         convert -append line{0..19}.jpg world.jpg
311
312   EXAMPLE: Download Apollo-11 images from NASA using jq
313       Search NASA using their API to get JSON for images related to 'apollo
314       11' and has 'moon landing' in the description.
315
316       The search query returns JSON containing URLs to JSON containing
317       collections of pictures. One of the pictures in each of these
318       collection is large.
319
320       wget is used to get the JSON for the search query. jq is then used to
321       extract the URLs of the collections. parallel then calls wget to get
322       each collection, which is passed to jq to extract the URLs of all
323       images. grep filters out the large images, and parallel finally uses
324       wget to fetch the images.
325
326         base="https://images-api.nasa.gov/search"
327         q="q=apollo 11"
328         description="description=moon landing"
329         media_type="media_type=image"
330         wget -O - "$base?$q&$description&$media_type" |
331           jq -r .collection.items[].href |
332           parallel wget -O - |
333           jq -r .[] |
334           grep large |
335           parallel wget
336
337   EXAMPLE: Download video playlist in parallel
338       youtube-dl is an excellent tool to download videos. It can, however,
339       not download videos in parallel. This takes a playlist and downloads 10
340       videos in parallel.
341
342         url='youtu.be/watch?v=0wOf2Fgi3DE&list=UU_cznB5YZZmvAmeq7Y3EriQ'
343         export url
344         youtube-dl --flat-playlist "https://$url" |
345           parallel --tagstring {#} --lb -j10 \
346             youtube-dl --playlist-start {#} --playlist-end {#} '"https://$url"'
347
348   EXAMPLE: Prepend last modified date (ISO8601) to file name
349         parallel mv {} '{= $a=pQ($_); $b=$_;' \
350           '$_=qx{date -r "$a" +%FT%T}; chomp; $_="$_ $b" =}' ::: *
351
352       {= and =} mark a perl expression. pQ perl-quotes the string. date
353       +%FT%T is the date in ISO8601 with time.
354
355   EXAMPLE: Save output in ISO8601 dirs
356       Save output from ps aux every second into dirs named
357       yyyy-mm-ddThh:mm:ss+zz:zz.
358
359         seq 1000 | parallel -N0 -j1 --delay 1 \
360           --results '{= $_=`date -Isec`; chomp=}/' ps aux
361
362   EXAMPLE: Digital clock with "blinking" :
363       The : in a digital clock blinks. To make every other line have a ':'
364       and the rest a ' ' a perl expression is used to look at the 3rd input
365       source. If the value modulo 2 is 1: Use ":" otherwise use " ":
366
367         parallel -k echo {1}'{=3 $_=$_%2?":":" "=}'{2}{3} \
368           ::: {0..12} ::: {0..5} ::: {0..9}
369
370   EXAMPLE: Aggregating content of files
371       This:
372
373         parallel --header : echo x{X}y{Y}z{Z} \> x{X}y{Y}z{Z} \
374         ::: X {1..5} ::: Y {01..10} ::: Z {1..5}
375
376       will generate the files x1y01z1 .. x5y10z5. If you want to aggregate
377       the output grouping on x and z you can do this:
378
379         parallel eval 'cat {=s/y01/y*/=} > {=s/y01//=}' ::: *y01*
380
381       For all values of x and z it runs commands like:
382
383         cat x1y*z1 > x1z1
384
385       So you end up with x1z1 .. x5z5 each containing the content of all
386       values of y.
387
388   EXAMPLE: Breadth first parallel web crawler/mirrorer
389       This script below will crawl and mirror a URL in parallel.  It
390       downloads first pages that are 1 click down, then 2 clicks down, then
391       3; instead of the normal depth first, where the first link link on each
392       page is fetched first.
393
394       Run like this:
395
396         PARALLEL=-j100 ./parallel-crawl http://gatt.org.yeslab.org/
397
398       Remove the wget part if you only want a web crawler.
399
400       It works by fetching a page from a list of URLs and looking for links
401       in that page that are within the same starting URL and that have not
402       already been seen. These links are added to a new queue. When all the
403       pages from the list is done, the new queue is moved to the list of URLs
404       and the process is started over until no unseen links are found.
405
406         #!/bin/bash
407
408         # E.g. http://gatt.org.yeslab.org/
409         URL=$1
410         # Stay inside the start dir
411         BASEURL=$(echo $URL | perl -pe 's:#.*::; s:(//.*/)[^/]*:$1:')
412         URLLIST=$(mktemp urllist.XXXX)
413         URLLIST2=$(mktemp urllist.XXXX)
414         SEEN=$(mktemp seen.XXXX)
415
416         # Spider to get the URLs
417         echo $URL >$URLLIST
418         cp $URLLIST $SEEN
419
420         while [ -s $URLLIST ] ; do
421           cat $URLLIST |
422             parallel lynx -listonly -image_links -dump {} \; \
423               wget -qm -l1 -Q1 {} \; echo Spidered: {} \>\&2 |
424               perl -ne 's/#.*//; s/\s+\d+.\s(\S+)$/$1/ and
425                 do { $seen{$1}++ or print }' |
426             grep -F $BASEURL |
427             grep -v -x -F -f $SEEN | tee -a $SEEN > $URLLIST2
428           mv $URLLIST2 $URLLIST
429         done
430
431         rm -f $URLLIST $URLLIST2 $SEEN
432
433   EXAMPLE: Process files from a tar file while unpacking
434       If the files to be processed are in a tar file then unpacking one file
435       and processing it immediately may be faster than first unpacking all
436       files.
437
438         tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' | \
439           parallel echo
440
441       The Perl one-liner is needed to make sure the file is complete before
442       handing it to GNU parallel.
443
444   EXAMPLE: Rewriting a for-loop and a while-read-loop
445       for-loops like this:
446
447         (for x in `cat list` ; do
448           do_something $x
449         done) | process_output
450
451       and while-read-loops like this:
452
453         cat list | (while read x ; do
454           do_something $x
455         done) | process_output
456
457       can be written like this:
458
459         cat list | parallel do_something | process_output
460
461       For example: Find which host name in a list has IP address 1.2.3 4:
462
463         cat hosts.txt | parallel -P 100 host | grep 1.2.3.4
464
465       If the processing requires more steps the for-loop like this:
466
467         (for x in `cat list` ; do
468           no_extension=${x%.*};
469           do_step1 $x scale $no_extension.jpg
470           do_step2 <$x $no_extension
471         done) | process_output
472
473       and while-loops like this:
474
475         cat list | (while read x ; do
476           no_extension=${x%.*};
477           do_step1 $x scale $no_extension.jpg
478           do_step2 <$x $no_extension
479         done) | process_output
480
481       can be written like this:
482
483         cat list | parallel "do_step1 {} scale {.}.jpg ; do_step2 <{} {.}" |\
484           process_output
485
486       If the body of the loop is bigger, it improves readability to use a
487       function:
488
489         (for x in `cat list` ; do
490           do_something $x
491           [... 100 lines that do something with $x ...]
492         done) | process_output
493
494         cat list | (while read x ; do
495           do_something $x
496           [... 100 lines that do something with $x ...]
497         done) | process_output
498
499       can both be rewritten as:
500
501         doit() {
502           x=$1
503           do_something $x
504           [... 100 lines that do something with $x ...]
505         }
506         export -f doit
507         cat list | parallel doit
508
509   EXAMPLE: Rewriting nested for-loops
510       Nested for-loops like this:
511
512         (for x in `cat xlist` ; do
513           for y in `cat ylist` ; do
514             do_something $x $y
515           done
516         done) | process_output
517
518       can be written like this:
519
520         parallel do_something {1} {2} :::: xlist ylist | process_output
521
522       Nested for-loops like this:
523
524         (for colour in red green blue ; do
525           for size in S M L XL XXL ; do
526             echo $colour $size
527           done
528         done) | sort
529
530       can be written like this:
531
532         parallel echo {1} {2} ::: red green blue ::: S M L XL XXL | sort
533
534   EXAMPLE: Finding the lowest difference between files
535       diff is good for finding differences in text files. diff | wc -l gives
536       an indication of the size of the difference. To find the differences
537       between all files in the current dir do:
538
539         parallel --tag 'diff {1} {2} | wc -l' ::: * ::: * | sort -nk3
540
541       This way it is possible to see if some files are closer to other files.
542
543   EXAMPLE: for-loops with column names
544       When doing multiple nested for-loops it can be easier to keep track of
545       the loop variable if is is named instead of just having a number. Use
546       --header : to let the first argument be an named alias for the
547       positional replacement string:
548
549         parallel --header : echo {colour} {size} \
550           ::: colour red green blue ::: size S M L XL XXL
551
552       This also works if the input file is a file with columns:
553
554         cat addressbook.tsv | \
555           parallel --colsep '\t' --header : echo {Name} {E-mail address}
556
557   EXAMPLE: All combinations in a list
558       GNU parallel makes all combinations when given two lists.
559
560       To make all combinations in a single list with unique values, you
561       repeat the list and use replacement string {choose_k}:
562
563         parallel --plus echo {choose_k} ::: A B C D ::: A B C D
564
565         parallel --plus echo 2{2choose_k} 1{1choose_k} ::: A B C D ::: A B C D
566
567       {choose_k} works for any number of input sources:
568
569         parallel --plus echo {choose_k} ::: A B C D ::: A B C D ::: A B C D
570
571       Where {choose_k} does not care about order, {uniq} cares about order.
572       It simply skips jobs where values from different input sources are the
573       same:
574
575         parallel --plus echo {uniq} ::: A B C  ::: A B C  ::: A B C
576         parallel --plus echo {1uniq}+{2uniq}+{3uniq} ::: A B C  ::: A B C  ::: A B C
577
578   EXAMPLE: From a to b and b to c
579       Assume you have input like:
580
581         aardvark
582         babble
583         cab
584         dab
585         each
586
587       and want to run combinations like:
588
589         aardvark babble
590         babble cab
591         cab dab
592         dab each
593
594       If the input is in the file in.txt:
595
596         parallel echo {1} - {2} ::::+ <(head -n -1 in.txt) <(tail -n +2 in.txt)
597
598       If the input is in the array $a here are two solutions:
599
600         seq $((${#a[@]}-1)) | \
601           env_parallel --env a echo '${a[{=$_--=}]} - ${a[{}]}'
602         parallel echo {1} - {2} ::: "${a[@]::${#a[@]}-1}" :::+ "${a[@]:1}"
603
604   EXAMPLE: Count the differences between all files in a dir
605       Using --results the results are saved in /tmp/diffcount*.
606
607         parallel --results /tmp/diffcount "diff -U 0 {1} {2} | \
608           tail -n +3 |grep -v '^@'|wc -l" ::: * ::: *
609
610       To see the difference between file A and file B look at the file
611       '/tmp/diffcount/1/A/2/B'.
612
613   EXAMPLE: Speeding up fast jobs
614       Starting a job on the local machine takes around 3-10 ms. This can be a
615       big overhead if the job takes very few ms to run. Often you can group
616       small jobs together using -X which will make the overhead less
617       significant. Compare the speed of these:
618
619         seq -w 0 9999 | parallel touch pict{}.jpg
620         seq -w 0 9999 | parallel -X touch pict{}.jpg
621
622       If your program cannot take multiple arguments, then you can use GNU
623       parallel to spawn multiple GNU parallels:
624
625         seq -w 0 9999999 | \
626           parallel -j10 -q -I,, --pipe parallel -j0 touch pict{}.jpg
627
628       If -j0 normally spawns 252 jobs, then the above will try to spawn 2520
629       jobs. On a normal GNU/Linux system you can spawn 32000 jobs using this
630       technique with no problems. To raise the 32000 jobs limit raise
631       /proc/sys/kernel/pid_max to 4194303.
632
633       If you do not need GNU parallel to have control over each job (so no
634       need for --retries or --joblog or similar), then it can be even faster
635       if you can generate the command lines and pipe those to a shell. So if
636       you can do this:
637
638         mygenerator | sh
639
640       Then that can be parallelized like this:
641
642         mygenerator | parallel --pipe --block 10M sh
643
644       E.g.
645
646         mygenerator() {
647           seq 10000000 | perl -pe 'print "echo This is fast job number "';
648         }
649         mygenerator | parallel --pipe --block 10M sh
650
651       The overhead is 100000 times smaller namely around 100 nanoseconds per
652       job.
653
654   EXAMPLE: Using shell variables
655       When using shell variables you need to quote them correctly as they may
656       otherwise be interpreted by the shell.
657
658       Notice the difference between:
659
660         ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
661         parallel echo ::: ${ARR[@]} # This is probably not what you want
662
663       and:
664
665         ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
666         parallel echo ::: "${ARR[@]}"
667
668       When using variables in the actual command that contains special
669       characters (e.g. space) you can quote them using '"$VAR"' or using "'s
670       and -q:
671
672         VAR="My brother's 12\" records are worth <\$\$\$>"
673         parallel -q echo "$VAR" ::: '!'
674         export VAR
675         parallel echo '"$VAR"' ::: '!'
676
677       If $VAR does not contain ' then "'$VAR'" will also work (and does not
678       need export):
679
680         VAR="My 12\" records are worth <\$\$\$>"
681         parallel echo "'$VAR'" ::: '!'
682
683       If you use them in a function you just quote as you normally would do:
684
685         VAR="My brother's 12\" records are worth <\$\$\$>"
686         export VAR
687         myfunc() { echo "$VAR" "$1"; }
688         export -f myfunc
689         parallel myfunc ::: '!'
690
691   EXAMPLE: Group output lines
692       When running jobs that output data, you often do not want the output of
693       multiple jobs to run together. GNU parallel defaults to grouping the
694       output of each job, so the output is printed when the job finishes. If
695       you want full lines to be printed while the job is running you can use
696       --line-buffer. If you want output to be printed as soon as possible you
697       can use -u.
698
699       Compare the output of:
700
701         parallel wget --progress=dot --limit-rate=100k \
702           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
703           ::: {12..16}
704         parallel --line-buffer wget --progress=dot --limit-rate=100k \
705           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
706           ::: {12..16}
707         parallel --latest-line wget --progress=dot --limit-rate=100k \
708           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
709           ::: {12..16}
710         parallel -u wget --progress=dot --limit-rate=100k \
711           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
712           ::: {12..16}
713
714   EXAMPLE: Tag output lines
715       GNU parallel groups the output lines, but it can be hard to see where
716       the different jobs begin. --tag prepends the argument to make that more
717       visible:
718
719         parallel --tag wget --limit-rate=100k \
720           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
721           ::: {12..16}
722
723       --tag works with --line-buffer but not with -u:
724
725         parallel --tag --line-buffer wget --limit-rate=100k \
726           https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
727           ::: {12..16}
728
729       Check the uptime of the servers in ~/.parallel/sshloginfile:
730
731         parallel --tag -S .. --nonall uptime
732
733   EXAMPLE: Colorize output
734       Give each job a new color. Most terminals support ANSI colors with the
735       escape code "\033[30;3Xm" where 0 <= X <= 7:
736
737           seq 10 | \
738             parallel --tagstring '\033[30;3{=$_=++$::color%8=}m' seq {}
739           parallel --rpl '{color} $_="\033[30;3".(++$::color%8)."m"' \
740             --tagstring {color} seq {} ::: {1..10}
741
742       To get rid of the initial \t (which comes from --tagstring):
743
744           ... | perl -pe 's/\t//'
745
746   EXAMPLE: Keep order of output same as order of input
747       Normally the output of a job will be printed as soon as it completes.
748       Sometimes you want the order of the output to remain the same as the
749       order of the input. This is often important, if the output is used as
750       input for another system. -k will make sure the order of output will be
751       in the same order as input even if later jobs end before earlier jobs.
752
753       Append a string to every line in a text file:
754
755         cat textfile | parallel -k echo {} append_string
756
757       If you remove -k some of the lines may come out in the wrong order.
758
759       Another example is traceroute:
760
761         parallel traceroute ::: qubes-os.org debian.org freenetproject.org
762
763       will give traceroute of qubes-os.org, debian.org and
764       freenetproject.org, but it will be sorted according to which job
765       completed first.
766
767       To keep the order the same as input run:
768
769         parallel -k traceroute ::: qubes-os.org debian.org freenetproject.org
770
771       This will make sure the traceroute to qubes-os.org will be printed
772       first.
773
774       A bit more complex example is downloading a huge file in chunks in
775       parallel: Some internet connections will deliver more data if you
776       download files in parallel. For downloading files in parallel see:
777       "EXAMPLE: Download 10 images for each of the past 30 days". But if you
778       are downloading a big file you can download the file in chunks in
779       parallel.
780
781       To download byte 10000000-19999999 you can use curl:
782
783         curl -r 10000000-19999999 https://example.com/the/big/file >file.part
784
785       To download a 1 GB file we need 100 10MB chunks downloaded and combined
786       in the correct order.
787
788         seq 0 99 | parallel -k curl -r \
789           {}0000000-{}9999999 https://example.com/the/big/file > file
790
791   EXAMPLE: Parallel grep
792       grep -r greps recursively through directories. GNU parallel can often
793       speed this up.
794
795         find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}
796
797       This will run 1.5 job per CPU, and give 1000 arguments to grep.
798
799       There are situations where the above will be slower than grep -r:
800
801       • If data is already in RAM. The overhead of starting jobs and
802         buffering output may outweigh the benefit of running in parallel.
803
804       • If the files are big. If a file cannot be read in a single seek, the
805         disk may start thrashing.
806
807       The speedup is caused by two factors:
808
809       • On rotating harddisks small files often require a seek for each file.
810         By searching for more files in parallel, the arm may pass another
811         wanted file on its way.
812
813       • NVMe drives often perform better by having multiple command running
814         in parallel.
815
816   EXAMPLE: Grepping n lines for m regular expressions.
817       The simplest solution to grep a big file for a lot of regexps is:
818
819         grep -f regexps.txt bigfile
820
821       Or if the regexps are fixed strings:
822
823         grep -F -f regexps.txt bigfile
824
825       There are 3 limiting factors: CPU, RAM, and disk I/O.
826
827       RAM is easy to measure: If the grep process takes up most of your free
828       memory (e.g. when running top), then RAM is a limiting factor.
829
830       CPU is also easy to measure: If the grep takes >90% CPU in top, then
831       the CPU is a limiting factor, and parallelization will speed this up.
832
833       It is harder to see if disk I/O is the limiting factor, and depending
834       on the disk system it may be faster or slower to parallelize. The only
835       way to know for certain is to test and measure.
836
837       Limiting factor: RAM
838
839       The normal grep -f regexps.txt bigfile works no matter the size of
840       bigfile, but if regexps.txt is so big it cannot fit into memory, then
841       you need to split this.
842
843       grep -F takes around 100 bytes of RAM and grep takes about 500 bytes of
844       RAM per 1 byte of regexp. So if regexps.txt is 1% of your RAM, then it
845       may be too big.
846
847       If you can convert your regexps into fixed strings do that. E.g. if the
848       lines you are looking for in bigfile all looks like:
849
850         ID1 foo bar baz Identifier1 quux
851         fubar ID2 foo bar baz Identifier2
852
853       then your regexps.txt can be converted from:
854
855         ID1.*Identifier1
856         ID2.*Identifier2
857
858       into:
859
860         ID1 foo bar baz Identifier1
861         ID2 foo bar baz Identifier2
862
863       This way you can use grep -F which takes around 80% less memory and is
864       much faster.
865
866       If it still does not fit in memory you can do this:
867
868         parallel --pipe-part -a regexps.txt --block 1M grep -F -f - -n bigfile | \
869           sort -un | perl -pe 's/^\d+://'
870
871       The 1M should be your free memory divided by the number of CPU threads
872       and divided by 200 for grep -F and by 1000 for normal grep. On
873       GNU/Linux you can do:
874
875         free=$(awk '/^((Swap)?Cached|MemFree|Buffers):/ { sum += $2 }
876                     END { print sum }' /proc/meminfo)
877         percpu=$((free / 200 / $(parallel --number-of-threads)))k
878
879         parallel --pipe-part -a regexps.txt --block $percpu --compress \
880           grep -F -f - -n bigfile | \
881           sort -un | perl -pe 's/^\d+://'
882
883       If you can live with duplicated lines and wrong order, it is faster to
884       do:
885
886         parallel --pipe-part -a regexps.txt --block $percpu --compress \
887           grep -F -f - bigfile
888
889       Limiting factor: CPU
890
891       If the CPU is the limiting factor parallelization should be done on the
892       regexps:
893
894         cat regexps.txt | parallel --pipe -L1000 --round-robin --compress \
895           grep -f - -n bigfile | \
896           sort -un | perl -pe 's/^\d+://'
897
898       The command will start one grep per CPU and read bigfile one time per
899       CPU, but as that is done in parallel, all reads except the first will
900       be cached in RAM. Depending on the size of regexps.txt it may be faster
901       to use --block 10m instead of -L1000.
902
903       Some storage systems perform better when reading multiple chunks in
904       parallel. This is true for some RAID systems and for some network file
905       systems. To parallelize the reading of bigfile:
906
907         parallel --pipe-part --block 100M -a bigfile -k --compress \
908           grep -f regexps.txt
909
910       This will split bigfile into 100MB chunks and run grep on each of these
911       chunks. To parallelize both reading of bigfile and regexps.txt combine
912       the two using --cat:
913
914         parallel --pipe-part --block 100M -a bigfile --cat cat regexps.txt \
915           \| parallel --pipe -L1000 --round-robin grep -f - {}
916
917       If a line matches multiple regexps, the line may be duplicated.
918
919       Bigger problem
920
921       If the problem is too big to be solved by this, you are probably ready
922       for Lucene.
923
924   EXAMPLE: Using remote computers
925       To run commands on a remote computer SSH needs to be set up and you
926       must be able to login without entering a password (The commands ssh-
927       copy-id, ssh-agent, and sshpass may help you do that).
928
929       If you need to login to a whole cluster, you typically do not want to
930       accept the host key for every host. You want to accept them the first
931       time and be warned if they are ever changed. To do that:
932
933         # Add the servers to the sshloginfile
934         (echo servera; echo serverb) > .parallel/my_cluster
935         # Make sure .ssh/config exist
936         touch .ssh/config
937         cp .ssh/config .ssh/config.backup
938         # Disable StrictHostKeyChecking temporarily
939         (echo 'Host *'; echo StrictHostKeyChecking no) >> .ssh/config
940         parallel --slf my_cluster --nonall true
941         # Remove the disabling of StrictHostKeyChecking
942         mv .ssh/config.backup .ssh/config
943
944       The servers in .parallel/my_cluster are now added in .ssh/known_hosts.
945
946       To run echo on server.example.com:
947
948         seq 10 | parallel --sshlogin server.example.com echo
949
950       To run commands on more than one remote computer run:
951
952         seq 10 | parallel --sshlogin s1.example.com,s2.example.net echo
953
954       Or:
955
956         seq 10 | parallel --sshlogin server.example.com \
957           --sshlogin server2.example.net echo
958
959       If the login username is foo on server2.example.net use:
960
961         seq 10 | parallel --sshlogin server.example.com \
962           --sshlogin foo@server2.example.net echo
963
964       If your list of hosts is server1-88.example.net with login foo:
965
966         seq 10 | parallel -Sfoo@server{1..88}.example.net echo
967
968       To distribute the commands to a list of computers, make a file
969       mycomputers with all the computers:
970
971         server.example.com
972         foo@server2.example.com
973         server3.example.com
974
975       Then run:
976
977         seq 10 | parallel --sshloginfile mycomputers echo
978
979       To include the local computer add the special sshlogin ':' to the list:
980
981         server.example.com
982         foo@server2.example.com
983         server3.example.com
984         :
985
986       GNU parallel will try to determine the number of CPUs on each of the
987       remote computers, and run one job per CPU - even if the remote
988       computers do not have the same number of CPUs.
989
990       If the number of CPUs on the remote computers is not identified
991       correctly the number of CPUs can be added in front. Here the computer
992       has 8 CPUs.
993
994         seq 10 | parallel --sshlogin 8/server.example.com echo
995
996   EXAMPLE: Transferring of files
997       To recompress gzipped files with bzip2 using a remote computer run:
998
999         find logs/ -name '*.gz' | \
1000           parallel --sshlogin server.example.com \
1001           --transfer "zcat {} | bzip2 -9 >{.}.bz2"
1002
1003       This will list the .gz-files in the logs directory and all directories
1004       below. Then it will transfer the files to server.example.com to the
1005       corresponding directory in $HOME/logs. On server.example.com the file
1006       will be recompressed using zcat and bzip2 resulting in the
1007       corresponding file with .gz replaced with .bz2.
1008
1009       If you want the resulting bz2-file to be transferred back to the local
1010       computer add --return {.}.bz2:
1011
1012         find logs/ -name '*.gz' | \
1013           parallel --sshlogin server.example.com \
1014           --transfer --return {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
1015
1016       After the recompressing is done the .bz2-file is transferred back to
1017       the local computer and put next to the original .gz-file.
1018
1019       If you want to delete the transferred files on the remote computer add
1020       --cleanup. This will remove both the file transferred to the remote
1021       computer and the files transferred from the remote computer:
1022
1023         find logs/ -name '*.gz' | \
1024           parallel --sshlogin server.example.com \
1025           --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
1026
1027       If you want run on several computers add the computers to --sshlogin
1028       either using ',' or multiple --sshlogin:
1029
1030         find logs/ -name '*.gz' | \
1031           parallel --sshlogin server.example.com,server2.example.com \
1032           --sshlogin server3.example.com \
1033           --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
1034
1035       You can add the local computer using --sshlogin :. This will disable
1036       the removing and transferring for the local computer only:
1037
1038         find logs/ -name '*.gz' | \
1039           parallel --sshlogin server.example.com,server2.example.com \
1040           --sshlogin server3.example.com \
1041           --sshlogin : \
1042           --transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
1043
1044       Often --transfer, --return and --cleanup are used together. They can be
1045       shortened to --trc:
1046
1047         find logs/ -name '*.gz' | \
1048           parallel --sshlogin server.example.com,server2.example.com \
1049           --sshlogin server3.example.com \
1050           --sshlogin : \
1051           --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
1052
1053       With the file mycomputers containing the list of computers it becomes:
1054
1055         find logs/ -name '*.gz' | parallel --sshloginfile mycomputers \
1056           --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
1057
1058       If the file ~/.parallel/sshloginfile contains the list of computers the
1059       special short hand -S .. can be used:
1060
1061         find logs/ -name '*.gz' | parallel -S .. \
1062           --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
1063
1064   EXAMPLE: Advanced file transfer
1065       Assume you have files in in/*, want them processed on server, and
1066       transferred back into /other/dir:
1067
1068         parallel -S server --trc /other/dir/./{/}.out \
1069           cp {/} {/}.out ::: in/./*
1070
1071   EXAMPLE: Distributing work to local and remote computers
1072       Convert *.mp3 to *.ogg running one process per CPU on local computer
1073       and server2:
1074
1075         parallel --trc {.}.ogg -S server2,: \
1076           'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg' ::: *.mp3
1077
1078   EXAMPLE: Running the same command on remote computers
1079       To run the command uptime on remote computers you can do:
1080
1081         parallel --tag --nonall -S server1,server2 uptime
1082
1083       --nonall reads no arguments. If you have a list of jobs you want to run
1084       on each computer you can do:
1085
1086         parallel --tag --onall -S server1,server2 echo ::: 1 2 3
1087
1088       Remove --tag if you do not want the sshlogin added before the output.
1089
1090       If you have a lot of hosts use '-j0' to access more hosts in parallel.
1091
1092   EXAMPLE: Running 'sudo' on remote computers
1093       Put the password into passwordfile then run:
1094
1095         parallel --ssh 'cat passwordfile | ssh' --nonall \
1096           -S user@server1,user@server2 sudo -S ls -l /root
1097
1098   EXAMPLE: Using remote computers behind NAT wall
1099       If the workers are behind a NAT wall, you need some trickery to get to
1100       them.
1101
1102       If you can ssh to a jumphost, and reach the workers from there, then
1103       the obvious solution would be this, but it does not work:
1104
1105         parallel --ssh 'ssh jumphost ssh' -S host1 echo ::: DOES NOT WORK
1106
1107       It does not work because the command is dequoted by ssh twice where as
1108       GNU parallel only expects it to be dequoted once.
1109
1110       You can use a bash function and have GNU parallel quote the command:
1111
1112         jumpssh() { ssh -A jumphost ssh $(parallel --shellquote ::: "$@"); }
1113         export -f jumpssh
1114         parallel --ssh jumpssh -S host1 echo ::: this works
1115
1116       Or you can instead put this in ~/.ssh/config:
1117
1118         Host host1 host2 host3
1119           ProxyCommand ssh jumphost.domain nc -w 1 %h 22
1120
1121       It requires nc(netcat) to be installed on jumphost. With this you can
1122       simply:
1123
1124         parallel -S host1,host2,host3 echo ::: This does work
1125
1126       No jumphost, but port forwards
1127
1128       If there is no jumphost but each server has port 22 forwarded from the
1129       firewall (e.g. the firewall's port 22001 = port 22 on host1, 22002 =
1130       host2, 22003 = host3) then you can use ~/.ssh/config:
1131
1132         Host host1.v
1133           Port 22001
1134         Host host2.v
1135           Port 22002
1136         Host host3.v
1137           Port 22003
1138         Host *.v
1139           Hostname firewall
1140
1141       And then use host{1..3}.v as normal hosts:
1142
1143         parallel -S host1.v,host2.v,host3.v echo ::: a b c
1144
1145       No jumphost, no port forwards
1146
1147       If ports cannot be forwarded, you need some sort of VPN to traverse the
1148       NAT-wall. TOR is one options for that, as it is very easy to get
1149       working.
1150
1151       You need to install TOR and setup a hidden service. In torrc put:
1152
1153         HiddenServiceDir /var/lib/tor/hidden_service/
1154         HiddenServicePort 22 127.0.0.1:22
1155
1156       Then start TOR: /etc/init.d/tor restart
1157
1158       The TOR hostname is now in /var/lib/tor/hidden_service/hostname and is
1159       something similar to izjafdceobowklhz.onion. Now you simply prepend
1160       torsocks to ssh:
1161
1162         parallel --ssh 'torsocks ssh' -S izjafdceobowklhz.onion \
1163           -S zfcdaeiojoklbwhz.onion,auclucjzobowklhi.onion echo ::: a b c
1164
1165       If not all hosts are accessible through TOR:
1166
1167         parallel -S 'torsocks ssh izjafdceobowklhz.onion,host2,host3' \
1168           echo ::: a b c
1169
1170       See more ssh tricks on
1171       https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Proxies_and_Jump_Hosts
1172
1173   EXAMPLE: Use sshpass with ssh
1174       If you cannot use passwordless login, you may be able to use sshpass:
1175
1176         seq 10 | parallel -S user-with-password:MyPassword@server echo
1177
1178       or:
1179
1180         export SSHPASS='MyPa$$w0rd'
1181         seq 10 | parallel -S user-with-password:@server echo
1182
1183   EXAMPLE: Use outrun instead of ssh
1184       outrun lets you run a command on a remote server. outrun sets up a
1185       connection to access files at the source server, and automatically
1186       transfers files. outrun must be installed on the remote system.
1187
1188       You can use outrun in an sshlogin this way:
1189
1190         parallel -S 'outrun user@server' command
1191
1192       or:
1193
1194         parallel --ssh outrun -S server command
1195
1196   EXAMPLE: Slurm cluster
1197       The Slurm Workload Manager is used in many clusters.
1198
1199       Here is a simple example of using GNU parallel to call srun:
1200
1201         #!/bin/bash
1202
1203         #SBATCH --time 00:02:00
1204         #SBATCH --ntasks=4
1205         #SBATCH --job-name GnuParallelDemo
1206         #SBATCH --output gnuparallel.out
1207
1208         module purge
1209         module load gnu_parallel
1210
1211         my_parallel="parallel --delay .2 -j $SLURM_NTASKS"
1212         my_srun="srun --export=all --exclusive -n1 --cpus-per-task=1 --cpu-bind=cores"
1213         $my_parallel "$my_srun" echo This is job {} ::: {1..20}
1214
1215   EXAMPLE: Parallelizing rsync
1216       rsync is a great tool, but sometimes it will not fill up the available
1217       bandwidth. Running multiple rsync in parallel can fix this.
1218
1219         cd src-dir
1220         find . -type f |
1221           parallel -j10 -X rsync -zR -Ha ./{} fooserver:/dest-dir/
1222
1223       Adjust -j10 until you find the optimal number.
1224
1225       rsync -R will create the needed subdirectories, so all files are not
1226       put into a single dir. The ./ is needed so the resulting command looks
1227       similar to:
1228
1229         rsync -zR ././sub/dir/file fooserver:/dest-dir/
1230
1231       The /./ is what rsync -R works on.
1232
1233       If you are unable to push data, but need to pull them and the files are
1234       called digits.png (e.g. 000000.png) you might be able to do:
1235
1236         seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/
1237
1238   EXAMPLE: Use multiple inputs in one command
1239       Copy files like foo.es.ext to foo.ext:
1240
1241         ls *.es.* | perl -pe 'print; s/\.es//' | parallel -N2 cp {1} {2}
1242
1243       The perl command spits out 2 lines for each input. GNU parallel takes 2
1244       inputs (using -N2) and replaces {1} and {2} with the inputs.
1245
1246       Count in binary:
1247
1248         parallel -k echo ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1
1249
1250       Print the number on the opposing sides of a six sided die:
1251
1252         parallel --link -a <(seq 6) -a <(seq 6 -1 1) echo
1253         parallel --link echo :::: <(seq 6) <(seq 6 -1 1)
1254
1255       Convert files from all subdirs to PNG-files with consecutive numbers
1256       (useful for making input PNG's for ffmpeg):
1257
1258         parallel --link -a <(find . -type f | sort) \
1259           -a <(seq $(find . -type f|wc -l)) convert {1} {2}.png
1260
1261       Alternative version:
1262
1263         find . -type f | sort | parallel convert {} {#}.png
1264
1265   EXAMPLE: Use a table as input
1266       Content of table_file.tsv:
1267
1268         foo<TAB>bar
1269         baz <TAB> quux
1270
1271       To run:
1272
1273         cmd -o bar -i foo
1274         cmd -o quux -i baz
1275
1276       you can run:
1277
1278         parallel -a table_file.tsv --colsep '\t' cmd -o {2} -i {1}
1279
1280       Note: The default for GNU parallel is to remove the spaces around the
1281       columns. To keep the spaces:
1282
1283         parallel -a table_file.tsv --trim n --colsep '\t' cmd -o {2} -i {1}
1284
1285   EXAMPLE: Output to database
1286       GNU parallel can output to a database table and a CSV-file:
1287
1288         dburl=csv:///%2Ftmp%2Fmydir
1289         dbtableurl=$dburl/mytable.csv
1290         parallel --sqlandworker $dbtableurl seq ::: {1..10}
1291
1292       It is rather slow and takes up a lot of CPU time because GNU parallel
1293       parses the whole CSV file for each update.
1294
1295       A better approach is to use an SQLite-base and then convert that to
1296       CSV:
1297
1298         dburl=sqlite3:///%2Ftmp%2Fmy.sqlite
1299         dbtableurl=$dburl/mytable
1300         parallel --sqlandworker $dbtableurl seq ::: {1..10}
1301         sql $dburl '.headers on' '.mode csv' 'SELECT * FROM mytable;'
1302
1303       This takes around a second per job.
1304
1305       If you have access to a real database system, such as PostgreSQL, it is
1306       even faster:
1307
1308         dburl=pg://user:pass@host/mydb
1309         dbtableurl=$dburl/mytable
1310         parallel --sqlandworker $dbtableurl seq ::: {1..10}
1311         sql $dburl \
1312           "COPY (SELECT * FROM mytable) TO stdout DELIMITER ',' CSV HEADER;"
1313
1314       Or MySQL:
1315
1316         dburl=mysql://user:pass@host/mydb
1317         dbtableurl=$dburl/mytable
1318         parallel --sqlandworker $dbtableurl seq ::: {1..10}
1319         sql -p -B $dburl "SELECT * FROM mytable;" > mytable.tsv
1320         perl -pe 's/"/""/g; s/\t/","/g; s/^/"/; s/$/"/;
1321           %s=("\\" => "\\", "t" => "\t", "n" => "\n");
1322           s/\\([\\tn])/$s{$1}/g;' mytable.tsv
1323
1324   EXAMPLE: Output to CSV-file for R
1325       If you have no need for the advanced job distribution control that a
1326       database provides, but you simply want output into a CSV file that you
1327       can read into R or LibreCalc, then you can use --results:
1328
1329         parallel --results my.csv seq ::: 10 20 30
1330         R
1331         > mydf <- read.csv("my.csv");
1332         > print(mydf[2,])
1333         > write(as.character(mydf[2,c("Stdout")]),'')
1334
1335   EXAMPLE: Use XML as input
1336       The show Aflyttet on Radio 24syv publishes an RSS feed with their audio
1337       podcasts on: http://arkiv.radio24syv.dk/audiopodcast/channel/4466232
1338
1339       Using xpath you can extract the URLs for 2019 and download them using
1340       GNU parallel:
1341
1342         wget -O - http://arkiv.radio24syv.dk/audiopodcast/channel/4466232 | \
1343           xpath -e "//pubDate[contains(text(),'2019')]/../enclosure/@url" | \
1344           parallel -u wget '{= s/ url="//; s/"//; =}'
1345
1346   EXAMPLE: Run the same command 10 times
1347       If you want to run the same command with the same arguments 10 times in
1348       parallel you can do:
1349
1350         seq 10 | parallel -n0 my_command my_args
1351
1352   EXAMPLE: Working as cat | sh. Resource inexpensive jobs and evaluation
1353       GNU parallel can work similar to cat | sh.
1354
1355       A resource inexpensive job is a job that takes very little CPU, disk
1356       I/O and network I/O. Ping is an example of a resource inexpensive job.
1357       wget is too - if the webpages are small.
1358
1359       The content of the file jobs_to_run:
1360
1361         ping -c 1 10.0.0.1
1362         wget http://example.com/status.cgi?ip=10.0.0.1
1363         ping -c 1 10.0.0.2
1364         wget http://example.com/status.cgi?ip=10.0.0.2
1365         ...
1366         ping -c 1 10.0.0.255
1367         wget http://example.com/status.cgi?ip=10.0.0.255
1368
1369       To run 100 processes simultaneously do:
1370
1371         parallel -j 100 < jobs_to_run
1372
1373       As there is not a command the jobs will be evaluated by the shell.
1374
1375   EXAMPLE: Call program with FASTA sequence
1376       FASTA files have the format:
1377
1378         >Sequence name1
1379         sequence
1380         sequence continued
1381         >Sequence name2
1382         sequence
1383         sequence continued
1384         more sequence
1385
1386       To call myprog with the sequence as argument run:
1387
1388         cat file.fasta |
1389           parallel --pipe -N1 --recstart '>' --rrs \
1390             'read a; echo Name: "$a"; myprog $(tr -d "\n")'
1391
1392   EXAMPLE: Call program with interleaved FASTQ records
1393       FASTQ files have the format:
1394
1395         @M10991:61:000000000-A7EML:1:1101:14011:1001 1:N:0:28
1396         CTCCTAGGTCGGCATGATGGGGGAAGGAGAGCATGGGAAGAAATGAGAGAGTAGCAAGG
1397         +
1398         #8BCCGGGGGFEFECFGGGGGGGGG@;FFGGGEG@FF<EE<@FFC,CEGCCGGFF<FGF
1399
1400       Interleaved FASTQ starts with a line like these:
1401
1402         @HWUSI-EAS100R:6:73:941:1973#0/1
1403         @EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG
1404         @EAS139:136:FC706VJ:2:2104:15343:197393 1:N:18:1
1405
1406       where '/1' and ' 1:' determines this is read 1.
1407
1408       This will cut big.fq into one chunk per CPU thread and pass it on stdin
1409       (standard input) to the program fastq-reader:
1410
1411         parallel --pipe-part -a big.fq --block -1 --regexp \
1412           --recend '\n' --recstart '@.*(/1| 1:.*)\n[A-Za-z\n\.~]' \
1413           fastq-reader
1414
1415   EXAMPLE: Processing a big file using more CPUs
1416       To process a big file or some output you can use --pipe to split up the
1417       data into blocks and pipe the blocks into the processing program.
1418
1419       If the program is gzip -9 you can do:
1420
1421         cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz
1422
1423       This will split bigfile into blocks of 1 MB and pass that to gzip -9 in
1424       parallel. One gzip will be run per CPU. The output of gzip -9 will be
1425       kept in order and saved to bigfile.gz
1426
1427       gzip works fine if the output is appended, but some processing does not
1428       work like that - for example sorting. For this GNU parallel can put the
1429       output of each command into a file. This will sort a big file in
1430       parallel:
1431
1432         cat bigfile | parallel --pipe --files sort |\
1433           parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort
1434
1435       Here bigfile is split into blocks of around 1MB, each block ending in
1436       '\n' (which is the default for --recend). Each block is passed to sort
1437       and the output from sort is saved into files. These files are passed to
1438       the second parallel that runs sort -m on the files before it removes
1439       the files. The output is saved to bigfile.sort.
1440
1441       GNU parallel's --pipe maxes out at around 100 MB/s because every byte
1442       has to be copied through GNU parallel. But if bigfile is a real
1443       (seekable) file GNU parallel can by-pass the copying and send the parts
1444       directly to the program:
1445
1446         parallel --pipe-part --block 100m -a bigfile --files sort |\
1447           parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort
1448
1449   EXAMPLE: Grouping input lines
1450       When processing with --pipe you may have lines grouped by a value. Here
1451       is my.csv:
1452
1453          Transaction Customer Item
1454               1       a       53
1455               2       b       65
1456               3       b       82
1457               4       c       96
1458               5       c       67
1459               6       c       13
1460               7       d       90
1461               8       d       43
1462               9       d       91
1463               10      d       84
1464               11      e       72
1465               12      e       102
1466               13      e       63
1467               14      e       56
1468               15      e       74
1469
1470       Let us assume you want GNU parallel to process each customer. In other
1471       words: You want all the transactions for a single customer to be
1472       treated as a single record.
1473
1474       To do this we preprocess the data with a program that inserts a record
1475       separator before each customer (column 2 = $F[1]). Here we first make a
1476       50 character random string, which we then use as the separator:
1477
1478         sep=`perl -e 'print map { ("a".."z","A".."Z")[rand(52)] } (1..50);'`
1479         cat my.csv | \
1480            perl -ape '$F[1] ne $l and print "'$sep'"; $l = $F[1]' | \
1481            parallel --recend $sep --rrs --pipe -N1 wc
1482
1483       If your program can process multiple customers replace -N1 with a
1484       reasonable --blocksize.
1485
1486   EXAMPLE: Running more than 250 jobs workaround
1487       If you need to run a massive amount of jobs in parallel, then you will
1488       likely hit the filehandle limit which is often around 250 jobs. If you
1489       are super user you can raise the limit in /etc/security/limits.conf but
1490       you can also use this workaround. The filehandle limit is per process.
1491       That means that if you just spawn more GNU parallels then each of them
1492       can run 250 jobs. This will spawn up to 2500 jobs:
1493
1494         cat myinput |\
1495           parallel --pipe -N 50 --round-robin -j50 parallel -j50 your_prg
1496
1497       This will spawn up to 62500 jobs (use with caution - you need 64 GB RAM
1498       to do this, and you may need to increase /proc/sys/kernel/pid_max):
1499
1500         cat myinput |\
1501           parallel --pipe -N 250 --round-robin -j250 parallel -j250 your_prg
1502
1503   EXAMPLE: Working as mutex and counting semaphore
1504       The command sem is an alias for parallel --semaphore.
1505
1506       A counting semaphore will allow a given number of jobs to be started in
1507       the background.  When the number of jobs are running in the background,
1508       GNU sem will wait for one of these to complete before starting another
1509       command. sem --wait will wait for all jobs to complete.
1510
1511       Run 10 jobs concurrently in the background:
1512
1513         for i in *.log ; do
1514           echo $i
1515           sem -j10 gzip $i ";" echo done
1516         done
1517         sem --wait
1518
1519       A mutex is a counting semaphore allowing only one job to run. This will
1520       edit the file myfile and prepends the file with lines with the numbers
1521       1 to 3.
1522
1523         seq 3 | parallel sem sed -i -e '1i{}' myfile
1524
1525       As myfile can be very big it is important only one process edits the
1526       file at the same time.
1527
1528       Name the semaphore to have multiple different semaphores active at the
1529       same time:
1530
1531         seq 3 | parallel sem --id mymutex sed -i -e '1i{}' myfile
1532
1533   EXAMPLE: Mutex for a script
1534       Assume a script is called from cron or from a web service, but only one
1535       instance can be run at a time. With sem and --shebang-wrap the script
1536       can be made to wait for other instances to finish. Here in bash:
1537
1538         #!/usr/bin/sem --shebang-wrap -u --id $0 --fg /bin/bash
1539
1540         echo This will run
1541         sleep 5
1542         echo exclusively
1543
1544       Here perl:
1545
1546         #!/usr/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/perl
1547
1548         print "This will run ";
1549         sleep 5;
1550         print "exclusively\n";
1551
1552       Here python:
1553
1554         #!/usr/local/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/python
1555
1556         import time
1557         print "This will run ";
1558         time.sleep(5)
1559         print "exclusively";
1560
1561   EXAMPLE: Start editor with filenames from stdin (standard input)
1562       You can use GNU parallel to start interactive programs like emacs or
1563       vi:
1564
1565         cat filelist | parallel --tty -X emacs
1566         cat filelist | parallel --tty -X vi
1567
1568       If there are more files than will fit on a single command line, the
1569       editor will be started again with the remaining files.
1570
1571   EXAMPLE: Running sudo
1572       sudo requires a password to run a command as root. It caches the
1573       access, so you only need to enter the password again if you have not
1574       used sudo for a while.
1575
1576       The command:
1577
1578         parallel sudo echo ::: This is a bad idea
1579
1580       is no good, as you would be prompted for the sudo password for each of
1581       the jobs. Instead do:
1582
1583         sudo parallel echo ::: This is a good idea
1584
1585       This way you only have to enter the sudo password once.
1586
1587   EXAMPLE: Run ping in parallel
1588       ping prints out statistics when killed with CTRL-C.
1589
1590       Unfortunately, CTRL-C will also normally kill GNU parallel.
1591
1592       But by using --open-tty and ignoring SIGINT you can get the wanted
1593       effect:
1594
1595         parallel -j0 --open-tty --lb --tag ping '{= $SIG{INT}=sub {} =}' \
1596           ::: 1.1.1.1 8.8.8.8 9.9.9.9 21.21.21.21 80.80.80.80 88.88.88.88
1597
1598       --open-tty will make the pings receive SIGINT (from CTRL-C).  CTRL-C
1599       will not kill GNU parallel, so that will only exit after ping is done.
1600
1601   EXAMPLE: GNU Parallel as queue system/batch manager
1602       GNU parallel can work as a simple job queue system or batch manager.
1603       The idea is to put the jobs into a file and have GNU parallel read from
1604       that continuously. As GNU parallel will stop at end of file we use tail
1605       to continue reading:
1606
1607         true >jobqueue; tail -n+0 -f jobqueue | parallel
1608
1609       To submit your jobs to the queue:
1610
1611         echo my_command my_arg >> jobqueue
1612
1613       You can of course use -S to distribute the jobs to remote computers:
1614
1615         true >jobqueue; tail -n+0 -f jobqueue | parallel -S ..
1616
1617       Output only will be printed when reading the next input after a job has
1618       finished: So you need to submit a job after the first has finished to
1619       see the output from the first job.
1620
1621       If you keep this running for a long time, jobqueue will grow. A way of
1622       removing the jobs already run is by making GNU parallel stop when it
1623       hits a special value and then restart. To use --eof to make GNU
1624       parallel exit, tail also needs to be forced to exit:
1625
1626         true >jobqueue;
1627         while true; do
1628           tail -n+0 -f jobqueue |
1629             (parallel -E StOpHeRe -S ..; echo GNU Parallel is now done;
1630              perl -e 'while(<>){/StOpHeRe/ and last};print <>' jobqueue > j2;
1631              (seq 1000 >> jobqueue &);
1632              echo Done appending dummy data forcing tail to exit)
1633           echo tail exited;
1634           mv j2 jobqueue
1635         done
1636
1637       In some cases you can run on more CPUs and computers during the night:
1638
1639         # Day time
1640         echo 50% > jobfile
1641         cp day_server_list ~/.parallel/sshloginfile
1642         # Night time
1643         echo 100% > jobfile
1644         cp night_server_list ~/.parallel/sshloginfile
1645         tail -n+0 -f jobqueue | parallel --jobs jobfile -S ..
1646
1647       GNU parallel discovers if jobfile or ~/.parallel/sshloginfile changes.
1648
1649   EXAMPLE: GNU Parallel as dir processor
1650       If you have a dir in which users drop files that needs to be processed
1651       you can do this on GNU/Linux (If you know what inotifywait is called on
1652       other platforms file a bug report):
1653
1654         inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
1655           parallel -u echo
1656
1657       This will run the command echo on each file put into my_dir or subdirs
1658       of my_dir.
1659
1660       You can of course use -S to distribute the jobs to remote computers:
1661
1662         inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
1663           parallel -S ..  -u echo
1664
1665       If the files to be processed are in a tar file then unpacking one file
1666       and processing it immediately may be faster than first unpacking all
1667       files. Set up the dir processor as above and unpack into the dir.
1668
1669       Using GNU parallel as dir processor has the same limitations as using
1670       GNU parallel as queue system/batch manager.
1671
1672   EXAMPLE: Locate the missing package
1673       If you have downloaded source and tried compiling it, you may have
1674       seen:
1675
1676         $ ./configure
1677         [...]
1678         checking for something.h... no
1679         configure: error: "libsomething not found"
1680
1681       Often it is not obvious which package you should install to get that
1682       file. Debian has `apt-file` to search for a file. `tracefile` from
1683       https://gitlab.com/ole.tange/tangetools can tell which files a program
1684       tried to access. In this case we are interested in one of the last
1685       files:
1686
1687         $ tracefile -un ./configure | tail | parallel -j0 apt-file search
1688

AUTHOR

1690       When using GNU parallel for a publication please cite:
1691
1692       O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
1693       The USENIX Magazine, February 2011:42-47.
1694
1695       This helps funding further development; and it won't cost you a cent.
1696       If you pay 10000 EUR you should feel free to use GNU Parallel without
1697       citing.
1698
1699       Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
1700
1701       Copyright (C) 2008-2010 Ole Tange, http://ole.tange.dk
1702
1703       Copyright (C) 2010-2022 Ole Tange, http://ole.tange.dk and Free
1704       Software Foundation, Inc.
1705
1706       Parts of the manual concerning xargs compatibility is inspired by the
1707       manual of xargs from GNU findutils 4.4.2.
1708

LICENSE

1710       This program is free software; you can redistribute it and/or modify it
1711       under the terms of the GNU General Public License as published by the
1712       Free Software Foundation; either version 3 of the License, or at your
1713       option any later version.
1714
1715       This program is distributed in the hope that it will be useful, but
1716       WITHOUT ANY WARRANTY; without even the implied warranty of
1717       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
1718       General Public License for more details.
1719
1720       You should have received a copy of the GNU General Public License along
1721       with this program.  If not, see <https://www.gnu.org/licenses/>.
1722
1723   Documentation license I
1724       Permission is granted to copy, distribute and/or modify this
1725       documentation under the terms of the GNU Free Documentation License,
1726       Version 1.3 or any later version published by the Free Software
1727       Foundation; with no Invariant Sections, with no Front-Cover Texts, and
1728       with no Back-Cover Texts.  A copy of the license is included in the
1729       file LICENSES/GFDL-1.3-or-later.txt.
1730
1731   Documentation license II
1732       You are free:
1733
1734       to Share to copy, distribute and transmit the work
1735
1736       to Remix to adapt the work
1737
1738       Under the following conditions:
1739
1740       Attribution
1741                You must attribute the work in the manner specified by the
1742                author or licensor (but not in any way that suggests that they
1743                endorse you or your use of the work).
1744
1745       Share Alike
1746                If you alter, transform, or build upon this work, you may
1747                distribute the resulting work only under the same, similar or
1748                a compatible license.
1749
1750       With the understanding that:
1751
1752       Waiver   Any of the above conditions can be waived if you get
1753                permission from the copyright holder.
1754
1755       Public Domain
1756                Where the work or any of its elements is in the public domain
1757                under applicable law, that status is in no way affected by the
1758                license.
1759
1760       Other Rights
1761                In no way are any of the following rights affected by the
1762                license:
1763
1764                • Your fair dealing or fair use rights, or other applicable
1765                  copyright exceptions and limitations;
1766
1767                • The author's moral rights;
1768
1769                • Rights other persons may have either in the work itself or
1770                  in how the work is used, such as publicity or privacy
1771                  rights.
1772
1773       Notice   For any reuse or distribution, you must make clear to others
1774                the license terms of this work.
1775
1776       A copy of the full license is included in the file as
1777       LICENCES/CC-BY-SA-4.0.txt
1778

GNU PARALLEL EXAMPLES

AUTHOR

LICENSE

SEE ALSO