1PARALLEL_DESIGN(7) parallel PARALLEL_DESIGN(7)
2
3
4
6 This document describes design decisions made in the development of GNU
7 parallel and the reasoning behind them. It will give an overview of why
8 some of the code looks the way it does, and will help new maintainers
9 understand the code better.
10
11 One file program
12 GNU parallel is a Perl script in a single file. It is object oriented,
13 but contrary to normal Perl scripts each class is not in its own file.
14 This is due to user experience: The goal is that in a pinch the user
15 will be able to get GNU parallel working simply by copying a single
16 file: No need to mess around with environment variables like PERL5LIB.
17
18 Choice of programming language
19 GNU parallel is designed to be able to run on old systems. That means
20 that it cannot depend on a compiler being installed - and especially
21 not a compiler for a language that is younger than 20 years old.
22
23 The goal is that you can use GNU parallel on any system, even if you
24 are not allowed to install additional software.
25
26 Of all the systems I have experienced, I have yet to see a system that
27 had GCC installed that did not have Perl. The same goes for Rust, Go,
28 Haskell, and other younger languages. I have, however, seen systems
29 with Perl without any of the mentioned compilers.
30
31 Most modern systems also have either Python2 or Python3 installed, but
32 you still cannot be certain which version, and since Python2 cannot run
33 under Python3, Python is not an option.
34
35 Perl has the added benefit that implementing the {= perlexpr =}
36 replacement string was fairly easy.
37
38 The primary drawback is that Perl is slow. So there is an overhead of
39 3-10 ms/job and 1 ms/MB output (and even more if you use --tag).
40
41 Old Perl style
42 GNU parallel uses some old, deprecated constructs. This is due to a
43 goal of being able to run on old installations. Currently the target is
44 CentOS 3.9 and Perl 5.8.0.
45
46 Scalability up and down
47 The smallest system GNU parallel is tested on is a 32 MB ASUS WL500gP.
48 The largest is a 2 TB 128-core machine. It scales up to around 100
49 machines - depending on the duration of each job.
50
51 Exponentially back off
52 GNU parallel busy waits. This is because the reason why a job is not
53 started may be due to load average (when using --load), and thus it
54 will not make sense to just wait for a job to finish. Instead the load
55 average must be rechecked regularly. Load average is not the only
56 reason: --timeout has a similar problem.
57
58 To not burn up too much CPU GNU parallel sleeps exponentially longer
59 and longer if nothing happens, maxing out at 1 second.
60
61 Shell compatibility
62 It is a goal to have GNU parallel work equally well in any shell.
63 However, in practice GNU parallel is being developed in bash and thus
64 testing in other shells is limited to reported bugs.
65
66 When an incompatibility is found there is often not an easy fix: Fixing
67 the problem in csh often breaks it in bash. In these cases the fix is
68 often to use a small Perl script and call that.
69
70 env_parallel
71 env_parallel is a dummy shell script that will run if env_parallel is
72 not an alias or a function and tell the user how to activate the
73 alias/function for the supported shells.
74
75 The alias or function will copy the current environment and run the
76 command with GNU parallel in the copy of the environment.
77
78 The problem is that you cannot access all of the current environment
79 inside Perl. E.g. aliases, functions and unexported shell variables.
80
81 The idea is therefore to take the environment and put it in
82 $PARALLEL_ENV which GNU parallel prepends to every command.
83
84 The only way to have access to the environment is directly from the
85 shell, so the program must be written in a shell script that will be
86 sourced and there has to deal with the dialect of the relevant shell.
87
88 env_parallel.*
89
90 These are the files that implements the alias or function env_parallel
91 for a given shell. It could be argued that these should be put in some
92 obscure place under /usr/lib, but by putting them in your path it
93 becomes trivial to find the path to them and source them:
94
95 source `which env_parallel.foo`
96
97 The beauty is that they can be put anywhere in the path without the
98 user having to know the location. So if the user's path includes
99 /afs/bin/i386_fc5 or /usr/pkg/parallel/bin or
100 /usr/local/parallel/20161222/sunos5.6/bin the files can be put in the
101 dir that makes most sense for the sysadmin.
102
103 env_parallel.bash / env_parallel.sh / env_parallel.ash /
104 env_parallel.dash / env_parallel.zsh / env_parallel.ksh /
105 env_parallel.mksh
106
107 env_parallel.(bash|sh|ash|dash|ksh|mksh|zsh) defines the function
108 env_parallel. It uses alias and typeset to dump the configuration (with
109 a few exceptions) into $PARALLEL_ENV before running GNU parallel.
110
111 After GNU parallel is finished, $PARALLEL_ENV is deleted.
112
113 env_parallel.csh
114
115 env_parallel.csh has two purposes: If env_parallel is not an alias:
116 make it into an alias that sets $PARALLEL with arguments and calls
117 env_parallel.csh.
118
119 If env_parallel is an alias, then env_parallel.csh uses $PARALLEL as
120 the arguments for GNU parallel.
121
122 It exports the environment by writing a variable definition to a file
123 for each variable. The definitions of aliases are appended to this
124 file. Finally the file is put into $PARALLEL_ENV.
125
126 GNU parallel is then run and $PARALLEL_ENV is deleted.
127
128 env_parallel.fish
129
130 First all functions definitions are generated using a loop and
131 functions.
132
133 Dumping the scalar variable definitions is harder.
134
135 fish can represent non-printable characters in (at least) 2 ways. To
136 avoid problems all scalars are converted to \XX quoting.
137
138 Then commands to generate the definitions are made and separated by
139 NUL.
140
141 This is then piped into a Perl script that quotes all values. List
142 elements will be appended using two spaces.
143
144 Finally \n is converted into \1 because fish variables cannot contain
145 \n. GNU parallel will later convert all \1 from $PARALLEL_ENV into \n.
146
147 This is then all saved in $PARALLEL_ENV.
148
149 GNU parallel is called, and $PARALLEL_ENV is deleted.
150
151 parset (supported in sh, ash, dash, bash, zsh, ksh, mksh)
152 parset is a shell function. This is the reason why parset can set
153 variables: It runs in the shell which is calling it.
154
155 It is also the reason why parset does not work, when data is piped into
156 it: ... | parset ... makes parset start in a subshell, and any changes
157 in environment can therefore not make it back to the calling shell.
158
159 Job slots
160 The easiest way to explain what GNU parallel does is to assume that
161 there are a number of job slots, and when a slot becomes available a
162 job from the queue will be run in that slot. But originally GNU
163 parallel did not model job slots in the code. Job slots have been added
164 to make it possible to use {%} as a replacement string.
165
166 While the job sequence number can be computed in advance, the job slot
167 can only be computed the moment a slot becomes available. So it has
168 been implemented as a stack with lazy evaluation: Draw one from an
169 empty stack and the stack is extended by one. When a job is done, push
170 the available job slot back on the stack.
171
172 This implementation also means that if you re-run the same jobs, you
173 cannot assume jobs will get the same slots. And if you use remote
174 executions, you cannot assume that a given job slot will remain on the
175 same remote server. This goes double since number of job slots can be
176 adjusted on the fly (by giving --jobs a file name).
177
178 Rsync protocol version
179 rsync 3.1.x uses protocol 31 which is unsupported by version 2.5.7.
180 That means that you cannot push a file to a remote system using rsync
181 protocol 31, if the remote system uses 2.5.7. rsync does not
182 automatically downgrade to protocol 30.
183
184 GNU parallel does not require protocol 31, so if the rsync version is
185 >= 3.1.0 then --protocol 30 is added to force newer rsyncs to talk to
186 version 2.5.7.
187
188 Compression
189 GNU parallel buffers output in temporary files. --compress compresses
190 the buffered data. This is a bit tricky because there should be no
191 files to clean up if GNU parallel is killed by a power outage.
192
193 GNU parallel first selects a compression program. If the user has not
194 selected one, the first of these that is in $PATH is used: pzstd lbzip2
195 pbzip2 zstd pixz lz4 pigz lzop plzip lzip gzip lrz pxz bzip2 lzma xz
196 clzip. They are sorted by speed on a 128 core machine.
197
198 Schematically the setup is as follows:
199
200 command started by parallel | compress > tmpfile
201 cattail tmpfile | uncompress | parallel which reads the output
202
203 The setup is duplicated for both standard output (stdout) and standard
204 error (stderr).
205
206 GNU parallel pipes output from the command run into the compression
207 program which saves to a tmpfile. GNU parallel records the pid of the
208 compress program. At the same time a small Perl script (called cattail
209 above) is started: It basically does cat followed by tail -f, but it
210 also removes the tmpfile as soon as the first byte is read, and it
211 continuously checks if the pid of the compression program is dead. If
212 the compress program is dead, cattail reads the rest of tmpfile and
213 exits.
214
215 As most compression programs write out a header when they start, the
216 tmpfile in practice is removed by cattail after around 40 ms.
217
218 More detailed it works like this:
219
220 bash ( command ) |
221 sh ( emptywrapper ( bash ( compound compress ) ) >tmpfile )
222 cattail ( rm tmpfile; compound decompress ) < tmpfile
223
224 This complex setup is to make sure compress program is only started if
225 there is input. This means each job will cause 8 processes to run. If
226 combined with --keep-order these processes will run until the job has
227 been printed.
228
229 Wrapping
230 The command given by the user can be wrapped in multiple templates.
231 Templates can be wrapped in other templates.
232
233 $COMMAND the command to run.
234
235 $INPUT the input to run.
236
237 $SHELL the shell that started GNU Parallel.
238
239 $SSHLOGIN the sshlogin.
240
241 $WORKDIR the working dir.
242
243 $FILE the file to read parts from.
244
245 $STARTPOS the first byte position to read from $FILE.
246
247 $LENGTH the number of bytes to read from $FILE.
248
249 --shellquote echo Double quoted $INPUT
250
251 --nice pri Remote: See The remote system wrapper.
252
253 Local: setpriority(0,0,$nice)
254
255 --cat
256 cat > {}; $COMMAND {};
257 perl -e '$bash = shift;
258 $csh = shift;
259 for(@ARGV) { unlink;rmdir; }
260 if($bash =~ s/h//) { exit $bash; }
261 exit $csh;' "$?h" "$status" {};
262
263 {} is set to $PARALLEL_TMP which is a tmpfile. The Perl
264 script saves the exit value, unlinks the tmpfile, and
265 returns the exit value - no matter if the shell is
266 bash/ksh/zsh (using $?) or *csh/fish (using $status).
267
268 --fifo
269 perl -e '($s,$c,$f) = @ARGV;
270 # mkfifo $PARALLEL_TMP
271 system "mkfifo", $f;
272 # spawn $shell -c $command &
273 $pid = fork || exec $s, "-c", $c;
274 open($o,">",$f) || die $!;
275 # cat > $PARALLEL_TMP
276 while(sysread(STDIN,$buf,131072)){
277 syswrite $o, $buf;
278 }
279 close $o;
280 # waitpid to get the exit code from $command
281 waitpid $pid,0;
282 # Cleanup
283 unlink $f;
284 exit $?/256;' $SHELL -c $COMMAND $PARALLEL_TMP
285
286 This is an elaborate way of: mkfifo {}; run $COMMAND in
287 the background using $SHELL; copying STDIN to {};
288 waiting for background to complete; remove {} and exit
289 with the exit code from $COMMAND.
290
291 It is made this way to be compatible with *csh/fish.
292
293 --pipepart
294 < $FILE perl -e 'while(@ARGV) {
295 sysseek(STDIN,shift,0) || die;
296 $left = shift;
297 while($read =
298 sysread(STDIN,$buf,
299 ($left > 131072 ? 131072 : $left))){
300 $left -= $read;
301 syswrite(STDOUT,$buf);
302 }
303 }' $STARTPOS $LENGTH
304
305 This will read $LENGTH bytes from $FILE starting at
306 $STARTPOS and send it to STDOUT.
307
308 --sshlogin $SSHLOGIN
309 ssh $SSHLOGIN "$COMMAND"
310
311 --transfer
312 ssh $SSHLOGIN mkdir -p ./$WORKDIR;
313 rsync --protocol 30 -rlDzR \
314 -essh ./{} $SSHLOGIN:./$WORKDIR;
315 ssh $SSHLOGIN "$COMMAND"
316
317 Read about --protocol 30 in the section Rsync protocol
318 version.
319
320 --transferfile file
321 <<todo>>
322
323 --basefile <<todo>>
324
325 --return file
326 $COMMAND; _EXIT_status=$?; mkdir -p $WORKDIR;
327 rsync --protocol 30 \
328 --rsync-path=cd\ ./$WORKDIR\;\ rsync \
329 -rlDzR -essh $SSHLOGIN:./$FILE ./$WORKDIR;
330 exit $_EXIT_status;
331
332 The --rsync-path=cd ... is needed because old versions
333 of rsync do not support --no-implied-dirs.
334
335 The $_EXIT_status trick is to postpone the exit value.
336 This makes it incompatible with *csh and should be fixed
337 in the future. Maybe a wrapping 'sh -c' is enough?
338
339 --cleanup $RETURN is the wrapper from --return
340
341 $COMMAND; _EXIT_status=$?; $RETURN;
342 ssh $SSHLOGIN \(rm\ -f\ ./$WORKDIR/{}\;\
343 rmdir\ ./$WORKDIR\ \>\&/dev/null\;\);
344 exit $_EXIT_status;
345
346 $_EXIT_status: see --return above.
347
348 --pipe
349 perl -e 'if(sysread(STDIN, $buf, 1)) {
350 open($fh, "|-", "@ARGV") || die;
351 syswrite($fh, $buf);
352 # Align up to 128k block
353 if($read = sysread(STDIN, $buf, 131071)) {
354 syswrite($fh, $buf);
355 }
356 while($read = sysread(STDIN, $buf, 131072)) {
357 syswrite($fh, $buf);
358 }
359 close $fh;
360 exit ($?&127 ? 128+($?&127) : 1+$?>>8)
361 }' $SHELL -c $COMMAND
362
363 This small wrapper makes sure that $COMMAND will never
364 be run if there is no data.
365
366 --tmux <<TODO Fixup with '-quoting>> mkfifo /tmp/tmx3cMEV &&
367 sh -c 'tmux -S /tmp/tmsaKpv1 new-session -s p334310 -d
368 "sleep .2" >/dev/null 2>&1'; tmux -S /tmp/tmsaKpv1 new-
369 window -t p334310 -n wc\ 10 \(wc\ 10\)\;\ perl\ -e\
370 \'while\(\$t++\<3\)\{\ print\ \$ARGV\[0\],\"\\n\"\ \}\'\
371 \$\?h/\$status\ \>\>\ /tmp/tmx3cMEV\&echo\ wc\\\ 10\;\
372 echo\ \Job\ finished\ at:\ \`date\`\;sleep\ 10; exec
373 perl -e '$/="/";$_=<>;$c=<>;unlink $ARGV; /(\d+)h/ and
374 exit($1);exit$c' /tmp/tmx3cMEV
375
376 mkfifo tmpfile.tmx; tmux -S <tmpfile.tms> new-session -s
377 pPID -d 'sleep .2' >&/dev/null; tmux -S <tmpfile.tms>
378 new-window -t pPID -n <<shell quoted input>> \(<<shell
379 quoted input>>\)\;\ perl\ -e\ \'while\(\$t++\<3\)\{\
380 print\ \$ARGV\[0\],\"\\n\"\ \}\'\ \$\?h/\$status\ \>\>\
381 tmpfile.tmx\&echo\ <<shell double quoted input>>\;echo\
382 \Job\ finished\ at:\ \`date\`\;sleep\ 10; exec perl -e
383 '$/="/";$_=<>;$c=<>;unlink $ARGV; /(\d+)h/ and
384 exit($1);exit$c' tmpfile.tmx
385
386 First a FIFO is made (.tmx). It is used for
387 communicating exit value. Next a new tmux session is
388 made. This may fail if there is already a session, so
389 the output is ignored. If all job slots finish at the
390 same time, then tmux will close the session. A temporary
391 socket is made (.tms) to avoid a race condition in tmux.
392 It is cleaned up when GNU parallel finishes.
393
394 The input is used as the name of the windows in tmux.
395 When the job inside tmux finishes, the exit value is
396 printed to the FIFO (.tmx). This FIFO is opened by perl
397 outside tmux, and perl then removes the FIFO. Perl
398 blocks until the first value is read from the FIFO, and
399 this value is used as exit value.
400
401 To make it compatible with csh and bash the exit value
402 is printed as: $?h/$status and this is parsed by perl.
403
404 There is a bug that makes it necessary to print the exit
405 value 3 times.
406
407 Another bug in tmux requires the length of the tmux
408 title and command to not have certain limits. When
409 inside these limits, 75 '\ ' are added to the title to
410 force it to be outside the limits.
411
412 You can map the bad limits using:
413
414 perl -e 'sub r { int(rand(shift)).($_[0] && "\t".r(@_)) } print map { r(@ARGV)."\n" } 1..10000' 1600 1500 90 |
415 perl -ane '$F[0]+$F[1]+$F[2] < 2037 and print ' |
416 parallel --colsep '\t' --tagstring '{1}\t{2}\t{3}' tmux -S /tmp/p{%}-'{=3 $_="O"x$_ =}' \
417 new-session -d -n '{=1 $_="O"x$_ =}' true'\ {=2 $_="O"x$_ =};echo $?;rm -f /tmp/p{%}-O*'
418
419 perl -e 'sub r { int(rand(shift)).($_[0] && "\t".r(@_)) } print map { r(@ARGV)."\n" } 1..10000' 17000 17000 90 |
420 parallel --colsep '\t' --tagstring '{1}\t{2}\t{3}' \
421 tmux -S /tmp/p{%}-'{=3 $_="O"x$_ =}' new-session -d -n '{=1 $_="O"x$_ =}' true'\ {=2 $_="O"x$_ =};echo $?;rm /tmp/p{%}-O*'
422 > value.csv 2>/dev/null
423
424 R -e 'a<-read.table("value.csv");X11();plot(a[,1],a[,2],col=a[,4]+5,cex=0.1);Sys.sleep(1000)'
425
426 For tmux 1.8 17000 can be lowered to 2100.
427
428 The interesting areas are title 0..1000 with (title +
429 whole command) in 996..1127 and 9331..9636.
430
431 The ordering of the wrapping is important:
432
433 • $PARALLEL_ENV which is set in env_parallel.* must be prepended to
434 the command first, as the command may contain exported variables
435 or functions.
436
437 • --nice/--cat/--fifo should be done on the remote machine
438
439 • --pipepart/--pipe should be done on the local machine inside
440 --tmux
441
442 Convenience options --nice --basefile --transfer --return --cleanup --tmux
443 --group --compress --cat --fifo --workdir --tag --tagstring
444 These are all convenience options that make it easier to do a task. But
445 more importantly: They are tested to work on corner cases, too. Take
446 --nice as an example:
447
448 nice parallel command ...
449
450 will work just fine. But when run remotely, you need to move the nice
451 command so it is being run on the server:
452
453 parallel -S server nice command ...
454
455 And this will again work just fine, as long as you are running a single
456 command. When you are running a composed command you need nice to apply
457 to the whole command, and it gets harder still:
458
459 parallel -S server -q nice bash -c 'command1 ...; cmd2 | cmd3'
460
461 It is not impossible, but by using --nice GNU parallel will do the
462 right thing for you. Similarly when transferring files: It starts to
463 get hard when the file names contain space, :, `, *, or other special
464 characters.
465
466 To run the commands in a tmux session you basically just need to quote
467 the command. For simple commands that is easy, but when commands
468 contain special characters, it gets much harder to get right.
469
470 --compress not only compresses standard output (stdout) but also
471 standard error (stderr); and it does so into files, that are open but
472 deleted, so a crash will not leave these files around.
473
474 --cat and --fifo are easy to do by hand, until you want to clean up the
475 tmpfile and keep the exit code of the command.
476
477 The real killer comes when you try to combine several of these: Doing
478 that correctly for all corner cases is next to impossible to do by
479 hand.
480
481 --shard
482 The simple way to implement sharding would be to:
483
484 1. start n jobs,
485
486 2. split each line into columns,
487
488 3. select the data from the relevant column
489
490 4. compute a hash value from the data
491
492 5. take the modulo n of the hash value
493
494 6. pass the full line to the jobslot that has the computed value
495
496 Unfortunately Perl is rather slow at computing the hash value (and
497 somewhat slow at splitting into columns).
498
499 One solution is to use a compiled language for the splitting and
500 hashing, but that would go against the design criteria of not depending
501 on a compiler.
502
503 Luckily those tasks can be parallelized. So GNU parallel starts n
504 sharders that do step 2-6, and passes blocks of 100k to each of those
505 in a round robin manner. To make sure these sharders compute the hash
506 the same way, $PERL_HASH_SEED is set to the same value for all
507 sharders.
508
509 Running n sharders poses a new problem: Instead of having n outputs
510 (one for each computed value) you now have n outputs for each of the n
511 values, so in total n*n outputs; and you need to merge these n*n
512 outputs together into n outputs.
513
514 This can be done by simply running 'parallel -j0 --lb cat :::
515 outputs_for_one_value', but that is rather inefficient, as it spawns a
516 process for each file. Instead the core code from 'parcat' is run,
517 which is also a bit faster.
518
519 All the sharders and parcats communicate through named pipes that are
520 unlinked as soon as they are opened.
521
522 Shell shock
523 The shell shock bug in bash did not affect GNU parallel, but the
524 solutions did. bash first introduced functions in variables named:
525 BASH_FUNC_myfunc() and later changed that to BASH_FUNC_myfunc%%. When
526 transferring functions GNU parallel reads off the function and changes
527 that into a function definition, which is copied to the remote system
528 and executed before the actual command is executed. Therefore GNU
529 parallel needs to know how to read the function.
530
531 From version 20150122 GNU parallel tries both the ()-version and the
532 %%-version, and the function definition works on both pre- and post-
533 shell shock versions of bash.
534
535 The remote system wrapper
536 The remote system wrapper does some initialization before starting the
537 command on the remote system.
538
539 Make quoting unnecessary by hex encoding everything
540
541 When you run ssh server foo then foo has to be quoted once:
542
543 ssh server "echo foo; echo bar"
544
545 If you run ssh server1 ssh server2 foo then foo has to be quoted twice:
546
547 ssh server1 ssh server2 \'"echo foo; echo bar"\'
548
549 GNU parallel avoids this by packing everyting into hex values and
550 running a command that does not need quoting:
551
552 perl -X -e GNU_Parallel_worker,eval+pack+q/H10000000/,join+q//,@ARGV
553
554 This command reads hex from the command line and converts that to bytes
555 that are then eval'ed as a Perl expression.
556
557 The string GNU_Parallel_worker is not needed. It is simply there to let
558 the user know, that this process is GNU parallel working.
559
560 Ctrl-C and standard error (stderr)
561
562 If the user presses Ctrl-C the user expects jobs to stop. This works
563 out of the box if the jobs are run locally. Unfortunately it is not so
564 simple if the jobs are run remotely.
565
566 If remote jobs are run in a tty using ssh -tt, then Ctrl-C works, but
567 all output to standard error (stderr) is sent to standard output
568 (stdout). This is not what the user expects.
569
570 If remote jobs are run without a tty using ssh (without -tt), then
571 output to standard error (stderr) is kept on stderr, but Ctrl-C does
572 not kill remote jobs. This is not what the user expects.
573
574 So what is needed is a way to have both. It seems the reason why Ctrl-C
575 does not kill the remote jobs is because the shell does not propagate
576 the hang-up signal from sshd. But when sshd dies, the parent of the
577 login shell becomes init (process id 1). So by exec'ing a Perl wrapper
578 to monitor the parent pid and kill the child if the parent pid becomes
579 1, then Ctrl-C works and stderr is kept on stderr.
580
581 Ctrl-C does, however, kill the ssh connection, so any output from a
582 remote dying process is lost.
583
584 To be able to kill all (grand)*children a new process group is started.
585
586 --nice
587
588 niceing the remote process is done by setpriority(0,0,$nice). A few old
589 systems do not implement this and --nice is unsupported on those.
590
591 Setting $PARALLEL_TMP
592
593 $PARALLEL_TMP is used by --fifo and --cat and must point to a non-
594 exitent file in $TMPDIR. This file name is computed on the remote
595 system.
596
597 The wrapper
598
599 The wrapper looks like this:
600
601 $shell = $PARALLEL_SHELL || $SHELL;
602 $tmpdir = $TMPDIR || $PARALLEL_REMOTE_TMPDIR;
603 $nice = $opt::nice;
604 $termseq = $opt::termseq;
605
606 # Check that $tmpdir is writable
607 -w $tmpdir ||
608 die("$tmpdir is not writable.".
609 " Set PARALLEL_REMOTE_TMPDIR");
610 # Set $PARALLEL_TMP to a non-existent file name in $TMPDIR
611 do {
612 $ENV{PARALLEL_TMP} = $tmpdir."/par".
613 join"", map { (0..9,"a".."z","A".."Z")[rand(62)] } (1..5);
614 } while(-e $ENV{PARALLEL_TMP});
615 # Set $script to a non-existent file name in $TMPDIR
616 do {
617 $script = $tmpdir."/par".
618 join"", map { (0..9,"a".."z","A".."Z")[rand(62)] } (1..5);
619 } while(-e $script);
620 # Create a script from the hex code
621 # that removes itself and runs the commands
622 open($fh,">",$script) || die;
623 # ' needed due to rc-shell
624 print($fh("rm \'$script\'\n",$bashfunc.$cmd));
625 close $fh;
626 my $parent = getppid;
627 my $done = 0;
628 $SIG{CHLD} = sub { $done = 1; };
629 $pid = fork;
630 unless($pid) {
631 # Make own process group to be able to kill HUP it later
632 eval { setpgrp };
633 # Set nice value
634 eval { setpriority(0,0,$nice) };
635 # Run the script
636 exec($shell,$script);
637 die("exec failed: $!");
638 }
639 while((not $done) and (getppid == $parent)) {
640 # Parent pid is not changed, so sshd is alive
641 # Exponential sleep up to 1 sec
642 $s = $s < 1 ? 0.001 + $s * 1.03 : $s;
643 select(undef, undef, undef, $s);
644 }
645 if(not $done) {
646 # sshd is dead: User pressed Ctrl-C
647 # Kill as per --termseq
648 my @term_seq = split/,/,$termseq;
649 if(not @term_seq) {
650 @term_seq = ("TERM",200,"TERM",100,"TERM",50,"KILL",25);
651 }
652 while(@term_seq && kill(0,-$pid)) {
653 kill(shift @term_seq, -$pid);
654 select(undef, undef, undef, (shift @term_seq)/1000);
655 }
656 }
657 wait;
658 exit ($?&127 ? 128+($?&127) : 1+$?>>8)
659
660 Transferring of variables and functions
661 Transferring of variables and functions given by --env is done by
662 running a Perl script remotely that calls the actual command. The Perl
663 script sets $ENV{variable} to the correct value before exec'ing a shell
664 that runs the function definition followed by the actual command.
665
666 The function env_parallel copies the full current environment into the
667 environment variable PARALLEL_ENV. This variable is picked up by GNU
668 parallel and used to create the Perl script mentioned above.
669
670 Base64 encoded bzip2
671 csh limits words of commands to 1024 chars. This is often too little
672 when GNU parallel encodes environment variables and wraps the command
673 with different templates. All of these are combined and quoted into one
674 single word, which often is longer than 1024 chars.
675
676 When the line to run is > 1000 chars, GNU parallel therefore encodes
677 the line to run. The encoding bzip2s the line to run, converts this to
678 base64, splits the base64 into 1000 char blocks (so csh does not fail),
679 and prepends it with this Perl script that decodes, decompresses and
680 evals the line.
681
682 @GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
683 eval "@GNU_Parallel";
684
685 $SIG{CHLD}="IGNORE";
686 # Search for bzip2. Not found => use default path
687 my $zip = (grep { -x $_ } "/usr/local/bin/bzip2")[0] || "bzip2";
688 # $in = stdin on $zip, $out = stdout from $zip
689 my($in, $out,$eval);
690 open3($in,$out,">&STDERR",$zip,"-dc");
691 if(my $perlpid = fork) {
692 close $in;
693 $eval = join "", <$out>;
694 close $out;
695 } else {
696 close $out;
697 # Pipe decoded base64 into 'bzip2 -dc'
698 print $in (decode_base64(join"",@ARGV));
699 close $in;
700 exit;
701 }
702 wait;
703 eval $eval;
704
705 Perl and bzip2 must be installed on the remote system, but a small test
706 showed that bzip2 is installed by default on all platforms that runs
707 GNU parallel, so this is not a big problem.
708
709 The added bonus of this is that much bigger environments can now be
710 transferred as they will be below bash's limit of 131072 chars.
711
712 Which shell to use
713 Different shells behave differently. A command that works in tcsh may
714 not work in bash. It is therefore important that the correct shell is
715 used when GNU parallel executes commands.
716
717 GNU parallel tries hard to use the right shell. If GNU parallel is
718 called from tcsh it will use tcsh. If it is called from bash it will
719 use bash. It does this by looking at the (grand)*parent process: If the
720 (grand)*parent process is a shell, use this shell; otherwise look at
721 the parent of this (grand)*parent. If none of the (grand)*parents are
722 shells, then $SHELL is used.
723
724 This will do the right thing if called from:
725
726 • an interactive shell
727
728 • a shell script
729
730 • a Perl script in `` or using system if called as a single string.
731
732 While these cover most cases, there are situations where it will fail:
733
734 • When run using exec.
735
736 • When run as the last command using -c from another shell (because
737 some shells use exec):
738
739 zsh% bash -c "parallel 'echo {} is not run in bash; \
740 set | grep BASH_VERSION' ::: This"
741
742 You can work around that by appending '&& true':
743
744 zsh% bash -c "parallel 'echo {} is run in bash; \
745 set | grep BASH_VERSION' ::: This && true"
746
747 • When run in a Perl script using system with parallel as the first
748 string:
749
750 #!/usr/bin/perl
751
752 system("parallel",'setenv a {}; echo $a',":::",2);
753
754 Here it depends on which shell is used to call the Perl script. If
755 the Perl script is called from tcsh it will work just fine, but if it
756 is called from bash it will fail, because the command setenv is not
757 known to bash.
758
759 If GNU parallel guesses wrong in these situation, set the shell using
760 $PARALLEL_SHELL.
761
762 Always running commands in a shell
763 If the command is a simple command with no redirection and setting of
764 variables, the command could be run without spawning a shell. E.g. this
765 simple grep matching either 'ls ' or ' wc >> c':
766
767 parallel "grep -E 'ls | wc >> c' {}" ::: foo
768
769 could be run as:
770
771 system("grep","-E","ls | wc >> c","foo");
772
773 However, as soon as the command is a bit more complex a shell must be
774 spawned:
775
776 parallel "grep -E 'ls | wc >> c' {} | wc >> c" ::: foo
777 parallel "LANG=C grep -E 'ls | wc >> c' {}" ::: foo
778
779 It is impossible to tell how | wc >> c should be interpreted without
780 parsing the string (is the | a pipe in shell or an alternation in a
781 grep regexp? Is LANG=C a command in csh or setting a variable in bash?
782 Is >> redirection or part of a regexp?).
783
784 On top of this, wrapper scripts will often require a shell to be
785 spawned.
786
787 The downside is that you need to quote special shell chars twice:
788
789 parallel echo '*' ::: This will expand the asterisk
790 parallel echo "'*'" ::: This will not
791 parallel "echo '*'" ::: This will not
792 parallel echo '\*' ::: This will not
793 parallel echo \''*'\' ::: This will not
794 parallel -q echo '*' ::: This will not
795
796 -q will quote all special chars, thus redirection will not work: this
797 prints '* > out.1' and does not save '*' into the file out.1:
798
799 parallel -q echo "*" ">" out.{} ::: 1
800
801 GNU parallel tries to live up to Principle Of Least Astonishment
802 (POLA), and the requirement of using -q is hard to understand, when you
803 do not see the whole picture.
804
805 Quoting
806 Quoting depends on the shell. For most shells '-quoting is used for
807 strings containing special characters.
808
809 For tcsh/csh newline is quoted as \ followed by newline. Other special
810 characters are also \-quoted.
811
812 For rc everything is quoted using '.
813
814 --pipepart vs. --pipe
815 While --pipe and --pipepart look much the same to the user, they are
816 implemented very differently.
817
818 With --pipe GNU parallel reads the blocks from standard input (stdin),
819 which is then given to the command on standard input (stdin); so every
820 block is being processed by GNU parallel itself. This is the reason why
821 --pipe maxes out at around 500 MB/sec.
822
823 --pipepart, on the other hand, first identifies at which byte positions
824 blocks start and how long they are. It does that by seeking into the
825 file by the size of a block and then reading until it meets end of a
826 block. The seeking explains why GNU parallel does not know the line
827 number and why -L/-l and -N do not work.
828
829 With a reasonable block and file size this seeking is more than 1000
830 time faster than reading the full file. The byte positions are then
831 given to a small script that reads from position X to Y and sends
832 output to standard output (stdout). This small script is prepended to
833 the command and the full command is executed just as if GNU parallel
834 had been in its normal mode. The script looks like this:
835
836 < file perl -e 'while(@ARGV) {
837 sysseek(STDIN,shift,0) || die;
838 $left = shift;
839 while($read = sysread(STDIN,$buf,
840 ($left > 131072 ? 131072 : $left))){
841 $left -= $read; syswrite(STDOUT,$buf);
842 }
843 }' startbyte length_in_bytes
844
845 It delivers 1 GB/s per core.
846
847 Instead of the script dd was tried, but many versions of dd do not
848 support reading from one byte to another and might cause partial data.
849 See this for a surprising example:
850
851 yes | dd bs=1024k count=10 | wc
852
853 --block-size adjustment
854 Every time GNU parallel detects a record bigger than --block-size it
855 increases the block size by 30%. A small --block-size gives very poor
856 performance; by exponentially increasing the block size performance
857 will not suffer.
858
859 GNU parallel will waste CPU power if --block-size does not contain a
860 full record, because it tries to find a full record and will fail to do
861 so. The recommendation is therefore to use a --block-size > 2 records,
862 so you always get at least one full record when you read one block.
863
864 If you use -N then --block-size should be big enough to contain N+1
865 records.
866
867 Automatic --block-size computation
868 With --pipepart GNU parallel can compute the --block-size
869 automatically. A --block-size of -1 will use a block size so that each
870 jobslot will receive approximately 1 block. --block -2 will pass 2
871 blocks to each jobslot and -n will pass n blocks to each jobslot.
872
873 This can be done because --pipepart reads from files, and we can
874 compute the total size of the input.
875
876 --jobs and --onall
877 When running the same commands on many servers what should --jobs
878 signify? Is it the number of servers to run on in parallel? Is it the
879 number of jobs run in parallel on each server?
880
881 GNU parallel lets --jobs represent the number of servers to run on in
882 parallel. This is to make it possible to run a sequence of commands
883 (that cannot be parallelized) on each server, but run the same sequence
884 on multiple servers.
885
886 --shuf
887 When using --shuf to shuffle the jobs, all jobs are read, then they are
888 shuffled, and finally executed. When using SQL this makes the
889 --sqlmaster be the part that shuffles the jobs. The --sqlworkers simply
890 executes according to Seq number.
891
892 --csv
893 --pipepart is incompatible with --csv because you can have records
894 like:
895
896 a,b,c
897 a,"
898 a,b,c
899 a,b,c
900 a,b,c
901 ",c
902 a,b,c
903
904 Here the second record contains a multi-line field that looks like
905 records. Since --pipepart does not read then whole file when searching
906 for record endings, it may start reading in this multi-line field,
907 which would be wrong.
908
909 Buffering on disk
910 GNU parallel buffers output, because if output is not buffered you have
911 to be ridiculously careful on sizes to avoid mixing of outputs (see
912 excellent example on https://catern.com/posts/pipes.html).
913
914 GNU parallel buffers on disk in $TMPDIR using files, that are removed
915 as soon as they are created, but which are kept open. So even if GNU
916 parallel is killed by a power outage, there will be no files to clean
917 up afterwards. Another advantage is that the file system is aware that
918 these files will be lost in case of a crash, so it does not need to
919 sync them to disk.
920
921 It gives the odd situation that a disk can be fully used, but there are
922 no visible files on it.
923
924 Partly buffering in memory
925
926 When using output formats SQL and CSV then GNU Parallel has to read the
927 whole output into memory. When run normally it will only read the
928 output from a single job. But when using --linebuffer every line
929 printed will also be buffered in memory - for all jobs currently
930 running.
931
932 If memory is tight, then do not use the output format SQL/CSV with
933 --linebuffer.
934
935 Comparing to buffering in memory
936
937 gargs is a parallelizing tool that buffers in memory. It is therefore a
938 useful way of comparing the advantages and disadvantages of buffering
939 in memory to buffering on disk.
940
941 On an system with 6 GB RAM free and 6 GB free swap these were tested
942 with different sizes:
943
944 echo /dev/zero | gargs "head -c $size {}" >/dev/null
945 echo /dev/zero | parallel "head -c $size {}" >/dev/null
946
947 The results are here:
948
949 JobRuntime Command
950 0.344 parallel_test 1M
951 0.362 parallel_test 10M
952 0.640 parallel_test 100M
953 9.818 parallel_test 1000M
954 23.888 parallel_test 2000M
955 30.217 parallel_test 2500M
956 30.963 parallel_test 2750M
957 34.648 parallel_test 3000M
958 43.302 parallel_test 4000M
959 55.167 parallel_test 5000M
960 67.493 parallel_test 6000M
961 178.654 parallel_test 7000M
962 204.138 parallel_test 8000M
963 230.052 parallel_test 9000M
964 255.639 parallel_test 10000M
965 757.981 parallel_test 30000M
966 0.537 gargs_test 1M
967 0.292 gargs_test 10M
968 0.398 gargs_test 100M
969 3.456 gargs_test 1000M
970 8.577 gargs_test 2000M
971 22.705 gargs_test 2500M
972 123.076 gargs_test 2750M
973 89.866 gargs_test 3000M
974 291.798 gargs_test 4000M
975
976 GNU parallel is pretty much limited by the speed of the disk: Up to 6
977 GB data is written to disk but cached, so reading is fast. Above 6 GB
978 data are both written and read from disk. When the 30000MB job is
979 running, the disk system is slow, but usable: If you are not using the
980 disk, you almost do not feel it.
981
982 gargs has a speed advantage up until 2500M where it hits a wall. Then
983 the system starts swapping like crazy and is completely unusable. At
984 5000M it goes out of memory.
985
986 You can make GNU parallel behave similar to gargs if you point $TMPDIR
987 to a tmpfs-filesystem: It will be faster for small outputs, but may
988 kill your system for larger outputs and cause you to lose output.
989
990 Disk full
991 GNU parallel buffers on disk. If the disk is full, data may be lost. To
992 check if the disk is full GNU parallel writes a 8193 byte file every
993 second. If this file is written successfully, it is removed
994 immediately. If it is not written successfully, the disk is full. The
995 size 8193 was chosen because 8192 gave wrong result on some file
996 systems, whereas 8193 did the correct thing on all tested filesystems.
997
998 Memory usage
999 Normally GNU parallel will use around 17 MB RAM constantly - no matter
1000 how many jobs or how much output there is. There are a few things that
1001 cause the memory usage to rise:
1002
1003 • Multiple input sources. GNU parallel reads an input source only
1004 once. This is by design, as an input source can be a stream (e.g.
1005 FIFO, pipe, standard input (stdin)) which cannot be rewound and read
1006 again. When reading a single input source, the memory is freed as
1007 soon as the job is done - thus keeping the memory usage constant.
1008
1009 But when reading multiple input sources GNU parallel keeps the
1010 already read values for generating all combinations with other input
1011 sources.
1012
1013 • Computing the number of jobs. --bar, --eta, and --halt xx% use
1014 total_jobs() to compute the total number of jobs. It does this by
1015 generating the data structures for all jobs. All these job data
1016 structures will be stored in memory and take up around 400
1017 bytes/job.
1018
1019 • Buffering a full line. --linebuffer will read a full line per
1020 running job. A very long output line (say 1 GB without \n) will
1021 increase RAM usage temporarily: From when the beginning of the line
1022 is read till the line is printed.
1023
1024 • Buffering the full output of a single job. This happens when using
1025 --results *.csv/*.tsv or --sql*. Here GNU parallel will read the
1026 whole output of a single job and save it as csv/tsv or SQL.
1027
1028 Argument separators ::: :::: :::+ ::::+
1029 The argument separator ::: was chosen because I have never seen :::
1030 used in any command. The natural choice -- would be a bad idea since it
1031 is not unlikely that the template command will contain --. I have seen
1032 :: used in programming languanges to separate classes, and I did not
1033 want the user to be confused that the separator had anything to do with
1034 classes.
1035
1036 ::: also makes a visual separation, which is good if there are multiple
1037 :::.
1038
1039 When ::: was chosen, :::: came as a fairly natural extension.
1040
1041 Linking input sources meant having to decide for some way to indicate
1042 linking of ::: and ::::. :::+ and ::::+ were chosen, so that they were
1043 similar to ::: and ::::.
1044
1045 In 2022 I realized that /// would have been an even better choice,
1046 because you cannot have an file named /// whereas you can have a file
1047 named :::.
1048
1049 Perl replacement strings, {= =}, and --rpl
1050 The shorthands for replacement strings make a command look more
1051 cryptic. Different users will need different replacement strings.
1052 Instead of inventing more shorthands you get more flexible replacement
1053 strings if they can be programmed by the user.
1054
1055 The language Perl was chosen because GNU parallel is written in Perl
1056 and it was easy and reasonably fast to run the code given by the user.
1057
1058 If a user needs the same programmed replacement string again and again,
1059 the user may want to make his own shorthand for it. This is what --rpl
1060 is for. It works so well, that even GNU parallel's own shorthands are
1061 implemented using --rpl.
1062
1063 In Perl code the bigrams {= and =} rarely exist. They look like a
1064 matching pair and can be entered on all keyboards. This made them good
1065 candidates for enclosing the Perl expression in the replacement
1066 strings. Another candidate ,, and ,, was rejected because they do not
1067 look like a matching pair. --parens was made, so that the users can
1068 still use ,, and ,, if they like: --parens ,,,,
1069
1070 Internally, however, the {= and =} are replaced by \257< and \257>.
1071 This is to make it simpler to make regular expressions. You only need
1072 to look one character ahead, and never have to look behind.
1073
1074 Test suite
1075 GNU parallel uses its own testing framework. This is mostly due to
1076 historical reasons. It deals reasonably well with tests that are
1077 dependent on how long a given test runs (e.g. more than 10 secs is a
1078 pass, but less is a fail). It parallelizes most tests, but it is easy
1079 to force a test to run as the single test (which may be important for
1080 timing issues). It deals reasonably well with tests that fail
1081 intermittently. It detects which tests failed and pushes these to the
1082 top, so when running the test suite again, the tests that failed most
1083 recently are run first.
1084
1085 If GNU parallel should adopt a real testing framework then those
1086 elements would be important.
1087
1088 Since many tests are dependent on which hardware it is running on,
1089 these tests break when run on a different hardware than what the test
1090 was written for.
1091
1092 When most bugs are fixed a test is added, so this bug will not
1093 reappear. It is, however, sometimes hard to create the environment in
1094 which the bug shows up - especially if the bug only shows up sometimes.
1095 One of the harder problems was to make a machine start swapping without
1096 forcing it to its knees.
1097
1098 Median run time
1099 Using a percentage for --timeout causes GNU parallel to compute the
1100 median run time of a job. The median is a better indicator of the
1101 expected run time than average, because there will often be outliers
1102 taking way longer than the normal run time.
1103
1104 To avoid keeping all run times in memory, an implementation of remedian
1105 was made (Rousseeuw et al).
1106
1107 Error messages and warnings
1108 Error messages like: ERROR, Not found, and 42 are not very helpful. GNU
1109 parallel strives to inform the user:
1110
1111 • What went wrong?
1112
1113 • Why did it go wrong?
1114
1115 • What can be done about it?
1116
1117 Unfortunately it is not always possible to predict the root cause of
1118 the error.
1119
1120 Determine number of CPUs
1121 CPUs is an ambiguous term. It can mean the number of socket filled
1122 (i.e. the number of physical chips). It can mean the number of cores
1123 (i.e. the number of physical compute cores). It can mean the number of
1124 hyperthreaded cores (i.e. the number of virtual cores - with some of
1125 them possibly being hyperthreaded).
1126
1127 On ark.intel.com Intel uses the terms cores and threads for number of
1128 physical cores and the number of hyperthreaded cores respectively.
1129
1130 GNU parallel uses uses CPUs as the number of compute units and the
1131 terms sockets, cores, and threads to specify how the number of compute
1132 units is calculated.
1133
1134 Computation of load
1135 Contrary to the obvious --load does not use load average. This is due
1136 to load average rising too slowly. Instead it uses ps to list the
1137 number of threads in running or blocked state (state D, O or R). This
1138 gives an instant load.
1139
1140 As remote calculation of load can be slow, a process is spawned to run
1141 ps and put the result in a file, which is then used next time.
1142
1143 Killing jobs
1144 GNU parallel kills jobs. It can be due to --memfree, --halt, or when
1145 GNU parallel meets a condition from which it cannot recover. Every job
1146 is started as its own process group. This way any (grand)*children will
1147 get killed, too. The process group is killed with the specification
1148 mentioned in --termseq.
1149
1150 SQL interface
1151 GNU parallel uses the DBURL from GNU sql to give database software,
1152 username, password, host, port, database, and table in a single string.
1153
1154 The DBURL must point to a table name. The table will be dropped and
1155 created. The reason for not reusing an existing table is that the user
1156 may have added more input sources which would require more columns in
1157 the table. By prepending '+' to the DBURL the table will not be
1158 dropped.
1159
1160 The table columns are similar to joblog with the addition of V1 .. Vn
1161 which are values from the input sources, and Stdout and Stderr which
1162 are the output from standard output and standard error, respectively.
1163
1164 The Signal column has been renamed to _Signal due to Signal being a
1165 reserved word in MySQL.
1166
1167 Logo
1168 The logo is inspired by the Cafe Wall illusion. The font is DejaVu
1169 Sans.
1170
1171 Citation notice
1172 Funding a free software project is hard. GNU parallel is no exception.
1173 On top of that it seems the less visible a project is, the harder it is
1174 to get funding. And the nature of GNU parallel is that it will never be
1175 seen by "the guy with the checkbook", but only by the people doing the
1176 actual work.
1177
1178 This problem has been covered by others - though no solution has been
1179 found: https://www.slideshare.net/NadiaEghbal/consider-the-maintainer
1180 https://www.numfocus.org/blog/why-is-numpy-only-now-getting-funded/
1181
1182 Before implementing the citation notice it was discussed with the
1183 users:
1184 https://lists.gnu.org/archive/html/parallel/2013-11/msg00006.html
1185
1186 Having to spend 10 seconds on running parallel --citation once is no
1187 doubt not an ideal solution, but no one has so far come up with an
1188 ideal solution - neither for funding GNU parallel nor other free
1189 software.
1190
1191 If you believe you have the perfect solution, you should try it out,
1192 and if it works, you should post it on the email list. Ideas that will
1193 cost work and which have not been tested are, however, unlikely to be
1194 prioritized.
1195
1196 Running parallel --citation one single time takes less than 10 seconds,
1197 and will silence the citation notice for future runs. This is
1198 comparable to graphical tools where you have to click a checkbox saying
1199 "Do not show this again". But if that is too much trouble for you, why
1200 not use one of the alternatives instead? See a list in: man
1201 parallel_alternatives.
1202
1203 As the request for citation is not a legal requirement this is
1204 acceptable under GPLv3 and cleared with Richard M. Stallman himself.
1205 Thus it does not fall under this:
1206 https://www.gnu.org/licenses/gpl-faq.en.html#RequireCitation
1207
1209 Multiple processes working together
1210 Open3 is slow. Printing is slow. It would be good if they did not tie
1211 up resources, but were run in separate threads.
1212
1213 --rrs on remote using a perl wrapper
1214 ... | perl -pe '$/=$recend$recstart;BEGIN{ if(substr($_) eq $recstart)
1215 substr($_)="" } eof and substr($_) eq $recend) substr($_)=""
1216
1217 It ought to be possible to write a filter that removed rec sep on the
1218 fly instead of inside GNU parallel. This could then use more cpus.
1219
1220 Will that require 2x record size memory?
1221
1222 Will that require 2x block size memory?
1223
1225 These decisions were relevant for earlier versions of GNU parallel, but
1226 not the current version. They are kept here as historical record.
1227
1228 --tollef
1229 You can read about the history of GNU parallel on
1230 https://www.gnu.org/software/parallel/history.html
1231
1232 --tollef was included to make GNU parallel switch compatible with the
1233 parallel from moreutils (which is made by Tollef Fog Heen). This was
1234 done so that users of that parallel easily could port their use to GNU
1235 parallel: Simply set PARALLEL="--tollef" and that would be it.
1236
1237 But several distributions chose to make --tollef global (by putting it
1238 into /etc/parallel/config) without making the users aware of this, and
1239 that caused much confusion when people tried out the examples from GNU
1240 parallel's man page and these did not work. The users became
1241 frustrated because the distribution did not make it clear to them that
1242 it has made --tollef global.
1243
1244 So to lessen the frustration and the resulting support, --tollef was
1245 obsoleted 20130222 and removed one year later.
1246
1247
1248
124920221022 2022-11-02 PARALLEL_DESIGN(7)