Parallel::ForkManager(3pm)

1Parallel::ForkManager(3U)ser Contributed Perl DocumentatiPoanrallel::ForkManager(3)
2
3
4

NAME

6       Parallel::ForkManager - A simple parallel processing fork manager
7

VERSION

9       version 1.20
10

SYNOPSIS

12         use Parallel::ForkManager;
13
14         my $pm = Parallel::ForkManager->new($MAX_PROCESSES);
15
16         DATA_LOOP:
17         foreach my $data (@all_data) {
18           # Forks and returns the pid for the child:
19           my $pid = $pm->start and next DATA_LOOP;
20
21           ... do some work with $data in the child process ...
22
23           $pm->finish; # Terminates the child process
24         }
25

DESCRIPTION

27       This module is intended for use in operations that can be done in
28       parallel where the number of processes to be forked off should be
29       limited. Typical use is a downloader which will be retrieving
30       hundreds/thousands of files.
31
32       The code for a downloader would look something like this:
33
34         use LWP::Simple;
35         use Parallel::ForkManager;
36
37         ...
38
39         my @links=(
40           ["http://www.foo.bar/rulez.data","rulez_data.txt"],
41           ["http://new.host/more_data.doc","more_data.doc"],
42           ...
43         );
44
45         ...
46
47         # Max 30 processes for parallel download
48         my $pm = Parallel::ForkManager->new(30);
49
50         LINKS:
51         foreach my $linkarray (@links) {
52           $pm->start and next LINKS; # do the fork
53
54           my ($link, $fn) = @$linkarray;
55           warn "Cannot get $fn from $link"
56             if getstore($link, $fn) != RC_OK;
57
58           $pm->finish; # do the exit in the child process
59         }
60         $pm->wait_all_children;
61
62       First you need to instantiate the ForkManager with the "new"
63       constructor.  You must specify the maximum number of processes to be
64       created. If you specify 0, then NO fork will be done; this is good for
65       debugging purposes.
66
67       Next, use $pm->start to do the fork. $pm returns 0 for the child
68       process, and child pid for the parent process (see also "fork()" in
69       perlfunc(1p)).  The "and next" skips the internal loop in the parent
70       process. NOTE: $pm->start dies if the fork fails.
71
72       $pm->finish terminates the child process (assuming a fork was done in
73       the "start").
74
75       NOTE: You cannot use $pm->start if you are already in the child
76       process.  If you want to manage another set of subprocesses in the
77       child process, you must instantiate another Parallel::ForkManager
78       object!
79

METHODS

81       The comment letter indicates where the method should be run. P for
82       parent, C for child.
83
84       new $processes
85            Instantiate a new Parallel::ForkManager object. You must specify
86            the maximum number of children to fork off. If you specify 0
87            (zero), then no children will be forked. This is intended for
88            debugging purposes.
89
90            The optional second parameter, $tempdir, is only used if you want
91            the children to send back a reference to some data (see RETRIEVING
92            DATASTRUCTURES below). If not provided, it is set via a call to
93            File::Temp::tempdir().
94
95            The new method will die if the temporary directory does not exist
96            or it is not a directory.
97
98       start [ $process_identifier ]
99            This method does the fork. It returns the pid of the child process
100            for the parent, and 0 for the child process. If the $processes
101            parameter for the constructor is 0 then, assuming you're in the
102            child process, $pm->start simply returns 0.
103
104            An optional $process_identifier can be provided to this method...
105            It is used by the "run_on_finish" callback (see CALLBACKS) for
106            identifying the finished process.
107
108       start_child [ $process_identifier, ] \&callback
109            Like "start", but will run the &callback as the child. If the
110            callback returns anything, it'll be passed as the data to transmit
111            back to the parent process via "finish()".
112
113       finish [ $exit_code [, $data_structure_reference] ]
114            Closes the child process by exiting and accepts an optional exit
115            code (default exit code is 0) which can be retrieved in the parent
116            via callback.  If the second optional parameter is provided, the
117            child attempts to send its contents back to the parent. If you use
118            the program in debug mode ($processes == 0), this method just
119            calls the callback.
120
121            If the $data_structure_reference is provided, then it is
122            serialized and passed to the parent process. See RETRIEVING
123            DATASTRUCTURES for more info.
124
125       set_max_procs $processes
126            Allows you to set a new maximum number of children to maintain.
127
128       wait_all_children
129            You can call this method to wait for all the processes which have
130            been forked. This is a blocking wait.
131
132       reap_finished_children
133            This is a non-blocking call to reap children and execute callbacks
134            independent of calls to "start" or "wait_all_children". Use this
135            in scenarios where "start" is called infrequently but you would
136            like the callbacks executed quickly.
137
138       is_parent
139            Returns "true" if within the parent or "false" if within the
140            child.
141
142       is_child
143            Returns "true" if within the child or "false" if within the
144            parent.
145
146       max_procs
147            Returns the maximal number of processes the object will fork.
148
149       running_procs
150            Returns the pids of the forked processes currently monitored by
151            the "Parallel::ForkManager". Note that children are still reported
152            as running until the fork manager harvest them, via the next call
153            to "start" or "wait_all_children".
154
155                my @pids = $pm->running_procs;
156
157                my $nbr_children =- $pm->running_procs;
158
159       wait_for_available_procs( $n )
160            Wait until $n available process slots are available.  If $n is not
161            given, defaults to 1.
162
163       waitpid_blocking_sleep
164            Returns the sleep period, in seconds, of the pseudo-blocking
165            calls. The sleep period can be a fraction of second.
166
167            Returns 0 if disabled.
168
169            Defaults to 1 second.
170
171            See BLOCKING CALLS for more details.
172
173       set_waitpid_blocking_sleep $seconds
174            Sets the the sleep period, in seconds, of the pseudo-blocking
175            calls.  Set to 0 to disable.
176
177            See BLOCKING CALLS for more details.
178

CALLBACKS

180       You can define callbacks in the code, which are called on events like
181       starting a process or upon finish. Declare these before the first call
182       to start().
183
184       The callbacks can be defined with the following methods:
185
186       run_on_finish $code [, $pid ]
187           You can define a subroutine which is called when a child is
188           terminated. It is called in the parent process.
189
190           The parameters of the $code are the following:
191
192             - pid of the process, which is terminated
193             - exit code of the program
194             - identification of the process (if provided in the "start" method)
195             - exit signal (0-127: signal name)
196             - core dump (1 if there was core dump at exit)
197             - datastructure reference or undef (see RETRIEVING DATASTRUCTURES)
198
199       run_on_start $code
200           You can define a subroutine which is called when a child is
201           started. It called after the successful startup of a child in the
202           parent process.
203
204           The parameters of the $code are the following:
205
206             - pid of the process which has been started
207             - identification of the process (if provided in the "start" method)
208
209       run_on_wait $code, [$period]
210           You can define a subroutine which is called when the child process
211           needs to wait for the startup. If $period is not defined, then one
212           call is done per child. If $period is defined, then $code is called
213           periodically and the module waits for $period seconds between the
214           two calls. Note, $period can be fractional number also. The exact
215           "$period seconds" is not guaranteed, signals can shorten and the
216           process scheduler can make it longer (on busy systems).
217
218           The $code called in the "start" and the "wait_all_children" method
219           also.
220
221           No parameters are passed to the $code on the call.
222

BLOCKING CALLS

224       When it comes to waiting for child processes to terminate,
225       "Parallel::ForkManager" is between a fork and a hard place (if you
226       excuse the terrible pun). The underlying Perl "waitpid" function that
227       the module relies on can block until either one specific or any child
228       process terminate, but not for a process part of a given group.
229
230       This means that the module can do one of two things when it waits for
231       one of its child processes to terminate:
232
233       Only wait for its own child processes
234           This is done via a loop using a "waitpid" non-blocking call and a
235           sleep statement.  The code does something along the lines of
236
237               while(1) {
238                   if ( any of the P::FM child process terminated ) {
239                       return its pid
240                   }
241
242                   sleep $sleep_period
243               }
244
245           This is the default behavior that the module will use.  This is not
246           the most efficient way to wait for child processes, but it's the
247           safest way to ensure that "Parallel::ForkManager" won't interfere
248           with any other part of the codebase.
249
250           The sleep period is set via the method
251           "set_waitpid_blocking_sleep".
252
253       Block until any process terminate
254           Alternatively, "Parallel::ForkManager" can call "waitpid" such that
255           it will block until any child process terminate. If the child
256           process was not one of the monitored subprocesses, the wait will
257           resume. This is more efficient, but mean that "P::FM" can captures
258           (and discards) the termination notification that a different part
259           of the code might be waiting for.
260
261           If this is a race condition that doesn't apply to your codebase,
262           you can set the waitpid_blocking_sleep period to 0, which will
263           enable "waitpid" call blocking.
264
265               my $pm = Parallel::ForkManager->new( 4 );
266
267               $pm->set_waitpid_blocking_sleep(0);  # true blocking calls enabled
268
269               for ( 1..100 ) {
270                   $pm->start and next;
271
272                   ...; # do work
273
274                   $pm->finish;
275               }
276

RETRIEVING DATASTRUCTURES from child processes

278       The ability for the parent to retrieve data structures is new as of
279       version 0.7.6.
280
281       Each child process may optionally send 1 data structure back to the
282       parent.  By data structure, we mean a reference to a string, hash or
283       array. The contents of the data structure are written out to temporary
284       files on disc using the Storable modules' store() method. The reference
285       is then retrieved from within the code you send to the run_on_finish
286       callback.
287
288       The data structure can be any scalar perl data structure which makes
289       sense: string, numeric value or a reference to an array, hash or
290       object.
291
292       There are 2 steps involved in retrieving data structures:
293
294       1) A reference to the data structure the child wishes to send back to
295       the parent is provided as the second argument to the finish() call. It
296       is up to the child to decide whether or not to send anything back to
297       the parent.
298
299       2) The data structure reference is retrieved using the callback
300       provided in the run_on_finish() method.
301
302       Keep in mind that data structure retrieval is not the same as returning
303       a data structure from a method call. That is not what actually occurs.
304       The data structure referenced in a given child process is serialized
305       and written out to a file by Storable. The file is subsequently read
306       back into memory and a new data structure belonging to the parent
307       process is created. Please consider the performance penalty it can
308       imply, so try to keep the returned structure small.
309

EXAMPLES

311   Parallel get
312       This small example can be used to get URLs in parallel.
313
314         use Parallel::ForkManager;
315         use LWP::Simple;
316
317         my $pm = Parallel::ForkManager->new(10);
318
319         LINKS:
320         for my $link (@ARGV) {
321           $pm->start and next LINKS;
322           my ($fn) = $link =~ /^.*\/(.*?)$/;
323           if (!$fn) {
324             warn "Cannot determine filename from $fn\n";
325           } else {
326             $0 .= " " . $fn;
327             print "Getting $fn from $link\n";
328             my $rc = getstore($link, $fn);
329             print "$link downloaded. response code: $rc\n";
330           };
331           $pm->finish;
332         };
333
334   Callbacks
335       Example of a program using callbacks to get child exit codes:
336
337         use strict;
338         use Parallel::ForkManager;
339
340         my $max_procs = 5;
341         my @names = qw( Fred Jim Lily Steve Jessica Bob Dave Christine Rico Sara );
342         # hash to resolve PID's back to child specific information
343
344         my $pm = Parallel::ForkManager->new($max_procs);
345
346         # Setup a callback for when a child finishes up so we can
347         # get it's exit code
348         $pm->run_on_finish( sub {
349             my ($pid, $exit_code, $ident) = @_;
350             print "** $ident just got out of the pool ".
351               "with PID $pid and exit code: $exit_code\n";
352         });
353
354         $pm->run_on_start( sub {
355             my ($pid, $ident)=@_;
356             print "** $ident started, pid: $pid\n";
357         });
358
359         $pm->run_on_wait( sub {
360             print "** Have to wait for one children ...\n"
361           },
362           0.5
363         );
364
365         NAMES:
366         foreach my $child ( 0 .. $#names ) {
367           my $pid = $pm->start($names[$child]) and next NAMES;
368
369           # This code is the child process
370           print "This is $names[$child], Child number $child\n";
371           sleep ( 2 * $child );
372           print "$names[$child], Child $child is about to get out...\n";
373           sleep 1;
374           $pm->finish($child); # pass an exit code to finish
375         }
376
377         print "Waiting for Children...\n";
378         $pm->wait_all_children;
379         print "Everybody is out of the pool!\n";
380
381   Data structure retrieval
382       In this simple example, each child sends back a string reference.
383
384         use Parallel::ForkManager 0.7.6;
385         use strict;
386
387         my $pm = Parallel::ForkManager->new(2, '/server/path/to/temp/dir/');
388
389         # data structure retrieval and handling
390         $pm -> run_on_finish ( # called BEFORE the first call to start()
391           sub {
392             my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_structure_reference) = @_;
393
394             # retrieve data structure from child
395             if (defined($data_structure_reference)) {  # children are not forced to send anything
396               my $string = ${$data_structure_reference};  # child passed a string reference
397               print "$string\n";
398             }
399             else {  # problems occurring during storage or retrieval will throw a warning
400               print qq|No message received from child process $pid!\n|;
401             }
402           }
403         );
404
405         # prep random statement components
406         my @foods = ('chocolate', 'ice cream', 'peanut butter', 'pickles', 'pizza', 'bacon', 'pancakes', 'spaghetti', 'cookies');
407         my @preferences = ('loves', q|can't stand|, 'always wants more', 'will walk 100 miles for', 'only eats', 'would starve rather than eat');
408
409         # run the parallel processes
410         PERSONS:
411         foreach my $person (qw(Fred Wilma Ernie Bert Lucy Ethel Curly Moe Larry)) {
412           $pm->start() and next PERSONS;
413
414           # generate a random statement about food preferences
415           my $statement = $person . ' ' . $preferences[int(rand @preferences)] . ' ' . $foods[int(rand @foods)];
416
417           # send it back to the parent process
418           $pm->finish(0, \$statement);  # note that it's a scalar REFERENCE, not the scalar itself
419         }
420         $pm->wait_all_children;
421
422       A second datastructure retrieval example demonstrates how children
423       decide whether or not to send anything back, what to send and how the
424       parent should process whatever is retrieved.
425
426         use Parallel::ForkManager 0.7.6;
427         use Data::Dumper;  # to display the data structures retrieved.
428         use strict;
429
430         my $pm = Parallel::ForkManager->new(20);  # using the system temp dir $L<File::Temp::tempdir()
431
432         # data structure retrieval and handling
433         my %retrieved_responses = ();  # for collecting responses
434         $pm -> run_on_finish (
435           sub {
436             my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_structure_reference) = @_;
437
438             # see what the child sent us, if anything
439             if (defined($data_structure_reference)) {  # test rather than assume child sent anything
440               my $reftype = ref($data_structure_reference);
441               print qq|ident "$ident" returned a "$reftype" reference.\n\n|;
442               if (1) {  # simple on/off switch to display the contents
443                 print &Dumper($data_structure_reference) . qq|end of "$ident" sent structure\n\n|;
444               }
445
446               # we can also collect retrieved data structures for processing after all children have exited
447               $retrieved_responses{$ident} = $data_structure_reference;
448             } else {
449               print qq|ident "$ident" did not send anything.\n\n|;
450             }
451           }
452         );
453
454         # generate a list of instructions
455         my @instructions = (  # a unique identifier and what the child process should send
456           {'name' => '%ENV keys as a string', 'send' => 'keys'},
457           {'name' => 'Send Nothing'},  # not instructing the child to send anything back to the parent
458           {'name' => 'Childs %ENV', 'send' => 'all'},
459           {'name' => 'Child chooses randomly', 'send' => 'random'},
460           {'name' => 'Invalid send instructions', 'send' => 'Na Na Nana Na'},
461           {'name' => 'ENV values in an array', 'send' => 'values'},
462         );
463
464         INSTRUCTS:
465         foreach my $instruction (@instructions) {
466           $pm->start($instruction->{'name'}) and next INSTRUCTS;  # this time we are using an explicit, unique child process identifier
467
468           # last step in child processing
469           $pm->finish(0) unless $instruction->{'send'};  # no data structure is sent unless this child is told what to send.
470
471           if ($instruction->{'send'} eq 'keys') {
472             $pm->finish(0, \join(', ', keys %ENV));
473
474           } elsif ($instruction->{'send'} eq 'values') {
475             $pm->finish(0, [values %ENV]);  # kinda useless without knowing which keys they belong to...
476
477           } elsif ($instruction->{'send'} eq 'all') {
478             $pm->finish(0, \%ENV);  # remember, we are not "returning" anything, just copying the hash to disc
479
480           # demonstrate clearly that the child determines what type of reference to send
481           } elsif ($instruction->{'send'} eq 'random') {
482             my $string = q|I'm just a string.|;
483             my @array = qw(I am an array);
484             my %hash = (type => 'associative array', synonym => 'hash', cool => 'very :)');
485             my $return_choice = ('string', 'array', 'hash')[int(rand 3)];  # randomly choose return data type
486             $pm->finish(0, \$string) if ($return_choice eq 'string');
487             $pm->finish(0, \@array) if ($return_choice eq 'array');
488             $pm->finish(0, \%hash) if ($return_choice eq 'hash');
489
490           # as a responsible child, inform parent that their instruction was invalid
491           } else {
492             $pm->finish(0, \qq|Invalid instructions: "$instruction->{'send'}".|);  # ordinarily I wouldn't include invalid input in a response...
493           }
494         }
495         $pm->wait_all_children;  # blocks until all forked processes have exited
496
497         # post fork processing of returned data structures
498         for (sort keys %retrieved_responses) {
499           print qq|Post processing "$_"...\n|;
500         }
501

USING RAND() IN FORKED PROCESSES

503       A caveat worth noting is that all forked processes will use the same
504       random seed, so potentially providing the same results (see
505       <http://blogs.perl.org/users/brian_phillips/2010/06/when-rand-isnt-random.html>).
506       If you are using "rand()" and want each forked child to use a different
507       seed, you can add the following to your program:
508
509           $pm->run_on_start(sub { srand });
510

SECURITY

512       Parallel::ForkManager uses temporary files when a child process returns
513       information to its parent process. The filenames are based on the
514       process of the parent and child processes, so they are fairly easy to
515       guess. So if security is a concern in your environment, make sure the
516       directory used by Parallel::ForkManager is restricted to the current
517       user only (the default behavior is to create a directory, via
518       File::Temp's "tempdir", which does that).
519

TROUBLESHOOTING

521   PerlIO::gzip and Parallel::ForkManager do not play nice together
522       If you are using PerlIO::gzip in your child processes, you may end up
523       with garbled files. This is not really P::FM's fault, but rather a
524       problem between PerlIO::gzip and "fork()" (see
525       <https://rt.cpan.org/Public/Bug/Display.html?id=114557>).
526
527       Fortunately, it seems there is an easy way to fix the problem by adding
528       the "unix" layer? I.e.,
529
530           open(IN, '<:unix:gzip', ...
531

BUGS AND LIMITATIONS

533       Do not use Parallel::ForkManager in an environment, where other child
534       processes can affect the run of the main program, so using this module
535       is not recommended in an environment where fork() / wait() is already
536       used.
537
538       If you want to use more than one copies of the Parallel::ForkManager,
539       then you have to make sure that all children processes are terminated,
540       before you use the second object in the main program.
541
542       You are free to use a new copy of Parallel::ForkManager in the child
543       processes, although I don't think it makes sense.
544

CREDITS

546         Michael Gang (bug report)
547         Noah Robin <sitz@onastick.net> (documentation tweaks)
548         Chuck Hirstius <chirstius@megapathdsl.net> (callback exit status, example)
549         Grant Hopwood <hopwoodg@valero.com> (win32 port)
550         Mark Southern <mark_southern@merck.com> (bugfix)
551         Ken Clarke <www.perlprogrammer.net>  (datastructure retrieval)
552

AUTHORS

554       ·   dLux (SzabÃ³, BalÃ¡zs) <dlux@dlux.hu>
555
556       ·   Yanick Champoux <yanick@cpan.org>
557
558       ·   Gabor Szabo <gabor@szabgab.com>
559

COPYRIGHT AND LICENSE

561       This software is copyright (c) 2018, 2016, 2015 by BalÃ¡zs SzabÃ³.
562
563       This is free software; you can redistribute it and/or modify it under
564       the same terms as the Perl 5 programming language system itself.
565
566
567
568perl v5.28.0                      2018-07-19          Parallel::ForkManager(3)