Parallel::ForkManager(3pm)

1Parallel::ForkManager(3U)ser Contributed Perl DocumentatiPoanrallel::ForkManager(3)
2
3
4

NAME

6       Parallel::ForkManager - A simple parallel processing fork manager
7

VERSION

9       version 2.02
10

SYNOPSIS

12         use Parallel::ForkManager;
13
14         my $pm = Parallel::ForkManager->new($MAX_PROCESSES);
15
16         DATA_LOOP:
17         foreach my $data (@all_data) {
18           # Forks and returns the pid for the child:
19           my $pid = $pm->start and next DATA_LOOP;
20
21           ... do some work with $data in the child process ...
22
23           $pm->finish; # Terminates the child process
24         }
25

DESCRIPTION

27       This module is intended for use in operations that can be done in
28       parallel where the number of processes to be forked off should be
29       limited. Typical use is a downloader which will be retrieving
30       hundreds/thousands of files.
31
32       The code for a downloader would look something like this:
33
34         use LWP::Simple;
35         use Parallel::ForkManager;
36
37         ...
38
39         my @links=(
40           ["http://www.foo.bar/rulez.data","rulez_data.txt"],
41           ["http://new.host/more_data.doc","more_data.doc"],
42           ...
43         );
44
45         ...
46
47         # Max 30 processes for parallel download
48         my $pm = Parallel::ForkManager->new(30);
49
50         LINKS:
51         foreach my $linkarray (@links) {
52           $pm->start and next LINKS; # do the fork
53
54           my ($link, $fn) = @$linkarray;
55           warn "Cannot get $fn from $link"
56             if getstore($link, $fn) != RC_OK;
57
58           $pm->finish; # do the exit in the child process
59         }
60         $pm->wait_all_children;
61
62       First you need to instantiate the ForkManager with the "new"
63       constructor.  You must specify the maximum number of processes to be
64       created. If you specify 0, then NO fork will be done; this is good for
65       debugging purposes.
66
67       Next, use $pm->start to do the fork. $pm returns 0 for the child
68       process, and child pid for the parent process (see also "fork()" in
69       perlfunc(1p)).  The "and next" skips the internal loop in the parent
70       process. NOTE: $pm->start dies if the fork fails.
71
72       $pm->finish terminates the child process (assuming a fork was done in
73       the "start").
74
75       NOTE: You cannot use $pm->start if you are already in the child
76       process.  If you want to manage another set of subprocesses in the
77       child process, you must instantiate another Parallel::ForkManager
78       object!
79

METHODS

81       The comment letter indicates where the method should be run. P for
82       parent, C for child.
83
84       new $processes
85            Instantiate a new Parallel::ForkManager object. You must specify
86            the maximum number of children to fork off. If you specify 0
87            (zero), then no children will be forked. This is intended for
88            debugging purposes.
89
90            The optional second parameter, $tempdir, is only used if you want
91            the children to send back a reference to some data (see RETRIEVING
92            DATASTRUCTURES below). If not provided, it is set via a call to
93            File::Temp::tempdir().
94
95            The new method will die if the temporary directory does not exist
96            or it is not a directory.
97
98            Since version 2.00, the constructor can also be called in the
99            typical Moo/Moose fashion. I.e.
100
101                my $fm = Parallel::ForkManager->new(
102                    max_procs => 4,
103                    tempdir => '...',
104                    child_role => 'Parallel::ForkManager::CustomChild',
105                );
106
107       child_role
108            Returns the name of the role consumed by the ForkManager object in
109            child processes. Defaults to Parallel::ForkManager::Child and can
110            be set to something else via the constructor.
111
112       start [ $process_identifier ]
113            This method does the fork. It returns the pid of the child process
114            for the parent, and 0 for the child process. If the $processes
115            parameter for the constructor is 0 then, assuming you're in the
116            child process, $pm->start simply returns 0.
117
118            An optional $process_identifier can be provided to this method...
119            It is used by the "run_on_finish" callback (see CALLBACKS) for
120            identifying the finished process.
121
122       start_child [ $process_identifier, ] \&callback
123            Like "start", but will run the &callback as the child. If the
124            callback returns anything, it'll be passed as the data to transmit
125            back to the parent process via "finish()".
126
127       finish [ $exit_code [, $data_structure_reference] ]
128            Closes the child process by exiting and accepts an optional exit
129            code (default exit code is 0) which can be retrieved in the parent
130            via callback.  If the second optional parameter is provided, the
131            child attempts to send its contents back to the parent. If you use
132            the program in debug mode ($processes == 0), this method just
133            calls the callback.
134
135            If the $data_structure_reference is provided, then it is
136            serialized and passed to the parent process. See RETRIEVING
137            DATASTRUCTURES for more info.
138
139       set_max_procs $processes
140            Allows you to set a new maximum number of children to maintain.
141
142       wait_all_children
143            You can call this method to wait for all the processes which have
144            been forked. This is a blocking wait.
145
146       reap_finished_children
147            This is a non-blocking call to reap children and execute callbacks
148            independent of calls to "start" or "wait_all_children". Use this
149            in scenarios where "start" is called infrequently but you would
150            like the callbacks executed quickly.
151
152       is_parent
153            Returns "true" if within the parent or "false" if within the
154            child.
155
156       is_child
157            Returns "true" if within the child or "false" if within the
158            parent.
159
160       max_procs
161            Returns the maximal number of processes the object will fork.
162
163       running_procs
164            Returns the pids of the forked processes currently monitored by
165            the "Parallel::ForkManager". Note that children are still reported
166            as running until the fork manager harvest them, via the next call
167            to "start" or "wait_all_children".
168
169                my @pids = $pm->running_procs;
170
171                my $nbr_children =- $pm->running_procs;
172
173       wait_for_available_procs( $n )
174            Wait until $n available process slots are available.  If $n is not
175            given, defaults to 1.
176
177       waitpid_blocking_sleep
178            Returns the sleep period, in seconds, of the pseudo-blocking
179            calls. The sleep period can be a fraction of second.
180
181            Returns 0 if disabled.
182
183            Defaults to 1 second.
184
185            See BLOCKING CALLS for more details.
186
187       set_waitpid_blocking_sleep $seconds
188            Sets the the sleep period, in seconds, of the pseudo-blocking
189            calls.  Set to 0 to disable.
190
191            See BLOCKING CALLS for more details.
192

CALLBACKS

194       You can define callbacks in the code, which are called on events like
195       starting a process or upon finish. Declare these before the first call
196       to start().
197
198       The callbacks can be defined with the following methods:
199
200       run_on_finish $code [, $pid ]
201           You can define a subroutine which is called when a child is
202           terminated. It is called in the parent process.
203
204           The parameters of the $code are the following:
205
206             - pid of the process, which is terminated
207             - exit code of the program
208             - identification of the process (if provided in the "start" method)
209             - exit signal (0-127: signal name)
210             - core dump (1 if there was core dump at exit)
211             - datastructure reference or undef (see RETRIEVING DATASTRUCTURES)
212
213       run_on_start $code
214           You can define a subroutine which is called when a child is
215           started. It called after the successful startup of a child in the
216           parent process.
217
218           The parameters of the $code are the following:
219
220             - pid of the process which has been started
221             - identification of the process (if provided in the "start" method)
222
223       run_on_wait $code, [$period]
224           You can define a subroutine which is called when the child process
225           needs to wait for the startup. If $period is not defined, then one
226           call is done per child. If $period is defined, then $code is called
227           periodically and the module waits for $period seconds between the
228           two calls. Note, $period can be fractional number also. The exact
229           "$period seconds" is not guaranteed, signals can shorten and the
230           process scheduler can make it longer (on busy systems).
231
232           The $code called in the "start" and the "wait_all_children" method
233           also.
234
235           No parameters are passed to the $code on the call.
236

BLOCKING CALLS

238       When it comes to waiting for child processes to terminate,
239       "Parallel::ForkManager" is between a fork and a hard place (if you
240       excuse the terrible pun). The underlying Perl "waitpid" function that
241       the module relies on can block until either one specific or any child
242       process terminate, but not for a process part of a given group.
243
244       This means that the module can do one of two things when it waits for
245       one of its child processes to terminate:
246
247       Only wait for its own child processes
248           This is done via a loop using a "waitpid" non-blocking call and a
249           sleep statement.  The code does something along the lines of
250
251               while(1) {
252                   if ( any of the P::FM child process terminated ) {
253                       return its pid
254                   }
255
256                   sleep $sleep_period
257               }
258
259           This is the default behavior that the module will use.  This is not
260           the most efficient way to wait for child processes, but it's the
261           safest way to ensure that "Parallel::ForkManager" won't interfere
262           with any other part of the codebase.
263
264           The sleep period is set via the method
265           "set_waitpid_blocking_sleep".
266
267       Block until any process terminate
268           Alternatively, "Parallel::ForkManager" can call "waitpid" such that
269           it will block until any child process terminate. If the child
270           process was not one of the monitored subprocesses, the wait will
271           resume. This is more efficient, but mean that "P::FM" can captures
272           (and discards) the termination notification that a different part
273           of the code might be waiting for.
274
275           If this is a race condition that doesn't apply to your codebase,
276           you can set the waitpid_blocking_sleep period to 0, which will
277           enable "waitpid" call blocking.
278
279               my $pm = Parallel::ForkManager->new( 4 );
280
281               $pm->set_waitpid_blocking_sleep(0);  # true blocking calls enabled
282
283               for ( 1..100 ) {
284                   $pm->start and next;
285
286                   ...; # do work
287
288                   $pm->finish;
289               }
290

RETRIEVING DATASTRUCTURES from child processes

292       The ability for the parent to retrieve data structures is new as of
293       version 0.7.6.
294
295       Each child process may optionally send 1 data structure back to the
296       parent.  By data structure, we mean a reference to a string, hash or
297       array. The contents of the data structure are written out to temporary
298       files on disc using the Storable modules' store() method. The reference
299       is then retrieved from within the code you send to the run_on_finish
300       callback.
301
302       The data structure can be any scalar perl data structure which makes
303       sense: string, numeric value or a reference to an array, hash or
304       object.
305
306       There are 2 steps involved in retrieving data structures:
307
308       1) A reference to the data structure the child wishes to send back to
309       the parent is provided as the second argument to the finish() call. It
310       is up to the child to decide whether or not to send anything back to
311       the parent.
312
313       2) The data structure reference is retrieved using the callback
314       provided in the run_on_finish() method.
315
316       Keep in mind that data structure retrieval is not the same as returning
317       a data structure from a method call. That is not what actually occurs.
318       The data structure referenced in a given child process is serialized
319       and written out to a file by Storable. The file is subsequently read
320       back into memory and a new data structure belonging to the parent
321       process is created. Please consider the performance penalty it can
322       imply, so try to keep the returned structure small.
323

EXAMPLES

325   Parallel get
326       This small example can be used to get URLs in parallel.
327
328         use Parallel::ForkManager;
329         use LWP::Simple;
330
331         my $pm = Parallel::ForkManager->new(10);
332
333         LINKS:
334         for my $link (@ARGV) {
335           $pm->start and next LINKS;
336           my ($fn) = $link =~ /^.*\/(.*?)$/;
337           if (!$fn) {
338             warn "Cannot determine filename from $fn\n";
339           } else {
340             $0 .= " " . $fn;
341             print "Getting $fn from $link\n";
342             my $rc = getstore($link, $fn);
343             print "$link downloaded. response code: $rc\n";
344           };
345           $pm->finish;
346         };
347
348   Callbacks
349       Example of a program using callbacks to get child exit codes:
350
351         use strict;
352         use Parallel::ForkManager;
353
354         my $max_procs = 5;
355         my @names = qw( Fred Jim Lily Steve Jessica Bob Dave Christine Rico Sara );
356         # hash to resolve PID's back to child specific information
357
358         my $pm = Parallel::ForkManager->new($max_procs);
359
360         # Setup a callback for when a child finishes up so we can
361         # get it's exit code
362         $pm->run_on_finish( sub {
363             my ($pid, $exit_code, $ident) = @_;
364             print "** $ident just got out of the pool ".
365               "with PID $pid and exit code: $exit_code\n";
366         });
367
368         $pm->run_on_start( sub {
369             my ($pid, $ident)=@_;
370             print "** $ident started, pid: $pid\n";
371         });
372
373         $pm->run_on_wait( sub {
374             print "** Have to wait for one children ...\n"
375           },
376           0.5
377         );
378
379         NAMES:
380         foreach my $child ( 0 .. $#names ) {
381           my $pid = $pm->start($names[$child]) and next NAMES;
382
383           # This code is the child process
384           print "This is $names[$child], Child number $child\n";
385           sleep ( 2 * $child );
386           print "$names[$child], Child $child is about to get out...\n";
387           sleep 1;
388           $pm->finish($child); # pass an exit code to finish
389         }
390
391         print "Waiting for Children...\n";
392         $pm->wait_all_children;
393         print "Everybody is out of the pool!\n";
394
395   Data structure retrieval
396       In this simple example, each child sends back a string reference.
397
398         use Parallel::ForkManager 0.7.6;
399         use strict;
400
401         my $pm = Parallel::ForkManager->new(2, '/server/path/to/temp/dir/');
402
403         # data structure retrieval and handling
404         $pm -> run_on_finish ( # called BEFORE the first call to start()
405           sub {
406             my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_structure_reference) = @_;
407
408             # retrieve data structure from child
409             if (defined($data_structure_reference)) {  # children are not forced to send anything
410               my $string = ${$data_structure_reference};  # child passed a string reference
411               print "$string\n";
412             }
413             else {  # problems occurring during storage or retrieval will throw a warning
414               print qq|No message received from child process $pid!\n|;
415             }
416           }
417         );
418
419         # prep random statement components
420         my @foods = ('chocolate', 'ice cream', 'peanut butter', 'pickles', 'pizza', 'bacon', 'pancakes', 'spaghetti', 'cookies');
421         my @preferences = ('loves', q|can't stand|, 'always wants more', 'will walk 100 miles for', 'only eats', 'would starve rather than eat');
422
423         # run the parallel processes
424         PERSONS:
425         foreach my $person (qw(Fred Wilma Ernie Bert Lucy Ethel Curly Moe Larry)) {
426           $pm->start() and next PERSONS;
427
428           # generate a random statement about food preferences
429           my $statement = $person . ' ' . $preferences[int(rand @preferences)] . ' ' . $foods[int(rand @foods)];
430
431           # send it back to the parent process
432           $pm->finish(0, \$statement);  # note that it's a scalar REFERENCE, not the scalar itself
433         }
434         $pm->wait_all_children;
435
436       A second datastructure retrieval example demonstrates how children
437       decide whether or not to send anything back, what to send and how the
438       parent should process whatever is retrieved.
439
440         use Parallel::ForkManager 0.7.6;
441         use Data::Dumper;  # to display the data structures retrieved.
442         use strict;
443
444         my $pm = Parallel::ForkManager->new(20);  # using the system temp dir $L<File::Temp::tempdir()
445
446         # data structure retrieval and handling
447         my %retrieved_responses = ();  # for collecting responses
448         $pm -> run_on_finish (
449           sub {
450             my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_structure_reference) = @_;
451
452             # see what the child sent us, if anything
453             if (defined($data_structure_reference)) {  # test rather than assume child sent anything
454               my $reftype = ref($data_structure_reference);
455               print qq|ident "$ident" returned a "$reftype" reference.\n\n|;
456               if (1) {  # simple on/off switch to display the contents
457                 print &Dumper($data_structure_reference) . qq|end of "$ident" sent structure\n\n|;
458               }
459
460               # we can also collect retrieved data structures for processing after all children have exited
461               $retrieved_responses{$ident} = $data_structure_reference;
462             } else {
463               print qq|ident "$ident" did not send anything.\n\n|;
464             }
465           }
466         );
467
468         # generate a list of instructions
469         my @instructions = (  # a unique identifier and what the child process should send
470           {'name' => '%ENV keys as a string', 'send' => 'keys'},
471           {'name' => 'Send Nothing'},  # not instructing the child to send anything back to the parent
472           {'name' => 'Childs %ENV', 'send' => 'all'},
473           {'name' => 'Child chooses randomly', 'send' => 'random'},
474           {'name' => 'Invalid send instructions', 'send' => 'Na Na Nana Na'},
475           {'name' => 'ENV values in an array', 'send' => 'values'},
476         );
477
478         INSTRUCTS:
479         foreach my $instruction (@instructions) {
480           $pm->start($instruction->{'name'}) and next INSTRUCTS;  # this time we are using an explicit, unique child process identifier
481
482           # last step in child processing
483           $pm->finish(0) unless $instruction->{'send'};  # no data structure is sent unless this child is told what to send.
484
485           if ($instruction->{'send'} eq 'keys') {
486             $pm->finish(0, \join(', ', keys %ENV));
487
488           } elsif ($instruction->{'send'} eq 'values') {
489             $pm->finish(0, [values %ENV]);  # kinda useless without knowing which keys they belong to...
490
491           } elsif ($instruction->{'send'} eq 'all') {
492             $pm->finish(0, \%ENV);  # remember, we are not "returning" anything, just copying the hash to disc
493
494           # demonstrate clearly that the child determines what type of reference to send
495           } elsif ($instruction->{'send'} eq 'random') {
496             my $string = q|I'm just a string.|;
497             my @array = qw(I am an array);
498             my %hash = (type => 'associative array', synonym => 'hash', cool => 'very :)');
499             my $return_choice = ('string', 'array', 'hash')[int(rand 3)];  # randomly choose return data type
500             $pm->finish(0, \$string) if ($return_choice eq 'string');
501             $pm->finish(0, \@array) if ($return_choice eq 'array');
502             $pm->finish(0, \%hash) if ($return_choice eq 'hash');
503
504           # as a responsible child, inform parent that their instruction was invalid
505           } else {
506             $pm->finish(0, \qq|Invalid instructions: "$instruction->{'send'}".|);  # ordinarily I wouldn't include invalid input in a response...
507           }
508         }
509         $pm->wait_all_children;  # blocks until all forked processes have exited
510
511         # post fork processing of returned data structures
512         for (sort keys %retrieved_responses) {
513           print qq|Post processing "$_"...\n|;
514         }
515

USING RAND() IN FORKED PROCESSES

517       A caveat worth noting is that all forked processes will use the same
518       random seed, so potentially providing the same results (see
519       <http://blogs.perl.org/users/brian_phillips/2010/06/when-rand-isnt-random.html>).
520       If you are using "rand()" and want each forked child to use a different
521       seed, you can add the following to your program:
522
523           $pm->run_on_start(sub { srand });
524

EXTENDING

526       As of version 2.0.0, "Parallel::ForkManager" uses Moo under the hood.
527       When a process is being forked from the parent object, the forked
528       instance of the object will be modified to consume the
529       Parallel::ForkManager::Child role. All of this makes extending
530       Parallel::ForkManager to implement any storing/retrieving mechanism or
531       any other behavior fairly easy.
532
533   Example: store and retrieve data via a web service
534           {
535               package Parallel::ForkManager::Web;
536
537               use HTTP::Tiny;
538
539               use Moo;
540               extends 'Parallel::ForkManager';
541
542               has ua => (
543                   is => 'ro',
544                   lazy => 1,
545                   default => sub {
546                       HTTP::Tiny->new;
547                   }
548               );
549
550               sub store {
551                   my( $self, $data ) = @_;
552
553                   $self->ua->post( "http://.../store/$$", { body => $data } );
554               }
555
556               sub retrieve {
557                   my( $self, $kid_id ) = @_;
558
559                   $self->ua->get( "http://.../store/$kid_id" )->{content};
560               }
561
562           }
563
564           my $fm = Parallel::ForkManager::Web->new(2);
565
566           $fm->run_on_finish(sub{
567               my $retrieved = $_[5];
568
569               print "got ", $retrieved, "\n";
570           });
571
572           $fm->start_child(sub {
573               return $_**2;
574           }) for 1..3;
575
576           $fm->wait_all_children;
577
578   Example: have the child processes exit differently
579           use Parallel::ForkManager;
580
581           package Parallel::ForkManager::Child::PosixExit {
582               use Moo::Role;
583               with 'Parallel::ForkManager::Child';
584
585               sub finish  { POSIX::_exit() };
586           }
587
588           my $fm = Parallel::ForkManager->new(
589               max_proc   => 1,
590               child_role => 'Parallel::ForkManager::Child::PosixExit'
591           );
592

SECURITY

594       Parallel::ForkManager uses temporary files when a child process returns
595       information to its parent process. The filenames are based on the
596       process of the parent and child processes, so they are fairly easy to
597       guess. So if security is a concern in your environment, make sure the
598       directory used by Parallel::ForkManager is restricted to the current
599       user only (the default behavior is to create a directory, via
600       File::Temp's "tempdir", which does that).
601

TROUBLESHOOTING

603   PerlIO::gzip and Parallel::ForkManager do not play nice together
604       If you are using PerlIO::gzip in your child processes, you may end up
605       with garbled files. This is not really P::FM's fault, but rather a
606       problem between PerlIO::gzip and "fork()" (see
607       <https://rt.cpan.org/Public/Bug/Display.html?id=114557>).
608
609       Fortunately, it seems there is an easy way to fix the problem by adding
610       the "unix" layer? I.e.,
611
612           open(IN, '<:unix:gzip', ...
613

BUGS AND LIMITATIONS

615       Do not use Parallel::ForkManager in an environment where other child
616       processes can affect the run of the main program; using this module is
617       not recommended in an environment where fork() / wait() is already
618       used.
619
620       If you want to use more than one copies of the Parallel::ForkManager,
621       then you have to make sure that all children processes are terminated,
622       before you use the second object in the main program.
623
624       You are free to use a new copy of Parallel::ForkManager in the child
625       processes, although I don't think it makes sense.
626

CREDITS

628         Michael Gang (bug report)
629         Noah Robin <sitz@onastick.net> (documentation tweaks)
630         Chuck Hirstius <chirstius@megapathdsl.net> (callback exit status, example)
631         Grant Hopwood <hopwoodg@valero.com> (win32 port)
632         Mark Southern <mark_southern@merck.com> (bugfix)
633         Ken Clarke <www.perlprogrammer.net>  (datastructure retrieval)
634

AUTHORS

636       •   dLux (SzabÃ³, BalÃ¡zs) <dlux@dlux.hu>
637
638       •   Yanick Champoux <yanick@cpan.org>
639
640       •   Gabor Szabo <gabor@szabgab.com>
641

COPYRIGHT AND LICENSE

643       This software is copyright (c) 2018, 2016, 2015 by BalÃ¡zs SzabÃ³.
644
645       This is free software; you can redistribute it and/or modify it under
646       the same terms as the Perl 5 programming language system itself.
647
648
649
650perl v5.32.1                      2021-01-27          Parallel::ForkManager(3)