1Parallel::ForkManager(3U)ser Contributed Perl DocumentatiPoanrallel::ForkManager(3)
2
3
4
6 Parallel::ForkManager - A simple parallel processing fork manager
7
9 version 1.20
10
12 use Parallel::ForkManager;
13
14 my $pm = Parallel::ForkManager->new($MAX_PROCESSES);
15
16 DATA_LOOP:
17 foreach my $data (@all_data) {
18 # Forks and returns the pid for the child:
19 my $pid = $pm->start and next DATA_LOOP;
20
21 ... do some work with $data in the child process ...
22
23 $pm->finish; # Terminates the child process
24 }
25
27 This module is intended for use in operations that can be done in
28 parallel where the number of processes to be forked off should be
29 limited. Typical use is a downloader which will be retrieving
30 hundreds/thousands of files.
31
32 The code for a downloader would look something like this:
33
34 use LWP::Simple;
35 use Parallel::ForkManager;
36
37 ...
38
39 my @links=(
40 ["http://www.foo.bar/rulez.data","rulez_data.txt"],
41 ["http://new.host/more_data.doc","more_data.doc"],
42 ...
43 );
44
45 ...
46
47 # Max 30 processes for parallel download
48 my $pm = Parallel::ForkManager->new(30);
49
50 LINKS:
51 foreach my $linkarray (@links) {
52 $pm->start and next LINKS; # do the fork
53
54 my ($link, $fn) = @$linkarray;
55 warn "Cannot get $fn from $link"
56 if getstore($link, $fn) != RC_OK;
57
58 $pm->finish; # do the exit in the child process
59 }
60 $pm->wait_all_children;
61
62 First you need to instantiate the ForkManager with the "new"
63 constructor. You must specify the maximum number of processes to be
64 created. If you specify 0, then NO fork will be done; this is good for
65 debugging purposes.
66
67 Next, use $pm->start to do the fork. $pm returns 0 for the child
68 process, and child pid for the parent process (see also "fork()" in
69 perlfunc(1p)). The "and next" skips the internal loop in the parent
70 process. NOTE: $pm->start dies if the fork fails.
71
72 $pm->finish terminates the child process (assuming a fork was done in
73 the "start").
74
75 NOTE: You cannot use $pm->start if you are already in the child
76 process. If you want to manage another set of subprocesses in the
77 child process, you must instantiate another Parallel::ForkManager
78 object!
79
81 The comment letter indicates where the method should be run. P for
82 parent, C for child.
83
84 new $processes
85 Instantiate a new Parallel::ForkManager object. You must specify
86 the maximum number of children to fork off. If you specify 0
87 (zero), then no children will be forked. This is intended for
88 debugging purposes.
89
90 The optional second parameter, $tempdir, is only used if you want
91 the children to send back a reference to some data (see RETRIEVING
92 DATASTRUCTURES below). If not provided, it is set via a call to
93 File::Temp::tempdir().
94
95 The new method will die if the temporary directory does not exist
96 or it is not a directory.
97
98 start [ $process_identifier ]
99 This method does the fork. It returns the pid of the child process
100 for the parent, and 0 for the child process. If the $processes
101 parameter for the constructor is 0 then, assuming you're in the
102 child process, $pm->start simply returns 0.
103
104 An optional $process_identifier can be provided to this method...
105 It is used by the "run_on_finish" callback (see CALLBACKS) for
106 identifying the finished process.
107
108 start_child [ $process_identifier, ] \&callback
109 Like "start", but will run the &callback as the child. If the
110 callback returns anything, it'll be passed as the data to transmit
111 back to the parent process via "finish()".
112
113 finish [ $exit_code [, $data_structure_reference] ]
114 Closes the child process by exiting and accepts an optional exit
115 code (default exit code is 0) which can be retrieved in the parent
116 via callback. If the second optional parameter is provided, the
117 child attempts to send its contents back to the parent. If you use
118 the program in debug mode ($processes == 0), this method just
119 calls the callback.
120
121 If the $data_structure_reference is provided, then it is
122 serialized and passed to the parent process. See RETRIEVING
123 DATASTRUCTURES for more info.
124
125 set_max_procs $processes
126 Allows you to set a new maximum number of children to maintain.
127
128 wait_all_children
129 You can call this method to wait for all the processes which have
130 been forked. This is a blocking wait.
131
132 reap_finished_children
133 This is a non-blocking call to reap children and execute callbacks
134 independent of calls to "start" or "wait_all_children". Use this
135 in scenarios where "start" is called infrequently but you would
136 like the callbacks executed quickly.
137
138 is_parent
139 Returns "true" if within the parent or "false" if within the
140 child.
141
142 is_child
143 Returns "true" if within the child or "false" if within the
144 parent.
145
146 max_procs
147 Returns the maximal number of processes the object will fork.
148
149 running_procs
150 Returns the pids of the forked processes currently monitored by
151 the "Parallel::ForkManager". Note that children are still reported
152 as running until the fork manager harvest them, via the next call
153 to "start" or "wait_all_children".
154
155 my @pids = $pm->running_procs;
156
157 my $nbr_children =- $pm->running_procs;
158
159 wait_for_available_procs( $n )
160 Wait until $n available process slots are available. If $n is not
161 given, defaults to 1.
162
163 waitpid_blocking_sleep
164 Returns the sleep period, in seconds, of the pseudo-blocking
165 calls. The sleep period can be a fraction of second.
166
167 Returns 0 if disabled.
168
169 Defaults to 1 second.
170
171 See BLOCKING CALLS for more details.
172
173 set_waitpid_blocking_sleep $seconds
174 Sets the the sleep period, in seconds, of the pseudo-blocking
175 calls. Set to 0 to disable.
176
177 See BLOCKING CALLS for more details.
178
180 You can define callbacks in the code, which are called on events like
181 starting a process or upon finish. Declare these before the first call
182 to start().
183
184 The callbacks can be defined with the following methods:
185
186 run_on_finish $code [, $pid ]
187 You can define a subroutine which is called when a child is
188 terminated. It is called in the parent process.
189
190 The parameters of the $code are the following:
191
192 - pid of the process, which is terminated
193 - exit code of the program
194 - identification of the process (if provided in the "start" method)
195 - exit signal (0-127: signal name)
196 - core dump (1 if there was core dump at exit)
197 - datastructure reference or undef (see RETRIEVING DATASTRUCTURES)
198
199 run_on_start $code
200 You can define a subroutine which is called when a child is
201 started. It called after the successful startup of a child in the
202 parent process.
203
204 The parameters of the $code are the following:
205
206 - pid of the process which has been started
207 - identification of the process (if provided in the "start" method)
208
209 run_on_wait $code, [$period]
210 You can define a subroutine which is called when the child process
211 needs to wait for the startup. If $period is not defined, then one
212 call is done per child. If $period is defined, then $code is called
213 periodically and the module waits for $period seconds between the
214 two calls. Note, $period can be fractional number also. The exact
215 "$period seconds" is not guaranteed, signals can shorten and the
216 process scheduler can make it longer (on busy systems).
217
218 The $code called in the "start" and the "wait_all_children" method
219 also.
220
221 No parameters are passed to the $code on the call.
222
224 When it comes to waiting for child processes to terminate,
225 "Parallel::ForkManager" is between a fork and a hard place (if you
226 excuse the terrible pun). The underlying Perl "waitpid" function that
227 the module relies on can block until either one specific or any child
228 process terminate, but not for a process part of a given group.
229
230 This means that the module can do one of two things when it waits for
231 one of its child processes to terminate:
232
233 Only wait for its own child processes
234 This is done via a loop using a "waitpid" non-blocking call and a
235 sleep statement. The code does something along the lines of
236
237 while(1) {
238 if ( any of the P::FM child process terminated ) {
239 return its pid
240 }
241
242 sleep $sleep_period
243 }
244
245 This is the default behavior that the module will use. This is not
246 the most efficient way to wait for child processes, but it's the
247 safest way to ensure that "Parallel::ForkManager" won't interfere
248 with any other part of the codebase.
249
250 The sleep period is set via the method
251 "set_waitpid_blocking_sleep".
252
253 Block until any process terminate
254 Alternatively, "Parallel::ForkManager" can call "waitpid" such that
255 it will block until any child process terminate. If the child
256 process was not one of the monitored subprocesses, the wait will
257 resume. This is more efficient, but mean that "P::FM" can captures
258 (and discards) the termination notification that a different part
259 of the code might be waiting for.
260
261 If this is a race condition that doesn't apply to your codebase,
262 you can set the waitpid_blocking_sleep period to 0, which will
263 enable "waitpid" call blocking.
264
265 my $pm = Parallel::ForkManager->new( 4 );
266
267 $pm->set_waitpid_blocking_sleep(0); # true blocking calls enabled
268
269 for ( 1..100 ) {
270 $pm->start and next;
271
272 ...; # do work
273
274 $pm->finish;
275 }
276
278 The ability for the parent to retrieve data structures is new as of
279 version 0.7.6.
280
281 Each child process may optionally send 1 data structure back to the
282 parent. By data structure, we mean a reference to a string, hash or
283 array. The contents of the data structure are written out to temporary
284 files on disc using the Storable modules' store() method. The reference
285 is then retrieved from within the code you send to the run_on_finish
286 callback.
287
288 The data structure can be any scalar perl data structure which makes
289 sense: string, numeric value or a reference to an array, hash or
290 object.
291
292 There are 2 steps involved in retrieving data structures:
293
294 1) A reference to the data structure the child wishes to send back to
295 the parent is provided as the second argument to the finish() call. It
296 is up to the child to decide whether or not to send anything back to
297 the parent.
298
299 2) The data structure reference is retrieved using the callback
300 provided in the run_on_finish() method.
301
302 Keep in mind that data structure retrieval is not the same as returning
303 a data structure from a method call. That is not what actually occurs.
304 The data structure referenced in a given child process is serialized
305 and written out to a file by Storable. The file is subsequently read
306 back into memory and a new data structure belonging to the parent
307 process is created. Please consider the performance penalty it can
308 imply, so try to keep the returned structure small.
309
311 Parallel get
312 This small example can be used to get URLs in parallel.
313
314 use Parallel::ForkManager;
315 use LWP::Simple;
316
317 my $pm = Parallel::ForkManager->new(10);
318
319 LINKS:
320 for my $link (@ARGV) {
321 $pm->start and next LINKS;
322 my ($fn) = $link =~ /^.*\/(.*?)$/;
323 if (!$fn) {
324 warn "Cannot determine filename from $fn\n";
325 } else {
326 $0 .= " " . $fn;
327 print "Getting $fn from $link\n";
328 my $rc = getstore($link, $fn);
329 print "$link downloaded. response code: $rc\n";
330 };
331 $pm->finish;
332 };
333
334 Callbacks
335 Example of a program using callbacks to get child exit codes:
336
337 use strict;
338 use Parallel::ForkManager;
339
340 my $max_procs = 5;
341 my @names = qw( Fred Jim Lily Steve Jessica Bob Dave Christine Rico Sara );
342 # hash to resolve PID's back to child specific information
343
344 my $pm = Parallel::ForkManager->new($max_procs);
345
346 # Setup a callback for when a child finishes up so we can
347 # get it's exit code
348 $pm->run_on_finish( sub {
349 my ($pid, $exit_code, $ident) = @_;
350 print "** $ident just got out of the pool ".
351 "with PID $pid and exit code: $exit_code\n";
352 });
353
354 $pm->run_on_start( sub {
355 my ($pid, $ident)=@_;
356 print "** $ident started, pid: $pid\n";
357 });
358
359 $pm->run_on_wait( sub {
360 print "** Have to wait for one children ...\n"
361 },
362 0.5
363 );
364
365 NAMES:
366 foreach my $child ( 0 .. $#names ) {
367 my $pid = $pm->start($names[$child]) and next NAMES;
368
369 # This code is the child process
370 print "This is $names[$child], Child number $child\n";
371 sleep ( 2 * $child );
372 print "$names[$child], Child $child is about to get out...\n";
373 sleep 1;
374 $pm->finish($child); # pass an exit code to finish
375 }
376
377 print "Waiting for Children...\n";
378 $pm->wait_all_children;
379 print "Everybody is out of the pool!\n";
380
381 Data structure retrieval
382 In this simple example, each child sends back a string reference.
383
384 use Parallel::ForkManager 0.7.6;
385 use strict;
386
387 my $pm = Parallel::ForkManager->new(2, '/server/path/to/temp/dir/');
388
389 # data structure retrieval and handling
390 $pm -> run_on_finish ( # called BEFORE the first call to start()
391 sub {
392 my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_structure_reference) = @_;
393
394 # retrieve data structure from child
395 if (defined($data_structure_reference)) { # children are not forced to send anything
396 my $string = ${$data_structure_reference}; # child passed a string reference
397 print "$string\n";
398 }
399 else { # problems occurring during storage or retrieval will throw a warning
400 print qq|No message received from child process $pid!\n|;
401 }
402 }
403 );
404
405 # prep random statement components
406 my @foods = ('chocolate', 'ice cream', 'peanut butter', 'pickles', 'pizza', 'bacon', 'pancakes', 'spaghetti', 'cookies');
407 my @preferences = ('loves', q|can't stand|, 'always wants more', 'will walk 100 miles for', 'only eats', 'would starve rather than eat');
408
409 # run the parallel processes
410 PERSONS:
411 foreach my $person (qw(Fred Wilma Ernie Bert Lucy Ethel Curly Moe Larry)) {
412 $pm->start() and next PERSONS;
413
414 # generate a random statement about food preferences
415 my $statement = $person . ' ' . $preferences[int(rand @preferences)] . ' ' . $foods[int(rand @foods)];
416
417 # send it back to the parent process
418 $pm->finish(0, \$statement); # note that it's a scalar REFERENCE, not the scalar itself
419 }
420 $pm->wait_all_children;
421
422 A second datastructure retrieval example demonstrates how children
423 decide whether or not to send anything back, what to send and how the
424 parent should process whatever is retrieved.
425
426 use Parallel::ForkManager 0.7.6;
427 use Data::Dumper; # to display the data structures retrieved.
428 use strict;
429
430 my $pm = Parallel::ForkManager->new(20); # using the system temp dir $L<File::Temp::tempdir()
431
432 # data structure retrieval and handling
433 my %retrieved_responses = (); # for collecting responses
434 $pm -> run_on_finish (
435 sub {
436 my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_structure_reference) = @_;
437
438 # see what the child sent us, if anything
439 if (defined($data_structure_reference)) { # test rather than assume child sent anything
440 my $reftype = ref($data_structure_reference);
441 print qq|ident "$ident" returned a "$reftype" reference.\n\n|;
442 if (1) { # simple on/off switch to display the contents
443 print &Dumper($data_structure_reference) . qq|end of "$ident" sent structure\n\n|;
444 }
445
446 # we can also collect retrieved data structures for processing after all children have exited
447 $retrieved_responses{$ident} = $data_structure_reference;
448 } else {
449 print qq|ident "$ident" did not send anything.\n\n|;
450 }
451 }
452 );
453
454 # generate a list of instructions
455 my @instructions = ( # a unique identifier and what the child process should send
456 {'name' => '%ENV keys as a string', 'send' => 'keys'},
457 {'name' => 'Send Nothing'}, # not instructing the child to send anything back to the parent
458 {'name' => 'Childs %ENV', 'send' => 'all'},
459 {'name' => 'Child chooses randomly', 'send' => 'random'},
460 {'name' => 'Invalid send instructions', 'send' => 'Na Na Nana Na'},
461 {'name' => 'ENV values in an array', 'send' => 'values'},
462 );
463
464 INSTRUCTS:
465 foreach my $instruction (@instructions) {
466 $pm->start($instruction->{'name'}) and next INSTRUCTS; # this time we are using an explicit, unique child process identifier
467
468 # last step in child processing
469 $pm->finish(0) unless $instruction->{'send'}; # no data structure is sent unless this child is told what to send.
470
471 if ($instruction->{'send'} eq 'keys') {
472 $pm->finish(0, \join(', ', keys %ENV));
473
474 } elsif ($instruction->{'send'} eq 'values') {
475 $pm->finish(0, [values %ENV]); # kinda useless without knowing which keys they belong to...
476
477 } elsif ($instruction->{'send'} eq 'all') {
478 $pm->finish(0, \%ENV); # remember, we are not "returning" anything, just copying the hash to disc
479
480 # demonstrate clearly that the child determines what type of reference to send
481 } elsif ($instruction->{'send'} eq 'random') {
482 my $string = q|I'm just a string.|;
483 my @array = qw(I am an array);
484 my %hash = (type => 'associative array', synonym => 'hash', cool => 'very :)');
485 my $return_choice = ('string', 'array', 'hash')[int(rand 3)]; # randomly choose return data type
486 $pm->finish(0, \$string) if ($return_choice eq 'string');
487 $pm->finish(0, \@array) if ($return_choice eq 'array');
488 $pm->finish(0, \%hash) if ($return_choice eq 'hash');
489
490 # as a responsible child, inform parent that their instruction was invalid
491 } else {
492 $pm->finish(0, \qq|Invalid instructions: "$instruction->{'send'}".|); # ordinarily I wouldn't include invalid input in a response...
493 }
494 }
495 $pm->wait_all_children; # blocks until all forked processes have exited
496
497 # post fork processing of returned data structures
498 for (sort keys %retrieved_responses) {
499 print qq|Post processing "$_"...\n|;
500 }
501
503 A caveat worth noting is that all forked processes will use the same
504 random seed, so potentially providing the same results (see
505 <http://blogs.perl.org/users/brian_phillips/2010/06/when-rand-isnt-random.html>).
506 If you are using "rand()" and want each forked child to use a different
507 seed, you can add the following to your program:
508
509 $pm->run_on_start(sub { srand });
510
512 Parallel::ForkManager uses temporary files when a child process returns
513 information to its parent process. The filenames are based on the
514 process of the parent and child processes, so they are fairly easy to
515 guess. So if security is a concern in your environment, make sure the
516 directory used by Parallel::ForkManager is restricted to the current
517 user only (the default behavior is to create a directory, via
518 File::Temp's "tempdir", which does that).
519
521 PerlIO::gzip and Parallel::ForkManager do not play nice together
522 If you are using PerlIO::gzip in your child processes, you may end up
523 with garbled files. This is not really P::FM's fault, but rather a
524 problem between PerlIO::gzip and "fork()" (see
525 <https://rt.cpan.org/Public/Bug/Display.html?id=114557>).
526
527 Fortunately, it seems there is an easy way to fix the problem by adding
528 the "unix" layer? I.e.,
529
530 open(IN, '<:unix:gzip', ...
531
533 Do not use Parallel::ForkManager in an environment, where other child
534 processes can affect the run of the main program, so using this module
535 is not recommended in an environment where fork() / wait() is already
536 used.
537
538 If you want to use more than one copies of the Parallel::ForkManager,
539 then you have to make sure that all children processes are terminated,
540 before you use the second object in the main program.
541
542 You are free to use a new copy of Parallel::ForkManager in the child
543 processes, although I don't think it makes sense.
544
546 Michael Gang (bug report)
547 Noah Robin <sitz@onastick.net> (documentation tweaks)
548 Chuck Hirstius <chirstius@megapathdsl.net> (callback exit status, example)
549 Grant Hopwood <hopwoodg@valero.com> (win32 port)
550 Mark Southern <mark_southern@merck.com> (bugfix)
551 Ken Clarke <www.perlprogrammer.net> (datastructure retrieval)
552
554 · dLux (Szabó, Balázs) <dlux@dlux.hu>
555
556 · Yanick Champoux <yanick@cpan.org>
557
558 · Gabor Szabo <gabor@szabgab.com>
559
561 This software is copyright (c) 2018, 2016, 2015 by Balázs Szabó.
562
563 This is free software; you can redistribute it and/or modify it under
564 the same terms as the Perl 5 programming language system itself.
565
566
567
568perl v5.28.0 2018-07-19 Parallel::ForkManager(3)