1Parallel::ForkManager(3U)ser Contributed Perl DocumentatiPoanrallel::ForkManager(3)
2
3
4
6 Parallel::ForkManager - A simple parallel processing fork manager
7
9 version 2.02
10
12 use Parallel::ForkManager;
13
14 my $pm = Parallel::ForkManager->new($MAX_PROCESSES);
15
16 DATA_LOOP:
17 foreach my $data (@all_data) {
18 # Forks and returns the pid for the child:
19 my $pid = $pm->start and next DATA_LOOP;
20
21 ... do some work with $data in the child process ...
22
23 $pm->finish; # Terminates the child process
24 }
25
27 This module is intended for use in operations that can be done in
28 parallel where the number of processes to be forked off should be
29 limited. Typical use is a downloader which will be retrieving
30 hundreds/thousands of files.
31
32 The code for a downloader would look something like this:
33
34 use LWP::Simple;
35 use Parallel::ForkManager;
36
37 ...
38
39 my @links=(
40 ["http://www.foo.bar/rulez.data","rulez_data.txt"],
41 ["http://new.host/more_data.doc","more_data.doc"],
42 ...
43 );
44
45 ...
46
47 # Max 30 processes for parallel download
48 my $pm = Parallel::ForkManager->new(30);
49
50 LINKS:
51 foreach my $linkarray (@links) {
52 $pm->start and next LINKS; # do the fork
53
54 my ($link, $fn) = @$linkarray;
55 warn "Cannot get $fn from $link"
56 if getstore($link, $fn) != RC_OK;
57
58 $pm->finish; # do the exit in the child process
59 }
60 $pm->wait_all_children;
61
62 First you need to instantiate the ForkManager with the "new"
63 constructor. You must specify the maximum number of processes to be
64 created. If you specify 0, then NO fork will be done; this is good for
65 debugging purposes.
66
67 Next, use $pm->start to do the fork. $pm returns 0 for the child
68 process, and child pid for the parent process (see also "fork()" in
69 perlfunc(1p)). The "and next" skips the internal loop in the parent
70 process. NOTE: $pm->start dies if the fork fails.
71
72 $pm->finish terminates the child process (assuming a fork was done in
73 the "start").
74
75 NOTE: You cannot use $pm->start if you are already in the child
76 process. If you want to manage another set of subprocesses in the
77 child process, you must instantiate another Parallel::ForkManager
78 object!
79
81 The comment letter indicates where the method should be run. P for
82 parent, C for child.
83
84 new $processes
85 Instantiate a new Parallel::ForkManager object. You must specify
86 the maximum number of children to fork off. If you specify 0
87 (zero), then no children will be forked. This is intended for
88 debugging purposes.
89
90 The optional second parameter, $tempdir, is only used if you want
91 the children to send back a reference to some data (see RETRIEVING
92 DATASTRUCTURES below). If not provided, it is set via a call to
93 File::Temp::tempdir().
94
95 The new method will die if the temporary directory does not exist
96 or it is not a directory.
97
98 Since version 2.00, the constructor can also be called in the
99 typical Moo/Moose fashion. I.e.
100
101 my $fm = Parallel::ForkManager->new(
102 max_procs => 4,
103 tempdir => '...',
104 child_role => 'Parallel::ForkManager::CustomChild',
105 );
106
107 child_role
108 Returns the name of the role consumed by the ForkManager object in
109 child processes. Defaults to Parallel::ForkManager::Child and can
110 be set to something else via the constructor.
111
112 start [ $process_identifier ]
113 This method does the fork. It returns the pid of the child process
114 for the parent, and 0 for the child process. If the $processes
115 parameter for the constructor is 0 then, assuming you're in the
116 child process, $pm->start simply returns 0.
117
118 An optional $process_identifier can be provided to this method...
119 It is used by the "run_on_finish" callback (see CALLBACKS) for
120 identifying the finished process.
121
122 start_child [ $process_identifier, ] \&callback
123 Like "start", but will run the &callback as the child. If the
124 callback returns anything, it'll be passed as the data to transmit
125 back to the parent process via "finish()".
126
127 finish [ $exit_code [, $data_structure_reference] ]
128 Closes the child process by exiting and accepts an optional exit
129 code (default exit code is 0) which can be retrieved in the parent
130 via callback. If the second optional parameter is provided, the
131 child attempts to send its contents back to the parent. If you use
132 the program in debug mode ($processes == 0), this method just
133 calls the callback.
134
135 If the $data_structure_reference is provided, then it is
136 serialized and passed to the parent process. See RETRIEVING
137 DATASTRUCTURES for more info.
138
139 set_max_procs $processes
140 Allows you to set a new maximum number of children to maintain.
141
142 wait_all_children
143 You can call this method to wait for all the processes which have
144 been forked. This is a blocking wait.
145
146 reap_finished_children
147 This is a non-blocking call to reap children and execute callbacks
148 independent of calls to "start" or "wait_all_children". Use this
149 in scenarios where "start" is called infrequently but you would
150 like the callbacks executed quickly.
151
152 is_parent
153 Returns "true" if within the parent or "false" if within the
154 child.
155
156 is_child
157 Returns "true" if within the child or "false" if within the
158 parent.
159
160 max_procs
161 Returns the maximal number of processes the object will fork.
162
163 running_procs
164 Returns the pids of the forked processes currently monitored by
165 the "Parallel::ForkManager". Note that children are still reported
166 as running until the fork manager harvest them, via the next call
167 to "start" or "wait_all_children".
168
169 my @pids = $pm->running_procs;
170
171 my $nbr_children =- $pm->running_procs;
172
173 wait_for_available_procs( $n )
174 Wait until $n available process slots are available. If $n is not
175 given, defaults to 1.
176
177 waitpid_blocking_sleep
178 Returns the sleep period, in seconds, of the pseudo-blocking
179 calls. The sleep period can be a fraction of second.
180
181 Returns 0 if disabled.
182
183 Defaults to 1 second.
184
185 See BLOCKING CALLS for more details.
186
187 set_waitpid_blocking_sleep $seconds
188 Sets the the sleep period, in seconds, of the pseudo-blocking
189 calls. Set to 0 to disable.
190
191 See BLOCKING CALLS for more details.
192
194 You can define callbacks in the code, which are called on events like
195 starting a process or upon finish. Declare these before the first call
196 to start().
197
198 The callbacks can be defined with the following methods:
199
200 run_on_finish $code [, $pid ]
201 You can define a subroutine which is called when a child is
202 terminated. It is called in the parent process.
203
204 The parameters of the $code are the following:
205
206 - pid of the process, which is terminated
207 - exit code of the program
208 - identification of the process (if provided in the "start" method)
209 - exit signal (0-127: signal name)
210 - core dump (1 if there was core dump at exit)
211 - datastructure reference or undef (see RETRIEVING DATASTRUCTURES)
212
213 run_on_start $code
214 You can define a subroutine which is called when a child is
215 started. It called after the successful startup of a child in the
216 parent process.
217
218 The parameters of the $code are the following:
219
220 - pid of the process which has been started
221 - identification of the process (if provided in the "start" method)
222
223 run_on_wait $code, [$period]
224 You can define a subroutine which is called when the child process
225 needs to wait for the startup. If $period is not defined, then one
226 call is done per child. If $period is defined, then $code is called
227 periodically and the module waits for $period seconds between the
228 two calls. Note, $period can be fractional number also. The exact
229 "$period seconds" is not guaranteed, signals can shorten and the
230 process scheduler can make it longer (on busy systems).
231
232 The $code called in the "start" and the "wait_all_children" method
233 also.
234
235 No parameters are passed to the $code on the call.
236
238 When it comes to waiting for child processes to terminate,
239 "Parallel::ForkManager" is between a fork and a hard place (if you
240 excuse the terrible pun). The underlying Perl "waitpid" function that
241 the module relies on can block until either one specific or any child
242 process terminate, but not for a process part of a given group.
243
244 This means that the module can do one of two things when it waits for
245 one of its child processes to terminate:
246
247 Only wait for its own child processes
248 This is done via a loop using a "waitpid" non-blocking call and a
249 sleep statement. The code does something along the lines of
250
251 while(1) {
252 if ( any of the P::FM child process terminated ) {
253 return its pid
254 }
255
256 sleep $sleep_period
257 }
258
259 This is the default behavior that the module will use. This is not
260 the most efficient way to wait for child processes, but it's the
261 safest way to ensure that "Parallel::ForkManager" won't interfere
262 with any other part of the codebase.
263
264 The sleep period is set via the method
265 "set_waitpid_blocking_sleep".
266
267 Block until any process terminate
268 Alternatively, "Parallel::ForkManager" can call "waitpid" such that
269 it will block until any child process terminate. If the child
270 process was not one of the monitored subprocesses, the wait will
271 resume. This is more efficient, but mean that "P::FM" can captures
272 (and discards) the termination notification that a different part
273 of the code might be waiting for.
274
275 If this is a race condition that doesn't apply to your codebase,
276 you can set the waitpid_blocking_sleep period to 0, which will
277 enable "waitpid" call blocking.
278
279 my $pm = Parallel::ForkManager->new( 4 );
280
281 $pm->set_waitpid_blocking_sleep(0); # true blocking calls enabled
282
283 for ( 1..100 ) {
284 $pm->start and next;
285
286 ...; # do work
287
288 $pm->finish;
289 }
290
292 The ability for the parent to retrieve data structures is new as of
293 version 0.7.6.
294
295 Each child process may optionally send 1 data structure back to the
296 parent. By data structure, we mean a reference to a string, hash or
297 array. The contents of the data structure are written out to temporary
298 files on disc using the Storable modules' store() method. The reference
299 is then retrieved from within the code you send to the run_on_finish
300 callback.
301
302 The data structure can be any scalar perl data structure which makes
303 sense: string, numeric value or a reference to an array, hash or
304 object.
305
306 There are 2 steps involved in retrieving data structures:
307
308 1) A reference to the data structure the child wishes to send back to
309 the parent is provided as the second argument to the finish() call. It
310 is up to the child to decide whether or not to send anything back to
311 the parent.
312
313 2) The data structure reference is retrieved using the callback
314 provided in the run_on_finish() method.
315
316 Keep in mind that data structure retrieval is not the same as returning
317 a data structure from a method call. That is not what actually occurs.
318 The data structure referenced in a given child process is serialized
319 and written out to a file by Storable. The file is subsequently read
320 back into memory and a new data structure belonging to the parent
321 process is created. Please consider the performance penalty it can
322 imply, so try to keep the returned structure small.
323
325 Parallel get
326 This small example can be used to get URLs in parallel.
327
328 use Parallel::ForkManager;
329 use LWP::Simple;
330
331 my $pm = Parallel::ForkManager->new(10);
332
333 LINKS:
334 for my $link (@ARGV) {
335 $pm->start and next LINKS;
336 my ($fn) = $link =~ /^.*\/(.*?)$/;
337 if (!$fn) {
338 warn "Cannot determine filename from $fn\n";
339 } else {
340 $0 .= " " . $fn;
341 print "Getting $fn from $link\n";
342 my $rc = getstore($link, $fn);
343 print "$link downloaded. response code: $rc\n";
344 };
345 $pm->finish;
346 };
347
348 Callbacks
349 Example of a program using callbacks to get child exit codes:
350
351 use strict;
352 use Parallel::ForkManager;
353
354 my $max_procs = 5;
355 my @names = qw( Fred Jim Lily Steve Jessica Bob Dave Christine Rico Sara );
356 # hash to resolve PID's back to child specific information
357
358 my $pm = Parallel::ForkManager->new($max_procs);
359
360 # Setup a callback for when a child finishes up so we can
361 # get it's exit code
362 $pm->run_on_finish( sub {
363 my ($pid, $exit_code, $ident) = @_;
364 print "** $ident just got out of the pool ".
365 "with PID $pid and exit code: $exit_code\n";
366 });
367
368 $pm->run_on_start( sub {
369 my ($pid, $ident)=@_;
370 print "** $ident started, pid: $pid\n";
371 });
372
373 $pm->run_on_wait( sub {
374 print "** Have to wait for one children ...\n"
375 },
376 0.5
377 );
378
379 NAMES:
380 foreach my $child ( 0 .. $#names ) {
381 my $pid = $pm->start($names[$child]) and next NAMES;
382
383 # This code is the child process
384 print "This is $names[$child], Child number $child\n";
385 sleep ( 2 * $child );
386 print "$names[$child], Child $child is about to get out...\n";
387 sleep 1;
388 $pm->finish($child); # pass an exit code to finish
389 }
390
391 print "Waiting for Children...\n";
392 $pm->wait_all_children;
393 print "Everybody is out of the pool!\n";
394
395 Data structure retrieval
396 In this simple example, each child sends back a string reference.
397
398 use Parallel::ForkManager 0.7.6;
399 use strict;
400
401 my $pm = Parallel::ForkManager->new(2, '/server/path/to/temp/dir/');
402
403 # data structure retrieval and handling
404 $pm -> run_on_finish ( # called BEFORE the first call to start()
405 sub {
406 my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_structure_reference) = @_;
407
408 # retrieve data structure from child
409 if (defined($data_structure_reference)) { # children are not forced to send anything
410 my $string = ${$data_structure_reference}; # child passed a string reference
411 print "$string\n";
412 }
413 else { # problems occurring during storage or retrieval will throw a warning
414 print qq|No message received from child process $pid!\n|;
415 }
416 }
417 );
418
419 # prep random statement components
420 my @foods = ('chocolate', 'ice cream', 'peanut butter', 'pickles', 'pizza', 'bacon', 'pancakes', 'spaghetti', 'cookies');
421 my @preferences = ('loves', q|can't stand|, 'always wants more', 'will walk 100 miles for', 'only eats', 'would starve rather than eat');
422
423 # run the parallel processes
424 PERSONS:
425 foreach my $person (qw(Fred Wilma Ernie Bert Lucy Ethel Curly Moe Larry)) {
426 $pm->start() and next PERSONS;
427
428 # generate a random statement about food preferences
429 my $statement = $person . ' ' . $preferences[int(rand @preferences)] . ' ' . $foods[int(rand @foods)];
430
431 # send it back to the parent process
432 $pm->finish(0, \$statement); # note that it's a scalar REFERENCE, not the scalar itself
433 }
434 $pm->wait_all_children;
435
436 A second datastructure retrieval example demonstrates how children
437 decide whether or not to send anything back, what to send and how the
438 parent should process whatever is retrieved.
439
440 use Parallel::ForkManager 0.7.6;
441 use Data::Dumper; # to display the data structures retrieved.
442 use strict;
443
444 my $pm = Parallel::ForkManager->new(20); # using the system temp dir $L<File::Temp::tempdir()
445
446 # data structure retrieval and handling
447 my %retrieved_responses = (); # for collecting responses
448 $pm -> run_on_finish (
449 sub {
450 my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_structure_reference) = @_;
451
452 # see what the child sent us, if anything
453 if (defined($data_structure_reference)) { # test rather than assume child sent anything
454 my $reftype = ref($data_structure_reference);
455 print qq|ident "$ident" returned a "$reftype" reference.\n\n|;
456 if (1) { # simple on/off switch to display the contents
457 print &Dumper($data_structure_reference) . qq|end of "$ident" sent structure\n\n|;
458 }
459
460 # we can also collect retrieved data structures for processing after all children have exited
461 $retrieved_responses{$ident} = $data_structure_reference;
462 } else {
463 print qq|ident "$ident" did not send anything.\n\n|;
464 }
465 }
466 );
467
468 # generate a list of instructions
469 my @instructions = ( # a unique identifier and what the child process should send
470 {'name' => '%ENV keys as a string', 'send' => 'keys'},
471 {'name' => 'Send Nothing'}, # not instructing the child to send anything back to the parent
472 {'name' => 'Childs %ENV', 'send' => 'all'},
473 {'name' => 'Child chooses randomly', 'send' => 'random'},
474 {'name' => 'Invalid send instructions', 'send' => 'Na Na Nana Na'},
475 {'name' => 'ENV values in an array', 'send' => 'values'},
476 );
477
478 INSTRUCTS:
479 foreach my $instruction (@instructions) {
480 $pm->start($instruction->{'name'}) and next INSTRUCTS; # this time we are using an explicit, unique child process identifier
481
482 # last step in child processing
483 $pm->finish(0) unless $instruction->{'send'}; # no data structure is sent unless this child is told what to send.
484
485 if ($instruction->{'send'} eq 'keys') {
486 $pm->finish(0, \join(', ', keys %ENV));
487
488 } elsif ($instruction->{'send'} eq 'values') {
489 $pm->finish(0, [values %ENV]); # kinda useless without knowing which keys they belong to...
490
491 } elsif ($instruction->{'send'} eq 'all') {
492 $pm->finish(0, \%ENV); # remember, we are not "returning" anything, just copying the hash to disc
493
494 # demonstrate clearly that the child determines what type of reference to send
495 } elsif ($instruction->{'send'} eq 'random') {
496 my $string = q|I'm just a string.|;
497 my @array = qw(I am an array);
498 my %hash = (type => 'associative array', synonym => 'hash', cool => 'very :)');
499 my $return_choice = ('string', 'array', 'hash')[int(rand 3)]; # randomly choose return data type
500 $pm->finish(0, \$string) if ($return_choice eq 'string');
501 $pm->finish(0, \@array) if ($return_choice eq 'array');
502 $pm->finish(0, \%hash) if ($return_choice eq 'hash');
503
504 # as a responsible child, inform parent that their instruction was invalid
505 } else {
506 $pm->finish(0, \qq|Invalid instructions: "$instruction->{'send'}".|); # ordinarily I wouldn't include invalid input in a response...
507 }
508 }
509 $pm->wait_all_children; # blocks until all forked processes have exited
510
511 # post fork processing of returned data structures
512 for (sort keys %retrieved_responses) {
513 print qq|Post processing "$_"...\n|;
514 }
515
517 A caveat worth noting is that all forked processes will use the same
518 random seed, so potentially providing the same results (see
519 <http://blogs.perl.org/users/brian_phillips/2010/06/when-rand-isnt-random.html>).
520 If you are using "rand()" and want each forked child to use a different
521 seed, you can add the following to your program:
522
523 $pm->run_on_start(sub { srand });
524
526 As of version 2.0.0, "Parallel::ForkManager" uses Moo under the hood.
527 When a process is being forked from the parent object, the forked
528 instance of the object will be modified to consume the
529 Parallel::ForkManager::Child role. All of this makes extending
530 Parallel::ForkManager to implement any storing/retrieving mechanism or
531 any other behavior fairly easy.
532
533 Example: store and retrieve data via a web service
534 {
535 package Parallel::ForkManager::Web;
536
537 use HTTP::Tiny;
538
539 use Moo;
540 extends 'Parallel::ForkManager';
541
542 has ua => (
543 is => 'ro',
544 lazy => 1,
545 default => sub {
546 HTTP::Tiny->new;
547 }
548 );
549
550 sub store {
551 my( $self, $data ) = @_;
552
553 $self->ua->post( "http://.../store/$$", { body => $data } );
554 }
555
556 sub retrieve {
557 my( $self, $kid_id ) = @_;
558
559 $self->ua->get( "http://.../store/$kid_id" )->{content};
560 }
561
562 }
563
564 my $fm = Parallel::ForkManager::Web->new(2);
565
566 $fm->run_on_finish(sub{
567 my $retrieved = $_[5];
568
569 print "got ", $retrieved, "\n";
570 });
571
572 $fm->start_child(sub {
573 return $_**2;
574 }) for 1..3;
575
576 $fm->wait_all_children;
577
578 Example: have the child processes exit differently
579 use Parallel::ForkManager;
580
581 package Parallel::ForkManager::Child::PosixExit {
582 use Moo::Role;
583 with 'Parallel::ForkManager::Child';
584
585 sub finish { POSIX::_exit() };
586 }
587
588 my $fm = Parallel::ForkManager->new(
589 max_proc => 1,
590 child_role => 'Parallel::ForkManager::Child::PosixExit'
591 );
592
594 Parallel::ForkManager uses temporary files when a child process returns
595 information to its parent process. The filenames are based on the
596 process of the parent and child processes, so they are fairly easy to
597 guess. So if security is a concern in your environment, make sure the
598 directory used by Parallel::ForkManager is restricted to the current
599 user only (the default behavior is to create a directory, via
600 File::Temp's "tempdir", which does that).
601
603 PerlIO::gzip and Parallel::ForkManager do not play nice together
604 If you are using PerlIO::gzip in your child processes, you may end up
605 with garbled files. This is not really P::FM's fault, but rather a
606 problem between PerlIO::gzip and "fork()" (see
607 <https://rt.cpan.org/Public/Bug/Display.html?id=114557>).
608
609 Fortunately, it seems there is an easy way to fix the problem by adding
610 the "unix" layer? I.e.,
611
612 open(IN, '<:unix:gzip', ...
613
615 Do not use Parallel::ForkManager in an environment where other child
616 processes can affect the run of the main program; using this module is
617 not recommended in an environment where fork() / wait() is already
618 used.
619
620 If you want to use more than one copies of the Parallel::ForkManager,
621 then you have to make sure that all children processes are terminated,
622 before you use the second object in the main program.
623
624 You are free to use a new copy of Parallel::ForkManager in the child
625 processes, although I don't think it makes sense.
626
628 Michael Gang (bug report)
629 Noah Robin <sitz@onastick.net> (documentation tweaks)
630 Chuck Hirstius <chirstius@megapathdsl.net> (callback exit status, example)
631 Grant Hopwood <hopwoodg@valero.com> (win32 port)
632 Mark Southern <mark_southern@merck.com> (bugfix)
633 Ken Clarke <www.perlprogrammer.net> (datastructure retrieval)
634
636 • dLux (Szabó, Balázs) <dlux@dlux.hu>
637
638 • Yanick Champoux <yanick@cpan.org>
639
640 • Gabor Szabo <gabor@szabgab.com>
641
643 This software is copyright (c) 2018, 2016, 2015 by Balázs Szabó.
644
645 This is free software; you can redistribute it and/or modify it under
646 the same terms as the Perl 5 programming language system itself.
647
648
649
650perl v5.32.1 2021-01-27 Parallel::ForkManager(3)