MPI_Comm_spawn(3)

1MPI_Comm_spawn(3)                   LAM/MPI                  MPI_Comm_spawn(3)
2
3
4

NAME

6       MPI_Comm_spawn -  Spawn a dynamic MPI process
7

SYNOPSIS

9       #include <mpi.h>
10       int
11       MPI_Comm_spawn(char* command, char** argv, int maxprocs, MPI_Info info,
12                      int root, MPI_Comm comm, MPI_Comm *intercomm,
13                      int *errcodes)
14

INPUT PARAMETERS

16       command
17              - Name of program to spawn (only significant at root)
18       argv   - arguments to command (only significant at root)
19       maxprocs
20              - max number of processes to start (only significant at root)
21       info   - startup hints
22       root   - rank of process to perform the spawn
23       comm   - parent intracommunicator
24
25

OUTPUT PARAMETERS

27       intercomm
28              - child intercommunicator containing spawned processes
29       errcodes
30              - one code per process
31
32

DESCRIPTION

34       A  group  of  processes  can  create  another  group  of processes with
35       MPI_Comm_spawn .  This function is a collective operation over the par‐
36       ent  communicator.  The child group starts up like any MPI application.
37       The processes must begin by calling MPI_Init ,  after  which  the  pre-
38       defined  communicator, MPI_COMM_WORLD , may be used.  This world commu‐
39       nicator contains only the child processes.  It  is  distinct  from  the
40       MPI_COMM_WORLD of the parent processes.
41
42       MPI_Comm_spawn_multiple  is used to manually specify a group of differ‐
43       ent executables and arguments to  spawn.   MPI_Comm_spawn  is  used  to
44       specify  one  executable  and set of arguments (although a LAM/MPI app‐
45       schema(5) can be provided to MPI_Comm_spawn  via  the  "lam_file"  info
46       key).
47
48       Communication With Spawned Processes
49
50       The natural communication mechanism between two groups is the intercom‐
51       municator.  The second communicator argument to MPI_Comm_spawn  returns
52       an  intercommunicator  whose  local group contains the parent processes
53       (same as the first communicator argument) and whose remote  group  con‐
54       tains  child  processes. The child processes can access the same inter‐
55       communicator by using the MPI_Comm_get_parent call.  The  remote  group
56       size  of  the parent communicator is zero if the process was created by
57       mpirun (1) instead of one of the  spawn  functions.   Both  groups  can
58       decide  to  merge the intercommunicator into an intracommunicator (with
59       the MPI_Intercomm_merge function) and take advantage of other MPI  col‐
60       lective  operations.  They can then use the merged intracommunicator to
61       create new communicators and reach other processes in the MPI  applica‐
62       tion.
63
64       Resource Allocation
65
66       LAM/MPI offers some MPI_Info keys for the placement of spawned applica‐
67       tions.  Keys are looked for in the order listed below.  The  first  key
68       that is found is used; any remaining keys are ignored.
69
70       lam_spawn_file
71
72       The  value  of  this  key can be the filename of an appschema(1).  This
73       allows the programmer to specify an arbitrary set of LAM CPUs or  nodes
74       to spawn MPI processes on.  In this case, only the appschema is used to
75       spawn the application; command , argv , and maxprocs  are  all  ignored
76       (even  at  the  root).   Note  that  even  though  maxprocs is ignored,
77       errcodes must still be an array long enough to hold  an  integer  error
78       code  for  every  process  that tried to launch, or be the MPI constant
79       MPI_ERRCODES_IGNORE .  Also note that MPI_Comm_spawn_multiple does  not
80       accept  the  "lam_spawn_file"  info key.  As such, the "lam_spawn_file"
81       info key to MPI_Comm_spawn is mainly intended to  spawn  MPMD  applica‐
82       tions and/or specify an arbitrary number of nodes to run on.
83
84       Also  note  that this "lam_spawn_file" key is not portable to other MPI
85       implementations; it is a  LAM/MPI-specific  info  key.   If  specifying
86       exact  LAM  nodes  or  CPUs is not necessary, users should probably use
87       MPI_Comm_spawn_multiple to make their program more portable.
88
89       file
90
91       This key is a synonym for "lam_spawn_file".  Since "file" is not a LAM-
92       specific  name, yet this key carries a LAM-specific meaning, its use is
93       deprecated in favor of "lam_spawn_file".
94
95       lam_spawn_sched_round_robin
96
97       The value of this key is a string representing a LAM CPU or node (using
98       standard  LAM nomenclature -- see mpirun(1)) to begin spawning on.  The
99       use of this key allows the programmer to indicate  which  node/CPU  for
100       LAM  to  start  spawning on without having to write out a temporary app
101       schema file.
102
103       The CPU number is relative to the  boot  schema  given  to  lamboot(1).
104       Only  a single LAM node/CPU may be specified, such as "n3" or "c1".  If
105       a node is specified, LAM will spawn one MPI process per node.  If a CPU
106       is  specified,  LAM  will scedule one MPI process per CPU.  An error is
107       returned if "N" or "C" is used.
108
109       Note that LAM is not involved  with  run-time  scheduling  of  the  MPI
110       process -- LAM only spawns processes on indicated nodes.  The operating
111       system schedules these processes for executation just  like  any  other
112       process.   No attempt is made by LAM to bind processes to CPUs.  Hence,
113       the "cX" nomenclature is just a convenicence mechanism to inidicate how
114       many MPI processes should be spawned on a given node; it is not indica‐
115       tive of operating system scheduling.
116
117       For "nX" values, the first MPI process will be spawned on the indicated
118       node.   The  remaining  (maxprocs - 1) MPI processes will be spawned on
119       successive nodes.  Specifically, if X  is  the  starting  node  number,
120       process  i will be launched on "nK", where K = ((X + i) % total_nodes).
121       LAM will modulus the node number with the total number of nodes in  the
122       current LAM universe to prevent errors, thereby creating a "wraparound"
123       effect.  Hence, this mechanism can be used for round-robin  scheduling,
124       regardless of how many nodes are in the LAM universe.
125
126       For "cX" values, the algorithm is essentially the same, except that LAM
127       will resolve "cX" to a specific node before  spawning,  and  successive
128       processes  are  spawned on the node where "cK" resides, where K = ((X +
129       i) % total_cpus).
130
131       For example, if there are 8 nodes and 16 CPUs in the current  LAM  uni‐
132       verse  (2  CPUs per node), a "lam_spawn_sched_round_robin" key is given
133       with the value of "c14", and maxprocs is 4, LAM will spawn MPI
134

PROCESSES ON

136       CPU  Node  MPI_COMM_WORLD rank
137       ---  ----  -------------------
138       c14  n7    0
139       c15  n7    1
140       c0   n0    2
141       c1   n0    3
142
143
144       lam_no_root_node_schedule
145
146       This key is used to designate that the spawned processes  must  not  be
147       spawned  or  scheduled  on  the "root node" (the node doing the spawn).
148       There is no specific value associated with this key, but it  should  be
149       given some non-null/non-empty dummy value.
150
151       It is a node-specific key and not a CPU-specific one. Hence if the root
152       node has multiple CPUs, none of the CPUs on this root  node  will  take
153       part in the scheduling of the spawned processes.
154
155       No keys given
156
157       If  none  of  the  info  keys  listed  above  are  used,  the  value of
158       MPI_INFO_NULL should be given for info (all  other  keys  are  ignored,
159       anyway  - there is no harm in providing other keys).  In this case, LAM
160       schedules the given number of processes onto LAM nodes by starting with
161       CPU  0  (or the lowest numbered CPU), and continuing through higher CPU
162       numbers, placing one process on each CPU.   If  the  process  count  is
163       greater than the CPU count, the procedure repeats.
164
165       Predefined Attributes
166
167       The  pre-defined  attribute on MPI_COMM_WORLD , MPI_UNIVERSE_SIZE , can
168       be useful in determining how many CPUs are currently unused.  For exam‐
169       ple,  the value in MPI_UNIVERSE_SIZE is the number of CPUs that LAM was
170       booted with (see MPI_Init(1)).  Subtracting the size of  MPI_COMM_WORLD
171       from  this value returns the number of CPUs in the current LAM universe
172       that the current application is not using (and are therefore likely not
173       being used).
174
175       Process Terminiation
176
177       Note    that   the   process[es]   spawned   by   MPI_COMM_SPAWN   (and
178       MPI_COMM_SPAWN_MULTIPLE ) effectively become  orphans.   That  is,  the
179       spawnning  MPI application does not wait for the spawned application to
180       finish.  Hence, there is no guarantee the spawned application has  fin‐
181       ished  when  the  spawning  completes.  Similarly, killing the spawning
182       application will also have no effect on the spawned application.
183
184       User applications can effect this kind  of  behavior  with  MPI_BARRIER
185       between the spawning and spawned processed before MPI_FINALIZE .
186
187
188       Note that lamclean will kill *all* MPI processes.
189
190       Process Count
191
192       The  maxprocs parameter to MPI_Comm_spawn specifies the exact number of
193       processes to be started.  If it is not possible to  start  the  desired
194       number  of  processes,  MPI_Comm_spawn will return an error code.  Note
195       that even though maxprocs is only relevant on the root, all ranks  must
196       have  an errcodes array long enough to handle an integer error code for
197       every  process  that  tries   to   launch,   or   give   MPI   constant
198       MPI_ERRCODES_IGNORE  for  the errcodes argument.  While this appears to
199       be a contradiction, it is per the MPI-2 standard.  :-\
200
201       Frequently, an application wishes to chooses a process count so  as  to
202       fill all processors available to a job.  MPI indicates the maximum num‐
203       ber of processes recommended for a job in  the  pre-defined  attribute,
204       MPI_UNIVERSE_SIZE , which is cached on MPI_COMM_WORLD .
205
206       The  typical  usage  is to subtract the value of MPI_UNIVERSE_SIZE from
207       the number of processes currently in the job and spawn the  difference.
208       LAM sets MPI_UNIVERSE_SIZE to the number of CPUs in the user's LAM ses‐
209       sion (as defined in the boot schema [bhost(5)] via lamboot (1)).
210
211       See MPI_Init(3) for other pre-defined attributes that are helpful  when
212       spawning.
213
214       Locating an Executable Program
215
216       The  executable  program  file must be located on the node(s) where the
217       process(es) will run.  On any node, the directories  specified  by  the
218       user's PATH environment variable are searched to find the program.
219
220       All  MPI runtime options selected by mpirun (1) in the initial applica‐
221       tion launch remain in effect for all child  processes  created  by  the
222       spawn functions.
223
224       Command-line Arguments
225
226       The  argv  parameter  to  MPI_Comm_spawn should not contain the program
227       name since it is given in the first parameter.  The command  line  that
228       is  passed  to the newly launched program will be the program name fol‐
229       lowed by the strings in argv .
230
231
232

USAGE WITH IMPI EXTENSIONS

234       The IMPI standard only supports MPI-1 functions.  Hence, this  function
235       is currently not designed to operate within an IMPI job.
236
237

ERRORS

239       If an error occurs in an MPI function, the current MPI error handler is
240       called to handle it.  By default, this error  handler  aborts  the  MPI
241       job.   The  error  handler may be changed with MPI_Errhandler_set ; the
242       predefined error handler MPI_ERRORS_RETURN may be used to  cause  error
243       values  to  be  returned  (in C and Fortran; this error handler is less
244       useful in with the C++ MPI  bindings.   The  predefined  error  handler
245       MPI::ERRORS_THROW_EXCEPTIONS  should  be used in C++ if the error value
246       needs to be recovered).  Note that MPI does not guarantee that  an  MPI
247       program can continue past an error.
248
249       All  MPI  routines  (except  MPI_Wtime  and MPI_Wtick ) return an error
250       value; C routines as the value of the function and Fortran routines  in
251       the  last  argument.  The C++ bindings for MPI do not return error val‐
252       ues; instead, error values are communicated by throwing  exceptions  of
253       type  MPI::Exception  (but not by default).  Exceptions are only thrown
254       if the error value is not MPI::SUCCESS .
255
256
257       Note that if the MPI::ERRORS_RETURN handler is set in  C++,  while  MPI
258       functions  will  return  upon an error, there will be no way to recover
259       what the actual error value was.
260       MPI_SUCCESS
261              - No error; MPI routine completed successfully.
262       MPI_ERR_COMM
263              - Invalid communicator.  A common error is to use a null  commu‐
264              nicator in a call (not even allowed in MPI_Comm_rank ).
265       MPI_ERR_SPAWN
266              -  Spawn error; one or more of the applications attempting to be
267              launched failed.  Check the returned error code array.
268       MPI_ERR_ARG
269              - Invalid argument.  Some argument is invalid and is not identi‐
270              fied  by  a  specific  error  class.   This  is typically a NULL
271              pointer or other such error.
272       MPI_ERR_ROOT
273              - Invalid root.  The root must be specified as  a  rank  in  the
274              communicator.   Ranks  must  be between zero and the size of the
275              communicator minus one.
276       MPI_ERR_OTHER
277              - Other error; use  MPI_Error_string  to  get  more  information
278              about this error code.
279       MPI_ERR_INTERN
280              -  An  internal error has been detected.  This is fatal.  Please
281              send a bug report to the LAM mailing list  (see  http://www.lam-
282              mpi.org/contact.php ).
283       MPI_ERR_NO_MEM
284              -  This  error class is associated with an error code that indi‐
285              cates that free space is exhausted.
286
287

MORE INFORMATION

296       For more information, please see the official MPI Forum web site, which
297       contains the text of both the MPI-1 and MPI-2 standards.   These  docu‐
298       ments  contain  detailed  information  about each MPI function (most of
299       which is not duplicated in these man pages).
300
301       http://www.mpi-forum.org/
302
303
304

ACKNOWLEDGEMENTS

306       The LAM Team would like the thank the MPICH Team for the handy  program
307       to  generate  man  pages ("doctext" from ftp://ftp.mcs.anl.gov/pub/sow‐
308       ing/sowing.tar.gz ), the initial formatting, and some initial text  for
309       most of the MPI-1 man pages.
310

LOCATION

312       spawn.c
313
314
315
316LAM/MPI 7.1.2                      3/10/2006                 MPI_Comm_spawn(3)