1MPI_Comm_spawn(3) LAM/MPI MPI_Comm_spawn(3)
2
3
4
6 MPI_Comm_spawn - Spawn a dynamic MPI process
7
9 #include <mpi.h>
10 int
11 MPI_Comm_spawn(char* command, char** argv, int maxprocs, MPI_Info info,
12 int root, MPI_Comm comm, MPI_Comm *intercomm,
13 int *errcodes)
14
16 command
17 - Name of program to spawn (only significant at root)
18 argv - arguments to command (only significant at root)
19 maxprocs
20 - max number of processes to start (only significant at root)
21 info - startup hints
22 root - rank of process to perform the spawn
23 comm - parent intracommunicator
24
25
27 intercomm
28 - child intercommunicator containing spawned processes
29 errcodes
30 - one code per process
31
32
34 A group of processes can create another group of processes with
35 MPI_Comm_spawn . This function is a collective operation over the par‐
36 ent communicator. The child group starts up like any MPI application.
37 The processes must begin by calling MPI_Init , after which the pre-
38 defined communicator, MPI_COMM_WORLD , may be used. This world commu‐
39 nicator contains only the child processes. It is distinct from the
40 MPI_COMM_WORLD of the parent processes.
41
42 MPI_Comm_spawn_multiple is used to manually specify a group of differ‐
43 ent executables and arguments to spawn. MPI_Comm_spawn is used to
44 specify one executable and set of arguments (although a LAM/MPI app‐
45 schema(5) can be provided to MPI_Comm_spawn via the "lam_file" info
46 key).
47
48 Communication With Spawned Processes
49
50 The natural communication mechanism between two groups is the intercom‐
51 municator. The second communicator argument to MPI_Comm_spawn returns
52 an intercommunicator whose local group contains the parent processes
53 (same as the first communicator argument) and whose remote group con‐
54 tains child processes. The child processes can access the same inter‐
55 communicator by using the MPI_Comm_get_parent call. The remote group
56 size of the parent communicator is zero if the process was created by
57 mpirun (1) instead of one of the spawn functions. Both groups can
58 decide to merge the intercommunicator into an intracommunicator (with
59 the MPI_Intercomm_merge function) and take advantage of other MPI col‐
60 lective operations. They can then use the merged intracommunicator to
61 create new communicators and reach other processes in the MPI applica‐
62 tion.
63
64 Resource Allocation
65
66 LAM/MPI offers some MPI_Info keys for the placement of spawned applica‐
67 tions. Keys are looked for in the order listed below. The first key
68 that is found is used; any remaining keys are ignored.
69
70 lam_spawn_file
71
72 The value of this key can be the filename of an appschema(1). This
73 allows the programmer to specify an arbitrary set of LAM CPUs or nodes
74 to spawn MPI processes on. In this case, only the appschema is used to
75 spawn the application; command , argv , and maxprocs are all ignored
76 (even at the root). Note that even though maxprocs is ignored,
77 errcodes must still be an array long enough to hold an integer error
78 code for every process that tried to launch, or be the MPI constant
79 MPI_ERRCODES_IGNORE . Also note that MPI_Comm_spawn_multiple does not
80 accept the "lam_spawn_file" info key. As such, the "lam_spawn_file"
81 info key to MPI_Comm_spawn is mainly intended to spawn MPMD applica‐
82 tions and/or specify an arbitrary number of nodes to run on.
83
84 Also note that this "lam_spawn_file" key is not portable to other MPI
85 implementations; it is a LAM/MPI-specific info key. If specifying
86 exact LAM nodes or CPUs is not necessary, users should probably use
87 MPI_Comm_spawn_multiple to make their program more portable.
88
89 file
90
91 This key is a synonym for "lam_spawn_file". Since "file" is not a LAM-
92 specific name, yet this key carries a LAM-specific meaning, its use is
93 deprecated in favor of "lam_spawn_file".
94
95 lam_spawn_sched_round_robin
96
97 The value of this key is a string representing a LAM CPU or node (using
98 standard LAM nomenclature -- see mpirun(1)) to begin spawning on. The
99 use of this key allows the programmer to indicate which node/CPU for
100 LAM to start spawning on without having to write out a temporary app
101 schema file.
102
103 The CPU number is relative to the boot schema given to lamboot(1).
104 Only a single LAM node/CPU may be specified, such as "n3" or "c1". If
105 a node is specified, LAM will spawn one MPI process per node. If a CPU
106 is specified, LAM will scedule one MPI process per CPU. An error is
107 returned if "N" or "C" is used.
108
109 Note that LAM is not involved with run-time scheduling of the MPI
110 process -- LAM only spawns processes on indicated nodes. The operating
111 system schedules these processes for executation just like any other
112 process. No attempt is made by LAM to bind processes to CPUs. Hence,
113 the "cX" nomenclature is just a convenicence mechanism to inidicate how
114 many MPI processes should be spawned on a given node; it is not indica‐
115 tive of operating system scheduling.
116
117 For "nX" values, the first MPI process will be spawned on the indicated
118 node. The remaining (maxprocs - 1) MPI processes will be spawned on
119 successive nodes. Specifically, if X is the starting node number,
120 process i will be launched on "nK", where K = ((X + i) % total_nodes).
121 LAM will modulus the node number with the total number of nodes in the
122 current LAM universe to prevent errors, thereby creating a "wraparound"
123 effect. Hence, this mechanism can be used for round-robin scheduling,
124 regardless of how many nodes are in the LAM universe.
125
126 For "cX" values, the algorithm is essentially the same, except that LAM
127 will resolve "cX" to a specific node before spawning, and successive
128 processes are spawned on the node where "cK" resides, where K = ((X +
129 i) % total_cpus).
130
131 For example, if there are 8 nodes and 16 CPUs in the current LAM uni‐
132 verse (2 CPUs per node), a "lam_spawn_sched_round_robin" key is given
133 with the value of "c14", and maxprocs is 4, LAM will spawn MPI
134
136 CPU Node MPI_COMM_WORLD rank
137 --- ---- -------------------
138 c14 n7 0
139 c15 n7 1
140 c0 n0 2
141 c1 n0 3
142
143
144 lam_no_root_node_schedule
145
146 This key is used to designate that the spawned processes must not be
147 spawned or scheduled on the "root node" (the node doing the spawn).
148 There is no specific value associated with this key, but it should be
149 given some non-null/non-empty dummy value.
150
151 It is a node-specific key and not a CPU-specific one. Hence if the root
152 node has multiple CPUs, none of the CPUs on this root node will take
153 part in the scheduling of the spawned processes.
154
155 No keys given
156
157 If none of the info keys listed above are used, the value of
158 MPI_INFO_NULL should be given for info (all other keys are ignored,
159 anyway - there is no harm in providing other keys). In this case, LAM
160 schedules the given number of processes onto LAM nodes by starting with
161 CPU 0 (or the lowest numbered CPU), and continuing through higher CPU
162 numbers, placing one process on each CPU. If the process count is
163 greater than the CPU count, the procedure repeats.
164
165 Predefined Attributes
166
167 The pre-defined attribute on MPI_COMM_WORLD , MPI_UNIVERSE_SIZE , can
168 be useful in determining how many CPUs are currently unused. For exam‐
169 ple, the value in MPI_UNIVERSE_SIZE is the number of CPUs that LAM was
170 booted with (see MPI_Init(1)). Subtracting the size of MPI_COMM_WORLD
171 from this value returns the number of CPUs in the current LAM universe
172 that the current application is not using (and are therefore likely not
173 being used).
174
175 Process Terminiation
176
177 Note that the process[es] spawned by MPI_COMM_SPAWN (and
178 MPI_COMM_SPAWN_MULTIPLE ) effectively become orphans. That is, the
179 spawnning MPI application does not wait for the spawned application to
180 finish. Hence, there is no guarantee the spawned application has fin‐
181 ished when the spawning completes. Similarly, killing the spawning
182 application will also have no effect on the spawned application.
183
184 User applications can effect this kind of behavior with MPI_BARRIER
185 between the spawning and spawned processed before MPI_FINALIZE .
186
187
188 Note that lamclean will kill *all* MPI processes.
189
190 Process Count
191
192 The maxprocs parameter to MPI_Comm_spawn specifies the exact number of
193 processes to be started. If it is not possible to start the desired
194 number of processes, MPI_Comm_spawn will return an error code. Note
195 that even though maxprocs is only relevant on the root, all ranks must
196 have an errcodes array long enough to handle an integer error code for
197 every process that tries to launch, or give MPI constant
198 MPI_ERRCODES_IGNORE for the errcodes argument. While this appears to
199 be a contradiction, it is per the MPI-2 standard. :-\
200
201 Frequently, an application wishes to chooses a process count so as to
202 fill all processors available to a job. MPI indicates the maximum num‐
203 ber of processes recommended for a job in the pre-defined attribute,
204 MPI_UNIVERSE_SIZE , which is cached on MPI_COMM_WORLD .
205
206 The typical usage is to subtract the value of MPI_UNIVERSE_SIZE from
207 the number of processes currently in the job and spawn the difference.
208 LAM sets MPI_UNIVERSE_SIZE to the number of CPUs in the user's LAM ses‐
209 sion (as defined in the boot schema [bhost(5)] via lamboot (1)).
210
211 See MPI_Init(3) for other pre-defined attributes that are helpful when
212 spawning.
213
214 Locating an Executable Program
215
216 The executable program file must be located on the node(s) where the
217 process(es) will run. On any node, the directories specified by the
218 user's PATH environment variable are searched to find the program.
219
220 All MPI runtime options selected by mpirun (1) in the initial applica‐
221 tion launch remain in effect for all child processes created by the
222 spawn functions.
223
224 Command-line Arguments
225
226 The argv parameter to MPI_Comm_spawn should not contain the program
227 name since it is given in the first parameter. The command line that
228 is passed to the newly launched program will be the program name fol‐
229 lowed by the strings in argv .
230
231
232
234 The IMPI standard only supports MPI-1 functions. Hence, this function
235 is currently not designed to operate within an IMPI job.
236
237
239 If an error occurs in an MPI function, the current MPI error handler is
240 called to handle it. By default, this error handler aborts the MPI
241 job. The error handler may be changed with MPI_Errhandler_set ; the
242 predefined error handler MPI_ERRORS_RETURN may be used to cause error
243 values to be returned (in C and Fortran; this error handler is less
244 useful in with the C++ MPI bindings. The predefined error handler
245 MPI::ERRORS_THROW_EXCEPTIONS should be used in C++ if the error value
246 needs to be recovered). Note that MPI does not guarantee that an MPI
247 program can continue past an error.
248
249 All MPI routines (except MPI_Wtime and MPI_Wtick ) return an error
250 value; C routines as the value of the function and Fortran routines in
251 the last argument. The C++ bindings for MPI do not return error val‐
252 ues; instead, error values are communicated by throwing exceptions of
253 type MPI::Exception (but not by default). Exceptions are only thrown
254 if the error value is not MPI::SUCCESS .
255
256
257 Note that if the MPI::ERRORS_RETURN handler is set in C++, while MPI
258 functions will return upon an error, there will be no way to recover
259 what the actual error value was.
260 MPI_SUCCESS
261 - No error; MPI routine completed successfully.
262 MPI_ERR_COMM
263 - Invalid communicator. A common error is to use a null commu‐
264 nicator in a call (not even allowed in MPI_Comm_rank ).
265 MPI_ERR_SPAWN
266 - Spawn error; one or more of the applications attempting to be
267 launched failed. Check the returned error code array.
268 MPI_ERR_ARG
269 - Invalid argument. Some argument is invalid and is not identi‐
270 fied by a specific error class. This is typically a NULL
271 pointer or other such error.
272 MPI_ERR_ROOT
273 - Invalid root. The root must be specified as a rank in the
274 communicator. Ranks must be between zero and the size of the
275 communicator minus one.
276 MPI_ERR_OTHER
277 - Other error; use MPI_Error_string to get more information
278 about this error code.
279 MPI_ERR_INTERN
280 - An internal error has been detected. This is fatal. Please
281 send a bug report to the LAM mailing list (see http://www.lam-
282 mpi.org/contact.php ).
283 MPI_ERR_NO_MEM
284 - This error class is associated with an error code that indi‐
285 cates that free space is exhausted.
286
287
289 appschema(5), bhost(5), lamboot(1), MPI_Comm_get_parent(3), MPI_Inter‐
290 comm_merge(3), MPI_Comm_spawn_multiple(3), MPI_Info_create(3),
291 MPI_Info_set(3), MPI_Info_delete(3), MPI_Info_free(3), MPI_Init(3),
292 mpirun(1)
293
294
296 For more information, please see the official MPI Forum web site, which
297 contains the text of both the MPI-1 and MPI-2 standards. These docu‐
298 ments contain detailed information about each MPI function (most of
299 which is not duplicated in these man pages).
300
301 http://www.mpi-forum.org/
302
303
304
306 The LAM Team would like the thank the MPICH Team for the handy program
307 to generate man pages ("doctext" from ftp://ftp.mcs.anl.gov/pub/sow‐
308 ing/sowing.tar.gz ), the initial formatting, and some initial text for
309 most of the MPI-1 man pages.
310
312 spawn.c
313
314
315
316LAM/MPI 7.1.2 3/10/2006 MPI_Comm_spawn(3)