1prte-map(1) PRTE prte-map(1)
2
3
4
6 PRTE: Mapping, Ranking, and Binding
7
9 PRTE employs a three-phase procedure for assigning process locations
10 and ranks:
11
12 1. mapping: Assigns a default location to each process
13
14 2. ranking: Assigns a unique rank value to each process
15
16 3. binding: Constrains each process to run on specific processors
17
18 This document describes these three phases with examples. Unless oth‐
19 erwise noted, this behavior is shared by prun, prterun, and prte.
20
22 The two binaries that most influence process layout are prte and prun.
23 The prte process discovers the allocation, starts the daemons, and de‐
24 fines the default mapping/ranking/binding for all jobs. The prun
25 process defines the specific mapping/ranking/binding for a specific
26 job. Most of the command line controls are targeted to prun since each
27 job has its own unique requirements.
28
29 prterun is just a wrapper around prte for a single job PRTE DVM. It is
30 doing the job of both prte and prun, and, as such, accepts the sum all
31 of their command line arguments. Any example that uses prun can sub‐
32 stitute the use of prterun except where otherwise noted.
33
34 The prte process attempts to automatically discover the nodes in the
35 allocation by querying supported resource managers. If a support re‐
36 source manager is not present then prte relies on a hostfile provided
37 by the user. In the absence of such a hostfile it will run all pro‐
38 cesses on the localhost.
39
40 If running under a supported resource manager, the prte process will
41 start the daemon processes (prted) on the remote nodes using the corre‐
42 sponding resource manager process starter. If no such starter is
43 available then rsh or ssh is used.
44
45 PRTE automatically maps processes in a round-robin fashion by CPU slot
46 in one of two ways in the absence of any further directives:
47
48 Map by core:
49 when the number of total processes in the job is <= 2
50
51 Map by package:
52 when the number of total processes in the job is > 2
53
54 PRTE automatically binds processes. Three binding patterns are used in
55 the absence of any further directives:
56
57 Bind to core:
58 when the number of total processes in the job is <= 2
59
60 Bind to package:
61 when the number of total processes in the job is > 2
62
63 Bind to none:
64 when oversubscribed
65
66 If your application uses threads, then you probably want to ensure that
67 you are either not bound at all (by specifying --bind-to none), or
68 bound to multiple cores using an appropriate binding level or specific
69 number of processing elements per application process.
70
71 PRTE automatically ranks processes starting from 0. Two ranking pat‐
72 terns are used in the absence of any further directives:
73
74 Rank by slot:
75 when the number of total processes in the job is <= 2
76
77 Rank by package:
78 when the number of total processes in the job is > 2
79
81 Listed here are the subset of command line options that will be used in
82 the process mapping/ranking/binding discussion in this manual page.
83
84 Specifying Host Nodes
85 Use one of the following options to specify which hosts (nodes) within
86 the PRTE DVM environment to run on.
87
88 --host <host1,host2,...,hostN> or --host <host1:X,host2:Y,...,hostN:Z>
89 List of hosts on which to invoke processes. After each hostname
90 a colon (:) followed by a positive integer can be used to speci‐
91 fy the number of slots on that host (:X, :Y, and :Z). The de‐
92 fault is 1.
93
94 --hostfile <hostfile>
95 Provide a hostfile to use.
96
97 --machinefile <machinefile>
98 Synonym for -hostfile.
99
100 --default-hostfile <hostfile>
101 Provide a default hostfile to use.
102
103 Process Mapping / Ranking / Binding Options
104 The following options specify the number of processes to launch. Note
105 that none of the options imply a particular binding policy - e.g., re‐
106 questing N processes for each socket does not imply that the processes
107 will be bound to the socket.
108
109 -c, -n, --n, --np <#>
110 Run this many copies of the program on the given nodes. This
111 option indicates that the specified file is an executable pro‐
112 gram and not an application context. If no value is provided
113 for the number of copies to execute (i.e., neither the -np nor
114 its synonyms are provided on the command line), prun will auto‐
115 matically execute a copy of the program on each process slot
116 (see below for description of a “process slot”). This feature,
117 however, can only be used in the SPMD model and will return an
118 error (without beginning execution of the application) other‐
119 wise.
120
121 To map processes across sets of objects:
122
123 --map-by <object>
124 Map to the specified object. See defaults in Quick Summary.
125 Supported options include slot, hwthread, core, l1cache,
126 l2cache, l3cache, package, node, seq, dist, ppr, and rankfile.
127
128 Any object can include qualifier by adding a colon (:) and any combina‐
129 tion of one or more of the following to the --map-by option:
130
131 • PE=n bind n processing elements to each process
132
133 • SPAN load balance the processes across the allocation
134
135 • OVERSUBSCRIBE allow more processes on a node than processing elements
136
137 • NOOVERSUBSCRIBE means !OVERSUBSCRIBE
138
139 • NOLOCAL do not launch processes on the same node as prun
140
141 • HWTCPUS use hardware threads as CPU slots
142
143 • CORECPUS use cores as CPU slots (default)
144
145 • DEVICE=dev device specifier for the dist policy
146
147 • INHERIT
148
149 • NOINHERIT means !INHERIT
150
151 • PE-LIST=a,b comma-delimited ranges of cpus to use for this job pro‐
152 cessed as an unordered pool of CPUs
153
154 • FILE=%s (path to file containing sequential or rankfile entries).
155
156 ppr policy example: --map-by ppr:N:<object> will launch N times the
157 number of objects of the specified type on each node.
158
159 To order processes’ ranks:
160
161 --rank-by <object>
162 Rank in round-robin fashion according to the specified object.
163 See defaults in Quick Summary. Supported options include slot,
164 hwthread, core, l1cache, l2cache, l3cache, package, and node.
165
166 Any object can include qualifiers by adding a colon (:) and any combi‐
167 nation of one or more of the following to the --rank-by option:
168
169 • SPAN
170
171 • FILL
172
173 To bind processes to sets of objects:
174
175 --bind-to <object>
176 Bind processes to the specified object. See defaults in Quick
177 Summary. Supported options include none, hwthread, core,
178 l1cache, l2cache, l3cache, and package.
179
180 Any object can include qualifiers by adding a colon (:) and any combi‐
181 nation of one or more of the following to the --bind-to option:
182
183 • overload-allowed allows for binding more than one process in relation
184 to a CPU
185
186 • if-supported if that object is supported on this system
187
188 Diagnostics
189 --map-by :DISPLAY
190 Display a table showing the mapped location of each process pri‐
191 or to launch.
192
193 --map-by :DISPLAYALLOC
194 Display the detected allocation of resources (e.g., nodes,
195 slots)
196
197 --bind-to :REPORT
198 Report bindings for launched processes to stderr.
199
201 PRTE employs a three-phase procedure for assigning process locations
202 and ranks:
203
204 1. mapping: Assigns a default location to each process
205
206 2. ranking: Assigns a unique rank value to each process
207
208 3. binding: Constrains each process to run on specific processors
209
210 The first phase of mapping is used to assign a default location to each
211 process based on the mapper being employed. Mapping by slot, node, and
212 sequentially results in the assignment of the processes to the node
213 level. In contrast, mapping by object, allows the mapper to assign the
214 process to an actual object on each node.
215
216 Note: The location assigned to the process is independent of where it
217 will be bound - the assignment is used solely as input to the binding
218 algorithm.
219
220 The second phase focuses on the ranking of the process within the job’s
221 namespace. PRTE separates this from the mapping procedure to allow
222 more flexibility in the relative placement of processes.
223
224 The third phase of binding actually binds each process to a given set
225 of processors. This can improve performance if the operating system is
226 placing processes sub-optimally. For example, it might oversubscribe
227 some multi-core processor sockets, leaving other sockets idle; this can
228 lead processes to contend unnecessarily for common resources. Or, it
229 might spread processes out too widely; this can be suboptimal if appli‐
230 cation performance is sensitive to interprocess communication costs.
231 Binding can also keep the operating system from migrating processes ex‐
232 cessively, regardless of how optimally those processes were placed to
233 begin with.
234
235 PRTE’s support for process binding depends on the underlying operating
236 system. Therefore, certain process binding options may not be avail‐
237 able on every system.
238
239 Specifying Host Nodes
240 Host nodes can be identified on the command line with the --host option
241 or in a hostfile.
242
243 For example, assuming no other resource manager or scheduler is in‐
244 volved,
245
246 prte --host aa,aa,bb ./a.out
247 launches two processes on node aa and one on bb.
248
249 prun --host aa ./a.out
250 launches one process on node aa.
251
252 prun --host aa:5 ./a.out
253 launches five processes on node aa.
254
255 Or, consider the hostfile
256
257 $ cat myhostfile
258 aa slots=2
259 bb slots=2
260 cc slots=2
261
262 Here, we list both the host names (aa, bb, and cc) but also how many
263 “slots” there are for each. Slots indicate how many processes can po‐
264 tentially execute on a node. For best performance, the number of slots
265 may be chosen to be the number of cores on the node or the number of
266 processor sockets.
267
268 If the hostfile does not provide slots information, the PRTE DVM will
269 attempt to discover the number of cores (or hwthreads, if the :HWTCPUS
270 qualifier to the --map-by option is set) and set the number of slots to
271 that value.
272
273 Examples using the hostfile above with and without the --host option
274
275 prun --hostfile myhostfile ./a.out
276 will launch two processes on each of the three nodes.
277
278 prun --hostfile myhostfile --host aa ./a.out
279 will launch two processes, both on node aa.
280
281 prun --hostfile myhostfile --host dd ./a.out
282 will find no hosts to run on and abort with an error. That is,
283 the specified host dd is not in the specified hostfile.
284
285 When running under resource managers (e.g., SLURM, Torque, etc.), PRTE
286 will obtain both the hostnames and the number of slots directly from
287 the resource manger. The behavior of --host in that environment will
288 behave the same as if a hostfile was provided (since it is provided by
289 the resource manager).
290
291 Specifying Number of Processes
292 As we have just seen, the number of processes to run can be set using
293 the hostfile. Other mechanisms exist.
294
295 The number of processes launched can be specified as a multiple of the
296 number of nodes or processor sockets available. Consider the hostfile
297 below for the examples that follow.
298
299 $ cat myhostfile
300 aa
301 bb
302
303 For example,
304
305 prun --hostfile myhostfile --map-by ppr:2:package ./a.out
306 launches processes 0-3 on node aa and process 4-7 on node bb,
307 where aa and bb are both dual-package nodes. The --map-by
308 ppr:2:package option also turns on the --bind-to package option,
309 which is discussed in a later section.
310
311 prun --hostfile myhostfile --map-by ppr:2:node ./a.out
312 launches processes 0-1 on node aa and processes 2-3 on node bb.
313
314 prun --hostfile myhostfile --map-by ppr:1:node ./a.out
315 launches one process per host node.
316
317 Another alternative is to specify the number of processes with the --np
318 option. Consider now the hostfile
319
320 $ cat myhostfile
321 aa slots=4
322 bb slots=4
323 cc slots=4
324
325 Now,
326
327 prun --hostfile myhostfile --np 6 ./a.out
328 will launch processes 0-3 on node aa and processes 4-5 on node
329 bb. The remaining slots in the hostfile will not be used since
330 the -np option indicated that only 6 processes should be
331 launched.
332
333 Mapping Processes to Nodes: Using Policies
334 The examples above illustrate the default mapping of process processes
335 to nodes. This mapping can also be controlled with various
336 prun/prterun options that describe mapping policies.
337
338 $ cat myhostfile
339 aa slots=4
340 bb slots=4
341 cc slots=4
342
343 Consider the hostfile above, with --np 6:
344
345 node aa node bb node cc
346 prun 0 1 2 3 4 5
347 prun --map-by node 0 1 2 3 4 5
348 prun --map-by node:NOLOCAL 0 1 2 3 4 5
349
350 The --map-by node option will load balance the processes across the
351 available nodes, numbering each process in a round-robin fashion.
352
353 The :NOLOCAL qualifier to --map-by prevents any processes from being
354 mapped onto the local host (in this case node aa). While prun typical‐
355 ly consumes few system resources, the :NOLOCAL qualifier can be helpful
356 for launching very large jobs where prun may actually need to use no‐
357 ticeable amounts of memory and/or processing time.
358
359 Just as --np can specify fewer processes than there are slots, it can
360 also oversubscribe the slots. For example, with the same hostfile:
361
362 prun --hostfile myhostfile --np 14 ./a.out
363 will produce an error since the default :NOOVERSUBSCRIBE quali‐
364 fier to --map-by prevents oversubscription.
365
366 To oversubscribe the nodes you can use the :OVERSUBSCRIBE qualifier to
367 --map-by:
368
369 prun --hostfile myhostfile --np 14 --map-by :OVERSUBSCRIBE ./a.out
370 will launch processes 0-5 on node aa, 6-9 on bb, and 10-13 on
371 cc.
372
373 Limits to oversubscription can also be specified in the hostfile itself
374 with the max_slots field:
375
376 % cat myhostfile
377 aa slots=4 max_slots=4
378 bb max_slots=8
379 cc slots=4
380
381 The max_slots field specifies such a limit. When it does, the slots
382 value defaults to the limit. Now:
383
384 prun --hostfile myhostfile --np 14 --map-by :OVERSUBSCRIBE ./a.out
385 causes the first 12 processes to be launched as before, but the
386 remaining two processes will be forced onto node cc. The other
387 two nodes are protected by the hostfile against oversubscription
388 by this job.
389
390 Using the :NOOVERSUBSCRIBE qualifier to --map-by option can be helpful
391 since the PRTE DVM currently does not get “max_slots” values from the
392 resource manager.
393
394 Of course, --np can also be used with the --host option. For example,
395
396 prun --host aa,bb --np 8 ./a.out
397 will produce an error since the default :NOOVERSUBSCRIBE quali‐
398 fier to --map-by prevents oversubscription.
399
400 prun --host aa,bb --np 8 --map-by :OVERSUBSCRIBE ./a.out
401 launches 8 processes. Since only two hosts are specified, after
402 the first two processes are mapped, one to aa and one to bb, the
403 remaining processes oversubscribe the specified hosts evenly.
404
405 prun --host aa:2,bb:6 --np 8 ./a.out
406 launches 8 processes. Processes 0-1 on node aa since it has 2
407 slots and processes 2-7 on node bb since it has 6 slots.
408
409 And here is a MIMD example:
410
411 prun --host aa --np 1 hostname : --host bb,cc --np 2 uptime
412 will launch process 0 running hostname on node aa and processes
413 1 and 2 each running uptime on nodes bb and cc, respectively.
414
415 Mapping, Ranking, and Binding: Fundamentals
416 The mapping of process processes to nodes can be defined not just with
417 general policies but also, if necessary, using arbitrary mappings that
418 cannot be described by a simple policy. One can use the “sequential
419 mapper,” which reads the hostfile line by line, assigning processes to
420 nodes in whatever order the hostfile specifies. Use the --prtemca
421 rmaps seq option.
422
423 For example, using the hostfile below:
424
425 % cat myhostfile
426 aa slots=4
427 bb slots=4
428 cc slots=4
429
430 The command below will launch three processes, one on each of nodes aa,
431 bb, and cc, respectively. The slot counts don’t matter; one process is
432 launched per line on whatever node is listed on the line.
433
434 % prun --hostfile myhostfile --prtemca rmaps seq ./a.out
435
436 The ranking phase is best illustrated by considering the following
437 hostfile and test cases we used the --map-by ppr:2:package option:
438
439 % cat myhostfile
440 aa
441 bb
442
443 node aa node bb
444 --rank-by core 0 1 ! 2 3 4 5 ! 6 7
445 --rank-by package 0 2 ! 1 3 4 6 ! 5 7
446 --rank-by package:SPAN 0 4 ! 1 5 2 6 ! 3 7
447
448 Ranking by core and by slot provide the identical result - a simple
449 progression of ranks across each node. Ranking by package does a
450 round-robin ranking within each node until all processes have been as‐
451 signed a rank, and then progresses to the next node. Adding the :SPAN
452 qualifier to the ranking directive causes the ranking algorithm to
453 treat the entire allocation as a single entity - thus, the process
454 ranks are assigned across all sockets before circling back around to
455 the beginning.
456
457 The binding phase restricts the process to a subset of the CPU re‐
458 sources on the node.
459
460 The processors to be used for binding can be identified in terms of
461 topological groupings - e.g., binding to an l3cache will bind each
462 process to all processors within the scope of a single L3 cache within
463 their assigned location. Thus, if a process is assigned by the mapper
464 to a certain package, then a --bind-to l3cache directive will cause the
465 process to be bound to the processors that share a single L3 cache
466 within that package.
467
468 To help balance loads, the binding directive uses a round-robin method
469 when binding to levels lower than used in the mapper. For example,
470 consider the case where a job is mapped to the package level, and then
471 bound to core. Each package will have multiple cores, so if multiple
472 processes are mapped to a given package, the binding algorithm will as‐
473 sign each process located to a package to a unique core in a round-
474 robin manner.
475
476 Alternatively, processes mapped by l2cache and then bound to package
477 will simply be bound to all the processors in the package where they
478 are located. In this manner, users can exert detailed control over
479 relative process location and binding.
480
481 Process mapping/ranking/binding can also be set with MCA parameters.
482 Their usage is less convenient than that of the command line options.
483 On the other hand, MCA parameters can be set not only on the prun com‐
484 mand line, but alternatively in a system or user mca-params.conf file
485 or as environment variables, as described in the MCA section below.
486 Some examples include:
487
488 prun option MCA parameter key value
489 --map-by core rmaps_base_mapping_policy core
490 --map-by package rmaps_base_mapping_policy package
491 --rank-by core rmaps_base_ranking_policy core
492 --bind-to core hwloc_base_binding_policy core
493 --bind-to package hwloc_base_binding_policy package
494 --bind-to none hwloc_base_binding_policy none
495
496 Difference between overloading and oversubscription
497 This section explores the difference between these two options. Users
498 are often confused by the difference between these two scenarios. As
499 such this section provides a number of scenarios to help illustrate the
500 differences.
501
502 • --map-by :OVERSUBSCRIBE allow more processes on a node than process‐
503 ing elements
504
505 • --bind-to <object>:overload-allowed allows for binding more than one
506 process in relation to a CPU
507
508 The important thing to remember with oversubscribing is that it can be
509 defined separately from the actual number of CPUs on a node. This al‐
510 lows the mapper to place more or fewer processes per node than CPUs.
511 By default, PRTE uses cores to determine slots in the absence of such
512 information provided in the hostfile or by the resource manager (except
513 in the case of the --host as described in the “Specifying Host Nodes”
514 section).
515
516 The important thing to remember with overloading is that it is defined
517 as binding more processes than CPUs. By default, PRTE uses cores as a
518 means of counting the number of CPUs. However, the user can adjust
519 this. For example when using the :HWTCPUS qualifier to the --map-by
520 option PRTE will use hardware threads as a means of counting the number
521 of CPUs.
522
523 For the following examples consider a node with: - Two processor pack‐
524 ages, - Ten cores per package, and - Eight hardware threads per core.
525
526 Consider the node from above with the hostfile below:
527
528 $ cat myhostfile
529 node01 slots=32
530 node02 slots=32
531
532 The “slots” tells PRTE that it can place up to 32 processes before
533 oversubscribing the node.
534
535 If we run the following:
536
537 prun --np 34 --hostfile myhostfile --map-by core --bind-to core hostname
538
539 It will return an error at the binding time indicating an overloading
540 scenario.
541
542 The mapping mechanism assigns 32 processes to node01 matching the
543 “slots” specification in the hostfile. The binding mechanism will bind
544 the first 20 processes to unique cores leaving it with 12 processes
545 that it cannot bind without overloading one of the cores (putting more
546 than one process on the core).
547
548 Using the overload-allowed qualifier to the --bind-to core option tells
549 PRTE that it may assign more than one process to a core.
550
551 If we run the following:
552
553 prun --np 34 --hostfile myhostfile --map-by core --bind-to core:overload-allowed hostname
554
555 This will run correctly placing 32 processes on node01, and 2 processes
556 on node02. On node01 two processes are bound to cores 0-11 accounting
557 for the overloading of those cores.
558
559 Alternatively, we could use hardware threads to give binding a lower
560 level CPU to bind to without overloading.
561
562 If we run the following:
563
564 prun --np 34 --hostfile myhostfile --map-by core:HWTCPUS --bind-to hwthread hostname
565
566 This will run correctly placing 32 processes on node01, and 2 processes
567 on node02. On node01 two processes are mapped to cores 0-11 but bound
568 to different hardware threads on those cores (the logical first and
569 second hardware thread) thus no hardware threads are overloaded at
570 binding time.
571
572 In both of the examples above the node is not oversubscribed at mapping
573 time because the hostfile set the oversubscription limit to “slots=32”
574 for each node. It is only after we exceed that limit that PRTE will
575 throw an oversubscription error.
576
577 Consider next if we ran the following:
578
579 prun --np 66 --hostfile myhostfile --map-by core:HWTCPUS --bind-to hwthread hostname
580
581 This will return an error at mapping time indicating an oversubscrip‐
582 tion scenario. The mapping mechanism will assign all of the available
583 slots (64 across 2 nodes) and be left two processes to map. The only
584 way to map those processes is to exceed the number of available slots
585 putting the job into an oversubscription scenario.
586
587 You can force PRTE to oversubscribe the nodes by using the :OVERSUB‐
588 SCRIBE qualifier to the --map-by option as seen in the example below:
589
590 prun --np 66 --hostfile myhostfile --map-by core:HWTCPUS:OVERSUBSCRIBE --bind-to hwthread hostname
591
592 This will run correctly placing 34 processes on node01 and 32 on
593 node02. Each process is bound to a unique hardware thread.
594
595 Overloading vs Oversubscription: Package Example
596 Let’s extend these examples by considering the package level. Consider
597 the same node as before, but with the hostfile below:
598
599 $ cat myhostfile
600 node01 slots=22
601 node02 slots=22
602
603 The lowest level CPUs are `cores' and we have 20 total (10 per pack‐
604 age).
605
606 If we run:
607
608 prun --np 20 --hostfile myhostfile --map-by package --bind-to package:REPORT hostname
609
610 Then 10 processes are mapped to each package, and bound at the package
611 level. This is not overloading since we have 10 CPUs (cores) available
612 in the package at the hardware level.
613
614 However, if we run:
615
616 prun --np 21 --hostfile myhostfile --map-by package --bind-to package:REPORT hostname
617
618 Then 11 processes are mapped to the first package and 10 to the second
619 package. At binding time we have an overloading scenario because there
620 are only 10 CPUs (cores) available in the package at the hardware lev‐
621 el. So the first package is overloaded.
622
623 Overloading vs Oversubscription: Hardware Threads Example
624 Similarly, if we consider hardware threads.
625
626 Consider the same node as before, but with the hostfile below:
627
628 $ cat myhostfile
629 node01 slots=165
630 node02 slots=165
631
632 The lowest level CPUs are `hwthreads' (because we are going to use the
633 :HWTCPUS qualifier) and we have 160 total (80 per package).
634
635 If we re-run (from the package example) and add the :HWTCPUS qualifier:
636
637 prun --np 21 --hostfile myhostfile --map-by package:HWTCPUS --bind-to package:REPORT hostname
638
639 Without the :HWTCPUS qualifier this would be overloading (as we saw
640 previously). The mapper places 11 processes on the first package and
641 10 to the second package. The processes are still bound to the package
642 level. However, with the :HWTCPUS qualifier, it is not overloading
643 since we have 80 CPUs (hwthreads) available in the package at the hard‐
644 ware level.
645
646 Alternatively, if we run:
647
648 prun --np 161 --hostfile myhostfile --map-by package:HWTCPUS --bind-to package:REPORT hostname
649
650 Then 81 processes are mapped to the first package and 80 to the second
651 package. At binding time we have an overloading scenario because there
652 are only 80 CPUs (hwthreads) available in the package at the hardware
653 level. So the first package is overloaded.
654
655 Diagnostics
656 PRTE provides various diagnostic reports that aid the user in verifying
657 and tuning the mapping/ranking/binding for a specific job.
658
659 The :REPORT qualifier to --bind-to command line option can be used to
660 report process bindings.
661
662 As an example, consider a node with: - Two processor packages, - Four
663 cores per package, and - Eight hardware threads per core.
664
665 In each of the examples below the binding is reported in a human read‐
666 able format.
667
668 $ prun --np 4 --map-by core --bind-to core:REPORT ./a.out
669 [node01:103137] MCW rank 0 bound to package[0][core:0]
670 [node01:103137] MCW rank 1 bound to package[0][core:1]
671 [node01:103137] MCW rank 2 bound to package[0][core:2]
672 [node01:103137] MCW rank 3 bound to package[0][core:3]
673
674 The example above processes bind to successive cores on the first pack‐
675 age.
676
677 $ prun --np 4 --map-by package --bind-to package:REPORT ./a.out
678 [node01:103115] MCW rank 0 bound to package[0][core:0-9]
679 [node01:103115] MCW rank 1 bound to package[1][core:10-19]
680 [node01:103115] MCW rank 2 bound to package[0][core:0-9]
681 [node01:103115] MCW rank 3 bound to package[1][core:10-19]
682
683 The example above processes bind to all cores on successive packages.
684 The processes cycle though the packages in a round-robin fashion as
685 many times as are needed.
686
687 $ prun --np 4 --map-by package:PE=2 --bind-to core:REPORT ./a.out
688 [node01:103328] MCW rank 0 bound to package[0][core:0-1]
689 [node01:103328] MCW rank 1 bound to package[1][core:10-11]
690 [node01:103328] MCW rank 2 bound to package[0][core:2-3]
691 [node01:103328] MCW rank 3 bound to package[1][core:12-13]
692
693 The example above shows us that 2 cores have been bound per process.
694 The :PE=2 qualifier states that 2 processing elements underneath the
695 package (which would be cores in this case) are mapped to each process.
696 The processes cycle though the packages in a round-robin fashion as
697 many times as are needed.
698
699 $ prun --np 4 --map-by core:PE=2:HWTCPUS --bind-to :REPORT hostname
700 [node01:103506] MCW rank 0 bound to package[0][hwt:0-1]
701 [node01:103506] MCW rank 1 bound to package[0][hwt:8-9]
702 [node01:103506] MCW rank 2 bound to package[0][hwt:16-17]
703 [node01:103506] MCW rank 3 bound to package[0][hwt:24-25]
704
705 The example above shows us that 2 hardware threads have been bound per
706 process. In this case prun is mapping by hardware threads since we
707 used the :HWTCPUS qualifier. Without that qualifier this command would
708 return an error since by default prun will not map to resources smaller
709 than a core. The :PE=2 qualifier states that 2 processing elements un‐
710 derneath the core (which would be hardware threads in this case) are
711 mapped to each process. The processes cycle though the cores in a
712 round-robin fashion as many times as are needed.
713
714 $ prun --np 4 --bind-to none:REPORT hostname
715 [node01:107126] MCW rank 0 is not bound (or bound to all available processors)
716 [node01:107126] MCW rank 1 is not bound (or bound to all available processors)
717 [node01:107126] MCW rank 2 is not bound (or bound to all available processors)
718 [node01:107126] MCW rank 3 is not bound (or bound to all available processors)
719
720 The example above binding is turned off.
721
722 Rankfiles
723 Another way to specify arbitrary mappings is with a rankfile, which
724 gives you detailed control over process binding as well.
725
726 Rankfiles are text files that specify detailed information about how
727 individual processes should be mapped to nodes, and to which proces‐
728 sor(s) they should be bound. Each line of a rankfile specifies the lo‐
729 cation of one process. The general form of each line in the rankfile
730 is:
731
732 rank <N>=<hostname> slot=<slot list>
733
734 For example:
735
736 $ cat myrankfile
737 rank 0=c712f6n01 slot=10-12
738 rank 1=c712f6n02 slot=0,1,4
739 rank 2=c712f6n03 slot=1-2
740 $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out
741
742 Means that
743
744 Rank 0 runs on node aa, bound to logical cores 10-12.
745 Rank 1 runs on node bb, bound to logical cores 0, 1, and 4.
746 Rank 2 runs on node cc, bound to logical cores 1 and 2.
747
748 For example:
749
750 $ cat myrankfile
751 rank 0=aa slot=1:0-2
752 rank 1=bb slot=0:0,1,4
753 rank 2=cc slot=1-2
754 $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out
755
756 Means that
757
758 Rank 0 runs on node aa, bound to logical package 1, cores 10-12 (the 0th through 2nd cores on that package).
759 Rank 1 runs on node bb, bound to logical package 0, cores 0, 1, and 4.
760 Rank 2 runs on node cc, bound to logical cores 1 and 2.
761
762 The hostnames listed above are “absolute,” meaning that actual resolv‐
763 able hostnames are specified. However, hostnames can also be specified
764 as “relative,” meaning that they are specified in relation to an exter‐
765 nally-specified list of hostnames (e.g., by prun’s --host argument, a
766 hostfile, or a job scheduler).
767
768 The “relative” specification is of the form “+n<X>”, where X is an in‐
769 teger specifying the Xth hostname in the set of all available host‐
770 names, indexed from 0. For example:
771
772 $ cat myrankfile
773 rank 0=+n0 slot=10-12
774 rank 1=+n1 slot=0,1,4
775 rank 2=+n2 slot=1-2
776 $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out
777
778 All package/core slot locations are be specified as logical indexes.
779 You can use tools such as HWLOC’s “lstopo” to find the logical indexes
780 of packages and cores.
781
782 Deprecated Options
783 These deprecated options will be removed in a future release.
784
785 --bind-to-core
786 (Deprecated: Use --bind-to core) Bind processes to cores
787
788 -bind-to-socket, --bind-to-socket
789 (Deprecated: Use --bind-to package) Bind processes to processor
790 sockets
791
792 --bycore
793 (Deprecated: Use --map-by core) Map processes by core
794
795 -bynode, --bynode
796 (Deprecated: Use --map-by node) Launch processes one per node,
797 cycling by node in a round-robin fashion. This spreads process‐
798 es evenly among nodes and assigns ranks in a round-robin, “by
799 node” manner.
800
801 --byslot
802 (Deprecated: Use --map-by slot) Map and rank processes round-
803 robin by slot.
804
805 --cpus-per-proc <#perproc>
806 (Deprecated: Use --map-by <obj>:PE=<#perproc>) Bind each process
807 to the specified number of cpus.
808
809 --cpus-per-rank <#perrank>
810 (Deprecated: Use --map-by <obj>:PE=<#perrank>) Alias for --cpus-
811 per-proc.
812
813 --display-allocation
814 (Deprecated: Use --map-by :DISPLAYALLOC) Display the detected
815 resource allocation.
816
817 --display-devel-map
818 (Deprecated: Use --map-by :DISPLAYDEVEL) Display a detailed
819 process map (mostly intended for developers) just before launch.
820
821 --display-map
822 (Deprecated: Use --map-by :DISPLAY) Display a table showing the
823 mapped location of each process prior to launch.
824
825 --display-topo
826 (Deprecated: Use --map-by :DISPLAYTOPO) Display the topology as
827 part of the process map (mostly intended for developers) just
828 before launch.
829
830 --do-not-launch
831 (Deprecated: Use --map-by :DONOTLAUNCH) Perform all necessary
832 operations to prepare to launch the application, but do not ac‐
833 tually launch it (usually used to test mapping patterns).
834
835 --do-not-resolve
836 (Deprecated: Use --map-by :DONOTRESOLVE) Do not attempt to re‐
837 solve interfaces - usually used to determine proposed process
838 placement/binding prior to obtaining an allocation.
839
840 -N <num>
841 (Deprecated: Use --map-by prr:<num>:node) Launch num processes
842 per node on all allocated nodes.
843
844 --nolocal
845 (Deprecated: Use --map-by :NOLOCAL) Do not run any copies of the
846 launched application on the same node as prun is running. This
847 option will override listing the localhost with --host or any
848 other host-specifying mechanism.
849
850 --nooversubscribe
851 (Deprecated: Use --map-by :NOOVERSUBSCRIBE) Do not oversubscribe
852 any nodes; error (without starting any processes) if the re‐
853 quested number of processes would cause oversubscription. This
854 option implicitly sets “max_slots” equal to the “slots” value
855 for each node. (Enabled by default).
856
857 --npernode <#pernode>
858 (Deprecated: Use --map-by ppr:<#pernode>:node) On each node,
859 launch this many processes.
860
861 --npersocket <#persocket>
862 (Deprecated: Use --map-by ppr:<#perpackage>:package) On each
863 node, launch this many processes times the number of processor
864 sockets on the node. The --npersocket option also turns on the
865 --bind-to socket option. The term socket has been globally re‐
866 placed with package.
867
868 --oversubscribe
869 (Deprecated: Use --map-by :OVERSUBSCRIBE) Nodes are allowed to
870 be oversubscribed, even on a managed system, and overloading of
871 processing elements.
872
873 --pernode
874 (Deprecated: Use --map-by ppr:1:node) On each node, launch one
875 process.
876
877 --ppr (Deprecated: Use --map-by ppr:<list>) Comma-separated list of
878 number of processes on a given resource type [default: none].
879
880 --rankfile <FILENAME>
881 (Deprecated: Use --map-by rankfile:FILE=<FILENAME>) Use a rank‐
882 file for mapping/ranking/binding
883
884 --report-bindings
885 (Deprecated: Use --bind-to :REPORT) Report any bindings for
886 launched processes.
887
888 --tag-output
889 (Deprecated: Use --map-by :TAGOUTPUT) Tag all output with
890 [job,rank]
891
892 --timestamp-output
893 (Deprecated: Use --map-by :TIMESTAMPOUTPUT) Timestamp all appli‐
894 cation process output
895
896 --use-hwthread-cpus
897 (Deprecated: Use --map-by :HWTCPUS) Use hardware threads as in‐
898 dependent cpus.
899
900 --xml (Deprecated: Use --map-by :XMLOUTPUT) Provide all output in XML
901 format
902
903
904
9052021-06-29 prte-map(1)