prte-map(1)

1prte-map(1)                          PRTE                          prte-map(1)
2
3
4

NAME

6       PRTE: Mapping, Ranking, and Binding
7

SYNOPSIS

9       PRTE  employs  a  three-phase procedure for assigning process locations
10       and ranks:
11
12       1. mapping: Assigns a default location to each process
13
14       2. ranking: Assigns a unique rank value to each process
15
16       3. binding: Constrains each process to run on specific processors
17
18       This document describes these three phases with examples.  Unless  oth‐
19       erwise noted, this behavior is shared by prun, prterun, and prte.
20

QUICK SUMMARY

22       The  two binaries that most influence process layout are prte and prun.
23       The prte process discovers the allocation, starts the daemons, and  de‐
24       fines  the  default  mapping/ranking/binding  for  all  jobs.  The prun
25       process defines the specific  mapping/ranking/binding  for  a  specific
26       job.  Most of the command line controls are targeted to prun since each
27       job has its own unique requirements.
28
29       prterun is just a wrapper around prte for a single job PRTE DVM.  It is
30       doing  the job of both prte and prun, and, as such, accepts the sum all
31       of their command line arguments.  Any example that uses prun  can  sub‐
32       stitute the use of prterun except where otherwise noted.
33
34       The  prte  process  attempts to automatically discover the nodes in the
35       allocation by querying supported resource managers.  If a  support  re‐
36       source  manager  is not present then prte relies on a hostfile provided
37       by the user.  In the absence of such a hostfile it will  run  all  pro‐
38       cesses on the localhost.
39
40       If  running  under  a supported resource manager, the prte process will
41       start the daemon processes (prted) on the remote nodes using the corre‐
42       sponding  resource  manager  process  starter.   If  no such starter is
43       available then rsh or ssh is used.
44
45       PRTE automatically maps processes in a round-robin fashion by CPU  slot
46       in one of two ways in the absence of any further directives:
47
48       Map by core:
49              when the number of total processes in the job is <= 2
50
51       Map by package:
52              when the number of total processes in the job is > 2
53
54       PRTE automatically binds processes.  Three binding patterns are used in
55       the absence of any further directives:
56
57       Bind to core:
58              when the number of total processes in the job is <= 2
59
60       Bind to package:
61              when the number of total processes in the job is > 2
62
63       Bind to none:
64              when oversubscribed
65
66       If your application uses threads, then you probably want to ensure that
67       you  are  either  not  bound  at all (by specifying --bind-to none), or
68       bound to multiple cores using an appropriate binding level or  specific
69       number of processing elements per application process.
70
71       PRTE  automatically  ranks processes starting from 0.  Two ranking pat‐
72       terns are used in the absence of any further directives:
73
74       Rank by slot:
75              when the number of total processes in the job is <= 2
76
77       Rank by package:
78              when the number of total processes in the job is > 2
79

OPTIONS

81       Listed here are the subset of command line options that will be used in
82       the process mapping/ranking/binding discussion in this manual page.
83
84   Specifying Host Nodes
85       Use  one of the following options to specify which hosts (nodes) within
86       the PRTE DVM environment to run on.
87
88       --host <host1,host2,...,hostN> or --host <host1:X,host2:Y,...,hostN:Z>
89              List of hosts on which to invoke processes.  After each hostname
90              a colon (:) followed by a positive integer can be used to speci‐
91              fy the number of slots on that host (:X, :Y, and :Z).   The  de‐
92              fault is 1.
93
94       --hostfile <hostfile>
95              Provide a hostfile to use.
96
97       --machinefile <machinefile>
98              Synonym for -hostfile.
99
100       --default-hostfile <hostfile>
101              Provide a default hostfile to use.
102
103   Process Mapping / Ranking / Binding Options
104       The  following options specify the number of processes to launch.  Note
105       that none of the options imply a particular binding policy - e.g.,  re‐
106       questing  N processes for each socket does not imply that the processes
107       will be bound to the socket.
108
109       -c, -n, --n, --np <#>
110              Run this many copies of the program on the  given  nodes.   This
111              option  indicates  that the specified file is an executable pro‐
112              gram and not an application context.  If no  value  is  provided
113              for  the  number of copies to execute (i.e., neither the -np nor
114              its synonyms are provided on the command line), prun will  auto‐
115              matically  execute  a  copy  of the program on each process slot
116              (see below for description of a “process slot”).  This  feature,
117              however,  can  only be used in the SPMD model and will return an
118              error (without beginning execution of  the  application)  other‐
119              wise.
120
121       To map processes across sets of objects:
122
123       --map-by <object>
124              Map  to  the  specified  object.  See defaults in Quick Summary.
125              Supported  options  include  slot,  hwthread,   core,   l1cache,
126              l2cache, l3cache, package, node, seq, dist, ppr, and rankfile.
127
128       Any object can include qualifier by adding a colon (:) and any combina‐
129       tion of one or more of the following to the --map-by option:
130
131       • PE=n bind n processing elements to each process
132
133       • SPAN load balance the processes across the allocation
134
135       • OVERSUBSCRIBE allow more processes on a node than processing elements
136
137       • NOOVERSUBSCRIBE means !OVERSUBSCRIBE
138
139       • NOLOCAL do not launch processes on the same node as prun
140
141       • HWTCPUS use hardware threads as CPU slots
142
143       • CORECPUS use cores as CPU slots (default)
144
145       • DEVICE=dev device specifier for the dist policy
146
147       • INHERIT
148
149       • NOINHERIT means !INHERIT
150
151       • PE-LIST=a,b comma-delimited ranges of cpus to use for this  job  pro‐
152         cessed as an unordered pool of CPUs
153
154       • FILE=%s (path to file containing sequential or rankfile entries).
155
156       ppr  policy  example:  --map-by  ppr:N:<object> will launch N times the
157       number of objects of the specified type on each node.
158
159       To order processes’ ranks:
160
161       --rank-by <object>
162              Rank in round-robin fashion according to the  specified  object.
163              See  defaults in Quick Summary.  Supported options include slot,
164              hwthread, core, l1cache, l2cache, l3cache, package, and node.
165
166       Any object can include qualifiers by adding a colon (:) and any  combi‐
167       nation of one or more of the following to the --rank-by option:
168
169       • SPAN
170
171       • FILL
172
173       To bind processes to sets of objects:
174
175       --bind-to <object>
176              Bind  processes  to the specified object.  See defaults in Quick
177              Summary.   Supported  options  include  none,  hwthread,   core,
178              l1cache, l2cache, l3cache, and package.
179
180       Any  object can include qualifiers by adding a colon (:) and any combi‐
181       nation of one or more of the following to the --bind-to option:
182
183       • overload-allowed allows for binding more than one process in relation
184         to a CPU
185
186       • if-supported if that object is supported on this system
187
188   Diagnostics
189       --map-by :DISPLAY
190              Display a table showing the mapped location of each process pri‐
191              or to launch.
192
193       --map-by :DISPLAYALLOC
194              Display the  detected  allocation  of  resources  (e.g.,  nodes,
195              slots)
196
197       --bind-to :REPORT
198              Report bindings for launched processes to stderr.
199

DESCRIPTION

201       PRTE  employs  a  three-phase procedure for assigning process locations
202       and ranks:
203
204       1. mapping: Assigns a default location to each process
205
206       2. ranking: Assigns a unique rank value to each process
207
208       3. binding: Constrains each process to run on specific processors
209
210       The first phase of mapping is used to assign a default location to each
211       process based on the mapper being employed.  Mapping by slot, node, and
212       sequentially results in the assignment of the  processes  to  the  node
213       level.  In contrast, mapping by object, allows the mapper to assign the
214       process to an actual object on each node.
215
216       Note: The location assigned to the process is independent of  where  it
217       will  be  bound - the assignment is used solely as input to the binding
218       algorithm.
219
220       The second phase focuses on the ranking of the process within the job’s
221       namespace.   PRTE  separates  this  from the mapping procedure to allow
222       more flexibility in the relative placement of processes.
223
224       The third phase of binding actually binds each process to a  given  set
225       of processors.  This can improve performance if the operating system is
226       placing processes sub-optimally.  For example, it  might  oversubscribe
227       some multi-core processor sockets, leaving other sockets idle; this can
228       lead processes to contend unnecessarily for common resources.   Or,  it
229       might spread processes out too widely; this can be suboptimal if appli‐
230       cation performance is sensitive to  interprocess  communication  costs.
231       Binding can also keep the operating system from migrating processes ex‐
232       cessively, regardless of how optimally those processes were  placed  to
233       begin with.
234
235       PRTE’s  support for process binding depends on the underlying operating
236       system.  Therefore, certain process binding options may not  be  avail‐
237       able on every system.
238
239   Specifying Host Nodes
240       Host nodes can be identified on the command line with the --host option
241       or in a hostfile.
242
243       For example, assuming no other resource manager  or  scheduler  is  in‐
244       volved,
245
246       prte --host aa,aa,bb ./a.out
247              launches two processes on node aa and one on bb.
248
249       prun --host aa ./a.out
250              launches one process on node aa.
251
252       prun --host aa:5 ./a.out
253              launches five processes on node aa.
254
255       Or, consider the hostfile
256
257              $ cat myhostfile
258              aa slots=2
259              bb slots=2
260              cc slots=2
261
262       Here,  we  list  both the host names (aa, bb, and cc) but also how many
263       “slots” there are for each.  Slots indicate how many processes can  po‐
264       tentially execute on a node.  For best performance, the number of slots
265       may be chosen to be the number of cores on the node or  the  number  of
266       processor sockets.
267
268       If  the  hostfile does not provide slots information, the PRTE DVM will
269       attempt to discover the number of cores (or hwthreads, if the  :HWTCPUS
270       qualifier to the --map-by option is set) and set the number of slots to
271       that value.
272
273       Examples using the hostfile above with and without the --host option
274
275       prun --hostfile myhostfile ./a.out
276              will launch two processes on each of the three nodes.
277
278       prun --hostfile myhostfile --host aa ./a.out
279              will launch two processes, both on node aa.
280
281       prun --hostfile myhostfile --host dd ./a.out
282              will find no hosts to run on and abort with an error.  That  is,
283              the specified host dd is not in the specified hostfile.
284
285       When  running under resource managers (e.g., SLURM, Torque, etc.), PRTE
286       will obtain both the hostnames and the number of  slots  directly  from
287       the  resource  manger.  The behavior of --host in that environment will
288       behave the same as if a hostfile was provided (since it is provided  by
289       the resource manager).
290
291   Specifying Number of Processes
292       As  we  have just seen, the number of processes to run can be set using
293       the hostfile.  Other mechanisms exist.
294
295       The number of processes launched can be specified as a multiple of  the
296       number  of nodes or processor sockets available.  Consider the hostfile
297       below for the examples that follow.
298
299              $ cat myhostfile
300              aa
301              bb
302
303       For example,
304
305       prun --hostfile myhostfile --map-by ppr:2:package ./a.out
306              launches processes 0-3 on node aa and process 4-7  on  node  bb,
307              where  aa  and  bb  are  both  dual-package nodes.  The --map-by
308              ppr:2:package option also turns on the --bind-to package option,
309              which is discussed in a later section.
310
311       prun --hostfile myhostfile --map-by ppr:2:node ./a.out
312              launches processes 0-1 on node aa and processes 2-3 on node bb.
313
314       prun --hostfile myhostfile --map-by ppr:1:node ./a.out
315              launches one process per host node.
316
317       Another alternative is to specify the number of processes with the --np
318       option.  Consider now the hostfile
319
320              $ cat myhostfile
321              aa slots=4
322              bb slots=4
323              cc slots=4
324
325       Now,
326
327       prun --hostfile myhostfile --np 6 ./a.out
328              will launch processes 0-3 on node aa and processes 4-5  on  node
329              bb.   The remaining slots in the hostfile will not be used since
330              the -np  option  indicated  that  only  6  processes  should  be
331              launched.
332
333   Mapping Processes to Nodes: Using Policies
334       The  examples above illustrate the default mapping of process processes
335       to  nodes.   This  mapping  can  also  be   controlled   with   various
336       prun/prterun options that describe mapping policies.
337
338              $ cat myhostfile
339              aa slots=4
340              bb slots=4
341              cc slots=4
342
343       Consider the hostfile above, with --np 6:
344
345                                            node aa      node bb      node cc
346              prun                          0 1 2 3      4 5
347              prun --map-by node            0 1          2 3          4 5
348              prun --map-by node:NOLOCAL                 0 1 2        3 4 5
349
350       The  --map-by  node  option  will load balance the processes across the
351       available nodes, numbering each process in a round-robin fashion.
352
353       The :NOLOCAL qualifier to --map-by prevents any  processes  from  being
354       mapped onto the local host (in this case node aa).  While prun typical‐
355       ly consumes few system resources, the :NOLOCAL qualifier can be helpful
356       for  launching  very large jobs where prun may actually need to use no‐
357       ticeable amounts of memory and/or processing time.
358
359       Just as --np can specify fewer processes than there are slots,  it  can
360       also oversubscribe the slots.  For example, with the same hostfile:
361
362       prun --hostfile myhostfile --np 14 ./a.out
363              will  produce an error since the default :NOOVERSUBSCRIBE quali‐
364              fier to --map-by prevents oversubscription.
365
366       To oversubscribe the nodes you can use the :OVERSUBSCRIBE qualifier  to
367       --map-by:
368
369       prun --hostfile myhostfile --np 14 --map-by :OVERSUBSCRIBE ./a.out
370              will  launch  processes  0-5 on node aa, 6-9 on bb, and 10-13 on
371              cc.
372
373       Limits to oversubscription can also be specified in the hostfile itself
374       with the max_slots field:
375
376              % cat myhostfile
377              aa slots=4 max_slots=4
378              bb         max_slots=8
379              cc slots=4
380
381       The  max_slots  field  specifies such a limit.  When it does, the slots
382       value defaults to the limit.  Now:
383
384       prun --hostfile myhostfile --np 14 --map-by :OVERSUBSCRIBE ./a.out
385              causes the first 12 processes to be launched as before, but  the
386              remaining  two processes will be forced onto node cc.  The other
387              two nodes are protected by the hostfile against oversubscription
388              by this job.
389
390       Using  the :NOOVERSUBSCRIBE qualifier to --map-by option can be helpful
391       since the PRTE DVM currently does not get “max_slots” values  from  the
392       resource manager.
393
394       Of course, --np can also be used with the --host option.  For example,
395
396       prun --host aa,bb --np 8 ./a.out
397              will  produce an error since the default :NOOVERSUBSCRIBE quali‐
398              fier to --map-by prevents oversubscription.
399
400       prun --host aa,bb --np 8 --map-by :OVERSUBSCRIBE ./a.out
401              launches 8 processes.  Since only two hosts are specified, after
402              the first two processes are mapped, one to aa and one to bb, the
403              remaining processes oversubscribe the specified hosts evenly.
404
405       prun --host aa:2,bb:6 --np 8 ./a.out
406              launches 8 processes.  Processes 0-1 on node aa since it  has  2
407              slots and processes 2-7 on node bb since it has 6 slots.
408
409       And here is a MIMD example:
410
411       prun --host aa --np 1 hostname : --host bb,cc --np 2 uptime
412              will  launch process 0 running hostname on node aa and processes
413              1 and 2 each running uptime on nodes bb and cc, respectively.
414
415   Mapping, Ranking, and Binding: Fundamentals
416       The mapping of process processes to nodes can be defined not just  with
417       general  policies but also, if necessary, using arbitrary mappings that
418       cannot be described by a simple policy.  One can  use  the  “sequential
419       mapper,”  which reads the hostfile line by line, assigning processes to
420       nodes in whatever order the  hostfile  specifies.   Use  the  --prtemca
421       rmaps seq option.
422
423       For example, using the hostfile below:
424
425              % cat myhostfile
426              aa slots=4
427              bb slots=4
428              cc slots=4
429
430       The command below will launch three processes, one on each of nodes aa,
431       bb, and cc, respectively.  The slot counts don’t matter; one process is
432       launched per line on whatever node is listed on the line.
433
434              % prun --hostfile myhostfile --prtemca rmaps seq ./a.out
435
436       The  ranking  phase  is  best  illustrated by considering the following
437       hostfile and test cases we used the --map-by ppr:2:package option:
438
439              % cat myhostfile
440              aa
441              bb
442
443                                       node aa       node bb
444              --rank-by core           0 1 ! 2 3     4 5 ! 6 7
445              --rank-by package        0 2 ! 1 3     4 6 ! 5 7
446              --rank-by package:SPAN   0 4 ! 1 5     2 6 ! 3 7
447
448       Ranking by core and by slot provide the identical  result  -  a  simple
449       progression  of  ranks  across  each  node.   Ranking by package does a
450       round-robin ranking within each node until all processes have been  as‐
451       signed  a rank, and then progresses to the next node.  Adding the :SPAN
452       qualifier to the ranking directive  causes  the  ranking  algorithm  to
453       treat  the  entire  allocation  as  a single entity - thus, the process
454       ranks are assigned across all sockets before circling  back  around  to
455       the beginning.
456
457       The  binding  phase  restricts  the  process to a subset of the CPU re‐
458       sources on the node.
459
460       The processors to be used for binding can be  identified  in  terms  of
461       topological  groupings  -  e.g.,  binding  to an l3cache will bind each
462       process to all processors within the scope of a single L3 cache  within
463       their  assigned location.  Thus, if a process is assigned by the mapper
464       to a certain package, then a --bind-to l3cache directive will cause the
465       process  to  be  bound  to  the processors that share a single L3 cache
466       within that package.
467
468       To help balance loads, the binding directive uses a round-robin  method
469       when  binding  to  levels  lower than used in the mapper.  For example,
470       consider the case where a job is mapped to the package level, and  then
471       bound  to  core.  Each package will have multiple cores, so if multiple
472       processes are mapped to a given package, the binding algorithm will as‐
473       sign  each  process  located  to a package to a unique core in a round-
474       robin manner.
475
476       Alternatively, processes mapped by l2cache and then  bound  to  package
477       will  simply  be  bound to all the processors in the package where they
478       are located.  In this manner, users can  exert  detailed  control  over
479       relative process location and binding.
480
481       Process  mapping/ranking/binding  can  also be set with MCA parameters.
482       Their usage is less convenient than that of the command  line  options.
483       On  the other hand, MCA parameters can be set not only on the prun com‐
484       mand line, but alternatively in a system or user  mca-params.conf  file
485       or  as  environment  variables,  as described in the MCA section below.
486       Some examples include:
487
488              prun option          MCA parameter key           value
489              --map-by core        rmaps_base_mapping_policy   core
490              --map-by package     rmaps_base_mapping_policy   package
491              --rank-by core       rmaps_base_ranking_policy   core
492              --bind-to core       hwloc_base_binding_policy   core
493              --bind-to package    hwloc_base_binding_policy   package
494              --bind-to none       hwloc_base_binding_policy   none
495
496   Difference between overloading and oversubscription
497       This section explores the difference between these two options.   Users
498       are  often  confused by the difference between these two scenarios.  As
499       such this section provides a number of scenarios to help illustrate the
500       differences.
501
502       • --map-by  :OVERSUBSCRIBE allow more processes on a node than process‐
503         ing elements
504
505       • --bind-to <object>:overload-allowed allows for binding more than  one
506         process in relation to a CPU
507
508       The  important thing to remember with oversubscribing is that it can be
509       defined separately from the actual number of CPUs on a node.  This  al‐
510       lows  the  mapper  to place more or fewer processes per node than CPUs.
511       By default, PRTE uses cores to determine slots in the absence  of  such
512       information provided in the hostfile or by the resource manager (except
513       in the case of the --host as described in the “Specifying  Host  Nodes”
514       section).
515
516       The  important thing to remember with overloading is that it is defined
517       as binding more processes than CPUs.  By default, PRTE uses cores as  a
518       means  of  counting  the  number of CPUs.  However, the user can adjust
519       this.  For example when using the :HWTCPUS qualifier  to  the  --map-by
520       option PRTE will use hardware threads as a means of counting the number
521       of CPUs.
522
523       For the following examples consider a node with: - Two processor  pack‐
524       ages, - Ten cores per package, and - Eight hardware threads per core.
525
526       Consider the node from above with the hostfile below:
527
528              $ cat myhostfile
529              node01 slots=32
530              node02 slots=32
531
532       The  “slots”  tells  PRTE  that  it can place up to 32 processes before
533       oversubscribing the node.
534
535       If we run the following:
536
537              prun --np 34 --hostfile myhostfile --map-by core --bind-to core hostname
538
539       It will return an error at the binding time indicating  an  overloading
540       scenario.
541
542       The  mapping  mechanism  assigns  32  processes  to node01 matching the
543       “slots” specification in the hostfile.  The binding mechanism will bind
544       the  first  20  processes  to unique cores leaving it with 12 processes
545       that it cannot bind without overloading one of the cores (putting  more
546       than one process on the core).
547
548       Using the overload-allowed qualifier to the --bind-to core option tells
549       PRTE that it may assign more than one process to a core.
550
551       If we run the following:
552
553              prun --np 34 --hostfile myhostfile --map-by core --bind-to core:overload-allowed hostname
554
555       This will run correctly placing 32 processes on node01, and 2 processes
556       on  node02.  On node01 two processes are bound to cores 0-11 accounting
557       for the overloading of those cores.
558
559       Alternatively, we could use hardware threads to give  binding  a  lower
560       level CPU to bind to without overloading.
561
562       If we run the following:
563
564              prun --np 34 --hostfile myhostfile --map-by core:HWTCPUS --bind-to hwthread hostname
565
566       This will run correctly placing 32 processes on node01, and 2 processes
567       on node02.  On node01 two processes are mapped to cores 0-11 but  bound
568       to  different  hardware  threads  on those cores (the logical first and
569       second hardware thread) thus no  hardware  threads  are  overloaded  at
570       binding time.
571
572       In both of the examples above the node is not oversubscribed at mapping
573       time because the hostfile set the oversubscription limit to  “slots=32”
574       for  each  node.   It is only after we exceed that limit that PRTE will
575       throw an oversubscription error.
576
577       Consider next if we ran the following:
578
579              prun --np 66 --hostfile myhostfile --map-by core:HWTCPUS --bind-to hwthread hostname
580
581       This will return an error at mapping time indicating  an  oversubscrip‐
582       tion  scenario.  The mapping mechanism will assign all of the available
583       slots (64 across 2 nodes) and be left two processes to map.   The  only
584       way  to  map those processes is to exceed the number of available slots
585       putting the job into an oversubscription scenario.
586
587       You can force PRTE to oversubscribe the nodes by  using  the  :OVERSUB‐
588       SCRIBE qualifier to the --map-by option as seen in the example below:
589
590              prun --np 66 --hostfile myhostfile --map-by core:HWTCPUS:OVERSUBSCRIBE --bind-to hwthread hostname
591
592       This  will  run  correctly  placing  34  processes  on node01 and 32 on
593       node02.  Each process is bound to a unique hardware thread.
594
595   Overloading vs Oversubscription: Package Example
596       Let’s extend these examples by considering the package level.  Consider
597       the same node as before, but with the hostfile below:
598
599              $ cat myhostfile
600              node01 slots=22
601              node02 slots=22
602
603       The  lowest  level  CPUs are `cores' and we have 20 total (10 per pack‐
604       age).
605
606       If we run:
607
608              prun --np 20 --hostfile myhostfile --map-by package --bind-to package:REPORT hostname
609
610       Then 10 processes are mapped to each package, and bound at the  package
611       level.  This is not overloading since we have 10 CPUs (cores) available
612       in the package at the hardware level.
613
614       However, if we run:
615
616              prun --np 21 --hostfile myhostfile --map-by package --bind-to package:REPORT hostname
617
618       Then 11 processes are mapped to the first package and 10 to the  second
619       package.  At binding time we have an overloading scenario because there
620       are only 10 CPUs (cores) available in the package at the hardware  lev‐
621       el.  So the first package is overloaded.
622
623   Overloading vs Oversubscription: Hardware Threads Example
624       Similarly, if we consider hardware threads.
625
626       Consider the same node as before, but with the hostfile below:
627
628              $ cat myhostfile
629              node01 slots=165
630              node02 slots=165
631
632       The  lowest level CPUs are `hwthreads' (because we are going to use the
633       :HWTCPUS qualifier) and we have 160 total (80 per package).
634
635       If we re-run (from the package example) and add the :HWTCPUS qualifier:
636
637              prun --np 21 --hostfile myhostfile --map-by package:HWTCPUS --bind-to package:REPORT hostname
638
639       Without the :HWTCPUS qualifier this would be  overloading  (as  we  saw
640       previously).   The  mapper places 11 processes on the first package and
641       10 to the second package.  The processes are still bound to the package
642       level.   However,  with  the  :HWTCPUS qualifier, it is not overloading
643       since we have 80 CPUs (hwthreads) available in the package at the hard‐
644       ware level.
645
646       Alternatively, if we run:
647
648              prun --np 161 --hostfile myhostfile --map-by package:HWTCPUS --bind-to package:REPORT hostname
649
650       Then  81 processes are mapped to the first package and 80 to the second
651       package.  At binding time we have an overloading scenario because there
652       are  only  80 CPUs (hwthreads) available in the package at the hardware
653       level.  So the first package is overloaded.
654
655   Diagnostics
656       PRTE provides various diagnostic reports that aid the user in verifying
657       and tuning the mapping/ranking/binding for a specific job.
658
659       The  :REPORT  qualifier to --bind-to command line option can be used to
660       report process bindings.
661
662       As an example, consider a node with: - Two processor packages,  -  Four
663       cores per package, and - Eight hardware threads per core.
664
665       In  each of the examples below the binding is reported in a human read‐
666       able format.
667
668              $ prun --np 4 --map-by core --bind-to core:REPORT ./a.out
669              [node01:103137] MCW rank 0 bound to package[0][core:0]
670              [node01:103137] MCW rank 1 bound to package[0][core:1]
671              [node01:103137] MCW rank 2 bound to package[0][core:2]
672              [node01:103137] MCW rank 3 bound to package[0][core:3]
673
674       The example above processes bind to successive cores on the first pack‐
675       age.
676
677              $ prun --np 4 --map-by package --bind-to package:REPORT ./a.out
678              [node01:103115] MCW rank 0 bound to package[0][core:0-9]
679              [node01:103115] MCW rank 1 bound to package[1][core:10-19]
680              [node01:103115] MCW rank 2 bound to package[0][core:0-9]
681              [node01:103115] MCW rank 3 bound to package[1][core:10-19]
682
683       The  example  above processes bind to all cores on successive packages.
684       The processes cycle though the packages in  a  round-robin  fashion  as
685       many times as are needed.
686
687              $ prun --np 4 --map-by package:PE=2 --bind-to core:REPORT ./a.out
688              [node01:103328] MCW rank 0 bound to package[0][core:0-1]
689              [node01:103328] MCW rank 1 bound to package[1][core:10-11]
690              [node01:103328] MCW rank 2 bound to package[0][core:2-3]
691              [node01:103328] MCW rank 3 bound to package[1][core:12-13]
692
693       The  example  above  shows us that 2 cores have been bound per process.
694       The :PE=2 qualifier states that 2 processing  elements  underneath  the
695       package (which would be cores in this case) are mapped to each process.
696       The processes cycle though the packages in  a  round-robin  fashion  as
697       many times as are needed.
698
699              $ prun --np 4 --map-by core:PE=2:HWTCPUS --bind-to :REPORT  hostname
700              [node01:103506] MCW rank 0 bound to package[0][hwt:0-1]
701              [node01:103506] MCW rank 1 bound to package[0][hwt:8-9]
702              [node01:103506] MCW rank 2 bound to package[0][hwt:16-17]
703              [node01:103506] MCW rank 3 bound to package[0][hwt:24-25]
704
705       The  example above shows us that 2 hardware threads have been bound per
706       process.  In this case prun is mapping by  hardware  threads  since  we
707       used the :HWTCPUS qualifier.  Without that qualifier this command would
708       return an error since by default prun will not map to resources smaller
709       than a core.  The :PE=2 qualifier states that 2 processing elements un‐
710       derneath the core (which would be hardware threads in  this  case)  are
711       mapped  to  each  process.   The  processes cycle though the cores in a
712       round-robin fashion as many times as are needed.
713
714              $ prun --np 4 --bind-to none:REPORT  hostname
715              [node01:107126] MCW rank 0 is not bound (or bound to all available processors)
716              [node01:107126] MCW rank 1 is not bound (or bound to all available processors)
717              [node01:107126] MCW rank 2 is not bound (or bound to all available processors)
718              [node01:107126] MCW rank 3 is not bound (or bound to all available processors)
719
720       The example above binding is turned off.
721
722   Rankfiles
723       Another way to specify arbitrary mappings is  with  a  rankfile,  which
724       gives you detailed control over process binding as well.
725
726       Rankfiles  are  text  files that specify detailed information about how
727       individual processes should be mapped to nodes, and  to  which  proces‐
728       sor(s) they should be bound.  Each line of a rankfile specifies the lo‐
729       cation of one process.  The general form of each line in  the  rankfile
730       is:
731
732              rank <N>=<hostname> slot=<slot list>
733
734       For example:
735
736              $ cat myrankfile
737              rank 0=c712f6n01 slot=10-12
738              rank 1=c712f6n02 slot=0,1,4
739              rank 2=c712f6n03 slot=1-2
740              $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out
741
742       Means that
743
744              Rank 0 runs on node aa, bound to logical cores 10-12.
745              Rank 1 runs on node bb, bound to logical cores 0, 1, and 4.
746              Rank 2 runs on node cc, bound to logical cores 1 and 2.
747
748       For example:
749
750              $ cat myrankfile
751              rank 0=aa slot=1:0-2
752              rank 1=bb slot=0:0,1,4
753              rank 2=cc slot=1-2
754              $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out
755
756       Means that
757
758              Rank 0 runs on node aa, bound to logical package 1, cores 10-12 (the 0th through 2nd cores on that package).
759              Rank 1 runs on node bb, bound to logical package 0, cores 0, 1, and 4.
760              Rank 2 runs on node cc, bound to logical cores 1 and 2.
761
762       The  hostnames listed above are “absolute,” meaning that actual resolv‐
763       able hostnames are specified.  However, hostnames can also be specified
764       as “relative,” meaning that they are specified in relation to an exter‐
765       nally-specified list of hostnames (e.g., by prun’s --host  argument,  a
766       hostfile, or a job scheduler).
767
768       The  “relative” specification is of the form “+n<X>”, where X is an in‐
769       teger specifying the Xth hostname in the set  of  all  available  host‐
770       names, indexed from 0.  For example:
771
772              $ cat myrankfile
773              rank 0=+n0 slot=10-12
774              rank 1=+n1 slot=0,1,4
775              rank 2=+n2 slot=1-2
776              $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out
777
778       All  package/core  slot  locations are be specified as logical indexes.
779       You can use tools such as HWLOC’s “lstopo” to find the logical  indexes
780       of packages and cores.
781
782   Deprecated Options
783       These deprecated options will be removed in a future release.
784
785       --bind-to-core
786              (Deprecated: Use --bind-to core) Bind processes to cores
787
788       -bind-to-socket, --bind-to-socket
789              (Deprecated:  Use --bind-to package) Bind processes to processor
790              sockets
791
792       --bycore
793              (Deprecated: Use --map-by core) Map processes by core
794
795       -bynode, --bynode
796              (Deprecated: Use --map-by node) Launch processes one  per  node,
797              cycling by node in a round-robin fashion.  This spreads process‐
798              es evenly among nodes and assigns ranks in  a  round-robin,  “by
799              node” manner.
800
801       --byslot
802              (Deprecated:  Use  --map-by  slot) Map and rank processes round-
803              robin by slot.
804
805       --cpus-per-proc <#perproc>
806              (Deprecated: Use --map-by <obj>:PE=<#perproc>) Bind each process
807              to the specified number of cpus.
808
809       --cpus-per-rank <#perrank>
810              (Deprecated: Use --map-by <obj>:PE=<#perrank>) Alias for --cpus-
811              per-proc.
812
813       --display-allocation
814              (Deprecated: Use --map-by :DISPLAYALLOC)  Display  the  detected
815              resource allocation.
816
817       --display-devel-map
818              (Deprecated:  Use  --map-by  :DISPLAYDEVEL)  Display  a detailed
819              process map (mostly intended for developers) just before launch.
820
821       --display-map
822              (Deprecated: Use --map-by :DISPLAY) Display a table showing  the
823              mapped location of each process prior to launch.
824
825       --display-topo
826              (Deprecated:  Use --map-by :DISPLAYTOPO) Display the topology as
827              part of the process map (mostly intended  for  developers)  just
828              before launch.
829
830       --do-not-launch
831              (Deprecated:  Use  --map-by  :DONOTLAUNCH) Perform all necessary
832              operations to prepare to launch the application, but do not  ac‐
833              tually launch it (usually used to test mapping patterns).
834
835       --do-not-resolve
836              (Deprecated:  Use  --map-by :DONOTRESOLVE) Do not attempt to re‐
837              solve interfaces - usually used to  determine  proposed  process
838              placement/binding prior to obtaining an allocation.
839
840       -N <num>
841              (Deprecated:  Use  --map-by prr:<num>:node) Launch num processes
842              per node on all allocated nodes.
843
844       --nolocal
845              (Deprecated: Use --map-by :NOLOCAL) Do not run any copies of the
846              launched  application on the same node as prun is running.  This
847              option will override listing the localhost with  --host  or  any
848              other host-specifying mechanism.
849
850       --nooversubscribe
851              (Deprecated: Use --map-by :NOOVERSUBSCRIBE) Do not oversubscribe
852              any nodes; error (without starting any  processes)  if  the  re‐
853              quested  number of processes would cause oversubscription.  This
854              option implicitly sets “max_slots” equal to  the  “slots”  value
855              for each node.  (Enabled by default).
856
857       --npernode <#pernode>
858              (Deprecated:  Use  --map-by  ppr:<#pernode>:node)  On each node,
859              launch this many processes.
860
861       --npersocket <#persocket>
862              (Deprecated: Use  --map-by  ppr:<#perpackage>:package)  On  each
863              node,  launch  this many processes times the number of processor
864              sockets on the node.  The --npersocket option also turns on  the
865              --bind-to  socket option.  The term socket has been globally re‐
866              placed with package.
867
868       --oversubscribe
869              (Deprecated: Use --map-by :OVERSUBSCRIBE) Nodes are  allowed  to
870              be  oversubscribed, even on a managed system, and overloading of
871              processing elements.
872
873       --pernode
874              (Deprecated: Use --map-by ppr:1:node) On each node,  launch  one
875              process.
876
877       --ppr  (Deprecated:  Use  --map-by  ppr:<list>) Comma-separated list of
878              number of processes on a given resource type [default: none].
879
880       --rankfile <FILENAME>
881              (Deprecated: Use --map-by rankfile:FILE=<FILENAME>) Use a  rank‐
882              file for mapping/ranking/binding
883
884       --report-bindings
885              (Deprecated:  Use  --bind-to  :REPORT)  Report  any bindings for
886              launched processes.
887
888       --tag-output
889              (Deprecated:  Use  --map-by  :TAGOUTPUT)  Tag  all  output  with
890              [job,rank]
891
892       --timestamp-output
893              (Deprecated: Use --map-by :TIMESTAMPOUTPUT) Timestamp all appli‐
894              cation process output
895
896       --use-hwthread-cpus
897              (Deprecated: Use --map-by :HWTCPUS) Use hardware threads as  in‐
898              dependent cpus.
899
900       --xml  (Deprecated:  Use --map-by :XMLOUTPUT) Provide all output in XML
901              format
902
903
904
9052021-06-29                                                         prte-map(1)