1prte-map(1)                          PRRTE                         prte-map(1)
2
3
4

NAME

6       PRTE: Mapping, Ranking, and Binding
7

SYNOPSIS

9       PRTE  employs  a  three-phase procedure for assigning process locations
10       and ranks:
11
12       1. mapping: Assigns a default location to each process
13
14       2. ranking: Assigns a unique rank value to each process
15
16       3. binding: Constrains each process to run on specific processors
17
18       This document describes these three phases with examples.  Unless  oth‐
19       erwise noted, this behavior is shared by prun, prterun, and prte.
20

QUICK SUMMARY

22       The  two binaries that most influence process layout are prte and prun.
23       The prte process discovers the allocation, starts the daemons, and  de‐
24       fines  the  default  mapping/ranking/binding  for  all  jobs.  The prun
25       process defines the specific  mapping/ranking/binding  for  a  specific
26       job.  Most of the command line controls are targeted to prun since each
27       job has its own unique requirements.
28
29       prterun is just a wrapper around prte for a single job PRTE DVM.  It is
30       doing  the job of both prte and prun, and, as such, accepts the sum all
31       of their command line arguments.  Any example that uses prun  can  sub‐
32       stitute the use of prterun except where otherwise noted.
33
34       The  prte  process  attempts to automatically discover the nodes in the
35       allocation by querying supported resource managers.  If a  support  re‐
36       source  manager  is not present then prte relies on a hostfile provided
37       by the user.  In the absence of such a hostfile it will  run  all  pro‐
38       cesses on the localhost.
39
40       If  running  under  a supported resource manager, the prte process will
41       start the daemon processes (prted) on the remote nodes using the corre‐
42       sponding  resource  manager  process  starter.   If  no such starter is
43       available then rsh or ssh is used.
44
45       PRTE automatically maps processes in a round-robin fashion by CPU  slot
46       in one of two ways in the absence of any further directives:
47
48       Map by core:
49              when the number of total processes in the job is <= 2
50
51       Map by NUMA:
52              when the number of total processes in the job is > 2
53
54       PRTE automatically binds processes.  Three binding patterns are used in
55       the absence of any further directives:
56
57       Bind to core:
58              when the number of total processes in the job is <= 2
59
60       Bind to NUMA:
61              when the number of total processes in the job is > 2
62
63       Bind to none:
64              when oversubscribed
65
66       If your application uses threads, then you probably want to ensure that
67       you  are  either  not  bound  at all (by specifying --bind-to none), or
68       bound to multiple cores using an appropriate binding level or  specific
69       number of processing elements per application process.
70
71       PRTE  automatically  ranks processes starting from 0.  Two ranking pat‐
72       terns are used in the absence of any further directives:
73
74       Rank by slot:
75              when the number of total processes in the job is <= 2
76
77       Rank by NUMA:
78              when the number of total processes in the job is > 2
79

OPTIONS

81       Listed here are the subset of command line options that will be used in
82       the process mapping/ranking/binding discussion in this manual page.
83
84   Specifying Host Nodes
85       Use  one of the following options to specify which hosts (nodes) within
86       the PRTE DVM environment to run on.
87
88       --host <host1,host2,...,hostN> or --host <host1:X,host2:Y,...,hostN:Z>
89              List of hosts on which to invoke processes.  After each hostname
90              a colon (:) followed by a positive integer can be used to speci‐
91              fy the number of slots on that host (:X, :Y, and :Z).   The  de‐
92              fault is 1.
93
94       --hostfile <hostfile>
95              Provide a hostfile to use.
96
97       --machinefile <machinefile>
98              Synonym for -hostfile.
99
100       --default-hostfile <hostfile>
101              Provide a default hostfile to use.
102
103   Process Mapping / Ranking / Binding Options
104       The  following options specify the number of processes to launch.  Note
105       that none of the options imply a particular binding policy - e.g.,  re‐
106       questing  N processes for each socket does not imply that the processes
107       will be bound to the socket.
108
109       -c, -n, --n, --np <#>
110              Run this many copies of the program on the  given  nodes.   This
111              option  indicates  that the specified file is an executable pro‐
112              gram and not an application context.  If no  value  is  provided
113              for  the  number of copies to execute (i.e., neither the -np nor
114              its synonyms are provided on the command line), prun will  auto‐
115              matically  execute  a  copy  of the program on each process slot
116              (see below for description of a “process slot”).  This  feature,
117              however,  can  only be used in the SPMD model and will return an
118              error (without beginning execution of  the  application)  other‐
119              wise.
120
121       To map processes across sets of objects:
122
123       --map-by <object>
124              Map  to  the  specified  object.  See defaults in Quick Summary.
125              Supported  options  include  slot,  hwthread,   core,   l1cache,
126              l2cache, l3cache, numa, package, node, seq, dist, ppr, and rank‐
127              file.
128
129       Any object can include qualifier by adding a colon (:) and any combina‐
130       tion of one or more of the following to the --map-by option:
131
132       • PE=n bind n processing elements to each process
133
134       • SPAN load balance the processes across the allocation
135
136       • OVERSUBSCRIBE allow more processes on a node than processing elements
137
138       • NOOVERSUBSCRIBE means !OVERSUBSCRIBE
139
140       • NOLOCAL do not launch processes on the same node as prun
141
142       • HWTCPUS use hardware threads as CPU slots
143
144       • CORECPUS use cores as CPU slots (default)
145
146       • DEVICE=dev device specifier for the dist policy
147
148       • INHERIT
149
150       • NOINHERIT means !INHERIT
151
152       • PE-LIST=a,b  comma-delimited  ranges of cpus to use for this job pro‐
153         cessed as an unordered pool of CPUs
154
155       • FILE=%s (path to file containing sequential or rankfile entries).
156
157       ppr policy example: --map-by ppr:N:<object> will  launch  N  times  the
158       number of objects of the specified type on each node.
159
160       To order processes’ ranks:
161
162       --rank-by <object>
163              Rank  in  round-robin fashion according to the specified object.
164              See defaults in Quick Summary.  Supported options include  slot,
165              hwthread,  core,  l1cache,  l2cache, l3cache, numa, package, and
166              node.
167
168       Any object can include qualifiers by adding a colon (:) and any  combi‐
169       nation of one or more of the following to the --rank-by option:
170
171       • SPAN
172
173       • FILL
174
175       To bind processes to sets of objects:
176
177       --bind-to <object>
178              Bind  processes  to the specified object.  See defaults in Quick
179              Summary.   Supported  options  include  none,  hwthread,   core,
180              l1cache, l2cache, l3cache, numa, and package.
181
182       Any  object can include qualifiers by adding a colon (:) and any combi‐
183       nation of one or more of the following to the --bind-to option:
184
185       • overload-allowed allows for binding more than one process in relation
186         to a CPU
187
188       • if-supported if that object is supported on this system
189
190   Diagnostics
191       --map-by :DISPLAY
192              Display a table showing the mapped location of each process pri‐
193              or to launch.
194
195       --map-by :DISPLAYALLOC
196              Display the  detected  allocation  of  resources  (e.g.,  nodes,
197              slots)
198
199       --bind-to :REPORT
200              Report bindings for launched processes to stderr.
201

DESCRIPTION

203       PRTE  employs  a  three-phase procedure for assigning process locations
204       and ranks:
205
206       1. mapping: Assigns a default location to each process
207
208       2. ranking: Assigns a unique rank value to each process
209
210       3. binding: Constrains each process to run on specific processors
211
212       The first phase of mapping is used to assign a default location to each
213       process based on the mapper being employed.  Mapping by slot, node, and
214       sequentially results in the assignment of the  processes  to  the  node
215       level.  In contrast, mapping by object, allows the mapper to assign the
216       process to an actual object on each node.
217
218       Note: The location assigned to the process is independent of  where  it
219       will  be  bound - the assignment is used solely as input to the binding
220       algorithm.
221
222       The second phase focuses on the ranking of the process within the job’s
223       namespace.   PRTE  separates  this  from the mapping procedure to allow
224       more flexibility in the relative placement of processes.
225
226       The third phase of binding actually binds each process to a  given  set
227       of processors.  This can improve performance if the operating system is
228       placing processes sub-optimally.  For example, it  might  oversubscribe
229       some multi-core processor sockets, leaving other sockets idle; this can
230       lead processes to contend unnecessarily for common resources.   Or,  it
231       might spread processes out too widely; this can be suboptimal if appli‐
232       cation performance is sensitive to  interprocess  communication  costs.
233       Binding can also keep the operating system from migrating processes ex‐
234       cessively, regardless of how optimally those processes were  placed  to
235       begin with.
236
237       PRTE’s  support for process binding depends on the underlying operating
238       system.  Therefore, certain process binding options may not  be  avail‐
239       able on every system.
240
241   Specifying Host Nodes
242       Host nodes can be identified on the command line with the --host option
243       or in a hostfile.
244
245       For example, assuming no other resource manager  or  scheduler  is  in‐
246       volved,
247
248       prte --host aa,aa,bb ./a.out
249              launches two processes on node aa and one on bb.
250
251       prun --host aa ./a.out
252              launches one process on node aa.
253
254       prun --host aa:5 ./a.out
255              launches five processes on node aa.
256
257       Or, consider the hostfile
258
259              $ cat myhostfile
260              aa slots=2
261              bb slots=2
262              cc slots=2
263
264       Here,  we  list  both the host names (aa, bb, and cc) but also how many
265       “slots” there are for each.  Slots indicate how many processes can  po‐
266       tentially execute on a node.  For best performance, the number of slots
267       may be chosen to be the number of cores on the node or  the  number  of
268       processor sockets.
269
270       If  the  hostfile does not provide slots information, the PRTE DVM will
271       attempt to discover the number of cores (or hwthreads, if the  :HWTCPUS
272       qualifier to the --map-by option is set) and set the number of slots to
273       that value.
274
275       Examples using the hostfile above with and without the --host option
276
277       prun --hostfile myhostfile ./a.out
278              will launch two processes on each of the three nodes.
279
280       prun --hostfile myhostfile --host aa ./a.out
281              will launch two processes, both on node aa.
282
283       prun --hostfile myhostfile --host dd ./a.out
284              will find no hosts to run on and abort with an error.  That  is,
285              the specified host dd is not in the specified hostfile.
286
287       When  running under resource managers (e.g., SLURM, Torque, etc.), PRTE
288       will obtain both the hostnames and the number of  slots  directly  from
289       the  resource  manger.  The behavior of --host in that environment will
290       behave the same as if a hostfile was provided (since it is provided  by
291       the resource manager).
292
293   Specifying Number of Processes
294       As  we  have just seen, the number of processes to run can be set using
295       the hostfile.  Other mechanisms exist.
296
297       The number of processes launched can be specified as a multiple of  the
298       number  of nodes or processor sockets available.  Consider the hostfile
299       below for the examples that follow.
300
301              $ cat myhostfile
302              aa
303              bb
304
305       For example,
306
307       prun --hostfile myhostfile --map-by ppr:2:package ./a.out
308              launches processes 0-3 on node aa and process 4-7  on  node  bb,
309              where  aa  and  bb  are  both  dual-package nodes.  The --map-by
310              ppr:2:package option also turns on the --bind-to package option,
311              which is discussed in a later section.
312
313       prun --hostfile myhostfile --map-by ppr:2:node ./a.out
314              launches processes 0-1 on node aa and processes 2-3 on node bb.
315
316       prun --hostfile myhostfile --map-by ppr:1:node ./a.out
317              launches one process per host node.
318
319       Another alternative is to specify the number of processes with the --np
320       option.  Consider now the hostfile
321
322              $ cat myhostfile
323              aa slots=4
324              bb slots=4
325              cc slots=4
326
327       Now,
328
329       prun --hostfile myhostfile --np 6 ./a.out
330              will launch processes 0-3 on node aa and processes 4-5  on  node
331              bb.   The remaining slots in the hostfile will not be used since
332              the -np  option  indicated  that  only  6  processes  should  be
333              launched.
334
335   Mapping Processes to Nodes: Using Policies
336       The  examples above illustrate the default mapping of process processes
337       to  nodes.   This  mapping  can  also  be   controlled   with   various
338       prun/prterun options that describe mapping policies.
339
340              $ cat myhostfile
341              aa slots=4
342              bb slots=4
343              cc slots=4
344
345       Consider the hostfile above, with --np 6:
346
347                                            node aa      node bb      node cc
348              prun                          0 1 2 3      4 5
349              prun --map-by node            0 1          2 3          4 5
350              prun --map-by node:NOLOCAL                 0 1 2        3 4 5
351
352       The  --map-by  node  option  will load balance the processes across the
353       available nodes, numbering each process in a round-robin fashion.
354
355       The :NOLOCAL qualifier to --map-by prevents any  processes  from  being
356       mapped onto the local host (in this case node aa).  While prun typical‐
357       ly consumes few system resources, the :NOLOCAL qualifier can be helpful
358       for  launching  very large jobs where prun may actually need to use no‐
359       ticeable amounts of memory and/or processing time.
360
361       Just as --np can specify fewer processes than there are slots,  it  can
362       also oversubscribe the slots.  For example, with the same hostfile:
363
364       prun --hostfile myhostfile --np 14 ./a.out
365              will  produce an error since the default :NOOVERSUBSCRIBE quali‐
366              fier to --map-by prevents oversubscription.
367
368       To oversubscribe the nodes you can use the :OVERSUBSCRIBE qualifier  to
369       --map-by:
370
371       prun --hostfile myhostfile --np 14 --map-by :OVERSUBSCRIBE ./a.out
372              will  launch  processes  0-5 on node aa, 6-9 on bb, and 10-13 on
373              cc.
374
375       Limits to oversubscription can also be specified in the hostfile itself
376       with the max_slots field:
377
378              % cat myhostfile
379              aa slots=4 max_slots=4
380              bb         max_slots=8
381              cc slots=4
382
383       The  max_slots  field  specifies such a limit.  When it does, the slots
384       value defaults to the limit.  Now:
385
386       prun --hostfile myhostfile --np 14 --map-by :OVERSUBSCRIBE ./a.out
387              causes the first 12 processes to be launched as before, but  the
388              remaining  two processes will be forced onto node cc.  The other
389              two nodes are protected by the hostfile against oversubscription
390              by this job.
391
392       Using  the :NOOVERSUBSCRIBE qualifier to --map-by option can be helpful
393       since the PRTE DVM currently does not get “max_slots” values  from  the
394       resource manager.
395
396       Of course, --np can also be used with the --host option.  For example,
397
398       prun --host aa,bb --np 8 ./a.out
399              will  produce an error since the default :NOOVERSUBSCRIBE quali‐
400              fier to --map-by prevents oversubscription.
401
402       prun --host aa,bb --np 8 --map-by :OVERSUBSCRIBE ./a.out
403              launches 8 processes.  Since only two hosts are specified, after
404              the first two processes are mapped, one to aa and one to bb, the
405              remaining processes oversubscribe the specified hosts evenly.
406
407       prun --host aa:2,bb:6 --np 8 ./a.out
408              launches 8 processes.  Processes 0-1 on node aa since it  has  2
409              slots and processes 2-7 on node bb since it has 6 slots.
410
411       And here is a MIMD example:
412
413       prun --host aa --np 1 hostname : --host bb,cc --np 2 uptime
414              will  launch process 0 running hostname on node aa and processes
415              1 and 2 each running uptime on nodes bb and cc, respectively.
416
417   Mapping, Ranking, and Binding: Fundamentals
418       The mapping of process processes to nodes can be defined not just  with
419       general  policies but also, if necessary, using arbitrary mappings that
420       cannot be described by a simple policy.  One can  use  the  “sequential
421       mapper,”  which reads the hostfile line by line, assigning processes to
422       nodes in whatever order the  hostfile  specifies.   Use  the  --prtemca
423       rmaps seq option.
424
425       For example, using the hostfile below:
426
427              % cat myhostfile
428              aa slots=4
429              bb slots=4
430              cc slots=4
431
432       The command below will launch three processes, one on each of nodes aa,
433       bb, and cc, respectively.  The slot counts don’t matter; one process is
434       launched per line on whatever node is listed on the line.
435
436              % prun --hostfile myhostfile --prtemca rmaps seq ./a.out
437
438       The  ranking  phase  is  best  illustrated by considering the following
439       hostfile and test cases we used the --map-by ppr:2:package option:
440
441              % cat myhostfile
442              aa
443              bb
444
445                                       node aa       node bb
446              --rank-by core           0 1 ! 2 3     4 5 ! 6 7
447              --rank-by package        0 2 ! 1 3     4 6 ! 5 7
448              --rank-by package:SPAN   0 4 ! 1 5     2 6 ! 3 7
449
450       Ranking by core and by slot provide the identical  result  -  a  simple
451       progression  of  ranks  across  each  node.   Ranking by package does a
452       round-robin ranking within each node until all processes have been  as‐
453       signed  a rank, and then progresses to the next node.  Adding the :SPAN
454       qualifier to the ranking directive  causes  the  ranking  algorithm  to
455       treat  the  entire  allocation  as  a single entity - thus, the process
456       ranks are assigned across all sockets before circling  back  around  to
457       the beginning.
458
459       The  binding  phase  restricts  the  process to a subset of the CPU re‐
460       sources on the node.
461
462       The processors to be used for binding can be  identified  in  terms  of
463       topological  groupings  -  e.g.,  binding  to an l3cache will bind each
464       process to all processors within the scope of a single L3 cache  within
465       their  assigned location.  Thus, if a process is assigned by the mapper
466       to a certain package, then a --bind-to l3cache directive will cause the
467       process  to  be  bound  to  the processors that share a single L3 cache
468       within that package.
469
470       To help balance loads, the binding directive uses a round-robin  method
471       when  binding  to  levels  lower than used in the mapper.  For example,
472       consider the case where a job is mapped to the package level, and  then
473       bound  to  core.  Each package will have multiple cores, so if multiple
474       processes are mapped to a given package, the binding algorithm will as‐
475       sign  each  process  located  to a package to a unique core in a round-
476       robin manner.
477
478       Alternatively, processes mapped by l2cache and then  bound  to  package
479       will  simply  be  bound to all the processors in the package where they
480       are located.  In this manner, users can  exert  detailed  control  over
481       relative process location and binding.
482
483       Process  mapping/ranking/binding  can  also be set with MCA parameters.
484       Their usage is less convenient than that of the command  line  options.
485       On  the other hand, MCA parameters can be set not only on the prun com‐
486       mand line, but alternatively in a system or user  mca-params.conf  file
487       or  as  environment  variables,  as described in the MCA section below.
488       Some examples include:
489
490              prun option              MCA parameter key          value
491              --map-by core        rmaps_default_mapping_policy   core
492              --map-by package     rmaps_default_mapping_policy   package
493              --rank-by core       rmaps_default_ranking_policy   core
494              --bind-to core       hwloc_default_binding_policy   core
495              --bind-to package    hwloc_default_binding_policy   package
496              --bind-to none       hwloc_default_binding_policy   none
497
498   Difference between overloading and oversubscription
499       This section explores the difference between these two options.   Users
500       are  often  confused by the difference between these two scenarios.  As
501       such this section provides a number of scenarios to help illustrate the
502       differences.
503
504       • --map-by  :OVERSUBSCRIBE allow more processes on a node than process‐
505         ing elements
506
507       • --bind-to <object>:overload-allowed allows for binding more than  one
508         process in relation to a CPU
509
510       The  important thing to remember with oversubscribing is that it can be
511       defined separately from the actual number of CPUs on a node.  This  al‐
512       lows  the  mapper  to place more or fewer processes per node than CPUs.
513       By default, PRTE uses cores to determine slots in the absence  of  such
514       information provided in the hostfile or by the resource manager (except
515       in the case of the --host as described in the “Specifying  Host  Nodes”
516       section).
517
518       The  important thing to remember with overloading is that it is defined
519       as binding more processes than CPUs.  By default, PRTE uses cores as  a
520       means  of  counting  the  number of CPUs.  However, the user can adjust
521       this.  For example when using the :HWTCPUS qualifier  to  the  --map-by
522       option PRTE will use hardware threads as a means of counting the number
523       of CPUs.
524
525       For the following examples consider a node with: - Two processor  pack‐
526       ages, - Ten cores per package, and - Eight hardware threads per core.
527
528       Consider the node from above with the hostfile below:
529
530              $ cat myhostfile
531              node01 slots=32
532              node02 slots=32
533
534       The  “slots”  tells  PRTE  that  it can place up to 32 processes before
535       oversubscribing the node.
536
537       If we run the following:
538
539              prun --np 34 --hostfile myhostfile --map-by core --bind-to core hostname
540
541       It will return an error at the binding time indicating  an  overloading
542       scenario.
543
544       The  mapping  mechanism  assigns  32  processes  to node01 matching the
545       “slots” specification in the hostfile.  The binding mechanism will bind
546       the  first  20  processes  to unique cores leaving it with 12 processes
547       that it cannot bind without overloading one of the cores (putting  more
548       than one process on the core).
549
550       Using the overload-allowed qualifier to the --bind-to core option tells
551       PRTE that it may assign more than one process to a core.
552
553       If we run the following:
554
555              prun --np 34 --hostfile myhostfile --map-by core --bind-to core:overload-allowed hostname
556
557       This will run correctly placing 32 processes on node01, and 2 processes
558       on  node02.  On node01 two processes are bound to cores 0-11 accounting
559       for the overloading of those cores.
560
561       Alternatively, we could use hardware threads to give  binding  a  lower
562       level CPU to bind to without overloading.
563
564       If we run the following:
565
566              prun --np 34 --hostfile myhostfile --map-by core:HWTCPUS --bind-to hwthread hostname
567
568       This will run correctly placing 32 processes on node01, and 2 processes
569       on node02.  On node01 two processes are mapped to cores 0-11 but  bound
570       to  different  hardware  threads  on those cores (the logical first and
571       second hardware thread) thus no  hardware  threads  are  overloaded  at
572       binding time.
573
574       In both of the examples above the node is not oversubscribed at mapping
575       time because the hostfile set the oversubscription limit to  “slots=32”
576       for  each  node.   It is only after we exceed that limit that PRTE will
577       throw an oversubscription error.
578
579       Consider next if we ran the following:
580
581              prun --np 66 --hostfile myhostfile --map-by core:HWTCPUS --bind-to hwthread hostname
582
583       This will return an error at mapping time indicating  an  oversubscrip‐
584       tion  scenario.  The mapping mechanism will assign all of the available
585       slots (64 across 2 nodes) and be left two processes to map.   The  only
586       way  to  map those processes is to exceed the number of available slots
587       putting the job into an oversubscription scenario.
588
589       You can force PRTE to oversubscribe the nodes by  using  the  :OVERSUB‐
590       SCRIBE qualifier to the --map-by option as seen in the example below:
591
592              prun --np 66 --hostfile myhostfile --map-by core:HWTCPUS:OVERSUBSCRIBE --bind-to hwthread hostname
593
594       This  will  run  correctly  placing  34  processes  on node01 and 32 on
595       node02.  Each process is bound to a unique hardware thread.
596
597   Overloading vs Oversubscription: Package Example
598       Let’s extend these examples by considering the package level.  Consider
599       the same node as before, but with the hostfile below:
600
601              $ cat myhostfile
602              node01 slots=22
603              node02 slots=22
604
605       The  lowest  level  CPUs are `cores' and we have 20 total (10 per pack‐
606       age).
607
608       If we run:
609
610              prun --np 20 --hostfile myhostfile --map-by package --bind-to package:REPORT hostname
611
612       Then 10 processes are mapped to each package, and bound at the  package
613       level.  This is not overloading since we have 10 CPUs (cores) available
614       in the package at the hardware level.
615
616       However, if we run:
617
618              prun --np 21 --hostfile myhostfile --map-by package --bind-to package:REPORT hostname
619
620       Then 11 processes are mapped to the first package and 10 to the  second
621       package.  At binding time we have an overloading scenario because there
622       are only 10 CPUs (cores) available in the package at the hardware  lev‐
623       el.  So the first package is overloaded.
624
625   Overloading vs Oversubscription: Hardware Threads Example
626       Similarly, if we consider hardware threads.
627
628       Consider the same node as before, but with the hostfile below:
629
630              $ cat myhostfile
631              node01 slots=165
632              node02 slots=165
633
634       The  lowest level CPUs are `hwthreads' (because we are going to use the
635       :HWTCPUS qualifier) and we have 160 total (80 per package).
636
637       If we re-run (from the package example) and add the :HWTCPUS qualifier:
638
639              prun --np 21 --hostfile myhostfile --map-by package:HWTCPUS --bind-to package:REPORT hostname
640
641       Without the :HWTCPUS qualifier this would be  overloading  (as  we  saw
642       previously).   The  mapper places 11 processes on the first package and
643       10 to the second package.  The processes are still bound to the package
644       level.   However,  with  the  :HWTCPUS qualifier, it is not overloading
645       since we have 80 CPUs (hwthreads) available in the package at the hard‐
646       ware level.
647
648       Alternatively, if we run:
649
650              prun --np 161 --hostfile myhostfile --map-by package:HWTCPUS --bind-to package:REPORT hostname
651
652       Then  81 processes are mapped to the first package and 80 to the second
653       package.  At binding time we have an overloading scenario because there
654       are  only  80 CPUs (hwthreads) available in the package at the hardware
655       level.  So the first package is overloaded.
656
657   Diagnostics
658       PRTE provides various diagnostic reports that aid the user in verifying
659       and tuning the mapping/ranking/binding for a specific job.
660
661       The  :REPORT  qualifier to --bind-to command line option can be used to
662       report process bindings.
663
664       As an example, consider a node with: - Two processor packages,  -  Four
665       cores per package, and - Eight hardware threads per core.
666
667       In  each of the examples below the binding is reported in a human read‐
668       able format.
669
670              $ prun --np 4 --map-by core --bind-to core:REPORT ./a.out
671              [node01:103137] MCW rank 0 bound to package[0][core:0]
672              [node01:103137] MCW rank 1 bound to package[0][core:1]
673              [node01:103137] MCW rank 2 bound to package[0][core:2]
674              [node01:103137] MCW rank 3 bound to package[0][core:3]
675
676       The example above processes bind to successive cores on the first pack‐
677       age.
678
679              $ prun --np 4 --map-by package --bind-to package:REPORT ./a.out
680              [node01:103115] MCW rank 0 bound to package[0][core:0-9]
681              [node01:103115] MCW rank 1 bound to package[1][core:10-19]
682              [node01:103115] MCW rank 2 bound to package[0][core:0-9]
683              [node01:103115] MCW rank 3 bound to package[1][core:10-19]
684
685       The  example  above processes bind to all cores on successive packages.
686       The processes cycle though the packages in  a  round-robin  fashion  as
687       many times as are needed.
688
689              $ prun --np 4 --map-by package:PE=2 --bind-to core:REPORT ./a.out
690              [node01:103328] MCW rank 0 bound to package[0][core:0-1]
691              [node01:103328] MCW rank 1 bound to package[1][core:10-11]
692              [node01:103328] MCW rank 2 bound to package[0][core:2-3]
693              [node01:103328] MCW rank 3 bound to package[1][core:12-13]
694
695       The  example  above  shows us that 2 cores have been bound per process.
696       The :PE=2 qualifier states that 2 processing  elements  underneath  the
697       package (which would be cores in this case) are mapped to each process.
698       The processes cycle though the packages in  a  round-robin  fashion  as
699       many times as are needed.
700
701              $ prun --np 4 --map-by core:PE=2:HWTCPUS --bind-to :REPORT  hostname
702              [node01:103506] MCW rank 0 bound to package[0][hwt:0-1]
703              [node01:103506] MCW rank 1 bound to package[0][hwt:8-9]
704              [node01:103506] MCW rank 2 bound to package[0][hwt:16-17]
705              [node01:103506] MCW rank 3 bound to package[0][hwt:24-25]
706
707       The  example above shows us that 2 hardware threads have been bound per
708       process.  In this case prun is mapping by  hardware  threads  since  we
709       used the :HWTCPUS qualifier.  Without that qualifier this command would
710       return an error since by default prun will not map to resources smaller
711       than a core.  The :PE=2 qualifier states that 2 processing elements un‐
712       derneath the core (which would be hardware threads in  this  case)  are
713       mapped  to  each  process.   The  processes cycle though the cores in a
714       round-robin fashion as many times as are needed.
715
716              $ prun --np 4 --bind-to none:REPORT  hostname
717              [node01:107126] MCW rank 0 is not bound (or bound to all available processors)
718              [node01:107126] MCW rank 1 is not bound (or bound to all available processors)
719              [node01:107126] MCW rank 2 is not bound (or bound to all available processors)
720              [node01:107126] MCW rank 3 is not bound (or bound to all available processors)
721
722       The example above binding is turned off.
723
724   Rankfiles
725       Another way to specify arbitrary mappings is  with  a  rankfile,  which
726       gives you detailed control over process binding as well.
727
728       Rankfiles  are  text  files that specify detailed information about how
729       individual processes should be mapped to nodes, and  to  which  proces‐
730       sor(s) they should be bound.  Each line of a rankfile specifies the lo‐
731       cation of one process.  The general form of each line in  the  rankfile
732       is:
733
734              rank <N>=<hostname> slot=<slot list>
735
736       For example:
737
738              $ cat myrankfile
739              rank 0=c712f6n01 slot=10-12
740              rank 1=c712f6n02 slot=0,1,4
741              rank 2=c712f6n03 slot=1-2
742              $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out
743
744       Means that
745
746              Rank 0 runs on node aa, bound to logical cores 10-12.
747              Rank 1 runs on node bb, bound to logical cores 0, 1, and 4.
748              Rank 2 runs on node cc, bound to logical cores 1 and 2.
749
750       For example:
751
752              $ cat myrankfile
753              rank 0=aa slot=1:0-2
754              rank 1=bb slot=0:0,1,4
755              rank 2=cc slot=1-2
756              $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out
757
758       Means that
759
760              Rank 0 runs on node aa, bound to logical package 1, cores 10-12 (the 0th through 2nd cores on that package).
761              Rank 1 runs on node bb, bound to logical package 0, cores 0, 1, and 4.
762              Rank 2 runs on node cc, bound to logical cores 1 and 2.
763
764       The  hostnames listed above are “absolute,” meaning that actual resolv‐
765       able hostnames are specified.  However, hostnames can also be specified
766       as “relative,” meaning that they are specified in relation to an exter‐
767       nally-specified list of hostnames (e.g., by prun’s --host  argument,  a
768       hostfile, or a job scheduler).
769
770       The  “relative” specification is of the form “+n<X>”, where X is an in‐
771       teger specifying the Xth hostname in the set  of  all  available  host‐
772       names, indexed from 0.  For example:
773
774              $ cat myrankfile
775              rank 0=+n0 slot=10-12
776              rank 1=+n1 slot=0,1,4
777              rank 2=+n2 slot=1-2
778              $ prun --host aa,bb,cc,dd --map-by rankfile:FILE=myrankfile ./a.out
779
780       All  package/core  slot  locations are be specified as logical indexes.
781       You can use tools such as HWLOC’s “lstopo” to find the logical  indexes
782       of packages and cores.
783
784   Deprecated Options
785       These deprecated options will be removed in a future release.
786
787       --bind-to-core
788              (Deprecated: Use --bind-to core) Bind processes to cores
789
790       -bind-to-socket, --bind-to-socket
791              (Deprecated:  Use --bind-to package) Bind processes to processor
792              sockets
793
794       --bycore
795              (Deprecated: Use --map-by core) Map processes by core
796
797       -bynode, --bynode
798              (Deprecated: Use --map-by node) Launch processes one  per  node,
799              cycling by node in a round-robin fashion.  This spreads process‐
800              es evenly among nodes and assigns ranks in  a  round-robin,  “by
801              node” manner.
802
803       --byslot
804              (Deprecated:  Use  --map-by  slot) Map and rank processes round-
805              robin by slot.
806
807       --cpus-per-proc <#perproc>
808              (Deprecated: Use --map-by <obj>:PE=<#perproc>) Bind each process
809              to the specified number of cpus.
810
811       --cpus-per-rank <#perrank>
812              (Deprecated: Use --map-by <obj>:PE=<#perrank>) Alias for --cpus-
813              per-proc.
814
815       --display-allocation
816              (Deprecated: Use --map-by :DISPLAYALLOC)  Display  the  detected
817              resource allocation.
818
819       --display-devel-map
820              (Deprecated:  Use  --map-by  :DISPLAYDEVEL)  Display  a detailed
821              process map (mostly intended for developers) just before launch.
822
823       --display-map
824              (Deprecated: Use --map-by :DISPLAY) Display a table showing  the
825              mapped location of each process prior to launch.
826
827       --display-topo
828              (Deprecated:  Use --map-by :DISPLAYTOPO) Display the topology as
829              part of the process map (mostly intended  for  developers)  just
830              before launch.
831
832       --do-not-launch
833              (Deprecated:  Use  --map-by  :DONOTLAUNCH) Perform all necessary
834              operations to prepare to launch the application, but do not  ac‐
835              tually launch it (usually used to test mapping patterns).
836
837       --do-not-resolve
838              (Deprecated:  Use  --map-by :DONOTRESOLVE) Do not attempt to re‐
839              solve interfaces - usually used to  determine  proposed  process
840              placement/binding prior to obtaining an allocation.
841
842       -N <num>
843              (Deprecated:  Use  --map-by prr:<num>:node) Launch num processes
844              per node on all allocated nodes.
845
846       --nolocal
847              (Deprecated: Use --map-by :NOLOCAL) Do not run any copies of the
848              launched  application on the same node as prun is running.  This
849              option will override listing the localhost with  --host  or  any
850              other host-specifying mechanism.
851
852       --nooversubscribe
853              (Deprecated: Use --map-by :NOOVERSUBSCRIBE) Do not oversubscribe
854              any nodes; error (without starting any  processes)  if  the  re‐
855              quested  number of processes would cause oversubscription.  This
856              option implicitly sets “max_slots” equal to  the  “slots”  value
857              for each node.  (Enabled by default).
858
859       --npernode <#pernode>
860              (Deprecated:  Use  --map-by  ppr:<#pernode>:node)  On each node,
861              launch this many processes.
862
863       --npersocket <#persocket>
864              (Deprecated: Use  --map-by  ppr:<#perpackage>:package)  On  each
865              node,  launch  this many processes times the number of processor
866              sockets on the node.  The --npersocket option also turns on  the
867              --bind-to  socket option.  The term socket has been globally re‐
868              placed with package.
869
870       --oversubscribe
871              (Deprecated: Use --map-by :OVERSUBSCRIBE) Nodes are  allowed  to
872              be  oversubscribed, even on a managed system, and overloading of
873              processing elements.
874
875       --pernode
876              (Deprecated: Use --map-by ppr:1:node) On each node,  launch  one
877              process.
878
879       --ppr  (Deprecated:  Use  --map-by  ppr:<list>) Comma-separated list of
880              number of processes on a given resource type [default: none].
881
882       --rankfile <FILENAME>
883              (Deprecated: Use --map-by rankfile:FILE=<FILENAME>) Use a  rank‐
884              file for mapping/ranking/binding
885
886       --report-bindings
887              (Deprecated:  Use  --bind-to  :REPORT)  Report  any bindings for
888              launched processes.
889
890       --tag-output
891              (Deprecated:  Use  --map-by  :TAGOUTPUT)  Tag  all  output  with
892              [job,rank]
893
894       --timestamp-output
895              (Deprecated: Use --map-by :TIMESTAMPOUTPUT) Timestamp all appli‐
896              cation process output
897
898       --use-hwthread-cpus
899              (Deprecated: Use --map-by :HWTCPUS) Use hardware threads as  in‐
900              dependent cpus.
901
902       --xml  (Deprecated:  Use --map-by :XMLOUTPUT) Provide all output in XML
903              format
904
905
906
9072021-10-09                                                         prte-map(1)
Impressum