srun(1) - f32

1srun(1)                         Slurm Commands                         srun(1)
2
3
4

NAME

6       srun - Run parallel jobs
7
8

SYNOPSIS

10       srun [OPTIONS(0)...] [ : [OPTIONS(N)...]] executable(0) [args(0)...]
11
12       Option(s)  define  multiple  jobs  in a co-scheduled heterogeneous job.
13       For more details about heterogeneous jobs see the document
14       https://slurm.schedmd.com/heterogeneous_jobs.html
15
16

DESCRIPTION

18       Run a parallel job on cluster managed by  Slurm.   If  necessary,  srun
19       will  first  create  a resource allocation in which to run the parallel
20       job.
21
22       The following document describes the influence of  various  options  on
23       the allocation of cpus to jobs and tasks.
24       https://slurm.schedmd.com/cpu_management.html
25
26

RETURN VALUE

28       srun  will return the highest exit code of all tasks run or the highest
29       signal (with the high-order bit set in an 8-bit integer -- e.g.  128  +
30       signal) of any task that exited with a signal.
31
32

EXECUTABLE PATH RESOLUTION

34       The executable is resolved in the following order:
35
36       1. If  executable starts with ".", then path is constructed as: current
37          working directory / executable
38
39       2. If executable starts with a "/", then path is considered absolute.
40
41       3. If executable can be resolved through PATH. See path_resolution(7).
42
43       4. If executable is in current working directory.
44
45       Current working directory is  the  calling  process  working  directory
46       unless  the --chdir argument is passed, which will override the current
47       working directory.
48
49

OPTIONS

51       --accel-bind=<options>
52              Control how tasks are bound to generic resources  of  type  gpu,
53              mic  and  nic.   Multiple  options  may  be specified. Supported
54              options include:
55
56              g      Bind each task to GPUs which are closest to the allocated
57                     CPUs.
58
59              m      Bind each task to MICs which are closest to the allocated
60                     CPUs.
61
62              n      Bind each task to NICs which are closest to the allocated
63                     CPUs.
64
65              v      Verbose  mode.  Log  how  tasks  are bound to GPU and NIC
66                     devices.
67
68              This option applies to job allocations.
69
70
71       -A, --account=<account>
72              Charge resources used by this job  to  specified  account.   The
73              account  is an arbitrary string. The account name may be changed
74              after job submission using the  scontrol  command.  This  option
75              applies to job allocations.
76
77
78       --acctg-freq
79              Define  the  job  accounting  and  profiling sampling intervals.
80              This can be used to override the JobAcctGatherFrequency  parame‐
81              ter  in  Slurm's  configuration file, slurm.conf.  The supported
82              format is follows:
83
84              --acctg-freq=<datatype>=<interval>
85                          where <datatype>=<interval> specifies the task  sam‐
86                          pling  interval  for  the jobacct_gather plugin or a
87                          sampling  interval  for  a  profiling  type  by  the
88                          acct_gather_profile  plugin.  Multiple,  comma-sepa‐
89                          rated <datatype>=<interval> intervals may be  speci‐
90                          fied. Supported datatypes are as follows:
91
92                          task=<interval>
93                                 where  <interval> is the task sampling inter‐
94                                 val in seconds for the jobacct_gather plugins
95                                 and     for    task    profiling    by    the
96                                 acct_gather_profile plugin.  NOTE: This  fre‐
97                                 quency  is  used  to monitor memory usage. If
98                                 memory limits are enforced the  highest  fre‐
99                                 quency  a user can request is what is config‐
100                                 ured in the slurm.conf file.   They  can  not
101                                 turn it off (=0) either.
102
103                          energy=<interval>
104                                 where  <interval> is the sampling interval in
105                                 seconds  for  energy  profiling   using   the
106                                 acct_gather_energy plugin
107
108                          network=<interval>
109                                 where  <interval> is the sampling interval in
110                                 seconds for infiniband  profiling  using  the
111                                 acct_gather_infiniband plugin.
112
113                          filesystem=<interval>
114                                 where  <interval> is the sampling interval in
115                                 seconds for filesystem  profiling  using  the
116                                 acct_gather_filesystem plugin.
117
118              The  default  value  for  the  task  sampling
119              interval
120              is 30. The default value for  all  other  intervals  is  0.   An
121              interval  of  0 disables sampling of the specified type.  If the
122              task sampling interval is 0, accounting information is collected
123              only  at  job  termination (reducing Slurm interference with the
124              job).
125              Smaller (non-zero) values have a greater impact upon job perfor‐
126              mance,  but a value of 30 seconds is not likely to be noticeable
127              for applications having less  than  10,000  tasks.  This  option
128              applies job allocations.
129
130
131       -B --extra-node-info=<sockets[:cores[:threads]]>
132              Restrict  node  selection  to  nodes with at least the specified
133              number of sockets, cores per socket  and/or  threads  per  core.
134              NOTE: These options do not specify the resource allocation size.
135              Each value specified is considered a minimum.  An  asterisk  (*)
136              can  be  used  as  a  placeholder  indicating that all available
137              resources of that type are to be utilized. Values  can  also  be
138              specified  as  min-max. The individual levels can also be speci‐
139              fied in separate options if desired:
140                  --sockets-per-node=<sockets>
141                  --cores-per-socket=<cores>
142                  --threads-per-core=<threads>
143              If task/affinity plugin is enabled, then specifying  an  alloca‐
144              tion  in  this  manner  also sets a default --cpu-bind option of
145              threads if the -B option specifies a thread count, otherwise  an
146              option  of  cores  if  a  core  count is specified, otherwise an
147              option   of   sockets.    If   SelectType   is   configured   to
148              select/cons_res,   it   must   have   a  parameter  of  CR_Core,
149              CR_Core_Memory, CR_Socket, or CR_Socket_Memory for  this  option
150              to  be  honored.   If  not specified, the scontrol show job will
151              display 'ReqS:C:T=*:*:*'. This option  applies  to  job  alloca‐
152              tions.
153
154
155       --bb=<spec>
156              Burst  buffer  specification.  The  form of the specification is
157              system dependent.  Also see --bbf. This option  applies  to  job
158              allocations.
159
160
161       --bbf=<file_name>
162              Path of file containing burst buffer specification.  The form of
163              the specification is system  dependent.   Also  see  --bb.  This
164              option applies to job allocations.
165
166
167       --bcast[=<dest_path>]
168              Copy executable file to allocated compute nodes.  If a file name
169              is specified, copy the executable to the  specified  destination
170              file  path.  If  no  path  is specified, copy the file to a file
171              named "slurm_bcast_<job_id>.<step_id>" in the  current  working.
172              For  example,  "srun  --bcast=/tmp/mine -N3 a.out" will copy the
173              file "a.out" from your current directory to the file "/tmp/mine"
174              on  each  of  the three allocated compute nodes and execute that
175              file. This option applies to step allocations.
176
177
178       -b, --begin=<time>
179              Defer initiation of this  job  until  the  specified  time.   It
180              accepts  times  of  the form HH:MM:SS to run a job at a specific
181              time of day (seconds are optional).  (If that  time  is  already
182              past,  the next day is assumed.)  You may also specify midnight,
183              noon, fika (3  PM)  or  teatime  (4  PM)  and  you  can  have  a
184              time-of-day suffixed with AM or PM for running in the morning or
185              the evening.  You can also say what day the job will be run,  by
186              specifying  a  date  of  the form MMDDYY or MM/DD/YY YYYY-MM-DD.
187              Combine   date   and   time   using   the    following    format
188              YYYY-MM-DD[THH:MM[:SS]].  You  can  also  give  times like now +
189              count time-units, where the time-units can be seconds (default),
190              minutes, hours, days, or weeks and you can tell Slurm to run the
191              job today with the keyword today and to  run  the  job  tomorrow
192              with  the  keyword tomorrow.  The value may be changed after job
193              submission using the scontrol command.  For example:
194                 --begin=16:00
195                 --begin=now+1hour
196                 --begin=now+60           (seconds by default)
197                 --begin=2010-01-20T12:34:00
198
199
200              Notes on date/time specifications:
201               - Although the 'seconds' field of the HH:MM:SS time  specifica‐
202              tion  is  allowed  by  the  code, note that the poll time of the
203              Slurm scheduler is not precise enough to guarantee  dispatch  of
204              the  job on the exact second.  The job will be eligible to start
205              on the next poll following the specified time.  The  exact  poll
206              interval  depends  on the Slurm scheduler (e.g., 60 seconds with
207              the default sched/builtin).
208               -  If  no  time  (HH:MM:SS)  is  specified,  the   default   is
209              (00:00:00).
210               -  If a date is specified without a year (e.g., MM/DD) then the
211              current year is assumed, unless the  combination  of  MM/DD  and
212              HH:MM:SS  has  already  passed  for that year, in which case the
213              next year is used.
214              This option applies to job allocations.
215
216
217       --checkpoint=<time>
218              Specifies the interval between creating checkpoints of  the  job
219              step.   By  default,  the job step will have no checkpoints cre‐
220              ated.  Acceptable time formats include "minutes",  "minutes:sec‐
221              onds",  "hours:minutes:seconds",  "days-hours", "days-hours:min‐
222              utes" and "days-hours:minutes:seconds". This option  applies  to
223              job and step allocations.
224
225
226       --cluster-constraint=<list>
227              Specifies  features that a federated cluster must have to have a
228              sibling job submitted to it. Slurm will attempt to submit a sib‐
229              ling  job  to  a cluster if it has at least one of the specified
230              features.
231
232
233       --comment=<string>
234              An arbitrary comment. This option applies to job allocations.
235
236
237       --compress[=type]
238              Compress file before sending it to compute hosts.  The  optional
239              argument  specifies  the  data  compression  library to be used.
240              Supported values are "lz4" (default) and "zlib".  Some  compres‐
241              sion libraries may be unavailable on some systems.  For use with
242              the --bcast option. This option applies to step allocations.
243
244
245       -C, --constraint=<list>
246              Nodes can have features assigned to them by the  Slurm  adminis‐
247              trator.   Users can specify which of these features are required
248              by their job using the constraint  option.   Only  nodes  having
249              features  matching  the  job constraints will be used to satisfy
250              the request.  Multiple constraints may be  specified  with  AND,
251              OR,  matching  OR, resource counts, etc. (some operators are not
252              supported on all system types).   Supported  constraint  options
253              include:
254
255              Single Name
256                     Only nodes which have the specified feature will be used.
257                     For example, --constraint="intel"
258
259              Node Count
260                     A request can specify the number  of  nodes  needed  with
261                     some feature by appending an asterisk and count after the
262                     feature   name.    For   example    "--nodes=16    --con‐
263                     straint=graphics*4  ..."  indicates that the job requires
264                     16 nodes and that at least four of those nodes must  have
265                     the feature "graphics."
266
267              AND    If  only  nodes  with  all  of specified features will be
268                     used.  The ampersand is used for an  AND  operator.   For
269                     example, --constraint="intel&gpu"
270
271              OR     If  only  nodes  with  at least one of specified features
272                     will be used.  The vertical bar is used for an OR  opera‐
273                     tor.  For example, --constraint="intel|amd"
274
275              Matching OR
276                     If  only  one of a set of possible options should be used
277                     for all allocated nodes, then use  the  OR  operator  and
278                     enclose the options within square brackets.  For example:
279                     "--constraint=[rack1|rack2|rack3|rack4]" might be used to
280                     specify that all nodes must be allocated on a single rack
281                     of the cluster, but any of those four racks can be used.
282
283              Multiple Counts
284                     Specific counts of multiple resources may be specified by
285                     using  the  AND operator and enclosing the options within
286                     square     brackets.       For      example:      "--con‐
287                     straint=[rack1*2&rack2*4]"  might be used to specify that
288                     two nodes must be allocated from nodes with  the  feature
289                     of  "rack1"  and  four nodes must be allocated from nodes
290                     with the feature "rack2".
291
292                     NOTE: This construct does not support multiple Intel  KNL
293                     NUMA   or   MCDRAM  modes.  For  example,  while  "--con‐
294                     straint=[(knl&quad)*2&(knl&hemi)*4]"  is  not  supported,
295                     "--constraint=[haswell*2&(knl&hemi)*4]"   is   supported.
296                     Specification of multiple KNL modes requires the use of a
297                     heterogeneous job.
298
299
300              Parenthesis
301                     Parenthesis  can  be  used  to  group  like node features
302                     together.          For          example           "--con‐
303                     straint=[(knl&snc4&flat)*4&haswell*1]"  might  be used to
304                     specify that four nodes with the features  "knl",  "snc4"
305                     and  "flat"  plus one node with the feature "haswell" are
306                     required.  All  options  within  parenthesis  should   be
307                     grouped with AND (e.g. "&") operands.
308
309       WARNING:  When  srun is executed from within salloc or sbatch, the con‐
310       straint value can only contain a single feature name. None of the other
311       operators are currently supported for job steps.
312       This option applies to job and step allocations.
313
314
315       --contiguous
316              If  set,  then  the  allocated nodes must form a contiguous set.
317              Not honored with the topology/tree or topology/3d_torus plugins,
318              both  of which can modify the node ordering. This option applies
319              to job allocations.
320
321
322       --cores-per-socket=<cores>
323              Restrict node selection to nodes with  at  least  the  specified
324              number of cores per socket.  See additional information under -B
325              option above when task/affinity plugin is enabled.  This  option
326              applies to job allocations.
327
328
329       --cpu-bind=[{quiet,verbose},]type
330              Bind  tasks  to  CPUs.   Used  only  when  the  task/affinity or
331              task/cgroup plugin is  enabled.   NOTE:  To  have  Slurm  always
332              report  on the selected CPU binding for all commands executed in
333              a  shell,  you  can  enable  verbose   mode   by   setting   the
334              SLURM_CPU_BIND environment variable value to "verbose".
335
336              The  following  informational environment variables are set when
337              --cpu-bind is in use:
338                   SLURM_CPU_BIND_VERBOSE
339                   SLURM_CPU_BIND_TYPE
340                   SLURM_CPU_BIND_LIST
341
342              See the  ENVIRONMENT  VARIABLES  section  for  a  more  detailed
343              description  of  the  individual SLURM_CPU_BIND variables. These
344              variable are available only if the task/affinity plugin is  con‐
345              figured.
346
347              When  using --cpus-per-task to run multithreaded tasks, be aware
348              that CPU binding is inherited from the parent  of  the  process.
349              This  means that the multithreaded task should either specify or
350              clear the CPU binding itself to avoid having all threads of  the
351              multithreaded  task use the same mask/CPU as the parent.  Alter‐
352              natively, fat masks (masks which specify more than  one  allowed
353              CPU)  could  be  used for the tasks in order to provide multiple
354              CPUs for the multithreaded tasks.
355
356              By default, a job step has access to every CPU allocated to  the
357              job.   To  ensure  that  distinct CPUs are allocated to each job
358              step, use the --exclusive option.
359
360              Note that a job step can be allocated different numbers of  CPUs
361              on each node or be allocated CPUs not starting at location zero.
362              Therefore one of the options which  automatically  generate  the
363              task  binding  is  recommended.   Explicitly  specified masks or
364              bindings are only honored when the job step has  been  allocated
365              every available CPU on the node.
366
367              Binding  a task to a NUMA locality domain means to bind the task
368              to the set of CPUs that belong to the NUMA  locality  domain  or
369              "NUMA  node".   If NUMA locality domain options are used on sys‐
370              tems with no NUMA support, then  each  socket  is  considered  a
371              locality domain.
372
373              If  the  --cpu-bind option is not used, the default binding mode
374              will depend upon Slurm's configuration and the  step's  resource
375              allocation.   If  all  allocated  nodes have the same configured
376              CpuBind mode, that will be used.  Otherwise if the job's  Parti‐
377              tion  has  a configured CpuBind mode, that will be used.  Other‐
378              wise if Slurm has a configured TaskPluginParam value, that  mode
379              will  be used.  Otherwise automatic binding will be performed as
380              described below.
381
382
383              Auto Binding
384                     Applies only when task/affinity is enabled.  If  the  job
385                     step  allocation  includes an allocation with a number of
386                     sockets, cores, or threads equal to the number  of  tasks
387                     times  cpus-per-task,  then  the tasks will by default be
388                     bound to the appropriate resources (auto  binding).  Dis‐
389                     able   this  mode  of  operation  by  explicitly  setting
390                     "--cpu-bind=none".       Use        TaskPluginParam=auto‐
391                     bind=[threads|cores|sockets] to set a default cpu binding
392                     in case "auto binding" doesn't find a match.
393
394              Supported options include:
395
396                     q[uiet]
397                            Quietly bind before task runs (default)
398
399                     v[erbose]
400                            Verbosely report binding before task runs
401
402                     no[ne] Do not bind tasks to  CPUs  (default  unless  auto
403                            binding is applied)
404
405                     rank   Automatically  bind by task rank.  The lowest num‐
406                            bered task on each node is  bound  to  socket  (or
407                            core  or  thread) zero, etc.  Not supported unless
408                            the entire node is allocated to the job.
409
410                     map_cpu:<list>
411                            Bind by setting CPU masks on tasks (or  ranks)  as
412                            specified          where         <list>         is
413                            <cpu_id_for_task_0>,<cpu_id_for_task_1>,...    CPU
414                            IDs  are interpreted as decimal values unless they
415                            are preceded with '0x' in which case  they  inter‐
416                            preted  as  hexadecimal  values.  If the number of
417                            tasks (or ranks) exceeds the number of elements in
418                            this  list, elements in the list will be reused as
419                            needed starting from the beginning  of  the  list.
420                            To  simplify  support  for  large task counts, the
421                            lists may follow a map with an asterisk and  repe‐
422                            tition  count For example "map_cpu:0x0f*4,0xf0*4".
423                            Not supported unless the entire node is  allocated
424                            to the job.
425
426                     mask_cpu:<list>
427                            Bind  by  setting CPU masks on tasks (or ranks) as
428                            specified         where         <list>          is
429                            <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
430                            The mapping is specified for a node and  identical
431                            mapping  is  applied  to  the  tasks on every node
432                            (i.e. the lowest task ID on each node is mapped to
433                            the  first mask specified in the list, etc.).  CPU
434                            masks are always interpreted as hexadecimal values
435                            but  can  be  preceded  with an optional '0x'. Not
436                            supported unless the entire node is  allocated  to
437                            the  job.   To  simplify  support  for  large task
438                            counts, the lists may follow a map with an  aster‐
439                            isk    and    repetition    count    For   example
440                            "mask_cpu:0x0f*4,0xf0*4".   Not  supported  unless
441                            the entire node is allocated to the job.
442
443                     rank_ldom
444                            Bind  to  a NUMA locality domain by rank. Not sup‐
445                            ported unless the entire node is allocated to  the
446                            job.
447
448                     map_ldom:<list>
449                            Bind  by mapping NUMA locality domain IDs to tasks
450                            as      specified      where       <list>       is
451                            <ldom1>,<ldom2>,...<ldomN>.   The  locality domain
452                            IDs are interpreted as decimal values unless  they
453                            are  preceded  with  '0x'  in  which case they are
454                            interpreted as hexadecimal values.  Not  supported
455                            unless the entire node is allocated to the job.
456
457                     mask_ldom:<list>
458                            Bind  by  setting  NUMA  locality  domain masks on
459                            tasks    as    specified    where    <list>     is
460                            <mask1>,<mask2>,...<maskN>.   NUMA locality domain
461                            masks are always interpreted as hexadecimal values
462                            but  can  be  preceded with an optional '0x'.  Not
463                            supported unless the entire node is  allocated  to
464                            the job.
465
466                     sockets
467                            Automatically  generate  masks  binding  tasks  to
468                            sockets.  Only the CPUs on the socket  which  have
469                            been  allocated  to  the job will be used.  If the
470                            number of tasks differs from the number  of  allo‐
471                            cated sockets this can result in sub-optimal bind‐
472                            ing.
473
474                     cores  Automatically  generate  masks  binding  tasks  to
475                            cores.   If  the  number of tasks differs from the
476                            number of  allocated  cores  this  can  result  in
477                            sub-optimal binding.
478
479                     threads
480                            Automatically  generate  masks  binding  tasks  to
481                            threads.  If the number of tasks differs from  the
482                            number  of  allocated  threads  this can result in
483                            sub-optimal binding.
484
485                     ldoms  Automatically generate masks binding tasks to NUMA
486                            locality  domains.  If the number of tasks differs
487                            from the number of allocated locality domains this
488                            can result in sub-optimal binding.
489
490                     boards Automatically  generate  masks  binding  tasks  to
491                            boards.  If the number of tasks differs  from  the
492                            number  of  allocated  boards  this  can result in
493                            sub-optimal binding. This option is  supported  by
494                            the task/cgroup plugin only.
495
496                     help   Show help message for cpu-bind
497
498              This option applies to job and step allocations.
499
500
501       --cpu-freq =<p1[-p2[:p3]]>
502
503              Request  that the job step initiated by this srun command be run
504              at some requested frequency if possible, on  the  CPUs  selected
505              for the step on the compute node(s).
506
507              p1  can be  [#### | low | medium | high | highm1] which will set
508              the frequency scaling_speed to the corresponding value, and  set
509              the frequency scaling_governor to UserSpace. See below for defi‐
510              nition of the values.
511
512              p1 can be [Conservative | OnDemand |  Performance  |  PowerSave]
513              which  will set the scaling_governor to the corresponding value.
514              The governor has to be in the list set by the slurm.conf  option
515              CpuFreqGovernors.
516
517              When p2 is present, p1 will be the minimum scaling frequency and
518              p2 will be the maximum scaling frequency.
519
520              p2 can be  [#### | medium | high | highm1] p2  must  be  greater
521              than p1.
522
523              p3  can  be [Conservative | OnDemand | Performance | PowerSave |
524              UserSpace] which will set  the  governor  to  the  corresponding
525              value.
526
527              If p3 is UserSpace, the frequency scaling_speed will be set by a
528              power or energy aware scheduling strategy to a value between  p1
529              and  p2  that lets the job run within the site's power goal. The
530              job may be delayed if p1 is higher than a frequency that  allows
531              the job to run within the goal.
532
533              If  the current frequency is < min, it will be set to min. Like‐
534              wise, if the current frequency is > max, it will be set to max.
535
536              Acceptable values at present include:
537
538              ####          frequency in kilohertz
539
540              Low           the lowest available frequency
541
542              High          the highest available frequency
543
544              HighM1        (high minus one)  will  select  the  next  highest
545                            available frequency
546
547              Medium        attempts  to  set a frequency in the middle of the
548                            available range
549
550              Conservative  attempts to use the Conservative CPU governor
551
552              OnDemand      attempts to use the  OnDemand  CPU  governor  (the
553                            default value)
554
555              Performance   attempts to use the Performance CPU governor
556
557              PowerSave     attempts to use the PowerSave CPU governor
558
559              UserSpace     attempts to use the UserSpace CPU governor
560
561
562              The  following  informational environment variable is set
563              in the job
564              step when --cpu-freq option is requested.
565                      SLURM_CPU_FREQ_REQ
566
567              This environment variable can also be used to supply  the  value
568              for  the CPU frequency request if it is set when the 'srun' com‐
569              mand is issued.  The --cpu-freq on the command line  will  over‐
570              ride  the  environment variable value.  The form on the environ‐
571              ment variable is the same as the command line.  See the ENVIRON‐
572              MENT    VARIABLES    section    for   a   description   of   the
573              SLURM_CPU_FREQ_REQ variable.
574
575              NOTE: This parameter is treated as a request, not a requirement.
576              If  the  job  step's  node does not support setting the CPU fre‐
577              quency, or the requested value is  outside  the  bounds  of  the
578              legal  frequencies,  an  error  is  logged,  but the job step is
579              allowed to continue.
580
581              NOTE: Setting the frequency for just the CPUs of  the  job  step
582              implies that the tasks are confined to those CPUs.  If task con‐
583              finement    (i.e.,    TaskPlugin=task/affinity    or    TaskPlu‐
584              gin=task/cgroup with the "ConstrainCores" option) is not config‐
585              ured, this parameter is ignored.
586
587              NOTE: When the step completes, the  frequency  and  governor  of
588              each selected CPU is reset to the previous values.
589
590              NOTE: When submitting jobs with  the --cpu-freq option with lin‐
591              uxproc as the ProctrackType can cause jobs to  run  too  quickly
592              before  Accounting  is  able  to  poll for job information. As a
593              result not all of accounting information will be present.
594
595              This option applies to job and step allocations.
596
597
598       --cpus-per-gpu=<ncpus>
599              Advise Slurm that ensuing job steps will require  ncpus  proces‐
600              sors  per  allocated GPU.  Requires the --gpus option.  Not com‐
601              patible with the --cpus-per-task option.
602
603
604       -c, --cpus-per-task=<ncpus>
605              Request that ncpus be allocated per process. This may be  useful
606              if  the  job is multithreaded and requires more than one CPU per
607              task for  optimal  performance.  The  default  is  one  CPU  per
608              process.   If  -c is specified without -n, as many tasks will be
609              allocated per node as possible while satisfying the -c  restric‐
610              tion.  For  instance  on  a  cluster with 8 CPUs per node, a job
611              request for 4 nodes and 3 CPUs per task may be allocated 3 or  6
612              CPUs  per  node  (1 or 2 tasks per node) depending upon resource
613              consumption by other jobs. Such a job may be unable  to  execute
614              more than a total of 4 tasks.  This option may also be useful to
615              spawn tasks without allocating resources to the  job  step  from
616              the  job's  allocation  when running multiple job steps with the
617              --exclusive option.
618
619              WARNING: There are configurations and options  interpreted  dif‐
620              ferently by job and job step requests which can result in incon‐
621              sistencies   for   this   option.    For   example   srun    -c2
622              --threads-per-core=1  prog  may  allocate two cores for the job,
623              but if each of those cores contains two threads, the job alloca‐
624              tion  will  include four CPUs. The job step allocation will then
625              launch two threads per CPU for a total of two tasks.
626
627              WARNING: When srun is executed from  within  salloc  or  sbatch,
628              there  are configurations and options which can result in incon‐
629              sistent allocations when -c has a value greater than -c on  sal‐
630              loc or sbatch.
631
632              This option applies to job allocations.
633
634
635       --deadline=<OPT>
636              remove  the  job  if  no ending is possible before this deadline
637              (start > (deadline -  time[-min])).   Default  is  no  deadline.
638              Valid time formats are:
639              HH:MM[:SS] [AM|PM]
640              MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
641              MM/DD[/YY]-HH:MM[:SS]
642              YYYY-MM-DD[THH:MM[:SS]]]
643
644              This option applies only to job allocations.
645
646
647       --delay-boot=<minutes>
648              Do  not  reboot  nodes  in order to satisfied this job's feature
649              specification if the job has been eligible to run for less  than
650              this time period.  If the job has waited for less than the spec‐
651              ified period, it will use only  nodes  which  already  have  the
652              specified  features.   The  argument  is in units of minutes.  A
653              default value may be set by a  system  administrator  using  the
654              delay_boot   option  of  the  SchedulerParameters  configuration
655              parameter in the slurm.conf file, otherwise the default value is
656              zero (no delay).
657
658              This option applies only to job allocations.
659
660
661       -d, --dependency=<dependency_list>
662              Defer  the  start  of  this job until the specified dependencies
663              have been satisfied completed. This option does not apply to job
664              steps  (executions  of  srun within an existing salloc or sbatch
665              allocation) only to job allocations.   <dependency_list>  is  of
666              the    form   <type:job_id[:job_id][,type:job_id[:job_id]]>   or
667              <type:job_id[:job_id][?type:job_id[:job_id]]>.  All dependencies
668              must  be satisfied if the "," separator is used.  Any dependency
669              may be satisfied if the "?" separator is used.   Many  jobs  can
670              share the same dependency and these jobs may even belong to dif‐
671              ferent  users. The  value may be changed  after  job  submission
672              using  the scontrol command.  Once a job dependency fails due to
673              the termination state of a preceding job, the dependent job will
674              never  be  run,  even if the preceding job is requeued and has a
675              different termination state  in  a  subsequent  execution.  This
676              option applies to job allocations.
677
678              after:job_id[:jobid...]
679                     This  job  can  begin  execution after the specified jobs
680                     have begun execution.
681
682              afterany:job_id[:jobid...]
683                     This job can begin execution  after  the  specified  jobs
684                     have terminated.
685
686              afterburstbuffer:job_id[:jobid...]
687                     This  job  can  begin  execution after the specified jobs
688                     have terminated and any associated burst buffer stage out
689                     operations have completed.
690
691              aftercorr:job_id[:jobid...]
692                     A  task  of  this job array can begin execution after the
693                     corresponding task ID in the specified job has  completed
694                     successfully  (ran  to  completion  with  an exit code of
695                     zero).
696
697              afternotok:job_id[:jobid...]
698                     This job can begin execution  after  the  specified  jobs
699                     have terminated in some failed state (non-zero exit code,
700                     node failure, timed out, etc).
701
702              afterok:job_id[:jobid...]
703                     This job can begin execution  after  the  specified  jobs
704                     have  successfully  executed  (ran  to completion with an
705                     exit code of zero).
706
707              expand:job_id
708                     Resources allocated to this job should be used to  expand
709                     the specified job.  The job to expand must share the same
710                     QOS (Quality of Service) and partition.  Gang  scheduling
711                     of resources in the partition is also not supported.
712
713              singleton
714                     This   job  can  begin  execution  after  any  previously
715                     launched jobs sharing the same job  name  and  user  have
716                     terminated.   In  other  words, only one job by that name
717                     and owned by that user can be running or suspended at any
718                     point in time.
719
720
721       -D, --chdir=<path>
722              Have  the  remote  processes do a chdir to path before beginning
723              execution. The default is to chdir to the current working direc‐
724              tory of the srun process. The path can be specified as full path
725              or relative path to the directory where the command is executed.
726              This option applies to job allocations.
727
728
729       -e, --error=<filename pattern>
730              Specify  how  stderr is to be redirected. By default in interac‐
731              tive mode, srun redirects stderr to the same file as stdout,  if
732              one is specified. The --error option is provided to allow stdout
733              and stderr to be redirected to different locations.  See IO  Re‐
734              direction below for more options.  If the specified file already
735              exists, it will be overwritten. This option applies to  job  and
736              step allocations.
737
738
739       -E, --preserve-env
740              Pass the current values of environment variables SLURM_JOB_NODES
741              and SLURM_NTASKS through to the executable, rather than  comput‐
742              ing them from commandline parameters. This option applies to job
743              allocations.
744
745
746       --epilog=<executable>
747              srun will run executable just after the job step completes.  The
748              command  line  arguments  for executable will be the command and
749              arguments of the job step.  If executable  is  "none",  then  no
750              srun epilog will be run. This parameter overrides the SrunEpilog
751              parameter in slurm.conf. This parameter is  completely  indepen‐
752              dent  from  the  Epilog  parameter  in  slurm.conf.  This option
753              applies to job allocations.
754
755
756
757       --exclusive[=user|mcs]
758              This option applies to job and job step allocations, and has two
759              slightly different meanings for each one.  When used to initiate
760              a job, the job allocation cannot share nodes with other  running
761              jobs   (or  just  other  users with the "=user" option or "=mcs"
762              option).  The default shared/exclusive behavior depends on  sys‐
763              tem configuration and the partition's OverSubscribe option takes
764              precedence over the job's option.
765
766              This option can also be used when initiating more than  one  job
767              step within an existing resource allocation, where you want sep‐
768              arate processors to be dedicated to each job step. If sufficient
769              processors  are  not available to initiate the job step, it will
770              be deferred. This can be thought of as providing a mechanism for
771              resource management to the job within it's allocation.
772
773              The  exclusive  allocation  of  CPUs  only  applies to job steps
774              explicitly invoked with the --exclusive option.  For example,  a
775              job  might  be  allocated  one  node with four CPUs and a remote
776              shell invoked on the  allocated  node.  If  that  shell  is  not
777              invoked  with  the  --exclusive option, then it may create a job
778              step with four tasks using the --exclusive option and  not  con‐
779              flict  with  the  remote  shell's  resource allocation.  Use the
780              --exclusive option to invoke every job step to  ensure  distinct
781              resources for each step.
782
783              Note  that all CPUs allocated to a job are available to each job
784              step unless the --exclusive option is used plus task affinity is
785              configured.  Since resource management is provided by processor,
786              the --ntasks option must be specified, but the following options
787              should  NOT  be  specified --relative, --distribution=arbitrary.
788              See EXAMPLE below.
789
790
791       --export=<environment variables [ALL] | NONE>
792              Identify which  environment  variables  are  propagated  to  the
793              launched application.  By default, all are propagated.  Multiple
794              environment variable names should be comma separated.   Environ‐
795              ment  variable  names  may be specified to propagate the current
796              value  (e.g.  "--export=EDITOR")  or  specific  values  may   be
797              exported (e.g. "--export=EDITOR=/bin/emacs"). In these two exam‐
798              ples, the propagated environment will only contain the  variable
799              EDITOR.   If  one  desires  to add to the environment instead of
800              replacing   it,   have   the   argument   include   ALL    (e.g.
801              "--export=ALL,EDITOR=/bin/emacs").   This  will propagate EDITOR
802              along with the current environment.  Unlike sbatch,  if  ALL  is
803              specified,  any  additional  specified environment variables are
804              ignored.  If one desires no environment variables be propagated,
805              use  the  argument NONE.  Regardless of this setting, the appro‐
806              priate SLURM_* task environment variables are always exported to
807              the  environment.   srun  may deviate from the above behavior if
808              the default launch plugin, launch/slurm, is not used.
809
810
811       -F, --nodefile=<node file>
812              Much like --nodelist, but the list is contained  in  a  file  of
813              name node file.  The node names of the list may also span multi‐
814              ple lines in the file.    Duplicate node names in the file  will
815              be  ignored.   The  order  of  the node names in the list is not
816              important; the node names will be sorted by Slurm.
817
818
819       --gid=<group>
820              If srun is run as root, and the --gid option is used, submit the
821              job  with  group's  group  access permissions.  group may be the
822              group name or the numerical group ID. This option applies to job
823              allocations.
824
825
826       -G, --gpus=[<type>:]<number>
827              Specify  the  total  number  of  GPUs  required for the job.  An
828              optional GPU type specification can be  supplied.   For  example
829              "--gpus=volta:3".   Multiple options can be requested in a comma
830              separated list,  for  example:  "--gpus=volta:3,kepler:1".   See
831              also  the --gpus-per-node, --gpus-per-socket and --gpus-per-task
832              options.
833
834
835       --gpu-bind=<type>
836              Bind tasks to specific GPUs.  By default every spawned task  can
837              access every GPU allocated to the job.
838
839              Supported type options:
840
841              closest   Bind  each task to the GPU(s) which are closest.  In a
842                        NUMA environment, each task may be bound to more  than
843                        one GPU (i.e.  all GPUs in that NUMA environment).
844
845              map_gpu:<list>
846                        Bind by setting GPU masks on tasks (or ranks) as spec‐
847                        ified            where            <list>            is
848                        <gpu_id_for_task_0>,<gpu_id_for_task_1>,...   GPU  IDs
849                        are interpreted as decimal values unless they are pre‐
850                        ceded  with  '0x'  in  which  case they interpreted as
851                        hexadecimal values. If the number of tasks (or  ranks)
852                        exceeds  the number of elements in this list, elements
853                        in the list will be reused as needed starting from the
854                        beginning  of  the list. To simplify support for large
855                        task counts, the lists may follow a map with an aster‐
856                        isk     and    repetition    count.     For    example
857                        "map_gpu:0*4,1*4".  Not supported  unless  the  entire
858                        node is allocated to the job.
859
860              mask_gpu:<list>
861                        Bind by setting GPU masks on tasks (or ranks) as spec‐
862                        ified            where            <list>            is
863                        <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,...    The
864                        mapping is specified for a node and identical  mapping
865                        is applied to the tasks on every node (i.e. the lowest
866                        task ID on each node is mapped to the first mask spec‐
867                        ified  in the list, etc.). GPU masks are always inter‐
868                        preted as hexadecimal values but can be preceded  with
869                        an optional '0x'. Not supported unless the entire node
870                        is allocated to the job. To simplify support for large
871                        task counts, the lists may follow a map with an aster‐
872                        isk    and    repetition    count.     For     example
873                        "mask_gpu:0x0f*4,0xf0*4".   Not  supported  unless the
874                        entire node is allocated to the job.
875
876
877       --gpu-freq=[<type]=value>[,<type=value>][,verbose]
878              Request that GPUs allocated to the job are configured with  spe‐
879              cific  frequency  values.   This  option can be used to indepen‐
880              dently configure the GPU and its memory frequencies.  After  the
881              job  is  completed, the frequencies of all affected GPUs will be
882              reset to the highest possible values.   In  some  cases,  system
883              power  caps  may  override the requested values.  The field type
884              can be "memory".  If type is not specified, the GPU frequency is
885              implied.  The value field can either be "low", "medium", "high",
886              "highm1" or a numeric value in megahertz (MHz).  If  the  speci‐
887              fied numeric value is not possible, a value as close as possible
888              will be used. See below for definition of the values.  The  ver‐
889              bose  option  causes  current  GPU  frequency  information to be
890              logged.  Examples of use include "--gpu-freq=medium,memory=high"
891              and "--gpu-freq=450".
892
893              Supported value definitions:
894
895              low       the lowest available frequency.
896
897              medium    attempts  to  set  a  frequency  in  the middle of the
898                        available range.
899
900              high      the highest available frequency.
901
902              highm1    (high minus one) will select the next  highest  avail‐
903                        able frequency.
904
905
906       --gpus-per-node=[<type>:]<number>
907              Specify  the  number  of  GPUs required for the job on each node
908              included in the job's resource allocation.  An optional GPU type
909              specification      can     be     supplied.      For     example
910              "--gpus-per-node=volta:3".  Multiple options can be requested in
911              a       comma       separated       list,      for      example:
912              "--gpus-per-node=volta:3,kepler:1".   See   also   the   --gpus,
913              --gpus-per-socket and --gpus-per-task options.
914
915
916       --gpus-per-socket=[<type>:]<number>
917              Specify  the  number of GPUs required for the job on each socket
918              included in the job's resource allocation.  An optional GPU type
919              specification      can     be     supplied.      For     example
920              "--gpus-per-socket=volta:3".  Multiple options can be  requested
921              in      a     comma     separated     list,     for     example:
922              "--gpus-per-socket=volta:3,kepler:1".  Requires job to specify a
923              sockets  per  node  count  (  --sockets-per-node).  See also the
924              --gpus,  --gpus-per-node  and  --gpus-per-task  options.    This
925              option applies to job allocations.
926
927
928       --gpus-per-task=[<type>:]<number>
929              Specify  the number of GPUs required for the job on each task to
930              be spawned in the job's resource allocation.   An  optional  GPU
931              type  specification  can  be supplied.  This option requires the
932              specification    of    a    task     count.      For     example
933              "--gpus-per-task=volta:1".  Multiple options can be requested in
934              a      comma      separated       list,       for       example:
935              "--gpus-per-task=volta:3,kepler:1".   Requires  job to specify a
936              task count (--nodes).  See also  the  --gpus,  --gpus-per-socket
937              and --gpus-per-node options.
938
939
940       --gres=<list>
941              Specifies   a   comma   delimited  list  of  generic  consumable
942              resources.   The  format  of  each  entry   on   the   list   is
943              "name[[:type]:count]".   The  name  is  that  of  the consumable
944              resource.  The count is the number of  those  resources  with  a
945              default  value  of 1.  The count can have a suffix of "k" or "K"
946              (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
947              "G"  (multiple  of  1024 x 1024 x 1024), "t" or "T" (multiple of
948              1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x  1024
949              x  1024  x  1024 x 1024).  The specified resources will be allo‐
950              cated to the job on each node.  The available generic consumable
951              resources  is  configurable by the system administrator.  A list
952              of available generic consumable resources will  be  printed  and
953              the  command  will exit if the option argument is "help".  Exam‐
954              ples of use include "--gres=gpu:2,mic:1", "--gres=gpu:kepler:2",
955              and  "--gres=help".   NOTE:  This option applies to job and step
956              allocations. By default, a job step  is  allocated  all  of  the
957              generic resources that have allocated to the job.  To change the
958              behavior  so  that  each  job  step  is  allocated  no   generic
959              resources,  explicitly  set  the value of --gres to specify zero
960              counts for each generic resource OR set "--gres=none" OR set the
961              SLURM_STEP_GRES environment variable to "none".
962
963
964       --gres-flags=<type>
965              Specify  generic  resource  task  binding  options.  This option
966              applies to job allocations.
967
968              disable-binding
969                     Disable  filtering  of  CPUs  with  respect  to   generic
970                     resource  locality.  This option is currently required to
971                     use more CPUs than are bound to a GRES (i.e. if a GPU  is
972                     bound  to  the  CPUs on one socket, but resources on more
973                     than one socket are  required  to  run  the  job).   This
974                     option  may permit a job to be allocated resources sooner
975                     than otherwise possible, but may result in lower job per‐
976                     formance.
977
978              enforce-binding
979                     The only CPUs available to the job will be those bound to
980                     the selected  GRES  (i.e.  the  CPUs  identified  in  the
981                     gres.conf  file  will  be strictly enforced). This option
982                     may result in delayed initiation of a job.  For example a
983                     job  requiring two GPUs and one CPU will be delayed until
984                     both GPUs on a single socket are  available  rather  than
985                     using  GPUs bound to separate sockets, however the appli‐
986                     cation performance may be improved due to improved commu‐
987                     nication  speed.  Requires the node to be configured with
988                     more than one socket and resource filtering will be  per‐
989                     formed on a per-socket basis.
990
991
992       -H, --hold
993              Specify  the job is to be submitted in a held state (priority of
994              zero).  A held job can now be released using scontrol  to  reset
995              its  priority  (e.g.  "scontrol  release <job_id>"). This option
996              applies to job allocations.
997
998
999       -h, --help
1000              Display help information and exit.
1001
1002
1003       --hint=<type>
1004              Bind tasks according to application hints.
1005
1006              compute_bound
1007                     Select settings for compute bound applications:  use  all
1008                     cores in each socket, one thread per core.
1009
1010              memory_bound
1011                     Select  settings  for memory bound applications: use only
1012                     one core in each socket, one thread per core.
1013
1014              [no]multithread
1015                     [don't] use extra threads  with  in-core  multi-threading
1016                     which  can  benefit communication intensive applications.
1017                     Only supported with the task/affinity plugin.
1018
1019              help   show this help message
1020
1021              This option applies to job allocations.
1022
1023
1024       -I, --immediate[=<seconds>]
1025              exit if resources are not available within the time period spec‐
1026              ified.   If  no  argument  is  given  (seconds  defaults  to 1),
1027              resources must be available immediately for the request to  suc‐
1028              ceed.  If  defer  is  configured in SchedulerParameters and sec‐
1029              onds=1 the allocation request will fail immediately; defer  con‐
1030              flicts  and  takes  precedence  over  this  option.  By default,
1031              --immediate is off, and the command will block  until  resources
1032              become  available. Since this option's argument is optional, for
1033              proper parsing the single letter option must be followed immedi‐
1034              ately  with  the value and not include a space between them. For
1035              example "-I60" and not "-I 60". This option applies to  job  and
1036              step allocations.
1037
1038
1039       -i, --input=<mode>
1040              Specify  how  stdin is to redirected. By default, srun redirects
1041              stdin from the terminal all tasks. See IO Redirection below  for
1042              more  options.   For  OS X, the poll() function does not support
1043              stdin, so input from a terminal is  not  possible.  This  option
1044              applies to job and step allocations.
1045
1046
1047       -J, --job-name=<jobname>
1048              Specify a name for the job. The specified name will appear along
1049              with the job id number when querying running jobs on the system.
1050              The  default  is  the  supplied executable program's name. NOTE:
1051              This information may be written to the  slurm_jobacct.log  file.
1052              This  file  is space delimited so if a space is used in the job‐
1053              name name it will cause problems in properly displaying the con‐
1054              tents  of  the  slurm_jobacct.log file when the sacct command is
1055              used. This option applies to job and step allocations.
1056
1057
1058       --jobid=<jobid>
1059              Initiate a job step under an already allocated job with  job  id
1060              id.   Using  this option will cause srun to behave exactly as if
1061              the SLURM_JOB_ID  environment  variable  was  set.  This  option
1062              applies to step allocations.
1063
1064
1065       -K, --kill-on-bad-exit[=0|1]
1066              Controls  whether  or  not to terminate a step if any task exits
1067              with a non-zero exit code. If this option is not specified,  the
1068              default action will be based upon the Slurm configuration param‐
1069              eter of KillOnBadExit. If this option is specified, it will take
1070              precedence  over  KillOnBadExit. An option argument of zero will
1071              not terminate the job. A non-zero argument or no  argument  will
1072              terminate  the job.  Note: This option takes precedence over the
1073              -W, --wait option to terminate the job  immediately  if  a  task
1074              exits  with  a non-zero exit code.  Since this option's argument
1075              is optional, for proper parsing the single letter option must be
1076              followed  immediately  with  the  value  and not include a space
1077              between them. For example "-K1" and not "-K 1".
1078
1079
1080       -k, --no-kill [=off]
1081              Do not automatically terminate a job if one of the nodes it  has
1082              been  allocated fails. This option applies to job and step allo‐
1083              cations.   The  job  will  assume   all   responsibilities   for
1084              fault-tolerance.   Tasks  launch  using  this option will not be
1085              considered  terminated  (e.g.  -K,  --kill-on-bad-exit  and  -W,
1086              --wait  options  will  have  no  effect upon the job step).  The
1087              active job step (MPI job) will likely suffer a fatal error,  but
1088              subsequent job steps may be run if this option is specified.
1089
1090              Specify  an optional argument of "off" disable the effect of the
1091              SLURM_NO_KILL environment variable.
1092
1093              The default action is to terminate the job upon node failure.
1094
1095
1096       -l, --label
1097              Prepend task number to lines of stdout/err.  The --label  option
1098              will  prepend  lines  of  output  with  the remote task id. This
1099              option applies to step allocations.
1100
1101
1102       -L, --licenses=<license>
1103              Specification of licenses (or other resources available  on  all
1104              nodes  of  the  cluster)  which  must  be allocated to this job.
1105              License names can be followed by a colon and count (the  default
1106              count is one).  Multiple license names should be comma separated
1107              (e.g.  "--licenses=foo:4,bar"). This option applies to job allo‐
1108              cations.
1109
1110
1111       -M, --clusters=<string>
1112              Clusters  to  issue  commands to.  Multiple cluster names may be
1113              comma separated.  The job will be submitted to the  one  cluster
1114              providing the earliest expected job initiation time. The default
1115              value is the current cluster. A value of 'all' will query to run
1116              on  all  clusters.  Note the --export option to control environ‐
1117              ment variables exported between clusters.  This  option  applies
1118              only  to job allocations.  Note that the SlurmDBD must be up for
1119              this option to work properly.
1120
1121
1122       -m, --distribution=
1123              *|block|cyclic|arbitrary|plane=<options>
1124              [:*|block|cyclic|fcyclic[:*|block|
1125              cyclic|fcyclic]][,Pack|NoPack]
1126
1127              Specify alternate distribution  methods  for  remote  processes.
1128              This  option  controls the distribution of tasks to the nodes on
1129              which resources have been allocated,  and  the  distribution  of
1130              those  resources to tasks for binding (task affinity). The first
1131              distribution method (before the first ":") controls the  distri‐
1132              bution of tasks to nodes.  The second distribution method (after
1133              the first ":")  controls  the  distribution  of  allocated  CPUs
1134              across  sockets  for  binding  to  tasks. The third distribution
1135              method (after the second ":") controls the distribution of allo‐
1136              cated  CPUs  across  cores for binding to tasks.  The second and
1137              third distributions apply only if task affinity is enabled.  The
1138              third  distribution  is supported only if the task/cgroup plugin
1139              is configured. The default value for each distribution  type  is
1140              specified by *.
1141
1142              Note  that with select/cons_res, the number of CPUs allocated on
1143              each   socket   and   node   may   be   different.   Refer    to
1144              https://slurm.schedmd.com/mc_support.html  for  more information
1145              on resource allocation, distribution  of  tasks  to  nodes,  and
1146              binding of tasks to CPUs.
1147              First distribution method (distribution of tasks across nodes):
1148
1149
1150              *      Use  the  default  method for distributing tasks to nodes
1151                     (block).
1152
1153              block  The block distribution method will distribute tasks to  a
1154                     node  such that consecutive tasks share a node. For exam‐
1155                     ple, consider an allocation of three nodes each with  two
1156                     cpus.  A  four-task  block distribution request will dis‐
1157                     tribute those tasks to the nodes with tasks one  and  two
1158                     on  the  first  node,  task three on the second node, and
1159                     task four on the third node.  Block distribution  is  the
1160                     default  behavior if the number of tasks exceeds the num‐
1161                     ber of allocated nodes.
1162
1163              cyclic The cyclic distribution method will distribute tasks to a
1164                     node  such  that  consecutive  tasks are distributed over
1165                     consecutive nodes (in a round-robin fashion).  For  exam‐
1166                     ple,  consider an allocation of three nodes each with two
1167                     cpus. A four-task cyclic distribution request  will  dis‐
1168                     tribute  those tasks to the nodes with tasks one and four
1169                     on the first node, task two on the second node, and  task
1170                     three  on  the  third node.  Note that when SelectType is
1171                     select/cons_res, the same number of CPUs may not be allo‐
1172                     cated on each node. Task distribution will be round-robin
1173                     among all the nodes with  CPUs  yet  to  be  assigned  to
1174                     tasks.   Cyclic  distribution  is the default behavior if
1175                     the number of tasks is no larger than the number of allo‐
1176                     cated nodes.
1177
1178              plane  The  tasks are distributed in blocks of a specified size.
1179                     The options include a number representing the size of the
1180                     task  block.   This is followed by an optional specifica‐
1181                     tion of the task distribution scheme within  a  block  of
1182                     tasks  and  between  the  blocks of tasks.  The number of
1183                     tasks distributed to each node is the same as for  cyclic
1184                     distribution,  but  the  taskids  assigned  to  each node
1185                     depend on the plane size.  For  more  details  (including
1186                     examples and diagrams), please see
1187                     https://slurm.schedmd.com/mc_support.html
1188                     and
1189                     https://slurm.schedmd.com/dist_plane.html
1190
1191              arbitrary
1192                     The  arbitrary  method of distribution will allocate pro‐
1193                     cesses in-order as listed in file designated by the envi‐
1194                     ronment  variable  SLURM_HOSTFILE.   If  this variable is
1195                     listed it will over ride any other method specified.   If
1196                     not  set  the  method  will default to block.  Inside the
1197                     hostfile must contain at  minimum  the  number  of  hosts
1198                     requested  and  be  one  per line or comma separated.  If
1199                     specifying a task  count  (-n,  --ntasks=<number>),  your
1200                     tasks  will  be laid out on the nodes in the order of the
1201                     file.
1202                     NOTE: The arbitrary distribution option on a job  alloca‐
1203                     tion  only  controls the nodes to be allocated to the job
1204                     and not the allocation  of  CPUs  on  those  nodes.  This
1205                     option  is  meant  primarily to control a job step's task
1206                     layout in an existing job allocation for  the  srun  com‐
1207                     mand.
1208                     NOTE: If number of tasks is given and a list of requested
1209                     nodes is also given the number of nodes  used  from  that
1210                     list will be reduced to match that of the number of tasks
1211                     if the number of nodes in the list is  greater  than  the
1212                     number of tasks.
1213
1214
1215              Second  distribution method (distribution of CPUs across sockets
1216              for binding):
1217
1218
1219              *      Use the default method for distributing CPUs across sock‐
1220                     ets (cyclic).
1221
1222              block  The  block  distribution method will distribute allocated
1223                     CPUs consecutively from the same socket  for  binding  to
1224                     tasks, before using the next consecutive socket.
1225
1226              cyclic The  cyclic distribution method will distribute allocated
1227                     CPUs for binding to a given task consecutively  from  the
1228                     same socket, and from the next consecutive socket for the
1229                     next task, in a round-robin fashion across sockets.
1230
1231              fcyclic
1232                     The fcyclic distribution method will distribute allocated
1233                     CPUs  for  binding to tasks from consecutive sockets in a
1234                     round-robin fashion across the sockets.
1235
1236
1237              Third distribution method (distribution of CPUs across cores for
1238              binding):
1239
1240
1241              *      Use the default method for distributing CPUs across cores
1242                     (inherited from second distribution method).
1243
1244              block  The block distribution method will  distribute  allocated
1245                     CPUs  consecutively  from  the  same  core for binding to
1246                     tasks, before using the next consecutive core.
1247
1248              cyclic The cyclic distribution method will distribute  allocated
1249                     CPUs  for  binding to a given task consecutively from the
1250                     same core, and from the next  consecutive  core  for  the
1251                     next task, in a round-robin fashion across cores.
1252
1253              fcyclic
1254                     The fcyclic distribution method will distribute allocated
1255                     CPUs for binding to tasks from  consecutive  cores  in  a
1256                     round-robin fashion across the cores.
1257
1258
1259
1260              Optional control for task distribution over nodes:
1261
1262
1263              Pack   Rather than evenly distributing a job step's tasks evenly
1264                     across it's allocated nodes, pack them as tightly as pos‐
1265                     sible on the nodes.
1266
1267              NoPack Rather than packing a job step's tasks as tightly as pos‐
1268                     sible on the nodes, distribute them  evenly.   This  user
1269                     option    will    supersede    the   SelectTypeParameters
1270                     CR_Pack_Nodes configuration parameter.
1271
1272              This option applies to job and step allocations.
1273
1274
1275       --mail-type=<type>
1276              Notify user by email when certain event types occur.  Valid type
1277              values  are  NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1278              BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buf‐
1279              fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1280              (reached 90 percent of time limit),  TIME_LIMIT_80  (reached  80
1281              percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1282              time limit).  Multiple type values may be specified in  a  comma
1283              separated  list.   The  user  to  be  notified is indicated with
1284              --mail-user. This option applies to job allocations.
1285
1286
1287       --mail-user=<user>
1288              User to receive email notification of state changes  as  defined
1289              by  --mail-type.  The default value is the submitting user. This
1290              option applies to job allocations.
1291
1292
1293       --mcs-label=<mcs>
1294              Used only when the mcs/group plugin is enabled.  This  parameter
1295              is  a group among the groups of the user.  Default value is cal‐
1296              culated by the Plugin mcs if it's enabled. This  option  applies
1297              to job allocations.
1298
1299
1300       --mem=<size[units]>
1301              Specify  the  real  memory required per node.  Default units are
1302              megabytes unless the SchedulerParameters configuration parameter
1303              includes  the  "default_gbytes" option for gigabytes.  Different
1304              units can be specified  using  the  suffix  [K|M|G|T].   Default
1305              value  is  DefMemPerNode and the maximum value is MaxMemPerNode.
1306              If configured, both of parameters can be seen using the scontrol
1307              show  config command.  This parameter would generally be used if
1308              whole nodes are allocated  to  jobs  (SelectType=select/linear).
1309              Specifying  a  memory limit of zero for a job step will restrict
1310              the job step to the amount of memory allocated to the  job,  but
1311              not  remove any of the job's memory allocation from being avail‐
1312              able  to  other  job  steps.    Also   see   --mem-per-cpu   and
1313              --mem-per-gpu.    The  --mem,  --mem-per-cpu  and  --mem-per-gpu
1314              options are  mutually  exclusive.  If  --mem,  --mem-per-cpu  or
1315              --mem-per-gpu are specified as command line arguments, then they
1316              will take precedence over the environment (potentially inherited
1317              from salloc or sbatch).
1318
1319              NOTE:  A  memory size specification of zero is treated as a spe‐
1320              cial case and grants the job access to all of the memory on each
1321              node  for newly submitted jobs and all available job memory to a
1322              new job steps.
1323
1324              Specifying new memory limits for job steps are only advisory.
1325
1326              If the job is allocated multiple nodes in a heterogeneous  clus‐
1327              ter,  the  memory limit on each node will be that of the node in
1328              the allocation with the smallest memory size  (same  limit  will
1329              apply to every node in the job's allocation).
1330
1331              NOTE:  Enforcement  of  memory  limits currently relies upon the
1332              task/cgroup plugin or enabling of accounting, which samples mem‐
1333              ory  use on a periodic basis (data need not be stored, just col‐
1334              lected). In both cases memory use is based upon the job's  Resi‐
1335              dent  Set  Size  (RSS). A task may exceed the memory limit until
1336              the next periodic accounting sample.
1337
1338              This option applies to job and step allocations.
1339
1340
1341       --mem-per-cpu=<size[units]>
1342              Minimum memory required per allocated CPU.   Default  units  are
1343              megabytes unless the SchedulerParameters configuration parameter
1344              includes the "default_gbytes" option for  gigabytes.   Different
1345              units  can  be  specified  using  the suffix [K|M|G|T].  Default
1346              value is DefMemPerCPU and the maximum value is MaxMemPerCPU (see
1347              exception  below). If configured, both of parameters can be seen
1348              using the scontrol show config command.  Note that if the  job's
1349              --mem-per-cpu  value  exceeds  the configured MaxMemPerCPU, then
1350              the user's limit will be treated as a  memory  limit  per  task;
1351              --mem-per-cpu  will be reduced to a value no larger than MaxMem‐
1352              PerCPU;  --cpus-per-task  will  be  set   and   the   value   of
1353              --cpus-per-task  multiplied  by the new --mem-per-cpu value will
1354              equal the original --mem-per-cpu value specified  by  the  user.
1355              This  parameter would generally be used if individual processors
1356              are  allocated   to   jobs   (SelectType=select/cons_res).    If
1357              resources  are allocated by the core, socket or whole nodes; the
1358              number of CPUs allocated to a job may be higher  than  the  task
1359              count  and the value of --mem-per-cpu should be adjusted accord‐
1360              ingly.  Specifying a memory limit of zero for a  job  step  will
1361              restrict  the  job step to the amount of memory allocated to the
1362              job, but not remove any of  the  job's  memory  allocation  from
1363              being  available  to  other  job  steps.   Also  see  --mem  and
1364              --mem-per-gpu.   The  --mem,  --mem-per-cpu  and   --mem-per-gpu
1365              options are mutually exclusive.
1366
1367              NOTE:If  the  final amount of memory requested by job (eg.: when
1368              --mem-per-cpu use with --exclusive option) can't be satisfied by
1369              any  of  nodes  configured  in  the  partition,  the job will be
1370              rejected.
1371
1372
1373       --mem-per-gpu=<size[units]>
1374              Minimum memory required per allocated GPU.   Default  units  are
1375              megabytes unless the SchedulerParameters configuration parameter
1376              includes the "default_gbytes" option for  gigabytes.   Different
1377              units  can  be  specified  using  the suffix [K|M|G|T].  Default
1378              value is DefMemPerGPU and is available on both a global and  per
1379              partition  basis.   If  configured,  the  parameters can be seen
1380              using the scontrol show config and scontrol show partition  com‐
1381              mands.    Also   see   --mem.    The  --mem,  --mem-per-cpu  and
1382              --mem-per-gpu options are mutually exclusive.
1383
1384
1385       --mem-bind=[{quiet,verbose},]type
1386              Bind tasks to memory. Used only when the task/affinity plugin is
1387              enabled  and the NUMA memory functions are available.  Note that
1388              the resolution of CPU and memory  binding  may  differ  on  some
1389              architectures.  For example, CPU binding may be performed at the
1390              level of the cores within a processor while memory binding  will
1391              be  performed  at  the  level  of nodes, where the definition of
1392              "nodes" may differ from system to system.  By default no  memory
1393              binding is performed; any task using any CPU can use any memory.
1394              This option is typically used to ensure that each task is  bound
1395              to  the memory closest to it's assigned CPU. The use of any type
1396              other than "none" or "local" is not recommended.   If  you  want
1397              greater control, try running a simple test code with the options
1398              "--cpu-bind=verbose,none --mem-bind=verbose,none"  to  determine
1399              the specific configuration.
1400
1401              NOTE: To have Slurm always report on the selected memory binding
1402              for all commands executed in a shell,  you  can  enable  verbose
1403              mode by setting the SLURM_MEM_BIND environment variable value to
1404              "verbose".
1405
1406              The following informational environment variables are  set  when
1407              --mem-bind is in use:
1408
1409                   SLURM_MEM_BIND_LIST
1410                   SLURM_MEM_BIND_PREFER
1411                   SLURM_MEM_BIND_SORT
1412                   SLURM_MEM_BIND_TYPE
1413                   SLURM_MEM_BIND_VERBOSE
1414
1415              See  the  ENVIRONMENT  VARIABLES  section  for  a  more detailed
1416              description of the individual SLURM_MEM_BIND* variables.
1417
1418              Supported options include:
1419
1420              help   show this help message
1421
1422              local  Use memory local to the processor in use
1423
1424              map_mem:<list>
1425                     Bind by setting memory masks on tasks (or ranks) as spec‐
1426                     ified             where             <list>             is
1427                     <numa_id_for_task_0>,<numa_id_for_task_1>,...   The  map‐
1428                     ping  is  specified  for  a node and identical mapping is
1429                     applied to the tasks on every node (i.e. the lowest  task
1430                     ID  on  each  node is mapped to the first ID specified in
1431                     the list, etc.).  NUMA IDs  are  interpreted  as  decimal
1432                     values  unless  they are preceded with '0x' in which case
1433                     they interpreted as hexadecimal values.  If the number of
1434                     tasks  (or  ranks) exceeds the number of elements in this
1435                     list, elements in the  list  will  be  reused  as  needed
1436                     starting  from  the  beginning  of the list.  To simplify
1437                     support for large task counts, the lists may follow a map
1438                     with   an  asterisk  and  repetition  count  For  example
1439                     "map_mem:0x0f*4,0xf0*4".  Not supported unless the entire
1440                     node is allocated to the job.
1441
1442              mask_mem:<list>
1443                     Bind by setting memory masks on tasks (or ranks) as spec‐
1444                     ified             where             <list>             is
1445                     <numa_mask_for_task_0>,<numa_mask_for_task_1>,...     The
1446                     mapping is specified for a node and identical mapping  is
1447                     applied  to the tasks on every node (i.e. the lowest task
1448                     ID on each node is mapped to the first mask specified  in
1449                     the  list,  etc.).   NUMA masks are always interpreted as
1450                     hexadecimal values.  Note that  masks  must  be  preceded
1451                     with  a  '0x'  if they don't begin with [0-9] so they are
1452                     seen as numerical values.  If the  number  of  tasks  (or
1453                     ranks)  exceeds the number of elements in this list, ele‐
1454                     ments in the list will be reused as needed starting  from
1455                     the beginning of the list.  To simplify support for large
1456                     task counts, the lists may follow a mask with an asterisk
1457                     and repetition count For example "mask_mem:0*4,1*4".  Not
1458                     supported unless the entire node is allocated to the job.
1459
1460              no[ne] don't bind tasks to memory (default)
1461
1462              nosort avoid sorting free cache pages (default, LaunchParameters
1463                     configuration parameter can override this default)
1464
1465              p[refer]
1466                     Prefer use of first specified NUMA node, but permit
1467                      use of other available NUMA nodes.
1468
1469              q[uiet]
1470                     quietly bind before task runs (default)
1471
1472              rank   bind by task rank (not recommended)
1473
1474              sort   sort free cache pages (run zonesort on Intel KNL nodes)
1475
1476              v[erbose]
1477                     verbosely report binding before task runs
1478
1479              This option applies to job and step allocations.
1480
1481
1482       --mincpus=<n>
1483              Specify  a  minimum  number of logical cpus/processors per node.
1484              This option applies to job allocations.
1485
1486
1487       --msg-timeout=<seconds>
1488              Modify the job launch message timeout.   The  default  value  is
1489              MessageTimeout  in  the  Slurm  configuration  file  slurm.conf.
1490              Changes to this are typically not recommended, but could be use‐
1491              ful  to  diagnose  problems.  This option applies to job alloca‐
1492              tions.
1493
1494
1495       --mpi=<mpi_type>
1496              Identify the type of MPI to be used. May result in unique initi‐
1497              ation procedures.
1498
1499              list   Lists available mpi types to choose from.
1500
1501              openmpi
1502                     For use with OpenMPI.
1503
1504              pmi2   To  enable  PMI2 support. The PMI2 support in Slurm works
1505                     only if the MPI  implementation  supports  it,  in  other
1506                     words  if the MPI has the PMI2 interface implemented. The
1507                     --mpi=pmi2 will load  the  library  lib/slurm/mpi_pmi2.so
1508                     which  provides  the  server  side  functionality but the
1509                     client side must  implement  PMI2_Init()  and  the  other
1510                     interface calls.
1511
1512              pmix   To  enable  PMIx  support (http://pmix.github.io/master).
1513                     The PMIx support in Slurm can be used to launch  parallel
1514                     applications  (e.g.  MPI)  if  it  supports PMIx, PMI2 or
1515                     PMI1. Slurm must be configured with pmix support by pass‐
1516                     ing  "--with-pmix=<PMIx installation path>" option to its
1517                     "./configure" script.
1518
1519                     At the time of writing PMIx  is  supported  in  Open  MPI
1520                     starting  from  version 2.0.  PMIx also supports backward
1521                     compatibility with PMI1 and PMI2 and can be used  if  MPI
1522                     was  configured  with  PMI2/PMI1  support pointing to the
1523                     PMIx library ("libpmix").  If MPI supports PMI1/PMI2  but
1524                     doesn't  provide the way to point to a specific implemen‐
1525                     tation, a hack'ish solution leveraging LD_PRELOAD can  be
1526                     used to force "libpmix" usage.
1527
1528
1529              none   No  special MPI processing. This is the default and works
1530                     with many other versions of MPI.
1531
1532              This option applies to step allocations.
1533
1534
1535       --multi-prog
1536              Run a job with different programs and  different  arguments  for
1537              each  task.  In  this  case, the executable program specified is
1538              actually a configuration  file  specifying  the  executable  and
1539              arguments  for  each  task.  See  MULTIPLE PROGRAM CONFIGURATION
1540              below for details  on  the  configuration  file  contents.  This
1541              option applies to step allocations.
1542
1543
1544       -N, --nodes=<minnodes[-maxnodes]>
1545              Request  that  a  minimum of minnodes nodes be allocated to this
1546              job.  A maximum node count may also be specified with  maxnodes.
1547              If  only one number is specified, this is used as both the mini‐
1548              mum and maximum node count.  The partition's node limits  super‐
1549              sede  those  of  the job.  If a job's node limits are outside of
1550              the range permitted for its associated partition, the  job  will
1551              be  left in a PENDING state.  This permits possible execution at
1552              a later time, when the partition limit is  changed.   If  a  job
1553              node  limit exceeds the number of nodes configured in the parti‐
1554              tion, the job will be rejected.  Note that the environment vari‐
1555              able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1556              ibility) will be set to the count of nodes actually allocated to
1557              the job. See the ENVIRONMENT VARIABLES section for more informa‐
1558              tion.  If -N is not specified, the default behavior is to  allo‐
1559              cate  enough  nodes to satisfy the requirements of the -n and -c
1560              options.  The job will be allocated as many  nodes  as  possible
1561              within  the  range specified and without delaying the initiation
1562              of the job.  If number  of  tasks  is  given  and  a  number  of
1563              requested nodes is also given the number of nodes used from that
1564              request will be reduced to match that of the number of tasks  if
1565              the number of nodes in the request is greater than the number of
1566              tasks.  The node count specification may include a numeric value
1567              followed  by a suffix of "k" (multiplies numeric value by 1,024)
1568              or "m" (multiplies numeric  value  by  1,048,576).  This  option
1569              applies to job and step allocations.
1570
1571
1572       -n, --ntasks=<number>
1573              Specify  the  number of tasks to run. Request that srun allocate
1574              resources for ntasks tasks.  The default is one task  per  node,
1575              but  note  that  the  --cpus-per-task  option  will  change this
1576              default. This option applies to job and step allocations.
1577
1578
1579       --network=<type>
1580              Specify information pertaining to the switch  or  network.   The
1581              interpretation of type is system dependent.  This option is sup‐
1582              ported when running Slurm on a Cray natively.   It  is  used  to
1583              request  using Network Performance Counters.  Only one value per
1584              request is valid.  All options are case in-sensitive.   In  this
1585              configuration supported values include:
1586
1587              system
1588                    Use  the  system-wide  network  performance counters. Only
1589                    nodes requested will be marked in use for the job  alloca‐
1590                    tion.   If  the job does not fill up the entire system the
1591                    rest of the nodes are not able to be used  by  other  jobs
1592                    using  NPC,  if  idle their state will appear as PerfCnts.
1593                    These nodes are still available for other jobs  not  using
1594                    NPC.
1595
1596              blade Use  the  blade  network  performance counters. Only nodes
1597                    requested will be marked in use for  the  job  allocation.
1598                    If  the job does not fill up the entire blade(s) allocated
1599                    to the job those blade(s) are not able to be used by other
1600                    jobs  using NPC, if idle their state will appear as PerfC‐
1601                    nts.  These nodes are still available for other  jobs  not
1602                    using NPC.
1603
1604
1605              In  all  cases  the  job  or step allocation request must
1606              specify the
1607              --exclusive option.  Otherwise the request will be denied.
1608
1609              Also with any of these options steps are not  allowed  to  share
1610              blades,  so  resources would remain idle inside an allocation if
1611              the step running on a blade does not take up all  the  nodes  on
1612              the blade.
1613
1614              The  network option is also supported on systems with IBM's Par‐
1615              allel Environment (PE).  See IBM's LoadLeveler job command  key‐
1616              word documentation about the keyword "network" for more informa‐
1617              tion.  Multiple values may be specified  in  a  comma  separated
1618              list.   All  options  are  case  in-sensitive.  Supported values
1619              include:
1620
1621              BULK_XFER[=<resources>]
1622                          Enable bulk transfer of data  using  Remote  Direct-
1623                          Memory Access (RDMA).  The optional resources speci‐
1624                          fication is a numeric value which can have a  suffix
1625                          of  "k",  "K",  "m",  "M", "g" or "G" for kilobytes,
1626                          megabytes or gigabytes.  NOTE: The resources  speci‐
1627                          fication  is not supported by the underlying IBM in‐
1628                          frastructure as of Parallel Environment version  2.2
1629                          and  no value should be specified at this time.  The
1630                          devices allocated to a job must all be of  the  same
1631                          type.   The  default value depends upon depends upon
1632                          what hardware is available and in order  of  prefer‐
1633                          ences  is  IPONLY  (which  is not considered in User
1634                          Space mode), HFI, IB, HPCE, and KMUX.
1635
1636              CAU=<count> Number  of  Collective  Acceleration   Units   (CAU)
1637                          required.  Applies only to IBM Power7-IH processors.
1638                          Default value is  zero.   Independent  CAU  will  be
1639                          allocated for each programming interface (MPI, LAPI,
1640                          etc.)
1641
1642              DEVNAME=<name>
1643                          Specify the device name to  use  for  communications
1644                          (e.g. "eth0" or "mlx4_0").
1645
1646              DEVTYPE=<type>
1647                          Specify  the  device type to use for communications.
1648                          The supported values of type are: "IB" (InfiniBand),
1649                          "HFI"  (P7 Host Fabric Interface), "IPONLY" (IP-Only
1650                          interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1651                          nel  Emulation of HPCE).  The devices allocated to a
1652                          job must all be of the same type.  The default value
1653                          depends upon depends upon what hardware is available
1654                          and in order of preferences is IPONLY (which is  not
1655                          considered  in  User Space mode), HFI, IB, HPCE, and
1656                          KMUX.
1657
1658              IMMED =<count>
1659                          Number of immediate send slots per window  required.
1660                          Applies  only  to IBM Power7-IH processors.  Default
1661                          value is zero.
1662
1663              INSTANCES =<count>
1664                          Specify number of network connections for each  task
1665                          on  each  network  connection.  The default instance
1666                          count is 1.
1667
1668              IPV4        Use Internet Protocol (IP) version 4  communications
1669                          (default).
1670
1671              IPV6        Use Internet Protocol (IP) version 6 communications.
1672
1673              LAPI        Use the LAPI programming interface.
1674
1675              MPI         Use  the  MPI  programming  interface.   MPI  is the
1676                          default interface.
1677
1678              PAMI        Use the PAMI programming interface.
1679
1680              SHMEM       Use the OpenSHMEM programming interface.
1681
1682              SN_ALL      Use all available switch networks (default).
1683
1684              SN_SINGLE   Use one available switch network.
1685
1686              UPC         Use the UPC programming interface.
1687
1688              US          Use User Space communications.
1689
1690
1691              Some examples of network specifications:
1692
1693              Instances=2,US,MPI,SN_ALL
1694                          Create two user space connections for MPI communica‐
1695                          tions on every switch network for each task.
1696
1697              US,MPI,Instances=3,Devtype=IB
1698                          Create three user space connections for MPI communi‐
1699                          cations on every InfiniBand network for each task.
1700
1701              IPV4,LAPI,SN_Single
1702                          Create a IP version 4 connection for LAPI communica‐
1703                          tions on one switch network for each task.
1704
1705              Instances=2,US,LAPI,MPI
1706                          Create  two user space connections each for LAPI and
1707                          MPI communications on every switch network for  each
1708                          task.  Note  that  SN_ALL  is  the default option so
1709                          every  switch  network  is  used.  Also  note   that
1710                          Instances=2   specifies  that  two  connections  are
1711                          established for each protocol  (LAPI  and  MPI)  and
1712                          each task.  If there are two networks and four tasks
1713                          on the node then  a  total  of  32  connections  are
1714                          established  (2 instances x 2 protocols x 2 networks
1715                          x 4 tasks).
1716
1717              This option applies to job and step allocations.
1718
1719
1720       --nice[=adjustment]
1721              Run the job with an adjusted scheduling priority  within  Slurm.
1722              With no adjustment value the scheduling priority is decreased by
1723              100. A negative nice value  increases  the  priority,  otherwise
1724              decreases it. The adjustment range is +/- 2147483645. Only priv‐
1725              ileged users can specify a negative adjustment.
1726
1727
1728       --ntasks-per-core=<ntasks>
1729              Request the maximum ntasks be invoked on each core.  This option
1730              applies  to  the  job  allocation,  but not to step allocations.
1731              Meant  to  be  used  with  the  --ntasks  option.   Related   to
1732              --ntasks-per-node  except  at the core level instead of the node
1733              level.  Masks will automatically be generated to bind the  tasks
1734              to  specific  core  unless  --cpu-bind=none is specified.  NOTE:
1735              This option is not supported unless SelectType=cons_res is  con‐
1736              figured  (either  directly  or indirectly on Cray systems) along
1737              with the node's core count.
1738
1739
1740       --ntasks-per-node=<ntasks>
1741              Request that ntasks be invoked on each node.  If used  with  the
1742              --ntasks  option,  the  --ntasks option will take precedence and
1743              the --ntasks-per-node will be treated  as  a  maximum  count  of
1744              tasks per node.  Meant to be used with the --nodes option.  This
1745              is related to --cpus-per-task=ncpus, but does not require knowl‐
1746              edge  of the actual number of cpus on each node.  In some cases,
1747              it is more convenient to be able to request that no more than  a
1748              specific  number  of tasks be invoked on each node.  Examples of
1749              this include submitting a hybrid MPI/OpenMP app where  only  one
1750              MPI  "task/rank"  should be assigned to each node while allowing
1751              the OpenMP portion to utilize all of the parallelism present  in
1752              the node, or submitting a single setup/cleanup/monitoring job to
1753              each node of a pre-existing allocation as one step in  a  larger
1754              job script. This option applies to job allocations.
1755
1756
1757       --ntasks-per-socket=<ntasks>
1758              Request  the  maximum  ntasks  be  invoked on each socket.  This
1759              option applies to the job allocation, but not  to  step  alloca‐
1760              tions.   Meant  to be used with the --ntasks option.  Related to
1761              --ntasks-per-node except at the socket level instead of the node
1762              level.   Masks will automatically be generated to bind the tasks
1763              to specific sockets unless --cpu-bind=none is specified.   NOTE:
1764              This  option is not supported unless SelectType=cons_res is con‐
1765              figured (either directly or indirectly on  Cray  systems)  along
1766              with the node's socket count.
1767
1768
1769       -O, --overcommit
1770              Overcommit  resources. This option applies to job and step allo‐
1771              cations.  When applied to job allocation, only one CPU is  allo‐
1772              cated to the job per node and options used to specify the number
1773              of tasks per  node,  socket,  core,  etc.   are  ignored.   When
1774              applied  to job step allocations (the srun command when executed
1775              within an existing job allocation), this option can be  used  to
1776              launch  more  than  one  task  per CPU.  Normally, srun will not
1777              allocate more than one process per CPU.  By  specifying  --over‐
1778              commit  you  are  explicitly  allowing more than one process per
1779              CPU. However no more than MAX_TASKS_PER_NODE tasks are permitted
1780              to execute per node.  NOTE: MAX_TASKS_PER_NODE is defined in the
1781              file slurm.h and is not a variable, it is  set  at  Slurm  build
1782              time.
1783
1784
1785       -o, --output=<filename pattern>
1786              Specify  the  "filename  pattern"  for  stdout  redirection.  By
1787              default in interactive mode, srun collects stdout from all tasks
1788              and  sends this output via TCP/IP to the attached terminal. With
1789              --output stdout may be redirected to a file,  to  one  file  per
1790              task,  or to /dev/null. See section IO Redirection below for the
1791              various forms  of  filename  pattern.   If  the  specified  file
1792              already exists, it will be overwritten.
1793
1794              If  --error is not also specified on the command line, both std‐
1795              out and stderr will directed to the file specified by  --output.
1796              This option applies to job and step allocations.
1797
1798
1799       --open-mode=<append|truncate>
1800              Open the output and error files using append or truncate mode as
1801              specified.  For heterogeneous job steps  the  default  value  is
1802              "append".   Otherwise the default value is specified by the sys‐
1803              tem configuration parameter JobFileAppend. This  option  applies
1804              to job and step allocations.
1805
1806
1807       --pack-group=<expr>
1808              Identify  each job in a heterogeneous job allocation for which a
1809              step is to be created. Applies  only  to  srun  commands  issued
1810              inside a salloc allocation or sbatch script.  <expr> is a set of
1811              integers corresponding to one or more  options  indexes  on  the
1812              salloc  or  sbatch  command  line.   Examples: "--pack-group=2",
1813              "--pack-group=0,4", "--pack-group=1,3-5".  The default value  is
1814              --pack-group=0.
1815
1816
1817       -p, --partition=<partition_names>
1818              Request  a  specific  partition for the resource allocation.  If
1819              not specified, the default behavior is to allow the  slurm  con‐
1820              troller  to  select  the  default partition as designated by the
1821              system administrator. If the job can use more  than  one  parti‐
1822              tion,  specify  their names in a comma separate list and the one
1823              offering earliest initiation will be used with no  regard  given
1824              to  the partition name ordering (although higher priority parti‐
1825              tions will be considered first).  When the job is initiated, the
1826              name  of  the  partition  used  will  be placed first in the job
1827              record partition string. This option applies to job allocations.
1828
1829
1830       --power=<flags>
1831              Comma separated list of power management plugin  options.   Cur‐
1832              rently  available  flags  include: level (all nodes allocated to
1833              the job should have identical power caps, may be disabled by the
1834              Slurm  configuration option PowerParameters=job_no_level).  This
1835              option applies to job allocations.
1836
1837
1838       --priority=<value>
1839              Request a specific job priority.  May be subject  to  configura‐
1840              tion  specific  constraints.   value  should either be a numeric
1841              value or "TOP" (for highest possible value).  Only Slurm  opera‐
1842              tors  and  administrators  can  set the priority of a job.  This
1843              option applies to job allocations only.
1844
1845
1846       --profile=<all|none|[energy[,|task[,|filesystem[,|network]]]]>
1847              enables detailed  data  collection  by  the  acct_gather_profile
1848              plugin.  Detailed data are typically time-series that are stored
1849              in an HDF5 file for the job or an InfluxDB database depending on
1850              the configured plugin.
1851
1852
1853              All       All data types are collected. (Cannot be combined with
1854                        other values.)
1855
1856
1857              None      No data types are collected. This is the default.
1858                         (Cannot be combined with other values.)
1859
1860
1861              Energy    Energy data is collected.
1862
1863
1864              Task      Task (I/O, Memory, ...) data is collected.
1865
1866
1867              Filesystem
1868                        Filesystem data is collected.
1869
1870
1871              Network   Network (InfiniBand) data is collected.
1872
1873
1874              This option applies to job and step allocations.
1875
1876
1877       --prolog=<executable>
1878              srun will run executable just before  launching  the  job  step.
1879              The  command  line  arguments for executable will be the command
1880              and arguments of the job step.  If executable is "none", then no
1881              srun prolog will be run. This parameter overrides the SrunProlog
1882              parameter in slurm.conf. This parameter is  completely  indepen‐
1883              dent  from  the  Prolog  parameter  in  slurm.conf.  This option
1884              applies to job allocations.
1885
1886
1887       --propagate[=rlimit[,rlimit...]]
1888              Allows users to specify which of the modifiable (soft)  resource
1889              limits  to  propagate  to  the  compute nodes and apply to their
1890              jobs. If no rlimit is specified, then all resource  limits  will
1891              be  propagated.   The  following  rlimit  names are supported by
1892              Slurm (although some options may not be supported on  some  sys‐
1893              tems):
1894
1895              ALL       All limits listed below (default)
1896
1897              NONE      No limits listed below
1898
1899              AS        The maximum address space for a process
1900
1901              CORE      The maximum size of core file
1902
1903              CPU       The maximum amount of CPU time
1904
1905              DATA      The maximum size of a process's data segment
1906
1907              FSIZE     The  maximum  size  of files created. Note that if the
1908                        user sets FSIZE to less than the current size  of  the
1909                        slurmd.log,  job  launches will fail with a 'File size
1910                        limit exceeded' error.
1911
1912              MEMLOCK   The maximum size that may be locked into memory
1913
1914              NOFILE    The maximum number of open files
1915
1916              NPROC     The maximum number of processes available
1917
1918              RSS       The maximum resident set size
1919
1920              STACK     The maximum stack size
1921
1922              This option applies to job allocations.
1923
1924
1925       --pty  Execute task zero in  pseudo  terminal  mode.   Implicitly  sets
1926              --unbuffered.  Implicitly sets --error and --output to /dev/null
1927              for all tasks except task zero, which may cause those  tasks  to
1928              exit immediately (e.g. shells will typically exit immediately in
1929              that situation).  This option applies to step allocations.
1930
1931
1932       -q, --qos=<qos>
1933              Request a quality of service for the job.   QOS  values  can  be
1934              defined  for  each user/cluster/account association in the Slurm
1935              database.  Users will be limited to their association's  defined
1936              set  of  qos's  when the Slurm configuration parameter, Account‐
1937              ingStorageEnforce,  includes  "qos"  in  it's  definition.  This
1938              option applies to job allocations.
1939
1940
1941       -Q, --quiet
1942              Suppress  informational messages from srun. Errors will still be
1943              displayed. This option applies to job and step allocations.
1944
1945
1946       --quit-on-interrupt
1947              Quit immediately on single SIGINT (Ctrl-C). Use of  this  option
1948              disables   the  status  feature  normally  available  when  srun
1949              receives a single Ctrl-C and causes srun to instead  immediately
1950              terminate  the  running job. This option applies to step alloca‐
1951              tions.
1952
1953
1954       -r, --relative=<n>
1955              Run a job step relative to node n  of  the  current  allocation.
1956              This  option  may  be used to spread several job steps out among
1957              the nodes of the current job. If -r is  used,  the  current  job
1958              step  will  begin at node n of the allocated nodelist, where the
1959              first node is considered node 0.  The -r option is not permitted
1960              with  -w  or -x option and will result in a fatal error when not
1961              running within a prior allocation (i.e. when SLURM_JOB_ID is not
1962              set).  The  default  for n is 0. If the value of --nodes exceeds
1963              the number of nodes identified with  the  --relative  option,  a
1964              warning  message  will be printed and the --relative option will
1965              take precedence. This option applies to step allocations.
1966
1967
1968       --reboot
1969              Force the allocated nodes to reboot  before  starting  the  job.
1970              This  is only supported with some system configurations and will
1971              otherwise be silently ignored. This option applies to job  allo‐
1972              cations.
1973
1974
1975       --resv-ports[=count]
1976              Reserve  communication ports for this job. Users can specify the
1977              number  of  port  they  want  to  reserve.  The  parameter  Mpi‐
1978              Params=ports=12000-12999 must be specified in slurm.conf. If not
1979              specified and Slurm's OpenMPI plugin is used,  then  by  default
1980              the  number  of reserved equal to the highest number of tasks on
1981              any node in the job step allocation.  If the number of  reserved
1982              ports is zero then no ports is reserved.  Used for OpenMPI. This
1983              option applies to job and step allocations.
1984
1985
1986       --reservation=<name>
1987              Allocate resources for the job from the named reservation.  This
1988              option applies to job allocations.
1989
1990
1991       -s, --oversubscribe
1992              The  job allocation can over-subscribe resources with other run‐
1993              ning jobs.  The resources to be over-subscribed  can  be  nodes,
1994              sockets,  cores,  and/or  hyperthreads depending upon configura‐
1995              tion.  The default over-subscribe  behavior  depends  on  system
1996              configuration  and  the  partition's  OverSubscribe option takes
1997              precedence over the job's option.  This option may result in the
1998              allocation  being  granted  sooner  than  if the --oversubscribe
1999              option was not set and  allow  higher  system  utilization,  but
2000              application  performance  will  likely suffer due to competition
2001              for resources.  Also see the  --exclusive  option.  This  option
2002              applies to step allocations.
2003
2004
2005       -S, --core-spec=<num>
2006              Count of specialized cores per node reserved by the job for sys‐
2007              tem operations and not used by the application. The  application
2008              will  not use these cores, but will be charged for their alloca‐
2009              tion.  Default value is dependent  upon  the  node's  configured
2010              CoreSpecCount  value.   If a value of zero is designated and the
2011              Slurm configuration option AllowSpecResourcesUsage  is  enabled,
2012              the  job  will  be allowed to override CoreSpecCount and use the
2013              specialized resources on nodes it is allocated.  This option can
2014              not  be  used with the --thread-spec option. This option applies
2015              to job allocations.
2016
2017
2018       --signal=<sig_num>[@<sig_time>]
2019              When a job is within sig_time seconds of its end time,  send  it
2020              the  signal sig_num.  Due to the resolution of event handling by
2021              Slurm, the signal may be sent up  to  60  seconds  earlier  than
2022              specified.   sig_num may either be a signal number or name (e.g.
2023              "10" or "USR1").  sig_time must have an integer value between  0
2024              and  65535.   By default, no signal is sent before the job's end
2025              time.  If a sig_num  is  specified  without  any  sig_time,  the
2026              default  time  will  be  60  seconds. This option applies to job
2027              allocations.  To have the signal sent at preemption time see the
2028              preempt_send_user_signal SlurmctldParameter.
2029
2030
2031       --slurmd-debug=<level>
2032              Specify  a debug level for slurmd(8). The level may be specified
2033              either an integer value between 0 [quiet, only errors  are  dis‐
2034              played] and 4 [verbose operation] or the SlurmdDebug tags.
2035
2036              quiet     Log nothing
2037
2038              fatal     Log only fatal errors
2039
2040              error     Log only errors
2041
2042              info      Log errors and general informational messages
2043
2044              verbose   Log errors and verbose informational messages
2045
2046
2047              The slurmd debug information is copied onto the stderr of
2048              the  job.  By  default  only  errors  are displayed. This option
2049              applies to job and step allocations.
2050
2051
2052       --sockets-per-node=<sockets>
2053              Restrict node selection to nodes with  at  least  the  specified
2054              number  of  sockets.  See additional information under -B option
2055              above when task/affinity plugin is enabled. This option  applies
2056              to job allocations.
2057
2058
2059       --spread-job
2060              Spread  the  job  allocation  over as many nodes as possible and
2061              attempt to evenly distribute tasks across the  allocated  nodes.
2062              This  option  disables  the  topology/tree  plugin.  This option
2063              applies to job allocations.
2064
2065
2066       --switches=<count>[@<max-time>]
2067              When a tree topology is used, this defines the maximum count  of
2068              switches desired for the job allocation and optionally the maxi‐
2069              mum time to wait for that number of switches. If Slurm finds  an
2070              allocation  containing  more  switches than the count specified,
2071              the job remains pending until it either finds an allocation with
2072              desired  switch count or the time limit expires.  It there is no
2073              switch count limit, there is  no  delay  in  starting  the  job.
2074              Acceptable  time  formats  include "minutes", "minutes:seconds",
2075              "hours:minutes:seconds", "days-hours", "days-hours:minutes"  and
2076              "days-hours:minutes:seconds".   The job's maximum time delay may
2077              be limited by the system administrator using the SchedulerParam‐
2078              eters configuration parameter with the max_switch_wait parameter
2079              option.  On a dragonfly network the only switch count  supported
2080              is  1 since communication performance will be highest when a job
2081              is allocate resources on one leaf switch or  more  than  2  leaf
2082              switches.   The  default  max-time is the max_switch_wait Sched‐
2083              ulerParameters. This option applies to job allocations.
2084
2085
2086       -T, --threads=<nthreads>
2087              Allows limiting the number of concurrent threads  used  to  send
2088              the job request from the srun process to the slurmd processes on
2089              the allocated nodes. Default is to use one thread per  allocated
2090              node  up  to a maximum of 60 concurrent threads. Specifying this
2091              option limits the number of concurrent threads to nthreads (less
2092              than  or  equal  to  60).  This should only be used to set a low
2093              thread count for testing on very small  memory  computers.  This
2094              option applies to job allocations.
2095
2096
2097       -t, --time=<time>
2098              Set a limit on the total run time of the job allocation.  If the
2099              requested time limit exceeds the partition's time limit, the job
2100              will  be  left  in a PENDING state (possibly indefinitely).  The
2101              default time limit is the partition's default time limit.   When
2102              the  time  limit  is reached, each task in each job step is sent
2103              SIGTERM followed by SIGKILL.  The interval  between  signals  is
2104              specified  by  the  Slurm configuration parameter KillWait.  The
2105              OverTimeLimit configuration parameter may permit the job to  run
2106              longer than scheduled.  Time resolution is one minute and second
2107              values are rounded up to the next minute.
2108
2109              A time limit of zero requests that no  time  limit  be  imposed.
2110              Acceptable  time  formats  include "minutes", "minutes:seconds",
2111              "hours:minutes:seconds", "days-hours", "days-hours:minutes"  and
2112              "days-hours:minutes:seconds".  This  option  applies  to job and
2113              step allocations.
2114
2115
2116       --task-epilog=<executable>
2117              The slurmstepd daemon will run executable just after  each  task
2118              terminates.  This will be executed before any TaskEpilog parame‐
2119              ter in slurm.conf is executed.  This  is  meant  to  be  a  very
2120              short-lived  program. If it fails to terminate within a few sec‐
2121              onds, it will be killed along  with  any  descendant  processes.
2122              This option applies to step allocations.
2123
2124
2125       --task-prolog=<executable>
2126              The  slurmstepd daemon will run executable just before launching
2127              each task. This will be executed after any TaskProlog  parameter
2128              in slurm.conf is executed.  Besides the normal environment vari‐
2129              ables, this has SLURM_TASK_PID available to identify the process
2130              ID of the task being started.  Standard output from this program
2131              of the form "export NAME=value" will be used to set  environment
2132              variables  for  the  task  being spawned. This option applies to
2133              step allocations.
2134
2135
2136       --test-only
2137              Returns an estimate of when a job  would  be  scheduled  to  run
2138              given  the  current  job  queue and all the other srun arguments
2139              specifying the job.  This limits srun's behavior to just  return
2140              information;  no job is actually submitted.  The program will be
2141              executed directly by the slurmd daemon. This option  applies  to
2142              job allocations.
2143
2144
2145       --thread-spec=<num>
2146              Count  of  specialized  threads per node reserved by the job for
2147              system operations and not used by the application. The  applica‐
2148              tion  will  not use these threads, but will be charged for their
2149              allocation.  This option can not be used  with  the  --core-spec
2150              option. This option applies to job allocations.
2151
2152
2153       --threads-per-core=<threads>
2154              Restrict  node  selection  to  nodes with at least the specified
2155              number of threads per core.  NOTE: "Threads" refers to the  num‐
2156              ber  of  processing units on each core rather than the number of
2157              application tasks to  be  launched  per  core.   See  additional
2158              information  under  -B option above when task/affinity plugin is
2159              enabled. This option applies to job allocations.
2160
2161
2162       --time-min=<time>
2163              Set a minimum time limit on the job allocation.   If  specified,
2164              the  job  may have it's --time limit lowered to a value no lower
2165              than --time-min if doing so permits the job to  begin  execution
2166              earlier  than otherwise possible.  The job's time limit will not
2167              be changed after the job is allocated resources.  This  is  per‐
2168              formed  by a backfill scheduling algorithm to allocate resources
2169              otherwise reserved for higher priority  jobs.   Acceptable  time
2170              formats   include   "minutes",   "minutes:seconds",  "hours:min‐
2171              utes:seconds",    "days-hours",     "days-hours:minutes"     and
2172              "days-hours:minutes:seconds". This option applies to job alloca‐
2173              tions.
2174
2175
2176       --tmp=<size[units]>
2177              Specify a minimum amount  of  temporary  disk  space  per  node.
2178              Default  units are megabytes unless the SchedulerParameters con‐
2179              figuration parameter includes the  "default_gbytes"  option  for
2180              gigabytes.   Different  units  can be specified using the suffix
2181              [K|M|G|T].  This option applies to job allocations.
2182
2183
2184       -u, --unbuffered
2185              By default  the  connection  between  slurmstepd  and  the  user
2186              launched application is over a pipe. The stdio output written by
2187              the application is buffered by the glibc until it is flushed  or
2188              the  output is set as unbuffered.  See setbuf(3). If this option
2189              is specified the tasks are executed with a  pseudo  terminal  so
2190              that  the  application output is unbuffered. This option applies
2191              to step allocations.
2192
2193       --usage
2194              Display brief help message and exit.
2195
2196
2197       --uid=<user>
2198              Attempt to submit and/or run a job as user instead of the invok‐
2199              ing  user  id.  The  invoking user's credentials will be used to
2200              check access permissions for the target partition. User root may
2201              use  this option to run jobs as a normal user in a RootOnly par‐
2202              tition for example. If run as root, srun will drop  its  permis‐
2203              sions  to the uid specified after node allocation is successful.
2204              user may be the user name or  numerical  user  ID.  This  option
2205              applies to job and step allocations.
2206
2207
2208       --use-min-nodes
2209              If a range of node counts is given, prefer the smaller count.
2210
2211
2212       -V, --version
2213              Display version information and exit.
2214
2215
2216       -v, --verbose
2217              Increase the verbosity of srun's informational messages.  Multi‐
2218              ple -v's will further increase  srun's  verbosity.   By  default
2219              only  errors  will  be displayed. This option applies to job and
2220              step allocations.
2221
2222
2223       -W, --wait=<seconds>
2224              Specify how long to wait after the first task terminates  before
2225              terminating  all  remaining  tasks.  A  value  of 0 indicates an
2226              unlimited wait (a warning will be issued after 60 seconds).  The
2227              default value is set by the WaitTime parameter in the slurm con‐
2228              figuration file (see slurm.conf(5)). This option can  be  useful
2229              to  ensure  that  a job is terminated in a timely fashion in the
2230              event that one or more tasks terminate prematurely.   Note:  The
2231              -K,  --kill-on-bad-exit  option takes precedence over -W, --wait
2232              to terminate the job immediately if a task exits with a non-zero
2233              exit code. This option applies to job allocations.
2234
2235
2236       -w, --nodelist=<host1,host2,... or filename>
2237              Request  a  specific list of hosts.  The job will contain all of
2238              these hosts and possibly additional hosts as needed  to  satisfy
2239              resource   requirements.    The  list  may  be  specified  as  a
2240              comma-separated list of hosts, a range of hosts (host[1-5,7,...]
2241              for  example),  or a filename.  The host list will be assumed to
2242              be a filename if it contains a "/" character.  If you specify  a
2243              minimum  node or processor count larger than can be satisfied by
2244              the supplied host list, additional resources will  be  allocated
2245              on  other  nodes  as  needed.  Rather than repeating a host name
2246              multiple times, an  asterisk  and  a  repetition  count  may  be
2247              appended to a host name. For example "host1,host1" and "host1*2"
2248              are equivalent. If number of  tasks  is  given  and  a  list  of
2249              requested nodes is also given the number of nodes used from that
2250              list will be reduced to match that of the number of tasks if the
2251              number of nodes in the list is greater than the number of tasks.
2252              This option applies to job and step allocations.
2253
2254
2255       --wckey=<wckey>
2256              Specify wckey to be used with job.  If  TrackWCKey=no  (default)
2257              in  the slurm.conf this value is ignored. This option applies to
2258              job allocations.
2259
2260
2261       -X, --disable-status
2262              Disable the display of task status when srun receives  a  single
2263              SIGINT  (Ctrl-C).  Instead immediately forward the SIGINT to the
2264              running job.  Without this option a second Ctrl-C in one  second
2265              is  required to forcibly terminate the job and srun will immedi‐
2266              ately exit.  May  also  be  set  via  the  environment  variable
2267              SLURM_DISABLE_STATUS. This option applies to job allocations.
2268
2269
2270       -x, --exclude=<host1,host2,... or filename>
2271              Request  that  a  specific  list of hosts not be included in the
2272              resources allocated to this job. The host list will  be  assumed
2273              to  be  a  filename  if  it contains a "/"character. This option
2274              applies to job allocations.
2275
2276
2277       --x11[=<all|first|last>]
2278              Sets up X11 forwarding on all, first  or  last  node(s)  of  the
2279              allocation.  This  option  is only enabled if Slurm was compiled
2280              with  X11  support  and  PrologFlags=x11  is  defined   in   the
2281              slurm.conf. Default is all.
2282
2283
2284       -Z, --no-allocate
2285              Run  the  specified  tasks  on a set of nodes without creating a
2286              Slurm "job" in the Slurm queue structure, bypassing  the  normal
2287              resource  allocation  step.  The list of nodes must be specified
2288              with the -w, --nodelist option.  This  is  a  privileged  option
2289              only available for the users "SlurmUser" and "root". This option
2290              applies to job allocations.
2291
2292
2293       srun will submit the job request to the slurm job controller, then ini‐
2294       tiate  all  processes on the remote nodes. If the request cannot be met
2295       immediately, srun will block until the resources are free  to  run  the
2296       job. If the -I (--immediate) option is specified srun will terminate if
2297       resources are not immediately available.
2298
2299       When initiating remote processes srun will propagate the current  work‐
2300       ing  directory,  unless --chdir=<path> is specified, in which case path
2301       will become the working directory for the remote processes.
2302
2303       The -n, -c, and -N options control how CPUs  and nodes  will  be  allo‐
2304       cated  to  the job. When specifying only the number of processes to run
2305       with -n, a default of one CPU per process is allocated.  By  specifying
2306       the  number  of  CPUs  required per task (-c), more than one CPU may be
2307       allocated per process. If the number of nodes  is  specified  with  -N,
2308       srun will attempt to allocate at least the number of nodes specified.
2309
2310       Combinations  of the above three options may be used to change how pro‐
2311       cesses are distributed across nodes and cpus. For instance, by specify‐
2312       ing  both  the number of processes and number of nodes on which to run,
2313       the number of processes per node is implied. However, if the number  of
2314       CPUs  per  process  is more important then number of processes (-n) and
2315       the number of CPUs per process (-c) should be specified.
2316
2317       srun will refuse to  allocate more than  one  process  per  CPU  unless
2318       --overcommit (-O) is also specified.
2319
2320       srun will attempt to meet the above specifications "at a minimum." That
2321       is, if 16 nodes are requested for 32 processes, and some nodes  do  not
2322       have 2 CPUs, the allocation of nodes will be increased in order to meet
2323       the demand for CPUs. In other words, a minimum of 16  nodes  are  being
2324       requested.  However,  if  16 nodes are requested for 15 processes, srun
2325       will consider this an error, as  15  processes  cannot  run  across  16
2326       nodes.
2327
2328
2329       IO Redirection
2330
2331       By  default, stdout and stderr will be redirected from all tasks to the
2332       stdout and stderr of srun, and stdin will be redirected from the  stan‐
2333       dard input of srun to all remote tasks.  If stdin is only to be read by
2334       a subset of the spawned tasks, specifying a file to  read  from  rather
2335       than  forwarding  stdin  from  the srun command may be preferable as it
2336       avoids moving and storing data that will never be read.
2337
2338       For OS X, the poll() function does not support stdin, so input  from  a
2339       terminal is not possible.
2340
2341       This  behavior  may  be changed with the --output, --error, and --input
2342       (-o, -e, -i) options. Valid format specifications for these options are
2343
2344       all       stdout stderr is redirected from all tasks to srun.  stdin is
2345                 broadcast  to  all remote tasks.  (This is the default behav‐
2346                 ior)
2347
2348       none      stdout and stderr is not received from any  task.   stdin  is
2349                 not sent to any task (stdin is closed).
2350
2351       taskid    stdout  and/or  stderr are redirected from only the task with
2352                 relative id equal to taskid, where 0  <=  taskid  <=  ntasks,
2353                 where  ntasks is the total number of tasks in the current job
2354                 step.  stdin is redirected from the stdin  of  srun  to  this
2355                 same  task.   This file will be written on the node executing
2356                 the task.
2357
2358       filename  srun will redirect stdout and/or stderr  to  the  named  file
2359                 from all tasks.  stdin will be redirected from the named file
2360                 and broadcast to all tasks in the job.  filename refers to  a
2361                 path  on the host that runs srun.  Depending on the cluster's
2362                 file system layout, this may result in the  output  appearing
2363                 in  different  places  depending on whether the job is run in
2364                 batch mode.
2365
2366       filename pattern
2367                 srun allows for a filename pattern to be used to generate the
2368                 named  IO  file described above. The following list of format
2369                 specifiers may be used in the format  string  to  generate  a
2370                 filename  that will be unique to a given jobid, stepid, node,
2371                 or task. In each case, the appropriate number  of  files  are
2372                 opened and associated with the corresponding tasks. Note that
2373                 any format string containing %t, %n, and/or %N will be  writ‐
2374                 ten on the node executing the task rather than the node where
2375                 srun executes, these format specifiers are not supported on a
2376                 BGQ system.
2377
2378                 \\     Do not process any of the replacement symbols.
2379
2380                 %%     The character "%".
2381
2382                 %A     Job array's master job allocation number.
2383
2384                 %a     Job array ID (index) number.
2385
2386                 %J     jobid.stepid of the running job. (e.g. "128.0")
2387
2388                 %j     jobid of the running job.
2389
2390                 %s     stepid of the running job.
2391
2392                 %N     short  hostname.  This  will create a separate IO file
2393                        per node.
2394
2395                 %n     Node identifier relative to current job (e.g.  "0"  is
2396                        the  first node of the running job) This will create a
2397                        separate IO file per node.
2398
2399                 %t     task identifier (rank) relative to current  job.  This
2400                        will create a separate IO file per task.
2401
2402                 %u     User name.
2403
2404                 %x     Job name.
2405
2406                 A  number  placed  between  the  percent character and format
2407                 specifier may be used to zero-pad the result in the IO  file‐
2408                 name.  This  number is ignored if the format specifier corre‐
2409                 sponds to  non-numeric data (%N for example).
2410
2411                 Some examples of how the format string may be used  for  a  4
2412                 task  job  step  with  a  Job  ID of 128 and step id of 0 are
2413                 included below:
2414
2415                 job%J.out      job128.0.out
2416
2417                 job%4j.out     job0128.out
2418
2419                 job%j-%2t.out  job128-00.out, job128-01.out, ...
2420

INPUT ENVIRONMENT VARIABLES

2422       Some srun options may be set via environment variables.  These environ‐
2423       ment  variables,  along  with  their  corresponding options, are listed
2424       below.  Note: Command line options will always override these settings.
2425
2426       PMI_FANOUT            This is used exclusively  with  PMI  (MPICH2  and
2427                             MVAPICH2)  and controls the fanout of data commu‐
2428                             nications. The srun  command  sends  messages  to
2429                             application  programs  (via  the PMI library) and
2430                             those applications may be called upon to  forward
2431                             that  data  to  up  to  this number of additional
2432                             tasks. Higher values offload work from  the  srun
2433                             command  to  the applications and likely increase
2434                             the vulnerability to failures.  The default value
2435                             is 32.
2436
2437       PMI_FANOUT_OFF_HOST   This  is  used  exclusively  with PMI (MPICH2 and
2438                             MVAPICH2) and controls the fanout of data  commu‐
2439                             nications.   The  srun  command sends messages to
2440                             application programs (via the  PMI  library)  and
2441                             those  applications may be called upon to forward
2442                             that data to additional tasks. By  default,  srun
2443                             sends  one  message per host and one task on that
2444                             host forwards the data to  other  tasks  on  that
2445                             host up to PMI_FANOUT.  If PMI_FANOUT_OFF_HOST is
2446                             defined, the user task may be required to forward
2447                             the  data  to  tasks  on  other  hosts.   Setting
2448                             PMI_FANOUT_OFF_HOST  may  increase   performance.
2449                             Since  more  work is performed by the PMI library
2450                             loaded by the user application, failures also can
2451                             be more common and more difficult to diagnose.
2452
2453       PMI_TIME              This  is  used  exclusively  with PMI (MPICH2 and
2454                             MVAPICH2) and controls how  much  the  communica‐
2455                             tions  from  the tasks to the srun are spread out
2456                             in time in order to avoid overwhelming  the  srun
2457                             command  with  work.  The  default  value  is 500
2458                             (microseconds) per task. On relatively slow  pro‐
2459                             cessors  or  systems  with  very  large processor
2460                             counts (and large PMI data sets),  higher  values
2461                             may be required.
2462
2463       SLURM_CONF            The location of the Slurm configuration file.
2464
2465       SLURM_ACCOUNT         Same as -A, --account
2466
2467       SLURM_ACCTG_FREQ      Same as --acctg-freq
2468
2469       SLURM_BCAST           Same as --bcast
2470
2471       SLURM_BURST_BUFFER    Same as --bb
2472
2473       SLURM_CHECKPOINT      Same as --checkpoint
2474
2475       SLURM_COMPRESS        Same as --compress
2476
2477       SLURM_CONSTRAINT      Same as -C, --constraint
2478
2479       SLURM_CORE_SPEC       Same as --core-spec
2480
2481       SLURM_CPU_BIND        Same as --cpu-bind
2482
2483       SLURM_CPU_FREQ_REQ    Same as --cpu-freq.
2484
2485       SLURM_CPUS_PER_GPU    Same as --cpus-per-gpu
2486
2487       SLURM_CPUS_PER_TASK   Same as -c, --cpus-per-task
2488
2489       SLURM_DEBUG           Same as -v, --verbose
2490
2491       SLURM_DELAY_BOOT      Same as --delay-boot
2492
2493       SLURMD_DEBUG          Same as -d, --slurmd-debug
2494
2495       SLURM_DEPENDENCY      Same as -P, --dependency=<jobid>
2496
2497       SLURM_DISABLE_STATUS  Same as -X, --disable-status
2498
2499       SLURM_DIST_PLANESIZE  Same as -m plane
2500
2501       SLURM_DISTRIBUTION    Same as -m, --distribution
2502
2503       SLURM_EPILOG          Same as --epilog
2504
2505       SLURM_EXCLUSIVE       Same as --exclusive
2506
2507       SLURM_EXIT_ERROR      Specifies  the  exit  code generated when a Slurm
2508                             error occurs (e.g. invalid options).  This can be
2509                             used  by a script to distinguish application exit
2510                             codes from various Slurm error conditions.   Also
2511                             see SLURM_EXIT_IMMEDIATE.
2512
2513       SLURM_EXIT_IMMEDIATE  Specifies   the  exit  code  generated  when  the
2514                             --immediate option is used and resources are  not
2515                             currently  available.   This  can  be  used  by a
2516                             script to distinguish application exit codes from
2517                             various   Slurm   error   conditions.   Also  see
2518                             SLURM_EXIT_ERROR.
2519
2520       SLURM_EXPORT_ENV      Same as --export
2521
2522       SLURM_GPUS            Same as -G, --gpus
2523
2524       SLURM_GPU_BIND        Same as --gpu-bind
2525
2526       SLURM_GPU_FREQ        Same as --gpu-freq
2527
2528       SLURM_GPUS_PER_NODE   Same as --gpus-per-node
2529
2530       SLURM_GPUS_PER_TASK   Same as --gpus-per-task
2531
2532       SLURM_GRES_FLAGS      Same as --gres-flags
2533
2534       SLURM_HINT            Same as --hint
2535
2536       SLURM_GRES            Same as --gres. Also see SLURM_STEP_GRES
2537
2538       SLURM_IMMEDIATE       Same as -I, --immediate
2539
2540       SLURM_JOB_ID          Same as --jobid
2541
2542       SLURM_JOB_NAME        Same as -J, --job-name except within an  existing
2543                             allocation,  in which case it is ignored to avoid
2544                             using the batch job's name as the  name  of  each
2545                             job step.
2546
2547       SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility)
2548                             Same  as -N, --nodes Total number of nodes in the
2549                             job’s resource allocation.
2550
2551       SLURM_KILL_BAD_EXIT   Same as -K, --kill-on-bad-exit
2552
2553       SLURM_LABELIO         Same as -l, --label
2554
2555       SLURM_MEM_BIND        Same as --mem-bind
2556
2557       SLURM_MEM_PER_CPU     Same as --mem-per-cpu
2558
2559       SLURM_MEM_PER_GPU     Same as --mem-per-gpu
2560
2561       SLURM_MEM_PER_NODE    Same as --mem
2562
2563       SLURM_MPI_TYPE        Same as --mpi
2564
2565       SLURM_NETWORK         Same as --network
2566
2567       SLURM_NO_KILL         Same as -k, --no-kill
2568
2569       SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
2570                             Same as -n, --ntasks
2571
2572       SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2573
2574       SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2575
2576       SLURM_NTASKS_PER_SOCKET
2577                             Same as --ntasks-per-socket
2578
2579       SLURM_OPEN_MODE       Same as --open-mode
2580
2581       SLURM_OVERCOMMIT      Same as -O, --overcommit
2582
2583       SLURM_PARTITION       Same as -p, --partition
2584
2585       SLURM_PMI_KVS_NO_DUP_KEYS
2586                             If set, then PMI key-pairs will contain no dupli‐
2587                             cate  keys.  MPI  can use this variable to inform
2588                             the PMI library that it will  not  use  duplicate
2589                             keys  so  PMI  can  skip  the check for duplicate
2590                             keys.  This is the case for  MPICH2  and  reduces
2591                             overhead  in  testing for duplicates for improved
2592                             performance
2593
2594       SLURM_POWER           Same as --power
2595
2596       SLURM_PROFILE         Same as --profile
2597
2598       SLURM_PROLOG          Same as --prolog
2599
2600       SLURM_QOS             Same as --qos
2601
2602       SLURM_REMOTE_CWD      Same as -D, --chdir=
2603
2604       SLURM_REQ_SWITCH      When a tree topology is used,  this  defines  the
2605                             maximum  count  of  switches  desired for the job
2606                             allocation and optionally  the  maximum  time  to
2607                             wait for that number of switches. See --switches
2608
2609       SLURM_RESERVATION     Same as --reservation
2610
2611       SLURM_RESV_PORTS      Same as --resv-ports
2612
2613       SLURM_SIGNAL          Same as --signal
2614
2615       SLURM_STDERRMODE      Same as -e, --error
2616
2617       SLURM_STDINMODE       Same as -i, --input
2618
2619       SLURM_SPREAD_JOB      Same as --spread-job
2620
2621       SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2622                             if  set  and  non-zero, successive task exit mes‐
2623                             sages with the same exit  code  will  be  printed
2624                             only once.
2625
2626       SLURM_STEP_GRES       Same as --gres (only applies to job steps, not to
2627                             job allocations).  Also see SLURM_GRES
2628
2629       SLURM_STEP_KILLED_MSG_NODE_ID=ID
2630                             If set, only the specified node will log when the
2631                             job or step are killed by a signal.
2632
2633       SLURM_STDOUTMODE      Same as -o, --output
2634
2635       SLURM_TASK_EPILOG     Same as --task-epilog
2636
2637       SLURM_TASK_PROLOG     Same as --task-prolog
2638
2639       SLURM_TEST_EXEC       If  defined,  srun  will  verify existence of the
2640                             executable program along with user  execute  per‐
2641                             mission  on the node where srun was called before
2642                             attempting to launch it on nodes in the step.
2643
2644       SLURM_THREAD_SPEC     Same as --thread-spec
2645
2646       SLURM_THREADS         Same as -T, --threads
2647
2648       SLURM_TIMELIMIT       Same as -t, --time
2649
2650       SLURM_UNBUFFEREDIO    Same as -u, --unbuffered
2651
2652       SLURM_USE_MIN_NODES   Same as --use-min-nodes
2653
2654       SLURM_WAIT            Same as -W, --wait
2655
2656       SLURM_WAIT4SWITCH     Max time  waiting  for  requested  switches.  See
2657                             --switches
2658
2659       SLURM_WCKEY           Same as -W, --wckey
2660
2661       SLURM_WORKING_DIR     -D, --chdir
2662
2663       SRUN_EXPORT_ENV       Same  as  --export, and will override any setting
2664                             for SRUN_EXPORT_ENV.
2665
2666
2667

OUTPUT ENVIRONMENT VARIABLES

2669       srun will set some environment variables in the environment of the exe‐
2670       cuting  tasks on the remote compute nodes.  These environment variables
2671       are:
2672
2673
2674       SLURM_*_PACK_GROUP_#  For a heterogeneous job allocation, the  environ‐
2675                             ment variables are set separately for each compo‐
2676                             nent.
2677
2678       SLURM_CLUSTER_NAME    Name of the cluster on which the job  is  execut‐
2679                             ing.
2680
2681       SLURM_CPU_BIND_VERBOSE
2682                             --cpu-bind verbosity (quiet,verbose).
2683
2684       SLURM_CPU_BIND_TYPE   --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2685
2686       SLURM_CPU_BIND_LIST   --cpu-bind  map  or  mask list (list of Slurm CPU
2687                             IDs or masks for this node, CPU_ID =  Board_ID  x
2688                             threads_per_board       +       Socket_ID       x
2689                             threads_per_socket + Core_ID x threads_per_core +
2690                             Thread_ID).
2691
2692
2693       SLURM_CPU_FREQ_REQ    Contains the value requested for cpu frequency on
2694                             the srun command  as  a  numerical  frequency  in
2695                             kilohertz, or a coded value for a request of low,
2696                             medium,highm1 or high for the frequency.  See the
2697                             description  of  the  --cpu-freq  option  or  the
2698                             SLURM_CPU_FREQ_REQ input environment variable.
2699
2700       SLURM_CPUS_ON_NODE    Count of processors available to the job on  this
2701                             node.   Note  the  select/linear plugin allocates
2702                             entire nodes to jobs, so the value indicates  the
2703                             total  count  of  CPUs  on  the  node.   For  the
2704                             select/cons_res plugin, this number indicates the
2705                             number  of  cores  on  this node allocated to the
2706                             job.
2707
2708       SLURM_CPUS_PER_GPU    Number of CPUs requested per allocated GPU.  Only
2709                             set if the --cpus-per-gpu option is specified.
2710
2711       SLURM_CPUS_PER_TASK   Number  of  cpus requested per task.  Only set if
2712                             the --cpus-per-task option is specified.
2713
2714       SLURM_DISTRIBUTION    Distribution type for the allocated jobs. Set the
2715                             distribution with -m, --distribution.
2716
2717       SLURM_GPUS            Number  of  GPUs  requested.  Only set if the -G,
2718                             --gpus option is specified.
2719
2720       SLURM_GPU_BIND        Requested binding of tasks to GPU.  Only  set  if
2721                             the --gpu-bind option is specified.
2722
2723       SLURM_GPU_FREQ        Requested   GPU   frequency.   Only  set  if  the
2724                             --gpu-freq option is specified.
2725
2726       SLURM_GPUS_PER_NODE   Requested GPU count per allocated node.  Only set
2727                             if the --gpus-per-node option is specified.
2728
2729       SLURM_GPUS_PER_SOCKET Requested  GPU  count per allocated socket.  Only
2730                             set if the --gpus-per-socket option is specified.
2731
2732       SLURM_GPUS_PER_TASK   Requested GPU count per allocated task.  Only set
2733                             if the --gpus-per-task option is specified.
2734
2735       SLURM_GTIDS           Global  task IDs running on this node.  Zero ori‐
2736                             gin and comma separated.
2737
2738       SLURM_JOB_ACCOUNT     Account name associated of the job allocation.
2739
2740       SLURM_JOB_CPUS_PER_NODE
2741                             Number of CPUS per node.
2742
2743       SLURM_JOB_DEPENDENCY  Set to value of the --dependency option.
2744
2745       SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
2746                             Job id of the executing job.
2747
2748
2749       SLURM_JOB_NAME        Set to the value of the --job-name option or  the
2750                             command  name  when  srun is used to create a new
2751                             job allocation. Not set when srun is used only to
2752                             create  a  job  step (i.e. within an existing job
2753                             allocation).
2754
2755
2756       SLURM_JOB_PARTITION   Name of the partition in which the  job  is  run‐
2757                             ning.
2758
2759
2760       SLURM_JOB_QOS         Quality Of Service (QOS) of the job allocation.
2761
2762       SLURM_JOB_RESERVATION Advanced  reservation  containing the job alloca‐
2763                             tion, if any.
2764
2765
2766       SLURM_LAUNCH_NODE_IPADDR
2767                             IP address of the node from which the task launch
2768                             was initiated (where the srun command ran from).
2769
2770       SLURM_LOCALID         Node local task ID for the process within a job.
2771
2772
2773       SLURM_MEM_BIND_LIST   --mem-bind  map  or  mask  list  (<list of IDs or
2774                             masks for this node>).
2775
2776       SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2777
2778       SLURM_MEM_BIND_SORT   Sort free cache pages (run zonesort on Intel  KNL
2779                             nodes).
2780
2781       SLURM_MEM_BIND_TYPE   --mem-bind type (none,rank,map_mem:,mask_mem:).
2782
2783       SLURM_MEM_BIND_VERBOSE
2784                             --mem-bind verbosity (quiet,verbose).
2785
2786       SLURM_MEM_PER_GPU     Requested  memory per allocated GPU.  Only set if
2787                             the --mem-per-gpu option is specified.
2788
2789       SLURM_JOB_NODES       Total number of nodes in the job's resource allo‐
2790                             cation.
2791
2792       SLURM_NODE_ALIASES    Sets  of  node  name,  communication  address and
2793                             hostname for nodes allocated to the job from  the
2794                             cloud. Each element in the set if colon separated
2795                             and each set is comma separated. For example:
2796                             SLURM_NODE_ALIASES=
2797                             ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2798
2799       SLURM_NODEID          The relative node ID of the current node.
2800
2801       SLURM_JOB_NODELIST    List of nodes allocated to the job.
2802
2803       SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
2804                             Total number of processes in the current  job  or
2805                             job step.
2806
2807       SLURM_PACK_SIZE       Set to count of components in heterogeneous job.
2808
2809       SLURM_PRIO_PROCESS    The  scheduling priority (nice value) at the time
2810                             of job submission.  This value is  propagated  to
2811                             the spawned processes.
2812
2813       SLURM_PROCID          The MPI rank (or relative process ID) of the cur‐
2814                             rent process.
2815
2816       SLURM_SRUN_COMM_HOST  IP address of srun communication host.
2817
2818       SLURM_SRUN_COMM_PORT  srun communication port.
2819
2820       SLURM_STEP_LAUNCHER_PORT
2821                             Step launcher port.
2822
2823       SLURM_STEP_NODELIST   List of nodes allocated to the step.
2824
2825       SLURM_STEP_NUM_NODES  Number of nodes allocated to the step.
2826
2827       SLURM_STEP_NUM_TASKS  Number of processes in the step.
2828
2829       SLURM_STEP_TASKS_PER_NODE
2830                             Number of processes per node within the step.
2831
2832       SLURM_STEP_ID (and SLURM_STEPID for backwards compatibility)
2833                             The step ID of the current job.
2834
2835       SLURM_SUBMIT_DIR      The directory from which srun was invoked or,  if
2836                             applicable,  the  directory  specified by the -D,
2837                             --chdir option.
2838
2839       SLURM_SUBMIT_HOST     The hostname of the computer  from  which  salloc
2840                             was invoked.
2841
2842       SLURM_TASK_PID        The process ID of the task being started.
2843
2844       SLURM_TASKS_PER_NODE  Number  of  tasks  to  be initiated on each node.
2845                             Values are comma separated and in the same  order
2846                             as  SLURM_JOB_NODELIST.   If two or more consecu‐
2847                             tive nodes are to have the same task count,  that
2848                             count is followed by "(x#)" where "#" is the rep‐
2849                             etition        count.        For         example,
2850                             "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2851                             first three nodes will each execute  three  tasks
2852                             and the fourth node will execute one task.
2853
2854
2855       SLURM_TOPOLOGY_ADDR   This  is  set  only  if the system has the topol‐
2856                             ogy/tree plugin configured.  The  value  will  be
2857                             set  to  the  names network switches which may be
2858                             involved in the  job's  communications  from  the
2859                             system's top level switch down to the leaf switch
2860                             and ending with node name. A period  is  used  to
2861                             separate each hardware component name.
2862
2863       SLURM_TOPOLOGY_ADDR_PATTERN
2864                             This  is  set  only  if the system has the topol‐
2865                             ogy/tree plugin configured.  The  value  will  be
2866                             set   component   types  listed  in  SLURM_TOPOL‐
2867                             OGY_ADDR.  Each component will be  identified  as
2868                             either  "switch"  or "node".  A period is used to
2869                             separate each hardware component type.
2870
2871       SLURM_UMASK           The umask in effect when the job was submitted.
2872
2873       SLURMD_NODENAME       Name of the node running the task. In the case of
2874                             a  parallel  job  executing  on  multiple compute
2875                             nodes, the various tasks will have this  environ‐
2876                             ment  variable  set  to  different values on each
2877                             compute node.
2878
2879       SRUN_DEBUG            Set to the logging level  of  the  srun  command.
2880                             Default  value  is  3 (info level).  The value is
2881                             incremented or decremented based upon the  --ver‐
2882                             bose and --quiet options.
2883
2884

SIGNALS AND ESCAPE SEQUENCES

2886       Signals  sent  to  the  srun command are automatically forwarded to the
2887       tasks it is controlling with a  few  exceptions.  The  escape  sequence
2888       <control-c> will report the state of all tasks associated with the srun
2889       command. If <control-c> is entered twice within one  second,  then  the
2890       associated  SIGINT  signal  will be sent to all tasks and a termination
2891       sequence will be entered sending SIGCONT, SIGTERM, and SIGKILL  to  all
2892       spawned  tasks.   If  a third <control-c> is received, the srun program
2893       will be terminated without waiting for remote tasks to  exit  or  their
2894       I/O to complete.
2895
2896       The escape sequence <control-z> is presently ignored. Our intent is for
2897       this put the srun command into a mode where various special actions may
2898       be invoked.
2899
2900

MPI SUPPORT

2902       MPI  use depends upon the type of MPI being used.  There are three fun‐
2903       damentally different modes of  operation  used  by  these  various  MPI
2904       implementation.
2905
2906       1.  Slurm  directly  launches  the tasks and performs initialization of
2907       communications through the PMI2 or PMIx APIs.  For example: "srun  -n16
2908       a.out".
2909
2910       2.  Slurm  creates  a  resource  allocation for the job and then mpirun
2911       launches tasks using Slurm's infrastructure (OpenMPI).
2912
2913       3. Slurm creates a resource allocation for  the  job  and  then  mpirun
2914       launches  tasks  using  some mechanism other than Slurm, such as SSH or
2915       RSH.  These tasks are initiated outside of Slurm's monitoring  or  con‐
2916       trol. Slurm's epilog should be configured to purge these tasks when the
2917       job's allocation is relinquished, or  the  use  of  pam_slurm_adopt  is
2918       highly recommended.
2919
2920       See  https://slurm.schedmd.com/mpi_guide.html  for  more information on
2921       use of these various MPI implementation with Slurm.
2922
2923

MULTIPLE PROGRAM CONFIGURATION

2925       Comments in the configuration file must have a "#" in column one.   The
2926       configuration  file  contains  the  following fields separated by white
2927       space:
2928
2929       Task rank
2930              One or more task ranks to use this configuration.  Multiple val‐
2931              ues  may  be  comma separated.  Ranges may be indicated with two
2932              numbers separated with a '-' with the smaller number first (e.g.
2933              "0-4" and not "4-0").  To indicate all tasks not otherwise spec‐
2934              ified, specify a rank of '*' as the last line of the  file.   If
2935              an  attempt  is  made to initiate a task for which no executable
2936              program is defined, the following error message will be produced
2937              "No executable program specified for this task".
2938
2939       Executable
2940              The  name  of  the  program  to execute.  May be fully qualified
2941              pathname if desired.
2942
2943       Arguments
2944              Program arguments.  The expression "%t" will  be  replaced  with
2945              the  task's  number.   The expression "%o" will be replaced with
2946              the task's offset within this range (e.g. a configured task rank
2947              value  of  "1-5"  would  have  offset  values of "0-4").  Single
2948              quotes may be used to avoid having the  enclosed  values  inter‐
2949              preted.   This field is optional.  Any arguments for the program
2950              entered on the command line will be added to the arguments spec‐
2951              ified in the configuration file.
2952
2953       For example:
2954       ###################################################################
2955       # srun multiple program configuration file
2956       #
2957       # srun -n8 -l --multi-prog silly.conf
2958       ###################################################################
2959       4-6       hostname
2960       1,7       echo  task:%t
2961       0,2-3     echo  offset:%o
2962
2963       > srun -n8 -l --multi-prog silly.conf
2964       0: offset:0
2965       1: task:1
2966       2: offset:1
2967       3: offset:2
2968       4: linux15.llnl.gov
2969       5: linux16.llnl.gov
2970       6: linux17.llnl.gov
2971       7: task:7
2972
2973
2974
2975

EXAMPLES

2977       This  simple example demonstrates the execution of the command hostname
2978       in eight tasks. At least eight processors will be allocated to the  job
2979       (the same as the task count) on however many nodes are required to sat‐
2980       isfy the request. The output of each task will be  proceeded  with  its
2981       task  number.   (The  machine "dev" in the example below has a total of
2982       two CPUs per node)
2983
2984
2985       > srun -n8 -l hostname
2986       0: dev0
2987       1: dev0
2988       2: dev1
2989       3: dev1
2990       4: dev2
2991       5: dev2
2992       6: dev3
2993       7: dev3
2994
2995
2996       The srun -r option is used within a job script to run two job steps  on
2997       disjoint  nodes in the following example. The script is run using allo‐
2998       cate mode instead of as a batch job in this case.
2999
3000
3001       > cat test.sh
3002       #!/bin/sh
3003       echo $SLURM_JOB_NODELIST
3004       srun -lN2 -r2 hostname
3005       srun -lN2 hostname
3006
3007       > salloc -N4 test.sh
3008       dev[7-10]
3009       0: dev9
3010       1: dev10
3011       0: dev7
3012       1: dev8
3013
3014
3015       The following script runs two job steps in parallel within an allocated
3016       set of nodes.
3017
3018
3019       > cat test.sh
3020       #!/bin/bash
3021       srun -lN2 -n4 -r 2 sleep 60 &
3022       srun -lN2 -r 0 sleep 60 &
3023       sleep 1
3024       squeue
3025       squeue -s
3026       wait
3027
3028       > salloc -N4 test.sh
3029         JOBID PARTITION     NAME     USER  ST      TIME  NODES NODELIST
3030         65641     batch  test.sh   grondo   R      0:01      4 dev[7-10]
3031
3032       STEPID     PARTITION     USER      TIME NODELIST
3033       65641.0        batch   grondo      0:01 dev[7-8]
3034       65641.1        batch   grondo      0:01 dev[9-10]
3035
3036
3037       This  example  demonstrates  how one executes a simple MPI job.  We use
3038       srun to build a list of machines (nodes) to be used by  mpirun  in  its
3039       required  format.  A  sample command line and the script to be executed
3040       follow.
3041
3042
3043       > cat test.sh
3044       #!/bin/sh
3045       MACHINEFILE="nodes.$SLURM_JOB_ID"
3046
3047       # Generate Machinefile for mpi such that hosts are in the same
3048       #  order as if run via srun
3049       #
3050       srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3051
3052       # Run using generated Machine file:
3053       mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3054
3055       rm $MACHINEFILE
3056
3057       > salloc -N2 -n4 test.sh
3058
3059
3060       This simple example demonstrates the execution  of  different  jobs  on
3061       different  nodes  in  the same srun.  You can do this for any number of
3062       nodes or any number of jobs.  The executables are placed on  the  nodes
3063       sited by the SLURM_NODEID env var.  Starting at 0 and going to the num‐
3064       ber specified on the srun commandline.
3065
3066
3067       > cat test.sh
3068       case $SLURM_NODEID in
3069           0) echo "I am running on "
3070              hostname ;;
3071           1) hostname
3072              echo "is where I am running" ;;
3073       esac
3074
3075       > srun -N2 test.sh
3076       dev0
3077       is where I am running
3078       I am running on
3079       dev1
3080
3081
3082       This example demonstrates use of multi-core options to  control  layout
3083       of  tasks.   We  request  that  four sockets per node and two cores per
3084       socket be dedicated to the job.
3085
3086
3087       > srun -N2 -B 4-4:2-2 a.out
3088
3089       This example shows a script in which Slurm is used to provide  resource
3090       management  for  a job by executing the various job steps as processors
3091       become available for their dedicated use.
3092
3093
3094       > cat my.script
3095       #!/bin/bash
3096       srun --exclusive -n4 prog1 &
3097       srun --exclusive -n3 prog2 &
3098       srun --exclusive -n1 prog3 &
3099       srun --exclusive -n1 prog4 &
3100       wait
3101
3102
3103       This example shows how to launch an application  called  "master"  with
3104       one  task,  8  CPUs and and 16 GB of memory (2 GB per CPU) plus another
3105       application called "slave" with 16 tasks, 1 CPU per task (the  default)
3106       and 1 GB of memory per task.
3107
3108
3109       > srun -n1 -c16 --mem-per-cpu=1gb master : -n16 --mem-per-cpu=1gb slave
3110
3111

COPYING

3113       Copyright  (C)  2006-2007  The Regents of the University of California.
3114       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3115       Copyright (C) 2008-2010 Lawrence Livermore National Security.
3116       Copyright (C) 2010-2015 SchedMD LLC.
3117
3118       This file is  part  of  Slurm,  a  resource  management  program.   For
3119       details, see <https://slurm.schedmd.com/>.
3120
3121       Slurm  is free software; you can redistribute it and/or modify it under
3122       the terms of the GNU General Public License as published  by  the  Free
3123       Software  Foundation;  either  version  2  of  the License, or (at your
3124       option) any later version.
3125
3126       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
3127       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
3128       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
3129       for more details.
3130
3131