srun(1) - f34

1srun(1)                         Slurm Commands                         srun(1)
2
3
4

NAME

6       srun - Run parallel jobs
7
8

SYNOPSIS

10       srun  [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]]
11       executable(N) [args(N)...]
12
13       Option(s) define multiple jobs in  a  co-scheduled  heterogeneous  job.
14       For more details about heterogeneous jobs see the document
15       https://slurm.schedmd.com/heterogeneous_jobs.html
16
17

DESCRIPTION

19       Run  a  parallel  job  on cluster managed by Slurm.  If necessary, srun
20       will first create a resource allocation in which to  run  the  parallel
21       job.
22
23       The  following  document  describes the influence of various options on
24       the allocation of cpus to jobs and tasks.
25       https://slurm.schedmd.com/cpu_management.html
26
27

RETURN VALUE

29       srun will return the highest exit code of all tasks run or the  highest
30       signal  (with  the high-order bit set in an 8-bit integer -- e.g. 128 +
31       signal) of any task that exited with a signal.
32       The value 253 is reserved for out-of-memory errors.
33
34

EXECUTABLE PATH RESOLUTION

36       The executable is resolved in the following order:
37
38       1. If executable starts with ".", then path is constructed as:  current
39       working directory / executable
40       2. If executable starts with a "/", then path is considered absolute.
41       3. If executable can be resolved through PATH. See path_resolution(7).
42       4. If executable is in current working directory.
43
44       Current  working directory is the calling process working directory un‐
45       less the --chdir argument is passed, which will  override  the  current
46       working directory.
47
48

OPTIONS

50       --accel-bind=<options>
51              Control  how  tasks  are bound to generic resources of type gpu,
52              mic and nic.  Multiple options may be specified.  Supported  op‐
53              tions include:
54
55              g      Bind each task to GPUs which are closest to the allocated
56                     CPUs.
57
58              m      Bind each task to MICs which are closest to the allocated
59                     CPUs.
60
61              n      Bind each task to NICs which are closest to the allocated
62                     CPUs.
63
64              v      Verbose mode. Log how tasks are bound to GPU and NIC  de‐
65                     vices.
66
67              This option applies to job allocations.
68
69
70       -A, --account=<account>
71              Charge resources used by this job to specified account.  The ac‐
72              count is an arbitrary string. The account name  may  be  changed
73              after job submission using the scontrol command. This option ap‐
74              plies to job allocations.
75
76
77       --acctg-freq
78              Define the job  accounting  and  profiling  sampling  intervals.
79              This  can be used to override the JobAcctGatherFrequency parame‐
80              ter in Slurm's configuration file,  slurm.conf.   The  supported
81              format is follows:
82
83              --acctg-freq=<datatype>=<interval>
84                          where  <datatype>=<interval> specifies the task sam‐
85                          pling interval for the jobacct_gather  plugin  or  a
86                          sampling  interval  for  a  profiling  type  by  the
87                          acct_gather_profile  plugin.  Multiple,  comma-sepa‐
88                          rated  <datatype>=<interval> intervals may be speci‐
89                          fied. Supported datatypes are as follows:
90
91                          task=<interval>
92                                 where <interval> is the task sampling  inter‐
93                                 val in seconds for the jobacct_gather plugins
94                                 and    for    task    profiling    by     the
95                                 acct_gather_profile  plugin.  NOTE: This fre‐
96                                 quency is used to monitor  memory  usage.  If
97                                 memory  limits  are enforced the highest fre‐
98                                 quency a user can request is what is  config‐
99                                 ured  in  the  slurm.conf file.  They can not
100                                 turn it off (=0) either.
101
102                          energy=<interval>
103                                 where <interval> is the sampling interval  in
104                                 seconds   for   energy  profiling  using  the
105                                 acct_gather_energy plugin
106
107                          network=<interval>
108                                 where <interval> is the sampling interval  in
109                                 seconds  for  infiniband  profiling using the
110                                 acct_gather_interconnect plugin.
111
112                          filesystem=<interval>
113                                 where <interval> is the sampling interval  in
114                                 seconds  for  filesystem  profiling using the
115                                 acct_gather_filesystem plugin.
116
117              The default value for the task  sampling  in‐
118              terval
119              is  30.  The default value for all other intervals is 0.  An in‐
120              terval of 0 disables sampling of the  specified  type.   If  the
121              task sampling interval is 0, accounting information is collected
122              only at job termination (reducing Slurm  interference  with  the
123              job).
124              Smaller (non-zero) values have a greater impact upon job perfor‐
125              mance, but a value of 30 seconds is not likely to be  noticeable
126              for  applications having less than 10,000 tasks. This option ap‐
127              plies job allocations.
128
129
130       -B --extra-node-info=<sockets[:cores[:threads]]>
131              Restrict node selection to nodes with  at  least  the  specified
132              number  of  sockets,  cores  per socket and/or threads per core.
133              NOTE: These options do not specify the resource allocation size.
134              Each  value  specified is considered a minimum.  An asterisk (*)
135              can be used as a placeholder indicating that all  available  re‐
136              sources  of  that  type  are  to be utilized. Values can also be
137              specified as min-max. The individual levels can also  be  speci‐
138              fied in separate options if desired:
139                  --sockets-per-node=<sockets>
140                  --cores-per-socket=<cores>
141                  --threads-per-core=<threads>
142              If  task/affinity  plugin is enabled, then specifying an alloca‐
143              tion in this manner also sets a  default  --cpu-bind  option  of
144              threads  if the -B option specifies a thread count, otherwise an
145              option of cores if a core count is specified, otherwise  an  op‐
146              tion   of   sockets.    If   SelectType  is  configured  to  se‐
147              lect/cons_res, it must have a parameter of CR_Core, CR_Core_Mem‐
148              ory,  CR_Socket,  or CR_Socket_Memory for this option to be hon‐
149              ored.  If not specified, the  scontrol  show  job  will  display
150              'ReqS:C:T=*:*:*'. This option applies to job allocations.  NOTE:
151              This    option    is    mutually    exclusive    with    --hint,
152              --threads-per-core and --ntasks-per-core.
153
154
155       --bb=<spec>
156              Burst  buffer  specification.  The  form of the specification is
157              system dependent.  Also see --bbf. This option  applies  to  job
158              allocations.
159
160
161       --bbf=<file_name>
162              Path of file containing burst buffer specification.  The form of
163              the specification is system dependent.  Also see --bb. This  op‐
164              tion applies to job allocations.
165
166
167       --bcast[=<dest_path>]
168              Copy executable file to allocated compute nodes.  If a file name
169              is specified, copy the executable to the  specified  destination
170              file path.  If the path specified ends with '/' it is treated as
171              a target directory,  and  the  destination  file  name  will  be
172              slurm_bcast_<job_id>.<step_id>_<nodename>.   If  no dest_path is
173              specified, then the current working directory is used,  and  the
174              filename   follows   the  above  pattern.   For  example,  "srun
175              --bcast=/tmp/mine -N3 a.out" will copy  the  file  "a.out"  from
176              your  current  directory  to the file "/tmp/mine" on each of the
177              three allocated compute nodes and execute that file. This option
178              applies to step allocations.
179
180
181       -b, --begin=<time>
182              Defer  initiation  of this job until the specified time.  It ac‐
183              cepts times of the form HH:MM:SS to run a job at a specific time
184              of  day  (seconds are optional).  (If that time is already past,
185              the next day is assumed.)  You may also specify midnight,  noon,
186              fika  (3  PM)  or  teatime (4 PM) and you can have a time-of-day
187              suffixed with AM or  PM  for  running  in  the  morning  or  the
188              evening.   You  can  also  say  what day the job will be run, by
189              specifying a date of the form  MMDDYY  or  MM/DD/YY  YYYY-MM-DD.
190              Combine    date    and   time   using   the   following   format
191              YYYY-MM-DD[THH:MM[:SS]]. You can also  give  times  like  now  +
192              count time-units, where the time-units can be seconds (default),
193              minutes, hours, days, or weeks and you can tell Slurm to run the
194              job  today  with  the  keyword today and to run the job tomorrow
195              with the keyword tomorrow.  The value may be changed  after  job
196              submission using the scontrol command.  For example:
197                 --begin=16:00
198                 --begin=now+1hour
199                 --begin=now+60           (seconds by default)
200                 --begin=2010-01-20T12:34:00
201
202
203              Notes on date/time specifications:
204               -  Although the 'seconds' field of the HH:MM:SS time specifica‐
205              tion is allowed by the code, note that  the  poll  time  of  the
206              Slurm  scheduler  is not precise enough to guarantee dispatch of
207              the job on the exact second.  The job will be eligible to  start
208              on  the  next  poll following the specified time. The exact poll
209              interval depends on the Slurm scheduler (e.g., 60  seconds  with
210              the default sched/builtin).
211               -   If   no  time  (HH:MM:SS)  is  specified,  the  default  is
212              (00:00:00).
213               - If a date is specified without a year (e.g., MM/DD) then  the
214              current  year  is  assumed,  unless the combination of MM/DD and
215              HH:MM:SS has already passed for that year,  in  which  case  the
216              next year is used.
217              This option applies to job allocations.
218
219
220       --cluster-constraint=<list>
221              Specifies  features that a federated cluster must have to have a
222              sibling job submitted to it. Slurm will attempt to submit a sib‐
223              ling  job  to  a cluster if it has at least one of the specified
224              features.
225
226
227       --comment=<string>
228              An arbitrary comment. This option applies to job allocations.
229
230
231       --compress[=type]
232              Compress file before sending it to compute hosts.  The  optional
233              argument  specifies  the  data  compression  library to be used.
234              Supported values are "lz4" (default) and "zlib".  Some  compres‐
235              sion libraries may be unavailable on some systems.  For use with
236              the --bcast option. This option applies to step allocations.
237
238
239       -C, --constraint=<list>
240              Nodes can have features assigned to them by the  Slurm  adminis‐
241              trator.   Users can specify which of these features are required
242              by their job using the constraint  option.   Only  nodes  having
243              features  matching  the  job constraints will be used to satisfy
244              the request.  Multiple constraints may be  specified  with  AND,
245              OR,  matching  OR, resource counts, etc. (some operators are not
246              supported on all system types).   Supported  constraint  options
247              include:
248
249              Single Name
250                     Only nodes which have the specified feature will be used.
251                     For example, --constraint="intel"
252
253              Node Count
254                     A request can specify the number  of  nodes  needed  with
255                     some feature by appending an asterisk and count after the
256                     feature   name.    For   example,    --nodes=16    --con‐
257                     straint="graphics*4 ..."  indicates that the job requires
258                     16 nodes and that at least four of those nodes must  have
259                     the feature "graphics."
260
261              AND    If  only  nodes  with  all  of specified features will be
262                     used.  The ampersand is used for an  AND  operator.   For
263                     example, --constraint="intel&gpu"
264
265              OR     If  only  nodes  with  at least one of specified features
266                     will be used.  The vertical bar is used for an OR  opera‐
267                     tor.  For example, --constraint="intel|amd"
268
269              Matching OR
270                     If  only  one of a set of possible options should be used
271                     for all allocated nodes, then use the OR operator and en‐
272                     close  the  options within square brackets.  For example,
273                     --constraint="[rack1|rack2|rack3|rack4]" might be used to
274                     specify that all nodes must be allocated on a single rack
275                     of the cluster, but any of those four racks can be used.
276
277              Multiple Counts
278                     Specific counts of multiple resources may be specified by
279                     using  the  AND operator and enclosing the options within
280                     square      brackets.       For      example,      --con‐
281                     straint="[rack1*2&rack2*4]" might be used to specify that
282                     two nodes must be allocated from nodes with  the  feature
283                     of  "rack1"  and  four nodes must be allocated from nodes
284                     with the feature "rack2".
285
286                     NOTE: This construct does not support multiple Intel  KNL
287                     NUMA   or   MCDRAM   modes.  For  example,  while  --con‐
288                     straint="[(knl&quad)*2&(knl&hemi)*4]" is  not  supported,
289                     --constraint="[haswell*2&(knl&hemi)*4]"   is   supported.
290                     Specification of multiple KNL modes requires the use of a
291                     heterogeneous job.
292
293              Brackets
294                     Brackets can be used to indicate that you are looking for
295                     a set of nodes with the different requirements  contained
296                     within     the     brackets.    For    example,    --con‐
297                     straint="[(rack1|rack2)*1&(rack3)*2]" will  get  you  one
298                     node  with either the "rack1" or "rack2" features and two
299                     nodes with the "rack3" feature.  The same request without
300                     the  brackets  will  try to find a single node that meets
301                     those requirements.
302
303              Parenthesis
304                     Parenthesis can be used to group like node  features  to‐
305                     gether.           For           example,           --con‐
306                     straint="[(knl&snc4&flat)*4&haswell*1]" might be used  to
307                     specify  that  four nodes with the features "knl", "snc4"
308                     and "flat" plus one node with the feature  "haswell"  are
309                     required.   All  options  within  parenthesis  should  be
310                     grouped with AND (e.g. "&") operands.
311
312              WARNING: When srun is executed from within salloc or sbatch, the
313              constraint value can only contain a single feature name. None of
314              the other operators are currently supported for job steps.
315              This option applies to job and step allocations.
316
317
318       --contiguous
319              If set, then the allocated nodes must form a contiguous set.
320
321              NOTE: If SelectPlugin=cons_res this option won't be honored with
322              the  topology/tree  or  topology/3d_torus plugins, both of which
323              can modify the node ordering. This option applies to job alloca‐
324              tions.
325
326
327       --cores-per-socket=<cores>
328              Restrict  node  selection  to  nodes with at least the specified
329              number of cores per socket.  See additional information under -B
330              option  above  when task/affinity plugin is enabled. This option
331              applies to job allocations.
332
333
334       --cpu-bind=[{quiet,verbose},]type
335              Bind tasks  to  CPUs.   Used  only  when  the  task/affinity  or
336              task/cgroup  plugin  is enabled.  NOTE: To have Slurm always re‐
337              port on the selected CPU binding for all commands executed in  a
338              shell, you can enable verbose mode by setting the SLURM_CPU_BIND
339              environment variable value to "verbose".
340
341              The following informational environment variables are  set  when
342              --cpu-bind is in use:
343                   SLURM_CPU_BIND_VERBOSE
344                   SLURM_CPU_BIND_TYPE
345                   SLURM_CPU_BIND_LIST
346
347              See  the  ENVIRONMENT  VARIABLES section for a more detailed de‐
348              scription of  the  individual  SLURM_CPU_BIND  variables.  These
349              variable  are available only if the task/affinity plugin is con‐
350              figured.
351
352              When using --cpus-per-task to run multithreaded tasks, be  aware
353              that  CPU  binding  is inherited from the parent of the process.
354              This means that the multithreaded task should either specify  or
355              clear  the CPU binding itself to avoid having all threads of the
356              multithreaded task use the same mask/CPU as the parent.   Alter‐
357              natively,  fat  masks (masks which specify more than one allowed
358              CPU) could be used for the tasks in order  to  provide  multiple
359              CPUs for the multithreaded tasks.
360
361              Note  that a job step can be allocated different numbers of CPUs
362              on each node or be allocated CPUs not starting at location zero.
363              Therefore  one  of  the options which automatically generate the
364              task binding is  recommended.   Explicitly  specified  masks  or
365              bindings  are  only honored when the job step has been allocated
366              every available CPU on the node.
367
368              Binding a task to a NUMA locality domain means to bind the  task
369              to  the  set  of CPUs that belong to the NUMA locality domain or
370              "NUMA node".  If NUMA locality domain options are used  on  sys‐
371              tems  with no NUMA support, then each socket is considered a lo‐
372              cality domain.
373
374              If the --cpu-bind option is not used, the default  binding  mode
375              will  depend  upon Slurm's configuration and the step's resource
376              allocation.  If all allocated nodes  have  the  same  configured
377              CpuBind  mode, that will be used.  Otherwise if the job's Parti‐
378              tion has a configured CpuBind mode, that will be  used.   Other‐
379              wise  if Slurm has a configured TaskPluginParam value, that mode
380              will be used.  Otherwise automatic binding will be performed  as
381              described below.
382
383
384              Auto Binding
385                     Applies  only  when  task/affinity is enabled. If the job
386                     step allocation includes an allocation with a  number  of
387                     sockets,  cores,  or threads equal to the number of tasks
388                     times cpus-per-task, then the tasks will  by  default  be
389                     bound  to  the appropriate resources (auto binding). Dis‐
390                     able  this  mode  of  operation  by  explicitly   setting
391                     "--cpu-bind=none".        Use       TaskPluginParam=auto‐
392                     bind=[threads|cores|sockets] to set a default cpu binding
393                     in case "auto binding" doesn't find a match.
394
395              Supported options include:
396
397                     q[uiet]
398                            Quietly bind before task runs (default)
399
400                     v[erbose]
401                            Verbosely report binding before task runs
402
403                     no[ne] Do  not  bind  tasks  to CPUs (default unless auto
404                            binding is applied)
405
406                     rank   Automatically bind by task rank.  The lowest  num‐
407                            bered  task  on  each  node is bound to socket (or
408                            core or thread) zero, etc.  Not  supported  unless
409                            the entire node is allocated to the job.
410
411                     map_cpu:<list>
412                            Bind  by  setting CPU masks on tasks (or ranks) as
413                            specified         where         <list>          is
414                            <cpu_id_for_task_0>,<cpu_id_for_task_1>,...    CPU
415                            IDs are interpreted as decimal values unless  they
416                            are  preceded  with '0x' in which case they inter‐
417                            preted as hexadecimal values.  If  the  number  of
418                            tasks (or ranks) exceeds the number of elements in
419                            this list, elements in the list will be reused  as
420                            needed  starting  from  the beginning of the list.
421                            To simplify support for  large  task  counts,  the
422                            lists  may follow a map with an asterisk and repe‐
423                            tition         count.          For         example
424                            "map_cpu:0x0f*4,0xf0*4".  Not supported unless the
425                            entire node is allocated to the job.
426
427                     mask_cpu:<list>
428                            Bind by setting CPU masks on tasks (or  ranks)  as
429                            specified          where         <list>         is
430                            <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
431                            The  mapping is specified for a node and identical
432                            mapping is applied to  the  tasks  on  every  node
433                            (i.e. the lowest task ID on each node is mapped to
434                            the first mask specified in the list, etc.).   CPU
435                            masks are always interpreted as hexadecimal values
436                            but can be preceded with an optional '0x'.  If the
437                            number  of  tasks (or ranks) exceeds the number of
438                            elements in this list, elements in the  list  will
439                            be reused as needed starting from the beginning of
440                            the list.  To  simplify  support  for  large  task
441                            counts,  the lists may follow a map with an aster‐
442                            isk   and   repetition   count.     For    example
443                            "mask_cpu:0x0f*4,0xf0*4".   Not  supported  unless
444                            the entire node is allocated to the job.
445
446                     rank_ldom
447                            Bind to a NUMA locality domain by rank.  Not  sup‐
448                            ported  unless the entire node is allocated to the
449                            job.
450
451                     map_ldom:<list>
452                            Bind by mapping NUMA locality domain IDs to  tasks
453                            as       specified       where      <list>      is
454                            <ldom1>,<ldom2>,...<ldomN>.  The  locality  domain
455                            IDs  are interpreted as decimal values unless they
456                            are preceded with '0x' in which case they are  in‐
457                            terpreted  as  hexadecimal  values.  Not supported
458                            unless the entire node is allocated to the job.
459
460                     mask_ldom:<list>
461                            Bind by setting  NUMA  locality  domain  masks  on
462                            tasks     as    specified    where    <list>    is
463                            <mask1>,<mask2>,...<maskN>.  NUMA locality  domain
464                            masks are always interpreted as hexadecimal values
465                            but can be preceded with an  optional  '0x'.   Not
466                            supported  unless  the entire node is allocated to
467                            the job.
468
469                     sockets
470                            Automatically  generate  masks  binding  tasks  to
471                            sockets.   Only  the CPUs on the socket which have
472                            been allocated to the job will be  used.   If  the
473                            number  of  tasks differs from the number of allo‐
474                            cated sockets this can result in sub-optimal bind‐
475                            ing.
476
477                     cores  Automatically  generate  masks  binding  tasks  to
478                            cores.  If the number of tasks  differs  from  the
479                            number  of  allocated  cores  this  can  result in
480                            sub-optimal binding.
481
482                     threads
483                            Automatically  generate  masks  binding  tasks  to
484                            threads.   If the number of tasks differs from the
485                            number of allocated threads  this  can  result  in
486                            sub-optimal binding.
487
488                     ldoms  Automatically generate masks binding tasks to NUMA
489                            locality domains.  If the number of tasks  differs
490                            from the number of allocated locality domains this
491                            can result in sub-optimal binding.
492
493                     boards Automatically  generate  masks  binding  tasks  to
494                            boards.   If  the number of tasks differs from the
495                            number of allocated  boards  this  can  result  in
496                            sub-optimal  binding.  This option is supported by
497                            the task/cgroup plugin only.
498
499                     help   Show help message for cpu-bind
500
501              This option applies to job and step allocations.
502
503
504       --cpu-freq =<p1[-p2[:p3]]>
505
506              Request that the job step initiated by this srun command be  run
507              at  some  requested  frequency if possible, on the CPUs selected
508              for the step on the compute node(s).
509
510              p1 can be  [#### | low | medium | high | highm1] which will  set
511              the  frequency scaling_speed to the corresponding value, and set
512              the frequency scaling_governor to UserSpace. See below for defi‐
513              nition of the values.
514
515              p1  can  be  [Conservative | OnDemand | Performance | PowerSave]
516              which will set the scaling_governor to the corresponding  value.
517              The  governor has to be in the list set by the slurm.conf option
518              CpuFreqGovernors.
519
520              When p2 is present, p1 will be the minimum scaling frequency and
521              p2 will be the maximum scaling frequency.
522
523              p2  can  be   [#### | medium | high | highm1] p2 must be greater
524              than p1.
525
526              p3 can be [Conservative | OnDemand | Performance |  PowerSave  |
527              UserSpace]  which  will  set  the  governor to the corresponding
528              value.
529
530              If p3 is UserSpace, the frequency scaling_speed will be set by a
531              power  or energy aware scheduling strategy to a value between p1
532              and p2 that lets the job run within the site's power  goal.  The
533              job  may be delayed if p1 is higher than a frequency that allows
534              the job to run within the goal.
535
536              If the current frequency is < min, it will be set to min.  Like‐
537              wise, if the current frequency is > max, it will be set to max.
538
539              Acceptable values at present include:
540
541              ####          frequency in kilohertz
542
543              Low           the lowest available frequency
544
545              High          the highest available frequency
546
547              HighM1        (high  minus  one)  will  select  the next highest
548                            available frequency
549
550              Medium        attempts to set a frequency in the middle  of  the
551                            available range
552
553              Conservative  attempts to use the Conservative CPU governor
554
555              OnDemand      attempts to use the OnDemand CPU governor (the de‐
556                            fault value)
557
558              Performance   attempts to use the Performance CPU governor
559
560              PowerSave     attempts to use the PowerSave CPU governor
561
562              UserSpace     attempts to use the UserSpace CPU governor
563
564
565              The following informational environment variable  is  set
566              in the job
567              step when --cpu-freq option is requested.
568                      SLURM_CPU_FREQ_REQ
569
570              This  environment  variable can also be used to supply the value
571              for the CPU frequency request if it is set when the 'srun'  com‐
572              mand  is  issued.  The --cpu-freq on the command line will over‐
573              ride the environment variable value.  The form on  the  environ‐
574              ment variable is the same as the command line.  See the ENVIRON‐
575              MENT   VARIABLES   section   for   a    description    of    the
576              SLURM_CPU_FREQ_REQ variable.
577
578              NOTE: This parameter is treated as a request, not a requirement.
579              If the job step's node does not support  setting  the  CPU  fre‐
580              quency,  or the requested value is outside the bounds of the le‐
581              gal frequencies, an error is logged, but the job step is allowed
582              to continue.
583
584              NOTE:  Setting  the  frequency for just the CPUs of the job step
585              implies that the tasks are confined to those CPUs.  If task con‐
586              finement    (i.e.,    TaskPlugin=task/affinity    or    TaskPlu‐
587              gin=task/cgroup with the "ConstrainCores" option) is not config‐
588              ured, this parameter is ignored.
589
590              NOTE:  When  the  step  completes, the frequency and governor of
591              each selected CPU is reset to the previous values.
592
593              NOTE: When submitting jobs with  the --cpu-freq option with lin‐
594              uxproc  as  the  ProctrackType can cause jobs to run too quickly
595              before Accounting is able to poll for job information. As a  re‐
596              sult not all of accounting information will be present.
597
598              This option applies to job and step allocations.
599
600
601       --cpus-per-gpu=<ncpus>
602              Advise  Slurm  that ensuing job steps will require ncpus proces‐
603              sors per allocated GPU.  Not compatible with the --cpus-per-task
604              option.
605
606
607       -c, --cpus-per-task=<ncpus>
608              Request  that ncpus be allocated per process. This may be useful
609              if the job is multithreaded and requires more than one  CPU  per
610              task  for  optimal  performance.  The  default  is  one  CPU per
611              process.  If -c is specified without -n, as many tasks  will  be
612              allocated  per node as possible while satisfying the -c restric‐
613              tion. For instance on a cluster with 8 CPUs per node, a job  re‐
614              quest  for  4  nodes and 3 CPUs per task may be allocated 3 or 6
615              CPUs per node (1 or 2 tasks per node)  depending  upon  resource
616              consumption  by  other jobs. Such a job may be unable to execute
617              more than a total of 4 tasks.
618
619              WARNING: There are configurations and options  interpreted  dif‐
620              ferently by job and job step requests which can result in incon‐
621              sistencies   for   this   option.    For   example   srun    -c2
622              --threads-per-core=1  prog  may  allocate two cores for the job,
623              but if each of those cores contains two threads, the job alloca‐
624              tion  will  include four CPUs. The job step allocation will then
625              launch two threads per CPU for a total of two tasks.
626
627              WARNING: When srun is executed from  within  salloc  or  sbatch,
628              there  are configurations and options which can result in incon‐
629              sistent allocations when -c has a value greater than -c on  sal‐
630              loc or sbatch.
631
632              This option applies to job allocations.
633
634
635       --deadline=<OPT>
636              remove  the  job  if  no ending is possible before this deadline
637              (start > (deadline -  time[-min])).   Default  is  no  deadline.
638              Valid time formats are:
639              HH:MM[:SS] [AM|PM]
640              MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
641              MM/DD[/YY]-HH:MM[:SS]
642              YYYY-MM-DD[THH:MM[:SS]]]
643              now[+count[seconds(default)|minutes|hours|days|weeks]]
644
645              This option applies only to job allocations.
646
647
648       --delay-boot=<minutes>
649              Do  not  reboot  nodes  in order to satisfied this job's feature
650              specification if the job has been eligible to run for less  than
651              this time period.  If the job has waited for less than the spec‐
652              ified period, it will use only  nodes  which  already  have  the
653              specified features.  The argument is in units of minutes.  A de‐
654              fault value may be set by a system administrator using  the  de‐
655              lay_boot option of the SchedulerParameters configuration parame‐
656              ter in the slurm.conf file, otherwise the default value is  zero
657              (no delay).
658
659              This option applies only to job allocations.
660
661
662       -d, --dependency=<dependency_list>
663              Defer  the  start  of  this job until the specified dependencies
664              have been satisfied completed. This option does not apply to job
665              steps  (executions  of  srun within an existing salloc or sbatch
666              allocation) only to job allocations.   <dependency_list>  is  of
667              the    form   <type:job_id[:job_id][,type:job_id[:job_id]]>   or
668              <type:job_id[:job_id][?type:job_id[:job_id]]>.  All dependencies
669              must  be satisfied if the "," separator is used.  Any dependency
670              may be satisfied if the "?" separator is used.  Only one separa‐
671              tor  may  be  used.  Many jobs can share the same dependency and
672              these jobs may even belong to different  users. The   value  may
673              be changed after job submission using the scontrol command.  De‐
674              pendencies on remote jobs are allowed in a federation.   Once  a
675              job dependency fails due to the termination state of a preceding
676              job, the dependent job will never be run, even if the  preceding
677              job  is requeued and has a different termination state in a sub‐
678              sequent execution. This option applies to job allocations.
679
680              after:job_id[[+time][:jobid[+time]...]]
681                     After the specified  jobs  start  or  are  cancelled  and
682                     'time' in minutes from job start or cancellation happens,
683                     this job can begin execution. If no 'time' is given  then
684                     there is no delay after start or cancellation.
685
686              afterany:job_id[:jobid...]
687                     This  job  can  begin  execution after the specified jobs
688                     have terminated.
689
690              afterburstbuffer:job_id[:jobid...]
691                     This job can begin execution  after  the  specified  jobs
692                     have terminated and any associated burst buffer stage out
693                     operations have completed.
694
695              aftercorr:job_id[:jobid...]
696                     A task of this job array can begin  execution  after  the
697                     corresponding  task ID in the specified job has completed
698                     successfully (ran to completion  with  an  exit  code  of
699                     zero).
700
701              afternotok:job_id[:jobid...]
702                     This  job  can  begin  execution after the specified jobs
703                     have terminated in some failed state (non-zero exit code,
704                     node failure, timed out, etc).
705
706              afterok:job_id[:jobid...]
707                     This  job  can  begin  execution after the specified jobs
708                     have successfully executed (ran  to  completion  with  an
709                     exit code of zero).
710
711              expand:job_id
712                     Resources  allocated to this job should be used to expand
713                     the specified job.  The job to expand must share the same
714                     QOS  (Quality of Service) and partition.  Gang scheduling
715                     of resources in the  partition  is  also  not  supported.
716                     "expand" is not allowed for jobs that didn't originate on
717                     the same cluster as the submitted job.
718
719              singleton
720                     This  job  can  begin  execution  after  any   previously
721                     launched  jobs  sharing  the  same job name and user have
722                     terminated.  In other words, only one job  by  that  name
723                     and owned by that user can be running or suspended at any
724                     point in time.  In a federation, a  singleton  dependency
725                     must be fulfilled on all clusters unless DependencyParam‐
726                     eters=disable_remote_singleton is used in slurm.conf.
727
728
729       -D, --chdir=<path>
730              Have the remote processes do a chdir to  path  before  beginning
731              execution. The default is to chdir to the current working direc‐
732              tory of the srun process. The path can be specified as full path
733              or relative path to the directory where the command is executed.
734              This option applies to job allocations.
735
736
737       -e, --error=<filename pattern>
738              Specify how stderr is to be redirected. By default  in  interac‐
739              tive  mode, srun redirects stderr to the same file as stdout, if
740              one is specified. The --error option is provided to allow stdout
741              and  stderr to be redirected to different locations.  See IO Re‐
742              direction below for more options.  If the specified file already
743              exists,  it  will be overwritten. This option applies to job and
744              step allocations.
745
746
747       -E, --preserve-env
748              Pass   the   current    values    of    environment    variables
749              SLURM_JOB_NUM_NODES  and SLURM_NTASKS through to the executable,
750              rather than computing them from commandline parameters. This op‐
751              tion applies to job allocations.
752
753
754       --exact
755              Allow  a  step  access  to  only the resources requested for the
756              step.  By default, all non-GRES resources on each  node  in  the
757              step  allocation  will be used. Note that no other parallel step
758              will have access to those CPUs unless  --overlap  is  specified.
759              This option applies to step allocations.
760
761
762       --epilog=<executable>
763              srun will run executable just after the job step completes.  The
764              command line arguments for executable will be  the  command  and
765              arguments  of  the  job  step.  If executable is "none", then no
766              srun epilog will be run. This parameter overrides the SrunEpilog
767              parameter  in  slurm.conf. This parameter is completely indepen‐
768              dent from the Epilog parameter in slurm.conf.  This  option  ap‐
769              plies to job allocations.
770
771
772
773       --exclusive[=user|mcs]
774              This option applies to job and job step allocations, and has two
775              slightly different meanings for each one.  When used to initiate
776              a  job, the job allocation cannot share nodes with other running
777              jobs  (or just other users with the "=user" option or "=mcs" op‐
778              tion).   The default shared/exclusive behavior depends on system
779              configuration and the  partition's  OverSubscribe  option  takes
780              precedence over the job's option.
781
782              This  option  can also be used when initiating more than one job
783              step within an existing resource allocation (default), where you
784              want  separate  processors  to be dedicated to each job step. If
785              sufficient processors are not  available  to  initiate  the  job
786              step, it will be deferred. This can be thought of as providing a
787              mechanism for resource management to the job within its  alloca‐
788              tion (--exact implied).
789
790              The  exclusive  allocation  of  CPUs applies to job steps by de‐
791              fault. In order to share the resources use the --overlap option.
792
793              See EXAMPLE below.
794
795
796       --export=<[ALL,]environment variables|ALL|NONE>
797              Identify which environment variables from the  submission  envi‐
798              ronment are propagated to the launched application.
799
800              --export=ALL
801                        Default  mode if --export is not specified. All of the
802                        users environment will be loaded from callers environ‐
803                        ment.
804
805              --export=NONE
806                        None  of  the  user  environment will be defined. User
807                        must use absolute path to the binary  to  be  executed
808                        that will define the environment. User can not specify
809                        explicit environment variables with NONE.
810                        This option is particularly important  for  jobs  that
811                        are  submitted on one cluster and execute on a differ‐
812                        ent cluster (e.g. with  different  paths).   To  avoid
813                        steps  inheriting  environment  export  settings (e.g.
814                        NONE) from sbatch command, either set --export=ALL  or
815                        the  environment  variable  SLURM_EXPORT_ENV should be
816                        set to ALL.
817
818              --export=<[ALL,]environment variables>
819                        Exports all SLURM* environment  variables  along  with
820                        explicitly  defined  variables.  Multiple  environment
821                        variable names should be comma separated.  Environment
822                        variable  names may be specified to propagate the cur‐
823                        rent value (e.g. "--export=EDITOR") or specific values
824                        may  be  exported (e.g. "--export=EDITOR=/bin/emacs").
825                        If ALL is specified, then all user  environment  vari‐
826                        ables will be loaded and will take precedence over any
827                        explicitly given environment variables.
828
829                   Example: --export=EDITOR,ARG1=test
830                        In this example, the propagated environment will  only
831                        contain  the  variable EDITOR from the user's environ‐
832                        ment, SLURM_* environment variables, and ARG1=test.
833
834                   Example: --export=ALL,EDITOR=/bin/emacs
835                        There are two possible outcomes for this  example.  If
836                        the  caller  has  the  EDITOR environment variable de‐
837                        fined, then the job's  environment  will  inherit  the
838                        variable from the caller's environment.  If the caller
839                        doesn't have an environment variable defined for  EDI‐
840                        TOR,  then  the  job's  environment will use the value
841                        given by --export.
842
843
844       -F, --nodefile=<node file>
845              Much like --nodelist, but the list is contained  in  a  file  of
846              name node file.  The node names of the list may also span multi‐
847              ple lines in the file.    Duplicate node names in the file  will
848              be  ignored.  The order of the node names in the list is not im‐
849              portant; the node names will be sorted by Slurm.
850
851
852       --gid=<group>
853              If srun is run as root, and the --gid option is used, submit the
854              job  with  group's  group  access permissions.  group may be the
855              group name or the numerical group ID. This option applies to job
856              allocations.
857
858
859       -G, --gpus=[<type>:]<number>
860              Specify  the  total number of GPUs required for the job.  An op‐
861              tional GPU type specification  can  be  supplied.   For  example
862              "--gpus=volta:3".   Multiple options can be requested in a comma
863              separated list,  for  example:  "--gpus=volta:3,kepler:1".   See
864              also  the --gpus-per-node, --gpus-per-socket and --gpus-per-task
865              options.
866
867
868       --gpu-bind=[verbose,]<type>
869              Bind tasks to specific GPUs.  By default every spawned task  can
870              access  every GPU allocated to the job.  If "verbose," is speci‐
871              fied before <type>, then print out GPU binding information.
872
873              Supported type options:
874
875              closest   Bind each task to the GPU(s) which are closest.  In  a
876                        NUMA  environment, each task may be bound to more than
877                        one GPU (i.e.  all GPUs in that NUMA environment).
878
879              map_gpu:<list>
880                        Bind by setting GPU masks on tasks (or ranks) as spec‐
881                        ified            where            <list>            is
882                        <gpu_id_for_task_0>,<gpu_id_for_task_1>,...  GPU   IDs
883                        are interpreted as decimal values unless they are pre‐
884                        ceded with '0x' in  which  case  they  interpreted  as
885                        hexadecimal  values. If the number of tasks (or ranks)
886                        exceeds the number of elements in this list,  elements
887                        in the list will be reused as needed starting from the
888                        beginning of the list. To simplify support  for  large
889                        task counts, the lists may follow a map with an aster‐
890                        isk    and    repetition    count.     For     example
891                        "map_gpu:0*4,1*4".   If the task/cgroup plugin is used
892                        and ConstrainDevices is set in cgroup.conf,  then  the
893                        GPU  IDs  are  zero-based indexes relative to the GPUs
894                        allocated to the job (e.g. the first GPU is 0, even if
895                        the global ID is 3). Otherwise, the GPU IDs are global
896                        IDs, and all GPUs on each node in the  job  should  be
897                        allocated for predictable binding results.
898
899              mask_gpu:<list>
900                        Bind by setting GPU masks on tasks (or ranks) as spec‐
901                        ified            where            <list>            is
902                        <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,...    The
903                        mapping is specified for a node and identical  mapping
904                        is applied to the tasks on every node (i.e. the lowest
905                        task ID on each node is mapped to the first mask spec‐
906                        ified  in the list, etc.). GPU masks are always inter‐
907                        preted as hexadecimal values but can be preceded  with
908                        an  optional  '0x'. To simplify support for large task
909                        counts, the lists may follow a map  with  an  asterisk
910                        and      repetition      count.       For      example
911                        "mask_gpu:0x0f*4,0xf0*4".  If the  task/cgroup  plugin
912                        is  used  and  ConstrainDevices is set in cgroup.conf,
913                        then the GPU IDs are zero-based  indexes  relative  to
914                        the  GPUs  allocated to the job (e.g. the first GPU is
915                        0, even if the global ID is 3). Otherwise, the GPU IDs
916                        are  global  IDs, and all GPUs on each node in the job
917                        should be allocated for predictable binding results.
918
919              single:<tasks_per_gpu>
920                        Like --gpu-bind=closest, except  that  each  task  can
921                        only  be  bound  to  a single GPU, even when it can be
922                        bound to multiple GPUs that are  equally  close.   The
923                        GPU to bind to is determined by <tasks_per_gpu>, where
924                        the first <tasks_per_gpu> tasks are bound to the first
925                        GPU  available,  the  second <tasks_per_gpu> tasks are
926                        bound to the second GPU available, etc.  This is basi‐
927                        cally  a  block  distribution  of tasks onto available
928                        GPUs, where the available GPUs are determined  by  the
929                        socket affinity of the task and the socket affinity of
930                        the GPUs as specified in gres.conf's Cores parameter.
931
932
933       --gpu-freq=[<type]=value>[,<type=value>][,verbose]
934              Request that GPUs allocated to the job are configured with  spe‐
935              cific  frequency  values.   This  option can be used to indepen‐
936              dently configure the GPU and its memory frequencies.  After  the
937              job  is  completed, the frequencies of all affected GPUs will be
938              reset to the highest possible values.   In  some  cases,  system
939              power  caps  may  override the requested values.  The field type
940              can be "memory".  If type is not specified, the GPU frequency is
941              implied.  The value field can either be "low", "medium", "high",
942              "highm1" or a numeric value in megahertz (MHz).  If  the  speci‐
943              fied numeric value is not possible, a value as close as possible
944              will be used. See below for definition of the values.  The  ver‐
945              bose  option  causes  current  GPU  frequency  information to be
946              logged.  Examples of use include "--gpu-freq=medium,memory=high"
947              and "--gpu-freq=450".
948
949              Supported value definitions:
950
951              low       the lowest available frequency.
952
953              medium    attempts  to  set  a  frequency  in  the middle of the
954                        available range.
955
956              high      the highest available frequency.
957
958              highm1    (high minus one) will select the next  highest  avail‐
959                        able frequency.
960
961
962       --gpus-per-node=[<type>:]<number>
963              Specify the number of GPUs required for the job on each node in‐
964              cluded in the job's resource allocation.  An optional  GPU  type
965              specification      can     be     supplied.      For     example
966              "--gpus-per-node=volta:3".  Multiple options can be requested in
967              a       comma       separated       list,      for      example:
968              "--gpus-per-node=volta:3,kepler:1".   See   also   the   --gpus,
969              --gpus-per-socket and --gpus-per-task options.
970
971
972       --gpus-per-socket=[<type>:]<number>
973              Specify  the  number of GPUs required for the job on each socket
974              included in the job's resource allocation.  An optional GPU type
975              specification      can     be     supplied.      For     example
976              "--gpus-per-socket=volta:3".  Multiple options can be  requested
977              in      a     comma     separated     list,     for     example:
978              "--gpus-per-socket=volta:3,kepler:1".  Requires job to specify a
979              sockets  per  node  count  (  --sockets-per-node).  See also the
980              --gpus, --gpus-per-node and --gpus-per-task options.   This  op‐
981              tion applies to job allocations.
982
983
984       --gpus-per-task=[<type>:]<number>
985              Specify  the number of GPUs required for the job on each task to
986              be spawned in the job's resource allocation.   An  optional  GPU
987              type    specification    can    be    supplied.    For   example
988              "--gpus-per-task=volta:1". Multiple options can be requested  in
989              a       comma       separated       list,      for      example:
990              "--gpus-per-task=volta:3,kepler:1".   See   also   the   --gpus,
991              --gpus-per-socket  and --gpus-per-node options.  This option re‐
992              quires an explicit task count, e.g. -n,  --ntasks  or  "--gpus=X
993              --gpus-per-task=Y"  rather than an ambiguous range of nodes with
994              -N, --nodes.
995              NOTE: This option will not  have  any  impact  on  GPU  binding,
996              specifically  it  won't  limit  the  number  of  devices set for
997              CUDA_VISIBLE_DEVICES.
998
999
1000       --gres=<list>
1001              Specifies a comma  delimited  list  of  generic  consumable  re‐
1002              sources.    The   format   of   each   entry   on  the  list  is
1003              "name[[:type]:count]".  The name is that of the  consumable  re‐
1004              source.   The  count is the number of those resources with a de‐
1005              fault value of 1.  The count can have a suffix  of  "k"  or  "K"
1006              (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
1007              "G" (multiple of 1024 x 1024 x 1024), "t" or  "T"  (multiple  of
1008              1024  x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
1009              x 1024 x 1024 x 1024).  The specified resources  will  be  allo‐
1010              cated to the job on each node.  The available generic consumable
1011              resources is configurable by the system administrator.   A  list
1012              of  available  generic  consumable resources will be printed and
1013              the command will exit if the option argument is  "help".   Exam‐
1014              ples of use include "--gres=gpu:2,mic:1", "--gres=gpu:kepler:2",
1015              and "--gres=help".  NOTE: This option applies to  job  and  step
1016              allocations.  By  default,  a  job  step is allocated all of the
1017              generic resources that have  been  allocated  to  the  job.   To
1018              change  the  behavior  so  that  each  job  step is allocated no
1019              generic resources, explicitly set the value of --gres to specify
1020              zero  counts  for  each generic resource OR set "--gres=none" OR
1021              set the SLURM_STEP_GRES environment variable to "none".
1022
1023
1024       --gres-flags=<type>
1025              Specify generic resource task binding options.  This option  ap‐
1026              plies to job allocations.
1027
1028              disable-binding
1029                     Disable  filtering  of  CPUs  with respect to generic re‐
1030                     source locality.  This option is  currently  required  to
1031                     use  more CPUs than are bound to a GRES (i.e. if a GPU is
1032                     bound to the CPUs on one socket, but  resources  on  more
1033                     than  one  socket are required to run the job).  This op‐
1034                     tion may permit a job to be  allocated  resources  sooner
1035                     than otherwise possible, but may result in lower job per‐
1036                     formance.
1037                     NOTE: This option is specific to SelectType=cons_res.
1038
1039              enforce-binding
1040                     The only CPUs available to the job will be those bound to
1041                     the  selected  GRES  (i.e.  the  CPUs  identified  in the
1042                     gres.conf file will be strictly  enforced).  This  option
1043                     may result in delayed initiation of a job.  For example a
1044                     job requiring two GPUs and one CPU will be delayed  until
1045                     both  GPUs  on  a single socket are available rather than
1046                     using GPUs bound to separate sockets, however, the appli‐
1047                     cation performance may be improved due to improved commu‐
1048                     nication speed.  Requires the node to be configured  with
1049                     more  than one socket and resource filtering will be per‐
1050                     formed on a per-socket basis.
1051                     NOTE: This option is specific to SelectType=cons_tres.
1052
1053
1054       -H, --hold
1055              Specify the job is to be submitted in a held state (priority  of
1056              zero).   A  held job can now be released using scontrol to reset
1057              its priority (e.g. "scontrol release <job_id>"). This option ap‐
1058              plies to job allocations.
1059
1060
1061       -h, --help
1062              Display help information and exit.
1063
1064
1065       --hint=<type>
1066              Bind tasks according to application hints.
1067              NOTE:  This  option  cannot  be  used in conjunction with any of
1068              --ntasks-per-core, --threads-per-core,  --cpu-bind  (other  than
1069              --cpu-bind=verbose)  or  -B. If --hint is specified as a command
1070              line argument, it will take precedence over the environment.
1071
1072              compute_bound
1073                     Select settings for compute bound applications:  use  all
1074                     cores in each socket, one thread per core.
1075
1076              memory_bound
1077                     Select  settings  for memory bound applications: use only
1078                     one core in each socket, one thread per core.
1079
1080              [no]multithread
1081                     [don't] use extra threads  with  in-core  multi-threading
1082                     which  can  benefit communication intensive applications.
1083                     Only supported with the task/affinity plugin.
1084
1085              help   show this help message
1086
1087              This option applies to job allocations.
1088
1089
1090       -I, --immediate[=<seconds>]
1091              exit if resources are not available within the time period spec‐
1092              ified.   If  no  argument  is given (seconds defaults to 1), re‐
1093              sources must be available immediately for the  request  to  suc‐
1094              ceed.  If  defer  is  configured in SchedulerParameters and sec‐
1095              onds=1 the allocation request will fail immediately; defer  con‐
1096              flicts and takes precedence over this option.  By default, --im‐
1097              mediate is off, and the command will block until  resources  be‐
1098              come  available.  Since  this option's argument is optional, for
1099              proper parsing the single letter option must be followed immedi‐
1100              ately  with  the value and not include a space between them. For
1101              example "-I60" and not "-I 60". This option applies to  job  and
1102              step allocations.
1103
1104
1105       -i, --input=<mode>
1106              Specify  how  stdin is to redirected. By default, srun redirects
1107              stdin from the terminal all tasks. See IO Redirection below  for
1108              more  options.   For  OS X, the poll() function does not support
1109              stdin, so input from a terminal is not possible. This option ap‐
1110              plies to job and step allocations.
1111
1112
1113       -J, --job-name=<jobname>
1114              Specify a name for the job. The specified name will appear along
1115              with the job id number when querying running jobs on the system.
1116              The  default  is  the  supplied executable program's name. NOTE:
1117              This information may be written to the  slurm_jobacct.log  file.
1118              This  file  is space delimited so if a space is used in the job‐
1119              name name it will cause problems in properly displaying the con‐
1120              tents  of  the  slurm_jobacct.log file when the sacct command is
1121              used. This option applies to job and step allocations.
1122
1123
1124       --jobid=<jobid>
1125              Initiate a job step under an already allocated job with  job  id
1126              id.   Using  this option will cause srun to behave exactly as if
1127              the SLURM_JOB_ID environment variable was set. This  option  ap‐
1128              plies to step allocations.
1129
1130
1131       -K, --kill-on-bad-exit[=0|1]
1132              Controls  whether  or  not to terminate a step if any task exits
1133              with a non-zero exit code. If this option is not specified,  the
1134              default action will be based upon the Slurm configuration param‐
1135              eter of KillOnBadExit. If this option is specified, it will take
1136              precedence  over  KillOnBadExit. An option argument of zero will
1137              not terminate the job. A non-zero argument or no  argument  will
1138              terminate  the job.  Note: This option takes precedence over the
1139              -W, --wait option to terminate the job immediately if a task ex‐
1140              its  with a non-zero exit code.  Since this option's argument is
1141              optional, for proper parsing the single letter  option  must  be
1142              followed  immediately with the value and not include a space be‐
1143              tween them. For example "-K1" and not "-K 1".
1144
1145
1146       -k, --no-kill [=off]
1147              Do not automatically terminate a job if one of the nodes it  has
1148              been  allocated fails. This option applies to job and step allo‐
1149              cations.   The  job  will  assume   all   responsibilities   for
1150              fault-tolerance.   Tasks  launch  using  this option will not be
1151              considered  terminated  (e.g.  -K,  --kill-on-bad-exit  and  -W,
1152              --wait  options will have no effect upon the job step).  The ac‐
1153              tive job step (MPI job) will likely suffer a  fatal  error,  but
1154              subsequent job steps may be run if this option is specified.
1155
1156              Specify  an optional argument of "off" disable the effect of the
1157              SLURM_NO_KILL environment variable.
1158
1159              The default action is to terminate the job upon node failure.
1160
1161
1162       -l, --label
1163              Prepend task number to lines of stdout/err.  The --label  option
1164              will  prepend  lines of output with the remote task id. This op‐
1165              tion applies to step allocations.
1166
1167
1168       -L, --licenses=<license>
1169              Specification of licenses (or other resources available  on  all
1170              nodes  of the cluster) which must be allocated to this job.  Li‐
1171              cense names can be followed by a colon and  count  (the  default
1172              count is one).  Multiple license names should be comma separated
1173              (e.g.  "--licenses=foo:4,bar"). This option applies to job allo‐
1174              cations.
1175
1176
1177       -M, --clusters=<string>
1178              Clusters  to  issue  commands to.  Multiple cluster names may be
1179              comma separated.  The job will be submitted to the  one  cluster
1180              providing the earliest expected job initiation time. The default
1181              value is the current cluster. A value of 'all' will query to run
1182              on  all  clusters.  Note the --export option to control environ‐
1183              ment variables exported between clusters.  This  option  applies
1184              only  to job allocations.  Note that the SlurmDBD must be up for
1185              this option to work properly.
1186
1187
1188       -m, --distribution=
1189              *|block|cyclic|arbitrary|plane=<options>
1190              [:*|block|cyclic|fcyclic[:*|block|
1191              cyclic|fcyclic]][,Pack|NoPack]
1192
1193              Specify alternate distribution  methods  for  remote  processes.
1194              This  option  controls the distribution of tasks to the nodes on
1195              which resources have been allocated,  and  the  distribution  of
1196              those  resources to tasks for binding (task affinity). The first
1197              distribution method (before the first ":") controls the  distri‐
1198              bution of tasks to nodes.  The second distribution method (after
1199              the first ":")  controls  the  distribution  of  allocated  CPUs
1200              across  sockets  for  binding  to  tasks. The third distribution
1201              method (after the second ":") controls the distribution of allo‐
1202              cated  CPUs  across  cores for binding to tasks.  The second and
1203              third distributions apply only if task affinity is enabled.  The
1204              third  distribution  is supported only if the task/cgroup plugin
1205              is configured. The default value for each distribution  type  is
1206              specified by *.
1207
1208              Note  that with select/cons_res and select/cons_tres, the number
1209              of CPUs allocated to each socket and node may be different.  Re‐
1210              fer to https://slurm.schedmd.com/mc_support.html for more infor‐
1211              mation on resource allocation, distribution of tasks  to  nodes,
1212              and binding of tasks to CPUs.
1213              First distribution method (distribution of tasks across nodes):
1214
1215
1216              *      Use  the  default  method for distributing tasks to nodes
1217                     (block).
1218
1219              block  The block distribution method will distribute tasks to  a
1220                     node  such that consecutive tasks share a node. For exam‐
1221                     ple, consider an allocation of three nodes each with  two
1222                     cpus.  A  four-task  block distribution request will dis‐
1223                     tribute those tasks to the nodes with tasks one  and  two
1224                     on  the  first  node,  task three on the second node, and
1225                     task four on the third node.  Block distribution  is  the
1226                     default  behavior if the number of tasks exceeds the num‐
1227                     ber of allocated nodes.
1228
1229              cyclic The cyclic distribution method will distribute tasks to a
1230                     node  such  that  consecutive  tasks are distributed over
1231                     consecutive nodes (in a round-robin fashion).  For  exam‐
1232                     ple,  consider an allocation of three nodes each with two
1233                     cpus. A four-task cyclic distribution request  will  dis‐
1234                     tribute  those tasks to the nodes with tasks one and four
1235                     on the first node, task two on the second node, and  task
1236                     three  on  the  third node.  Note that when SelectType is
1237                     select/cons_res, the same number of CPUs may not be allo‐
1238                     cated on each node. Task distribution will be round-robin
1239                     among all the nodes with  CPUs  yet  to  be  assigned  to
1240                     tasks.   Cyclic  distribution  is the default behavior if
1241                     the number of tasks is no larger than the number of allo‐
1242                     cated nodes.
1243
1244              plane  The  tasks are distributed in blocks of a specified size.
1245                     The number of tasks distributed to each node is the  same
1246                     as  for  cyclic distribution, but the taskids assigned to
1247                     each node depend on the plane size. Additional  distribu‐
1248                     tion  specifications cannot be combined with this option.
1249                     For  more  details  (including  examples  and  diagrams),
1250                     please see
1251                     https://slurm.schedmd.com/mc_support.html
1252                     and
1253                     https://slurm.schedmd.com/dist_plane.html
1254
1255              arbitrary
1256                     The  arbitrary  method of distribution will allocate pro‐
1257                     cesses in-order as listed in file designated by the envi‐
1258                     ronment  variable  SLURM_HOSTFILE.   If  this variable is
1259                     listed it will over ride any other method specified.   If
1260                     not  set  the  method  will default to block.  Inside the
1261                     hostfile must contain at minimum the number of hosts  re‐
1262                     quested and be one per line or comma separated.  If spec‐
1263                     ifying a task count (-n, --ntasks=<number>),  your  tasks
1264                     will be laid out on the nodes in the order of the file.
1265                     NOTE:  The arbitrary distribution option on a job alloca‐
1266                     tion only controls the nodes to be allocated to  the  job
1267                     and  not  the allocation of CPUs on those nodes. This op‐
1268                     tion is meant primarily to control a job step's task lay‐
1269                     out in an existing job allocation for the srun command.
1270                     NOTE:  If  the number of tasks is given and a list of re‐
1271                     quested nodes is also given, the  number  of  nodes  used
1272                     from  that list will be reduced to match that of the num‐
1273                     ber of tasks if the  number  of  nodes  in  the  list  is
1274                     greater than the number of tasks.
1275
1276
1277              Second  distribution method (distribution of CPUs across sockets
1278              for binding):
1279
1280
1281              *      Use the default method for distributing CPUs across sock‐
1282                     ets (cyclic).
1283
1284              block  The  block  distribution method will distribute allocated
1285                     CPUs consecutively from the same socket  for  binding  to
1286                     tasks, before using the next consecutive socket.
1287
1288              cyclic The  cyclic distribution method will distribute allocated
1289                     CPUs for binding to a given task consecutively  from  the
1290                     same socket, and from the next consecutive socket for the
1291                     next task, in a round-robin fashion across sockets.
1292
1293              fcyclic
1294                     The fcyclic distribution method will distribute allocated
1295                     CPUs  for  binding to tasks from consecutive sockets in a
1296                     round-robin fashion across the sockets.
1297
1298
1299              Third distribution method (distribution of CPUs across cores for
1300              binding):
1301
1302
1303              *      Use the default method for distributing CPUs across cores
1304                     (inherited from second distribution method).
1305
1306              block  The block distribution method will  distribute  allocated
1307                     CPUs  consecutively  from  the  same  core for binding to
1308                     tasks, before using the next consecutive core.
1309
1310              cyclic The cyclic distribution method will distribute  allocated
1311                     CPUs  for  binding to a given task consecutively from the
1312                     same core, and from the next  consecutive  core  for  the
1313                     next task, in a round-robin fashion across cores.
1314
1315              fcyclic
1316                     The fcyclic distribution method will distribute allocated
1317                     CPUs for binding to tasks from  consecutive  cores  in  a
1318                     round-robin fashion across the cores.
1319
1320
1321
1322              Optional control for task distribution over nodes:
1323
1324
1325              Pack   Rather than evenly distributing a job step's tasks evenly
1326                     across its allocated nodes, pack them as tightly as  pos‐
1327                     sible  on  the nodes.  This only applies when the "block"
1328                     task distribution method is used.
1329
1330              NoPack Rather than packing a job step's tasks as tightly as pos‐
1331                     sible  on  the  nodes, distribute them evenly.  This user
1332                     option   will    supersede    the    SelectTypeParameters
1333                     CR_Pack_Nodes configuration parameter.
1334
1335              This option applies to job and step allocations.
1336
1337
1338       --mail-type=<type>
1339              Notify user by email when certain event types occur.  Valid type
1340              values are NONE, BEGIN, END, FAIL, REQUEUE, ALL  (equivalent  to
1341              BEGIN,  END,  FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
1342              VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1343              fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1344              (reached 90 percent of time limit),  TIME_LIMIT_80  (reached  80
1345              percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1346              time limit).  Multiple type values may be specified in  a  comma
1347              separated  list.   The  user  to  be  notified is indicated with
1348              --mail-user. This option applies to job allocations.
1349
1350
1351       --mail-user=<user>
1352              User to receive email notification of state changes  as  defined
1353              by  --mail-type.  The default value is the submitting user. This
1354              option applies to job allocations.
1355
1356
1357       --mcs-label=<mcs>
1358              Used only when the mcs/group plugin is enabled.  This  parameter
1359              is  a group among the groups of the user.  Default value is cal‐
1360              culated by the Plugin mcs if it's enabled. This  option  applies
1361              to job allocations.
1362
1363
1364       --mem=<size[units]>
1365              Specify  the  real  memory required per node.  Default units are
1366              megabytes.  Different units can be specified  using  the  suffix
1367              [K|M|G|T].  Default value is DefMemPerNode and the maximum value
1368              is MaxMemPerNode. If configured, both of parameters can be  seen
1369              using  the  scontrol  show config command.  This parameter would
1370              generally be used if whole nodes are allocated to jobs  (Select‐
1371              Type=select/linear).   Specifying  a  memory limit of zero for a
1372              job step will restrict the job step to the amount of memory  al‐
1373              located to the job, but not remove any of the job's memory allo‐
1374              cation from being  available  to  other  job  steps.   Also  see
1375              --mem-per-cpu  and  --mem-per-gpu.  The --mem, --mem-per-cpu and
1376              --mem-per-gpu  options  are  mutually   exclusive.   If   --mem,
1377              --mem-per-cpu or --mem-per-gpu are specified as command line ar‐
1378              guments, then they will take  precedence  over  the  environment
1379              (potentially inherited from salloc or sbatch).
1380
1381              NOTE:  A  memory size specification of zero is treated as a spe‐
1382              cial case and grants the job access to all of the memory on each
1383              node  for  newly  submitted jobs and all available job memory to
1384              new job steps.
1385
1386              Specifying new memory limits for job steps are only advisory.
1387
1388              If the job is allocated multiple nodes in a heterogeneous  clus‐
1389              ter,  the  memory limit on each node will be that of the node in
1390              the allocation with the smallest memory size  (same  limit  will
1391              apply to every node in the job's allocation).
1392
1393              NOTE:  Enforcement  of  memory  limits currently relies upon the
1394              task/cgroup plugin or enabling of accounting, which samples mem‐
1395              ory  use on a periodic basis (data need not be stored, just col‐
1396              lected). In both cases memory use is based upon the job's  Resi‐
1397              dent  Set  Size  (RSS). A task may exceed the memory limit until
1398              the next periodic accounting sample.
1399
1400              This option applies to job and step allocations.
1401
1402
1403       --mem-per-cpu=<size[units]>
1404              Minimum memory required per allocated CPU.   Default  units  are
1405              megabytes.   Different  units  can be specified using the suffix
1406              [K|M|G|T].  The default value is DefMemPerCPU  and  the  maximum
1407              value is MaxMemPerCPU (see exception below). If configured, both
1408              parameters can be seen using the scontrol show  config  command.
1409              Note  that  if the job's --mem-per-cpu value exceeds the config‐
1410              ured MaxMemPerCPU, then the user's limit will be  treated  as  a
1411              memory  limit per task; --mem-per-cpu will be reduced to a value
1412              no larger than MaxMemPerCPU; --cpus-per-task will be set and the
1413              value  of  --cpus-per-task  multiplied  by the new --mem-per-cpu
1414              value will equal the original --mem-per-cpu value  specified  by
1415              the  user.  This parameter would generally be used if individual
1416              processors are allocated to  jobs  (SelectType=select/cons_res).
1417              If resources are allocated by core, socket, or whole nodes, then
1418              the number of CPUs allocated to a job may  be  higher  than  the
1419              task count and the value of --mem-per-cpu should be adjusted ac‐
1420              cordingly.  Specifying a memory limit of zero  for  a  job  step
1421              will  restrict the job step to the amount of memory allocated to
1422              the job, but not remove any of the job's memory allocation  from
1423              being  available  to  other  job  steps.   Also  see  --mem  and
1424              --mem-per-gpu.  The --mem, --mem-per-cpu and  --mem-per-gpu  op‐
1425              tions are mutually exclusive.
1426
1427              NOTE:  If the final amount of memory requested by a job can't be
1428              satisfied by any of the nodes configured in the  partition,  the
1429              job  will  be  rejected.   This could happen if --mem-per-cpu is
1430              used with the  --exclusive  option  for  a  job  allocation  and
1431              --mem-per-cpu times the number of CPUs on a node is greater than
1432              the total memory of that node.
1433
1434
1435       --mem-per-gpu=<size[units]>
1436              Minimum memory required per allocated GPU.   Default  units  are
1437              megabytes.   Different  units  can be specified using the suffix
1438              [K|M|G|T].  Default value is DefMemPerGPU and  is  available  on
1439              both  a  global and per partition basis.  If configured, the pa‐
1440              rameters can be seen using the scontrol show config and scontrol
1441              show   partition   commands.    Also   see  --mem.   The  --mem,
1442              --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1443
1444
1445       --mem-bind=[{quiet,verbose},]type
1446              Bind tasks to memory. Used only when the task/affinity plugin is
1447              enabled  and the NUMA memory functions are available.  Note that
1448              the resolution of CPU and memory binding may differ on some  ar‐
1449              chitectures.  For  example,  CPU binding may be performed at the
1450              level of the cores within a processor while memory binding  will
1451              be  performed  at  the  level  of nodes, where the definition of
1452              "nodes" may differ from system to system.  By default no  memory
1453              binding is performed; any task using any CPU can use any memory.
1454              This option is typically used to ensure that each task is  bound
1455              to  the  memory closest to its assigned CPU. The use of any type
1456              other than "none" or "local" is not recommended.   If  you  want
1457              greater control, try running a simple test code with the options
1458              "--cpu-bind=verbose,none --mem-bind=verbose,none"  to  determine
1459              the specific configuration.
1460
1461              NOTE: To have Slurm always report on the selected memory binding
1462              for all commands executed in a shell,  you  can  enable  verbose
1463              mode by setting the SLURM_MEM_BIND environment variable value to
1464              "verbose".
1465
1466              The following informational environment variables are  set  when
1467              --mem-bind is in use:
1468
1469                   SLURM_MEM_BIND_LIST
1470                   SLURM_MEM_BIND_PREFER
1471                   SLURM_MEM_BIND_SORT
1472                   SLURM_MEM_BIND_TYPE
1473                   SLURM_MEM_BIND_VERBOSE
1474
1475              See  the  ENVIRONMENT  VARIABLES section for a more detailed de‐
1476              scription of the individual SLURM_MEM_BIND* variables.
1477
1478              Supported options include:
1479
1480              help   show this help message
1481
1482              local  Use memory local to the processor in use
1483
1484              map_mem:<list>
1485                     Bind by setting memory masks on tasks (or ranks) as spec‐
1486                     ified             where             <list>             is
1487                     <numa_id_for_task_0>,<numa_id_for_task_1>,...   The  map‐
1488                     ping is specified for a node and identical mapping is ap‐
1489                     plied to the tasks on every node (i.e. the lowest task ID
1490                     on  each  node is mapped to the first ID specified in the
1491                     list, etc.).  NUMA IDs are interpreted as decimal  values
1492                     unless they are preceded with '0x' in which case they in‐
1493                     terpreted as hexadecimal values.  If the number of  tasks
1494                     (or  ranks)  exceeds the number of elements in this list,
1495                     elements in the list will be reused  as  needed  starting
1496                     from  the beginning of the list.  To simplify support for
1497                     large task counts, the lists may follow a map with an as‐
1498                     terisk     and    repetition    count.     For    example
1499                     "map_mem:0x0f*4,0xf0*4".   For  predictable  binding  re‐
1500                     sults,  all CPUs for each node in the job should be allo‐
1501                     cated to the job.
1502
1503              mask_mem:<list>
1504                     Bind by setting memory masks on tasks (or ranks) as spec‐
1505                     ified             where             <list>             is
1506                     <numa_mask_for_task_0>,<numa_mask_for_task_1>,...     The
1507                     mapping  is specified for a node and identical mapping is
1508                     applied to the tasks on every node (i.e. the lowest  task
1509                     ID  on each node is mapped to the first mask specified in
1510                     the list, etc.).  NUMA masks are  always  interpreted  as
1511                     hexadecimal  values.   Note  that  masks must be preceded
1512                     with a '0x' if they don't begin with [0-9]  so  they  are
1513                     seen  as  numerical  values.   If the number of tasks (or
1514                     ranks) exceeds the number of elements in this list,  ele‐
1515                     ments  in the list will be reused as needed starting from
1516                     the beginning of the list.  To simplify support for large
1517                     task counts, the lists may follow a mask with an asterisk
1518                     and repetition count.   For  example  "mask_mem:0*4,1*4".
1519                     For  predictable  binding results, all CPUs for each node
1520                     in the job should be allocated to the job.
1521
1522              no[ne] don't bind tasks to memory (default)
1523
1524              nosort avoid sorting free cache pages (default, LaunchParameters
1525                     configuration parameter can override this default)
1526
1527              p[refer]
1528                     Prefer use of first specified NUMA node, but permit
1529                      use of other available NUMA nodes.
1530
1531              q[uiet]
1532                     quietly bind before task runs (default)
1533
1534              rank   bind by task rank (not recommended)
1535
1536              sort   sort free cache pages (run zonesort on Intel KNL nodes)
1537
1538              v[erbose]
1539                     verbosely report binding before task runs
1540
1541              This option applies to job and step allocations.
1542
1543
1544       --mincpus=<n>
1545              Specify  a  minimum  number of logical cpus/processors per node.
1546              This option applies to job allocations.
1547
1548
1549       --msg-timeout=<seconds>
1550              Modify the job launch message timeout.   The  default  value  is
1551              MessageTimeout  in  the  Slurm  configuration  file  slurm.conf.
1552              Changes to this are typically not recommended, but could be use‐
1553              ful  to  diagnose  problems.  This option applies to job alloca‐
1554              tions.
1555
1556
1557       --mpi=<mpi_type>
1558              Identify the type of MPI to be used. May result in unique initi‐
1559              ation procedures.
1560
1561              list   Lists available mpi types to choose from.
1562
1563              pmi2   To  enable  PMI2 support. The PMI2 support in Slurm works
1564                     only if the MPI  implementation  supports  it,  in  other
1565                     words  if the MPI has the PMI2 interface implemented. The
1566                     --mpi=pmi2 will load  the  library  lib/slurm/mpi_pmi2.so
1567                     which  provides  the  server  side  functionality but the
1568                     client side must implement PMI2_Init() and the other  in‐
1569                     terface calls.
1570
1571              pmix   To enable PMIx support (https://pmix.github.io). The PMIx
1572                     support in Slurm can be used to launch parallel  applica‐
1573                     tions (e.g. MPI) if it supports PMIx, PMI2 or PMI1. Slurm
1574                     must be configured with pmix support by passing  "--with-
1575                     pmix=<PMIx  installation  path>" option to its "./config‐
1576                     ure" script.
1577
1578                     At the time of writing PMIx  is  supported  in  Open  MPI
1579                     starting  from  version 2.0.  PMIx also supports backward
1580                     compatibility with PMI1 and PMI2 and can be used  if  MPI
1581                     was  configured  with  PMI2/PMI1  support pointing to the
1582                     PMIx library ("libpmix").  If MPI supports PMI1/PMI2  but
1583                     doesn't  provide the way to point to a specific implemen‐
1584                     tation, a hack'ish solution leveraging LD_PRELOAD can  be
1585                     used to force "libpmix" usage.
1586
1587
1588              none   No  special MPI processing. This is the default and works
1589                     with many other versions of MPI.
1590
1591              This option applies to step allocations.
1592
1593
1594       --multi-prog
1595              Run a job with different programs and  different  arguments  for
1596              each task. In this case, the executable program specified is ac‐
1597              tually a configuration file specifying the executable and  argu‐
1598              ments  for  each  task. See MULTIPLE PROGRAM CONFIGURATION below
1599              for details on the configuration file contents. This option  ap‐
1600              plies to step allocations.
1601
1602
1603       -N, --nodes=<minnodes[-maxnodes]>
1604              Request  that  a  minimum of minnodes nodes be allocated to this
1605              job.  A maximum node count may also be specified with  maxnodes.
1606              If  only one number is specified, this is used as both the mini‐
1607              mum and maximum node count.  The partition's node limits  super‐
1608              sede  those  of  the job.  If a job's node limits are outside of
1609              the range permitted for its associated partition, the  job  will
1610              be  left in a PENDING state.  This permits possible execution at
1611              a later time, when the partition limit is  changed.   If  a  job
1612              node  limit exceeds the number of nodes configured in the parti‐
1613              tion, the job will be rejected.  Note that the environment vari‐
1614              able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1615              ibility) will be set to the count of nodes actually allocated to
1616              the job. See the ENVIRONMENT VARIABLES section for more informa‐
1617              tion.  If -N is not specified, the default behavior is to  allo‐
1618              cate  enough  nodes to satisfy the requirements of the -n and -c
1619              options.  The job will be allocated as many  nodes  as  possible
1620              within  the  range specified and without delaying the initiation
1621              of the job.  If the number of tasks is given and a number of re‐
1622              quested  nodes is also given, the number of nodes used from that
1623              request will be reduced to match that of the number of tasks  if
1624              the number of nodes in the request is greater than the number of
1625              tasks.  The node count specification may include a numeric value
1626              followed  by a suffix of "k" (multiplies numeric value by 1,024)
1627              or "m" (multiplies numeric value by 1,048,576). This option  ap‐
1628              plies to job and step allocations.
1629
1630
1631       -n, --ntasks=<number>
1632              Specify  the  number of tasks to run. Request that srun allocate
1633              resources for ntasks tasks.  The default is one task  per  node,
1634              but  note  that  the --cpus-per-task option will change this de‐
1635              fault. This option applies to job and step allocations.
1636
1637
1638       --network=<type>
1639              Specify information pertaining to the switch  or  network.   The
1640              interpretation of type is system dependent.  This option is sup‐
1641              ported when running Slurm on a Cray natively.  It is used to re‐
1642              quest  using  Network  Performance Counters.  Only one value per
1643              request is valid.  All options are case in-sensitive.   In  this
1644              configuration supported values include:
1645
1646              system
1647                    Use  the  system-wide  network  performance counters. Only
1648                    nodes requested will be marked in use for the job  alloca‐
1649                    tion.   If  the job does not fill up the entire system the
1650                    rest of the nodes are not able to be used  by  other  jobs
1651                    using  NPC,  if  idle their state will appear as PerfCnts.
1652                    These nodes are still available for other jobs  not  using
1653                    NPC.
1654
1655              blade Use the blade network performance counters. Only nodes re‐
1656                    quested will be marked in use for the job allocation.   If
1657                    the  job does not fill up the entire blade(s) allocated to
1658                    the job those blade(s) are not able to be  used  by  other
1659                    jobs  using NPC, if idle their state will appear as PerfC‐
1660                    nts.  These nodes are still available for other  jobs  not
1661                    using NPC.
1662
1663
1664              In all cases the job allocation request must specify the
1665              --exclusive option and the step cannot specify the --overlap op‐
1666              tion.  Otherwise the request will be denied.
1667
1668              Also with any of these options steps are not  allowed  to  share
1669              blades,  so  resources would remain idle inside an allocation if
1670              the step running on a blade does not take up all  the  nodes  on
1671              the blade.
1672
1673              The  network option is also supported on systems with IBM's Par‐
1674              allel Environment (PE).  See IBM's LoadLeveler job command  key‐
1675              word documentation about the keyword "network" for more informa‐
1676              tion.  Multiple values may be specified  in  a  comma  separated
1677              list.   All options are case in-sensitive.  Supported values in‐
1678              clude:
1679
1680              BULK_XFER[=<resources>]
1681                          Enable bulk transfer of data  using  Remote  Direct-
1682                          Memory Access (RDMA).  The optional resources speci‐
1683                          fication is a numeric value which can have a  suffix
1684                          of  "k",  "K",  "m",  "M", "g" or "G" for kilobytes,
1685                          megabytes or gigabytes.  NOTE: The resources  speci‐
1686                          fication  is not supported by the underlying IBM in‐
1687                          frastructure as of Parallel Environment version  2.2
1688                          and  no value should be specified at this time.  The
1689                          devices allocated to a job must all be of  the  same
1690                          type.   The  default value depends upon depends upon
1691                          what hardware is available and in order  of  prefer‐
1692                          ences  is  IPONLY  (which  is not considered in User
1693                          Space mode), HFI, IB, HPCE, and KMUX.
1694
1695              CAU=<count> Number of Collective Acceleration  Units  (CAU)  re‐
1696                          quired.   Applies  only to IBM Power7-IH processors.
1697                          Default value is zero.  Independent CAU will be  al‐
1698                          located  for  each programming interface (MPI, LAPI,
1699                          etc.)
1700
1701              DEVNAME=<name>
1702                          Specify the device name to  use  for  communications
1703                          (e.g. "eth0" or "mlx4_0").
1704
1705              DEVTYPE=<type>
1706                          Specify  the  device type to use for communications.
1707                          The supported values of type are: "IB" (InfiniBand),
1708                          "HFI"  (P7 Host Fabric Interface), "IPONLY" (IP-Only
1709                          interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1710                          nel  Emulation of HPCE).  The devices allocated to a
1711                          job must all be of the same type.  The default value
1712                          depends upon depends upon what hardware is available
1713                          and in order of preferences is IPONLY (which is  not
1714                          considered  in  User Space mode), HFI, IB, HPCE, and
1715                          KMUX.
1716
1717              IMMED =<count>
1718                          Number of immediate send slots per window  required.
1719                          Applies  only  to IBM Power7-IH processors.  Default
1720                          value is zero.
1721
1722              INSTANCES =<count>
1723                          Specify number of network connections for each  task
1724                          on  each  network  connection.  The default instance
1725                          count is 1.
1726
1727              IPV4        Use Internet Protocol (IP) version 4  communications
1728                          (default).
1729
1730              IPV6        Use Internet Protocol (IP) version 6 communications.
1731
1732              LAPI        Use the LAPI programming interface.
1733
1734              MPI         Use  the  MPI programming interface.  MPI is the de‐
1735                          fault interface.
1736
1737              PAMI        Use the PAMI programming interface.
1738
1739              SHMEM       Use the OpenSHMEM programming interface.
1740
1741              SN_ALL      Use all available switch networks (default).
1742
1743              SN_SINGLE   Use one available switch network.
1744
1745              UPC         Use the UPC programming interface.
1746
1747              US          Use User Space communications.
1748
1749
1750              Some examples of network specifications:
1751
1752              Instances=2,US,MPI,SN_ALL
1753                          Create two user space connections for MPI communica‐
1754                          tions on every switch network for each task.
1755
1756              US,MPI,Instances=3,Devtype=IB
1757                          Create three user space connections for MPI communi‐
1758                          cations on every InfiniBand network for each task.
1759
1760              IPV4,LAPI,SN_Single
1761                          Create a IP version 4 connection for LAPI communica‐
1762                          tions on one switch network for each task.
1763
1764              Instances=2,US,LAPI,MPI
1765                          Create  two user space connections each for LAPI and
1766                          MPI communications on every switch network for  each
1767                          task.  Note that SN_ALL is the default option so ev‐
1768                          ery switch network  is  used.  Also  note  that  In‐
1769                          stances=2  specifies that two connections are estab‐
1770                          lished for each protocol (LAPI  and  MPI)  and  each
1771                          task.   If  there are two networks and four tasks on
1772                          the node then a total of 32 connections  are  estab‐
1773                          lished  (2  instances x 2 protocols x 2 networks x 4
1774                          tasks).
1775
1776              This option applies to job and step allocations.
1777
1778
1779       --nice[=adjustment]
1780              Run the job with an adjusted scheduling priority  within  Slurm.
1781              With no adjustment value the scheduling priority is decreased by
1782              100. A negative nice value increases the priority, otherwise de‐
1783              creases  it. The adjustment range is +/- 2147483645. Only privi‐
1784              leged users can specify a negative adjustment.
1785
1786
1787       --ntasks-per-core=<ntasks>
1788              Request the maximum ntasks be invoked on each core.  This option
1789              applies  to  the  job  allocation,  but not to step allocations.
1790              Meant  to  be  used  with  the  --ntasks  option.   Related   to
1791              --ntasks-per-node  except  at the core level instead of the node
1792              level.  Masks will automatically be generated to bind the  tasks
1793              to  specific  cores  unless --cpu-bind=none is specified.  NOTE:
1794              This option is not supported when  using  SelectType=select/lin‐
1795              ear.
1796
1797
1798       --ntasks-per-gpu=<ntasks>
1799              Request that there are ntasks tasks invoked for every GPU.  This
1800              option can work in two ways: 1) either specify --ntasks in addi‐
1801              tion,  in which case a type-less GPU specification will be auto‐
1802              matically determined to satisfy --ntasks-per-gpu, or 2)  specify
1803              the  GPUs  wanted (e.g. via --gpus or --gres) without specifying
1804              --ntasks, and the total task count will be automatically  deter‐
1805              mined.   The  number  of  CPUs  needed will be automatically in‐
1806              creased if necessary to allow for  any  calculated  task  count.
1807              This  option will implicitly set --gpu-bind=single:<ntasks>, but
1808              that can be overridden with an  explicit  --gpu-bind  specifica‐
1809              tion.   This  option  is  not compatible with a node range (i.e.
1810              -N<minnodes-maxnodes>).  This  option  is  not  compatible  with
1811              --gpus-per-task,  --gpus-per-socket, or --ntasks-per-node.  This
1812              option is not supported unless SelectType=cons_tres  is  config‐
1813              ured (either directly or indirectly on Cray systems).
1814
1815
1816       --ntasks-per-node=<ntasks>
1817              Request  that  ntasks be invoked on each node.  If used with the
1818              --ntasks option, the --ntasks option will  take  precedence  and
1819              the  --ntasks-per-node  will  be  treated  as a maximum count of
1820              tasks per node.  Meant to be used with the --nodes option.  This
1821              is related to --cpus-per-task=ncpus, but does not require knowl‐
1822              edge of the actual number of cpus on each node.  In some  cases,
1823              it  is more convenient to be able to request that no more than a
1824              specific number of tasks be invoked on each node.   Examples  of
1825              this  include  submitting a hybrid MPI/OpenMP app where only one
1826              MPI "task/rank" should be assigned to each node  while  allowing
1827              the  OpenMP portion to utilize all of the parallelism present in
1828              the node, or submitting a single setup/cleanup/monitoring job to
1829              each  node  of a pre-existing allocation as one step in a larger
1830              job script. This option applies to job allocations.
1831
1832
1833       --ntasks-per-socket=<ntasks>
1834              Request the maximum ntasks be invoked on each socket.  This  op‐
1835              tion applies to the job allocation, but not to step allocations.
1836              Meant  to  be  used  with  the  --ntasks  option.   Related   to
1837              --ntasks-per-node except at the socket level instead of the node
1838              level.  Masks will automatically be generated to bind the  tasks
1839              to  specific sockets unless --cpu-bind=none is specified.  NOTE:
1840              This option is not supported when  using  SelectType=select/lin‐
1841              ear.
1842
1843
1844       -O, --overcommit
1845              Overcommit  resources. This option applies to job and step allo‐
1846              cations.  When applied to job allocation, only one CPU is  allo‐
1847              cated to the job per node and options used to specify the number
1848              of tasks per node, socket, core, etc.  are  ignored.   When  ap‐
1849              plied  to  job  step allocations (the srun command when executed
1850              within an existing job allocation), this option can be  used  to
1851              launch  more than one task per CPU.  Normally, srun will not al‐
1852              locate more than one process per CPU.  By specifying  --overcom‐
1853              mit  you  are explicitly allowing more than one process per CPU.
1854              However no more than MAX_TASKS_PER_NODE tasks are  permitted  to
1855              execute  per  node.   NOTE: MAX_TASKS_PER_NODE is defined in the
1856              file slurm.h and is not a variable, it is  set  at  Slurm  build
1857              time.
1858
1859
1860       --overlap
1861              Allow steps to overlap each other on the CPUs.  By default steps
1862              do not share CPUs with other parallel steps.
1863
1864
1865       -o, --output=<filename pattern>
1866              Specify the "filename pattern" for stdout  redirection.  By  de‐
1867              fault  in  interactive mode, srun collects stdout from all tasks
1868              and sends this output via TCP/IP to the attached terminal.  With
1869              --output  stdout  may  be  redirected to a file, to one file per
1870              task, or to /dev/null. See section IO Redirection below for  the
1871              various  forms  of  filename pattern.  If the specified file al‐
1872              ready exists, it will be overwritten.
1873
1874              If --error is not also specified on the command line, both  std‐
1875              out  and stderr will directed to the file specified by --output.
1876              This option applies to job and step allocations.
1877
1878
1879       --open-mode=<append|truncate>
1880              Open the output and error files using append or truncate mode as
1881              specified.   For  heterogeneous  job  steps the default value is
1882              "append".  Otherwise the default value is specified by the  sys‐
1883              tem  configuration  parameter JobFileAppend. This option applies
1884              to job and step allocations.
1885
1886
1887       --het-group=<expr>
1888              Identify each component in a heterogeneous  job  allocation  for
1889              which a step is to be created. Applies only to srun commands is‐
1890              sued inside a salloc allocation or sbatch script.  <expr>  is  a
1891              set  of integers corresponding to one or more options offsets on
1892              the salloc or sbatch command line.   Examples:  "--het-group=2",
1893              "--het-group=0,4",  "--het-group=1,3-5".   The  default value is
1894              --het-group=0.
1895
1896
1897       -p, --partition=<partition_names>
1898              Request a specific partition for the  resource  allocation.   If
1899              not  specified,  the default behavior is to allow the slurm con‐
1900              troller to select the default partition  as  designated  by  the
1901              system  administrator.  If  the job can use more than one parti‐
1902              tion, specify their names in a comma separate list and  the  one
1903              offering  earliest  initiation will be used with no regard given
1904              to the partition name ordering (although higher priority  parti‐
1905              tions will be considered first).  When the job is initiated, the
1906              name of the partition used will  be  placed  first  in  the  job
1907              record partition string. This option applies to job allocations.
1908
1909
1910       --power=<flags>
1911              Comma  separated  list of power management plugin options.  Cur‐
1912              rently available flags include: level (all  nodes  allocated  to
1913              the job should have identical power caps, may be disabled by the
1914              Slurm configuration option PowerParameters=job_no_level).   This
1915              option applies to job allocations.
1916
1917
1918       --priority=<value>
1919              Request  a  specific job priority.  May be subject to configura‐
1920              tion specific constraints.  value should  either  be  a  numeric
1921              value  or "TOP" (for highest possible value).  Only Slurm opera‐
1922              tors and administrators can set the priority of a job.  This op‐
1923              tion applies to job allocations only.
1924
1925
1926       --profile=<all|none|[energy[,|task[,|filesystem[,|network]]]]>
1927              enables  detailed  data  collection  by  the acct_gather_profile
1928              plugin.  Detailed data are typically time-series that are stored
1929              in an HDF5 file for the job or an InfluxDB database depending on
1930              the configured plugin.
1931
1932
1933              All       All data types are collected. (Cannot be combined with
1934                        other values.)
1935
1936
1937              None      No data types are collected. This is the default.
1938                         (Cannot be combined with other values.)
1939
1940
1941              Energy    Energy data is collected.
1942
1943
1944              Task      Task (I/O, Memory, ...) data is collected.
1945
1946
1947              Filesystem
1948                        Filesystem data is collected.
1949
1950
1951              Network   Network (InfiniBand) data is collected.
1952
1953
1954              This option applies to job and step allocations.
1955
1956
1957       --prolog=<executable>
1958              srun  will  run  executable  just before launching the job step.
1959              The command line arguments for executable will  be  the  command
1960              and arguments of the job step.  If executable is "none", then no
1961              srun prolog will be run. This parameter overrides the SrunProlog
1962              parameter  in  slurm.conf. This parameter is completely indepen‐
1963              dent from the Prolog parameter in slurm.conf.  This  option  ap‐
1964              plies to job allocations.
1965
1966
1967       --propagate[=rlimit[,rlimit...]]
1968              Allows  users to specify which of the modifiable (soft) resource
1969              limits to propagate to the compute  nodes  and  apply  to  their
1970              jobs.  If  no rlimit is specified, then all resource limits will
1971              be propagated.  The following  rlimit  names  are  supported  by
1972              Slurm  (although  some options may not be supported on some sys‐
1973              tems):
1974
1975              ALL       All limits listed below (default)
1976
1977              NONE      No limits listed below
1978
1979              AS        The maximum address space for a process
1980
1981              CORE      The maximum size of core file
1982
1983              CPU       The maximum amount of CPU time
1984
1985              DATA      The maximum size of a process's data segment
1986
1987              FSIZE     The maximum size of files created. Note  that  if  the
1988                        user  sets  FSIZE to less than the current size of the
1989                        slurmd.log, job launches will fail with a  'File  size
1990                        limit exceeded' error.
1991
1992              MEMLOCK   The maximum size that may be locked into memory
1993
1994              NOFILE    The maximum number of open files
1995
1996              NPROC     The maximum number of processes available
1997
1998              RSS       The maximum resident set size
1999
2000              STACK     The maximum stack size
2001
2002              This option applies to job allocations.
2003
2004
2005       --pty  Execute  task  zero  in  pseudo  terminal mode.  Implicitly sets
2006              --unbuffered.  Implicitly sets --error and --output to /dev/null
2007              for  all  tasks except task zero, which may cause those tasks to
2008              exit immediately (e.g. shells will typically exit immediately in
2009              that situation).  This option applies to step allocations.
2010
2011
2012       -q, --qos=<qos>
2013              Request a quality of service for the job.  QOS values can be de‐
2014              fined for each user/cluster/account  association  in  the  Slurm
2015              database.   Users will be limited to their association's defined
2016              set of qos's when the Slurm  configuration  parameter,  Account‐
2017              ingStorageEnforce, includes "qos" in its definition. This option
2018              applies to job allocations.
2019
2020
2021       -Q, --quiet
2022              Suppress informational messages from srun. Errors will still  be
2023              displayed. This option applies to job and step allocations.
2024
2025
2026       --quit-on-interrupt
2027              Quit  immediately  on single SIGINT (Ctrl-C). Use of this option
2028              disables the status feature normally  available  when  srun  re‐
2029              ceives  a  single  Ctrl-C and causes srun to instead immediately
2030              terminate the running job. This option applies to  step  alloca‐
2031              tions.
2032
2033
2034       -r, --relative=<n>
2035              Run  a  job  step  relative to node n of the current allocation.
2036              This option may be used to spread several job  steps  out  among
2037              the  nodes  of  the  current job. If -r is used, the current job
2038              step will begin at node n of the allocated nodelist,  where  the
2039              first node is considered node 0.  The -r option is not permitted
2040              with -w or -x option and will result in a fatal error  when  not
2041              running within a prior allocation (i.e. when SLURM_JOB_ID is not
2042              set). The default for n is 0. If the value  of  --nodes  exceeds
2043              the  number  of  nodes  identified with the --relative option, a
2044              warning message will be printed and the --relative  option  will
2045              take precedence. This option applies to step allocations.
2046
2047
2048       --reboot
2049              Force  the  allocated  nodes  to reboot before starting the job.
2050              This is only supported with some system configurations and  will
2051              otherwise  be  silently  ignored. Only root, SlurmUser or admins
2052              can reboot nodes. This option applies to job allocations.
2053
2054
2055       --resv-ports[=count]
2056              Reserve communication ports for this job. Users can specify  the
2057              number  of  port  they  want  to  reserve.  The  parameter  Mpi‐
2058              Params=ports=12000-12999 must be specified in slurm.conf. If not
2059              specified  and  Slurm's  OpenMPI plugin is used, then by default
2060              the number of reserved equal to the highest number of  tasks  on
2061              any  node in the job step allocation.  If the number of reserved
2062              ports is zero then no ports is reserved.  Used for OpenMPI. This
2063              option applies to job and step allocations.
2064
2065
2066       --reservation=<reservation_names>
2067              Allocate  resources  for  the job from the named reservation. If
2068              the job can use more than one reservation, specify  their  names
2069              in  a  comma separate list and the one offering earliest initia‐
2070              tion. Each reservation will be considered in the  order  it  was
2071              requested.   All  reservations will be listed in scontrol/squeue
2072              through the life of the job.  In accounting the  first  reserva‐
2073              tion  will be seen and after the job starts the reservation used
2074              will replace it.
2075
2076
2077       -s, --oversubscribe
2078              The job allocation can over-subscribe resources with other  run‐
2079              ning  jobs.   The  resources to be over-subscribed can be nodes,
2080              sockets, cores, and/or hyperthreads  depending  upon  configura‐
2081              tion.   The  default  over-subscribe  behavior depends on system
2082              configuration and the  partition's  OverSubscribe  option  takes
2083              precedence over the job's option.  This option may result in the
2084              allocation being granted sooner than if the --oversubscribe  op‐
2085              tion was not set and allow higher system utilization, but appli‐
2086              cation performance will likely suffer due to competition for re‐
2087              sources.  This option applies to step allocations.
2088
2089
2090       -S, --core-spec=<num>
2091              Count of specialized cores per node reserved by the job for sys‐
2092              tem operations and not used by the application. The  application
2093              will  not use these cores, but will be charged for their alloca‐
2094              tion.  Default value is dependent  upon  the  node's  configured
2095              CoreSpecCount  value.   If a value of zero is designated and the
2096              Slurm configuration option AllowSpecResourcesUsage  is  enabled,
2097              the  job  will  be allowed to override CoreSpecCount and use the
2098              specialized resources on nodes it is allocated.  This option can
2099              not  be  used with the --thread-spec option. This option applies
2100              to job allocations.
2101
2102
2103       --signal=[R:]<sig_num>[@<sig_time>]
2104              When a job is within sig_time seconds of its end time,  send  it
2105              the  signal sig_num.  Due to the resolution of event handling by
2106              Slurm, the signal may be sent up  to  60  seconds  earlier  than
2107              specified.   sig_num may either be a signal number or name (e.g.
2108              "10" or "USR1").  sig_time must have an integer value between  0
2109              and  65535.   By default, no signal is sent before the job's end
2110              time.  If a sig_num is specified without any sig_time,  the  de‐
2111              fault  time will be 60 seconds. This option applies to job allo‐
2112              cations.  Use the "R:" option to allow this job to overlap  with
2113              a  reservation  with MaxStartDelay set.  To have the signal sent
2114              at preemption time see the preempt_send_user_signal SlurmctldPa‐
2115              rameter.
2116
2117
2118       --slurmd-debug=<level>
2119              Specify  a debug level for slurmd(8). The level may be specified
2120              either an integer value between 0 [quiet, only errors  are  dis‐
2121              played] and 4 [verbose operation] or the SlurmdDebug tags.
2122
2123              quiet     Log nothing
2124
2125              fatal     Log only fatal errors
2126
2127              error     Log only errors
2128
2129              info      Log errors and general informational messages
2130
2131              verbose   Log errors and verbose informational messages
2132
2133
2134              The slurmd debug information is copied onto the stderr of
2135              the  job.  By default only errors are displayed. This option ap‐
2136              plies to job and step allocations.
2137
2138
2139       --sockets-per-node=<sockets>
2140              Restrict node selection to nodes with  at  least  the  specified
2141              number  of  sockets.  See additional information under -B option
2142              above when task/affinity plugin is enabled. This option  applies
2143              to job allocations.
2144
2145
2146       --spread-job
2147              Spread the job allocation over as many nodes as possible and at‐
2148              tempt to evenly distribute tasks  across  the  allocated  nodes.
2149              This  option disables the topology/tree plugin.  This option ap‐
2150              plies to job allocations.
2151
2152
2153       --switches=<count>[@<max-time>]
2154              When a tree topology is used, this defines the maximum count  of
2155              switches desired for the job allocation and optionally the maxi‐
2156              mum time to wait for that number of switches. If Slurm finds  an
2157              allocation  containing  more  switches than the count specified,
2158              the job remains pending until it either finds an allocation with
2159              desired  switch count or the time limit expires.  It there is no
2160              switch count limit, there is no delay in starting the job.   Ac‐
2161              ceptable  time  formats  include  "minutes",  "minutes:seconds",
2162              "hours:minutes:seconds", "days-hours", "days-hours:minutes"  and
2163              "days-hours:minutes:seconds".   The job's maximum time delay may
2164              be limited by the system administrator using the SchedulerParam‐
2165              eters configuration parameter with the max_switch_wait parameter
2166              option.  On a dragonfly network the only switch count  supported
2167              is  1 since communication performance will be highest when a job
2168              is allocate resources on one leaf switch or  more  than  2  leaf
2169              switches.   The  default  max-time is the max_switch_wait Sched‐
2170              ulerParameters. This option applies to job allocations.
2171
2172
2173       -T, --threads=<nthreads>
2174              Allows limiting the number of concurrent threads  used  to  send
2175              the job request from the srun process to the slurmd processes on
2176              the allocated nodes. Default is to use one thread per  allocated
2177              node  up  to a maximum of 60 concurrent threads. Specifying this
2178              option limits the number of concurrent threads to nthreads (less
2179              than  or  equal  to  60).  This should only be used to set a low
2180              thread count for testing on very small  memory  computers.  This
2181              option applies to job allocations.
2182
2183
2184       -t, --time=<time>
2185              Set a limit on the total run time of the job allocation.  If the
2186              requested time limit exceeds the partition's time limit, the job
2187              will  be  left  in a PENDING state (possibly indefinitely).  The
2188              default time limit is the partition's default time limit.   When
2189              the  time  limit  is reached, each task in each job step is sent
2190              SIGTERM followed by SIGKILL.  The interval  between  signals  is
2191              specified  by  the  Slurm configuration parameter KillWait.  The
2192              OverTimeLimit configuration parameter may permit the job to  run
2193              longer than scheduled.  Time resolution is one minute and second
2194              values are rounded up to the next minute.
2195
2196              A time limit of zero requests that no  time  limit  be  imposed.
2197              Acceptable  time  formats  include "minutes", "minutes:seconds",
2198              "hours:minutes:seconds", "days-hours", "days-hours:minutes"  and
2199              "days-hours:minutes:seconds".  This  option  applies  to job and
2200              step allocations.
2201
2202
2203       --task-epilog=<executable>
2204              The slurmstepd daemon will run executable just after  each  task
2205              terminates.  This will be executed before any TaskEpilog parame‐
2206              ter in slurm.conf is executed.  This  is  meant  to  be  a  very
2207              short-lived  program. If it fails to terminate within a few sec‐
2208              onds, it will be killed along  with  any  descendant  processes.
2209              This option applies to step allocations.
2210
2211
2212       --task-prolog=<executable>
2213              The  slurmstepd daemon will run executable just before launching
2214              each task. This will be executed after any TaskProlog  parameter
2215              in slurm.conf is executed.  Besides the normal environment vari‐
2216              ables, this has SLURM_TASK_PID available to identify the process
2217              ID of the task being started.  Standard output from this program
2218              of the form "export NAME=value" will be used to set  environment
2219              variables  for  the  task  being spawned. This option applies to
2220              step allocations.
2221
2222
2223       --test-only
2224              Returns an estimate of when a job  would  be  scheduled  to  run
2225              given  the  current  job  queue and all the other srun arguments
2226              specifying the job.  This limits srun's behavior to just  return
2227              information;  no job is actually submitted.  The program will be
2228              executed directly by the slurmd daemon. This option  applies  to
2229              job allocations.
2230
2231
2232       --thread-spec=<num>
2233              Count  of  specialized  threads per node reserved by the job for
2234              system operations and not used by the application. The  applica‐
2235              tion  will  not use these threads, but will be charged for their
2236              allocation.  This option can not be used  with  the  --core-spec
2237              option. This option applies to job allocations.
2238
2239
2240       --threads-per-core=<threads>
2241              Restrict  node  selection  to  nodes with at least the specified
2242              number of threads per core. In task layout,  use  the  specified
2243              maximum  number of threads per core. Implies --cpu-bind=threads.
2244              NOTE: "Threads" refers to the number of processing units on each
2245              core  rather than the number of application tasks to be launched
2246              per core. See additional information under -B option above  when
2247              task/affinity  plugin is enabled. This option applies to job and
2248              step allocations.
2249
2250
2251       --time-min=<time>
2252              Set a minimum time limit on the job allocation.   If  specified,
2253              the  job  may  have its --time limit lowered to a value no lower
2254              than --time-min if doing so permits the job to  begin  execution
2255              earlier  than otherwise possible.  The job's time limit will not
2256              be changed after the job is allocated resources.  This  is  per‐
2257              formed  by a backfill scheduling algorithm to allocate resources
2258              otherwise reserved for higher priority  jobs.   Acceptable  time
2259              formats   include   "minutes",   "minutes:seconds",  "hours:min‐
2260              utes:seconds",    "days-hours",     "days-hours:minutes"     and
2261              "days-hours:minutes:seconds". This option applies to job alloca‐
2262              tions.
2263
2264
2265       --tmp=<size[units]>
2266              Specify a minimum amount of temporary disk space per node.   De‐
2267              fault units are megabytes.  Different units can be specified us‐
2268              ing the suffix [K|M|G|T].  This option applies  to  job  alloca‐
2269              tions.
2270
2271
2272       -u, --unbuffered
2273              By  default  the  connection  between  slurmstepd  and  the user
2274              launched application is over a pipe. The stdio output written by
2275              the  application is buffered by the glibc until it is flushed or
2276              the output is set as unbuffered.  See setbuf(3). If this  option
2277              is  specified  the  tasks are executed with a pseudo terminal so
2278              that the application output is unbuffered. This  option  applies
2279              to step allocations.
2280
2281       --usage
2282              Display brief help message and exit.
2283
2284
2285       --uid=<user>
2286              Attempt to submit and/or run a job as user instead of the invok‐
2287              ing user id. The invoking user's credentials  will  be  used  to
2288              check access permissions for the target partition. User root may
2289              use this option to run jobs as a normal user in a RootOnly  par‐
2290              tition  for  example. If run as root, srun will drop its permis‐
2291              sions to the uid specified after node allocation is  successful.
2292              user  may be the user name or numerical user ID. This option ap‐
2293              plies to job and step allocations.
2294
2295
2296       --use-min-nodes
2297              If a range of node counts is given, prefer the smaller count.
2298
2299
2300       -V, --version
2301              Display version information and exit.
2302
2303
2304       -v, --verbose
2305              Increase the verbosity of srun's informational messages.  Multi‐
2306              ple  -v's  will  further  increase srun's verbosity.  By default
2307              only errors will be displayed. This option applies  to  job  and
2308              step allocations.
2309
2310
2311       -W, --wait=<seconds>
2312              Specify  how long to wait after the first task terminates before
2313              terminating all remaining tasks. A value of 0 indicates  an  un‐
2314              limited  wait  (a  warning will be issued after 60 seconds). The
2315              default value is set by the WaitTime parameter in the slurm con‐
2316              figuration  file  (see slurm.conf(5)). This option can be useful
2317              to ensure that a job is terminated in a timely  fashion  in  the
2318              event  that  one or more tasks terminate prematurely.  Note: The
2319              -K, --kill-on-bad-exit option takes precedence over  -W,  --wait
2320              to terminate the job immediately if a task exits with a non-zero
2321              exit code. This option applies to job allocations.
2322
2323
2324       -w, --nodelist=<host1,host2,... or filename>
2325              Request a specific list of hosts.  The job will contain  all  of
2326              these  hosts  and possibly additional hosts as needed to satisfy
2327              resource  requirements.   The  list  may  be  specified   as   a
2328              comma-separated list of hosts, a range of hosts (host[1-5,7,...]
2329              for example), or a filename.  The host list will be  assumed  to
2330              be  a filename if it contains a "/" character.  If you specify a
2331              minimum node or processor count larger than can be satisfied  by
2332              the  supplied  host list, additional resources will be allocated
2333              on other nodes as needed.  Rather than  repeating  a  host  name
2334              multiple  times,  an  asterisk and a repetition count may be ap‐
2335              pended to a host name. For example "host1,host1"  and  "host1*2"
2336              are  equivalent.  If  the number of tasks is given and a list of
2337              requested nodes is also given, the number  of  nodes  used  from
2338              that  list  will be reduced to match that of the number of tasks
2339              if the number of nodes in the list is greater than the number of
2340              tasks. This option applies to job and step allocations.
2341
2342
2343       --wckey=<wckey>
2344              Specify  wckey  to be used with job.  If TrackWCKey=no (default)
2345              in the slurm.conf this value is ignored. This option applies  to
2346              job allocations.
2347
2348
2349       -X, --disable-status
2350              Disable  the  display of task status when srun receives a single
2351              SIGINT (Ctrl-C). Instead immediately forward the SIGINT  to  the
2352              running  job.  Without this option a second Ctrl-C in one second
2353              is required to forcibly terminate the job and srun will  immedi‐
2354              ately  exit.  May  also  be  set  via  the  environment variable
2355              SLURM_DISABLE_STATUS. This option applies to job allocations.
2356
2357
2358       -x, --exclude=<host1,host2,... or filename>
2359              Request that a specific list of hosts not be included in the re‐
2360              sources  allocated to this job. The host list will be assumed to
2361              be a filename if it contains a "/" character.  This  option  ap‐
2362              plies to job and step allocations.
2363
2364
2365       --x11[=<all|first|last>]
2366              Sets  up X11 forwarding on all, first or last node(s) of the al‐
2367              location. This option is only enabled if Slurm was compiled with
2368              X11  support  and  PrologFlags=x11 is defined in the slurm.conf.
2369              Default is all.
2370
2371
2372       -Z, --no-allocate
2373              Run the specified tasks on a set of  nodes  without  creating  a
2374              Slurm  "job"  in the Slurm queue structure, bypassing the normal
2375              resource allocation step.  The list of nodes must  be  specified
2376              with  the  -w,  --nodelist  option.  This is a privileged option
2377              only available for the users "SlurmUser" and "root". This option
2378              applies to job allocations.
2379
2380
2381       srun will submit the job request to the slurm job controller, then ini‐
2382       tiate all processes on the remote nodes. If the request cannot  be  met
2383       immediately,  srun  will  block until the resources are free to run the
2384       job. If the -I (--immediate) option is specified srun will terminate if
2385       resources are not immediately available.
2386
2387       When  initiating remote processes srun will propagate the current work‐
2388       ing directory, unless --chdir=<path> is specified, in which  case  path
2389       will become the working directory for the remote processes.
2390
2391       The  -n,  -c,  and -N options control how CPUs  and nodes will be allo‐
2392       cated to the job. When specifying only the number of processes  to  run
2393       with  -n,  a default of one CPU per process is allocated. By specifying
2394       the number of CPUs required per task (-c), more than one CPU may be al‐
2395       located  per process. If the number of nodes is specified with -N, srun
2396       will attempt to allocate at least the number of nodes specified.
2397
2398       Combinations of the above three options may be used to change how  pro‐
2399       cesses are distributed across nodes and cpus. For instance, by specify‐
2400       ing both the number of processes and number of nodes on which  to  run,
2401       the  number of processes per node is implied. However, if the number of
2402       CPUs per process is more important then number of  processes  (-n)  and
2403       the number of CPUs per process (-c) should be specified.
2404
2405       srun  will  refuse  to   allocate  more than one process per CPU unless
2406       --overcommit (-O) is also specified.
2407
2408       srun will attempt to meet the above specifications "at a minimum." That
2409       is,  if  16 nodes are requested for 32 processes, and some nodes do not
2410       have 2 CPUs, the allocation of nodes will be increased in order to meet
2411       the  demand  for  CPUs. In other words, a minimum of 16 nodes are being
2412       requested. However, if 16 nodes are requested for  15  processes,  srun
2413       will  consider  this  an  error,  as  15 processes cannot run across 16
2414       nodes.
2415
2416
2417       IO Redirection
2418
2419       By default, stdout and stderr will be redirected from all tasks to  the
2420       stdout  and stderr of srun, and stdin will be redirected from the stan‐
2421       dard input of srun to all remote tasks.  If stdin is only to be read by
2422       a  subset  of  the spawned tasks, specifying a file to read from rather
2423       than forwarding stdin from the srun command may  be  preferable  as  it
2424       avoids moving and storing data that will never be read.
2425
2426       For  OS  X, the poll() function does not support stdin, so input from a
2427       terminal is not possible.
2428
2429       This behavior may be changed with the --output,  --error,  and  --input
2430       (-o, -e, -i) options. Valid format specifications for these options are
2431
2432       all       stdout stderr is redirected from all tasks to srun.  stdin is
2433                 broadcast to all remote tasks.  (This is the  default  behav‐
2434                 ior)
2435
2436       none      stdout  and  stderr  is not received from any task.  stdin is
2437                 not sent to any task (stdin is closed).
2438
2439       taskid    stdout and/or stderr are redirected from only the  task  with
2440                 relative  id  equal  to  taskid, where 0 <= taskid <= ntasks,
2441                 where ntasks is the total number of tasks in the current  job
2442                 step.   stdin  is  redirected  from the stdin of srun to this
2443                 same task.  This file will be written on the  node  executing
2444                 the task.
2445
2446       filename  srun  will  redirect  stdout  and/or stderr to the named file
2447                 from all tasks.  stdin will be redirected from the named file
2448                 and  broadcast to all tasks in the job.  filename refers to a
2449                 path on the host that runs srun.  Depending on the  cluster's
2450                 file  system  layout, this may result in the output appearing
2451                 in different places depending on whether the job  is  run  in
2452                 batch mode.
2453
2454       filename pattern
2455                 srun allows for a filename pattern to be used to generate the
2456                 named IO file described above. The following list  of  format
2457                 specifiers  may  be  used  in the format string to generate a
2458                 filename that will be unique to a given jobid, stepid,  node,
2459                 or  task.  In  each case, the appropriate number of files are
2460                 opened and associated with the corresponding tasks. Note that
2461                 any  format string containing %t, %n, and/or %N will be writ‐
2462                 ten on the node executing the task rather than the node where
2463                 srun executes, these format specifiers are not supported on a
2464                 BGQ system.
2465
2466                 \\     Do not process any of the replacement symbols.
2467
2468                 %%     The character "%".
2469
2470                 %A     Job array's master job allocation number.
2471
2472                 %a     Job array ID (index) number.
2473
2474                 %J     jobid.stepid of the running job. (e.g. "128.0")
2475
2476                 %j     jobid of the running job.
2477
2478                 %s     stepid of the running job.
2479
2480                 %N     short hostname. This will create a  separate  IO  file
2481                        per node.
2482
2483                 %n     Node  identifier  relative to current job (e.g. "0" is
2484                        the first node of the running job) This will create  a
2485                        separate IO file per node.
2486
2487                 %t     task  identifier  (rank) relative to current job. This
2488                        will create a separate IO file per task.
2489
2490                 %u     User name.
2491
2492                 %x     Job name.
2493
2494                 A number placed between  the  percent  character  and  format
2495                 specifier  may be used to zero-pad the result in the IO file‐
2496                 name. This number is ignored if the format  specifier  corre‐
2497                 sponds to  non-numeric data (%N for example).
2498
2499                 Some  examples  of  how the format string may be used for a 4
2500                 task job step with a Job ID of 128 and step id of 0  are  in‐
2501                 cluded below:
2502
2503                 job%J.out      job128.0.out
2504
2505                 job%4j.out     job0128.out
2506
2507                 job%j-%2t.out  job128-00.out, job128-01.out, ...
2508

PERFORMANCE

2510       Executing  srun  sends  a remote procedure call to slurmctld. If enough
2511       calls from srun or other Slurm client commands that send remote  proce‐
2512       dure  calls to the slurmctld daemon come in at once, it can result in a
2513       degradation of performance of the slurmctld daemon, possibly  resulting
2514       in a denial of service.
2515
2516       Do  not run srun or other Slurm client commands that send remote proce‐
2517       dure calls to slurmctld from loops in shell scripts or other  programs.
2518       Ensure  that  programs limit calls to srun to the minimum necessary for
2519       the information you are trying to gather.
2520
2521

INPUT ENVIRONMENT VARIABLES

2523       Some srun options may be set via environment variables.  These environ‐
2524       ment  variables, along with their corresponding options, are listed be‐
2525       low.  Note: Command line options will always override these settings.
2526
2527       PMI_FANOUT            This is used exclusively  with  PMI  (MPICH2  and
2528                             MVAPICH2)  and controls the fanout of data commu‐
2529                             nications. The srun command sends messages to ap‐
2530                             plication  programs  (via  the  PMI  library) and
2531                             those applications may be called upon to  forward
2532                             that  data  to  up  to  this number of additional
2533                             tasks. Higher values offload work from  the  srun
2534                             command  to  the applications and likely increase
2535                             the vulnerability to failures.  The default value
2536                             is 32.
2537
2538       PMI_FANOUT_OFF_HOST   This  is  used  exclusively  with PMI (MPICH2 and
2539                             MVAPICH2) and controls the fanout of data  commu‐
2540                             nications.   The  srun  command sends messages to
2541                             application programs (via the  PMI  library)  and
2542                             those  applications may be called upon to forward
2543                             that data to additional tasks. By  default,  srun
2544                             sends  one  message per host and one task on that
2545                             host forwards the data to  other  tasks  on  that
2546                             host up to PMI_FANOUT.  If PMI_FANOUT_OFF_HOST is
2547                             defined, the user task may be required to forward
2548                             the  data  to  tasks  on  other  hosts.   Setting
2549                             PMI_FANOUT_OFF_HOST  may  increase   performance.
2550                             Since  more  work is performed by the PMI library
2551                             loaded by the user application, failures also can
2552                             be more common and more difficult to diagnose.
2553
2554       PMI_TIME              This  is  used  exclusively  with PMI (MPICH2 and
2555                             MVAPICH2) and controls how  much  the  communica‐
2556                             tions  from  the tasks to the srun are spread out
2557                             in time in order to avoid overwhelming  the  srun
2558                             command  with work. The default value is 500 (mi‐
2559                             croseconds) per task. On relatively slow  proces‐
2560                             sors  or systems with very large processor counts
2561                             (and large PMI data sets), higher values  may  be
2562                             required.
2563
2564       SLURM_ACCOUNT         Same as -A, --account
2565
2566       SLURM_ACCTG_FREQ      Same as --acctg-freq
2567
2568       SLURM_BCAST           Same as --bcast
2569
2570       SLURM_BURST_BUFFER    Same as --bb
2571
2572       SLURM_CLUSTERS        Same as -M, --clusters
2573
2574       SLURM_COMPRESS        Same as --compress
2575
2576       SLURM_CONF            The location of the Slurm configuration file.
2577
2578       SLURM_CONSTRAINT      Same as -C, --constraint
2579
2580       SLURM_CORE_SPEC       Same as --core-spec
2581
2582       SLURM_CPU_BIND        Same as --cpu-bind
2583
2584       SLURM_CPU_FREQ_REQ    Same as --cpu-freq.
2585
2586       SLURM_CPUS_PER_GPU    Same as --cpus-per-gpu
2587
2588       SLURM_CPUS_PER_TASK   Same as -c, --cpus-per-task
2589
2590       SLURM_DEBUG           Same as -v, --verbose
2591
2592       SLURM_DELAY_BOOT      Same as --delay-boot
2593
2594       SLURM_DEPENDENCY      Same as -P, --dependency=<jobid>
2595
2596       SLURM_DISABLE_STATUS  Same as -X, --disable-status
2597
2598       SLURM_DIST_PLANESIZE  Plane distribution size. Only used if --distribu‐
2599                             tion=plane, without =<size>, is set.
2600
2601       SLURM_DISTRIBUTION    Same as -m, --distribution
2602
2603       SLURM_EPILOG          Same as --epilog
2604
2605       SLURM_EXACT           Same as --exact
2606
2607       SLURM_EXCLUSIVE       Same as --exclusive
2608
2609       SLURM_EXIT_ERROR      Specifies the exit code generated  when  a  Slurm
2610                             error occurs (e.g. invalid options).  This can be
2611                             used by a script to distinguish application  exit
2612                             codes  from various Slurm error conditions.  Also
2613                             see SLURM_EXIT_IMMEDIATE.
2614
2615       SLURM_EXIT_IMMEDIATE  Specifies the exit code generated when the  --im‐
2616                             mediate option is used and resources are not cur‐
2617                             rently available.  This can be used by  a  script
2618                             to  distinguish application exit codes from vari‐
2619                             ous   Slurm   error   conditions.     Also    see
2620                             SLURM_EXIT_ERROR.
2621
2622       SLURM_EXPORT_ENV      Same as --export
2623
2624       SLURM_GPU_BIND        Same as --gpu-bind
2625
2626       SLURM_GPU_FREQ        Same as --gpu-freq
2627
2628       SLURM_GPUS            Same as -G, --gpus
2629
2630       SLURM_GPUS_PER_NODE   Same as --gpus-per-node
2631
2632       SLURM_GPUS_PER_TASK   Same as --gpus-per-task
2633
2634       SLURM_GRES            Same as --gres. Also see SLURM_STEP_GRES
2635
2636       SLURM_GRES_FLAGS      Same as --gres-flags
2637
2638       SLURM_HINT            Same as --hint
2639
2640       SLURM_IMMEDIATE       Same as -I, --immediate
2641
2642       SLURM_JOB_ID          Same as --jobid
2643
2644       SLURM_JOB_NAME        Same  as -J, --job-name except within an existing
2645                             allocation, in which case it is ignored to  avoid
2646                             using  the  batch  job's name as the name of each
2647                             job step.
2648
2649       SLURM_JOB_NODELIST    Same as -w, --nodelist=<host1,host2,... or  file‐
2650                             name>.  If job has been resized, ensure that this
2651                             nodelist is adjusted (or undefined) to avoid jobs
2652                             steps being rejected due to down nodes.
2653
2654       SLURM_JOB_NUM_NODES   Same  as  -N,  --nodes.  Total number of nodes in
2655                             the job’s resource allocation.
2656
2657       SLURM_KILL_BAD_EXIT   Same as -K, --kill-on-bad-exit
2658
2659       SLURM_LABELIO         Same as -l, --label
2660
2661       SLURM_MEM_BIND        Same as --mem-bind
2662
2663       SLURM_MEM_PER_CPU     Same as --mem-per-cpu
2664
2665       SLURM_MEM_PER_GPU     Same as --mem-per-gpu
2666
2667       SLURM_MEM_PER_NODE    Same as --mem
2668
2669       SLURM_MPI_TYPE        Same as --mpi
2670
2671       SLURM_NETWORK         Same as --network
2672
2673       SLURM_NNODES          Same as -N, --nodes. Total number of nodes in the
2674                             job’s        resource       allocation.       See
2675                             SLURM_JOB_NUM_NODES. Included for backwards  com‐
2676                             patibility.
2677
2678       SLURM_NO_KILL         Same as -k, --no-kill
2679
2680       SLURM_NPROCS          Same  as -n, --ntasks. See SLURM_NTASKS. Included
2681                             for backwards compatibility.
2682
2683       SLURM_NTASKS          Same as -n, --ntasks
2684
2685       SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2686
2687       SLURM_NTASKS_PER_GPU  Same as --ntasks-per-gpu
2688
2689       SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2690
2691       SLURM_NTASKS_PER_SOCKET
2692                             Same as --ntasks-per-socket
2693
2694       SLURM_OPEN_MODE       Same as --open-mode
2695
2696       SLURM_OVERCOMMIT      Same as -O, --overcommit
2697
2698       SLURM_OVERLAP         Same as --overlap
2699
2700       SLURM_PARTITION       Same as -p, --partition
2701
2702       SLURM_PMI_KVS_NO_DUP_KEYS
2703                             If set, then PMI key-pairs will contain no dupli‐
2704                             cate  keys.  MPI  can use this variable to inform
2705                             the PMI library that it will  not  use  duplicate
2706                             keys  so  PMI  can  skip  the check for duplicate
2707                             keys.  This is the case for  MPICH2  and  reduces
2708                             overhead  in  testing for duplicates for improved
2709                             performance
2710
2711       SLURM_POWER           Same as --power
2712
2713       SLURM_PROFILE         Same as --profile
2714
2715       SLURM_PROLOG          Same as --prolog
2716
2717       SLURM_QOS             Same as --qos
2718
2719       SLURM_REMOTE_CWD      Same as -D, --chdir=
2720
2721       SLURM_REQ_SWITCH      When a tree topology is used,  this  defines  the
2722                             maximum count of switches desired for the job al‐
2723                             location and optionally the maximum time to  wait
2724                             for that number of switches. See --switches
2725
2726       SLURM_RESERVATION     Same as --reservation
2727
2728       SLURM_RESV_PORTS      Same as --resv-ports
2729
2730       SLURM_SIGNAL          Same as --signal
2731
2732       SLURM_SPREAD_JOB      Same as --spread-job
2733
2734       SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2735                             if  set  and  non-zero, successive task exit mes‐
2736                             sages with the same exit  code  will  be  printed
2737                             only once.
2738
2739       SLURM_STDERRMODE      Same as -e, --error
2740
2741       SLURM_STDINMODE       Same as -i, --input
2742
2743       SLURM_STDOUTMODE      Same as -o, --output
2744
2745       SLURM_STEP_GRES       Same as --gres (only applies to job steps, not to
2746                             job allocations).  Also see SLURM_GRES
2747
2748       SLURM_STEP_KILLED_MSG_NODE_ID=ID
2749                             If set, only the specified node will log when the
2750                             job or step are killed by a signal.
2751
2752       SLURM_TASK_EPILOG     Same as --task-epilog
2753
2754       SLURM_TASK_PROLOG     Same as --task-prolog
2755
2756       SLURM_TEST_EXEC       If defined, srun will verify existence of the ex‐
2757                             ecutable program along with user execute  permis‐
2758                             sion on the node where srun was called before at‐
2759                             tempting to launch it on nodes in the step.
2760
2761       SLURM_THREAD_SPEC     Same as --thread-spec
2762
2763       SLURM_THREADS         Same as -T, --threads
2764
2765       SLURM_THREADS_PER_CORE
2766                             Same as -T, --threads-per-core
2767
2768       SLURM_TIMELIMIT       Same as -t, --time
2769
2770       SLURM_UNBUFFEREDIO    Same as -u, --unbuffered
2771
2772       SLURM_USE_MIN_NODES   Same as --use-min-nodes
2773
2774       SLURM_WAIT            Same as -W, --wait
2775
2776       SLURM_WAIT4SWITCH     Max time  waiting  for  requested  switches.  See
2777                             --switches
2778
2779       SLURM_WCKEY           Same as -W, --wckey
2780
2781       SLURM_WORKING_DIR     -D, --chdir
2782
2783       SLURMD_DEBUG          Same as -d, --slurmd-debug
2784
2785       SRUN_EXPORT_ENV       Same  as  --export, and will override any setting
2786                             for SLURM_EXPORT_ENV.
2787
2788
2789

OUTPUT ENVIRONMENT VARIABLES

2791       srun will set some environment variables in the environment of the exe‐
2792       cuting  tasks on the remote compute nodes.  These environment variables
2793       are:
2794
2795
2796       SLURM_*_HET_GROUP_#   For a heterogeneous job allocation, the  environ‐
2797                             ment variables are set separately for each compo‐
2798                             nent.
2799
2800       SLURM_CLUSTER_NAME    Name of the cluster on which the job  is  execut‐
2801                             ing.
2802
2803       SLURM_CPU_BIND_LIST   --cpu-bind  map  or  mask list (list of Slurm CPU
2804                             IDs or masks for this node, CPU_ID =  Board_ID  x
2805                             threads_per_board       +       Socket_ID       x
2806                             threads_per_socket + Core_ID x threads_per_core +
2807                             Thread_ID).
2808
2809       SLURM_CPU_BIND_TYPE   --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2810
2811       SLURM_CPU_BIND_VERBOSE
2812                             --cpu-bind verbosity (quiet,verbose).
2813
2814       SLURM_CPU_FREQ_REQ    Contains the value requested for cpu frequency on
2815                             the srun command  as  a  numerical  frequency  in
2816                             kilohertz, or a coded value for a request of low,
2817                             medium,highm1 or high for the frequency.  See the
2818                             description  of  the  --cpu-freq  option  or  the
2819                             SLURM_CPU_FREQ_REQ input environment variable.
2820
2821       SLURM_CPUS_ON_NODE    Count of processors available to the job on  this
2822                             node.   Note  the  select/linear plugin allocates
2823                             entire nodes to jobs, so the value indicates  the
2824                             total  count  of  CPUs  on the node.  For the se‐
2825                             lect/cons_res plugin, this number  indicates  the
2826                             number  of  cores  on  this node allocated to the
2827                             job.
2828
2829       SLURM_CPUS_PER_TASK   Number of cpus requested per task.  Only  set  if
2830                             the --cpus-per-task option is specified.
2831
2832       SLURM_DISTRIBUTION    Distribution type for the allocated jobs. Set the
2833                             distribution with -m, --distribution.
2834
2835       SLURM_GTIDS           Global task IDs running on this node.  Zero  ori‐
2836                             gin and comma separated.
2837
2838       SLURM_HET_SIZE        Set to count of components in heterogeneous job.
2839
2840       SLURM_JOB_ACCOUNT     Account name associated of the job allocation.
2841
2842       SLURM_JOB_CPUS_PER_NODE
2843                             Number of CPUS per node.
2844
2845       SLURM_JOB_DEPENDENCY  Set to value of the --dependency option.
2846
2847       SLURM_JOB_ID          Job id of the executing job.
2848
2849       SLURM_JOB_NAME        Set  to the value of the --job-name option or the
2850                             command name when srun is used to  create  a  new
2851                             job allocation. Not set when srun is used only to
2852                             create a job step (i.e. within  an  existing  job
2853                             allocation).
2854
2855       SLURM_JOB_NODELIST    List of nodes allocated to the job.
2856
2857       SLURM_JOB_NODES       Total number of nodes in the job's resource allo‐
2858                             cation.
2859
2860       SLURM_JOB_PARTITION   Name of the partition in which the  job  is  run‐
2861                             ning.
2862
2863       SLURM_JOB_QOS         Quality Of Service (QOS) of the job allocation.
2864
2865       SLURM_JOB_RESERVATION Advanced  reservation  containing the job alloca‐
2866                             tion, if any.
2867
2868       SLURM_JOBID           Job id of the executing  job.  See  SLURM_JOB_ID.
2869                             Included for backwards compatibility.
2870
2871       SLURM_LAUNCH_NODE_IPADDR
2872                             IP address of the node from which the task launch
2873                             was initiated (where the srun command ran from).
2874
2875       SLURM_LOCALID         Node local task ID for the process within a job.
2876
2877       SLURM_MEM_BIND_LIST   --mem-bind map or mask  list  (<list  of  IDs  or
2878                             masks for this node>).
2879
2880       SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2881
2882       SLURM_MEM_BIND_SORT   Sort  free cache pages (run zonesort on Intel KNL
2883                             nodes).
2884
2885       SLURM_MEM_BIND_TYPE   --mem-bind type (none,rank,map_mem:,mask_mem:).
2886
2887       SLURM_MEM_BIND_VERBOSE
2888                             --mem-bind verbosity (quiet,verbose).
2889
2890       SLURM_NODE_ALIASES    Sets of  node  name,  communication  address  and
2891                             hostname  for nodes allocated to the job from the
2892                             cloud. Each element in the set if colon separated
2893                             and each set is comma separated. For example:
2894                             SLURM_NODE_ALIASES=
2895                             ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2896
2897       SLURM_NODEID          The relative node ID of the current node.
2898
2899       SLURM_NPROCS          Total  number  of processes in the current job or
2900                             job step. See SLURM_NTASKS.  Included  for  back‐
2901                             wards compatibility.
2902
2903       SLURM_NTASKS          Total  number  of processes in the current job or
2904                             job step.
2905
2906       SLURM_OVERCOMMIT      Set to 1 if --overcommit was specified.
2907
2908       SLURM_PRIO_PROCESS    The scheduling priority (nice value) at the  time
2909                             of  job  submission.  This value is propagated to
2910                             the spawned processes.
2911
2912       SLURM_PROCID          The MPI rank (or relative process ID) of the cur‐
2913                             rent process.
2914
2915       SLURM_SRUN_COMM_HOST  IP address of srun communication host.
2916
2917       SLURM_SRUN_COMM_PORT  srun communication port.
2918
2919       SLURM_STEP_ID         The step ID of the current job.
2920
2921       SLURM_STEP_LAUNCHER_PORT
2922                             Step launcher port.
2923
2924       SLURM_STEP_NODELIST   List of nodes allocated to the step.
2925
2926       SLURM_STEP_NUM_NODES  Number of nodes allocated to the step.
2927
2928       SLURM_STEP_NUM_TASKS  Number of processes in the job step or whole het‐
2929                             erogeneous job step.
2930
2931       SLURM_STEP_TASKS_PER_NODE
2932                             Number of processes per node within the step.
2933
2934       SLURM_STEPID          The   step   ID   of   the   current   job.   See
2935                             SLURM_STEP_ID. Included for backwards compatibil‐
2936                             ity.
2937
2938       SLURM_SUBMIT_DIR      The directory from which srun was invoked.
2939
2940       SLURM_SUBMIT_HOST     The hostname of the computer  from  which  salloc
2941                             was invoked.
2942
2943       SLURM_TASK_PID        The process ID of the task being started.
2944
2945       SLURM_TASKS_PER_NODE  Number  of  tasks  to  be initiated on each node.
2946                             Values are comma separated and in the same  order
2947                             as  SLURM_JOB_NODELIST.   If two or more consecu‐
2948                             tive nodes are to have the same task count,  that
2949                             count is followed by "(x#)" where "#" is the rep‐
2950                             etition        count.        For         example,
2951                             "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2952                             first three nodes will each execute two tasks and
2953                             the fourth node will execute one task.
2954
2955
2956       SLURM_TOPOLOGY_ADDR   This  is  set  only  if the system has the topol‐
2957                             ogy/tree plugin configured.  The  value  will  be
2958                             set  to  the  names network switches which may be
2959                             involved in the  job's  communications  from  the
2960                             system's top level switch down to the leaf switch
2961                             and ending with node name. A period  is  used  to
2962                             separate each hardware component name.
2963
2964       SLURM_TOPOLOGY_ADDR_PATTERN
2965                             This  is  set  only  if the system has the topol‐
2966                             ogy/tree plugin configured.  The  value  will  be
2967                             set   component   types  listed  in  SLURM_TOPOL‐
2968                             OGY_ADDR.  Each component will be  identified  as
2969                             either  "switch"  or "node".  A period is used to
2970                             separate each hardware component type.
2971
2972       SLURM_UMASK           The umask in effect when the job was submitted.
2973
2974       SLURMD_NODENAME       Name of the node running the task. In the case of
2975                             a  parallel  job  executing  on  multiple compute
2976                             nodes, the various tasks will have this  environ‐
2977                             ment  variable  set  to  different values on each
2978                             compute node.
2979
2980       SRUN_DEBUG            Set to the logging level  of  the  srun  command.
2981                             Default  value  is  3 (info level).  The value is
2982                             incremented or decremented based upon the  --ver‐
2983                             bose and --quiet options.
2984
2985

SIGNALS AND ESCAPE SEQUENCES

2987       Signals  sent  to  the  srun command are automatically forwarded to the
2988       tasks it is controlling with a  few  exceptions.  The  escape  sequence
2989       <control-c> will report the state of all tasks associated with the srun
2990       command. If <control-c> is entered twice within one  second,  then  the
2991       associated  SIGINT  signal  will be sent to all tasks and a termination
2992       sequence will be entered sending SIGCONT, SIGTERM, and SIGKILL  to  all
2993       spawned  tasks.   If  a third <control-c> is received, the srun program
2994       will be terminated without waiting for remote tasks to  exit  or  their
2995       I/O to complete.
2996
2997       The escape sequence <control-z> is presently ignored.
2998
2999

MPI SUPPORT

3001       MPI  use depends upon the type of MPI being used.  There are three fun‐
3002       damentally different modes of operation used by these various  MPI  im‐
3003       plementation.
3004
3005       1.  Slurm  directly  launches  the tasks and performs initialization of
3006       communications through the PMI2 or PMIx APIs.  For example: "srun  -n16
3007       a.out".
3008
3009       2.  Slurm  creates  a  resource  allocation for the job and then mpirun
3010       launches tasks using Slurm's infrastructure (OpenMPI).
3011
3012       3. Slurm creates a resource allocation for  the  job  and  then  mpirun
3013       launches  tasks  using  some mechanism other than Slurm, such as SSH or
3014       RSH.  These tasks are initiated outside of Slurm's monitoring  or  con‐
3015       trol. Slurm's epilog should be configured to purge these tasks when the
3016       job's allocation is relinquished, or  the  use  of  pam_slurm_adopt  is
3017       highly recommended.
3018
3019       See  https://slurm.schedmd.com/mpi_guide.html  for  more information on
3020       use of these various MPI implementation with Slurm.
3021
3022

MULTIPLE PROGRAM CONFIGURATION

3024       Comments in the configuration file must have a "#" in column one.   The
3025       configuration  file  contains  the  following fields separated by white
3026       space:
3027
3028       Task rank
3029              One or more task ranks to use this configuration.  Multiple val‐
3030              ues  may  be  comma separated.  Ranges may be indicated with two
3031              numbers separated with a '-' with the smaller number first (e.g.
3032              "0-4" and not "4-0").  To indicate all tasks not otherwise spec‐
3033              ified, specify a rank of '*' as the last line of the  file.   If
3034              an  attempt  is  made to initiate a task for which no executable
3035              program is defined, the following error message will be produced
3036              "No executable program specified for this task".
3037
3038       Executable
3039              The  name  of  the  program  to execute.  May be fully qualified
3040              pathname if desired.
3041
3042       Arguments
3043              Program arguments.  The expression "%t" will  be  replaced  with
3044              the  task's  number.   The expression "%o" will be replaced with
3045              the task's offset within this range (e.g. a configured task rank
3046              value  of  "1-5"  would  have  offset  values of "0-4").  Single
3047              quotes may be used to avoid having the  enclosed  values  inter‐
3048              preted.   This field is optional.  Any arguments for the program
3049              entered on the command line will be added to the arguments spec‐
3050              ified in the configuration file.
3051
3052       For example:
3053       $ cat silly.conf
3054       ###################################################################
3055       # srun multiple program configuration file
3056       #
3057       # srun -n8 -l --multi-prog silly.conf
3058       ###################################################################
3059       4-6       hostname
3060       1,7       echo  task:%t
3061       0,2-3     echo  offset:%o
3062
3063       $ srun -n8 -l --multi-prog silly.conf
3064       0: offset:0
3065       1: task:1
3066       2: offset:1
3067       3: offset:2
3068       4: linux15.llnl.gov
3069       5: linux16.llnl.gov
3070       6: linux17.llnl.gov
3071       7: task:7
3072
3073

EXAMPLES

3075       This  simple example demonstrates the execution of the command hostname
3076       in eight tasks. At least eight processors will be allocated to the  job
3077       (the same as the task count) on however many nodes are required to sat‐
3078       isfy the request. The output of each task will be  proceeded  with  its
3079       task  number.   (The  machine "dev" in the example below has a total of
3080       two CPUs per node)
3081
3082       $ srun -n8 -l hostname
3083       0: dev0
3084       1: dev0
3085       2: dev1
3086       3: dev1
3087       4: dev2
3088       5: dev2
3089       6: dev3
3090       7: dev3
3091
3092
3093       The srun -r option is used within a job script to run two job steps  on
3094       disjoint  nodes in the following example. The script is run using allo‐
3095       cate mode instead of as a batch job in this case.
3096
3097       $ cat test.sh
3098       #!/bin/sh
3099       echo $SLURM_JOB_NODELIST
3100       srun -lN2 -r2 hostname
3101       srun -lN2 hostname
3102
3103       $ salloc -N4 test.sh
3104       dev[7-10]
3105       0: dev9
3106       1: dev10
3107       0: dev7
3108       1: dev8
3109
3110
3111       The following script runs two job steps in parallel within an allocated
3112       set of nodes.
3113
3114       $ cat test.sh
3115       #!/bin/bash
3116       srun -lN2 -n4 -r 2 sleep 60 &
3117       srun -lN2 -r 0 sleep 60 &
3118       sleep 1
3119       squeue
3120       squeue -s
3121       wait
3122
3123       $ salloc -N4 test.sh
3124         JOBID PARTITION     NAME     USER  ST      TIME  NODES NODELIST
3125         65641     batch  test.sh   grondo   R      0:01      4 dev[7-10]
3126
3127       STEPID     PARTITION     USER      TIME NODELIST
3128       65641.0        batch   grondo      0:01 dev[7-8]
3129       65641.1        batch   grondo      0:01 dev[9-10]
3130
3131
3132       This  example  demonstrates  how one executes a simple MPI job.  We use
3133       srun to build a list of machines (nodes) to be used by  mpirun  in  its
3134       required  format.  A  sample command line and the script to be executed
3135       follow.
3136
3137       $ cat test.sh
3138       #!/bin/sh
3139       MACHINEFILE="nodes.$SLURM_JOB_ID"
3140
3141       # Generate Machinefile for mpi such that hosts are in the same
3142       #  order as if run via srun
3143       #
3144       srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3145
3146       # Run using generated Machine file:
3147       mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3148
3149       rm $MACHINEFILE
3150
3151       $ salloc -N2 -n4 test.sh
3152
3153
3154       This simple example demonstrates the execution  of  different  jobs  on
3155       different  nodes  in  the same srun.  You can do this for any number of
3156       nodes or any number of jobs.  The executables are placed on  the  nodes
3157       sited by the SLURM_NODEID env var.  Starting at 0 and going to the num‐
3158       ber specified on the srun commandline.
3159
3160       $ cat test.sh
3161       case $SLURM_NODEID in
3162           0) echo "I am running on "
3163              hostname ;;
3164           1) hostname
3165              echo "is where I am running" ;;
3166       esac
3167
3168       $ srun -N2 test.sh
3169       dev0
3170       is where I am running
3171       I am running on
3172       dev1
3173
3174
3175       This example demonstrates use of multi-core options to  control  layout
3176       of  tasks.   We  request  that  four sockets per node and two cores per
3177       socket be dedicated to the job.
3178
3179       $ srun -N2 -B 4-4:2-2 a.out
3180
3181
3182       This example shows a script in which Slurm is used to provide  resource
3183       management  for  a job by executing the various job steps as processors
3184       become available for their dedicated use.
3185
3186       $ cat my.script
3187       #!/bin/bash
3188       srun -n4 prog1 &
3189       srun -n3 prog2 &
3190       srun -n1 prog3 &
3191       srun -n1 prog4 &
3192       wait
3193
3194
3195       This example shows how to launch an application  called  "server"  with
3196       one task, 8 CPUs and 16 GB of memory (2 GB per CPU) plus another appli‐
3197       cation called "client" with 16 tasks, 1 CPU per task (the default)  and
3198       1 GB of memory per task.
3199
3200       $ srun -n1 -c16 --mem-per-cpu=1gb server : -n16 --mem-per-cpu=1gb client
3201
3202

COPYING

3204       Copyright  (C)  2006-2007  The Regents of the University of California.
3205       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3206       Copyright (C) 2008-2010 Lawrence Livermore National Security.
3207       Copyright (C) 2010-2015 SchedMD LLC.
3208
3209       This file is part of Slurm, a resource  management  program.   For  de‐
3210       tails, see <https://slurm.schedmd.com/>.
3211
3212       Slurm  is free software; you can redistribute it and/or modify it under
3213       the terms of the GNU General Public License as published  by  the  Free
3214       Software  Foundation;  either version 2 of the License, or (at your op‐
3215       tion) any later version.
3216
3217       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
3218       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
3219       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
3220       for more details.
3221
3222