srun(1) - f33

1srun(1)                         Slurm Commands                         srun(1)
2
3
4

NAME

6       srun - Run parallel jobs
7
8

SYNOPSIS

10       srun  [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]]
11       executable(N) [args(N)...]
12
13       Option(s) define multiple jobs in  a  co-scheduled  heterogeneous  job.
14       For more details about heterogeneous jobs see the document
15       https://slurm.schedmd.com/heterogeneous_jobs.html
16
17

DESCRIPTION

19       Run  a  parallel  job  on cluster managed by Slurm.  If necessary, srun
20       will first create a resource allocation in which to  run  the  parallel
21       job.
22
23       The  following  document  describes the influence of various options on
24       the allocation of cpus to jobs and tasks.
25       https://slurm.schedmd.com/cpu_management.html
26
27

RETURN VALUE

29       srun will return the highest exit code of all tasks run or the  highest
30       signal  (with  the high-order bit set in an 8-bit integer -- e.g. 128 +
31       signal) of any task that exited with a signal.
32       The value 253 is reserved for out-of-memory errors.
33
34

EXECUTABLE PATH RESOLUTION

36       The executable is resolved in the following order:
37
38       1. If executable starts with ".", then path is constructed as:  current
39       working directory / executable
40       2. If executable starts with a "/", then path is considered absolute.
41       3. If executable can be resolved through PATH. See path_resolution(7).
42       4. If executable is in current working directory.
43
44       Current  working  directory  is  the  calling process working directory
45       unless the --chdir argument is passed, which will override the  current
46       working directory.
47
48

OPTIONS

50       --accel-bind=<options>
51              Control  how  tasks  are bound to generic resources of type gpu,
52              mic and nic.   Multiple  options  may  be  specified.  Supported
53              options include:
54
55              g      Bind each task to GPUs which are closest to the allocated
56                     CPUs.
57
58              m      Bind each task to MICs which are closest to the allocated
59                     CPUs.
60
61              n      Bind each task to NICs which are closest to the allocated
62                     CPUs.
63
64              v      Verbose mode. Log how tasks are  bound  to  GPU  and  NIC
65                     devices.
66
67              This option applies to job allocations.
68
69
70       -A, --account=<account>
71              Charge  resources  used  by  this job to specified account.  The
72              account is an arbitrary string. The account name may be  changed
73              after  job  submission  using  the scontrol command. This option
74              applies to job allocations.
75
76
77       --acctg-freq
78              Define the job  accounting  and  profiling  sampling  intervals.
79              This  can be used to override the JobAcctGatherFrequency parame‐
80              ter in Slurm's configuration file,  slurm.conf.   The  supported
81              format is follows:
82
83              --acctg-freq=<datatype>=<interval>
84                          where  <datatype>=<interval> specifies the task sam‐
85                          pling interval for the jobacct_gather  plugin  or  a
86                          sampling  interval  for  a  profiling  type  by  the
87                          acct_gather_profile  plugin.  Multiple,  comma-sepa‐
88                          rated  <datatype>=<interval> intervals may be speci‐
89                          fied. Supported datatypes are as follows:
90
91                          task=<interval>
92                                 where <interval> is the task sampling  inter‐
93                                 val in seconds for the jobacct_gather plugins
94                                 and    for    task    profiling    by     the
95                                 acct_gather_profile  plugin.  NOTE: This fre‐
96                                 quency is used to monitor  memory  usage.  If
97                                 memory  limits  are enforced the highest fre‐
98                                 quency a user can request is what is  config‐
99                                 ured  in  the  slurm.conf file.  They can not
100                                 turn it off (=0) either.
101
102                          energy=<interval>
103                                 where <interval> is the sampling interval  in
104                                 seconds   for   energy  profiling  using  the
105                                 acct_gather_energy plugin
106
107                          network=<interval>
108                                 where <interval> is the sampling interval  in
109                                 seconds  for  infiniband  profiling using the
110                                 acct_gather_interconnect plugin.
111
112                          filesystem=<interval>
113                                 where <interval> is the sampling interval  in
114                                 seconds  for  filesystem  profiling using the
115                                 acct_gather_filesystem plugin.
116
117              The  default  value  for  the  task  sampling
118              interval
119              is  30.  The  default  value  for  all other intervals is 0.  An
120              interval of 0 disables sampling of the specified type.   If  the
121              task sampling interval is 0, accounting information is collected
122              only at job termination (reducing Slurm  interference  with  the
123              job).
124              Smaller (non-zero) values have a greater impact upon job perfor‐
125              mance, but a value of 30 seconds is not likely to be  noticeable
126              for  applications  having  less  than  10,000 tasks. This option
127              applies job allocations.
128
129
130       -B --extra-node-info=<sockets[:cores[:threads]]>
131              Restrict node selection to nodes with  at  least  the  specified
132              number  of  sockets,  cores  per socket and/or threads per core.
133              NOTE: These options do not specify the resource allocation size.
134              Each  value  specified is considered a minimum.  An asterisk (*)
135              can be used as  a  placeholder  indicating  that  all  available
136              resources  of  that  type are to be utilized. Values can also be
137              specified as min-max. The individual levels can also  be  speci‐
138              fied in separate options if desired:
139                  --sockets-per-node=<sockets>
140                  --cores-per-socket=<cores>
141                  --threads-per-core=<threads>
142              If  task/affinity  plugin is enabled, then specifying an alloca‐
143              tion in this manner also sets a  default  --cpu-bind  option  of
144              threads  if the -B option specifies a thread count, otherwise an
145              option of cores if a  core  count  is  specified,  otherwise  an
146              option   of   sockets.    If   SelectType   is   configured   to
147              select/cons_res,  it  must  have   a   parameter   of   CR_Core,
148              CR_Core_Memory,  CR_Socket,  or CR_Socket_Memory for this option
149              to be honored.  If not specified, the  scontrol  show  job  will
150              display  'ReqS:C:T=*:*:*'.  This  option  applies to job alloca‐
151              tions.  NOTE: This option is  mutually  exclusive  with  --hint,
152              --threads-per-core and --ntasks-per-core.
153
154
155       --bb=<spec>
156              Burst  buffer  specification.  The  form of the specification is
157              system dependent.  Also see --bbf. This option  applies  to  job
158              allocations.
159
160
161       --bbf=<file_name>
162              Path of file containing burst buffer specification.  The form of
163              the specification is system  dependent.   Also  see  --bb.  This
164              option applies to job allocations.
165
166
167       --bcast[=<dest_path>]
168              Copy executable file to allocated compute nodes.  If a file name
169              is specified, copy the executable to the  specified  destination
170              file path.  If the path specified ends with '/' it is treated as
171              a target directory,  and  the  destination  file  name  will  be
172              slurm_bcast_<job_id>.<step_id>_<nodename>.   If  no dest_path is
173              specified, then the current working directory is used,  and  the
174              filename   follows   the  above  pattern.   For  example,  "srun
175              --bcast=/tmp/mine -N3 a.out" will copy  the  file  "a.out"  from
176              your  current  directory  to the file "/tmp/mine" on each of the
177              three allocated compute nodes and execute that file. This option
178              applies to step allocations.
179
180
181       -b, --begin=<time>
182              Defer  initiation  of  this  job  until  the specified time.  It
183              accepts times of the form HH:MM:SS to run a job  at  a  specific
184              time  of  day  (seconds are optional).  (If that time is already
185              past, the next day is assumed.)  You may also specify  midnight,
186              noon,  fika  (3  PM)  or  teatime  (4  PM)  and  you  can have a
187              time-of-day suffixed with AM or PM for running in the morning or
188              the  evening.  You can also say what day the job will be run, by
189              specifying a date of the form  MMDDYY  or  MM/DD/YY  YYYY-MM-DD.
190              Combine    date    and   time   using   the   following   format
191              YYYY-MM-DD[THH:MM[:SS]]. You can also  give  times  like  now  +
192              count time-units, where the time-units can be seconds (default),
193              minutes, hours, days, or weeks and you can tell Slurm to run the
194              job  today  with  the  keyword today and to run the job tomorrow
195              with the keyword tomorrow.  The value may be changed  after  job
196              submission using the scontrol command.  For example:
197                 --begin=16:00
198                 --begin=now+1hour
199                 --begin=now+60           (seconds by default)
200                 --begin=2010-01-20T12:34:00
201
202
203              Notes on date/time specifications:
204               -  Although the 'seconds' field of the HH:MM:SS time specifica‐
205              tion is allowed by the code, note that  the  poll  time  of  the
206              Slurm  scheduler  is not precise enough to guarantee dispatch of
207              the job on the exact second.  The job will be eligible to  start
208              on  the  next  poll following the specified time. The exact poll
209              interval depends on the Slurm scheduler (e.g., 60  seconds  with
210              the default sched/builtin).
211               -   If   no  time  (HH:MM:SS)  is  specified,  the  default  is
212              (00:00:00).
213               - If a date is specified without a year (e.g., MM/DD) then  the
214              current  year  is  assumed,  unless the combination of MM/DD and
215              HH:MM:SS has already passed for that year,  in  which  case  the
216              next year is used.
217              This option applies to job allocations.
218
219
220       --cluster-constraint=<list>
221              Specifies  features that a federated cluster must have to have a
222              sibling job submitted to it. Slurm will attempt to submit a sib‐
223              ling  job  to  a cluster if it has at least one of the specified
224              features.
225
226
227       --comment=<string>
228              An arbitrary comment. This option applies to job allocations.
229
230
231       --compress[=type]
232              Compress file before sending it to compute hosts.  The  optional
233              argument  specifies  the  data  compression  library to be used.
234              Supported values are "lz4" (default) and "zlib".  Some  compres‐
235              sion libraries may be unavailable on some systems.  For use with
236              the --bcast option. This option applies to step allocations.
237
238
239       -C, --constraint=<list>
240              Nodes can have features assigned to them by the  Slurm  adminis‐
241              trator.   Users can specify which of these features are required
242              by their job using the constraint  option.   Only  nodes  having
243              features  matching  the  job constraints will be used to satisfy
244              the request.  Multiple constraints may be  specified  with  AND,
245              OR,  matching  OR, resource counts, etc. (some operators are not
246              supported on all system types).   Supported  constraint  options
247              include:
248
249              Single Name
250                     Only nodes which have the specified feature will be used.
251                     For example, --constraint="intel"
252
253              Node Count
254                     A request can specify the number  of  nodes  needed  with
255                     some feature by appending an asterisk and count after the
256                     feature   name.    For   example,    --nodes=16    --con‐
257                     straint="graphics*4 ..."  indicates that the job requires
258                     16 nodes and that at least four of those nodes must  have
259                     the feature "graphics."
260
261              AND    If  only  nodes  with  all  of specified features will be
262                     used.  The ampersand is used for an  AND  operator.   For
263                     example, --constraint="intel&gpu"
264
265              OR     If  only  nodes  with  at least one of specified features
266                     will be used.  The vertical bar is used for an OR  opera‐
267                     tor.  For example, --constraint="intel|amd"
268
269              Matching OR
270                     If  only  one of a set of possible options should be used
271                     for all allocated nodes, then use  the  OR  operator  and
272                     enclose the options within square brackets.  For example,
273                     --constraint="[rack1|rack2|rack3|rack4]" might be used to
274                     specify that all nodes must be allocated on a single rack
275                     of the cluster, but any of those four racks can be used.
276
277              Multiple Counts
278                     Specific counts of multiple resources may be specified by
279                     using  the  AND operator and enclosing the options within
280                     square      brackets.       For      example,      --con‐
281                     straint="[rack1*2&rack2*4]" might be used to specify that
282                     two nodes must be allocated from nodes with  the  feature
283                     of  "rack1"  and  four nodes must be allocated from nodes
284                     with the feature "rack2".
285
286                     NOTE: This construct does not support multiple Intel  KNL
287                     NUMA   or   MCDRAM   modes.  For  example,  while  --con‐
288                     straint="[(knl&quad)*2&(knl&hemi)*4]" is  not  supported,
289                     --constraint="[haswell*2&(knl&hemi)*4]"   is   supported.
290                     Specification of multiple KNL modes requires the use of a
291                     heterogeneous job.
292
293              Brackets
294                     Brackets can be used to indicate that you are looking for
295                     a set of nodes with the different requirements  contained
296                     within     the     brackets.    For    example,    --con‐
297                     straint="[(rack1|rack2)*1&(rack3)*2]" will  get  you  one
298                     node  with either the "rack1" or "rack2" features and two
299                     nodes with the "rack3" feature.  The same request without
300                     the  brackets  will  try to find a single node that meets
301                     those requirements.
302
303              Parenthesis
304                     Parenthesis can be  used  to  group  like  node  features
305                     together.           For          example,          --con‐
306                     straint="[(knl&snc4&flat)*4&haswell*1]" might be used  to
307                     specify  that  four nodes with the features "knl", "snc4"
308                     and "flat" plus one node with the feature  "haswell"  are
309                     required.   All  options  within  parenthesis  should  be
310                     grouped with AND (e.g. "&") operands.
311
312              WARNING: When srun is executed from within salloc or sbatch, the
313              constraint value can only contain a single feature name. None of
314              the other operators are currently supported for job steps.
315              This option applies to job and step allocations.
316
317
318       --contiguous
319              If set, then the allocated nodes must form a contiguous set.
320
321              NOTE: If SelectPlugin=cons_res this option won't be honored with
322              the  topology/tree  or  topology/3d_torus plugins, both of which
323              can modify the node ordering. This option applies to job alloca‐
324              tions.
325
326
327       --cores-per-socket=<cores>
328              Restrict  node  selection  to  nodes with at least the specified
329              number of cores per socket.  See additional information under -B
330              option  above  when task/affinity plugin is enabled. This option
331              applies to job allocations.
332
333
334       --cpu-bind=[{quiet,verbose},]type
335              Bind tasks  to  CPUs.   Used  only  when  the  task/affinity  or
336              task/cgroup  plugin  is  enabled.   NOTE:  To  have Slurm always
337              report on the selected CPU binding for all commands executed  in
338              a   shell,   you   can   enable  verbose  mode  by  setting  the
339              SLURM_CPU_BIND environment variable value to "verbose".
340
341              The following informational environment variables are  set  when
342              --cpu-bind is in use:
343                   SLURM_CPU_BIND_VERBOSE
344                   SLURM_CPU_BIND_TYPE
345                   SLURM_CPU_BIND_LIST
346
347              See  the  ENVIRONMENT  VARIABLES  section  for  a  more detailed
348              description of the individual  SLURM_CPU_BIND  variables.  These
349              variable  are available only if the task/affinity plugin is con‐
350              figured.
351
352              When using --cpus-per-task to run multithreaded tasks, be  aware
353              that  CPU  binding  is inherited from the parent of the process.
354              This means that the multithreaded task should either specify  or
355              clear  the CPU binding itself to avoid having all threads of the
356              multithreaded task use the same mask/CPU as the parent.   Alter‐
357              natively,  fat  masks (masks which specify more than one allowed
358              CPU) could be used for the tasks in order  to  provide  multiple
359              CPUs for the multithreaded tasks.
360
361              Note  that a job step can be allocated different numbers of CPUs
362              on each node or be allocated CPUs not starting at location zero.
363              Therefore  one  of  the options which automatically generate the
364              task binding is  recommended.   Explicitly  specified  masks  or
365              bindings  are  only honored when the job step has been allocated
366              every available CPU on the node.
367
368              Binding a task to a NUMA locality domain means to bind the  task
369              to  the  set  of CPUs that belong to the NUMA locality domain or
370              "NUMA node".  If NUMA locality domain options are used  on  sys‐
371              tems  with  no  NUMA  support,  then each socket is considered a
372              locality domain.
373
374              If the --cpu-bind option is not used, the default  binding  mode
375              will  depend  upon Slurm's configuration and the step's resource
376              allocation.  If all allocated nodes  have  the  same  configured
377              CpuBind  mode, that will be used.  Otherwise if the job's Parti‐
378              tion has a configured CpuBind mode, that will be  used.   Other‐
379              wise  if Slurm has a configured TaskPluginParam value, that mode
380              will be used.  Otherwise automatic binding will be performed  as
381              described below.
382
383
384              Auto Binding
385                     Applies  only  when  task/affinity is enabled. If the job
386                     step allocation includes an allocation with a  number  of
387                     sockets,  cores,  or threads equal to the number of tasks
388                     times cpus-per-task, then the tasks will  by  default  be
389                     bound  to  the appropriate resources (auto binding). Dis‐
390                     able  this  mode  of  operation  by  explicitly   setting
391                     "--cpu-bind=none".        Use       TaskPluginParam=auto‐
392                     bind=[threads|cores|sockets] to set a default cpu binding
393                     in case "auto binding" doesn't find a match.
394
395              Supported options include:
396
397                     q[uiet]
398                            Quietly bind before task runs (default)
399
400                     v[erbose]
401                            Verbosely report binding before task runs
402
403                     no[ne] Do  not  bind  tasks  to CPUs (default unless auto
404                            binding is applied)
405
406                     rank   Automatically bind by task rank.  The lowest  num‐
407                            bered  task  on  each  node is bound to socket (or
408                            core or thread) zero, etc.  Not  supported  unless
409                            the entire node is allocated to the job.
410
411                     map_cpu:<list>
412                            Bind  by  setting CPU masks on tasks (or ranks) as
413                            specified         where         <list>          is
414                            <cpu_id_for_task_0>,<cpu_id_for_task_1>,...    CPU
415                            IDs are interpreted as decimal values unless  they
416                            are  preceded  with '0x' in which case they inter‐
417                            preted as hexadecimal values.  If  the  number  of
418                            tasks (or ranks) exceeds the number of elements in
419                            this list, elements in the list will be reused  as
420                            needed  starting  from  the beginning of the list.
421                            To simplify support for  large  task  counts,  the
422                            lists  may follow a map with an asterisk and repe‐
423                            tition         count.          For         example
424                            "map_cpu:0x0f*4,0xf0*4".  Not supported unless the
425                            entire node is allocated to the job.
426
427                     mask_cpu:<list>
428                            Bind by setting CPU masks on tasks (or  ranks)  as
429                            specified          where         <list>         is
430                            <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
431                            The  mapping is specified for a node and identical
432                            mapping is applied to  the  tasks  on  every  node
433                            (i.e. the lowest task ID on each node is mapped to
434                            the first mask specified in the list, etc.).   CPU
435                            masks are always interpreted as hexadecimal values
436                            but can be preceded with an optional '0x'.  If the
437                            number  of  tasks (or ranks) exceeds the number of
438                            elements in this list, elements in the  list  will
439                            be reused as needed starting from the beginning of
440                            the list.  To  simplify  support  for  large  task
441                            counts,  the lists may follow a map with an aster‐
442                            isk   and   repetition   count.     For    example
443                            "mask_cpu:0x0f*4,0xf0*4".   Not  supported  unless
444                            the entire node is allocated to the job.
445
446                     rank_ldom
447                            Bind to a NUMA locality domain by rank.  Not  sup‐
448                            ported  unless the entire node is allocated to the
449                            job.
450
451                     map_ldom:<list>
452                            Bind by mapping NUMA locality domain IDs to  tasks
453                            as       specified       where      <list>      is
454                            <ldom1>,<ldom2>,...<ldomN>.  The  locality  domain
455                            IDs  are interpreted as decimal values unless they
456                            are preceded with '0x'  in  which  case  they  are
457                            interpreted  as hexadecimal values.  Not supported
458                            unless the entire node is allocated to the job.
459
460                     mask_ldom:<list>
461                            Bind by setting  NUMA  locality  domain  masks  on
462                            tasks     as    specified    where    <list>    is
463                            <mask1>,<mask2>,...<maskN>.  NUMA locality  domain
464                            masks are always interpreted as hexadecimal values
465                            but can be preceded with an  optional  '0x'.   Not
466                            supported  unless  the entire node is allocated to
467                            the job.
468
469                     sockets
470                            Automatically  generate  masks  binding  tasks  to
471                            sockets.   Only  the CPUs on the socket which have
472                            been allocated to the job will be  used.   If  the
473                            number  of  tasks differs from the number of allo‐
474                            cated sockets this can result in sub-optimal bind‐
475                            ing.
476
477                     cores  Automatically  generate  masks  binding  tasks  to
478                            cores.  If the number of tasks  differs  from  the
479                            number  of  allocated  cores  this  can  result in
480                            sub-optimal binding.
481
482                     threads
483                            Automatically  generate  masks  binding  tasks  to
484                            threads.   If the number of tasks differs from the
485                            number of allocated threads  this  can  result  in
486                            sub-optimal binding.
487
488                     ldoms  Automatically generate masks binding tasks to NUMA
489                            locality domains.  If the number of tasks  differs
490                            from the number of allocated locality domains this
491                            can result in sub-optimal binding.
492
493                     boards Automatically  generate  masks  binding  tasks  to
494                            boards.   If  the number of tasks differs from the
495                            number of allocated  boards  this  can  result  in
496                            sub-optimal  binding.  This option is supported by
497                            the task/cgroup plugin only.
498
499                     help   Show help message for cpu-bind
500
501              This option applies to job and step allocations.
502
503
504       --cpu-freq =<p1[-p2[:p3]]>
505
506              Request that the job step initiated by this srun command be  run
507              at  some  requested  frequency if possible, on the CPUs selected
508              for the step on the compute node(s).
509
510              p1 can be  [#### | low | medium | high | highm1] which will  set
511              the  frequency scaling_speed to the corresponding value, and set
512              the frequency scaling_governor to UserSpace. See below for defi‐
513              nition of the values.
514
515              p1  can  be  [Conservative | OnDemand | Performance | PowerSave]
516              which will set the scaling_governor to the corresponding  value.
517              The  governor has to be in the list set by the slurm.conf option
518              CpuFreqGovernors.
519
520              When p2 is present, p1 will be the minimum scaling frequency and
521              p2 will be the maximum scaling frequency.
522
523              p2  can  be   [#### | medium | high | highm1] p2 must be greater
524              than p1.
525
526              p3 can be [Conservative | OnDemand | Performance |  PowerSave  |
527              UserSpace]  which  will  set  the  governor to the corresponding
528              value.
529
530              If p3 is UserSpace, the frequency scaling_speed will be set by a
531              power  or energy aware scheduling strategy to a value between p1
532              and p2 that lets the job run within the site's power  goal.  The
533              job  may be delayed if p1 is higher than a frequency that allows
534              the job to run within the goal.
535
536              If the current frequency is < min, it will be set to min.  Like‐
537              wise, if the current frequency is > max, it will be set to max.
538
539              Acceptable values at present include:
540
541              ####          frequency in kilohertz
542
543              Low           the lowest available frequency
544
545              High          the highest available frequency
546
547              HighM1        (high  minus  one)  will  select  the next highest
548                            available frequency
549
550              Medium        attempts to set a frequency in the middle  of  the
551                            available range
552
553              Conservative  attempts to use the Conservative CPU governor
554
555              OnDemand      attempts  to  use  the  OnDemand CPU governor (the
556                            default value)
557
558              Performance   attempts to use the Performance CPU governor
559
560              PowerSave     attempts to use the PowerSave CPU governor
561
562              UserSpace     attempts to use the UserSpace CPU governor
563
564
565              The following informational environment variable  is  set
566              in the job
567              step when --cpu-freq option is requested.
568                      SLURM_CPU_FREQ_REQ
569
570              This  environment  variable can also be used to supply the value
571              for the CPU frequency request if it is set when the 'srun'  com‐
572              mand  is  issued.  The --cpu-freq on the command line will over‐
573              ride the environment variable value.  The form on  the  environ‐
574              ment variable is the same as the command line.  See the ENVIRON‐
575              MENT   VARIABLES   section   for   a    description    of    the
576              SLURM_CPU_FREQ_REQ variable.
577
578              NOTE: This parameter is treated as a request, not a requirement.
579              If the job step's node does not support  setting  the  CPU  fre‐
580              quency,  or  the  requested  value  is outside the bounds of the
581              legal frequencies, an error is  logged,  but  the  job  step  is
582              allowed to continue.
583
584              NOTE:  Setting  the  frequency for just the CPUs of the job step
585              implies that the tasks are confined to those CPUs.  If task con‐
586              finement    (i.e.,    TaskPlugin=task/affinity    or    TaskPlu‐
587              gin=task/cgroup with the "ConstrainCores" option) is not config‐
588              ured, this parameter is ignored.
589
590              NOTE:  When  the  step  completes, the frequency and governor of
591              each selected CPU is reset to the previous values.
592
593              NOTE: When submitting jobs with  the --cpu-freq option with lin‐
594              uxproc  as  the  ProctrackType can cause jobs to run too quickly
595              before Accounting is able to poll  for  job  information.  As  a
596              result not all of accounting information will be present.
597
598              This option applies to job and step allocations.
599
600
601       --cpus-per-gpu=<ncpus>
602              Advise  Slurm  that ensuing job steps will require ncpus proces‐
603              sors per allocated GPU.  Not compatible with the --cpus-per-task
604              option.
605
606
607       -c, --cpus-per-task=<ncpus>
608              Request  that ncpus be allocated per process. This may be useful
609              if the job is multithreaded and requires more than one  CPU  per
610              task  for  optimal  performance.  The  default  is  one  CPU per
611              process.  If -c is specified without -n, as many tasks  will  be
612              allocated  per node as possible while satisfying the -c restric‐
613              tion. For instance on a cluster with 8  CPUs  per  node,  a  job
614              request  for 4 nodes and 3 CPUs per task may be allocated 3 or 6
615              CPUs per node (1 or 2 tasks per node)  depending  upon  resource
616              consumption  by  other jobs. Such a job may be unable to execute
617              more than a total of 4 tasks.
618
619              WARNING: There are configurations and options  interpreted  dif‐
620              ferently by job and job step requests which can result in incon‐
621              sistencies   for   this   option.    For   example   srun    -c2
622              --threads-per-core=1  prog  may  allocate two cores for the job,
623              but if each of those cores contains two threads, the job alloca‐
624              tion  will  include four CPUs. The job step allocation will then
625              launch two threads per CPU for a total of two tasks.
626
627              WARNING: When srun is executed from  within  salloc  or  sbatch,
628              there  are configurations and options which can result in incon‐
629              sistent allocations when -c has a value greater than -c on  sal‐
630              loc or sbatch.
631
632              This option applies to job allocations.
633
634
635       --deadline=<OPT>
636              remove  the  job  if  no ending is possible before this deadline
637              (start > (deadline -  time[-min])).   Default  is  no  deadline.
638              Valid time formats are:
639              HH:MM[:SS] [AM|PM]
640              MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
641              MM/DD[/YY]-HH:MM[:SS]
642              YYYY-MM-DD[THH:MM[:SS]]]
643              now[+count[seconds(default)|minutes|hours|days|weeks]]
644
645              This option applies only to job allocations.
646
647
648       --delay-boot=<minutes>
649              Do  not  reboot  nodes  in order to satisfied this job's feature
650              specification if the job has been eligible to run for less  than
651              this time period.  If the job has waited for less than the spec‐
652              ified period, it will use only  nodes  which  already  have  the
653              specified  features.   The  argument  is in units of minutes.  A
654              default value may be set by a  system  administrator  using  the
655              delay_boot   option  of  the  SchedulerParameters  configuration
656              parameter in the slurm.conf file, otherwise the default value is
657              zero (no delay).
658
659              This option applies only to job allocations.
660
661
662       -d, --dependency=<dependency_list>
663              Defer  the  start  of  this job until the specified dependencies
664              have been satisfied completed. This option does not apply to job
665              steps  (executions  of  srun within an existing salloc or sbatch
666              allocation) only to job allocations.   <dependency_list>  is  of
667              the    form   <type:job_id[:job_id][,type:job_id[:job_id]]>   or
668              <type:job_id[:job_id][?type:job_id[:job_id]]>.  All dependencies
669              must  be satisfied if the "," separator is used.  Any dependency
670              may be satisfied if the "?" separator is used.  Only one separa‐
671              tor  may  be  used.  Many jobs can share the same dependency and
672              these jobs may even belong to different  users. The   value  may
673              be  changed  after  job  submission  using the scontrol command.
674              Dependencies on remote jobs are allowed in a federation.  Once a
675              job dependency fails due to the termination state of a preceding
676              job, the dependent job will never be run, even if the  preceding
677              job  is requeued and has a different termination state in a sub‐
678              sequent execution. This option applies to job allocations.
679
680              after:job_id[[+time][:jobid[+time]...]]
681                     After the specified  jobs  start  or  are  cancelled  and
682                     'time' in minutes from job start or cancellation happens,
683                     this job can begin execution. If no 'time' is given  then
684                     there is no delay after start or cancellation.
685
686              afterany:job_id[:jobid...]
687                     This  job  can  begin  execution after the specified jobs
688                     have terminated.
689
690              afterburstbuffer:job_id[:jobid...]
691                     This job can begin execution  after  the  specified  jobs
692                     have terminated and any associated burst buffer stage out
693                     operations have completed.
694
695              aftercorr:job_id[:jobid...]
696                     A task of this job array can begin  execution  after  the
697                     corresponding  task ID in the specified job has completed
698                     successfully (ran to completion  with  an  exit  code  of
699                     zero).
700
701              afternotok:job_id[:jobid...]
702                     This  job  can  begin  execution after the specified jobs
703                     have terminated in some failed state (non-zero exit code,
704                     node failure, timed out, etc).
705
706              afterok:job_id[:jobid...]
707                     This  job  can  begin  execution after the specified jobs
708                     have successfully executed (ran  to  completion  with  an
709                     exit code of zero).
710
711              expand:job_id
712                     Resources  allocated to this job should be used to expand
713                     the specified job.  The job to expand must share the same
714                     QOS  (Quality of Service) and partition.  Gang scheduling
715                     of resources in the  partition  is  also  not  supported.
716                     "expand" is not allowed for jobs that didn't originate on
717                     the same cluster as the submitted job.
718
719              singleton
720                     This  job  can  begin  execution  after  any   previously
721                     launched  jobs  sharing  the  same job name and user have
722                     terminated.  In other words, only one job  by  that  name
723                     and owned by that user can be running or suspended at any
724                     point in time.  In a federation, a  singleton  dependency
725                     must be fulfilled on all clusters unless DependencyParam‐
726                     eters=disable_remote_singleton is used in slurm.conf.
727
728
729       -D, --chdir=<path>
730              Have the remote processes do a chdir to  path  before  beginning
731              execution. The default is to chdir to the current working direc‐
732              tory of the srun process. The path can be specified as full path
733              or relative path to the directory where the command is executed.
734              This option applies to job allocations.
735
736
737       -e, --error=<filename pattern>
738              Specify how stderr is to be redirected. By default  in  interac‐
739              tive  mode, srun redirects stderr to the same file as stdout, if
740              one is specified. The --error option is provided to allow stdout
741              and  stderr to be redirected to different locations.  See IO Re‐
742              direction below for more options.  If the specified file already
743              exists,  it  will be overwritten. This option applies to job and
744              step allocations.
745
746
747       -E, --preserve-env
748              Pass the current values of environment variables SLURM_JOB_NODES
749              and  SLURM_NTASKS through to the executable, rather than comput‐
750              ing them from commandline parameters. This option applies to job
751              allocations.
752
753
754       --exact
755              Allow  a  step  access  to  only the resources requested for the
756              step.  By default, all non-GRES resources on each  node  in  the
757              step  allocation  will be used. Note that no other parallel step
758              will have access to those resources unless --overlap  is  speci‐
759              fied. This option applies to step allocations.
760
761
762       --epilog=<executable>
763              srun will run executable just after the job step completes.  The
764              command line arguments for executable will be  the  command  and
765              arguments  of  the  job  step.  If executable is "none", then no
766              srun epilog will be run. This parameter overrides the SrunEpilog
767              parameter  in  slurm.conf. This parameter is completely indepen‐
768              dent from  the  Epilog  parameter  in  slurm.conf.  This  option
769              applies to job allocations.
770
771
772
773       --exclusive[=user|mcs]
774              This option applies to job and job step allocations, and has two
775              slightly different meanings for each one.  When used to initiate
776              a  job, the job allocation cannot share nodes with other running
777              jobs  (or just other users with the  "=user"  option  or  "=mcs"
778              option).   The default shared/exclusive behavior depends on sys‐
779              tem configuration and the partition's OverSubscribe option takes
780              precedence over the job's option.
781
782              This  option  can also be used when initiating more than one job
783              step within an existing resource allocation (default), where you
784              want  separate  processors  to be dedicated to each job step. If
785              sufficient processors are not  available  to  initiate  the  job
786              step, it will be deferred. This can be thought of as providing a
787              mechanism for resource management to the job within its  alloca‐
788              tion (--exact implied).
789
790              The  exclusive  allocation  of  CPUs  applies  to  job  steps by
791              default. In order to  share  the  resources  use  the  --overlap
792              option.
793
794              See EXAMPLE below.
795
796
797       --export=<[ALL,]environment variables|ALL|NONE>
798              Identify  which  environment variables from the submission envi‐
799              ronment are propagated to the launched application.
800
801              --export=ALL
802                        Default mode if --export is not specified. All of  the
803                        users environment will be loaded from callers environ‐
804                        ment.
805
806              --export=NONE
807                        None of the user environment  will  be  defined.  User
808                        must  use  absolute  path to the binary to be executed
809                        that will define the environment. User can not specify
810                        explicit environment variables with NONE.
811                        This  option  is  particularly important for jobs that
812                        are submitted on one cluster and execute on a  differ‐
813                        ent  cluster  (e.g.  with  different paths).  To avoid
814                        steps inheriting  environment  export  settings  (e.g.
815                        NONE)  from sbatch command, either set --export=ALL or
816                        the environment variable  SLURM_EXPORT_ENV  should  be
817                        set to ALL.
818
819              --export=<[ALL,]environment variables>
820                        Exports  all  SLURM*  environment variables along with
821                        explicitly  defined  variables.  Multiple  environment
822                        variable names should be comma separated.  Environment
823                        variable names may be specified to propagate the  cur‐
824                        rent value (e.g. "--export=EDITOR") or specific values
825                        may be exported  (e.g.  "--export=EDITOR=/bin/emacs").
826                        If  ALL  is specified, then all user environment vari‐
827                        ables will be loaded and will take precedence over any
828                        explicitly given environment variables.
829
830                   Example: --export=EDITOR,ARG1=test
831                        In  this example, the propagated environment will only
832                        contain the variable EDITOR from the  user's  environ‐
833                        ment, SLURM_* environment variables, and ARG1=test.
834
835                   Example: --export=ALL,EDITOR=/bin/emacs
836                        There  are  two possible outcomes for this example. If
837                        the  caller  has  the  EDITOR   environment   variable
838                        defined,  then  the job's environment will inherit the
839                        variable from the caller's environment.  If the caller
840                        doesn't  have an environment variable defined for EDI‐
841                        TOR, then the job's environment  will  use  the  value
842                        given by --export.
843
844
845       -F, --nodefile=<node file>
846              Much  like  --nodelist,  but  the list is contained in a file of
847              name node file.  The node names of the list may also span multi‐
848              ple  lines in the file.    Duplicate node names in the file will
849              be ignored.  The order of the node names  in  the  list  is  not
850              important; the node names will be sorted by Slurm.
851
852
853       --gid=<group>
854              If srun is run as root, and the --gid option is used, submit the
855              job with group's group access permissions.   group  may  be  the
856              group name or the numerical group ID. This option applies to job
857              allocations.
858
859
860       -G, --gpus=[<type>:]<number>
861              Specify the total number of  GPUs  required  for  the  job.   An
862              optional  GPU  type  specification can be supplied.  For example
863              "--gpus=volta:3".  Multiple options can be requested in a  comma
864              separated  list,  for  example:  "--gpus=volta:3,kepler:1".  See
865              also the --gpus-per-node, --gpus-per-socket and  --gpus-per-task
866              options.
867
868
869       --gpu-bind=[verbose,]<type>
870              Bind  tasks to specific GPUs.  By default every spawned task can
871              access every GPU allocated to the job.  If "verbose," is  speci‐
872              fied before <type>, then print out GPU binding information.
873
874              Supported type options:
875
876              closest   Bind  each task to the GPU(s) which are closest.  In a
877                        NUMA environment, each task may be bound to more  than
878                        one GPU (i.e.  all GPUs in that NUMA environment).
879
880              map_gpu:<list>
881                        Bind by setting GPU masks on tasks (or ranks) as spec‐
882                        ified            where            <list>            is
883                        <gpu_id_for_task_0>,<gpu_id_for_task_1>,...   GPU  IDs
884                        are interpreted as decimal values unless they are pre‐
885                        ceded  with  '0x'  in  which  case they interpreted as
886                        hexadecimal values. If the number of tasks (or  ranks)
887                        exceeds  the number of elements in this list, elements
888                        in the list will be reused as needed starting from the
889                        beginning  of  the list. To simplify support for large
890                        task counts, the lists may follow a map with an aster‐
891                        isk     and    repetition    count.     For    example
892                        "map_gpu:0*4,1*4".  If the task/cgroup plugin is  used
893                        and  ConstrainDevices  is set in cgroup.conf, then the
894                        GPU IDs are zero-based indexes relative  to  the  GPUs
895                        allocated to the job (e.g. the first GPU is 0, even if
896                        the global ID is 3). Otherwise, the GPU IDs are global
897                        IDs,  and  all  GPUs on each node in the job should be
898                        allocated for predictable binding results.
899
900              mask_gpu:<list>
901                        Bind by setting GPU masks on tasks (or ranks) as spec‐
902                        ified            where            <list>            is
903                        <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,...    The
904                        mapping  is specified for a node and identical mapping
905                        is applied to the tasks on every node (i.e. the lowest
906                        task ID on each node is mapped to the first mask spec‐
907                        ified in the list, etc.). GPU masks are always  inter‐
908                        preted  as hexadecimal values but can be preceded with
909                        an optional '0x'. To simplify support for  large  task
910                        counts,  the  lists  may follow a map with an asterisk
911                        and      repetition      count.       For      example
912                        "mask_gpu:0x0f*4,0xf0*4".   If  the task/cgroup plugin
913                        is used and ConstrainDevices is  set  in  cgroup.conf,
914                        then  the  GPU  IDs are zero-based indexes relative to
915                        the GPUs allocated to the job (e.g. the first  GPU  is
916                        0, even if the global ID is 3). Otherwise, the GPU IDs
917                        are global IDs, and all GPUs on each node in  the  job
918                        should be allocated for predictable binding results.
919
920              single:<tasks_per_gpu>
921                        Like  --gpu-bind=closest,  except  that  each task can
922                        only be bound to a single GPU, even  when  it  can  be
923                        bound  to  multiple  GPUs that are equally close.  The
924                        GPU to bind to is determined by <tasks_per_gpu>, where
925                        the first <tasks_per_gpu> tasks are bound to the first
926                        GPU available, the second  <tasks_per_gpu>  tasks  are
927                        bound to the second GPU available, etc.  This is basi‐
928                        cally a block distribution  of  tasks  onto  available
929                        GPUs,  where  the available GPUs are determined by the
930                        socket affinity of the task and the socket affinity of
931                        the GPUs as specified in gres.conf's Cores parameter.
932
933
934       --gpu-freq=[<type]=value>[,<type=value>][,verbose]
935              Request  that GPUs allocated to the job are configured with spe‐
936              cific frequency values.  This option can  be  used  to  indepen‐
937              dently  configure the GPU and its memory frequencies.  After the
938              job is completed, the frequencies of all affected GPUs  will  be
939              reset  to  the  highest  possible values.  In some cases, system
940              power caps may override the requested values.   The  field  type
941              can be "memory".  If type is not specified, the GPU frequency is
942              implied.  The value field can either be "low", "medium", "high",
943              "highm1"  or  a numeric value in megahertz (MHz).  If the speci‐
944              fied numeric value is not possible, a value as close as possible
945              will  be used. See below for definition of the values.  The ver‐
946              bose option causes  current  GPU  frequency  information  to  be
947              logged.  Examples of use include "--gpu-freq=medium,memory=high"
948              and "--gpu-freq=450".
949
950              Supported value definitions:
951
952              low       the lowest available frequency.
953
954              medium    attempts to set a  frequency  in  the  middle  of  the
955                        available range.
956
957              high      the highest available frequency.
958
959              highm1    (high  minus  one) will select the next highest avail‐
960                        able frequency.
961
962
963       --gpus-per-node=[<type>:]<number>
964              Specify the number of GPUs required for the  job  on  each  node
965              included in the job's resource allocation.  An optional GPU type
966              specification     can     be     supplied.      For      example
967              "--gpus-per-node=volta:3".  Multiple options can be requested in
968              a      comma      separated       list,       for       example:
969              "--gpus-per-node=volta:3,kepler:1".    See   also   the  --gpus,
970              --gpus-per-socket and --gpus-per-task options.
971
972
973       --gpus-per-socket=[<type>:]<number>
974              Specify the number of GPUs required for the job on  each  socket
975              included in the job's resource allocation.  An optional GPU type
976              specification     can     be     supplied.      For      example
977              "--gpus-per-socket=volta:3".   Multiple options can be requested
978              in     a     comma     separated     list,     for      example:
979              "--gpus-per-socket=volta:3,kepler:1".  Requires job to specify a
980              sockets per node count  (  --sockets-per-node).   See  also  the
981              --gpus,   --gpus-per-node  and  --gpus-per-task  options.   This
982              option applies to job allocations.
983
984
985       --gpus-per-task=[<type>:]<number>
986              Specify the number of GPUs required for the job on each task  to
987              be  spawned  in  the job's resource allocation.  An optional GPU
988              type   specification   can    be    supplied.     For    example
989              "--gpus-per-task=volta:1".  Multiple options can be requested in
990              a      comma      separated       list,       for       example:
991              "--gpus-per-task=volta:3,kepler:1".   See   also   the   --gpus,
992              --gpus-per-socket  and  --gpus-per-node  options.   This  option
993              requires  an explicit task count, e.g. -n, --ntasks or "--gpus=X
994              --gpus-per-task=Y" rather than an ambiguous range of nodes  with
995              -N, --nodes.
996              NOTE:  This  option  will  not  have  any impact on GPU binding,
997              specifically it won't  limit  the  number  of  devices  set  for
998              CUDA_VISIBLE_DEVICES.
999
1000
1001       --gres=<list>
1002              Specifies   a   comma   delimited  list  of  generic  consumable
1003              resources.   The  format  of  each  entry   on   the   list   is
1004              "name[[:type]:count]".   The  name  is  that  of  the consumable
1005              resource.  The count is the number of  those  resources  with  a
1006              default  value  of 1.  The count can have a suffix of "k" or "K"
1007              (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
1008              "G"  (multiple  of  1024 x 1024 x 1024), "t" or "T" (multiple of
1009              1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x  1024
1010              x  1024  x  1024 x 1024).  The specified resources will be allo‐
1011              cated to the job on each node.  The available generic consumable
1012              resources  is  configurable by the system administrator.  A list
1013              of available generic consumable resources will  be  printed  and
1014              the  command  will exit if the option argument is "help".  Exam‐
1015              ples of use include "--gres=gpu:2,mic:1", "--gres=gpu:kepler:2",
1016              and  "--gres=help".   NOTE:  This option applies to job and step
1017              allocations. By default, a job step  is  allocated  all  of  the
1018              generic  resources  that  have  been  allocated  to the job.  To
1019              change the behavior so  that  each  job  step  is  allocated  no
1020              generic resources, explicitly set the value of --gres to specify
1021              zero counts for each generic resource OR  set  "--gres=none"  OR
1022              set the SLURM_STEP_GRES environment variable to "none".
1023
1024
1025       --gres-flags=<type>
1026              Specify  generic  resource  task  binding  options.  This option
1027              applies to job allocations.
1028
1029              disable-binding
1030                     Disable  filtering  of  CPUs  with  respect  to   generic
1031                     resource  locality.  This option is currently required to
1032                     use more CPUs than are bound to a GRES (i.e. if a GPU  is
1033                     bound  to  the  CPUs on one socket, but resources on more
1034                     than one socket are  required  to  run  the  job).   This
1035                     option  may permit a job to be allocated resources sooner
1036                     than otherwise possible, but may result in lower job per‐
1037                     formance.
1038                     NOTE: This option is specific to SelectType=cons_res.
1039
1040              enforce-binding
1041                     The only CPUs available to the job will be those bound to
1042                     the selected  GRES  (i.e.  the  CPUs  identified  in  the
1043                     gres.conf  file  will  be strictly enforced). This option
1044                     may result in delayed initiation of a job.  For example a
1045                     job  requiring two GPUs and one CPU will be delayed until
1046                     both GPUs on a single socket are  available  rather  than
1047                     using GPUs bound to separate sockets, however, the appli‐
1048                     cation performance may be improved due to improved commu‐
1049                     nication  speed.  Requires the node to be configured with
1050                     more than one socket and resource filtering will be  per‐
1051                     formed on a per-socket basis.
1052                     NOTE: This option is specific to SelectType=cons_tres.
1053
1054
1055       -H, --hold
1056              Specify  the job is to be submitted in a held state (priority of
1057              zero).  A held job can now be released using scontrol  to  reset
1058              its  priority  (e.g.  "scontrol  release <job_id>"). This option
1059              applies to job allocations.
1060
1061
1062       -h, --help
1063              Display help information and exit.
1064
1065
1066       --hint=<type>
1067              Bind tasks according to application hints.
1068              NOTE: This option cannot be used  in  conjunction  with  any  of
1069              --ntasks-per-core,  --threads-per-core,  --cpu-bind  (other than
1070              --cpu-bind=verbose) or -B. If --hint is specified as  a  command
1071              line argument, it will take precedence over the environment.
1072
1073              compute_bound
1074                     Select  settings  for compute bound applications: use all
1075                     cores in each socket, one thread per core.
1076
1077              memory_bound
1078                     Select settings for memory bound applications:  use  only
1079                     one core in each socket, one thread per core.
1080
1081              [no]multithread
1082                     [don't]  use  extra  threads with in-core multi-threading
1083                     which can benefit communication  intensive  applications.
1084                     Only supported with the task/affinity plugin.
1085
1086              help   show this help message
1087
1088              This option applies to job allocations.
1089
1090
1091       -I, --immediate[=<seconds>]
1092              exit if resources are not available within the time period spec‐
1093              ified.  If  no  argument  is  given  (seconds  defaults  to  1),
1094              resources  must be available immediately for the request to suc‐
1095              ceed. If defer is configured  in  SchedulerParameters  and  sec‐
1096              onds=1  the allocation request will fail immediately; defer con‐
1097              flicts and takes  precedence  over  this  option.   By  default,
1098              --immediate  is  off, and the command will block until resources
1099              become available. Since this option's argument is optional,  for
1100              proper parsing the single letter option must be followed immedi‐
1101              ately with the value and not include a space between  them.  For
1102              example  "-I60"  and not "-I 60". This option applies to job and
1103              step allocations.
1104
1105
1106       -i, --input=<mode>
1107              Specify how stdin is to redirected. By default,  srun  redirects
1108              stdin  from the terminal all tasks. See IO Redirection below for
1109              more options.  For OS X, the poll() function  does  not  support
1110              stdin,  so  input  from  a terminal is not possible. This option
1111              applies to job and step allocations.
1112
1113
1114       -J, --job-name=<jobname>
1115              Specify a name for the job. The specified name will appear along
1116              with the job id number when querying running jobs on the system.
1117              The default is the supplied  executable  program's  name.  NOTE:
1118              This  information  may be written to the slurm_jobacct.log file.
1119              This file is space delimited so if a space is used in  the  job‐
1120              name name it will cause problems in properly displaying the con‐
1121              tents of the slurm_jobacct.log file when the  sacct  command  is
1122              used. This option applies to job and step allocations.
1123
1124
1125       --jobid=<jobid>
1126              Initiate  a  job step under an already allocated job with job id
1127              id.  Using this option will cause srun to behave exactly  as  if
1128              the  SLURM_JOB_ID  environment  variable  was  set.  This option
1129              applies to step allocations.
1130
1131
1132       -K, --kill-on-bad-exit[=0|1]
1133              Controls whether or not to terminate a step if  any  task  exits
1134              with  a non-zero exit code. If this option is not specified, the
1135              default action will be based upon the Slurm configuration param‐
1136              eter of KillOnBadExit. If this option is specified, it will take
1137              precedence over KillOnBadExit. An option argument of  zero  will
1138              not  terminate  the job. A non-zero argument or no argument will
1139              terminate the job.  Note: This option takes precedence over  the
1140              -W,  --wait  option  to  terminate the job immediately if a task
1141              exits with a non-zero exit code.  Since this  option's  argument
1142              is optional, for proper parsing the single letter option must be
1143              followed immediately with the value  and  not  include  a  space
1144              between them. For example "-K1" and not "-K 1".
1145
1146
1147       -k, --no-kill [=off]
1148              Do  not automatically terminate a job if one of the nodes it has
1149              been allocated fails. This option applies to job and step  allo‐
1150              cations.    The   job   will  assume  all  responsibilities  for
1151              fault-tolerance.  Tasks launch using this  option  will  not  be
1152              considered  terminated  (e.g.  -K,  --kill-on-bad-exit  and  -W,
1153              --wait options will have no effect  upon  the  job  step).   The
1154              active  job step (MPI job) will likely suffer a fatal error, but
1155              subsequent job steps may be run if this option is specified.
1156
1157              Specify an optional argument of "off" disable the effect of  the
1158              SLURM_NO_KILL environment variable.
1159
1160              The default action is to terminate the job upon node failure.
1161
1162
1163       -l, --label
1164              Prepend  task number to lines of stdout/err.  The --label option
1165              will prepend lines of output  with  the  remote  task  id.  This
1166              option applies to step allocations.
1167
1168
1169       -L, --licenses=<license>
1170              Specification  of  licenses (or other resources available on all
1171              nodes of the cluster) which  must  be  allocated  to  this  job.
1172              License  names can be followed by a colon and count (the default
1173              count is one).  Multiple license names should be comma separated
1174              (e.g.  "--licenses=foo:4,bar"). This option applies to job allo‐
1175              cations.
1176
1177
1178       -M, --clusters=<string>
1179              Clusters to issue commands to.  Multiple cluster  names  may  be
1180              comma  separated.   The job will be submitted to the one cluster
1181              providing the earliest expected job initiation time. The default
1182              value is the current cluster. A value of 'all' will query to run
1183              on all clusters.  Note the --export option to  control  environ‐
1184              ment  variables  exported between clusters.  This option applies
1185              only to job allocations.  Note that the SlurmDBD must be up  for
1186              this option to work properly.
1187
1188
1189       -m, --distribution=
1190              *|block|cyclic|arbitrary|plane=<options>
1191              [:*|block|cyclic|fcyclic[:*|block|
1192              cyclic|fcyclic]][,Pack|NoPack]
1193
1194              Specify  alternate  distribution  methods  for remote processes.
1195              This option controls the distribution of tasks to the  nodes  on
1196              which  resources  have  been  allocated, and the distribution of
1197              those resources to tasks for binding (task affinity). The  first
1198              distribution  method (before the first ":") controls the distri‐
1199              bution of tasks to nodes.  The second distribution method (after
1200              the  first  ":")  controls  the  distribution  of allocated CPUs
1201              across sockets for binding  to  tasks.  The  third  distribution
1202              method (after the second ":") controls the distribution of allo‐
1203              cated CPUs across cores for binding to tasks.   The  second  and
1204              third distributions apply only if task affinity is enabled.  The
1205              third distribution is supported only if the  task/cgroup  plugin
1206              is  configured.  The default value for each distribution type is
1207              specified by *.
1208
1209              Note that with select/cons_res and select/cons_tres, the  number
1210              of  CPUs  allocated  to  each  socket and node may be different.
1211              Refer  to  https://slurm.schedmd.com/mc_support.html  for   more
1212              information  on  resource  allocation,  distribution of tasks to
1213              nodes, and binding of tasks to CPUs.
1214              First distribution method (distribution of tasks across nodes):
1215
1216
1217              *      Use the default method for distributing  tasks  to  nodes
1218                     (block).
1219
1220              block  The  block distribution method will distribute tasks to a
1221                     node such that consecutive tasks share a node. For  exam‐
1222                     ple,  consider an allocation of three nodes each with two
1223                     cpus. A four-task block distribution  request  will  dis‐
1224                     tribute  those  tasks to the nodes with tasks one and two
1225                     on the first node, task three on  the  second  node,  and
1226                     task  four  on the third node.  Block distribution is the
1227                     default behavior if the number of tasks exceeds the  num‐
1228                     ber of allocated nodes.
1229
1230              cyclic The cyclic distribution method will distribute tasks to a
1231                     node such that consecutive  tasks  are  distributed  over
1232                     consecutive  nodes  (in a round-robin fashion). For exam‐
1233                     ple, consider an allocation of three nodes each with  two
1234                     cpus.  A  four-task cyclic distribution request will dis‐
1235                     tribute those tasks to the nodes with tasks one and  four
1236                     on  the first node, task two on the second node, and task
1237                     three on the third node.  Note that  when  SelectType  is
1238                     select/cons_res, the same number of CPUs may not be allo‐
1239                     cated on each node. Task distribution will be round-robin
1240                     among  all  the  nodes  with  CPUs  yet to be assigned to
1241                     tasks.  Cyclic distribution is the  default  behavior  if
1242                     the number of tasks is no larger than the number of allo‐
1243                     cated nodes.
1244
1245              plane  The tasks are distributed in blocks of a specified  size.
1246                     The  number of tasks distributed to each node is the same
1247                     as for cyclic distribution, but the taskids  assigned  to
1248                     each  node depend on the plane size. Additional distribu‐
1249                     tion specifications cannot be combined with this  option.
1250                     For  more  details  (including  examples  and  diagrams),
1251                     please see
1252                     https://slurm.schedmd.com/mc_support.html
1253                     and
1254                     https://slurm.schedmd.com/dist_plane.html
1255
1256              arbitrary
1257                     The arbitrary method of distribution will  allocate  pro‐
1258                     cesses in-order as listed in file designated by the envi‐
1259                     ronment variable SLURM_HOSTFILE.   If  this  variable  is
1260                     listed  it will over ride any other method specified.  If
1261                     not set the method will default  to  block.   Inside  the
1262                     hostfile  must  contain  at  minimum  the number of hosts
1263                     requested and be one per line  or  comma  separated.   If
1264                     specifying  a  task  count  (-n, --ntasks=<number>), your
1265                     tasks will be laid out on the nodes in the order  of  the
1266                     file.
1267                     NOTE:  The arbitrary distribution option on a job alloca‐
1268                     tion only controls the nodes to be allocated to  the  job
1269                     and  not  the  allocation  of  CPUs  on those nodes. This
1270                     option is meant primarily to control a  job  step's  task
1271                     layout  in  an  existing job allocation for the srun com‐
1272                     mand.
1273                     NOTE: If the number of tasks  is  given  and  a  list  of
1274                     requested  nodes  is also given, the number of nodes used
1275                     from that list will be reduced to match that of the  num‐
1276                     ber  of  tasks  if  the  number  of  nodes in the list is
1277                     greater than the number of tasks.
1278
1279
1280              Second distribution method (distribution of CPUs across  sockets
1281              for binding):
1282
1283
1284              *      Use the default method for distributing CPUs across sock‐
1285                     ets (cyclic).
1286
1287              block  The block distribution method will  distribute  allocated
1288                     CPUs  consecutively  from  the same socket for binding to
1289                     tasks, before using the next consecutive socket.
1290
1291              cyclic The cyclic distribution method will distribute  allocated
1292                     CPUs  for  binding to a given task consecutively from the
1293                     same socket, and from the next consecutive socket for the
1294                     next task, in a round-robin fashion across sockets.
1295
1296              fcyclic
1297                     The fcyclic distribution method will distribute allocated
1298                     CPUs for binding to tasks from consecutive sockets  in  a
1299                     round-robin fashion across the sockets.
1300
1301
1302              Third distribution method (distribution of CPUs across cores for
1303              binding):
1304
1305
1306              *      Use the default method for distributing CPUs across cores
1307                     (inherited from second distribution method).
1308
1309              block  The  block  distribution method will distribute allocated
1310                     CPUs consecutively from the  same  core  for  binding  to
1311                     tasks, before using the next consecutive core.
1312
1313              cyclic The  cyclic distribution method will distribute allocated
1314                     CPUs for binding to a given task consecutively  from  the
1315                     same  core,  and  from  the next consecutive core for the
1316                     next task, in a round-robin fashion across cores.
1317
1318              fcyclic
1319                     The fcyclic distribution method will distribute allocated
1320                     CPUs  for  binding  to  tasks from consecutive cores in a
1321                     round-robin fashion across the cores.
1322
1323
1324
1325              Optional control for task distribution over nodes:
1326
1327
1328              Pack   Rather than evenly distributing a job step's tasks evenly
1329                     across  its allocated nodes, pack them as tightly as pos‐
1330                     sible on the nodes.  This only applies when  the  "block"
1331                     task distribution method is used.
1332
1333              NoPack Rather than packing a job step's tasks as tightly as pos‐
1334                     sible on the nodes, distribute them  evenly.   This  user
1335                     option    will    supersede    the   SelectTypeParameters
1336                     CR_Pack_Nodes configuration parameter.
1337
1338              This option applies to job and step allocations.
1339
1340
1341       --mail-type=<type>
1342              Notify user by email when certain event types occur.  Valid type
1343              values  are  NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1344              BEGIN,  END,  FAIL,  INVALID_DEPEND,  REQUEUE,  and  STAGE_OUT),
1345              INVALID_DEPEND  (dependency  never  satisfied), STAGE_OUT (burst
1346              buffer  stage   out   and   teardown   completed),   TIME_LIMIT,
1347              TIME_LIMIT_90  (reached 90 percent of time limit), TIME_LIMIT_80
1348              (reached 80 percent of time limit), and  TIME_LIMIT_50  (reached
1349              50  percent  of time limit).  Multiple type values may be speci‐
1350              fied in a comma separated list.  The  user  to  be  notified  is
1351              indicated  with  --mail-user. This option applies to job alloca‐
1352              tions.
1353
1354
1355       --mail-user=<user>
1356              User to receive email notification of state changes  as  defined
1357              by  --mail-type.  The default value is the submitting user. This
1358              option applies to job allocations.
1359
1360
1361       --mcs-label=<mcs>
1362              Used only when the mcs/group plugin is enabled.  This  parameter
1363              is  a group among the groups of the user.  Default value is cal‐
1364              culated by the Plugin mcs if it's enabled. This  option  applies
1365              to job allocations.
1366
1367
1368       --mem=<size[units]>
1369              Specify  the  real  memory required per node.  Default units are
1370              megabytes.  Different units can be specified  using  the  suffix
1371              [K|M|G|T].  Default value is DefMemPerNode and the maximum value
1372              is MaxMemPerNode. If configured, both of parameters can be  seen
1373              using  the  scontrol  show config command.  This parameter would
1374              generally be used if whole nodes are allocated to jobs  (Select‐
1375              Type=select/linear).   Specifying  a  memory limit of zero for a
1376              job step will restrict the job step  to  the  amount  of  memory
1377              allocated  to  the  job,  but not remove any of the job's memory
1378              allocation from being available to other job  steps.   Also  see
1379              --mem-per-cpu  and  --mem-per-gpu.  The --mem, --mem-per-cpu and
1380              --mem-per-gpu  options  are  mutually   exclusive.   If   --mem,
1381              --mem-per-cpu  or  --mem-per-gpu  are  specified as command line
1382              arguments, then they will take precedence over  the  environment
1383              (potentially inherited from salloc or sbatch).
1384
1385              NOTE:  A  memory size specification of zero is treated as a spe‐
1386              cial case and grants the job access to all of the memory on each
1387              node  for  newly  submitted jobs and all available job memory to
1388              new job steps.
1389
1390              Specifying new memory limits for job steps are only advisory.
1391
1392              If the job is allocated multiple nodes in a heterogeneous  clus‐
1393              ter,  the  memory limit on each node will be that of the node in
1394              the allocation with the smallest memory size  (same  limit  will
1395              apply to every node in the job's allocation).
1396
1397              NOTE:  Enforcement  of  memory  limits currently relies upon the
1398              task/cgroup plugin or enabling of accounting, which samples mem‐
1399              ory  use on a periodic basis (data need not be stored, just col‐
1400              lected). In both cases memory use is based upon the job's  Resi‐
1401              dent  Set  Size  (RSS). A task may exceed the memory limit until
1402              the next periodic accounting sample.
1403
1404              This option applies to job and step allocations.
1405
1406
1407       --mem-per-cpu=<size[units]>
1408              Minimum memory required per allocated CPU.   Default  units  are
1409              megabytes.   Different  units  can be specified using the suffix
1410              [K|M|G|T].  The default value is DefMemPerCPU  and  the  maximum
1411              value is MaxMemPerCPU (see exception below). If configured, both
1412              parameters can be seen using the scontrol show  config  command.
1413              Note  that  if the job's --mem-per-cpu value exceeds the config‐
1414              ured MaxMemPerCPU, then the user's limit will be  treated  as  a
1415              memory  limit per task; --mem-per-cpu will be reduced to a value
1416              no larger than MaxMemPerCPU; --cpus-per-task will be set and the
1417              value  of  --cpus-per-task  multiplied  by the new --mem-per-cpu
1418              value will equal the original --mem-per-cpu value  specified  by
1419              the  user.  This parameter would generally be used if individual
1420              processors are allocated to  jobs  (SelectType=select/cons_res).
1421              If resources are allocated by core, socket, or whole nodes, then
1422              the number of CPUs allocated to a job may  be  higher  than  the
1423              task  count  and  the  value of --mem-per-cpu should be adjusted
1424              accordingly.  Specifying a memory limit of zero for a  job  step
1425              will  restrict the job step to the amount of memory allocated to
1426              the job, but not remove any of the job's memory allocation  from
1427              being  available  to  other  job  steps.   Also  see  --mem  and
1428              --mem-per-gpu.   The  --mem,  --mem-per-cpu  and   --mem-per-gpu
1429              options are mutually exclusive.
1430
1431              NOTE:  If the final amount of memory requested by a job can't be
1432              satisfied by any of the nodes configured in the  partition,  the
1433              job  will  be  rejected.   This could happen if --mem-per-cpu is
1434              used with the  --exclusive  option  for  a  job  allocation  and
1435              --mem-per-cpu times the number of CPUs on a node is greater than
1436              the total memory of that node.
1437
1438
1439       --mem-per-gpu=<size[units]>
1440              Minimum memory required per allocated GPU.   Default  units  are
1441              megabytes.   Different  units  can be specified using the suffix
1442              [K|M|G|T].  Default value is DefMemPerGPU and  is  available  on
1443              both  a  global  and  per  partition  basis.  If configured, the
1444              parameters can be seen using the scontrol show config and  scon‐
1445              trol  show  partition  commands.   Also  see  --mem.  The --mem,
1446              --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1447
1448
1449       --mem-bind=[{quiet,verbose},]type
1450              Bind tasks to memory. Used only when the task/affinity plugin is
1451              enabled  and the NUMA memory functions are available.  Note that
1452              the resolution of CPU and memory  binding  may  differ  on  some
1453              architectures.  For example, CPU binding may be performed at the
1454              level of the cores within a processor while memory binding  will
1455              be  performed  at  the  level  of nodes, where the definition of
1456              "nodes" may differ from system to system.  By default no  memory
1457              binding is performed; any task using any CPU can use any memory.
1458              This option is typically used to ensure that each task is  bound
1459              to  the  memory closest to its assigned CPU. The use of any type
1460              other than "none" or "local" is not recommended.   If  you  want
1461              greater control, try running a simple test code with the options
1462              "--cpu-bind=verbose,none --mem-bind=verbose,none"  to  determine
1463              the specific configuration.
1464
1465              NOTE: To have Slurm always report on the selected memory binding
1466              for all commands executed in a shell,  you  can  enable  verbose
1467              mode by setting the SLURM_MEM_BIND environment variable value to
1468              "verbose".
1469
1470              The following informational environment variables are  set  when
1471              --mem-bind is in use:
1472
1473                   SLURM_MEM_BIND_LIST
1474                   SLURM_MEM_BIND_PREFER
1475                   SLURM_MEM_BIND_SORT
1476                   SLURM_MEM_BIND_TYPE
1477                   SLURM_MEM_BIND_VERBOSE
1478
1479              See  the  ENVIRONMENT  VARIABLES  section  for  a  more detailed
1480              description of the individual SLURM_MEM_BIND* variables.
1481
1482              Supported options include:
1483
1484              help   show this help message
1485
1486              local  Use memory local to the processor in use
1487
1488              map_mem:<list>
1489                     Bind by setting memory masks on tasks (or ranks) as spec‐
1490                     ified             where             <list>             is
1491                     <numa_id_for_task_0>,<numa_id_for_task_1>,...   The  map‐
1492                     ping  is  specified  for  a node and identical mapping is
1493                     applied to the tasks on every node (i.e. the lowest  task
1494                     ID  on  each  node is mapped to the first ID specified in
1495                     the list, etc.).  NUMA IDs  are  interpreted  as  decimal
1496                     values  unless  they are preceded with '0x' in which case
1497                     they interpreted as hexadecimal values.  If the number of
1498                     tasks  (or  ranks) exceeds the number of elements in this
1499                     list, elements in the  list  will  be  reused  as  needed
1500                     starting  from  the  beginning  of the list.  To simplify
1501                     support for large task counts, the lists may follow a map
1502                     with  an  asterisk  and  repetition  count.   For example
1503                     "map_mem:0x0f*4,0xf0*4".    For    predictable    binding
1504                     results,  all  CPUs  for  each  node in the job should be
1505                     allocated to the job.
1506
1507              mask_mem:<list>
1508                     Bind by setting memory masks on tasks (or ranks) as spec‐
1509                     ified             where             <list>             is
1510                     <numa_mask_for_task_0>,<numa_mask_for_task_1>,...     The
1511                     mapping  is specified for a node and identical mapping is
1512                     applied to the tasks on every node (i.e. the lowest  task
1513                     ID  on each node is mapped to the first mask specified in
1514                     the list, etc.).  NUMA masks are  always  interpreted  as
1515                     hexadecimal  values.   Note  that  masks must be preceded
1516                     with a '0x' if they don't begin with [0-9]  so  they  are
1517                     seen  as  numerical  values.   If the number of tasks (or
1518                     ranks) exceeds the number of elements in this list,  ele‐
1519                     ments  in the list will be reused as needed starting from
1520                     the beginning of the list.  To simplify support for large
1521                     task counts, the lists may follow a mask with an asterisk
1522                     and repetition count.   For  example  "mask_mem:0*4,1*4".
1523                     For  predictable  binding results, all CPUs for each node
1524                     in the job should be allocated to the job.
1525
1526              no[ne] don't bind tasks to memory (default)
1527
1528              nosort avoid sorting free cache pages (default, LaunchParameters
1529                     configuration parameter can override this default)
1530
1531              p[refer]
1532                     Prefer use of first specified NUMA node, but permit
1533                      use of other available NUMA nodes.
1534
1535              q[uiet]
1536                     quietly bind before task runs (default)
1537
1538              rank   bind by task rank (not recommended)
1539
1540              sort   sort free cache pages (run zonesort on Intel KNL nodes)
1541
1542              v[erbose]
1543                     verbosely report binding before task runs
1544
1545              This option applies to job and step allocations.
1546
1547
1548       --mincpus=<n>
1549              Specify  a  minimum  number of logical cpus/processors per node.
1550              This option applies to job allocations.
1551
1552
1553       --msg-timeout=<seconds>
1554              Modify the job launch message timeout.   The  default  value  is
1555              MessageTimeout  in  the  Slurm  configuration  file  slurm.conf.
1556              Changes to this are typically not recommended, but could be use‐
1557              ful  to  diagnose  problems.  This option applies to job alloca‐
1558              tions.
1559
1560
1561       --mpi=<mpi_type>
1562              Identify the type of MPI to be used. May result in unique initi‐
1563              ation procedures.
1564
1565              list   Lists available mpi types to choose from.
1566
1567              pmi2   To  enable  PMI2 support. The PMI2 support in Slurm works
1568                     only if the MPI  implementation  supports  it,  in  other
1569                     words  if the MPI has the PMI2 interface implemented. The
1570                     --mpi=pmi2 will load  the  library  lib/slurm/mpi_pmi2.so
1571                     which  provides  the  server  side  functionality but the
1572                     client side must  implement  PMI2_Init()  and  the  other
1573                     interface calls.
1574
1575              pmix   To enable PMIx support (https://pmix.github.io). The PMIx
1576                     support in Slurm can be used to launch parallel  applica‐
1577                     tions (e.g. MPI) if it supports PMIx, PMI2 or PMI1. Slurm
1578                     must be configured with pmix support by passing  "--with-
1579                     pmix=<PMIx  installation  path>" option to its "./config‐
1580                     ure" script.
1581
1582                     At the time of writing PMIx  is  supported  in  Open  MPI
1583                     starting  from  version 2.0.  PMIx also supports backward
1584                     compatibility with PMI1 and PMI2 and can be used  if  MPI
1585                     was  configured  with  PMI2/PMI1  support pointing to the
1586                     PMIx library ("libpmix").  If MPI supports PMI1/PMI2  but
1587                     doesn't  provide the way to point to a specific implemen‐
1588                     tation, a hack'ish solution leveraging LD_PRELOAD can  be
1589                     used to force "libpmix" usage.
1590
1591
1592              none   No  special MPI processing. This is the default and works
1593                     with many other versions of MPI.
1594
1595              This option applies to step allocations.
1596
1597
1598       --multi-prog
1599              Run a job with different programs and  different  arguments  for
1600              each  task.  In  this  case, the executable program specified is
1601              actually a configuration  file  specifying  the  executable  and
1602              arguments  for  each  task.  See  MULTIPLE PROGRAM CONFIGURATION
1603              below for details  on  the  configuration  file  contents.  This
1604              option applies to step allocations.
1605
1606
1607       -N, --nodes=<minnodes[-maxnodes]>
1608              Request  that  a  minimum of minnodes nodes be allocated to this
1609              job.  A maximum node count may also be specified with  maxnodes.
1610              If  only one number is specified, this is used as both the mini‐
1611              mum and maximum node count.  The partition's node limits  super‐
1612              sede  those  of  the job.  If a job's node limits are outside of
1613              the range permitted for its associated partition, the  job  will
1614              be  left in a PENDING state.  This permits possible execution at
1615              a later time, when the partition limit is  changed.   If  a  job
1616              node  limit exceeds the number of nodes configured in the parti‐
1617              tion, the job will be rejected.  Note that the environment vari‐
1618              able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1619              ibility) will be set to the count of nodes actually allocated to
1620              the job. See the ENVIRONMENT VARIABLES section for more informa‐
1621              tion.  If -N is not specified, the default behavior is to  allo‐
1622              cate  enough  nodes to satisfy the requirements of the -n and -c
1623              options.  The job will be allocated as many  nodes  as  possible
1624              within  the  range specified and without delaying the initiation
1625              of the job.  If the number of tasks is given  and  a  number  of
1626              requested  nodes  is  also  given, the number of nodes used from
1627              that request will be reduced to match  that  of  the  number  of
1628              tasks  if the number of nodes in the request is greater than the
1629              number of tasks.  The node count  specification  may  include  a
1630              numeric  value  followed  by a suffix of "k" (multiplies numeric
1631              value by 1,024) or "m" (multiplies numeric value by  1,048,576).
1632              This option applies to job and step allocations.
1633
1634
1635       -n, --ntasks=<number>
1636              Specify  the  number of tasks to run. Request that srun allocate
1637              resources for ntasks tasks.  The default is one task  per  node,
1638              but  note  that  the  --cpus-per-task  option  will  change this
1639              default. This option applies to job and step allocations.
1640
1641
1642       --network=<type>
1643              Specify information pertaining to the switch  or  network.   The
1644              interpretation of type is system dependent.  This option is sup‐
1645              ported when running Slurm on a Cray natively.   It  is  used  to
1646              request  using Network Performance Counters.  Only one value per
1647              request is valid.  All options are case in-sensitive.   In  this
1648              configuration supported values include:
1649
1650              system
1651                    Use  the  system-wide  network  performance counters. Only
1652                    nodes requested will be marked in use for the job  alloca‐
1653                    tion.   If  the job does not fill up the entire system the
1654                    rest of the nodes are not able to be used  by  other  jobs
1655                    using  NPC,  if  idle their state will appear as PerfCnts.
1656                    These nodes are still available for other jobs  not  using
1657                    NPC.
1658
1659              blade Use  the  blade  network  performance counters. Only nodes
1660                    requested will be marked in use for  the  job  allocation.
1661                    If  the job does not fill up the entire blade(s) allocated
1662                    to the job those blade(s) are not able to be used by other
1663                    jobs  using NPC, if idle their state will appear as PerfC‐
1664                    nts.  These nodes are still available for other  jobs  not
1665                    using NPC.
1666
1667
1668              In all cases the job allocation request must specify the
1669              --exclusive  option  and  the  step cannot specify the --overlap
1670              option.  Otherwise the request will be denied.
1671
1672              Also with any of these options steps are not  allowed  to  share
1673              blades,  so  resources would remain idle inside an allocation if
1674              the step running on a blade does not take up all  the  nodes  on
1675              the blade.
1676
1677              The  network option is also supported on systems with IBM's Par‐
1678              allel Environment (PE).  See IBM's LoadLeveler job command  key‐
1679              word documentation about the keyword "network" for more informa‐
1680              tion.  Multiple values may be specified  in  a  comma  separated
1681              list.   All  options  are  case  in-sensitive.  Supported values
1682              include:
1683
1684              BULK_XFER[=<resources>]
1685                          Enable bulk transfer of data  using  Remote  Direct-
1686                          Memory Access (RDMA).  The optional resources speci‐
1687                          fication is a numeric value which can have a  suffix
1688                          of  "k",  "K",  "m",  "M", "g" or "G" for kilobytes,
1689                          megabytes or gigabytes.  NOTE: The resources  speci‐
1690                          fication  is not supported by the underlying IBM in‐
1691                          frastructure as of Parallel Environment version  2.2
1692                          and  no value should be specified at this time.  The
1693                          devices allocated to a job must all be of  the  same
1694                          type.   The  default value depends upon depends upon
1695                          what hardware is available and in order  of  prefer‐
1696                          ences  is  IPONLY  (which  is not considered in User
1697                          Space mode), HFI, IB, HPCE, and KMUX.
1698
1699              CAU=<count> Number  of  Collective  Acceleration   Units   (CAU)
1700                          required.  Applies only to IBM Power7-IH processors.
1701                          Default value is  zero.   Independent  CAU  will  be
1702                          allocated for each programming interface (MPI, LAPI,
1703                          etc.)
1704
1705              DEVNAME=<name>
1706                          Specify the device name to  use  for  communications
1707                          (e.g. "eth0" or "mlx4_0").
1708
1709              DEVTYPE=<type>
1710                          Specify  the  device type to use for communications.
1711                          The supported values of type are: "IB" (InfiniBand),
1712                          "HFI"  (P7 Host Fabric Interface), "IPONLY" (IP-Only
1713                          interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1714                          nel  Emulation of HPCE).  The devices allocated to a
1715                          job must all be of the same type.  The default value
1716                          depends upon depends upon what hardware is available
1717                          and in order of preferences is IPONLY (which is  not
1718                          considered  in  User Space mode), HFI, IB, HPCE, and
1719                          KMUX.
1720
1721              IMMED =<count>
1722                          Number of immediate send slots per window  required.
1723                          Applies  only  to IBM Power7-IH processors.  Default
1724                          value is zero.
1725
1726              INSTANCES =<count>
1727                          Specify number of network connections for each  task
1728                          on  each  network  connection.  The default instance
1729                          count is 1.
1730
1731              IPV4        Use Internet Protocol (IP) version 4  communications
1732                          (default).
1733
1734              IPV6        Use Internet Protocol (IP) version 6 communications.
1735
1736              LAPI        Use the LAPI programming interface.
1737
1738              MPI         Use  the  MPI  programming  interface.   MPI  is the
1739                          default interface.
1740
1741              PAMI        Use the PAMI programming interface.
1742
1743              SHMEM       Use the OpenSHMEM programming interface.
1744
1745              SN_ALL      Use all available switch networks (default).
1746
1747              SN_SINGLE   Use one available switch network.
1748
1749              UPC         Use the UPC programming interface.
1750
1751              US          Use User Space communications.
1752
1753
1754              Some examples of network specifications:
1755
1756              Instances=2,US,MPI,SN_ALL
1757                          Create two user space connections for MPI communica‐
1758                          tions on every switch network for each task.
1759
1760              US,MPI,Instances=3,Devtype=IB
1761                          Create three user space connections for MPI communi‐
1762                          cations on every InfiniBand network for each task.
1763
1764              IPV4,LAPI,SN_Single
1765                          Create a IP version 4 connection for LAPI communica‐
1766                          tions on one switch network for each task.
1767
1768              Instances=2,US,LAPI,MPI
1769                          Create  two user space connections each for LAPI and
1770                          MPI communications on every switch network for  each
1771                          task.  Note  that  SN_ALL  is  the default option so
1772                          every  switch  network  is  used.  Also  note   that
1773                          Instances=2   specifies  that  two  connections  are
1774                          established for each protocol  (LAPI  and  MPI)  and
1775                          each task.  If there are two networks and four tasks
1776                          on the node then  a  total  of  32  connections  are
1777                          established  (2 instances x 2 protocols x 2 networks
1778                          x 4 tasks).
1779
1780              This option applies to job and step allocations.
1781
1782
1783       --nice[=adjustment]
1784              Run the job with an adjusted scheduling priority  within  Slurm.
1785              With no adjustment value the scheduling priority is decreased by
1786              100. A negative nice value  increases  the  priority,  otherwise
1787              decreases it. The adjustment range is +/- 2147483645. Only priv‐
1788              ileged users can specify a negative adjustment.
1789
1790
1791       --ntasks-per-core=<ntasks>
1792              Request the maximum ntasks be invoked on each core.  This option
1793              applies  to  the  job  allocation,  but not to step allocations.
1794              Meant  to  be  used  with  the  --ntasks  option.   Related   to
1795              --ntasks-per-node  except  at the core level instead of the node
1796              level.  Masks will automatically be generated to bind the  tasks
1797              to  specific  cores  unless --cpu-bind=none is specified.  NOTE:
1798              This option is not supported unless SelectType=cons_res is  con‐
1799              figured  (either  directly  or indirectly on Cray systems) along
1800              with the node's core count.
1801
1802
1803       --ntasks-per-gpu=<ntasks>
1804              Request that there are ntasks tasks invoked for every GPU.  This
1805              option can work in two ways: 1) either specify --ntasks in addi‐
1806              tion, in which case a type-less GPU specification will be  auto‐
1807              matically  determined to satisfy --ntasks-per-gpu, or 2) specify
1808              the GPUs wanted (e.g. via --gpus or --gres)  without  specifying
1809              --ntasks,  and the total task count will be automatically deter‐
1810              mined.   The  number  of  CPUs  needed  will  be   automatically
1811              increased  if  necessary to allow for any calculated task count.
1812              This option will implicitly set --gpu-bind=single:<ntasks>,  but
1813              that  can  be  overridden with an explicit --gpu-bind specifica‐
1814              tion.  This option is not compatible with  a  node  range  (i.e.
1815              -N<minnodes-maxnodes>).   This  option  is  not  compatible with
1816              --gpus-per-task, --gpus-per-socket, or --ntasks-per-node.   This
1817              option  is  not supported unless SelectType=cons_tres is config‐
1818              ured (either directly or indirectly on Cray systems).
1819
1820
1821       --ntasks-per-node=<ntasks>
1822              Request that ntasks be invoked on each node.  If used  with  the
1823              --ntasks  option,  the  --ntasks option will take precedence and
1824              the --ntasks-per-node will be treated  as  a  maximum  count  of
1825              tasks per node.  Meant to be used with the --nodes option.  This
1826              is related to --cpus-per-task=ncpus, but does not require knowl‐
1827              edge  of the actual number of cpus on each node.  In some cases,
1828              it is more convenient to be able to request that no more than  a
1829              specific  number  of tasks be invoked on each node.  Examples of
1830              this include submitting a hybrid MPI/OpenMP app where  only  one
1831              MPI  "task/rank"  should be assigned to each node while allowing
1832              the OpenMP portion to utilize all of the parallelism present  in
1833              the node, or submitting a single setup/cleanup/monitoring job to
1834              each node of a pre-existing allocation as one step in  a  larger
1835              job script. This option applies to job allocations.
1836
1837
1838       --ntasks-per-socket=<ntasks>
1839              Request  the  maximum  ntasks  be  invoked on each socket.  This
1840              option applies to the job allocation, but not  to  step  alloca‐
1841              tions.   Meant  to be used with the --ntasks option.  Related to
1842              --ntasks-per-node except at the socket level instead of the node
1843              level.   Masks will automatically be generated to bind the tasks
1844              to specific sockets unless --cpu-bind=none is specified.   NOTE:
1845              This  option is not supported unless SelectType=cons_res is con‐
1846              figured (either directly or indirectly on  Cray  systems)  along
1847              with the node's socket count.
1848
1849
1850       -O, --overcommit
1851              Overcommit  resources. This option applies to job and step allo‐
1852              cations.  When applied to job allocation, only one CPU is  allo‐
1853              cated to the job per node and options used to specify the number
1854              of tasks per  node,  socket,  core,  etc.   are  ignored.   When
1855              applied  to job step allocations (the srun command when executed
1856              within an existing job allocation), this option can be  used  to
1857              launch  more  than  one  task  per CPU.  Normally, srun will not
1858              allocate more than one process per CPU.  By  specifying  --over‐
1859              commit  you  are  explicitly  allowing more than one process per
1860              CPU. However no more than MAX_TASKS_PER_NODE tasks are permitted
1861              to execute per node.  NOTE: MAX_TASKS_PER_NODE is defined in the
1862              file slurm.h and is not a variable, it is  set  at  Slurm  build
1863              time.
1864
1865
1866       --overlap
1867              Allow  steps  to  overlap  each  other on the same resources. By
1868              default steps do not share resources with other parallel steps.
1869
1870
1871       -o, --output=<filename pattern>
1872              Specify  the  "filename  pattern"  for  stdout  redirection.  By
1873              default in interactive mode, srun collects stdout from all tasks
1874              and sends this output via TCP/IP to the attached terminal.  With
1875              --output  stdout  may  be  redirected to a file, to one file per
1876              task, or to /dev/null. See section IO Redirection below for  the
1877              various  forms  of  filename  pattern.   If  the  specified file
1878              already exists, it will be overwritten.
1879
1880              If --error is not also specified on the command line, both  std‐
1881              out  and stderr will directed to the file specified by --output.
1882              This option applies to job and step allocations.
1883
1884
1885       --open-mode=<append|truncate>
1886              Open the output and error files using append or truncate mode as
1887              specified.   For  heterogeneous  job  steps the default value is
1888              "append".  Otherwise the default value is specified by the  sys‐
1889              tem  configuration  parameter JobFileAppend. This option applies
1890              to job and step allocations.
1891
1892
1893       --het-group=<expr>
1894              Identify each component in a heterogeneous  job  allocation  for
1895              which  a  step  is  to be created. Applies only to srun commands
1896              issued inside a salloc allocation or sbatch script.  <expr> is a
1897              set  of integers corresponding to one or more options offsets on
1898              the salloc or sbatch command line.   Examples:  "--het-group=2",
1899              "--het-group=0,4",  "--het-group=1,3-5".   The  default value is
1900              --het-group=0.
1901
1902
1903       -p, --partition=<partition_names>
1904              Request a specific partition for the  resource  allocation.   If
1905              not  specified,  the default behavior is to allow the slurm con‐
1906              troller to select the default partition  as  designated  by  the
1907              system  administrator.  If  the job can use more than one parti‐
1908              tion, specify their names in a comma separate list and  the  one
1909              offering  earliest  initiation will be used with no regard given
1910              to the partition name ordering (although higher priority  parti‐
1911              tions will be considered first).  When the job is initiated, the
1912              name of the partition used will  be  placed  first  in  the  job
1913              record partition string. This option applies to job allocations.
1914
1915
1916       --power=<flags>
1917              Comma  separated  list of power management plugin options.  Cur‐
1918              rently available flags include: level (all  nodes  allocated  to
1919              the job should have identical power caps, may be disabled by the
1920              Slurm configuration option PowerParameters=job_no_level).   This
1921              option applies to job allocations.
1922
1923
1924       --priority=<value>
1925              Request  a  specific job priority.  May be subject to configura‐
1926              tion specific constraints.  value should  either  be  a  numeric
1927              value  or "TOP" (for highest possible value).  Only Slurm opera‐
1928              tors and administrators can set the priority  of  a  job.   This
1929              option applies to job allocations only.
1930
1931
1932       --profile=<all|none|[energy[,|task[,|filesystem[,|network]]]]>
1933              enables  detailed  data  collection  by  the acct_gather_profile
1934              plugin.  Detailed data are typically time-series that are stored
1935              in an HDF5 file for the job or an InfluxDB database depending on
1936              the configured plugin.
1937
1938
1939              All       All data types are collected. (Cannot be combined with
1940                        other values.)
1941
1942
1943              None      No data types are collected. This is the default.
1944                         (Cannot be combined with other values.)
1945
1946
1947              Energy    Energy data is collected.
1948
1949
1950              Task      Task (I/O, Memory, ...) data is collected.
1951
1952
1953              Filesystem
1954                        Filesystem data is collected.
1955
1956
1957              Network   Network (InfiniBand) data is collected.
1958
1959
1960              This option applies to job and step allocations.
1961
1962
1963       --prolog=<executable>
1964              srun  will  run  executable  just before launching the job step.
1965              The command line arguments for executable will  be  the  command
1966              and arguments of the job step.  If executable is "none", then no
1967              srun prolog will be run. This parameter overrides the SrunProlog
1968              parameter  in  slurm.conf. This parameter is completely indepen‐
1969              dent from  the  Prolog  parameter  in  slurm.conf.  This  option
1970              applies to job allocations.
1971
1972
1973       --propagate[=rlimit[,rlimit...]]
1974              Allows  users to specify which of the modifiable (soft) resource
1975              limits to propagate to the compute  nodes  and  apply  to  their
1976              jobs.  If  no rlimit is specified, then all resource limits will
1977              be propagated.  The following  rlimit  names  are  supported  by
1978              Slurm  (although  some options may not be supported on some sys‐
1979              tems):
1980
1981              ALL       All limits listed below (default)
1982
1983              NONE      No limits listed below
1984
1985              AS        The maximum address space for a process
1986
1987              CORE      The maximum size of core file
1988
1989              CPU       The maximum amount of CPU time
1990
1991              DATA      The maximum size of a process's data segment
1992
1993              FSIZE     The maximum size of files created. Note  that  if  the
1994                        user  sets  FSIZE to less than the current size of the
1995                        slurmd.log, job launches will fail with a  'File  size
1996                        limit exceeded' error.
1997
1998              MEMLOCK   The maximum size that may be locked into memory
1999
2000              NOFILE    The maximum number of open files
2001
2002              NPROC     The maximum number of processes available
2003
2004              RSS       The maximum resident set size
2005
2006              STACK     The maximum stack size
2007
2008              This option applies to job allocations.
2009
2010
2011       --pty  Execute  task  zero  in  pseudo  terminal mode.  Implicitly sets
2012              --unbuffered.  Implicitly sets --error and --output to /dev/null
2013              for  all  tasks except task zero, which may cause those tasks to
2014              exit immediately (e.g. shells will typically exit immediately in
2015              that situation).  This option applies to step allocations.
2016
2017
2018       -q, --qos=<qos>
2019              Request  a  quality  of  service for the job.  QOS values can be
2020              defined for each user/cluster/account association in  the  Slurm
2021              database.   Users will be limited to their association's defined
2022              set of qos's when the Slurm  configuration  parameter,  Account‐
2023              ingStorageEnforce, includes "qos" in its definition. This option
2024              applies to job allocations.
2025
2026
2027       -Q, --quiet
2028              Suppress informational messages from srun. Errors will still  be
2029              displayed. This option applies to job and step allocations.
2030
2031
2032       --quit-on-interrupt
2033              Quit  immediately  on single SIGINT (Ctrl-C). Use of this option
2034              disables  the  status  feature  normally  available  when   srun
2035              receives  a single Ctrl-C and causes srun to instead immediately
2036              terminate the running job. This option applies to  step  alloca‐
2037              tions.
2038
2039
2040       -r, --relative=<n>
2041              Run  a  job  step  relative to node n of the current allocation.
2042              This option may be used to spread several job  steps  out  among
2043              the  nodes  of  the  current job. If -r is used, the current job
2044              step will begin at node n of the allocated nodelist,  where  the
2045              first node is considered node 0.  The -r option is not permitted
2046              with -w or -x option and will result in a fatal error  when  not
2047              running within a prior allocation (i.e. when SLURM_JOB_ID is not
2048              set). The default for n is 0. If the value  of  --nodes  exceeds
2049              the  number  of  nodes  identified with the --relative option, a
2050              warning message will be printed and the --relative  option  will
2051              take precedence. This option applies to step allocations.
2052
2053
2054       --reboot
2055              Force  the  allocated  nodes  to reboot before starting the job.
2056              This is only supported with some system configurations and  will
2057              otherwise  be  silently  ignored. Only root, SlurmUser or admins
2058              can reboot nodes. This option applies to job allocations.
2059
2060
2061       --resv-ports[=count]
2062              Reserve communication ports for this job. Users can specify  the
2063              number  of  port  they  want  to  reserve.  The  parameter  Mpi‐
2064              Params=ports=12000-12999 must be specified in slurm.conf. If not
2065              specified  and  Slurm's  OpenMPI plugin is used, then by default
2066              the number of reserved equal to the highest number of  tasks  on
2067              any  node in the job step allocation.  If the number of reserved
2068              ports is zero then no ports is reserved.  Used for OpenMPI. This
2069              option applies to job and step allocations.
2070
2071
2072       --reservation=<reservation_names>
2073              Allocate  resources  for  the job from the named reservation. If
2074              the job can use more than one reservation, specify  their  names
2075              in  a  comma separate list and the one offering earliest initia‐
2076              tion. Each reservation will be considered in the  order  it  was
2077              requested.   All  reservations will be listed in scontrol/squeue
2078              through the life of the job.  In accounting the  first  reserva‐
2079              tion  will be seen and after the job starts the reservation used
2080              will replace it.
2081
2082
2083       -s, --oversubscribe
2084              The job allocation can over-subscribe resources with other  run‐
2085              ning  jobs.   The  resources to be over-subscribed can be nodes,
2086              sockets, cores, and/or hyperthreads  depending  upon  configura‐
2087              tion.   The  default  over-subscribe  behavior depends on system
2088              configuration and the  partition's  OverSubscribe  option  takes
2089              precedence over the job's option.  This option may result in the
2090              allocation being granted  sooner  than  if  the  --oversubscribe
2091              option  was  not  set  and  allow higher system utilization, but
2092              application performance will likely suffer  due  to  competition
2093              for resources.  This option applies to step allocations.
2094
2095
2096       -S, --core-spec=<num>
2097              Count of specialized cores per node reserved by the job for sys‐
2098              tem operations and not used by the application. The  application
2099              will  not use these cores, but will be charged for their alloca‐
2100              tion.  Default value is dependent  upon  the  node's  configured
2101              CoreSpecCount  value.   If a value of zero is designated and the
2102              Slurm configuration option AllowSpecResourcesUsage  is  enabled,
2103              the  job  will  be allowed to override CoreSpecCount and use the
2104              specialized resources on nodes it is allocated.  This option can
2105              not  be  used with the --thread-spec option. This option applies
2106              to job allocations.
2107
2108
2109       --signal=[R:]<sig_num>[@<sig_time>]
2110              When a job is within sig_time seconds of its end time,  send  it
2111              the  signal sig_num.  Due to the resolution of event handling by
2112              Slurm, the signal may be sent up  to  60  seconds  earlier  than
2113              specified.   sig_num may either be a signal number or name (e.g.
2114              "10" or "USR1").  sig_time must have an integer value between  0
2115              and  65535.   By default, no signal is sent before the job's end
2116              time.  If a sig_num  is  specified  without  any  sig_time,  the
2117              default  time  will  be  60  seconds. This option applies to job
2118              allocations.  Use the "R:" option to allow this job  to  overlap
2119              with  a  reservation with MaxStartDelay set.  To have the signal
2120              sent at preemption time see the preempt_send_user_signal  Slurm‐
2121              ctldParameter.
2122
2123
2124       --slurmd-debug=<level>
2125              Specify  a debug level for slurmd(8). The level may be specified
2126              either an integer value between 0 [quiet, only errors  are  dis‐
2127              played] and 4 [verbose operation] or the SlurmdDebug tags.
2128
2129              quiet     Log nothing
2130
2131              fatal     Log only fatal errors
2132
2133              error     Log only errors
2134
2135              info      Log errors and general informational messages
2136
2137              verbose   Log errors and verbose informational messages
2138
2139
2140              The slurmd debug information is copied onto the stderr of
2141              the  job.  By  default  only  errors  are displayed. This option
2142              applies to job and step allocations.
2143
2144
2145       --sockets-per-node=<sockets>
2146              Restrict node selection to nodes with  at  least  the  specified
2147              number  of  sockets.  See additional information under -B option
2148              above when task/affinity plugin is enabled. This option  applies
2149              to job allocations.
2150
2151
2152       --spread-job
2153              Spread  the  job  allocation  over as many nodes as possible and
2154              attempt to evenly distribute tasks across the  allocated  nodes.
2155              This  option  disables  the  topology/tree  plugin.  This option
2156              applies to job allocations.
2157
2158
2159       --switches=<count>[@<max-time>]
2160              When a tree topology is used, this defines the maximum count  of
2161              switches desired for the job allocation and optionally the maxi‐
2162              mum time to wait for that number of switches. If Slurm finds  an
2163              allocation  containing  more  switches than the count specified,
2164              the job remains pending until it either finds an allocation with
2165              desired  switch count or the time limit expires.  It there is no
2166              switch count limit, there is  no  delay  in  starting  the  job.
2167              Acceptable  time  formats  include "minutes", "minutes:seconds",
2168              "hours:minutes:seconds", "days-hours", "days-hours:minutes"  and
2169              "days-hours:minutes:seconds".   The job's maximum time delay may
2170              be limited by the system administrator using the SchedulerParam‐
2171              eters configuration parameter with the max_switch_wait parameter
2172              option.  On a dragonfly network the only switch count  supported
2173              is  1 since communication performance will be highest when a job
2174              is allocate resources on one leaf switch or  more  than  2  leaf
2175              switches.   The  default  max-time is the max_switch_wait Sched‐
2176              ulerParameters. This option applies to job allocations.
2177
2178
2179       -T, --threads=<nthreads>
2180              Allows limiting the number of concurrent threads  used  to  send
2181              the job request from the srun process to the slurmd processes on
2182              the allocated nodes. Default is to use one thread per  allocated
2183              node  up  to a maximum of 60 concurrent threads. Specifying this
2184              option limits the number of concurrent threads to nthreads (less
2185              than  or  equal  to  60).  This should only be used to set a low
2186              thread count for testing on very small  memory  computers.  This
2187              option applies to job allocations.
2188
2189
2190       -t, --time=<time>
2191              Set a limit on the total run time of the job allocation.  If the
2192              requested time limit exceeds the partition's time limit, the job
2193              will  be  left  in a PENDING state (possibly indefinitely).  The
2194              default time limit is the partition's default time limit.   When
2195              the  time  limit  is reached, each task in each job step is sent
2196              SIGTERM followed by SIGKILL.  The interval  between  signals  is
2197              specified  by  the  Slurm configuration parameter KillWait.  The
2198              OverTimeLimit configuration parameter may permit the job to  run
2199              longer than scheduled.  Time resolution is one minute and second
2200              values are rounded up to the next minute.
2201
2202              A time limit of zero requests that no  time  limit  be  imposed.
2203              Acceptable  time  formats  include "minutes", "minutes:seconds",
2204              "hours:minutes:seconds", "days-hours", "days-hours:minutes"  and
2205              "days-hours:minutes:seconds".  This  option  applies  to job and
2206              step allocations.
2207
2208
2209       --task-epilog=<executable>
2210              The slurmstepd daemon will run executable just after  each  task
2211              terminates.  This will be executed before any TaskEpilog parame‐
2212              ter in slurm.conf is executed.  This  is  meant  to  be  a  very
2213              short-lived  program. If it fails to terminate within a few sec‐
2214              onds, it will be killed along  with  any  descendant  processes.
2215              This option applies to step allocations.
2216
2217
2218       --task-prolog=<executable>
2219              The  slurmstepd daemon will run executable just before launching
2220              each task. This will be executed after any TaskProlog  parameter
2221              in slurm.conf is executed.  Besides the normal environment vari‐
2222              ables, this has SLURM_TASK_PID available to identify the process
2223              ID of the task being started.  Standard output from this program
2224              of the form "export NAME=value" will be used to set  environment
2225              variables  for  the  task  being spawned. This option applies to
2226              step allocations.
2227
2228
2229       --test-only
2230              Returns an estimate of when a job  would  be  scheduled  to  run
2231              given  the  current  job  queue and all the other srun arguments
2232              specifying the job.  This limits srun's behavior to just  return
2233              information;  no job is actually submitted.  The program will be
2234              executed directly by the slurmd daemon. This option  applies  to
2235              job allocations.
2236
2237
2238       --thread-spec=<num>
2239              Count  of  specialized  threads per node reserved by the job for
2240              system operations and not used by the application. The  applica‐
2241              tion  will  not use these threads, but will be charged for their
2242              allocation.  This option can not be used  with  the  --core-spec
2243              option. This option applies to job allocations.
2244
2245
2246       --threads-per-core=<threads>
2247              Restrict  node  selection  to  nodes with at least the specified
2248              number of threads per core. In task layout,  use  the  specified
2249              maximum  number of threads per core. Implies --cpu-bind=threads.
2250              NOTE: "Threads" refers to the number of processing units on each
2251              core  rather than the number of application tasks to be launched
2252              per core. See additional information under -B option above  when
2253              task/affinity  plugin is enabled. This option applies to job and
2254              step allocations.
2255
2256
2257       --time-min=<time>
2258              Set a minimum time limit on the job allocation.   If  specified,
2259              the  job  may  have its --time limit lowered to a value no lower
2260              than --time-min if doing so permits the job to  begin  execution
2261              earlier  than otherwise possible.  The job's time limit will not
2262              be changed after the job is allocated resources.  This  is  per‐
2263              formed  by a backfill scheduling algorithm to allocate resources
2264              otherwise reserved for higher priority  jobs.   Acceptable  time
2265              formats   include   "minutes",   "minutes:seconds",  "hours:min‐
2266              utes:seconds",    "days-hours",     "days-hours:minutes"     and
2267              "days-hours:minutes:seconds". This option applies to job alloca‐
2268              tions.
2269
2270
2271       --tmp=<size[units]>
2272              Specify a minimum amount  of  temporary  disk  space  per  node.
2273              Default  units  are megabytes.  Different units can be specified
2274              using the suffix [K|M|G|T].  This option applies to job  alloca‐
2275              tions.
2276
2277
2278       -u, --unbuffered
2279              By  default  the  connection  between  slurmstepd  and  the user
2280              launched application is over a pipe. The stdio output written by
2281              the  application is buffered by the glibc until it is flushed or
2282              the output is set as unbuffered.  See setbuf(3). If this  option
2283              is  specified  the  tasks are executed with a pseudo terminal so
2284              that the application output is unbuffered. This  option  applies
2285              to step allocations.
2286
2287       --usage
2288              Display brief help message and exit.
2289
2290
2291       --uid=<user>
2292              Attempt to submit and/or run a job as user instead of the invok‐
2293              ing user id. The invoking user's credentials  will  be  used  to
2294              check access permissions for the target partition. User root may
2295              use this option to run jobs as a normal user in a RootOnly  par‐
2296              tition  for  example. If run as root, srun will drop its permis‐
2297              sions to the uid specified after node allocation is  successful.
2298              user  may  be  the  user  name or numerical user ID. This option
2299              applies to job and step allocations.
2300
2301
2302       --use-min-nodes
2303              If a range of node counts is given, prefer the smaller count.
2304
2305
2306       -V, --version
2307              Display version information and exit.
2308
2309
2310       -v, --verbose
2311              Increase the verbosity of srun's informational messages.  Multi‐
2312              ple  -v's  will  further  increase srun's verbosity.  By default
2313              only errors will be displayed. This option applies  to  job  and
2314              step allocations.
2315
2316
2317       -W, --wait=<seconds>
2318              Specify  how long to wait after the first task terminates before
2319              terminating all remaining tasks.  A  value  of  0  indicates  an
2320              unlimited  wait (a warning will be issued after 60 seconds). The
2321              default value is set by the WaitTime parameter in the slurm con‐
2322              figuration  file  (see slurm.conf(5)). This option can be useful
2323              to ensure that a job is terminated in a timely  fashion  in  the
2324              event  that  one or more tasks terminate prematurely.  Note: The
2325              -K, --kill-on-bad-exit option takes precedence over  -W,  --wait
2326              to terminate the job immediately if a task exits with a non-zero
2327              exit code. This option applies to job allocations.
2328
2329
2330       -w, --nodelist=<host1,host2,... or filename>
2331              Request a specific list of hosts.  The job will contain  all  of
2332              these  hosts  and possibly additional hosts as needed to satisfy
2333              resource  requirements.   The  list  may  be  specified   as   a
2334              comma-separated list of hosts, a range of hosts (host[1-5,7,...]
2335              for example), or a filename.  The host list will be  assumed  to
2336              be  a filename if it contains a "/" character.  If you specify a
2337              minimum node or processor count larger than can be satisfied  by
2338              the  supplied  host list, additional resources will be allocated
2339              on other nodes as needed.  Rather than  repeating  a  host  name
2340              multiple  times,  an  asterisk  and  a  repetition  count may be
2341              appended to a host name. For example "host1,host1" and "host1*2"
2342              are  equivalent.  If  the number of tasks is given and a list of
2343              requested nodes is also given, the number  of  nodes  used  from
2344              that  list  will be reduced to match that of the number of tasks
2345              if the number of nodes in the list is greater than the number of
2346              tasks. This option applies to job and step allocations.
2347
2348
2349       --wckey=<wckey>
2350              Specify  wckey  to be used with job.  If TrackWCKey=no (default)
2351              in the slurm.conf this value is ignored. This option applies  to
2352              job allocations.
2353
2354
2355       -X, --disable-status
2356              Disable  the  display of task status when srun receives a single
2357              SIGINT (Ctrl-C). Instead immediately forward the SIGINT  to  the
2358              running  job.  Without this option a second Ctrl-C in one second
2359              is required to forcibly terminate the job and srun will  immedi‐
2360              ately  exit.  May  also  be  set  via  the  environment variable
2361              SLURM_DISABLE_STATUS. This option applies to job allocations.
2362
2363
2364       -x, --exclude=<host1,host2,... or filename>
2365              Request that a specific list of hosts not  be  included  in  the
2366              resources  allocated  to this job. The host list will be assumed
2367              to be a filename if it contains a  "/"  character.  This  option
2368              applies to job and step allocations.
2369
2370
2371       --x11[=<all|first|last>]
2372              Sets  up  X11  forwarding  on  all, first or last node(s) of the
2373              allocation. This option is only enabled if  Slurm  was  compiled
2374              with   X11   support  and  PrologFlags=x11  is  defined  in  the
2375              slurm.conf. Default is all.
2376
2377
2378       -Z, --no-allocate
2379              Run the specified tasks on a set of  nodes  without  creating  a
2380              Slurm  "job"  in the Slurm queue structure, bypassing the normal
2381              resource allocation step.  The list of nodes must  be  specified
2382              with  the  -w,  --nodelist  option.  This is a privileged option
2383              only available for the users "SlurmUser" and "root". This option
2384              applies to job allocations.
2385
2386
2387       srun will submit the job request to the slurm job controller, then ini‐
2388       tiate all processes on the remote nodes. If the request cannot  be  met
2389       immediately,  srun  will  block until the resources are free to run the
2390       job. If the -I (--immediate) option is specified srun will terminate if
2391       resources are not immediately available.
2392
2393       When  initiating remote processes srun will propagate the current work‐
2394       ing directory, unless --chdir=<path> is specified, in which  case  path
2395       will become the working directory for the remote processes.
2396
2397       The  -n,  -c,  and -N options control how CPUs  and nodes will be allo‐
2398       cated to the job. When specifying only the number of processes  to  run
2399       with  -n,  a default of one CPU per process is allocated. By specifying
2400       the number of CPUs required per task (-c), more than  one  CPU  may  be
2401       allocated  per  process.  If  the number of nodes is specified with -N,
2402       srun will attempt to allocate at least the number of nodes specified.
2403
2404       Combinations of the above three options may be used to change how  pro‐
2405       cesses are distributed across nodes and cpus. For instance, by specify‐
2406       ing both the number of processes and number of nodes on which  to  run,
2407       the  number of processes per node is implied. However, if the number of
2408       CPUs per process is more important then number of  processes  (-n)  and
2409       the number of CPUs per process (-c) should be specified.
2410
2411       srun  will  refuse  to   allocate  more than one process per CPU unless
2412       --overcommit (-O) is also specified.
2413
2414       srun will attempt to meet the above specifications "at a minimum." That
2415       is,  if  16 nodes are requested for 32 processes, and some nodes do not
2416       have 2 CPUs, the allocation of nodes will be increased in order to meet
2417       the  demand  for  CPUs. In other words, a minimum of 16 nodes are being
2418       requested. However, if 16 nodes are requested for  15  processes,  srun
2419       will  consider  this  an  error,  as  15 processes cannot run across 16
2420       nodes.
2421
2422
2423       IO Redirection
2424
2425       By default, stdout and stderr will be redirected from all tasks to  the
2426       stdout  and stderr of srun, and stdin will be redirected from the stan‐
2427       dard input of srun to all remote tasks.  If stdin is only to be read by
2428       a  subset  of  the spawned tasks, specifying a file to read from rather
2429       than forwarding stdin from the srun command may  be  preferable  as  it
2430       avoids moving and storing data that will never be read.
2431
2432       For  OS  X, the poll() function does not support stdin, so input from a
2433       terminal is not possible.
2434
2435       This behavior may be changed with the --output,  --error,  and  --input
2436       (-o, -e, -i) options. Valid format specifications for these options are
2437
2438       all       stdout stderr is redirected from all tasks to srun.  stdin is
2439                 broadcast to all remote tasks.  (This is the  default  behav‐
2440                 ior)
2441
2442       none      stdout  and  stderr  is not received from any task.  stdin is
2443                 not sent to any task (stdin is closed).
2444
2445       taskid    stdout and/or stderr are redirected from only the  task  with
2446                 relative  id  equal  to  taskid, where 0 <= taskid <= ntasks,
2447                 where ntasks is the total number of tasks in the current  job
2448                 step.   stdin  is  redirected  from the stdin of srun to this
2449                 same task.  This file will be written on the  node  executing
2450                 the task.
2451
2452       filename  srun  will  redirect  stdout  and/or stderr to the named file
2453                 from all tasks.  stdin will be redirected from the named file
2454                 and  broadcast to all tasks in the job.  filename refers to a
2455                 path on the host that runs srun.  Depending on the  cluster's
2456                 file  system  layout, this may result in the output appearing
2457                 in different places depending on whether the job  is  run  in
2458                 batch mode.
2459
2460       filename pattern
2461                 srun allows for a filename pattern to be used to generate the
2462                 named IO file described above. The following list  of  format
2463                 specifiers  may  be  used  in the format string to generate a
2464                 filename that will be unique to a given jobid, stepid,  node,
2465                 or  task.  In  each case, the appropriate number of files are
2466                 opened and associated with the corresponding tasks. Note that
2467                 any  format string containing %t, %n, and/or %N will be writ‐
2468                 ten on the node executing the task rather than the node where
2469                 srun executes, these format specifiers are not supported on a
2470                 BGQ system.
2471
2472                 \\     Do not process any of the replacement symbols.
2473
2474                 %%     The character "%".
2475
2476                 %A     Job array's master job allocation number.
2477
2478                 %a     Job array ID (index) number.
2479
2480                 %J     jobid.stepid of the running job. (e.g. "128.0")
2481
2482                 %j     jobid of the running job.
2483
2484                 %s     stepid of the running job.
2485
2486                 %N     short hostname. This will create a  separate  IO  file
2487                        per node.
2488
2489                 %n     Node  identifier  relative to current job (e.g. "0" is
2490                        the first node of the running job) This will create  a
2491                        separate IO file per node.
2492
2493                 %t     task  identifier  (rank) relative to current job. This
2494                        will create a separate IO file per task.
2495
2496                 %u     User name.
2497
2498                 %x     Job name.
2499
2500                 A number placed between  the  percent  character  and  format
2501                 specifier  may be used to zero-pad the result in the IO file‐
2502                 name. This number is ignored if the format  specifier  corre‐
2503                 sponds to  non-numeric data (%N for example).
2504
2505                 Some  examples  of  how the format string may be used for a 4
2506                 task job step with a Job ID of 128  and  step  id  of  0  are
2507                 included below:
2508
2509                 job%J.out      job128.0.out
2510
2511                 job%4j.out     job0128.out
2512
2513                 job%j-%2t.out  job128-00.out, job128-01.out, ...
2514

PERFORMANCE

2516       Executing  srun  sends  a remote procedure call to slurmctld. If enough
2517       calls from srun or other Slurm client commands that send remote  proce‐
2518       dure  calls to the slurmctld daemon come in at once, it can result in a
2519       degradation of performance of the slurmctld daemon, possibly  resulting
2520       in a denial of service.
2521
2522       Do  not run srun or other Slurm client commands that send remote proce‐
2523       dure calls to slurmctld from loops in shell scripts or other  programs.
2524       Ensure  that  programs limit calls to srun to the minimum necessary for
2525       the information you are trying to gather.
2526
2527

INPUT ENVIRONMENT VARIABLES

2529       Some srun options may be set via environment variables.  These environ‐
2530       ment  variables,  along  with  their  corresponding options, are listed
2531       below.  Note: Command line options will always override these settings.
2532
2533       PMI_FANOUT            This is used exclusively  with  PMI  (MPICH2  and
2534                             MVAPICH2)  and controls the fanout of data commu‐
2535                             nications. The srun  command  sends  messages  to
2536                             application  programs  (via  the PMI library) and
2537                             those applications may be called upon to  forward
2538                             that  data  to  up  to  this number of additional
2539                             tasks. Higher values offload work from  the  srun
2540                             command  to  the applications and likely increase
2541                             the vulnerability to failures.  The default value
2542                             is 32.
2543
2544       PMI_FANOUT_OFF_HOST   This  is  used  exclusively  with PMI (MPICH2 and
2545                             MVAPICH2) and controls the fanout of data  commu‐
2546                             nications.   The  srun  command sends messages to
2547                             application programs (via the  PMI  library)  and
2548                             those  applications may be called upon to forward
2549                             that data to additional tasks. By  default,  srun
2550                             sends  one  message per host and one task on that
2551                             host forwards the data to  other  tasks  on  that
2552                             host up to PMI_FANOUT.  If PMI_FANOUT_OFF_HOST is
2553                             defined, the user task may be required to forward
2554                             the  data  to  tasks  on  other  hosts.   Setting
2555                             PMI_FANOUT_OFF_HOST  may  increase   performance.
2556                             Since  more  work is performed by the PMI library
2557                             loaded by the user application, failures also can
2558                             be more common and more difficult to diagnose.
2559
2560       PMI_TIME              This  is  used  exclusively  with PMI (MPICH2 and
2561                             MVAPICH2) and controls how  much  the  communica‐
2562                             tions  from  the tasks to the srun are spread out
2563                             in time in order to avoid overwhelming  the  srun
2564                             command  with  work.  The  default  value  is 500
2565                             (microseconds) per task. On relatively slow  pro‐
2566                             cessors  or  systems  with  very  large processor
2567                             counts (and large PMI data sets),  higher  values
2568                             may be required.
2569
2570       SLURM_CONF            The location of the Slurm configuration file.
2571
2572       SLURM_ACCOUNT         Same as -A, --account
2573
2574       SLURM_ACCTG_FREQ      Same as --acctg-freq
2575
2576       SLURM_BCAST           Same as --bcast
2577
2578       SLURM_BURST_BUFFER    Same as --bb
2579
2580       SLURM_CLUSTERS        Same as -M, --clusters
2581
2582       SLURM_COMPRESS        Same as --compress
2583
2584       SLURM_CONSTRAINT      Same as -C, --constraint
2585
2586       SLURM_CORE_SPEC       Same as --core-spec
2587
2588       SLURM_CPU_BIND        Same as --cpu-bind
2589
2590       SLURM_CPU_FREQ_REQ    Same as --cpu-freq.
2591
2592       SLURM_CPUS_PER_GPU    Same as --cpus-per-gpu
2593
2594       SLURM_CPUS_PER_TASK   Same as -c, --cpus-per-task
2595
2596       SLURM_DEBUG           Same as -v, --verbose
2597
2598       SLURM_DELAY_BOOT      Same as --delay-boot
2599
2600       SLURMD_DEBUG          Same as -d, --slurmd-debug
2601
2602       SLURM_DEPENDENCY      Same as -P, --dependency=<jobid>
2603
2604       SLURM_DISABLE_STATUS  Same as -X, --disable-status
2605
2606       SLURM_DIST_PLANESIZE  Plane distribution size. Only used if --distribu‐
2607                             tion=plane, without =<size>, is set.
2608
2609       SLURM_DISTRIBUTION    Same as -m, --distribution
2610
2611       SLURM_EPILOG          Same as --epilog
2612
2613       SLURM_EXCLUSIVE       Same as --exclusive
2614
2615       SLURM_EXIT_ERROR      Specifies the exit code generated  when  a  Slurm
2616                             error occurs (e.g. invalid options).  This can be
2617                             used by a script to distinguish application  exit
2618                             codes  from various Slurm error conditions.  Also
2619                             see SLURM_EXIT_IMMEDIATE.
2620
2621       SLURM_EXIT_IMMEDIATE  Specifies  the  exit  code  generated  when   the
2622                             --immediate  option is used and resources are not
2623                             currently available.   This  can  be  used  by  a
2624                             script to distinguish application exit codes from
2625                             various  Slurm  error   conditions.    Also   see
2626                             SLURM_EXIT_ERROR.
2627
2628       SLURM_EXPORT_ENV      Same as --export
2629
2630       SLURM_GPUS            Same as -G, --gpus
2631
2632       SLURM_GPU_BIND        Same as --gpu-bind
2633
2634       SLURM_GPU_FREQ        Same as --gpu-freq
2635
2636       SLURM_GPUS_PER_NODE   Same as --gpus-per-node
2637
2638       SLURM_GPUS_PER_TASK   Same as --gpus-per-task
2639
2640       SLURM_GRES_FLAGS      Same as --gres-flags
2641
2642       SLURM_HINT            Same as --hint
2643
2644       SLURM_GRES            Same as --gres. Also see SLURM_STEP_GRES
2645
2646       SLURM_IMMEDIATE       Same as -I, --immediate
2647
2648       SLURM_JOB_ID          Same as --jobid
2649
2650       SLURM_JOB_NAME        Same  as -J, --job-name except within an existing
2651                             allocation, in which case it is ignored to  avoid
2652                             using  the  batch  job's name as the name of each
2653                             job step.
2654
2655       SLURM_JOB_NODELIST    Same as -w, --nodelist=<host1,host2,... or  file‐
2656                             name>.  If job has been resized, ensure that this
2657                             nodelist is adjusted (or undefined) to avoid jobs
2658                             steps being rejected due to down nodes.
2659
2660       SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility)
2661                             Same  as -N, --nodes Total number of nodes in the
2662                             job’s resource allocation.
2663
2664       SLURM_KILL_BAD_EXIT   Same as -K, --kill-on-bad-exit
2665
2666       SLURM_LABELIO         Same as -l, --label
2667
2668       SLURM_MEM_BIND        Same as --mem-bind
2669
2670       SLURM_MEM_PER_CPU     Same as --mem-per-cpu
2671
2672       SLURM_MEM_PER_GPU     Same as --mem-per-gpu
2673
2674       SLURM_MEM_PER_NODE    Same as --mem
2675
2676       SLURM_MPI_TYPE        Same as --mpi
2677
2678       SLURM_NETWORK         Same as --network
2679
2680       SLURM_NO_KILL         Same as -k, --no-kill
2681
2682       SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
2683                             Same as -n, --ntasks
2684
2685       SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2686
2687       SLURM_NTASKS_PER_GPU  Same as --ntasks-per-gpu
2688
2689       SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2690
2691       SLURM_NTASKS_PER_SOCKET
2692                             Same as --ntasks-per-socket
2693
2694       SLURM_OPEN_MODE       Same as --open-mode
2695
2696       SLURM_OVERCOMMIT      Same as -O, --overcommit
2697
2698       SLURM_OVERLAP         Same as --overlap
2699
2700       SLURM_PARTITION       Same as -p, --partition
2701
2702       SLURM_PMI_KVS_NO_DUP_KEYS
2703                             If set, then PMI key-pairs will contain no dupli‐
2704                             cate  keys.  MPI  can use this variable to inform
2705                             the PMI library that it will  not  use  duplicate
2706                             keys  so  PMI  can  skip  the check for duplicate
2707                             keys.  This is the case for  MPICH2  and  reduces
2708                             overhead  in  testing for duplicates for improved
2709                             performance
2710
2711       SLURM_POWER           Same as --power
2712
2713       SLURM_PROFILE         Same as --profile
2714
2715       SLURM_PROLOG          Same as --prolog
2716
2717       SLURM_QOS             Same as --qos
2718
2719       SLURM_REMOTE_CWD      Same as -D, --chdir=
2720
2721       SLURM_REQ_SWITCH      When a tree topology is used,  this  defines  the
2722                             maximum  count  of  switches  desired for the job
2723                             allocation and optionally  the  maximum  time  to
2724                             wait for that number of switches. See --switches
2725
2726       SLURM_RESERVATION     Same as --reservation
2727
2728       SLURM_RESV_PORTS      Same as --resv-ports
2729
2730       SLURM_SIGNAL          Same as --signal
2731
2732       SLURM_STDERRMODE      Same as -e, --error
2733
2734       SLURM_STDINMODE       Same as -i, --input
2735
2736       SLURM_SPREAD_JOB      Same as --spread-job
2737
2738       SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2739                             if  set  and  non-zero, successive task exit mes‐
2740                             sages with the same exit  code  will  be  printed
2741                             only once.
2742
2743       SLURM_STEP_GRES       Same as --gres (only applies to job steps, not to
2744                             job allocations).  Also see SLURM_GRES
2745
2746       SLURM_STEP_KILLED_MSG_NODE_ID=ID
2747                             If set, only the specified node will log when the
2748                             job or step are killed by a signal.
2749
2750       SLURM_STDOUTMODE      Same as -o, --output
2751
2752       SLURM_TASK_EPILOG     Same as --task-epilog
2753
2754       SLURM_TASK_PROLOG     Same as --task-prolog
2755
2756       SLURM_TEST_EXEC       If  defined,  srun  will  verify existence of the
2757                             executable program along with user  execute  per‐
2758                             mission  on the node where srun was called before
2759                             attempting to launch it on nodes in the step.
2760
2761       SLURM_THREAD_SPEC     Same as --thread-spec
2762
2763       SLURM_THREADS         Same as -T, --threads
2764
2765       SLURM_THREADS_PER_CORE
2766                             Same as -T, --threads-per-core
2767
2768       SLURM_TIMELIMIT       Same as -t, --time
2769
2770       SLURM_UNBUFFEREDIO    Same as -u, --unbuffered
2771
2772       SLURM_USE_MIN_NODES   Same as --use-min-nodes
2773
2774       SLURM_WAIT            Same as -W, --wait
2775
2776       SLURM_WAIT4SWITCH     Max time  waiting  for  requested  switches.  See
2777                             --switches
2778
2779       SLURM_WCKEY           Same as -W, --wckey
2780
2781       SLURM_WHOLE           Same as --whole
2782
2783       SLURM_WORKING_DIR     -D, --chdir
2784
2785       SRUN_EXPORT_ENV       Same  as  --export, and will override any setting
2786                             for SLURM_EXPORT_ENV.
2787
2788
2789

OUTPUT ENVIRONMENT VARIABLES

2791       srun will set some environment variables in the environment of the exe‐
2792       cuting  tasks on the remote compute nodes.  These environment variables
2793       are:
2794
2795
2796       SLURM_*_HET_GROUP_#   For a heterogeneous job allocation, the  environ‐
2797                             ment variables are set separately for each compo‐
2798                             nent.
2799
2800       SLURM_CLUSTER_NAME    Name of the cluster on which the job  is  execut‐
2801                             ing.
2802
2803       SLURM_CPU_BIND_VERBOSE
2804                             --cpu-bind verbosity (quiet,verbose).
2805
2806       SLURM_CPU_BIND_TYPE   --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2807
2808       SLURM_CPU_BIND_LIST   --cpu-bind  map  or  mask list (list of Slurm CPU
2809                             IDs or masks for this node, CPU_ID =  Board_ID  x
2810                             threads_per_board       +       Socket_ID       x
2811                             threads_per_socket + Core_ID x threads_per_core +
2812                             Thread_ID).
2813
2814
2815       SLURM_CPU_FREQ_REQ    Contains the value requested for cpu frequency on
2816                             the srun command  as  a  numerical  frequency  in
2817                             kilohertz, or a coded value for a request of low,
2818                             medium,highm1 or high for the frequency.  See the
2819                             description  of  the  --cpu-freq  option  or  the
2820                             SLURM_CPU_FREQ_REQ input environment variable.
2821
2822       SLURM_CPUS_ON_NODE    Count of processors available to the job on  this
2823                             node.   Note  the  select/linear plugin allocates
2824                             entire nodes to jobs, so the value indicates  the
2825                             total  count  of  CPUs  on  the  node.   For  the
2826                             select/cons_res plugin, this number indicates the
2827                             number  of  cores  on  this node allocated to the
2828                             job.
2829
2830       SLURM_CPUS_PER_TASK   Number of cpus requested per task.  Only  set  if
2831                             the --cpus-per-task option is specified.
2832
2833       SLURM_DISTRIBUTION    Distribution type for the allocated jobs. Set the
2834                             distribution with -m, --distribution.
2835
2836       SLURM_GTIDS           Global task IDs running on this node.  Zero  ori‐
2837                             gin and comma separated.
2838
2839       SLURM_JOB_ACCOUNT     Account name associated of the job allocation.
2840
2841       SLURM_JOB_CPUS_PER_NODE
2842                             Number of CPUS per node.
2843
2844       SLURM_JOB_DEPENDENCY  Set to value of the --dependency option.
2845
2846       SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
2847                             Job id of the executing job.
2848
2849
2850       SLURM_JOB_NAME        Set  to the value of the --job-name option or the
2851                             command name when srun is used to  create  a  new
2852                             job allocation. Not set when srun is used only to
2853                             create a job step (i.e. within  an  existing  job
2854                             allocation).
2855
2856
2857       SLURM_JOB_PARTITION   Name  of  the  partition in which the job is run‐
2858                             ning.
2859
2860
2861       SLURM_JOB_QOS         Quality Of Service (QOS) of the job allocation.
2862
2863       SLURM_JOB_RESERVATION Advanced reservation containing the  job  alloca‐
2864                             tion, if any.
2865
2866
2867       SLURM_LAUNCH_NODE_IPADDR
2868                             IP address of the node from which the task launch
2869                             was initiated (where the srun command ran from).
2870
2871       SLURM_LOCALID         Node local task ID for the process within a job.
2872
2873
2874       SLURM_MEM_BIND_LIST   --mem-bind map or mask  list  (<list  of  IDs  or
2875                             masks for this node>).
2876
2877       SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2878
2879       SLURM_MEM_BIND_SORT   Sort  free cache pages (run zonesort on Intel KNL
2880                             nodes).
2881
2882       SLURM_MEM_BIND_TYPE   --mem-bind type (none,rank,map_mem:,mask_mem:).
2883
2884       SLURM_MEM_BIND_VERBOSE
2885                             --mem-bind verbosity (quiet,verbose).
2886
2887       SLURM_JOB_NODES       Total number of nodes in the job's resource allo‐
2888                             cation.
2889
2890       SLURM_NODE_ALIASES    Sets  of  node  name,  communication  address and
2891                             hostname for nodes allocated to the job from  the
2892                             cloud. Each element in the set if colon separated
2893                             and each set is comma separated. For example:
2894                             SLURM_NODE_ALIASES=
2895                             ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2896
2897       SLURM_NODEID          The relative node ID of the current node.
2898
2899       SLURM_JOB_NODELIST    List of nodes allocated to the job.
2900
2901       SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
2902                             Total number of processes in the current  job  or
2903                             job step.
2904
2905       SLURM_HET_SIZE        Set to count of components in heterogeneous job.
2906
2907       SLURM_PRIO_PROCESS    The  scheduling priority (nice value) at the time
2908                             of job submission.  This value is  propagated  to
2909                             the spawned processes.
2910
2911       SLURM_PROCID          The MPI rank (or relative process ID) of the cur‐
2912                             rent process.
2913
2914       SLURM_SRUN_COMM_HOST  IP address of srun communication host.
2915
2916       SLURM_SRUN_COMM_PORT  srun communication port.
2917
2918       SLURM_STEP_LAUNCHER_PORT
2919                             Step launcher port.
2920
2921       SLURM_STEP_NODELIST   List of nodes allocated to the step.
2922
2923       SLURM_STEP_NUM_NODES  Number of nodes allocated to the step.
2924
2925       SLURM_STEP_NUM_TASKS  Number of processes in the job step or whole het‐
2926                             erogeneous job step.
2927
2928       SLURM_STEP_TASKS_PER_NODE
2929                             Number of processes per node within the step.
2930
2931       SLURM_STEP_ID (and SLURM_STEPID for backwards compatibility)
2932                             The step ID of the current job.
2933
2934       SLURM_SUBMIT_DIR      The  directory from which srun was invoked or, if
2935                             applicable, the directory specified  by  the  -D,
2936                             --chdir option.
2937
2938       SLURM_SUBMIT_HOST     The  hostname  of  the computer from which salloc
2939                             was invoked.
2940
2941       SLURM_TASK_PID        The process ID of the task being started.
2942
2943       SLURM_TASKS_PER_NODE  Number of tasks to be  initiated  on  each  node.
2944                             Values  are comma separated and in the same order
2945                             as SLURM_JOB_NODELIST.  If two or  more  consecu‐
2946                             tive  nodes are to have the same task count, that
2947                             count is followed by "(x#)" where "#" is the rep‐
2948                             etition         count.        For        example,
2949                             "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2950                             first three nodes will each execute two tasks and
2951                             the fourth node will execute one task.
2952
2953
2954       SLURM_TOPOLOGY_ADDR   This is set only if the  system  has  the  topol‐
2955                             ogy/tree  plugin  configured.   The value will be
2956                             set to the names network switches  which  may  be
2957                             involved  in  the  job's  communications from the
2958                             system's top level switch down to the leaf switch
2959                             and  ending  with  node name. A period is used to
2960                             separate each hardware component name.
2961
2962       SLURM_TOPOLOGY_ADDR_PATTERN
2963                             This is set only if the  system  has  the  topol‐
2964                             ogy/tree  plugin  configured.   The value will be
2965                             set  component  types  listed   in   SLURM_TOPOL‐
2966                             OGY_ADDR.   Each  component will be identified as
2967                             either "switch" or "node".  A period is  used  to
2968                             separate each hardware component type.
2969
2970       SLURM_UMASK           The umask in effect when the job was submitted.
2971
2972       SLURMD_NODENAME       Name of the node running the task. In the case of
2973                             a parallel  job  executing  on  multiple  compute
2974                             nodes,  the various tasks will have this environ‐
2975                             ment variable set to  different  values  on  each
2976                             compute node.
2977
2978       SRUN_DEBUG            Set  to  the  logging  level of the srun command.
2979                             Default value is 3 (info level).   The  value  is
2980                             incremented  or decremented based upon the --ver‐
2981                             bose and --quiet options.
2982
2983

SIGNALS AND ESCAPE SEQUENCES

2985       Signals sent to the srun command are  automatically  forwarded  to  the
2986       tasks  it  is  controlling  with  a few exceptions. The escape sequence
2987       <control-c> will report the state of all tasks associated with the srun
2988       command.  If  <control-c>  is entered twice within one second, then the
2989       associated SIGINT signal will be sent to all tasks  and  a  termination
2990       sequence  will  be entered sending SIGCONT, SIGTERM, and SIGKILL to all
2991       spawned tasks.  If a third <control-c> is received,  the  srun  program
2992       will  be  terminated  without waiting for remote tasks to exit or their
2993       I/O to complete.
2994
2995       The escape sequence <control-z> is presently ignored. Our intent is for
2996       this put the srun command into a mode where various special actions may
2997       be invoked.
2998
2999

MPI SUPPORT

3001       MPI use depends upon the type of MPI being used.  There are three  fun‐
3002       damentally  different  modes  of  operation  used  by these various MPI
3003       implementation.
3004
3005       1. Slurm directly launches the tasks  and  performs  initialization  of
3006       communications  through the PMI2 or PMIx APIs.  For example: "srun -n16
3007       a.out".
3008
3009       2. Slurm creates a resource allocation for  the  job  and  then  mpirun
3010       launches tasks using Slurm's infrastructure (OpenMPI).
3011
3012       3.  Slurm  creates  a  resource  allocation for the job and then mpirun
3013       launches tasks using some mechanism other than Slurm, such  as  SSH  or
3014       RSH.   These  tasks are initiated outside of Slurm's monitoring or con‐
3015       trol. Slurm's epilog should be configured to purge these tasks when the
3016       job's  allocation  is  relinquished,  or  the use of pam_slurm_adopt is
3017       highly recommended.
3018
3019       See https://slurm.schedmd.com/mpi_guide.html for  more  information  on
3020       use of these various MPI implementation with Slurm.
3021
3022

MULTIPLE PROGRAM CONFIGURATION

3024       Comments  in the configuration file must have a "#" in column one.  The
3025       configuration file contains the following  fields  separated  by  white
3026       space:
3027
3028       Task rank
3029              One or more task ranks to use this configuration.  Multiple val‐
3030              ues may be comma separated.  Ranges may be  indicated  with  two
3031              numbers separated with a '-' with the smaller number first (e.g.
3032              "0-4" and not "4-0").  To indicate all tasks not otherwise spec‐
3033              ified,  specify  a rank of '*' as the last line of the file.  If
3034              an attempt is made to initiate a task for  which  no  executable
3035              program is defined, the following error message will be produced
3036              "No executable program specified for this task".
3037
3038       Executable
3039              The name of the program to  execute.   May  be  fully  qualified
3040              pathname if desired.
3041
3042       Arguments
3043              Program  arguments.   The  expression "%t" will be replaced with
3044              the task's number.  The expression "%o" will  be  replaced  with
3045              the task's offset within this range (e.g. a configured task rank
3046              value of "1-5" would  have  offset  values  of  "0-4").   Single
3047              quotes  may  be  used to avoid having the enclosed values inter‐
3048              preted.  This field is optional.  Any arguments for the  program
3049              entered on the command line will be added to the arguments spec‐
3050              ified in the configuration file.
3051
3052       For example:
3053       ###################################################################
3054       # srun multiple program configuration file
3055       #
3056       # srun -n8 -l --multi-prog silly.conf
3057       ###################################################################
3058       4-6       hostname
3059       1,7       echo  task:%t
3060       0,2-3     echo  offset:%o
3061
3062       > srun -n8 -l --multi-prog silly.conf
3063       0: offset:0
3064       1: task:1
3065       2: offset:1
3066       3: offset:2
3067       4: linux15.llnl.gov
3068       5: linux16.llnl.gov
3069       6: linux17.llnl.gov
3070       7: task:7
3071
3072
3073
3074

EXAMPLES

3076       This simple example demonstrates the execution of the command  hostname
3077       in  eight tasks. At least eight processors will be allocated to the job
3078       (the same as the task count) on however many nodes are required to sat‐
3079       isfy  the  request.  The output of each task will be proceeded with its
3080       task number.  (The machine "dev" in the example below has  a  total  of
3081       two CPUs per node)
3082
3083
3084       > srun -n8 -l hostname
3085       0: dev0
3086       1: dev0
3087       2: dev1
3088       3: dev1
3089       4: dev2
3090       5: dev2
3091       6: dev3
3092       7: dev3
3093
3094
3095       The  srun -r option is used within a job script to run two job steps on
3096       disjoint nodes in the following example. The script is run using  allo‐
3097       cate mode instead of as a batch job in this case.
3098
3099
3100       > cat test.sh
3101       #!/bin/sh
3102       echo $SLURM_JOB_NODELIST
3103       srun -lN2 -r2 hostname
3104       srun -lN2 hostname
3105
3106       > salloc -N4 test.sh
3107       dev[7-10]
3108       0: dev9
3109       1: dev10
3110       0: dev7
3111       1: dev8
3112
3113
3114       The following script runs two job steps in parallel within an allocated
3115       set of nodes.
3116
3117
3118       > cat test.sh
3119       #!/bin/bash
3120       srun -lN2 -n4 -r 2 sleep 60 &
3121       srun -lN2 -r 0 sleep 60 &
3122       sleep 1
3123       squeue
3124       squeue -s
3125       wait
3126
3127       > salloc -N4 test.sh
3128         JOBID PARTITION     NAME     USER  ST      TIME  NODES NODELIST
3129         65641     batch  test.sh   grondo   R      0:01      4 dev[7-10]
3130
3131       STEPID     PARTITION     USER      TIME NODELIST
3132       65641.0        batch   grondo      0:01 dev[7-8]
3133       65641.1        batch   grondo      0:01 dev[9-10]
3134
3135
3136       This example demonstrates how one executes a simple MPI  job.   We  use
3137       srun  to  build  a list of machines (nodes) to be used by mpirun in its
3138       required format. A sample command line and the script  to  be  executed
3139       follow.
3140
3141
3142       > cat test.sh
3143       #!/bin/sh
3144       MACHINEFILE="nodes.$SLURM_JOB_ID"
3145
3146       # Generate Machinefile for mpi such that hosts are in the same
3147       #  order as if run via srun
3148       #
3149       srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3150
3151       # Run using generated Machine file:
3152       mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3153
3154       rm $MACHINEFILE
3155
3156       > salloc -N2 -n4 test.sh
3157
3158
3159       This  simple  example  demonstrates  the execution of different jobs on
3160       different nodes in the same srun.  You can do this for  any  number  of
3161       nodes  or  any number of jobs.  The executables are placed on the nodes
3162       sited by the SLURM_NODEID env var.  Starting at 0 and going to the num‐
3163       ber specified on the srun commandline.
3164
3165
3166       > cat test.sh
3167       case $SLURM_NODEID in
3168           0) echo "I am running on "
3169              hostname ;;
3170           1) hostname
3171              echo "is where I am running" ;;
3172       esac
3173
3174       > srun -N2 test.sh
3175       dev0
3176       is where I am running
3177       I am running on
3178       dev1
3179
3180
3181       This  example  demonstrates use of multi-core options to control layout
3182       of tasks.  We request that four sockets per  node  and  two  cores  per
3183       socket be dedicated to the job.
3184
3185
3186       > srun -N2 -B 4-4:2-2 a.out
3187
3188       This  example shows a script in which Slurm is used to provide resource
3189       management for a job by executing the various job steps  as  processors
3190       become available for their dedicated use.
3191
3192
3193       > cat my.script
3194       #!/bin/bash
3195       srun -n4 prog1 &
3196       srun -n3 prog2 &
3197       srun -n1 prog3 &
3198       srun -n1 prog4 &
3199       wait
3200
3201
3202       This  example  shows  how to launch an application called "server" with
3203       one task, 8 CPUs and 16 GB of memory (2 GB per CPU) plus another appli‐
3204       cation  called "client" with 16 tasks, 1 CPU per task (the default) and
3205       1 GB of memory per task.
3206
3207
3208       > srun -n1 -c16 --mem-per-cpu=1gb server : -n16 --mem-per-cpu=1gb client
3209
3210

COPYING

3212       Copyright (C) 2006-2007 The Regents of the  University  of  California.
3213       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3214       Copyright (C) 2008-2010 Lawrence Livermore National Security.
3215       Copyright (C) 2010-2015 SchedMD LLC.
3216
3217       This  file  is  part  of  Slurm,  a  resource  management program.  For
3218       details, see <https://slurm.schedmd.com/>.
3219
3220       Slurm is free software; you can redistribute it and/or modify it  under
3221       the  terms  of  the GNU General Public License as published by the Free
3222       Software Foundation; either version 2  of  the  License,  or  (at  your
3223       option) any later version.
3224
3225       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
3226       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
3227       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
3228       for more details.
3229
3230