1srun(1)                         Slurm Commands                         srun(1)
2
3
4

NAME

6       srun - Run parallel jobs
7
8

SYNOPSIS

10       srun [OPTIONS(0)...] [ : [OPTIONS(N)...]] executable(0) [args(0)...]
11
12       Option(s)  define  multiple  jobs  in a co-scheduled heterogeneous job.
13       For more details about heterogeneous jobs see the document
14       https://slurm.schedmd.com/heterogeneous_jobs.html
15
16

DESCRIPTION

18       Run a parallel job on cluster managed by  Slurm.   If  necessary,  srun
19       will  first  create  a resource allocation in which to run the parallel
20       job.
21
22       The following document describes the influence of  various  options  on
23       the allocation of cpus to jobs and tasks.
24       https://slurm.schedmd.com/cpu_management.html
25
26

RETURN VALUE

28       srun  will return the highest exit code of all tasks run or the highest
29       signal (with the high-order bit set in an 8-bit integer -- e.g.  128  +
30       signal) of any task that exited with a signal.
31
32

EXECUTABLE PATH RESOLUTION

34       The executable is resolved in the following order:
35
36       1. If  executable starts with ".", then path is constructed as: current
37          working directory / executable
38
39       2. If executable starts with a "/", then path is considered absolute.
40
41       3. If executable can be resolved through PATH. See path_resolution(7).
42
43       4. If executable is in current working directory.
44
45       Current working directory is  the  calling  process  working  directory
46       unless  the --chdir argument is passed, which will override the current
47       working directory.
48
49

OPTIONS

51       --accel-bind=<options>
52              Control how tasks are bound to generic resources  of  type  gpu,
53              mic  and  nic.   Multiple  options  may  be specified. Supported
54              options include:
55
56              g      Bind each task to GPUs which are closest to the allocated
57                     CPUs.
58
59              m      Bind each task to MICs which are closest to the allocated
60                     CPUs.
61
62              n      Bind each task to NICs which are closest to the allocated
63                     CPUs.
64
65              v      Verbose  mode.  Log  how  tasks  are bound to GPU and NIC
66                     devices.
67
68              This option applies to job allocations.
69
70
71       -A, --account=<account>
72              Charge resources used by this job  to  specified  account.   The
73              account  is an arbitrary string. The account name may be changed
74              after job submission using the  scontrol  command.  This  option
75              applies to job allocations.
76
77
78       --acctg-freq
79              Define  the  job  accounting  and  profiling sampling intervals.
80              This can be used to override the JobAcctGatherFrequency  parame‐
81              ter  in  Slurm's  configuration file, slurm.conf.  The supported
82              format is follows:
83
84              --acctg-freq=<datatype>=<interval>
85                          where <datatype>=<interval> specifies the task  sam‐
86                          pling  interval  for  the jobacct_gather plugin or a
87                          sampling  interval  for  a  profiling  type  by  the
88                          acct_gather_profile  plugin.  Multiple,  comma-sepa‐
89                          rated <datatype>=<interval> intervals may be  speci‐
90                          fied. Supported datatypes are as follows:
91
92                          task=<interval>
93                                 where  <interval> is the task sampling inter‐
94                                 val in seconds for the jobacct_gather plugins
95                                 and     for    task    profiling    by    the
96                                 acct_gather_profile plugin.  NOTE: This  fre‐
97                                 quency  is  used  to monitor memory usage. If
98                                 memory limits are enforced the  highest  fre‐
99                                 quency  a user can request is what is config‐
100                                 ured in the slurm.conf file.   They  can  not
101                                 turn it off (=0) either.
102
103                          energy=<interval>
104                                 where  <interval> is the sampling interval in
105                                 seconds  for  energy  profiling   using   the
106                                 acct_gather_energy plugin
107
108                          network=<interval>
109                                 where  <interval> is the sampling interval in
110                                 seconds for infiniband  profiling  using  the
111                                 acct_gather_infiniband plugin.
112
113                          filesystem=<interval>
114                                 where  <interval> is the sampling interval in
115                                 seconds for filesystem  profiling  using  the
116                                 acct_gather_filesystem plugin.
117
118              The  default  value  for  the  task  sampling
119              interval
120              is 30. The default value for  all  other  intervals  is  0.   An
121              interval  of  0 disables sampling of the specified type.  If the
122              task sampling interval is 0, accounting information is collected
123              only  at  job  termination (reducing Slurm interference with the
124              job).
125              Smaller (non-zero) values have a greater impact upon job perfor‐
126              mance,  but a value of 30 seconds is not likely to be noticeable
127              for applications having less  than  10,000  tasks.  This  option
128              applies job allocations.
129
130
131       -B --extra-node-info=<sockets[:cores[:threads]]>
132              Restrict  node  selection  to  nodes with at least the specified
133              number of sockets, cores per socket  and/or  threads  per  core.
134              NOTE: These options do not specify the resource allocation size.
135              Each value specified is considered a minimum.  An  asterisk  (*)
136              can  be  used  as  a  placeholder  indicating that all available
137              resources of that type are to be utilized. Values  can  also  be
138              specified  as  min-max. The individual levels can also be speci‐
139              fied in separate options if desired:
140                  --sockets-per-node=<sockets>
141                  --cores-per-socket=<cores>
142                  --threads-per-core=<threads>
143              If task/affinity plugin is enabled, then specifying  an  alloca‐
144              tion  in  this  manner  also sets a default --cpu-bind option of
145              threads if the -B option specifies a thread count, otherwise  an
146              option  of  cores  if  a  core  count is specified, otherwise an
147              option   of   sockets.    If   SelectType   is   configured   to
148              select/cons_res,   it   must   have   a  parameter  of  CR_Core,
149              CR_Core_Memory, CR_Socket, or CR_Socket_Memory for  this  option
150              to  be  honored.   If  not specified, the scontrol show job will
151              display 'ReqS:C:T=*:*:*'. This option  applies  to  job  alloca‐
152              tions.
153
154
155       --bb=<spec>
156              Burst  buffer  specification.  The  form of the specification is
157              system dependent.  Also see --bbf. This option  applies  to  job
158              allocations.
159
160
161       --bbf=<file_name>
162              Path of file containing burst buffer specification.  The form of
163              the specification is system  dependent.   Also  see  --bb.  This
164              option applies to job allocations.
165
166
167       --bcast[=<dest_path>]
168              Copy executable file to allocated compute nodes.  If a file name
169              is specified, copy the executable to the  specified  destination
170              file  path.  If  no  path  is specified, copy the file to a file
171              named "slurm_bcast_<job_id>.<step_id>" in the  current  working.
172              For  example,  "srun  --bcast=/tmp/mine -N3 a.out" will copy the
173              file "a.out" from your current directory to the file "/tmp/mine"
174              on  each  of  the three allocated compute nodes and execute that
175              file. This option applies to step allocations.
176
177
178       -b, --begin=<time>
179              Defer initiation of this  job  until  the  specified  time.   It
180              accepts  times  of  the form HH:MM:SS to run a job at a specific
181              time of day (seconds are optional).  (If that  time  is  already
182              past,  the next day is assumed.)  You may also specify midnight,
183              noon, fika (3  PM)  or  teatime  (4  PM)  and  you  can  have  a
184              time-of-day suffixed with AM or PM for running in the morning or
185              the evening.  You can also say what day the job will be run,  by
186              specifying  a  date  of  the form MMDDYY or MM/DD/YY YYYY-MM-DD.
187              Combine   date   and   time   using   the    following    format
188              YYYY-MM-DD[THH:MM[:SS]].  You  can  also  give  times like now +
189              count time-units, where the time-units can be seconds (default),
190              minutes, hours, days, or weeks and you can tell Slurm to run the
191              job today with the keyword today and to  run  the  job  tomorrow
192              with  the  keyword tomorrow.  The value may be changed after job
193              submission using the scontrol command.  For example:
194                 --begin=16:00
195                 --begin=now+1hour
196                 --begin=now+60           (seconds by default)
197                 --begin=2010-01-20T12:34:00
198
199
200              Notes on date/time specifications:
201               - Although the 'seconds' field of the HH:MM:SS time  specifica‐
202              tion  is  allowed  by  the  code, note that the poll time of the
203              Slurm scheduler is not precise enough to guarantee  dispatch  of
204              the  job on the exact second.  The job will be eligible to start
205              on the next poll following the specified time.  The  exact  poll
206              interval  depends  on the Slurm scheduler (e.g., 60 seconds with
207              the default sched/builtin).
208               -  If  no  time  (HH:MM:SS)  is  specified,  the   default   is
209              (00:00:00).
210               -  If a date is specified without a year (e.g., MM/DD) then the
211              current year is assumed, unless the  combination  of  MM/DD  and
212              HH:MM:SS  has  already  passed  for that year, in which case the
213              next year is used.
214              This option applies to job allocations.
215
216
217       --checkpoint=<time>
218              Specifies the interval between creating checkpoints of  the  job
219              step.   By  default,  the job step will have no checkpoints cre‐
220              ated.  Acceptable time formats include "minutes",  "minutes:sec‐
221              onds",  "hours:minutes:seconds",  "days-hours", "days-hours:min‐
222              utes" and "days-hours:minutes:seconds". This option  applies  to
223              job and step allocations.
224
225
226       --cluster-constraint=<list>
227              Specifies  features that a federated cluster must have to have a
228              sibling job submitted to it. Slurm will attempt to submit a sib‐
229              ling  job  to  a cluster if it has at least one of the specified
230              features.
231
232
233       --comment=<string>
234              An arbitrary comment. This option applies to job allocations.
235
236
237       --compress[=type]
238              Compress file before sending it to compute hosts.  The  optional
239              argument  specifies  the  data  compression  library to be used.
240              Supported values are "lz4" (default) and "zlib".  Some  compres‐
241              sion libraries may be unavailable on some systems.  For use with
242              the --bcast option. This option applies to step allocations.
243
244
245       -C, --constraint=<list>
246              Nodes can have features assigned to them by the  Slurm  adminis‐
247              trator.   Users can specify which of these features are required
248              by their job using the constraint  option.   Only  nodes  having
249              features  matching  the  job constraints will be used to satisfy
250              the request.  Multiple constraints may be  specified  with  AND,
251              OR,  matching  OR, resource counts, etc. (some operators are not
252              supported on all system types).   Supported  constraint  options
253              include:
254
255              Single Name
256                     Only nodes which have the specified feature will be used.
257                     For example, --constraint="intel"
258
259              Node Count
260                     A request can specify the number  of  nodes  needed  with
261                     some feature by appending an asterisk and count after the
262                     feature   name.    For   example    "--nodes=16    --con‐
263                     straint=graphics*4  ..."  indicates that the job requires
264                     16 nodes and that at least four of those nodes must  have
265                     the feature "graphics."
266
267              AND    If  only  nodes  with  all  of specified features will be
268                     used.  The ampersand is used for an  AND  operator.   For
269                     example, --constraint="intel&gpu"
270
271              OR     If  only  nodes  with  at least one of specified features
272                     will be used.  The vertical bar is used for an OR  opera‐
273                     tor.  For example, --constraint="intel|amd"
274
275              Matching OR
276                     If  only  one of a set of possible options should be used
277                     for all allocated nodes, then use  the  OR  operator  and
278                     enclose the options within square brackets.  For example:
279                     "--constraint=[rack1|rack2|rack3|rack4]" might be used to
280                     specify that all nodes must be allocated on a single rack
281                     of the cluster, but any of those four racks can be used.
282
283              Multiple Counts
284                     Specific counts of multiple resources may be specified by
285                     using  the  AND operator and enclosing the options within
286                     square     brackets.       For      example:      "--con‐
287                     straint=[rack1*2&rack2*4]"  might be used to specify that
288                     two nodes must be allocated from nodes with  the  feature
289                     of  "rack1"  and  four nodes must be allocated from nodes
290                     with the feature "rack2".
291
292                     NOTE: This construct does not support multiple Intel  KNL
293                     NUMA   or   MCDRAM  modes.  For  example,  while  "--con‐
294                     straint=[(knl&quad)*2&(knl&hemi)*4]"  is  not  supported,
295                     "--constraint=[haswell*2&(knl&hemi)*4]"   is   supported.
296                     Specification of multiple KNL modes requires the use of a
297                     heterogeneous job.
298
299
300              Parenthesis
301                     Parenthesis  can  be  used  to  group  like node features
302                     together.          For          example           "--con‐
303                     straint=[(knl&snc4&flat)*4&haswell*1]"  might  be used to
304                     specify that four nodes with the features  "knl",  "snc4"
305                     and  "flat"  plus one node with the feature "haswell" are
306                     required.  All  options  within  parenthesis  should   be
307                     grouped with AND (e.g. "&") operands.
308
309       WARNING:  When  srun is executed from within salloc or sbatch, the con‐
310       straint value can only contain a single feature name. None of the other
311       operators are currently supported for job steps.
312       This option applies to job and step allocations.
313
314
315       --contiguous
316              If  set,  then  the  allocated nodes must form a contiguous set.
317              Not honored with the topology/tree or topology/3d_torus plugins,
318              both  of which can modify the node ordering. This option applies
319              to job allocations.
320
321
322       --cores-per-socket=<cores>
323              Restrict node selection to nodes with  at  least  the  specified
324              number of cores per socket.  See additional information under -B
325              option above when task/affinity plugin is enabled.  This  option
326              applies to job allocations.
327
328
329       --cpu-bind=[{quiet,verbose},]type
330              Bind  tasks  to  CPUs.   Used  only  when  the  task/affinity or
331              task/cgroup plugin is  enabled.   NOTE:  To  have  Slurm  always
332              report  on the selected CPU binding for all commands executed in
333              a  shell,  you  can  enable  verbose   mode   by   setting   the
334              SLURM_CPU_BIND environment variable value to "verbose".
335
336              The  following  informational environment variables are set when
337              --cpu-bind is in use:
338                   SLURM_CPU_BIND_VERBOSE
339                   SLURM_CPU_BIND_TYPE
340                   SLURM_CPU_BIND_LIST
341
342              See the  ENVIRONMENT  VARIABLES  section  for  a  more  detailed
343              description  of  the  individual SLURM_CPU_BIND variables. These
344              variable are available only if the task/affinity plugin is  con‐
345              figured.
346
347              When  using --cpus-per-task to run multithreaded tasks, be aware
348              that CPU binding is inherited from the parent  of  the  process.
349              This  means that the multithreaded task should either specify or
350              clear the CPU binding itself to avoid having all threads of  the
351              multithreaded  task use the same mask/CPU as the parent.  Alter‐
352              natively, fat masks (masks which specify more than  one  allowed
353              CPU)  could  be  used for the tasks in order to provide multiple
354              CPUs for the multithreaded tasks.
355
356              By default, a job step has access to every CPU allocated to  the
357              job.   To  ensure  that  distinct CPUs are allocated to each job
358              step, use the --exclusive option.
359
360              Note that a job step can be allocated different numbers of  CPUs
361              on each node or be allocated CPUs not starting at location zero.
362              Therefore one of the options which  automatically  generate  the
363              task  binding  is  recommended.   Explicitly  specified masks or
364              bindings are only honored when the job step has  been  allocated
365              every available CPU on the node.
366
367              Binding  a task to a NUMA locality domain means to bind the task
368              to the set of CPUs that belong to the NUMA  locality  domain  or
369              "NUMA  node".   If NUMA locality domain options are used on sys‐
370              tems with no NUMA support, then  each  socket  is  considered  a
371              locality domain.
372
373              If  the  --cpu-bind option is not used, the default binding mode
374              will depend upon Slurm's configuration and the  step's  resource
375              allocation.   If  all  allocated  nodes have the same configured
376              CpuBind mode, that will be used.  Otherwise if the job's  Parti‐
377              tion  has  a configured CpuBind mode, that will be used.  Other‐
378              wise if Slurm has a configured TaskPluginParam value, that  mode
379              will  be used.  Otherwise automatic binding will be performed as
380              described below.
381
382
383              Auto Binding
384                     Applies only when task/affinity is enabled.  If  the  job
385                     step  allocation  includes an allocation with a number of
386                     sockets, cores, or threads equal to the number  of  tasks
387                     times  cpus-per-task,  then  the tasks will by default be
388                     bound to the appropriate resources (auto  binding).  Dis‐
389                     able   this  mode  of  operation  by  explicitly  setting
390                     "--cpu-bind=none".       Use        TaskPluginParam=auto‐
391                     bind=[threads|cores|sockets] to set a default cpu binding
392                     in case "auto binding" doesn't find a match.
393
394              Supported options include:
395
396                     q[uiet]
397                            Quietly bind before task runs (default)
398
399                     v[erbose]
400                            Verbosely report binding before task runs
401
402                     no[ne] Do not bind tasks to  CPUs  (default  unless  auto
403                            binding is applied)
404
405                     rank   Automatically  bind by task rank.  The lowest num‐
406                            bered task on each node is  bound  to  socket  (or
407                            core  or  thread) zero, etc.  Not supported unless
408                            the entire node is allocated to the job.
409
410                     map_cpu:<list>
411                            Bind by setting CPU masks on tasks (or  ranks)  as
412                            specified          where         <list>         is
413                            <cpu_id_for_task_0>,<cpu_id_for_task_1>,...    CPU
414                            IDs  are interpreted as decimal values unless they
415                            are preceded with '0x' in which case  they  inter‐
416                            preted  as  hexadecimal  values.  If the number of
417                            tasks (or ranks) exceeds the number of elements in
418                            this  list, elements in the list will be reused as
419                            needed starting from the beginning  of  the  list.
420                            To  simplify  support  for  large task counts, the
421                            lists may follow a map with an asterisk and  repe‐
422                            tition  count For example "map_cpu:0x0f*4,0xf0*4".
423                            Not supported unless the entire node is  allocated
424                            to the job.
425
426                     mask_cpu:<list>
427                            Bind  by  setting CPU masks on tasks (or ranks) as
428                            specified         where         <list>          is
429                            <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
430                            The mapping is specified for a node and  identical
431                            mapping  is  applied  to  the  tasks on every node
432                            (i.e. the lowest task ID on each node is mapped to
433                            the  first mask specified in the list, etc.).  CPU
434                            masks are always interpreted as hexadecimal values
435                            but  can  be  preceded  with an optional '0x'. Not
436                            supported unless the entire node is  allocated  to
437                            the  job.   To  simplify  support  for  large task
438                            counts, the lists may follow a map with an  aster‐
439                            isk    and    repetition    count    For   example
440                            "mask_cpu:0x0f*4,0xf0*4".   Not  supported  unless
441                            the entire node is allocated to the job.
442
443                     rank_ldom
444                            Bind  to  a NUMA locality domain by rank. Not sup‐
445                            ported unless the entire node is allocated to  the
446                            job.
447
448                     map_ldom:<list>
449                            Bind  by mapping NUMA locality domain IDs to tasks
450                            as      specified      where       <list>       is
451                            <ldom1>,<ldom2>,...<ldomN>.   The  locality domain
452                            IDs are interpreted as decimal values unless  they
453                            are  preceded  with  '0x'  in  which case they are
454                            interpreted as hexadecimal values.  Not  supported
455                            unless the entire node is allocated to the job.
456
457                     mask_ldom:<list>
458                            Bind  by  setting  NUMA  locality  domain masks on
459                            tasks    as    specified    where    <list>     is
460                            <mask1>,<mask2>,...<maskN>.   NUMA locality domain
461                            masks are always interpreted as hexadecimal values
462                            but  can  be  preceded with an optional '0x'.  Not
463                            supported unless the entire node is  allocated  to
464                            the job.
465
466                     sockets
467                            Automatically  generate  masks  binding  tasks  to
468                            sockets.  Only the CPUs on the socket  which  have
469                            been  allocated  to  the job will be used.  If the
470                            number of tasks differs from the number  of  allo‐
471                            cated sockets this can result in sub-optimal bind‐
472                            ing.
473
474                     cores  Automatically  generate  masks  binding  tasks  to
475                            cores.   If  the  number of tasks differs from the
476                            number of  allocated  cores  this  can  result  in
477                            sub-optimal binding.
478
479                     threads
480                            Automatically  generate  masks  binding  tasks  to
481                            threads.  If the number of tasks differs from  the
482                            number  of  allocated  threads  this can result in
483                            sub-optimal binding.
484
485                     ldoms  Automatically generate masks binding tasks to NUMA
486                            locality  domains.  If the number of tasks differs
487                            from the number of allocated locality domains this
488                            can result in sub-optimal binding.
489
490                     boards Automatically  generate  masks  binding  tasks  to
491                            boards.  If the number of tasks differs  from  the
492                            number  of  allocated  boards  this  can result in
493                            sub-optimal binding. This option is  supported  by
494                            the task/cgroup plugin only.
495
496                     help   Show help message for cpu-bind
497
498              This option applies to job and step allocations.
499
500
501       --cpu-freq =<p1[-p2[:p3]]>
502
503              Request  that the job step initiated by this srun command be run
504              at some requested frequency if possible, on  the  CPUs  selected
505              for the step on the compute node(s).
506
507              p1  can be  [#### | low | medium | high | highm1] which will set
508              the frequency scaling_speed to the corresponding value, and  set
509              the frequency scaling_governor to UserSpace. See below for defi‐
510              nition of the values.
511
512              p1 can be [Conservative | OnDemand |  Performance  |  PowerSave]
513              which  will set the scaling_governor to the corresponding value.
514              The governor has to be in the list set by the slurm.conf  option
515              CpuFreqGovernors.
516
517              When p2 is present, p1 will be the minimum scaling frequency and
518              p2 will be the maximum scaling frequency.
519
520              p2 can be  [#### | medium | high | highm1] p2  must  be  greater
521              than p1.
522
523              p3  can  be [Conservative | OnDemand | Performance | PowerSave |
524              UserSpace] which will set  the  governor  to  the  corresponding
525              value.
526
527              If p3 is UserSpace, the frequency scaling_speed will be set by a
528              power or energy aware scheduling strategy to a value between  p1
529              and  p2  that lets the job run within the site's power goal. The
530              job may be delayed if p1 is higher than a frequency that  allows
531              the job to run within the goal.
532
533              If  the current frequency is < min, it will be set to min. Like‐
534              wise, if the current frequency is > max, it will be set to max.
535
536              Acceptable values at present include:
537
538              ####          frequency in kilohertz
539
540              Low           the lowest available frequency
541
542              High          the highest available frequency
543
544              HighM1        (high minus one)  will  select  the  next  highest
545                            available frequency
546
547              Medium        attempts  to  set a frequency in the middle of the
548                            available range
549
550              Conservative  attempts to use the Conservative CPU governor
551
552              OnDemand      attempts to use the  OnDemand  CPU  governor  (the
553                            default value)
554
555              Performance   attempts to use the Performance CPU governor
556
557              PowerSave     attempts to use the PowerSave CPU governor
558
559              UserSpace     attempts to use the UserSpace CPU governor
560
561
562              The  following  informational environment variable is set
563              in the job
564              step when --cpu-freq option is requested.
565                      SLURM_CPU_FREQ_REQ
566
567              This environment variable can also be used to supply  the  value
568              for  the CPU frequency request if it is set when the 'srun' com‐
569              mand is issued.  The --cpu-freq on the command line  will  over‐
570              ride  the  environment variable value.  The form on the environ‐
571              ment variable is the same as the command line.  See the ENVIRON‐
572              MENT    VARIABLES    section    for   a   description   of   the
573              SLURM_CPU_FREQ_REQ variable.
574
575              NOTE: This parameter is treated as a request, not a requirement.
576              If  the  job  step's  node does not support setting the CPU fre‐
577              quency, or the requested value is  outside  the  bounds  of  the
578              legal  frequencies,  an  error  is  logged,  but the job step is
579              allowed to continue.
580
581              NOTE: Setting the frequency for just the CPUs of  the  job  step
582              implies that the tasks are confined to those CPUs.  If task con‐
583              finement    (i.e.,    TaskPlugin=task/affinity    or    TaskPlu‐
584              gin=task/cgroup with the "ConstrainCores" option) is not config‐
585              ured, this parameter is ignored.
586
587              NOTE: When the step completes, the  frequency  and  governor  of
588              each selected CPU is reset to the previous values.
589
590              NOTE: When submitting jobs with  the --cpu-freq option with lin‐
591              uxproc as the ProctrackType can cause jobs to  run  too  quickly
592              before  Accounting  is  able  to  poll for job information. As a
593              result not all of accounting information will be present.
594
595              This option applies to job and step allocations.
596
597
598       --cpus-per-gpu=<ncpus>
599              Advise Slurm that ensuing job steps will require  ncpus  proces‐
600              sors  per  allocated GPU.  Requires the --gpus option.  Not com‐
601              patible with the --cpus-per-task option.
602
603
604       -c, --cpus-per-task=<ncpus>
605              Request that ncpus be allocated per process. This may be  useful
606              if  the  job is multithreaded and requires more than one CPU per
607              task for  optimal  performance.  The  default  is  one  CPU  per
608              process.   If  -c is specified without -n, as many tasks will be
609              allocated per node as possible while satisfying the -c  restric‐
610              tion.  For  instance  on  a  cluster with 8 CPUs per node, a job
611              request for 4 nodes and 3 CPUs per task may be allocated 3 or  6
612              CPUs  per  node  (1 or 2 tasks per node) depending upon resource
613              consumption by other jobs. Such a job may be unable  to  execute
614              more than a total of 4 tasks.  This option may also be useful to
615              spawn tasks without allocating resources to the  job  step  from
616              the  job's  allocation  when running multiple job steps with the
617              --exclusive option.
618
619              WARNING: There are configurations and options  interpreted  dif‐
620              ferently by job and job step requests which can result in incon‐
621              sistencies   for   this   option.    For   example   srun    -c2
622              --threads-per-core=1  prog  may  allocate two cores for the job,
623              but if each of those cores contains two threads, the job alloca‐
624              tion  will  include four CPUs. The job step allocation will then
625              launch two threads per CPU for a total of two tasks.
626
627              WARNING: When srun is executed from  within  salloc  or  sbatch,
628              there  are configurations and options which can result in incon‐
629              sistent allocations when -c has a value greater than -c on  sal‐
630              loc or sbatch.
631
632              This option applies to job allocations.
633
634
635       --deadline=<OPT>
636              remove  the  job  if  no ending is possible before this deadline
637              (start > (deadline -  time[-min])).   Default  is  no  deadline.
638              Valid time formats are:
639              HH:MM[:SS] [AM|PM]
640              MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
641              MM/DD[/YY]-HH:MM[:SS]
642              YYYY-MM-DD[THH:MM[:SS]]]
643
644              This option applies only to job allocations.
645
646
647       --delay-boot=<minutes>
648              Do  not  reboot  nodes  in order to satisfied this job's feature
649              specification if the job has been eligible to run for less  than
650              this time period.  If the job has waited for less than the spec‐
651              ified period, it will use only  nodes  which  already  have  the
652              specified  features.   The  argument  is in units of minutes.  A
653              default value may be set by a  system  administrator  using  the
654              delay_boot   option  of  the  SchedulerParameters  configuration
655              parameter in the slurm.conf file, otherwise the default value is
656              zero (no delay).
657
658              This option applies only to job allocations.
659
660
661       -d, --dependency=<dependency_list>
662              Defer  the  start  of  this job until the specified dependencies
663              have been satisfied completed. This option does not apply to job
664              steps  (executions  of  srun within an existing salloc or sbatch
665              allocation) only to job allocations.   <dependency_list>  is  of
666              the    form   <type:job_id[:job_id][,type:job_id[:job_id]]>   or
667              <type:job_id[:job_id][?type:job_id[:job_id]]>.  All dependencies
668              must  be satisfied if the "," separator is used.  Any dependency
669              may be satisfied if the "?" separator is used.   Many  jobs  can
670              share the same dependency and these jobs may even belong to dif‐
671              ferent  users. The  value may be changed  after  job  submission
672              using  the scontrol command.  Once a job dependency fails due to
673              the termination state of a preceding job, the dependent job will
674              never  be  run,  even if the preceding job is requeued and has a
675              different termination state  in  a  subsequent  execution.  This
676              option applies to job allocations.
677
678              after:job_id[:jobid...]
679                     This  job  can  begin  execution after the specified jobs
680                     have begun execution.
681
682              afterany:job_id[:jobid...]
683                     This job can begin execution  after  the  specified  jobs
684                     have terminated.
685
686              afterburstbuffer:job_id[:jobid...]
687                     This  job  can  begin  execution after the specified jobs
688                     have terminated and any associated burst buffer stage out
689                     operations have completed.
690
691              aftercorr:job_id[:jobid...]
692                     A  task  of  this job array can begin execution after the
693                     corresponding task ID in the specified job has  completed
694                     successfully  (ran  to  completion  with  an exit code of
695                     zero).
696
697              afternotok:job_id[:jobid...]
698                     This job can begin execution  after  the  specified  jobs
699                     have terminated in some failed state (non-zero exit code,
700                     node failure, timed out, etc).
701
702              afterok:job_id[:jobid...]
703                     This job can begin execution  after  the  specified  jobs
704                     have  successfully  executed  (ran  to completion with an
705                     exit code of zero).
706
707              expand:job_id
708                     Resources allocated to this job should be used to  expand
709                     the specified job.  The job to expand must share the same
710                     QOS (Quality of Service) and partition.  Gang  scheduling
711                     of resources in the partition is also not supported.
712
713              singleton
714                     This   job  can  begin  execution  after  any  previously
715                     launched jobs sharing the same job  name  and  user  have
716                     terminated.   In  other  words, only one job by that name
717                     and owned by that user can be running or suspended at any
718                     point in time.
719
720
721       -D, --chdir=<path>
722              Have  the  remote  processes do a chdir to path before beginning
723              execution. The default is to chdir to the current working direc‐
724              tory of the srun process. The path can be specified as full path
725              or relative path to the directory where the command is executed.
726              This option applies to job allocations.
727
728
729       -e, --error=<filename pattern>
730              Specify  how  stderr is to be redirected. By default in interac‐
731              tive mode, srun redirects stderr to the same file as stdout,  if
732              one is specified. The --error option is provided to allow stdout
733              and stderr to be redirected to different locations.  See IO  Re‐
734              direction below for more options.  If the specified file already
735              exists, it will be overwritten. This option applies to  job  and
736              step allocations.
737
738
739       -E, --preserve-env
740              Pass the current values of environment variables SLURM_JOB_NODES
741              and SLURM_NTASKS through to the executable, rather than  comput‐
742              ing them from commandline parameters. This option applies to job
743              allocations.
744
745
746       --epilog=<executable>
747              srun will run executable just after the job step completes.  The
748              command  line  arguments  for executable will be the command and
749              arguments of the job step.  If executable  is  "none",  then  no
750              srun epilog will be run. This parameter overrides the SrunEpilog
751              parameter in slurm.conf. This parameter is  completely  indepen‐
752              dent  from  the  Epilog  parameter  in  slurm.conf.  This option
753              applies to job allocations.
754
755
756
757       --exclusive[=user|mcs]
758              This option applies to job and job step allocations, and has two
759              slightly different meanings for each one.  When used to initiate
760              a job, the job allocation cannot share nodes with other  running
761              jobs   (or  just  other  users with the "=user" option or "=mcs"
762              option).  The default shared/exclusive behavior depends on  sys‐
763              tem configuration and the partition's OverSubscribe option takes
764              precedence over the job's option.
765
766              This option can also be used when initiating more than  one  job
767              step within an existing resource allocation, where you want sep‐
768              arate processors to be dedicated to each job step. If sufficient
769              processors  are  not available to initiate the job step, it will
770              be deferred. This can be thought of as providing a mechanism for
771              resource management to the job within it's allocation.
772
773              The  exclusive  allocation  of  CPUs  only  applies to job steps
774              explicitly invoked with the --exclusive option.  For example,  a
775              job  might  be  allocated  one  node with four CPUs and a remote
776              shell invoked on the  allocated  node.  If  that  shell  is  not
777              invoked  with  the  --exclusive option, then it may create a job
778              step with four tasks using the --exclusive option and  not  con‐
779              flict  with  the  remote  shell's  resource allocation.  Use the
780              --exclusive option to invoke every job step to  ensure  distinct
781              resources for each step.
782
783              Note  that all CPUs allocated to a job are available to each job
784              step unless the --exclusive option is used plus task affinity is
785              configured.  Since resource management is provided by processor,
786              the --ntasks option must be specified, but the following options
787              should  NOT  be  specified --relative, --distribution=arbitrary.
788              See EXAMPLE below.
789
790
791       --export=<environment variables [ALL] | NONE>
792              Identify which  environment  variables  are  propagated  to  the
793              launched application.  By default, all are propagated.  Multiple
794              environment variable names should be comma separated.   Environ‐
795              ment  variable  names  may be specified to propagate the current
796              value  (e.g.  "--export=EDITOR")  or  specific  values  may   be
797              exported (e.g. "--export=EDITOR=/bin/emacs"). In these two exam‐
798              ples, the propagated environment will only contain the  variable
799              EDITOR.   If  one  desires  to add to the environment instead of
800              replacing   it,   have   the   argument   include   ALL    (e.g.
801              "--export=ALL,EDITOR=/bin/emacs").   This  will propagate EDITOR
802              along with the current environment.  Unlike sbatch,  if  ALL  is
803              specified,  any  additional  specified environment variables are
804              ignored.  If one desires no environment variables be propagated,
805              use  the  argument NONE.  Regardless of this setting, the appro‐
806              priate SLURM_* task environment variables are always exported to
807              the  environment.   srun  may deviate from the above behavior if
808              the default launch plugin, launch/slurm, is not used.
809
810
811       -F, --nodefile=<node file>
812              Much like --nodelist, but the list is contained  in  a  file  of
813              name node file.  The node names of the list may also span multi‐
814              ple lines in the file.    Duplicate node names in the file  will
815              be  ignored.   The  order  of  the node names in the list is not
816              important; the node names will be sorted by Slurm.
817
818
819       --gid=<group>
820              If srun is run as root, and the --gid option is used, submit the
821              job  with  group's  group  access permissions.  group may be the
822              group name or the numerical group ID. This option applies to job
823              allocations.
824
825
826       -G, --gpus=[<type>:]<number>
827              Specify  the  total  number  of  GPUs  required for the job.  An
828              optional GPU type specification can be  supplied.   For  example
829              "--gpus=volta:3".   Multiple options can be requested in a comma
830              separated list,  for  example:  "--gpus=volta:3,kepler:1".   See
831              also  the --gpus-per-node, --gpus-per-socket and --gpus-per-task
832              options.
833
834
835       --gpu-bind=<type>
836              Bind tasks to specific GPUs.  By default every spawned task  can
837              access every GPU allocated to the job.
838
839              Supported type options:
840
841              closest   Bind  each task to the GPU(s) which are closest.  In a
842                        NUMA environment, each task may be bound to more  than
843                        one GPU (i.e.  all GPUs in that NUMA environment).
844
845              map_gpu:<list>
846                        Bind by setting GPU masks on tasks (or ranks) as spec‐
847                        ified            where            <list>            is
848                        <gpu_id_for_task_0>,<gpu_id_for_task_1>,...   GPU  IDs
849                        are interpreted as decimal values unless they are pre‐
850                        ceded  with  '0x'  in  which  case they interpreted as
851                        hexadecimal values. If the number of tasks (or  ranks)
852                        exceeds  the number of elements in this list, elements
853                        in the list will be reused as needed starting from the
854                        beginning  of  the list. To simplify support for large
855                        task counts, the lists may follow a map with an aster‐
856                        isk     and    repetition    count.     For    example
857                        "map_cpu:0*4,1*4".  Not supported  unless  the  entire
858                        node is allocated to the job.
859
860              mask_gpu:<list>
861                        Bind by setting GPU masks on tasks (or ranks) as spec‐
862                        ified            where            <list>            is
863                        <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,...    The
864                        mapping is specified for a node and identical  mapping
865                        is applied to the tasks on every node (i.e. the lowest
866                        task ID on each node is mapped to the first mask spec‐
867                        ified  in the list, etc.). GPU masks are always inter‐
868                        preted as hexadecimal values but can be preceded  with
869                        an optional '0x'. Not supported unless the entire node
870                        is allocated to the job. To simplify support for large
871                        task counts, the lists may follow a map with an aster‐
872                        isk    and    repetition    count.     For     example
873                        "mask_gpu:0x0f*4,0xf0*4".   Not  supported  unless the
874                        entire node is allocated to the job.
875
876
877       --gpu-freq=[<type]=value>[,<type=value>][,verbose]
878              Request that GPUs allocated to the job are configured with  spe‐
879              cific  frequency  values.   This  option can be used to indepen‐
880              dently configure the GPU and its memory frequencies.  After  the
881              job  is  completed, the frequencies of all affected GPUs will be
882              reset to the highest possible values.   In  some  cases,  system
883              power  caps  may  override the requested values.  The field type
884              can be "memory".  If type is not specified, the GPU frequency is
885              implied.  The value field can either be "low", "medium", "high",
886              "highm1" or a numeric value in megahertz (MHz).  If  the  speci‐
887              fied numeric value is not possible, a value as close as possible
888              will be used. See below for definition of the values.  The  ver‐
889              bose  option  causes  current  GPU  frequency  information to be
890              logged.  Examples of use include "--gpu-freq=medium,memory=high"
891              and "--gpu-freq=450".
892
893              Supported value definitions:
894
895              low       the lowest available frequency.
896
897              medium    attempts  to  set  a  frequency  in  the middle of the
898                        available range.
899
900              high      the highest available frequency.
901
902              highm1    (high minus one) will select the next  highest  avail‐
903                        able frequency.
904
905
906       --gpus-per-node=[<type>:]<number>
907              Specify  the  number  of  GPUs required for the job on each node
908              included in the job's resource allocation.  An optional GPU type
909              specification      can     be     supplied.      For     example
910              "--gpus-per-node=volta:3".  Multiple options can be requested in
911              a       comma       separated       list,      for      example:
912              "--gpus-per-node=volta:3,kepler:1".   See   also   the   --gpus,
913              --gpus-per-socket and --gpus-per-task options.
914
915
916       --gpus-per-socket=[<type>:]<number>
917              Specify  the  number of GPUs required for the job on each socket
918              included in the job's resource allocation.  An optional GPU type
919              specification      can     be     supplied.      For     example
920              "--gpus-per-socket=volta:3".  Multiple options can be  requested
921              in      a     comma     separated     list,     for     example:
922              "--gpus-per-socket=volta:3,kepler:1".  Requires job to specify a
923              sockets  per  node  count  (  --sockets-per-node).  See also the
924              --gpus,  --gpus-per-node  and  --gpus-per-task  options.    This
925              option applies to job allocations.
926
927
928       --gpus-per-task=[<type>:]<number>
929              Specify  the number of GPUs required for the job on each task to
930              be spawned in the job's resource allocation.   An  optional  GPU
931              type  specification  can  be supplied.  This option requires the
932              specification    of    a    task     count.      For     example
933              "--gpus-per-task=volta:1".  Multiple options can be requested in
934              a      comma      separated       list,       for       example:
935              "--gpus-per-task=volta:3,kepler:1".   Requires  job to specify a
936              task count (--nodes).  See also  the  --gpus,  --gpus-per-socket
937              and --gpus-per-node options.
938
939
940       --gres=<list>
941              Specifies   a   comma   delimited  list  of  generic  consumable
942              resources.   The  format  of  each  entry   on   the   list   is
943              "name[[:type]:count]".   The  name  is  that  of  the consumable
944              resource.  The count is the number of  those  resources  with  a
945              default  value  of 1.  The count can have a suffix of "k" or "K"
946              (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
947              "G"  (multiple  of  1024 x 1024 x 1024), "t" or "T" (multiple of
948              1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x  1024
949              x  1024  x  1024 x 1024).  The specified resources will be allo‐
950              cated to the job on each node.  The available generic consumable
951              resources  is  configurable by the system administrator.  A list
952              of available generic consumable resources will  be  printed  and
953              the  command  will exit if the option argument is "help".  Exam‐
954              ples of use include "--gres=gpu:2,mic:1", "--gres=gpu:kepler:2",
955              and  "--gres=help".   NOTE:  This option applies to job and step
956              allocations. By default, a job step  is  allocated  all  of  the
957              generic resources that have allocated to the job.  To change the
958              behavior  so  that  each  job  step  is  allocated  no   generic
959              resources,  explicitly  set  the value of --gres to specify zero
960              counts for each generic resource OR set "--gres=none" OR set the
961              SLURM_STEP_GRES environment variable to "none".
962
963
964       --gres-flags=<type>
965              Specify  generic  resource  task  binding  options.  This option
966              applies to job allocations.
967
968              disable-binding
969                     Disable  filtering  of  CPUs  with  respect  to   generic
970                     resource  locality.  This option is currently required to
971                     use more CPUs than are bound to a GRES (i.e. if a GPU  is
972                     bound  to  the  CPUs on one socket, but resources on more
973                     than one socket are  required  to  run  the  job).   This
974                     option  may permit a job to be allocated resources sooner
975                     than otherwise possible, but may result in lower job per‐
976                     formance.
977
978              enforce-binding
979                     The only CPUs available to the job will be those bound to
980                     the selected  GRES  (i.e.  the  CPUs  identified  in  the
981                     gres.conf  file  will  be strictly enforced). This option
982                     may result in delayed initiation of a job.  For example a
983                     job  requiring two GPUs and one CPU will be delayed until
984                     both GPUs on a single socket are  available  rather  than
985                     using  GPUs bound to separate sockets, however the appli‐
986                     cation performance may be improved due to improved commu‐
987                     nication  speed.  Requires the node to be configured with
988                     more than one socket and resource filtering will be  per‐
989                     formed on a per-socket basis.
990
991
992       -H, --hold
993              Specify  the job is to be submitted in a held state (priority of
994              zero).  A held job can now be released using scontrol  to  reset
995              its  priority  (e.g.  "scontrol  release <job_id>"). This option
996              applies to job allocations.
997
998
999       -h, --help
1000              Display help information and exit.
1001
1002
1003       --hint=<type>
1004              Bind tasks according to application hints.
1005
1006              compute_bound
1007                     Select settings for compute bound applications:  use  all
1008                     cores in each socket, one thread per core.
1009
1010              memory_bound
1011                     Select  settings  for memory bound applications: use only
1012                     one core in each socket, one thread per core.
1013
1014              [no]multithread
1015                     [don't] use extra threads  with  in-core  multi-threading
1016                     which  can  benefit communication intensive applications.
1017                     Only supported with the task/affinity plugin.
1018
1019              help   show this help message
1020
1021              This option applies to job allocations.
1022
1023
1024       -I, --immediate[=<seconds>]
1025              exit if resources are not available within the time period spec‐
1026              ified.   If  no  argument  is given, resources must be available
1027              immediately for the request to succeed.  By default, --immediate
1028              is off, and the command will block until resources become avail‐
1029              able. Since this option's argument is optional, for proper pars‐
1030              ing  the  single letter option must be followed immediately with
1031              the value and not include a  space  between  them.  For  example
1032              "-I60"  and  not  "-I  60".  This option applies to job and step
1033              allocations.
1034
1035
1036       -i, --input=<mode>
1037              Specify how stdin is to redirected. By default,  srun  redirects
1038              stdin  from the terminal all tasks. See IO Redirection below for
1039              more options.  For OS X, the poll() function  does  not  support
1040              stdin,  so  input  from  a terminal is not possible. This option
1041              applies to job and step allocations.
1042
1043
1044       -J, --job-name=<jobname>
1045              Specify a name for the job. The specified name will appear along
1046              with the job id number when querying running jobs on the system.
1047              The default is the supplied  executable  program's  name.  NOTE:
1048              This  information  may be written to the slurm_jobacct.log file.
1049              This file is space delimited so if a space is used in  the  job‐
1050              name name it will cause problems in properly displaying the con‐
1051              tents of the slurm_jobacct.log file when the  sacct  command  is
1052              used. This option applies to job and step allocations.
1053
1054
1055       --jobid=<jobid>
1056              Initiate  a  job step under an already allocated job with job id
1057              id.  Using this option will cause srun to behave exactly  as  if
1058              the  SLURM_JOB_ID  environment  variable  was  set.  This option
1059              applies to step allocations.
1060
1061
1062       -K, --kill-on-bad-exit[=0|1]
1063              Controls whether or not to terminate a step if  any  task  exits
1064              with  a non-zero exit code. If this option is not specified, the
1065              default action will be based upon the Slurm configuration param‐
1066              eter of KillOnBadExit. If this option is specified, it will take
1067              precedence over KillOnBadExit. An option argument of  zero  will
1068              not  terminate  the job. A non-zero argument or no argument will
1069              terminate the job.  Note: This option takes precedence over  the
1070              -W,  --wait  option  to  terminate the job immediately if a task
1071              exits with a non-zero exit code.  Since this  option's  argument
1072              is optional, for proper parsing the single letter option must be
1073              followed immediately with the value  and  not  include  a  space
1074              between them. For example "-K1" and not "-K 1".
1075
1076
1077       -k, --no-kill [=off]
1078              Do  not automatically terminate a job if one of the nodes it has
1079              been allocated fails. This option applies to job and step  allo‐
1080              cations.    The   job   will  assume  all  responsibilities  for
1081              fault-tolerance.  Tasks launch using this  option  will  not  be
1082              considered  terminated  (e.g.  -K,  --kill-on-bad-exit  and  -W,
1083              --wait options will have no effect  upon  the  job  step).   The
1084              active  job step (MPI job) will likely suffer a fatal error, but
1085              subsequent job steps may be run if this option is specified.
1086
1087              Specify an optional argument of "off" disable the effect of  the
1088              SLURM_NO_KILL environment variable.
1089
1090              The default action is to terminate the job upon node failure.
1091
1092
1093       -l, --label
1094              Prepend  task number to lines of stdout/err.  The --label option
1095              will prepend lines of output  with  the  remote  task  id.  This
1096              option applies to step allocations.
1097
1098
1099       -L, --licenses=<license>
1100              Specification  of  licenses (or other resources available on all
1101              nodes of the cluster) which  must  be  allocated  to  this  job.
1102              License  names can be followed by a colon and count (the default
1103              count is one).  Multiple license names should be comma separated
1104              (e.g.  "--licenses=foo:4,bar"). This option applies to job allo‐
1105              cations.
1106
1107
1108       -M, --clusters=<string>
1109              Clusters to issue commands to.  Multiple cluster  names  may  be
1110              comma  separated.   The job will be submitted to the one cluster
1111              providing the earliest expected job initiation time. The default
1112              value is the current cluster. A value of 'all' will query to run
1113              on all clusters.  Note the --export option to  control  environ‐
1114              ment  variables  exported between clusters.  This option applies
1115              only to job allocations.  Note that the SlurmDBD must be up  for
1116              this option to work properly.
1117
1118
1119       -m, --distribution=
1120              *|block|cyclic|arbitrary|plane=<options>
1121              [:*|block|cyclic|fcyclic[:*|block|
1122              cyclic|fcyclic]][,Pack|NoPack]
1123
1124              Specify  alternate  distribution  methods  for remote processes.
1125              This option controls the distribution of tasks to the  nodes  on
1126              which  resources  have  been  allocated, and the distribution of
1127              those resources to tasks for binding (task affinity). The  first
1128              distribution  method (before the first ":") controls the distri‐
1129              bution of tasks to nodes.  The second distribution method (after
1130              the  first  ":")  controls  the  distribution  of allocated CPUs
1131              across sockets for binding  to  tasks.  The  third  distribution
1132              method (after the second ":") controls the distribution of allo‐
1133              cated CPUs across cores for binding to tasks.   The  second  and
1134              third distributions apply only if task affinity is enabled.  The
1135              third distribution is supported only if the  task/cgroup  plugin
1136              is  configured.  The default value for each distribution type is
1137              specified by *.
1138
1139              Note that with select/cons_res, the number of CPUs allocated  on
1140              each    socket   and   node   may   be   different.   Refer   to
1141              https://slurm.schedmd.com/mc_support.html for  more  information
1142              on  resource  allocation,  distribution  of  tasks to nodes, and
1143              binding of tasks to CPUs.
1144              First distribution method (distribution of tasks across nodes):
1145
1146
1147              *      Use the default method for distributing  tasks  to  nodes
1148                     (block).
1149
1150              block  The  block distribution method will distribute tasks to a
1151                     node such that consecutive tasks share a node. For  exam‐
1152                     ple,  consider an allocation of three nodes each with two
1153                     cpus. A four-task block distribution  request  will  dis‐
1154                     tribute  those  tasks to the nodes with tasks one and two
1155                     on the first node, task three on  the  second  node,  and
1156                     task  four  on the third node.  Block distribution is the
1157                     default behavior if the number of tasks exceeds the  num‐
1158                     ber of allocated nodes.
1159
1160              cyclic The cyclic distribution method will distribute tasks to a
1161                     node such that consecutive  tasks  are  distributed  over
1162                     consecutive  nodes  (in a round-robin fashion). For exam‐
1163                     ple, consider an allocation of three nodes each with  two
1164                     cpus.  A  four-task cyclic distribution request will dis‐
1165                     tribute those tasks to the nodes with tasks one and  four
1166                     on  the first node, task two on the second node, and task
1167                     three on the third node.  Note that  when  SelectType  is
1168                     select/cons_res, the same number of CPUs may not be allo‐
1169                     cated on each node. Task distribution will be round-robin
1170                     among  all  the  nodes  with  CPUs  yet to be assigned to
1171                     tasks.  Cyclic distribution is the  default  behavior  if
1172                     the number of tasks is no larger than the number of allo‐
1173                     cated nodes.
1174
1175              plane  The tasks are distributed in blocks of a specified  size.
1176                     The options include a number representing the size of the
1177                     task block.  This is followed by an  optional  specifica‐
1178                     tion  of  the  task distribution scheme within a block of
1179                     tasks and between the blocks of  tasks.   The  number  of
1180                     tasks  distributed to each node is the same as for cyclic
1181                     distribution, but  the  taskids  assigned  to  each  node
1182                     depend  on  the  plane  size. For more details (including
1183                     examples and diagrams), please see
1184                     https://slurm.schedmd.com/mc_support.html
1185                     and
1186                     https://slurm.schedmd.com/dist_plane.html
1187
1188              arbitrary
1189                     The arbitrary method of distribution will  allocate  pro‐
1190                     cesses in-order as listed in file designated by the envi‐
1191                     ronment variable SLURM_HOSTFILE.   If  this  variable  is
1192                     listed  it will over ride any other method specified.  If
1193                     not set the method will default  to  block.   Inside  the
1194                     hostfile  must  contain  at  minimum  the number of hosts
1195                     requested and be one per line  or  comma  separated.   If
1196                     specifying  a  task  count  (-n, --ntasks=<number>), your
1197                     tasks will be laid out on the nodes in the order  of  the
1198                     file.
1199                     NOTE:  The arbitrary distribution option on a job alloca‐
1200                     tion only controls the nodes to be allocated to  the  job
1201                     and  not  the  allocation  of  CPUs  on those nodes. This
1202                     option is meant primarily to control a  job  step's  task
1203                     layout  in  an  existing job allocation for the srun com‐
1204                     mand.
1205                     NOTE: If number of tasks is given and a list of requested
1206                     nodes  is  also  given the number of nodes used from that
1207                     list will be reduced to match that of the number of tasks
1208                     if  the  number  of nodes in the list is greater than the
1209                     number of tasks.
1210
1211
1212              Second distribution method (distribution of CPUs across  sockets
1213              for binding):
1214
1215
1216              *      Use the default method for distributing CPUs across sock‐
1217                     ets (cyclic).
1218
1219              block  The block distribution method will  distribute  allocated
1220                     CPUs  consecutively  from  the same socket for binding to
1221                     tasks, before using the next consecutive socket.
1222
1223              cyclic The cyclic distribution method will distribute  allocated
1224                     CPUs  for  binding to a given task consecutively from the
1225                     same socket, and from the next consecutive socket for the
1226                     next task, in a round-robin fashion across sockets.
1227
1228              fcyclic
1229                     The fcyclic distribution method will distribute allocated
1230                     CPUs for binding to tasks from consecutive sockets  in  a
1231                     round-robin fashion across the sockets.
1232
1233
1234              Third distribution method (distribution of CPUs across cores for
1235              binding):
1236
1237
1238              *      Use the default method for distributing CPUs across cores
1239                     (inherited from second distribution method).
1240
1241              block  The  block  distribution method will distribute allocated
1242                     CPUs consecutively from the  same  core  for  binding  to
1243                     tasks, before using the next consecutive core.
1244
1245              cyclic The  cyclic distribution method will distribute allocated
1246                     CPUs for binding to a given task consecutively  from  the
1247                     same  core,  and  from  the next consecutive core for the
1248                     next task, in a round-robin fashion across cores.
1249
1250              fcyclic
1251                     The fcyclic distribution method will distribute allocated
1252                     CPUs  for  binding  to  tasks from consecutive cores in a
1253                     round-robin fashion across the cores.
1254
1255
1256
1257              Optional control for task distribution over nodes:
1258
1259
1260              Pack   Rather than evenly distributing a job step's tasks evenly
1261                     across it's allocated nodes, pack them as tightly as pos‐
1262                     sible on the nodes.
1263
1264              NoPack Rather than packing a job step's tasks as tightly as pos‐
1265                     sible  on  the  nodes, distribute them evenly.  This user
1266                     option   will    supersede    the    SelectTypeParameters
1267                     CR_Pack_Nodes configuration parameter.
1268
1269              This option applies to job and step allocations.
1270
1271
1272       --mail-type=<type>
1273              Notify user by email when certain event types occur.  Valid type
1274              values are NONE, BEGIN, END, FAIL, REQUEUE, ALL  (equivalent  to
1275              BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buf‐
1276              fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1277              (reached  90  percent  of time limit), TIME_LIMIT_80 (reached 80
1278              percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1279              time  limit).   Multiple type values may be specified in a comma
1280              separated list.  The user  to  be  notified  is  indicated  with
1281              --mail-user. This option applies to job allocations.
1282
1283
1284       --mail-user=<user>
1285              User  to  receive email notification of state changes as defined
1286              by --mail-type.  The default value is the submitting user.  This
1287              option applies to job allocations.
1288
1289
1290       --mcs-label=<mcs>
1291              Used  only when the mcs/group plugin is enabled.  This parameter
1292              is a group among the groups of the user.  Default value is  cal‐
1293              culated  by  the Plugin mcs if it's enabled. This option applies
1294              to job allocations.
1295
1296
1297       --mem=<size[units]>
1298              Specify the real memory required per node.   Default  units  are
1299              megabytes unless the SchedulerParameters configuration parameter
1300              includes the "default_gbytes" option for  gigabytes.   Different
1301              units  can  be  specified  using  the suffix [K|M|G|T].  Default
1302              value is DefMemPerNode and the maximum value  is  MaxMemPerNode.
1303              If configured, both of parameters can be seen using the scontrol
1304              show config command.  This parameter would generally be used  if
1305              whole  nodes  are  allocated to jobs (SelectType=select/linear).
1306              Specifying a memory limit of zero for a job step  will  restrict
1307              the  job  step to the amount of memory allocated to the job, but
1308              not remove any of the job's memory allocation from being  avail‐
1309              able   to   other   job   steps.   Also  see  --mem-per-cpu  and
1310              --mem-per-gpu.   The  --mem,  --mem-per-cpu  and   --mem-per-gpu
1311              options  are  mutually  exclusive.  If  --mem,  --mem-per-cpu or
1312              --mem-per-gpu are specified as command line arguments, then they
1313              will take precedence over the environment (potentially inherited
1314              from salloc or sbatch).
1315
1316              NOTE: A memory size specification of zero is treated as  a  spe‐
1317              cial case and grants the job access to all of the memory on each
1318              node for newly submitted jobs and all available job memory to  a
1319              new job steps.
1320
1321              Specifying new memory limits for job steps are only advisory.
1322
1323              If  the job is allocated multiple nodes in a heterogeneous clus‐
1324              ter, the memory limit on each node will be that of the  node  in
1325              the  allocation  with  the smallest memory size (same limit will
1326              apply to every node in the job's allocation).
1327
1328              NOTE: Enforcement of memory limits  currently  relies  upon  the
1329              task/cgroup plugin or enabling of accounting, which samples mem‐
1330              ory use on a periodic basis (data need not be stored, just  col‐
1331              lected).  In both cases memory use is based upon the job's Resi‐
1332              dent Set Size (RSS). A task may exceed the  memory  limit  until
1333              the next periodic accounting sample.
1334
1335              This option applies to job and step allocations.
1336
1337
1338       --mem-per-cpu=<size[units]>
1339              Minimum  memory  required  per allocated CPU.  Default units are
1340              megabytes unless the SchedulerParameters configuration parameter
1341              includes  the  "default_gbytes" option for gigabytes.  Different
1342              units can be specified  using  the  suffix  [K|M|G|T].   Default
1343              value is DefMemPerCPU and the maximum value is MaxMemPerCPU (see
1344              exception below). If configured, both of parameters can be  seen
1345              using  the scontrol show config command.  Note that if the job's
1346              --mem-per-cpu value exceeds the  configured  MaxMemPerCPU,  then
1347              the  user's  limit  will  be treated as a memory limit per task;
1348              --mem-per-cpu will be reduced to a value no larger than  MaxMem‐
1349              PerCPU;   --cpus-per-task   will   be   set  and  the  value  of
1350              --cpus-per-task multiplied by the new --mem-per-cpu  value  will
1351              equal  the  original  --mem-per-cpu value specified by the user.
1352              This parameter would generally be used if individual  processors
1353              are   allocated   to   jobs   (SelectType=select/cons_res).   If
1354              resources are allocated by the core, socket or whole nodes;  the
1355              number  of  CPUs  allocated to a job may be higher than the task
1356              count and the value of --mem-per-cpu should be adjusted  accord‐
1357              ingly.   Specifying  a  memory limit of zero for a job step will
1358              restrict the job step to the amount of memory allocated  to  the
1359              job,  but  not  remove  any  of the job's memory allocation from
1360              being  available  to  other  job  steps.   Also  see  --mem  and
1361              --mem-per-gpu.    The  --mem,  --mem-per-cpu  and  --mem-per-gpu
1362              options are mutually exclusive.
1363
1364              NOTE:If the final amount of memory requested by job  (eg.:  when
1365              --mem-per-cpu use with --exclusive option) can't be satisfied by
1366              any of nodes configured  in  the  partition,  the  job  will  be
1367              rejected.
1368
1369
1370       --mem-per-gpu=<size[units]>
1371              Minimum  memory  required  per allocated GPU.  Default units are
1372              megabytes unless the SchedulerParameters configuration parameter
1373              includes  the  "default_gbytes" option for gigabytes.  Different
1374              units can be specified  using  the  suffix  [K|M|G|T].   Default
1375              value  is DefMemPerGPU and is available on both a global and per
1376              partition basis.  If configured,  the  parameters  can  be  seen
1377              using  the scontrol show config and scontrol show partition com‐
1378              mands.   Also  see  --mem.    The   --mem,   --mem-per-cpu   and
1379              --mem-per-gpu options are mutually exclusive.
1380
1381
1382       --mem-bind=[{quiet,verbose},]type
1383              Bind tasks to memory. Used only when the task/affinity plugin is
1384              enabled and the NUMA memory functions are available.  Note  that
1385              the  resolution  of  CPU  and  memory binding may differ on some
1386              architectures. For example, CPU binding may be performed at  the
1387              level  of the cores within a processor while memory binding will
1388              be performed at the level of  nodes,  where  the  definition  of
1389              "nodes"  may differ from system to system.  By default no memory
1390              binding is performed; any task using any CPU can use any memory.
1391              This  option is typically used to ensure that each task is bound
1392              to the memory closest to it's assigned CPU. The use of any  type
1393              other  than  "none"  or "local" is not recommended.  If you want
1394              greater control, try running a simple test code with the options
1395              "--cpu-bind=verbose,none  --mem-bind=verbose,none"  to determine
1396              the specific configuration.
1397
1398              NOTE: To have Slurm always report on the selected memory binding
1399              for  all  commands  executed  in a shell, you can enable verbose
1400              mode by setting the SLURM_MEM_BIND environment variable value to
1401              "verbose".
1402
1403              The  following  informational environment variables are set when
1404              --mem-bind is in use:
1405
1406                   SLURM_MEM_BIND_LIST
1407                   SLURM_MEM_BIND_PREFER
1408                   SLURM_MEM_BIND_SORT
1409                   SLURM_MEM_BIND_TYPE
1410                   SLURM_MEM_BIND_VERBOSE
1411
1412              See the  ENVIRONMENT  VARIABLES  section  for  a  more  detailed
1413              description of the individual SLURM_MEM_BIND* variables.
1414
1415              Supported options include:
1416
1417              help   show this help message
1418
1419              local  Use memory local to the processor in use
1420
1421              map_mem:<list>
1422                     Bind by setting memory masks on tasks (or ranks) as spec‐
1423                     ified             where             <list>             is
1424                     <numa_id_for_task_0>,<numa_id_for_task_1>,...   The  map‐
1425                     ping is specified for a node  and  identical  mapping  is
1426                     applied  to the tasks on every node (i.e. the lowest task
1427                     ID on each node is mapped to the first  ID  specified  in
1428                     the  list,  etc.).   NUMA  IDs are interpreted as decimal
1429                     values unless they are preceded with '0x' in  which  case
1430                     they interpreted as hexadecimal values.  If the number of
1431                     tasks (or ranks) exceeds the number of elements  in  this
1432                     list,  elements  in  the  list  will  be reused as needed
1433                     starting from the beginning of  the  list.   To  simplify
1434                     support for large task counts, the lists may follow a map
1435                     with  an  asterisk  and  repetition  count  For   example
1436                     "map_mem:0x0f*4,0xf0*4".  Not supported unless the entire
1437                     node is allocated to the job.
1438
1439              mask_mem:<list>
1440                     Bind by setting memory masks on tasks (or ranks) as spec‐
1441                     ified             where             <list>             is
1442                     <numa_mask_for_task_0>,<numa_mask_for_task_1>,...     The
1443                     mapping  is specified for a node and identical mapping is
1444                     applied to the tasks on every node (i.e. the lowest  task
1445                     ID  on each node is mapped to the first mask specified in
1446                     the list, etc.).  NUMA masks are  always  interpreted  as
1447                     hexadecimal  values.   Note  that  masks must be preceded
1448                     with a '0x' if they don't begin with [0-9]  so  they  are
1449                     seen  as  numerical  values.   If the number of tasks (or
1450                     ranks) exceeds the number of elements in this list,  ele‐
1451                     ments  in the list will be reused as needed starting from
1452                     the beginning of the list.  To simplify support for large
1453                     task counts, the lists may follow a mask with an asterisk
1454                     and repetition count For example "mask_mem:0*4,1*4".  Not
1455                     supported unless the entire node is allocated to the job.
1456
1457              no[ne] don't bind tasks to memory (default)
1458
1459              nosort avoid sorting free cache pages (default, LaunchParameters
1460                     configuration parameter can override this default)
1461
1462              p[refer]
1463                     Prefer use of first specified NUMA node, but permit
1464                      use of other available NUMA nodes.
1465
1466              q[uiet]
1467                     quietly bind before task runs (default)
1468
1469              rank   bind by task rank (not recommended)
1470
1471              sort   sort free cache pages (run zonesort on Intel KNL nodes)
1472
1473              v[erbose]
1474                     verbosely report binding before task runs
1475
1476              This option applies to job and step allocations.
1477
1478
1479       --mincpus=<n>
1480              Specify a minimum number of logical  cpus/processors  per  node.
1481              This option applies to job allocations.
1482
1483
1484       --msg-timeout=<seconds>
1485              Modify  the  job  launch  message timeout.  The default value is
1486              MessageTimeout  in  the  Slurm  configuration  file  slurm.conf.
1487              Changes to this are typically not recommended, but could be use‐
1488              ful to diagnose problems.  This option applies  to  job  alloca‐
1489              tions.
1490
1491
1492       --mpi=<mpi_type>
1493              Identify the type of MPI to be used. May result in unique initi‐
1494              ation procedures.
1495
1496              list   Lists available mpi types to choose from.
1497
1498              openmpi
1499                     For use with OpenMPI.
1500
1501              pmi2   To enable PMI2 support. The PMI2 support in  Slurm  works
1502                     only  if  the  MPI  implementation  supports it, in other
1503                     words if the MPI has the PMI2 interface implemented.  The
1504                     --mpi=pmi2  will  load  the library lib/slurm/mpi_pmi2.so
1505                     which provides the  server  side  functionality  but  the
1506                     client  side  must  implement  PMI2_Init()  and the other
1507                     interface calls.
1508
1509              pmix   To enable  PMIx  support  (http://pmix.github.io/master).
1510                     The  PMIx support in Slurm can be used to launch parallel
1511                     applications (e.g. MPI) if  it  supports  PMIx,  PMI2  or
1512                     PMI1. Slurm must be configured with pmix support by pass‐
1513                     ing "--with-pmix=<PMIx installation path>" option to  its
1514                     "./configure" script.
1515
1516                     At  the  time  of  writing  PMIx is supported in Open MPI
1517                     starting from version 2.0.  PMIx also  supports  backward
1518                     compatibility  with  PMI1 and PMI2 and can be used if MPI
1519                     was configured with PMI2/PMI1  support  pointing  to  the
1520                     PMIx  library ("libpmix").  If MPI supports PMI1/PMI2 but
1521                     doesn't provide the way to point to a specific  implemen‐
1522                     tation,  a hack'ish solution leveraging LD_PRELOAD can be
1523                     used to force "libpmix" usage.
1524
1525
1526              none   No special MPI processing. This is the default and  works
1527                     with many other versions of MPI.
1528
1529              This option applies to step allocations.
1530
1531
1532       --multi-prog
1533              Run  a  job  with different programs and different arguments for
1534              each task. In this case, the  executable  program  specified  is
1535              actually  a  configuration  file  specifying  the executable and
1536              arguments for each  task.  See  MULTIPLE  PROGRAM  CONFIGURATION
1537              below  for  details  on  the  configuration  file contents. This
1538              option applies to step allocations.
1539
1540
1541       -N, --nodes=<minnodes[-maxnodes]>
1542              Request that a minimum of minnodes nodes be  allocated  to  this
1543              job.   A maximum node count may also be specified with maxnodes.
1544              If only one number is specified, this is used as both the  mini‐
1545              mum  and maximum node count.  The partition's node limits super‐
1546              sede those of the job.  If a job's node limits  are  outside  of
1547              the  range  permitted for its associated partition, the job will
1548              be left in a PENDING state.  This permits possible execution  at
1549              a  later  time,  when  the partition limit is changed.  If a job
1550              node limit exceeds the number of nodes configured in the  parti‐
1551              tion, the job will be rejected.  Note that the environment vari‐
1552              able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1553              ibility) will be set to the count of nodes actually allocated to
1554              the job. See the ENVIRONMENT VARIABLES section for more informa‐
1555              tion.   If -N is not specified, the default behavior is to allo‐
1556              cate enough nodes to satisfy the requirements of the -n  and  -c
1557              options.   The  job  will be allocated as many nodes as possible
1558              within the range specified and without delaying  the  initiation
1559              of  the  job.   If  number  of  tasks  is  given and a number of
1560              requested nodes is also given the number of nodes used from that
1561              request  will be reduced to match that of the number of tasks if
1562              the number of nodes in the request is greater than the number of
1563              tasks.  The node count specification may include a numeric value
1564              followed by a suffix of "k" (multiplies numeric value by  1,024)
1565              or  "m"  (multiplies  numeric  value  by 1,048,576). This option
1566              applies to job and step allocations.
1567
1568
1569       -n, --ntasks=<number>
1570              Specify the number of tasks to run. Request that  srun  allocate
1571              resources  for  ntasks tasks.  The default is one task per node,
1572              but note  that  the  --cpus-per-task  option  will  change  this
1573              default. This option applies to job and step allocations.
1574
1575
1576       --network=<type>
1577              Specify  information  pertaining  to the switch or network.  The
1578              interpretation of type is system dependent.  This option is sup‐
1579              ported  when  running  Slurm  on a Cray natively.  It is used to
1580              request using Network Performance Counters.  Only one value  per
1581              request  is  valid.  All options are case in-sensitive.  In this
1582              configuration supported values include:
1583
1584              system
1585                    Use the system-wide  network  performance  counters.  Only
1586                    nodes  requested will be marked in use for the job alloca‐
1587                    tion.  If the job does not fill up the entire  system  the
1588                    rest  of  the  nodes are not able to be used by other jobs
1589                    using NPC, if idle their state will  appear  as  PerfCnts.
1590                    These  nodes  are still available for other jobs not using
1591                    NPC.
1592
1593              blade Use the blade network  performance  counters.  Only  nodes
1594                    requested  will  be  marked in use for the job allocation.
1595                    If the job does not fill up the entire blade(s)  allocated
1596                    to the job those blade(s) are not able to be used by other
1597                    jobs using NPC, if idle their state will appear as  PerfC‐
1598                    nts.   These  nodes are still available for other jobs not
1599                    using NPC.
1600
1601
1602              In all cases the job  or  step  allocation  request  must
1603              specify the
1604              --exclusive option.  Otherwise the request will be denied.
1605
1606              Also  with  any  of these options steps are not allowed to share
1607              blades, so resources would remain idle inside an  allocation  if
1608              the  step  running  on a blade does not take up all the nodes on
1609              the blade.
1610
1611              The network option is also supported on systems with IBM's  Par‐
1612              allel  Environment (PE).  See IBM's LoadLeveler job command key‐
1613              word documentation about the keyword "network" for more informa‐
1614              tion.   Multiple  values  may  be specified in a comma separated
1615              list.  All options  are  case  in-sensitive.   Supported  values
1616              include:
1617
1618              BULK_XFER[=<resources>]
1619                          Enable  bulk  transfer  of data using Remote Direct-
1620                          Memory Access (RDMA).  The optional resources speci‐
1621                          fication  is a numeric value which can have a suffix
1622                          of "k", "K", "m", "M", "g"  or  "G"  for  kilobytes,
1623                          megabytes  or gigabytes.  NOTE: The resources speci‐
1624                          fication is not supported by the underlying IBM  in‐
1625                          frastructure  as of Parallel Environment version 2.2
1626                          and no value should be specified at this time.   The
1627                          devices  allocated  to a job must all be of the same
1628                          type.  The default value depends upon  depends  upon
1629                          what  hardware  is available and in order of prefer‐
1630                          ences is IPONLY (which is  not  considered  in  User
1631                          Space mode), HFI, IB, HPCE, and KMUX.
1632
1633              CAU=<count> Number   of   Collective  Acceleration  Units  (CAU)
1634                          required.  Applies only to IBM Power7-IH processors.
1635                          Default  value  is  zero.   Independent  CAU will be
1636                          allocated for each programming interface (MPI, LAPI,
1637                          etc.)
1638
1639              DEVNAME=<name>
1640                          Specify  the  device  name to use for communications
1641                          (e.g. "eth0" or "mlx4_0").
1642
1643              DEVTYPE=<type>
1644                          Specify the device type to use  for  communications.
1645                          The supported values of type are: "IB" (InfiniBand),
1646                          "HFI" (P7 Host Fabric Interface), "IPONLY"  (IP-Only
1647                          interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1648                          nel Emulation of HPCE).  The devices allocated to  a
1649                          job must all be of the same type.  The default value
1650                          depends upon depends upon what hardware is available
1651                          and  in order of preferences is IPONLY (which is not
1652                          considered in User Space mode), HFI, IB,  HPCE,  and
1653                          KMUX.
1654
1655              IMMED =<count>
1656                          Number  of immediate send slots per window required.
1657                          Applies only to IBM Power7-IH  processors.   Default
1658                          value is zero.
1659
1660              INSTANCES =<count>
1661                          Specify  number of network connections for each task
1662                          on each network connection.   The  default  instance
1663                          count is 1.
1664
1665              IPV4        Use  Internet Protocol (IP) version 4 communications
1666                          (default).
1667
1668              IPV6        Use Internet Protocol (IP) version 6 communications.
1669
1670              LAPI        Use the LAPI programming interface.
1671
1672              MPI         Use the  MPI  programming  interface.   MPI  is  the
1673                          default interface.
1674
1675              PAMI        Use the PAMI programming interface.
1676
1677              SHMEM       Use the OpenSHMEM programming interface.
1678
1679              SN_ALL      Use all available switch networks (default).
1680
1681              SN_SINGLE   Use one available switch network.
1682
1683              UPC         Use the UPC programming interface.
1684
1685              US          Use User Space communications.
1686
1687
1688              Some examples of network specifications:
1689
1690              Instances=2,US,MPI,SN_ALL
1691                          Create two user space connections for MPI communica‐
1692                          tions on every switch network for each task.
1693
1694              US,MPI,Instances=3,Devtype=IB
1695                          Create three user space connections for MPI communi‐
1696                          cations on every InfiniBand network for each task.
1697
1698              IPV4,LAPI,SN_Single
1699                          Create a IP version 4 connection for LAPI communica‐
1700                          tions on one switch network for each task.
1701
1702              Instances=2,US,LAPI,MPI
1703                          Create two user space connections each for LAPI  and
1704                          MPI  communications on every switch network for each
1705                          task. Note that SN_ALL  is  the  default  option  so
1706                          every   switch  network  is  used.  Also  note  that
1707                          Instances=2  specifies  that  two  connections   are
1708                          established  for  each  protocol  (LAPI and MPI) and
1709                          each task.  If there are two networks and four tasks
1710                          on  the  node  then  a  total  of 32 connections are
1711                          established (2 instances x 2 protocols x 2  networks
1712                          x 4 tasks).
1713
1714              This option applies to job and step allocations.
1715
1716
1717       --nice[=adjustment]
1718              Run  the  job with an adjusted scheduling priority within Slurm.
1719              With no adjustment value the scheduling priority is decreased by
1720              100.  A  negative  nice  value increases the priority, otherwise
1721              decreases it. The adjustment range is +/- 2147483645. Only priv‐
1722              ileged users can specify a negative adjustment.
1723
1724
1725       --ntasks-per-core=<ntasks>
1726              Request the maximum ntasks be invoked on each core.  This option
1727              applies to the job allocation,  but  not  to  step  allocations.
1728              Meant   to  be  used  with  the  --ntasks  option.   Related  to
1729              --ntasks-per-node except at the core level instead of  the  node
1730              level.   Masks will automatically be generated to bind the tasks
1731              to specific core unless  --cpu-bind=none  is  specified.   NOTE:
1732              This  option is not supported unless SelectType=cons_res is con‐
1733              figured (either directly or indirectly on  Cray  systems)  along
1734              with the node's core count.
1735
1736
1737       --ntasks-per-node=<ntasks>
1738              Request  that  ntasks be invoked on each node.  If used with the
1739              --ntasks option, the --ntasks option will  take  precedence  and
1740              the  --ntasks-per-node  will  be  treated  as a maximum count of
1741              tasks per node.  Meant to be used with the --nodes option.  This
1742              is related to --cpus-per-task=ncpus, but does not require knowl‐
1743              edge of the actual number of cpus on each node.  In some  cases,
1744              it  is more convenient to be able to request that no more than a
1745              specific number of tasks be invoked on each node.   Examples  of
1746              this  include  submitting a hybrid MPI/OpenMP app where only one
1747              MPI "task/rank" should be assigned to each node  while  allowing
1748              the  OpenMP portion to utilize all of the parallelism present in
1749              the node, or submitting a single setup/cleanup/monitoring job to
1750              each  node  of a pre-existing allocation as one step in a larger
1751              job script. This option applies to job allocations.
1752
1753
1754       --ntasks-per-socket=<ntasks>
1755              Request the maximum ntasks be  invoked  on  each  socket.   This
1756              option  applies  to  the job allocation, but not to step alloca‐
1757              tions.  Meant to be used with the --ntasks option.   Related  to
1758              --ntasks-per-node except at the socket level instead of the node
1759              level.  Masks will automatically be generated to bind the  tasks
1760              to  specific sockets unless --cpu-bind=none is specified.  NOTE:
1761              This option is not supported unless SelectType=cons_res is  con‐
1762              figured  (either  directly  or indirectly on Cray systems) along
1763              with the node's socket count.
1764
1765
1766       -O, --overcommit
1767              Overcommit resources. This option applies to job and step  allo‐
1768              cations.   When applied to job allocation, only one CPU is allo‐
1769              cated to the job per node and options used to specify the number
1770              of  tasks  per  node,  socket,  core,  etc.   are ignored.  When
1771              applied to job step allocations (the srun command when  executed
1772              within  an  existing job allocation), this option can be used to
1773              launch more than one task per  CPU.   Normally,  srun  will  not
1774              allocate  more  than one process per CPU.  By specifying --over‐
1775              commit you are explicitly allowing more  than  one  process  per
1776              CPU. However no more than MAX_TASKS_PER_NODE tasks are permitted
1777              to execute per node.  NOTE: MAX_TASKS_PER_NODE is defined in the
1778              file  slurm.h  and  is  not a variable, it is set at Slurm build
1779              time.
1780
1781
1782       -o, --output=<filename pattern>
1783              Specify  the  "filename  pattern"  for  stdout  redirection.  By
1784              default in interactive mode, srun collects stdout from all tasks
1785              and sends this output via TCP/IP to the attached terminal.  With
1786              --output  stdout  may  be  redirected to a file, to one file per
1787              task, or to /dev/null. See section IO Redirection below for  the
1788              various  forms  of  filename  pattern.   If  the  specified file
1789              already exists, it will be overwritten.
1790
1791              If --error is not also specified on the command line, both  std‐
1792              out  and stderr will directed to the file specified by --output.
1793              This option applies to job and step allocations.
1794
1795
1796       --open-mode=<append|truncate>
1797              Open the output and error files using append or truncate mode as
1798              specified.   For  heterogeneous  job  steps the default value is
1799              "append".  Otherwise the default value is specified by the  sys‐
1800              tem  configuration  parameter JobFileAppend. This option applies
1801              to job and step allocations.
1802
1803
1804       --pack-group=<expr>
1805              Identify each job in a heterogeneous job allocation for which  a
1806              step  is  to  be  created.  Applies only to srun commands issued
1807              inside a salloc allocation or sbatch script.  <expr> is a set of
1808              integers  corresponding  to  one  or more options indexes on the
1809              salloc or  sbatch  command  line.   Examples:  "--pack-group=2",
1810              "--pack-group=0,4",  "--pack-group=1,3-5".  The default value is
1811              --pack-group=0.
1812
1813
1814       -p, --partition=<partition_names>
1815              Request a specific partition for the  resource  allocation.   If
1816              not  specified,  the default behavior is to allow the slurm con‐
1817              troller to select the default partition  as  designated  by  the
1818              system  administrator.  If  the job can use more than one parti‐
1819              tion, specify their names in a comma separate list and  the  one
1820              offering  earliest  initiation will be used with no regard given
1821              to the partition name ordering (although higher priority  parti‐
1822              tions will be considered first).  When the job is initiated, the
1823              name of the partition used will  be  placed  first  in  the  job
1824              record partition string. This option applies to job allocations.
1825
1826
1827       --power=<flags>
1828              Comma  separated  list of power management plugin options.  Cur‐
1829              rently available flags include: level (all  nodes  allocated  to
1830              the job should have identical power caps, may be disabled by the
1831              Slurm configuration option PowerParameters=job_no_level).   This
1832              option applies to job allocations.
1833
1834
1835       --priority=<value>
1836              Request  a  specific job priority.  May be subject to configura‐
1837              tion specific constraints.  value should  either  be  a  numeric
1838              value  or "TOP" (for highest possible value).  Only Slurm opera‐
1839              tors and administrators can set the priority  of  a  job.   This
1840              option applies to job allocations only.
1841
1842
1843       --profile=<all|none|[energy[,|task[,|filesystem[,|network]]]]>
1844              enables  detailed  data  collection  by  the acct_gather_profile
1845              plugin.  Detailed data are typically time-series that are stored
1846              in an HDF5 file for the job or an InfluxDB database depending on
1847              the configured plugin.
1848
1849
1850              All       All data types are collected. (Cannot be combined with
1851                        other values.)
1852
1853
1854              None      No data types are collected. This is the default.
1855                         (Cannot be combined with other values.)
1856
1857
1858              Energy    Energy data is collected.
1859
1860
1861              Task      Task (I/O, Memory, ...) data is collected.
1862
1863
1864              Filesystem
1865                        Filesystem data is collected.
1866
1867
1868              Network   Network (InfiniBand) data is collected.
1869
1870
1871              This option applies to job and step allocations.
1872
1873
1874       --prolog=<executable>
1875              srun  will  run  executable  just before launching the job step.
1876              The command line arguments for executable will  be  the  command
1877              and arguments of the job step.  If executable is "none", then no
1878              srun prolog will be run. This parameter overrides the SrunProlog
1879              parameter  in  slurm.conf. This parameter is completely indepen‐
1880              dent from  the  Prolog  parameter  in  slurm.conf.  This  option
1881              applies to job allocations.
1882
1883
1884       --propagate[=rlimit[,rlimit...]]
1885              Allows  users to specify which of the modifiable (soft) resource
1886              limits to propagate to the compute  nodes  and  apply  to  their
1887              jobs.  If  no rlimit is specified, then all resource limits will
1888              be propagated.  The following  rlimit  names  are  supported  by
1889              Slurm  (although  some options may not be supported on some sys‐
1890              tems):
1891
1892              ALL       All limits listed below (default)
1893
1894              NONE      No limits listed below
1895
1896              AS        The maximum address space for a process
1897
1898              CORE      The maximum size of core file
1899
1900              CPU       The maximum amount of CPU time
1901
1902              DATA      The maximum size of a process's data segment
1903
1904              FSIZE     The maximum size of files created. Note  that  if  the
1905                        user  sets  FSIZE to less than the current size of the
1906                        slurmd.log, job launches will fail with a  'File  size
1907                        limit exceeded' error.
1908
1909              MEMLOCK   The maximum size that may be locked into memory
1910
1911              NOFILE    The maximum number of open files
1912
1913              NPROC     The maximum number of processes available
1914
1915              RSS       The maximum resident set size
1916
1917              STACK     The maximum stack size
1918
1919              This option applies to job allocations.
1920
1921
1922       --pty  Execute  task  zero  in  pseudo  terminal mode.  Implicitly sets
1923              --unbuffered.  Implicitly sets --error and --output to /dev/null
1924              for  all  tasks except task zero, which may cause those tasks to
1925              exit immediately (e.g. shells will typically exit immediately in
1926              that situation).  This option applies to step allocations.
1927
1928
1929       -q, --qos=<qos>
1930              Request  a  quality  of  service for the job.  QOS values can be
1931              defined for each user/cluster/account association in  the  Slurm
1932              database.   Users will be limited to their association's defined
1933              set of qos's when the Slurm  configuration  parameter,  Account‐
1934              ingStorageEnforce,  includes  "qos"  in  it's  definition.  This
1935              option applies to job allocations.
1936
1937
1938       -Q, --quiet
1939              Suppress informational messages from srun. Errors will still  be
1940              displayed. This option applies to job and step allocations.
1941
1942
1943       --quit-on-interrupt
1944              Quit  immediately  on single SIGINT (Ctrl-C). Use of this option
1945              disables  the  status  feature  normally  available  when   srun
1946              receives  a single Ctrl-C and causes srun to instead immediately
1947              terminate the running job. This option applies to  step  alloca‐
1948              tions.
1949
1950
1951       -r, --relative=<n>
1952              Run  a  job  step  relative to node n of the current allocation.
1953              This option may be used to spread several job  steps  out  among
1954              the  nodes  of  the  current job. If -r is used, the current job
1955              step will begin at node n of the allocated nodelist,  where  the
1956              first node is considered node 0.  The -r option is not permitted
1957              with -w or -x option and will result in a fatal error  when  not
1958              running within a prior allocation (i.e. when SLURM_JOB_ID is not
1959              set). The default for n is 0. If the value  of  --nodes  exceeds
1960              the  number  of  nodes  identified with the --relative option, a
1961              warning message will be printed and the --relative  option  will
1962              take precedence. This option applies to step allocations.
1963
1964
1965       --reboot
1966              Force  the  allocated  nodes  to reboot before starting the job.
1967              This is only supported with some system configurations and  will
1968              otherwise  be silently ignored. This option applies to job allo‐
1969              cations.
1970
1971
1972       --resv-ports[=count]
1973              Reserve communication ports for this job. Users can specify  the
1974              number  of  port  they  want  to  reserve.  The  parameter  Mpi‐
1975              Params=ports=12000-12999 must be specified in slurm.conf. If not
1976              specified  and  Slurm's  OpenMPI plugin is used, then by default
1977              the number of reserved equal to the highest number of  tasks  on
1978              any  node in the job step allocation.  If the number of reserved
1979              ports is zero then no ports is reserved.  Used for OpenMPI. This
1980              option applies to job and step allocations.
1981
1982
1983       --reservation=<name>
1984              Allocate  resources for the job from the named reservation. This
1985              option applies to job allocations.
1986
1987
1988       -s, --oversubscribe
1989              The job allocation can over-subscribe resources with other  run‐
1990              ning  jobs.   The  resources to be over-subscribed can be nodes,
1991              sockets, cores, and/or hyperthreads  depending  upon  configura‐
1992              tion.   The  default  over-subscribe  behavior depends on system
1993              configuration and the  partition's  OverSubscribe  option  takes
1994              precedence over the job's option.  This option may result in the
1995              allocation being granted  sooner  than  if  the  --oversubscribe
1996              option  was  not  set  and  allow higher system utilization, but
1997              application performance will likely suffer  due  to  competition
1998              for  resources.   Also  see  the --exclusive option. This option
1999              applies to step allocations.
2000
2001
2002       -S, --core-spec=<num>
2003              Count of specialized cores per node reserved by the job for sys‐
2004              tem  operations and not used by the application. The application
2005              will not use these cores, but will be charged for their  alloca‐
2006              tion.   Default  value  is  dependent upon the node's configured
2007              CoreSpecCount value.  If a value of zero is designated  and  the
2008              Slurm  configuration  option AllowSpecResourcesUsage is enabled,
2009              the job will be allowed to override CoreSpecCount  and  use  the
2010              specialized resources on nodes it is allocated.  This option can
2011              not be used with the --thread-spec option. This  option  applies
2012              to job allocations.
2013
2014
2015       --signal=<sig_num>[@<sig_time>]
2016              When  a  job is within sig_time seconds of its end time, send it
2017              the signal sig_num.  Due to the resolution of event handling  by
2018              Slurm,  the  signal  may  be  sent up to 60 seconds earlier than
2019              specified.  sig_num may either be a signal number or name  (e.g.
2020              "10"  or "USR1").  sig_time must have an integer value between 0
2021              and 65535.  By default, no signal is sent before the  job's  end
2022              time.   If  a  sig_num  is  specified  without any sig_time, the
2023              default time will be 60 seconds.  This  option  applies  to  job
2024              allocations.  To have the signal sent at preemption time see the
2025              preempt_send_user_signal SlurmctldParameter.
2026
2027
2028       --slurmd-debug=<level>
2029              Specify a debug level for slurmd(8). The level may be  specified
2030              either  an  integer value between 0 [quiet, only errors are dis‐
2031              played] and 4 [verbose operation] or the SlurmdDebug tags.
2032
2033              quiet     Log nothing
2034
2035              fatal     Log only fatal errors
2036
2037              error     Log only errors
2038
2039              info      Log errors and general informational messages
2040
2041              verbose   Log errors and verbose informational messages
2042
2043
2044              The slurmd debug information is copied onto the stderr of
2045              the job. By default  only  errors  are  displayed.  This  option
2046              applies to job and step allocations.
2047
2048
2049       --sockets-per-node=<sockets>
2050              Restrict  node  selection  to  nodes with at least the specified
2051              number of sockets.  See additional information under  -B  option
2052              above  when task/affinity plugin is enabled. This option applies
2053              to job allocations.
2054
2055
2056       --spread-job
2057              Spread the job allocation over as many  nodes  as  possible  and
2058              attempt  to  evenly distribute tasks across the allocated nodes.
2059              This option disables  the  topology/tree  plugin.   This  option
2060              applies to job allocations.
2061
2062
2063       --switches=<count>[@<max-time>]
2064              When  a tree topology is used, this defines the maximum count of
2065              switches desired for the job allocation and optionally the maxi‐
2066              mum  time to wait for that number of switches. If Slurm finds an
2067              allocation containing more switches than  the  count  specified,
2068              the job remains pending until it either finds an allocation with
2069              desired switch count or the time limit expires.  It there is  no
2070              switch  count  limit,  there  is  no  delay in starting the job.
2071              Acceptable time formats  include  "minutes",  "minutes:seconds",
2072              "hours:minutes:seconds",  "days-hours", "days-hours:minutes" and
2073              "days-hours:minutes:seconds".  The job's maximum time delay  may
2074              be limited by the system administrator using the SchedulerParam‐
2075              eters configuration parameter with the max_switch_wait parameter
2076              option.   On a dragonfly network the only switch count supported
2077              is 1 since communication performance will be highest when a  job
2078              is  allocate  resources  on  one leaf switch or more than 2 leaf
2079              switches.  The default max-time is  the  max_switch_wait  Sched‐
2080              ulerParameters. This option applies to job allocations.
2081
2082
2083       -T, --threads=<nthreads>
2084              Allows  limiting  the  number of concurrent threads used to send
2085              the job request from the srun process to the slurmd processes on
2086              the  allocated nodes. Default is to use one thread per allocated
2087              node up to a maximum of 60 concurrent threads.  Specifying  this
2088              option limits the number of concurrent threads to nthreads (less
2089              than or equal to 60).  This should only be used  to  set  a  low
2090              thread  count  for  testing on very small memory computers. This
2091              option applies to job allocations.
2092
2093
2094       -t, --time=<time>
2095              Set a limit on the total run time of the job allocation.  If the
2096              requested time limit exceeds the partition's time limit, the job
2097              will be left in a PENDING state  (possibly  indefinitely).   The
2098              default  time limit is the partition's default time limit.  When
2099              the time limit is reached, each task in each job  step  is  sent
2100              SIGTERM  followed  by  SIGKILL.  The interval between signals is
2101              specified by the Slurm configuration  parameter  KillWait.   The
2102              OverTimeLimit  configuration parameter may permit the job to run
2103              longer than scheduled.  Time resolution is one minute and second
2104              values are rounded up to the next minute.
2105
2106              A  time  limit  of  zero requests that no time limit be imposed.
2107              Acceptable time formats  include  "minutes",  "minutes:seconds",
2108              "hours:minutes:seconds",  "days-hours", "days-hours:minutes" and
2109              "days-hours:minutes:seconds". This option  applies  to  job  and
2110              step allocations.
2111
2112
2113       --task-epilog=<executable>
2114              The  slurmstepd  daemon will run executable just after each task
2115              terminates. This will be executed before any TaskEpilog  parame‐
2116              ter  in  slurm.conf  is  executed.  This  is  meant to be a very
2117              short-lived program. If it fails to terminate within a few  sec‐
2118              onds,  it  will  be  killed along with any descendant processes.
2119              This option applies to step allocations.
2120
2121
2122       --task-prolog=<executable>
2123              The slurmstepd daemon will run executable just before  launching
2124              each  task. This will be executed after any TaskProlog parameter
2125              in slurm.conf is executed.  Besides the normal environment vari‐
2126              ables, this has SLURM_TASK_PID available to identify the process
2127              ID of the task being started.  Standard output from this program
2128              of  the form "export NAME=value" will be used to set environment
2129              variables for the task being spawned.  This  option  applies  to
2130              step allocations.
2131
2132
2133       --test-only
2134              Returns  an  estimate  of  when  a job would be scheduled to run
2135              given the current job queue and all  the  other  srun  arguments
2136              specifying  the job.  This limits srun's behavior to just return
2137              information; no job is actually submitted.  The program will  be
2138              executed  directly  by the slurmd daemon. This option applies to
2139              job allocations.
2140
2141
2142       --thread-spec=<num>
2143              Count of specialized threads per node reserved by  the  job  for
2144              system  operations and not used by the application. The applica‐
2145              tion will not use these threads, but will be charged  for  their
2146              allocation.   This  option  can not be used with the --core-spec
2147              option. This option applies to job allocations.
2148
2149
2150       --threads-per-core=<threads>
2151              Restrict node selection to nodes with  at  least  the  specified
2152              number  of threads per core.  NOTE: "Threads" refers to the num‐
2153              ber of processing units on each core rather than the  number  of
2154              application  tasks  to  be  launched  per  core.  See additional
2155              information under -B option above when task/affinity  plugin  is
2156              enabled. This option applies to job allocations.
2157
2158
2159       --time-min=<time>
2160              Set  a  minimum time limit on the job allocation.  If specified,
2161              the job may have it's --time limit lowered to a value  no  lower
2162              than  --time-min  if doing so permits the job to begin execution
2163              earlier than otherwise possible.  The job's time limit will  not
2164              be  changed  after the job is allocated resources.  This is per‐
2165              formed by a backfill scheduling algorithm to allocate  resources
2166              otherwise  reserved  for  higher priority jobs.  Acceptable time
2167              formats  include   "minutes",   "minutes:seconds",   "hours:min‐
2168              utes:seconds",     "days-hours",     "days-hours:minutes"    and
2169              "days-hours:minutes:seconds". This option applies to job alloca‐
2170              tions.
2171
2172
2173       --tmp=<size[units]>
2174              Specify  a  minimum  amount  of  temporary  disk space per node.
2175              Default units are megabytes unless the SchedulerParameters  con‐
2176              figuration  parameter  includes  the "default_gbytes" option for
2177              gigabytes.  Different units can be specified  using  the  suffix
2178              [K|M|G|T].  This option applies to job allocations.
2179
2180
2181       -u, --unbuffered
2182              By  default  the  connection  between  slurmstepd  and  the user
2183              launched application is over a pipe. The stdio output written by
2184              the  application is buffered by the glibc until it is flushed or
2185              the output is set as unbuffered.  See setbuf(3). If this  option
2186              is  specified  the  tasks are executed with a pseudo terminal so
2187              that the application output is unbuffered. This  option  applies
2188              to step allocations.
2189
2190       --usage
2191              Display brief help message and exit.
2192
2193
2194       --uid=<user>
2195              Attempt to submit and/or run a job as user instead of the invok‐
2196              ing user id. The invoking user's credentials  will  be  used  to
2197              check access permissions for the target partition. User root may
2198              use this option to run jobs as a normal user in a RootOnly  par‐
2199              tition  for  example. If run as root, srun will drop its permis‐
2200              sions to the uid specified after node allocation is  successful.
2201              user  may  be  the  user  name or numerical user ID. This option
2202              applies to job and step allocations.
2203
2204
2205       --use-min-nodes
2206              If a range of node counts is given, prefer the smaller count.
2207
2208
2209       -V, --version
2210              Display version information and exit.
2211
2212
2213       -v, --verbose
2214              Increase the verbosity of srun's informational messages.  Multi‐
2215              ple  -v's  will  further  increase srun's verbosity.  By default
2216              only errors will be displayed. This option applies  to  job  and
2217              step allocations.
2218
2219
2220       -W, --wait=<seconds>
2221              Specify  how long to wait after the first task terminates before
2222              terminating all remaining tasks.  A  value  of  0  indicates  an
2223              unlimited  wait (a warning will be issued after 60 seconds). The
2224              default value is set by the WaitTime parameter in the slurm con‐
2225              figuration  file  (see slurm.conf(5)). This option can be useful
2226              to ensure that a job is terminated in a timely  fashion  in  the
2227              event  that  one or more tasks terminate prematurely.  Note: The
2228              -K, --kill-on-bad-exit option takes precedence over  -W,  --wait
2229              to terminate the job immediately if a task exits with a non-zero
2230              exit code. This option applies to job allocations.
2231
2232
2233       -w, --nodelist=<host1,host2,... or filename>
2234              Request a specific list of hosts.  The job will contain  all  of
2235              these  hosts  and possibly additional hosts as needed to satisfy
2236              resource  requirements.   The  list  may  be  specified   as   a
2237              comma-separated list of hosts, a range of hosts (host[1-5,7,...]
2238              for example), or a filename.  The host list will be  assumed  to
2239              be  a filename if it contains a "/" character.  If you specify a
2240              minimum node or processor count larger than can be satisfied  by
2241              the  supplied  host list, additional resources will be allocated
2242              on other nodes as needed.  Rather than  repeating  a  host  name
2243              multiple  times,  an  asterisk  and  a  repetition  count may be
2244              appended to a host name. For example "host1,host1" and "host1*2"
2245              are  equivalent.  If  number  of  tasks  is  given and a list of
2246              requested nodes is also given the number of nodes used from that
2247              list will be reduced to match that of the number of tasks if the
2248              number of nodes in the list is greater than the number of tasks.
2249              This option applies to job and step allocations.
2250
2251
2252       --wckey=<wckey>
2253              Specify  wckey  to be used with job.  If TrackWCKey=no (default)
2254              in the slurm.conf this value is ignored. This option applies  to
2255              job allocations.
2256
2257
2258       -X, --disable-status
2259              Disable  the  display of task status when srun receives a single
2260              SIGINT (Ctrl-C). Instead immediately forward the SIGINT  to  the
2261              running  job.  Without this option a second Ctrl-C in one second
2262              is required to forcibly terminate the job and srun will  immedi‐
2263              ately  exit.  May  also  be  set  via  the  environment variable
2264              SLURM_DISABLE_STATUS. This option applies to job allocations.
2265
2266
2267       -x, --exclude=<host1,host2,... or filename>
2268              Request that a specific list of hosts not  be  included  in  the
2269              resources  allocated  to this job. The host list will be assumed
2270              to be a filename if it  contains  a  "/"character.  This  option
2271              applies to job allocations.
2272
2273
2274       --x11[=<all|first|last>]
2275              Sets  up  X11  forwarding  on  all, first or last node(s) of the
2276              allocation. This option is only enabled if  Slurm  was  compiled
2277              with   X11   support  and  PrologFlags=x11  is  defined  in  the
2278              slurm.conf. Default is all.
2279
2280
2281       -Z, --no-allocate
2282              Run the specified tasks on a set of  nodes  without  creating  a
2283              Slurm  "job"  in the Slurm queue structure, bypassing the normal
2284              resource allocation step.  The list of nodes must  be  specified
2285              with  the  -w,  --nodelist  option.  This is a privileged option
2286              only available for the users "SlurmUser" and "root". This option
2287              applies to job allocations.
2288
2289
2290       srun will submit the job request to the slurm job controller, then ini‐
2291       tiate all processes on the remote nodes. If the request cannot  be  met
2292       immediately,  srun  will  block until the resources are free to run the
2293       job. If the -I (--immediate) option is specified srun will terminate if
2294       resources are not immediately available.
2295
2296       When  initiating remote processes srun will propagate the current work‐
2297       ing directory, unless --chdir=<path> is specified, in which  case  path
2298       will become the working directory for the remote processes.
2299
2300       The  -n,  -c,  and -N options control how CPUs  and nodes will be allo‐
2301       cated to the job. When specifying only the number of processes  to  run
2302       with  -n,  a default of one CPU per process is allocated. By specifying
2303       the number of CPUs required per task (-c), more than  one  CPU  may  be
2304       allocated  per  process.  If  the number of nodes is specified with -N,
2305       srun will attempt to allocate at least the number of nodes specified.
2306
2307       Combinations of the above three options may be used to change how  pro‐
2308       cesses are distributed across nodes and cpus. For instance, by specify‐
2309       ing both the number of processes and number of nodes on which  to  run,
2310       the  number of processes per node is implied. However, if the number of
2311       CPUs per process is more important then number of  processes  (-n)  and
2312       the number of CPUs per process (-c) should be specified.
2313
2314       srun  will  refuse  to   allocate  more than one process per CPU unless
2315       --overcommit (-O) is also specified.
2316
2317       srun will attempt to meet the above specifications "at a minimum." That
2318       is,  if  16 nodes are requested for 32 processes, and some nodes do not
2319       have 2 CPUs, the allocation of nodes will be increased in order to meet
2320       the  demand  for  CPUs. In other words, a minimum of 16 nodes are being
2321       requested. However, if 16 nodes are requested for  15  processes,  srun
2322       will  consider  this  an  error,  as  15 processes cannot run across 16
2323       nodes.
2324
2325
2326       IO Redirection
2327
2328       By default, stdout and stderr will be redirected from all tasks to  the
2329       stdout  and stderr of srun, and stdin will be redirected from the stan‐
2330       dard input of srun to all remote tasks.  If stdin is only to be read by
2331       a  subset  of  the spawned tasks, specifying a file to read from rather
2332       than forwarding stdin from the srun command may  be  preferable  as  it
2333       avoids moving and storing data that will never be read.
2334
2335       For  OS  X, the poll() function does not support stdin, so input from a
2336       terminal is not possible.
2337
2338       This behavior may be changed with the --output,  --error,  and  --input
2339       (-o, -e, -i) options. Valid format specifications for these options are
2340
2341       all       stdout stderr is redirected from all tasks to srun.  stdin is
2342                 broadcast to all remote tasks.  (This is the  default  behav‐
2343                 ior)
2344
2345       none      stdout  and  stderr  is not received from any task.  stdin is
2346                 not sent to any task (stdin is closed).
2347
2348       taskid    stdout and/or stderr are redirected from only the  task  with
2349                 relative  id  equal  to  taskid, where 0 <= taskid <= ntasks,
2350                 where ntasks is the total number of tasks in the current  job
2351                 step.   stdin  is  redirected  from the stdin of srun to this
2352                 same task.  This file will be written on the  node  executing
2353                 the task.
2354
2355       filename  srun  will  redirect  stdout  and/or stderr to the named file
2356                 from all tasks.  stdin will be redirected from the named file
2357                 and  broadcast to all tasks in the job.  filename refers to a
2358                 path on the host that runs srun.  Depending on the  cluster's
2359                 file  system  layout, this may result in the output appearing
2360                 in different places depending on whether the job  is  run  in
2361                 batch mode.
2362
2363       filename pattern
2364                 srun allows for a filename pattern to be used to generate the
2365                 named IO file described above. The following list  of  format
2366                 specifiers  may  be  used  in the format string to generate a
2367                 filename that will be unique to a given jobid, stepid,  node,
2368                 or  task.  In  each case, the appropriate number of files are
2369                 opened and associated with the corresponding tasks. Note that
2370                 any  format string containing %t, %n, and/or %N will be writ‐
2371                 ten on the node executing the task rather than the node where
2372                 srun executes, these format specifiers are not supported on a
2373                 BGQ system.
2374
2375                 \\     Do not process any of the replacement symbols.
2376
2377                 %%     The character "%".
2378
2379                 %A     Job array's master job allocation number.
2380
2381                 %a     Job array ID (index) number.
2382
2383                 %J     jobid.stepid of the running job. (e.g. "128.0")
2384
2385                 %j     jobid of the running job.
2386
2387                 %s     stepid of the running job.
2388
2389                 %N     short hostname. This will create a  separate  IO  file
2390                        per node.
2391
2392                 %n     Node  identifier  relative to current job (e.g. "0" is
2393                        the first node of the running job) This will create  a
2394                        separate IO file per node.
2395
2396                 %t     task  identifier  (rank) relative to current job. This
2397                        will create a separate IO file per task.
2398
2399                 %u     User name.
2400
2401                 %x     Job name.
2402
2403                 A number placed between  the  percent  character  and  format
2404                 specifier  may be used to zero-pad the result in the IO file‐
2405                 name. This number is ignored if the format  specifier  corre‐
2406                 sponds to  non-numeric data (%N for example).
2407
2408                 Some  examples  of  how the format string may be used for a 4
2409                 task job step with a Job ID of 128  and  step  id  of  0  are
2410                 included below:
2411
2412                 job%J.out      job128.0.out
2413
2414                 job%4j.out     job0128.out
2415
2416                 job%j-%2t.out  job128-00.out, job128-01.out, ...
2417

INPUT ENVIRONMENT VARIABLES

2419       Some srun options may be set via environment variables.  These environ‐
2420       ment variables, along with  their  corresponding  options,  are  listed
2421       below.  Note: Command line options will always override these settings.
2422
2423       PMI_FANOUT            This  is  used  exclusively  with PMI (MPICH2 and
2424                             MVAPICH2) and controls the fanout of data  commu‐
2425                             nications.  The  srun  command  sends messages to
2426                             application programs (via the  PMI  library)  and
2427                             those  applications may be called upon to forward
2428                             that data to up  to  this  number  of  additional
2429                             tasks.  Higher  values offload work from the srun
2430                             command to the applications and  likely  increase
2431                             the vulnerability to failures.  The default value
2432                             is 32.
2433
2434       PMI_FANOUT_OFF_HOST   This is used exclusively  with  PMI  (MPICH2  and
2435                             MVAPICH2)  and controls the fanout of data commu‐
2436                             nications.  The srun command  sends  messages  to
2437                             application  programs  (via  the PMI library) and
2438                             those applications may be called upon to  forward
2439                             that  data  to additional tasks. By default, srun
2440                             sends one message per host and one task  on  that
2441                             host  forwards  the  data  to other tasks on that
2442                             host up to PMI_FANOUT.  If PMI_FANOUT_OFF_HOST is
2443                             defined, the user task may be required to forward
2444                             the  data  to  tasks  on  other  hosts.   Setting
2445                             PMI_FANOUT_OFF_HOST   may  increase  performance.
2446                             Since more work is performed by the  PMI  library
2447                             loaded by the user application, failures also can
2448                             be more common and more difficult to diagnose.
2449
2450       PMI_TIME              This is used exclusively  with  PMI  (MPICH2  and
2451                             MVAPICH2)  and  controls  how much the communica‐
2452                             tions from the tasks to the srun are  spread  out
2453                             in  time  in order to avoid overwhelming the srun
2454                             command with  work.  The  default  value  is  500
2455                             (microseconds)  per task. On relatively slow pro‐
2456                             cessors or  systems  with  very  large  processor
2457                             counts  (and  large PMI data sets), higher values
2458                             may be required.
2459
2460       SLURM_CONF            The location of the Slurm configuration file.
2461
2462       SLURM_ACCOUNT         Same as -A, --account
2463
2464       SLURM_ACCTG_FREQ      Same as --acctg-freq
2465
2466       SLURM_BCAST           Same as --bcast
2467
2468       SLURM_BURST_BUFFER    Same as --bb
2469
2470       SLURM_CHECKPOINT      Same as --checkpoint
2471
2472       SLURM_COMPRESS        Same as --compress
2473
2474       SLURM_CONSTRAINT      Same as -C, --constraint
2475
2476       SLURM_CORE_SPEC       Same as --core-spec
2477
2478       SLURM_CPU_BIND        Same as --cpu-bind
2479
2480       SLURM_CPU_FREQ_REQ    Same as --cpu-freq.
2481
2482       SLURM_CPUS_PER_GPU    Same as --cpus-per-gpu
2483
2484       SLURM_CPUS_PER_TASK   Same as -c, --cpus-per-task
2485
2486       SLURM_DEBUG           Same as -v, --verbose
2487
2488       SLURM_DELAY_BOOT      Same as --delay-boot
2489
2490       SLURMD_DEBUG          Same as -d, --slurmd-debug
2491
2492       SLURM_DEPENDENCY      Same as -P, --dependency=<jobid>
2493
2494       SLURM_DISABLE_STATUS  Same as -X, --disable-status
2495
2496       SLURM_DIST_PLANESIZE  Same as -m plane
2497
2498       SLURM_DISTRIBUTION    Same as -m, --distribution
2499
2500       SLURM_EPILOG          Same as --epilog
2501
2502       SLURM_EXCLUSIVE       Same as --exclusive
2503
2504       SLURM_EXIT_ERROR      Specifies the exit code generated  when  a  Slurm
2505                             error occurs (e.g. invalid options).  This can be
2506                             used by a script to distinguish application  exit
2507                             codes  from various Slurm error conditions.  Also
2508                             see SLURM_EXIT_IMMEDIATE.
2509
2510       SLURM_EXIT_IMMEDIATE  Specifies  the  exit  code  generated  when   the
2511                             --immediate  option is used and resources are not
2512                             currently available.   This  can  be  used  by  a
2513                             script to distinguish application exit codes from
2514                             various  Slurm  error   conditions.    Also   see
2515                             SLURM_EXIT_ERROR.
2516
2517       SLURM_EXPORT_ENV      Same as --export
2518
2519       SLURM_GPUS            Same as -G, --gpus
2520
2521       SLURM_GPU_BIND        Same as --gpu-bind
2522
2523       SLURM_GPU_FREQ        Same as --gpu-freq
2524
2525       SLURM_GPUS_PER_NODE   Same as --gpus-per-node
2526
2527       SLURM_GPUS_PER_TASK   Same as --gpus-per-task
2528
2529       SLURM_GRES_FLAGS      Same as --gres-flags
2530
2531       SLURM_HINT            Same as --hint
2532
2533       SLURM_GRES            Same as --gres. Also see SLURM_STEP_GRES
2534
2535       SLURM_IMMEDIATE       Same as -I, --immediate
2536
2537       SLURM_JOB_ID          Same as --jobid
2538
2539       SLURM_JOB_NAME        Same  as -J, --job-name except within an existing
2540                             allocation, in which case it is ignored to  avoid
2541                             using  the  batch  job's name as the name of each
2542                             job step.
2543
2544       SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility)
2545                             Same as -N, --nodes Total number of nodes in  the
2546                             job’s resource allocation.
2547
2548       SLURM_KILL_BAD_EXIT   Same as -K, --kill-on-bad-exit
2549
2550       SLURM_LABELIO         Same as -l, --label
2551
2552       SLURM_MEM_BIND        Same as --mem-bind
2553
2554       SLURM_MEM_PER_CPU     Same as --mem-per-cpu
2555
2556       SLURM_MEM_PER_GPU     Same as --mem-per-gpu
2557
2558       SLURM_MEM_PER_NODE    Same as --mem
2559
2560       SLURM_MPI_TYPE        Same as --mpi
2561
2562       SLURM_NETWORK         Same as --network
2563
2564       SLURM_NO_KILL         Same as -k, --no-kill
2565
2566       SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
2567                             Same as -n, --ntasks
2568
2569       SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2570
2571       SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2572
2573       SLURM_NTASKS_PER_SOCKET
2574                             Same as --ntasks-per-socket
2575
2576       SLURM_OPEN_MODE       Same as --open-mode
2577
2578       SLURM_OVERCOMMIT      Same as -O, --overcommit
2579
2580       SLURM_PARTITION       Same as -p, --partition
2581
2582       SLURM_PMI_KVS_NO_DUP_KEYS
2583                             If set, then PMI key-pairs will contain no dupli‐
2584                             cate keys. MPI can use this  variable  to  inform
2585                             the  PMI  library  that it will not use duplicate
2586                             keys so PMI can  skip  the  check  for  duplicate
2587                             keys.   This  is  the case for MPICH2 and reduces
2588                             overhead in testing for duplicates  for  improved
2589                             performance
2590
2591       SLURM_POWER           Same as --power
2592
2593       SLURM_PROFILE         Same as --profile
2594
2595       SLURM_PROLOG          Same as --prolog
2596
2597       SLURM_QOS             Same as --qos
2598
2599       SLURM_REMOTE_CWD      Same as -D, --chdir=
2600
2601       SLURM_REQ_SWITCH      When  a  tree  topology is used, this defines the
2602                             maximum count of switches  desired  for  the  job
2603                             allocation  and  optionally  the  maximum time to
2604                             wait for that number of switches. See --switches
2605
2606       SLURM_RESERVATION     Same as --reservation
2607
2608       SLURM_RESV_PORTS      Same as --resv-ports
2609
2610       SLURM_SIGNAL          Same as --signal
2611
2612       SLURM_STDERRMODE      Same as -e, --error
2613
2614       SLURM_STDINMODE       Same as -i, --input
2615
2616       SLURM_SPREAD_JOB      Same as --spread-job
2617
2618       SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2619                             if set and non-zero, successive  task  exit  mes‐
2620                             sages  with  the  same  exit code will be printed
2621                             only once.
2622
2623       SLURM_STEP_GRES       Same as --gres (only applies to job steps, not to
2624                             job allocations).  Also see SLURM_GRES
2625
2626       SLURM_STEP_KILLED_MSG_NODE_ID=ID
2627                             If set, only the specified node will log when the
2628                             job or step are killed by a signal.
2629
2630       SLURM_STDOUTMODE      Same as -o, --output
2631
2632       SLURM_TASK_EPILOG     Same as --task-epilog
2633
2634       SLURM_TASK_PROLOG     Same as --task-prolog
2635
2636       SLURM_TEST_EXEC       If defined, srun will  verify  existence  of  the
2637                             executable  program  along with user execute per‐
2638                             mission on the node where srun was called  before
2639                             attempting to launch it on nodes in the step.
2640
2641       SLURM_THREAD_SPEC     Same as --thread-spec
2642
2643       SLURM_THREADS         Same as -T, --threads
2644
2645       SLURM_TIMELIMIT       Same as -t, --time
2646
2647       SLURM_UNBUFFEREDIO    Same as -u, --unbuffered
2648
2649       SLURM_USE_MIN_NODES   Same as --use-min-nodes
2650
2651       SLURM_WAIT            Same as -W, --wait
2652
2653       SLURM_WAIT4SWITCH     Max  time  waiting  for  requested  switches. See
2654                             --switches
2655
2656       SLURM_WCKEY           Same as -W, --wckey
2657
2658       SLURM_WORKING_DIR     -D, --chdir
2659
2660       SRUN_EXPORT_ENV       Same as --export, and will override  any  setting
2661                             for SRUN_EXPORT_ENV.
2662
2663
2664

OUTPUT ENVIRONMENT VARIABLES

2666       srun will set some environment variables in the environment of the exe‐
2667       cuting tasks on the remote compute nodes.  These environment  variables
2668       are:
2669
2670
2671       SLURM_*_PACK_GROUP_#  For  a heterogeneous job allocation, the environ‐
2672                             ment variables are set separately for each compo‐
2673                             nent.
2674
2675       SLURM_CLUSTER_NAME    Name  of  the cluster on which the job is execut‐
2676                             ing.
2677
2678       SLURM_CPU_BIND_VERBOSE
2679                             --cpu-bind verbosity (quiet,verbose).
2680
2681       SLURM_CPU_BIND_TYPE   --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2682
2683       SLURM_CPU_BIND_LIST   --cpu-bind map or mask list (list  of  Slurm  CPU
2684                             IDs  or  masks for this node, CPU_ID = Board_ID x
2685                             threads_per_board       +       Socket_ID       x
2686                             threads_per_socket + Core_ID x threads_per_core +
2687                             Thread_ID).
2688
2689
2690       SLURM_CPU_FREQ_REQ    Contains the value requested for cpu frequency on
2691                             the  srun  command  as  a  numerical frequency in
2692                             kilohertz, or a coded value for a request of low,
2693                             medium,highm1 or high for the frequency.  See the
2694                             description  of  the  --cpu-freq  option  or  the
2695                             SLURM_CPU_FREQ_REQ input environment variable.
2696
2697       SLURM_CPUS_ON_NODE    Count  of processors available to the job on this
2698                             node.  Note the  select/linear  plugin  allocates
2699                             entire  nodes to jobs, so the value indicates the
2700                             total  count  of  CPUs  on  the  node.   For  the
2701                             select/cons_res plugin, this number indicates the
2702                             number of cores on this  node  allocated  to  the
2703                             job.
2704
2705       SLURM_CPUS_PER_GPU    Number of CPUs requested per allocated GPU.  Only
2706                             set if the --cpus-per-gpu option is specified.
2707
2708       SLURM_CPUS_PER_TASK   Number of cpus requested per task.  Only  set  if
2709                             the --cpus-per-task option is specified.
2710
2711       SLURM_DISTRIBUTION    Distribution type for the allocated jobs. Set the
2712                             distribution with -m, --distribution.
2713
2714       SLURM_GPUS            Number of GPUs requested.  Only set  if  the  -G,
2715                             --gpus option is specified.
2716
2717       SLURM_GPU_BIND        Requested  binding  of tasks to GPU.  Only set if
2718                             the --gpu-bind option is specified.
2719
2720       SLURM_GPU_FREQ        Requested  GPU  frequency.   Only  set   if   the
2721                             --gpu-freq option is specified.
2722
2723       SLURM_GPUS_PER_NODE   Requested GPU count per allocated node.  Only set
2724                             if the --gpus-per-node option is specified.
2725
2726       SLURM_GPUS_PER_SOCKET Requested GPU count per allocated  socket.   Only
2727                             set if the --gpus-per-socket option is specified.
2728
2729       SLURM_GPUS_PER_TASK   Requested GPU count per allocated task.  Only set
2730                             if the --gpus-per-task option is specified.
2731
2732       SLURM_GTIDS           Global task IDs running on this node.  Zero  ori‐
2733                             gin and comma separated.
2734
2735       SLURM_JOB_ACCOUNT     Account name associated of the job allocation.
2736
2737       SLURM_JOB_CPUS_PER_NODE
2738                             Number of CPUS per node.
2739
2740       SLURM_JOB_DEPENDENCY  Set to value of the --dependency option.
2741
2742       SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
2743                             Job id of the executing job.
2744
2745
2746       SLURM_JOB_NAME        Set  to the value of the --job-name option or the
2747                             command name when srun is used to  create  a  new
2748                             job allocation. Not set when srun is used only to
2749                             create a job step (i.e. within  an  existing  job
2750                             allocation).
2751
2752
2753       SLURM_JOB_PARTITION   Name  of  the  partition in which the job is run‐
2754                             ning.
2755
2756
2757       SLURM_JOB_QOS         Quality Of Service (QOS) of the job allocation.
2758
2759       SLURM_JOB_RESERVATION Advanced reservation containing the  job  alloca‐
2760                             tion, if any.
2761
2762
2763       SLURM_LAUNCH_NODE_IPADDR
2764                             IP address of the node from which the task launch
2765                             was initiated (where the srun command ran from).
2766
2767       SLURM_LOCALID         Node local task ID for the process within a job.
2768
2769
2770       SLURM_MEM_BIND_LIST   --mem-bind map or mask  list  (<list  of  IDs  or
2771                             masks for this node>).
2772
2773       SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2774
2775       SLURM_MEM_BIND_SORT   Sort  free cache pages (run zonesort on Intel KNL
2776                             nodes).
2777
2778       SLURM_MEM_BIND_TYPE   --mem-bind type (none,rank,map_mem:,mask_mem:).
2779
2780       SLURM_MEM_BIND_VERBOSE
2781                             --mem-bind verbosity (quiet,verbose).
2782
2783       SLURM_MEM_PER_GPU     Requested memory per allocated GPU.  Only set  if
2784                             the --mem-per-gpu option is specified.
2785
2786       SLURM_JOB_NODES       Total number of nodes in the job's resource allo‐
2787                             cation.
2788
2789       SLURM_NODE_ALIASES    Sets of  node  name,  communication  address  and
2790                             hostname  for nodes allocated to the job from the
2791                             cloud. Each element in the set if colon separated
2792                             and each set is comma separated. For example:
2793                             SLURM_NODE_ALIASES=
2794                             ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2795
2796       SLURM_NODEID          The relative node ID of the current node.
2797
2798       SLURM_JOB_NODELIST    List of nodes allocated to the job.
2799
2800       SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
2801                             Total  number  of processes in the current job or
2802                             job step.
2803
2804       SLURM_PACK_SIZE       Set to count of components in heterogeneous job.
2805
2806       SLURM_PRIO_PROCESS    The scheduling priority (nice value) at the  time
2807                             of  job  submission.  This value is propagated to
2808                             the spawned processes.
2809
2810       SLURM_PROCID          The MPI rank (or relative process ID) of the cur‐
2811                             rent process.
2812
2813       SLURM_SRUN_COMM_HOST  IP address of srun communication host.
2814
2815       SLURM_SRUN_COMM_PORT  srun communication port.
2816
2817       SLURM_STEP_LAUNCHER_PORT
2818                             Step launcher port.
2819
2820       SLURM_STEP_NODELIST   List of nodes allocated to the step.
2821
2822       SLURM_STEP_NUM_NODES  Number of nodes allocated to the step.
2823
2824       SLURM_STEP_NUM_TASKS  Number of processes in the step.
2825
2826       SLURM_STEP_TASKS_PER_NODE
2827                             Number of processes per node within the step.
2828
2829       SLURM_STEP_ID (and SLURM_STEPID for backwards compatibility)
2830                             The step ID of the current job.
2831
2832       SLURM_SUBMIT_DIR      The  directory from which srun was invoked or, if
2833                             applicable, the directory specified  by  the  -D,
2834                             --chdir option.
2835
2836       SLURM_SUBMIT_HOST     The  hostname  of  the computer from which salloc
2837                             was invoked.
2838
2839       SLURM_TASK_PID        The process ID of the task being started.
2840
2841       SLURM_TASKS_PER_NODE  Number of tasks to be  initiated  on  each  node.
2842                             Values  are comma separated and in the same order
2843                             as SLURM_JOB_NODELIST.  If two or  more  consecu‐
2844                             tive  nodes are to have the same task count, that
2845                             count is followed by "(x#)" where "#" is the rep‐
2846                             etition         count.        For        example,
2847                             "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2848                             first  three  nodes will each execute three tasks
2849                             and the fourth node will execute one task.
2850
2851
2852       SLURM_TOPOLOGY_ADDR   This is set only if the  system  has  the  topol‐
2853                             ogy/tree  plugin  configured.   The value will be
2854                             set to the names network switches  which  may  be
2855                             involved  in  the  job's  communications from the
2856                             system's top level switch down to the leaf switch
2857                             and  ending  with  node name. A period is used to
2858                             separate each hardware component name.
2859
2860       SLURM_TOPOLOGY_ADDR_PATTERN
2861                             This is set only if the  system  has  the  topol‐
2862                             ogy/tree  plugin  configured.   The value will be
2863                             set  component  types  listed   in   SLURM_TOPOL‐
2864                             OGY_ADDR.   Each  component will be identified as
2865                             either "switch" or "node".  A period is  used  to
2866                             separate each hardware component type.
2867
2868       SLURM_UMASK           The umask in effect when the job was submitted.
2869
2870       SLURMD_NODENAME       Name of the node running the task. In the case of
2871                             a parallel  job  executing  on  multiple  compute
2872                             nodes,  the various tasks will have this environ‐
2873                             ment variable set to  different  values  on  each
2874                             compute node.
2875
2876       SRUN_DEBUG            Set  to  the  logging  level of the srun command.
2877                             Default value is 3 (info level).   The  value  is
2878                             incremented  or decremented based upon the --ver‐
2879                             bose and --quiet options.
2880
2881

SIGNALS AND ESCAPE SEQUENCES

2883       Signals sent to the srun command are  automatically  forwarded  to  the
2884       tasks  it  is  controlling  with  a few exceptions. The escape sequence
2885       <control-c> will report the state of all tasks associated with the srun
2886       command.  If  <control-c>  is entered twice within one second, then the
2887       associated SIGINT signal will be sent to all tasks  and  a  termination
2888       sequence  will  be entered sending SIGCONT, SIGTERM, and SIGKILL to all
2889       spawned tasks.  If a third <control-c> is received,  the  srun  program
2890       will  be  terminated  without waiting for remote tasks to exit or their
2891       I/O to complete.
2892
2893       The escape sequence <control-z> is presently ignored. Our intent is for
2894       this put the srun command into a mode where various special actions may
2895       be invoked.
2896
2897

MPI SUPPORT

2899       MPI use depends upon the type of MPI being used.  There are three  fun‐
2900       damentally  different  modes  of  operation  used  by these various MPI
2901       implementation.
2902
2903       1. Slurm directly launches the tasks  and  performs  initialization  of
2904       communications  through the PMI2 or PMIx APIs.  For example: "srun -n16
2905       a.out".
2906
2907       2. Slurm creates a resource allocation for  the  job  and  then  mpirun
2908       launches tasks using Slurm's infrastructure (OpenMPI).
2909
2910       3.  Slurm  creates  a  resource  allocation for the job and then mpirun
2911       launches tasks using some mechanism other than Slurm, such  as  SSH  or
2912       RSH.   These  tasks are initiated outside of Slurm's monitoring or con‐
2913       trol. Slurm's epilog should be configured to purge these tasks when the
2914       job's  allocation  is  relinquished,  or  the use of pam_slurm_adopt is
2915       highly recommended.
2916
2917       See https://slurm.schedmd.com/mpi_guide.html for  more  information  on
2918       use of these various MPI implementation with Slurm.
2919
2920

MULTIPLE PROGRAM CONFIGURATION

2922       Comments  in the configuration file must have a "#" in column one.  The
2923       configuration file contains the following  fields  separated  by  white
2924       space:
2925
2926       Task rank
2927              One or more task ranks to use this configuration.  Multiple val‐
2928              ues may be comma separated.  Ranges may be  indicated  with  two
2929              numbers separated with a '-' with the smaller number first (e.g.
2930              "0-4" and not "4-0").  To indicate all tasks not otherwise spec‐
2931              ified,  specify  a rank of '*' as the last line of the file.  If
2932              an attempt is made to initiate a task for  which  no  executable
2933              program is defined, the following error message will be produced
2934              "No executable program specified for this task".
2935
2936       Executable
2937              The name of the program to  execute.   May  be  fully  qualified
2938              pathname if desired.
2939
2940       Arguments
2941              Program  arguments.   The  expression "%t" will be replaced with
2942              the task's number.  The expression "%o" will  be  replaced  with
2943              the task's offset within this range (e.g. a configured task rank
2944              value of "1-5" would  have  offset  values  of  "0-4").   Single
2945              quotes  may  be  used to avoid having the enclosed values inter‐
2946              preted.  This field is optional.  Any arguments for the  program
2947              entered on the command line will be added to the arguments spec‐
2948              ified in the configuration file.
2949
2950       For example:
2951       ###################################################################
2952       # srun multiple program configuration file
2953       #
2954       # srun -n8 -l --multi-prog silly.conf
2955       ###################################################################
2956       4-6       hostname
2957       1,7       echo  task:%t
2958       0,2-3     echo  offset:%o
2959
2960       > srun -n8 -l --multi-prog silly.conf
2961       0: offset:0
2962       1: task:1
2963       2: offset:1
2964       3: offset:2
2965       4: linux15.llnl.gov
2966       5: linux16.llnl.gov
2967       6: linux17.llnl.gov
2968       7: task:7
2969
2970
2971
2972

EXAMPLES

2974       This simple example demonstrates the execution of the command  hostname
2975       in  eight tasks. At least eight processors will be allocated to the job
2976       (the same as the task count) on however many nodes are required to sat‐
2977       isfy  the  request.  The output of each task will be proceeded with its
2978       task number.  (The machine "dev" in the example below has  a  total  of
2979       two CPUs per node)
2980
2981
2982       > srun -n8 -l hostname
2983       0: dev0
2984       1: dev0
2985       2: dev1
2986       3: dev1
2987       4: dev2
2988       5: dev2
2989       6: dev3
2990       7: dev3
2991
2992
2993       The  srun -r option is used within a job script to run two job steps on
2994       disjoint nodes in the following example. The script is run using  allo‐
2995       cate mode instead of as a batch job in this case.
2996
2997
2998       > cat test.sh
2999       #!/bin/sh
3000       echo $SLURM_JOB_NODELIST
3001       srun -lN2 -r2 hostname
3002       srun -lN2 hostname
3003
3004       > salloc -N4 test.sh
3005       dev[7-10]
3006       0: dev9
3007       1: dev10
3008       0: dev7
3009       1: dev8
3010
3011
3012       The following script runs two job steps in parallel within an allocated
3013       set of nodes.
3014
3015
3016       > cat test.sh
3017       #!/bin/bash
3018       srun -lN2 -n4 -r 2 sleep 60 &
3019       srun -lN2 -r 0 sleep 60 &
3020       sleep 1
3021       squeue
3022       squeue -s
3023       wait
3024
3025       > salloc -N4 test.sh
3026         JOBID PARTITION     NAME     USER  ST      TIME  NODES NODELIST
3027         65641     batch  test.sh   grondo   R      0:01      4 dev[7-10]
3028
3029       STEPID     PARTITION     USER      TIME NODELIST
3030       65641.0        batch   grondo      0:01 dev[7-8]
3031       65641.1        batch   grondo      0:01 dev[9-10]
3032
3033
3034       This example demonstrates how one executes a simple MPI  job.   We  use
3035       srun  to  build  a list of machines (nodes) to be used by mpirun in its
3036       required format. A sample command line and the script  to  be  executed
3037       follow.
3038
3039
3040       > cat test.sh
3041       #!/bin/sh
3042       MACHINEFILE="nodes.$SLURM_JOB_ID"
3043
3044       # Generate Machinefile for mpi such that hosts are in the same
3045       #  order as if run via srun
3046       #
3047       srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3048
3049       # Run using generated Machine file:
3050       mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3051
3052       rm $MACHINEFILE
3053
3054       > salloc -N2 -n4 test.sh
3055
3056
3057       This  simple  example  demonstrates  the execution of different jobs on
3058       different nodes in the same srun.  You can do this for  any  number  of
3059       nodes  or  any number of jobs.  The executables are placed on the nodes
3060       sited by the SLURM_NODEID env var.  Starting at 0 and going to the num‐
3061       ber specified on the srun commandline.
3062
3063
3064       > cat test.sh
3065       case $SLURM_NODEID in
3066           0) echo "I am running on "
3067              hostname ;;
3068           1) hostname
3069              echo "is where I am running" ;;
3070       esac
3071
3072       > srun -N2 test.sh
3073       dev0
3074       is where I am running
3075       I am running on
3076       dev1
3077
3078
3079       This  example  demonstrates use of multi-core options to control layout
3080       of tasks.  We request that four sockets per  node  and  two  cores  per
3081       socket be dedicated to the job.
3082
3083
3084       > srun -N2 -B 4-4:2-2 a.out
3085
3086       This  example shows a script in which Slurm is used to provide resource
3087       management for a job by executing the various job steps  as  processors
3088       become available for their dedicated use.
3089
3090
3091       > cat my.script
3092       #!/bin/bash
3093       srun --exclusive -n4 prog1 &
3094       srun --exclusive -n3 prog2 &
3095       srun --exclusive -n1 prog3 &
3096       srun --exclusive -n1 prog4 &
3097       wait
3098
3099
3100       This  example  shows  how to launch an application called "master" with
3101       one task, 8 CPUs and and 16 GB of memory (2 GB per  CPU)  plus  another
3102       application  called "slave" with 16 tasks, 1 CPU per task (the default)
3103       and 1 GB of memory per task.
3104
3105
3106       > srun -n1 -c16 --mem-per-cpu=1gb master : -n16 --mem-per-cpu=1gb slave
3107
3108

COPYING

3110       Copyright (C) 2006-2007 The Regents of the  University  of  California.
3111       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3112       Copyright (C) 2008-2010 Lawrence Livermore National Security.
3113       Copyright (C) 2010-2015 SchedMD LLC.
3114
3115       This  file  is  part  of  Slurm,  a  resource  management program.  For
3116       details, see <https://slurm.schedmd.com/>.
3117
3118       Slurm is free software; you can redistribute it and/or modify it  under
3119       the  terms  of  the GNU General Public License as published by the Free
3120       Software Foundation; either version 2  of  the  License,  or  (at  your
3121       option) any later version.
3122
3123       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
3124       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
3125       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
3126       for more details.
3127
3128

SEE ALSO

3130       salloc(1), sattach(1), sbatch(1), sbcast(1),  scancel(1),  scontrol(1),
3131       squeue(1), slurm.conf(5), sched_setaffinity (2), numa (3) getrlimit (2)
3132
3133
3134
3135October 2019                    Slurm Commands                         srun(1)
Impressum