1srun(1)                         Slurm Commands                         srun(1)
2
3
4

NAME

6       srun - Run parallel jobs
7
8

SYNOPSIS

10       srun  [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]]
11       executable(N) [args(N)...]
12
13       Option(s) define multiple jobs in  a  co-scheduled  heterogeneous  job.
14       For more details about heterogeneous jobs see the document
15       https://slurm.schedmd.com/heterogeneous_jobs.html
16
17

DESCRIPTION

19       Run  a  parallel  job  on cluster managed by Slurm.  If necessary, srun
20       will first create a resource allocation in which to  run  the  parallel
21       job.
22
23       The  following  document  describes the influence of various options on
24       the allocation of cpus to jobs and tasks.
25       https://slurm.schedmd.com/cpu_management.html
26
27

RETURN VALUE

29       srun will return the highest exit code of all tasks run or the  highest
30       signal  (with  the high-order bit set in an 8-bit integer -- e.g. 128 +
31       signal) of any task that exited with a signal.
32       The value 253 is reserved for out-of-memory errors.
33
34

EXECUTABLE PATH RESOLUTION

36       The executable is resolved in the following order:
37
38       1. If executable starts with ".", then path is constructed as:  current
39       working directory / executable
40       2. If executable starts with a "/", then path is considered absolute.
41       3. If executable can be resolved through PATH. See path_resolution(7).
42       4. If executable is in current working directory.
43
44       Current  working directory is the calling process working directory un‐
45       less the --chdir argument is passed, which will  override  the  current
46       working directory.
47
48

OPTIONS

50       --accel-bind=<options>
51              Control how tasks are bound to generic resources of type gpu and
52              nic.  Multiple options may be specified. Supported  options  in‐
53              clude:
54
55              g      Bind each task to GPUs which are closest to the allocated
56                     CPUs.
57
58              n      Bind each task to NICs which are closest to the allocated
59                     CPUs.
60
61              v      Verbose  mode. Log how tasks are bound to GPU and NIC de‐
62                     vices.
63
64              This option applies to job allocations.
65
66       -A, --account=<account>
67              Charge resources used by this job to specified account.  The ac‐
68              count  is  an  arbitrary string. The account name may be changed
69              after job submission using the scontrol command. This option ap‐
70              plies to job allocations.
71
72       --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
73              Define  the  job  accounting and profiling sampling intervals in
74              seconds.  This can be used  to  override  the  JobAcctGatherFre‐
75              quency  parameter  in the slurm.conf file. <datatype>=<interval>
76              specifies the task  sampling  interval  for  the  jobacct_gather
77              plugin  or  a  sampling  interval  for  a  profiling type by the
78              acct_gather_profile     plugin.     Multiple     comma-separated
79              <datatype>=<interval> pairs may be specified. Supported datatype
80              values are:
81
82              task        Sampling interval for the jobacct_gather plugins and
83                          for   task   profiling  by  the  acct_gather_profile
84                          plugin.
85                          NOTE: This frequency is used to monitor  memory  us‐
86                          age.  If memory limits are enforced the highest fre‐
87                          quency a user can request is what is  configured  in
88                          the slurm.conf file.  It can not be disabled.
89
90              energy      Sampling  interval  for  energy  profiling using the
91                          acct_gather_energy plugin.
92
93              network     Sampling interval for infiniband profiling using the
94                          acct_gather_interconnect plugin.
95
96              filesystem  Sampling interval for filesystem profiling using the
97                          acct_gather_filesystem plugin.
98
99
100              The default value for the task sampling interval is 30  seconds.
101              The  default value for all other intervals is 0.  An interval of
102              0 disables sampling of the specified type.  If the task sampling
103              interval  is  0, accounting information is collected only at job
104              termination (reducing Slurm interference with the job).
105              Smaller (non-zero) values have a greater impact upon job perfor‐
106              mance,  but a value of 30 seconds is not likely to be noticeable
107              for applications having less than 10,000 tasks. This option  ap‐
108              plies to job allocations.
109
110       --bb=<spec>
111              Burst  buffer  specification.  The  form of the specification is
112              system dependent.  Also see --bbf. This option  applies  to  job
113              allocations.   When  the  --bb option is used, Slurm parses this
114              option and creates a temporary burst buffer script file that  is
115              used  internally  by the burst buffer plugins. See Slurm's burst
116              buffer guide for more information and examples:
117              https://slurm.schedmd.com/burst_buffer.html
118
119       --bbf=<file_name>
120              Path of file containing burst buffer specification.  The form of
121              the  specification is system dependent.  Also see --bb. This op‐
122              tion applies to job allocations.  See Slurm's burst buffer guide
123              for more information and examples:
124              https://slurm.schedmd.com/burst_buffer.html
125
126       --bcast[=<dest_path>]
127              Copy executable file to allocated compute nodes.  If a file name
128              is specified, copy the executable to the  specified  destination
129              file path.  If the path specified ends with '/' it is treated as
130              a target directory,  and  the  destination  file  name  will  be
131              slurm_bcast_<job_id>.<step_id>_<nodename>.   If  no dest_path is
132              specified and the slurm.conf BcastParameters DestDir is  config‐
133              ured  then  it  is used, and the filename follows the above pat‐
134              tern. If none of the previous  is  specified,  then  --chdir  is
135              used, and the filename follows the above pattern too.  For exam‐
136              ple, "srun --bcast=/tmp/mine  -N3  a.out"  will  copy  the  file
137              "a.out"  from  your current directory to the file "/tmp/mine" on
138              each of the three allocated compute nodes and execute that file.
139              This option applies to step allocations.
140
141       --bcast-exclude={NONE|<exclude_path>[,<exclude_path>...]}
142              Comma-separated  list of absolute directory paths to be excluded
143              when autodetecting and broadcasting executable shared object de‐
144              pendencies through --bcast. If the keyword "NONE" is configured,
145              no directory paths will be excluded. The default value  is  that
146              of  slurm.conf  BcastExclude  and  this option overrides it. See
147              also --bcast and --send-libs.
148
149       -b, --begin=<time>
150              Defer initiation of this job until the specified time.   It  ac‐
151              cepts times of the form HH:MM:SS to run a job at a specific time
152              of day (seconds are optional).  (If that time is  already  past,
153              the  next day is assumed.)  You may also specify midnight, noon,
154              fika (3 PM) or teatime (4 PM) and you  can  have  a  time-of-day
155              suffixed  with  AM  or  PM  for  running  in  the morning or the
156              evening.  You can also say what day the  job  will  be  run,  by
157              specifying  a  date  of  the form MMDDYY or MM/DD/YY YYYY-MM-DD.
158              Combine   date   and   time   using   the    following    format
159              YYYY-MM-DD[THH:MM[:SS]].  You  can  also  give  times like now +
160              count time-units, where the time-units can be seconds (default),
161              minutes, hours, days, or weeks and you can tell Slurm to run the
162              job today with the keyword today and to  run  the  job  tomorrow
163              with  the  keyword tomorrow.  The value may be changed after job
164              submission using the scontrol command.  For example:
165
166                 --begin=16:00
167                 --begin=now+1hour
168                 --begin=now+60           (seconds by default)
169                 --begin=2010-01-20T12:34:00
170
171
172              Notes on date/time specifications:
173               - Although the 'seconds' field of the HH:MM:SS time  specifica‐
174              tion  is  allowed  by  the  code, note that the poll time of the
175              Slurm scheduler is not precise enough to guarantee  dispatch  of
176              the  job on the exact second.  The job will be eligible to start
177              on the next poll following the specified time.  The  exact  poll
178              interval  depends  on the Slurm scheduler (e.g., 60 seconds with
179              the default sched/builtin).
180               -  If  no  time  (HH:MM:SS)  is  specified,  the   default   is
181              (00:00:00).
182               -  If a date is specified without a year (e.g., MM/DD) then the
183              current year is assumed, unless the  combination  of  MM/DD  and
184              HH:MM:SS  has  already  passed  for that year, in which case the
185              next year is used.
186              This option applies to job allocations.
187
188       -D, --chdir=<path>
189              Have the remote processes do a chdir to  path  before  beginning
190              execution. The default is to chdir to the current working direc‐
191              tory of the srun process. The path can be specified as full path
192              or relative path to the directory where the command is executed.
193              This option applies to job allocations.
194
195       --cluster-constraint=<list>
196              Specifies features that a federated cluster must have to have  a
197              sibling job submitted to it. Slurm will attempt to submit a sib‐
198              ling job to a cluster if it has at least one  of  the  specified
199              features.
200
201       -M, --clusters=<string>
202              Clusters  to  issue  commands to.  Multiple cluster names may be
203              comma separated.  The job will be submitted to the  one  cluster
204              providing the earliest expected job initiation time. The default
205              value is the current cluster. A value of 'all' will query to run
206              on  all  clusters.  Note the --export option to control environ‐
207              ment variables exported between clusters.  This  option  applies
208              only  to job allocations.  Note that the SlurmDBD must be up for
209              this option to work properly.
210
211       --comment=<string>
212              An arbitrary comment. This option applies to job allocations.
213
214       --compress[=type]
215              Compress file before sending it to compute hosts.  The  optional
216              argument specifies the data compression library to be used.  The
217              default is BcastParameters Compression= if set or  "lz4"  other‐
218              wise.   Supported  values are "lz4".  Some compression libraries
219              may be unavailable on some systems.  For use  with  the  --bcast
220              option. This option applies to step allocations.
221
222       -C, --constraint=<list>
223              Nodes  can  have features assigned to them by the Slurm adminis‐
224              trator.  Users can specify which of these features are  required
225              by  their  job  using  the constraint option.  Only nodes having
226              features matching the job constraints will be  used  to  satisfy
227              the  request.   Multiple  constraints may be specified with AND,
228              OR, matching OR, resource counts, etc. (some operators  are  not
229              supported  on  all  system types).  Supported constraint options
230              include:
231
232              Single Name
233                     Only nodes which have the specified feature will be used.
234                     For example, --constraint="intel"
235
236              Node Count
237                     A  request  can  specify  the number of nodes needed with
238                     some feature by appending an asterisk and count after the
239                     feature    name.     For   example,   --nodes=16   --con‐
240                     straint="graphics*4 ..."  indicates that the job requires
241                     16  nodes and that at least four of those nodes must have
242                     the feature "graphics."
243
244              AND    If only nodes with all  of  specified  features  will  be
245                     used.   The  ampersand  is used for an AND operator.  For
246                     example, --constraint="intel&gpu"
247
248              OR     If only nodes with at least  one  of  specified  features
249                     will  be used.  The vertical bar is used for an OR opera‐
250                     tor.  For example, --constraint="intel|amd"
251
252              Matching OR
253                     If only one of a set of possible options should  be  used
254                     for all allocated nodes, then use the OR operator and en‐
255                     close the options within square brackets.   For  example,
256                     --constraint="[rack1|rack2|rack3|rack4]" might be used to
257                     specify that all nodes must be allocated on a single rack
258                     of the cluster, but any of those four racks can be used.
259
260              Multiple Counts
261                     Specific counts of multiple resources may be specified by
262                     using the AND operator and enclosing the  options  within
263                     square      brackets.       For      example,      --con‐
264                     straint="[rack1*2&rack2*4]" might be used to specify that
265                     two  nodes  must be allocated from nodes with the feature
266                     of "rack1" and four nodes must be  allocated  from  nodes
267                     with the feature "rack2".
268
269                     NOTE:  This construct does not support multiple Intel KNL
270                     NUMA  or  MCDRAM  modes.  For   example,   while   --con‐
271                     straint="[(knl&quad)*2&(knl&hemi)*4]"  is  not supported,
272                     --constraint="[haswell*2&(knl&hemi)*4]"   is   supported.
273                     Specification of multiple KNL modes requires the use of a
274                     heterogeneous job.
275
276              Brackets
277                     Brackets can be used to indicate that you are looking for
278                     a  set of nodes with the different requirements contained
279                     within    the    brackets.    For     example,     --con‐
280                     straint="[(rack1|rack2)*1&(rack3)*2]"  will  get  you one
281                     node with either the "rack1" or "rack2" features and  two
282                     nodes with the "rack3" feature.  The same request without
283                     the brackets will try to find a single  node  that  meets
284                     those requirements.
285
286                     NOTE:  Brackets are only reserved for Multiple Counts and
287                     Matching OR syntax.  AND operators require  a  count  for
288                     each     feature    inside    square    brackets    (i.e.
289                     "[quad*2&hemi*1]"). Slurm will only allow a single set of
290                     bracketed constraints per job.
291
292              Parenthesis
293                     Parenthesis  can  be used to group like node features to‐
294                     gether.           For           example,           --con‐
295                     straint="[(knl&snc4&flat)*4&haswell*1]"  might be used to
296                     specify that four nodes with the features  "knl",  "snc4"
297                     and  "flat"  plus one node with the feature "haswell" are
298                     required.  All  options  within  parenthesis  should   be
299                     grouped with AND (e.g. "&") operands.
300
301              WARNING: When srun is executed from within salloc or sbatch, the
302              constraint value can only contain a single feature name. None of
303              the other operators are currently supported for job steps.
304              This option applies to job and step allocations.
305
306       --container=<path_to_container>
307              Absolute path to OCI container bundle.
308
309       --contiguous
310              If set, then the allocated nodes must form a contiguous set.
311
312              NOTE: If SelectPlugin=cons_res this option won't be honored with
313              the topology/tree or topology/3d_torus plugins,  both  of  which
314              can modify the node ordering. This option applies to job alloca‐
315              tions.
316
317       -S, --core-spec=<num>
318              Count of specialized cores per node reserved by the job for sys‐
319              tem  operations and not used by the application. The application
320              will not use these cores, but will be charged for their  alloca‐
321              tion.   Default  value  is  dependent upon the node's configured
322              CoreSpecCount value.  If a value of zero is designated  and  the
323              Slurm  configuration  option AllowSpecResourcesUsage is enabled,
324              the job will be allowed to override CoreSpecCount  and  use  the
325              specialized resources on nodes it is allocated.  This option can
326              not be used with the --thread-spec option. This  option  applies
327              to job allocations.
328              NOTE:  This  option may implicitly impact the number of tasks if
329              -n was not specified.
330
331       --cores-per-socket=<cores>
332              Restrict node selection to nodes with  at  least  the  specified
333              number of cores per socket.  See additional information under -B
334              option above when task/affinity plugin is enabled.  This  option
335              applies to job allocations.
336
337       --cpu-bind=[{quiet|verbose},]<type>
338              Bind  tasks to CPUs.  Used only when the task/affinity plugin is
339              enabled.  NOTE: To have Slurm always report on the selected  CPU
340              binding  for  all  commands  executed in a shell, you can enable
341              verbose mode by setting the SLURM_CPU_BIND environment  variable
342              value to "verbose".
343
344              The  following  informational environment variables are set when
345              --cpu-bind is in use:
346
347                   SLURM_CPU_BIND_VERBOSE
348                   SLURM_CPU_BIND_TYPE
349                   SLURM_CPU_BIND_LIST
350
351              See the ENVIRONMENT VARIABLES section for a  more  detailed  de‐
352              scription  of  the  individual  SLURM_CPU_BIND  variables. These
353              variable are available only if the task/affinity plugin is  con‐
354              figured.
355
356              When  using --cpus-per-task to run multithreaded tasks, be aware
357              that CPU binding is inherited from the parent  of  the  process.
358              This  means that the multithreaded task should either specify or
359              clear the CPU binding itself to avoid having all threads of  the
360              multithreaded  task use the same mask/CPU as the parent.  Alter‐
361              natively, fat masks (masks which specify more than  one  allowed
362              CPU)  could  be  used for the tasks in order to provide multiple
363              CPUs for the multithreaded tasks.
364
365              Note that a job step can be allocated different numbers of  CPUs
366              on each node or be allocated CPUs not starting at location zero.
367              Therefore one of the options which  automatically  generate  the
368              task  binding  is  recommended.   Explicitly  specified masks or
369              bindings are only honored when the job step has  been  allocated
370              every available CPU on the node.
371
372              Binding  a task to a NUMA locality domain means to bind the task
373              to the set of CPUs that belong to the NUMA  locality  domain  or
374              "NUMA  node".   If NUMA locality domain options are used on sys‐
375              tems with no NUMA support, then each socket is considered a  lo‐
376              cality domain.
377
378              If  the  --cpu-bind option is not used, the default binding mode
379              will depend upon Slurm's configuration and the  step's  resource
380              allocation.   If  all  allocated  nodes have the same configured
381              CpuBind mode, that will be used.  Otherwise if the job's  Parti‐
382              tion  has  a configured CpuBind mode, that will be used.  Other‐
383              wise if Slurm has a configured TaskPluginParam value, that  mode
384              will  be used.  Otherwise automatic binding will be performed as
385              described below.
386
387              Auto Binding
388                     Applies only when task/affinity is enabled.  If  the  job
389                     step  allocation  includes an allocation with a number of
390                     sockets, cores, or threads equal to the number  of  tasks
391                     times  cpus-per-task,  then  the tasks will by default be
392                     bound to the appropriate resources (auto  binding).  Dis‐
393                     able   this  mode  of  operation  by  explicitly  setting
394                     "--cpu-bind=none".       Use        TaskPluginParam=auto‐
395                     bind=[threads|cores|sockets] to set a default cpu binding
396                     in case "auto binding" doesn't find a match.
397
398              Supported options include:
399
400                     q[uiet]
401                            Quietly bind before task runs (default)
402
403                     v[erbose]
404                            Verbosely report binding before task runs
405
406                     no[ne] Do not bind tasks to  CPUs  (default  unless  auto
407                            binding is applied)
408
409                     rank   Automatically  bind by task rank.  The lowest num‐
410                            bered task on each node is  bound  to  socket  (or
411                            core  or  thread) zero, etc.  Not supported unless
412                            the entire node is allocated to the job.
413
414                     map_cpu:<list>
415                            Bind by setting CPU masks on tasks (or  ranks)  as
416                            specified          where         <list>         is
417                            <cpu_id_for_task_0>,<cpu_id_for_task_1>,...    CPU
418                            IDs  are interpreted as decimal values unless they
419                            are preceded with '0x' in which case  they  inter‐
420                            preted  as  hexadecimal  values.  If the number of
421                            tasks (or ranks) exceeds the number of elements in
422                            this  list, elements in the list will be reused as
423                            needed starting from the beginning  of  the  list.
424                            To  simplify  support  for  large task counts, the
425                            lists may follow a map with an asterisk and  repe‐
426                            tition         count.          For         example
427                            "map_cpu:0x0f*4,0xf0*4".
428
429                     mask_cpu:<list>
430                            Bind by setting CPU masks on tasks (or  ranks)  as
431                            specified          where         <list>         is
432                            <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
433                            The  mapping is specified for a node and identical
434                            mapping is applied to  the  tasks  on  every  node
435                            (i.e. the lowest task ID on each node is mapped to
436                            the first mask specified in the list, etc.).   CPU
437                            masks are always interpreted as hexadecimal values
438                            but can be preceded with an optional '0x'.  If the
439                            number  of  tasks (or ranks) exceeds the number of
440                            elements in this list, elements in the  list  will
441                            be reused as needed starting from the beginning of
442                            the list.  To  simplify  support  for  large  task
443                            counts,  the lists may follow a map with an aster‐
444                            isk   and   repetition   count.     For    example
445                            "mask_cpu:0x0f*4,0xf0*4".
446
447                     rank_ldom
448                            Bind  to  a NUMA locality domain by rank. Not sup‐
449                            ported unless the entire node is allocated to  the
450                            job.
451
452                     map_ldom:<list>
453                            Bind  by mapping NUMA locality domain IDs to tasks
454                            as      specified      where       <list>       is
455                            <ldom1>,<ldom2>,...<ldomN>.   The  locality domain
456                            IDs are interpreted as decimal values unless  they
457                            are  preceded with '0x' in which case they are in‐
458                            terpreted as hexadecimal  values.   Not  supported
459                            unless the entire node is allocated to the job.
460
461                     mask_ldom:<list>
462                            Bind  by  setting  NUMA  locality  domain masks on
463                            tasks    as    specified    where    <list>     is
464                            <mask1>,<mask2>,...<maskN>.   NUMA locality domain
465                            masks are always interpreted as hexadecimal values
466                            but  can  be  preceded with an optional '0x'.  Not
467                            supported unless the entire node is  allocated  to
468                            the job.
469
470                     sockets
471                            Automatically  generate  masks  binding  tasks  to
472                            sockets.  Only the CPUs on the socket  which  have
473                            been  allocated  to  the job will be used.  If the
474                            number of tasks differs from the number  of  allo‐
475                            cated sockets this can result in sub-optimal bind‐
476                            ing.
477
478                     cores  Automatically  generate  masks  binding  tasks  to
479                            cores.   If  the  number of tasks differs from the
480                            number of  allocated  cores  this  can  result  in
481                            sub-optimal binding.
482
483                     threads
484                            Automatically  generate  masks  binding  tasks  to
485                            threads.  If the number of tasks differs from  the
486                            number  of  allocated  threads  this can result in
487                            sub-optimal binding.
488
489                     ldoms  Automatically generate masks binding tasks to NUMA
490                            locality  domains.  If the number of tasks differs
491                            from the number of allocated locality domains this
492                            can result in sub-optimal binding.
493
494                     help   Show help message for cpu-bind
495
496              This option applies to job and step allocations.
497
498       --cpu-freq=<p1>[-p2[:p3]]
499
500              Request  that the job step initiated by this srun command be run
501              at some requested frequency if possible, on  the  CPUs  selected
502              for the step on the compute node(s).
503
504              p1  can be  [#### | low | medium | high | highm1] which will set
505              the frequency scaling_speed to the corresponding value, and  set
506              the frequency scaling_governor to UserSpace. See below for defi‐
507              nition of the values.
508
509              p1 can be [Conservative | OnDemand |  Performance  |  PowerSave]
510              which  will set the scaling_governor to the corresponding value.
511              The governor has to be in the list set by the slurm.conf  option
512              CpuFreqGovernors.
513
514              When p2 is present, p1 will be the minimum scaling frequency and
515              p2 will be the maximum scaling frequency.
516
517              p2 can be  [#### | medium | high | highm1] p2  must  be  greater
518              than p1.
519
520              p3  can  be [Conservative | OnDemand | Performance | PowerSave |
521              SchedUtil | UserSpace] which will set the governor to the corre‐
522              sponding value.
523
524              If p3 is UserSpace, the frequency scaling_speed will be set by a
525              power or energy aware scheduling strategy to a value between  p1
526              and  p2  that lets the job run within the site's power goal. The
527              job may be delayed if p1 is higher than a frequency that  allows
528              the job to run within the goal.
529
530              If  the current frequency is < min, it will be set to min. Like‐
531              wise, if the current frequency is > max, it will be set to max.
532
533              Acceptable values at present include:
534
535              ####          frequency in kilohertz
536
537              Low           the lowest available frequency
538
539              High          the highest available frequency
540
541              HighM1        (high minus one)  will  select  the  next  highest
542                            available frequency
543
544              Medium        attempts  to  set a frequency in the middle of the
545                            available range
546
547              Conservative  attempts to use the Conservative CPU governor
548
549              OnDemand      attempts to use the OnDemand CPU governor (the de‐
550                            fault value)
551
552              Performance   attempts to use the Performance CPU governor
553
554              PowerSave     attempts to use the PowerSave CPU governor
555
556              UserSpace     attempts to use the UserSpace CPU governor
557
558              The  following  informational environment variable is set
559              in the job
560              step when --cpu-freq option is requested.
561                      SLURM_CPU_FREQ_REQ
562
563              This environment variable can also be used to supply  the  value
564              for  the CPU frequency request if it is set when the 'srun' com‐
565              mand is issued.  The --cpu-freq on the command line  will  over‐
566              ride  the  environment variable value.  The form on the environ‐
567              ment variable is the same as the command line.  See the ENVIRON‐
568              MENT    VARIABLES    section    for   a   description   of   the
569              SLURM_CPU_FREQ_REQ variable.
570
571              NOTE: This parameter is treated as a request, not a requirement.
572              If  the  job  step's  node does not support setting the CPU fre‐
573              quency, or the requested value is outside the bounds of the  le‐
574              gal frequencies, an error is logged, but the job step is allowed
575              to continue.
576
577              NOTE: Setting the frequency for just the CPUs of  the  job  step
578              implies that the tasks are confined to those CPUs.  If task con‐
579              finement (i.e. the task/affinity TaskPlugin is enabled,  or  the
580              task/cgroup  TaskPlugin is enabled with "ConstrainCores=yes" set
581              in cgroup.conf) is not configured, this parameter is ignored.
582
583              NOTE: When the step completes, the  frequency  and  governor  of
584              each selected CPU is reset to the previous values.
585
586              NOTE: When submitting jobs with  the --cpu-freq option with lin‐
587              uxproc as the ProctrackType can cause jobs to  run  too  quickly
588              before  Accounting is able to poll for job information. As a re‐
589              sult not all of accounting information will be present.
590
591              This option applies to job and step allocations.
592
593       --cpus-per-gpu=<ncpus>
594              Advise Slurm that ensuing job steps will require  ncpus  proces‐
595              sors per allocated GPU.  Not compatible with the --cpus-per-task
596              option.
597
598       -c, --cpus-per-task=<ncpus>
599              Request that ncpus be allocated per process. This may be  useful
600              if  the  job is multithreaded and requires more than one CPU per
601              task for optimal performance. The default is one CPU per process
602              and  does  not imply --exact.  If -c is specified without -n, as
603              many tasks will be allocated per node as possible while satisfy‐
604              ing  the  -c  restriction. For instance on a cluster with 8 CPUs
605              per node, a job request for 4 nodes and 3 CPUs per task  may  be
606              allocated 3 or 6 CPUs per node (1 or 2 tasks per node) depending
607              upon resource consumption by other jobs. Such a job may  be  un‐
608              able to execute more than a total of 4 tasks.
609
610              WARNING:  There  are configurations and options interpreted dif‐
611              ferently by job and job step requests which can result in incon‐
612              sistencies    for   this   option.    For   example   srun   -c2
613              --threads-per-core=1 prog may allocate two cores  for  the  job,
614              but if each of those cores contains two threads, the job alloca‐
615              tion will include four CPUs. The job step allocation  will  then
616              launch two threads per CPU for a total of two tasks.
617
618              WARNING:  When  srun  is  executed from within salloc or sbatch,
619              there are configurations and options which can result in  incon‐
620              sistent  allocations when -c has a value greater than -c on sal‐
621              loc or sbatch.
622
623              This option applies to job and step allocations.
624
625       --deadline=<OPT>
626              remove the job if no ending is  possible  before  this  deadline
627              (start  >  (deadline  -  time[-min])).   Default is no deadline.
628              Valid time formats are:
629              HH:MM[:SS] [AM|PM]
630              MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
631              MM/DD[/YY]-HH:MM[:SS]
632              YYYY-MM-DD[THH:MM[:SS]]]
633              now[+count[seconds(default)|minutes|hours|days|weeks]]
634
635              This option applies only to job allocations.
636
637       --delay-boot=<minutes>
638              Do not reboot nodes in order to  satisfied  this  job's  feature
639              specification  if the job has been eligible to run for less than
640              this time period.  If the job has waited for less than the spec‐
641              ified  period,  it  will  use  only nodes which already have the
642              specified features.  The argument is in units of minutes.  A de‐
643              fault  value  may be set by a system administrator using the de‐
644              lay_boot option of the SchedulerParameters configuration parame‐
645              ter  in the slurm.conf file, otherwise the default value is zero
646              (no delay).
647
648              This option applies only to job allocations.
649
650       -d, --dependency=<dependency_list>
651              Defer the start of this job  until  the  specified  dependencies
652              have been satisfied completed. This option does not apply to job
653              steps (executions of srun within an existing  salloc  or  sbatch
654              allocation)  only  to  job allocations.  <dependency_list> is of
655              the   form   <type:job_id[:job_id][,type:job_id[:job_id]]>    or
656              <type:job_id[:job_id][?type:job_id[:job_id]]>.  All dependencies
657              must be satisfied if the "," separator is used.  Any  dependency
658              may be satisfied if the "?" separator is used.  Only one separa‐
659              tor may be used.  Many jobs can share the  same  dependency  and
660              these  jobs  may even belong to different  users. The  value may
661              be changed after job submission using the scontrol command.  De‐
662              pendencies  on  remote jobs are allowed in a federation.  Once a
663              job dependency fails due to the termination state of a preceding
664              job,  the dependent job will never be run, even if the preceding
665              job is requeued and has a different termination state in a  sub‐
666              sequent execution. This option applies to job allocations.
667
668              after:job_id[[+time][:jobid[+time]...]]
669                     After  the  specified  jobs  start  or  are cancelled and
670                     'time' in minutes from job start or cancellation happens,
671                     this  job can begin execution. If no 'time' is given then
672                     there is no delay after start or cancellation.
673
674              afterany:job_id[:jobid...]
675                     This job can begin execution  after  the  specified  jobs
676                     have terminated.
677
678              afterburstbuffer:job_id[:jobid...]
679                     This  job  can  begin  execution after the specified jobs
680                     have terminated and any associated burst buffer stage out
681                     operations have completed.
682
683              aftercorr:job_id[:jobid...]
684                     A  task  of  this job array can begin execution after the
685                     corresponding task ID in the specified job has  completed
686                     successfully  (ran  to  completion  with  an exit code of
687                     zero).
688
689              afternotok:job_id[:jobid...]
690                     This job can begin execution  after  the  specified  jobs
691                     have terminated in some failed state (non-zero exit code,
692                     node failure, timed out, etc).
693
694              afterok:job_id[:jobid...]
695                     This job can begin execution  after  the  specified  jobs
696                     have  successfully  executed  (ran  to completion with an
697                     exit code of zero).
698
699              singleton
700                     This  job  can  begin  execution  after  any   previously
701                     launched  jobs  sharing  the  same job name and user have
702                     terminated.  In other words, only one job  by  that  name
703                     and owned by that user can be running or suspended at any
704                     point in time.  In a federation, a  singleton  dependency
705                     must be fulfilled on all clusters unless DependencyParam‐
706                     eters=disable_remote_singleton is used in slurm.conf.
707
708       -X, --disable-status
709              Disable the display of task status when srun receives  a  single
710              SIGINT  (Ctrl-C).  Instead immediately forward the SIGINT to the
711              running job.  Without this option a second Ctrl-C in one  second
712              is  required to forcibly terminate the job and srun will immedi‐
713              ately exit.  May  also  be  set  via  the  environment  variable
714              SLURM_DISABLE_STATUS. This option applies to job allocations.
715
716       -m,                                --distribution={*|block|cyclic|arbi‐
717       trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
718
719              Specify  alternate  distribution  methods  for remote processes.
720              For job allocation, this sets environment variables that will be
721              used  by  subsequent  srun requests and also affects which cores
722              will be selected for job allocation.
723
724              This option controls the distribution of tasks to the  nodes  on
725              which  resources  have  been  allocated, and the distribution of
726              those resources to tasks for binding (task affinity). The  first
727              distribution  method (before the first ":") controls the distri‐
728              bution of tasks to nodes.  The second distribution method (after
729              the  first  ":")  controls  the  distribution  of allocated CPUs
730              across sockets for binding  to  tasks.  The  third  distribution
731              method (after the second ":") controls the distribution of allo‐
732              cated CPUs across cores for binding to tasks.   The  second  and
733              third distributions apply only if task affinity is enabled.  The
734              third distribution is supported only if the  task/cgroup  plugin
735              is  configured.  The default value for each distribution type is
736              specified by *.
737
738              Note that with select/cons_res and select/cons_tres, the  number
739              of  CPUs allocated to each socket and node may be different. Re‐
740              fer to https://slurm.schedmd.com/mc_support.html for more infor‐
741              mation  on  resource allocation, distribution of tasks to nodes,
742              and binding of tasks to CPUs.
743              First distribution method (distribution of tasks across nodes):
744
745
746              *      Use the default method for distributing  tasks  to  nodes
747                     (block).
748
749              block  The  block distribution method will distribute tasks to a
750                     node such that consecutive tasks share a node. For  exam‐
751                     ple,  consider an allocation of three nodes each with two
752                     cpus. A four-task block distribution  request  will  dis‐
753                     tribute  those  tasks to the nodes with tasks one and two
754                     on the first node, task three on  the  second  node,  and
755                     task  four  on the third node.  Block distribution is the
756                     default behavior if the number of tasks exceeds the  num‐
757                     ber of allocated nodes.
758
759              cyclic The cyclic distribution method will distribute tasks to a
760                     node such that consecutive  tasks  are  distributed  over
761                     consecutive  nodes  (in a round-robin fashion). For exam‐
762                     ple, consider an allocation of three nodes each with  two
763                     cpus.  A  four-task cyclic distribution request will dis‐
764                     tribute those tasks to the nodes with tasks one and  four
765                     on  the first node, task two on the second node, and task
766                     three on the third node.  Note that  when  SelectType  is
767                     select/cons_res, the same number of CPUs may not be allo‐
768                     cated on each node. Task distribution will be round-robin
769                     among  all  the  nodes  with  CPUs  yet to be assigned to
770                     tasks.  Cyclic distribution is the  default  behavior  if
771                     the number of tasks is no larger than the number of allo‐
772                     cated nodes.
773
774              plane  The tasks are distributed in blocks of size  <size>.  The
775                     size  must  be given or SLURM_DIST_PLANESIZE must be set.
776                     The number of tasks distributed to each node is the  same
777                     as  for  cyclic distribution, but the taskids assigned to
778                     each node depend on the plane size. Additional  distribu‐
779                     tion  specifications cannot be combined with this option.
780                     For  more  details  (including  examples  and  diagrams),
781                     please  see https://slurm.schedmd.com/mc_support.html and
782                     https://slurm.schedmd.com/dist_plane.html
783
784              arbitrary
785                     The arbitrary method of distribution will  allocate  pro‐
786                     cesses in-order as listed in file designated by the envi‐
787                     ronment variable SLURM_HOSTFILE.   If  this  variable  is
788                     listed  it will over ride any other method specified.  If
789                     not set the method will default  to  block.   Inside  the
790                     hostfile  must contain at minimum the number of hosts re‐
791                     quested and be one per line or comma separated.  If spec‐
792                     ifying  a  task count (-n, --ntasks=<number>), your tasks
793                     will be laid out on the nodes in the order of the file.
794                     NOTE: The arbitrary distribution option on a job  alloca‐
795                     tion  only  controls the nodes to be allocated to the job
796                     and not the allocation of CPUs on those nodes.  This  op‐
797                     tion is meant primarily to control a job step's task lay‐
798                     out in an existing job allocation for the srun command.
799                     NOTE: If the number of tasks is given and a list  of  re‐
800                     quested  nodes  is  also  given, the number of nodes used
801                     from that list will be reduced to match that of the  num‐
802                     ber  of  tasks  if  the  number  of  nodes in the list is
803                     greater than the number of tasks.
804
805              Second distribution method (distribution of CPUs across  sockets
806              for binding):
807
808
809              *      Use the default method for distributing CPUs across sock‐
810                     ets (cyclic).
811
812              block  The block distribution method will  distribute  allocated
813                     CPUs  consecutively  from  the same socket for binding to
814                     tasks, before using the next consecutive socket.
815
816              cyclic The cyclic distribution method will distribute  allocated
817                     CPUs  for  binding to a given task consecutively from the
818                     same socket, and from the next consecutive socket for the
819                     next  task,  in  a  round-robin  fashion  across sockets.
820                     Tasks requiring more than one CPU will have all of  those
821                     CPUs allocated on a single socket if possible.
822
823              fcyclic
824                     The fcyclic distribution method will distribute allocated
825                     CPUs for binding to tasks from consecutive sockets  in  a
826                     round-robin  fashion across the sockets.  Tasks requiring
827                     more than one CPU will have  each  CPUs  allocated  in  a
828                     cyclic fashion across sockets.
829
830              Third distribution method (distribution of CPUs across cores for
831              binding):
832
833
834              *      Use the default method for distributing CPUs across cores
835                     (inherited from second distribution method).
836
837              block  The  block  distribution method will distribute allocated
838                     CPUs consecutively from the  same  core  for  binding  to
839                     tasks, before using the next consecutive core.
840
841              cyclic The  cyclic distribution method will distribute allocated
842                     CPUs for binding to a given task consecutively  from  the
843                     same  core,  and  from  the next consecutive core for the
844                     next task, in a round-robin fashion across cores.
845
846              fcyclic
847                     The fcyclic distribution method will distribute allocated
848                     CPUs  for  binding  to  tasks from consecutive cores in a
849                     round-robin fashion across the cores.
850
851              Optional control for task distribution over nodes:
852
853
854              Pack   Rather than evenly distributing a job step's tasks evenly
855                     across  its allocated nodes, pack them as tightly as pos‐
856                     sible on the nodes.  This only applies when  the  "block"
857                     task distribution method is used.
858
859              NoPack Rather than packing a job step's tasks as tightly as pos‐
860                     sible on the nodes, distribute them  evenly.   This  user
861                     option    will    supersede    the   SelectTypeParameters
862                     CR_Pack_Nodes configuration parameter.
863
864              This option applies to job and step allocations.
865
866       --epilog={none|<executable>}
867              srun will run executable just after the job step completes.  The
868              command  line  arguments  for executable will be the command and
869              arguments of the job step.  If none is specified, then  no  srun
870              epilog  will be run. This parameter overrides the SrunEpilog pa‐
871              rameter in slurm.conf. This parameter is completely  independent
872              from  the Epilog parameter in slurm.conf. This option applies to
873              job allocations.
874
875       -e, --error=<filename_pattern>
876              Specify how stderr is to be redirected. By default  in  interac‐
877              tive  mode, srun redirects stderr to the same file as stdout, if
878              one is specified. The --error option is provided to allow stdout
879              and  stderr to be redirected to different locations.  See IO Re‐
880              direction below for more options.  If the specified file already
881              exists,  it  will be overwritten. This option applies to job and
882              step allocations.
883
884       --exact
885              Allow a step access to only  the  resources  requested  for  the
886              step.   By  default,  all non-GRES resources on each node in the
887              step allocation will be used. This option only applies  to  step
888              allocations.
889              NOTE:  Parallel  steps  will either be blocked or rejected until
890              requested step resources are available unless --overlap is spec‐
891              ified. Job resources can be held after the completion of an srun
892              command while Slurm does job cleanup. Step epilogs and/or  SPANK
893              plugins can further delay the release of step resources.
894
895       -x, --exclude={<host1[,<host2>...]|<filename>}
896              Request that a specific list of hosts not be included in the re‐
897              sources allocated to this job. The host list will be assumed  to
898              be  a  filename  if it contains a "/" character. This option ap‐
899              plies to job and step allocations.
900
901       --exclusive[={user|mcs}]
902              This option applies to job and job step allocations, and has two
903              slightly different meanings for each one.  When used to initiate
904              a job, the job allocation cannot share nodes with other  running
905              jobs  (or just other users with the "=user" option or "=mcs" op‐
906              tion).  If user/mcs are not specified (i.e. the  job  allocation
907              can  not  share nodes with other running jobs), the job is allo‐
908              cated all CPUs and GRES on all nodes in the allocation,  but  is
909              only allocated as much memory as it requested. This is by design
910              to support gang scheduling, because suspended jobs still  reside
911              in  memory.  To  request  all the memory on a node, use --mem=0.
912              The default shared/exclusive behavior depends on system configu‐
913              ration and the partition's OverSubscribe option takes precedence
914              over the job's option.  NOTE: Since shared GRES (MPS) cannot  be
915              allocated  at  the same time as a sharing GRES (GPU) this option
916              only allocates all sharing GRES and no underlying shared GRES.
917
918              This option can also be used when initiating more than  one  job
919              step within an existing resource allocation (default), where you
920              want separate processors to be dedicated to each  job  step.  If
921              sufficient  processors  are  not  available  to initiate the job
922              step, it will be deferred. This can be thought of as providing a
923              mechanism  for resource management to the job within its alloca‐
924              tion (--exact implied).
925
926              The exclusive allocation of CPUs applies to  job  steps  by  de‐
927              fault,  but  --exact is NOT the default. In other words, the de‐
928              fault behavior is this: job steps will not share CPUs,  but  job
929              steps  will  be  allocated  all CPUs available to the job on all
930              nodes allocated to the steps.
931
932              In order to share the resources use the --overlap option.
933
934              See EXAMPLE below.
935
936       --export={[ALL,]<environment_variables>|ALL|NONE}
937              Identify which environment variables from the  submission  envi‐
938              ronment are propagated to the launched application.
939
940              --export=ALL
941                        Default  mode if --export is not specified. All of the
942                        user's environment will be loaded  from  the  caller's
943                        environment.
944
945              --export=NONE
946                        None  of  the  user  environment will be defined. User
947                        must use absolute path to the binary  to  be  executed
948                        that will define the environment. User can not specify
949                        explicit environment variables with "NONE".
950
951                        This option is particularly important  for  jobs  that
952                        are  submitted on one cluster and execute on a differ‐
953                        ent cluster (e.g. with  different  paths).   To  avoid
954                        steps  inheriting  environment  export  settings (e.g.
955                        "NONE") from sbatch command, either  set  --export=ALL
956                        or the environment variable SLURM_EXPORT_ENV should be
957                        set to "ALL".
958
959              --export=[ALL,]<environment_variables>
960                        Exports all SLURM* environment  variables  along  with
961                        explicitly  defined  variables.  Multiple  environment
962                        variable names should be comma separated.  Environment
963                        variable  names may be specified to propagate the cur‐
964                        rent value (e.g. "--export=EDITOR") or specific values
965                        may  be  exported (e.g. "--export=EDITOR=/bin/emacs").
966                        If "ALL" is specified, then all user environment vari‐
967                        ables will be loaded and will take precedence over any
968                        explicitly given environment variables.
969
970                   Example: --export=EDITOR,ARG1=test
971                        In this example, the propagated environment will  only
972                        contain  the  variable EDITOR from the user's environ‐
973                        ment, SLURM_* environment variables, and ARG1=test.
974
975                   Example: --export=ALL,EDITOR=/bin/emacs
976                        There are two possible outcomes for this  example.  If
977                        the  caller  has  the  EDITOR environment variable de‐
978                        fined, then the job's  environment  will  inherit  the
979                        variable from the caller's environment.  If the caller
980                        doesn't have an environment variable defined for  EDI‐
981                        TOR,  then  the  job's  environment will use the value
982                        given by --export.
983
984       -B, --extra-node-info=<sockets>[:cores[:threads]]
985              Restrict node selection to nodes with  at  least  the  specified
986              number of sockets, cores per socket and/or threads per core.
987              NOTE: These options do not specify the resource allocation size.
988              Each value specified is considered a minimum.  An  asterisk  (*)
989              can  be  used as a placeholder indicating that all available re‐
990              sources of that type are to be  utilized.  Values  can  also  be
991              specified  as  min-max. The individual levels can also be speci‐
992              fied in separate options if desired:
993
994                  --sockets-per-node=<sockets>
995                  --cores-per-socket=<cores>
996                  --threads-per-core=<threads>
997              If task/affinity plugin is enabled, then specifying  an  alloca‐
998              tion  in  this  manner  also sets a default --cpu-bind option of
999              threads if the -B option specifies a thread count, otherwise  an
1000              option  of  cores if a core count is specified, otherwise an op‐
1001              tion  of  sockets.   If  SelectType   is   configured   to   se‐
1002              lect/cons_res, it must have a parameter of CR_Core, CR_Core_Mem‐
1003              ory, CR_Socket, or CR_Socket_Memory for this option to  be  hon‐
1004              ored.   If  not  specified,  the  scontrol show job will display
1005              'ReqS:C:T=*:*:*'. This option applies to job allocations.
1006              NOTE:  This  option   is   mutually   exclusive   with   --hint,
1007              --threads-per-core and --ntasks-per-core.
1008              NOTE: If the number of sockets, cores and threads were all spec‐
1009              ified, the number of nodes was specified (as a fixed number, not
1010              a  range)  and  the number of tasks was NOT specified, srun will
1011              implicitly calculate the number of tasks as one task per thread.
1012
1013       --gid=<group>
1014              If srun is run as root, and the --gid option is used, submit the
1015              job  with  group's  group  access permissions.  group may be the
1016              group name or the numerical group ID. This option applies to job
1017              allocations.
1018
1019       --gpu-bind=[verbose,]<type>
1020              Bind  tasks to specific GPUs.  By default every spawned task can
1021              access every GPU allocated to the step.  If "verbose," is speci‐
1022              fied before <type>, then print out GPU binding debug information
1023              to the stderr of the tasks. GPU binding is ignored if  there  is
1024              only one task.
1025
1026              Supported type options:
1027
1028              closest   Bind  each task to the GPU(s) which are closest.  In a
1029                        NUMA environment, each task may be bound to more  than
1030                        one GPU (i.e.  all GPUs in that NUMA environment).
1031
1032              map_gpu:<list>
1033                        Bind by setting GPU masks on tasks (or ranks) as spec‐
1034                        ified            where            <list>            is
1035                        <gpu_id_for_task_0>,<gpu_id_for_task_1>,...   GPU  IDs
1036                        are interpreted as decimal values unless they are pre‐
1037                        ceded  with  '0x'  in  which  case they interpreted as
1038                        hexadecimal values. If the number of tasks (or  ranks)
1039                        exceeds  the number of elements in this list, elements
1040                        in the list will be reused as needed starting from the
1041                        beginning  of  the list. To simplify support for large
1042                        task counts, the lists may follow a map with an aster‐
1043                        isk     and    repetition    count.     For    example
1044                        "map_gpu:0*4,1*4".  If the task/cgroup plugin is  used
1045                        and  ConstrainDevices  is set in cgroup.conf, then the
1046                        GPU IDs are zero-based indexes relative  to  the  GPUs
1047                        allocated to the job (e.g. the first GPU is 0, even if
1048                        the global ID is 3). Otherwise, the GPU IDs are global
1049                        IDs,  and  all  GPUs on each node in the job should be
1050                        allocated for predictable binding results.
1051
1052              mask_gpu:<list>
1053                        Bind by setting GPU masks on tasks (or ranks) as spec‐
1054                        ified            where            <list>            is
1055                        <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,...    The
1056                        mapping  is specified for a node and identical mapping
1057                        is applied to the tasks on every node (i.e. the lowest
1058                        task ID on each node is mapped to the first mask spec‐
1059                        ified in the list, etc.). GPU masks are always  inter‐
1060                        preted  as hexadecimal values but can be preceded with
1061                        an optional '0x'. To simplify support for  large  task
1062                        counts,  the  lists  may follow a map with an asterisk
1063                        and      repetition      count.       For      example
1064                        "mask_gpu:0x0f*4,0xf0*4".   If  the task/cgroup plugin
1065                        is used and ConstrainDevices is  set  in  cgroup.conf,
1066                        then  the  GPU  IDs are zero-based indexes relative to
1067                        the GPUs allocated to the job (e.g. the first  GPU  is
1068                        0, even if the global ID is 3). Otherwise, the GPU IDs
1069                        are global IDs, and all GPUs on each node in  the  job
1070                        should be allocated for predictable binding results.
1071
1072              none      Do  not  bind  tasks  to  GPUs  (turns  off binding if
1073                        --gpus-per-task is requested).
1074
1075              per_task:<gpus_per_task>
1076                        Each task will be bound to the number of  gpus  speci‐
1077                        fied in <gpus_per_task>. Gpus are assigned in order to
1078                        tasks. The first task will be  assigned  the  first  x
1079                        number of gpus on the node etc.
1080
1081              single:<tasks_per_gpu>
1082                        Like  --gpu-bind=closest,  except  that  each task can
1083                        only be bound to a single GPU, even  when  it  can  be
1084                        bound  to  multiple  GPUs that are equally close.  The
1085                        GPU to bind to is determined by <tasks_per_gpu>, where
1086                        the first <tasks_per_gpu> tasks are bound to the first
1087                        GPU available, the second  <tasks_per_gpu>  tasks  are
1088                        bound to the second GPU available, etc.  This is basi‐
1089                        cally a block distribution  of  tasks  onto  available
1090                        GPUs,  where  the available GPUs are determined by the
1091                        socket affinity of the task and the socket affinity of
1092                        the GPUs as specified in gres.conf's Cores parameter.
1093
1094       --gpu-freq=[<type]=value>[,<type=value>][,verbose]
1095              Request  that GPUs allocated to the job are configured with spe‐
1096              cific frequency values.  This option can  be  used  to  indepen‐
1097              dently  configure the GPU and its memory frequencies.  After the
1098              job is completed, the frequencies of all affected GPUs  will  be
1099              reset  to  the  highest  possible values.  In some cases, system
1100              power caps may override the requested values.   The  field  type
1101              can be "memory".  If type is not specified, the GPU frequency is
1102              implied.  The value field can either be "low", "medium", "high",
1103              "highm1"  or  a numeric value in megahertz (MHz).  If the speci‐
1104              fied numeric value is not possible, a value as close as possible
1105              will  be used. See below for definition of the values.  The ver‐
1106              bose option causes  current  GPU  frequency  information  to  be
1107              logged.  Examples of use include "--gpu-freq=medium,memory=high"
1108              and "--gpu-freq=450".
1109
1110              Supported value definitions:
1111
1112              low       the lowest available frequency.
1113
1114              medium    attempts to set a  frequency  in  the  middle  of  the
1115                        available range.
1116
1117              high      the highest available frequency.
1118
1119              highm1    (high  minus  one) will select the next highest avail‐
1120                        able frequency.
1121
1122       -G, --gpus=[type:]<number>
1123              Specify the total number of GPUs required for the job.   An  op‐
1124              tional  GPU  type  specification  can  be supplied.  For example
1125              "--gpus=volta:3".  Multiple options can be requested in a  comma
1126              separated  list,  for  example:  "--gpus=volta:3,kepler:1".  See
1127              also the --gpus-per-node, --gpus-per-socket and  --gpus-per-task
1128              options.
1129              NOTE: The allocation has to contain at least one GPU per node.
1130
1131       --gpus-per-node=[type:]<number>
1132              Specify the number of GPUs required for the job on each node in‐
1133              cluded in the job's resource allocation.  An optional  GPU  type
1134              specification      can     be     supplied.      For     example
1135              "--gpus-per-node=volta:3".  Multiple options can be requested in
1136              a       comma       separated       list,      for      example:
1137              "--gpus-per-node=volta:3,kepler:1".   See   also   the   --gpus,
1138              --gpus-per-socket and --gpus-per-task options.
1139
1140       --gpus-per-socket=[type:]<number>
1141              Specify  the  number of GPUs required for the job on each socket
1142              included in the job's resource allocation.  An optional GPU type
1143              specification      can     be     supplied.      For     example
1144              "--gpus-per-socket=volta:3".  Multiple options can be  requested
1145              in      a     comma     separated     list,     for     example:
1146              "--gpus-per-socket=volta:3,kepler:1".  Requires job to specify a
1147              sockets  per  node  count  (  --sockets-per-node).  See also the
1148              --gpus, --gpus-per-node and --gpus-per-task options.   This  op‐
1149              tion applies to job allocations.
1150
1151       --gpus-per-task=[type:]<number>
1152              Specify  the number of GPUs required for the job on each task to
1153              be spawned in the job's resource allocation.   An  optional  GPU
1154              type    specification    can    be    supplied.    For   example
1155              "--gpus-per-task=volta:1". Multiple options can be requested  in
1156              a       comma       separated       list,      for      example:
1157              "--gpus-per-task=volta:3,kepler:1".   See   also   the   --gpus,
1158              --gpus-per-socket  and --gpus-per-node options.  This option re‐
1159              quires an explicit task count, e.g. -n,  --ntasks  or  "--gpus=X
1160              --gpus-per-task=Y"  rather than an ambiguous range of nodes with
1161              -N,    --nodes.     This    option    will    implicitly     set
1162              --gpu-bind=per_task:<gpus_per_task>,  but that can be overridden
1163              with an explicit --gpu-bind specification.
1164
1165       --gres=<list>
1166              Specifies a  comma-delimited  list  of  generic  consumable  re‐
1167              sources.    The   format   of   each   entry   on  the  list  is
1168              "name[[:type]:count]".  The name is that of the  consumable  re‐
1169              source.   The  count is the number of those resources with a de‐
1170              fault value of 1.  The count can have a suffix  of  "k"  or  "K"
1171              (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
1172              "G" (multiple of 1024 x 1024 x 1024), "t" or  "T"  (multiple  of
1173              1024  x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
1174              x 1024 x 1024 x 1024).  The specified resources  will  be  allo‐
1175              cated to the job on each node.  The available generic consumable
1176              resources is configurable by the system administrator.   A  list
1177              of  available  generic  consumable resources will be printed and
1178              the command will exit if the option argument is  "help".   Exam‐
1179              ples  of  use include "--gres=gpu:2", "--gres=gpu:kepler:2", and
1180              "--gres=help".  NOTE: This option applies to job and step  allo‐
1181              cations.  By default, a job step is allocated all of the generic
1182              resources that have been allocated to the job.   To  change  the
1183              behavior  so  that  each  job  step  is allocated no generic re‐
1184              sources, explicitly set the value  of  --gres  to  specify  zero
1185              counts for each generic resource OR set "--gres=none" OR set the
1186              SLURM_STEP_GRES environment variable to "none".
1187
1188       --gres-flags=<type>
1189              Specify generic resource task binding options.  This option  ap‐
1190              plies to job allocations.
1191
1192              disable-binding
1193                     Disable  filtering  of  CPUs  with respect to generic re‐
1194                     source locality.  This option is  currently  required  to
1195                     use  more CPUs than are bound to a GRES (i.e. if a GPU is
1196                     bound to the CPUs on one socket, but  resources  on  more
1197                     than  one  socket are required to run the job).  This op‐
1198                     tion may permit a job to be  allocated  resources  sooner
1199                     than otherwise possible, but may result in lower job per‐
1200                     formance.
1201                     NOTE: This option is specific to SelectType=cons_res.
1202
1203              enforce-binding
1204                     The only CPUs available to the job will be those bound to
1205                     the  selected  GRES  (i.e.  the  CPUs  identified  in the
1206                     gres.conf file will be strictly  enforced).  This  option
1207                     may result in delayed initiation of a job.  For example a
1208                     job requiring two GPUs and one CPU will be delayed  until
1209                     both  GPUs  on  a single socket are available rather than
1210                     using GPUs bound to separate sockets, however, the appli‐
1211                     cation performance may be improved due to improved commu‐
1212                     nication speed.  Requires the node to be configured  with
1213                     more  than one socket and resource filtering will be per‐
1214                     formed on a per-socket basis.
1215                     NOTE: This option is specific to SelectType=cons_tres.
1216
1217       -h, --help
1218              Display help information and exit.
1219
1220       --het-group=<expr>
1221              Identify each component in a heterogeneous  job  allocation  for
1222              which a step is to be created. Applies only to srun commands is‐
1223              sued inside a salloc allocation or sbatch script.  <expr>  is  a
1224              set  of integers corresponding to one or more options offsets on
1225              the salloc or sbatch command line.   Examples:  "--het-group=2",
1226              "--het-group=0,4",  "--het-group=1,3-5".   The  default value is
1227              --het-group=0.
1228
1229       --hint=<type>
1230              Bind tasks according to application hints.
1231              NOTE: This option cannot be used  in  conjunction  with  any  of
1232              --ntasks-per-core,  --threads-per-core,  --cpu-bind  (other than
1233              --cpu-bind=verbose) or -B. If --hint is specified as  a  command
1234              line argument, it will take precedence over the environment.
1235
1236              compute_bound
1237                     Select  settings  for compute bound applications: use all
1238                     cores in each socket, one thread per core.
1239
1240              memory_bound
1241                     Select settings for memory bound applications:  use  only
1242                     one core in each socket, one thread per core.
1243
1244              [no]multithread
1245                     [don't]  use  extra  threads with in-core multi-threading
1246                     which can benefit communication  intensive  applications.
1247                     Only supported with the task/affinity plugin.
1248
1249              help   show this help message
1250
1251              This option applies to job allocations.
1252
1253       -H, --hold
1254              Specify  the job is to be submitted in a held state (priority of
1255              zero).  A held job can now be released using scontrol  to  reset
1256              its priority (e.g. "scontrol release <job_id>"). This option ap‐
1257              plies to job allocations.
1258
1259       -I, --immediate[=<seconds>]
1260              exit if resources are not available within the time period spec‐
1261              ified.   If  no  argument  is given (seconds defaults to 1), re‐
1262              sources must be available immediately for the  request  to  suc‐
1263              ceed.  If  defer  is  configured in SchedulerParameters and sec‐
1264              onds=1 the allocation request will fail immediately; defer  con‐
1265              flicts and takes precedence over this option.  By default, --im‐
1266              mediate is off, and the command will block until  resources  be‐
1267              come  available.  Since  this option's argument is optional, for
1268              proper parsing the single letter option must be followed immedi‐
1269              ately  with  the value and not include a space between them. For
1270              example "-I60" and not "-I 60". This option applies to  job  and
1271              step allocations.
1272
1273       -i, --input=<mode>
1274              Specify  how  stdin is to redirected. By default, srun redirects
1275              stdin from the terminal to all tasks. See IO  Redirection  below
1276              for  more  options.  For OS X, the poll() function does not sup‐
1277              port stdin, so input from a terminal is not possible.  This  op‐
1278              tion applies to job and step allocations.
1279
1280       -J, --job-name=<jobname>
1281              Specify a name for the job. The specified name will appear along
1282              with the job id number when querying running jobs on the system.
1283              The  default  is  the  supplied executable program's name. NOTE:
1284              This information may be written to the  slurm_jobacct.log  file.
1285              This  file  is space delimited so if a space is used in the job‐
1286              name name it will cause problems in properly displaying the con‐
1287              tents  of  the  slurm_jobacct.log file when the sacct command is
1288              used. This option applies to job and step allocations.
1289
1290       --jobid=<jobid>
1291              Initiate a job step under an already allocated job with  job  id
1292              id.   Using  this option will cause srun to behave exactly as if
1293              the SLURM_JOB_ID environment variable was set. This  option  ap‐
1294              plies to step allocations.
1295
1296       -K, --kill-on-bad-exit[=0|1]
1297              Controls  whether  or  not to terminate a step if any task exits
1298              with a non-zero exit code. If this option is not specified,  the
1299              default action will be based upon the Slurm configuration param‐
1300              eter of KillOnBadExit. If this option is specified, it will take
1301              precedence  over  KillOnBadExit. An option argument of zero will
1302              not terminate the job. A non-zero argument or no  argument  will
1303              terminate  the job.  Note: This option takes precedence over the
1304              -W, --wait option to terminate the job immediately if a task ex‐
1305              its  with a non-zero exit code.  Since this option's argument is
1306              optional, for proper parsing the single letter  option  must  be
1307              followed  immediately with the value and not include a space be‐
1308              tween them. For example "-K1" and not "-K 1".
1309
1310       -l, --label
1311              Prepend task number to lines of stdout/err.  The --label  option
1312              will  prepend  lines of output with the remote task id. This op‐
1313              tion applies to step allocations.
1314
1315       -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
1316              Specification of licenses (or other resources available  on  all
1317              nodes  of the cluster) which must be allocated to this job.  Li‐
1318              cense names can be followed by a colon and  count  (the  default
1319              count is one).  Multiple license names should be comma separated
1320              (e.g.  "--licenses=foo:4,bar"). This option applies to job allo‐
1321              cations.
1322
1323              NOTE:  When submitting heterogeneous jobs, license requests only
1324              work correctly when made on the first component job.  For  exam‐
1325              ple "srun -L ansys:2 : myexecutable".
1326
1327       --mail-type=<type>
1328              Notify user by email when certain event types occur.  Valid type
1329              values are NONE, BEGIN, END, FAIL, REQUEUE, ALL  (equivalent  to
1330              BEGIN,  END,  FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
1331              VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1332              fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1333              (reached 90 percent of time limit),  TIME_LIMIT_80  (reached  80
1334              percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1335              time limit).  Multiple type values may be specified in  a  comma
1336              separated  list.   The  user  to  be  notified is indicated with
1337              --mail-user. This option applies to job allocations.
1338
1339       --mail-user=<user>
1340              User to receive email notification of state changes  as  defined
1341              by  --mail-type.  The default value is the submitting user. This
1342              option applies to job allocations.
1343
1344       --mcs-label=<mcs>
1345              Used only when the mcs/group plugin is enabled.  This  parameter
1346              is  a group among the groups of the user.  Default value is cal‐
1347              culated by the Plugin mcs if it's enabled. This  option  applies
1348              to job allocations.
1349
1350       --mem=<size>[units]
1351              Specify  the  real  memory required per node.  Default units are
1352              megabytes.  Different units can be specified  using  the  suffix
1353              [K|M|G|T].  Default value is DefMemPerNode and the maximum value
1354              is MaxMemPerNode. If configured, both of parameters can be  seen
1355              using  the  scontrol  show config command.  This parameter would
1356              generally be used if whole nodes are allocated to jobs  (Select‐
1357              Type=select/linear).   Specifying  a  memory limit of zero for a
1358              job step will restrict the job step to the amount of memory  al‐
1359              located to the job, but not remove any of the job's memory allo‐
1360              cation from being  available  to  other  job  steps.   Also  see
1361              --mem-per-cpu  and  --mem-per-gpu.  The --mem, --mem-per-cpu and
1362              --mem-per-gpu  options  are  mutually   exclusive.   If   --mem,
1363              --mem-per-cpu or --mem-per-gpu are specified as command line ar‐
1364              guments, then they will take  precedence  over  the  environment
1365              (potentially inherited from salloc or sbatch).
1366
1367              NOTE:  A  memory size specification of zero is treated as a spe‐
1368              cial case and grants the job access to all of the memory on each
1369              node  for  newly  submitted jobs and all available job memory to
1370              new job steps.
1371
1372              Specifying new memory limits for job steps are only advisory.
1373
1374              If the job is allocated multiple nodes in a heterogeneous  clus‐
1375              ter,  the  memory limit on each node will be that of the node in
1376              the allocation with the smallest memory size  (same  limit  will
1377              apply to every node in the job's allocation).
1378
1379              NOTE:  Enforcement  of  memory  limits currently relies upon the
1380              task/cgroup plugin or enabling of accounting, which samples mem‐
1381              ory  use on a periodic basis (data need not be stored, just col‐
1382              lected). In both cases memory use is based upon the job's  Resi‐
1383              dent  Set  Size  (RSS). A task may exceed the memory limit until
1384              the next periodic accounting sample.
1385
1386              This option applies to job and step allocations.
1387
1388       --mem-bind=[{quiet|verbose},]<type>
1389              Bind tasks to memory. Used only when the task/affinity plugin is
1390              enabled  and the NUMA memory functions are available.  Note that
1391              the resolution of CPU and memory binding may differ on some  ar‐
1392              chitectures.  For  example,  CPU binding may be performed at the
1393              level of the cores within a processor while memory binding  will
1394              be  performed  at  the  level  of nodes, where the definition of
1395              "nodes" may differ from system to system.  By default no  memory
1396              binding is performed; any task using any CPU can use any memory.
1397              This option is typically used to ensure that each task is  bound
1398              to  the  memory closest to its assigned CPU. The use of any type
1399              other than "none" or "local" is not recommended.   If  you  want
1400              greater control, try running a simple test code with the options
1401              "--cpu-bind=verbose,none --mem-bind=verbose,none"  to  determine
1402              the specific configuration.
1403
1404              NOTE: To have Slurm always report on the selected memory binding
1405              for all commands executed in a shell,  you  can  enable  verbose
1406              mode by setting the SLURM_MEM_BIND environment variable value to
1407              "verbose".
1408
1409              The following informational environment variables are  set  when
1410              --mem-bind is in use:
1411
1412                   SLURM_MEM_BIND_LIST
1413                   SLURM_MEM_BIND_PREFER
1414                   SLURM_MEM_BIND_SORT
1415                   SLURM_MEM_BIND_TYPE
1416                   SLURM_MEM_BIND_VERBOSE
1417
1418              See  the  ENVIRONMENT  VARIABLES section for a more detailed de‐
1419              scription of the individual SLURM_MEM_BIND* variables.
1420
1421              Supported options include:
1422
1423              help   show this help message
1424
1425              local  Use memory local to the processor in use
1426
1427              map_mem:<list>
1428                     Bind by setting memory masks on tasks (or ranks) as spec‐
1429                     ified             where             <list>             is
1430                     <numa_id_for_task_0>,<numa_id_for_task_1>,...   The  map‐
1431                     ping is specified for a node and identical mapping is ap‐
1432                     plied to the tasks on every node (i.e. the lowest task ID
1433                     on  each  node is mapped to the first ID specified in the
1434                     list, etc.).  NUMA IDs are interpreted as decimal  values
1435                     unless they are preceded with '0x' in which case they in‐
1436                     terpreted as hexadecimal values.  If the number of  tasks
1437                     (or  ranks)  exceeds the number of elements in this list,
1438                     elements in the list will be reused  as  needed  starting
1439                     from  the beginning of the list.  To simplify support for
1440                     large task counts, the lists may follow a map with an as‐
1441                     terisk     and    repetition    count.     For    example
1442                     "map_mem:0x0f*4,0xf0*4".   For  predictable  binding  re‐
1443                     sults,  all CPUs for each node in the job should be allo‐
1444                     cated to the job.
1445
1446              mask_mem:<list>
1447                     Bind by setting memory masks on tasks (or ranks) as spec‐
1448                     ified             where             <list>             is
1449                     <numa_mask_for_task_0>,<numa_mask_for_task_1>,...     The
1450                     mapping  is specified for a node and identical mapping is
1451                     applied to the tasks on every node (i.e. the lowest  task
1452                     ID  on each node is mapped to the first mask specified in
1453                     the list, etc.).  NUMA masks are  always  interpreted  as
1454                     hexadecimal  values.   Note  that  masks must be preceded
1455                     with a '0x' if they don't begin with [0-9]  so  they  are
1456                     seen  as  numerical  values.   If the number of tasks (or
1457                     ranks) exceeds the number of elements in this list,  ele‐
1458                     ments  in the list will be reused as needed starting from
1459                     the beginning of the list.  To simplify support for large
1460                     task counts, the lists may follow a mask with an asterisk
1461                     and repetition count.   For  example  "mask_mem:0*4,1*4".
1462                     For  predictable  binding results, all CPUs for each node
1463                     in the job should be allocated to the job.
1464
1465              no[ne] don't bind tasks to memory (default)
1466
1467              nosort avoid sorting free cache pages (default, LaunchParameters
1468                     configuration parameter can override this default)
1469
1470              p[refer]
1471                     Prefer use of first specified NUMA node, but permit
1472                      use of other available NUMA nodes.
1473
1474              q[uiet]
1475                     quietly bind before task runs (default)
1476
1477              rank   bind by task rank (not recommended)
1478
1479              sort   sort free cache pages (run zonesort on Intel KNL nodes)
1480
1481              v[erbose]
1482                     verbosely report binding before task runs
1483
1484              This option applies to job and step allocations.
1485
1486       --mem-per-cpu=<size>[units]
1487              Minimum  memory  required  per allocated CPU.  Default units are
1488              megabytes.  Different units can be specified  using  the  suffix
1489              [K|M|G|T].   The  default  value is DefMemPerCPU and the maximum
1490              value is MaxMemPerCPU (see exception below). If configured, both
1491              parameters  can  be seen using the scontrol show config command.
1492              Note that if the job's --mem-per-cpu value exceeds  the  config‐
1493              ured  MaxMemPerCPU,  then  the user's limit will be treated as a
1494              memory limit per task; --mem-per-cpu will be reduced to a  value
1495              no larger than MaxMemPerCPU; --cpus-per-task will be set and the
1496              value of --cpus-per-task multiplied  by  the  new  --mem-per-cpu
1497              value  will  equal the original --mem-per-cpu value specified by
1498              the user.  This parameter would generally be used if  individual
1499              processors  are  allocated to jobs (SelectType=select/cons_res).
1500              If resources are allocated by core, socket, or whole nodes, then
1501              the  number  of  CPUs  allocated to a job may be higher than the
1502              task count and the value of --mem-per-cpu should be adjusted ac‐
1503              cordingly.   Specifying  a  memory  limit of zero for a job step
1504              will restrict the job step to the amount of memory allocated  to
1505              the  job, but not remove any of the job's memory allocation from
1506              being  available  to  other  job  steps.   Also  see  --mem  and
1507              --mem-per-gpu.   The  --mem, --mem-per-cpu and --mem-per-gpu op‐
1508              tions are mutually exclusive.
1509
1510              NOTE: If the final amount of memory requested by a job can't  be
1511              satisfied  by  any of the nodes configured in the partition, the
1512              job will be rejected.  This could  happen  if  --mem-per-cpu  is
1513              used  with  the  --exclusive  option  for  a  job allocation and
1514              --mem-per-cpu times the number of CPUs on a node is greater than
1515              the total memory of that node.
1516
1517       --mem-per-gpu=<size>[units]
1518              Minimum  memory  required  per allocated GPU.  Default units are
1519              megabytes.  Different units can be specified  using  the  suffix
1520              [K|M|G|T].   Default  value  is DefMemPerGPU and is available on
1521              both a global and per partition basis.  If configured,  the  pa‐
1522              rameters can be seen using the scontrol show config and scontrol
1523              show  partition  commands.   Also   see   --mem.    The   --mem,
1524              --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1525
1526       --mincpus=<n>
1527              Specify  a  minimum  number of logical cpus/processors per node.
1528              This option applies to job allocations.
1529
1530       --mpi=<mpi_type>
1531              Identify the type of MPI to be used. May result in unique initi‐
1532              ation procedures.
1533
1534              list   Lists available mpi types to choose from.
1535
1536              pmi2   To  enable  PMI2 support. The PMI2 support in Slurm works
1537                     only if the MPI  implementation  supports  it,  in  other
1538                     words  if the MPI has the PMI2 interface implemented. The
1539                     --mpi=pmi2 will load  the  library  lib/slurm/mpi_pmi2.so
1540                     which  provides  the  server  side  functionality but the
1541                     client side must implement PMI2_Init() and the other  in‐
1542                     terface calls.
1543
1544              pmix   To enable PMIx support (https://pmix.github.io). The PMIx
1545                     support in Slurm can be used to launch parallel  applica‐
1546                     tions (e.g. MPI) if it supports PMIx, PMI2 or PMI1. Slurm
1547                     must  be  configured  with  pmix   support   by   passing
1548                     "--with-pmix=<PMIx  installation  path>"  option  to  its
1549                     "./configure" script.
1550
1551                     At the time of writing PMIx  is  supported  in  Open  MPI
1552                     starting  from  version 2.0.  PMIx also supports backward
1553                     compatibility with PMI1 and PMI2 and can be used  if  MPI
1554                     was  configured  with  PMI2/PMI1  support pointing to the
1555                     PMIx library ("libpmix").  If MPI supports PMI1/PMI2  but
1556                     doesn't  provide the way to point to a specific implemen‐
1557                     tation, a hack'ish solution leveraging LD_PRELOAD can  be
1558                     used to force "libpmix" usage.
1559
1560              none   No  special MPI processing. This is the default and works
1561                     with many other versions of MPI.
1562
1563              This option applies to step allocations.
1564
1565       --msg-timeout=<seconds>
1566              Modify the job launch message timeout.   The  default  value  is
1567              MessageTimeout  in  the  Slurm  configuration  file  slurm.conf.
1568              Changes to this are typically not recommended, but could be use‐
1569              ful  to  diagnose  problems.  This option applies to job alloca‐
1570              tions.
1571
1572       --multi-prog
1573              Run a job with different programs and  different  arguments  for
1574              each task. In this case, the executable program specified is ac‐
1575              tually a configuration file specifying the executable and  argu‐
1576              ments  for  each  task. See MULTIPLE PROGRAM CONFIGURATION below
1577              for details on the configuration file contents. This option  ap‐
1578              plies to step allocations.
1579
1580       --network=<type>
1581              Specify  information  pertaining  to the switch or network.  The
1582              interpretation of type is system dependent.  This option is sup‐
1583              ported when running Slurm on a Cray natively.  It is used to re‐
1584              quest using Network Performance Counters.  Only  one  value  per
1585              request  is  valid.  All options are case in-sensitive.  In this
1586              configuration supported values include:
1587
1588              system
1589                    Use the system-wide  network  performance  counters.  Only
1590                    nodes  requested will be marked in use for the job alloca‐
1591                    tion.  If the job does not fill up the entire  system  the
1592                    rest  of  the  nodes are not able to be used by other jobs
1593                    using NPC, if idle their state will  appear  as  PerfCnts.
1594                    These  nodes  are still available for other jobs not using
1595                    NPC.
1596
1597              blade Use the blade network performance counters. Only nodes re‐
1598                    quested  will be marked in use for the job allocation.  If
1599                    the job does not fill up the entire blade(s) allocated  to
1600                    the  job  those  blade(s) are not able to be used by other
1601                    jobs using NPC, if idle their state will appear as  PerfC‐
1602                    nts.   These  nodes are still available for other jobs not
1603                    using NPC.
1604
1605       In all cases the job allocation request must  specify  the  --exclusive
1606       option  and the step cannot specify the --overlap option. Otherwise the
1607       request will be denied.
1608
1609       Also with any of these options steps are not allowed to  share  blades,
1610       so resources would remain idle inside an allocation if the step running
1611       on a blade does not take up all the nodes on the blade.
1612
1613       The network option is also supported on systems with IBM's Parallel En‐
1614       vironment  (PE).   See IBM's LoadLeveler job command keyword documenta‐
1615       tion about the keyword "network" for more information.  Multiple values
1616       may  be  specified  in  a  comma  separated list.  All options are case
1617       in-sensitive.  Supported values include:
1618
1619              BULK_XFER[=<resources>]
1620                          Enable  bulk  transfer  of  data  using  Remote  Di‐
1621                          rect-Memory  Access  (RDMA).  The optional resources
1622                          specification is a numeric value which  can  have  a
1623                          suffix  of  "k", "K", "m", "M", "g" or "G" for kilo‐
1624                          bytes, megabytes or gigabytes.  NOTE: The  resources
1625                          specification is not supported by the underlying IBM
1626                          infrastructure as of  Parallel  Environment  version
1627                          2.2  and  no value should be specified at this time.
1628                          The devices allocated to a job must all  be  of  the
1629                          same  type.   The default value depends upon depends
1630                          upon what hardware is  available  and  in  order  of
1631                          preferences  is  IPONLY  (which is not considered in
1632                          User Space mode), HFI, IB, HPCE, and KMUX.
1633
1634              CAU=<count> Number of Collective Acceleration  Units  (CAU)  re‐
1635                          quired.   Applies  only to IBM Power7-IH processors.
1636                          Default value is zero.  Independent CAU will be  al‐
1637                          located  for  each programming interface (MPI, LAPI,
1638                          etc.)
1639
1640              DEVNAME=<name>
1641                          Specify the device name to  use  for  communications
1642                          (e.g. "eth0" or "mlx4_0").
1643
1644              DEVTYPE=<type>
1645                          Specify  the  device type to use for communications.
1646                          The supported values of type are: "IB" (InfiniBand),
1647                          "HFI"  (P7 Host Fabric Interface), "IPONLY" (IP-Only
1648                          interfaces), "HPCE" (HPC Ethernet), and
1649
1650                          "KMUX" (Kernel Emulation of HPCE).  The devices  al‐
1651                          located  to a job must all be of the same type.  The
1652                          default value depends upon depends upon  what  hard‐
1653                          ware  is  available  and  in order of preferences is
1654                          IPONLY (which is not considered in User Space mode),
1655                          HFI, IB, HPCE, and KMUX.
1656
1657              IMMED =<count>
1658                          Number  of immediate send slots per window required.
1659                          Applies only to IBM Power7-IH  processors.   Default
1660                          value is zero.
1661
1662              INSTANCES =<count>
1663                          Specify  number of network connections for each task
1664                          on each network connection.   The  default  instance
1665                          count is 1.
1666
1667              IPV4        Use  Internet Protocol (IP) version 4 communications
1668                          (default).
1669
1670              IPV6        Use Internet Protocol (IP) version 6 communications.
1671
1672              LAPI        Use the LAPI programming interface.
1673
1674              MPI         Use the MPI programming interface.  MPI is  the  de‐
1675                          fault interface.
1676
1677              PAMI        Use the PAMI programming interface.
1678
1679              SHMEM       Use the OpenSHMEM programming interface.
1680
1681              SN_ALL      Use all available switch networks (default).
1682
1683              SN_SINGLE   Use one available switch network.
1684
1685              UPC         Use the UPC programming interface.
1686
1687              US          Use User Space communications.
1688
1689       Some examples of network specifications:
1690
1691              Instances=2,US,MPI,SN_ALL
1692                     Create  two user space connections for MPI communications
1693                     on every switch network for each task.
1694
1695              US,MPI,Instances=3,Devtype=IB
1696                     Create three user space connections  for  MPI  communica‐
1697                     tions on every InfiniBand network for each task.
1698
1699              IPV4,LAPI,SN_Single
1700                     Create  a IP version 4 connection for LAPI communications
1701                     on one switch network for each task.
1702
1703              Instances=2,US,LAPI,MPI
1704                     Create two user space connections each for LAPI  and  MPI
1705                     communications  on  every  switch  network for each task.
1706                     Note that SN_ALL is the default option  so  every  switch
1707                     network  is  used.  Also  note that Instances=2 specifies
1708                     that two connections are established  for  each  protocol
1709                     (LAPI  and MPI) and each task.  If there are two networks
1710                     and four tasks on the node then a total of 32 connections
1711                     are established (2 instances x 2 protocols x 2 networks x
1712                     4 tasks).
1713
1714              This option applies to job and step allocations.
1715
1716       --nice[=adjustment]
1717              Run the job with an adjusted scheduling priority  within  Slurm.
1718              With no adjustment value the scheduling priority is decreased by
1719              100. A negative nice value increases the priority, otherwise de‐
1720              creases  it. The adjustment range is +/- 2147483645. Only privi‐
1721              leged users can specify a negative adjustment.
1722
1723       -Z, --no-allocate
1724              Run the specified tasks on a set of  nodes  without  creating  a
1725              Slurm  "job"  in the Slurm queue structure, bypassing the normal
1726              resource allocation step.  The list of nodes must  be  specified
1727              with  the  -w,  --nodelist  option.  This is a privileged option
1728              only available for the users "SlurmUser" and "root". This option
1729              applies to job allocations.
1730
1731       -k, --no-kill[=off]
1732              Do  not automatically terminate a job if one of the nodes it has
1733              been allocated fails. This option applies to job and step  allo‐
1734              cations.    The   job   will  assume  all  responsibilities  for
1735              fault-tolerance.  Tasks launch using this  option  will  not  be
1736              considered  terminated  (e.g.  -K,  --kill-on-bad-exit  and  -W,
1737              --wait options will have no effect upon the job step).  The  ac‐
1738              tive  job  step  (MPI job) will likely suffer a fatal error, but
1739              subsequent job steps may be run if this option is specified.
1740
1741              Specify an optional argument of "off" disable the effect of  the
1742              SLURM_NO_KILL environment variable.
1743
1744              The default action is to terminate the job upon node failure.
1745
1746       -F, --nodefile=<node_file>
1747              Much  like  --nodelist,  but  the list is contained in a file of
1748              name node file.  The node names of the list may also span multi‐
1749              ple  lines in the file.    Duplicate node names in the file will
1750              be ignored.  The order of the node names in the list is not  im‐
1751              portant; the node names will be sorted by Slurm.
1752
1753       -w, --nodelist={<node_name_list>|<filename>}
1754              Request  a  specific list of hosts.  The job will contain all of
1755              these hosts and possibly additional hosts as needed  to  satisfy
1756              resource   requirements.    The  list  may  be  specified  as  a
1757              comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1758              for  example),  or a filename.  The host list will be assumed to
1759              be a filename if it contains a "/" character.  If you specify  a
1760              minimum  node or processor count larger than can be satisfied by
1761              the supplied host list, additional resources will  be  allocated
1762              on  other  nodes  as  needed.  Rather than repeating a host name
1763              multiple times, an asterisk and a repetition count  may  be  ap‐
1764              pended  to  a host name. For example "host1,host1" and "host1*2"
1765              are equivalent. If the number of tasks is given and  a  list  of
1766              requested  nodes  is  also  given, the number of nodes used from
1767              that list will be reduced to match that of the number  of  tasks
1768              if the number of nodes in the list is greater than the number of
1769              tasks. This option applies to job and step allocations.
1770
1771       -N, --nodes=<minnodes>[-maxnodes]
1772              Request that a minimum of minnodes nodes be  allocated  to  this
1773              job.   A maximum node count may also be specified with maxnodes.
1774              If only one number is specified, this is used as both the  mini‐
1775              mum  and maximum node count.  The partition's node limits super‐
1776              sede those of the job.  If a job's node limits  are  outside  of
1777              the  range  permitted for its associated partition, the job will
1778              be left in a PENDING state.  This permits possible execution  at
1779              a  later  time,  when  the partition limit is changed.  If a job
1780              node limit exceeds the number of nodes configured in the  parti‐
1781              tion, the job will be rejected.  Note that the environment vari‐
1782              able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1783              ibility) will be set to the count of nodes actually allocated to
1784              the job. See the ENVIRONMENT VARIABLES section for more informa‐
1785              tion.   If -N is not specified, the default behavior is to allo‐
1786              cate enough nodes to satisfy  the  requested  resources  as  ex‐
1787              pressed  by  per-job  specification  options,  e.g.  -n,  -c and
1788              --gpus.  The job will be allocated as  many  nodes  as  possible
1789              within  the  range specified and without delaying the initiation
1790              of the job.  If the number of tasks is given and a number of re‐
1791              quested  nodes is also given, the number of nodes used from that
1792              request will be reduced to match that of the number of tasks  if
1793              the number of nodes in the request is greater than the number of
1794              tasks.  The node count specification may include a numeric value
1795              followed  by a suffix of "k" (multiplies numeric value by 1,024)
1796              or "m" (multiplies numeric value by 1,048,576). This option  ap‐
1797              plies to job and step allocations.
1798
1799       -n, --ntasks=<number>
1800              Specify  the  number of tasks to run. Request that srun allocate
1801              resources for ntasks tasks.  The default is one task  per  node,
1802              but  note  that  the --cpus-per-task option will change this de‐
1803              fault. This option applies to job and step allocations.
1804
1805       --ntasks-per-core=<ntasks>
1806              Request the maximum ntasks be invoked on each core.  This option
1807              applies  to  the  job  allocation,  but not to step allocations.
1808              Meant  to  be  used  with  the  --ntasks  option.   Related   to
1809              --ntasks-per-node  except  at the core level instead of the node
1810              level.  Masks will automatically be generated to bind the  tasks
1811              to  specific  cores  unless --cpu-bind=none is specified.  NOTE:
1812              This option is not supported when  using  SelectType=select/lin‐
1813              ear.
1814
1815       --ntasks-per-gpu=<ntasks>
1816              Request that there are ntasks tasks invoked for every GPU.  This
1817              option can work in two ways: 1) either specify --ntasks in addi‐
1818              tion,  in which case a type-less GPU specification will be auto‐
1819              matically determined to satisfy --ntasks-per-gpu, or 2)  specify
1820              the  GPUs  wanted (e.g. via --gpus or --gres) without specifying
1821              --ntasks, and the total task count will be automatically  deter‐
1822              mined.   The  number  of  CPUs  needed will be automatically in‐
1823              creased if necessary to allow for  any  calculated  task  count.
1824              This  option will implicitly set --gpu-bind=single:<ntasks>, but
1825              that can be overridden with an  explicit  --gpu-bind  specifica‐
1826              tion.   This  option  is  not compatible with a node range (i.e.
1827              -N<minnodes-maxnodes>).  This  option  is  not  compatible  with
1828              --gpus-per-task,  --gpus-per-socket, or --ntasks-per-node.  This
1829              option is not supported unless SelectType=cons_tres  is  config‐
1830              ured (either directly or indirectly on Cray systems).
1831
1832       --ntasks-per-node=<ntasks>
1833              Request  that  ntasks be invoked on each node.  If used with the
1834              --ntasks option, the --ntasks option will  take  precedence  and
1835              the  --ntasks-per-node  will  be  treated  as a maximum count of
1836              tasks per node.  Meant to be used with the --nodes option.  This
1837              is related to --cpus-per-task=ncpus, but does not require knowl‐
1838              edge of the actual number of cpus on each node.  In some  cases,
1839              it  is more convenient to be able to request that no more than a
1840              specific number of tasks be invoked on each node.   Examples  of
1841              this  include  submitting a hybrid MPI/OpenMP app where only one
1842              MPI "task/rank" should be assigned to each node  while  allowing
1843              the  OpenMP portion to utilize all of the parallelism present in
1844              the node, or submitting a single setup/cleanup/monitoring job to
1845              each  node  of a pre-existing allocation as one step in a larger
1846              job script. This option applies to job allocations.
1847
1848       --ntasks-per-socket=<ntasks>
1849              Request the maximum ntasks be invoked on each socket.  This  op‐
1850              tion applies to the job allocation, but not to step allocations.
1851              Meant  to  be  used  with  the  --ntasks  option.   Related   to
1852              --ntasks-per-node except at the socket level instead of the node
1853              level.  Masks will automatically be generated to bind the  tasks
1854              to  specific sockets unless --cpu-bind=none is specified.  NOTE:
1855              This option is not supported when  using  SelectType=select/lin‐
1856              ear.
1857
1858       --open-mode={append|truncate}
1859              Open the output and error files using append or truncate mode as
1860              specified.  For heterogeneous job steps  the  default  value  is
1861              "append".   Otherwise the default value is specified by the sys‐
1862              tem configuration parameter JobFileAppend. This  option  applies
1863              to job and step allocations.
1864
1865       -o, --output=<filename_pattern>
1866              Specify  the  "filename  pattern" for stdout redirection. By de‐
1867              fault in interactive mode, srun collects stdout from  all  tasks
1868              and  sends this output via TCP/IP to the attached terminal. With
1869              --output stdout may be redirected to a file,  to  one  file  per
1870              task,  or to /dev/null. See section IO Redirection below for the
1871              various forms of filename pattern.  If the  specified  file  al‐
1872              ready exists, it will be overwritten.
1873
1874              If  --error is not also specified on the command line, both std‐
1875              out and stderr will directed to the file specified by  --output.
1876              This option applies to job and step allocations.
1877
1878       -O, --overcommit
1879              Overcommit  resources. This option applies to job and step allo‐
1880              cations.
1881
1882              When applied to a job allocation (not including jobs  requesting
1883              exclusive access to the nodes) the resources are allocated as if
1884              only one task per node is requested. This  means  that  the  re‐
1885              quested  number of cpus per task (-c, --cpus-per-task) are allo‐
1886              cated per node rather than being multiplied  by  the  number  of
1887              tasks.  Options  used  to  specify the number of tasks per node,
1888              socket, core, etc. are ignored.
1889
1890              When applied to job step allocations (the srun command when exe‐
1891              cuted  within  an  existing  job allocation), this option can be
1892              used to launch more than one task per CPU.  Normally, srun  will
1893              not  allocate  more  than  one  process  per CPU.  By specifying
1894              --overcommit you are explicitly allowing more than  one  process
1895              per  CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1896              mitted to execute per node.  NOTE: MAX_TASKS_PER_NODE is defined
1897              in  the  file  slurm.h and is not a variable, it is set at Slurm
1898              build time.
1899
1900       --overlap
1901              Allow steps to overlap each other on the CPUs.  By default steps
1902              do not share CPUs with other parallel steps.
1903
1904       -s, --oversubscribe
1905              The  job allocation can over-subscribe resources with other run‐
1906              ning jobs.  The resources to be over-subscribed  can  be  nodes,
1907              sockets,  cores,  and/or  hyperthreads depending upon configura‐
1908              tion.  The default over-subscribe  behavior  depends  on  system
1909              configuration  and  the  partition's  OverSubscribe option takes
1910              precedence over the job's option.  This option may result in the
1911              allocation  being granted sooner than if the --oversubscribe op‐
1912              tion was not set and allow higher system utilization, but appli‐
1913              cation performance will likely suffer due to competition for re‐
1914              sources.  This option applies to step allocations.
1915
1916       -p, --partition=<partition_names>
1917              Request a specific partition for the  resource  allocation.   If
1918              not  specified,  the default behavior is to allow the slurm con‐
1919              troller to select the default partition  as  designated  by  the
1920              system  administrator.  If  the job can use more than one parti‐
1921              tion, specify their names in a comma separate list and  the  one
1922              offering  earliest  initiation will be used with no regard given
1923              to the partition name ordering (although higher priority  parti‐
1924              tions will be considered first).  When the job is initiated, the
1925              name of the partition used will  be  placed  first  in  the  job
1926              record partition string. This option applies to job allocations.
1927
1928       --power=<flags>
1929              Comma  separated  list of power management plugin options.  Cur‐
1930              rently available flags include: level (all  nodes  allocated  to
1931              the job should have identical power caps, may be disabled by the
1932              Slurm configuration option PowerParameters=job_no_level).   This
1933              option applies to job allocations.
1934
1935       -E, --preserve-env
1936              Pass    the    current    values    of   environment   variables
1937              SLURM_JOB_NUM_NODES and SLURM_NTASKS through to the  executable,
1938              rather  than  computing  them from command line parameters. This
1939              option applies to job allocations.
1940
1941       --priority=<value>
1942              Request a specific job priority.  May be subject  to  configura‐
1943              tion  specific  constraints.   value  should either be a numeric
1944              value or "TOP" (for highest possible value).  Only Slurm  opera‐
1945              tors and administrators can set the priority of a job.  This op‐
1946              tion applies to job allocations only.
1947
1948       --profile={all|none|<type>[,<type>...]}
1949              Enables detailed  data  collection  by  the  acct_gather_profile
1950              plugin.  Detailed data are typically time-series that are stored
1951              in an HDF5 file for the job or an InfluxDB database depending on
1952              the  configured plugin.  This option applies to job and step al‐
1953              locations.
1954
1955              All       All data types are collected. (Cannot be combined with
1956                        other values.)
1957
1958              None      No data types are collected. This is the default.
1959                         (Cannot be combined with other values.)
1960
1961       Valid type values are:
1962
1963              Energy Energy data is collected.
1964
1965              Task   Task (I/O, Memory, ...) data is collected.
1966
1967              Filesystem
1968                     Filesystem data is collected.
1969
1970              Network
1971                     Network (InfiniBand) data is collected.
1972
1973       --prolog=<executable>
1974              srun  will  run  executable  just before launching the job step.
1975              The command line arguments for executable will  be  the  command
1976              and arguments of the job step.  If executable is "none", then no
1977              srun prolog will be run. This parameter overrides the SrunProlog
1978              parameter  in  slurm.conf. This parameter is completely indepen‐
1979              dent from the Prolog parameter in slurm.conf.  This  option  ap‐
1980              plies to job allocations.
1981
1982       --propagate[=rlimit[,rlimit...]]
1983              Allows  users to specify which of the modifiable (soft) resource
1984              limits to propagate to the compute  nodes  and  apply  to  their
1985              jobs.  If  no rlimit is specified, then all resource limits will
1986              be propagated.  The following  rlimit  names  are  supported  by
1987              Slurm  (although  some options may not be supported on some sys‐
1988              tems):
1989
1990              ALL       All limits listed below (default)
1991
1992              NONE      No limits listed below
1993
1994              AS        The maximum  address  space  (virtual  memory)  for  a
1995                        process.
1996
1997              CORE      The maximum size of core file
1998
1999              CPU       The maximum amount of CPU time
2000
2001              DATA      The maximum size of a process's data segment
2002
2003              FSIZE     The  maximum  size  of files created. Note that if the
2004                        user sets FSIZE to less than the current size  of  the
2005                        slurmd.log,  job  launches will fail with a 'File size
2006                        limit exceeded' error.
2007
2008              MEMLOCK   The maximum size that may be locked into memory
2009
2010              NOFILE    The maximum number of open files
2011
2012              NPROC     The maximum number of processes available
2013
2014              RSS       The maximum resident set size. Note that this only has
2015                        effect with Linux kernels 2.4.30 or older or BSD.
2016
2017              STACK     The maximum stack size
2018
2019              This option applies to job allocations.
2020
2021       --pty  Execute  task  zero  in  pseudo  terminal mode.  Implicitly sets
2022              --unbuffered.  Implicitly sets --error and --output to /dev/null
2023              for  all  tasks except task zero, which may cause those tasks to
2024              exit immediately (e.g. shells will typically exit immediately in
2025              that situation).  This option applies to step allocations.
2026
2027       -q, --qos=<qos>
2028              Request a quality of service for the job.  QOS values can be de‐
2029              fined for each user/cluster/account  association  in  the  Slurm
2030              database.   Users will be limited to their association's defined
2031              set of qos's when the Slurm  configuration  parameter,  Account‐
2032              ingStorageEnforce, includes "qos" in its definition. This option
2033              applies to job allocations.
2034
2035       -Q, --quiet
2036              Suppress informational messages from srun. Errors will still  be
2037              displayed. This option applies to job and step allocations.
2038
2039       --quit-on-interrupt
2040              Quit  immediately  on single SIGINT (Ctrl-C). Use of this option
2041              disables the status feature normally  available  when  srun  re‐
2042              ceives  a  single  Ctrl-C and causes srun to instead immediately
2043              terminate the running job. This option applies to  step  alloca‐
2044              tions.
2045
2046       --reboot
2047              Force  the  allocated  nodes  to reboot before starting the job.
2048              This is only supported with some system configurations and  will
2049              otherwise  be  silently  ignored. Only root, SlurmUser or admins
2050              can reboot nodes. This option applies to job allocations.
2051
2052       -r, --relative=<n>
2053              Run a job step relative to node n  of  the  current  allocation.
2054              This  option  may  be used to spread several job steps out among
2055              the nodes of the current job. If -r is  used,  the  current  job
2056              step  will  begin at node n of the allocated nodelist, where the
2057              first node is considered node 0.  The -r option is not permitted
2058              with  -w  or -x option and will result in a fatal error when not
2059              running within a prior allocation (i.e. when SLURM_JOB_ID is not
2060              set).  The  default  for n is 0. If the value of --nodes exceeds
2061              the number of nodes identified with  the  --relative  option,  a
2062              warning  message  will be printed and the --relative option will
2063              take precedence. This option applies to step allocations.
2064
2065       --reservation=<reservation_names>
2066              Allocate resources for the job from the  named  reservation.  If
2067              the  job  can use more than one reservation, specify their names
2068              in a comma separate list and the one offering  earliest  initia‐
2069              tion.  Each  reservation  will be considered in the order it was
2070              requested.  All reservations will be listed  in  scontrol/squeue
2071              through  the  life of the job.  In accounting the first reserva‐
2072              tion will be seen and after the job starts the reservation  used
2073              will replace it.
2074
2075       --resv-ports[=count]
2076              Reserve  communication ports for this job. Users can specify the
2077              number  of  port  they  want  to  reserve.  The  parameter  Mpi‐
2078              Params=ports=12000-12999 must be specified in slurm.conf. If not
2079              specified and Slurm's OpenMPI plugin is used,  then  by  default
2080              the  number  of reserved equal to the highest number of tasks on
2081              any node in the job step allocation.  If the number of  reserved
2082              ports is zero then no ports is reserved.  Used for OpenMPI. This
2083              option applies to job and step allocations.
2084
2085       --send-libs[=yes|no]
2086              If set to yes (or no argument), autodetect and broadcast the ex‐
2087              ecutable's  shared  object  dependencies  to  allocated  compute
2088              nodes. The files are placed in a directory  alongside  the  exe‐
2089              cutable. The LD_LIBRARY_PATH is automatically updated to include
2090              this cache directory as well. This overrides the default  behav‐
2091              ior  configured  in  slurm.conf SbcastParameters send_libs. This
2092              option  only  works  in  conjunction  with  --bcast.  See   also
2093              --bcast-exclude.
2094
2095       --signal=[R:]<sig_num>[@sig_time]
2096              When  a  job is within sig_time seconds of its end time, send it
2097              the signal sig_num.  Due to the resolution of event handling  by
2098              Slurm,  the  signal  may  be  sent up to 60 seconds earlier than
2099              specified.  sig_num may either be a signal number or name  (e.g.
2100              "10"  or "USR1").  sig_time must have an integer value between 0
2101              and 65535.  By default, no signal is sent before the  job's  end
2102              time.   If  a sig_num is specified without any sig_time, the de‐
2103              fault time will be 60 seconds. This option applies to job  allo‐
2104              cations.   Use the "R:" option to allow this job to overlap with
2105              a reservation with MaxStartDelay set.  To have the  signal  sent
2106              at preemption time see the preempt_send_user_signal SlurmctldPa‐
2107              rameter.
2108
2109       --slurmd-debug=<level>
2110              Specify a debug level for slurmd(8). The level may be  specified
2111              either  an  integer value between 0 [quiet, only errors are dis‐
2112              played] and 4 [verbose operation] or the SlurmdDebug tags.
2113
2114              quiet     Log nothing
2115
2116              fatal     Log only fatal errors
2117
2118              error     Log only errors
2119
2120              info      Log errors and general informational messages
2121
2122              verbose   Log errors and verbose informational messages
2123
2124              The slurmd debug information is copied onto the  stderr  of  the
2125              job.  By  default only errors are displayed. This option applies
2126              to job and step allocations.
2127
2128       --sockets-per-node=<sockets>
2129              Restrict node selection to nodes with  at  least  the  specified
2130              number  of  sockets.  See additional information under -B option
2131              above when task/affinity plugin is enabled. This option  applies
2132              to job allocations.
2133              NOTE:  This  option may implicitly impact the number of tasks if
2134              -n was not specified.
2135
2136       --spread-job
2137              Spread the job allocation over as many nodes as possible and at‐
2138              tempt  to  evenly  distribute  tasks across the allocated nodes.
2139              This option disables the topology/tree plugin.  This option  ap‐
2140              plies to job allocations.
2141
2142       --switches=<count>[@max-time]
2143              When  a tree topology is used, this defines the maximum count of
2144              leaf switches desired for the job allocation and optionally  the
2145              maximum time to wait for that number of switches. If Slurm finds
2146              an allocation containing more switches than the count specified,
2147              the job remains pending until it either finds an allocation with
2148              desired switch count or the time limit expires.  It there is  no
2149              switch  count limit, there is no delay in starting the job.  Ac‐
2150              ceptable  time  formats  include  "minutes",  "minutes:seconds",
2151              "hours:minutes:seconds",  "days-hours", "days-hours:minutes" and
2152              "days-hours:minutes:seconds".  The job's maximum time delay  may
2153              be limited by the system administrator using the SchedulerParam‐
2154              eters configuration parameter with the max_switch_wait parameter
2155              option.   On a dragonfly network the only switch count supported
2156              is 1 since communication performance will be highest when a  job
2157              is  allocate  resources  on  one leaf switch or more than 2 leaf
2158              switches.  The default max-time is  the  max_switch_wait  Sched‐
2159              ulerParameters. This option applies to job allocations.
2160
2161       --task-epilog=<executable>
2162              The  slurmstepd  daemon will run executable just after each task
2163              terminates. This will be executed before any TaskEpilog  parame‐
2164              ter  in  slurm.conf  is  executed.  This  is  meant to be a very
2165              short-lived program. If it fails to terminate within a few  sec‐
2166              onds,  it  will  be  killed along with any descendant processes.
2167              This option applies to step allocations.
2168
2169       --task-prolog=<executable>
2170              The slurmstepd daemon will run executable just before  launching
2171              each  task. This will be executed after any TaskProlog parameter
2172              in slurm.conf is executed.  Besides the normal environment vari‐
2173              ables, this has SLURM_TASK_PID available to identify the process
2174              ID of the task being started.  Standard output from this program
2175              of  the form "export NAME=value" will be used to set environment
2176              variables for the task being spawned.  This  option  applies  to
2177              step allocations.
2178
2179       --test-only
2180              Returns  an  estimate  of  when  a job would be scheduled to run
2181              given the current job queue and all  the  other  srun  arguments
2182              specifying  the job.  This limits srun's behavior to just return
2183              information; no job is actually submitted.  The program will  be
2184              executed  directly  by the slurmd daemon. This option applies to
2185              job allocations.
2186
2187       --thread-spec=<num>
2188              Count of specialized threads per node reserved by  the  job  for
2189              system  operations and not used by the application. The applica‐
2190              tion will not use these threads, but will be charged  for  their
2191              allocation.   This  option  can not be used with the --core-spec
2192              option. This option applies to job allocations.
2193
2194       -T, --threads=<nthreads>
2195              Allows limiting the number of concurrent threads  used  to  send
2196              the job request from the srun process to the slurmd processes on
2197              the allocated nodes. Default is to use one thread per  allocated
2198              node  up  to a maximum of 60 concurrent threads. Specifying this
2199              option limits the number of concurrent threads to nthreads (less
2200              than  or  equal  to  60).  This should only be used to set a low
2201              thread count for testing on very small  memory  computers.  This
2202              option applies to job allocations.
2203
2204       --threads-per-core=<threads>
2205              Restrict  node  selection  to  nodes with at least the specified
2206              number of threads per core. In task layout,  use  the  specified
2207              maximum  number  of threads per core. Implies --cpu-bind=threads
2208              unless overridden by command line or environment options.  NOTE:
2209              "Threads"  refers to the number of processing units on each core
2210              rather than the number of application tasks to be  launched  per
2211              core.  See  additional  information  under  -B option above when
2212              task/affinity plugin is enabled. This option applies to job  and
2213              step allocations.
2214              NOTE:  This  option may implicitly impact the number of tasks if
2215              -n was not specified.
2216
2217       -t, --time=<time>
2218              Set a limit on the total run time of the job allocation.  If the
2219              requested time limit exceeds the partition's time limit, the job
2220              will be left in a PENDING state  (possibly  indefinitely).   The
2221              default  time limit is the partition's default time limit.  When
2222              the time limit is reached, each task in each job  step  is  sent
2223              SIGTERM  followed  by  SIGKILL.  The interval between signals is
2224              specified by the Slurm configuration  parameter  KillWait.   The
2225              OverTimeLimit  configuration parameter may permit the job to run
2226              longer than scheduled.  Time resolution is one minute and second
2227              values are rounded up to the next minute.
2228
2229              A  time  limit  of  zero requests that no time limit be imposed.
2230              Acceptable time formats  include  "minutes",  "minutes:seconds",
2231              "hours:minutes:seconds",  "days-hours", "days-hours:minutes" and
2232              "days-hours:minutes:seconds". This option  applies  to  job  and
2233              step allocations.
2234
2235       --time-min=<time>
2236              Set  a  minimum time limit on the job allocation.  If specified,
2237              the job may have its --time limit lowered to a  value  no  lower
2238              than  --time-min  if doing so permits the job to begin execution
2239              earlier than otherwise possible.  The job's time limit will  not
2240              be  changed  after the job is allocated resources.  This is per‐
2241              formed by a backfill scheduling algorithm to allocate  resources
2242              otherwise  reserved  for  higher priority jobs.  Acceptable time
2243              formats  include   "minutes",   "minutes:seconds",   "hours:min‐
2244              utes:seconds",     "days-hours",     "days-hours:minutes"    and
2245              "days-hours:minutes:seconds". This option applies to job alloca‐
2246              tions.
2247
2248       --tmp=<size>[units]
2249              Specify  a minimum amount of temporary disk space per node.  De‐
2250              fault units are megabytes.  Different units can be specified us‐
2251              ing  the  suffix  [K|M|G|T].  This option applies to job alloca‐
2252              tions.
2253
2254       --uid=<user>
2255              Attempt to submit and/or run a job as user instead of the invok‐
2256              ing  user  id.  The  invoking user's credentials will be used to
2257              check access permissions for the target partition. User root may
2258              use  this option to run jobs as a normal user in a RootOnly par‐
2259              tition for example. If run as root, srun will drop  its  permis‐
2260              sions  to the uid specified after node allocation is successful.
2261              user may be the user name or numerical user ID. This option  ap‐
2262              plies to job and step allocations.
2263
2264       -u, --unbuffered
2265              By   default,   the   connection   between  slurmstepd  and  the
2266              user-launched application is over a pipe. The stdio output writ‐
2267              ten  by  the  application  is  buffered by the glibc until it is
2268              flushed or the output is set as unbuffered.  See  setbuf(3).  If
2269              this  option  is  specified the tasks are executed with a pseudo
2270              terminal so that the application output is unbuffered. This  op‐
2271              tion applies to step allocations.
2272
2273       --usage
2274              Display brief help message and exit.
2275
2276       --use-min-nodes
2277              If a range of node counts is given, prefer the smaller count.
2278
2279       -v, --verbose
2280              Increase the verbosity of srun's informational messages.  Multi‐
2281              ple -v's will further increase  srun's  verbosity.   By  default
2282              only  errors  will  be displayed. This option applies to job and
2283              step allocations.
2284
2285       -V, --version
2286              Display version information and exit.
2287
2288       -W, --wait=<seconds>
2289              Specify how long to wait after the first task terminates  before
2290              terminating  all  remaining tasks. A value of 0 indicates an un‐
2291              limited wait (a warning will be issued after  60  seconds).  The
2292              default value is set by the WaitTime parameter in the slurm con‐
2293              figuration file (see slurm.conf(5)). This option can  be  useful
2294              to  ensure  that  a job is terminated in a timely fashion in the
2295              event that one or more tasks terminate prematurely.   Note:  The
2296              -K,  --kill-on-bad-exit  option takes precedence over -W, --wait
2297              to terminate the job immediately if a task exits with a non-zero
2298              exit code. This option applies to job allocations.
2299
2300       --wckey=<wckey>
2301              Specify  wckey  to be used with job.  If TrackWCKey=no (default)
2302              in the slurm.conf this value is ignored. This option applies  to
2303              job allocations.
2304
2305       --x11[={all|first|last}]
2306              Sets  up  X11  forwarding on "all", "first" or "last" node(s) of
2307              the allocation.  This option is only enabled if Slurm  was  com‐
2308              piled  with  X11  support  and PrologFlags=x11 is defined in the
2309              slurm.conf. Default is "all".
2310
2311       srun will submit the job request to the slurm job controller, then ini‐
2312       tiate  all  processes on the remote nodes. If the request cannot be met
2313       immediately, srun will block until the resources are free  to  run  the
2314       job. If the -I (--immediate) option is specified srun will terminate if
2315       resources are not immediately available.
2316
2317       When initiating remote processes srun will propagate the current  work‐
2318       ing  directory,  unless --chdir=<path> is specified, in which case path
2319       will become the working directory for the remote processes.
2320
2321       The -n, -c, and -N options control how CPUs  and nodes  will  be  allo‐
2322       cated  to  the job. When specifying only the number of processes to run
2323       with -n, a default of one CPU per process is allocated.  By  specifying
2324       the number of CPUs required per task (-c), more than one CPU may be al‐
2325       located per process. If the number of nodes is specified with -N,  srun
2326       will attempt to allocate at least the number of nodes specified.
2327
2328       Combinations  of the above three options may be used to change how pro‐
2329       cesses are distributed across nodes and cpus. For instance, by specify‐
2330       ing  both  the number of processes and number of nodes on which to run,
2331       the number of processes per node is implied. However, if the number  of
2332       CPUs  per  process  is more important then number of processes (-n) and
2333       the number of CPUs per process (-c) should be specified.
2334
2335       srun will refuse to  allocate more than  one  process  per  CPU  unless
2336       --overcommit (-O) is also specified.
2337
2338       srun will attempt to meet the above specifications "at a minimum." That
2339       is, if 16 nodes are requested for 32 processes, and some nodes  do  not
2340       have 2 CPUs, the allocation of nodes will be increased in order to meet
2341       the demand for CPUs. In other words, a minimum of 16  nodes  are  being
2342       requested.  However,  if  16 nodes are requested for 15 processes, srun
2343       will consider this an error, as  15  processes  cannot  run  across  16
2344       nodes.
2345
2346
2347       IO Redirection
2348
2349       By  default, stdout and stderr will be redirected from all tasks to the
2350       stdout and stderr of srun, and stdin will be redirected from the  stan‐
2351       dard input of srun to all remote tasks.  If stdin is only to be read by
2352       a subset of the spawned tasks, specifying a file to  read  from  rather
2353       than  forwarding  stdin  from  the srun command may be preferable as it
2354       avoids moving and storing data that will never be read.
2355
2356       For OS X, the poll() function does not support stdin, so input  from  a
2357       terminal is not possible.
2358
2359       This  behavior  may  be changed with the --output, --error, and --input
2360       (-o, -e, -i) options. Valid format specifications for these options are
2361
2362
2363       all       stdout stderr is redirected from all tasks to srun.  stdin is
2364                 broadcast  to  all remote tasks.  (This is the default behav‐
2365                 ior)
2366
2367       none      stdout and stderr is not received from any  task.   stdin  is
2368                 not sent to any task (stdin is closed).
2369
2370       taskid    stdout  and/or  stderr are redirected from only the task with
2371                 relative id equal to taskid, where 0  <=  taskid  <=  ntasks,
2372                 where  ntasks is the total number of tasks in the current job
2373                 step.  stdin is redirected from the stdin  of  srun  to  this
2374                 same  task.   This file will be written on the node executing
2375                 the task.
2376
2377       filename  srun will redirect stdout and/or stderr  to  the  named  file
2378                 from all tasks.  stdin will be redirected from the named file
2379                 and broadcast to all tasks in the job.  filename refers to  a
2380                 path  on the host that runs srun.  Depending on the cluster's
2381                 file system layout, this may result in the  output  appearing
2382                 in  different  places  depending on whether the job is run in
2383                 batch mode.
2384
2385       filename pattern
2386                 srun allows for a filename pattern to be used to generate the
2387                 named  IO  file described above. The following list of format
2388                 specifiers may be used in the format  string  to  generate  a
2389                 filename  that will be unique to a given jobid, stepid, node,
2390                 or task. In each case, the appropriate number  of  files  are
2391                 opened and associated with the corresponding tasks. Note that
2392                 any format string containing %t, %n, and/or %N will be  writ‐
2393                 ten on the node executing the task rather than the node where
2394                 srun executes, these format specifiers are not supported on a
2395                 BGQ system.
2396
2397                 \\     Do not process any of the replacement symbols.
2398
2399                 %%     The character "%".
2400
2401                 %A     Job array's master job allocation number.
2402
2403                 %a     Job array ID (index) number.
2404
2405                 %J     jobid.stepid of the running job. (e.g. "128.0")
2406
2407                 %j     jobid of the running job.
2408
2409                 %s     stepid of the running job.
2410
2411                 %N     short  hostname.  This  will create a separate IO file
2412                        per node.
2413
2414                 %n     Node identifier relative to current job (e.g.  "0"  is
2415                        the  first node of the running job) This will create a
2416                        separate IO file per node.
2417
2418                 %t     task identifier (rank) relative to current  job.  This
2419                        will create a separate IO file per task.
2420
2421                 %u     User name.
2422
2423                 %x     Job name.
2424
2425                 A  number  placed  between  the  percent character and format
2426                 specifier may be used to zero-pad the result in the IO  file‐
2427                 name.  This  number is ignored if the format specifier corre‐
2428                 sponds to  non-numeric data (%N for example).
2429
2430                 Some examples of how the format string may be used  for  a  4
2431                 task  job  step with a Job ID of 128 and step id of 0 are in‐
2432                 cluded below:
2433
2434
2435                 job%J.out      job128.0.out
2436
2437                 job%4j.out     job0128.out
2438
2439                 job%j-%2t.out  job128-00.out, job128-01.out, ...
2440

PERFORMANCE

2442       Executing srun sends a remote procedure call to  slurmctld.  If  enough
2443       calls  from srun or other Slurm client commands that send remote proce‐
2444       dure calls to the slurmctld daemon come in at once, it can result in  a
2445       degradation  of performance of the slurmctld daemon, possibly resulting
2446       in a denial of service.
2447
2448       Do not run srun or other Slurm client commands that send remote  proce‐
2449       dure  calls to slurmctld from loops in shell scripts or other programs.
2450       Ensure that programs limit calls to srun to the minimum  necessary  for
2451       the information you are trying to gather.
2452
2453

INPUT ENVIRONMENT VARIABLES

2455       Upon  startup, srun will read and handle the options set in the follow‐
2456       ing environment variables. The majority of these variables are set  the
2457       same  way  the options are set, as defined above. For flag options that
2458       are defined to expect no argument, the option can be enabled by setting
2459       the  environment  variable  without a value (empty or NULL string), the
2460       string 'yes', or a non-zero number. Any other value for the environment
2461       variable  will  result in the option not being set.  There are a couple
2462       exceptions to these rules that are noted below.
2463       NOTE: Command line options always override  environment  variable  set‐
2464       tings.
2465
2466
2467       PMI_FANOUT            This  is  used  exclusively  with PMI (MPICH2 and
2468                             MVAPICH2) and controls the fanout of data  commu‐
2469                             nications. The srun command sends messages to ap‐
2470                             plication programs  (via  the  PMI  library)  and
2471                             those  applications may be called upon to forward
2472                             that data to up  to  this  number  of  additional
2473                             tasks.  Higher  values offload work from the srun
2474                             command to the applications and  likely  increase
2475                             the vulnerability to failures.  The default value
2476                             is 32.
2477
2478       PMI_FANOUT_OFF_HOST   This is used exclusively  with  PMI  (MPICH2  and
2479                             MVAPICH2)  and controls the fanout of data commu‐
2480                             nications.  The srun command  sends  messages  to
2481                             application  programs  (via  the PMI library) and
2482                             those applications may be called upon to  forward
2483                             that  data  to additional tasks. By default, srun
2484                             sends one message per host and one task  on  that
2485                             host  forwards  the  data  to other tasks on that
2486                             host up to PMI_FANOUT.  If PMI_FANOUT_OFF_HOST is
2487                             defined, the user task may be required to forward
2488                             the  data  to  tasks  on  other  hosts.   Setting
2489                             PMI_FANOUT_OFF_HOST   may  increase  performance.
2490                             Since more work is performed by the  PMI  library
2491                             loaded by the user application, failures also can
2492                             be more common and more  difficult  to  diagnose.
2493                             Should be disabled/enabled by setting to 0 or 1.
2494
2495       PMI_TIME              This  is  used  exclusively  with PMI (MPICH2 and
2496                             MVAPICH2) and controls how  much  the  communica‐
2497                             tions  from  the tasks to the srun are spread out
2498                             in time in order to avoid overwhelming  the  srun
2499                             command  with work. The default value is 500 (mi‐
2500                             croseconds) per task. On relatively slow  proces‐
2501                             sors  or systems with very large processor counts
2502                             (and large PMI data sets), higher values  may  be
2503                             required.
2504
2505       SLURM_ACCOUNT         Same as -A, --account
2506
2507       SLURM_ACCTG_FREQ      Same as --acctg-freq
2508
2509       SLURM_BCAST           Same as --bcast
2510
2511       SLURM_BCAST_EXCLUDE   Same as --bcast-exclude
2512
2513       SLURM_BURST_BUFFER    Same as --bb
2514
2515       SLURM_CLUSTERS        Same as -M, --clusters
2516
2517       SLURM_COMPRESS        Same as --compress
2518
2519       SLURM_CONF            The location of the Slurm configuration file.
2520
2521       SLURM_CONSTRAINT      Same as -C, --constraint
2522
2523       SLURM_CORE_SPEC       Same as --core-spec
2524
2525       SLURM_CPU_BIND        Same as --cpu-bind
2526
2527       SLURM_CPU_FREQ_REQ    Same as --cpu-freq.
2528
2529       SLURM_CPUS_PER_GPU    Same as --cpus-per-gpu
2530
2531       SLURM_CPUS_PER_TASK   Same as -c, --cpus-per-task
2532
2533       SLURM_DEBUG           Same  as  -v, --verbose. Must be set to 0 or 1 to
2534                             disable or enable the option.
2535
2536       SLURM_DELAY_BOOT      Same as --delay-boot
2537
2538       SLURM_DEPENDENCY      Same as -d, --dependency=<jobid>
2539
2540       SLURM_DISABLE_STATUS  Same as -X, --disable-status
2541
2542       SLURM_DIST_PLANESIZE  Plane distribution size. Only used if --distribu‐
2543                             tion=plane, without =<size>, is set.
2544
2545       SLURM_DISTRIBUTION    Same as -m, --distribution
2546
2547       SLURM_EPILOG          Same as --epilog
2548
2549       SLURM_EXACT           Same as --exact
2550
2551       SLURM_EXCLUSIVE       Same as --exclusive
2552
2553       SLURM_EXIT_ERROR      Specifies  the  exit  code generated when a Slurm
2554                             error occurs (e.g. invalid options).  This can be
2555                             used  by a script to distinguish application exit
2556                             codes from various Slurm error conditions.   Also
2557                             see SLURM_EXIT_IMMEDIATE.
2558
2559       SLURM_EXIT_IMMEDIATE  Specifies  the exit code generated when the --im‐
2560                             mediate option is used and resources are not cur‐
2561                             rently  available.   This can be used by a script
2562                             to distinguish application exit codes from  vari‐
2563                             ous    Slurm    error   conditions.    Also   see
2564                             SLURM_EXIT_ERROR.
2565
2566       SLURM_EXPORT_ENV      Same as --export
2567
2568       SLURM_GPU_BIND        Same as --gpu-bind
2569
2570       SLURM_GPU_FREQ        Same as --gpu-freq
2571
2572       SLURM_GPUS            Same as -G, --gpus
2573
2574       SLURM_GPUS_PER_NODE   Same as --gpus-per-node
2575
2576       SLURM_GPUS_PER_TASK   Same as --gpus-per-task
2577
2578       SLURM_GRES            Same as --gres. Also see SLURM_STEP_GRES
2579
2580       SLURM_GRES_FLAGS      Same as --gres-flags
2581
2582       SLURM_HINT            Same as --hint
2583
2584       SLURM_IMMEDIATE       Same as -I, --immediate
2585
2586       SLURM_JOB_ID          Same as --jobid
2587
2588       SLURM_JOB_NAME        Same as -J, --job-name except within an  existing
2589                             allocation,  in which case it is ignored to avoid
2590                             using the batch job's name as the  name  of  each
2591                             job step.
2592
2593       SLURM_JOB_NUM_NODES   Same  as  -N,  --nodes.  Total number of nodes in
2594                             the job’s resource allocation.
2595
2596       SLURM_KILL_BAD_EXIT   Same as -K, --kill-on-bad-exit. Must be set to  0
2597                             or 1 to disable or enable the option.
2598
2599       SLURM_LABELIO         Same as -l, --label
2600
2601       SLURM_MEM_BIND        Same as --mem-bind
2602
2603       SLURM_MEM_PER_CPU     Same as --mem-per-cpu
2604
2605       SLURM_MEM_PER_GPU     Same as --mem-per-gpu
2606
2607       SLURM_MEM_PER_NODE    Same as --mem
2608
2609       SLURM_MPI_TYPE        Same as --mpi
2610
2611       SLURM_NETWORK         Same as --network
2612
2613       SLURM_NNODES          Same as -N, --nodes. Total number of nodes in the
2614                             job’s       resource       allocation.        See
2615                             SLURM_JOB_NUM_NODES.  Included for backwards com‐
2616                             patibility.
2617
2618       SLURM_NO_KILL         Same as -k, --no-kill
2619
2620       SLURM_NPROCS          Same as -n, --ntasks. See SLURM_NTASKS.  Included
2621                             for backwards compatibility.
2622
2623       SLURM_NTASKS          Same as -n, --ntasks
2624
2625       SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2626
2627       SLURM_NTASKS_PER_GPU  Same as --ntasks-per-gpu
2628
2629       SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2630
2631       SLURM_NTASKS_PER_SOCKET
2632                             Same as --ntasks-per-socket
2633
2634       SLURM_OPEN_MODE       Same as --open-mode
2635
2636       SLURM_OVERCOMMIT      Same as -O, --overcommit
2637
2638       SLURM_OVERLAP         Same as --overlap
2639
2640       SLURM_PARTITION       Same as -p, --partition
2641
2642       SLURM_PMI_KVS_NO_DUP_KEYS
2643                             If set, then PMI key-pairs will contain no dupli‐
2644                             cate keys. MPI can use this  variable  to  inform
2645                             the  PMI  library  that it will not use duplicate
2646                             keys so PMI can  skip  the  check  for  duplicate
2647                             keys.   This  is  the case for MPICH2 and reduces
2648                             overhead in testing for duplicates  for  improved
2649                             performance
2650
2651       SLURM_POWER           Same as --power
2652
2653       SLURM_PROFILE         Same as --profile
2654
2655       SLURM_PROLOG          Same as --prolog
2656
2657       SLURM_QOS             Same as --qos
2658
2659       SLURM_REMOTE_CWD      Same as -D, --chdir=
2660
2661       SLURM_REQ_SWITCH      When  a  tree  topology is used, this defines the
2662                             maximum count of switches desired for the job al‐
2663                             location  and optionally the maximum time to wait
2664                             for that number of switches. See --switches
2665
2666       SLURM_RESERVATION     Same as --reservation
2667
2668       SLURM_RESV_PORTS      Same as --resv-ports
2669
2670       SLURM_SEND_LIBS       Same as --send-libs
2671
2672       SLURM_SIGNAL          Same as --signal
2673
2674       SLURM_SPREAD_JOB      Same as --spread-job
2675
2676       SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2677                             if set and non-zero, successive  task  exit  mes‐
2678                             sages  with  the  same  exit code will be printed
2679                             only once.
2680
2681       SLURM_STDERRMODE      Same as -e, --error
2682
2683       SLURM_STDINMODE       Same as -i, --input
2684
2685       SLURM_STDOUTMODE      Same as -o, --output
2686
2687       SLURM_STEP_GRES       Same as --gres (only applies to job steps, not to
2688                             job allocations).  Also see SLURM_GRES
2689
2690       SLURM_STEP_KILLED_MSG_NODE_ID=ID
2691                             If set, only the specified node will log when the
2692                             job or step are killed by a signal.
2693
2694       SLURM_TASK_EPILOG     Same as --task-epilog
2695
2696       SLURM_TASK_PROLOG     Same as --task-prolog
2697
2698       SLURM_TEST_EXEC       If defined, srun will verify existence of the ex‐
2699                             ecutable  program along with user execute permis‐
2700                             sion on the node where srun was called before at‐
2701                             tempting to launch it on nodes in the step.
2702
2703       SLURM_THREAD_SPEC     Same as --thread-spec
2704
2705       SLURM_THREADS         Same as -T, --threads
2706
2707       SLURM_THREADS_PER_CORE
2708                             Same as --threads-per-core
2709
2710       SLURM_TIMELIMIT       Same as -t, --time
2711
2712       SLURM_UNBUFFEREDIO    Same as -u, --unbuffered
2713
2714       SLURM_USE_MIN_NODES   Same as --use-min-nodes
2715
2716       SLURM_WAIT            Same as -W, --wait
2717
2718       SLURM_WAIT4SWITCH     Max  time  waiting  for  requested  switches. See
2719                             --switches
2720
2721       SLURM_WCKEY           Same as -W, --wckey
2722
2723       SLURM_WORKING_DIR     -D, --chdir
2724
2725       SLURMD_DEBUG          Same as -d, --slurmd-debug. Must be set to 0 or 1
2726                             to disable or enable the option.
2727
2728       SRUN_CONTAINER        Same as --container.
2729
2730       SRUN_EXPORT_ENV       Same  as  --export, and will override any setting
2731                             for SLURM_EXPORT_ENV.
2732

OUTPUT ENVIRONMENT VARIABLES

2734       srun will set some environment variables in the environment of the exe‐
2735       cuting  tasks on the remote compute nodes.  These environment variables
2736       are:
2737
2738
2739       SLURM_*_HET_GROUP_#   For a heterogeneous job allocation, the  environ‐
2740                             ment variables are set separately for each compo‐
2741                             nent.
2742
2743       SLURM_CLUSTER_NAME    Name of the cluster on which the job  is  execut‐
2744                             ing.
2745
2746       SLURM_CPU_BIND_LIST   --cpu-bind  map  or  mask list (list of Slurm CPU
2747                             IDs or masks for this node, CPU_ID =  Board_ID  x
2748                             threads_per_board       +       Socket_ID       x
2749                             threads_per_socket + Core_ID x threads_per_core +
2750                             Thread_ID).
2751
2752       SLURM_CPU_BIND_TYPE   --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2753
2754       SLURM_CPU_BIND_VERBOSE
2755                             --cpu-bind verbosity (quiet,verbose).
2756
2757       SLURM_CPU_FREQ_REQ    Contains the value requested for cpu frequency on
2758                             the srun command  as  a  numerical  frequency  in
2759                             kilohertz, or a coded value for a request of low,
2760                             medium,highm1 or high for the frequency.  See the
2761                             description  of  the  --cpu-freq  option  or  the
2762                             SLURM_CPU_FREQ_REQ input environment variable.
2763
2764       SLURM_CPUS_ON_NODE    Number of CPUs available  to  the  step  on  this
2765                             node.   NOTE:  The select/linear plugin allocates
2766                             entire nodes to jobs, so the value indicates  the
2767                             total  count  of  CPUs  on the node.  For the se‐
2768                             lect/cons_res and cons/tres plugins, this  number
2769                             indicates  the  number of CPUs on this node allo‐
2770                             cated to the step.
2771
2772       SLURM_CPUS_PER_TASK   Number of cpus requested per task.  Only  set  if
2773                             the --cpus-per-task option is specified.
2774
2775       SLURM_DISTRIBUTION    Distribution type for the allocated jobs. Set the
2776                             distribution with -m, --distribution.
2777
2778       SLURM_GPUS_ON_NODE    Number of GPUs available  to  the  step  on  this
2779                             node.
2780
2781       SLURM_GTIDS           Global  task IDs running on this node.  Zero ori‐
2782                             gin and comma separated.  It is  read  internally
2783                             by pmi if Slurm was built with pmi support. Leav‐
2784                             ing the variable set may cause problems when  us‐
2785                             ing external packages from within the job (Abaqus
2786                             and Ansys have been known to have  problems  when
2787                             it is set - consult the appropriate documentation
2788                             for 3rd party software).
2789
2790       SLURM_HET_SIZE        Set to count of components in heterogeneous job.
2791
2792       SLURM_JOB_ACCOUNT     Account name associated of the job allocation.
2793
2794       SLURM_JOB_CPUS_PER_NODE
2795                             Count of CPUs available to the job on  the  nodes
2796                             in    the    allocation,    using    the   format
2797                             CPU_count[(xnumber_of_nodes)][,CPU_count  [(xnum‐
2798                             ber_of_nodes)]      ...].       For      example:
2799                             SLURM_JOB_CPUS_PER_NODE='72(x2),36'     indicates
2800                             that  on the first and second nodes (as listed by
2801                             SLURM_JOB_NODELIST) the allocation has  72  CPUs,
2802                             while  the third node has 36 CPUs.  NOTE: The se‐
2803                             lect/linear  plugin  allocates  entire  nodes  to
2804                             jobs,  so  the value indicates the total count of
2805                             CPUs on allocated nodes. The select/cons_res  and
2806                             select/cons_tres plugins allocate individual CPUs
2807                             to jobs, so this number indicates the  number  of
2808                             CPUs allocated to the job.
2809
2810       SLURM_JOB_DEPENDENCY  Set to value of the --dependency option.
2811
2812       SLURM_JOB_ID          Job id of the executing job.
2813
2814       SLURM_JOB_NAME        Set  to the value of the --job-name option or the
2815                             command name when srun is used to  create  a  new
2816                             job allocation. Not set when srun is used only to
2817                             create a job step (i.e. within  an  existing  job
2818                             allocation).
2819
2820       SLURM_JOB_NODELIST    List of nodes allocated to the job.
2821
2822       SLURM_JOB_NODES       Total number of nodes in the job's resource allo‐
2823                             cation.
2824
2825       SLURM_JOB_PARTITION   Name of the partition in which the  job  is  run‐
2826                             ning.
2827
2828       SLURM_JOB_QOS         Quality Of Service (QOS) of the job allocation.
2829
2830       SLURM_JOB_RESERVATION Advanced  reservation  containing the job alloca‐
2831                             tion, if any.
2832
2833       SLURM_JOBID           Job id of the executing  job.  See  SLURM_JOB_ID.
2834                             Included for backwards compatibility.
2835
2836       SLURM_LAUNCH_NODE_IPADDR
2837                             IP address of the node from which the task launch
2838                             was initiated (where the srun command ran from).
2839
2840       SLURM_LOCALID         Node local task ID for the process within a job.
2841
2842       SLURM_MEM_BIND_LIST   --mem-bind map or mask  list  (<list  of  IDs  or
2843                             masks for this node>).
2844
2845       SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2846
2847       SLURM_MEM_BIND_SORT   Sort  free cache pages (run zonesort on Intel KNL
2848                             nodes).
2849
2850       SLURM_MEM_BIND_TYPE   --mem-bind type (none,rank,map_mem:,mask_mem:).
2851
2852       SLURM_MEM_BIND_VERBOSE
2853                             --mem-bind verbosity (quiet,verbose).
2854
2855       SLURM_NODE_ALIASES    Sets of  node  name,  communication  address  and
2856                             hostname  for nodes allocated to the job from the
2857                             cloud. Each element in the set if colon separated
2858                             and each set is comma separated. For example:
2859                             SLURM_NODE_ALIASES=
2860                             ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2861
2862       SLURM_NODEID          The relative node ID of the current node.
2863
2864       SLURM_NPROCS          Total  number  of processes in the current job or
2865                             job step. See SLURM_NTASKS.  Included  for  back‐
2866                             wards compatibility.
2867
2868       SLURM_NTASKS          Total  number  of processes in the current job or
2869                             job step.
2870
2871       SLURM_OVERCOMMIT      Set to 1 if --overcommit was specified.
2872
2873       SLURM_PRIO_PROCESS    The scheduling priority (nice value) at the  time
2874                             of  job  submission.  This value is propagated to
2875                             the spawned processes.
2876
2877       SLURM_PROCID          The MPI rank (or relative process ID) of the cur‐
2878                             rent process.
2879
2880       SLURM_SRUN_COMM_HOST  IP address of srun communication host.
2881
2882       SLURM_SRUN_COMM_PORT  srun communication port.
2883
2884       SLURM_CONTAINER       OCI  Bundle  for job.  Only set if --container is
2885                             specified.
2886
2887       SLURM_STEP_ID         The step ID of the current job.
2888
2889       SLURM_STEP_LAUNCHER_PORT
2890                             Step launcher port.
2891
2892       SLURM_STEP_NODELIST   List of nodes allocated to the step.
2893
2894       SLURM_STEP_NUM_NODES  Number of nodes allocated to the step.
2895
2896       SLURM_STEP_NUM_TASKS  Number of processes in the job step or whole het‐
2897                             erogeneous job step.
2898
2899       SLURM_STEP_TASKS_PER_NODE
2900                             Number of processes per node within the step.
2901
2902       SLURM_STEPID          The   step   ID   of   the   current   job.   See
2903                             SLURM_STEP_ID. Included for backwards compatibil‐
2904                             ity.
2905
2906       SLURM_SUBMIT_DIR      The  directory  from which the allocation was in‐
2907                             voked from.
2908
2909       SLURM_SUBMIT_HOST     The hostname of the computer from which the allo‐
2910                             cation was invoked from.
2911
2912       SLURM_TASK_PID        The process ID of the task being started.
2913
2914       SLURM_TASKS_PER_NODE  Number  of  tasks  to  be initiated on each node.
2915                             Values are comma separated and in the same  order
2916                             as  SLURM_JOB_NODELIST.   If two or more consecu‐
2917                             tive nodes are to have the same task count,  that
2918                             count is followed by "(x#)" where "#" is the rep‐
2919                             etition        count.        For         example,
2920                             "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2921                             first three nodes will each execute two tasks and
2922                             the fourth node will execute one task.
2923
2924       SLURM_TOPOLOGY_ADDR   This  is  set  only  if the system has the topol‐
2925                             ogy/tree plugin configured.  The  value  will  be
2926                             set  to  the  names network switches which may be
2927                             involved in the  job's  communications  from  the
2928                             system's top level switch down to the leaf switch
2929                             and ending with node name. A period  is  used  to
2930                             separate each hardware component name.
2931
2932       SLURM_TOPOLOGY_ADDR_PATTERN
2933                             This  is  set  only  if the system has the topol‐
2934                             ogy/tree plugin configured.  The  value  will  be
2935                             set   component   types  listed  in  SLURM_TOPOL‐
2936                             OGY_ADDR.  Each component will be  identified  as
2937                             either  "switch"  or "node".  A period is used to
2938                             separate each hardware component type.
2939
2940       SLURM_UMASK           The umask in effect when the job was submitted.
2941
2942       SLURMD_NODENAME       Name of the node running the task. In the case of
2943                             a  parallel  job  executing  on  multiple compute
2944                             nodes, the various tasks will have this  environ‐
2945                             ment  variable  set  to  different values on each
2946                             compute node.
2947
2948       SRUN_DEBUG            Set to the logging level  of  the  srun  command.
2949                             Default  value  is  3 (info level).  The value is
2950                             incremented or decremented based upon the  --ver‐
2951                             bose and --quiet options.
2952

SIGNALS AND ESCAPE SEQUENCES

2954       Signals  sent  to  the  srun command are automatically forwarded to the
2955       tasks it is controlling with a  few  exceptions.  The  escape  sequence
2956       <control-c> will report the state of all tasks associated with the srun
2957       command. If <control-c> is entered twice within one  second,  then  the
2958       associated  SIGINT  signal  will be sent to all tasks and a termination
2959       sequence will be entered sending SIGCONT, SIGTERM, and SIGKILL  to  all
2960       spawned  tasks.   If  a third <control-c> is received, the srun program
2961       will be terminated without waiting for remote tasks to  exit  or  their
2962       I/O to complete.
2963
2964       The escape sequence <control-z> is presently ignored.
2965
2966

MPI SUPPORT

2968       MPI  use depends upon the type of MPI being used.  There are three fun‐
2969       damentally different modes of operation used by these various  MPI  im‐
2970       plementations.
2971
2972       1.  Slurm  directly  launches  the tasks and performs initialization of
2973       communications through the PMI2 or PMIx APIs.  For example: "srun  -n16
2974       a.out".
2975
2976       2.  Slurm  creates  a  resource  allocation for the job and then mpirun
2977       launches tasks using Slurm's infrastructure (OpenMPI).
2978
2979       3. Slurm creates a resource allocation for  the  job  and  then  mpirun
2980       launches  tasks  using  some mechanism other than Slurm, such as SSH or
2981       RSH.  These tasks are initiated outside of Slurm's monitoring  or  con‐
2982       trol. Slurm's epilog should be configured to purge these tasks when the
2983       job's allocation is relinquished, or  the  use  of  pam_slurm_adopt  is
2984       highly recommended.
2985
2986       See  https://slurm.schedmd.com/mpi_guide.html  for  more information on
2987       use of these various MPI implementations with Slurm.
2988
2989

MULTIPLE PROGRAM CONFIGURATION

2991       Comments in the configuration file must have a "#" in column one.   The
2992       configuration  file  contains  the  following fields separated by white
2993       space:
2994
2995
2996       Task rank
2997              One or more task ranks to use this configuration.  Multiple val‐
2998              ues  may  be  comma separated.  Ranges may be indicated with two
2999              numbers separated with a '-' with the smaller number first (e.g.
3000              "0-4" and not "4-0").  To indicate all tasks not otherwise spec‐
3001              ified, specify a rank of '*' as the last line of the  file.   If
3002              an  attempt  is  made to initiate a task for which no executable
3003              program is defined, the following error message will be produced
3004              "No executable program specified for this task".
3005
3006       Executable
3007              The  name  of  the  program  to execute.  May be fully qualified
3008              pathname if desired.
3009
3010       Arguments
3011              Program arguments.  The expression "%t" will  be  replaced  with
3012              the  task's  number.   The expression "%o" will be replaced with
3013              the task's offset within this range (e.g. a configured task rank
3014              value  of  "1-5"  would  have  offset  values of "0-4").  Single
3015              quotes may be used to avoid having the  enclosed  values  inter‐
3016              preted.   This field is optional.  Any arguments for the program
3017              entered on the command line will be added to the arguments spec‐
3018              ified in the configuration file.
3019
3020       For example:
3021
3022       $ cat silly.conf
3023       ###################################################################
3024       # srun multiple program configuration file
3025       #
3026       # srun -n8 -l --multi-prog silly.conf
3027       ###################################################################
3028       4-6       hostname
3029       1,7       echo  task:%t
3030       0,2-3     echo  offset:%o
3031
3032       $ srun -n8 -l --multi-prog silly.conf
3033       0: offset:0
3034       1: task:1
3035       2: offset:1
3036       3: offset:2
3037       4: linux15.llnl.gov
3038       5: linux16.llnl.gov
3039       6: linux17.llnl.gov
3040       7: task:7
3041
3042

EXAMPLES

3044       This  simple example demonstrates the execution of the command hostname
3045       in eight tasks. At least eight processors will be allocated to the  job
3046       (the same as the task count) on however many nodes are required to sat‐
3047       isfy the request. The output of each task will be  proceeded  with  its
3048       task  number.   (The  machine "dev" in the example below has a total of
3049       two CPUs per node)
3050
3051       $ srun -n8 -l hostname
3052       0: dev0
3053       1: dev0
3054       2: dev1
3055       3: dev1
3056       4: dev2
3057       5: dev2
3058       6: dev3
3059       7: dev3
3060
3061
3062       The srun -r option is used within a job script to run two job steps  on
3063       disjoint  nodes in the following example. The script is run using allo‐
3064       cate mode instead of as a batch job in this case.
3065
3066       $ cat test.sh
3067       #!/bin/sh
3068       echo $SLURM_JOB_NODELIST
3069       srun -lN2 -r2 hostname
3070       srun -lN2 hostname
3071
3072       $ salloc -N4 test.sh
3073       dev[7-10]
3074       0: dev9
3075       1: dev10
3076       0: dev7
3077       1: dev8
3078
3079
3080       The following script runs two job steps in parallel within an allocated
3081       set of nodes.
3082
3083       $ cat test.sh
3084       #!/bin/bash
3085       srun -lN2 -n4 -r 2 sleep 60 &
3086       srun -lN2 -r 0 sleep 60 &
3087       sleep 1
3088       squeue
3089       squeue -s
3090       wait
3091
3092       $ salloc -N4 test.sh
3093         JOBID PARTITION     NAME     USER  ST      TIME  NODES NODELIST
3094         65641     batch  test.sh   grondo   R      0:01      4 dev[7-10]
3095
3096       STEPID     PARTITION     USER      TIME NODELIST
3097       65641.0        batch   grondo      0:01 dev[7-8]
3098       65641.1        batch   grondo      0:01 dev[9-10]
3099
3100
3101       This  example  demonstrates  how one executes a simple MPI job.  We use
3102       srun to build a list of machines (nodes) to be used by  mpirun  in  its
3103       required  format.  A  sample command line and the script to be executed
3104       follow.
3105
3106       $ cat test.sh
3107       #!/bin/sh
3108       MACHINEFILE="nodes.$SLURM_JOB_ID"
3109
3110       # Generate Machinefile for mpi such that hosts are in the same
3111       #  order as if run via srun
3112       #
3113       srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3114
3115       # Run using generated Machine file:
3116       mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3117
3118       rm $MACHINEFILE
3119
3120       $ salloc -N2 -n4 test.sh
3121
3122
3123       This simple example demonstrates the execution  of  different  jobs  on
3124       different  nodes  in  the same srun.  You can do this for any number of
3125       nodes or any number of jobs.  The executables are placed on  the  nodes
3126       sited by the SLURM_NODEID env var.  Starting at 0 and going to the num‐
3127       ber specified on the srun command line.
3128
3129       $ cat test.sh
3130       case $SLURM_NODEID in
3131           0) echo "I am running on "
3132              hostname ;;
3133           1) hostname
3134              echo "is where I am running" ;;
3135       esac
3136
3137       $ srun -N2 test.sh
3138       dev0
3139       is where I am running
3140       I am running on
3141       dev1
3142
3143
3144       This example demonstrates use of multi-core options to  control  layout
3145       of  tasks.   We  request  that  four sockets per node and two cores per
3146       socket be dedicated to the job.
3147
3148       $ srun -N2 -B 4-4:2-2 a.out
3149
3150
3151       This example shows a script in which Slurm is used to provide  resource
3152       management  for  a job by executing the various job steps as processors
3153       become available for their dedicated use.
3154
3155       $ cat my.script
3156       #!/bin/bash
3157       srun -n4 prog1 &
3158       srun -n3 prog2 &
3159       srun -n1 prog3 &
3160       srun -n1 prog4 &
3161       wait
3162
3163
3164       This example shows how to launch an application  called  "server"  with
3165       one task, 8 CPUs and 16 GB of memory (2 GB per CPU) plus another appli‐
3166       cation called "client" with 16 tasks, 1 CPU per task (the default)  and
3167       1 GB of memory per task.
3168
3169       $ srun -n1 -c16 --mem-per-cpu=1gb server : -n16 --mem-per-cpu=1gb client
3170
3171

COPYING

3173       Copyright  (C)  2006-2007  The Regents of the University of California.
3174       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3175       Copyright (C) 2008-2010 Lawrence Livermore National Security.
3176       Copyright (C) 2010-2022 SchedMD LLC.
3177
3178       This file is part of Slurm, a resource  management  program.   For  de‐
3179       tails, see <https://slurm.schedmd.com/>.
3180
3181       Slurm  is free software; you can redistribute it and/or modify it under
3182       the terms of the GNU General Public License as published  by  the  Free
3183       Software  Foundation;  either version 2 of the License, or (at your op‐
3184       tion) any later version.
3185
3186       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
3187       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
3188       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
3189       for more details.
3190
3191

SEE ALSO

3193       salloc(1),  sattach(1),  sbatch(1), sbcast(1), scancel(1), scontrol(1),
3194       squeue(1), slurm.conf(5), sched_setaffinity (2), numa (3) getrlimit (2)
3195
3196
3197
3198April 2022                      Slurm Commands                         srun(1)
Impressum