srun(1) - f35

1srun(1)                         Slurm Commands                         srun(1)
2
3
4

NAME

6       srun - Run parallel jobs
7
8

SYNOPSIS

10       srun  [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]]
11       executable(N) [args(N)...]
12
13       Option(s) define multiple jobs in  a  co-scheduled  heterogeneous  job.
14       For more details about heterogeneous jobs see the document
15       https://slurm.schedmd.com/heterogeneous_jobs.html
16
17

DESCRIPTION

19       Run  a  parallel  job  on cluster managed by Slurm.  If necessary, srun
20       will first create a resource allocation in which to  run  the  parallel
21       job.
22
23       The  following  document  describes the influence of various options on
24       the allocation of cpus to jobs and tasks.
25       https://slurm.schedmd.com/cpu_management.html
26
27

RETURN VALUE

29       srun will return the highest exit code of all tasks run or the  highest
30       signal  (with  the high-order bit set in an 8-bit integer -- e.g. 128 +
31       signal) of any task that exited with a signal.
32       The value 253 is reserved for out-of-memory errors.
33
34

EXECUTABLE PATH RESOLUTION

36       The executable is resolved in the following order:
37
38       1. If executable starts with ".", then path is constructed as:  current
39       working directory / executable
40       2. If executable starts with a "/", then path is considered absolute.
41       3. If executable can be resolved through PATH. See path_resolution(7).
42       4. If executable is in current working directory.
43
44       Current  working directory is the calling process working directory un‐
45       less the --chdir argument is passed, which will  override  the  current
46       working directory.
47
48

OPTIONS

50       --accel-bind=<options>
51              Control how tasks are bound to generic resources of type gpu and
52              nic.  Multiple options may be specified. Supported  options  in‐
53              clude:
54
55              g      Bind each task to GPUs which are closest to the allocated
56                     CPUs.
57
58              n      Bind each task to NICs which are closest to the allocated
59                     CPUs.
60
61              v      Verbose  mode. Log how tasks are bound to GPU and NIC de‐
62                     vices.
63
64              This option applies to job allocations.
65
66
67       -A, --account=<account>
68              Charge resources used by this job to specified account.  The ac‐
69              count  is  an  arbitrary string. The account name may be changed
70              after job submission using the scontrol command. This option ap‐
71              plies to job allocations.
72
73
74       --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
75              Define  the  job  accounting and profiling sampling intervals in
76              seconds.  This can be used  to  override  the  JobAcctGatherFre‐
77              quency  parameter  in the slurm.conf file. <datatype>=<interval>
78              specifies the task  sampling  interval  for  the  jobacct_gather
79              plugin  or  a  sampling  interval  for  a  profiling type by the
80              acct_gather_profile     plugin.     Multiple     comma-separated
81              <datatype>=<interval> pairs may be specified. Supported datatype
82              values are:
83
84              task        Sampling interval for the jobacct_gather plugins and
85                          for   task   profiling  by  the  acct_gather_profile
86                          plugin.
87                          NOTE: This frequency is used to monitor  memory  us‐
88                          age.  If memory limits are enforced the highest fre‐
89                          quency a user can request is what is  configured  in
90                          the slurm.conf file.  It can not be disabled.
91
92              energy      Sampling  interval  for  energy  profiling using the
93                          acct_gather_energy plugin.
94
95              network     Sampling interval for infiniband profiling using the
96                          acct_gather_interconnect plugin.
97
98              filesystem  Sampling interval for filesystem profiling using the
99                          acct_gather_filesystem plugin.
100
101
102              The default value for the task sampling interval is 30  seconds.
103              The  default value for all other intervals is 0.  An interval of
104              0 disables sampling of the specified type.  If the task sampling
105              interval  is  0, accounting information is collected only at job
106              termination (reducing Slurm interference with the job).
107              Smaller (non-zero) values have a greater impact upon job perfor‐
108              mance,  but a value of 30 seconds is not likely to be noticeable
109              for applications having less than 10,000 tasks. This option  ap‐
110              plies to job allocations.
111
112
113       --bb=<spec>
114              Burst  buffer  specification.  The  form of the specification is
115              system dependent.  Also see --bbf. This option  applies  to  job
116              allocations.   When  the  --bb option is used, Slurm parses this
117              option and creates a temporary burst buffer script file that  is
118              used  internally  by the burst buffer plugins. See Slurm's burst
119              buffer guide for more information and examples:
120              https://slurm.schedmd.com/burst_buffer.html
121
122
123       --bbf=<file_name>
124              Path of file containing burst buffer specification.  The form of
125              the  specification is system dependent.  Also see --bb. This op‐
126              tion applies to job allocations.  See Slurm's burst buffer guide
127              for more information and examples:
128              https://slurm.schedmd.com/burst_buffer.html
129
130
131       --bcast[=<dest_path>]
132              Copy executable file to allocated compute nodes.  If a file name
133              is specified, copy the executable to the  specified  destination
134              file path.  If the path specified ends with '/' it is treated as
135              a target directory,  and  the  destination  file  name  will  be
136              slurm_bcast_<job_id>.<step_id>_<nodename>.   If  no dest_path is
137              specified and the slurm.conf BcastParameters DestDir is  config‐
138              ured  then  it  is used, and the filename follows the above pat‐
139              tern. If none of the previous  is  specified,  then  --chdir  is
140              used, and the filename follows the above pattern too.  For exam‐
141              ple, "srun --bcast=/tmp/mine  -N3  a.out"  will  copy  the  file
142              "a.out"  from  your current directory to the file "/tmp/mine" on
143              each of the three allocated compute nodes and execute that file.
144              This option applies to step allocations.
145
146
147       --bcast-exclude={NONE|<exclude_path>[,<exclude_path>...]}
148              Comma-separated  list of absolute directory paths to be excluded
149              when autodetecting and broadcasting executable shared object de‐
150              pendencies through --bcast. If the keyword "NONE" is configured,
151              no directory paths will be excluded. The default value  is  that
152              of  slurm.conf  BcastExclude  and  this option overrides it. See
153              also --bcast and --send-libs.
154
155
156       -b, --begin=<time>
157              Defer initiation of this job until the specified time.   It  ac‐
158              cepts times of the form HH:MM:SS to run a job at a specific time
159              of day (seconds are optional).  (If that time is  already  past,
160              the  next day is assumed.)  You may also specify midnight, noon,
161              fika (3 PM) or teatime (4 PM) and you  can  have  a  time-of-day
162              suffixed  with  AM  or  PM  for  running  in  the morning or the
163              evening.  You can also say what day the  job  will  be  run,  by
164              specifying  a  date  of  the form MMDDYY or MM/DD/YY YYYY-MM-DD.
165              Combine   date   and   time   using   the    following    format
166              YYYY-MM-DD[THH:MM[:SS]].  You  can  also  give  times like now +
167              count time-units, where the time-units can be seconds (default),
168              minutes, hours, days, or weeks and you can tell Slurm to run the
169              job today with the keyword today and to  run  the  job  tomorrow
170              with  the  keyword tomorrow.  The value may be changed after job
171              submission using the scontrol command.  For example:
172                 --begin=16:00
173                 --begin=now+1hour
174                 --begin=now+60           (seconds by default)
175                 --begin=2010-01-20T12:34:00
176
177
178              Notes on date/time specifications:
179               - Although the 'seconds' field of the HH:MM:SS time  specifica‐
180              tion  is  allowed  by  the  code, note that the poll time of the
181              Slurm scheduler is not precise enough to guarantee  dispatch  of
182              the  job on the exact second.  The job will be eligible to start
183              on the next poll following the specified time.  The  exact  poll
184              interval  depends  on the Slurm scheduler (e.g., 60 seconds with
185              the default sched/builtin).
186               -  If  no  time  (HH:MM:SS)  is  specified,  the   default   is
187              (00:00:00).
188               -  If a date is specified without a year (e.g., MM/DD) then the
189              current year is assumed, unless the  combination  of  MM/DD  and
190              HH:MM:SS  has  already  passed  for that year, in which case the
191              next year is used.
192              This option applies to job allocations.
193
194
195       -D, --chdir=<path>
196              Have the remote processes do a chdir to  path  before  beginning
197              execution. The default is to chdir to the current working direc‐
198              tory of the srun process. The path can be specified as full path
199              or relative path to the directory where the command is executed.
200              This option applies to job allocations.
201
202
203       --cluster-constraint=<list>
204              Specifies features that a federated cluster must have to have  a
205              sibling job submitted to it. Slurm will attempt to submit a sib‐
206              ling job to a cluster if it has at least one  of  the  specified
207              features.
208
209
210       -M, --clusters=<string>
211              Clusters  to  issue  commands to.  Multiple cluster names may be
212              comma separated.  The job will be submitted to the  one  cluster
213              providing the earliest expected job initiation time. The default
214              value is the current cluster. A value of 'all' will query to run
215              on  all  clusters.  Note the --export option to control environ‐
216              ment variables exported between clusters.  This  option  applies
217              only  to job allocations.  Note that the SlurmDBD must be up for
218              this option to work properly.
219
220
221       --comment=<string>
222              An arbitrary comment. This option applies to job allocations.
223
224
225       --compress[=type]
226              Compress file before sending it to compute hosts.  The  optional
227              argument specifies the data compression library to be used.  The
228              default is BcastParameters Compression= if set or  "lz4"  other‐
229              wise.   Supported  values are "lz4".  Some compression libraries
230              may be unavailable on some systems.  For use  with  the  --bcast
231              option. This option applies to step allocations.
232
233
234       -C, --constraint=<list>
235              Nodes  can  have features assigned to them by the Slurm adminis‐
236              trator.  Users can specify which of these features are  required
237              by  their  job  using  the constraint option.  Only nodes having
238              features matching the job constraints will be  used  to  satisfy
239              the  request.   Multiple  constraints may be specified with AND,
240              OR, matching OR, resource counts, etc. (some operators  are  not
241              supported  on  all  system types).  Supported constraint options
242              include:
243
244              Single Name
245                     Only nodes which have the specified feature will be used.
246                     For example, --constraint="intel"
247
248              Node Count
249                     A  request  can  specify  the number of nodes needed with
250                     some feature by appending an asterisk and count after the
251                     feature    name.     For   example,   --nodes=16   --con‐
252                     straint="graphics*4 ..."  indicates that the job requires
253                     16  nodes and that at least four of those nodes must have
254                     the feature "graphics."
255
256              AND    If only nodes with all  of  specified  features  will  be
257                     used.   The  ampersand  is used for an AND operator.  For
258                     example, --constraint="intel&gpu"
259
260              OR     If only nodes with at least  one  of  specified  features
261                     will  be used.  The vertical bar is used for an OR opera‐
262                     tor.  For example, --constraint="intel|amd"
263
264              Matching OR
265                     If only one of a set of possible options should  be  used
266                     for all allocated nodes, then use the OR operator and en‐
267                     close the options within square brackets.   For  example,
268                     --constraint="[rack1|rack2|rack3|rack4]" might be used to
269                     specify that all nodes must be allocated on a single rack
270                     of the cluster, but any of those four racks can be used.
271
272              Multiple Counts
273                     Specific counts of multiple resources may be specified by
274                     using the AND operator and enclosing the  options  within
275                     square      brackets.       For      example,      --con‐
276                     straint="[rack1*2&rack2*4]" might be used to specify that
277                     two  nodes  must be allocated from nodes with the feature
278                     of "rack1" and four nodes must be  allocated  from  nodes
279                     with the feature "rack2".
280
281                     NOTE:  This construct does not support multiple Intel KNL
282                     NUMA  or  MCDRAM  modes.  For   example,   while   --con‐
283                     straint="[(knl&quad)*2&(knl&hemi)*4]"  is  not supported,
284                     --constraint="[haswell*2&(knl&hemi)*4]"   is   supported.
285                     Specification of multiple KNL modes requires the use of a
286                     heterogeneous job.
287
288              Brackets
289                     Brackets can be used to indicate that you are looking for
290                     a  set of nodes with the different requirements contained
291                     within    the    brackets.    For     example,     --con‐
292                     straint="[(rack1|rack2)*1&(rack3)*2]"  will  get  you one
293                     node with either the "rack1" or "rack2" features and  two
294                     nodes with the "rack3" feature.  The same request without
295                     the brackets will try to find a single  node  that  meets
296                     those requirements.
297
298                     NOTE:  Brackets are only reserved for Multiple Counts and
299                     Matching OR syntax.  AND operators require  a  count  for
300                     each     feature    inside    square    brackets    (i.e.
301                     "[quad*2&hemi*1]").
302
303              Parenthesis
304                     Parenthesis can be used to group like node  features  to‐
305                     gether.           For           example,           --con‐
306                     straint="[(knl&snc4&flat)*4&haswell*1]" might be used  to
307                     specify  that  four nodes with the features "knl", "snc4"
308                     and "flat" plus one node with the feature  "haswell"  are
309                     required.   All  options  within  parenthesis  should  be
310                     grouped with AND (e.g. "&") operands.
311
312              WARNING: When srun is executed from within salloc or sbatch, the
313              constraint value can only contain a single feature name. None of
314              the other operators are currently supported for job steps.
315              This option applies to job and step allocations.
316
317
318       --container=<path_to_container>
319              Absolute path to OCI container bundle.
320
321
322       --contiguous
323              If set, then the allocated nodes must form a contiguous set.
324
325              NOTE: If SelectPlugin=cons_res this option won't be honored with
326              the  topology/tree  or  topology/3d_torus plugins, both of which
327              can modify the node ordering. This option applies to job alloca‐
328              tions.
329
330
331       -S, --core-spec=<num>
332              Count of specialized cores per node reserved by the job for sys‐
333              tem operations and not used by the application. The  application
334              will  not use these cores, but will be charged for their alloca‐
335              tion.  Default value is dependent  upon  the  node's  configured
336              CoreSpecCount  value.   If a value of zero is designated and the
337              Slurm configuration option AllowSpecResourcesUsage  is  enabled,
338              the  job  will  be allowed to override CoreSpecCount and use the
339              specialized resources on nodes it is allocated.  This option can
340              not  be  used with the --thread-spec option. This option applies
341              to job allocations.
342              NOTE: This option may implicitly impact the number of  tasks  if
343              -n was not specified.
344
345
346       --cores-per-socket=<cores>
347              Restrict  node  selection  to  nodes with at least the specified
348              number of cores per socket.  See additional information under -B
349              option  above  when task/affinity plugin is enabled. This option
350              applies to job allocations.
351
352
353       --cpu-bind=[{quiet|verbose},]<type>
354              Bind tasks  to  CPUs.   Used  only  when  the  task/affinity  or
355              task/cgroup  plugin  is enabled.  NOTE: To have Slurm always re‐
356              port on the selected CPU binding for all commands executed in  a
357              shell, you can enable verbose mode by setting the SLURM_CPU_BIND
358              environment variable value to "verbose".
359
360              The following informational environment variables are  set  when
361              --cpu-bind is in use:
362                   SLURM_CPU_BIND_VERBOSE
363                   SLURM_CPU_BIND_TYPE
364                   SLURM_CPU_BIND_LIST
365
366              See  the  ENVIRONMENT  VARIABLES section for a more detailed de‐
367              scription of  the  individual  SLURM_CPU_BIND  variables.  These
368              variable  are available only if the task/affinity plugin is con‐
369              figured.
370
371              When using --cpus-per-task to run multithreaded tasks, be  aware
372              that  CPU  binding  is inherited from the parent of the process.
373              This means that the multithreaded task should either specify  or
374              clear  the CPU binding itself to avoid having all threads of the
375              multithreaded task use the same mask/CPU as the parent.   Alter‐
376              natively,  fat  masks (masks which specify more than one allowed
377              CPU) could be used for the tasks in order  to  provide  multiple
378              CPUs for the multithreaded tasks.
379
380              Note  that a job step can be allocated different numbers of CPUs
381              on each node or be allocated CPUs not starting at location zero.
382              Therefore  one  of  the options which automatically generate the
383              task binding is  recommended.   Explicitly  specified  masks  or
384              bindings  are  only honored when the job step has been allocated
385              every available CPU on the node.
386
387              Binding a task to a NUMA locality domain means to bind the  task
388              to  the  set  of CPUs that belong to the NUMA locality domain or
389              "NUMA node".  If NUMA locality domain options are used  on  sys‐
390              tems  with no NUMA support, then each socket is considered a lo‐
391              cality domain.
392
393              If the --cpu-bind option is not used, the default  binding  mode
394              will  depend  upon Slurm's configuration and the step's resource
395              allocation.  If all allocated nodes  have  the  same  configured
396              CpuBind  mode, that will be used.  Otherwise if the job's Parti‐
397              tion has a configured CpuBind mode, that will be  used.   Other‐
398              wise  if Slurm has a configured TaskPluginParam value, that mode
399              will be used.  Otherwise automatic binding will be performed  as
400              described below.
401
402
403              Auto Binding
404                     Applies  only  when  task/affinity is enabled. If the job
405                     step allocation includes an allocation with a  number  of
406                     sockets,  cores,  or threads equal to the number of tasks
407                     times cpus-per-task, then the tasks will  by  default  be
408                     bound  to  the appropriate resources (auto binding). Dis‐
409                     able  this  mode  of  operation  by  explicitly   setting
410                     "--cpu-bind=none".        Use       TaskPluginParam=auto‐
411                     bind=[threads|cores|sockets] to set a default cpu binding
412                     in case "auto binding" doesn't find a match.
413
414              Supported options include:
415
416                     q[uiet]
417                            Quietly bind before task runs (default)
418
419                     v[erbose]
420                            Verbosely report binding before task runs
421
422                     no[ne] Do  not  bind  tasks  to CPUs (default unless auto
423                            binding is applied)
424
425                     rank   Automatically bind by task rank.  The lowest  num‐
426                            bered  task  on  each  node is bound to socket (or
427                            core or thread) zero, etc.  Not  supported  unless
428                            the entire node is allocated to the job.
429
430                     map_cpu:<list>
431                            Bind  by  setting CPU masks on tasks (or ranks) as
432                            specified         where         <list>          is
433                            <cpu_id_for_task_0>,<cpu_id_for_task_1>,...    CPU
434                            IDs are interpreted as decimal values unless  they
435                            are  preceded  with '0x' in which case they inter‐
436                            preted as hexadecimal values.  If  the  number  of
437                            tasks (or ranks) exceeds the number of elements in
438                            this list, elements in the list will be reused  as
439                            needed  starting  from  the beginning of the list.
440                            To simplify support for  large  task  counts,  the
441                            lists  may follow a map with an asterisk and repe‐
442                            tition         count.          For         example
443                            "map_cpu:0x0f*4,0xf0*4".
444
445                     mask_cpu:<list>
446                            Bind  by  setting CPU masks on tasks (or ranks) as
447                            specified         where         <list>          is
448                            <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
449                            The mapping is specified for a node and  identical
450                            mapping  is  applied  to  the  tasks on every node
451                            (i.e. the lowest task ID on each node is mapped to
452                            the  first mask specified in the list, etc.).  CPU
453                            masks are always interpreted as hexadecimal values
454                            but can be preceded with an optional '0x'.  If the
455                            number of tasks (or ranks) exceeds the  number  of
456                            elements  in  this list, elements in the list will
457                            be reused as needed starting from the beginning of
458                            the  list.   To  simplify  support  for large task
459                            counts, the lists may follow a map with an  aster‐
460                            isk    and    repetition   count.    For   example
461                            "mask_cpu:0x0f*4,0xf0*4".
462
463                     rank_ldom
464                            Bind to a NUMA locality domain by rank.  Not  sup‐
465                            ported  unless the entire node is allocated to the
466                            job.
467
468                     map_ldom:<list>
469                            Bind by mapping NUMA locality domain IDs to  tasks
470                            as       specified       where      <list>      is
471                            <ldom1>,<ldom2>,...<ldomN>.  The  locality  domain
472                            IDs  are interpreted as decimal values unless they
473                            are preceded with '0x' in which case they are  in‐
474                            terpreted  as  hexadecimal  values.  Not supported
475                            unless the entire node is allocated to the job.
476
477                     mask_ldom:<list>
478                            Bind by setting  NUMA  locality  domain  masks  on
479                            tasks     as    specified    where    <list>    is
480                            <mask1>,<mask2>,...<maskN>.  NUMA locality  domain
481                            masks are always interpreted as hexadecimal values
482                            but can be preceded with an  optional  '0x'.   Not
483                            supported  unless  the entire node is allocated to
484                            the job.
485
486                     sockets
487                            Automatically  generate  masks  binding  tasks  to
488                            sockets.   Only  the CPUs on the socket which have
489                            been allocated to the job will be  used.   If  the
490                            number  of  tasks differs from the number of allo‐
491                            cated sockets this can result in sub-optimal bind‐
492                            ing.
493
494                     cores  Automatically  generate  masks  binding  tasks  to
495                            cores.  If the number of tasks  differs  from  the
496                            number  of  allocated  cores  this  can  result in
497                            sub-optimal binding.
498
499                     threads
500                            Automatically  generate  masks  binding  tasks  to
501                            threads.   If the number of tasks differs from the
502                            number of allocated threads  this  can  result  in
503                            sub-optimal binding.
504
505                     ldoms  Automatically generate masks binding tasks to NUMA
506                            locality domains.  If the number of tasks  differs
507                            from the number of allocated locality domains this
508                            can result in sub-optimal binding.
509
510                     help   Show help message for cpu-bind
511
512              This option applies to job and step allocations.
513
514
515       --cpu-freq=<p1>[-p2[:p3]]
516
517              Request that the job step initiated by this srun command be  run
518              at  some  requested  frequency if possible, on the CPUs selected
519              for the step on the compute node(s).
520
521              p1 can be  [#### | low | medium | high | highm1] which will  set
522              the  frequency scaling_speed to the corresponding value, and set
523              the frequency scaling_governor to UserSpace. See below for defi‐
524              nition of the values.
525
526              p1  can  be  [Conservative | OnDemand | Performance | PowerSave]
527              which will set the scaling_governor to the corresponding  value.
528              The  governor has to be in the list set by the slurm.conf option
529              CpuFreqGovernors.
530
531              When p2 is present, p1 will be the minimum scaling frequency and
532              p2 will be the maximum scaling frequency.
533
534              p2  can  be   [#### | medium | high | highm1] p2 must be greater
535              than p1.
536
537              p3 can be [Conservative | OnDemand | Performance |  PowerSave  |
538              SchedUtil | UserSpace] which will set the governor to the corre‐
539              sponding value.
540
541              If p3 is UserSpace, the frequency scaling_speed will be set by a
542              power  or energy aware scheduling strategy to a value between p1
543              and p2 that lets the job run within the site's power  goal.  The
544              job  may be delayed if p1 is higher than a frequency that allows
545              the job to run within the goal.
546
547              If the current frequency is < min, it will be set to min.  Like‐
548              wise, if the current frequency is > max, it will be set to max.
549
550              Acceptable values at present include:
551
552              ####          frequency in kilohertz
553
554              Low           the lowest available frequency
555
556              High          the highest available frequency
557
558              HighM1        (high  minus  one)  will  select  the next highest
559                            available frequency
560
561              Medium        attempts to set a frequency in the middle  of  the
562                            available range
563
564              Conservative  attempts to use the Conservative CPU governor
565
566              OnDemand      attempts to use the OnDemand CPU governor (the de‐
567                            fault value)
568
569              Performance   attempts to use the Performance CPU governor
570
571              PowerSave     attempts to use the PowerSave CPU governor
572
573              UserSpace     attempts to use the UserSpace CPU governor
574
575
576              The following informational environment variable  is  set
577              in the job
578              step when --cpu-freq option is requested.
579                      SLURM_CPU_FREQ_REQ
580
581              This  environment  variable can also be used to supply the value
582              for the CPU frequency request if it is set when the 'srun'  com‐
583              mand  is  issued.  The --cpu-freq on the command line will over‐
584              ride the environment variable value.  The form on  the  environ‐
585              ment variable is the same as the command line.  See the ENVIRON‐
586              MENT   VARIABLES   section   for   a    description    of    the
587              SLURM_CPU_FREQ_REQ variable.
588
589              NOTE: This parameter is treated as a request, not a requirement.
590              If the job step's node does not support  setting  the  CPU  fre‐
591              quency,  or the requested value is outside the bounds of the le‐
592              gal frequencies, an error is logged, but the job step is allowed
593              to continue.
594
595              NOTE:  Setting  the  frequency for just the CPUs of the job step
596              implies that the tasks are confined to those CPUs.  If task con‐
597              finement    (i.e.,    TaskPlugin=task/affinity    or    TaskPlu‐
598              gin=task/cgroup with the "ConstrainCores" option) is not config‐
599              ured, this parameter is ignored.
600
601              NOTE:  When  the  step  completes, the frequency and governor of
602              each selected CPU is reset to the previous values.
603
604              NOTE: When submitting jobs with  the --cpu-freq option with lin‐
605              uxproc  as  the  ProctrackType can cause jobs to run too quickly
606              before Accounting is able to poll for job information. As a  re‐
607              sult not all of accounting information will be present.
608
609              This option applies to job and step allocations.
610
611
612       --cpus-per-gpu=<ncpus>
613              Advise  Slurm  that ensuing job steps will require ncpus proces‐
614              sors per allocated GPU.  Not compatible with the --cpus-per-task
615              option.
616
617
618       -c, --cpus-per-task=<ncpus>
619              Request  that ncpus be allocated per process. This may be useful
620              if the job is multithreaded and requires more than one  CPU  per
621              task  for optimal performance. Explicitly requesting this option
622              implies --exact. The default is one CPU per process and does not
623              imply  --exact.   If  -c  is specified without -n, as many tasks
624              will be allocated per node as possible while satisfying  the  -c
625              restriction.  For  instance on a cluster with 8 CPUs per node, a
626              job request for 4 nodes and 3 CPUs per task may be  allocated  3
627              or  6  CPUs  per node (1 or 2 tasks per node) depending upon re‐
628              source consumption by other jobs. Such a job may  be  unable  to
629              execute more than a total of 4 tasks.
630
631              WARNING:  There  are configurations and options interpreted dif‐
632              ferently by job and job step requests which can result in incon‐
633              sistencies    for   this   option.    For   example   srun   -c2
634              --threads-per-core=1 prog may allocate two cores  for  the  job,
635              but if each of those cores contains two threads, the job alloca‐
636              tion will include four CPUs. The job step allocation  will  then
637              launch two threads per CPU for a total of two tasks.
638
639              WARNING:  When  srun  is  executed from within salloc or sbatch,
640              there are configurations and options which can result in  incon‐
641              sistent  allocations when -c has a value greater than -c on sal‐
642              loc or sbatch.
643
644              This option applies to job and step allocations.
645
646
647       --deadline=<OPT>
648              remove the job if no ending is  possible  before  this  deadline
649              (start  >  (deadline  -  time[-min])).   Default is no deadline.
650              Valid time formats are:
651              HH:MM[:SS] [AM|PM]
652              MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
653              MM/DD[/YY]-HH:MM[:SS]
654              YYYY-MM-DD[THH:MM[:SS]]]
655              now[+count[seconds(default)|minutes|hours|days|weeks]]
656
657              This option applies only to job allocations.
658
659
660       --delay-boot=<minutes>
661              Do not reboot nodes in order to  satisfied  this  job's  feature
662              specification  if the job has been eligible to run for less than
663              this time period.  If the job has waited for less than the spec‐
664              ified  period,  it  will  use  only nodes which already have the
665              specified features.  The argument is in units of minutes.  A de‐
666              fault  value  may be set by a system administrator using the de‐
667              lay_boot option of the SchedulerParameters configuration parame‐
668              ter  in the slurm.conf file, otherwise the default value is zero
669              (no delay).
670
671              This option applies only to job allocations.
672
673
674       -d, --dependency=<dependency_list>
675              Defer the start of this job  until  the  specified  dependencies
676              have been satisfied completed. This option does not apply to job
677              steps (executions of srun within an existing  salloc  or  sbatch
678              allocation)  only  to  job allocations.  <dependency_list> is of
679              the   form   <type:job_id[:job_id][,type:job_id[:job_id]]>    or
680              <type:job_id[:job_id][?type:job_id[:job_id]]>.  All dependencies
681              must be satisfied if the "," separator is used.  Any  dependency
682              may be satisfied if the "?" separator is used.  Only one separa‐
683              tor may be used.  Many jobs can share the  same  dependency  and
684              these  jobs  may even belong to different  users. The  value may
685              be changed after job submission using the scontrol command.  De‐
686              pendencies  on  remote jobs are allowed in a federation.  Once a
687              job dependency fails due to the termination state of a preceding
688              job,  the dependent job will never be run, even if the preceding
689              job is requeued and has a different termination state in a  sub‐
690              sequent execution. This option applies to job allocations.
691
692              after:job_id[[+time][:jobid[+time]...]]
693                     After  the  specified  jobs  start  or  are cancelled and
694                     'time' in minutes from job start or cancellation happens,
695                     this  job can begin execution. If no 'time' is given then
696                     there is no delay after start or cancellation.
697
698              afterany:job_id[:jobid...]
699                     This job can begin execution  after  the  specified  jobs
700                     have terminated.
701
702              afterburstbuffer:job_id[:jobid...]
703                     This  job  can  begin  execution after the specified jobs
704                     have terminated and any associated burst buffer stage out
705                     operations have completed.
706
707              aftercorr:job_id[:jobid...]
708                     A  task  of  this job array can begin execution after the
709                     corresponding task ID in the specified job has  completed
710                     successfully  (ran  to  completion  with  an exit code of
711                     zero).
712
713              afternotok:job_id[:jobid...]
714                     This job can begin execution  after  the  specified  jobs
715                     have terminated in some failed state (non-zero exit code,
716                     node failure, timed out, etc).
717
718              afterok:job_id[:jobid...]
719                     This job can begin execution  after  the  specified  jobs
720                     have  successfully  executed  (ran  to completion with an
721                     exit code of zero).
722
723              singleton
724                     This  job  can  begin  execution  after  any   previously
725                     launched  jobs  sharing  the  same job name and user have
726                     terminated.  In other words, only one job  by  that  name
727                     and owned by that user can be running or suspended at any
728                     point in time.  In a federation, a  singleton  dependency
729                     must be fulfilled on all clusters unless DependencyParam‐
730                     eters=disable_remote_singleton is used in slurm.conf.
731
732
733       -X, --disable-status
734              Disable the display of task status when srun receives  a  single
735              SIGINT  (Ctrl-C).  Instead immediately forward the SIGINT to the
736              running job.  Without this option a second Ctrl-C in one  second
737              is  required to forcibly terminate the job and srun will immedi‐
738              ately exit.  May  also  be  set  via  the  environment  variable
739              SLURM_DISABLE_STATUS. This option applies to job allocations.
740
741
742       -m,                                --distribution={*|block|cyclic|arbi‐
743       trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
744
745              Specify  alternate  distribution  methods  for remote processes.
746              For job allocation, this sets environment variables that will be
747              used  by  subsequent  srun requests and also affects which cores
748              will be selected for job allocation.
749
750              This option controls the distribution of tasks to the  nodes  on
751              which  resources  have  been  allocated, and the distribution of
752              those resources to tasks for binding (task affinity). The  first
753              distribution  method (before the first ":") controls the distri‐
754              bution of tasks to nodes.  The second distribution method (after
755              the  first  ":")  controls  the  distribution  of allocated CPUs
756              across sockets for binding  to  tasks.  The  third  distribution
757              method (after the second ":") controls the distribution of allo‐
758              cated CPUs across cores for binding to tasks.   The  second  and
759              third distributions apply only if task affinity is enabled.  The
760              third distribution is supported only if the  task/cgroup  plugin
761              is  configured.  The default value for each distribution type is
762              specified by *.
763
764              Note that with select/cons_res and select/cons_tres, the  number
765              of  CPUs allocated to each socket and node may be different. Re‐
766              fer to https://slurm.schedmd.com/mc_support.html for more infor‐
767              mation  on  resource allocation, distribution of tasks to nodes,
768              and binding of tasks to CPUs.
769              First distribution method (distribution of tasks across nodes):
770
771
772              *      Use the default method for distributing  tasks  to  nodes
773                     (block).
774
775              block  The  block distribution method will distribute tasks to a
776                     node such that consecutive tasks share a node. For  exam‐
777                     ple,  consider an allocation of three nodes each with two
778                     cpus. A four-task block distribution  request  will  dis‐
779                     tribute  those  tasks to the nodes with tasks one and two
780                     on the first node, task three on  the  second  node,  and
781                     task  four  on the third node.  Block distribution is the
782                     default behavior if the number of tasks exceeds the  num‐
783                     ber of allocated nodes.
784
785              cyclic The cyclic distribution method will distribute tasks to a
786                     node such that consecutive  tasks  are  distributed  over
787                     consecutive  nodes  (in a round-robin fashion). For exam‐
788                     ple, consider an allocation of three nodes each with  two
789                     cpus.  A  four-task cyclic distribution request will dis‐
790                     tribute those tasks to the nodes with tasks one and  four
791                     on  the first node, task two on the second node, and task
792                     three on the third node.  Note that  when  SelectType  is
793                     select/cons_res, the same number of CPUs may not be allo‐
794                     cated on each node. Task distribution will be round-robin
795                     among  all  the  nodes  with  CPUs  yet to be assigned to
796                     tasks.  Cyclic distribution is the  default  behavior  if
797                     the number of tasks is no larger than the number of allo‐
798                     cated nodes.
799
800              plane  The tasks are distributed in blocks of size  <size>.  The
801                     size  must  be given or SLURM_DIST_PLANESIZE must be set.
802                     The number of tasks distributed to each node is the  same
803                     as  for  cyclic distribution, but the taskids assigned to
804                     each node depend on the plane size. Additional  distribu‐
805                     tion  specifications cannot be combined with this option.
806                     For  more  details  (including  examples  and  diagrams),
807                     please  see https://slurm.schedmd.com/mc_support.html and
808                     https://slurm.schedmd.com/dist_plane.html
809
810              arbitrary
811                     The arbitrary method of distribution will  allocate  pro‐
812                     cesses in-order as listed in file designated by the envi‐
813                     ronment variable SLURM_HOSTFILE.   If  this  variable  is
814                     listed  it will over ride any other method specified.  If
815                     not set the method will default  to  block.   Inside  the
816                     hostfile  must contain at minimum the number of hosts re‐
817                     quested and be one per line or comma separated.  If spec‐
818                     ifying  a  task count (-n, --ntasks=<number>), your tasks
819                     will be laid out on the nodes in the order of the file.
820                     NOTE: The arbitrary distribution option on a job  alloca‐
821                     tion  only  controls the nodes to be allocated to the job
822                     and not the allocation of CPUs on those nodes.  This  op‐
823                     tion is meant primarily to control a job step's task lay‐
824                     out in an existing job allocation for the srun command.
825                     NOTE: If the number of tasks is given and a list  of  re‐
826                     quested  nodes  is  also  given, the number of nodes used
827                     from that list will be reduced to match that of the  num‐
828                     ber  of  tasks  if  the  number  of  nodes in the list is
829                     greater than the number of tasks.
830
831
832              Second distribution method (distribution of CPUs across  sockets
833              for binding):
834
835
836              *      Use the default method for distributing CPUs across sock‐
837                     ets (cyclic).
838
839              block  The block distribution method will  distribute  allocated
840                     CPUs  consecutively  from  the same socket for binding to
841                     tasks, before using the next consecutive socket.
842
843              cyclic The cyclic distribution method will distribute  allocated
844                     CPUs  for  binding to a given task consecutively from the
845                     same socket, and from the next consecutive socket for the
846                     next  task,  in  a  round-robin  fashion  across sockets.
847                     Tasks requiring more than one CPU will have all of  those
848                     CPUs allocated on a single socket if possible.
849
850              fcyclic
851                     The fcyclic distribution method will distribute allocated
852                     CPUs for binding to tasks from consecutive sockets  in  a
853                     round-robin  fashion across the sockets.  Tasks requiring
854                     more than one CPU will have  each  CPUs  allocated  in  a
855                     cyclic fashion across sockets.
856
857
858              Third distribution method (distribution of CPUs across cores for
859              binding):
860
861
862              *      Use the default method for distributing CPUs across cores
863                     (inherited from second distribution method).
864
865              block  The  block  distribution method will distribute allocated
866                     CPUs consecutively from the  same  core  for  binding  to
867                     tasks, before using the next consecutive core.
868
869              cyclic The  cyclic distribution method will distribute allocated
870                     CPUs for binding to a given task consecutively  from  the
871                     same  core,  and  from  the next consecutive core for the
872                     next task, in a round-robin fashion across cores.
873
874              fcyclic
875                     The fcyclic distribution method will distribute allocated
876                     CPUs  for  binding  to  tasks from consecutive cores in a
877                     round-robin fashion across the cores.
878
879
880
881              Optional control for task distribution over nodes:
882
883
884              Pack   Rather than evenly distributing a job step's tasks evenly
885                     across  its allocated nodes, pack them as tightly as pos‐
886                     sible on the nodes.  This only applies when  the  "block"
887                     task distribution method is used.
888
889              NoPack Rather than packing a job step's tasks as tightly as pos‐
890                     sible on the nodes, distribute them  evenly.   This  user
891                     option    will    supersede    the   SelectTypeParameters
892                     CR_Pack_Nodes configuration parameter.
893
894              This option applies to job and step allocations.
895
896
897       --epilog={none|<executable>}
898              srun will run executable just after the job step completes.  The
899              command  line  arguments  for executable will be the command and
900              arguments of the job step.  If none is specified, then  no  srun
901              epilog  will be run. This parameter overrides the SrunEpilog pa‐
902              rameter in slurm.conf. This parameter is completely  independent
903              from  the Epilog parameter in slurm.conf. This option applies to
904              job allocations.
905
906
907       -e, --error=<filename_pattern>
908              Specify how stderr is to be redirected. By default  in  interac‐
909              tive  mode, srun redirects stderr to the same file as stdout, if
910              one is specified. The --error option is provided to allow stdout
911              and  stderr to be redirected to different locations.  See IO Re‐
912              direction below for more options.  If the specified file already
913              exists,  it  will be overwritten. This option applies to job and
914              step allocations.
915
916
917       --exact
918              Allow a step access to only  the  resources  requested  for  the
919              step.   By  default,  all non-GRES resources on each node in the
920              step allocation will be used. This option only applies  to  step
921              allocations.
922              NOTE:  Parallel  steps  will either be blocked or rejected until
923              requested step resources are available unless --overlap is spec‐
924              ified. Job resources can be held after the completion of an srun
925              command while Slurm does job cleanup. Step epilogs and/or  SPANK
926              plugins can further delay the release of step resources.
927
928
929       -x, --exclude={<host1[,<host2>...]|<filename>}
930              Request that a specific list of hosts not be included in the re‐
931              sources allocated to this job. The host list will be assumed  to
932              be  a  filename  if it contains a "/" character. This option ap‐
933              plies to job and step allocations.
934
935
936       --exclusive[={user|mcs}]
937              This option applies to job and job step allocations, and has two
938              slightly different meanings for each one.  When used to initiate
939              a job, the job allocation cannot share nodes with other  running
940              jobs  (or just other users with the "=user" option or "=mcs" op‐
941              tion).  If user/mcs are not specified (i.e. the  job  allocation
942              can  not  share nodes with other running jobs), the job is allo‐
943              cated all CPUs and GRES on all nodes in the allocation,  but  is
944              only allocated as much memory as it requested. This is by design
945              to support gang scheduling, because suspended jobs still  reside
946              in  memory.  To  request  all the memory on a node, use --mem=0.
947              The default shared/exclusive behavior depends on system configu‐
948              ration and the partition's OverSubscribe option takes precedence
949              over the job's option.
950
951              This option can also be used when initiating more than  one  job
952              step within an existing resource allocation (default), where you
953              want separate processors to be dedicated to each  job  step.  If
954              sufficient  processors  are  not  available  to initiate the job
955              step, it will be deferred. This can be thought of as providing a
956              mechanism  for resource management to the job within its alloca‐
957              tion (--exact implied).
958
959              The exclusive allocation of CPUs applies to  job  steps  by  de‐
960              fault. In order to share the resources use the --overlap option.
961
962              See EXAMPLE below.
963
964
965       --export={[ALL,]<environment_variables>|ALL|NONE}
966              Identify  which  environment variables from the submission envi‐
967              ronment are propagated to the launched application.
968
969              --export=ALL
970                        Default mode if --export is not specified. All of  the
971                        user's  environment  will  be loaded from the caller's
972                        environment.
973
974
975              --export=NONE
976                        None of the user environment  will  be  defined.  User
977                        must  use  absolute  path to the binary to be executed
978                        that will define the environment. User can not specify
979                        explicit environment variables with "NONE".
980
981                        This  option  is  particularly important for jobs that
982                        are submitted on one cluster and execute on a  differ‐
983                        ent  cluster  (e.g.  with  different paths).  To avoid
984                        steps inheriting  environment  export  settings  (e.g.
985                        "NONE")  from  sbatch command, either set --export=ALL
986                        or the environment variable SLURM_EXPORT_ENV should be
987                        set to "ALL".
988
989              --export=[ALL,]<environment_variables>
990                        Exports  all  SLURM*  environment variables along with
991                        explicitly  defined  variables.  Multiple  environment
992                        variable names should be comma separated.  Environment
993                        variable names may be specified to propagate the  cur‐
994                        rent value (e.g. "--export=EDITOR") or specific values
995                        may be exported  (e.g.  "--export=EDITOR=/bin/emacs").
996                        If "ALL" is specified, then all user environment vari‐
997                        ables will be loaded and will take precedence over any
998                        explicitly given environment variables.
999
1000                   Example: --export=EDITOR,ARG1=test
1001                        In  this example, the propagated environment will only
1002                        contain the variable EDITOR from the  user's  environ‐
1003                        ment, SLURM_* environment variables, and ARG1=test.
1004
1005                   Example: --export=ALL,EDITOR=/bin/emacs
1006                        There  are  two possible outcomes for this example. If
1007                        the caller has the  EDITOR  environment  variable  de‐
1008                        fined,  then  the  job's  environment will inherit the
1009                        variable from the caller's environment.  If the caller
1010                        doesn't  have an environment variable defined for EDI‐
1011                        TOR, then the job's environment  will  use  the  value
1012                        given by --export.
1013
1014
1015       -B, --extra-node-info=<sockets>[:cores[:threads]]
1016              Restrict  node  selection  to  nodes with at least the specified
1017              number of sockets, cores per socket and/or threads per core.
1018              NOTE: These options do not specify the resource allocation size.
1019              Each  value  specified is considered a minimum.  An asterisk (*)
1020              can be used as a placeholder indicating that all  available  re‐
1021              sources  of  that  type  are  to be utilized. Values can also be
1022              specified as min-max. The individual levels can also  be  speci‐
1023              fied in separate options if desired:
1024                  --sockets-per-node=<sockets>
1025                  --cores-per-socket=<cores>
1026                  --threads-per-core=<threads>
1027              If  task/affinity  plugin is enabled, then specifying an alloca‐
1028              tion in this manner also sets a  default  --cpu-bind  option  of
1029              threads  if the -B option specifies a thread count, otherwise an
1030              option of cores if a core count is specified, otherwise  an  op‐
1031              tion   of   sockets.    If   SelectType  is  configured  to  se‐
1032              lect/cons_res, it must have a parameter of CR_Core, CR_Core_Mem‐
1033              ory,  CR_Socket,  or CR_Socket_Memory for this option to be hon‐
1034              ored.  If not specified, the  scontrol  show  job  will  display
1035              'ReqS:C:T=*:*:*'. This option applies to job allocations.
1036              NOTE:   This   option   is   mutually   exclusive  with  --hint,
1037              --threads-per-core and --ntasks-per-core.
1038              NOTE: If the number of sockets, cores and threads were all spec‐
1039              ified, the number of nodes was specified (as a fixed number, not
1040              a range) and the number of tasks was NOT  specified,  srun  will
1041              implicitly calculate the number of tasks as one task per thread.
1042
1043
1044       --gid=<group>
1045              If srun is run as root, and the --gid option is used, submit the
1046              job with group's group access permissions.   group  may  be  the
1047              group name or the numerical group ID. This option applies to job
1048              allocations.
1049
1050
1051       --gpu-bind=[verbose,]<type>
1052              Bind tasks to specific GPUs.  By default every spawned task  can
1053              access every GPU allocated to the step.  If "verbose," is speci‐
1054              fied before <type>, then print out GPU binding debug information
1055              to  the  stderr of the tasks. GPU binding is ignored if there is
1056              only one task.
1057
1058              Supported type options:
1059
1060              closest   Bind each task to the GPU(s) which are closest.  In  a
1061                        NUMA  environment, each task may be bound to more than
1062                        one GPU (i.e.  all GPUs in that NUMA environment).
1063
1064              map_gpu:<list>
1065                        Bind by setting GPU masks on tasks (or ranks) as spec‐
1066                        ified            where            <list>            is
1067                        <gpu_id_for_task_0>,<gpu_id_for_task_1>,...  GPU   IDs
1068                        are interpreted as decimal values unless they are pre‐
1069                        ceded with '0x' in  which  case  they  interpreted  as
1070                        hexadecimal  values. If the number of tasks (or ranks)
1071                        exceeds the number of elements in this list,  elements
1072                        in the list will be reused as needed starting from the
1073                        beginning of the list. To simplify support  for  large
1074                        task counts, the lists may follow a map with an aster‐
1075                        isk    and    repetition    count.     For     example
1076                        "map_gpu:0*4,1*4".   If the task/cgroup plugin is used
1077                        and ConstrainDevices is set in cgroup.conf,  then  the
1078                        GPU  IDs  are  zero-based indexes relative to the GPUs
1079                        allocated to the job (e.g. the first GPU is 0, even if
1080                        the global ID is 3). Otherwise, the GPU IDs are global
1081                        IDs, and all GPUs on each node in the  job  should  be
1082                        allocated for predictable binding results.
1083
1084              mask_gpu:<list>
1085                        Bind by setting GPU masks on tasks (or ranks) as spec‐
1086                        ified            where            <list>            is
1087                        <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,...    The
1088                        mapping is specified for a node and identical  mapping
1089                        is applied to the tasks on every node (i.e. the lowest
1090                        task ID on each node is mapped to the first mask spec‐
1091                        ified  in the list, etc.). GPU masks are always inter‐
1092                        preted as hexadecimal values but can be preceded  with
1093                        an  optional  '0x'. To simplify support for large task
1094                        counts, the lists may follow a map  with  an  asterisk
1095                        and      repetition      count.       For      example
1096                        "mask_gpu:0x0f*4,0xf0*4".  If the  task/cgroup  plugin
1097                        is  used  and  ConstrainDevices is set in cgroup.conf,
1098                        then the GPU IDs are zero-based  indexes  relative  to
1099                        the  GPUs  allocated to the job (e.g. the first GPU is
1100                        0, even if the global ID is 3). Otherwise, the GPU IDs
1101                        are  global  IDs, and all GPUs on each node in the job
1102                        should be allocated for predictable binding results.
1103
1104              none      Do not bind  tasks  to  GPUs  (turns  off  binding  if
1105                        --gpus-per-task is requested).
1106
1107              per_task:<gpus_per_task>
1108                        Each  task  will be bound to the number of gpus speci‐
1109                        fied in <gpus_per_task>. Gpus are assigned in order to
1110                        tasks.  The  first  task  will be assigned the first x
1111                        number of gpus on the node etc.
1112
1113              single:<tasks_per_gpu>
1114                        Like --gpu-bind=closest, except  that  each  task  can
1115                        only  be  bound  to  a single GPU, even when it can be
1116                        bound to multiple GPUs that are  equally  close.   The
1117                        GPU to bind to is determined by <tasks_per_gpu>, where
1118                        the first <tasks_per_gpu> tasks are bound to the first
1119                        GPU  available,  the  second <tasks_per_gpu> tasks are
1120                        bound to the second GPU available, etc.  This is basi‐
1121                        cally  a  block  distribution  of tasks onto available
1122                        GPUs, where the available GPUs are determined  by  the
1123                        socket affinity of the task and the socket affinity of
1124                        the GPUs as specified in gres.conf's Cores parameter.
1125
1126
1127       --gpu-freq=[<type]=value>[,<type=value>][,verbose]
1128              Request that GPUs allocated to the job are configured with  spe‐
1129              cific  frequency  values.   This  option can be used to indepen‐
1130              dently configure the GPU and its memory frequencies.  After  the
1131              job  is  completed, the frequencies of all affected GPUs will be
1132              reset to the highest possible values.   In  some  cases,  system
1133              power  caps  may  override the requested values.  The field type
1134              can be "memory".  If type is not specified, the GPU frequency is
1135              implied.  The value field can either be "low", "medium", "high",
1136              "highm1" or a numeric value in megahertz (MHz).  If  the  speci‐
1137              fied numeric value is not possible, a value as close as possible
1138              will be used. See below for definition of the values.  The  ver‐
1139              bose  option  causes  current  GPU  frequency  information to be
1140              logged.  Examples of use include "--gpu-freq=medium,memory=high"
1141              and "--gpu-freq=450".
1142
1143              Supported value definitions:
1144
1145              low       the lowest available frequency.
1146
1147              medium    attempts  to  set  a  frequency  in  the middle of the
1148                        available range.
1149
1150              high      the highest available frequency.
1151
1152              highm1    (high minus one) will select the next  highest  avail‐
1153                        able frequency.
1154
1155
1156       -G, --gpus=[type:]<number>
1157              Specify  the  total number of GPUs required for the job.  An op‐
1158              tional GPU type specification  can  be  supplied.   For  example
1159              "--gpus=volta:3".   Multiple options can be requested in a comma
1160              separated list,  for  example:  "--gpus=volta:3,kepler:1".   See
1161              also  the --gpus-per-node, --gpus-per-socket and --gpus-per-task
1162              options.
1163
1164
1165       --gpus-per-node=[type:]<number>
1166              Specify the number of GPUs required for the job on each node in‐
1167              cluded  in  the job's resource allocation.  An optional GPU type
1168              specification     can     be     supplied.      For      example
1169              "--gpus-per-node=volta:3".  Multiple options can be requested in
1170              a      comma      separated       list,       for       example:
1171              "--gpus-per-node=volta:3,kepler:1".    See   also   the  --gpus,
1172              --gpus-per-socket and --gpus-per-task options.
1173
1174
1175       --gpus-per-socket=[type:]<number>
1176              Specify the number of GPUs required for the job on  each  socket
1177              included in the job's resource allocation.  An optional GPU type
1178              specification     can     be     supplied.      For      example
1179              "--gpus-per-socket=volta:3".   Multiple options can be requested
1180              in     a     comma     separated     list,     for      example:
1181              "--gpus-per-socket=volta:3,kepler:1".  Requires job to specify a
1182              sockets per node count  (  --sockets-per-node).   See  also  the
1183              --gpus,  --gpus-per-node  and --gpus-per-task options.  This op‐
1184              tion applies to job allocations.
1185
1186
1187       --gpus-per-task=[type:]<number>
1188              Specify the number of GPUs required for the job on each task  to
1189              be  spawned  in  the job's resource allocation.  An optional GPU
1190              type   specification   can    be    supplied.     For    example
1191              "--gpus-per-task=volta:1".  Multiple options can be requested in
1192              a      comma      separated       list,       for       example:
1193              "--gpus-per-task=volta:3,kepler:1".   See   also   the   --gpus,
1194              --gpus-per-socket and --gpus-per-node options.  This option  re‐
1195              quires  an  explicit  task count, e.g. -n, --ntasks or "--gpus=X
1196              --gpus-per-task=Y" rather than an ambiguous range of nodes  with
1197              -N,     --nodes.     This    option    will    implicitly    set
1198              --gpu-bind=per_task:<gpus_per_task>, but that can be  overridden
1199              with an explicit --gpu-bind specification.
1200
1201
1202       --gres=<list>
1203              Specifies  a  comma-delimited  list  of  generic  consumable re‐
1204              sources.   The  format  of   each   entry   on   the   list   is
1205              "name[[:type]:count]".   The  name is that of the consumable re‐
1206              source.  The count is the number of those resources with  a  de‐
1207              fault  value  of  1.   The count can have a suffix of "k" or "K"
1208              (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
1209              "G"  (multiple  of  1024 x 1024 x 1024), "t" or "T" (multiple of
1210              1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x  1024
1211              x  1024  x  1024 x 1024).  The specified resources will be allo‐
1212              cated to the job on each node.  The available generic consumable
1213              resources  is  configurable by the system administrator.  A list
1214              of available generic consumable resources will  be  printed  and
1215              the  command  will exit if the option argument is "help".  Exam‐
1216              ples of use include "--gres=gpu:2",  "--gres=gpu:kepler:2",  and
1217              "--gres=help".   NOTE: This option applies to job and step allo‐
1218              cations. By default, a job step is allocated all of the  generic
1219              resources  that  have  been allocated to the job.  To change the
1220              behavior so that each job  step  is  allocated  no  generic  re‐
1221              sources,  explicitly  set  the  value  of --gres to specify zero
1222              counts for each generic resource OR set "--gres=none" OR set the
1223              SLURM_STEP_GRES environment variable to "none".
1224
1225
1226       --gres-flags=<type>
1227              Specify  generic resource task binding options.  This option ap‐
1228              plies to job allocations.
1229
1230              disable-binding
1231                     Disable filtering of CPUs with  respect  to  generic  re‐
1232                     source  locality.   This  option is currently required to
1233                     use more CPUs than are bound to a GRES (i.e. if a GPU  is
1234                     bound  to  the  CPUs on one socket, but resources on more
1235                     than one socket are required to run the job).   This  op‐
1236                     tion  may  permit  a job to be allocated resources sooner
1237                     than otherwise possible, but may result in lower job per‐
1238                     formance.
1239                     NOTE: This option is specific to SelectType=cons_res.
1240
1241              enforce-binding
1242                     The only CPUs available to the job will be those bound to
1243                     the selected  GRES  (i.e.  the  CPUs  identified  in  the
1244                     gres.conf  file  will  be strictly enforced). This option
1245                     may result in delayed initiation of a job.  For example a
1246                     job  requiring two GPUs and one CPU will be delayed until
1247                     both GPUs on a single socket are  available  rather  than
1248                     using GPUs bound to separate sockets, however, the appli‐
1249                     cation performance may be improved due to improved commu‐
1250                     nication  speed.  Requires the node to be configured with
1251                     more than one socket and resource filtering will be  per‐
1252                     formed on a per-socket basis.
1253                     NOTE: This option is specific to SelectType=cons_tres.
1254
1255
1256       -h, --help
1257              Display help information and exit.
1258
1259
1260       --het-group=<expr>
1261              Identify  each  component  in a heterogeneous job allocation for
1262              which a step is to be created. Applies only to srun commands is‐
1263              sued  inside  a salloc allocation or sbatch script.  <expr> is a
1264              set of integers corresponding to one or more options offsets  on
1265              the  salloc  or sbatch command line.  Examples: "--het-group=2",
1266              "--het-group=0,4", "--het-group=1,3-5".  The  default  value  is
1267              --het-group=0.
1268
1269
1270       --hint=<type>
1271              Bind tasks according to application hints.
1272              NOTE:  This  option  cannot  be  used in conjunction with any of
1273              --ntasks-per-core, --threads-per-core,  --cpu-bind  (other  than
1274              --cpu-bind=verbose)  or  -B. If --hint is specified as a command
1275              line argument, it will take precedence over the environment.
1276
1277              compute_bound
1278                     Select settings for compute bound applications:  use  all
1279                     cores in each socket, one thread per core.
1280
1281              memory_bound
1282                     Select  settings  for memory bound applications: use only
1283                     one core in each socket, one thread per core.
1284
1285              [no]multithread
1286                     [don't] use extra threads  with  in-core  multi-threading
1287                     which  can  benefit communication intensive applications.
1288                     Only supported with the task/affinity plugin.
1289
1290              help   show this help message
1291
1292              This option applies to job allocations.
1293
1294
1295       -H, --hold
1296              Specify the job is to be submitted in a held state (priority  of
1297              zero).   A  held job can now be released using scontrol to reset
1298              its priority (e.g. "scontrol release <job_id>"). This option ap‐
1299              plies to job allocations.
1300
1301
1302       -I, --immediate[=<seconds>]
1303              exit if resources are not available within the time period spec‐
1304              ified.  If no argument is given (seconds  defaults  to  1),  re‐
1305              sources  must  be  available immediately for the request to suc‐
1306              ceed. If defer is configured  in  SchedulerParameters  and  sec‐
1307              onds=1  the allocation request will fail immediately; defer con‐
1308              flicts and takes precedence over this option.  By default, --im‐
1309              mediate  is  off, and the command will block until resources be‐
1310              come available. Since this option's argument  is  optional,  for
1311              proper parsing the single letter option must be followed immedi‐
1312              ately with the value and not include a space between  them.  For
1313              example  "-I60"  and not "-I 60". This option applies to job and
1314              step allocations.
1315
1316
1317       -i, --input=<mode>
1318              Specify how stdin is to redirected. By default,  srun  redirects
1319              stdin  from the terminal all tasks. See IO Redirection below for
1320              more options.  For OS X, the poll() function  does  not  support
1321              stdin, so input from a terminal is not possible. This option ap‐
1322              plies to job and step allocations.
1323
1324
1325       -J, --job-name=<jobname>
1326              Specify a name for the job. The specified name will appear along
1327              with the job id number when querying running jobs on the system.
1328              The default is the supplied  executable  program's  name.  NOTE:
1329              This  information  may be written to the slurm_jobacct.log file.
1330              This file is space delimited so if a space is used in  the  job‐
1331              name name it will cause problems in properly displaying the con‐
1332              tents of the slurm_jobacct.log file when the  sacct  command  is
1333              used. This option applies to job and step allocations.
1334
1335
1336       --jobid=<jobid>
1337              Initiate  a  job step under an already allocated job with job id
1338              id.  Using this option will cause srun to behave exactly  as  if
1339              the  SLURM_JOB_ID  environment variable was set. This option ap‐
1340              plies to step allocations.
1341
1342
1343       -K, --kill-on-bad-exit[=0|1]
1344              Controls whether or not to terminate a step if  any  task  exits
1345              with  a non-zero exit code. If this option is not specified, the
1346              default action will be based upon the Slurm configuration param‐
1347              eter of KillOnBadExit. If this option is specified, it will take
1348              precedence over KillOnBadExit. An option argument of  zero  will
1349              not  terminate  the job. A non-zero argument or no argument will
1350              terminate the job.  Note: This option takes precedence over  the
1351              -W, --wait option to terminate the job immediately if a task ex‐
1352              its with a non-zero exit code.  Since this option's argument  is
1353              optional,  for  proper  parsing the single letter option must be
1354              followed immediately with the value and not include a space  be‐
1355              tween them. For example "-K1" and not "-K 1".
1356
1357
1358       -l, --label
1359              Prepend  task number to lines of stdout/err.  The --label option
1360              will prepend lines of output with the remote task id.  This  op‐
1361              tion applies to step allocations.
1362
1363
1364       -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
1365              Specification  of  licenses (or other resources available on all
1366              nodes of the cluster) which must be allocated to this job.   Li‐
1367              cense  names  can  be followed by a colon and count (the default
1368              count is one).  Multiple license names should be comma separated
1369              (e.g.  "--licenses=foo:4,bar"). This option applies to job allo‐
1370              cations.
1371
1372
1373       --mail-type=<type>
1374              Notify user by email when certain event types occur.  Valid type
1375              values  are  NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1376              BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and  STAGE_OUT),  IN‐
1377              VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1378              fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1379              (reached  90  percent  of time limit), TIME_LIMIT_80 (reached 80
1380              percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1381              time  limit).   Multiple type values may be specified in a comma
1382              separated list.  The user  to  be  notified  is  indicated  with
1383              --mail-user. This option applies to job allocations.
1384
1385
1386       --mail-user=<user>
1387              User  to  receive email notification of state changes as defined
1388              by --mail-type.  The default value is the submitting user.  This
1389              option applies to job allocations.
1390
1391
1392       --mcs-label=<mcs>
1393              Used  only when the mcs/group plugin is enabled.  This parameter
1394              is a group among the groups of the user.  Default value is  cal‐
1395              culated  by  the Plugin mcs if it's enabled. This option applies
1396              to job allocations.
1397
1398
1399       --mem=<size>[units]
1400              Specify the real memory required per node.   Default  units  are
1401              megabytes.   Different  units  can be specified using the suffix
1402              [K|M|G|T].  Default value is DefMemPerNode and the maximum value
1403              is  MaxMemPerNode. If configured, both of parameters can be seen
1404              using the scontrol show config command.   This  parameter  would
1405              generally  be used if whole nodes are allocated to jobs (Select‐
1406              Type=select/linear).  Specifying a memory limit of  zero  for  a
1407              job  step will restrict the job step to the amount of memory al‐
1408              located to the job, but not remove any of the job's memory allo‐
1409              cation  from  being  available  to  other  job  steps.  Also see
1410              --mem-per-cpu and --mem-per-gpu.  The --mem,  --mem-per-cpu  and
1411              --mem-per-gpu   options   are   mutually  exclusive.  If  --mem,
1412              --mem-per-cpu or --mem-per-gpu are specified as command line ar‐
1413              guments,  then  they  will  take precedence over the environment
1414              (potentially inherited from salloc or sbatch).
1415
1416              NOTE: A memory size specification of zero is treated as  a  spe‐
1417              cial case and grants the job access to all of the memory on each
1418              node for newly submitted jobs and all available  job  memory  to
1419              new job steps.
1420
1421              Specifying new memory limits for job steps are only advisory.
1422
1423              If  the job is allocated multiple nodes in a heterogeneous clus‐
1424              ter, the memory limit on each node will be that of the  node  in
1425              the  allocation  with  the smallest memory size (same limit will
1426              apply to every node in the job's allocation).
1427
1428              NOTE: Enforcement of memory limits  currently  relies  upon  the
1429              task/cgroup plugin or enabling of accounting, which samples mem‐
1430              ory use on a periodic basis (data need not be stored, just  col‐
1431              lected).  In both cases memory use is based upon the job's Resi‐
1432              dent Set Size (RSS). A task may exceed the  memory  limit  until
1433              the next periodic accounting sample.
1434
1435              This option applies to job and step allocations.
1436
1437
1438       --mem-bind=[{quiet|verbose},]<type>
1439              Bind tasks to memory. Used only when the task/affinity plugin is
1440              enabled and the NUMA memory functions are available.  Note  that
1441              the  resolution of CPU and memory binding may differ on some ar‐
1442              chitectures. For example, CPU binding may be  performed  at  the
1443              level  of the cores within a processor while memory binding will
1444              be performed at the level of  nodes,  where  the  definition  of
1445              "nodes"  may differ from system to system.  By default no memory
1446              binding is performed; any task using any CPU can use any memory.
1447              This  option is typically used to ensure that each task is bound
1448              to the memory closest to its assigned CPU. The use of  any  type
1449              other  than  "none"  or "local" is not recommended.  If you want
1450              greater control, try running a simple test code with the options
1451              "--cpu-bind=verbose,none  --mem-bind=verbose,none"  to determine
1452              the specific configuration.
1453
1454              NOTE: To have Slurm always report on the selected memory binding
1455              for  all  commands  executed  in a shell, you can enable verbose
1456              mode by setting the SLURM_MEM_BIND environment variable value to
1457              "verbose".
1458
1459              The  following  informational environment variables are set when
1460              --mem-bind is in use:
1461
1462                   SLURM_MEM_BIND_LIST
1463                   SLURM_MEM_BIND_PREFER
1464                   SLURM_MEM_BIND_SORT
1465                   SLURM_MEM_BIND_TYPE
1466                   SLURM_MEM_BIND_VERBOSE
1467
1468              See the ENVIRONMENT VARIABLES section for a  more  detailed  de‐
1469              scription of the individual SLURM_MEM_BIND* variables.
1470
1471              Supported options include:
1472
1473              help   show this help message
1474
1475              local  Use memory local to the processor in use
1476
1477              map_mem:<list>
1478                     Bind by setting memory masks on tasks (or ranks) as spec‐
1479                     ified             where             <list>             is
1480                     <numa_id_for_task_0>,<numa_id_for_task_1>,...   The  map‐
1481                     ping is specified for a node and identical mapping is ap‐
1482                     plied to the tasks on every node (i.e. the lowest task ID
1483                     on each node is mapped to the first ID specified  in  the
1484                     list,  etc.).  NUMA IDs are interpreted as decimal values
1485                     unless they are preceded with '0x' in which case they in‐
1486                     terpreted  as hexadecimal values.  If the number of tasks
1487                     (or ranks) exceeds the number of elements in  this  list,
1488                     elements  in  the  list will be reused as needed starting
1489                     from the beginning of the list.  To simplify support  for
1490                     large task counts, the lists may follow a map with an as‐
1491                     terisk    and    repetition    count.     For     example
1492                     "map_mem:0x0f*4,0xf0*4".   For  predictable  binding  re‐
1493                     sults, all CPUs for each node in the job should be  allo‐
1494                     cated to the job.
1495
1496              mask_mem:<list>
1497                     Bind by setting memory masks on tasks (or ranks) as spec‐
1498                     ified             where             <list>             is
1499                     <numa_mask_for_task_0>,<numa_mask_for_task_1>,...     The
1500                     mapping is specified for a node and identical mapping  is
1501                     applied  to the tasks on every node (i.e. the lowest task
1502                     ID on each node is mapped to the first mask specified  in
1503                     the  list,  etc.).   NUMA masks are always interpreted as
1504                     hexadecimal values.  Note that  masks  must  be  preceded
1505                     with  a  '0x'  if they don't begin with [0-9] so they are
1506                     seen as numerical values.  If the  number  of  tasks  (or
1507                     ranks)  exceeds the number of elements in this list, ele‐
1508                     ments in the list will be reused as needed starting  from
1509                     the beginning of the list.  To simplify support for large
1510                     task counts, the lists may follow a mask with an asterisk
1511                     and  repetition  count.   For example "mask_mem:0*4,1*4".
1512                     For predictable binding results, all CPUs for  each  node
1513                     in the job should be allocated to the job.
1514
1515              no[ne] don't bind tasks to memory (default)
1516
1517              nosort avoid sorting free cache pages (default, LaunchParameters
1518                     configuration parameter can override this default)
1519
1520              p[refer]
1521                     Prefer use of first specified NUMA node, but permit
1522                      use of other available NUMA nodes.
1523
1524              q[uiet]
1525                     quietly bind before task runs (default)
1526
1527              rank   bind by task rank (not recommended)
1528
1529              sort   sort free cache pages (run zonesort on Intel KNL nodes)
1530
1531              v[erbose]
1532                     verbosely report binding before task runs
1533
1534              This option applies to job and step allocations.
1535
1536
1537       --mem-per-cpu=<size>[units]
1538              Minimum memory required per allocated CPU.   Default  units  are
1539              megabytes.   Different  units  can be specified using the suffix
1540              [K|M|G|T].  The default value is DefMemPerCPU  and  the  maximum
1541              value is MaxMemPerCPU (see exception below). If configured, both
1542              parameters can be seen using the scontrol show  config  command.
1543              Note  that  if the job's --mem-per-cpu value exceeds the config‐
1544              ured MaxMemPerCPU, then the user's limit will be  treated  as  a
1545              memory  limit per task; --mem-per-cpu will be reduced to a value
1546              no larger than MaxMemPerCPU; --cpus-per-task will be set and the
1547              value  of  --cpus-per-task  multiplied  by the new --mem-per-cpu
1548              value will equal the original --mem-per-cpu value  specified  by
1549              the  user.  This parameter would generally be used if individual
1550              processors are allocated to  jobs  (SelectType=select/cons_res).
1551              If resources are allocated by core, socket, or whole nodes, then
1552              the number of CPUs allocated to a job may  be  higher  than  the
1553              task count and the value of --mem-per-cpu should be adjusted ac‐
1554              cordingly.  Specifying a memory limit of zero  for  a  job  step
1555              will  restrict the job step to the amount of memory allocated to
1556              the job, but not remove any of the job's memory allocation  from
1557              being  available  to  other  job  steps.   Also  see  --mem  and
1558              --mem-per-gpu.  The --mem, --mem-per-cpu and  --mem-per-gpu  op‐
1559              tions are mutually exclusive.
1560
1561              NOTE:  If the final amount of memory requested by a job can't be
1562              satisfied by any of the nodes configured in the  partition,  the
1563              job  will  be  rejected.   This could happen if --mem-per-cpu is
1564              used with the  --exclusive  option  for  a  job  allocation  and
1565              --mem-per-cpu times the number of CPUs on a node is greater than
1566              the total memory of that node.
1567
1568
1569       --mem-per-gpu=<size>[units]
1570              Minimum memory required per allocated GPU.   Default  units  are
1571              megabytes.   Different  units  can be specified using the suffix
1572              [K|M|G|T].  Default value is DefMemPerGPU and  is  available  on
1573              both  a  global and per partition basis.  If configured, the pa‐
1574              rameters can be seen using the scontrol show config and scontrol
1575              show   partition   commands.    Also   see  --mem.   The  --mem,
1576              --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1577
1578
1579       --mincpus=<n>
1580              Specify a minimum number of logical  cpus/processors  per  node.
1581              This option applies to job allocations.
1582
1583
1584       --mpi=<mpi_type>
1585              Identify the type of MPI to be used. May result in unique initi‐
1586              ation procedures.
1587
1588              list   Lists available mpi types to choose from.
1589
1590              pmi2   To enable PMI2 support. The PMI2 support in  Slurm  works
1591                     only  if  the  MPI  implementation  supports it, in other
1592                     words if the MPI has the PMI2 interface implemented.  The
1593                     --mpi=pmi2  will  load  the library lib/slurm/mpi_pmi2.so
1594                     which provides the  server  side  functionality  but  the
1595                     client  side must implement PMI2_Init() and the other in‐
1596                     terface calls.
1597
1598              pmix   To enable PMIx support (https://pmix.github.io). The PMIx
1599                     support  in Slurm can be used to launch parallel applica‐
1600                     tions (e.g. MPI) if it supports PMIx, PMI2 or PMI1. Slurm
1601                     must   be   configured   with  pmix  support  by  passing
1602                     "--with-pmix=<PMIx  installation  path>"  option  to  its
1603                     "./configure" script.
1604
1605                     At  the  time  of  writing  PMIx is supported in Open MPI
1606                     starting from version 2.0.  PMIx also  supports  backward
1607                     compatibility  with  PMI1 and PMI2 and can be used if MPI
1608                     was configured with PMI2/PMI1  support  pointing  to  the
1609                     PMIx  library ("libpmix").  If MPI supports PMI1/PMI2 but
1610                     doesn't provide the way to point to a specific  implemen‐
1611                     tation,  a hack'ish solution leveraging LD_PRELOAD can be
1612                     used to force "libpmix" usage.
1613
1614
1615              none   No special MPI processing. This is the default and  works
1616                     with many other versions of MPI.
1617
1618              This option applies to step allocations.
1619
1620
1621       --msg-timeout=<seconds>
1622              Modify  the  job  launch  message timeout.  The default value is
1623              MessageTimeout  in  the  Slurm  configuration  file  slurm.conf.
1624              Changes to this are typically not recommended, but could be use‐
1625              ful to diagnose problems.  This option applies  to  job  alloca‐
1626              tions.
1627
1628
1629       --multi-prog
1630              Run  a  job  with different programs and different arguments for
1631              each task. In this case, the executable program specified is ac‐
1632              tually  a configuration file specifying the executable and argu‐
1633              ments for each task. See MULTIPLE  PROGRAM  CONFIGURATION  below
1634              for  details on the configuration file contents. This option ap‐
1635              plies to step allocations.
1636
1637
1638       --network=<type>
1639              Specify information pertaining to the switch  or  network.   The
1640              interpretation of type is system dependent.  This option is sup‐
1641              ported when running Slurm on a Cray natively.  It is used to re‐
1642              quest  using  Network  Performance Counters.  Only one value per
1643              request is valid.  All options are case in-sensitive.   In  this
1644              configuration supported values include:
1645
1646              system
1647                    Use  the  system-wide  network  performance counters. Only
1648                    nodes requested will be marked in use for the job  alloca‐
1649                    tion.   If  the job does not fill up the entire system the
1650                    rest of the nodes are not able to be used  by  other  jobs
1651                    using  NPC,  if  idle their state will appear as PerfCnts.
1652                    These nodes are still available for other jobs  not  using
1653                    NPC.
1654
1655              blade Use the blade network performance counters. Only nodes re‐
1656                    quested will be marked in use for the job allocation.   If
1657                    the  job does not fill up the entire blade(s) allocated to
1658                    the job those blade(s) are not able to be  used  by  other
1659                    jobs  using NPC, if idle their state will appear as PerfC‐
1660                    nts.  These nodes are still available for other  jobs  not
1661                    using NPC.
1662
1663
1664              In all cases the job allocation request must specify the
1665              --exclusive option and the step cannot specify the --overlap op‐
1666              tion. Otherwise the request will be denied.
1667
1668              Also with any of these options steps are not  allowed  to  share
1669              blades,  so  resources would remain idle inside an allocation if
1670              the step running on a blade does not take up all  the  nodes  on
1671              the blade.
1672
1673              The  network option is also supported on systems with IBM's Par‐
1674              allel Environment (PE).  See IBM's LoadLeveler job command  key‐
1675              word documentation about the keyword "network" for more informa‐
1676              tion.  Multiple values may be specified  in  a  comma  separated
1677              list.   All options are case in-sensitive.  Supported values in‐
1678              clude:
1679
1680              BULK_XFER[=<resources>]
1681                          Enable  bulk  transfer  of  data  using  Remote  Di‐
1682                          rect-Memory  Access  (RDMA).  The optional resources
1683                          specification is a numeric value which  can  have  a
1684                          suffix  of  "k", "K", "m", "M", "g" or "G" for kilo‐
1685                          bytes, megabytes or gigabytes.  NOTE: The  resources
1686                          specification is not supported by the underlying IBM
1687                          infrastructure as of  Parallel  Environment  version
1688                          2.2  and  no value should be specified at this time.
1689                          The devices allocated to a job must all  be  of  the
1690                          same  type.   The default value depends upon depends
1691                          upon what hardware is  available  and  in  order  of
1692                          preferences  is  IPONLY  (which is not considered in
1693                          User Space mode), HFI, IB, HPCE, and KMUX.
1694
1695              CAU=<count> Number of Collective Acceleration  Units  (CAU)  re‐
1696                          quired.   Applies  only to IBM Power7-IH processors.
1697                          Default value is zero.  Independent CAU will be  al‐
1698                          located  for  each programming interface (MPI, LAPI,
1699                          etc.)
1700
1701              DEVNAME=<name>
1702                          Specify the device name to  use  for  communications
1703                          (e.g. "eth0" or "mlx4_0").
1704
1705              DEVTYPE=<type>
1706                          Specify  the  device type to use for communications.
1707                          The supported values of type are: "IB" (InfiniBand),
1708                          "HFI"  (P7 Host Fabric Interface), "IPONLY" (IP-Only
1709                          interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1710                          nel  Emulation of HPCE).  The devices allocated to a
1711                          job must all be of the same type.  The default value
1712                          depends upon depends upon what hardware is available
1713                          and in order of preferences is IPONLY (which is  not
1714                          considered  in  User Space mode), HFI, IB, HPCE, and
1715                          KMUX.
1716
1717              IMMED =<count>
1718                          Number of immediate send slots per window  required.
1719                          Applies  only  to IBM Power7-IH processors.  Default
1720                          value is zero.
1721
1722              INSTANCES =<count>
1723                          Specify number of network connections for each  task
1724                          on  each  network  connection.  The default instance
1725                          count is 1.
1726
1727              IPV4        Use Internet Protocol (IP) version 4  communications
1728                          (default).
1729
1730              IPV6        Use Internet Protocol (IP) version 6 communications.
1731
1732              LAPI        Use the LAPI programming interface.
1733
1734              MPI         Use  the  MPI programming interface.  MPI is the de‐
1735                          fault interface.
1736
1737              PAMI        Use the PAMI programming interface.
1738
1739              SHMEM       Use the OpenSHMEM programming interface.
1740
1741              SN_ALL      Use all available switch networks (default).
1742
1743              SN_SINGLE   Use one available switch network.
1744
1745              UPC         Use the UPC programming interface.
1746
1747              US          Use User Space communications.
1748
1749
1750              Some examples of network specifications:
1751
1752              Instances=2,US,MPI,SN_ALL
1753                          Create two user space connections for MPI communica‐
1754                          tions on every switch network for each task.
1755
1756              US,MPI,Instances=3,Devtype=IB
1757                          Create three user space connections for MPI communi‐
1758                          cations on every InfiniBand network for each task.
1759
1760              IPV4,LAPI,SN_Single
1761                          Create a IP version 4 connection for LAPI communica‐
1762                          tions on one switch network for each task.
1763
1764              Instances=2,US,LAPI,MPI
1765                          Create  two user space connections each for LAPI and
1766                          MPI communications on every switch network for  each
1767                          task.  Note that SN_ALL is the default option so ev‐
1768                          ery switch network  is  used.  Also  note  that  In‐
1769                          stances=2  specifies that two connections are estab‐
1770                          lished for each protocol (LAPI  and  MPI)  and  each
1771                          task.   If  there are two networks and four tasks on
1772                          the node then a total of 32 connections  are  estab‐
1773                          lished  (2  instances x 2 protocols x 2 networks x 4
1774                          tasks).
1775
1776              This option applies to job and step allocations.
1777
1778
1779       --nice[=adjustment]
1780              Run the job with an adjusted scheduling priority  within  Slurm.
1781              With no adjustment value the scheduling priority is decreased by
1782              100. A negative nice value increases the priority, otherwise de‐
1783              creases  it. The adjustment range is +/- 2147483645. Only privi‐
1784              leged users can specify a negative adjustment.
1785
1786
1787       -Z, --no-allocate
1788              Run the specified tasks on a set of  nodes  without  creating  a
1789              Slurm  "job"  in the Slurm queue structure, bypassing the normal
1790              resource allocation step.  The list of nodes must  be  specified
1791              with  the  -w,  --nodelist  option.  This is a privileged option
1792              only available for the users "SlurmUser" and "root". This option
1793              applies to job allocations.
1794
1795
1796       -k, --no-kill[=off]
1797              Do  not automatically terminate a job if one of the nodes it has
1798              been allocated fails. This option applies to job and step  allo‐
1799              cations.    The   job   will  assume  all  responsibilities  for
1800              fault-tolerance.  Tasks launch using this  option  will  not  be
1801              considered  terminated  (e.g.  -K,  --kill-on-bad-exit  and  -W,
1802              --wait options will have no effect upon the job step).  The  ac‐
1803              tive  job  step  (MPI job) will likely suffer a fatal error, but
1804              subsequent job steps may be run if this option is specified.
1805
1806              Specify an optional argument of "off" disable the effect of  the
1807              SLURM_NO_KILL environment variable.
1808
1809              The default action is to terminate the job upon node failure.
1810
1811
1812       -F, --nodefile=<node_file>
1813              Much  like  --nodelist,  but  the list is contained in a file of
1814              name node file.  The node names of the list may also span multi‐
1815              ple  lines in the file.    Duplicate node names in the file will
1816              be ignored.  The order of the node names in the list is not  im‐
1817              portant; the node names will be sorted by Slurm.
1818
1819
1820       -w, --nodelist={<node_name_list>|<filename>}
1821              Request  a  specific list of hosts.  The job will contain all of
1822              these hosts and possibly additional hosts as needed  to  satisfy
1823              resource   requirements.    The  list  may  be  specified  as  a
1824              comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1825              for  example),  or a filename.  The host list will be assumed to
1826              be a filename if it contains a "/" character.  If you specify  a
1827              minimum  node or processor count larger than can be satisfied by
1828              the supplied host list, additional resources will  be  allocated
1829              on  other  nodes  as  needed.  Rather than repeating a host name
1830              multiple times, an asterisk and a repetition count  may  be  ap‐
1831              pended  to  a host name. For example "host1,host1" and "host1*2"
1832              are equivalent. If the number of tasks is given and  a  list  of
1833              requested  nodes  is  also  given, the number of nodes used from
1834              that list will be reduced to match that of the number  of  tasks
1835              if the number of nodes in the list is greater than the number of
1836              tasks. This option applies to job and step allocations.
1837
1838
1839       -N, --nodes=<minnodes>[-maxnodes]
1840              Request that a minimum of minnodes nodes be  allocated  to  this
1841              job.   A maximum node count may also be specified with maxnodes.
1842              If only one number is specified, this is used as both the  mini‐
1843              mum  and maximum node count.  The partition's node limits super‐
1844              sede those of the job.  If a job's node limits  are  outside  of
1845              the  range  permitted for its associated partition, the job will
1846              be left in a PENDING state.  This permits possible execution  at
1847              a  later  time,  when  the partition limit is changed.  If a job
1848              node limit exceeds the number of nodes configured in the  parti‐
1849              tion, the job will be rejected.  Note that the environment vari‐
1850              able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1851              ibility) will be set to the count of nodes actually allocated to
1852              the job. See the ENVIRONMENT VARIABLES section for more informa‐
1853              tion.   If -N is not specified, the default behavior is to allo‐
1854              cate enough nodes to satisfy the requirements of the -n  and  -c
1855              options.   The  job  will be allocated as many nodes as possible
1856              within the range specified and without delaying  the  initiation
1857              of the job.  If the number of tasks is given and a number of re‐
1858              quested nodes is also given, the number of nodes used from  that
1859              request  will be reduced to match that of the number of tasks if
1860              the number of nodes in the request is greater than the number of
1861              tasks.  The node count specification may include a numeric value
1862              followed by a suffix of "k" (multiplies numeric value by  1,024)
1863              or  "m" (multiplies numeric value by 1,048,576). This option ap‐
1864              plies to job and step allocations.
1865
1866
1867       -n, --ntasks=<number>
1868              Specify the number of tasks to run. Request that  srun  allocate
1869              resources  for  ntasks tasks.  The default is one task per node,
1870              but note that the --cpus-per-task option will  change  this  de‐
1871              fault. This option applies to job and step allocations.
1872
1873
1874       --ntasks-per-core=<ntasks>
1875              Request the maximum ntasks be invoked on each core.  This option
1876              applies to the job allocation,  but  not  to  step  allocations.
1877              Meant   to  be  used  with  the  --ntasks  option.   Related  to
1878              --ntasks-per-node except at the core level instead of  the  node
1879              level.   Masks will automatically be generated to bind the tasks
1880              to specific cores unless --cpu-bind=none  is  specified.   NOTE:
1881              This  option  is not supported when using SelectType=select/lin‐
1882              ear.
1883
1884
1885       --ntasks-per-gpu=<ntasks>
1886              Request that there are ntasks tasks invoked for every GPU.  This
1887              option can work in two ways: 1) either specify --ntasks in addi‐
1888              tion, in which case a type-less GPU specification will be  auto‐
1889              matically  determined to satisfy --ntasks-per-gpu, or 2) specify
1890              the GPUs wanted (e.g. via --gpus or --gres)  without  specifying
1891              --ntasks,  and the total task count will be automatically deter‐
1892              mined.  The number of CPUs  needed  will  be  automatically  in‐
1893              creased  if  necessary  to  allow for any calculated task count.
1894              This option will implicitly set --gpu-bind=single:<ntasks>,  but
1895              that  can  be  overridden with an explicit --gpu-bind specifica‐
1896              tion.  This option is not compatible with  a  node  range  (i.e.
1897              -N<minnodes-maxnodes>).   This  option  is  not  compatible with
1898              --gpus-per-task, --gpus-per-socket, or --ntasks-per-node.   This
1899              option  is  not supported unless SelectType=cons_tres is config‐
1900              ured (either directly or indirectly on Cray systems).
1901
1902
1903       --ntasks-per-node=<ntasks>
1904              Request that ntasks be invoked on each node.  If used  with  the
1905              --ntasks  option,  the  --ntasks option will take precedence and
1906              the --ntasks-per-node will be treated  as  a  maximum  count  of
1907              tasks per node.  Meant to be used with the --nodes option.  This
1908              is related to --cpus-per-task=ncpus, but does not require knowl‐
1909              edge  of the actual number of cpus on each node.  In some cases,
1910              it is more convenient to be able to request that no more than  a
1911              specific  number  of tasks be invoked on each node.  Examples of
1912              this include submitting a hybrid MPI/OpenMP app where  only  one
1913              MPI  "task/rank"  should be assigned to each node while allowing
1914              the OpenMP portion to utilize all of the parallelism present  in
1915              the node, or submitting a single setup/cleanup/monitoring job to
1916              each node of a pre-existing allocation as one step in  a  larger
1917              job script. This option applies to job allocations.
1918
1919
1920       --ntasks-per-socket=<ntasks>
1921              Request  the maximum ntasks be invoked on each socket.  This op‐
1922              tion applies to the job allocation, but not to step allocations.
1923              Meant   to  be  used  with  the  --ntasks  option.   Related  to
1924              --ntasks-per-node except at the socket level instead of the node
1925              level.   Masks will automatically be generated to bind the tasks
1926              to specific sockets unless --cpu-bind=none is specified.   NOTE:
1927              This  option  is not supported when using SelectType=select/lin‐
1928              ear.
1929
1930
1931       --open-mode={append|truncate}
1932              Open the output and error files using append or truncate mode as
1933              specified.   For  heterogeneous  job  steps the default value is
1934              "append".  Otherwise the default value is specified by the  sys‐
1935              tem  configuration  parameter JobFileAppend. This option applies
1936              to job and step allocations.
1937
1938
1939       -o, --output=<filename_pattern>
1940              Specify the "filename pattern" for stdout  redirection.  By  de‐
1941              fault  in  interactive mode, srun collects stdout from all tasks
1942              and sends this output via TCP/IP to the attached terminal.  With
1943              --output  stdout  may  be  redirected to a file, to one file per
1944              task, or to /dev/null. See section IO Redirection below for  the
1945              various  forms  of  filename pattern.  If the specified file al‐
1946              ready exists, it will be overwritten.
1947
1948              If --error is not also specified on the command line, both  std‐
1949              out  and stderr will directed to the file specified by --output.
1950              This option applies to job and step allocations.
1951
1952
1953       -O, --overcommit
1954              Overcommit resources. This option applies to job and step  allo‐
1955              cations.
1956
1957              When  applied to a job allocation (not including jobs requesting
1958              exclusive access to the nodes) the resources are allocated as if
1959              only  one  task  per  node is requested. This means that the re‐
1960              quested number of cpus per task (-c, --cpus-per-task) are  allo‐
1961              cated  per  node  rather  than being multiplied by the number of
1962              tasks. Options used to specify the number  of  tasks  per  node,
1963              socket, core, etc. are ignored.
1964
1965              When applied to job step allocations (the srun command when exe‐
1966              cuted within an existing job allocation),  this  option  can  be
1967              used  to launch more than one task per CPU.  Normally, srun will
1968              not allocate more than  one  process  per  CPU.   By  specifying
1969              --overcommit  you  are explicitly allowing more than one process
1970              per CPU. However no more than MAX_TASKS_PER_NODE tasks are  per‐
1971              mitted to execute per node.  NOTE: MAX_TASKS_PER_NODE is defined
1972              in the file slurm.h and is not a variable, it is  set  at  Slurm
1973              build time.
1974
1975
1976       --overlap
1977              Allow steps to overlap each other on the CPUs.  By default steps
1978              do not share CPUs with other parallel steps.
1979
1980
1981       -s, --oversubscribe
1982              The job allocation can over-subscribe resources with other  run‐
1983              ning  jobs.   The  resources to be over-subscribed can be nodes,
1984              sockets, cores, and/or hyperthreads  depending  upon  configura‐
1985              tion.   The  default  over-subscribe  behavior depends on system
1986              configuration and the  partition's  OverSubscribe  option  takes
1987              precedence over the job's option.  This option may result in the
1988              allocation being granted sooner than if the --oversubscribe  op‐
1989              tion was not set and allow higher system utilization, but appli‐
1990              cation performance will likely suffer due to competition for re‐
1991              sources.  This option applies to step allocations.
1992
1993
1994       -p, --partition=<partition_names>
1995              Request  a  specific  partition for the resource allocation.  If
1996              not specified, the default behavior is to allow the  slurm  con‐
1997              troller  to  select  the  default partition as designated by the
1998              system administrator. If the job can use more  than  one  parti‐
1999              tion,  specify  their names in a comma separate list and the one
2000              offering earliest initiation will be used with no  regard  given
2001              to  the partition name ordering (although higher priority parti‐
2002              tions will be considered first).  When the job is initiated, the
2003              name  of  the  partition  used  will  be placed first in the job
2004              record partition string. This option applies to job allocations.
2005
2006
2007       --power=<flags>
2008              Comma separated list of power management plugin  options.   Cur‐
2009              rently  available  flags  include: level (all nodes allocated to
2010              the job should have identical power caps, may be disabled by the
2011              Slurm  configuration option PowerParameters=job_no_level).  This
2012              option applies to job allocations.
2013
2014
2015       -E, --preserve-env
2016              Pass   the   current    values    of    environment    variables
2017              SLURM_JOB_NUM_NODES  and SLURM_NTASKS through to the executable,
2018              rather than computing them from command  line  parameters.  This
2019              option applies to job allocations.
2020
2021
2022
2023       --priority=<value>
2024              Request  a  specific job priority.  May be subject to configura‐
2025              tion specific constraints.  value should  either  be  a  numeric
2026              value  or "TOP" (for highest possible value).  Only Slurm opera‐
2027              tors and administrators can set the priority of a job.  This op‐
2028              tion applies to job allocations only.
2029
2030
2031       --profile={all|none|<type>[,<type>...]}
2032              Enables  detailed  data  collection  by  the acct_gather_profile
2033              plugin.  Detailed data are typically time-series that are stored
2034              in an HDF5 file for the job or an InfluxDB database depending on
2035              the configured plugin.  This option applies to job and step  al‐
2036              locations.
2037
2038
2039              All       All data types are collected. (Cannot be combined with
2040                        other values.)
2041
2042
2043              None      No data types are collected. This is the default.
2044                         (Cannot be combined with other values.)
2045
2046
2047              Valid type values are:
2048
2049
2050                  Energy Energy data is collected.
2051
2052
2053                  Task   Task (I/O, Memory, ...) data is collected.
2054
2055
2056                  Filesystem
2057                         Filesystem data is collected.
2058
2059
2060                  Network
2061                         Network (InfiniBand) data is collected.
2062
2063
2064       --prolog=<executable>
2065              srun will run executable just before  launching  the  job  step.
2066              The  command  line  arguments for executable will be the command
2067              and arguments of the job step.  If executable is "none", then no
2068              srun prolog will be run. This parameter overrides the SrunProlog
2069              parameter in slurm.conf. This parameter is  completely  indepen‐
2070              dent  from  the  Prolog parameter in slurm.conf. This option ap‐
2071              plies to job allocations.
2072
2073
2074       --propagate[=rlimit[,rlimit...]]
2075              Allows users to specify which of the modifiable (soft)  resource
2076              limits  to  propagate  to  the  compute nodes and apply to their
2077              jobs. If no rlimit is specified, then all resource  limits  will
2078              be  propagated.   The  following  rlimit  names are supported by
2079              Slurm (although some options may not be supported on  some  sys‐
2080              tems):
2081
2082              ALL       All limits listed below (default)
2083
2084              NONE      No limits listed below
2085
2086              AS        The  maximum  address  space  (virtual  memory)  for a
2087                        process.
2088
2089              CORE      The maximum size of core file
2090
2091              CPU       The maximum amount of CPU time
2092
2093              DATA      The maximum size of a process's data segment
2094
2095              FSIZE     The maximum size of files created. Note  that  if  the
2096                        user  sets  FSIZE to less than the current size of the
2097                        slurmd.log, job launches will fail with a  'File  size
2098                        limit exceeded' error.
2099
2100              MEMLOCK   The maximum size that may be locked into memory
2101
2102              NOFILE    The maximum number of open files
2103
2104              NPROC     The maximum number of processes available
2105
2106              RSS       The maximum resident set size. Note that this only has
2107                        effect with Linux kernels 2.4.30 or older or BSD.
2108
2109              STACK     The maximum stack size
2110
2111              This option applies to job allocations.
2112
2113
2114       --pty  Execute task zero in  pseudo  terminal  mode.   Implicitly  sets
2115              --unbuffered.  Implicitly sets --error and --output to /dev/null
2116              for all tasks except task zero, which may cause those  tasks  to
2117              exit immediately (e.g. shells will typically exit immediately in
2118              that situation).  This option applies to step allocations.
2119
2120
2121       -q, --qos=<qos>
2122              Request a quality of service for the job.  QOS values can be de‐
2123              fined  for  each  user/cluster/account  association in the Slurm
2124              database.  Users will be limited to their association's  defined
2125              set  of  qos's  when the Slurm configuration parameter, Account‐
2126              ingStorageEnforce, includes "qos" in its definition. This option
2127              applies to job allocations.
2128
2129
2130       -Q, --quiet
2131              Suppress  informational messages from srun. Errors will still be
2132              displayed. This option applies to job and step allocations.
2133
2134
2135       --quit-on-interrupt
2136              Quit immediately on single SIGINT (Ctrl-C). Use of  this  option
2137              disables  the  status  feature  normally available when srun re‐
2138              ceives a single Ctrl-C and causes srun  to  instead  immediately
2139              terminate  the  running job. This option applies to step alloca‐
2140              tions.
2141
2142
2143       --reboot
2144              Force the allocated nodes to reboot  before  starting  the  job.
2145              This  is only supported with some system configurations and will
2146              otherwise be silently ignored. Only root,  SlurmUser  or  admins
2147              can reboot nodes. This option applies to job allocations.
2148
2149
2150       -r, --relative=<n>
2151              Run  a  job  step  relative to node n of the current allocation.
2152              This option may be used to spread several job  steps  out  among
2153              the  nodes  of  the  current job. If -r is used, the current job
2154              step will begin at node n of the allocated nodelist,  where  the
2155              first node is considered node 0.  The -r option is not permitted
2156              with -w or -x option and will result in a fatal error  when  not
2157              running within a prior allocation (i.e. when SLURM_JOB_ID is not
2158              set). The default for n is 0. If the value  of  --nodes  exceeds
2159              the  number  of  nodes  identified with the --relative option, a
2160              warning message will be printed and the --relative  option  will
2161              take precedence. This option applies to step allocations.
2162
2163
2164       --reservation=<reservation_names>
2165              Allocate  resources  for  the job from the named reservation. If
2166              the job can use more than one reservation, specify  their  names
2167              in  a  comma separate list and the one offering earliest initia‐
2168              tion. Each reservation will be considered in the  order  it  was
2169              requested.   All  reservations will be listed in scontrol/squeue
2170              through the life of the job.  In accounting the  first  reserva‐
2171              tion  will be seen and after the job starts the reservation used
2172              will replace it.
2173
2174
2175       --resv-ports[=count]
2176              Reserve communication ports for this job. Users can specify  the
2177              number  of  port  they  want  to  reserve.  The  parameter  Mpi‐
2178              Params=ports=12000-12999 must be specified in slurm.conf. If not
2179              specified  and  Slurm's  OpenMPI plugin is used, then by default
2180              the number of reserved equal to the highest number of  tasks  on
2181              any  node in the job step allocation.  If the number of reserved
2182              ports is zero then no ports is reserved.  Used for OpenMPI. This
2183              option applies to job and step allocations.
2184
2185
2186       --send-libs[=yes|no]
2187              If set to yes (or no argument), autodetect and broadcast the ex‐
2188              ecutable's  shared  object  dependencies  to  allocated  compute
2189              nodes.  The  files  are placed in a directory alongside the exe‐
2190              cutable. The LD_LIBRARY_PATH is automatically updated to include
2191              this  cache directory as well. This overrides the default behav‐
2192              ior configured in slurm.conf  SbcastParameters  send_libs.  This
2193              option   only  works  in  conjunction  with  --bcast.  See  also
2194              --bcast-exclude.
2195
2196
2197       --signal=[R:]<sig_num>[@sig_time]
2198              When a job is within sig_time seconds of its end time,  send  it
2199              the  signal sig_num.  Due to the resolution of event handling by
2200              Slurm, the signal may be sent up  to  60  seconds  earlier  than
2201              specified.   sig_num may either be a signal number or name (e.g.
2202              "10" or "USR1").  sig_time must have an integer value between  0
2203              and  65535.   By default, no signal is sent before the job's end
2204              time.  If a sig_num is specified without any sig_time,  the  de‐
2205              fault  time will be 60 seconds. This option applies to job allo‐
2206              cations.  Use the "R:" option to allow this job to overlap  with
2207              a  reservation  with MaxStartDelay set.  To have the signal sent
2208              at preemption time see the preempt_send_user_signal SlurmctldPa‐
2209              rameter.
2210
2211
2212       --slurmd-debug=<level>
2213              Specify  a debug level for slurmd(8). The level may be specified
2214              either an integer value between 0 [quiet, only errors  are  dis‐
2215              played] and 4 [verbose operation] or the SlurmdDebug tags.
2216
2217              quiet     Log nothing
2218
2219              fatal     Log only fatal errors
2220
2221              error     Log only errors
2222
2223              info      Log errors and general informational messages
2224
2225              verbose   Log errors and verbose informational messages
2226
2227
2228              The slurmd debug information is copied onto the stderr of
2229              the  job.  By default only errors are displayed. This option ap‐
2230              plies to job and step allocations.
2231
2232
2233       --sockets-per-node=<sockets>
2234              Restrict node selection to nodes with  at  least  the  specified
2235              number  of  sockets.  See additional information under -B option
2236              above when task/affinity plugin is enabled. This option  applies
2237              to job allocations.
2238              NOTE:  This  option may implicitly impact the number of tasks if
2239              -n was not specified.
2240
2241
2242       --spread-job
2243              Spread the job allocation over as many nodes as possible and at‐
2244              tempt  to  evenly  distribute  tasks across the allocated nodes.
2245              This option disables the topology/tree plugin.  This option  ap‐
2246              plies to job allocations.
2247
2248
2249       --switches=<count>[@max-time]
2250              When  a tree topology is used, this defines the maximum count of
2251              leaf switches desired for the job allocation and optionally  the
2252              maximum time to wait for that number of switches. If Slurm finds
2253              an allocation containing more switches than the count specified,
2254              the job remains pending until it either finds an allocation with
2255              desired switch count or the time limit expires.  It there is  no
2256              switch  count limit, there is no delay in starting the job.  Ac‐
2257              ceptable  time  formats  include  "minutes",  "minutes:seconds",
2258              "hours:minutes:seconds",  "days-hours", "days-hours:minutes" and
2259              "days-hours:minutes:seconds".  The job's maximum time delay  may
2260              be limited by the system administrator using the SchedulerParam‐
2261              eters configuration parameter with the max_switch_wait parameter
2262              option.   On a dragonfly network the only switch count supported
2263              is 1 since communication performance will be highest when a  job
2264              is  allocate  resources  on  one leaf switch or more than 2 leaf
2265              switches.  The default max-time is  the  max_switch_wait  Sched‐
2266              ulerParameters. This option applies to job allocations.
2267
2268
2269       --task-epilog=<executable>
2270              The  slurmstepd  daemon will run executable just after each task
2271              terminates. This will be executed before any TaskEpilog  parame‐
2272              ter  in  slurm.conf  is  executed.  This  is  meant to be a very
2273              short-lived program. If it fails to terminate within a few  sec‐
2274              onds,  it  will  be  killed along with any descendant processes.
2275              This option applies to step allocations.
2276
2277
2278       --task-prolog=<executable>
2279              The slurmstepd daemon will run executable just before  launching
2280              each  task. This will be executed after any TaskProlog parameter
2281              in slurm.conf is executed.  Besides the normal environment vari‐
2282              ables, this has SLURM_TASK_PID available to identify the process
2283              ID of the task being started.  Standard output from this program
2284              of  the form "export NAME=value" will be used to set environment
2285              variables for the task being spawned.  This  option  applies  to
2286              step allocations.
2287
2288
2289       --test-only
2290              Returns  an  estimate  of  when  a job would be scheduled to run
2291              given the current job queue and all  the  other  srun  arguments
2292              specifying  the job.  This limits srun's behavior to just return
2293              information; no job is actually submitted.  The program will  be
2294              executed  directly  by the slurmd daemon. This option applies to
2295              job allocations.
2296
2297
2298       --thread-spec=<num>
2299              Count of specialized threads per node reserved by  the  job  for
2300              system  operations and not used by the application. The applica‐
2301              tion will not use these threads, but will be charged  for  their
2302              allocation.   This  option  can not be used with the --core-spec
2303              option. This option applies to job allocations.
2304
2305
2306       -T, --threads=<nthreads>
2307              Allows limiting the number of concurrent threads  used  to  send
2308              the job request from the srun process to the slurmd processes on
2309              the allocated nodes. Default is to use one thread per  allocated
2310              node  up  to a maximum of 60 concurrent threads. Specifying this
2311              option limits the number of concurrent threads to nthreads (less
2312              than  or  equal  to  60).  This should only be used to set a low
2313              thread count for testing on very small  memory  computers.  This
2314              option applies to job allocations.
2315
2316
2317       --threads-per-core=<threads>
2318              Restrict  node  selection  to  nodes with at least the specified
2319              number of threads per core. In task layout,  use  the  specified
2320              maximum   number  of  threads  per  core.  Implies  --exact  and
2321              --cpu-bind=threads unless overridden by command line or environ‐
2322              ment  options.  NOTE: "Threads" refers to the number of process‐
2323              ing units on each core rather than  the  number  of  application
2324              tasks  to be launched per core. See additional information under
2325              -B option above when task/affinity plugin is enabled.  This  op‐
2326              tion applies to job and step allocations.
2327              NOTE:  This  option may implicitly impact the number of tasks if
2328              -n was not specified.
2329
2330
2331       -t, --time=<time>
2332              Set a limit on the total run time of the job allocation.  If the
2333              requested time limit exceeds the partition's time limit, the job
2334              will be left in a PENDING state  (possibly  indefinitely).   The
2335              default  time limit is the partition's default time limit.  When
2336              the time limit is reached, each task in each job  step  is  sent
2337              SIGTERM  followed  by  SIGKILL.  The interval between signals is
2338              specified by the Slurm configuration  parameter  KillWait.   The
2339              OverTimeLimit  configuration parameter may permit the job to run
2340              longer than scheduled.  Time resolution is one minute and second
2341              values are rounded up to the next minute.
2342
2343              A  time  limit  of  zero requests that no time limit be imposed.
2344              Acceptable time formats  include  "minutes",  "minutes:seconds",
2345              "hours:minutes:seconds",  "days-hours", "days-hours:minutes" and
2346              "days-hours:minutes:seconds". This option  applies  to  job  and
2347              step allocations.
2348
2349
2350       --time-min=<time>
2351              Set  a  minimum time limit on the job allocation.  If specified,
2352              the job may have its --time limit lowered to a  value  no  lower
2353              than  --time-min  if doing so permits the job to begin execution
2354              earlier than otherwise possible.  The job's time limit will  not
2355              be  changed  after the job is allocated resources.  This is per‐
2356              formed by a backfill scheduling algorithm to allocate  resources
2357              otherwise  reserved  for  higher priority jobs.  Acceptable time
2358              formats  include   "minutes",   "minutes:seconds",   "hours:min‐
2359              utes:seconds",     "days-hours",     "days-hours:minutes"    and
2360              "days-hours:minutes:seconds". This option applies to job alloca‐
2361              tions.
2362
2363
2364       --tmp=<size>[units]
2365              Specify  a minimum amount of temporary disk space per node.  De‐
2366              fault units are megabytes.  Different units can be specified us‐
2367              ing  the  suffix  [K|M|G|T].  This option applies to job alloca‐
2368              tions.
2369
2370
2371       --uid=<user>
2372              Attempt to submit and/or run a job as user instead of the invok‐
2373              ing  user  id.  The  invoking user's credentials will be used to
2374              check access permissions for the target partition. User root may
2375              use  this option to run jobs as a normal user in a RootOnly par‐
2376              tition for example. If run as root, srun will drop  its  permis‐
2377              sions  to the uid specified after node allocation is successful.
2378              user may be the user name or numerical user ID. This option  ap‐
2379              plies to job and step allocations.
2380
2381
2382       -u, --unbuffered
2383              By   default,   the   connection   between  slurmstepd  and  the
2384              user-launched application is over a pipe. The stdio output writ‐
2385              ten  by  the  application  is  buffered by the glibc until it is
2386              flushed or the output is set as unbuffered.  See  setbuf(3).  If
2387              this  option  is  specified the tasks are executed with a pseudo
2388              terminal so that the application output is unbuffered. This  op‐
2389              tion applies to step allocations.
2390
2391       --usage
2392              Display brief help message and exit.
2393
2394
2395       --use-min-nodes
2396              If a range of node counts is given, prefer the smaller count.
2397
2398
2399       -v, --verbose
2400              Increase the verbosity of srun's informational messages.  Multi‐
2401              ple -v's will further increase  srun's  verbosity.   By  default
2402              only  errors  will  be displayed. This option applies to job and
2403              step allocations.
2404
2405
2406       -V, --version
2407              Display version information and exit.
2408
2409
2410       -W, --wait=<seconds>
2411              Specify how long to wait after the first task terminates  before
2412              terminating  all  remaining tasks. A value of 0 indicates an un‐
2413              limited wait (a warning will be issued after  60  seconds).  The
2414              default value is set by the WaitTime parameter in the slurm con‐
2415              figuration file (see slurm.conf(5)). This option can  be  useful
2416              to  ensure  that  a job is terminated in a timely fashion in the
2417              event that one or more tasks terminate prematurely.   Note:  The
2418              -K,  --kill-on-bad-exit  option takes precedence over -W, --wait
2419              to terminate the job immediately if a task exits with a non-zero
2420              exit code. This option applies to job allocations.
2421
2422
2423       --wckey=<wckey>
2424              Specify  wckey  to be used with job.  If TrackWCKey=no (default)
2425              in the slurm.conf this value is ignored. This option applies  to
2426              job allocations.
2427
2428
2429       --x11[={all|first|last}]
2430              Sets  up  X11  forwarding on "all", "first" or "last" node(s) of
2431              the allocation.  This option is only enabled if Slurm  was  com‐
2432              piled  with  X11  support  and PrologFlags=x11 is defined in the
2433              slurm.conf. Default is "all".
2434
2435
2436       srun will submit the job request to the slurm job controller, then ini‐
2437       tiate  all  processes on the remote nodes. If the request cannot be met
2438       immediately, srun will block until the resources are free  to  run  the
2439       job. If the -I (--immediate) option is specified srun will terminate if
2440       resources are not immediately available.
2441
2442       When initiating remote processes srun will propagate the current  work‐
2443       ing  directory,  unless --chdir=<path> is specified, in which case path
2444       will become the working directory for the remote processes.
2445
2446       The -n, -c, and -N options control how CPUs  and nodes  will  be  allo‐
2447       cated  to  the job. When specifying only the number of processes to run
2448       with -n, a default of one CPU per process is allocated.  By  specifying
2449       the number of CPUs required per task (-c), more than one CPU may be al‐
2450       located per process. If the number of nodes is specified with -N,  srun
2451       will attempt to allocate at least the number of nodes specified.
2452
2453       Combinations  of the above three options may be used to change how pro‐
2454       cesses are distributed across nodes and cpus. For instance, by specify‐
2455       ing  both  the number of processes and number of nodes on which to run,
2456       the number of processes per node is implied. However, if the number  of
2457       CPUs  per  process  is more important then number of processes (-n) and
2458       the number of CPUs per process (-c) should be specified.
2459
2460       srun will refuse to  allocate more than  one  process  per  CPU  unless
2461       --overcommit (-O) is also specified.
2462
2463       srun will attempt to meet the above specifications "at a minimum." That
2464       is, if 16 nodes are requested for 32 processes, and some nodes  do  not
2465       have 2 CPUs, the allocation of nodes will be increased in order to meet
2466       the demand for CPUs. In other words, a minimum of 16  nodes  are  being
2467       requested.  However,  if  16 nodes are requested for 15 processes, srun
2468       will consider this an error, as  15  processes  cannot  run  across  16
2469       nodes.
2470
2471
2472       IO Redirection
2473
2474       By  default, stdout and stderr will be redirected from all tasks to the
2475       stdout and stderr of srun, and stdin will be redirected from the  stan‐
2476       dard input of srun to all remote tasks.  If stdin is only to be read by
2477       a subset of the spawned tasks, specifying a file to  read  from  rather
2478       than  forwarding  stdin  from  the srun command may be preferable as it
2479       avoids moving and storing data that will never be read.
2480
2481       For OS X, the poll() function does not support stdin, so input  from  a
2482       terminal is not possible.
2483
2484       This  behavior  may  be changed with the --output, --error, and --input
2485       (-o, -e, -i) options. Valid format specifications for these options are
2486
2487       all       stdout stderr is redirected from all tasks to srun.  stdin is
2488                 broadcast  to  all remote tasks.  (This is the default behav‐
2489                 ior)
2490
2491       none      stdout and stderr is not received from any  task.   stdin  is
2492                 not sent to any task (stdin is closed).
2493
2494       taskid    stdout  and/or  stderr are redirected from only the task with
2495                 relative id equal to taskid, where 0  <=  taskid  <=  ntasks,
2496                 where  ntasks is the total number of tasks in the current job
2497                 step.  stdin is redirected from the stdin  of  srun  to  this
2498                 same  task.   This file will be written on the node executing
2499                 the task.
2500
2501       filename  srun will redirect stdout and/or stderr  to  the  named  file
2502                 from all tasks.  stdin will be redirected from the named file
2503                 and broadcast to all tasks in the job.  filename refers to  a
2504                 path  on the host that runs srun.  Depending on the cluster's
2505                 file system layout, this may result in the  output  appearing
2506                 in  different  places  depending on whether the job is run in
2507                 batch mode.
2508
2509       filename pattern
2510                 srun allows for a filename pattern to be used to generate the
2511                 named  IO  file described above. The following list of format
2512                 specifiers may be used in the format  string  to  generate  a
2513                 filename  that will be unique to a given jobid, stepid, node,
2514                 or task. In each case, the appropriate number  of  files  are
2515                 opened and associated with the corresponding tasks. Note that
2516                 any format string containing %t, %n, and/or %N will be  writ‐
2517                 ten on the node executing the task rather than the node where
2518                 srun executes, these format specifiers are not supported on a
2519                 BGQ system.
2520
2521                 \\     Do not process any of the replacement symbols.
2522
2523                 %%     The character "%".
2524
2525                 %A     Job array's master job allocation number.
2526
2527                 %a     Job array ID (index) number.
2528
2529                 %J     jobid.stepid of the running job. (e.g. "128.0")
2530
2531                 %j     jobid of the running job.
2532
2533                 %s     stepid of the running job.
2534
2535                 %N     short  hostname.  This  will create a separate IO file
2536                        per node.
2537
2538                 %n     Node identifier relative to current job (e.g.  "0"  is
2539                        the  first node of the running job) This will create a
2540                        separate IO file per node.
2541
2542                 %t     task identifier (rank) relative to current  job.  This
2543                        will create a separate IO file per task.
2544
2545                 %u     User name.
2546
2547                 %x     Job name.
2548
2549                 A  number  placed  between  the  percent character and format
2550                 specifier may be used to zero-pad the result in the IO  file‐
2551                 name.  This  number is ignored if the format specifier corre‐
2552                 sponds to  non-numeric data (%N for example).
2553
2554                 Some examples of how the format string may be used  for  a  4
2555                 task  job  step with a Job ID of 128 and step id of 0 are in‐
2556                 cluded below:
2557
2558                 job%J.out      job128.0.out
2559
2560                 job%4j.out     job0128.out
2561
2562                 job%j-%2t.out  job128-00.out, job128-01.out, ...
2563

PERFORMANCE

2565       Executing srun sends a remote procedure call to  slurmctld.  If  enough
2566       calls  from srun or other Slurm client commands that send remote proce‐
2567       dure calls to the slurmctld daemon come in at once, it can result in  a
2568       degradation  of performance of the slurmctld daemon, possibly resulting
2569       in a denial of service.
2570
2571       Do not run srun or other Slurm client commands that send remote  proce‐
2572       dure  calls to slurmctld from loops in shell scripts or other programs.
2573       Ensure that programs limit calls to srun to the minimum  necessary  for
2574       the information you are trying to gather.
2575
2576

INPUT ENVIRONMENT VARIABLES

2578       Upon  startup, srun will read and handle the options set in the follow‐
2579       ing environment variables. The majority of these variables are set  the
2580       same  way  the options are set, as defined above. For flag options that
2581       are defined to expect no argument, the option can be enabled by setting
2582       the  environment  variable  without a value (empty or NULL string), the
2583       string 'yes', or a non-zero number. Any other value for the environment
2584       variable  will  result in the option not being set.  There are a couple
2585       exceptions to these rules that are noted below.
2586       NOTE: Command line options always override  environment  variable  set‐
2587       tings.
2588
2589
2590       PMI_FANOUT            This  is  used  exclusively  with PMI (MPICH2 and
2591                             MVAPICH2) and controls the fanout of data  commu‐
2592                             nications. The srun command sends messages to ap‐
2593                             plication programs  (via  the  PMI  library)  and
2594                             those  applications may be called upon to forward
2595                             that data to up  to  this  number  of  additional
2596                             tasks.  Higher  values offload work from the srun
2597                             command to the applications and  likely  increase
2598                             the vulnerability to failures.  The default value
2599                             is 32.
2600
2601       PMI_FANOUT_OFF_HOST   This is used exclusively  with  PMI  (MPICH2  and
2602                             MVAPICH2)  and controls the fanout of data commu‐
2603                             nications.  The srun command  sends  messages  to
2604                             application  programs  (via  the PMI library) and
2605                             those applications may be called upon to  forward
2606                             that  data  to additional tasks. By default, srun
2607                             sends one message per host and one task  on  that
2608                             host  forwards  the  data  to other tasks on that
2609                             host up to PMI_FANOUT.  If PMI_FANOUT_OFF_HOST is
2610                             defined, the user task may be required to forward
2611                             the  data  to  tasks  on  other  hosts.   Setting
2612                             PMI_FANOUT_OFF_HOST   may  increase  performance.
2613                             Since more work is performed by the  PMI  library
2614                             loaded by the user application, failures also can
2615                             be more common and more  difficult  to  diagnose.
2616                             Should be disabled/enabled by setting to 0 or 1.
2617
2618       PMI_TIME              This  is  used  exclusively  with PMI (MPICH2 and
2619                             MVAPICH2) and controls how  much  the  communica‐
2620                             tions  from  the tasks to the srun are spread out
2621                             in time in order to avoid overwhelming  the  srun
2622                             command  with work. The default value is 500 (mi‐
2623                             croseconds) per task. On relatively slow  proces‐
2624                             sors  or systems with very large processor counts
2625                             (and large PMI data sets), higher values  may  be
2626                             required.
2627
2628       SLURM_ACCOUNT         Same as -A, --account
2629
2630       SLURM_ACCTG_FREQ      Same as --acctg-freq
2631
2632       SLURM_BCAST           Same as --bcast
2633
2634       SLURM_BCAST_EXCLUDE   Same as --bcast-exclude
2635
2636       SLURM_BURST_BUFFER    Same as --bb
2637
2638       SLURM_CLUSTERS        Same as -M, --clusters
2639
2640       SLURM_COMPRESS        Same as --compress
2641
2642       SLURM_CONF            The location of the Slurm configuration file.
2643
2644       SLURM_CONSTRAINT      Same as -C, --constraint
2645
2646       SLURM_CORE_SPEC       Same as --core-spec
2647
2648       SLURM_CPU_BIND        Same as --cpu-bind
2649
2650       SLURM_CPU_FREQ_REQ    Same as --cpu-freq.
2651
2652       SLURM_CPUS_PER_GPU    Same as --cpus-per-gpu
2653
2654       SLURM_CPUS_PER_TASK   Same as -c, --cpus-per-task
2655
2656       SLURM_DEBUG           Same  as  -v, --verbose. Must be set to 0 or 1 to
2657                             disable or enable the option.
2658
2659       SLURM_DELAY_BOOT      Same as --delay-boot
2660
2661       SLURM_DEPENDENCY      Same as -d, --dependency=<jobid>
2662
2663       SLURM_DISABLE_STATUS  Same as -X, --disable-status
2664
2665       SLURM_DIST_PLANESIZE  Plane distribution size. Only used if --distribu‐
2666                             tion=plane, without =<size>, is set.
2667
2668       SLURM_DISTRIBUTION    Same as -m, --distribution
2669
2670       SLURM_EPILOG          Same as --epilog
2671
2672       SLURM_EXACT           Same as --exact
2673
2674       SLURM_EXCLUSIVE       Same as --exclusive
2675
2676       SLURM_EXIT_ERROR      Specifies  the  exit  code generated when a Slurm
2677                             error occurs (e.g. invalid options).  This can be
2678                             used  by a script to distinguish application exit
2679                             codes from various Slurm error conditions.   Also
2680                             see SLURM_EXIT_IMMEDIATE.
2681
2682       SLURM_EXIT_IMMEDIATE  Specifies  the exit code generated when the --im‐
2683                             mediate option is used and resources are not cur‐
2684                             rently  available.   This can be used by a script
2685                             to distinguish application exit codes from  vari‐
2686                             ous    Slurm    error   conditions.    Also   see
2687                             SLURM_EXIT_ERROR.
2688
2689       SLURM_EXPORT_ENV      Same as --export
2690
2691       SLURM_GPU_BIND        Same as --gpu-bind
2692
2693       SLURM_GPU_FREQ        Same as --gpu-freq
2694
2695       SLURM_GPUS            Same as -G, --gpus
2696
2697       SLURM_GPUS_PER_NODE   Same as --gpus-per-node
2698
2699       SLURM_GPUS_PER_TASK   Same as --gpus-per-task
2700
2701       SLURM_GRES            Same as --gres. Also see SLURM_STEP_GRES
2702
2703       SLURM_GRES_FLAGS      Same as --gres-flags
2704
2705       SLURM_HINT            Same as --hint
2706
2707       SLURM_IMMEDIATE       Same as -I, --immediate
2708
2709       SLURM_JOB_ID          Same as --jobid
2710
2711       SLURM_JOB_NAME        Same as -J, --job-name except within an  existing
2712                             allocation,  in which case it is ignored to avoid
2713                             using the batch job's name as the  name  of  each
2714                             job step.
2715
2716       SLURM_JOB_NUM_NODES   Same  as  -N,  --nodes.  Total number of nodes in
2717                             the job’s resource allocation.
2718
2719       SLURM_KILL_BAD_EXIT   Same as -K, --kill-on-bad-exit. Must be set to  0
2720                             or 1 to disable or enable the option.
2721
2722       SLURM_LABELIO         Same as -l, --label
2723
2724       SLURM_MEM_BIND        Same as --mem-bind
2725
2726       SLURM_MEM_PER_CPU     Same as --mem-per-cpu
2727
2728       SLURM_MEM_PER_GPU     Same as --mem-per-gpu
2729
2730       SLURM_MEM_PER_NODE    Same as --mem
2731
2732       SLURM_MPI_TYPE        Same as --mpi
2733
2734       SLURM_NETWORK         Same as --network
2735
2736       SLURM_NNODES          Same as -N, --nodes. Total number of nodes in the
2737                             job’s       resource       allocation.        See
2738                             SLURM_JOB_NUM_NODES.  Included for backwards com‐
2739                             patibility.
2740
2741       SLURM_NO_KILL         Same as -k, --no-kill
2742
2743       SLURM_NPROCS          Same as -n, --ntasks. See SLURM_NTASKS.  Included
2744                             for backwards compatibility.
2745
2746       SLURM_NTASKS          Same as -n, --ntasks
2747
2748       SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2749
2750       SLURM_NTASKS_PER_GPU  Same as --ntasks-per-gpu
2751
2752       SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2753
2754       SLURM_NTASKS_PER_SOCKET
2755                             Same as --ntasks-per-socket
2756
2757       SLURM_OPEN_MODE       Same as --open-mode
2758
2759       SLURM_OVERCOMMIT      Same as -O, --overcommit
2760
2761       SLURM_OVERLAP         Same as --overlap
2762
2763       SLURM_PARTITION       Same as -p, --partition
2764
2765       SLURM_PMI_KVS_NO_DUP_KEYS
2766                             If set, then PMI key-pairs will contain no dupli‐
2767                             cate keys. MPI can use this  variable  to  inform
2768                             the  PMI  library  that it will not use duplicate
2769                             keys so PMI can  skip  the  check  for  duplicate
2770                             keys.   This  is  the case for MPICH2 and reduces
2771                             overhead in testing for duplicates  for  improved
2772                             performance
2773
2774       SLURM_POWER           Same as --power
2775
2776       SLURM_PROFILE         Same as --profile
2777
2778       SLURM_PROLOG          Same as --prolog
2779
2780       SLURM_QOS             Same as --qos
2781
2782       SLURM_REMOTE_CWD      Same as -D, --chdir=
2783
2784       SLURM_REQ_SWITCH      When  a  tree  topology is used, this defines the
2785                             maximum count of switches desired for the job al‐
2786                             location  and optionally the maximum time to wait
2787                             for that number of switches. See --switches
2788
2789       SLURM_RESERVATION     Same as --reservation
2790
2791       SLURM_RESV_PORTS      Same as --resv-ports
2792
2793       SLURM_SEND_LIBS       Same as --send-libs
2794
2795       SLURM_SIGNAL          Same as --signal
2796
2797       SLURM_SPREAD_JOB      Same as --spread-job
2798
2799       SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2800                             if set and non-zero, successive  task  exit  mes‐
2801                             sages  with  the  same  exit code will be printed
2802                             only once.
2803
2804       SLURM_STDERRMODE      Same as -e, --error
2805
2806       SLURM_STDINMODE       Same as -i, --input
2807
2808       SLURM_STDOUTMODE      Same as -o, --output
2809
2810       SLURM_STEP_GRES       Same as --gres (only applies to job steps, not to
2811                             job allocations).  Also see SLURM_GRES
2812
2813       SLURM_STEP_KILLED_MSG_NODE_ID=ID
2814                             If set, only the specified node will log when the
2815                             job or step are killed by a signal.
2816
2817       SLURM_TASK_EPILOG     Same as --task-epilog
2818
2819       SLURM_TASK_PROLOG     Same as --task-prolog
2820
2821       SLURM_TEST_EXEC       If defined, srun will verify existence of the ex‐
2822                             ecutable  program along with user execute permis‐
2823                             sion on the node where srun was called before at‐
2824                             tempting to launch it on nodes in the step.
2825
2826       SLURM_THREAD_SPEC     Same as --thread-spec
2827
2828       SLURM_THREADS         Same as -T, --threads
2829
2830       SLURM_THREADS_PER_CORE
2831                             Same as --threads-per-core
2832
2833       SLURM_TIMELIMIT       Same as -t, --time
2834
2835       SLURM_UNBUFFEREDIO    Same as -u, --unbuffered
2836
2837       SLURM_USE_MIN_NODES   Same as --use-min-nodes
2838
2839       SLURM_WAIT            Same as -W, --wait
2840
2841       SLURM_WAIT4SWITCH     Max  time  waiting  for  requested  switches. See
2842                             --switches
2843
2844       SLURM_WCKEY           Same as -W, --wckey
2845
2846       SLURM_WORKING_DIR     -D, --chdir
2847
2848       SLURMD_DEBUG          Same as -d, --slurmd-debug. Must be set to 0 or 1
2849                             to disable or enable the option.
2850
2851       SRUN_CONTAINER        Same as --container.
2852
2853       SRUN_EXPORT_ENV       Same  as  --export, and will override any setting
2854                             for SLURM_EXPORT_ENV.
2855
2856
2857

OUTPUT ENVIRONMENT VARIABLES

2859       srun will set some environment variables in the environment of the exe‐
2860       cuting  tasks on the remote compute nodes.  These environment variables
2861       are:
2862
2863
2864       SLURM_*_HET_GROUP_#   For a heterogeneous job allocation, the  environ‐
2865                             ment variables are set separately for each compo‐
2866                             nent.
2867
2868       SLURM_CLUSTER_NAME    Name of the cluster on which the job  is  execut‐
2869                             ing.
2870
2871       SLURM_CPU_BIND_LIST   --cpu-bind  map  or  mask list (list of Slurm CPU
2872                             IDs or masks for this node, CPU_ID =  Board_ID  x
2873                             threads_per_board       +       Socket_ID       x
2874                             threads_per_socket + Core_ID x threads_per_core +
2875                             Thread_ID).
2876
2877       SLURM_CPU_BIND_TYPE   --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2878
2879       SLURM_CPU_BIND_VERBOSE
2880                             --cpu-bind verbosity (quiet,verbose).
2881
2882       SLURM_CPU_FREQ_REQ    Contains the value requested for cpu frequency on
2883                             the srun command  as  a  numerical  frequency  in
2884                             kilohertz, or a coded value for a request of low,
2885                             medium,highm1 or high for the frequency.  See the
2886                             description  of  the  --cpu-freq  option  or  the
2887                             SLURM_CPU_FREQ_REQ input environment variable.
2888
2889       SLURM_CPUS_ON_NODE    Number of CPUs available  to  the  step  on  this
2890                             node.   NOTE:  The select/linear plugin allocates
2891                             entire nodes to jobs, so the value indicates  the
2892                             total  count  of  CPUs  on the node.  For the se‐
2893                             lect/cons_res and cons/tres plugins, this  number
2894                             indicates  the  number of CPUs on this node allo‐
2895                             cated to the step.
2896
2897       SLURM_CPUS_PER_TASK   Number of cpus requested per task.  Only  set  if
2898                             the --cpus-per-task option is specified.
2899
2900       SLURM_DISTRIBUTION    Distribution type for the allocated jobs. Set the
2901                             distribution with -m, --distribution.
2902
2903       SLURM_GPUS_ON_NODE    Number of GPUs available  to  the  step  on  this
2904                             node.
2905
2906       SLURM_GTIDS           Global  task IDs running on this node.  Zero ori‐
2907                             gin and comma separated.  It is  read  internally
2908                             by pmi if Slurm was built with pmi support. Leav‐
2909                             ing the variable set may cause problems when  us‐
2910                             ing external packages from within the job (Abaqus
2911                             and Ansys have been known to have  problems  when
2912                             it is set - consult the appropriate documentation
2913                             for 3rd party software).
2914
2915       SLURM_HET_SIZE        Set to count of components in heterogeneous job.
2916
2917       SLURM_JOB_ACCOUNT     Account name associated of the job allocation.
2918
2919       SLURM_JOB_CPUS_PER_NODE
2920                             Count of CPUs available to the job on  the  nodes
2921                             in    the    allocation,    using    the   format
2922                             CPU_count[(xnumber_of_nodes)][,CPU_count  [(xnum‐
2923                             ber_of_nodes)]      ...].       For      example:
2924                             SLURM_JOB_CPUS_PER_NODE='72(x2),36'     indicates
2925                             that  on the first and second nodes (as listed by
2926                             SLURM_JOB_NODELIST) the allocation has  72  CPUs,
2927                             while  the third node has 36 CPUs.  NOTE: The se‐
2928                             lect/linear  plugin  allocates  entire  nodes  to
2929                             jobs,  so  the value indicates the total count of
2930                             CPUs on allocated nodes. The select/cons_res  and
2931                             select/cons_tres plugins allocate individual CPUs
2932                             to jobs, so this number indicates the  number  of
2933                             CPUs allocated to the job.
2934
2935       SLURM_JOB_DEPENDENCY  Set to value of the --dependency option.
2936
2937       SLURM_JOB_ID          Job id of the executing job.
2938
2939       SLURM_JOB_NAME        Set  to the value of the --job-name option or the
2940                             command name when srun is used to  create  a  new
2941                             job allocation. Not set when srun is used only to
2942                             create a job step (i.e. within  an  existing  job
2943                             allocation).
2944
2945       SLURM_JOB_NODELIST    List of nodes allocated to the job.
2946
2947       SLURM_JOB_NODES       Total number of nodes in the job's resource allo‐
2948                             cation.
2949
2950       SLURM_JOB_PARTITION   Name of the partition in which the  job  is  run‐
2951                             ning.
2952
2953       SLURM_JOB_QOS         Quality Of Service (QOS) of the job allocation.
2954
2955       SLURM_JOB_RESERVATION Advanced  reservation  containing the job alloca‐
2956                             tion, if any.
2957
2958       SLURM_JOBID           Job id of the executing  job.  See  SLURM_JOB_ID.
2959                             Included for backwards compatibility.
2960
2961       SLURM_LAUNCH_NODE_IPADDR
2962                             IP address of the node from which the task launch
2963                             was initiated (where the srun command ran from).
2964
2965       SLURM_LOCALID         Node local task ID for the process within a job.
2966
2967       SLURM_MEM_BIND_LIST   --mem-bind map or mask  list  (<list  of  IDs  or
2968                             masks for this node>).
2969
2970       SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2971
2972       SLURM_MEM_BIND_SORT   Sort  free cache pages (run zonesort on Intel KNL
2973                             nodes).
2974
2975       SLURM_MEM_BIND_TYPE   --mem-bind type (none,rank,map_mem:,mask_mem:).
2976
2977       SLURM_MEM_BIND_VERBOSE
2978                             --mem-bind verbosity (quiet,verbose).
2979
2980       SLURM_NODE_ALIASES    Sets of  node  name,  communication  address  and
2981                             hostname  for nodes allocated to the job from the
2982                             cloud. Each element in the set if colon separated
2983                             and each set is comma separated. For example:
2984                             SLURM_NODE_ALIASES=
2985                             ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2986
2987       SLURM_NODEID          The relative node ID of the current node.
2988
2989       SLURM_NPROCS          Total  number  of processes in the current job or
2990                             job step. See SLURM_NTASKS.  Included  for  back‐
2991                             wards compatibility.
2992
2993       SLURM_NTASKS          Total  number  of processes in the current job or
2994                             job step.
2995
2996       SLURM_OVERCOMMIT      Set to 1 if --overcommit was specified.
2997
2998       SLURM_PRIO_PROCESS    The scheduling priority (nice value) at the  time
2999                             of  job  submission.  This value is propagated to
3000                             the spawned processes.
3001
3002       SLURM_PROCID          The MPI rank (or relative process ID) of the cur‐
3003                             rent process.
3004
3005       SLURM_SRUN_COMM_HOST  IP address of srun communication host.
3006
3007       SLURM_SRUN_COMM_PORT  srun communication port.
3008
3009       SLURM_CONTAINER       OCI  Bundle  for job.  Only set if --container is
3010                             specified.
3011
3012       SLURM_STEP_ID         The step ID of the current job.
3013
3014       SLURM_STEP_LAUNCHER_PORT
3015                             Step launcher port.
3016
3017       SLURM_STEP_NODELIST   List of nodes allocated to the step.
3018
3019       SLURM_STEP_NUM_NODES  Number of nodes allocated to the step.
3020
3021       SLURM_STEP_NUM_TASKS  Number of processes in the job step or whole het‐
3022                             erogeneous job step.
3023
3024       SLURM_STEP_TASKS_PER_NODE
3025                             Number of processes per node within the step.
3026
3027       SLURM_STEPID          The   step   ID   of   the   current   job.   See
3028                             SLURM_STEP_ID. Included for backwards compatibil‐
3029                             ity.
3030
3031       SLURM_SUBMIT_DIR      The  directory  from which the allocation was in‐
3032                             voked from.
3033
3034       SLURM_SUBMIT_HOST     The hostname of the computer from which the allo‐
3035                             cation was invoked from.
3036
3037       SLURM_TASK_PID        The process ID of the task being started.
3038
3039       SLURM_TASKS_PER_NODE  Number  of  tasks  to  be initiated on each node.
3040                             Values are comma separated and in the same  order
3041                             as  SLURM_JOB_NODELIST.   If two or more consecu‐
3042                             tive nodes are to have the same task count,  that
3043                             count is followed by "(x#)" where "#" is the rep‐
3044                             etition        count.        For         example,
3045                             "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
3046                             first three nodes will each execute two tasks and
3047                             the fourth node will execute one task.
3048
3049
3050       SLURM_TOPOLOGY_ADDR   This  is  set  only  if the system has the topol‐
3051                             ogy/tree plugin configured.  The  value  will  be
3052                             set  to  the  names network switches which may be
3053                             involved in the  job's  communications  from  the
3054                             system's top level switch down to the leaf switch
3055                             and ending with node name. A period  is  used  to
3056                             separate each hardware component name.
3057
3058       SLURM_TOPOLOGY_ADDR_PATTERN
3059                             This  is  set  only  if the system has the topol‐
3060                             ogy/tree plugin configured.  The  value  will  be
3061                             set   component   types  listed  in  SLURM_TOPOL‐
3062                             OGY_ADDR.  Each component will be  identified  as
3063                             either  "switch"  or "node".  A period is used to
3064                             separate each hardware component type.
3065
3066       SLURM_UMASK           The umask in effect when the job was submitted.
3067
3068       SLURMD_NODENAME       Name of the node running the task. In the case of
3069                             a  parallel  job  executing  on  multiple compute
3070                             nodes, the various tasks will have this  environ‐
3071                             ment  variable  set  to  different values on each
3072                             compute node.
3073
3074       SRUN_DEBUG            Set to the logging level  of  the  srun  command.
3075                             Default  value  is  3 (info level).  The value is
3076                             incremented or decremented based upon the  --ver‐
3077                             bose and --quiet options.
3078
3079

SIGNALS AND ESCAPE SEQUENCES

3081       Signals  sent  to  the  srun command are automatically forwarded to the
3082       tasks it is controlling with a  few  exceptions.  The  escape  sequence
3083       <control-c> will report the state of all tasks associated with the srun
3084       command. If <control-c> is entered twice within one  second,  then  the
3085       associated  SIGINT  signal  will be sent to all tasks and a termination
3086       sequence will be entered sending SIGCONT, SIGTERM, and SIGKILL  to  all
3087       spawned  tasks.   If  a third <control-c> is received, the srun program
3088       will be terminated without waiting for remote tasks to  exit  or  their
3089       I/O to complete.
3090
3091       The escape sequence <control-z> is presently ignored.
3092
3093

MPI SUPPORT

3095       MPI  use depends upon the type of MPI being used.  There are three fun‐
3096       damentally different modes of operation used by these various  MPI  im‐
3097       plementations.
3098
3099       1.  Slurm  directly  launches  the tasks and performs initialization of
3100       communications through the PMI2 or PMIx APIs.  For example: "srun  -n16
3101       a.out".
3102
3103       2.  Slurm  creates  a  resource  allocation for the job and then mpirun
3104       launches tasks using Slurm's infrastructure (OpenMPI).
3105
3106       3. Slurm creates a resource allocation for  the  job  and  then  mpirun
3107       launches  tasks  using  some mechanism other than Slurm, such as SSH or
3108       RSH.  These tasks are initiated outside of Slurm's monitoring  or  con‐
3109       trol. Slurm's epilog should be configured to purge these tasks when the
3110       job's allocation is relinquished, or  the  use  of  pam_slurm_adopt  is
3111       highly recommended.
3112
3113       See  https://slurm.schedmd.com/mpi_guide.html  for  more information on
3114       use of these various MPI implementations with Slurm.
3115
3116

MULTIPLE PROGRAM CONFIGURATION

3118       Comments in the configuration file must have a "#" in column one.   The
3119       configuration  file  contains  the  following fields separated by white
3120       space:
3121
3122       Task rank
3123              One or more task ranks to use this configuration.  Multiple val‐
3124              ues  may  be  comma separated.  Ranges may be indicated with two
3125              numbers separated with a '-' with the smaller number first (e.g.
3126              "0-4" and not "4-0").  To indicate all tasks not otherwise spec‐
3127              ified, specify a rank of '*' as the last line of the  file.   If
3128              an  attempt  is  made to initiate a task for which no executable
3129              program is defined, the following error message will be produced
3130              "No executable program specified for this task".
3131
3132       Executable
3133              The  name  of  the  program  to execute.  May be fully qualified
3134              pathname if desired.
3135
3136       Arguments
3137              Program arguments.  The expression "%t" will  be  replaced  with
3138              the  task's  number.   The expression "%o" will be replaced with
3139              the task's offset within this range (e.g. a configured task rank
3140              value  of  "1-5"  would  have  offset  values of "0-4").  Single
3141              quotes may be used to avoid having the  enclosed  values  inter‐
3142              preted.   This field is optional.  Any arguments for the program
3143              entered on the command line will be added to the arguments spec‐
3144              ified in the configuration file.
3145
3146       For example:
3147       $ cat silly.conf
3148       ###################################################################
3149       # srun multiple program configuration file
3150       #
3151       # srun -n8 -l --multi-prog silly.conf
3152       ###################################################################
3153       4-6       hostname
3154       1,7       echo  task:%t
3155       0,2-3     echo  offset:%o
3156
3157       $ srun -n8 -l --multi-prog silly.conf
3158       0: offset:0
3159       1: task:1
3160       2: offset:1
3161       3: offset:2
3162       4: linux15.llnl.gov
3163       5: linux16.llnl.gov
3164       6: linux17.llnl.gov
3165       7: task:7
3166
3167

EXAMPLES

3169       This  simple example demonstrates the execution of the command hostname
3170       in eight tasks. At least eight processors will be allocated to the  job
3171       (the same as the task count) on however many nodes are required to sat‐
3172       isfy the request. The output of each task will be  proceeded  with  its
3173       task  number.   (The  machine "dev" in the example below has a total of
3174       two CPUs per node)
3175
3176       $ srun -n8 -l hostname
3177       0: dev0
3178       1: dev0
3179       2: dev1
3180       3: dev1
3181       4: dev2
3182       5: dev2
3183       6: dev3
3184       7: dev3
3185
3186
3187       The srun -r option is used within a job script to run two job steps  on
3188       disjoint  nodes in the following example. The script is run using allo‐
3189       cate mode instead of as a batch job in this case.
3190
3191       $ cat test.sh
3192       #!/bin/sh
3193       echo $SLURM_JOB_NODELIST
3194       srun -lN2 -r2 hostname
3195       srun -lN2 hostname
3196
3197       $ salloc -N4 test.sh
3198       dev[7-10]
3199       0: dev9
3200       1: dev10
3201       0: dev7
3202       1: dev8
3203
3204
3205       The following script runs two job steps in parallel within an allocated
3206       set of nodes.
3207
3208       $ cat test.sh
3209       #!/bin/bash
3210       srun -lN2 -n4 -r 2 sleep 60 &
3211       srun -lN2 -r 0 sleep 60 &
3212       sleep 1
3213       squeue
3214       squeue -s
3215       wait
3216
3217       $ salloc -N4 test.sh
3218         JOBID PARTITION     NAME     USER  ST      TIME  NODES NODELIST
3219         65641     batch  test.sh   grondo   R      0:01      4 dev[7-10]
3220
3221       STEPID     PARTITION     USER      TIME NODELIST
3222       65641.0        batch   grondo      0:01 dev[7-8]
3223       65641.1        batch   grondo      0:01 dev[9-10]
3224
3225
3226       This  example  demonstrates  how one executes a simple MPI job.  We use
3227       srun to build a list of machines (nodes) to be used by  mpirun  in  its
3228       required  format.  A  sample command line and the script to be executed
3229       follow.
3230
3231       $ cat test.sh
3232       #!/bin/sh
3233       MACHINEFILE="nodes.$SLURM_JOB_ID"
3234
3235       # Generate Machinefile for mpi such that hosts are in the same
3236       #  order as if run via srun
3237       #
3238       srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3239
3240       # Run using generated Machine file:
3241       mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3242
3243       rm $MACHINEFILE
3244
3245       $ salloc -N2 -n4 test.sh
3246
3247
3248       This simple example demonstrates the execution  of  different  jobs  on
3249       different  nodes  in  the same srun.  You can do this for any number of
3250       nodes or any number of jobs.  The executables are placed on  the  nodes
3251       sited by the SLURM_NODEID env var.  Starting at 0 and going to the num‐
3252       ber specified on the srun command line.
3253
3254       $ cat test.sh
3255       case $SLURM_NODEID in
3256           0) echo "I am running on "
3257              hostname ;;
3258           1) hostname
3259              echo "is where I am running" ;;
3260       esac
3261
3262       $ srun -N2 test.sh
3263       dev0
3264       is where I am running
3265       I am running on
3266       dev1
3267
3268
3269       This example demonstrates use of multi-core options to  control  layout
3270       of  tasks.   We  request  that  four sockets per node and two cores per
3271       socket be dedicated to the job.
3272
3273       $ srun -N2 -B 4-4:2-2 a.out
3274
3275
3276       This example shows a script in which Slurm is used to provide  resource
3277       management  for  a job by executing the various job steps as processors
3278       become available for their dedicated use.
3279
3280       $ cat my.script
3281       #!/bin/bash
3282       srun -n4 prog1 &
3283       srun -n3 prog2 &
3284       srun -n1 prog3 &
3285       srun -n1 prog4 &
3286       wait
3287
3288
3289       This example shows how to launch an application  called  "server"  with
3290       one task, 8 CPUs and 16 GB of memory (2 GB per CPU) plus another appli‐
3291       cation called "client" with 16 tasks, 1 CPU per task (the default)  and
3292       1 GB of memory per task.
3293
3294       $ srun -n1 -c16 --mem-per-cpu=1gb server : -n16 --mem-per-cpu=1gb client
3295
3296

COPYING

3298       Copyright  (C)  2006-2007  The Regents of the University of California.
3299       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3300       Copyright (C) 2008-2010 Lawrence Livermore National Security.
3301       Copyright (C) 2010-2021 SchedMD LLC.
3302
3303       This file is part of Slurm, a resource  management  program.   For  de‐
3304       tails, see <https://slurm.schedmd.com/>.
3305
3306       Slurm  is free software; you can redistribute it and/or modify it under
3307       the terms of the GNU General Public License as published  by  the  Free
3308       Software  Foundation;  either version 2 of the License, or (at your op‐
3309       tion) any later version.
3310
3311       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
3312       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
3313       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
3314       for more details.
3315
3316