srun(1) - f38

1srun(1)                         Slurm Commands                         srun(1)
2
3
4

NAME

6       srun - Run parallel jobs
7
8

SYNOPSIS

10       srun  [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]]
11       executable(N) [args(N)...]
12
13       Option(s) define multiple jobs in  a  co-scheduled  heterogeneous  job.
14       For more details about heterogeneous jobs see the document
15       https://slurm.schedmd.com/heterogeneous_jobs.html
16
17

DESCRIPTION

19       Run  a  parallel  job  on cluster managed by Slurm.  If necessary, srun
20       will first create a resource allocation in which to  run  the  parallel
21       job.
22
23       The  following  document  describes the influence of various options on
24       the allocation of cpus to jobs and tasks.
25       https://slurm.schedmd.com/cpu_management.html
26
27

RETURN VALUE

29       srun will return the highest exit code of all tasks run or the  highest
30       signal  (with  the high-order bit set in an 8-bit integer -- e.g. 128 +
31       signal) of any task that exited with a signal.
32       The value 253 is reserved for out-of-memory errors.
33
34

EXECUTABLE PATH RESOLUTION

36       The executable is resolved in the following order:
37
38       1. If executable starts with ".", then path is constructed as:  current
39       working directory / executable
40       2. If executable starts with a "/", then path is considered absolute.
41       3. If executable can be resolved through PATH. See path_resolution(7).
42       4. If executable is in current working directory.
43
44       Current  working directory is the calling process working directory un‐
45       less the --chdir argument is passed, which will  override  the  current
46       working directory.
47
48

OPTIONS

50       --accel-bind=<options>
51              Control how tasks are bound to generic resources of type gpu and
52              nic.  Multiple options may be specified. Supported  options  in‐
53              clude:
54
55              g      Bind each task to GPUs which are closest to the allocated
56                     CPUs.
57
58              n      Bind each task to NICs which are closest to the allocated
59                     CPUs.
60
61              v      Verbose  mode. Log how tasks are bound to GPU and NIC de‐
62                     vices.
63
64              This option applies to job allocations.
65
66       -A, --account=<account>
67              Charge resources used by this job to specified account.  The ac‐
68              count  is  an  arbitrary string. The account name may be changed
69              after job submission using the scontrol command. This option ap‐
70              plies to job allocations.
71
72       --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
73              Define  the  job  accounting and profiling sampling intervals in
74              seconds.  This can be used  to  override  the  JobAcctGatherFre‐
75              quency  parameter  in the slurm.conf file. <datatype>=<interval>
76              specifies the task  sampling  interval  for  the  jobacct_gather
77              plugin  or  a  sampling  interval  for  a  profiling type by the
78              acct_gather_profile     plugin.     Multiple     comma-separated
79              <datatype>=<interval> pairs may be specified. Supported datatype
80              values are:
81
82              task        Sampling interval for the jobacct_gather plugins and
83                          for   task   profiling  by  the  acct_gather_profile
84                          plugin.
85                          NOTE: This frequency is used to monitor  memory  us‐
86                          age.  If memory limits are enforced the highest fre‐
87                          quency a user can request is what is  configured  in
88                          the slurm.conf file.  It can not be disabled.
89
90              energy      Sampling  interval  for  energy  profiling using the
91                          acct_gather_energy plugin.
92
93              network     Sampling interval for infiniband profiling using the
94                          acct_gather_interconnect plugin.
95
96              filesystem  Sampling interval for filesystem profiling using the
97                          acct_gather_filesystem plugin.
98
99
100              The default value for the task sampling interval is 30  seconds.
101              The  default value for all other intervals is 0.  An interval of
102              0 disables sampling of the specified type.  If the task sampling
103              interval  is  0, accounting information is collected only at job
104              termination (reducing Slurm interference with the job).
105              Smaller (non-zero) values have a greater impact upon job perfor‐
106              mance,  but a value of 30 seconds is not likely to be noticeable
107              for applications having less than 10,000 tasks. This option  ap‐
108              plies to job allocations.
109
110       --bb=<spec>
111              Burst  buffer  specification.  The  form of the specification is
112              system dependent.  Also see --bbf. This option  applies  to  job
113              allocations.   When  the  --bb option is used, Slurm parses this
114              option and creates a temporary burst buffer script file that  is
115              used  internally  by the burst buffer plugins. See Slurm's burst
116              buffer guide for more information and examples:
117              https://slurm.schedmd.com/burst_buffer.html
118
119       --bbf=<file_name>
120              Path of file containing burst buffer specification.  The form of
121              the  specification is system dependent.  Also see --bb. This op‐
122              tion applies to job allocations.  See Slurm's burst buffer guide
123              for more information and examples:
124              https://slurm.schedmd.com/burst_buffer.html
125
126       --bcast[=<dest_path>]
127              Copy executable file to allocated compute nodes.  If a file name
128              is specified, copy the executable to the  specified  destination
129              file path.  If the path specified ends with '/' it is treated as
130              a target directory,  and  the  destination  file  name  will  be
131              slurm_bcast_<job_id>.<step_id>_<nodename>.   If  no dest_path is
132              specified and the slurm.conf BcastParameters DestDir is  config‐
133              ured  then  it  is used, and the filename follows the above pat‐
134              tern. If none of the previous  is  specified,  then  --chdir  is
135              used, and the filename follows the above pattern too.  For exam‐
136              ple, "srun --bcast=/tmp/mine  -N3  a.out"  will  copy  the  file
137              "a.out"  from  your current directory to the file "/tmp/mine" on
138              each of the three allocated compute nodes and execute that file.
139              This option applies to step allocations.
140
141       --bcast-exclude={NONE|<exclude_path>[,<exclude_path>...]}
142              Comma-separated  list of absolute directory paths to be excluded
143              when autodetecting and broadcasting executable shared object de‐
144              pendencies through --bcast. If the keyword "NONE" is configured,
145              no directory paths will be excluded. The default value  is  that
146              of  slurm.conf  BcastExclude  and  this option overrides it. See
147              also --bcast and --send-libs.
148
149       -b, --begin=<time>
150              Defer initiation of this job until the specified time.   It  ac‐
151              cepts times of the form HH:MM:SS to run a job at a specific time
152              of day (seconds are optional).  (If that time is  already  past,
153              the  next day is assumed.)  You may also specify midnight, noon,
154              fika (3 PM) or teatime (4 PM) and you  can  have  a  time-of-day
155              suffixed  with  AM  or  PM  for  running  in  the morning or the
156              evening.  You can also say what day the  job  will  be  run,  by
157              specifying  a  date  of  the form MMDDYY or MM/DD/YY YYYY-MM-DD.
158              Combine   date   and   time   using   the    following    format
159              YYYY-MM-DD[THH:MM[:SS]].  You  can  also  give  times like now +
160              count time-units, where the time-units can be seconds (default),
161              minutes, hours, days, or weeks and you can tell Slurm to run the
162              job today with the keyword today and to  run  the  job  tomorrow
163              with  the  keyword tomorrow.  The value may be changed after job
164              submission using the scontrol command.  For example:
165
166                 --begin=16:00
167                 --begin=now+1hour
168                 --begin=now+60           (seconds by default)
169                 --begin=2010-01-20T12:34:00
170
171
172              Notes on date/time specifications:
173               - Although the 'seconds' field of the HH:MM:SS time  specifica‐
174              tion  is  allowed  by  the  code, note that the poll time of the
175              Slurm scheduler is not precise enough to guarantee  dispatch  of
176              the  job on the exact second.  The job will be eligible to start
177              on the next poll following the specified time.  The  exact  poll
178              interval  depends  on the Slurm scheduler (e.g., 60 seconds with
179              the default sched/builtin).
180               -  If  no  time  (HH:MM:SS)  is  specified,  the   default   is
181              (00:00:00).
182               -  If a date is specified without a year (e.g., MM/DD) then the
183              current year is assumed, unless the  combination  of  MM/DD  and
184              HH:MM:SS  has  already  passed  for that year, in which case the
185              next year is used.
186              This option applies to job allocations.
187
188       -D, --chdir=<path>
189              Have the remote processes do a chdir to  path  before  beginning
190              execution. The default is to chdir to the current working direc‐
191              tory of the srun process. The path can be specified as full path
192              or relative path to the directory where the command is executed.
193              This option applies to job allocations.
194
195       --cluster-constraint=<list>
196              Specifies features that a federated cluster must have to have  a
197              sibling job submitted to it. Slurm will attempt to submit a sib‐
198              ling job to a cluster if it has at least one  of  the  specified
199              features.
200
201       -M, --clusters=<string>
202              Clusters  to  issue  commands to.  Multiple cluster names may be
203              comma separated.  The job will be submitted to the  one  cluster
204              providing the earliest expected job initiation time. The default
205              value is the current cluster. A value of 'all' will query to run
206              on  all  clusters.  Note the --export option to control environ‐
207              ment variables exported between clusters.  This  option  applies
208              only  to job allocations.  Note that the SlurmDBD must be up for
209              this option to work properly.
210
211       --comment=<string>
212              An arbitrary comment. This option applies to job allocations.
213
214       --compress[=type]
215              Compress file before sending it to compute hosts.  The  optional
216              argument specifies the data compression library to be used.  The
217              default is BcastParameters Compression= if set or  "lz4"  other‐
218              wise.   Supported  values are "lz4".  Some compression libraries
219              may be unavailable on some systems.  For use  with  the  --bcast
220              option. This option applies to step allocations.
221
222       -C, --constraint=<list>
223              Nodes  can  have features assigned to them by the Slurm adminis‐
224              trator.  Users can specify which of these features are  required
225              by their job using the constraint option. If you are looking for
226              'soft' constraints please see see --prefer for more information.
227              Only  nodes having features matching the job constraints will be
228              used to satisfy the request.  Multiple constraints may be speci‐
229              fied  with AND, OR, matching OR, resource counts, etc. (some op‐
230              erators are not supported on all system types).
231
232              NOTE: If features that are  part  of  the  node_features/helpers
233              plugin  are requested, then only the Single Name and AND options
234              are supported.
235
236              Supported --constraint options include:
237
238              Single Name
239                     Only nodes which have the specified feature will be used.
240                     For example, --constraint="intel"
241
242              Node Count
243                     A  request  can  specify  the number of nodes needed with
244                     some feature by appending an asterisk and count after the
245                     feature    name.     For   example,   --nodes=16   --con‐
246                     straint="graphics*4 ..."  indicates that the job requires
247                     16  nodes and that at least four of those nodes must have
248                     the feature "graphics."
249
250              AND    If only nodes with all  of  specified  features  will  be
251                     used.   The  ampersand  is used for an AND operator.  For
252                     example, --constraint="intel&gpu"
253
254              OR     If only nodes with at least  one  of  specified  features
255                     will  be used.  The vertical bar is used for an OR opera‐
256                     tor.  For example, --constraint="intel|amd"
257
258              Matching OR
259                     If only one of a set of possible options should  be  used
260                     for all allocated nodes, then use the OR operator and en‐
261                     close the options within square brackets.   For  example,
262                     --constraint="[rack1|rack2|rack3|rack4]" might be used to
263                     specify that all nodes must be allocated on a single rack
264                     of the cluster, but any of those four racks can be used.
265
266              Multiple Counts
267                     Specific counts of multiple resources may be specified by
268                     using the AND operator and enclosing the  options  within
269                     square      brackets.       For      example,      --con‐
270                     straint="[rack1*2&rack2*4]" might be used to specify that
271                     two  nodes  must be allocated from nodes with the feature
272                     of "rack1" and four nodes must be  allocated  from  nodes
273                     with the feature "rack2".
274
275                     NOTE:  This construct does not support multiple Intel KNL
276                     NUMA  or  MCDRAM  modes.  For   example,   while   --con‐
277                     straint="[(knl&quad)*2&(knl&hemi)*4]"  is  not supported,
278                     --constraint="[haswell*2&(knl&hemi)*4]"   is   supported.
279                     Specification of multiple KNL modes requires the use of a
280                     heterogeneous job.
281
282                     NOTE: Multiple Counts can cause jobs to be allocated with
283                     a non-optimal network layout.
284
285              Brackets
286                     Brackets can be used to indicate that you are looking for
287                     a set of nodes with the different requirements  contained
288                     within     the     brackets.    For    example,    --con‐
289                     straint="[(rack1|rack2)*1&(rack3)*2]" will  get  you  one
290                     node  with either the "rack1" or "rack2" features and two
291                     nodes with the "rack3" feature.  The same request without
292                     the  brackets  will  try to find a single node that meets
293                     those requirements.
294
295                     NOTE: Brackets are only reserved for Multiple Counts  and
296                     Matching  OR  syntax.   AND operators require a count for
297                     each    feature    inside    square    brackets     (i.e.
298                     "[quad*2&hemi*1]"). Slurm will only allow a single set of
299                     bracketed constraints per job.
300
301              Parenthesis
302                     Parenthesis can be used to group like node  features  to‐
303                     gether.           For           example,           --con‐
304                     straint="[(knl&snc4&flat)*4&haswell*1]" might be used  to
305                     specify  that  four nodes with the features "knl", "snc4"
306                     and "flat" plus one node with the feature  "haswell"  are
307                     required.   All  options  within  parenthesis  should  be
308                     grouped with AND (e.g. "&") operands.
309
310              WARNING: When srun is executed from within salloc or sbatch, the
311              constraint value can only contain a single feature name. None of
312              the other operators are currently supported for job steps.
313              This option applies to job and step allocations.
314
315       --container=<path_to_container>
316              Absolute path to OCI container bundle.
317
318       --contiguous
319              If set, then the allocated nodes must form a contiguous set.
320
321              NOTE: If SelectPlugin=cons_res this option won't be honored with
322              the  topology/tree  or  topology/3d_torus plugins, both of which
323              can modify the node ordering. This option applies to job alloca‐
324              tions.
325
326       -S, --core-spec=<num>
327              Count of specialized cores per node reserved by the job for sys‐
328              tem operations and not used by the application. The  application
329              will  not use these cores, but will be charged for their alloca‐
330              tion.  Default value is dependent  upon  the  node's  configured
331              CoreSpecCount  value.   If a value of zero is designated and the
332              Slurm configuration option AllowSpecResourcesUsage  is  enabled,
333              the  job  will  be allowed to override CoreSpecCount and use the
334              specialized resources on nodes it is allocated.  This option can
335              not  be  used with the --thread-spec option. This option applies
336              to job allocations.
337
338              NOTE: This option may implicitly impact the number of  tasks  if
339              -n was not specified.
340
341              NOTE:  Explicitly setting a job's specialized core value implic‐
342              itly sets its --exclusive option, reserving entire nodes for the
343              job.
344
345       --cores-per-socket=<cores>
346              Restrict  node  selection  to  nodes with at least the specified
347              number of cores per socket.  See additional information under -B
348              option  above  when task/affinity plugin is enabled. This option
349              applies to job allocations.
350
351       --cpu-bind=[{quiet|verbose},]<type>
352              Bind tasks to CPUs.  Used only when the task/affinity plugin  is
353              enabled.   NOTE: To have Slurm always report on the selected CPU
354              binding for all commands executed in a  shell,  you  can  enable
355              verbose  mode by setting the SLURM_CPU_BIND environment variable
356              value to "verbose".
357
358              The following informational environment variables are  set  when
359              --cpu-bind is in use:
360
361                   SLURM_CPU_BIND_VERBOSE
362                   SLURM_CPU_BIND_TYPE
363                   SLURM_CPU_BIND_LIST
364
365              See  the  ENVIRONMENT  VARIABLES section for a more detailed de‐
366              scription of  the  individual  SLURM_CPU_BIND  variables.  These
367              variable  are available only if the task/affinity plugin is con‐
368              figured.
369
370              When using --cpus-per-task to run multithreaded tasks, be  aware
371              that  CPU  binding  is inherited from the parent of the process.
372              This means that the multithreaded task should either specify  or
373              clear  the CPU binding itself to avoid having all threads of the
374              multithreaded task use the same mask/CPU as the parent.   Alter‐
375              natively,  fat  masks (masks which specify more than one allowed
376              CPU) could be used for the tasks in order  to  provide  multiple
377              CPUs for the multithreaded tasks.
378
379              Note  that a job step can be allocated different numbers of CPUs
380              on each node or be allocated CPUs not starting at location zero.
381              Therefore  one  of  the options which automatically generate the
382              task binding is  recommended.   Explicitly  specified  masks  or
383              bindings  are  only honored when the job step has been allocated
384              every available CPU on the node.
385
386              Binding a task to a NUMA locality domain means to bind the  task
387              to  the  set  of CPUs that belong to the NUMA locality domain or
388              "NUMA node".  If NUMA locality domain options are used  on  sys‐
389              tems  with no NUMA support, then each socket is considered a lo‐
390              cality domain.
391
392              If the --cpu-bind option is not used, the default  binding  mode
393              will  depend  upon Slurm's configuration and the step's resource
394              allocation.  If all allocated nodes  have  the  same  configured
395              CpuBind  mode, that will be used.  Otherwise if the job's Parti‐
396              tion has a configured CpuBind mode, that will be  used.   Other‐
397              wise  if Slurm has a configured TaskPluginParam value, that mode
398              will be used.  Otherwise automatic binding will be performed  as
399              described below.
400
401              Auto Binding
402                     Applies  only  when  task/affinity is enabled. If the job
403                     step allocation includes an allocation with a  number  of
404                     sockets,  cores,  or threads equal to the number of tasks
405                     times cpus-per-task, then the tasks will  by  default  be
406                     bound  to  the appropriate resources (auto binding). Dis‐
407                     able  this  mode  of  operation  by  explicitly   setting
408                     "--cpu-bind=none".        Use       TaskPluginParam=auto‐
409                     bind=[threads|cores|sockets] to set a default cpu binding
410                     in case "auto binding" doesn't find a match.
411
412              Supported options include:
413
414                     q[uiet]
415                            Quietly bind before task runs (default)
416
417                     v[erbose]
418                            Verbosely report binding before task runs
419
420                     no[ne] Do  not  bind  tasks  to CPUs (default unless auto
421                            binding is applied)
422
423                     rank   Automatically bind by task rank.  The lowest  num‐
424                            bered  task  on  each  node is bound to socket (or
425                            core or thread) zero, etc.  Not  supported  unless
426                            the entire node is allocated to the job.
427
428                     map_cpu:<list>
429                            Bind  by  setting CPU masks on tasks (or ranks) as
430                            specified         where         <list>          is
431                            <cpu_id_for_task_0>,<cpu_id_for_task_1>,...     If
432                            the number of tasks (or ranks) exceeds the  number
433                            of  elements  in  this  list, elements in the list
434                            will be reused as needed starting from the  begin‐
435                            ning  of  the list.  To simplify support for large
436                            task counts, the lists may follow a  map  with  an
437                            asterisk   and   repetition  count.   For  example
438                            "map_cpu:0*4,3*4".
439
440                     mask_cpu:<list>
441                            Bind by setting CPU masks on tasks (or  ranks)  as
442                            specified          where         <list>         is
443                            <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
444                            The  mapping is specified for a node and identical
445                            mapping is applied to  the  tasks  on  every  node
446                            (i.e. the lowest task ID on each node is mapped to
447                            the first mask specified in the list, etc.).   CPU
448                            masks are always interpreted as hexadecimal values
449                            but can be preceded with an optional '0x'.  If the
450                            number  of  tasks (or ranks) exceeds the number of
451                            elements in this list, elements in the  list  will
452                            be reused as needed starting from the beginning of
453                            the list.  To  simplify  support  for  large  task
454                            counts,  the lists may follow a map with an aster‐
455                            isk   and   repetition   count.     For    example
456                            "mask_cpu:0x0f*4,0xf0*4".
457
458                     rank_ldom
459                            Bind  to  a NUMA locality domain by rank. Not sup‐
460                            ported unless the entire node is allocated to  the
461                            job.
462
463                     map_ldom:<list>
464                            Bind  by mapping NUMA locality domain IDs to tasks
465                            as      specified      where       <list>       is
466                            <ldom1>,<ldom2>,...<ldomN>.   The  locality domain
467                            IDs are interpreted as decimal values unless  they
468                            are  preceded with '0x' in which case they are in‐
469                            terpreted as hexadecimal  values.   Not  supported
470                            unless the entire node is allocated to the job.
471
472                     mask_ldom:<list>
473                            Bind  by  setting  NUMA  locality  domain masks on
474                            tasks    as    specified    where    <list>     is
475                            <mask1>,<mask2>,...<maskN>.   NUMA locality domain
476                            masks are always interpreted as hexadecimal values
477                            but  can  be  preceded with an optional '0x'.  Not
478                            supported unless the entire node is  allocated  to
479                            the job.
480
481                     sockets
482                            Automatically  generate  masks  binding  tasks  to
483                            sockets.  Only the CPUs on the socket  which  have
484                            been  allocated  to  the job will be used.  If the
485                            number of tasks differs from the number  of  allo‐
486                            cated sockets this can result in sub-optimal bind‐
487                            ing.
488
489                     cores  Automatically  generate  masks  binding  tasks  to
490                            cores.   If  the  number of tasks differs from the
491                            number of  allocated  cores  this  can  result  in
492                            sub-optimal binding.
493
494                     threads
495                            Automatically  generate  masks  binding  tasks  to
496                            threads.  If the number of tasks differs from  the
497                            number  of  allocated  threads  this can result in
498                            sub-optimal binding.
499
500                     ldoms  Automatically generate masks binding tasks to NUMA
501                            locality  domains.  If the number of tasks differs
502                            from the number of allocated locality domains this
503                            can result in sub-optimal binding.
504
505                     help   Show help message for cpu-bind
506
507              This option applies to job and step allocations.
508
509       --cpu-freq=<p1>[-p2[:p3]]
510
511              Request  that the job step initiated by this srun command be run
512              at some requested frequency if possible, on  the  CPUs  selected
513              for the step on the compute node(s).
514
515              p1  can be  [#### | low | medium | high | highm1] which will set
516              the frequency scaling_speed to the corresponding value, and  set
517              the frequency scaling_governor to UserSpace. See below for defi‐
518              nition of the values.
519
520              p1 can be [Conservative | OnDemand |  Performance  |  PowerSave]
521              which  will set the scaling_governor to the corresponding value.
522              The governor has to be in the list set by the slurm.conf  option
523              CpuFreqGovernors.
524
525              When p2 is present, p1 will be the minimum scaling frequency and
526              p2 will be the maximum scaling frequency.
527
528              p2 can be  [#### | medium | high | highm1] p2  must  be  greater
529              than p1.
530
531              p3  can  be [Conservative | OnDemand | Performance | PowerSave |
532              SchedUtil | UserSpace] which will set the governor to the corre‐
533              sponding value.
534
535              If p3 is UserSpace, the frequency scaling_speed will be set by a
536              power or energy aware scheduling strategy to a value between  p1
537              and  p2  that lets the job run within the site's power goal. The
538              job may be delayed if p1 is higher than a frequency that  allows
539              the job to run within the goal.
540
541              If  the current frequency is < min, it will be set to min. Like‐
542              wise, if the current frequency is > max, it will be set to max.
543
544              Acceptable values at present include:
545
546              ####          frequency in kilohertz
547
548              Low           the lowest available frequency
549
550              High          the highest available frequency
551
552              HighM1        (high minus one)  will  select  the  next  highest
553                            available frequency
554
555              Medium        attempts  to  set a frequency in the middle of the
556                            available range
557
558              Conservative  attempts to use the Conservative CPU governor
559
560              OnDemand      attempts to use the OnDemand CPU governor (the de‐
561                            fault value)
562
563              Performance   attempts to use the Performance CPU governor
564
565              PowerSave     attempts to use the PowerSave CPU governor
566
567              UserSpace     attempts to use the UserSpace CPU governor
568
569              The  following  informational environment variable is set
570              in the job
571              step when --cpu-freq option is requested.
572                      SLURM_CPU_FREQ_REQ
573
574              This environment variable can also be used to supply  the  value
575              for  the CPU frequency request if it is set when the 'srun' com‐
576              mand is issued.  The --cpu-freq on the command line  will  over‐
577              ride  the  environment variable value.  The form on the environ‐
578              ment variable is the same as the command line.  See the ENVIRON‐
579              MENT    VARIABLES    section    for   a   description   of   the
580              SLURM_CPU_FREQ_REQ variable.
581
582              NOTE: This parameter is treated as a request, not a requirement.
583              If  the  job  step's  node does not support setting the CPU fre‐
584              quency, or the requested value is outside the bounds of the  le‐
585              gal frequencies, an error is logged, but the job step is allowed
586              to continue.
587
588              NOTE: Setting the frequency for just the CPUs of  the  job  step
589              implies that the tasks are confined to those CPUs.  If task con‐
590              finement (i.e. the task/affinity TaskPlugin is enabled,  or  the
591              task/cgroup  TaskPlugin is enabled with "ConstrainCores=yes" set
592              in cgroup.conf) is not configured, this parameter is ignored.
593
594              NOTE: When the step completes, the  frequency  and  governor  of
595              each selected CPU is reset to the previous values.
596
597              NOTE: When submitting jobs with  the --cpu-freq option with lin‐
598              uxproc as the ProctrackType can cause jobs to  run  too  quickly
599              before  Accounting is able to poll for job information. As a re‐
600              sult not all of accounting information will be present.
601
602              This option applies to job and step allocations.
603
604       --cpus-per-gpu=<ncpus>
605              Advise Slurm that ensuing job steps will require  ncpus  proces‐
606              sors per allocated GPU.  Not compatible with the --cpus-per-task
607              option.
608
609       -c, --cpus-per-task=<ncpus>
610              Request that ncpus be allocated per process. This may be  useful
611              if  the  job is multithreaded and requires more than one CPU per
612              task for optimal performance. Explicitly requesting this  option
613              implies --exact. The default is one CPU per process and does not
614              imply --exact.  If -c is specified without  -n,  as  many  tasks
615              will  be  allocated per node as possible while satisfying the -c
616              restriction. For instance on a cluster with 8 CPUs per  node,  a
617              job  request  for 4 nodes and 3 CPUs per task may be allocated 3
618              or 6 CPUs per node (1 or 2 tasks per node)  depending  upon  re‐
619              source  consumption  by  other jobs. Such a job may be unable to
620              execute more than a total of 4 tasks.
621
622              WARNING: There are configurations and options  interpreted  dif‐
623              ferently by job and job step requests which can result in incon‐
624              sistencies   for   this   option.    For   example   srun    -c2
625              --threads-per-core=1  prog  may  allocate two cores for the job,
626              but if each of those cores contains two threads, the job alloca‐
627              tion  will  include four CPUs. The job step allocation will then
628              launch two threads per CPU for a total of two tasks.
629
630              WARNING: When srun is executed from  within  salloc  or  sbatch,
631              there  are configurations and options which can result in incon‐
632              sistent allocations when -c has a value greater than -c on  sal‐
633              loc or sbatch.  The number of cpus per task specified for salloc
634              or sbatch is not automatically inherited by  srun  and,  if  de‐
635              sired,   must   be   requested   again,   either  by  specifying
636              --cpus-per-task  when  calling   srun,   or   by   setting   the
637              SRUN_CPUS_PER_TASK environment variable.
638
639              This option applies to job and step allocations.
640
641       --deadline=<OPT>
642              remove  the  job  if  no ending is possible before this deadline
643              (start > (deadline -  time[-min])).   Default  is  no  deadline.
644              Valid time formats are:
645              HH:MM[:SS] [AM|PM]
646              MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
647              MM/DD[/YY]-HH:MM[:SS]
648              YYYY-MM-DD[THH:MM[:SS]]]
649              now[+count[seconds(default)|minutes|hours|days|weeks]]
650
651              This option applies only to job allocations.
652
653       --delay-boot=<minutes>
654              Do  not  reboot  nodes  in order to satisfied this job's feature
655              specification if the job has been eligible to run for less  than
656              this time period.  If the job has waited for less than the spec‐
657              ified period, it will use only  nodes  which  already  have  the
658              specified features.  The argument is in units of minutes.  A de‐
659              fault value may be set by a system administrator using  the  de‐
660              lay_boot option of the SchedulerParameters configuration parame‐
661              ter in the slurm.conf file, otherwise the default value is  zero
662              (no delay).
663
664              This option applies only to job allocations.
665
666       -d, --dependency=<dependency_list>
667              Defer  the  start  of  this job until the specified dependencies
668              have been satisfied completed. This option does not apply to job
669              steps  (executions  of  srun within an existing salloc or sbatch
670              allocation) only to job allocations.   <dependency_list>  is  of
671              the    form   <type:job_id[:job_id][,type:job_id[:job_id]]>   or
672              <type:job_id[:job_id][?type:job_id[:job_id]]>.  All dependencies
673              must  be satisfied if the "," separator is used.  Any dependency
674              may be satisfied if the "?" separator is used.  Only one separa‐
675              tor may be used. For instance:
676              -d afterok:20:21,afterany:23
677              means that the job can run only after a 0 return code of jobs 20
678              and 21 AND the completion of job 23. However:
679              -d afterok:20:21?afterany:23
680              means that any of the conditions (afterok:20  OR  afterok:21  OR
681              afterany:23)  will  be enough to release the job.  Many jobs can
682              share the same dependency and these jobs may even belong to dif‐
683              ferent   users.  The   value may be changed after job submission
684              using the scontrol command.  Dependencies on remote jobs are al‐
685              lowed  in  a federation.  Once a job dependency fails due to the
686              termination state of a preceding job,  the  dependent  job  will
687              never  be  run,  even if the preceding job is requeued and has a
688              different termination state in a subsequent execution. This  op‐
689              tion applies to job allocations.
690
691              after:job_id[[+time][:jobid[+time]...]]
692                     After  the  specified  jobs  start  or  are cancelled and
693                     'time' in minutes from job start or cancellation happens,
694                     this  job can begin execution. If no 'time' is given then
695                     there is no delay after start or cancellation.
696
697              afterany:job_id[:jobid...]
698                     This job can begin execution  after  the  specified  jobs
699                     have terminated.  This is the default dependency type.
700
701              afterburstbuffer:job_id[:jobid...]
702                     This  job  can  begin  execution after the specified jobs
703                     have terminated and any associated burst buffer stage out
704                     operations have completed.
705
706              aftercorr:job_id[:jobid...]
707                     A  task  of  this job array can begin execution after the
708                     corresponding task ID in the specified job has  completed
709                     successfully  (ran  to  completion  with  an exit code of
710                     zero).
711
712              afternotok:job_id[:jobid...]
713                     This job can begin execution  after  the  specified  jobs
714                     have terminated in some failed state (non-zero exit code,
715                     node failure, timed out, etc).
716
717              afterok:job_id[:jobid...]
718                     This job can begin execution  after  the  specified  jobs
719                     have  successfully  executed  (ran  to completion with an
720                     exit code of zero).
721
722              singleton
723                     This  job  can  begin  execution  after  any   previously
724                     launched  jobs  sharing  the  same job name and user have
725                     terminated.  In other words, only one job  by  that  name
726                     and owned by that user can be running or suspended at any
727                     point in time.  In a federation, a  singleton  dependency
728                     must be fulfilled on all clusters unless DependencyParam‐
729                     eters=disable_remote_singleton is used in slurm.conf.
730
731       -X, --disable-status
732              Disable the display of task status when srun receives  a  single
733              SIGINT  (Ctrl-C).  Instead immediately forward the SIGINT to the
734              running job.  Without this option a second Ctrl-C in one  second
735              is  required to forcibly terminate the job and srun will immedi‐
736              ately exit.  May  also  be  set  via  the  environment  variable
737              SLURM_DISABLE_STATUS. This option applies to job allocations.
738
739       -m,                                --distribution={*|block|cyclic|arbi‐
740       trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
741
742              Specify  alternate  distribution  methods  for remote processes.
743              For job allocation, this sets environment variables that will be
744              used  by subsequent srun requests. Task distribution affects job
745              allocation at the last stage of the evaluation of available  re‐
746              sources  by  the  cons_res  and cons_tres plugins. Consequently,
747              other options (e.g. --ntasks-per-node, --cpus-per-task) may  af‐
748              fect resource selection prior to task distribution.  To ensure a
749              specific task distribution jobs  should  have  access  to  whole
750              nodes, for instance by using the --exclusive flag.
751
752              This  option  controls the distribution of tasks to the nodes on
753              which resources have been allocated,  and  the  distribution  of
754              those  resources to tasks for binding (task affinity). The first
755              distribution method (before the first ":") controls the  distri‐
756              bution of tasks to nodes.  The second distribution method (after
757              the first ":")  controls  the  distribution  of  allocated  CPUs
758              across  sockets  for  binding  to  tasks. The third distribution
759              method (after the second ":") controls the distribution of allo‐
760              cated  CPUs  across  cores for binding to tasks.  The second and
761              third distributions apply only if task affinity is enabled.  The
762              third  distribution  is supported only if the task/cgroup plugin
763              is configured. The default value for each distribution  type  is
764              specified by *.
765
766              Note  that with select/cons_res and select/cons_tres, the number
767              of CPUs allocated to each socket and node may be different.  Re‐
768              fer to https://slurm.schedmd.com/mc_support.html for more infor‐
769              mation on resource allocation, distribution of tasks  to  nodes,
770              and binding of tasks to CPUs.
771              First distribution method (distribution of tasks across nodes):
772
773
774              *      Use  the  default  method for distributing tasks to nodes
775                     (block).
776
777              block  The block distribution method will distribute tasks to  a
778                     node  such that consecutive tasks share a node. For exam‐
779                     ple, consider an allocation of three nodes each with  two
780                     cpus.  A  four-task  block distribution request will dis‐
781                     tribute those tasks to the nodes with tasks one  and  two
782                     on  the  first  node,  task three on the second node, and
783                     task four on the third node.  Block distribution  is  the
784                     default  behavior if the number of tasks exceeds the num‐
785                     ber of allocated nodes.
786
787              cyclic The cyclic distribution method will distribute tasks to a
788                     node  such  that  consecutive  tasks are distributed over
789                     consecutive nodes (in a round-robin fashion).  For  exam‐
790                     ple,  consider an allocation of three nodes each with two
791                     cpus. A four-task cyclic distribution request  will  dis‐
792                     tribute  those tasks to the nodes with tasks one and four
793                     on the first node, task two on the second node, and  task
794                     three  on  the  third node.  Note that when SelectType is
795                     select/cons_res, the same number of CPUs may not be allo‐
796                     cated on each node. Task distribution will be round-robin
797                     among all the nodes with  CPUs  yet  to  be  assigned  to
798                     tasks.   Cyclic  distribution  is the default behavior if
799                     the number of tasks is no larger than the number of allo‐
800                     cated nodes.
801
802              plane  The  tasks  are distributed in blocks of size <size>. The
803                     size must be given or SLURM_DIST_PLANESIZE must  be  set.
804                     The  number of tasks distributed to each node is the same
805                     as for cyclic distribution, but the taskids  assigned  to
806                     each  node depend on the plane size. Additional distribu‐
807                     tion specifications cannot be combined with this  option.
808                     For  more  details  (including  examples  and  diagrams),
809                     please see https://slurm.schedmd.com/mc_support.html  and
810                     https://slurm.schedmd.com/dist_plane.html
811
812              arbitrary
813                     The  arbitrary  method of distribution will allocate pro‐
814                     cesses in-order as listed in file designated by the envi‐
815                     ronment  variable  SLURM_HOSTFILE.   If  this variable is
816                     listed it will over ride any other method specified.   If
817                     not  set  the  method  will default to block.  Inside the
818                     hostfile must contain at minimum the number of hosts  re‐
819                     quested and be one per line or comma separated.  If spec‐
820                     ifying a task count (-n, --ntasks=<number>),  your  tasks
821                     will be laid out on the nodes in the order of the file.
822                     NOTE:  The arbitrary distribution option on a job alloca‐
823                     tion only controls the nodes to be allocated to  the  job
824                     and  not  the allocation of CPUs on those nodes. This op‐
825                     tion is meant primarily to control a job step's task lay‐
826                     out in an existing job allocation for the srun command.
827                     NOTE:  If  the number of tasks is given and a list of re‐
828                     quested nodes is also given, the  number  of  nodes  used
829                     from  that list will be reduced to match that of the num‐
830                     ber of tasks if the  number  of  nodes  in  the  list  is
831                     greater than the number of tasks.
832
833              Second  distribution method (distribution of CPUs across sockets
834              for binding):
835
836
837              *      Use the default method for distributing CPUs across sock‐
838                     ets (cyclic).
839
840              block  The  block  distribution method will distribute allocated
841                     CPUs consecutively from the same socket  for  binding  to
842                     tasks, before using the next consecutive socket.
843
844              cyclic The  cyclic distribution method will distribute allocated
845                     CPUs for binding to a given task consecutively  from  the
846                     same socket, and from the next consecutive socket for the
847                     next task,  in  a  round-robin  fashion  across  sockets.
848                     Tasks  requiring more than one CPU will have all of those
849                     CPUs allocated on a single socket if possible.
850
851              fcyclic
852                     The fcyclic distribution method will distribute allocated
853                     CPUs  for  binding to tasks from consecutive sockets in a
854                     round-robin fashion across the sockets.  Tasks  requiring
855                     more  than  one  CPU  will  have each CPUs allocated in a
856                     cyclic fashion across sockets.
857
858              Third distribution method (distribution of CPUs across cores for
859              binding):
860
861
862              *      Use the default method for distributing CPUs across cores
863                     (inherited from second distribution method).
864
865              block  The block distribution method will  distribute  allocated
866                     CPUs  consecutively  from  the  same  core for binding to
867                     tasks, before using the next consecutive core.
868
869              cyclic The cyclic distribution method will distribute  allocated
870                     CPUs  for  binding to a given task consecutively from the
871                     same core, and from the next  consecutive  core  for  the
872                     next task, in a round-robin fashion across cores.
873
874              fcyclic
875                     The fcyclic distribution method will distribute allocated
876                     CPUs for binding to tasks from  consecutive  cores  in  a
877                     round-robin fashion across the cores.
878
879              Optional control for task distribution over nodes:
880
881
882              Pack   Rather than evenly distributing a job step's tasks evenly
883                     across its allocated nodes, pack them as tightly as  pos‐
884                     sible  on  the nodes.  This only applies when the "block"
885                     task distribution method is used.
886
887              NoPack Rather than packing a job step's tasks as tightly as pos‐
888                     sible  on  the  nodes, distribute them evenly.  This user
889                     option   will    supersede    the    SelectTypeParameters
890                     CR_Pack_Nodes configuration parameter.
891
892              This option applies to job and step allocations.
893
894       --epilog={none|<executable>}
895              srun will run executable just after the job step completes.  The
896              command line arguments for executable will be  the  command  and
897              arguments  of  the job step.  If none is specified, then no srun
898              epilog will be run. This parameter overrides the SrunEpilog  pa‐
899              rameter  in slurm.conf. This parameter is completely independent
900              from the Epilog parameter in slurm.conf. This option applies  to
901              job allocations.
902
903       -e, --error=<filename_pattern>
904              Specify  how  stderr is to be redirected. By default in interac‐
905              tive mode, srun redirects stderr to the same file as stdout,  if
906              one is specified. The --error option is provided to allow stdout
907              and stderr to be redirected to different locations.  See IO  Re‐
908              direction below for more options.  If the specified file already
909              exists, it will be overwritten. This option applies to  job  and
910              step allocations.
911
912       --exact
913              Allow  a  step  access  to  only the resources requested for the
914              step.  By default, all non-GRES resources on each  node  in  the
915              step  allocation  will be used. This option only applies to step
916              allocations.
917              NOTE: Parallel steps will either be blocked  or  rejected  until
918              requested step resources are available unless --overlap is spec‐
919              ified. Job resources can be held after the completion of an srun
920              command  while Slurm does job cleanup. Step epilogs and/or SPANK
921              plugins can further delay the release of step resources.
922
923       -x, --exclude={<host1[,<host2>...]|<filename>}
924              Request that a specific list of hosts not be included in the re‐
925              sources  allocated to this job. The host list will be assumed to
926              be a filename if it contains a "/" character.  This  option  ap‐
927              plies to job and step allocations.
928
929       --exclusive[={user|mcs}]
930              This option applies to job and job step allocations, and has two
931              slightly different meanings for each one.  When used to initiate
932              a  job, the job allocation cannot share nodes with other running
933              jobs  (or just other users with the "=user" option or "=mcs" op‐
934              tion).   If  user/mcs are not specified (i.e. the job allocation
935              can not share nodes with other running jobs), the job  is  allo‐
936              cated  all  CPUs and GRES on all nodes in the allocation, but is
937              only allocated as much memory as it requested. This is by design
938              to  support gang scheduling, because suspended jobs still reside
939              in memory. To request all the memory on  a  node,  use  --mem=0.
940              The default shared/exclusive behavior depends on system configu‐
941              ration and the partition's OverSubscribe option takes precedence
942              over  the job's option.  NOTE: Since shared GRES (MPS) cannot be
943              allocated at the same time as a sharing GRES (GPU)  this  option
944              only allocates all sharing GRES and no underlying shared GRES.
945
946              This  option  can also be used when initiating more than one job
947              step within an existing resource allocation (default), where you
948              want  separate  processors  to be dedicated to each job step. If
949              sufficient processors are not  available  to  initiate  the  job
950              step, it will be deferred. This can be thought of as providing a
951              mechanism for resource management to the job within its  alloca‐
952              tion (--exact implied).
953
954              The  exclusive  allocation  of  CPUs applies to job steps by de‐
955              fault, but --exact is NOT the default. In other words,  the  de‐
956              fault  behavior  is this: job steps will not share CPUs, but job
957              steps will be allocated all CPUs available to  the  job  on  all
958              nodes allocated to the steps.
959
960              In order to share the resources use the --overlap option.
961
962              See EXAMPLE below.
963
964       --export={[ALL,]<environment_variables>|ALL|NONE}
965              Identify  which  environment variables from the submission envi‐
966              ronment are propagated to the launched application.
967
968              --export=ALL
969                        Default mode if --export is not specified. All of  the
970                        user's  environment  will  be loaded from the caller's
971                        environment.
972
973              --export=NONE
974                        None of the user environment  will  be  defined.  User
975                        must  use  absolute  path to the binary to be executed
976                        that will define the environment. User can not specify
977                        explicit environment variables with "NONE".
978
979                        This  option  is  particularly important for jobs that
980                        are submitted on one cluster and execute on a  differ‐
981                        ent  cluster  (e.g.  with  different paths).  To avoid
982                        steps inheriting  environment  export  settings  (e.g.
983                        "NONE")  from  sbatch command, either set --export=ALL
984                        or the environment variable SLURM_EXPORT_ENV should be
985                        set to "ALL".
986
987              --export=[ALL,]<environment_variables>
988                        Exports  all  SLURM*  environment variables along with
989                        explicitly  defined  variables.  Multiple  environment
990                        variable names should be comma separated.  Environment
991                        variable names may be specified to propagate the  cur‐
992                        rent value (e.g. "--export=EDITOR") or specific values
993                        may be exported  (e.g.  "--export=EDITOR=/bin/emacs").
994                        If "ALL" is specified, then all user environment vari‐
995                        ables will be loaded and will take precedence over any
996                        explicitly given environment variables.
997
998                   Example: --export=EDITOR,ARG1=test
999                        In  this example, the propagated environment will only
1000                        contain the variable EDITOR from the  user's  environ‐
1001                        ment, SLURM_* environment variables, and ARG1=test.
1002
1003                   Example: --export=ALL,EDITOR=/bin/emacs
1004                        There  are  two possible outcomes for this example. If
1005                        the caller has the  EDITOR  environment  variable  de‐
1006                        fined,  then  the  job's  environment will inherit the
1007                        variable from the caller's environment.  If the caller
1008                        doesn't  have an environment variable defined for EDI‐
1009                        TOR, then the job's environment  will  use  the  value
1010                        given by --export.
1011
1012       -B, --extra-node-info=<sockets>[:cores[:threads]]
1013              Restrict  node  selection  to  nodes with at least the specified
1014              number of sockets, cores per socket and/or threads per core.
1015              NOTE: These options do not specify the resource allocation size.
1016              Each  value  specified is considered a minimum.  An asterisk (*)
1017              can be used as a placeholder indicating that all  available  re‐
1018              sources  of  that  type  are  to be utilized. Values can also be
1019              specified as min-max. The individual levels can also  be  speci‐
1020              fied in separate options if desired:
1021
1022                  --sockets-per-node=<sockets>
1023                  --cores-per-socket=<cores>
1024                  --threads-per-core=<threads>
1025              If  task/affinity  plugin is enabled, then specifying an alloca‐
1026              tion in this manner also sets a  default  --cpu-bind  option  of
1027              threads  if the -B option specifies a thread count, otherwise an
1028              option of cores if a core count is specified, otherwise  an  op‐
1029              tion   of   sockets.    If   SelectType  is  configured  to  se‐
1030              lect/cons_res, it must have a parameter of CR_Core, CR_Core_Mem‐
1031              ory,  CR_Socket,  or CR_Socket_Memory for this option to be hon‐
1032              ored.  If not specified, the  scontrol  show  job  will  display
1033              'ReqS:C:T=*:*:*'. This option applies to job allocations.
1034              NOTE:   This   option   is   mutually   exclusive  with  --hint,
1035              --threads-per-core and --ntasks-per-core.
1036              NOTE: If the number of sockets, cores and threads were all spec‐
1037              ified, the number of nodes was specified (as a fixed number, not
1038              a range) and the number of tasks was NOT  specified,  srun  will
1039              implicitly calculate the number of tasks as one task per thread.
1040
1041       --gid=<group>
1042              If srun is run as root, and the --gid option is used, submit the
1043              job with group's group access permissions.   group  may  be  the
1044              group name or the numerical group ID. This option applies to job
1045              allocations.
1046
1047       --gpu-bind=[verbose,]<type>
1048              Bind tasks to specific GPUs.  By default every spawned task  can
1049              access every GPU allocated to the step.  If "verbose," is speci‐
1050              fied before <type>, then print out GPU binding debug information
1051              to  the  stderr of the tasks. GPU binding is ignored if there is
1052              only one task.
1053
1054              Supported type options:
1055
1056              closest   Bind each task to the GPU(s) which are closest.  In  a
1057                        NUMA  environment, each task may be bound to more than
1058                        one GPU (i.e.  all GPUs in that NUMA environment).
1059
1060              map_gpu:<list>
1061                        Bind by setting GPU masks on tasks (or ranks) as spec‐
1062                        ified            where            <list>            is
1063                        <gpu_id_for_task_0>,<gpu_id_for_task_1>,...  GPU   IDs
1064                        are  interpreted  as  decimal values. If the number of
1065                        tasks (or ranks) exceeds the  number  of  elements  in
1066                        this  list,  elements  in  the  list will be reused as
1067                        needed starting from the beginning  of  the  list.  To
1068                        simplify  support for large task counts, the lists may
1069                        follow a map with an asterisk  and  repetition  count.
1070                        For  example  "map_gpu:0*4,1*4".   If  the task/cgroup
1071                        plugin  is  used  and  ConstrainDevices  is   set   in
1072                        cgroup.conf,  then  the GPU IDs are zero-based indexes
1073                        relative to the GPUs allocated to the  job  (e.g.  the
1074                        first  GPU  is  0, even if the global ID is 3). Other‐
1075                        wise, the GPU IDs are global IDs, and all GPUs on each
1076                        node  in  the  job should be allocated for predictable
1077                        binding results.
1078
1079              mask_gpu:<list>
1080                        Bind by setting GPU masks on tasks (or ranks) as spec‐
1081                        ified            where            <list>            is
1082                        <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,...    The
1083                        mapping  is specified for a node and identical mapping
1084                        is applied to the tasks on every node (i.e. the lowest
1085                        task ID on each node is mapped to the first mask spec‐
1086                        ified in the list, etc.). GPU masks are always  inter‐
1087                        preted  as hexadecimal values but can be preceded with
1088                        an optional '0x'. To simplify support for  large  task
1089                        counts,  the  lists  may follow a map with an asterisk
1090                        and      repetition      count.       For      example
1091                        "mask_gpu:0x0f*4,0xf0*4".   If  the task/cgroup plugin
1092                        is used and ConstrainDevices is  set  in  cgroup.conf,
1093                        then  the  GPU  IDs are zero-based indexes relative to
1094                        the GPUs allocated to the job (e.g. the first  GPU  is
1095                        0, even if the global ID is 3). Otherwise, the GPU IDs
1096                        are global IDs, and all GPUs on each node in  the  job
1097                        should be allocated for predictable binding results.
1098
1099              none      Do  not  bind  tasks  to  GPUs  (turns  off binding if
1100                        --gpus-per-task is requested).
1101
1102              per_task:<gpus_per_task>
1103                        Each task will be bound to the number of  gpus  speci‐
1104                        fied in <gpus_per_task>. Gpus are assigned in order to
1105                        tasks. The first task will be  assigned  the  first  x
1106                        number of gpus on the node etc.
1107
1108              single:<tasks_per_gpu>
1109                        Like  --gpu-bind=closest,  except  that  each task can
1110                        only be bound to a single GPU, even  when  it  can  be
1111                        bound  to  multiple  GPUs that are equally close.  The
1112                        GPU to bind to is determined by <tasks_per_gpu>, where
1113                        the first <tasks_per_gpu> tasks are bound to the first
1114                        GPU available, the second  <tasks_per_gpu>  tasks  are
1115                        bound to the second GPU available, etc.  This is basi‐
1116                        cally a block distribution  of  tasks  onto  available
1117                        GPUs,  where  the available GPUs are determined by the
1118                        socket affinity of the task and the socket affinity of
1119                        the GPUs as specified in gres.conf's Cores parameter.
1120
1121       --gpu-freq=[<type]=value>[,<type=value>][,verbose]
1122              Request  that GPUs allocated to the job are configured with spe‐
1123              cific frequency values.  This option can  be  used  to  indepen‐
1124              dently  configure the GPU and its memory frequencies.  After the
1125              job is completed, the frequencies of all affected GPUs  will  be
1126              reset  to  the  highest  possible values.  In some cases, system
1127              power caps may override the requested values.   The  field  type
1128              can be "memory".  If type is not specified, the GPU frequency is
1129              implied.  The value field can either be "low", "medium", "high",
1130              "highm1"  or  a numeric value in megahertz (MHz).  If the speci‐
1131              fied numeric value is not possible, a value as close as possible
1132              will  be used. See below for definition of the values.  The ver‐
1133              bose option causes  current  GPU  frequency  information  to  be
1134              logged.  Examples of use include "--gpu-freq=medium,memory=high"
1135              and "--gpu-freq=450".
1136
1137              Supported value definitions:
1138
1139              low       the lowest available frequency.
1140
1141              medium    attempts to set a  frequency  in  the  middle  of  the
1142                        available range.
1143
1144              high      the highest available frequency.
1145
1146              highm1    (high  minus  one) will select the next highest avail‐
1147                        able frequency.
1148
1149       -G, --gpus=[type:]<number>
1150              Specify the total number of GPUs required for the job.   An  op‐
1151              tional  GPU  type  specification  can  be supplied.  For example
1152              "--gpus=volta:3".  Multiple options can be requested in a  comma
1153              separated  list,  for  example:  "--gpus=volta:3,kepler:1".  See
1154              also the --gpus-per-node, --gpus-per-socket and  --gpus-per-task
1155              options.
1156              NOTE: The allocation has to contain at least one GPU per node.
1157
1158       --gpus-per-node=[type:]<number>
1159              Specify the number of GPUs required for the job on each node in‐
1160              cluded in the job's resource allocation.  An optional  GPU  type
1161              specification      can     be     supplied.      For     example
1162              "--gpus-per-node=volta:3".  Multiple options can be requested in
1163              a       comma       separated       list,      for      example:
1164              "--gpus-per-node=volta:3,kepler:1".   See   also   the   --gpus,
1165              --gpus-per-socket and --gpus-per-task options.
1166
1167       --gpus-per-socket=[type:]<number>
1168              Specify  the  number of GPUs required for the job on each socket
1169              included in the job's resource allocation.  An optional GPU type
1170              specification      can     be     supplied.      For     example
1171              "--gpus-per-socket=volta:3".  Multiple options can be  requested
1172              in      a     comma     separated     list,     for     example:
1173              "--gpus-per-socket=volta:3,kepler:1".  Requires job to specify a
1174              sockets  per  node  count  (  --sockets-per-node).  See also the
1175              --gpus, --gpus-per-node and --gpus-per-task options.   This  op‐
1176              tion applies to job allocations.
1177
1178       --gpus-per-task=[type:]<number>
1179              Specify  the number of GPUs required for the job on each task to
1180              be spawned in the job's resource allocation.   An  optional  GPU
1181              type    specification    can    be    supplied.    For   example
1182              "--gpus-per-task=volta:1". Multiple options can be requested  in
1183              a       comma       separated       list,      for      example:
1184              "--gpus-per-task=volta:3,kepler:1".   See   also   the   --gpus,
1185              --gpus-per-socket  and --gpus-per-node options.  This option re‐
1186              quires an explicit task count, e.g. -n,  --ntasks  or  "--gpus=X
1187              --gpus-per-task=Y"  rather than an ambiguous range of nodes with
1188              -N,    --nodes.     This    option    will    implicitly     set
1189              --gpu-bind=per_task:<gpus_per_task>,  but that can be overridden
1190              with an explicit --gpu-bind specification.
1191
1192       --gres=<list>
1193              Specifies a  comma-delimited  list  of  generic  consumable  re‐
1194              sources.    The   format   of   each   entry   on  the  list  is
1195              "name[[:type]:count]".  The name is that of the  consumable  re‐
1196              source.   The  count is the number of those resources with a de‐
1197              fault value of 1.  The count can have a suffix  of  "k"  or  "K"
1198              (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
1199              "G" (multiple of 1024 x 1024 x 1024), "t" or  "T"  (multiple  of
1200              1024  x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
1201              x 1024 x 1024 x 1024).  The specified resources  will  be  allo‐
1202              cated to the job on each node.  The available generic consumable
1203              resources is configurable by the system administrator.   A  list
1204              of  available  generic  consumable resources will be printed and
1205              the command will exit if the option argument is  "help".   Exam‐
1206              ples  of  use include "--gres=gpu:2", "--gres=gpu:kepler:2", and
1207              "--gres=help".  NOTE: This option applies to job and step  allo‐
1208              cations.  By default, a job step is allocated all of the generic
1209              resources that have been requested by the job, except those  im‐
1210              plicitly  requested  when a job is exclusive.  To change the be‐
1211              havior so that each job step is allocated no generic  resources,
1212              explicitly  set  the  value of --gres to specify zero counts for
1213              each  generic  resource  OR  set  "--gres=none"   OR   set   the
1214              SLURM_STEP_GRES environment variable to "none".
1215
1216       --gres-flags=<type>
1217              Specify generic resource task binding options.
1218
1219              disable-binding
1220                     Disable  filtering  of  CPUs  with respect to generic re‐
1221                     source locality.  This option is  currently  required  to
1222                     use  more CPUs than are bound to a GRES (i.e. if a GPU is
1223                     bound to the CPUs on one socket, but  resources  on  more
1224                     than  one  socket are required to run the job).  This op‐
1225                     tion may permit a job to be  allocated  resources  sooner
1226                     than otherwise possible, but may result in lower job per‐
1227                     formance.  This option applies to job allocations.
1228                     NOTE: This option is specific to SelectType=cons_res.
1229
1230              enforce-binding
1231                     The only CPUs available to the  job/step  will  be  those
1232                     bound  to  the selected GRES (i.e. the CPUs identified in
1233                     the gres.conf file will be strictly enforced).  This  op‐
1234                     tion  may result in delayed initiation of a job.  For ex‐
1235                     ample a job requiring two GPUs and one CPU  will  be  de‐
1236                     layed  until  both  GPUs on a single socket are available
1237                     rather than using GPUs bound to  separate  sockets,  how‐
1238                     ever,  the application performance may be improved due to
1239                     improved communication speed.  Requires the  node  to  be
1240                     configured with more than one socket and resource filter‐
1241                     ing will be performed on a per-socket basis.   NOTE:  Job
1242                     steps that don't use --exact will not be affected.
1243                     NOTE: This option is specific to SelectType=cons_tres for
1244                     job allocations.
1245
1246       -h, --help
1247              Display help information and exit.
1248
1249       --het-group=<expr>
1250              Identify each component in a heterogeneous  job  allocation  for
1251              which a step is to be created. Applies only to srun commands is‐
1252              sued inside a salloc allocation or sbatch script.  <expr>  is  a
1253              set  of integers corresponding to one or more options offsets on
1254              the salloc or sbatch command line.   Examples:  "--het-group=2",
1255              "--het-group=0,4",  "--het-group=1,3-5".   The  default value is
1256              --het-group=0.
1257
1258       --hint=<type>
1259              Bind tasks according to application hints.
1260              NOTE: This option cannot be used  in  conjunction  with  any  of
1261              --ntasks-per-core,  --threads-per-core,  --cpu-bind  (other than
1262              --cpu-bind=verbose) or -B. If --hint is specified as  a  command
1263              line argument, it will take precedence over the environment.
1264
1265              compute_bound
1266                     Select  settings  for compute bound applications: use all
1267                     cores in each socket, one thread per core.
1268
1269              memory_bound
1270                     Select settings for memory bound applications:  use  only
1271                     one core in each socket, one thread per core.
1272
1273              [no]multithread
1274                     [don't]  use  extra  threads with in-core multi-threading
1275                     which can benefit communication  intensive  applications.
1276                     Only supported with the task/affinity plugin.
1277
1278              help   show this help message
1279
1280              This option applies to job allocations.
1281
1282       -H, --hold
1283              Specify  the job is to be submitted in a held state (priority of
1284              zero).  A held job can now be released using scontrol  to  reset
1285              its priority (e.g. "scontrol release <job_id>"). This option ap‐
1286              plies to job allocations.
1287
1288       -I, --immediate[=<seconds>]
1289              exit if resources are not available within the time period spec‐
1290              ified.   If  no  argument  is given (seconds defaults to 1), re‐
1291              sources must be available immediately for the  request  to  suc‐
1292              ceed.  If  defer  is  configured in SchedulerParameters and sec‐
1293              onds=1 the allocation request will fail immediately; defer  con‐
1294              flicts and takes precedence over this option.  By default, --im‐
1295              mediate is off, and the command will block until  resources  be‐
1296              come  available.  Since  this option's argument is optional, for
1297              proper parsing the single letter option must be followed immedi‐
1298              ately  with  the value and not include a space between them. For
1299              example "-I60" and not "-I 60". This option applies to  job  and
1300              step allocations.
1301
1302       -i, --input=<mode>
1303              Specify  how  stdin  is to be redirected. By default, srun redi‐
1304              rects stdin from the terminal to all tasks. See  IO  Redirection
1305              below  for more options.  For OS X, the poll() function does not
1306              support stdin, so input from a terminal is  not  possible.  This
1307              option applies to job and step allocations.
1308
1309       -J, --job-name=<jobname>
1310              Specify a name for the job. The specified name will appear along
1311              with the job id number when querying running jobs on the system.
1312              The  default  is  the  supplied executable program's name. NOTE:
1313              This information may be written to the  slurm_jobacct.log  file.
1314              This  file  is space delimited so if a space is used in the job‐
1315              name name it will cause problems in properly displaying the con‐
1316              tents  of  the  slurm_jobacct.log file when the sacct command is
1317              used. This option applies to job and step allocations.
1318
1319       --jobid=<jobid>
1320              Initiate a job step under an already allocated job with  job  id
1321              id.   Using  this option will cause srun to behave exactly as if
1322              the SLURM_JOB_ID environment variable was set. This  option  ap‐
1323              plies to step allocations.
1324
1325       -K, --kill-on-bad-exit[=0|1]
1326              Controls  whether  or  not to terminate a step if any task exits
1327              with a non-zero exit code. If this option is not specified,  the
1328              default action will be based upon the Slurm configuration param‐
1329              eter of KillOnBadExit. If this option is specified, it will take
1330              precedence  over  KillOnBadExit. An option argument of zero will
1331              not terminate the job. A non-zero argument or no  argument  will
1332              terminate  the job.  Note: This option takes precedence over the
1333              -W, --wait option to terminate the job immediately if a task ex‐
1334              its  with a non-zero exit code.  Since this option's argument is
1335              optional, for proper parsing the single letter  option  must  be
1336              followed  immediately with the value and not include a space be‐
1337              tween them. For example "-K1" and not "-K 1".
1338
1339       -l, --label
1340              Prepend task number to lines of stdout/err.  The --label  option
1341              will  prepend  lines of output with the remote task id. This op‐
1342              tion applies to step allocations.
1343
1344       -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
1345              Specification of licenses (or other resources available  on  all
1346              nodes  of the cluster) which must be allocated to this job.  Li‐
1347              cense names can be followed by a colon and  count  (the  default
1348              count is one).  Multiple license names should be comma separated
1349              (e.g.  "--licenses=foo:4,bar"). This option applies to job allo‐
1350              cations.
1351
1352              NOTE:  When submitting heterogeneous jobs, license requests only
1353              work correctly when made on the first component job.  For  exam‐
1354              ple "srun -L ansys:2 : myexecutable".
1355
1356       --mail-type=<type>
1357              Notify user by email when certain event types occur.  Valid type
1358              values are NONE, BEGIN, END, FAIL, REQUEUE, ALL  (equivalent  to
1359              BEGIN,  END,  FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
1360              VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1361              fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1362              (reached 90 percent of time limit),  TIME_LIMIT_80  (reached  80
1363              percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1364              time limit).  Multiple type values may be specified in  a  comma
1365              separated  list.   The  user  to  be  notified is indicated with
1366              --mail-user. This option applies to job allocations.
1367
1368       --mail-user=<user>
1369              User to receive email notification of state changes  as  defined
1370              by  --mail-type.  The default value is the submitting user. This
1371              option applies to job allocations.
1372
1373       --mcs-label=<mcs>
1374              Used only when the mcs/group plugin is enabled.  This  parameter
1375              is  a group among the groups of the user.  Default value is cal‐
1376              culated by the Plugin mcs if it's enabled. This  option  applies
1377              to job allocations.
1378
1379       --mem=<size>[units]
1380              Specify  the  real  memory required per node.  Default units are
1381              megabytes.  Different units can be specified  using  the  suffix
1382              [K|M|G|T].  Default value is DefMemPerNode and the maximum value
1383              is MaxMemPerNode. If configured, both of parameters can be  seen
1384              using  the  scontrol  show config command.  This parameter would
1385              generally be used if whole nodes are allocated to jobs  (Select‐
1386              Type=select/linear).   Specifying  a  memory limit of zero for a
1387              job step will restrict the job step to the amount of memory  al‐
1388              located to the job, but not remove any of the job's memory allo‐
1389              cation from being  available  to  other  job  steps.   Also  see
1390              --mem-per-cpu  and  --mem-per-gpu.  The --mem, --mem-per-cpu and
1391              --mem-per-gpu  options  are  mutually   exclusive.   If   --mem,
1392              --mem-per-cpu or --mem-per-gpu are specified as command line ar‐
1393              guments, then they will take  precedence  over  the  environment
1394              (potentially inherited from salloc or sbatch).
1395
1396              NOTE:  A  memory size specification of zero is treated as a spe‐
1397              cial case and grants the job access to all of the memory on each
1398              node  for  newly  submitted jobs and all available job memory to
1399              new job steps.
1400
1401              NOTE: Enforcement of memory limits  currently  relies  upon  the
1402              task/cgroup plugin or enabling of accounting, which samples mem‐
1403              ory use on a periodic basis (data need not be stored, just  col‐
1404              lected).  In both cases memory use is based upon the job's Resi‐
1405              dent Set Size (RSS). A task may exceed the  memory  limit  until
1406              the next periodic accounting sample.
1407
1408              This option applies to job and step allocations.
1409
1410       --mem-bind=[{quiet|verbose},]<type>
1411              Bind tasks to memory. Used only when the task/affinity plugin is
1412              enabled and the NUMA memory functions are available.  Note  that
1413              the  resolution of CPU and memory binding may differ on some ar‐
1414              chitectures. For example, CPU binding may be  performed  at  the
1415              level  of the cores within a processor while memory binding will
1416              be performed at the level of  nodes,  where  the  definition  of
1417              "nodes"  may differ from system to system.  By default no memory
1418              binding is performed; any task using any CPU can use any memory.
1419              This  option is typically used to ensure that each task is bound
1420              to the memory closest to its assigned CPU. The use of  any  type
1421              other  than  "none"  or "local" is not recommended.  If you want
1422              greater control, try running a simple test code with the options
1423              "--cpu-bind=verbose,none  --mem-bind=verbose,none"  to determine
1424              the specific configuration.
1425
1426              NOTE: To have Slurm always report on the selected memory binding
1427              for  all  commands  executed  in a shell, you can enable verbose
1428              mode by setting the SLURM_MEM_BIND environment variable value to
1429              "verbose".
1430
1431              The  following  informational environment variables are set when
1432              --mem-bind is in use:
1433
1434                   SLURM_MEM_BIND_LIST
1435                   SLURM_MEM_BIND_PREFER
1436                   SLURM_MEM_BIND_SORT
1437                   SLURM_MEM_BIND_TYPE
1438                   SLURM_MEM_BIND_VERBOSE
1439
1440              See the ENVIRONMENT VARIABLES section for a  more  detailed  de‐
1441              scription of the individual SLURM_MEM_BIND* variables.
1442
1443              Supported options include:
1444
1445              help   show this help message
1446
1447              local  Use memory local to the processor in use
1448
1449              map_mem:<list>
1450                     Bind by setting memory masks on tasks (or ranks) as spec‐
1451                     ified             where             <list>             is
1452                     <numa_id_for_task_0>,<numa_id_for_task_1>,...   The  map‐
1453                     ping is specified for a node and identical mapping is ap‐
1454                     plied to the tasks on every node (i.e. the lowest task ID
1455                     on each node is mapped to the first ID specified  in  the
1456                     list,  etc.).  NUMA IDs are interpreted as decimal values
1457                     unless they are preceded with '0x' in which case they in‐
1458                     terpreted  as hexadecimal values.  If the number of tasks
1459                     (or ranks) exceeds the number of elements in  this  list,
1460                     elements  in  the  list will be reused as needed starting
1461                     from the beginning of the list.  To simplify support  for
1462                     large task counts, the lists may follow a map with an as‐
1463                     terisk    and    repetition    count.     For     example
1464                     "map_mem:0x0f*4,0xf0*4".   For  predictable  binding  re‐
1465                     sults, all CPUs for each node in the job should be  allo‐
1466                     cated to the job.
1467
1468              mask_mem:<list>
1469                     Bind by setting memory masks on tasks (or ranks) as spec‐
1470                     ified             where             <list>             is
1471                     <numa_mask_for_task_0>,<numa_mask_for_task_1>,...     The
1472                     mapping is specified for a node and identical mapping  is
1473                     applied  to the tasks on every node (i.e. the lowest task
1474                     ID on each node is mapped to the first mask specified  in
1475                     the  list,  etc.).   NUMA masks are always interpreted as
1476                     hexadecimal values.  Note that  masks  must  be  preceded
1477                     with  a  '0x'  if they don't begin with [0-9] so they are
1478                     seen as numerical values.  If the  number  of  tasks  (or
1479                     ranks)  exceeds the number of elements in this list, ele‐
1480                     ments in the list will be reused as needed starting  from
1481                     the beginning of the list.  To simplify support for large
1482                     task counts, the lists may follow a mask with an asterisk
1483                     and  repetition  count.   For example "mask_mem:0*4,1*4".
1484                     For predictable binding results, all CPUs for  each  node
1485                     in the job should be allocated to the job.
1486
1487              no[ne] don't bind tasks to memory (default)
1488
1489              nosort avoid sorting free cache pages (default, LaunchParameters
1490                     configuration parameter can override this default)
1491
1492              p[refer]
1493                     Prefer use of first specified NUMA node, but permit
1494                      use of other available NUMA nodes.
1495
1496              q[uiet]
1497                     quietly bind before task runs (default)
1498
1499              rank   bind by task rank (not recommended)
1500
1501              sort   sort free cache pages (run zonesort on Intel KNL nodes)
1502
1503              v[erbose]
1504                     verbosely report binding before task runs
1505
1506              This option applies to job and step allocations.
1507
1508       --mem-per-cpu=<size>[units]
1509              Minimum memory required per usable allocated CPU.  Default units
1510              are  megabytes.  Different units can be specified using the suf‐
1511              fix [K|M|G|T].  The default value is DefMemPerCPU and the  maxi‐
1512              mum  value is MaxMemPerCPU (see exception below). If configured,
1513              both parameters can be seen using the scontrol show config  com‐
1514              mand.   Note  that  if the job's --mem-per-cpu value exceeds the
1515              configured MaxMemPerCPU, then the user's limit will  be  treated
1516              as  a  memory limit per task; --mem-per-cpu will be reduced to a
1517              value no larger than MaxMemPerCPU; --cpus-per-task will  be  set
1518              and   the   value  of  --cpus-per-task  multiplied  by  the  new
1519              --mem-per-cpu value will equal the original --mem-per-cpu  value
1520              specified  by  the user.  This parameter would generally be used
1521              if individual processors are allocated to  jobs  (SelectType=se‐
1522              lect/cons_res).   If resources are allocated by core, socket, or
1523              whole nodes, then the number of CPUs allocated to a job  may  be
1524              higher than the task count and the value of --mem-per-cpu should
1525              be adjusted accordingly.  Specifying a memory limit of zero  for
1526              a  job  step  will restrict the job step to the amount of memory
1527              allocated to the job, but not remove any of the job's memory al‐
1528              location  from  being  available  to  other job steps.  Also see
1529              --mem  and  --mem-per-gpu.    The   --mem,   --mem-per-cpu   and
1530              --mem-per-gpu options are mutually exclusive.
1531
1532              NOTE:  If the final amount of memory requested by a job can't be
1533              satisfied by any of the nodes configured in the  partition,  the
1534              job  will  be  rejected.   This could happen if --mem-per-cpu is
1535              used with the  --exclusive  option  for  a  job  allocation  and
1536              --mem-per-cpu times the number of CPUs on a node is greater than
1537              the total memory of that node.
1538
1539              NOTE: This applies to usable allocated CPUs in a job allocation.
1540              This  is important when more than one thread per core is config‐
1541              ured.  If a job requests --threads-per-core with  fewer  threads
1542              on  a core than exist on the core (or --hint=nomultithread which
1543              implies --threads-per-core=1), the job will  be  unable  to  use
1544              those  extra  threads  on the core and those threads will not be
1545              included in the memory per CPU calculation. But if the  job  has
1546              access  to  all  threads  on the core, those threads will be in‐
1547              cluded in the memory per CPU calculation even if the job did not
1548              explicitly request those threads.
1549
1550              In the following examples, each core has two threads.
1551
1552              In  this  first  example,  two  tasks can run on separate hyper‐
1553              threads in the same core because --threads-per-core is not used.
1554              The  third  task uses both threads of the second core. The allo‐
1555              cated memory per cpu includes all threads:
1556
1557              $ salloc -n3 --mem-per-cpu=100
1558              salloc: Granted job allocation 17199
1559              $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
1560                JobID                             ReqTRES                           AllocTRES
1561              ------- ----------------------------------- -----------------------------------
1562                17199     billing=3,cpu=3,mem=300M,node=1     billing=4,cpu=4,mem=400M,node=1
1563
1564              In this second example, because  of  --threads-per-core=1,  each
1565              task  is  allocated  an  entire core but is only able to use one
1566              thread per core. Allocated CPUs includes  all  threads  on  each
1567              core. However, allocated memory per cpu includes only the usable
1568              thread in each core.
1569
1570              $ salloc -n3 --mem-per-cpu=100 --threads-per-core=1
1571              salloc: Granted job allocation 17200
1572              $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
1573                JobID                             ReqTRES                           AllocTRES
1574              ------- ----------------------------------- -----------------------------------
1575                17200     billing=3,cpu=3,mem=300M,node=1     billing=6,cpu=6,mem=300M,node=1
1576
1577       --mem-per-gpu=<size>[units]
1578              Minimum memory required per allocated GPU.   Default  units  are
1579              megabytes.   Different  units  can be specified using the suffix
1580              [K|M|G|T].  Default value is DefMemPerGPU and  is  available  on
1581              both  a  global and per partition basis.  If configured, the pa‐
1582              rameters can be seen using the scontrol show config and scontrol
1583              show   partition   commands.    Also   see  --mem.   The  --mem,
1584              --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1585
1586       --mincpus=<n>
1587              Specify a minimum number of logical  cpus/processors  per  node.
1588              This option applies to job allocations.
1589
1590       --mpi=<mpi_type>
1591              Identify the type of MPI to be used. May result in unique initi‐
1592              ation procedures.
1593
1594              cray_shasta
1595                     To enable Cray PMI  support.  This  is  for  applications
1596                     built with the Cray Programming Environment. The PMI Con‐
1597                     trol Port can be specified with the  --resv-ports  option
1598                     or  with  the  MpiParams=ports=<port  range> parameter in
1599                     your slurm.conf.  This plugin does not have  support  for
1600                     heterogeneous  jobs.  Support for cray_shasta is included
1601                     by default.
1602
1603              list   Lists available mpi types to choose from.
1604
1605              pmi2   To enable PMI2 support. The PMI2 support in  Slurm  works
1606                     only  if  the  MPI  implementation  supports it, in other
1607                     words if the MPI has the PMI2 interface implemented.  The
1608                     --mpi=pmi2  will  load  the library lib/slurm/mpi_pmi2.so
1609                     which provides the  server  side  functionality  but  the
1610                     client  side must implement PMI2_Init() and the other in‐
1611                     terface calls.
1612
1613              pmix   To enable PMIx support (https://pmix.github.io). The PMIx
1614                     support  in Slurm can be used to launch parallel applica‐
1615                     tions (e.g. MPI) if it supports PMIx, PMI2 or PMI1. Slurm
1616                     must   be   configured   with  pmix  support  by  passing
1617                     "--with-pmix=<PMIx  installation  path>"  option  to  its
1618                     "./configure" script.
1619
1620                     At  the  time  of  writing  PMIx is supported in Open MPI
1621                     starting from version 2.0.  PMIx also  supports  backward
1622                     compatibility  with  PMI1 and PMI2 and can be used if MPI
1623                     was configured with PMI2/PMI1  support  pointing  to  the
1624                     PMIx  library ("libpmix").  If MPI supports PMI1/PMI2 but
1625                     doesn't provide the way to point to a specific  implemen‐
1626                     tation,  a hack'ish solution leveraging LD_PRELOAD can be
1627                     used to force "libpmix" usage.
1628
1629              none   No special MPI processing. This is the default and  works
1630                     with many other versions of MPI.
1631
1632              This option applies to step allocations.
1633
1634       --msg-timeout=<seconds>
1635              Modify  the  job  launch  message timeout.  The default value is
1636              MessageTimeout  in  the  Slurm  configuration  file  slurm.conf.
1637              Changes to this are typically not recommended, but could be use‐
1638              ful to diagnose problems.  This option applies  to  job  alloca‐
1639              tions.
1640
1641       --multi-prog
1642              Run  a  job  with different programs and different arguments for
1643              each task. In this case, the executable program specified is ac‐
1644              tually  a configuration file specifying the executable and argu‐
1645              ments for each task. See MULTIPLE  PROGRAM  CONFIGURATION  below
1646              for  details on the configuration file contents. This option ap‐
1647              plies to step allocations.
1648
1649       --network=<type>
1650              Specify information pertaining to the switch  or  network.   The
1651              interpretation of type is system dependent.  This option is sup‐
1652              ported when running Slurm on a Cray natively.  It is used to re‐
1653              quest  using  Network  Performance Counters.  Only one value per
1654              request is valid.  All options are case in-sensitive.   In  this
1655              configuration supported values include:
1656
1657
1658              system
1659                    Use  the  system-wide  network  performance counters. Only
1660                    nodes requested will be marked in use for the job  alloca‐
1661                    tion.   If  the job does not fill up the entire system the
1662                    rest of the nodes are not able to be used  by  other  jobs
1663                    using  NPC,  if  idle their state will appear as PerfCnts.
1664                    These nodes are still available for other jobs  not  using
1665                    NPC.
1666
1667              blade Use the blade network performance counters. Only nodes re‐
1668                    quested will be marked in use for the job allocation.   If
1669                    the  job does not fill up the entire blade(s) allocated to
1670                    the job those blade(s) are not able to be  used  by  other
1671                    jobs  using NPC, if idle their state will appear as PerfC‐
1672                    nts.  These nodes are still available for other  jobs  not
1673                    using NPC.
1674
1675              In  all  cases the job allocation request must specify the --ex‐
1676              clusive option and the step cannot specify the --overlap option.
1677              Otherwise the request will be denied.
1678
1679              Also  with  any  of these options steps are not allowed to share
1680              blades, so resources would remain idle inside an  allocation  if
1681              the  step  running  on a blade does not take up all the nodes on
1682              the blade.
1683
1684              The network option is also available on systems with HPE  Sling‐
1685              shot  networks.  It  can be used to override the default network
1686              resources allocated for the job step.  Multiple  values  may  be
1687              specified in a comma-separated list.
1688
1689                  def_<rsrc>=<val>
1690                         Per-CPU reserved allocation for this resource.
1691
1692                  res_<rsrc>=<val>
1693                         Per-node  reserved  allocation for this resource.  If
1694                         set, overrides the per-CPU allocation.
1695
1696                  max_<rsrc>=<val>
1697                         Maximum per-node limit for this resource.
1698
1699                  depth=<depth>
1700                         Multiplier for per-CPU resource allocation.   Default
1701                         is the number of reserved CPUs on the node.
1702
1703              The resources that may be requested are:
1704
1705                  txqs   Transmit  command  queues.  The default is 3 per-CPU,
1706                         maximum 1024 per-node.
1707
1708                  tgqs   Target command queues. The default is 2 per-CPU, max‐
1709                         imum 512 per-node.
1710
1711                  eqs    Event  queues. The default is 8 per-CPU, maximum 2048
1712                         per-node.
1713
1714                  cts    Counters. The default is 2 per-CPU, maximum 2048 per-
1715                         node.
1716
1717                  tles   Trigger list entries. The default is 1 per-CPU, maxi‐
1718                         mum 2048 per-node.
1719
1720                  ptes   Portable table entries. The  default  is  8  per-CPU,
1721                         maximum 2048 per-node.
1722
1723                  les    List  entries.  The  default  is 134 per-CPU, maximum
1724                         65535 per-node.
1725
1726                  acs    Addressing contexts. The default is 4 per-CPU,  maxi‐
1727                         mum 1024 per-node.
1728
1729              This option applies to job and step allocations.
1730
1731       --nice[=adjustment]
1732              Run  the  job with an adjusted scheduling priority within Slurm.
1733              With no adjustment value the scheduling priority is decreased by
1734              100. A negative nice value increases the priority, otherwise de‐
1735              creases it. The adjustment range is +/- 2147483645. Only  privi‐
1736              leged users can specify a negative adjustment.
1737
1738       -Z, --no-allocate
1739              Run  the  specified  tasks  on a set of nodes without creating a
1740              Slurm "job" in the Slurm queue structure, bypassing  the  normal
1741              resource  allocation  step.  The list of nodes must be specified
1742              with the -w, --nodelist option.  This  is  a  privileged  option
1743              only available for the users "SlurmUser" and "root". This option
1744              applies to job allocations.
1745
1746       -k, --no-kill[=off]
1747              Do not automatically terminate a job if one of the nodes it  has
1748              been  allocated fails. This option applies to job and step allo‐
1749              cations.   The  job  will  assume   all   responsibilities   for
1750              fault-tolerance.   Tasks  launched using this option will not be
1751              considered  terminated  (e.g.  -K,  --kill-on-bad-exit  and  -W,
1752              --wait  options will have no effect upon the job step).  The ac‐
1753              tive job step (MPI job) will likely suffer a  fatal  error,  but
1754              subsequent job steps may be run if this option is specified.
1755
1756              Specify  an optional argument of "off" disable the effect of the
1757              SLURM_NO_KILL environment variable.
1758
1759              The default action is to terminate the job upon node failure.
1760
1761       -F, --nodefile=<node_file>
1762              Much like --nodelist, but the list is contained  in  a  file  of
1763              name node file.  The node names of the list may also span multi‐
1764              ple lines in the file.    Duplicate node names in the file  will
1765              be  ignored.  The order of the node names in the list is not im‐
1766              portant; the node names will be sorted by Slurm.
1767
1768       -w, --nodelist={<node_name_list>|<filename>}
1769              Request a specific list of hosts.  The job will contain  all  of
1770              these  hosts  and possibly additional hosts as needed to satisfy
1771              resource  requirements.   The  list  may  be  specified   as   a
1772              comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1773              for example), or a filename.  The host list will be  assumed  to
1774              be  a filename if it contains a "/" character.  If you specify a
1775              minimum node or processor count larger than can be satisfied  by
1776              the  supplied  host list, additional resources will be allocated
1777              on other nodes as needed.  Rather than  repeating  a  host  name
1778              multiple  times,  an  asterisk and a repetition count may be ap‐
1779              pended to a host name. For example "host1,host1"  and  "host1*2"
1780              are  equivalent.  If  the number of tasks is given and a list of
1781              requested nodes is also given, the number  of  nodes  used  from
1782              that  list  will be reduced to match that of the number of tasks
1783              if the number of nodes in the list is greater than the number of
1784              tasks. This option applies to job and step allocations.
1785
1786       -N, --nodes=<minnodes>[-maxnodes]
1787              Request  that  a  minimum of minnodes nodes be allocated to this
1788              job.  A maximum node count may also be specified with  maxnodes.
1789              If  only one number is specified, this is used as both the mini‐
1790              mum and maximum node count.  The partition's node limits  super‐
1791              sede  those  of  the job.  If a job's node limits are outside of
1792              the range permitted for its associated partition, the  job  will
1793              be  left in a PENDING state.  This permits possible execution at
1794              a later time, when the partition limit is  changed.   If  a  job
1795              node  limit exceeds the number of nodes configured in the parti‐
1796              tion, the job will be rejected.  Note that the environment vari‐
1797              able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1798              ibility) will be set to the count of nodes actually allocated to
1799              the job. See the ENVIRONMENT VARIABLES section for more informa‐
1800              tion.  If -N is not specified, the default behavior is to  allo‐
1801              cate  enough  nodes  to  satisfy  the requested resources as ex‐
1802              pressed by  per-job  specification  options,  e.g.  -n,  -c  and
1803              --gpus.   The  job  will  be allocated as many nodes as possible
1804              within the range specified and without delaying  the  initiation
1805              of the job.  If the number of tasks is given and a number of re‐
1806              quested nodes is also given, the number of nodes used from  that
1807              request  will be reduced to match that of the number of tasks if
1808              the number of nodes in the request is greater than the number of
1809              tasks.  The node count specification may include a numeric value
1810              followed by a suffix of "k" (multiplies numeric value by  1,024)
1811              or  "m" (multiplies numeric value by 1,048,576). This option ap‐
1812              plies to job and step allocations.
1813
1814       -n, --ntasks=<number>
1815              Specify the number of tasks to run. Request that  srun  allocate
1816              resources  for  ntasks tasks.  The default is one task per node,
1817              but note that the --cpus-per-task option will  change  this  de‐
1818              fault. This option applies to job and step allocations.
1819
1820       --ntasks-per-core=<ntasks>
1821              Request the maximum ntasks be invoked on each core.  This option
1822              applies to the job allocation,  but  not  to  step  allocations.
1823              Meant   to  be  used  with  the  --ntasks  option.   Related  to
1824              --ntasks-per-node except at the core level instead of  the  node
1825              level.   Masks will automatically be generated to bind the tasks
1826              to specific cores unless --cpu-bind=none  is  specified.   NOTE:
1827              This  option  is not supported when using SelectType=select/lin‐
1828              ear.
1829
1830       --ntasks-per-gpu=<ntasks>
1831              Request that there are ntasks tasks invoked for every GPU.  This
1832              option can work in two ways: 1) either specify --ntasks in addi‐
1833              tion, in which case a type-less GPU specification will be  auto‐
1834              matically  determined to satisfy --ntasks-per-gpu, or 2) specify
1835              the GPUs wanted (e.g. via --gpus or --gres)  without  specifying
1836              --ntasks,  and the total task count will be automatically deter‐
1837              mined.  The number of CPUs  needed  will  be  automatically  in‐
1838              creased  if  necessary  to  allow for any calculated task count.
1839              This option will implicitly set --gpu-bind=single:<ntasks>,  but
1840              that  can  be  overridden with an explicit --gpu-bind specifica‐
1841              tion.  This option is not compatible with  a  node  range  (i.e.
1842              -N<minnodes-maxnodes>).   This  option  is  not  compatible with
1843              --gpus-per-task, --gpus-per-socket, or --ntasks-per-node.   This
1844              option  is  not supported unless SelectType=cons_tres is config‐
1845              ured (either directly or indirectly on Cray systems).
1846
1847       --ntasks-per-node=<ntasks>
1848              Request that ntasks be invoked on each node.  If used  with  the
1849              --ntasks  option,  the  --ntasks option will take precedence and
1850              the --ntasks-per-node will be treated  as  a  maximum  count  of
1851              tasks per node.  Meant to be used with the --nodes option.  This
1852              is related to --cpus-per-task=ncpus, but does not require knowl‐
1853              edge  of the actual number of cpus on each node.  In some cases,
1854              it is more convenient to be able to request that no more than  a
1855              specific  number  of tasks be invoked on each node.  Examples of
1856              this include submitting a hybrid MPI/OpenMP app where  only  one
1857              MPI  "task/rank"  should be assigned to each node while allowing
1858              the OpenMP portion to utilize all of the parallelism present  in
1859              the node, or submitting a single setup/cleanup/monitoring job to
1860              each node of a pre-existing allocation as one step in  a  larger
1861              job script. This option applies to job allocations.
1862
1863       --ntasks-per-socket=<ntasks>
1864              Request  the maximum ntasks be invoked on each socket.  This op‐
1865              tion applies to the job allocation, but not to step allocations.
1866              Meant   to  be  used  with  the  --ntasks  option.   Related  to
1867              --ntasks-per-node except at the socket level instead of the node
1868              level.   Masks will automatically be generated to bind the tasks
1869              to specific sockets unless --cpu-bind=none is specified.   NOTE:
1870              This  option  is not supported when using SelectType=select/lin‐
1871              ear.
1872
1873       --open-mode={append|truncate}
1874              Open the output and error files using append or truncate mode as
1875              specified.   For  heterogeneous  job  steps the default value is
1876              "append".  Otherwise the default value is specified by the  sys‐
1877              tem  configuration  parameter JobFileAppend. This option applies
1878              to job and step allocations.
1879
1880       -o, --output=<filename_pattern>
1881              Specify the "filename pattern" for stdout  redirection.  By  de‐
1882              fault  in  interactive mode, srun collects stdout from all tasks
1883              and sends this output via TCP/IP to the attached terminal.  With
1884              --output  stdout  may  be  redirected to a file, to one file per
1885              task, or to /dev/null. See section IO Redirection below for  the
1886              various  forms  of  filename pattern.  If the specified file al‐
1887              ready exists, it will be overwritten.
1888
1889              If --error is not also specified on the command line, both  std‐
1890              out  and stderr will directed to the file specified by --output.
1891              This option applies to job and step allocations.
1892
1893       -O, --overcommit
1894              Overcommit resources. This option applies to job and step  allo‐
1895              cations.
1896
1897              When  applied to a job allocation (not including jobs requesting
1898              exclusive access to the nodes) the resources are allocated as if
1899              only  one  task  per  node is requested. This means that the re‐
1900              quested number of cpus per task (-c, --cpus-per-task) are  allo‐
1901              cated  per  node  rather  than being multiplied by the number of
1902              tasks. Options used to specify the number  of  tasks  per  node,
1903              socket, core, etc. are ignored.
1904
1905              When applied to job step allocations (the srun command when exe‐
1906              cuted within an existing job allocation),  this  option  can  be
1907              used  to launch more than one task per CPU.  Normally, srun will
1908              not allocate more than  one  process  per  CPU.   By  specifying
1909              --overcommit  you  are explicitly allowing more than one process
1910              per CPU. However no more than MAX_TASKS_PER_NODE tasks are  per‐
1911              mitted to execute per node.  NOTE: MAX_TASKS_PER_NODE is defined
1912              in the file slurm.h and is not a variable, it is  set  at  Slurm
1913              build time.
1914
1915       --overlap
1916              Specifying  --overlap allows steps to share all resources (CPUs,
1917              memory, and GRES) with all other steps. A step using this option
1918              will  overlap  all  other steps, even those that did not specify
1919              --overlap.
1920
1921              By default steps do not  share  resources  with  other  parallel
1922              steps.  This option applies to step allocations.
1923
1924       -s, --oversubscribe
1925              The  job allocation can over-subscribe resources with other run‐
1926              ning jobs.  The resources to be over-subscribed  can  be  nodes,
1927              sockets,  cores,  and/or  hyperthreads depending upon configura‐
1928              tion.  The default over-subscribe  behavior  depends  on  system
1929              configuration  and  the  partition's  OverSubscribe option takes
1930              precedence over the job's option.  This option may result in the
1931              allocation  being granted sooner than if the --oversubscribe op‐
1932              tion was not set and allow higher system utilization, but appli‐
1933              cation performance will likely suffer due to competition for re‐
1934              sources.  This option applies to job allocations.
1935
1936       -p, --partition=<partition_names>
1937              Request a specific partition for the  resource  allocation.   If
1938              not  specified,  the default behavior is to allow the slurm con‐
1939              troller to select the default partition  as  designated  by  the
1940              system  administrator.  If  the job can use more than one parti‐
1941              tion, specify their names in a comma separate list and  the  one
1942              offering  earliest  initiation will be used with no regard given
1943              to the partition name ordering (although higher priority  parti‐
1944              tions will be considered first).  When the job is initiated, the
1945              name of the partition used will  be  placed  first  in  the  job
1946              record partition string. This option applies to job allocations.
1947
1948       --power=<flags>
1949              Comma  separated  list of power management plugin options.  Cur‐
1950              rently available flags include: level (all  nodes  allocated  to
1951              the job should have identical power caps, may be disabled by the
1952              Slurm configuration option PowerParameters=job_no_level).   This
1953              option applies to job allocations.
1954
1955       --prefer=<list>
1956              Nodes  can  have features assigned to them by the Slurm adminis‐
1957              trator.  Users can specify which of these features  are  desired
1958              but not required by their job using the prefer option.  This op‐
1959              tion operates independently from --constraint and will  override
1960              whatever is set there if possible.  When scheduling the features
1961              in --prefer are tried first if a node set isn't  available  with
1962              those features then --constraint is attempted.  See --constraint
1963              for more information, this option behaves the same way.
1964
1965
1966       -E, --preserve-env
1967              Pass   the   current    values    of    environment    variables
1968              SLURM_JOB_NUM_NODES  and SLURM_NTASKS through to the executable,
1969              rather than computing them from command  line  parameters.  This
1970              option applies to job allocations.
1971
1972       --priority=<value>
1973              Request  a  specific job priority.  May be subject to configura‐
1974              tion specific constraints.  value should  either  be  a  numeric
1975              value  or "TOP" (for highest possible value).  Only Slurm opera‐
1976              tors and administrators can set the priority of a job.  This op‐
1977              tion applies to job allocations only.
1978
1979       --profile={all|none|<type>[,<type>...]}
1980              Enables  detailed  data  collection  by  the acct_gather_profile
1981              plugin.  Detailed data are typically time-series that are stored
1982              in an HDF5 file for the job or an InfluxDB database depending on
1983              the configured plugin.  This option applies to job and step  al‐
1984              locations.
1985
1986              All       All data types are collected. (Cannot be combined with
1987                        other values.)
1988
1989              None      No data types are collected. This is the default.
1990                         (Cannot be combined with other values.)
1991
1992       Valid type values are:
1993
1994              Energy Energy data is collected.
1995
1996              Task   Task (I/O, Memory, ...) data is collected.
1997
1998              Filesystem
1999                     Filesystem data is collected.
2000
2001              Network
2002                     Network (InfiniBand) data is collected.
2003
2004       --prolog=<executable>
2005              srun will run executable just before  launching  the  job  step.
2006              The  command  line  arguments for executable will be the command
2007              and arguments of the job step.  If executable is "none", then no
2008              srun prolog will be run. This parameter overrides the SrunProlog
2009              parameter in slurm.conf. This parameter is  completely  indepen‐
2010              dent  from  the  Prolog parameter in slurm.conf. This option ap‐
2011              plies to job allocations.
2012
2013       --propagate[=rlimit[,rlimit...]]
2014              Allows users to specify which of the modifiable (soft)  resource
2015              limits  to  propagate  to  the  compute nodes and apply to their
2016              jobs. If no rlimit is specified, then all resource  limits  will
2017              be  propagated.   The  following  rlimit  names are supported by
2018              Slurm (although some options may not be supported on  some  sys‐
2019              tems):
2020
2021              ALL       All limits listed below (default)
2022
2023              NONE      No limits listed below
2024
2025              AS        The  maximum  address  space  (virtual  memory)  for a
2026                        process.
2027
2028              CORE      The maximum size of core file
2029
2030              CPU       The maximum amount of CPU time
2031
2032              DATA      The maximum size of a process's data segment
2033
2034              FSIZE     The maximum size of files created. Note  that  if  the
2035                        user  sets  FSIZE to less than the current size of the
2036                        slurmd.log, job launches will fail with a  'File  size
2037                        limit exceeded' error.
2038
2039              MEMLOCK   The maximum size that may be locked into memory
2040
2041              NOFILE    The maximum number of open files
2042
2043              NPROC     The maximum number of processes available
2044
2045              RSS       The maximum resident set size. Note that this only has
2046                        effect with Linux kernels 2.4.30 or older or BSD.
2047
2048              STACK     The maximum stack size
2049
2050              This option applies to job allocations.
2051
2052       --pty  Execute task zero in  pseudo  terminal  mode.   Implicitly  sets
2053              --unbuffered.  Implicitly sets --error and --output to /dev/null
2054              for all tasks except task zero, which may cause those  tasks  to
2055              exit immediately (e.g. shells will typically exit immediately in
2056              that situation).  This option applies to step allocations.
2057
2058       -q, --qos=<qos>
2059              Request a quality of service for the job.  QOS values can be de‐
2060              fined  for  each  user/cluster/account  association in the Slurm
2061              database.  Users will be limited to their association's  defined
2062              set  of  qos's  when the Slurm configuration parameter, Account‐
2063              ingStorageEnforce, includes "qos" in its definition. This option
2064              applies to job allocations.
2065
2066       -Q, --quiet
2067              Suppress  informational messages from srun. Errors will still be
2068              displayed. This option applies to job and step allocations.
2069
2070       --quit-on-interrupt
2071              Quit immediately on single SIGINT (Ctrl-C). Use of  this  option
2072              disables  the  status  feature  normally available when srun re‐
2073              ceives a single Ctrl-C and causes srun  to  instead  immediately
2074              terminate  the  running job. This option applies to step alloca‐
2075              tions.
2076
2077       --reboot
2078              Force the allocated nodes to reboot  before  starting  the  job.
2079              This  is only supported with some system configurations and will
2080              otherwise be silently ignored. Only root,  SlurmUser  or  admins
2081              can reboot nodes. This option applies to job allocations.
2082
2083       -r, --relative=<n>
2084              Run  a  job  step  relative to node n of the current allocation.
2085              This option may be used to spread several job  steps  out  among
2086              the  nodes  of  the  current job. If -r is used, the current job
2087              step will begin at node n of the allocated nodelist,  where  the
2088              first node is considered node 0.  The -r option is not permitted
2089              with -w or -x option and will result in a fatal error  when  not
2090              running within a prior allocation (i.e. when SLURM_JOB_ID is not
2091              set). The default for n is 0. If the value  of  --nodes  exceeds
2092              the  number  of  nodes  identified with the --relative option, a
2093              warning message will be printed and the --relative  option  will
2094              take precedence. This option applies to step allocations.
2095
2096       --reservation=<reservation_names>
2097              Allocate  resources  for  the job from the named reservation. If
2098              the job can use more than one reservation, specify  their  names
2099              in  a  comma separate list and the one offering earliest initia‐
2100              tion. Each reservation will be considered in the  order  it  was
2101              requested.   All  reservations will be listed in scontrol/squeue
2102              through the life of the job.  In accounting the  first  reserva‐
2103              tion  will be seen and after the job starts the reservation used
2104              will replace it.
2105
2106       --resv-ports[=count]
2107              Reserve communication ports for this job. Users can specify  the
2108              number  of  port  they  want  to  reserve.  The  parameter  Mpi‐
2109              Params=ports=12000-12999 must be specified in slurm.conf. If the
2110              number  of  reserved  ports  is zero then no ports are reserved.
2111              Used for native Cray's PMI only.  This option applies to job and
2112              step allocations.
2113
2114       --send-libs[=yes|no]
2115              If set to yes (or no argument), autodetect and broadcast the ex‐
2116              ecutable's  shared  object  dependencies  to  allocated  compute
2117              nodes.  The  files  are placed in a directory alongside the exe‐
2118              cutable. The LD_LIBRARY_PATH is automatically updated to include
2119              this  cache directory as well. This overrides the default behav‐
2120              ior configured in slurm.conf  SbcastParameters  send_libs.  This
2121              option   only  works  in  conjunction  with  --bcast.  See  also
2122              --bcast-exclude.
2123
2124       --signal=[R:]<sig_num>[@sig_time]
2125              When a job is within sig_time seconds of its end time,  send  it
2126              the  signal sig_num.  Due to the resolution of event handling by
2127              Slurm, the signal may be sent up  to  60  seconds  earlier  than
2128              specified.   sig_num may either be a signal number or name (e.g.
2129              "10" or "USR1").  sig_time must have an integer value between  0
2130              and  65535.   By default, no signal is sent before the job's end
2131              time.  If a sig_num is specified without any sig_time,  the  de‐
2132              fault  time will be 60 seconds. This option applies to job allo‐
2133              cations.  Use the "R:" option to allow this job to overlap  with
2134              a  reservation  with MaxStartDelay set.  To have the signal sent
2135              at preemption time see the preempt_send_user_signal SlurmctldPa‐
2136              rameter.
2137
2138       --slurmd-debug=<level>
2139              Specify  a debug level for slurmd(8). The level may be specified
2140              either an integer value between 0 [quiet, only errors  are  dis‐
2141              played] and 4 [verbose operation] or the SlurmdDebug tags.
2142
2143              quiet     Log nothing
2144
2145              fatal     Log only fatal errors
2146
2147              error     Log only errors
2148
2149              info      Log errors and general informational messages
2150
2151              verbose   Log errors and verbose informational messages
2152
2153              The  slurmd  debug  information is copied onto the stderr of the
2154              job. By default only errors are displayed. This  option  applies
2155              to job and step allocations.
2156
2157       --sockets-per-node=<sockets>
2158              Restrict  node  selection  to  nodes with at least the specified
2159              number of sockets.  See additional information under  -B  option
2160              above  when task/affinity plugin is enabled. This option applies
2161              to job allocations.
2162              NOTE: This option may implicitly impact the number of  tasks  if
2163              -n was not specified.
2164
2165       --spread-job
2166              Spread the job allocation over as many nodes as possible and at‐
2167              tempt to evenly distribute tasks  across  the  allocated  nodes.
2168              This  option disables the topology/tree plugin.  This option ap‐
2169              plies to job allocations.
2170
2171       --switches=<count>[@max-time]
2172              When a tree topology is used, this defines the maximum count  of
2173              leaf  switches desired for the job allocation and optionally the
2174              maximum time to wait for that number of switches. If Slurm finds
2175              an allocation containing more switches than the count specified,
2176              the job remains pending until it either finds an allocation with
2177              desired  switch count or the time limit expires.  It there is no
2178              switch count limit, there is no delay in starting the job.   Ac‐
2179              ceptable  time  formats  include  "minutes",  "minutes:seconds",
2180              "hours:minutes:seconds", "days-hours", "days-hours:minutes"  and
2181              "days-hours:minutes:seconds".   The job's maximum time delay may
2182              be limited by the system administrator using the SchedulerParam‐
2183              eters configuration parameter with the max_switch_wait parameter
2184              option.  On a dragonfly network the only switch count  supported
2185              is  1 since communication performance will be highest when a job
2186              is allocate resources on one leaf switch or  more  than  2  leaf
2187              switches.   The  default  max-time is the max_switch_wait Sched‐
2188              ulerParameters. This option applies to job allocations.
2189
2190       --task-epilog=<executable>
2191              The slurmstepd daemon will run executable just after  each  task
2192              terminates.  This will be executed before any TaskEpilog parame‐
2193              ter in slurm.conf is executed.  This  is  meant  to  be  a  very
2194              short-lived  program. If it fails to terminate within a few sec‐
2195              onds, it will be killed along  with  any  descendant  processes.
2196              This option applies to step allocations.
2197
2198       --task-prolog=<executable>
2199              The  slurmstepd daemon will run executable just before launching
2200              each task. This will be executed after any TaskProlog  parameter
2201              in slurm.conf is executed.  Besides the normal environment vari‐
2202              ables, this has SLURM_TASK_PID available to identify the process
2203              ID of the task being started.  Standard output from this program
2204              of the form "export NAME=value" will be used to set  environment
2205              variables  for  the  task  being spawned. This option applies to
2206              step allocations.
2207
2208       --test-only
2209              Returns an estimate of when a job  would  be  scheduled  to  run
2210              given  the  current  job  queue and all the other srun arguments
2211              specifying the job.  This limits srun's behavior to just  return
2212              information;  no job is actually submitted.  The program will be
2213              executed directly by the slurmd daemon. This option  applies  to
2214              job allocations.
2215
2216       --thread-spec=<num>
2217              Count  of  specialized  threads per node reserved by the job for
2218              system operations and not used by the application. The  applica‐
2219              tion  will  not use these threads, but will be charged for their
2220              allocation.  This option can not be used  with  the  --core-spec
2221              option. This option applies to job allocations.
2222
2223              NOTE:  Explicitly  setting  a job's specialized thread value im‐
2224              plicitly sets its --exclusive option, reserving entire nodes for
2225              the job.
2226
2227       -T, --threads=<nthreads>
2228              Allows  limiting  the  number of concurrent threads used to send
2229              the job request from the srun process to the slurmd processes on
2230              the  allocated nodes. Default is to use one thread per allocated
2231              node up to a maximum of 60 concurrent threads.  Specifying  this
2232              option limits the number of concurrent threads to nthreads (less
2233              than or equal to 60).  This should only be used  to  set  a  low
2234              thread  count  for  testing on very small memory computers. This
2235              option applies to job allocations.
2236
2237       --threads-per-core=<threads>
2238              Restrict node selection to nodes with  at  least  the  specified
2239              number  of  threads  per core. In task layout, use the specified
2240              maximum number of threads per core.  Implies  --cpu-bind=threads
2241              unless overridden by command line or environment options.  NOTE:
2242              "Threads" refers to the number of processing units on each  core
2243              rather  than  the number of application tasks to be launched per
2244              core. See additional information  under  -B  option  above  when
2245              task/affinity  plugin is enabled. This option applies to job and
2246              step allocations.
2247              NOTE: This option may implicitly impact the number of  tasks  if
2248              -n was not specified.
2249
2250       -t, --time=<time>
2251              Set a limit on the total run time of the job allocation.  If the
2252              requested time limit exceeds the partition's time limit, the job
2253              will  be  left  in a PENDING state (possibly indefinitely).  The
2254              default time limit is the partition's default time limit.   When
2255              the  time  limit  is reached, each task in each job step is sent
2256              SIGTERM followed by SIGKILL.  The interval  between  signals  is
2257              specified  by  the  Slurm configuration parameter KillWait.  The
2258              OverTimeLimit configuration parameter may permit the job to  run
2259              longer than scheduled.  Time resolution is one minute and second
2260              values are rounded up to the next minute.
2261
2262              A time limit of zero requests that no  time  limit  be  imposed.
2263              Acceptable  time  formats  include "minutes", "minutes:seconds",
2264              "hours:minutes:seconds", "days-hours", "days-hours:minutes"  and
2265              "days-hours:minutes:seconds".  This  option  applies  to job and
2266              step allocations.
2267
2268       --time-min=<time>
2269              Set a minimum time limit on the job allocation.   If  specified,
2270              the  job  may  have its --time limit lowered to a value no lower
2271              than --time-min if doing so permits the job to  begin  execution
2272              earlier  than otherwise possible.  The job's time limit will not
2273              be changed after the job is allocated resources.  This  is  per‐
2274              formed  by a backfill scheduling algorithm to allocate resources
2275              otherwise reserved for higher priority  jobs.   Acceptable  time
2276              formats   include   "minutes",   "minutes:seconds",  "hours:min‐
2277              utes:seconds",    "days-hours",     "days-hours:minutes"     and
2278              "days-hours:minutes:seconds". This option applies to job alloca‐
2279              tions.
2280
2281       --tmp=<size>[units]
2282              Specify a minimum amount of temporary disk space per node.   De‐
2283              fault units are megabytes.  Different units can be specified us‐
2284              ing the suffix [K|M|G|T].  This option applies  to  job  alloca‐
2285              tions.
2286
2287       --uid=<user>
2288              Attempt to submit and/or run a job as user instead of the invok‐
2289              ing user id. The invoking user's credentials  will  be  used  to
2290              check access permissions for the target partition. User root may
2291              use this option to run jobs as a normal user in a RootOnly  par‐
2292              tition  for  example. If run as root, srun will drop its permis‐
2293              sions to the uid specified after node allocation is  successful.
2294              user  may be the user name or numerical user ID. This option ap‐
2295              plies to job and step allocations.
2296
2297       -u, --unbuffered
2298              By  default,  the  connection   between   slurmstepd   and   the
2299              user-launched application is over a pipe. The stdio output writ‐
2300              ten by the application is buffered by  the  glibc  until  it  is
2301              flushed  or  the output is set as unbuffered.  See setbuf(3). If
2302              this option is specified the tasks are executed  with  a  pseudo
2303              terminal  so that the application output is unbuffered. This op‐
2304              tion applies to step allocations.
2305
2306       --usage
2307              Display brief help message and exit.
2308
2309       --use-min-nodes
2310              If a range of node counts is given, prefer the smaller count.
2311
2312       -v, --verbose
2313              Increase the verbosity of srun's informational messages.  Multi‐
2314              ple  -v's  will  further  increase srun's verbosity.  By default
2315              only errors will be displayed. This option applies  to  job  and
2316              step allocations.
2317
2318       -V, --version
2319              Display version information and exit.
2320
2321       -W, --wait=<seconds>
2322              Specify  how long to wait after the first task terminates before
2323              terminating all remaining tasks. A value of 0 indicates  an  un‐
2324              limited  wait  (a  warning will be issued after 60 seconds). The
2325              default value is set by the WaitTime parameter in the slurm con‐
2326              figuration  file  (see slurm.conf(5)). This option can be useful
2327              to ensure that a job is terminated in a timely  fashion  in  the
2328              event  that  one or more tasks terminate prematurely.  Note: The
2329              -K, --kill-on-bad-exit option takes precedence over  -W,  --wait
2330              to terminate the job immediately if a task exits with a non-zero
2331              exit code. This option applies to job allocations.
2332
2333       --wckey=<wckey>
2334              Specify wckey to be used with job.  If  TrackWCKey=no  (default)
2335              in  the slurm.conf this value is ignored. This option applies to
2336              job allocations.
2337
2338       --x11[={all|first|last}]
2339              Sets up X11 forwarding on "all", "first" or  "last"  node(s)  of
2340              the  allocation.   This option is only enabled if Slurm was com‐
2341              piled with X11 support and PrologFlags=x11  is  defined  in  the
2342              slurm.conf. Default is "all".
2343
2344       srun will submit the job request to the slurm job controller, then ini‐
2345       tiate all processes on the remote nodes. If the request cannot  be  met
2346       immediately,  srun  will  block until the resources are free to run the
2347       job. If the -I (--immediate) option is specified srun will terminate if
2348       resources are not immediately available.
2349
2350       When  initiating remote processes srun will propagate the current work‐
2351       ing directory, unless --chdir=<path> is specified, in which  case  path
2352       will become the working directory for the remote processes.
2353
2354       The  -n,  -c,  and -N options control how CPUs  and nodes will be allo‐
2355       cated to the job. When specifying only the number of processes  to  run
2356       with  -n,  a default of one CPU per process is allocated. By specifying
2357       the number of CPUs required per task (-c), more than one CPU may be al‐
2358       located  per process. If the number of nodes is specified with -N, srun
2359       will attempt to allocate at least the number of nodes specified.
2360
2361       Combinations of the above three options may be used to change how  pro‐
2362       cesses are distributed across nodes and cpus. For instance, by specify‐
2363       ing both the number of processes and number of nodes on which  to  run,
2364       the  number of processes per node is implied. However, if the number of
2365       CPUs per process is more important then number of  processes  (-n)  and
2366       the number of CPUs per process (-c) should be specified.
2367
2368       srun  will  refuse  to   allocate  more than one process per CPU unless
2369       --overcommit (-O) is also specified.
2370
2371       srun will attempt to meet the above specifications "at a minimum." That
2372       is,  if  16 nodes are requested for 32 processes, and some nodes do not
2373       have 2 CPUs, the allocation of nodes will be increased in order to meet
2374       the  demand  for  CPUs. In other words, a minimum of 16 nodes are being
2375       requested. However, if 16 nodes are requested for  15  processes,  srun
2376       will  consider  this  an  error,  as  15 processes cannot run across 16
2377       nodes.
2378
2379
2380       IO Redirection
2381
2382       By default, stdout and stderr will be redirected from all tasks to  the
2383       stdout  and stderr of srun, and stdin will be redirected from the stan‐
2384       dard input of srun to all remote tasks.  If stdin is only to be read by
2385       a  subset  of  the spawned tasks, specifying a file to read from rather
2386       than forwarding stdin from the srun command may  be  preferable  as  it
2387       avoids moving and storing data that will never be read.
2388
2389       For  OS  X, the poll() function does not support stdin, so input from a
2390       terminal is not possible.
2391
2392       This behavior may be changed with the --output,  --error,  and  --input
2393       (-o, -e, -i) options. Valid format specifications for these options are
2394
2395
2396       all       stdout stderr is redirected from all tasks to srun.  stdin is
2397                 broadcast to all remote tasks.  (This is the  default  behav‐
2398                 ior)
2399
2400       none      stdout  and  stderr  is not received from any task.  stdin is
2401                 not sent to any task (stdin is closed).
2402
2403       taskid    stdout and/or stderr are redirected from only the  task  with
2404                 relative  id  equal  to  taskid, where 0 <= taskid <= ntasks,
2405                 where ntasks is the total number of tasks in the current  job
2406                 step.   stdin  is  redirected  from the stdin of srun to this
2407                 same task.  This file will be written on the  node  executing
2408                 the task.
2409
2410       filename  srun  will  redirect  stdout  and/or stderr to the named file
2411                 from all tasks.  stdin will be redirected from the named file
2412                 and  broadcast to all tasks in the job.  filename refers to a
2413                 path on the host that runs srun.  Depending on the  cluster's
2414                 file  system  layout, this may result in the output appearing
2415                 in different places depending on whether the job  is  run  in
2416                 batch mode.
2417
2418       filename pattern
2419                 srun allows for a filename pattern to be used to generate the
2420                 named IO file described above. The following list  of  format
2421                 specifiers  may  be  used  in the format string to generate a
2422                 filename that will be unique to a given jobid, stepid,  node,
2423                 or  task.  In  each case, the appropriate number of files are
2424                 opened and associated with the corresponding tasks. Note that
2425                 any  format string containing %t, %n, and/or %N will be writ‐
2426                 ten on the node executing the task rather than the node where
2427                 srun executes, these format specifiers are not supported on a
2428                 BGQ system.
2429
2430                 \\     Do not process any of the replacement symbols.
2431
2432                 %%     The character "%".
2433
2434                 %A     Job array's master job allocation number.
2435
2436                 %a     Job array ID (index) number.
2437
2438                 %J     jobid.stepid of the running job. (e.g. "128.0")
2439
2440                 %j     jobid of the running job.
2441
2442                 %s     stepid of the running job.
2443
2444                 %N     short hostname. This will create a  separate  IO  file
2445                        per node.
2446
2447                 %n     Node  identifier  relative to current job (e.g. "0" is
2448                        the first node of the running job) This will create  a
2449                        separate IO file per node.
2450
2451                 %t     task  identifier  (rank) relative to current job. This
2452                        will create a separate IO file per task.
2453
2454                 %u     User name.
2455
2456                 %x     Job name.
2457
2458                 A number placed between  the  percent  character  and  format
2459                 specifier  may be used to zero-pad the result in the IO file‐
2460                 name. This number is ignored if the format  specifier  corre‐
2461                 sponds to  non-numeric data (%N for example).
2462
2463                 Some  examples  of  how the format string may be used for a 4
2464                 task job step with a Job ID of 128 and step id of 0  are  in‐
2465                 cluded below:
2466
2467
2468                 job%J.out      job128.0.out
2469
2470                 job%4j.out     job0128.out
2471
2472                 job%j-%2t.out  job128-00.out, job128-01.out, ...
2473

PERFORMANCE

2475       Executing  srun  sends  a remote procedure call to slurmctld. If enough
2476       calls from srun or other Slurm client commands that send remote  proce‐
2477       dure  calls to the slurmctld daemon come in at once, it can result in a
2478       degradation of performance of the slurmctld daemon, possibly  resulting
2479       in a denial of service.
2480
2481       Do  not run srun or other Slurm client commands that send remote proce‐
2482       dure calls to slurmctld from loops in shell scripts or other  programs.
2483       Ensure  that  programs limit calls to srun to the minimum necessary for
2484       the information you are trying to gather.
2485
2486

INPUT ENVIRONMENT VARIABLES

2488       Upon startup, srun will read and handle the options set in the  follow‐
2489       ing  environment variables. The majority of these variables are set the
2490       same way the options are set, as defined above. For flag  options  that
2491       are defined to expect no argument, the option can be enabled by setting
2492       the environment variable without a value (empty or  NULL  string),  the
2493       string 'yes', or a non-zero number. Any other value for the environment
2494       variable will result in the option not being set.  There are  a  couple
2495       exceptions to these rules that are noted below.
2496       NOTE:  Command  line  options always override environment variable set‐
2497       tings.
2498
2499
2500       PMI_FANOUT            This is used exclusively  with  PMI  (MPICH2  and
2501                             MVAPICH2)  and controls the fanout of data commu‐
2502                             nications. The srun command sends messages to ap‐
2503                             plication  programs  (via  the  PMI  library) and
2504                             those applications may be called upon to  forward
2505                             that  data  to  up  to  this number of additional
2506                             tasks. Higher values offload work from  the  srun
2507                             command  to  the applications and likely increase
2508                             the vulnerability to failures.  The default value
2509                             is 32.
2510
2511       PMI_FANOUT_OFF_HOST   This  is  used  exclusively  with PMI (MPICH2 and
2512                             MVAPICH2) and controls the fanout of data  commu‐
2513                             nications.   The  srun  command sends messages to
2514                             application programs (via the  PMI  library)  and
2515                             those  applications may be called upon to forward
2516                             that data to additional tasks. By  default,  srun
2517                             sends  one  message per host and one task on that
2518                             host forwards the data to  other  tasks  on  that
2519                             host up to PMI_FANOUT.  If PMI_FANOUT_OFF_HOST is
2520                             defined, the user task may be required to forward
2521                             the  data  to  tasks  on  other  hosts.   Setting
2522                             PMI_FANOUT_OFF_HOST  may  increase   performance.
2523                             Since  more  work is performed by the PMI library
2524                             loaded by the user application, failures also can
2525                             be  more  common  and more difficult to diagnose.
2526                             Should be disabled/enabled by setting to 0 or 1.
2527
2528       PMI_TIME              This is used exclusively  with  PMI  (MPICH2  and
2529                             MVAPICH2)  and  controls  how much the communica‐
2530                             tions from the tasks to the srun are  spread  out
2531                             in  time  in order to avoid overwhelming the srun
2532                             command with work. The default value is 500  (mi‐
2533                             croseconds)  per task. On relatively slow proces‐
2534                             sors or systems with very large processor  counts
2535                             (and  large  PMI data sets), higher values may be
2536                             required.
2537
2538       SLURM_ACCOUNT         Same as -A, --account
2539
2540       SLURM_ACCTG_FREQ      Same as --acctg-freq
2541
2542       SLURM_BCAST           Same as --bcast
2543
2544       SLURM_BCAST_EXCLUDE   Same as --bcast-exclude
2545
2546       SLURM_BURST_BUFFER    Same as --bb
2547
2548       SLURM_CLUSTERS        Same as -M, --clusters
2549
2550       SLURM_COMPRESS        Same as --compress
2551
2552       SLURM_CONF            The location of the Slurm configuration file.
2553
2554       SLURM_CONSTRAINT      Same as -C, --constraint
2555
2556       SLURM_CORE_SPEC       Same as --core-spec
2557
2558       SLURM_CPU_BIND        Same as --cpu-bind
2559
2560       SLURM_CPU_FREQ_REQ    Same as --cpu-freq.
2561
2562       SLURM_CPUS_PER_GPU    Same as --cpus-per-gpu
2563
2564       SRUN_CPUS_PER_TASK    Same as -c, --cpus-per-task
2565
2566       SLURM_DEBUG           Same as -v, --verbose. Must be set to 0 or  1  to
2567                             disable or enable the option.
2568
2569       SLURM_DELAY_BOOT      Same as --delay-boot
2570
2571       SLURM_DEPENDENCY      Same as -d, --dependency=<jobid>
2572
2573       SLURM_DISABLE_STATUS  Same as -X, --disable-status
2574
2575       SLURM_DIST_PLANESIZE  Plane distribution size. Only used if --distribu‐
2576                             tion=plane, without =<size>, is set.
2577
2578       SLURM_DISTRIBUTION    Same as -m, --distribution
2579
2580       SLURM_EPILOG          Same as --epilog
2581
2582       SLURM_EXACT           Same as --exact
2583
2584       SLURM_EXCLUSIVE       Same as --exclusive
2585
2586       SLURM_EXIT_ERROR      Specifies the exit code generated  when  a  Slurm
2587                             error occurs (e.g. invalid options).  This can be
2588                             used by a script to distinguish application  exit
2589                             codes  from various Slurm error conditions.  Also
2590                             see SLURM_EXIT_IMMEDIATE.
2591
2592       SLURM_EXIT_IMMEDIATE  Specifies the exit code generated when the  --im‐
2593                             mediate option is used and resources are not cur‐
2594                             rently available.  This can be used by  a  script
2595                             to  distinguish application exit codes from vari‐
2596                             ous   Slurm   error   conditions.     Also    see
2597                             SLURM_EXIT_ERROR.
2598
2599       SLURM_EXPORT_ENV      Same as --export
2600
2601       SLURM_GPU_BIND        Same as --gpu-bind
2602
2603       SLURM_GPU_FREQ        Same as --gpu-freq
2604
2605       SLURM_GPUS            Same as -G, --gpus
2606
2607       SLURM_GPUS_PER_NODE   Same as --gpus-per-node
2608
2609       SLURM_GPUS_PER_TASK   Same as --gpus-per-task
2610
2611       SLURM_GRES            Same as --gres. Also see SLURM_STEP_GRES
2612
2613       SLURM_GRES_FLAGS      Same as --gres-flags
2614
2615       SLURM_HINT            Same as --hint
2616
2617       SLURM_IMMEDIATE       Same as -I, --immediate
2618
2619       SLURM_JOB_ID          Same as --jobid
2620
2621       SLURM_JOB_NAME        Same  as -J, --job-name except within an existing
2622                             allocation, in which case it is ignored to  avoid
2623                             using  the  batch  job's name as the name of each
2624                             job step.
2625
2626       SLURM_JOB_NUM_NODES   Same as -N, --nodes.  Total number  of  nodes  in
2627                             the job’s resource allocation.
2628
2629       SLURM_KILL_BAD_EXIT   Same  as -K, --kill-on-bad-exit. Must be set to 0
2630                             or 1 to disable or enable the option.
2631
2632       SLURM_LABELIO         Same as -l, --label
2633
2634       SLURM_MEM_BIND        Same as --mem-bind
2635
2636       SLURM_MEM_PER_CPU     Same as --mem-per-cpu
2637
2638       SLURM_MEM_PER_GPU     Same as --mem-per-gpu
2639
2640       SLURM_MEM_PER_NODE    Same as --mem
2641
2642       SLURM_MPI_TYPE        Same as --mpi
2643
2644       SLURM_NETWORK         Same as --network
2645
2646       SLURM_NNODES          Same as -N, --nodes. Total number of nodes in the
2647                             job’s        resource       allocation.       See
2648                             SLURM_JOB_NUM_NODES. Included for backwards  com‐
2649                             patibility.
2650
2651       SLURM_NO_KILL         Same as -k, --no-kill
2652
2653       SLURM_NPROCS          Same  as -n, --ntasks. See SLURM_NTASKS. Included
2654                             for backwards compatibility.
2655
2656       SLURM_NTASKS          Same as -n, --ntasks
2657
2658       SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2659
2660       SLURM_NTASKS_PER_GPU  Same as --ntasks-per-gpu
2661
2662       SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2663
2664       SLURM_NTASKS_PER_SOCKET
2665                             Same as --ntasks-per-socket
2666
2667       SLURM_OPEN_MODE       Same as --open-mode
2668
2669       SLURM_OVERCOMMIT      Same as -O, --overcommit
2670
2671       SLURM_OVERLAP         Same as --overlap
2672
2673       SLURM_PARTITION       Same as -p, --partition
2674
2675       SLURM_PMI_KVS_NO_DUP_KEYS
2676                             If set, then PMI key-pairs will contain no dupli‐
2677                             cate  keys.  MPI  can use this variable to inform
2678                             the PMI library that it will  not  use  duplicate
2679                             keys  so  PMI  can  skip  the check for duplicate
2680                             keys.  This is the case for  MPICH2  and  reduces
2681                             overhead  in  testing for duplicates for improved
2682                             performance
2683
2684       SLURM_POWER           Same as --power
2685
2686       SLURM_PROFILE         Same as --profile
2687
2688       SLURM_PROLOG          Same as --prolog
2689
2690       SLURM_QOS             Same as --qos
2691
2692       SLURM_REMOTE_CWD      Same as -D, --chdir=
2693
2694       SLURM_REQ_SWITCH      When a tree topology is used,  this  defines  the
2695                             maximum count of switches desired for the job al‐
2696                             location and optionally the maximum time to  wait
2697                             for that number of switches. See --switches
2698
2699       SLURM_RESERVATION     Same as --reservation
2700
2701       SLURM_RESV_PORTS      Same as --resv-ports
2702
2703       SLURM_SEND_LIBS       Same as --send-libs
2704
2705       SLURM_SIGNAL          Same as --signal
2706
2707       SLURM_SPREAD_JOB      Same as --spread-job
2708
2709       SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2710                             if  set  and  non-zero, successive task exit mes‐
2711                             sages with the same exit  code  will  be  printed
2712                             only once.
2713
2714       SLURM_STDERRMODE      Same as -e, --error
2715
2716       SLURM_STDINMODE       Same as -i, --input
2717
2718       SLURM_STDOUTMODE      Same as -o, --output
2719
2720       SLURM_STEP_GRES       Same as --gres (only applies to job steps, not to
2721                             job allocations).  Also see SLURM_GRES
2722
2723       SLURM_STEP_KILLED_MSG_NODE_ID=ID
2724                             If set, only the specified node will log when the
2725                             job or step are killed by a signal.
2726
2727       SLURM_TASK_EPILOG     Same as --task-epilog
2728
2729       SLURM_TASK_PROLOG     Same as --task-prolog
2730
2731       SLURM_TEST_EXEC       If defined, srun will verify existence of the ex‐
2732                             ecutable program along with user execute  permis‐
2733                             sion on the node where srun was called before at‐
2734                             tempting to launch it on nodes in the step.
2735
2736       SLURM_THREAD_SPEC     Same as --thread-spec
2737
2738       SLURM_THREADS         Same as -T, --threads
2739
2740       SLURM_THREADS_PER_CORE
2741                             Same as --threads-per-core
2742
2743       SLURM_TIMELIMIT       Same as -t, --time
2744
2745       SLURM_UMASK           If defined, Slurm will use the defined  umask  to
2746                             set  permissions  when  creating the output/error
2747                             files for the job.
2748
2749       SLURM_UNBUFFEREDIO    Same as -u, --unbuffered
2750
2751       SLURM_USE_MIN_NODES   Same as --use-min-nodes
2752
2753       SLURM_WAIT            Same as -W, --wait
2754
2755       SLURM_WAIT4SWITCH     Max time  waiting  for  requested  switches.  See
2756                             --switches
2757
2758       SLURM_WCKEY           Same as -W, --wckey
2759
2760       SLURM_WORKING_DIR     -D, --chdir
2761
2762       SLURMD_DEBUG          Same as -d, --slurmd-debug. Must be set to 0 or 1
2763                             to disable or enable the option.
2764
2765       SRUN_CONTAINER        Same as --container.
2766
2767       SRUN_EXPORT_ENV       Same as --export, and will override  any  setting
2768                             for SLURM_EXPORT_ENV.
2769

OUTPUT ENVIRONMENT VARIABLES

2771       srun will set some environment variables in the environment of the exe‐
2772       cuting tasks on the remote compute nodes.  These environment  variables
2773       are:
2774
2775
2776       SLURM_*_HET_GROUP_#   For  a heterogeneous job allocation, the environ‐
2777                             ment variables are set separately for each compo‐
2778                             nent.
2779
2780       SLURM_CLUSTER_NAME    Name  of  the cluster on which the job is execut‐
2781                             ing.
2782
2783       SLURM_CPU_BIND_LIST   --cpu-bind map or mask list (list  of  Slurm  CPU
2784                             IDs  or  masks for this node, CPU_ID = Board_ID x
2785                             threads_per_board       +       Socket_ID       x
2786                             threads_per_socket + Core_ID x threads_per_core +
2787                             Thread_ID).
2788
2789       SLURM_CPU_BIND_TYPE   --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2790
2791       SLURM_CPU_BIND_VERBOSE
2792                             --cpu-bind verbosity (quiet,verbose).
2793
2794       SLURM_CPU_FREQ_REQ    Contains the value requested for cpu frequency on
2795                             the  srun  command  as  a  numerical frequency in
2796                             kilohertz, or a coded value for a request of low,
2797                             medium,highm1 or high for the frequency.  See the
2798                             description  of  the  --cpu-freq  option  or  the
2799                             SLURM_CPU_FREQ_REQ input environment variable.
2800
2801       SLURM_CPUS_ON_NODE    Number  of  CPUs  available  to  the step on this
2802                             node.  NOTE: The select/linear  plugin  allocates
2803                             entire  nodes to jobs, so the value indicates the
2804                             total count of CPUs on the  node.   For  the  se‐
2805                             lect/cons_res  and cons/tres plugins, this number
2806                             indicates the number of CPUs on this  node  allo‐
2807                             cated to the step.
2808
2809       SLURM_CPUS_PER_TASK   Number  of  cpus requested per task.  Only set if
2810                             the --cpus-per-task option is specified.
2811
2812       SLURM_DISTRIBUTION    Distribution type for the allocated jobs. Set the
2813                             distribution with -m, --distribution.
2814
2815       SLURM_GPUS_ON_NODE    Number  of  GPUs  available  to  the step on this
2816                             node.
2817
2818       SLURM_GTIDS           Global task IDs running on this node.  Zero  ori‐
2819                             gin  and  comma separated.  It is read internally
2820                             by pmi if Slurm was built with pmi support. Leav‐
2821                             ing  the variable set may cause problems when us‐
2822                             ing external packages from within the job (Abaqus
2823                             and  Ansys  have been known to have problems when
2824                             it is set - consult the appropriate documentation
2825                             for 3rd party software).
2826
2827       SLURM_HET_SIZE        Set to count of components in heterogeneous job.
2828
2829       SLURM_JOB_ACCOUNT     Account name associated of the job allocation.
2830
2831       SLURM_JOB_CPUS_PER_NODE
2832                             Count  of  CPUs available to the job on the nodes
2833                             in   the    allocation,    using    the    format
2834                             CPU_count[(xnumber_of_nodes)][,CPU_count  [(xnum‐
2835                             ber_of_nodes)]      ...].       For      example:
2836                             SLURM_JOB_CPUS_PER_NODE='72(x2),36'     indicates
2837                             that on the first and second nodes (as listed  by
2838                             SLURM_JOB_NODELIST)  the  allocation has 72 CPUs,
2839                             while the third node has 36 CPUs.  NOTE: The  se‐
2840                             lect/linear  plugin  allocates  entire  nodes  to
2841                             jobs, so the value indicates the total  count  of
2842                             CPUs  on allocated nodes. The select/cons_res and
2843                             select/cons_tres plugins allocate individual CPUs
2844                             to  jobs,  so this number indicates the number of
2845                             CPUs allocated to the job.
2846
2847       SLURM_JOB_DEPENDENCY  Set to value of the --dependency option.
2848
2849       SLURM_JOB_GPUS        The global GPU IDs of the GPUs allocated to  this
2850                             job.  The  GPU IDs are not relative to any device
2851                             cgroup, even  if  devices  are  constrained  with
2852                             task/cgroup.   Only  set in batch and interactive
2853                             jobs.
2854
2855       SLURM_JOB_ID          Job id of the executing job.
2856
2857       SLURM_JOB_NAME        Set to the value of the --job-name option or  the
2858                             command  name  when  srun is used to create a new
2859                             job allocation. Not set when srun is used only to
2860                             create  a  job  step (i.e. within an existing job
2861                             allocation).
2862
2863       SLURM_JOB_NODELIST    List of nodes allocated to the job.
2864
2865       SLURM_JOB_NODES       Total number of nodes in the job's resource allo‐
2866                             cation.
2867
2868       SLURM_JOB_PARTITION   Name  of  the  partition in which the job is run‐
2869                             ning.
2870
2871       SLURM_JOB_QOS         Quality Of Service (QOS) of the job allocation.
2872
2873       SLURM_JOB_RESERVATION Advanced reservation containing the  job  alloca‐
2874                             tion, if any.
2875
2876       SLURM_JOBID           Job  id  of  the executing job. See SLURM_JOB_ID.
2877                             Included for backwards compatibility.
2878
2879       SLURM_LAUNCH_NODE_IPADDR
2880                             IP address of the node from which the task launch
2881                             was initiated (where the srun command ran from).
2882
2883       SLURM_LOCALID         Node local task ID for the process within a job.
2884
2885       SLURM_MEM_BIND_LIST   --mem-bind  map  or  mask  list  (<list of IDs or
2886                             masks for this node>).
2887
2888       SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2889
2890       SLURM_MEM_BIND_SORT   Sort free cache pages (run zonesort on Intel  KNL
2891                             nodes).
2892
2893       SLURM_MEM_BIND_TYPE   --mem-bind type (none,rank,map_mem:,mask_mem:).
2894
2895       SLURM_MEM_BIND_VERBOSE
2896                             --mem-bind verbosity (quiet,verbose).
2897
2898       SLURM_NODE_ALIASES    Sets  of  node  name,  communication  address and
2899                             hostname for nodes allocated to the job from  the
2900                             cloud. Each element in the set if colon separated
2901                             and each set is comma separated. For example:
2902                             SLURM_NODE_ALIASES=
2903                             ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2904
2905       SLURM_NODEID          The relative node ID of the current node.
2906
2907       SLURM_NPROCS          Total number of processes in the current  job  or
2908                             job  step.  See  SLURM_NTASKS. Included for back‐
2909                             wards compatibility.
2910
2911       SLURM_NTASKS          Total number of processes in the current  job  or
2912                             job step.
2913
2914       SLURM_OVERCOMMIT      Set to 1 if --overcommit was specified.
2915
2916       SLURM_PRIO_PROCESS    The  scheduling priority (nice value) at the time
2917                             of job submission.  This value is  propagated  to
2918                             the spawned processes.
2919
2920       SLURM_PROCID          The MPI rank (or relative process ID) of the cur‐
2921                             rent process.
2922
2923       SLURM_SRUN_COMM_HOST  IP address of srun communication host.
2924
2925       SLURM_SRUN_COMM_PORT  srun communication port.
2926
2927       SLURM_CONTAINER       OCI Bundle for job.  Only set if  --container  is
2928                             specified.
2929
2930       SLURM_SHARDS_ON_NODE  Number  of  GPU  Shards  available to the step on
2931                             this node.
2932
2933       SLURM_STEP_GPUS       The global GPU IDs of the GPUs allocated to  this
2934                             step (excluding batch and interactive steps). The
2935                             GPU IDs are not relative to  any  device  cgroup,
2936                             even if devices are constrained with task/cgroup.
2937
2938       SLURM_STEP_ID         The step ID of the current job.
2939
2940       SLURM_STEP_LAUNCHER_PORT
2941                             Step launcher port.
2942
2943       SLURM_STEP_NODELIST   List of nodes allocated to the step.
2944
2945       SLURM_STEP_NUM_NODES  Number of nodes allocated to the step.
2946
2947       SLURM_STEP_NUM_TASKS  Number of processes in the job step or whole het‐
2948                             erogeneous job step.
2949
2950       SLURM_STEP_TASKS_PER_NODE
2951                             Number of processes per node within the step.
2952
2953       SLURM_STEPID          The   step   ID   of   the   current   job.   See
2954                             SLURM_STEP_ID. Included for backwards compatibil‐
2955                             ity.
2956
2957       SLURM_SUBMIT_DIR      The directory from which the allocation  was  in‐
2958                             voked from.
2959
2960       SLURM_SUBMIT_HOST     The hostname of the computer from which the allo‐
2961                             cation was invoked from.
2962
2963       SLURM_TASK_PID        The process ID of the task being started.
2964
2965       SLURM_TASKS_PER_NODE  Number of tasks to be  initiated  on  each  node.
2966                             Values  are comma separated and in the same order
2967                             as SLURM_JOB_NODELIST.  If two or  more  consecu‐
2968                             tive  nodes are to have the same task count, that
2969                             count is followed by "(x#)" where "#" is the rep‐
2970                             etition         count.        For        example,
2971                             "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2972                             first three nodes will each execute two tasks and
2973                             the fourth node will execute one task.
2974
2975       SLURM_TOPOLOGY_ADDR   This is set only if the  system  has  the  topol‐
2976                             ogy/tree  plugin  configured.   The value will be
2977                             set to the names network switches  which  may  be
2978                             involved  in  the  job's  communications from the
2979                             system's top level switch down to the leaf switch
2980                             and  ending  with  node name. A period is used to
2981                             separate each hardware component name.
2982
2983       SLURM_TOPOLOGY_ADDR_PATTERN
2984                             This is set only if the  system  has  the  topol‐
2985                             ogy/tree  plugin  configured.   The value will be
2986                             set  component  types  listed   in   SLURM_TOPOL‐
2987                             OGY_ADDR.   Each  component will be identified as
2988                             either "switch" or "node".  A period is  used  to
2989                             separate each hardware component type.
2990
2991       SLURM_UMASK           The umask in effect when the job was submitted.
2992
2993       SLURMD_NODENAME       Name of the node running the task. In the case of
2994                             a parallel  job  executing  on  multiple  compute
2995                             nodes,  the various tasks will have this environ‐
2996                             ment variable set to  different  values  on  each
2997                             compute node.
2998
2999       SRUN_DEBUG            Set  to  the  logging  level of the srun command.
3000                             Default value is 3 (info level).   The  value  is
3001                             incremented  or decremented based upon the --ver‐
3002                             bose and --quiet options.
3003

SIGNALS AND ESCAPE SEQUENCES

3005       Signals sent to the srun command are  automatically  forwarded  to  the
3006       tasks  it  is  controlling  with  a few exceptions. The escape sequence
3007       <control-c> will report the state of all tasks associated with the srun
3008       command.  If  <control-c>  is entered twice within one second, then the
3009       associated SIGINT signal will be sent to all tasks  and  a  termination
3010       sequence  will  be entered sending SIGCONT, SIGTERM, and SIGKILL to all
3011       spawned tasks.  If a third <control-c> is received,  the  srun  program
3012       will  be  terminated  without waiting for remote tasks to exit or their
3013       I/O to complete.
3014
3015       The escape sequence <control-z> is presently ignored.
3016
3017

MPI SUPPORT

3019       MPI use depends upon the type of MPI being used.  There are three  fun‐
3020       damentally  different  modes of operation used by these various MPI im‐
3021       plementations.
3022
3023       1. Slurm directly launches the tasks  and  performs  initialization  of
3024       communications  through the PMI2 or PMIx APIs.  For example: "srun -n16
3025       a.out".
3026
3027       2. Slurm creates a resource allocation for  the  job  and  then  mpirun
3028       launches tasks using Slurm's infrastructure (OpenMPI).
3029
3030       3.  Slurm  creates  a  resource  allocation for the job and then mpirun
3031       launches tasks using some mechanism other than Slurm, such  as  SSH  or
3032       RSH.   These  tasks are initiated outside of Slurm's monitoring or con‐
3033       trol. Slurm's epilog should be configured to purge these tasks when the
3034       job's  allocation  is  relinquished,  or  the use of pam_slurm_adopt is
3035       highly recommended.
3036
3037       See https://slurm.schedmd.com/mpi_guide.html for  more  information  on
3038       use of these various MPI implementations with Slurm.
3039
3040

MULTIPLE PROGRAM CONFIGURATION

3042       Comments  in the configuration file must have a "#" in column one.  The
3043       configuration file contains the following  fields  separated  by  white
3044       space:
3045
3046
3047       Task rank
3048              One or more task ranks to use this configuration.  Multiple val‐
3049              ues may be comma separated.  Ranges may be  indicated  with  two
3050              numbers separated with a '-' with the smaller number first (e.g.
3051              "0-4" and not "4-0").  To indicate all tasks not otherwise spec‐
3052              ified,  specify  a rank of '*' as the last line of the file.  If
3053              an attempt is made to initiate a task for  which  no  executable
3054              program is defined, the following error message will be produced
3055              "No executable program specified for this task".
3056
3057       Executable
3058              The name of the program to  execute.   May  be  fully  qualified
3059              pathname if desired.
3060
3061       Arguments
3062              Program  arguments.   The  expression "%t" will be replaced with
3063              the task's number.  The expression "%o" will  be  replaced  with
3064              the task's offset within this range (e.g. a configured task rank
3065              value of "1-5" would  have  offset  values  of  "0-4").   Single
3066              quotes  may  be  used to avoid having the enclosed values inter‐
3067              preted.  This field is optional.  Any arguments for the  program
3068              entered on the command line will be added to the arguments spec‐
3069              ified in the configuration file.
3070
3071       For example:
3072
3073       $ cat silly.conf
3074       ###################################################################
3075       # srun multiple program configuration file
3076       #
3077       # srun -n8 -l --multi-prog silly.conf
3078       ###################################################################
3079       4-6       hostname
3080       1,7       echo  task:%t
3081       0,2-3     echo  offset:%o
3082
3083       $ srun -n8 -l --multi-prog silly.conf
3084       0: offset:0
3085       1: task:1
3086       2: offset:1
3087       3: offset:2
3088       4: linux15.llnl.gov
3089       5: linux16.llnl.gov
3090       6: linux17.llnl.gov
3091       7: task:7
3092
3093

EXAMPLES

3095       This simple example demonstrates the execution of the command  hostname
3096       in  eight tasks. At least eight processors will be allocated to the job
3097       (the same as the task count) on however many nodes are required to sat‐
3098       isfy  the  request.  The output of each task will be proceeded with its
3099       task number.  (The machine "dev" in the example below has  a  total  of
3100       two CPUs per node)
3101
3102       $ srun -n8 -l hostname
3103       0: dev0
3104       1: dev0
3105       2: dev1
3106       3: dev1
3107       4: dev2
3108       5: dev2
3109       6: dev3
3110       7: dev3
3111
3112
3113       The  srun -r option is used within a job script to run two job steps on
3114       disjoint nodes in the following example. The script is run using  allo‐
3115       cate mode instead of as a batch job in this case.
3116
3117       $ cat test.sh
3118       #!/bin/sh
3119       echo $SLURM_JOB_NODELIST
3120       srun -lN2 -r2 hostname
3121       srun -lN2 hostname
3122
3123       $ salloc -N4 test.sh
3124       dev[7-10]
3125       0: dev9
3126       1: dev10
3127       0: dev7
3128       1: dev8
3129
3130
3131       The following script runs two job steps in parallel within an allocated
3132       set of nodes.
3133
3134       $ cat test.sh
3135       #!/bin/bash
3136       srun -lN2 -n4 -r 2 sleep 60 &
3137       srun -lN2 -r 0 sleep 60 &
3138       sleep 1
3139       squeue
3140       squeue -s
3141       wait
3142
3143       $ salloc -N4 test.sh
3144         JOBID PARTITION     NAME     USER  ST      TIME  NODES NODELIST
3145         65641     batch  test.sh   grondo   R      0:01      4 dev[7-10]
3146
3147       STEPID     PARTITION     USER      TIME NODELIST
3148       65641.0        batch   grondo      0:01 dev[7-8]
3149       65641.1        batch   grondo      0:01 dev[9-10]
3150
3151
3152       This example demonstrates how one executes a simple MPI  job.   We  use
3153       srun  to  build  a list of machines (nodes) to be used by mpirun in its
3154       required format. A sample command line and the script  to  be  executed
3155       follow.
3156
3157       $ cat test.sh
3158       #!/bin/sh
3159       MACHINEFILE="nodes.$SLURM_JOB_ID"
3160
3161       # Generate Machinefile for mpi such that hosts are in the same
3162       #  order as if run via srun
3163       #
3164       srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3165
3166       # Run using generated Machine file:
3167       mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3168
3169       rm $MACHINEFILE
3170
3171       $ salloc -N2 -n4 test.sh
3172
3173
3174       This  simple  example  demonstrates  the execution of different jobs on
3175       different nodes in the same srun.  You can do this for  any  number  of
3176       nodes  or  any number of jobs.  The executables are placed on the nodes
3177       sited by the SLURM_NODEID env var.  Starting at 0 and going to the num‐
3178       ber specified on the srun command line.
3179
3180       $ cat test.sh
3181       case $SLURM_NODEID in
3182           0) echo "I am running on "
3183              hostname ;;
3184           1) hostname
3185              echo "is where I am running" ;;
3186       esac
3187
3188       $ srun -N2 test.sh
3189       dev0
3190       is where I am running
3191       I am running on
3192       dev1
3193
3194
3195       This  example  demonstrates use of multi-core options to control layout
3196       of tasks.  We request that four sockets per  node  and  two  cores  per
3197       socket be dedicated to the job.
3198
3199       $ srun -N2 -B 4-4:2-2 a.out
3200
3201
3202       This  example shows a script in which Slurm is used to provide resource
3203       management for a job by executing the various job steps  as  processors
3204       become available for their dedicated use.
3205
3206       $ cat my.script
3207       #!/bin/bash
3208       srun -n4 prog1 &
3209       srun -n3 prog2 &
3210       srun -n1 prog3 &
3211       srun -n1 prog4 &
3212       wait
3213
3214
3215       This  example  shows  how to launch an application called "server" with
3216       one task, 8 CPUs and 16 GB of memory (2 GB per CPU) plus another appli‐
3217       cation  called "client" with 16 tasks, 1 CPU per task (the default) and
3218       1 GB of memory per task.
3219
3220       $ srun -n1 -c16 --mem-per-cpu=1gb server : -n16 --mem-per-cpu=1gb client
3221
3222

COPYING

3224       Copyright (C) 2006-2007 The Regents of the  University  of  California.
3225       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3226       Copyright (C) 2008-2010 Lawrence Livermore National Security.
3227       Copyright (C) 2010-2022 SchedMD LLC.
3228
3229       This  file  is  part  of Slurm, a resource management program.  For de‐
3230       tails, see <https://slurm.schedmd.com/>.
3231
3232       Slurm is free software; you can redistribute it and/or modify it  under
3233       the  terms  of  the GNU General Public License as published by the Free
3234       Software Foundation; either version 2 of the License, or (at  your  op‐
3235       tion) any later version.
3236
3237       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
3238       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
3239       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
3240       for more details.
3241
3242