srun(1) - f37

1srun(1)                         Slurm Commands                         srun(1)
2
3
4

NAME

6       srun - Run parallel jobs
7
8

SYNOPSIS

10       srun  [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]]
11       executable(N) [args(N)...]
12
13       Option(s) define multiple jobs in  a  co-scheduled  heterogeneous  job.
14       For more details about heterogeneous jobs see the document
15       https://slurm.schedmd.com/heterogeneous_jobs.html
16
17

DESCRIPTION

19       Run  a  parallel  job  on cluster managed by Slurm.  If necessary, srun
20       will first create a resource allocation in which to  run  the  parallel
21       job.
22
23       The  following  document  describes the influence of various options on
24       the allocation of cpus to jobs and tasks.
25       https://slurm.schedmd.com/cpu_management.html
26
27

RETURN VALUE

29       srun will return the highest exit code of all tasks run or the  highest
30       signal  (with  the high-order bit set in an 8-bit integer -- e.g. 128 +
31       signal) of any task that exited with a signal.
32       The value 253 is reserved for out-of-memory errors.
33
34

EXECUTABLE PATH RESOLUTION

36       The executable is resolved in the following order:
37
38       1. If executable starts with ".", then path is constructed as:  current
39       working directory / executable
40       2. If executable starts with a "/", then path is considered absolute.
41       3. If executable can be resolved through PATH. See path_resolution(7).
42       4. If executable is in current working directory.
43
44       Current  working directory is the calling process working directory un‐
45       less the --chdir argument is passed, which will  override  the  current
46       working directory.
47
48

OPTIONS

50       --accel-bind=<options>
51              Control how tasks are bound to generic resources of type gpu and
52              nic.  Multiple options may be specified. Supported  options  in‐
53              clude:
54
55              g      Bind each task to GPUs which are closest to the allocated
56                     CPUs.
57
58              n      Bind each task to NICs which are closest to the allocated
59                     CPUs.
60
61              v      Verbose  mode. Log how tasks are bound to GPU and NIC de‐
62                     vices.
63
64              This option applies to job allocations.
65
66       -A, --account=<account>
67              Charge resources used by this job to specified account.  The ac‐
68              count  is  an  arbitrary string. The account name may be changed
69              after job submission using the scontrol command. This option ap‐
70              plies to job allocations.
71
72       --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
73              Define  the  job  accounting and profiling sampling intervals in
74              seconds.  This can be used  to  override  the  JobAcctGatherFre‐
75              quency  parameter  in the slurm.conf file. <datatype>=<interval>
76              specifies the task  sampling  interval  for  the  jobacct_gather
77              plugin  or  a  sampling  interval  for  a  profiling type by the
78              acct_gather_profile     plugin.     Multiple     comma-separated
79              <datatype>=<interval> pairs may be specified. Supported datatype
80              values are:
81
82              task        Sampling interval for the jobacct_gather plugins and
83                          for   task   profiling  by  the  acct_gather_profile
84                          plugin.
85                          NOTE: This frequency is used to monitor  memory  us‐
86                          age.  If memory limits are enforced the highest fre‐
87                          quency a user can request is what is  configured  in
88                          the slurm.conf file.  It can not be disabled.
89
90              energy      Sampling  interval  for  energy  profiling using the
91                          acct_gather_energy plugin.
92
93              network     Sampling interval for infiniband profiling using the
94                          acct_gather_interconnect plugin.
95
96              filesystem  Sampling interval for filesystem profiling using the
97                          acct_gather_filesystem plugin.
98
99
100              The default value for the task sampling interval is 30  seconds.
101              The  default value for all other intervals is 0.  An interval of
102              0 disables sampling of the specified type.  If the task sampling
103              interval  is  0, accounting information is collected only at job
104              termination (reducing Slurm interference with the job).
105              Smaller (non-zero) values have a greater impact upon job perfor‐
106              mance,  but a value of 30 seconds is not likely to be noticeable
107              for applications having less than 10,000 tasks. This option  ap‐
108              plies to job allocations.
109
110       --bb=<spec>
111              Burst  buffer  specification.  The  form of the specification is
112              system dependent.  Also see --bbf. This option  applies  to  job
113              allocations.   When  the  --bb option is used, Slurm parses this
114              option and creates a temporary burst buffer script file that  is
115              used  internally  by the burst buffer plugins. See Slurm's burst
116              buffer guide for more information and examples:
117              https://slurm.schedmd.com/burst_buffer.html
118
119       --bbf=<file_name>
120              Path of file containing burst buffer specification.  The form of
121              the  specification is system dependent.  Also see --bb. This op‐
122              tion applies to job allocations.  See Slurm's burst buffer guide
123              for more information and examples:
124              https://slurm.schedmd.com/burst_buffer.html
125
126       --bcast[=<dest_path>]
127              Copy executable file to allocated compute nodes.  If a file name
128              is specified, copy the executable to the  specified  destination
129              file path.  If the path specified ends with '/' it is treated as
130              a target directory,  and  the  destination  file  name  will  be
131              slurm_bcast_<job_id>.<step_id>_<nodename>.   If  no dest_path is
132              specified and the slurm.conf BcastParameters DestDir is  config‐
133              ured  then  it  is used, and the filename follows the above pat‐
134              tern. If none of the previous  is  specified,  then  --chdir  is
135              used, and the filename follows the above pattern too.  For exam‐
136              ple, "srun --bcast=/tmp/mine  -N3  a.out"  will  copy  the  file
137              "a.out"  from  your current directory to the file "/tmp/mine" on
138              each of the three allocated compute nodes and execute that file.
139              This option applies to step allocations.
140
141       --bcast-exclude={NONE|<exclude_path>[,<exclude_path>...]}
142              Comma-separated  list of absolute directory paths to be excluded
143              when autodetecting and broadcasting executable shared object de‐
144              pendencies through --bcast. If the keyword "NONE" is configured,
145              no directory paths will be excluded. The default value  is  that
146              of  slurm.conf  BcastExclude  and  this option overrides it. See
147              also --bcast and --send-libs.
148
149       -b, --begin=<time>
150              Defer initiation of this job until the specified time.   It  ac‐
151              cepts times of the form HH:MM:SS to run a job at a specific time
152              of day (seconds are optional).  (If that time is  already  past,
153              the  next day is assumed.)  You may also specify midnight, noon,
154              fika (3 PM) or teatime (4 PM) and you  can  have  a  time-of-day
155              suffixed  with  AM  or  PM  for  running  in  the morning or the
156              evening.  You can also say what day the  job  will  be  run,  by
157              specifying  a  date  of  the form MMDDYY or MM/DD/YY YYYY-MM-DD.
158              Combine   date   and   time   using   the    following    format
159              YYYY-MM-DD[THH:MM[:SS]].  You  can  also  give  times like now +
160              count time-units, where the time-units can be seconds (default),
161              minutes, hours, days, or weeks and you can tell Slurm to run the
162              job today with the keyword today and to  run  the  job  tomorrow
163              with  the  keyword tomorrow.  The value may be changed after job
164              submission using the scontrol command.  For example:
165
166                 --begin=16:00
167                 --begin=now+1hour
168                 --begin=now+60           (seconds by default)
169                 --begin=2010-01-20T12:34:00
170
171
172              Notes on date/time specifications:
173               - Although the 'seconds' field of the HH:MM:SS time  specifica‐
174              tion  is  allowed  by  the  code, note that the poll time of the
175              Slurm scheduler is not precise enough to guarantee  dispatch  of
176              the  job on the exact second.  The job will be eligible to start
177              on the next poll following the specified time.  The  exact  poll
178              interval  depends  on the Slurm scheduler (e.g., 60 seconds with
179              the default sched/builtin).
180               -  If  no  time  (HH:MM:SS)  is  specified,  the   default   is
181              (00:00:00).
182               -  If a date is specified without a year (e.g., MM/DD) then the
183              current year is assumed, unless the  combination  of  MM/DD  and
184              HH:MM:SS  has  already  passed  for that year, in which case the
185              next year is used.
186              This option applies to job allocations.
187
188       -D, --chdir=<path>
189              Have the remote processes do a chdir to  path  before  beginning
190              execution. The default is to chdir to the current working direc‐
191              tory of the srun process. The path can be specified as full path
192              or relative path to the directory where the command is executed.
193              This option applies to job allocations.
194
195       --cluster-constraint=<list>
196              Specifies features that a federated cluster must have to have  a
197              sibling job submitted to it. Slurm will attempt to submit a sib‐
198              ling job to a cluster if it has at least one  of  the  specified
199              features.
200
201       -M, --clusters=<string>
202              Clusters  to  issue  commands to.  Multiple cluster names may be
203              comma separated.  The job will be submitted to the  one  cluster
204              providing the earliest expected job initiation time. The default
205              value is the current cluster. A value of 'all' will query to run
206              on  all  clusters.  Note the --export option to control environ‐
207              ment variables exported between clusters.  This  option  applies
208              only  to job allocations.  Note that the SlurmDBD must be up for
209              this option to work properly.
210
211       --comment=<string>
212              An arbitrary comment. This option applies to job allocations.
213
214       --compress[=type]
215              Compress file before sending it to compute hosts.  The  optional
216              argument specifies the data compression library to be used.  The
217              default is BcastParameters Compression= if set or  "lz4"  other‐
218              wise.   Supported  values are "lz4".  Some compression libraries
219              may be unavailable on some systems.  For use  with  the  --bcast
220              option. This option applies to step allocations.
221
222       -C, --constraint=<list>
223              Nodes  can  have features assigned to them by the Slurm adminis‐
224              trator.  Users can specify which of these features are  required
225              by their job using the constraint option. If you are looking for
226              'soft' constraints please see see --prefer for more information.
227              Only  nodes having features matching the job constraints will be
228              used to satisfy the request.  Multiple constraints may be speci‐
229              fied  with AND, OR, matching OR, resource counts, etc. (some op‐
230              erators are not supported on all system types).
231
232              NOTE: If features that are  part  of  the  node_features/helpers
233              plugin  are requested, then only the Single Name and AND options
234              are supported.
235
236              Supported --constraint options include:
237
238              Single Name
239                     Only nodes which have the specified feature will be used.
240                     For example, --constraint="intel"
241
242              Node Count
243                     A  request  can  specify  the number of nodes needed with
244                     some feature by appending an asterisk and count after the
245                     feature    name.     For   example,   --nodes=16   --con‐
246                     straint="graphics*4 ..."  indicates that the job requires
247                     16  nodes and that at least four of those nodes must have
248                     the feature "graphics."
249
250              AND    If only nodes with all  of  specified  features  will  be
251                     used.   The  ampersand  is used for an AND operator.  For
252                     example, --constraint="intel&gpu"
253
254              OR     If only nodes with at least  one  of  specified  features
255                     will  be used.  The vertical bar is used for an OR opera‐
256                     tor.  For example, --constraint="intel|amd"
257
258              Matching OR
259                     If only one of a set of possible options should  be  used
260                     for all allocated nodes, then use the OR operator and en‐
261                     close the options within square brackets.   For  example,
262                     --constraint="[rack1|rack2|rack3|rack4]" might be used to
263                     specify that all nodes must be allocated on a single rack
264                     of the cluster, but any of those four racks can be used.
265
266              Multiple Counts
267                     Specific counts of multiple resources may be specified by
268                     using the AND operator and enclosing the  options  within
269                     square      brackets.       For      example,      --con‐
270                     straint="[rack1*2&rack2*4]" might be used to specify that
271                     two  nodes  must be allocated from nodes with the feature
272                     of "rack1" and four nodes must be  allocated  from  nodes
273                     with the feature "rack2".
274
275                     NOTE:  This construct does not support multiple Intel KNL
276                     NUMA  or  MCDRAM  modes.  For   example,   while   --con‐
277                     straint="[(knl&quad)*2&(knl&hemi)*4]"  is  not supported,
278                     --constraint="[haswell*2&(knl&hemi)*4]"   is   supported.
279                     Specification of multiple KNL modes requires the use of a
280                     heterogeneous job.
281
282                     NOTE: Multiple Counts can cause jobs to be allocated with
283                     a non-optimal network layout.
284
285              Brackets
286                     Brackets can be used to indicate that you are looking for
287                     a set of nodes with the different requirements  contained
288                     within     the     brackets.    For    example,    --con‐
289                     straint="[(rack1|rack2)*1&(rack3)*2]" will  get  you  one
290                     node  with either the "rack1" or "rack2" features and two
291                     nodes with the "rack3" feature.  The same request without
292                     the  brackets  will  try to find a single node that meets
293                     those requirements.
294
295                     NOTE: Brackets are only reserved for Multiple Counts  and
296                     Matching  OR  syntax.   AND operators require a count for
297                     each    feature    inside    square    brackets     (i.e.
298                     "[quad*2&hemi*1]"). Slurm will only allow a single set of
299                     bracketed constraints per job.
300
301              Parenthesis
302                     Parenthesis can be used to group like node  features  to‐
303                     gether.           For           example,           --con‐
304                     straint="[(knl&snc4&flat)*4&haswell*1]" might be used  to
305                     specify  that  four nodes with the features "knl", "snc4"
306                     and "flat" plus one node with the feature  "haswell"  are
307                     required.   All  options  within  parenthesis  should  be
308                     grouped with AND (e.g. "&") operands.
309
310              WARNING: When srun is executed from within salloc or sbatch, the
311              constraint value can only contain a single feature name. None of
312              the other operators are currently supported for job steps.
313              This option applies to job and step allocations.
314
315       --container=<path_to_container>
316              Absolute path to OCI container bundle.
317
318       --contiguous
319              If set, then the allocated nodes must form a contiguous set.
320
321              NOTE: If SelectPlugin=cons_res this option won't be honored with
322              the  topology/tree  or  topology/3d_torus plugins, both of which
323              can modify the node ordering. This option applies to job alloca‐
324              tions.
325
326       -S, --core-spec=<num>
327              Count of specialized cores per node reserved by the job for sys‐
328              tem operations and not used by the application. The  application
329              will  not use these cores, but will be charged for their alloca‐
330              tion.  Default value is dependent  upon  the  node's  configured
331              CoreSpecCount  value.   If a value of zero is designated and the
332              Slurm configuration option AllowSpecResourcesUsage  is  enabled,
333              the  job  will  be allowed to override CoreSpecCount and use the
334              specialized resources on nodes it is allocated.  This option can
335              not  be  used with the --thread-spec option. This option applies
336              to job allocations.
337
338              NOTE: This option may implicitly impact the number of  tasks  if
339              -n was not specified.
340
341              NOTE:  Explicitly setting a job's specialized core value implic‐
342              itly sets its --exclusive option, reserving entire nodes for the
343              job.
344
345       --cores-per-socket=<cores>
346              Restrict  node  selection  to  nodes with at least the specified
347              number of cores per socket.  See additional information under -B
348              option  above  when task/affinity plugin is enabled. This option
349              applies to job allocations.
350
351       --cpu-bind=[{quiet|verbose},]<type>
352              Bind tasks to CPUs.  Used only when the task/affinity plugin  is
353              enabled.   NOTE: To have Slurm always report on the selected CPU
354              binding for all commands executed in a  shell,  you  can  enable
355              verbose  mode by setting the SLURM_CPU_BIND environment variable
356              value to "verbose".
357
358              The following informational environment variables are  set  when
359              --cpu-bind is in use:
360
361                   SLURM_CPU_BIND_VERBOSE
362                   SLURM_CPU_BIND_TYPE
363                   SLURM_CPU_BIND_LIST
364
365              See  the  ENVIRONMENT  VARIABLES section for a more detailed de‐
366              scription of  the  individual  SLURM_CPU_BIND  variables.  These
367              variable  are available only if the task/affinity plugin is con‐
368              figured.
369
370              When using --cpus-per-task to run multithreaded tasks, be  aware
371              that  CPU  binding  is inherited from the parent of the process.
372              This means that the multithreaded task should either specify  or
373              clear  the CPU binding itself to avoid having all threads of the
374              multithreaded task use the same mask/CPU as the parent.   Alter‐
375              natively,  fat  masks (masks which specify more than one allowed
376              CPU) could be used for the tasks in order  to  provide  multiple
377              CPUs for the multithreaded tasks.
378
379              Note  that a job step can be allocated different numbers of CPUs
380              on each node or be allocated CPUs not starting at location zero.
381              Therefore  one  of  the options which automatically generate the
382              task binding is  recommended.   Explicitly  specified  masks  or
383              bindings  are  only honored when the job step has been allocated
384              every available CPU on the node.
385
386              Binding a task to a NUMA locality domain means to bind the  task
387              to  the  set  of CPUs that belong to the NUMA locality domain or
388              "NUMA node".  If NUMA locality domain options are used  on  sys‐
389              tems  with no NUMA support, then each socket is considered a lo‐
390              cality domain.
391
392              If the --cpu-bind option is not used, the default  binding  mode
393              will  depend  upon Slurm's configuration and the step's resource
394              allocation.  If all allocated nodes  have  the  same  configured
395              CpuBind  mode, that will be used.  Otherwise if the job's Parti‐
396              tion has a configured CpuBind mode, that will be  used.   Other‐
397              wise  if Slurm has a configured TaskPluginParam value, that mode
398              will be used.  Otherwise automatic binding will be performed  as
399              described below.
400
401              Auto Binding
402                     Applies  only  when  task/affinity is enabled. If the job
403                     step allocation includes an allocation with a  number  of
404                     sockets,  cores,  or threads equal to the number of tasks
405                     times cpus-per-task, then the tasks will  by  default  be
406                     bound  to  the appropriate resources (auto binding). Dis‐
407                     able  this  mode  of  operation  by  explicitly   setting
408                     "--cpu-bind=none".        Use       TaskPluginParam=auto‐
409                     bind=[threads|cores|sockets] to set a default cpu binding
410                     in case "auto binding" doesn't find a match.
411
412              Supported options include:
413
414                     q[uiet]
415                            Quietly bind before task runs (default)
416
417                     v[erbose]
418                            Verbosely report binding before task runs
419
420                     no[ne] Do  not  bind  tasks  to CPUs (default unless auto
421                            binding is applied)
422
423                     rank   Automatically bind by task rank.  The lowest  num‐
424                            bered  task  on  each  node is bound to socket (or
425                            core or thread) zero, etc.  Not  supported  unless
426                            the entire node is allocated to the job.
427
428                     map_cpu:<list>
429                            Bind  by  setting CPU masks on tasks (or ranks) as
430                            specified         where         <list>          is
431                            <cpu_id_for_task_0>,<cpu_id_for_task_1>,...     If
432                            the number of tasks (or ranks) exceeds the  number
433                            of  elements  in  this  list, elements in the list
434                            will be reused as needed starting from the  begin‐
435                            ning  of  the list.  To simplify support for large
436                            task counts, the lists may follow a  map  with  an
437                            asterisk   and   repetition  count.   For  example
438                            "map_cpu:0*4,3*4".
439
440                     mask_cpu:<list>
441                            Bind by setting CPU masks on tasks (or  ranks)  as
442                            specified          where         <list>         is
443                            <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
444                            The  mapping is specified for a node and identical
445                            mapping is applied to  the  tasks  on  every  node
446                            (i.e. the lowest task ID on each node is mapped to
447                            the first mask specified in the list, etc.).   CPU
448                            masks are always interpreted as hexadecimal values
449                            but can be preceded with an optional '0x'.  If the
450                            number  of  tasks (or ranks) exceeds the number of
451                            elements in this list, elements in the  list  will
452                            be reused as needed starting from the beginning of
453                            the list.  To  simplify  support  for  large  task
454                            counts,  the lists may follow a map with an aster‐
455                            isk   and   repetition   count.     For    example
456                            "mask_cpu:0x0f*4,0xf0*4".
457
458                     rank_ldom
459                            Bind  to  a NUMA locality domain by rank. Not sup‐
460                            ported unless the entire node is allocated to  the
461                            job.
462
463                     map_ldom:<list>
464                            Bind  by mapping NUMA locality domain IDs to tasks
465                            as      specified      where       <list>       is
466                            <ldom1>,<ldom2>,...<ldomN>.   The  locality domain
467                            IDs are interpreted as decimal values unless  they
468                            are  preceded with '0x' in which case they are in‐
469                            terpreted as hexadecimal  values.   Not  supported
470                            unless the entire node is allocated to the job.
471
472                     mask_ldom:<list>
473                            Bind  by  setting  NUMA  locality  domain masks on
474                            tasks    as    specified    where    <list>     is
475                            <mask1>,<mask2>,...<maskN>.   NUMA locality domain
476                            masks are always interpreted as hexadecimal values
477                            but  can  be  preceded with an optional '0x'.  Not
478                            supported unless the entire node is  allocated  to
479                            the job.
480
481                     sockets
482                            Automatically  generate  masks  binding  tasks  to
483                            sockets.  Only the CPUs on the socket  which  have
484                            been  allocated  to  the job will be used.  If the
485                            number of tasks differs from the number  of  allo‐
486                            cated sockets this can result in sub-optimal bind‐
487                            ing.
488
489                     cores  Automatically  generate  masks  binding  tasks  to
490                            cores.   If  the  number of tasks differs from the
491                            number of  allocated  cores  this  can  result  in
492                            sub-optimal binding.
493
494                     threads
495                            Automatically  generate  masks  binding  tasks  to
496                            threads.  If the number of tasks differs from  the
497                            number  of  allocated  threads  this can result in
498                            sub-optimal binding.
499
500                     ldoms  Automatically generate masks binding tasks to NUMA
501                            locality  domains.  If the number of tasks differs
502                            from the number of allocated locality domains this
503                            can result in sub-optimal binding.
504
505                     help   Show help message for cpu-bind
506
507              This option applies to job and step allocations.
508
509       --cpu-freq=<p1>[-p2[:p3]]
510
511              Request  that the job step initiated by this srun command be run
512              at some requested frequency if possible, on  the  CPUs  selected
513              for the step on the compute node(s).
514
515              p1  can be  [#### | low | medium | high | highm1] which will set
516              the frequency scaling_speed to the corresponding value, and  set
517              the frequency scaling_governor to UserSpace. See below for defi‐
518              nition of the values.
519
520              p1 can be [Conservative | OnDemand |  Performance  |  PowerSave]
521              which  will set the scaling_governor to the corresponding value.
522              The governor has to be in the list set by the slurm.conf  option
523              CpuFreqGovernors.
524
525              When p2 is present, p1 will be the minimum scaling frequency and
526              p2 will be the maximum scaling frequency.
527
528              p2 can be  [#### | medium | high | highm1] p2  must  be  greater
529              than p1.
530
531              p3  can  be [Conservative | OnDemand | Performance | PowerSave |
532              SchedUtil | UserSpace] which will set the governor to the corre‐
533              sponding value.
534
535              If p3 is UserSpace, the frequency scaling_speed will be set by a
536              power or energy aware scheduling strategy to a value between  p1
537              and  p2  that lets the job run within the site's power goal. The
538              job may be delayed if p1 is higher than a frequency that  allows
539              the job to run within the goal.
540
541              If  the current frequency is < min, it will be set to min. Like‐
542              wise, if the current frequency is > max, it will be set to max.
543
544              Acceptable values at present include:
545
546              ####          frequency in kilohertz
547
548              Low           the lowest available frequency
549
550              High          the highest available frequency
551
552              HighM1        (high minus one)  will  select  the  next  highest
553                            available frequency
554
555              Medium        attempts  to  set a frequency in the middle of the
556                            available range
557
558              Conservative  attempts to use the Conservative CPU governor
559
560              OnDemand      attempts to use the OnDemand CPU governor (the de‐
561                            fault value)
562
563              Performance   attempts to use the Performance CPU governor
564
565              PowerSave     attempts to use the PowerSave CPU governor
566
567              UserSpace     attempts to use the UserSpace CPU governor
568
569              The  following  informational environment variable is set
570              in the job
571              step when --cpu-freq option is requested.
572                      SLURM_CPU_FREQ_REQ
573
574              This environment variable can also be used to supply  the  value
575              for  the CPU frequency request if it is set when the 'srun' com‐
576              mand is issued.  The --cpu-freq on the command line  will  over‐
577              ride  the  environment variable value.  The form on the environ‐
578              ment variable is the same as the command line.  See the ENVIRON‐
579              MENT    VARIABLES    section    for   a   description   of   the
580              SLURM_CPU_FREQ_REQ variable.
581
582              NOTE: This parameter is treated as a request, not a requirement.
583              If  the  job  step's  node does not support setting the CPU fre‐
584              quency, or the requested value is outside the bounds of the  le‐
585              gal frequencies, an error is logged, but the job step is allowed
586              to continue.
587
588              NOTE: Setting the frequency for just the CPUs of  the  job  step
589              implies that the tasks are confined to those CPUs.  If task con‐
590              finement (i.e. the task/affinity TaskPlugin is enabled,  or  the
591              task/cgroup  TaskPlugin is enabled with "ConstrainCores=yes" set
592              in cgroup.conf) is not configured, this parameter is ignored.
593
594              NOTE: When the step completes, the  frequency  and  governor  of
595              each selected CPU is reset to the previous values.
596
597              NOTE: When submitting jobs with  the --cpu-freq option with lin‐
598              uxproc as the ProctrackType can cause jobs to  run  too  quickly
599              before  Accounting is able to poll for job information. As a re‐
600              sult not all of accounting information will be present.
601
602              This option applies to job and step allocations.
603
604       --cpus-per-gpu=<ncpus>
605              Advise Slurm that ensuing job steps will require  ncpus  proces‐
606              sors per allocated GPU.  Not compatible with the --cpus-per-task
607              option.
608
609       -c, --cpus-per-task=<ncpus>
610              Request that ncpus be allocated per process. This may be  useful
611              if  the  job is multithreaded and requires more than one CPU per
612              task for optimal performance. Explicitly requesting this  option
613              implies --exact. The default is one CPU per process and does not
614              imply --exact.  If -c is specified without  -n,  as  many  tasks
615              will  be  allocated per node as possible while satisfying the -c
616              restriction. For instance on a cluster with 8 CPUs per  node,  a
617              job  request  for 4 nodes and 3 CPUs per task may be allocated 3
618              or 6 CPUs per node (1 or 2 tasks per node)  depending  upon  re‐
619              source  consumption  by  other jobs. Such a job may be unable to
620              execute more than a total of 4 tasks.
621
622              WARNING: There are configurations and options  interpreted  dif‐
623              ferently by job and job step requests which can result in incon‐
624              sistencies   for   this   option.    For   example   srun    -c2
625              --threads-per-core=1  prog  may  allocate two cores for the job,
626              but if each of those cores contains two threads, the job alloca‐
627              tion  will  include four CPUs. The job step allocation will then
628              launch two threads per CPU for a total of two tasks.
629
630              WARNING: When srun is executed from  within  salloc  or  sbatch,
631              there  are configurations and options which can result in incon‐
632              sistent allocations when -c has a value greater than -c on  sal‐
633              loc or sbatch.  The number of cpus per task specified for salloc
634              or sbatch is not automatically inherited by  srun  and,  if  de‐
635              sired,   must   be   requested   again,   either  by  specifying
636              --cpus-per-task  when  calling   srun,   or   by   setting   the
637              SRUN_CPUS_PER_TASK environment variable.
638
639              This option applies to job and step allocations.
640
641       --deadline=<OPT>
642              remove  the  job  if  no ending is possible before this deadline
643              (start > (deadline -  time[-min])).   Default  is  no  deadline.
644              Valid time formats are:
645              HH:MM[:SS] [AM|PM]
646              MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
647              MM/DD[/YY]-HH:MM[:SS]
648              YYYY-MM-DD[THH:MM[:SS]]]
649              now[+count[seconds(default)|minutes|hours|days|weeks]]
650
651              This option applies only to job allocations.
652
653       --delay-boot=<minutes>
654              Do  not  reboot  nodes  in order to satisfied this job's feature
655              specification if the job has been eligible to run for less  than
656              this time period.  If the job has waited for less than the spec‐
657              ified period, it will use only  nodes  which  already  have  the
658              specified features.  The argument is in units of minutes.  A de‐
659              fault value may be set by a system administrator using  the  de‐
660              lay_boot option of the SchedulerParameters configuration parame‐
661              ter in the slurm.conf file, otherwise the default value is  zero
662              (no delay).
663
664              This option applies only to job allocations.
665
666       -d, --dependency=<dependency_list>
667              Defer  the  start  of  this job until the specified dependencies
668              have been satisfied completed. This option does not apply to job
669              steps  (executions  of  srun within an existing salloc or sbatch
670              allocation) only to job allocations.   <dependency_list>  is  of
671              the    form   <type:job_id[:job_id][,type:job_id[:job_id]]>   or
672              <type:job_id[:job_id][?type:job_id[:job_id]]>.  All dependencies
673              must  be satisfied if the "," separator is used.  Any dependency
674              may be satisfied if the "?" separator is used.  Only one separa‐
675              tor  may  be  used.  Many jobs can share the same dependency and
676              these jobs may even belong to different  users. The   value  may
677              be changed after job submission using the scontrol command.  De‐
678              pendencies on remote jobs are allowed in a federation.   Once  a
679              job dependency fails due to the termination state of a preceding
680              job, the dependent job will never be run, even if the  preceding
681              job  is requeued and has a different termination state in a sub‐
682              sequent execution. This option applies to job allocations.
683
684              after:job_id[[+time][:jobid[+time]...]]
685                     After the specified  jobs  start  or  are  cancelled  and
686                     'time' in minutes from job start or cancellation happens,
687                     this job can begin execution. If no 'time' is given  then
688                     there is no delay after start or cancellation.
689
690              afterany:job_id[:jobid...]
691                     This  job  can  begin  execution after the specified jobs
692                     have terminated.
693
694              afterburstbuffer:job_id[:jobid...]
695                     This job can begin execution  after  the  specified  jobs
696                     have terminated and any associated burst buffer stage out
697                     operations have completed.
698
699              aftercorr:job_id[:jobid...]
700                     A task of this job array can begin  execution  after  the
701                     corresponding  task ID in the specified job has completed
702                     successfully (ran to completion  with  an  exit  code  of
703                     zero).
704
705              afternotok:job_id[:jobid...]
706                     This  job  can  begin  execution after the specified jobs
707                     have terminated in some failed state (non-zero exit code,
708                     node failure, timed out, etc).
709
710              afterok:job_id[:jobid...]
711                     This  job  can  begin  execution after the specified jobs
712                     have successfully executed (ran  to  completion  with  an
713                     exit code of zero).
714
715              singleton
716                     This   job  can  begin  execution  after  any  previously
717                     launched jobs sharing the same job  name  and  user  have
718                     terminated.   In  other  words, only one job by that name
719                     and owned by that user can be running or suspended at any
720                     point  in  time.  In a federation, a singleton dependency
721                     must be fulfilled on all clusters unless DependencyParam‐
722                     eters=disable_remote_singleton is used in slurm.conf.
723
724       -X, --disable-status
725              Disable  the  display of task status when srun receives a single
726              SIGINT (Ctrl-C). Instead immediately forward the SIGINT  to  the
727              running  job.  Without this option a second Ctrl-C in one second
728              is required to forcibly terminate the job and srun will  immedi‐
729              ately  exit.  May  also  be  set  via  the  environment variable
730              SLURM_DISABLE_STATUS. This option applies to job allocations.
731
732       -m,                                --distribution={*|block|cyclic|arbi‐
733       trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
734
735              Specify alternate distribution  methods  for  remote  processes.
736              For job allocation, this sets environment variables that will be
737              used by subsequent srun requests. Task distribution affects  job
738              allocation  at the last stage of the evaluation of available re‐
739              sources by the cons_res  and  cons_tres  plugins.  Consequently,
740              other  options (e.g. --ntasks-per-node, --cpus-per-task) may af‐
741              fect resource selection prior to task distribution.  To ensure a
742              specific  task  distribution  jobs  should  have access to whole
743              nodes, for instance by using the --exclusive flag.
744
745              This option controls the distribution of tasks to the  nodes  on
746              which  resources  have  been  allocated, and the distribution of
747              those resources to tasks for binding (task affinity). The  first
748              distribution  method (before the first ":") controls the distri‐
749              bution of tasks to nodes.  The second distribution method (after
750              the  first  ":")  controls  the  distribution  of allocated CPUs
751              across sockets for binding  to  tasks.  The  third  distribution
752              method (after the second ":") controls the distribution of allo‐
753              cated CPUs across cores for binding to tasks.   The  second  and
754              third distributions apply only if task affinity is enabled.  The
755              third distribution is supported only if the  task/cgroup  plugin
756              is  configured.  The default value for each distribution type is
757              specified by *.
758
759              Note that with select/cons_res and select/cons_tres, the  number
760              of  CPUs allocated to each socket and node may be different. Re‐
761              fer to https://slurm.schedmd.com/mc_support.html for more infor‐
762              mation  on  resource allocation, distribution of tasks to nodes,
763              and binding of tasks to CPUs.
764              First distribution method (distribution of tasks across nodes):
765
766
767              *      Use the default method for distributing  tasks  to  nodes
768                     (block).
769
770              block  The  block distribution method will distribute tasks to a
771                     node such that consecutive tasks share a node. For  exam‐
772                     ple,  consider an allocation of three nodes each with two
773                     cpus. A four-task block distribution  request  will  dis‐
774                     tribute  those  tasks to the nodes with tasks one and two
775                     on the first node, task three on  the  second  node,  and
776                     task  four  on the third node.  Block distribution is the
777                     default behavior if the number of tasks exceeds the  num‐
778                     ber of allocated nodes.
779
780              cyclic The cyclic distribution method will distribute tasks to a
781                     node such that consecutive  tasks  are  distributed  over
782                     consecutive  nodes  (in a round-robin fashion). For exam‐
783                     ple, consider an allocation of three nodes each with  two
784                     cpus.  A  four-task cyclic distribution request will dis‐
785                     tribute those tasks to the nodes with tasks one and  four
786                     on  the first node, task two on the second node, and task
787                     three on the third node.  Note that  when  SelectType  is
788                     select/cons_res, the same number of CPUs may not be allo‐
789                     cated on each node. Task distribution will be round-robin
790                     among  all  the  nodes  with  CPUs  yet to be assigned to
791                     tasks.  Cyclic distribution is the  default  behavior  if
792                     the number of tasks is no larger than the number of allo‐
793                     cated nodes.
794
795              plane  The tasks are distributed in blocks of size  <size>.  The
796                     size  must  be given or SLURM_DIST_PLANESIZE must be set.
797                     The number of tasks distributed to each node is the  same
798                     as  for  cyclic distribution, but the taskids assigned to
799                     each node depend on the plane size. Additional  distribu‐
800                     tion  specifications cannot be combined with this option.
801                     For  more  details  (including  examples  and  diagrams),
802                     please  see https://slurm.schedmd.com/mc_support.html and
803                     https://slurm.schedmd.com/dist_plane.html
804
805              arbitrary
806                     The arbitrary method of distribution will  allocate  pro‐
807                     cesses in-order as listed in file designated by the envi‐
808                     ronment variable SLURM_HOSTFILE.   If  this  variable  is
809                     listed  it will over ride any other method specified.  If
810                     not set the method will default  to  block.   Inside  the
811                     hostfile  must contain at minimum the number of hosts re‐
812                     quested and be one per line or comma separated.  If spec‐
813                     ifying  a  task count (-n, --ntasks=<number>), your tasks
814                     will be laid out on the nodes in the order of the file.
815                     NOTE: The arbitrary distribution option on a job  alloca‐
816                     tion  only  controls the nodes to be allocated to the job
817                     and not the allocation of CPUs on those nodes.  This  op‐
818                     tion is meant primarily to control a job step's task lay‐
819                     out in an existing job allocation for the srun command.
820                     NOTE: If the number of tasks is given and a list  of  re‐
821                     quested  nodes  is  also  given, the number of nodes used
822                     from that list will be reduced to match that of the  num‐
823                     ber  of  tasks  if  the  number  of  nodes in the list is
824                     greater than the number of tasks.
825
826              Second distribution method (distribution of CPUs across  sockets
827              for binding):
828
829
830              *      Use the default method for distributing CPUs across sock‐
831                     ets (cyclic).
832
833              block  The block distribution method will  distribute  allocated
834                     CPUs  consecutively  from  the same socket for binding to
835                     tasks, before using the next consecutive socket.
836
837              cyclic The cyclic distribution method will distribute  allocated
838                     CPUs  for  binding to a given task consecutively from the
839                     same socket, and from the next consecutive socket for the
840                     next  task,  in  a  round-robin  fashion  across sockets.
841                     Tasks requiring more than one CPU will have all of  those
842                     CPUs allocated on a single socket if possible.
843
844              fcyclic
845                     The fcyclic distribution method will distribute allocated
846                     CPUs for binding to tasks from consecutive sockets  in  a
847                     round-robin  fashion across the sockets.  Tasks requiring
848                     more than one CPU will have  each  CPUs  allocated  in  a
849                     cyclic fashion across sockets.
850
851              Third distribution method (distribution of CPUs across cores for
852              binding):
853
854
855              *      Use the default method for distributing CPUs across cores
856                     (inherited from second distribution method).
857
858              block  The  block  distribution method will distribute allocated
859                     CPUs consecutively from the  same  core  for  binding  to
860                     tasks, before using the next consecutive core.
861
862              cyclic The  cyclic distribution method will distribute allocated
863                     CPUs for binding to a given task consecutively  from  the
864                     same  core,  and  from  the next consecutive core for the
865                     next task, in a round-robin fashion across cores.
866
867              fcyclic
868                     The fcyclic distribution method will distribute allocated
869                     CPUs  for  binding  to  tasks from consecutive cores in a
870                     round-robin fashion across the cores.
871
872              Optional control for task distribution over nodes:
873
874
875              Pack   Rather than evenly distributing a job step's tasks evenly
876                     across  its allocated nodes, pack them as tightly as pos‐
877                     sible on the nodes.  This only applies when  the  "block"
878                     task distribution method is used.
879
880              NoPack Rather than packing a job step's tasks as tightly as pos‐
881                     sible on the nodes, distribute them  evenly.   This  user
882                     option    will    supersede    the   SelectTypeParameters
883                     CR_Pack_Nodes configuration parameter.
884
885              This option applies to job and step allocations.
886
887       --epilog={none|<executable>}
888              srun will run executable just after the job step completes.  The
889              command  line  arguments  for executable will be the command and
890              arguments of the job step.  If none is specified, then  no  srun
891              epilog  will be run. This parameter overrides the SrunEpilog pa‐
892              rameter in slurm.conf. This parameter is completely  independent
893              from  the Epilog parameter in slurm.conf. This option applies to
894              job allocations.
895
896       -e, --error=<filename_pattern>
897              Specify how stderr is to be redirected. By default  in  interac‐
898              tive  mode, srun redirects stderr to the same file as stdout, if
899              one is specified. The --error option is provided to allow stdout
900              and  stderr to be redirected to different locations.  See IO Re‐
901              direction below for more options.  If the specified file already
902              exists,  it  will be overwritten. This option applies to job and
903              step allocations.
904
905       --exact
906              Allow a step access to only  the  resources  requested  for  the
907              step.   By  default,  all non-GRES resources on each node in the
908              step allocation will be used. This option only applies  to  step
909              allocations.
910              NOTE:  Parallel  steps  will either be blocked or rejected until
911              requested step resources are available unless --overlap is spec‐
912              ified. Job resources can be held after the completion of an srun
913              command while Slurm does job cleanup. Step epilogs and/or  SPANK
914              plugins can further delay the release of step resources.
915
916       -x, --exclude={<host1[,<host2>...]|<filename>}
917              Request that a specific list of hosts not be included in the re‐
918              sources allocated to this job. The host list will be assumed  to
919              be  a  filename  if it contains a "/" character. This option ap‐
920              plies to job and step allocations.
921
922       --exclusive[={user|mcs}]
923              This option applies to job and job step allocations, and has two
924              slightly different meanings for each one.  When used to initiate
925              a job, the job allocation cannot share nodes with other  running
926              jobs  (or just other users with the "=user" option or "=mcs" op‐
927              tion).  If user/mcs are not specified (i.e. the  job  allocation
928              can  not  share nodes with other running jobs), the job is allo‐
929              cated all CPUs and GRES on all nodes in the allocation,  but  is
930              only allocated as much memory as it requested. This is by design
931              to support gang scheduling, because suspended jobs still  reside
932              in  memory.  To  request  all the memory on a node, use --mem=0.
933              The default shared/exclusive behavior depends on system configu‐
934              ration and the partition's OverSubscribe option takes precedence
935              over the job's option.  NOTE: Since shared GRES (MPS) cannot  be
936              allocated  at  the same time as a sharing GRES (GPU) this option
937              only allocates all sharing GRES and no underlying shared GRES.
938
939              This option can also be used when initiating more than  one  job
940              step within an existing resource allocation (default), where you
941              want separate processors to be dedicated to each  job  step.  If
942              sufficient  processors  are  not  available  to initiate the job
943              step, it will be deferred. This can be thought of as providing a
944              mechanism  for resource management to the job within its alloca‐
945              tion (--exact implied).
946
947              The exclusive allocation of CPUs applies to  job  steps  by  de‐
948              fault,  but  --exact is NOT the default. In other words, the de‐
949              fault behavior is this: job steps will not share CPUs,  but  job
950              steps  will  be  allocated  all CPUs available to the job on all
951              nodes allocated to the steps.
952
953              In order to share the resources use the --overlap option.
954
955              See EXAMPLE below.
956
957       --export={[ALL,]<environment_variables>|ALL|NONE}
958              Identify which environment variables from the  submission  envi‐
959              ronment are propagated to the launched application.
960
961              --export=ALL
962                        Default  mode if --export is not specified. All of the
963                        user's environment will be loaded  from  the  caller's
964                        environment.
965
966              --export=NONE
967                        None  of  the  user  environment will be defined. User
968                        must use absolute path to the binary  to  be  executed
969                        that will define the environment. User can not specify
970                        explicit environment variables with "NONE".
971
972                        This option is particularly important  for  jobs  that
973                        are  submitted on one cluster and execute on a differ‐
974                        ent cluster (e.g. with  different  paths).   To  avoid
975                        steps  inheriting  environment  export  settings (e.g.
976                        "NONE") from sbatch command, either  set  --export=ALL
977                        or the environment variable SLURM_EXPORT_ENV should be
978                        set to "ALL".
979
980              --export=[ALL,]<environment_variables>
981                        Exports all SLURM* environment  variables  along  with
982                        explicitly  defined  variables.  Multiple  environment
983                        variable names should be comma separated.  Environment
984                        variable  names may be specified to propagate the cur‐
985                        rent value (e.g. "--export=EDITOR") or specific values
986                        may  be  exported (e.g. "--export=EDITOR=/bin/emacs").
987                        If "ALL" is specified, then all user environment vari‐
988                        ables will be loaded and will take precedence over any
989                        explicitly given environment variables.
990
991                   Example: --export=EDITOR,ARG1=test
992                        In this example, the propagated environment will  only
993                        contain  the  variable EDITOR from the user's environ‐
994                        ment, SLURM_* environment variables, and ARG1=test.
995
996                   Example: --export=ALL,EDITOR=/bin/emacs
997                        There are two possible outcomes for this  example.  If
998                        the  caller  has  the  EDITOR environment variable de‐
999                        fined, then the job's  environment  will  inherit  the
1000                        variable from the caller's environment.  If the caller
1001                        doesn't have an environment variable defined for  EDI‐
1002                        TOR,  then  the  job's  environment will use the value
1003                        given by --export.
1004
1005       -B, --extra-node-info=<sockets>[:cores[:threads]]
1006              Restrict node selection to nodes with  at  least  the  specified
1007              number of sockets, cores per socket and/or threads per core.
1008              NOTE: These options do not specify the resource allocation size.
1009              Each value specified is considered a minimum.  An  asterisk  (*)
1010              can  be  used as a placeholder indicating that all available re‐
1011              sources of that type are to be  utilized.  Values  can  also  be
1012              specified  as  min-max. The individual levels can also be speci‐
1013              fied in separate options if desired:
1014
1015                  --sockets-per-node=<sockets>
1016                  --cores-per-socket=<cores>
1017                  --threads-per-core=<threads>
1018              If task/affinity plugin is enabled, then specifying  an  alloca‐
1019              tion  in  this  manner  also sets a default --cpu-bind option of
1020              threads if the -B option specifies a thread count, otherwise  an
1021              option  of  cores if a core count is specified, otherwise an op‐
1022              tion  of  sockets.   If  SelectType   is   configured   to   se‐
1023              lect/cons_res, it must have a parameter of CR_Core, CR_Core_Mem‐
1024              ory, CR_Socket, or CR_Socket_Memory for this option to  be  hon‐
1025              ored.   If  not  specified,  the  scontrol show job will display
1026              'ReqS:C:T=*:*:*'. This option applies to job allocations.
1027              NOTE:  This  option   is   mutually   exclusive   with   --hint,
1028              --threads-per-core and --ntasks-per-core.
1029              NOTE: If the number of sockets, cores and threads were all spec‐
1030              ified, the number of nodes was specified (as a fixed number, not
1031              a  range)  and  the number of tasks was NOT specified, srun will
1032              implicitly calculate the number of tasks as one task per thread.
1033
1034       --gid=<group>
1035              If srun is run as root, and the --gid option is used, submit the
1036              job  with  group's  group  access permissions.  group may be the
1037              group name or the numerical group ID. This option applies to job
1038              allocations.
1039
1040       --gpu-bind=[verbose,]<type>
1041              Bind  tasks to specific GPUs.  By default every spawned task can
1042              access every GPU allocated to the step.  If "verbose," is speci‐
1043              fied before <type>, then print out GPU binding debug information
1044              to the stderr of the tasks. GPU binding is ignored if  there  is
1045              only one task.
1046
1047              Supported type options:
1048
1049              closest   Bind  each task to the GPU(s) which are closest.  In a
1050                        NUMA environment, each task may be bound to more  than
1051                        one GPU (i.e.  all GPUs in that NUMA environment).
1052
1053              map_gpu:<list>
1054                        Bind by setting GPU masks on tasks (or ranks) as spec‐
1055                        ified            where            <list>            is
1056                        <gpu_id_for_task_0>,<gpu_id_for_task_1>,...   GPU  IDs
1057                        are interpreted as decimal values. If  the  number  of
1058                        tasks  (or  ranks)  exceeds  the number of elements in
1059                        this list, elements in the  list  will  be  reused  as
1060                        needed  starting  from  the  beginning of the list. To
1061                        simplify support for large task counts, the lists  may
1062                        follow  a  map  with an asterisk and repetition count.
1063                        For example  "map_gpu:0*4,1*4".   If  the  task/cgroup
1064                        plugin   is   used  and  ConstrainDevices  is  set  in
1065                        cgroup.conf, then the GPU IDs are  zero-based  indexes
1066                        relative  to  the  GPUs allocated to the job (e.g. the
1067                        first GPU is 0, even if the global ID  is  3).  Other‐
1068                        wise, the GPU IDs are global IDs, and all GPUs on each
1069                        node in the job should be  allocated  for  predictable
1070                        binding results.
1071
1072              mask_gpu:<list>
1073                        Bind by setting GPU masks on tasks (or ranks) as spec‐
1074                        ified            where            <list>            is
1075                        <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,...    The
1076                        mapping is specified for a node and identical  mapping
1077                        is applied to the tasks on every node (i.e. the lowest
1078                        task ID on each node is mapped to the first mask spec‐
1079                        ified  in the list, etc.). GPU masks are always inter‐
1080                        preted as hexadecimal values but can be preceded  with
1081                        an  optional  '0x'. To simplify support for large task
1082                        counts, the lists may follow a map  with  an  asterisk
1083                        and      repetition      count.       For      example
1084                        "mask_gpu:0x0f*4,0xf0*4".  If the  task/cgroup  plugin
1085                        is  used  and  ConstrainDevices is set in cgroup.conf,
1086                        then the GPU IDs are zero-based  indexes  relative  to
1087                        the  GPUs  allocated to the job (e.g. the first GPU is
1088                        0, even if the global ID is 3). Otherwise, the GPU IDs
1089                        are  global  IDs, and all GPUs on each node in the job
1090                        should be allocated for predictable binding results.
1091
1092              none      Do not bind  tasks  to  GPUs  (turns  off  binding  if
1093                        --gpus-per-task is requested).
1094
1095              per_task:<gpus_per_task>
1096                        Each  task  will be bound to the number of gpus speci‐
1097                        fied in <gpus_per_task>. Gpus are assigned in order to
1098                        tasks.  The  first  task  will be assigned the first x
1099                        number of gpus on the node etc.
1100
1101              single:<tasks_per_gpu>
1102                        Like --gpu-bind=closest, except  that  each  task  can
1103                        only  be  bound  to  a single GPU, even when it can be
1104                        bound to multiple GPUs that are  equally  close.   The
1105                        GPU to bind to is determined by <tasks_per_gpu>, where
1106                        the first <tasks_per_gpu> tasks are bound to the first
1107                        GPU  available,  the  second <tasks_per_gpu> tasks are
1108                        bound to the second GPU available, etc.  This is basi‐
1109                        cally  a  block  distribution  of tasks onto available
1110                        GPUs, where the available GPUs are determined  by  the
1111                        socket affinity of the task and the socket affinity of
1112                        the GPUs as specified in gres.conf's Cores parameter.
1113
1114       --gpu-freq=[<type]=value>[,<type=value>][,verbose]
1115              Request that GPUs allocated to the job are configured with  spe‐
1116              cific  frequency  values.   This  option can be used to indepen‐
1117              dently configure the GPU and its memory frequencies.  After  the
1118              job  is  completed, the frequencies of all affected GPUs will be
1119              reset to the highest possible values.   In  some  cases,  system
1120              power  caps  may  override the requested values.  The field type
1121              can be "memory".  If type is not specified, the GPU frequency is
1122              implied.  The value field can either be "low", "medium", "high",
1123              "highm1" or a numeric value in megahertz (MHz).  If  the  speci‐
1124              fied numeric value is not possible, a value as close as possible
1125              will be used. See below for definition of the values.  The  ver‐
1126              bose  option  causes  current  GPU  frequency  information to be
1127              logged.  Examples of use include "--gpu-freq=medium,memory=high"
1128              and "--gpu-freq=450".
1129
1130              Supported value definitions:
1131
1132              low       the lowest available frequency.
1133
1134              medium    attempts  to  set  a  frequency  in  the middle of the
1135                        available range.
1136
1137              high      the highest available frequency.
1138
1139              highm1    (high minus one) will select the next  highest  avail‐
1140                        able frequency.
1141
1142       -G, --gpus=[type:]<number>
1143              Specify  the  total number of GPUs required for the job.  An op‐
1144              tional GPU type specification  can  be  supplied.   For  example
1145              "--gpus=volta:3".   Multiple options can be requested in a comma
1146              separated list,  for  example:  "--gpus=volta:3,kepler:1".   See
1147              also  the --gpus-per-node, --gpus-per-socket and --gpus-per-task
1148              options.
1149              NOTE: The allocation has to contain at least one GPU per node.
1150
1151       --gpus-per-node=[type:]<number>
1152              Specify the number of GPUs required for the job on each node in‐
1153              cluded  in  the job's resource allocation.  An optional GPU type
1154              specification     can     be     supplied.      For      example
1155              "--gpus-per-node=volta:3".  Multiple options can be requested in
1156              a      comma      separated       list,       for       example:
1157              "--gpus-per-node=volta:3,kepler:1".    See   also   the  --gpus,
1158              --gpus-per-socket and --gpus-per-task options.
1159
1160       --gpus-per-socket=[type:]<number>
1161              Specify the number of GPUs required for the job on  each  socket
1162              included in the job's resource allocation.  An optional GPU type
1163              specification     can     be     supplied.      For      example
1164              "--gpus-per-socket=volta:3".   Multiple options can be requested
1165              in     a     comma     separated     list,     for      example:
1166              "--gpus-per-socket=volta:3,kepler:1".  Requires job to specify a
1167              sockets per node count  (  --sockets-per-node).   See  also  the
1168              --gpus,  --gpus-per-node  and --gpus-per-task options.  This op‐
1169              tion applies to job allocations.
1170
1171       --gpus-per-task=[type:]<number>
1172              Specify the number of GPUs required for the job on each task  to
1173              be  spawned  in  the job's resource allocation.  An optional GPU
1174              type   specification   can    be    supplied.     For    example
1175              "--gpus-per-task=volta:1".  Multiple options can be requested in
1176              a      comma      separated       list,       for       example:
1177              "--gpus-per-task=volta:3,kepler:1".   See   also   the   --gpus,
1178              --gpus-per-socket and --gpus-per-node options.  This option  re‐
1179              quires  an  explicit  task count, e.g. -n, --ntasks or "--gpus=X
1180              --gpus-per-task=Y" rather than an ambiguous range of nodes  with
1181              -N,     --nodes.     This    option    will    implicitly    set
1182              --gpu-bind=per_task:<gpus_per_task>, but that can be  overridden
1183              with an explicit --gpu-bind specification.
1184
1185       --gres=<list>
1186              Specifies  a  comma-delimited  list  of  generic  consumable re‐
1187              sources.   The  format  of   each   entry   on   the   list   is
1188              "name[[:type]:count]".   The  name is that of the consumable re‐
1189              source.  The count is the number of those resources with  a  de‐
1190              fault  value  of  1.   The count can have a suffix of "k" or "K"
1191              (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
1192              "G"  (multiple  of  1024 x 1024 x 1024), "t" or "T" (multiple of
1193              1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x  1024
1194              x  1024  x  1024 x 1024).  The specified resources will be allo‐
1195              cated to the job on each node.  The available generic consumable
1196              resources  is  configurable by the system administrator.  A list
1197              of available generic consumable resources will  be  printed  and
1198              the  command  will exit if the option argument is "help".  Exam‐
1199              ples of use include "--gres=gpu:2",  "--gres=gpu:kepler:2",  and
1200              "--gres=help".   NOTE: This option applies to job and step allo‐
1201              cations. By default, a job step is allocated all of the  generic
1202              resources  that have been requested by the job, except those im‐
1203              plicitly requested when a job is exclusive.  To change  the  be‐
1204              havior  so that each job step is allocated no generic resources,
1205              explicitly set the value of --gres to specify  zero  counts  for
1206              each   generic   resource   OR  set  "--gres=none"  OR  set  the
1207              SLURM_STEP_GRES environment variable to "none".
1208
1209       --gres-flags=<type>
1210              Specify generic resource task binding options.
1211
1212              disable-binding
1213                     Disable filtering of CPUs with  respect  to  generic  re‐
1214                     source  locality.   This  option is currently required to
1215                     use more CPUs than are bound to a GRES (i.e. if a GPU  is
1216                     bound  to  the  CPUs on one socket, but resources on more
1217                     than one socket are required to run the job).   This  op‐
1218                     tion  may  permit  a job to be allocated resources sooner
1219                     than otherwise possible, but may result in lower job per‐
1220                     formance.  This option applies to job allocations.
1221                     NOTE: This option is specific to SelectType=cons_res.
1222
1223              enforce-binding
1224                     The  only  CPUs  available  to the job/step will be those
1225                     bound to the selected GRES (i.e. the CPUs  identified  in
1226                     the  gres.conf  file will be strictly enforced). This op‐
1227                     tion may result in delayed initiation of a job.  For  ex‐
1228                     ample  a  job  requiring two GPUs and one CPU will be de‐
1229                     layed until both GPUs on a single  socket  are  available
1230                     rather  than  using  GPUs bound to separate sockets, how‐
1231                     ever, the application performance may be improved due  to
1232                     improved  communication  speed.   Requires the node to be
1233                     configured with more than one socket and resource filter‐
1234                     ing  will  be performed on a per-socket basis.  NOTE: Job
1235                     steps that don't use --exact will not be affected.
1236                     NOTE: This option is specific to SelectType=cons_tres for
1237                     job allocations.
1238
1239       -h, --help
1240              Display help information and exit.
1241
1242       --het-group=<expr>
1243              Identify  each  component  in a heterogeneous job allocation for
1244              which a step is to be created. Applies only to srun commands is‐
1245              sued  inside  a salloc allocation or sbatch script.  <expr> is a
1246              set of integers corresponding to one or more options offsets  on
1247              the  salloc  or sbatch command line.  Examples: "--het-group=2",
1248              "--het-group=0,4", "--het-group=1,3-5".  The  default  value  is
1249              --het-group=0.
1250
1251       --hint=<type>
1252              Bind tasks according to application hints.
1253              NOTE:  This  option  cannot  be  used in conjunction with any of
1254              --ntasks-per-core, --threads-per-core,  --cpu-bind  (other  than
1255              --cpu-bind=verbose)  or  -B. If --hint is specified as a command
1256              line argument, it will take precedence over the environment.
1257
1258              compute_bound
1259                     Select settings for compute bound applications:  use  all
1260                     cores in each socket, one thread per core.
1261
1262              memory_bound
1263                     Select  settings  for memory bound applications: use only
1264                     one core in each socket, one thread per core.
1265
1266              [no]multithread
1267                     [don't] use extra threads  with  in-core  multi-threading
1268                     which  can  benefit communication intensive applications.
1269                     Only supported with the task/affinity plugin.
1270
1271              help   show this help message
1272
1273              This option applies to job allocations.
1274
1275       -H, --hold
1276              Specify the job is to be submitted in a held state (priority  of
1277              zero).   A  held job can now be released using scontrol to reset
1278              its priority (e.g. "scontrol release <job_id>"). This option ap‐
1279              plies to job allocations.
1280
1281       -I, --immediate[=<seconds>]
1282              exit if resources are not available within the time period spec‐
1283              ified.  If no argument is given (seconds  defaults  to  1),  re‐
1284              sources  must  be  available immediately for the request to suc‐
1285              ceed. If defer is configured  in  SchedulerParameters  and  sec‐
1286              onds=1  the allocation request will fail immediately; defer con‐
1287              flicts and takes precedence over this option.  By default, --im‐
1288              mediate  is  off, and the command will block until resources be‐
1289              come available. Since this option's argument  is  optional,  for
1290              proper parsing the single letter option must be followed immedi‐
1291              ately with the value and not include a space between  them.  For
1292              example  "-I60"  and not "-I 60". This option applies to job and
1293              step allocations.
1294
1295       -i, --input=<mode>
1296              Specify how stdin is to be redirected. By  default,  srun  redi‐
1297              rects  stdin  from the terminal to all tasks. See IO Redirection
1298              below for more options.  For OS X, the poll() function does  not
1299              support  stdin,  so  input from a terminal is not possible. This
1300              option applies to job and step allocations.
1301
1302       -J, --job-name=<jobname>
1303              Specify a name for the job. The specified name will appear along
1304              with the job id number when querying running jobs on the system.
1305              The default is the supplied  executable  program's  name.  NOTE:
1306              This  information  may be written to the slurm_jobacct.log file.
1307              This file is space delimited so if a space is used in  the  job‐
1308              name name it will cause problems in properly displaying the con‐
1309              tents of the slurm_jobacct.log file when the  sacct  command  is
1310              used. This option applies to job and step allocations.
1311
1312       --jobid=<jobid>
1313              Initiate  a  job step under an already allocated job with job id
1314              id.  Using this option will cause srun to behave exactly  as  if
1315              the  SLURM_JOB_ID  environment variable was set. This option ap‐
1316              plies to step allocations.
1317
1318       -K, --kill-on-bad-exit[=0|1]
1319              Controls whether or not to terminate a step if  any  task  exits
1320              with  a non-zero exit code. If this option is not specified, the
1321              default action will be based upon the Slurm configuration param‐
1322              eter of KillOnBadExit. If this option is specified, it will take
1323              precedence over KillOnBadExit. An option argument of  zero  will
1324              not  terminate  the job. A non-zero argument or no argument will
1325              terminate the job.  Note: This option takes precedence over  the
1326              -W, --wait option to terminate the job immediately if a task ex‐
1327              its with a non-zero exit code.  Since this option's argument  is
1328              optional,  for  proper  parsing the single letter option must be
1329              followed immediately with the value and not include a space  be‐
1330              tween them. For example "-K1" and not "-K 1".
1331
1332       -l, --label
1333              Prepend  task number to lines of stdout/err.  The --label option
1334              will prepend lines of output with the remote task id.  This  op‐
1335              tion applies to step allocations.
1336
1337       -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
1338              Specification  of  licenses (or other resources available on all
1339              nodes of the cluster) which must be allocated to this job.   Li‐
1340              cense  names  can  be followed by a colon and count (the default
1341              count is one).  Multiple license names should be comma separated
1342              (e.g.  "--licenses=foo:4,bar"). This option applies to job allo‐
1343              cations.
1344
1345              NOTE: When submitting heterogeneous jobs, license requests  only
1346              work  correctly when made on the first component job.  For exam‐
1347              ple "srun -L ansys:2 : myexecutable".
1348
1349       --mail-type=<type>
1350              Notify user by email when certain event types occur.  Valid type
1351              values  are  NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1352              BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and  STAGE_OUT),  IN‐
1353              VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1354              fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1355              (reached  90  percent  of time limit), TIME_LIMIT_80 (reached 80
1356              percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1357              time  limit).   Multiple type values may be specified in a comma
1358              separated list.  The user  to  be  notified  is  indicated  with
1359              --mail-user. This option applies to job allocations.
1360
1361       --mail-user=<user>
1362              User  to  receive email notification of state changes as defined
1363              by --mail-type.  The default value is the submitting user.  This
1364              option applies to job allocations.
1365
1366       --mcs-label=<mcs>
1367              Used  only when the mcs/group plugin is enabled.  This parameter
1368              is a group among the groups of the user.  Default value is  cal‐
1369              culated  by  the Plugin mcs if it's enabled. This option applies
1370              to job allocations.
1371
1372       --mem=<size>[units]
1373              Specify the real memory required per node.   Default  units  are
1374              megabytes.   Different  units  can be specified using the suffix
1375              [K|M|G|T].  Default value is DefMemPerNode and the maximum value
1376              is  MaxMemPerNode. If configured, both of parameters can be seen
1377              using the scontrol show config command.   This  parameter  would
1378              generally  be used if whole nodes are allocated to jobs (Select‐
1379              Type=select/linear).  Specifying a memory limit of  zero  for  a
1380              job  step will restrict the job step to the amount of memory al‐
1381              located to the job, but not remove any of the job's memory allo‐
1382              cation  from  being  available  to  other  job  steps.  Also see
1383              --mem-per-cpu and --mem-per-gpu.  The --mem,  --mem-per-cpu  and
1384              --mem-per-gpu   options   are   mutually  exclusive.  If  --mem,
1385              --mem-per-cpu or --mem-per-gpu are specified as command line ar‐
1386              guments,  then  they  will  take precedence over the environment
1387              (potentially inherited from salloc or sbatch).
1388
1389              NOTE: A memory size specification of zero is treated as  a  spe‐
1390              cial case and grants the job access to all of the memory on each
1391              node for newly submitted jobs and all available  job  memory  to
1392              new job steps.
1393
1394              NOTE:  Enforcement  of  memory  limits currently relies upon the
1395              task/cgroup plugin or enabling of accounting, which samples mem‐
1396              ory  use on a periodic basis (data need not be stored, just col‐
1397              lected). In both cases memory use is based upon the job's  Resi‐
1398              dent  Set  Size  (RSS). A task may exceed the memory limit until
1399              the next periodic accounting sample.
1400
1401              This option applies to job and step allocations.
1402
1403       --mem-bind=[{quiet|verbose},]<type>
1404              Bind tasks to memory. Used only when the task/affinity plugin is
1405              enabled  and the NUMA memory functions are available.  Note that
1406              the resolution of CPU and memory binding may differ on some  ar‐
1407              chitectures.  For  example,  CPU binding may be performed at the
1408              level of the cores within a processor while memory binding  will
1409              be  performed  at  the  level  of nodes, where the definition of
1410              "nodes" may differ from system to system.  By default no  memory
1411              binding is performed; any task using any CPU can use any memory.
1412              This option is typically used to ensure that each task is  bound
1413              to  the  memory closest to its assigned CPU. The use of any type
1414              other than "none" or "local" is not recommended.   If  you  want
1415              greater control, try running a simple test code with the options
1416              "--cpu-bind=verbose,none --mem-bind=verbose,none"  to  determine
1417              the specific configuration.
1418
1419              NOTE: To have Slurm always report on the selected memory binding
1420              for all commands executed in a shell,  you  can  enable  verbose
1421              mode by setting the SLURM_MEM_BIND environment variable value to
1422              "verbose".
1423
1424              The following informational environment variables are  set  when
1425              --mem-bind is in use:
1426
1427                   SLURM_MEM_BIND_LIST
1428                   SLURM_MEM_BIND_PREFER
1429                   SLURM_MEM_BIND_SORT
1430                   SLURM_MEM_BIND_TYPE
1431                   SLURM_MEM_BIND_VERBOSE
1432
1433              See  the  ENVIRONMENT  VARIABLES section for a more detailed de‐
1434              scription of the individual SLURM_MEM_BIND* variables.
1435
1436              Supported options include:
1437
1438              help   show this help message
1439
1440              local  Use memory local to the processor in use
1441
1442              map_mem:<list>
1443                     Bind by setting memory masks on tasks (or ranks) as spec‐
1444                     ified             where             <list>             is
1445                     <numa_id_for_task_0>,<numa_id_for_task_1>,...   The  map‐
1446                     ping is specified for a node and identical mapping is ap‐
1447                     plied to the tasks on every node (i.e. the lowest task ID
1448                     on  each  node is mapped to the first ID specified in the
1449                     list, etc.).  NUMA IDs are interpreted as decimal  values
1450                     unless they are preceded with '0x' in which case they in‐
1451                     terpreted as hexadecimal values.  If the number of  tasks
1452                     (or  ranks)  exceeds the number of elements in this list,
1453                     elements in the list will be reused  as  needed  starting
1454                     from  the beginning of the list.  To simplify support for
1455                     large task counts, the lists may follow a map with an as‐
1456                     terisk     and    repetition    count.     For    example
1457                     "map_mem:0x0f*4,0xf0*4".   For  predictable  binding  re‐
1458                     sults,  all CPUs for each node in the job should be allo‐
1459                     cated to the job.
1460
1461              mask_mem:<list>
1462                     Bind by setting memory masks on tasks (or ranks) as spec‐
1463                     ified             where             <list>             is
1464                     <numa_mask_for_task_0>,<numa_mask_for_task_1>,...     The
1465                     mapping  is specified for a node and identical mapping is
1466                     applied to the tasks on every node (i.e. the lowest  task
1467                     ID  on each node is mapped to the first mask specified in
1468                     the list, etc.).  NUMA masks are  always  interpreted  as
1469                     hexadecimal  values.   Note  that  masks must be preceded
1470                     with a '0x' if they don't begin with [0-9]  so  they  are
1471                     seen  as  numerical  values.   If the number of tasks (or
1472                     ranks) exceeds the number of elements in this list,  ele‐
1473                     ments  in the list will be reused as needed starting from
1474                     the beginning of the list.  To simplify support for large
1475                     task counts, the lists may follow a mask with an asterisk
1476                     and repetition count.   For  example  "mask_mem:0*4,1*4".
1477                     For  predictable  binding results, all CPUs for each node
1478                     in the job should be allocated to the job.
1479
1480              no[ne] don't bind tasks to memory (default)
1481
1482              nosort avoid sorting free cache pages (default, LaunchParameters
1483                     configuration parameter can override this default)
1484
1485              p[refer]
1486                     Prefer use of first specified NUMA node, but permit
1487                      use of other available NUMA nodes.
1488
1489              q[uiet]
1490                     quietly bind before task runs (default)
1491
1492              rank   bind by task rank (not recommended)
1493
1494              sort   sort free cache pages (run zonesort on Intel KNL nodes)
1495
1496              v[erbose]
1497                     verbosely report binding before task runs
1498
1499              This option applies to job and step allocations.
1500
1501       --mem-per-cpu=<size>[units]
1502              Minimum memory required per usable allocated CPU.  Default units
1503              are megabytes.  Different units can be specified using the  suf‐
1504              fix  [K|M|G|T].  The default value is DefMemPerCPU and the maxi‐
1505              mum value is MaxMemPerCPU (see exception below). If  configured,
1506              both  parameters can be seen using the scontrol show config com‐
1507              mand.  Note that if the job's --mem-per-cpu  value  exceeds  the
1508              configured  MaxMemPerCPU,  then the user's limit will be treated
1509              as a memory limit per task; --mem-per-cpu will be reduced  to  a
1510              value  no  larger than MaxMemPerCPU; --cpus-per-task will be set
1511              and  the  value  of  --cpus-per-task  multiplied  by   the   new
1512              --mem-per-cpu  value will equal the original --mem-per-cpu value
1513              specified by the user.  This parameter would generally  be  used
1514              if  individual  processors are allocated to jobs (SelectType=se‐
1515              lect/cons_res).  If resources are allocated by core, socket,  or
1516              whole  nodes,  then the number of CPUs allocated to a job may be
1517              higher than the task count and the value of --mem-per-cpu should
1518              be  adjusted accordingly.  Specifying a memory limit of zero for
1519              a job step will restrict the job step to the  amount  of  memory
1520              allocated to the job, but not remove any of the job's memory al‐
1521              location from being available to  other  job  steps.   Also  see
1522              --mem   and   --mem-per-gpu.    The   --mem,  --mem-per-cpu  and
1523              --mem-per-gpu options are mutually exclusive.
1524
1525              NOTE: If the final amount of memory requested by a job can't  be
1526              satisfied  by  any of the nodes configured in the partition, the
1527              job will be rejected.  This could  happen  if  --mem-per-cpu  is
1528              used  with  the  --exclusive  option  for  a  job allocation and
1529              --mem-per-cpu times the number of CPUs on a node is greater than
1530              the total memory of that node.
1531
1532              NOTE: This applies to usable allocated CPUs in a job allocation.
1533              This is important when more than one thread per core is  config‐
1534              ured.   If  a job requests --threads-per-core with fewer threads
1535              on a core than exist on the core (or --hint=nomultithread  which
1536              implies  --threads-per-core=1),  the  job  will be unable to use
1537              those extra threads on the core and those threads  will  not  be
1538              included  in  the memory per CPU calculation. But if the job has
1539              access to all threads on the core, those  threads  will  be  in‐
1540              cluded in the memory per CPU calculation even if the job did not
1541              explicitly request those threads.
1542
1543              In the following examples, each core has two threads.
1544
1545              In this first example, two tasks  can  run  on  separate  hyper‐
1546              threads in the same core because --threads-per-core is not used.
1547              The third task uses both threads of the second core.  The  allo‐
1548              cated memory per cpu includes all threads:
1549
1550              $ salloc -n3 --mem-per-cpu=100
1551              salloc: Granted job allocation 17199
1552              $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
1553                JobID                             ReqTRES                           AllocTRES
1554              ------- ----------------------------------- -----------------------------------
1555                17199     billing=3,cpu=3,mem=300M,node=1     billing=4,cpu=4,mem=400M,node=1
1556
1557              In  this  second  example, because of --threads-per-core=1, each
1558              task is allocated an entire core but is only  able  to  use  one
1559              thread  per  core.  Allocated  CPUs includes all threads on each
1560              core. However, allocated memory per cpu includes only the usable
1561              thread in each core.
1562
1563              $ salloc -n3 --mem-per-cpu=100 --threads-per-core=1
1564              salloc: Granted job allocation 17200
1565              $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
1566                JobID                             ReqTRES                           AllocTRES
1567              ------- ----------------------------------- -----------------------------------
1568                17200     billing=3,cpu=3,mem=300M,node=1     billing=6,cpu=6,mem=300M,node=1
1569
1570       --mem-per-gpu=<size>[units]
1571              Minimum  memory  required  per allocated GPU.  Default units are
1572              megabytes.  Different units can be specified  using  the  suffix
1573              [K|M|G|T].   Default  value  is DefMemPerGPU and is available on
1574              both a global and per partition basis.  If configured,  the  pa‐
1575              rameters can be seen using the scontrol show config and scontrol
1576              show  partition  commands.   Also   see   --mem.    The   --mem,
1577              --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1578
1579       --mincpus=<n>
1580              Specify  a  minimum  number of logical cpus/processors per node.
1581              This option applies to job allocations.
1582
1583       --mpi=<mpi_type>
1584              Identify the type of MPI to be used. May result in unique initi‐
1585              ation procedures.
1586
1587              cray_shasta
1588                     To  enable  Cray  PMI  support.  This is for applications
1589                     built with the Cray Programming Environment. The PMI Con‐
1590                     trol  Port  can be specified with the --resv-ports option
1591                     or with the  MpiParams=ports=<port  range>  parameter  in
1592                     your  slurm.conf.   This plugin does not have support for
1593                     heterogeneous jobs.  Support for cray_shasta is  included
1594                     by default.
1595
1596              list   Lists available mpi types to choose from.
1597
1598              pmi2   To  enable  PMI2 support. The PMI2 support in Slurm works
1599                     only if the MPI  implementation  supports  it,  in  other
1600                     words  if the MPI has the PMI2 interface implemented. The
1601                     --mpi=pmi2 will load  the  library  lib/slurm/mpi_pmi2.so
1602                     which  provides  the  server  side  functionality but the
1603                     client side must implement PMI2_Init() and the other  in‐
1604                     terface calls.
1605
1606              pmix   To enable PMIx support (https://pmix.github.io). The PMIx
1607                     support in Slurm can be used to launch parallel  applica‐
1608                     tions (e.g. MPI) if it supports PMIx, PMI2 or PMI1. Slurm
1609                     must  be  configured  with  pmix   support   by   passing
1610                     "--with-pmix=<PMIx  installation  path>"  option  to  its
1611                     "./configure" script.
1612
1613                     At the time of writing PMIx  is  supported  in  Open  MPI
1614                     starting  from  version 2.0.  PMIx also supports backward
1615                     compatibility with PMI1 and PMI2 and can be used  if  MPI
1616                     was  configured  with  PMI2/PMI1  support pointing to the
1617                     PMIx library ("libpmix").  If MPI supports PMI1/PMI2  but
1618                     doesn't  provide the way to point to a specific implemen‐
1619                     tation, a hack'ish solution leveraging LD_PRELOAD can  be
1620                     used to force "libpmix" usage.
1621
1622              none   No  special MPI processing. This is the default and works
1623                     with many other versions of MPI.
1624
1625              This option applies to step allocations.
1626
1627       --msg-timeout=<seconds>
1628              Modify the job launch message timeout.   The  default  value  is
1629              MessageTimeout  in  the  Slurm  configuration  file  slurm.conf.
1630              Changes to this are typically not recommended, but could be use‐
1631              ful  to  diagnose  problems.  This option applies to job alloca‐
1632              tions.
1633
1634       --multi-prog
1635              Run a job with different programs and  different  arguments  for
1636              each task. In this case, the executable program specified is ac‐
1637              tually a configuration file specifying the executable and  argu‐
1638              ments  for  each  task. See MULTIPLE PROGRAM CONFIGURATION below
1639              for details on the configuration file contents. This option  ap‐
1640              plies to step allocations.
1641
1642       --network=<type>
1643              Specify  information  pertaining  to the switch or network.  The
1644              interpretation of type is system dependent.  This option is sup‐
1645              ported when running Slurm on a Cray natively.  It is used to re‐
1646              quest using Network Performance Counters.  Only  one  value  per
1647              request  is  valid.  All options are case in-sensitive.  In this
1648              configuration supported values include:
1649
1650
1651              system
1652                    Use the system-wide  network  performance  counters.  Only
1653                    nodes  requested will be marked in use for the job alloca‐
1654                    tion.  If the job does not fill up the entire  system  the
1655                    rest  of  the  nodes are not able to be used by other jobs
1656                    using NPC, if idle their state will  appear  as  PerfCnts.
1657                    These  nodes  are still available for other jobs not using
1658                    NPC.
1659
1660              blade Use the blade network performance counters. Only nodes re‐
1661                    quested  will be marked in use for the job allocation.  If
1662                    the job does not fill up the entire blade(s) allocated  to
1663                    the  job  those  blade(s) are not able to be used by other
1664                    jobs using NPC, if idle their state will appear as  PerfC‐
1665                    nts.   These  nodes are still available for other jobs not
1666                    using NPC.
1667
1668              In all cases the job allocation request must specify  the  --ex‐
1669              clusive option and the step cannot specify the --overlap option.
1670              Otherwise the request will be denied.
1671
1672              Also with any of these options steps are not  allowed  to  share
1673              blades,  so  resources would remain idle inside an allocation if
1674              the step running on a blade does not take up all  the  nodes  on
1675              the blade.
1676
1677              The  network option is also available on systems with HPE Sling‐
1678              shot networks. It can be used to override  the  default  network
1679              resources  allocated  for  the  job step. Multiple values may be
1680              specified in a comma-separated list.
1681
1682                  def_<rsrc>=<val>
1683                         Per-CPU reserved allocation for this resource.
1684
1685                  res_<rsrc>=<val>
1686                         Per-node reserved allocation for this  resource.   If
1687                         set, overrides the per-CPU allocation.
1688
1689                  max_<rsrc>=<val>
1690                         Maximum per-node limit for this resource.
1691
1692                  depth=<depth>
1693                         Multiplier  for per-CPU resource allocation.  Default
1694                         is the number of reserved CPUs on the node.
1695
1696              The resources that may be requested are:
1697
1698                  txqs   Transmit command queues. The default  is  3  per-CPU,
1699                         maximum 1024 per-node.
1700
1701                  tgqs   Target command queues. The default is 2 per-CPU, max‐
1702                         imum 512 per-node.
1703
1704                  eqs    Event queues. The default is 8 per-CPU, maximum  2048
1705                         per-node.
1706
1707                  cts    Counters. The default is 2 per-CPU, maximum 2048 per-
1708                         node.
1709
1710                  tles   Trigger list entries. The default is 1 per-CPU, maxi‐
1711                         mum 2048 per-node.
1712
1713                  ptes   Portable  table  entries.  The  default is 8 per-CPU,
1714                         maximum 2048 per-node.
1715
1716                  les    List entries. The default  is  134  per-CPU,  maximum
1717                         65535 per-node.
1718
1719                  acs    Addressing  contexts. The default is 4 per-CPU, maxi‐
1720                         mum 1024 per-node.
1721
1722              This option applies to job and step allocations.
1723
1724       --nice[=adjustment]
1725              Run the job with an adjusted scheduling priority  within  Slurm.
1726              With no adjustment value the scheduling priority is decreased by
1727              100. A negative nice value increases the priority, otherwise de‐
1728              creases  it. The adjustment range is +/- 2147483645. Only privi‐
1729              leged users can specify a negative adjustment.
1730
1731       -Z, --no-allocate
1732              Run the specified tasks on a set of  nodes  without  creating  a
1733              Slurm  "job"  in the Slurm queue structure, bypassing the normal
1734              resource allocation step.  The list of nodes must  be  specified
1735              with  the  -w,  --nodelist  option.  This is a privileged option
1736              only available for the users "SlurmUser" and "root". This option
1737              applies to job allocations.
1738
1739       -k, --no-kill[=off]
1740              Do  not automatically terminate a job if one of the nodes it has
1741              been allocated fails. This option applies to job and step  allo‐
1742              cations.    The   job   will  assume  all  responsibilities  for
1743              fault-tolerance.  Tasks launch using this  option  will  not  be
1744              considered  terminated  (e.g.  -K,  --kill-on-bad-exit  and  -W,
1745              --wait options will have no effect upon the job step).  The  ac‐
1746              tive  job  step  (MPI job) will likely suffer a fatal error, but
1747              subsequent job steps may be run if this option is specified.
1748
1749              Specify an optional argument of "off" disable the effect of  the
1750              SLURM_NO_KILL environment variable.
1751
1752              The default action is to terminate the job upon node failure.
1753
1754       -F, --nodefile=<node_file>
1755              Much  like  --nodelist,  but  the list is contained in a file of
1756              name node file.  The node names of the list may also span multi‐
1757              ple  lines in the file.    Duplicate node names in the file will
1758              be ignored.  The order of the node names in the list is not  im‐
1759              portant; the node names will be sorted by Slurm.
1760
1761       -w, --nodelist={<node_name_list>|<filename>}
1762              Request  a  specific list of hosts.  The job will contain all of
1763              these hosts and possibly additional hosts as needed  to  satisfy
1764              resource   requirements.    The  list  may  be  specified  as  a
1765              comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1766              for  example),  or a filename.  The host list will be assumed to
1767              be a filename if it contains a "/" character.  If you specify  a
1768              minimum  node or processor count larger than can be satisfied by
1769              the supplied host list, additional resources will  be  allocated
1770              on  other  nodes  as  needed.  Rather than repeating a host name
1771              multiple times, an asterisk and a repetition count  may  be  ap‐
1772              pended  to  a host name. For example "host1,host1" and "host1*2"
1773              are equivalent. If the number of tasks is given and  a  list  of
1774              requested  nodes  is  also  given, the number of nodes used from
1775              that list will be reduced to match that of the number  of  tasks
1776              if the number of nodes in the list is greater than the number of
1777              tasks. This option applies to job and step allocations.
1778
1779       -N, --nodes=<minnodes>[-maxnodes]
1780              Request that a minimum of minnodes nodes be  allocated  to  this
1781              job.   A maximum node count may also be specified with maxnodes.
1782              If only one number is specified, this is used as both the  mini‐
1783              mum  and maximum node count.  The partition's node limits super‐
1784              sede those of the job.  If a job's node limits  are  outside  of
1785              the  range  permitted for its associated partition, the job will
1786              be left in a PENDING state.  This permits possible execution  at
1787              a  later  time,  when  the partition limit is changed.  If a job
1788              node limit exceeds the number of nodes configured in the  parti‐
1789              tion, the job will be rejected.  Note that the environment vari‐
1790              able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1791              ibility) will be set to the count of nodes actually allocated to
1792              the job. See the ENVIRONMENT VARIABLES section for more informa‐
1793              tion.   If -N is not specified, the default behavior is to allo‐
1794              cate enough nodes to satisfy  the  requested  resources  as  ex‐
1795              pressed  by  per-job  specification  options,  e.g.  -n,  -c and
1796              --gpus.  The job will be allocated as  many  nodes  as  possible
1797              within  the  range specified and without delaying the initiation
1798              of the job.  If the number of tasks is given and a number of re‐
1799              quested  nodes is also given, the number of nodes used from that
1800              request will be reduced to match that of the number of tasks  if
1801              the number of nodes in the request is greater than the number of
1802              tasks.  The node count specification may include a numeric value
1803              followed  by a suffix of "k" (multiplies numeric value by 1,024)
1804              or "m" (multiplies numeric value by 1,048,576). This option  ap‐
1805              plies to job and step allocations.
1806
1807       -n, --ntasks=<number>
1808              Specify  the  number of tasks to run. Request that srun allocate
1809              resources for ntasks tasks.  The default is one task  per  node,
1810              but  note  that  the --cpus-per-task option will change this de‐
1811              fault. This option applies to job and step allocations.
1812
1813       --ntasks-per-core=<ntasks>
1814              Request the maximum ntasks be invoked on each core.  This option
1815              applies  to  the  job  allocation,  but not to step allocations.
1816              Meant  to  be  used  with  the  --ntasks  option.   Related   to
1817              --ntasks-per-node  except  at the core level instead of the node
1818              level.  Masks will automatically be generated to bind the  tasks
1819              to  specific  cores  unless --cpu-bind=none is specified.  NOTE:
1820              This option is not supported when  using  SelectType=select/lin‐
1821              ear.
1822
1823       --ntasks-per-gpu=<ntasks>
1824              Request that there are ntasks tasks invoked for every GPU.  This
1825              option can work in two ways: 1) either specify --ntasks in addi‐
1826              tion,  in which case a type-less GPU specification will be auto‐
1827              matically determined to satisfy --ntasks-per-gpu, or 2)  specify
1828              the  GPUs  wanted (e.g. via --gpus or --gres) without specifying
1829              --ntasks, and the total task count will be automatically  deter‐
1830              mined.   The  number  of  CPUs  needed will be automatically in‐
1831              creased if necessary to allow for  any  calculated  task  count.
1832              This  option will implicitly set --gpu-bind=single:<ntasks>, but
1833              that can be overridden with an  explicit  --gpu-bind  specifica‐
1834              tion.   This  option  is  not compatible with a node range (i.e.
1835              -N<minnodes-maxnodes>).  This  option  is  not  compatible  with
1836              --gpus-per-task,  --gpus-per-socket, or --ntasks-per-node.  This
1837              option is not supported unless SelectType=cons_tres  is  config‐
1838              ured (either directly or indirectly on Cray systems).
1839
1840       --ntasks-per-node=<ntasks>
1841              Request  that  ntasks be invoked on each node.  If used with the
1842              --ntasks option, the --ntasks option will  take  precedence  and
1843              the  --ntasks-per-node  will  be  treated  as a maximum count of
1844              tasks per node.  Meant to be used with the --nodes option.  This
1845              is related to --cpus-per-task=ncpus, but does not require knowl‐
1846              edge of the actual number of cpus on each node.  In some  cases,
1847              it  is more convenient to be able to request that no more than a
1848              specific number of tasks be invoked on each node.   Examples  of
1849              this  include  submitting a hybrid MPI/OpenMP app where only one
1850              MPI "task/rank" should be assigned to each node  while  allowing
1851              the  OpenMP portion to utilize all of the parallelism present in
1852              the node, or submitting a single setup/cleanup/monitoring job to
1853              each  node  of a pre-existing allocation as one step in a larger
1854              job script. This option applies to job allocations.
1855
1856       --ntasks-per-socket=<ntasks>
1857              Request the maximum ntasks be invoked on each socket.  This  op‐
1858              tion applies to the job allocation, but not to step allocations.
1859              Meant  to  be  used  with  the  --ntasks  option.   Related   to
1860              --ntasks-per-node except at the socket level instead of the node
1861              level.  Masks will automatically be generated to bind the  tasks
1862              to  specific sockets unless --cpu-bind=none is specified.  NOTE:
1863              This option is not supported when  using  SelectType=select/lin‐
1864              ear.
1865
1866       --open-mode={append|truncate}
1867              Open the output and error files using append or truncate mode as
1868              specified.  For heterogeneous job steps  the  default  value  is
1869              "append".   Otherwise the default value is specified by the sys‐
1870              tem configuration parameter JobFileAppend. This  option  applies
1871              to job and step allocations.
1872
1873       -o, --output=<filename_pattern>
1874              Specify  the  "filename  pattern" for stdout redirection. By de‐
1875              fault in interactive mode, srun collects stdout from  all  tasks
1876              and  sends this output via TCP/IP to the attached terminal. With
1877              --output stdout may be redirected to a file,  to  one  file  per
1878              task,  or to /dev/null. See section IO Redirection below for the
1879              various forms of filename pattern.  If the  specified  file  al‐
1880              ready exists, it will be overwritten.
1881
1882              If  --error is not also specified on the command line, both std‐
1883              out and stderr will directed to the file specified by  --output.
1884              This option applies to job and step allocations.
1885
1886       -O, --overcommit
1887              Overcommit  resources. This option applies to job and step allo‐
1888              cations.
1889
1890              When applied to a job allocation (not including jobs  requesting
1891              exclusive access to the nodes) the resources are allocated as if
1892              only one task per node is requested. This  means  that  the  re‐
1893              quested  number of cpus per task (-c, --cpus-per-task) are allo‐
1894              cated per node rather than being multiplied  by  the  number  of
1895              tasks.  Options  used  to  specify the number of tasks per node,
1896              socket, core, etc. are ignored.
1897
1898              When applied to job step allocations (the srun command when exe‐
1899              cuted  within  an  existing  job allocation), this option can be
1900              used to launch more than one task per CPU.  Normally, srun  will
1901              not  allocate  more  than  one  process  per CPU.  By specifying
1902              --overcommit you are explicitly allowing more than  one  process
1903              per  CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1904              mitted to execute per node.  NOTE: MAX_TASKS_PER_NODE is defined
1905              in  the  file  slurm.h and is not a variable, it is set at Slurm
1906              build time.
1907
1908       --overlap
1909              Specifying --overlap allows steps to share all resources  (CPUs,
1910              memory, and GRES) with all other steps. A step using this option
1911              will overlap all other steps, even those that  did  not  specify
1912              --overlap.
1913
1914              By  default  steps  do  not  share resources with other parallel
1915              steps.  This option applies to step allocations.
1916
1917       -s, --oversubscribe
1918              The job allocation can over-subscribe resources with other  run‐
1919              ning  jobs.   The  resources to be over-subscribed can be nodes,
1920              sockets, cores, and/or hyperthreads  depending  upon  configura‐
1921              tion.   The  default  over-subscribe  behavior depends on system
1922              configuration and the  partition's  OverSubscribe  option  takes
1923              precedence over the job's option.  This option may result in the
1924              allocation being granted sooner than if the --oversubscribe  op‐
1925              tion was not set and allow higher system utilization, but appli‐
1926              cation performance will likely suffer due to competition for re‐
1927              sources.  This option applies to job allocations.
1928
1929       -p, --partition=<partition_names>
1930              Request  a  specific  partition for the resource allocation.  If
1931              not specified, the default behavior is to allow the  slurm  con‐
1932              troller  to  select  the  default partition as designated by the
1933              system administrator. If the job can use more  than  one  parti‐
1934              tion,  specify  their names in a comma separate list and the one
1935              offering earliest initiation will be used with no  regard  given
1936              to  the partition name ordering (although higher priority parti‐
1937              tions will be considered first).  When the job is initiated, the
1938              name  of  the  partition  used  will  be placed first in the job
1939              record partition string. This option applies to job allocations.
1940
1941       --power=<flags>
1942              Comma separated list of power management plugin  options.   Cur‐
1943              rently  available  flags  include: level (all nodes allocated to
1944              the job should have identical power caps, may be disabled by the
1945              Slurm  configuration option PowerParameters=job_no_level).  This
1946              option applies to job allocations.
1947
1948       --prefer=<list>
1949              Nodes can have features assigned to them by the  Slurm  adminis‐
1950              trator.   Users  can specify which of these features are desired
1951              but not required by their job using the prefer option.  This op‐
1952              tion  operates independently from --constraint and will override
1953              whatever is set there if possible.  When scheduling the features
1954              in  --prefer  are tried first if a node set isn't available with
1955              those features then --constraint is attempted.  See --constraint
1956              for more information, this option behaves the same way.
1957
1958
1959       -E, --preserve-env
1960              Pass    the    current    values    of   environment   variables
1961              SLURM_JOB_NUM_NODES and SLURM_NTASKS through to the  executable,
1962              rather  than  computing  them from command line parameters. This
1963              option applies to job allocations.
1964
1965       --priority=<value>
1966              Request a specific job priority.  May be subject  to  configura‐
1967              tion  specific  constraints.   value  should either be a numeric
1968              value or "TOP" (for highest possible value).  Only Slurm  opera‐
1969              tors and administrators can set the priority of a job.  This op‐
1970              tion applies to job allocations only.
1971
1972       --profile={all|none|<type>[,<type>...]}
1973              Enables detailed  data  collection  by  the  acct_gather_profile
1974              plugin.  Detailed data are typically time-series that are stored
1975              in an HDF5 file for the job or an InfluxDB database depending on
1976              the  configured plugin.  This option applies to job and step al‐
1977              locations.
1978
1979              All       All data types are collected. (Cannot be combined with
1980                        other values.)
1981
1982              None      No data types are collected. This is the default.
1983                         (Cannot be combined with other values.)
1984
1985       Valid type values are:
1986
1987              Energy Energy data is collected.
1988
1989              Task   Task (I/O, Memory, ...) data is collected.
1990
1991              Filesystem
1992                     Filesystem data is collected.
1993
1994              Network
1995                     Network (InfiniBand) data is collected.
1996
1997       --prolog=<executable>
1998              srun  will  run  executable  just before launching the job step.
1999              The command line arguments for executable will  be  the  command
2000              and arguments of the job step.  If executable is "none", then no
2001              srun prolog will be run. This parameter overrides the SrunProlog
2002              parameter  in  slurm.conf. This parameter is completely indepen‐
2003              dent from the Prolog parameter in slurm.conf.  This  option  ap‐
2004              plies to job allocations.
2005
2006       --propagate[=rlimit[,rlimit...]]
2007              Allows  users to specify which of the modifiable (soft) resource
2008              limits to propagate to the compute  nodes  and  apply  to  their
2009              jobs.  If  no rlimit is specified, then all resource limits will
2010              be propagated.  The following  rlimit  names  are  supported  by
2011              Slurm  (although  some options may not be supported on some sys‐
2012              tems):
2013
2014              ALL       All limits listed below (default)
2015
2016              NONE      No limits listed below
2017
2018              AS        The maximum  address  space  (virtual  memory)  for  a
2019                        process.
2020
2021              CORE      The maximum size of core file
2022
2023              CPU       The maximum amount of CPU time
2024
2025              DATA      The maximum size of a process's data segment
2026
2027              FSIZE     The  maximum  size  of files created. Note that if the
2028                        user sets FSIZE to less than the current size  of  the
2029                        slurmd.log,  job  launches will fail with a 'File size
2030                        limit exceeded' error.
2031
2032              MEMLOCK   The maximum size that may be locked into memory
2033
2034              NOFILE    The maximum number of open files
2035
2036              NPROC     The maximum number of processes available
2037
2038              RSS       The maximum resident set size. Note that this only has
2039                        effect with Linux kernels 2.4.30 or older or BSD.
2040
2041              STACK     The maximum stack size
2042
2043              This option applies to job allocations.
2044
2045       --pty  Execute  task  zero  in  pseudo  terminal mode.  Implicitly sets
2046              --unbuffered.  Implicitly sets --error and --output to /dev/null
2047              for  all  tasks except task zero, which may cause those tasks to
2048              exit immediately (e.g. shells will typically exit immediately in
2049              that situation).  This option applies to step allocations.
2050
2051       -q, --qos=<qos>
2052              Request a quality of service for the job.  QOS values can be de‐
2053              fined for each user/cluster/account  association  in  the  Slurm
2054              database.   Users will be limited to their association's defined
2055              set of qos's when the Slurm  configuration  parameter,  Account‐
2056              ingStorageEnforce, includes "qos" in its definition. This option
2057              applies to job allocations.
2058
2059       -Q, --quiet
2060              Suppress informational messages from srun. Errors will still  be
2061              displayed. This option applies to job and step allocations.
2062
2063       --quit-on-interrupt
2064              Quit  immediately  on single SIGINT (Ctrl-C). Use of this option
2065              disables the status feature normally  available  when  srun  re‐
2066              ceives  a  single  Ctrl-C and causes srun to instead immediately
2067              terminate the running job. This option applies to  step  alloca‐
2068              tions.
2069
2070       --reboot
2071              Force  the  allocated  nodes  to reboot before starting the job.
2072              This is only supported with some system configurations and  will
2073              otherwise  be  silently  ignored. Only root, SlurmUser or admins
2074              can reboot nodes. This option applies to job allocations.
2075
2076       -r, --relative=<n>
2077              Run a job step relative to node n  of  the  current  allocation.
2078              This  option  may  be used to spread several job steps out among
2079              the nodes of the current job. If -r is  used,  the  current  job
2080              step  will  begin at node n of the allocated nodelist, where the
2081              first node is considered node 0.  The -r option is not permitted
2082              with  -w  or -x option and will result in a fatal error when not
2083              running within a prior allocation (i.e. when SLURM_JOB_ID is not
2084              set).  The  default  for n is 0. If the value of --nodes exceeds
2085              the number of nodes identified with  the  --relative  option,  a
2086              warning  message  will be printed and the --relative option will
2087              take precedence. This option applies to step allocations.
2088
2089       --reservation=<reservation_names>
2090              Allocate resources for the job from the  named  reservation.  If
2091              the  job  can use more than one reservation, specify their names
2092              in a comma separate list and the one offering  earliest  initia‐
2093              tion.  Each  reservation  will be considered in the order it was
2094              requested.  All reservations will be listed  in  scontrol/squeue
2095              through  the  life of the job.  In accounting the first reserva‐
2096              tion will be seen and after the job starts the reservation  used
2097              will replace it.
2098
2099       --resv-ports[=count]
2100              Reserve  communication ports for this job. Users can specify the
2101              number  of  port  they  want  to  reserve.  The  parameter  Mpi‐
2102              Params=ports=12000-12999 must be specified in slurm.conf. If the
2103              number of reserved ports is zero then  no  ports  are  reserved.
2104              Used for native Cray's PMI only.  This option applies to job and
2105              step allocations.
2106
2107       --send-libs[=yes|no]
2108              If set to yes (or no argument), autodetect and broadcast the ex‐
2109              ecutable's  shared  object  dependencies  to  allocated  compute
2110              nodes. The files are placed in a directory  alongside  the  exe‐
2111              cutable. The LD_LIBRARY_PATH is automatically updated to include
2112              this cache directory as well. This overrides the default  behav‐
2113              ior  configured  in  slurm.conf SbcastParameters send_libs. This
2114              option  only  works  in  conjunction  with  --bcast.  See   also
2115              --bcast-exclude.
2116
2117       --signal=[R:]<sig_num>[@sig_time]
2118              When  a  job is within sig_time seconds of its end time, send it
2119              the signal sig_num.  Due to the resolution of event handling  by
2120              Slurm,  the  signal  may  be  sent up to 60 seconds earlier than
2121              specified.  sig_num may either be a signal number or name  (e.g.
2122              "10"  or "USR1").  sig_time must have an integer value between 0
2123              and 65535.  By default, no signal is sent before the  job's  end
2124              time.   If  a sig_num is specified without any sig_time, the de‐
2125              fault time will be 60 seconds. This option applies to job  allo‐
2126              cations.   Use the "R:" option to allow this job to overlap with
2127              a reservation with MaxStartDelay set.  To have the  signal  sent
2128              at preemption time see the preempt_send_user_signal SlurmctldPa‐
2129              rameter.
2130
2131       --slurmd-debug=<level>
2132              Specify a debug level for slurmd(8). The level may be  specified
2133              either  an  integer value between 0 [quiet, only errors are dis‐
2134              played] and 4 [verbose operation] or the SlurmdDebug tags.
2135
2136              quiet     Log nothing
2137
2138              fatal     Log only fatal errors
2139
2140              error     Log only errors
2141
2142              info      Log errors and general informational messages
2143
2144              verbose   Log errors and verbose informational messages
2145
2146              The slurmd debug information is copied onto the  stderr  of  the
2147              job.  By  default only errors are displayed. This option applies
2148              to job and step allocations.
2149
2150       --sockets-per-node=<sockets>
2151              Restrict node selection to nodes with  at  least  the  specified
2152              number  of  sockets.  See additional information under -B option
2153              above when task/affinity plugin is enabled. This option  applies
2154              to job allocations.
2155              NOTE:  This  option may implicitly impact the number of tasks if
2156              -n was not specified.
2157
2158       --spread-job
2159              Spread the job allocation over as many nodes as possible and at‐
2160              tempt  to  evenly  distribute  tasks across the allocated nodes.
2161              This option disables the topology/tree plugin.  This option  ap‐
2162              plies to job allocations.
2163
2164       --switches=<count>[@max-time]
2165              When  a tree topology is used, this defines the maximum count of
2166              leaf switches desired for the job allocation and optionally  the
2167              maximum time to wait for that number of switches. If Slurm finds
2168              an allocation containing more switches than the count specified,
2169              the job remains pending until it either finds an allocation with
2170              desired switch count or the time limit expires.  It there is  no
2171              switch  count limit, there is no delay in starting the job.  Ac‐
2172              ceptable  time  formats  include  "minutes",  "minutes:seconds",
2173              "hours:minutes:seconds",  "days-hours", "days-hours:minutes" and
2174              "days-hours:minutes:seconds".  The job's maximum time delay  may
2175              be limited by the system administrator using the SchedulerParam‐
2176              eters configuration parameter with the max_switch_wait parameter
2177              option.   On a dragonfly network the only switch count supported
2178              is 1 since communication performance will be highest when a  job
2179              is  allocate  resources  on  one leaf switch or more than 2 leaf
2180              switches.  The default max-time is  the  max_switch_wait  Sched‐
2181              ulerParameters. This option applies to job allocations.
2182
2183       --task-epilog=<executable>
2184              The  slurmstepd  daemon will run executable just after each task
2185              terminates. This will be executed before any TaskEpilog  parame‐
2186              ter  in  slurm.conf  is  executed.  This  is  meant to be a very
2187              short-lived program. If it fails to terminate within a few  sec‐
2188              onds,  it  will  be  killed along with any descendant processes.
2189              This option applies to step allocations.
2190
2191       --task-prolog=<executable>
2192              The slurmstepd daemon will run executable just before  launching
2193              each  task. This will be executed after any TaskProlog parameter
2194              in slurm.conf is executed.  Besides the normal environment vari‐
2195              ables, this has SLURM_TASK_PID available to identify the process
2196              ID of the task being started.  Standard output from this program
2197              of  the form "export NAME=value" will be used to set environment
2198              variables for the task being spawned.  This  option  applies  to
2199              step allocations.
2200
2201       --test-only
2202              Returns  an  estimate  of  when  a job would be scheduled to run
2203              given the current job queue and all  the  other  srun  arguments
2204              specifying  the job.  This limits srun's behavior to just return
2205              information; no job is actually submitted.  The program will  be
2206              executed  directly  by the slurmd daemon. This option applies to
2207              job allocations.
2208
2209       --thread-spec=<num>
2210              Count of specialized threads per node reserved by  the  job  for
2211              system  operations and not used by the application. The applica‐
2212              tion will not use these threads, but will be charged  for  their
2213              allocation.   This  option  can not be used with the --core-spec
2214              option. This option applies to job allocations.
2215
2216              NOTE: Explicitly setting a job's specialized  thread  value  im‐
2217              plicitly sets its --exclusive option, reserving entire nodes for
2218              the job.
2219
2220       -T, --threads=<nthreads>
2221              Allows limiting the number of concurrent threads  used  to  send
2222              the job request from the srun process to the slurmd processes on
2223              the allocated nodes. Default is to use one thread per  allocated
2224              node  up  to a maximum of 60 concurrent threads. Specifying this
2225              option limits the number of concurrent threads to nthreads (less
2226              than  or  equal  to  60).  This should only be used to set a low
2227              thread count for testing on very small  memory  computers.  This
2228              option applies to job allocations.
2229
2230       --threads-per-core=<threads>
2231              Restrict  node  selection  to  nodes with at least the specified
2232              number of threads per core. In task layout,  use  the  specified
2233              maximum  number  of threads per core. Implies --cpu-bind=threads
2234              unless overridden by command line or environment options.  NOTE:
2235              "Threads"  refers to the number of processing units on each core
2236              rather than the number of application tasks to be  launched  per
2237              core.  See  additional  information  under  -B option above when
2238              task/affinity plugin is enabled. This option applies to job  and
2239              step allocations.
2240              NOTE:  This  option may implicitly impact the number of tasks if
2241              -n was not specified.
2242
2243       -t, --time=<time>
2244              Set a limit on the total run time of the job allocation.  If the
2245              requested time limit exceeds the partition's time limit, the job
2246              will be left in a PENDING state  (possibly  indefinitely).   The
2247              default  time limit is the partition's default time limit.  When
2248              the time limit is reached, each task in each job  step  is  sent
2249              SIGTERM  followed  by  SIGKILL.  The interval between signals is
2250              specified by the Slurm configuration  parameter  KillWait.   The
2251              OverTimeLimit  configuration parameter may permit the job to run
2252              longer than scheduled.  Time resolution is one minute and second
2253              values are rounded up to the next minute.
2254
2255              A  time  limit  of  zero requests that no time limit be imposed.
2256              Acceptable time formats  include  "minutes",  "minutes:seconds",
2257              "hours:minutes:seconds",  "days-hours", "days-hours:minutes" and
2258              "days-hours:minutes:seconds". This option  applies  to  job  and
2259              step allocations.
2260
2261       --time-min=<time>
2262              Set  a  minimum time limit on the job allocation.  If specified,
2263              the job may have its --time limit lowered to a  value  no  lower
2264              than  --time-min  if doing so permits the job to begin execution
2265              earlier than otherwise possible.  The job's time limit will  not
2266              be  changed  after the job is allocated resources.  This is per‐
2267              formed by a backfill scheduling algorithm to allocate  resources
2268              otherwise  reserved  for  higher priority jobs.  Acceptable time
2269              formats  include   "minutes",   "minutes:seconds",   "hours:min‐
2270              utes:seconds",     "days-hours",     "days-hours:minutes"    and
2271              "days-hours:minutes:seconds". This option applies to job alloca‐
2272              tions.
2273
2274       --tmp=<size>[units]
2275              Specify  a minimum amount of temporary disk space per node.  De‐
2276              fault units are megabytes.  Different units can be specified us‐
2277              ing  the  suffix  [K|M|G|T].  This option applies to job alloca‐
2278              tions.
2279
2280       --uid=<user>
2281              Attempt to submit and/or run a job as user instead of the invok‐
2282              ing  user  id.  The  invoking user's credentials will be used to
2283              check access permissions for the target partition. User root may
2284              use  this option to run jobs as a normal user in a RootOnly par‐
2285              tition for example. If run as root, srun will drop  its  permis‐
2286              sions  to the uid specified after node allocation is successful.
2287              user may be the user name or numerical user ID. This option  ap‐
2288              plies to job and step allocations.
2289
2290       -u, --unbuffered
2291              By   default,   the   connection   between  slurmstepd  and  the
2292              user-launched application is over a pipe. The stdio output writ‐
2293              ten  by  the  application  is  buffered by the glibc until it is
2294              flushed or the output is set as unbuffered.  See  setbuf(3).  If
2295              this  option  is  specified the tasks are executed with a pseudo
2296              terminal so that the application output is unbuffered. This  op‐
2297              tion applies to step allocations.
2298
2299       --usage
2300              Display brief help message and exit.
2301
2302       --use-min-nodes
2303              If a range of node counts is given, prefer the smaller count.
2304
2305       -v, --verbose
2306              Increase the verbosity of srun's informational messages.  Multi‐
2307              ple -v's will further increase  srun's  verbosity.   By  default
2308              only  errors  will  be displayed. This option applies to job and
2309              step allocations.
2310
2311       -V, --version
2312              Display version information and exit.
2313
2314       -W, --wait=<seconds>
2315              Specify how long to wait after the first task terminates  before
2316              terminating  all  remaining tasks. A value of 0 indicates an un‐
2317              limited wait (a warning will be issued after  60  seconds).  The
2318              default value is set by the WaitTime parameter in the slurm con‐
2319              figuration file (see slurm.conf(5)). This option can  be  useful
2320              to  ensure  that  a job is terminated in a timely fashion in the
2321              event that one or more tasks terminate prematurely.   Note:  The
2322              -K,  --kill-on-bad-exit  option takes precedence over -W, --wait
2323              to terminate the job immediately if a task exits with a non-zero
2324              exit code. This option applies to job allocations.
2325
2326       --wckey=<wckey>
2327              Specify  wckey  to be used with job.  If TrackWCKey=no (default)
2328              in the slurm.conf this value is ignored. This option applies  to
2329              job allocations.
2330
2331       --x11[={all|first|last}]
2332              Sets  up  X11  forwarding on "all", "first" or "last" node(s) of
2333              the allocation.  This option is only enabled if Slurm  was  com‐
2334              piled  with  X11  support  and PrologFlags=x11 is defined in the
2335              slurm.conf. Default is "all".
2336
2337       srun will submit the job request to the slurm job controller, then ini‐
2338       tiate  all  processes on the remote nodes. If the request cannot be met
2339       immediately, srun will block until the resources are free  to  run  the
2340       job. If the -I (--immediate) option is specified srun will terminate if
2341       resources are not immediately available.
2342
2343       When initiating remote processes srun will propagate the current  work‐
2344       ing  directory,  unless --chdir=<path> is specified, in which case path
2345       will become the working directory for the remote processes.
2346
2347       The -n, -c, and -N options control how CPUs  and nodes  will  be  allo‐
2348       cated  to  the job. When specifying only the number of processes to run
2349       with -n, a default of one CPU per process is allocated.  By  specifying
2350       the number of CPUs required per task (-c), more than one CPU may be al‐
2351       located per process. If the number of nodes is specified with -N,  srun
2352       will attempt to allocate at least the number of nodes specified.
2353
2354       Combinations  of the above three options may be used to change how pro‐
2355       cesses are distributed across nodes and cpus. For instance, by specify‐
2356       ing  both  the number of processes and number of nodes on which to run,
2357       the number of processes per node is implied. However, if the number  of
2358       CPUs  per  process  is more important then number of processes (-n) and
2359       the number of CPUs per process (-c) should be specified.
2360
2361       srun will refuse to  allocate more than  one  process  per  CPU  unless
2362       --overcommit (-O) is also specified.
2363
2364       srun will attempt to meet the above specifications "at a minimum." That
2365       is, if 16 nodes are requested for 32 processes, and some nodes  do  not
2366       have 2 CPUs, the allocation of nodes will be increased in order to meet
2367       the demand for CPUs. In other words, a minimum of 16  nodes  are  being
2368       requested.  However,  if  16 nodes are requested for 15 processes, srun
2369       will consider this an error, as  15  processes  cannot  run  across  16
2370       nodes.
2371
2372
2373       IO Redirection
2374
2375       By  default, stdout and stderr will be redirected from all tasks to the
2376       stdout and stderr of srun, and stdin will be redirected from the  stan‐
2377       dard input of srun to all remote tasks.  If stdin is only to be read by
2378       a subset of the spawned tasks, specifying a file to  read  from  rather
2379       than  forwarding  stdin  from  the srun command may be preferable as it
2380       avoids moving and storing data that will never be read.
2381
2382       For OS X, the poll() function does not support stdin, so input  from  a
2383       terminal is not possible.
2384
2385       This  behavior  may  be changed with the --output, --error, and --input
2386       (-o, -e, -i) options. Valid format specifications for these options are
2387
2388
2389       all       stdout stderr is redirected from all tasks to srun.  stdin is
2390                 broadcast  to  all remote tasks.  (This is the default behav‐
2391                 ior)
2392
2393       none      stdout and stderr is not received from any  task.   stdin  is
2394                 not sent to any task (stdin is closed).
2395
2396       taskid    stdout  and/or  stderr are redirected from only the task with
2397                 relative id equal to taskid, where 0  <=  taskid  <=  ntasks,
2398                 where  ntasks is the total number of tasks in the current job
2399                 step.  stdin is redirected from the stdin  of  srun  to  this
2400                 same  task.   This file will be written on the node executing
2401                 the task.
2402
2403       filename  srun will redirect stdout and/or stderr  to  the  named  file
2404                 from all tasks.  stdin will be redirected from the named file
2405                 and broadcast to all tasks in the job.  filename refers to  a
2406                 path  on the host that runs srun.  Depending on the cluster's
2407                 file system layout, this may result in the  output  appearing
2408                 in  different  places  depending on whether the job is run in
2409                 batch mode.
2410
2411       filename pattern
2412                 srun allows for a filename pattern to be used to generate the
2413                 named  IO  file described above. The following list of format
2414                 specifiers may be used in the format  string  to  generate  a
2415                 filename  that will be unique to a given jobid, stepid, node,
2416                 or task. In each case, the appropriate number  of  files  are
2417                 opened and associated with the corresponding tasks. Note that
2418                 any format string containing %t, %n, and/or %N will be  writ‐
2419                 ten on the node executing the task rather than the node where
2420                 srun executes, these format specifiers are not supported on a
2421                 BGQ system.
2422
2423                 \\     Do not process any of the replacement symbols.
2424
2425                 %%     The character "%".
2426
2427                 %A     Job array's master job allocation number.
2428
2429                 %a     Job array ID (index) number.
2430
2431                 %J     jobid.stepid of the running job. (e.g. "128.0")
2432
2433                 %j     jobid of the running job.
2434
2435                 %s     stepid of the running job.
2436
2437                 %N     short  hostname.  This  will create a separate IO file
2438                        per node.
2439
2440                 %n     Node identifier relative to current job (e.g.  "0"  is
2441                        the  first node of the running job) This will create a
2442                        separate IO file per node.
2443
2444                 %t     task identifier (rank) relative to current  job.  This
2445                        will create a separate IO file per task.
2446
2447                 %u     User name.
2448
2449                 %x     Job name.
2450
2451                 A  number  placed  between  the  percent character and format
2452                 specifier may be used to zero-pad the result in the IO  file‐
2453                 name.  This  number is ignored if the format specifier corre‐
2454                 sponds to  non-numeric data (%N for example).
2455
2456                 Some examples of how the format string may be used  for  a  4
2457                 task  job  step with a Job ID of 128 and step id of 0 are in‐
2458                 cluded below:
2459
2460
2461                 job%J.out      job128.0.out
2462
2463                 job%4j.out     job0128.out
2464
2465                 job%j-%2t.out  job128-00.out, job128-01.out, ...
2466

PERFORMANCE

2468       Executing srun sends a remote procedure call to  slurmctld.  If  enough
2469       calls  from srun or other Slurm client commands that send remote proce‐
2470       dure calls to the slurmctld daemon come in at once, it can result in  a
2471       degradation  of performance of the slurmctld daemon, possibly resulting
2472       in a denial of service.
2473
2474       Do not run srun or other Slurm client commands that send remote  proce‐
2475       dure  calls to slurmctld from loops in shell scripts or other programs.
2476       Ensure that programs limit calls to srun to the minimum  necessary  for
2477       the information you are trying to gather.
2478
2479

INPUT ENVIRONMENT VARIABLES

2481       Upon  startup, srun will read and handle the options set in the follow‐
2482       ing environment variables. The majority of these variables are set  the
2483       same  way  the options are set, as defined above. For flag options that
2484       are defined to expect no argument, the option can be enabled by setting
2485       the  environment  variable  without a value (empty or NULL string), the
2486       string 'yes', or a non-zero number. Any other value for the environment
2487       variable  will  result in the option not being set.  There are a couple
2488       exceptions to these rules that are noted below.
2489       NOTE: Command line options always override  environment  variable  set‐
2490       tings.
2491
2492
2493       PMI_FANOUT            This  is  used  exclusively  with PMI (MPICH2 and
2494                             MVAPICH2) and controls the fanout of data  commu‐
2495                             nications. The srun command sends messages to ap‐
2496                             plication programs  (via  the  PMI  library)  and
2497                             those  applications may be called upon to forward
2498                             that data to up  to  this  number  of  additional
2499                             tasks.  Higher  values offload work from the srun
2500                             command to the applications and  likely  increase
2501                             the vulnerability to failures.  The default value
2502                             is 32.
2503
2504       PMI_FANOUT_OFF_HOST   This is used exclusively  with  PMI  (MPICH2  and
2505                             MVAPICH2)  and controls the fanout of data commu‐
2506                             nications.  The srun command  sends  messages  to
2507                             application  programs  (via  the PMI library) and
2508                             those applications may be called upon to  forward
2509                             that  data  to additional tasks. By default, srun
2510                             sends one message per host and one task  on  that
2511                             host  forwards  the  data  to other tasks on that
2512                             host up to PMI_FANOUT.  If PMI_FANOUT_OFF_HOST is
2513                             defined, the user task may be required to forward
2514                             the  data  to  tasks  on  other  hosts.   Setting
2515                             PMI_FANOUT_OFF_HOST   may  increase  performance.
2516                             Since more work is performed by the  PMI  library
2517                             loaded by the user application, failures also can
2518                             be more common and more  difficult  to  diagnose.
2519                             Should be disabled/enabled by setting to 0 or 1.
2520
2521       PMI_TIME              This  is  used  exclusively  with PMI (MPICH2 and
2522                             MVAPICH2) and controls how  much  the  communica‐
2523                             tions  from  the tasks to the srun are spread out
2524                             in time in order to avoid overwhelming  the  srun
2525                             command  with work. The default value is 500 (mi‐
2526                             croseconds) per task. On relatively slow  proces‐
2527                             sors  or systems with very large processor counts
2528                             (and large PMI data sets), higher values  may  be
2529                             required.
2530
2531       SLURM_ACCOUNT         Same as -A, --account
2532
2533       SLURM_ACCTG_FREQ      Same as --acctg-freq
2534
2535       SLURM_BCAST           Same as --bcast
2536
2537       SLURM_BCAST_EXCLUDE   Same as --bcast-exclude
2538
2539       SLURM_BURST_BUFFER    Same as --bb
2540
2541       SLURM_CLUSTERS        Same as -M, --clusters
2542
2543       SLURM_COMPRESS        Same as --compress
2544
2545       SLURM_CONF            The location of the Slurm configuration file.
2546
2547       SLURM_CONSTRAINT      Same as -C, --constraint
2548
2549       SLURM_CORE_SPEC       Same as --core-spec
2550
2551       SLURM_CPU_BIND        Same as --cpu-bind
2552
2553       SLURM_CPU_FREQ_REQ    Same as --cpu-freq.
2554
2555       SLURM_CPUS_PER_GPU    Same as --cpus-per-gpu
2556
2557       SRUN_CPUS_PER_TASK    Same as -c, --cpus-per-task
2558
2559       SLURM_DEBUG           Same  as  -v, --verbose. Must be set to 0 or 1 to
2560                             disable or enable the option.
2561
2562       SLURM_DEBUG_FLAGS     Specify debug flags for  srun  to  use.  See  De‐
2563                             bugFlags in the slurm.conf(5) man page for a full
2564                             list of flags.  The  environment  variable  takes
2565                             precedence over the setting in the slurm.conf.
2566
2567       SLURM_DELAY_BOOT      Same as --delay-boot
2568
2569       SLURM_DEPENDENCY      Same as -d, --dependency=<jobid>
2570
2571       SLURM_DISABLE_STATUS  Same as -X, --disable-status
2572
2573       SLURM_DIST_PLANESIZE  Plane distribution size. Only used if --distribu‐
2574                             tion=plane, without =<size>, is set.
2575
2576       SLURM_DISTRIBUTION    Same as -m, --distribution
2577
2578       SLURM_EPILOG          Same as --epilog
2579
2580       SLURM_EXACT           Same as --exact
2581
2582       SLURM_EXCLUSIVE       Same as --exclusive
2583
2584       SLURM_EXIT_ERROR      Specifies the exit code generated  when  a  Slurm
2585                             error occurs (e.g. invalid options).  This can be
2586                             used by a script to distinguish application  exit
2587                             codes  from various Slurm error conditions.  Also
2588                             see SLURM_EXIT_IMMEDIATE.
2589
2590       SLURM_EXIT_IMMEDIATE  Specifies the exit code generated when the  --im‐
2591                             mediate option is used and resources are not cur‐
2592                             rently available.  This can be used by  a  script
2593                             to  distinguish application exit codes from vari‐
2594                             ous   Slurm   error   conditions.     Also    see
2595                             SLURM_EXIT_ERROR.
2596
2597       SLURM_EXPORT_ENV      Same as --export
2598
2599       SLURM_GPU_BIND        Same as --gpu-bind
2600
2601       SLURM_GPU_FREQ        Same as --gpu-freq
2602
2603       SLURM_GPUS            Same as -G, --gpus
2604
2605       SLURM_GPUS_PER_NODE   Same as --gpus-per-node
2606
2607       SLURM_GPUS_PER_TASK   Same as --gpus-per-task
2608
2609       SLURM_GRES            Same as --gres. Also see SLURM_STEP_GRES
2610
2611       SLURM_GRES_FLAGS      Same as --gres-flags
2612
2613       SLURM_HINT            Same as --hint
2614
2615       SLURM_IMMEDIATE       Same as -I, --immediate
2616
2617       SLURM_JOB_ID          Same as --jobid
2618
2619       SLURM_JOB_NAME        Same  as -J, --job-name except within an existing
2620                             allocation, in which case it is ignored to  avoid
2621                             using  the  batch  job's name as the name of each
2622                             job step.
2623
2624       SLURM_JOB_NUM_NODES   Same as -N, --nodes.  Total number  of  nodes  in
2625                             the job’s resource allocation.
2626
2627       SLURM_KILL_BAD_EXIT   Same  as -K, --kill-on-bad-exit. Must be set to 0
2628                             or 1 to disable or enable the option.
2629
2630       SLURM_LABELIO         Same as -l, --label
2631
2632       SLURM_MEM_BIND        Same as --mem-bind
2633
2634       SLURM_MEM_PER_CPU     Same as --mem-per-cpu
2635
2636       SLURM_MEM_PER_GPU     Same as --mem-per-gpu
2637
2638       SLURM_MEM_PER_NODE    Same as --mem
2639
2640       SLURM_MPI_TYPE        Same as --mpi
2641
2642       SLURM_NETWORK         Same as --network
2643
2644       SLURM_NNODES          Same as -N, --nodes. Total number of nodes in the
2645                             job’s        resource       allocation.       See
2646                             SLURM_JOB_NUM_NODES. Included for backwards  com‐
2647                             patibility.
2648
2649       SLURM_NO_KILL         Same as -k, --no-kill
2650
2651       SLURM_NPROCS          Same  as -n, --ntasks. See SLURM_NTASKS. Included
2652                             for backwards compatibility.
2653
2654       SLURM_NTASKS          Same as -n, --ntasks
2655
2656       SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2657
2658       SLURM_NTASKS_PER_GPU  Same as --ntasks-per-gpu
2659
2660       SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2661
2662       SLURM_NTASKS_PER_SOCKET
2663                             Same as --ntasks-per-socket
2664
2665       SLURM_OPEN_MODE       Same as --open-mode
2666
2667       SLURM_OVERCOMMIT      Same as -O, --overcommit
2668
2669       SLURM_OVERLAP         Same as --overlap
2670
2671       SLURM_PARTITION       Same as -p, --partition
2672
2673       SLURM_PMI_KVS_NO_DUP_KEYS
2674                             If set, then PMI key-pairs will contain no dupli‐
2675                             cate  keys.  MPI  can use this variable to inform
2676                             the PMI library that it will  not  use  duplicate
2677                             keys  so  PMI  can  skip  the check for duplicate
2678                             keys.  This is the case for  MPICH2  and  reduces
2679                             overhead  in  testing for duplicates for improved
2680                             performance
2681
2682       SLURM_POWER           Same as --power
2683
2684       SLURM_PROFILE         Same as --profile
2685
2686       SLURM_PROLOG          Same as --prolog
2687
2688       SLURM_QOS             Same as --qos
2689
2690       SLURM_REMOTE_CWD      Same as -D, --chdir=
2691
2692       SLURM_REQ_SWITCH      When a tree topology is used,  this  defines  the
2693                             maximum count of switches desired for the job al‐
2694                             location and optionally the maximum time to  wait
2695                             for that number of switches. See --switches
2696
2697       SLURM_RESERVATION     Same as --reservation
2698
2699       SLURM_RESV_PORTS      Same as --resv-ports
2700
2701       SLURM_SEND_LIBS       Same as --send-libs
2702
2703       SLURM_SIGNAL          Same as --signal
2704
2705       SLURM_SPREAD_JOB      Same as --spread-job
2706
2707       SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2708                             if  set  and  non-zero, successive task exit mes‐
2709                             sages with the same exit  code  will  be  printed
2710                             only once.
2711
2712       SLURM_STDERRMODE      Same as -e, --error
2713
2714       SLURM_STDINMODE       Same as -i, --input
2715
2716       SLURM_STDOUTMODE      Same as -o, --output
2717
2718       SLURM_STEP_GRES       Same as --gres (only applies to job steps, not to
2719                             job allocations).  Also see SLURM_GRES
2720
2721       SLURM_STEP_KILLED_MSG_NODE_ID=ID
2722                             If set, only the specified node will log when the
2723                             job or step are killed by a signal.
2724
2725       SLURM_TASK_EPILOG     Same as --task-epilog
2726
2727       SLURM_TASK_PROLOG     Same as --task-prolog
2728
2729       SLURM_TEST_EXEC       If defined, srun will verify existence of the ex‐
2730                             ecutable program along with user execute  permis‐
2731                             sion on the node where srun was called before at‐
2732                             tempting to launch it on nodes in the step.
2733
2734       SLURM_THREAD_SPEC     Same as --thread-spec
2735
2736       SLURM_THREADS         Same as -T, --threads
2737
2738       SLURM_THREADS_PER_CORE
2739                             Same as --threads-per-core
2740
2741       SLURM_TIMELIMIT       Same as -t, --time
2742
2743       SLURM_UMASK           If defined, Slurm will use the defined  umask  to
2744                             set  permissions  when  creating the output/error
2745                             files for the job.
2746
2747       SLURM_UNBUFFEREDIO    Same as -u, --unbuffered
2748
2749       SLURM_USE_MIN_NODES   Same as --use-min-nodes
2750
2751       SLURM_WAIT            Same as -W, --wait
2752
2753       SLURM_WAIT4SWITCH     Max time  waiting  for  requested  switches.  See
2754                             --switches
2755
2756       SLURM_WCKEY           Same as -W, --wckey
2757
2758       SLURM_WORKING_DIR     -D, --chdir
2759
2760       SLURMD_DEBUG          Same as -d, --slurmd-debug. Must be set to 0 or 1
2761                             to disable or enable the option.
2762
2763       SRUN_CONTAINER        Same as --container.
2764
2765       SRUN_EXPORT_ENV       Same as --export, and will override  any  setting
2766                             for SLURM_EXPORT_ENV.
2767

OUTPUT ENVIRONMENT VARIABLES

2769       srun will set some environment variables in the environment of the exe‐
2770       cuting tasks on the remote compute nodes.  These environment  variables
2771       are:
2772
2773
2774       SLURM_*_HET_GROUP_#   For  a heterogeneous job allocation, the environ‐
2775                             ment variables are set separately for each compo‐
2776                             nent.
2777
2778       SLURM_CLUSTER_NAME    Name  of  the cluster on which the job is execut‐
2779                             ing.
2780
2781       SLURM_CPU_BIND_LIST   --cpu-bind map or mask list (list  of  Slurm  CPU
2782                             IDs  or  masks for this node, CPU_ID = Board_ID x
2783                             threads_per_board       +       Socket_ID       x
2784                             threads_per_socket + Core_ID x threads_per_core +
2785                             Thread_ID).
2786
2787       SLURM_CPU_BIND_TYPE   --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2788
2789       SLURM_CPU_BIND_VERBOSE
2790                             --cpu-bind verbosity (quiet,verbose).
2791
2792       SLURM_CPU_FREQ_REQ    Contains the value requested for cpu frequency on
2793                             the  srun  command  as  a  numerical frequency in
2794                             kilohertz, or a coded value for a request of low,
2795                             medium,highm1 or high for the frequency.  See the
2796                             description  of  the  --cpu-freq  option  or  the
2797                             SLURM_CPU_FREQ_REQ input environment variable.
2798
2799       SLURM_CPUS_ON_NODE    Number  of  CPUs  available  to  the step on this
2800                             node.  NOTE: The select/linear  plugin  allocates
2801                             entire  nodes to jobs, so the value indicates the
2802                             total count of CPUs on the  node.   For  the  se‐
2803                             lect/cons_res  and cons/tres plugins, this number
2804                             indicates the number of CPUs on this  node  allo‐
2805                             cated to the step.
2806
2807       SLURM_CPUS_PER_TASK   Number  of  cpus requested per task.  Only set if
2808                             the --cpus-per-task option is specified.
2809
2810       SLURM_DISTRIBUTION    Distribution type for the allocated jobs. Set the
2811                             distribution with -m, --distribution.
2812
2813       SLURM_GPUS_ON_NODE    Number  of  GPUs  available  to  the step on this
2814                             node.
2815
2816       SLURM_GTIDS           Global task IDs running on this node.  Zero  ori‐
2817                             gin  and  comma separated.  It is read internally
2818                             by pmi if Slurm was built with pmi support. Leav‐
2819                             ing  the variable set may cause problems when us‐
2820                             ing external packages from within the job (Abaqus
2821                             and  Ansys  have been known to have problems when
2822                             it is set - consult the appropriate documentation
2823                             for 3rd party software).
2824
2825       SLURM_HET_SIZE        Set to count of components in heterogeneous job.
2826
2827       SLURM_JOB_ACCOUNT     Account name associated of the job allocation.
2828
2829       SLURM_JOB_CPUS_PER_NODE
2830                             Count  of  CPUs available to the job on the nodes
2831                             in   the    allocation,    using    the    format
2832                             CPU_count[(xnumber_of_nodes)][,CPU_count  [(xnum‐
2833                             ber_of_nodes)]      ...].       For      example:
2834                             SLURM_JOB_CPUS_PER_NODE='72(x2),36'     indicates
2835                             that on the first and second nodes (as listed  by
2836                             SLURM_JOB_NODELIST)  the  allocation has 72 CPUs,
2837                             while the third node has 36 CPUs.  NOTE: The  se‐
2838                             lect/linear  plugin  allocates  entire  nodes  to
2839                             jobs, so the value indicates the total  count  of
2840                             CPUs  on allocated nodes. The select/cons_res and
2841                             select/cons_tres plugins allocate individual CPUs
2842                             to  jobs,  so this number indicates the number of
2843                             CPUs allocated to the job.
2844
2845       SLURM_JOB_DEPENDENCY  Set to value of the --dependency option.
2846
2847       SLURM_JOB_GPUS        The global GPU IDs of the GPUs allocated to  this
2848                             job.  The  GPU IDs are not relative to any device
2849                             cgroup, even  if  devices  are  constrained  with
2850                             task/cgroup.   Only  set in batch and interactive
2851                             jobs.
2852
2853       SLURM_JOB_ID          Job id of the executing job.
2854
2855       SLURM_JOB_NAME        Set to the value of the --job-name option or  the
2856                             command  name  when  srun is used to create a new
2857                             job allocation. Not set when srun is used only to
2858                             create  a  job  step (i.e. within an existing job
2859                             allocation).
2860
2861       SLURM_JOB_NODELIST    List of nodes allocated to the job.
2862
2863       SLURM_JOB_NODES       Total number of nodes in the job's resource allo‐
2864                             cation.
2865
2866       SLURM_JOB_PARTITION   Name  of  the  partition in which the job is run‐
2867                             ning.
2868
2869       SLURM_JOB_QOS         Quality Of Service (QOS) of the job allocation.
2870
2871       SLURM_JOB_RESERVATION Advanced reservation containing the  job  alloca‐
2872                             tion, if any.
2873
2874       SLURM_JOBID           Job  id  of  the executing job. See SLURM_JOB_ID.
2875                             Included for backwards compatibility.
2876
2877       SLURM_LAUNCH_NODE_IPADDR
2878                             IP address of the node from which the task launch
2879                             was initiated (where the srun command ran from).
2880
2881       SLURM_LOCALID         Node local task ID for the process within a job.
2882
2883       SLURM_MEM_BIND_LIST   --mem-bind  map  or  mask  list  (<list of IDs or
2884                             masks for this node>).
2885
2886       SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2887
2888       SLURM_MEM_BIND_SORT   Sort free cache pages (run zonesort on Intel  KNL
2889                             nodes).
2890
2891       SLURM_MEM_BIND_TYPE   --mem-bind type (none,rank,map_mem:,mask_mem:).
2892
2893       SLURM_MEM_BIND_VERBOSE
2894                             --mem-bind verbosity (quiet,verbose).
2895
2896       SLURM_NODE_ALIASES    Sets  of  node  name,  communication  address and
2897                             hostname for nodes allocated to the job from  the
2898                             cloud. Each element in the set if colon separated
2899                             and each set is comma separated. For example:
2900                             SLURM_NODE_ALIASES=
2901                             ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2902
2903       SLURM_NODEID          The relative node ID of the current node.
2904
2905       SLURM_NPROCS          Total number of processes in the current  job  or
2906                             job  step.  See  SLURM_NTASKS. Included for back‐
2907                             wards compatibility.
2908
2909       SLURM_NTASKS          Total number of processes in the current  job  or
2910                             job step.
2911
2912       SLURM_OVERCOMMIT      Set to 1 if --overcommit was specified.
2913
2914       SLURM_PRIO_PROCESS    The  scheduling priority (nice value) at the time
2915                             of job submission.  This value is  propagated  to
2916                             the spawned processes.
2917
2918       SLURM_PROCID          The MPI rank (or relative process ID) of the cur‐
2919                             rent process.
2920
2921       SLURM_SRUN_COMM_HOST  IP address of srun communication host.
2922
2923       SLURM_SRUN_COMM_PORT  srun communication port.
2924
2925       SLURM_CONTAINER       OCI Bundle for job.  Only set if  --container  is
2926                             specified.
2927
2928       SLURM_SHARDS_ON_NODE  Number  of  GPU  Shards  available to the step on
2929                             this node.
2930
2931       SLURM_STEP_GPUS       The global GPU IDs of the GPUs allocated to  this
2932                             step (excluding batch and interactive steps). The
2933                             GPU IDs are not relative to  any  device  cgroup,
2934                             even if devices are constrained with task/cgroup.
2935
2936       SLURM_STEP_ID         The step ID of the current job.
2937
2938       SLURM_STEP_LAUNCHER_PORT
2939                             Step launcher port.
2940
2941       SLURM_STEP_NODELIST   List of nodes allocated to the step.
2942
2943       SLURM_STEP_NUM_NODES  Number of nodes allocated to the step.
2944
2945       SLURM_STEP_NUM_TASKS  Number of processes in the job step or whole het‐
2946                             erogeneous job step.
2947
2948       SLURM_STEP_TASKS_PER_NODE
2949                             Number of processes per node within the step.
2950
2951       SLURM_STEPID          The   step   ID   of   the   current   job.   See
2952                             SLURM_STEP_ID. Included for backwards compatibil‐
2953                             ity.
2954
2955       SLURM_SUBMIT_DIR      The directory from which the allocation  was  in‐
2956                             voked from.
2957
2958       SLURM_SUBMIT_HOST     The hostname of the computer from which the allo‐
2959                             cation was invoked from.
2960
2961       SLURM_TASK_PID        The process ID of the task being started.
2962
2963       SLURM_TASKS_PER_NODE  Number of tasks to be  initiated  on  each  node.
2964                             Values  are comma separated and in the same order
2965                             as SLURM_JOB_NODELIST.  If two or  more  consecu‐
2966                             tive  nodes are to have the same task count, that
2967                             count is followed by "(x#)" where "#" is the rep‐
2968                             etition         count.        For        example,
2969                             "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2970                             first three nodes will each execute two tasks and
2971                             the fourth node will execute one task.
2972
2973       SLURM_TOPOLOGY_ADDR   This is set only if the  system  has  the  topol‐
2974                             ogy/tree  plugin  configured.   The value will be
2975                             set to the names network switches  which  may  be
2976                             involved  in  the  job's  communications from the
2977                             system's top level switch down to the leaf switch
2978                             and  ending  with  node name. A period is used to
2979                             separate each hardware component name.
2980
2981       SLURM_TOPOLOGY_ADDR_PATTERN
2982                             This is set only if the  system  has  the  topol‐
2983                             ogy/tree  plugin  configured.   The value will be
2984                             set  component  types  listed   in   SLURM_TOPOL‐
2985                             OGY_ADDR.   Each  component will be identified as
2986                             either "switch" or "node".  A period is  used  to
2987                             separate each hardware component type.
2988
2989       SLURM_UMASK           The umask in effect when the job was submitted.
2990
2991       SLURMD_NODENAME       Name of the node running the task. In the case of
2992                             a parallel  job  executing  on  multiple  compute
2993                             nodes,  the various tasks will have this environ‐
2994                             ment variable set to  different  values  on  each
2995                             compute node.
2996
2997       SRUN_DEBUG            Set  to  the  logging  level of the srun command.
2998                             Default value is 3 (info level).   The  value  is
2999                             incremented  or decremented based upon the --ver‐
3000                             bose and --quiet options.
3001

SIGNALS AND ESCAPE SEQUENCES

3003       Signals sent to the srun command are  automatically  forwarded  to  the
3004       tasks  it  is  controlling  with  a few exceptions. The escape sequence
3005       <control-c> will report the state of all tasks associated with the srun
3006       command.  If  <control-c>  is entered twice within one second, then the
3007       associated SIGINT signal will be sent to all tasks  and  a  termination
3008       sequence  will  be entered sending SIGCONT, SIGTERM, and SIGKILL to all
3009       spawned tasks.  If a third <control-c> is received,  the  srun  program
3010       will  be  terminated  without waiting for remote tasks to exit or their
3011       I/O to complete.
3012
3013       The escape sequence <control-z> is presently ignored.
3014
3015

MPI SUPPORT

3017       MPI use depends upon the type of MPI being used.  There are three  fun‐
3018       damentally  different  modes of operation used by these various MPI im‐
3019       plementations.
3020
3021       1. Slurm directly launches the tasks  and  performs  initialization  of
3022       communications  through the PMI2 or PMIx APIs.  For example: "srun -n16
3023       a.out".
3024
3025       2. Slurm creates a resource allocation for  the  job  and  then  mpirun
3026       launches tasks using Slurm's infrastructure (OpenMPI).
3027
3028       3.  Slurm  creates  a  resource  allocation for the job and then mpirun
3029       launches tasks using some mechanism other than Slurm, such  as  SSH  or
3030       RSH.   These  tasks are initiated outside of Slurm's monitoring or con‐
3031       trol. Slurm's epilog should be configured to purge these tasks when the
3032       job's  allocation  is  relinquished,  or  the use of pam_slurm_adopt is
3033       highly recommended.
3034
3035       See https://slurm.schedmd.com/mpi_guide.html for  more  information  on
3036       use of these various MPI implementations with Slurm.
3037
3038

MULTIPLE PROGRAM CONFIGURATION

3040       Comments  in the configuration file must have a "#" in column one.  The
3041       configuration file contains the following  fields  separated  by  white
3042       space:
3043
3044
3045       Task rank
3046              One or more task ranks to use this configuration.  Multiple val‐
3047              ues may be comma separated.  Ranges may be  indicated  with  two
3048              numbers separated with a '-' with the smaller number first (e.g.
3049              "0-4" and not "4-0").  To indicate all tasks not otherwise spec‐
3050              ified,  specify  a rank of '*' as the last line of the file.  If
3051              an attempt is made to initiate a task for  which  no  executable
3052              program is defined, the following error message will be produced
3053              "No executable program specified for this task".
3054
3055       Executable
3056              The name of the program to  execute.   May  be  fully  qualified
3057              pathname if desired.
3058
3059       Arguments
3060              Program  arguments.   The  expression "%t" will be replaced with
3061              the task's number.  The expression "%o" will  be  replaced  with
3062              the task's offset within this range (e.g. a configured task rank
3063              value of "1-5" would  have  offset  values  of  "0-4").   Single
3064              quotes  may  be  used to avoid having the enclosed values inter‐
3065              preted.  This field is optional.  Any arguments for the  program
3066              entered on the command line will be added to the arguments spec‐
3067              ified in the configuration file.
3068
3069       For example:
3070
3071       $ cat silly.conf
3072       ###################################################################
3073       # srun multiple program configuration file
3074       #
3075       # srun -n8 -l --multi-prog silly.conf
3076       ###################################################################
3077       4-6       hostname
3078       1,7       echo  task:%t
3079       0,2-3     echo  offset:%o
3080
3081       $ srun -n8 -l --multi-prog silly.conf
3082       0: offset:0
3083       1: task:1
3084       2: offset:1
3085       3: offset:2
3086       4: linux15.llnl.gov
3087       5: linux16.llnl.gov
3088       6: linux17.llnl.gov
3089       7: task:7
3090
3091

EXAMPLES

3093       This simple example demonstrates the execution of the command  hostname
3094       in  eight tasks. At least eight processors will be allocated to the job
3095       (the same as the task count) on however many nodes are required to sat‐
3096       isfy  the  request.  The output of each task will be proceeded with its
3097       task number.  (The machine "dev" in the example below has  a  total  of
3098       two CPUs per node)
3099
3100       $ srun -n8 -l hostname
3101       0: dev0
3102       1: dev0
3103       2: dev1
3104       3: dev1
3105       4: dev2
3106       5: dev2
3107       6: dev3
3108       7: dev3
3109
3110
3111       The  srun -r option is used within a job script to run two job steps on
3112       disjoint nodes in the following example. The script is run using  allo‐
3113       cate mode instead of as a batch job in this case.
3114
3115       $ cat test.sh
3116       #!/bin/sh
3117       echo $SLURM_JOB_NODELIST
3118       srun -lN2 -r2 hostname
3119       srun -lN2 hostname
3120
3121       $ salloc -N4 test.sh
3122       dev[7-10]
3123       0: dev9
3124       1: dev10
3125       0: dev7
3126       1: dev8
3127
3128
3129       The following script runs two job steps in parallel within an allocated
3130       set of nodes.
3131
3132       $ cat test.sh
3133       #!/bin/bash
3134       srun -lN2 -n4 -r 2 sleep 60 &
3135       srun -lN2 -r 0 sleep 60 &
3136       sleep 1
3137       squeue
3138       squeue -s
3139       wait
3140
3141       $ salloc -N4 test.sh
3142         JOBID PARTITION     NAME     USER  ST      TIME  NODES NODELIST
3143         65641     batch  test.sh   grondo   R      0:01      4 dev[7-10]
3144
3145       STEPID     PARTITION     USER      TIME NODELIST
3146       65641.0        batch   grondo      0:01 dev[7-8]
3147       65641.1        batch   grondo      0:01 dev[9-10]
3148
3149
3150       This example demonstrates how one executes a simple MPI  job.   We  use
3151       srun  to  build  a list of machines (nodes) to be used by mpirun in its
3152       required format. A sample command line and the script  to  be  executed
3153       follow.
3154
3155       $ cat test.sh
3156       #!/bin/sh
3157       MACHINEFILE="nodes.$SLURM_JOB_ID"
3158
3159       # Generate Machinefile for mpi such that hosts are in the same
3160       #  order as if run via srun
3161       #
3162       srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3163
3164       # Run using generated Machine file:
3165       mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3166
3167       rm $MACHINEFILE
3168
3169       $ salloc -N2 -n4 test.sh
3170
3171
3172       This  simple  example  demonstrates  the execution of different jobs on
3173       different nodes in the same srun.  You can do this for  any  number  of
3174       nodes  or  any number of jobs.  The executables are placed on the nodes
3175       sited by the SLURM_NODEID env var.  Starting at 0 and going to the num‐
3176       ber specified on the srun command line.
3177
3178       $ cat test.sh
3179       case $SLURM_NODEID in
3180           0) echo "I am running on "
3181              hostname ;;
3182           1) hostname
3183              echo "is where I am running" ;;
3184       esac
3185
3186       $ srun -N2 test.sh
3187       dev0
3188       is where I am running
3189       I am running on
3190       dev1
3191
3192
3193       This  example  demonstrates use of multi-core options to control layout
3194       of tasks.  We request that four sockets per  node  and  two  cores  per
3195       socket be dedicated to the job.
3196
3197       $ srun -N2 -B 4-4:2-2 a.out
3198
3199
3200       This  example shows a script in which Slurm is used to provide resource
3201       management for a job by executing the various job steps  as  processors
3202       become available for their dedicated use.
3203
3204       $ cat my.script
3205       #!/bin/bash
3206       srun -n4 prog1 &
3207       srun -n3 prog2 &
3208       srun -n1 prog3 &
3209       srun -n1 prog4 &
3210       wait
3211
3212
3213       This  example  shows  how to launch an application called "server" with
3214       one task, 8 CPUs and 16 GB of memory (2 GB per CPU) plus another appli‐
3215       cation  called "client" with 16 tasks, 1 CPU per task (the default) and
3216       1 GB of memory per task.
3217
3218       $ srun -n1 -c16 --mem-per-cpu=1gb server : -n16 --mem-per-cpu=1gb client
3219
3220

COPYING

3222       Copyright (C) 2006-2007 The Regents of the  University  of  California.
3223       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3224       Copyright (C) 2008-2010 Lawrence Livermore National Security.
3225       Copyright (C) 2010-2022 SchedMD LLC.
3226
3227       This  file  is  part  of Slurm, a resource management program.  For de‐
3228       tails, see <https://slurm.schedmd.com/>.
3229
3230       Slurm is free software; you can redistribute it and/or modify it  under
3231       the  terms  of  the GNU General Public License as published by the Free
3232       Software Foundation; either version 2 of the License, or (at  your  op‐
3233       tion) any later version.
3234
3235       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
3236       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
3237       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
3238       for more details.
3239
3240