1srun(1) Slurm Commands srun(1)
2
3
4
6 srun - Run parallel jobs
7
8
10 srun [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]]
11 executable(N) [args(N)...]
12
13 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
14 For more details about heterogeneous jobs see the document
15 https://slurm.schedmd.com/heterogeneous_jobs.html
16
17
19 Run a parallel job on cluster managed by Slurm. If necessary, srun
20 will first create a resource allocation in which to run the parallel
21 job.
22
23 The following document describes the influence of various options on
24 the allocation of cpus to jobs and tasks.
25 https://slurm.schedmd.com/cpu_management.html
26
27
29 srun will return the highest exit code of all tasks run or the highest
30 signal (with the high-order bit set in an 8-bit integer -- e.g. 128 +
31 signal) of any task that exited with a signal.
32 The value 253 is reserved for out-of-memory errors.
33
34
36 The executable is resolved in the following order:
37
38 1. If executable starts with ".", then path is constructed as: current
39 working directory / executable
40 2. If executable starts with a "/", then path is considered absolute.
41 3. If executable can be resolved through PATH. See path_resolution(7).
42 4. If executable is in current working directory.
43
44 Current working directory is the calling process working directory un‐
45 less the --chdir argument is passed, which will override the current
46 working directory.
47
48
50 --accel-bind=<options>
51 Control how tasks are bound to generic resources of type gpu and
52 nic. Multiple options may be specified. Supported options in‐
53 clude:
54
55 g Bind each task to GPUs which are closest to the allocated
56 CPUs.
57
58 n Bind each task to NICs which are closest to the allocated
59 CPUs.
60
61 v Verbose mode. Log how tasks are bound to GPU and NIC de‐
62 vices.
63
64 This option applies to job allocations.
65
66 -A, --account=<account>
67 Charge resources used by this job to specified account. The ac‐
68 count is an arbitrary string. The account name may be changed
69 after job submission using the scontrol command. This option ap‐
70 plies to job allocations.
71
72 --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
73 Define the job accounting and profiling sampling intervals in
74 seconds. This can be used to override the JobAcctGatherFre‐
75 quency parameter in the slurm.conf file. <datatype>=<interval>
76 specifies the task sampling interval for the jobacct_gather
77 plugin or a sampling interval for a profiling type by the
78 acct_gather_profile plugin. Multiple comma-separated
79 <datatype>=<interval> pairs may be specified. Supported datatype
80 values are:
81
82 task Sampling interval for the jobacct_gather plugins and
83 for task profiling by the acct_gather_profile
84 plugin.
85 NOTE: This frequency is used to monitor memory us‐
86 age. If memory limits are enforced the highest fre‐
87 quency a user can request is what is configured in
88 the slurm.conf file. It can not be disabled.
89
90 energy Sampling interval for energy profiling using the
91 acct_gather_energy plugin.
92
93 network Sampling interval for infiniband profiling using the
94 acct_gather_interconnect plugin.
95
96 filesystem Sampling interval for filesystem profiling using the
97 acct_gather_filesystem plugin.
98
99
100 The default value for the task sampling interval is 30 seconds.
101 The default value for all other intervals is 0. An interval of
102 0 disables sampling of the specified type. If the task sampling
103 interval is 0, accounting information is collected only at job
104 termination (reducing Slurm interference with the job).
105 Smaller (non-zero) values have a greater impact upon job perfor‐
106 mance, but a value of 30 seconds is not likely to be noticeable
107 for applications having less than 10,000 tasks. This option ap‐
108 plies to job allocations.
109
110 --bb=<spec>
111 Burst buffer specification. The form of the specification is
112 system dependent. Also see --bbf. This option applies to job
113 allocations. When the --bb option is used, Slurm parses this
114 option and creates a temporary burst buffer script file that is
115 used internally by the burst buffer plugins. See Slurm's burst
116 buffer guide for more information and examples:
117 https://slurm.schedmd.com/burst_buffer.html
118
119 --bbf=<file_name>
120 Path of file containing burst buffer specification. The form of
121 the specification is system dependent. Also see --bb. This op‐
122 tion applies to job allocations. See Slurm's burst buffer guide
123 for more information and examples:
124 https://slurm.schedmd.com/burst_buffer.html
125
126 --bcast[=<dest_path>]
127 Copy executable file to allocated compute nodes. If a file name
128 is specified, copy the executable to the specified destination
129 file path. If the path specified ends with '/' it is treated as
130 a target directory, and the destination file name will be
131 slurm_bcast_<job_id>.<step_id>_<nodename>. If no dest_path is
132 specified and the slurm.conf BcastParameters DestDir is config‐
133 ured then it is used, and the filename follows the above pat‐
134 tern. If none of the previous is specified, then --chdir is
135 used, and the filename follows the above pattern too. For exam‐
136 ple, "srun --bcast=/tmp/mine -N3 a.out" will copy the file
137 "a.out" from your current directory to the file "/tmp/mine" on
138 each of the three allocated compute nodes and execute that file.
139 This option applies to step allocations.
140
141 --bcast-exclude={NONE|<exclude_path>[,<exclude_path>...]}
142 Comma-separated list of absolute directory paths to be excluded
143 when autodetecting and broadcasting executable shared object de‐
144 pendencies through --bcast. If the keyword "NONE" is configured,
145 no directory paths will be excluded. The default value is that
146 of slurm.conf BcastExclude and this option overrides it. See
147 also --bcast and --send-libs.
148
149 -b, --begin=<time>
150 Defer initiation of this job until the specified time. It ac‐
151 cepts times of the form HH:MM:SS to run a job at a specific time
152 of day (seconds are optional). (If that time is already past,
153 the next day is assumed.) You may also specify midnight, noon,
154 fika (3 PM) or teatime (4 PM) and you can have a time-of-day
155 suffixed with AM or PM for running in the morning or the
156 evening. You can also say what day the job will be run, by
157 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
158 Combine date and time using the following format
159 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
160 count time-units, where the time-units can be seconds (default),
161 minutes, hours, days, or weeks and you can tell Slurm to run the
162 job today with the keyword today and to run the job tomorrow
163 with the keyword tomorrow. The value may be changed after job
164 submission using the scontrol command. For example:
165
166 --begin=16:00
167 --begin=now+1hour
168 --begin=now+60 (seconds by default)
169 --begin=2010-01-20T12:34:00
170
171
172 Notes on date/time specifications:
173 - Although the 'seconds' field of the HH:MM:SS time specifica‐
174 tion is allowed by the code, note that the poll time of the
175 Slurm scheduler is not precise enough to guarantee dispatch of
176 the job on the exact second. The job will be eligible to start
177 on the next poll following the specified time. The exact poll
178 interval depends on the Slurm scheduler (e.g., 60 seconds with
179 the default sched/builtin).
180 - If no time (HH:MM:SS) is specified, the default is
181 (00:00:00).
182 - If a date is specified without a year (e.g., MM/DD) then the
183 current year is assumed, unless the combination of MM/DD and
184 HH:MM:SS has already passed for that year, in which case the
185 next year is used.
186 This option applies to job allocations.
187
188 -D, --chdir=<path>
189 Have the remote processes do a chdir to path before beginning
190 execution. The default is to chdir to the current working direc‐
191 tory of the srun process. The path can be specified as full path
192 or relative path to the directory where the command is executed.
193 This option applies to job allocations.
194
195 --cluster-constraint=<list>
196 Specifies features that a federated cluster must have to have a
197 sibling job submitted to it. Slurm will attempt to submit a sib‐
198 ling job to a cluster if it has at least one of the specified
199 features.
200
201 -M, --clusters=<string>
202 Clusters to issue commands to. Multiple cluster names may be
203 comma separated. The job will be submitted to the one cluster
204 providing the earliest expected job initiation time. The default
205 value is the current cluster. A value of 'all' will query to run
206 on all clusters. Note the --export option to control environ‐
207 ment variables exported between clusters. This option applies
208 only to job allocations. Note that the SlurmDBD must be up for
209 this option to work properly.
210
211 --comment=<string>
212 An arbitrary comment. This option applies to job allocations.
213
214 --compress[=type]
215 Compress file before sending it to compute hosts. The optional
216 argument specifies the data compression library to be used. The
217 default is BcastParameters Compression= if set or "lz4" other‐
218 wise. Supported values are "lz4". Some compression libraries
219 may be unavailable on some systems. For use with the --bcast
220 option. This option applies to step allocations.
221
222 -C, --constraint=<list>
223 Nodes can have features assigned to them by the Slurm adminis‐
224 trator. Users can specify which of these features are required
225 by their job using the constraint option. If you are looking for
226 'soft' constraints please see see --prefer for more information.
227 Only nodes having features matching the job constraints will be
228 used to satisfy the request. Multiple constraints may be speci‐
229 fied with AND, OR, matching OR, resource counts, etc. (some op‐
230 erators are not supported on all system types).
231
232 NOTE: If features that are part of the node_features/helpers
233 plugin are requested, then only the Single Name and AND options
234 are supported.
235
236 Supported --constraint options include:
237
238 Single Name
239 Only nodes which have the specified feature will be used.
240 For example, --constraint="intel"
241
242 Node Count
243 A request can specify the number of nodes needed with
244 some feature by appending an asterisk and count after the
245 feature name. For example, --nodes=16 --con‐
246 straint="graphics*4 ..." indicates that the job requires
247 16 nodes and that at least four of those nodes must have
248 the feature "graphics."
249
250 AND If only nodes with all of specified features will be
251 used. The ampersand is used for an AND operator. For
252 example, --constraint="intel&gpu"
253
254 OR If only nodes with at least one of specified features
255 will be used. The vertical bar is used for an OR opera‐
256 tor. For example, --constraint="intel|amd"
257
258 Matching OR
259 If only one of a set of possible options should be used
260 for all allocated nodes, then use the OR operator and en‐
261 close the options within square brackets. For example,
262 --constraint="[rack1|rack2|rack3|rack4]" might be used to
263 specify that all nodes must be allocated on a single rack
264 of the cluster, but any of those four racks can be used.
265
266 Multiple Counts
267 Specific counts of multiple resources may be specified by
268 using the AND operator and enclosing the options within
269 square brackets. For example, --con‐
270 straint="[rack1*2&rack2*4]" might be used to specify that
271 two nodes must be allocated from nodes with the feature
272 of "rack1" and four nodes must be allocated from nodes
273 with the feature "rack2".
274
275 NOTE: This construct does not support multiple Intel KNL
276 NUMA or MCDRAM modes. For example, while --con‐
277 straint="[(knl&quad)*2&(knl&hemi)*4]" is not supported,
278 --constraint="[haswell*2&(knl&hemi)*4]" is supported.
279 Specification of multiple KNL modes requires the use of a
280 heterogeneous job.
281
282 NOTE: Multiple Counts can cause jobs to be allocated with
283 a non-optimal network layout.
284
285 Brackets
286 Brackets can be used to indicate that you are looking for
287 a set of nodes with the different requirements contained
288 within the brackets. For example, --con‐
289 straint="[(rack1|rack2)*1&(rack3)*2]" will get you one
290 node with either the "rack1" or "rack2" features and two
291 nodes with the "rack3" feature. The same request without
292 the brackets will try to find a single node that meets
293 those requirements.
294
295 NOTE: Brackets are only reserved for Multiple Counts and
296 Matching OR syntax. AND operators require a count for
297 each feature inside square brackets (i.e.
298 "[quad*2&hemi*1]"). Slurm will only allow a single set of
299 bracketed constraints per job.
300
301 Parenthesis
302 Parenthesis can be used to group like node features to‐
303 gether. For example, --con‐
304 straint="[(knl&snc4&flat)*4&haswell*1]" might be used to
305 specify that four nodes with the features "knl", "snc4"
306 and "flat" plus one node with the feature "haswell" are
307 required. All options within parenthesis should be
308 grouped with AND (e.g. "&") operands.
309
310 WARNING: When srun is executed from within salloc or sbatch, the
311 constraint value can only contain a single feature name. None of
312 the other operators are currently supported for job steps.
313 This option applies to job and step allocations.
314
315 --container=<path_to_container>
316 Absolute path to OCI container bundle.
317
318 --contiguous
319 If set, then the allocated nodes must form a contiguous set.
320
321 NOTE: If SelectPlugin=cons_res this option won't be honored with
322 the topology/tree or topology/3d_torus plugins, both of which
323 can modify the node ordering. This option applies to job alloca‐
324 tions.
325
326 -S, --core-spec=<num>
327 Count of specialized cores per node reserved by the job for sys‐
328 tem operations and not used by the application. The application
329 will not use these cores, but will be charged for their alloca‐
330 tion. Default value is dependent upon the node's configured
331 CoreSpecCount value. If a value of zero is designated and the
332 Slurm configuration option AllowSpecResourcesUsage is enabled,
333 the job will be allowed to override CoreSpecCount and use the
334 specialized resources on nodes it is allocated. This option can
335 not be used with the --thread-spec option. This option applies
336 to job allocations.
337
338 NOTE: This option may implicitly impact the number of tasks if
339 -n was not specified.
340
341 NOTE: Explicitly setting a job's specialized core value implic‐
342 itly sets its --exclusive option, reserving entire nodes for the
343 job.
344
345 --cores-per-socket=<cores>
346 Restrict node selection to nodes with at least the specified
347 number of cores per socket. See additional information under -B
348 option above when task/affinity plugin is enabled. This option
349 applies to job allocations.
350
351 --cpu-bind=[{quiet|verbose},]<type>
352 Bind tasks to CPUs. Used only when the task/affinity plugin is
353 enabled. NOTE: To have Slurm always report on the selected CPU
354 binding for all commands executed in a shell, you can enable
355 verbose mode by setting the SLURM_CPU_BIND environment variable
356 value to "verbose".
357
358 The following informational environment variables are set when
359 --cpu-bind is in use:
360
361 SLURM_CPU_BIND_VERBOSE
362 SLURM_CPU_BIND_TYPE
363 SLURM_CPU_BIND_LIST
364
365 See the ENVIRONMENT VARIABLES section for a more detailed de‐
366 scription of the individual SLURM_CPU_BIND variables. These
367 variable are available only if the task/affinity plugin is con‐
368 figured.
369
370 When using --cpus-per-task to run multithreaded tasks, be aware
371 that CPU binding is inherited from the parent of the process.
372 This means that the multithreaded task should either specify or
373 clear the CPU binding itself to avoid having all threads of the
374 multithreaded task use the same mask/CPU as the parent. Alter‐
375 natively, fat masks (masks which specify more than one allowed
376 CPU) could be used for the tasks in order to provide multiple
377 CPUs for the multithreaded tasks.
378
379 Note that a job step can be allocated different numbers of CPUs
380 on each node or be allocated CPUs not starting at location zero.
381 Therefore one of the options which automatically generate the
382 task binding is recommended. Explicitly specified masks or
383 bindings are only honored when the job step has been allocated
384 every available CPU on the node.
385
386 Binding a task to a NUMA locality domain means to bind the task
387 to the set of CPUs that belong to the NUMA locality domain or
388 "NUMA node". If NUMA locality domain options are used on sys‐
389 tems with no NUMA support, then each socket is considered a lo‐
390 cality domain.
391
392 If the --cpu-bind option is not used, the default binding mode
393 will depend upon Slurm's configuration and the step's resource
394 allocation. If all allocated nodes have the same configured
395 CpuBind mode, that will be used. Otherwise if the job's Parti‐
396 tion has a configured CpuBind mode, that will be used. Other‐
397 wise if Slurm has a configured TaskPluginParam value, that mode
398 will be used. Otherwise automatic binding will be performed as
399 described below.
400
401 Auto Binding
402 Applies only when task/affinity is enabled. If the job
403 step allocation includes an allocation with a number of
404 sockets, cores, or threads equal to the number of tasks
405 times cpus-per-task, then the tasks will by default be
406 bound to the appropriate resources (auto binding). Dis‐
407 able this mode of operation by explicitly setting
408 "--cpu-bind=none". Use TaskPluginParam=auto‐
409 bind=[threads|cores|sockets] to set a default cpu binding
410 in case "auto binding" doesn't find a match.
411
412 Supported options include:
413
414 q[uiet]
415 Quietly bind before task runs (default)
416
417 v[erbose]
418 Verbosely report binding before task runs
419
420 no[ne] Do not bind tasks to CPUs (default unless auto
421 binding is applied)
422
423 rank Automatically bind by task rank. The lowest num‐
424 bered task on each node is bound to socket (or
425 core or thread) zero, etc. Not supported unless
426 the entire node is allocated to the job.
427
428 map_cpu:<list>
429 Bind by setting CPU masks on tasks (or ranks) as
430 specified where <list> is
431 <cpu_id_for_task_0>,<cpu_id_for_task_1>,... If
432 the number of tasks (or ranks) exceeds the number
433 of elements in this list, elements in the list
434 will be reused as needed starting from the begin‐
435 ning of the list. To simplify support for large
436 task counts, the lists may follow a map with an
437 asterisk and repetition count. For example
438 "map_cpu:0*4,3*4".
439
440 mask_cpu:<list>
441 Bind by setting CPU masks on tasks (or ranks) as
442 specified where <list> is
443 <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
444 The mapping is specified for a node and identical
445 mapping is applied to the tasks on every node
446 (i.e. the lowest task ID on each node is mapped to
447 the first mask specified in the list, etc.). CPU
448 masks are always interpreted as hexadecimal values
449 but can be preceded with an optional '0x'. If the
450 number of tasks (or ranks) exceeds the number of
451 elements in this list, elements in the list will
452 be reused as needed starting from the beginning of
453 the list. To simplify support for large task
454 counts, the lists may follow a map with an aster‐
455 isk and repetition count. For example
456 "mask_cpu:0x0f*4,0xf0*4".
457
458 rank_ldom
459 Bind to a NUMA locality domain by rank. Not sup‐
460 ported unless the entire node is allocated to the
461 job.
462
463 map_ldom:<list>
464 Bind by mapping NUMA locality domain IDs to tasks
465 as specified where <list> is
466 <ldom1>,<ldom2>,...<ldomN>. The locality domain
467 IDs are interpreted as decimal values unless they
468 are preceded with '0x' in which case they are in‐
469 terpreted as hexadecimal values. Not supported
470 unless the entire node is allocated to the job.
471
472 mask_ldom:<list>
473 Bind by setting NUMA locality domain masks on
474 tasks as specified where <list> is
475 <mask1>,<mask2>,...<maskN>. NUMA locality domain
476 masks are always interpreted as hexadecimal values
477 but can be preceded with an optional '0x'. Not
478 supported unless the entire node is allocated to
479 the job.
480
481 sockets
482 Automatically generate masks binding tasks to
483 sockets. Only the CPUs on the socket which have
484 been allocated to the job will be used. If the
485 number of tasks differs from the number of allo‐
486 cated sockets this can result in sub-optimal bind‐
487 ing.
488
489 cores Automatically generate masks binding tasks to
490 cores. If the number of tasks differs from the
491 number of allocated cores this can result in
492 sub-optimal binding.
493
494 threads
495 Automatically generate masks binding tasks to
496 threads. If the number of tasks differs from the
497 number of allocated threads this can result in
498 sub-optimal binding.
499
500 ldoms Automatically generate masks binding tasks to NUMA
501 locality domains. If the number of tasks differs
502 from the number of allocated locality domains this
503 can result in sub-optimal binding.
504
505 help Show help message for cpu-bind
506
507 This option applies to job and step allocations.
508
509 --cpu-freq=<p1>[-p2[:p3]]
510
511 Request that the job step initiated by this srun command be run
512 at some requested frequency if possible, on the CPUs selected
513 for the step on the compute node(s).
514
515 p1 can be [#### | low | medium | high | highm1] which will set
516 the frequency scaling_speed to the corresponding value, and set
517 the frequency scaling_governor to UserSpace. See below for defi‐
518 nition of the values.
519
520 p1 can be [Conservative | OnDemand | Performance | PowerSave]
521 which will set the scaling_governor to the corresponding value.
522 The governor has to be in the list set by the slurm.conf option
523 CpuFreqGovernors.
524
525 When p2 is present, p1 will be the minimum scaling frequency and
526 p2 will be the maximum scaling frequency.
527
528 p2 can be [#### | medium | high | highm1] p2 must be greater
529 than p1.
530
531 p3 can be [Conservative | OnDemand | Performance | PowerSave |
532 SchedUtil | UserSpace] which will set the governor to the corre‐
533 sponding value.
534
535 If p3 is UserSpace, the frequency scaling_speed will be set by a
536 power or energy aware scheduling strategy to a value between p1
537 and p2 that lets the job run within the site's power goal. The
538 job may be delayed if p1 is higher than a frequency that allows
539 the job to run within the goal.
540
541 If the current frequency is < min, it will be set to min. Like‐
542 wise, if the current frequency is > max, it will be set to max.
543
544 Acceptable values at present include:
545
546 #### frequency in kilohertz
547
548 Low the lowest available frequency
549
550 High the highest available frequency
551
552 HighM1 (high minus one) will select the next highest
553 available frequency
554
555 Medium attempts to set a frequency in the middle of the
556 available range
557
558 Conservative attempts to use the Conservative CPU governor
559
560 OnDemand attempts to use the OnDemand CPU governor (the de‐
561 fault value)
562
563 Performance attempts to use the Performance CPU governor
564
565 PowerSave attempts to use the PowerSave CPU governor
566
567 UserSpace attempts to use the UserSpace CPU governor
568
569 The following informational environment variable is set
570 in the job
571 step when --cpu-freq option is requested.
572 SLURM_CPU_FREQ_REQ
573
574 This environment variable can also be used to supply the value
575 for the CPU frequency request if it is set when the 'srun' com‐
576 mand is issued. The --cpu-freq on the command line will over‐
577 ride the environment variable value. The form on the environ‐
578 ment variable is the same as the command line. See the ENVIRON‐
579 MENT VARIABLES section for a description of the
580 SLURM_CPU_FREQ_REQ variable.
581
582 NOTE: This parameter is treated as a request, not a requirement.
583 If the job step's node does not support setting the CPU fre‐
584 quency, or the requested value is outside the bounds of the le‐
585 gal frequencies, an error is logged, but the job step is allowed
586 to continue.
587
588 NOTE: Setting the frequency for just the CPUs of the job step
589 implies that the tasks are confined to those CPUs. If task con‐
590 finement (i.e. the task/affinity TaskPlugin is enabled, or the
591 task/cgroup TaskPlugin is enabled with "ConstrainCores=yes" set
592 in cgroup.conf) is not configured, this parameter is ignored.
593
594 NOTE: When the step completes, the frequency and governor of
595 each selected CPU is reset to the previous values.
596
597 NOTE: When submitting jobs with the --cpu-freq option with lin‐
598 uxproc as the ProctrackType can cause jobs to run too quickly
599 before Accounting is able to poll for job information. As a re‐
600 sult not all of accounting information will be present.
601
602 This option applies to job and step allocations.
603
604 --cpus-per-gpu=<ncpus>
605 Advise Slurm that ensuing job steps will require ncpus proces‐
606 sors per allocated GPU. Not compatible with the --cpus-per-task
607 option.
608
609 -c, --cpus-per-task=<ncpus>
610 Request that ncpus be allocated per process. This may be useful
611 if the job is multithreaded and requires more than one CPU per
612 task for optimal performance. Explicitly requesting this option
613 implies --exact. The default is one CPU per process and does not
614 imply --exact. If -c is specified without -n, as many tasks
615 will be allocated per node as possible while satisfying the -c
616 restriction. For instance on a cluster with 8 CPUs per node, a
617 job request for 4 nodes and 3 CPUs per task may be allocated 3
618 or 6 CPUs per node (1 or 2 tasks per node) depending upon re‐
619 source consumption by other jobs. Such a job may be unable to
620 execute more than a total of 4 tasks.
621
622 WARNING: There are configurations and options interpreted dif‐
623 ferently by job and job step requests which can result in incon‐
624 sistencies for this option. For example srun -c2
625 --threads-per-core=1 prog may allocate two cores for the job,
626 but if each of those cores contains two threads, the job alloca‐
627 tion will include four CPUs. The job step allocation will then
628 launch two threads per CPU for a total of two tasks.
629
630 WARNING: When srun is executed from within salloc or sbatch,
631 there are configurations and options which can result in incon‐
632 sistent allocations when -c has a value greater than -c on sal‐
633 loc or sbatch. The number of cpus per task specified for salloc
634 or sbatch is not automatically inherited by srun and, if de‐
635 sired, must be requested again, either by specifying
636 --cpus-per-task when calling srun, or by setting the
637 SRUN_CPUS_PER_TASK environment variable.
638
639 This option applies to job and step allocations.
640
641 --deadline=<OPT>
642 remove the job if no ending is possible before this deadline
643 (start > (deadline - time[-min])). Default is no deadline.
644 Valid time formats are:
645 HH:MM[:SS] [AM|PM]
646 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
647 MM/DD[/YY]-HH:MM[:SS]
648 YYYY-MM-DD[THH:MM[:SS]]]
649 now[+count[seconds(default)|minutes|hours|days|weeks]]
650
651 This option applies only to job allocations.
652
653 --delay-boot=<minutes>
654 Do not reboot nodes in order to satisfied this job's feature
655 specification if the job has been eligible to run for less than
656 this time period. If the job has waited for less than the spec‐
657 ified period, it will use only nodes which already have the
658 specified features. The argument is in units of minutes. A de‐
659 fault value may be set by a system administrator using the de‐
660 lay_boot option of the SchedulerParameters configuration parame‐
661 ter in the slurm.conf file, otherwise the default value is zero
662 (no delay).
663
664 This option applies only to job allocations.
665
666 -d, --dependency=<dependency_list>
667 Defer the start of this job until the specified dependencies
668 have been satisfied completed. This option does not apply to job
669 steps (executions of srun within an existing salloc or sbatch
670 allocation) only to job allocations. <dependency_list> is of
671 the form <type:job_id[:job_id][,type:job_id[:job_id]]> or
672 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
673 must be satisfied if the "," separator is used. Any dependency
674 may be satisfied if the "?" separator is used. Only one separa‐
675 tor may be used. For instance:
676 -d afterok:20:21,afterany:23
677 means that the job can run only after a 0 return code of jobs 20
678 and 21 AND the completion of job 23. However:
679 -d afterok:20:21?afterany:23
680 means that any of the conditions (afterok:20 OR afterok:21 OR
681 afterany:23) will be enough to release the job. Many jobs can
682 share the same dependency and these jobs may even belong to dif‐
683 ferent users. The value may be changed after job submission
684 using the scontrol command. Dependencies on remote jobs are al‐
685 lowed in a federation. Once a job dependency fails due to the
686 termination state of a preceding job, the dependent job will
687 never be run, even if the preceding job is requeued and has a
688 different termination state in a subsequent execution. This op‐
689 tion applies to job allocations.
690
691 after:job_id[[+time][:jobid[+time]...]]
692 After the specified jobs start or are cancelled and
693 'time' in minutes from job start or cancellation happens,
694 this job can begin execution. If no 'time' is given then
695 there is no delay after start or cancellation.
696
697 afterany:job_id[:jobid...]
698 This job can begin execution after the specified jobs
699 have terminated. This is the default dependency type.
700
701 afterburstbuffer:job_id[:jobid...]
702 This job can begin execution after the specified jobs
703 have terminated and any associated burst buffer stage out
704 operations have completed.
705
706 aftercorr:job_id[:jobid...]
707 A task of this job array can begin execution after the
708 corresponding task ID in the specified job has completed
709 successfully (ran to completion with an exit code of
710 zero).
711
712 afternotok:job_id[:jobid...]
713 This job can begin execution after the specified jobs
714 have terminated in some failed state (non-zero exit code,
715 node failure, timed out, etc).
716
717 afterok:job_id[:jobid...]
718 This job can begin execution after the specified jobs
719 have successfully executed (ran to completion with an
720 exit code of zero).
721
722 singleton
723 This job can begin execution after any previously
724 launched jobs sharing the same job name and user have
725 terminated. In other words, only one job by that name
726 and owned by that user can be running or suspended at any
727 point in time. In a federation, a singleton dependency
728 must be fulfilled on all clusters unless DependencyParam‐
729 eters=disable_remote_singleton is used in slurm.conf.
730
731 -X, --disable-status
732 Disable the display of task status when srun receives a single
733 SIGINT (Ctrl-C). Instead immediately forward the SIGINT to the
734 running job. Without this option a second Ctrl-C in one second
735 is required to forcibly terminate the job and srun will immedi‐
736 ately exit. May also be set via the environment variable
737 SLURM_DISABLE_STATUS. This option applies to job allocations.
738
739 -m, --distribution={*|block|cyclic|arbi‐
740 trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
741
742 Specify alternate distribution methods for remote processes.
743 For job allocation, this sets environment variables that will be
744 used by subsequent srun requests. Task distribution affects job
745 allocation at the last stage of the evaluation of available re‐
746 sources by the cons_res and cons_tres plugins. Consequently,
747 other options (e.g. --ntasks-per-node, --cpus-per-task) may af‐
748 fect resource selection prior to task distribution. To ensure a
749 specific task distribution jobs should have access to whole
750 nodes, for instance by using the --exclusive flag.
751
752 This option controls the distribution of tasks to the nodes on
753 which resources have been allocated, and the distribution of
754 those resources to tasks for binding (task affinity). The first
755 distribution method (before the first ":") controls the distri‐
756 bution of tasks to nodes. The second distribution method (after
757 the first ":") controls the distribution of allocated CPUs
758 across sockets for binding to tasks. The third distribution
759 method (after the second ":") controls the distribution of allo‐
760 cated CPUs across cores for binding to tasks. The second and
761 third distributions apply only if task affinity is enabled. The
762 third distribution is supported only if the task/cgroup plugin
763 is configured. The default value for each distribution type is
764 specified by *.
765
766 Note that with select/cons_res and select/cons_tres, the number
767 of CPUs allocated to each socket and node may be different. Re‐
768 fer to https://slurm.schedmd.com/mc_support.html for more infor‐
769 mation on resource allocation, distribution of tasks to nodes,
770 and binding of tasks to CPUs.
771 First distribution method (distribution of tasks across nodes):
772
773
774 * Use the default method for distributing tasks to nodes
775 (block).
776
777 block The block distribution method will distribute tasks to a
778 node such that consecutive tasks share a node. For exam‐
779 ple, consider an allocation of three nodes each with two
780 cpus. A four-task block distribution request will dis‐
781 tribute those tasks to the nodes with tasks one and two
782 on the first node, task three on the second node, and
783 task four on the third node. Block distribution is the
784 default behavior if the number of tasks exceeds the num‐
785 ber of allocated nodes.
786
787 cyclic The cyclic distribution method will distribute tasks to a
788 node such that consecutive tasks are distributed over
789 consecutive nodes (in a round-robin fashion). For exam‐
790 ple, consider an allocation of three nodes each with two
791 cpus. A four-task cyclic distribution request will dis‐
792 tribute those tasks to the nodes with tasks one and four
793 on the first node, task two on the second node, and task
794 three on the third node. Note that when SelectType is
795 select/cons_res, the same number of CPUs may not be allo‐
796 cated on each node. Task distribution will be round-robin
797 among all the nodes with CPUs yet to be assigned to
798 tasks. Cyclic distribution is the default behavior if
799 the number of tasks is no larger than the number of allo‐
800 cated nodes.
801
802 plane The tasks are distributed in blocks of size <size>. The
803 size must be given or SLURM_DIST_PLANESIZE must be set.
804 The number of tasks distributed to each node is the same
805 as for cyclic distribution, but the taskids assigned to
806 each node depend on the plane size. Additional distribu‐
807 tion specifications cannot be combined with this option.
808 For more details (including examples and diagrams),
809 please see https://slurm.schedmd.com/mc_support.html and
810 https://slurm.schedmd.com/dist_plane.html
811
812 arbitrary
813 The arbitrary method of distribution will allocate pro‐
814 cesses in-order as listed in file designated by the envi‐
815 ronment variable SLURM_HOSTFILE. If this variable is
816 listed it will over ride any other method specified. If
817 not set the method will default to block. Inside the
818 hostfile must contain at minimum the number of hosts re‐
819 quested and be one per line or comma separated. If spec‐
820 ifying a task count (-n, --ntasks=<number>), your tasks
821 will be laid out on the nodes in the order of the file.
822 NOTE: The arbitrary distribution option on a job alloca‐
823 tion only controls the nodes to be allocated to the job
824 and not the allocation of CPUs on those nodes. This op‐
825 tion is meant primarily to control a job step's task lay‐
826 out in an existing job allocation for the srun command.
827 NOTE: If the number of tasks is given and a list of re‐
828 quested nodes is also given, the number of nodes used
829 from that list will be reduced to match that of the num‐
830 ber of tasks if the number of nodes in the list is
831 greater than the number of tasks.
832
833 Second distribution method (distribution of CPUs across sockets
834 for binding):
835
836
837 * Use the default method for distributing CPUs across sock‐
838 ets (cyclic).
839
840 block The block distribution method will distribute allocated
841 CPUs consecutively from the same socket for binding to
842 tasks, before using the next consecutive socket.
843
844 cyclic The cyclic distribution method will distribute allocated
845 CPUs for binding to a given task consecutively from the
846 same socket, and from the next consecutive socket for the
847 next task, in a round-robin fashion across sockets.
848 Tasks requiring more than one CPU will have all of those
849 CPUs allocated on a single socket if possible.
850
851 fcyclic
852 The fcyclic distribution method will distribute allocated
853 CPUs for binding to tasks from consecutive sockets in a
854 round-robin fashion across the sockets. Tasks requiring
855 more than one CPU will have each CPUs allocated in a
856 cyclic fashion across sockets.
857
858 Third distribution method (distribution of CPUs across cores for
859 binding):
860
861
862 * Use the default method for distributing CPUs across cores
863 (inherited from second distribution method).
864
865 block The block distribution method will distribute allocated
866 CPUs consecutively from the same core for binding to
867 tasks, before using the next consecutive core.
868
869 cyclic The cyclic distribution method will distribute allocated
870 CPUs for binding to a given task consecutively from the
871 same core, and from the next consecutive core for the
872 next task, in a round-robin fashion across cores.
873
874 fcyclic
875 The fcyclic distribution method will distribute allocated
876 CPUs for binding to tasks from consecutive cores in a
877 round-robin fashion across the cores.
878
879 Optional control for task distribution over nodes:
880
881
882 Pack Rather than evenly distributing a job step's tasks evenly
883 across its allocated nodes, pack them as tightly as pos‐
884 sible on the nodes. This only applies when the "block"
885 task distribution method is used.
886
887 NoPack Rather than packing a job step's tasks as tightly as pos‐
888 sible on the nodes, distribute them evenly. This user
889 option will supersede the SelectTypeParameters
890 CR_Pack_Nodes configuration parameter.
891
892 This option applies to job and step allocations.
893
894 --epilog={none|<executable>}
895 srun will run executable just after the job step completes. The
896 command line arguments for executable will be the command and
897 arguments of the job step. If none is specified, then no srun
898 epilog will be run. This parameter overrides the SrunEpilog pa‐
899 rameter in slurm.conf. This parameter is completely independent
900 from the Epilog parameter in slurm.conf. This option applies to
901 job allocations.
902
903 -e, --error=<filename_pattern>
904 Specify how stderr is to be redirected. By default in interac‐
905 tive mode, srun redirects stderr to the same file as stdout, if
906 one is specified. The --error option is provided to allow stdout
907 and stderr to be redirected to different locations. See IO Re‐
908 direction below for more options. If the specified file already
909 exists, it will be overwritten. This option applies to job and
910 step allocations.
911
912 --exact
913 Allow a step access to only the resources requested for the
914 step. By default, all non-GRES resources on each node in the
915 step allocation will be used. This option only applies to step
916 allocations.
917 NOTE: Parallel steps will either be blocked or rejected until
918 requested step resources are available unless --overlap is spec‐
919 ified. Job resources can be held after the completion of an srun
920 command while Slurm does job cleanup. Step epilogs and/or SPANK
921 plugins can further delay the release of step resources.
922
923 -x, --exclude={<host1[,<host2>...]|<filename>}
924 Request that a specific list of hosts not be included in the re‐
925 sources allocated to this job. The host list will be assumed to
926 be a filename if it contains a "/" character. This option ap‐
927 plies to job and step allocations.
928
929 --exclusive[={user|mcs}]
930 This option applies to job and job step allocations, and has two
931 slightly different meanings for each one. When used to initiate
932 a job, the job allocation cannot share nodes with other running
933 jobs (or just other users with the "=user" option or "=mcs" op‐
934 tion). If user/mcs are not specified (i.e. the job allocation
935 can not share nodes with other running jobs), the job is allo‐
936 cated all CPUs and GRES on all nodes in the allocation, but is
937 only allocated as much memory as it requested. This is by design
938 to support gang scheduling, because suspended jobs still reside
939 in memory. To request all the memory on a node, use --mem=0.
940 The default shared/exclusive behavior depends on system configu‐
941 ration and the partition's OverSubscribe option takes precedence
942 over the job's option. NOTE: Since shared GRES (MPS) cannot be
943 allocated at the same time as a sharing GRES (GPU) this option
944 only allocates all sharing GRES and no underlying shared GRES.
945
946 This option can also be used when initiating more than one job
947 step within an existing resource allocation (default), where you
948 want separate processors to be dedicated to each job step. If
949 sufficient processors are not available to initiate the job
950 step, it will be deferred. This can be thought of as providing a
951 mechanism for resource management to the job within its alloca‐
952 tion (--exact implied).
953
954 The exclusive allocation of CPUs applies to job steps by de‐
955 fault, but --exact is NOT the default. In other words, the de‐
956 fault behavior is this: job steps will not share CPUs, but job
957 steps will be allocated all CPUs available to the job on all
958 nodes allocated to the steps.
959
960 In order to share the resources use the --overlap option.
961
962 See EXAMPLE below.
963
964 --export={[ALL,]<environment_variables>|ALL|NONE}
965 Identify which environment variables from the submission envi‐
966 ronment are propagated to the launched application.
967
968 --export=ALL
969 Default mode if --export is not specified. All of the
970 user's environment will be loaded from the caller's
971 environment.
972
973 --export=NONE
974 None of the user environment will be defined. User
975 must use absolute path to the binary to be executed
976 that will define the environment. User can not specify
977 explicit environment variables with "NONE".
978
979 This option is particularly important for jobs that
980 are submitted on one cluster and execute on a differ‐
981 ent cluster (e.g. with different paths). To avoid
982 steps inheriting environment export settings (e.g.
983 "NONE") from sbatch command, either set --export=ALL
984 or the environment variable SLURM_EXPORT_ENV should be
985 set to "ALL".
986
987 --export=[ALL,]<environment_variables>
988 Exports all SLURM* environment variables along with
989 explicitly defined variables. Multiple environment
990 variable names should be comma separated. Environment
991 variable names may be specified to propagate the cur‐
992 rent value (e.g. "--export=EDITOR") or specific values
993 may be exported (e.g. "--export=EDITOR=/bin/emacs").
994 If "ALL" is specified, then all user environment vari‐
995 ables will be loaded and will take precedence over any
996 explicitly given environment variables.
997
998 Example: --export=EDITOR,ARG1=test
999 In this example, the propagated environment will only
1000 contain the variable EDITOR from the user's environ‐
1001 ment, SLURM_* environment variables, and ARG1=test.
1002
1003 Example: --export=ALL,EDITOR=/bin/emacs
1004 There are two possible outcomes for this example. If
1005 the caller has the EDITOR environment variable de‐
1006 fined, then the job's environment will inherit the
1007 variable from the caller's environment. If the caller
1008 doesn't have an environment variable defined for EDI‐
1009 TOR, then the job's environment will use the value
1010 given by --export.
1011
1012 -B, --extra-node-info=<sockets>[:cores[:threads]]
1013 Restrict node selection to nodes with at least the specified
1014 number of sockets, cores per socket and/or threads per core.
1015 NOTE: These options do not specify the resource allocation size.
1016 Each value specified is considered a minimum. An asterisk (*)
1017 can be used as a placeholder indicating that all available re‐
1018 sources of that type are to be utilized. Values can also be
1019 specified as min-max. The individual levels can also be speci‐
1020 fied in separate options if desired:
1021
1022 --sockets-per-node=<sockets>
1023 --cores-per-socket=<cores>
1024 --threads-per-core=<threads>
1025 If task/affinity plugin is enabled, then specifying an alloca‐
1026 tion in this manner also sets a default --cpu-bind option of
1027 threads if the -B option specifies a thread count, otherwise an
1028 option of cores if a core count is specified, otherwise an op‐
1029 tion of sockets. If SelectType is configured to se‐
1030 lect/cons_res, it must have a parameter of CR_Core, CR_Core_Mem‐
1031 ory, CR_Socket, or CR_Socket_Memory for this option to be hon‐
1032 ored. If not specified, the scontrol show job will display
1033 'ReqS:C:T=*:*:*'. This option applies to job allocations.
1034 NOTE: This option is mutually exclusive with --hint,
1035 --threads-per-core and --ntasks-per-core.
1036 NOTE: If the number of sockets, cores and threads were all spec‐
1037 ified, the number of nodes was specified (as a fixed number, not
1038 a range) and the number of tasks was NOT specified, srun will
1039 implicitly calculate the number of tasks as one task per thread.
1040
1041 --gid=<group>
1042 If srun is run as root, and the --gid option is used, submit the
1043 job with group's group access permissions. group may be the
1044 group name or the numerical group ID. This option applies to job
1045 allocations.
1046
1047 --gpu-bind=[verbose,]<type>
1048 Bind tasks to specific GPUs. By default every spawned task can
1049 access every GPU allocated to the step. If "verbose," is speci‐
1050 fied before <type>, then print out GPU binding debug information
1051 to the stderr of the tasks. GPU binding is ignored if there is
1052 only one task.
1053
1054 Supported type options:
1055
1056 closest Bind each task to the GPU(s) which are closest. In a
1057 NUMA environment, each task may be bound to more than
1058 one GPU (i.e. all GPUs in that NUMA environment).
1059
1060 map_gpu:<list>
1061 Bind by setting GPU masks on tasks (or ranks) as spec‐
1062 ified where <list> is
1063 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
1064 are interpreted as decimal values. If the number of
1065 tasks (or ranks) exceeds the number of elements in
1066 this list, elements in the list will be reused as
1067 needed starting from the beginning of the list. To
1068 simplify support for large task counts, the lists may
1069 follow a map with an asterisk and repetition count.
1070 For example "map_gpu:0*4,1*4". If the task/cgroup
1071 plugin is used and ConstrainDevices is set in
1072 cgroup.conf, then the GPU IDs are zero-based indexes
1073 relative to the GPUs allocated to the job (e.g. the
1074 first GPU is 0, even if the global ID is 3). Other‐
1075 wise, the GPU IDs are global IDs, and all GPUs on each
1076 node in the job should be allocated for predictable
1077 binding results.
1078
1079 mask_gpu:<list>
1080 Bind by setting GPU masks on tasks (or ranks) as spec‐
1081 ified where <list> is
1082 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
1083 mapping is specified for a node and identical mapping
1084 is applied to the tasks on every node (i.e. the lowest
1085 task ID on each node is mapped to the first mask spec‐
1086 ified in the list, etc.). GPU masks are always inter‐
1087 preted as hexadecimal values but can be preceded with
1088 an optional '0x'. To simplify support for large task
1089 counts, the lists may follow a map with an asterisk
1090 and repetition count. For example
1091 "mask_gpu:0x0f*4,0xf0*4". If the task/cgroup plugin
1092 is used and ConstrainDevices is set in cgroup.conf,
1093 then the GPU IDs are zero-based indexes relative to
1094 the GPUs allocated to the job (e.g. the first GPU is
1095 0, even if the global ID is 3). Otherwise, the GPU IDs
1096 are global IDs, and all GPUs on each node in the job
1097 should be allocated for predictable binding results.
1098
1099 none Do not bind tasks to GPUs (turns off binding if
1100 --gpus-per-task is requested).
1101
1102 per_task:<gpus_per_task>
1103 Each task will be bound to the number of gpus speci‐
1104 fied in <gpus_per_task>. Gpus are assigned in order to
1105 tasks. The first task will be assigned the first x
1106 number of gpus on the node etc.
1107
1108 single:<tasks_per_gpu>
1109 Like --gpu-bind=closest, except that each task can
1110 only be bound to a single GPU, even when it can be
1111 bound to multiple GPUs that are equally close. The
1112 GPU to bind to is determined by <tasks_per_gpu>, where
1113 the first <tasks_per_gpu> tasks are bound to the first
1114 GPU available, the second <tasks_per_gpu> tasks are
1115 bound to the second GPU available, etc. This is basi‐
1116 cally a block distribution of tasks onto available
1117 GPUs, where the available GPUs are determined by the
1118 socket affinity of the task and the socket affinity of
1119 the GPUs as specified in gres.conf's Cores parameter.
1120
1121 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
1122 Request that GPUs allocated to the job are configured with spe‐
1123 cific frequency values. This option can be used to indepen‐
1124 dently configure the GPU and its memory frequencies. After the
1125 job is completed, the frequencies of all affected GPUs will be
1126 reset to the highest possible values. In some cases, system
1127 power caps may override the requested values. The field type
1128 can be "memory". If type is not specified, the GPU frequency is
1129 implied. The value field can either be "low", "medium", "high",
1130 "highm1" or a numeric value in megahertz (MHz). If the speci‐
1131 fied numeric value is not possible, a value as close as possible
1132 will be used. See below for definition of the values. The ver‐
1133 bose option causes current GPU frequency information to be
1134 logged. Examples of use include "--gpu-freq=medium,memory=high"
1135 and "--gpu-freq=450".
1136
1137 Supported value definitions:
1138
1139 low the lowest available frequency.
1140
1141 medium attempts to set a frequency in the middle of the
1142 available range.
1143
1144 high the highest available frequency.
1145
1146 highm1 (high minus one) will select the next highest avail‐
1147 able frequency.
1148
1149 -G, --gpus=[type:]<number>
1150 Specify the total number of GPUs required for the job. An op‐
1151 tional GPU type specification can be supplied. For example
1152 "--gpus=volta:3". Multiple options can be requested in a comma
1153 separated list, for example: "--gpus=volta:3,kepler:1". See
1154 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
1155 options.
1156 NOTE: The allocation has to contain at least one GPU per node.
1157
1158 --gpus-per-node=[type:]<number>
1159 Specify the number of GPUs required for the job on each node in‐
1160 cluded in the job's resource allocation. An optional GPU type
1161 specification can be supplied. For example
1162 "--gpus-per-node=volta:3". Multiple options can be requested in
1163 a comma separated list, for example:
1164 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
1165 --gpus-per-socket and --gpus-per-task options.
1166
1167 --gpus-per-socket=[type:]<number>
1168 Specify the number of GPUs required for the job on each socket
1169 included in the job's resource allocation. An optional GPU type
1170 specification can be supplied. For example
1171 "--gpus-per-socket=volta:3". Multiple options can be requested
1172 in a comma separated list, for example:
1173 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
1174 sockets per node count ( --sockets-per-node). See also the
1175 --gpus, --gpus-per-node and --gpus-per-task options. This op‐
1176 tion applies to job allocations.
1177
1178 --gpus-per-task=[type:]<number>
1179 Specify the number of GPUs required for the job on each task to
1180 be spawned in the job's resource allocation. An optional GPU
1181 type specification can be supplied. For example
1182 "--gpus-per-task=volta:1". Multiple options can be requested in
1183 a comma separated list, for example:
1184 "--gpus-per-task=volta:3,kepler:1". See also the --gpus,
1185 --gpus-per-socket and --gpus-per-node options. This option re‐
1186 quires an explicit task count, e.g. -n, --ntasks or "--gpus=X
1187 --gpus-per-task=Y" rather than an ambiguous range of nodes with
1188 -N, --nodes. This option will implicitly set
1189 --gpu-bind=per_task:<gpus_per_task>, but that can be overridden
1190 with an explicit --gpu-bind specification.
1191
1192 --gres=<list>
1193 Specifies a comma-delimited list of generic consumable re‐
1194 sources. The format of each entry on the list is
1195 "name[[:type]:count]". The name is that of the consumable re‐
1196 source. The count is the number of those resources with a de‐
1197 fault value of 1. The count can have a suffix of "k" or "K"
1198 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
1199 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
1200 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
1201 x 1024 x 1024 x 1024). The specified resources will be allo‐
1202 cated to the job on each node. The available generic consumable
1203 resources is configurable by the system administrator. A list
1204 of available generic consumable resources will be printed and
1205 the command will exit if the option argument is "help". Exam‐
1206 ples of use include "--gres=gpu:2", "--gres=gpu:kepler:2", and
1207 "--gres=help". NOTE: This option applies to job and step allo‐
1208 cations. By default, a job step is allocated all of the generic
1209 resources that have been requested by the job, except those im‐
1210 plicitly requested when a job is exclusive. To change the be‐
1211 havior so that each job step is allocated no generic resources,
1212 explicitly set the value of --gres to specify zero counts for
1213 each generic resource OR set "--gres=none" OR set the
1214 SLURM_STEP_GRES environment variable to "none".
1215
1216 --gres-flags=<type>
1217 Specify generic resource task binding options.
1218
1219 disable-binding
1220 Disable filtering of CPUs with respect to generic re‐
1221 source locality. This option is currently required to
1222 use more CPUs than are bound to a GRES (i.e. if a GPU is
1223 bound to the CPUs on one socket, but resources on more
1224 than one socket are required to run the job). This op‐
1225 tion may permit a job to be allocated resources sooner
1226 than otherwise possible, but may result in lower job per‐
1227 formance. This option applies to job allocations.
1228 NOTE: This option is specific to SelectType=cons_res.
1229
1230 enforce-binding
1231 The only CPUs available to the job/step will be those
1232 bound to the selected GRES (i.e. the CPUs identified in
1233 the gres.conf file will be strictly enforced). This op‐
1234 tion may result in delayed initiation of a job. For ex‐
1235 ample a job requiring two GPUs and one CPU will be de‐
1236 layed until both GPUs on a single socket are available
1237 rather than using GPUs bound to separate sockets, how‐
1238 ever, the application performance may be improved due to
1239 improved communication speed. Requires the node to be
1240 configured with more than one socket and resource filter‐
1241 ing will be performed on a per-socket basis. NOTE: Job
1242 steps that don't use --exact will not be affected.
1243 NOTE: This option is specific to SelectType=cons_tres for
1244 job allocations.
1245
1246 -h, --help
1247 Display help information and exit.
1248
1249 --het-group=<expr>
1250 Identify each component in a heterogeneous job allocation for
1251 which a step is to be created. Applies only to srun commands is‐
1252 sued inside a salloc allocation or sbatch script. <expr> is a
1253 set of integers corresponding to one or more options offsets on
1254 the salloc or sbatch command line. Examples: "--het-group=2",
1255 "--het-group=0,4", "--het-group=1,3-5". The default value is
1256 --het-group=0.
1257
1258 --hint=<type>
1259 Bind tasks according to application hints.
1260 NOTE: This option cannot be used in conjunction with any of
1261 --ntasks-per-core, --threads-per-core, --cpu-bind (other than
1262 --cpu-bind=verbose) or -B. If --hint is specified as a command
1263 line argument, it will take precedence over the environment.
1264
1265 compute_bound
1266 Select settings for compute bound applications: use all
1267 cores in each socket, one thread per core.
1268
1269 memory_bound
1270 Select settings for memory bound applications: use only
1271 one core in each socket, one thread per core.
1272
1273 [no]multithread
1274 [don't] use extra threads with in-core multi-threading
1275 which can benefit communication intensive applications.
1276 Only supported with the task/affinity plugin.
1277
1278 help show this help message
1279
1280 This option applies to job allocations.
1281
1282 -H, --hold
1283 Specify the job is to be submitted in a held state (priority of
1284 zero). A held job can now be released using scontrol to reset
1285 its priority (e.g. "scontrol release <job_id>"). This option ap‐
1286 plies to job allocations.
1287
1288 -I, --immediate[=<seconds>]
1289 exit if resources are not available within the time period spec‐
1290 ified. If no argument is given (seconds defaults to 1), re‐
1291 sources must be available immediately for the request to suc‐
1292 ceed. If defer is configured in SchedulerParameters and sec‐
1293 onds=1 the allocation request will fail immediately; defer con‐
1294 flicts and takes precedence over this option. By default, --im‐
1295 mediate is off, and the command will block until resources be‐
1296 come available. Since this option's argument is optional, for
1297 proper parsing the single letter option must be followed immedi‐
1298 ately with the value and not include a space between them. For
1299 example "-I60" and not "-I 60". This option applies to job and
1300 step allocations.
1301
1302 -i, --input=<mode>
1303 Specify how stdin is to be redirected. By default, srun redi‐
1304 rects stdin from the terminal to all tasks. See IO Redirection
1305 below for more options. For OS X, the poll() function does not
1306 support stdin, so input from a terminal is not possible. This
1307 option applies to job and step allocations.
1308
1309 -J, --job-name=<jobname>
1310 Specify a name for the job. The specified name will appear along
1311 with the job id number when querying running jobs on the system.
1312 The default is the supplied executable program's name. NOTE:
1313 This information may be written to the slurm_jobacct.log file.
1314 This file is space delimited so if a space is used in the job‐
1315 name name it will cause problems in properly displaying the con‐
1316 tents of the slurm_jobacct.log file when the sacct command is
1317 used. This option applies to job and step allocations.
1318
1319 --jobid=<jobid>
1320 Initiate a job step under an already allocated job with job id
1321 id. Using this option will cause srun to behave exactly as if
1322 the SLURM_JOB_ID environment variable was set. This option ap‐
1323 plies to step allocations.
1324
1325 -K, --kill-on-bad-exit[=0|1]
1326 Controls whether or not to terminate a step if any task exits
1327 with a non-zero exit code. If this option is not specified, the
1328 default action will be based upon the Slurm configuration param‐
1329 eter of KillOnBadExit. If this option is specified, it will take
1330 precedence over KillOnBadExit. An option argument of zero will
1331 not terminate the job. A non-zero argument or no argument will
1332 terminate the job. Note: This option takes precedence over the
1333 -W, --wait option to terminate the job immediately if a task ex‐
1334 its with a non-zero exit code. Since this option's argument is
1335 optional, for proper parsing the single letter option must be
1336 followed immediately with the value and not include a space be‐
1337 tween them. For example "-K1" and not "-K 1".
1338
1339 -l, --label
1340 Prepend task number to lines of stdout/err. The --label option
1341 will prepend lines of output with the remote task id. This op‐
1342 tion applies to step allocations.
1343
1344 -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
1345 Specification of licenses (or other resources available on all
1346 nodes of the cluster) which must be allocated to this job. Li‐
1347 cense names can be followed by a colon and count (the default
1348 count is one). Multiple license names should be comma separated
1349 (e.g. "--licenses=foo:4,bar"). This option applies to job allo‐
1350 cations.
1351
1352 NOTE: When submitting heterogeneous jobs, license requests only
1353 work correctly when made on the first component job. For exam‐
1354 ple "srun -L ansys:2 : myexecutable".
1355
1356 --mail-type=<type>
1357 Notify user by email when certain event types occur. Valid type
1358 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1359 BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
1360 VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1361 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1362 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
1363 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1364 time limit). Multiple type values may be specified in a comma
1365 separated list. The user to be notified is indicated with
1366 --mail-user. This option applies to job allocations.
1367
1368 --mail-user=<user>
1369 User to receive email notification of state changes as defined
1370 by --mail-type. The default value is the submitting user. This
1371 option applies to job allocations.
1372
1373 --mcs-label=<mcs>
1374 Used only when the mcs/group plugin is enabled. This parameter
1375 is a group among the groups of the user. Default value is cal‐
1376 culated by the Plugin mcs if it's enabled. This option applies
1377 to job allocations.
1378
1379 --mem=<size>[units]
1380 Specify the real memory required per node. Default units are
1381 megabytes. Different units can be specified using the suffix
1382 [K|M|G|T]. Default value is DefMemPerNode and the maximum value
1383 is MaxMemPerNode. If configured, both of parameters can be seen
1384 using the scontrol show config command. This parameter would
1385 generally be used if whole nodes are allocated to jobs (Select‐
1386 Type=select/linear). Specifying a memory limit of zero for a
1387 job step will restrict the job step to the amount of memory al‐
1388 located to the job, but not remove any of the job's memory allo‐
1389 cation from being available to other job steps. Also see
1390 --mem-per-cpu and --mem-per-gpu. The --mem, --mem-per-cpu and
1391 --mem-per-gpu options are mutually exclusive. If --mem,
1392 --mem-per-cpu or --mem-per-gpu are specified as command line ar‐
1393 guments, then they will take precedence over the environment
1394 (potentially inherited from salloc or sbatch).
1395
1396 NOTE: A memory size specification of zero is treated as a spe‐
1397 cial case and grants the job access to all of the memory on each
1398 node for newly submitted jobs and all available job memory to
1399 new job steps.
1400
1401 NOTE: Enforcement of memory limits currently relies upon the
1402 task/cgroup plugin or enabling of accounting, which samples mem‐
1403 ory use on a periodic basis (data need not be stored, just col‐
1404 lected). In both cases memory use is based upon the job's Resi‐
1405 dent Set Size (RSS). A task may exceed the memory limit until
1406 the next periodic accounting sample.
1407
1408 This option applies to job and step allocations.
1409
1410 --mem-bind=[{quiet|verbose},]<type>
1411 Bind tasks to memory. Used only when the task/affinity plugin is
1412 enabled and the NUMA memory functions are available. Note that
1413 the resolution of CPU and memory binding may differ on some ar‐
1414 chitectures. For example, CPU binding may be performed at the
1415 level of the cores within a processor while memory binding will
1416 be performed at the level of nodes, where the definition of
1417 "nodes" may differ from system to system. By default no memory
1418 binding is performed; any task using any CPU can use any memory.
1419 This option is typically used to ensure that each task is bound
1420 to the memory closest to its assigned CPU. The use of any type
1421 other than "none" or "local" is not recommended. If you want
1422 greater control, try running a simple test code with the options
1423 "--cpu-bind=verbose,none --mem-bind=verbose,none" to determine
1424 the specific configuration.
1425
1426 NOTE: To have Slurm always report on the selected memory binding
1427 for all commands executed in a shell, you can enable verbose
1428 mode by setting the SLURM_MEM_BIND environment variable value to
1429 "verbose".
1430
1431 The following informational environment variables are set when
1432 --mem-bind is in use:
1433
1434 SLURM_MEM_BIND_LIST
1435 SLURM_MEM_BIND_PREFER
1436 SLURM_MEM_BIND_SORT
1437 SLURM_MEM_BIND_TYPE
1438 SLURM_MEM_BIND_VERBOSE
1439
1440 See the ENVIRONMENT VARIABLES section for a more detailed de‐
1441 scription of the individual SLURM_MEM_BIND* variables.
1442
1443 Supported options include:
1444
1445 help show this help message
1446
1447 local Use memory local to the processor in use
1448
1449 map_mem:<list>
1450 Bind by setting memory masks on tasks (or ranks) as spec‐
1451 ified where <list> is
1452 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1453 ping is specified for a node and identical mapping is ap‐
1454 plied to the tasks on every node (i.e. the lowest task ID
1455 on each node is mapped to the first ID specified in the
1456 list, etc.). NUMA IDs are interpreted as decimal values
1457 unless they are preceded with '0x' in which case they in‐
1458 terpreted as hexadecimal values. If the number of tasks
1459 (or ranks) exceeds the number of elements in this list,
1460 elements in the list will be reused as needed starting
1461 from the beginning of the list. To simplify support for
1462 large task counts, the lists may follow a map with an as‐
1463 terisk and repetition count. For example
1464 "map_mem:0x0f*4,0xf0*4". For predictable binding re‐
1465 sults, all CPUs for each node in the job should be allo‐
1466 cated to the job.
1467
1468 mask_mem:<list>
1469 Bind by setting memory masks on tasks (or ranks) as spec‐
1470 ified where <list> is
1471 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1472 mapping is specified for a node and identical mapping is
1473 applied to the tasks on every node (i.e. the lowest task
1474 ID on each node is mapped to the first mask specified in
1475 the list, etc.). NUMA masks are always interpreted as
1476 hexadecimal values. Note that masks must be preceded
1477 with a '0x' if they don't begin with [0-9] so they are
1478 seen as numerical values. If the number of tasks (or
1479 ranks) exceeds the number of elements in this list, ele‐
1480 ments in the list will be reused as needed starting from
1481 the beginning of the list. To simplify support for large
1482 task counts, the lists may follow a mask with an asterisk
1483 and repetition count. For example "mask_mem:0*4,1*4".
1484 For predictable binding results, all CPUs for each node
1485 in the job should be allocated to the job.
1486
1487 no[ne] don't bind tasks to memory (default)
1488
1489 nosort avoid sorting free cache pages (default, LaunchParameters
1490 configuration parameter can override this default)
1491
1492 p[refer]
1493 Prefer use of first specified NUMA node, but permit
1494 use of other available NUMA nodes.
1495
1496 q[uiet]
1497 quietly bind before task runs (default)
1498
1499 rank bind by task rank (not recommended)
1500
1501 sort sort free cache pages (run zonesort on Intel KNL nodes)
1502
1503 v[erbose]
1504 verbosely report binding before task runs
1505
1506 This option applies to job and step allocations.
1507
1508 --mem-per-cpu=<size>[units]
1509 Minimum memory required per usable allocated CPU. Default units
1510 are megabytes. Different units can be specified using the suf‐
1511 fix [K|M|G|T]. The default value is DefMemPerCPU and the maxi‐
1512 mum value is MaxMemPerCPU (see exception below). If configured,
1513 both parameters can be seen using the scontrol show config com‐
1514 mand. Note that if the job's --mem-per-cpu value exceeds the
1515 configured MaxMemPerCPU, then the user's limit will be treated
1516 as a memory limit per task; --mem-per-cpu will be reduced to a
1517 value no larger than MaxMemPerCPU; --cpus-per-task will be set
1518 and the value of --cpus-per-task multiplied by the new
1519 --mem-per-cpu value will equal the original --mem-per-cpu value
1520 specified by the user. This parameter would generally be used
1521 if individual processors are allocated to jobs (SelectType=se‐
1522 lect/cons_res). If resources are allocated by core, socket, or
1523 whole nodes, then the number of CPUs allocated to a job may be
1524 higher than the task count and the value of --mem-per-cpu should
1525 be adjusted accordingly. Specifying a memory limit of zero for
1526 a job step will restrict the job step to the amount of memory
1527 allocated to the job, but not remove any of the job's memory al‐
1528 location from being available to other job steps. Also see
1529 --mem and --mem-per-gpu. The --mem, --mem-per-cpu and
1530 --mem-per-gpu options are mutually exclusive.
1531
1532 NOTE: If the final amount of memory requested by a job can't be
1533 satisfied by any of the nodes configured in the partition, the
1534 job will be rejected. This could happen if --mem-per-cpu is
1535 used with the --exclusive option for a job allocation and
1536 --mem-per-cpu times the number of CPUs on a node is greater than
1537 the total memory of that node.
1538
1539 NOTE: This applies to usable allocated CPUs in a job allocation.
1540 This is important when more than one thread per core is config‐
1541 ured. If a job requests --threads-per-core with fewer threads
1542 on a core than exist on the core (or --hint=nomultithread which
1543 implies --threads-per-core=1), the job will be unable to use
1544 those extra threads on the core and those threads will not be
1545 included in the memory per CPU calculation. But if the job has
1546 access to all threads on the core, those threads will be in‐
1547 cluded in the memory per CPU calculation even if the job did not
1548 explicitly request those threads.
1549
1550 In the following examples, each core has two threads.
1551
1552 In this first example, two tasks can run on separate hyper‐
1553 threads in the same core because --threads-per-core is not used.
1554 The third task uses both threads of the second core. The allo‐
1555 cated memory per cpu includes all threads:
1556
1557 $ salloc -n3 --mem-per-cpu=100
1558 salloc: Granted job allocation 17199
1559 $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
1560 JobID ReqTRES AllocTRES
1561 ------- ----------------------------------- -----------------------------------
1562 17199 billing=3,cpu=3,mem=300M,node=1 billing=4,cpu=4,mem=400M,node=1
1563
1564 In this second example, because of --threads-per-core=1, each
1565 task is allocated an entire core but is only able to use one
1566 thread per core. Allocated CPUs includes all threads on each
1567 core. However, allocated memory per cpu includes only the usable
1568 thread in each core.
1569
1570 $ salloc -n3 --mem-per-cpu=100 --threads-per-core=1
1571 salloc: Granted job allocation 17200
1572 $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
1573 JobID ReqTRES AllocTRES
1574 ------- ----------------------------------- -----------------------------------
1575 17200 billing=3,cpu=3,mem=300M,node=1 billing=6,cpu=6,mem=300M,node=1
1576
1577 --mem-per-gpu=<size>[units]
1578 Minimum memory required per allocated GPU. Default units are
1579 megabytes. Different units can be specified using the suffix
1580 [K|M|G|T]. Default value is DefMemPerGPU and is available on
1581 both a global and per partition basis. If configured, the pa‐
1582 rameters can be seen using the scontrol show config and scontrol
1583 show partition commands. Also see --mem. The --mem,
1584 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1585
1586 --mincpus=<n>
1587 Specify a minimum number of logical cpus/processors per node.
1588 This option applies to job allocations.
1589
1590 --mpi=<mpi_type>
1591 Identify the type of MPI to be used. May result in unique initi‐
1592 ation procedures.
1593
1594 cray_shasta
1595 To enable Cray PMI support. This is for applications
1596 built with the Cray Programming Environment. The PMI Con‐
1597 trol Port can be specified with the --resv-ports option
1598 or with the MpiParams=ports=<port range> parameter in
1599 your slurm.conf. This plugin does not have support for
1600 heterogeneous jobs. Support for cray_shasta is included
1601 by default.
1602
1603 list Lists available mpi types to choose from.
1604
1605 pmi2 To enable PMI2 support. The PMI2 support in Slurm works
1606 only if the MPI implementation supports it, in other
1607 words if the MPI has the PMI2 interface implemented. The
1608 --mpi=pmi2 will load the library lib/slurm/mpi_pmi2.so
1609 which provides the server side functionality but the
1610 client side must implement PMI2_Init() and the other in‐
1611 terface calls.
1612
1613 pmix To enable PMIx support (https://pmix.github.io). The PMIx
1614 support in Slurm can be used to launch parallel applica‐
1615 tions (e.g. MPI) if it supports PMIx, PMI2 or PMI1. Slurm
1616 must be configured with pmix support by passing
1617 "--with-pmix=<PMIx installation path>" option to its
1618 "./configure" script.
1619
1620 At the time of writing PMIx is supported in Open MPI
1621 starting from version 2.0. PMIx also supports backward
1622 compatibility with PMI1 and PMI2 and can be used if MPI
1623 was configured with PMI2/PMI1 support pointing to the
1624 PMIx library ("libpmix"). If MPI supports PMI1/PMI2 but
1625 doesn't provide the way to point to a specific implemen‐
1626 tation, a hack'ish solution leveraging LD_PRELOAD can be
1627 used to force "libpmix" usage.
1628
1629 none No special MPI processing. This is the default and works
1630 with many other versions of MPI.
1631
1632 This option applies to step allocations.
1633
1634 --msg-timeout=<seconds>
1635 Modify the job launch message timeout. The default value is
1636 MessageTimeout in the Slurm configuration file slurm.conf.
1637 Changes to this are typically not recommended, but could be use‐
1638 ful to diagnose problems. This option applies to job alloca‐
1639 tions.
1640
1641 --multi-prog
1642 Run a job with different programs and different arguments for
1643 each task. In this case, the executable program specified is ac‐
1644 tually a configuration file specifying the executable and argu‐
1645 ments for each task. See MULTIPLE PROGRAM CONFIGURATION below
1646 for details on the configuration file contents. This option ap‐
1647 plies to step allocations.
1648
1649 --network=<type>
1650 Specify information pertaining to the switch or network. The
1651 interpretation of type is system dependent. This option is sup‐
1652 ported when running Slurm on a Cray natively. It is used to re‐
1653 quest using Network Performance Counters. Only one value per
1654 request is valid. All options are case in-sensitive. In this
1655 configuration supported values include:
1656
1657
1658 system
1659 Use the system-wide network performance counters. Only
1660 nodes requested will be marked in use for the job alloca‐
1661 tion. If the job does not fill up the entire system the
1662 rest of the nodes are not able to be used by other jobs
1663 using NPC, if idle their state will appear as PerfCnts.
1664 These nodes are still available for other jobs not using
1665 NPC.
1666
1667 blade Use the blade network performance counters. Only nodes re‐
1668 quested will be marked in use for the job allocation. If
1669 the job does not fill up the entire blade(s) allocated to
1670 the job those blade(s) are not able to be used by other
1671 jobs using NPC, if idle their state will appear as PerfC‐
1672 nts. These nodes are still available for other jobs not
1673 using NPC.
1674
1675 In all cases the job allocation request must specify the --ex‐
1676 clusive option and the step cannot specify the --overlap option.
1677 Otherwise the request will be denied.
1678
1679 Also with any of these options steps are not allowed to share
1680 blades, so resources would remain idle inside an allocation if
1681 the step running on a blade does not take up all the nodes on
1682 the blade.
1683
1684 The network option is also available on systems with HPE Sling‐
1685 shot networks. It can be used to override the default network
1686 resources allocated for the job step. Multiple values may be
1687 specified in a comma-separated list.
1688
1689 def_<rsrc>=<val>
1690 Per-CPU reserved allocation for this resource.
1691
1692 res_<rsrc>=<val>
1693 Per-node reserved allocation for this resource. If
1694 set, overrides the per-CPU allocation.
1695
1696 max_<rsrc>=<val>
1697 Maximum per-node limit for this resource.
1698
1699 depth=<depth>
1700 Multiplier for per-CPU resource allocation. Default
1701 is the number of reserved CPUs on the node.
1702
1703 The resources that may be requested are:
1704
1705 txqs Transmit command queues. The default is 3 per-CPU,
1706 maximum 1024 per-node.
1707
1708 tgqs Target command queues. The default is 2 per-CPU, max‐
1709 imum 512 per-node.
1710
1711 eqs Event queues. The default is 8 per-CPU, maximum 2048
1712 per-node.
1713
1714 cts Counters. The default is 2 per-CPU, maximum 2048 per-
1715 node.
1716
1717 tles Trigger list entries. The default is 1 per-CPU, maxi‐
1718 mum 2048 per-node.
1719
1720 ptes Portable table entries. The default is 8 per-CPU,
1721 maximum 2048 per-node.
1722
1723 les List entries. The default is 134 per-CPU, maximum
1724 65535 per-node.
1725
1726 acs Addressing contexts. The default is 4 per-CPU, maxi‐
1727 mum 1024 per-node.
1728
1729 This option applies to job and step allocations.
1730
1731 --nice[=adjustment]
1732 Run the job with an adjusted scheduling priority within Slurm.
1733 With no adjustment value the scheduling priority is decreased by
1734 100. A negative nice value increases the priority, otherwise de‐
1735 creases it. The adjustment range is +/- 2147483645. Only privi‐
1736 leged users can specify a negative adjustment.
1737
1738 -Z, --no-allocate
1739 Run the specified tasks on a set of nodes without creating a
1740 Slurm "job" in the Slurm queue structure, bypassing the normal
1741 resource allocation step. The list of nodes must be specified
1742 with the -w, --nodelist option. This is a privileged option
1743 only available for the users "SlurmUser" and "root". This option
1744 applies to job allocations.
1745
1746 -k, --no-kill[=off]
1747 Do not automatically terminate a job if one of the nodes it has
1748 been allocated fails. This option applies to job and step allo‐
1749 cations. The job will assume all responsibilities for
1750 fault-tolerance. Tasks launched using this option will not be
1751 considered terminated (e.g. -K, --kill-on-bad-exit and -W,
1752 --wait options will have no effect upon the job step). The ac‐
1753 tive job step (MPI job) will likely suffer a fatal error, but
1754 subsequent job steps may be run if this option is specified.
1755
1756 Specify an optional argument of "off" disable the effect of the
1757 SLURM_NO_KILL environment variable.
1758
1759 The default action is to terminate the job upon node failure.
1760
1761 -F, --nodefile=<node_file>
1762 Much like --nodelist, but the list is contained in a file of
1763 name node file. The node names of the list may also span multi‐
1764 ple lines in the file. Duplicate node names in the file will
1765 be ignored. The order of the node names in the list is not im‐
1766 portant; the node names will be sorted by Slurm.
1767
1768 -w, --nodelist={<node_name_list>|<filename>}
1769 Request a specific list of hosts. The job will contain all of
1770 these hosts and possibly additional hosts as needed to satisfy
1771 resource requirements. The list may be specified as a
1772 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1773 for example), or a filename. The host list will be assumed to
1774 be a filename if it contains a "/" character. If you specify a
1775 minimum node or processor count larger than can be satisfied by
1776 the supplied host list, additional resources will be allocated
1777 on other nodes as needed. Rather than repeating a host name
1778 multiple times, an asterisk and a repetition count may be ap‐
1779 pended to a host name. For example "host1,host1" and "host1*2"
1780 are equivalent. If the number of tasks is given and a list of
1781 requested nodes is also given, the number of nodes used from
1782 that list will be reduced to match that of the number of tasks
1783 if the number of nodes in the list is greater than the number of
1784 tasks. This option applies to job and step allocations.
1785
1786 -N, --nodes=<minnodes>[-maxnodes]
1787 Request that a minimum of minnodes nodes be allocated to this
1788 job. A maximum node count may also be specified with maxnodes.
1789 If only one number is specified, this is used as both the mini‐
1790 mum and maximum node count. The partition's node limits super‐
1791 sede those of the job. If a job's node limits are outside of
1792 the range permitted for its associated partition, the job will
1793 be left in a PENDING state. This permits possible execution at
1794 a later time, when the partition limit is changed. If a job
1795 node limit exceeds the number of nodes configured in the parti‐
1796 tion, the job will be rejected. Note that the environment vari‐
1797 able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1798 ibility) will be set to the count of nodes actually allocated to
1799 the job. See the ENVIRONMENT VARIABLES section for more informa‐
1800 tion. If -N is not specified, the default behavior is to allo‐
1801 cate enough nodes to satisfy the requested resources as ex‐
1802 pressed by per-job specification options, e.g. -n, -c and
1803 --gpus. The job will be allocated as many nodes as possible
1804 within the range specified and without delaying the initiation
1805 of the job. If the number of tasks is given and a number of re‐
1806 quested nodes is also given, the number of nodes used from that
1807 request will be reduced to match that of the number of tasks if
1808 the number of nodes in the request is greater than the number of
1809 tasks. The node count specification may include a numeric value
1810 followed by a suffix of "k" (multiplies numeric value by 1,024)
1811 or "m" (multiplies numeric value by 1,048,576). This option ap‐
1812 plies to job and step allocations.
1813
1814 -n, --ntasks=<number>
1815 Specify the number of tasks to run. Request that srun allocate
1816 resources for ntasks tasks. The default is one task per node,
1817 but note that the --cpus-per-task option will change this de‐
1818 fault. This option applies to job and step allocations.
1819
1820 --ntasks-per-core=<ntasks>
1821 Request the maximum ntasks be invoked on each core. This option
1822 applies to the job allocation, but not to step allocations.
1823 Meant to be used with the --ntasks option. Related to
1824 --ntasks-per-node except at the core level instead of the node
1825 level. Masks will automatically be generated to bind the tasks
1826 to specific cores unless --cpu-bind=none is specified. NOTE:
1827 This option is not supported when using SelectType=select/lin‐
1828 ear.
1829
1830 --ntasks-per-gpu=<ntasks>
1831 Request that there are ntasks tasks invoked for every GPU. This
1832 option can work in two ways: 1) either specify --ntasks in addi‐
1833 tion, in which case a type-less GPU specification will be auto‐
1834 matically determined to satisfy --ntasks-per-gpu, or 2) specify
1835 the GPUs wanted (e.g. via --gpus or --gres) without specifying
1836 --ntasks, and the total task count will be automatically deter‐
1837 mined. The number of CPUs needed will be automatically in‐
1838 creased if necessary to allow for any calculated task count.
1839 This option will implicitly set --gpu-bind=single:<ntasks>, but
1840 that can be overridden with an explicit --gpu-bind specifica‐
1841 tion. This option is not compatible with a node range (i.e.
1842 -N<minnodes-maxnodes>). This option is not compatible with
1843 --gpus-per-task, --gpus-per-socket, or --ntasks-per-node. This
1844 option is not supported unless SelectType=cons_tres is config‐
1845 ured (either directly or indirectly on Cray systems).
1846
1847 --ntasks-per-node=<ntasks>
1848 Request that ntasks be invoked on each node. If used with the
1849 --ntasks option, the --ntasks option will take precedence and
1850 the --ntasks-per-node will be treated as a maximum count of
1851 tasks per node. Meant to be used with the --nodes option. This
1852 is related to --cpus-per-task=ncpus, but does not require knowl‐
1853 edge of the actual number of cpus on each node. In some cases,
1854 it is more convenient to be able to request that no more than a
1855 specific number of tasks be invoked on each node. Examples of
1856 this include submitting a hybrid MPI/OpenMP app where only one
1857 MPI "task/rank" should be assigned to each node while allowing
1858 the OpenMP portion to utilize all of the parallelism present in
1859 the node, or submitting a single setup/cleanup/monitoring job to
1860 each node of a pre-existing allocation as one step in a larger
1861 job script. This option applies to job allocations.
1862
1863 --ntasks-per-socket=<ntasks>
1864 Request the maximum ntasks be invoked on each socket. This op‐
1865 tion applies to the job allocation, but not to step allocations.
1866 Meant to be used with the --ntasks option. Related to
1867 --ntasks-per-node except at the socket level instead of the node
1868 level. Masks will automatically be generated to bind the tasks
1869 to specific sockets unless --cpu-bind=none is specified. NOTE:
1870 This option is not supported when using SelectType=select/lin‐
1871 ear.
1872
1873 --open-mode={append|truncate}
1874 Open the output and error files using append or truncate mode as
1875 specified. For heterogeneous job steps the default value is
1876 "append". Otherwise the default value is specified by the sys‐
1877 tem configuration parameter JobFileAppend. This option applies
1878 to job and step allocations.
1879
1880 -o, --output=<filename_pattern>
1881 Specify the "filename pattern" for stdout redirection. By de‐
1882 fault in interactive mode, srun collects stdout from all tasks
1883 and sends this output via TCP/IP to the attached terminal. With
1884 --output stdout may be redirected to a file, to one file per
1885 task, or to /dev/null. See section IO Redirection below for the
1886 various forms of filename pattern. If the specified file al‐
1887 ready exists, it will be overwritten.
1888
1889 If --error is not also specified on the command line, both std‐
1890 out and stderr will directed to the file specified by --output.
1891 This option applies to job and step allocations.
1892
1893 -O, --overcommit
1894 Overcommit resources. This option applies to job and step allo‐
1895 cations.
1896
1897 When applied to a job allocation (not including jobs requesting
1898 exclusive access to the nodes) the resources are allocated as if
1899 only one task per node is requested. This means that the re‐
1900 quested number of cpus per task (-c, --cpus-per-task) are allo‐
1901 cated per node rather than being multiplied by the number of
1902 tasks. Options used to specify the number of tasks per node,
1903 socket, core, etc. are ignored.
1904
1905 When applied to job step allocations (the srun command when exe‐
1906 cuted within an existing job allocation), this option can be
1907 used to launch more than one task per CPU. Normally, srun will
1908 not allocate more than one process per CPU. By specifying
1909 --overcommit you are explicitly allowing more than one process
1910 per CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1911 mitted to execute per node. NOTE: MAX_TASKS_PER_NODE is defined
1912 in the file slurm.h and is not a variable, it is set at Slurm
1913 build time.
1914
1915 --overlap
1916 Specifying --overlap allows steps to share all resources (CPUs,
1917 memory, and GRES) with all other steps. A step using this option
1918 will overlap all other steps, even those that did not specify
1919 --overlap.
1920
1921 By default steps do not share resources with other parallel
1922 steps. This option applies to step allocations.
1923
1924 -s, --oversubscribe
1925 The job allocation can over-subscribe resources with other run‐
1926 ning jobs. The resources to be over-subscribed can be nodes,
1927 sockets, cores, and/or hyperthreads depending upon configura‐
1928 tion. The default over-subscribe behavior depends on system
1929 configuration and the partition's OverSubscribe option takes
1930 precedence over the job's option. This option may result in the
1931 allocation being granted sooner than if the --oversubscribe op‐
1932 tion was not set and allow higher system utilization, but appli‐
1933 cation performance will likely suffer due to competition for re‐
1934 sources. This option applies to job allocations.
1935
1936 -p, --partition=<partition_names>
1937 Request a specific partition for the resource allocation. If
1938 not specified, the default behavior is to allow the slurm con‐
1939 troller to select the default partition as designated by the
1940 system administrator. If the job can use more than one parti‐
1941 tion, specify their names in a comma separate list and the one
1942 offering earliest initiation will be used with no regard given
1943 to the partition name ordering (although higher priority parti‐
1944 tions will be considered first). When the job is initiated, the
1945 name of the partition used will be placed first in the job
1946 record partition string. This option applies to job allocations.
1947
1948 --power=<flags>
1949 Comma separated list of power management plugin options. Cur‐
1950 rently available flags include: level (all nodes allocated to
1951 the job should have identical power caps, may be disabled by the
1952 Slurm configuration option PowerParameters=job_no_level). This
1953 option applies to job allocations.
1954
1955 --prefer=<list>
1956 Nodes can have features assigned to them by the Slurm adminis‐
1957 trator. Users can specify which of these features are desired
1958 but not required by their job using the prefer option. This op‐
1959 tion operates independently from --constraint and will override
1960 whatever is set there if possible. When scheduling the features
1961 in --prefer are tried first if a node set isn't available with
1962 those features then --constraint is attempted. See --constraint
1963 for more information, this option behaves the same way.
1964
1965
1966 -E, --preserve-env
1967 Pass the current values of environment variables
1968 SLURM_JOB_NUM_NODES and SLURM_NTASKS through to the executable,
1969 rather than computing them from command line parameters. This
1970 option applies to job allocations.
1971
1972 --priority=<value>
1973 Request a specific job priority. May be subject to configura‐
1974 tion specific constraints. value should either be a numeric
1975 value or "TOP" (for highest possible value). Only Slurm opera‐
1976 tors and administrators can set the priority of a job. This op‐
1977 tion applies to job allocations only.
1978
1979 --profile={all|none|<type>[,<type>...]}
1980 Enables detailed data collection by the acct_gather_profile
1981 plugin. Detailed data are typically time-series that are stored
1982 in an HDF5 file for the job or an InfluxDB database depending on
1983 the configured plugin. This option applies to job and step al‐
1984 locations.
1985
1986 All All data types are collected. (Cannot be combined with
1987 other values.)
1988
1989 None No data types are collected. This is the default.
1990 (Cannot be combined with other values.)
1991
1992 Valid type values are:
1993
1994 Energy Energy data is collected.
1995
1996 Task Task (I/O, Memory, ...) data is collected.
1997
1998 Filesystem
1999 Filesystem data is collected.
2000
2001 Network
2002 Network (InfiniBand) data is collected.
2003
2004 --prolog=<executable>
2005 srun will run executable just before launching the job step.
2006 The command line arguments for executable will be the command
2007 and arguments of the job step. If executable is "none", then no
2008 srun prolog will be run. This parameter overrides the SrunProlog
2009 parameter in slurm.conf. This parameter is completely indepen‐
2010 dent from the Prolog parameter in slurm.conf. This option ap‐
2011 plies to job allocations.
2012
2013 --propagate[=rlimit[,rlimit...]]
2014 Allows users to specify which of the modifiable (soft) resource
2015 limits to propagate to the compute nodes and apply to their
2016 jobs. If no rlimit is specified, then all resource limits will
2017 be propagated. The following rlimit names are supported by
2018 Slurm (although some options may not be supported on some sys‐
2019 tems):
2020
2021 ALL All limits listed below (default)
2022
2023 NONE No limits listed below
2024
2025 AS The maximum address space (virtual memory) for a
2026 process.
2027
2028 CORE The maximum size of core file
2029
2030 CPU The maximum amount of CPU time
2031
2032 DATA The maximum size of a process's data segment
2033
2034 FSIZE The maximum size of files created. Note that if the
2035 user sets FSIZE to less than the current size of the
2036 slurmd.log, job launches will fail with a 'File size
2037 limit exceeded' error.
2038
2039 MEMLOCK The maximum size that may be locked into memory
2040
2041 NOFILE The maximum number of open files
2042
2043 NPROC The maximum number of processes available
2044
2045 RSS The maximum resident set size. Note that this only has
2046 effect with Linux kernels 2.4.30 or older or BSD.
2047
2048 STACK The maximum stack size
2049
2050 This option applies to job allocations.
2051
2052 --pty Execute task zero in pseudo terminal mode. Implicitly sets
2053 --unbuffered. Implicitly sets --error and --output to /dev/null
2054 for all tasks except task zero, which may cause those tasks to
2055 exit immediately (e.g. shells will typically exit immediately in
2056 that situation). This option applies to step allocations.
2057
2058 -q, --qos=<qos>
2059 Request a quality of service for the job. QOS values can be de‐
2060 fined for each user/cluster/account association in the Slurm
2061 database. Users will be limited to their association's defined
2062 set of qos's when the Slurm configuration parameter, Account‐
2063 ingStorageEnforce, includes "qos" in its definition. This option
2064 applies to job allocations.
2065
2066 -Q, --quiet
2067 Suppress informational messages from srun. Errors will still be
2068 displayed. This option applies to job and step allocations.
2069
2070 --quit-on-interrupt
2071 Quit immediately on single SIGINT (Ctrl-C). Use of this option
2072 disables the status feature normally available when srun re‐
2073 ceives a single Ctrl-C and causes srun to instead immediately
2074 terminate the running job. This option applies to step alloca‐
2075 tions.
2076
2077 --reboot
2078 Force the allocated nodes to reboot before starting the job.
2079 This is only supported with some system configurations and will
2080 otherwise be silently ignored. Only root, SlurmUser or admins
2081 can reboot nodes. This option applies to job allocations.
2082
2083 -r, --relative=<n>
2084 Run a job step relative to node n of the current allocation.
2085 This option may be used to spread several job steps out among
2086 the nodes of the current job. If -r is used, the current job
2087 step will begin at node n of the allocated nodelist, where the
2088 first node is considered node 0. The -r option is not permitted
2089 with -w or -x option and will result in a fatal error when not
2090 running within a prior allocation (i.e. when SLURM_JOB_ID is not
2091 set). The default for n is 0. If the value of --nodes exceeds
2092 the number of nodes identified with the --relative option, a
2093 warning message will be printed and the --relative option will
2094 take precedence. This option applies to step allocations.
2095
2096 --reservation=<reservation_names>
2097 Allocate resources for the job from the named reservation. If
2098 the job can use more than one reservation, specify their names
2099 in a comma separate list and the one offering earliest initia‐
2100 tion. Each reservation will be considered in the order it was
2101 requested. All reservations will be listed in scontrol/squeue
2102 through the life of the job. In accounting the first reserva‐
2103 tion will be seen and after the job starts the reservation used
2104 will replace it.
2105
2106 --resv-ports[=count]
2107 Reserve communication ports for this job. Users can specify the
2108 number of port they want to reserve. The parameter Mpi‐
2109 Params=ports=12000-12999 must be specified in slurm.conf. If the
2110 number of reserved ports is zero then no ports are reserved.
2111 Used for native Cray's PMI only. This option applies to job and
2112 step allocations.
2113
2114 --send-libs[=yes|no]
2115 If set to yes (or no argument), autodetect and broadcast the ex‐
2116 ecutable's shared object dependencies to allocated compute
2117 nodes. The files are placed in a directory alongside the exe‐
2118 cutable. The LD_LIBRARY_PATH is automatically updated to include
2119 this cache directory as well. This overrides the default behav‐
2120 ior configured in slurm.conf SbcastParameters send_libs. This
2121 option only works in conjunction with --bcast. See also
2122 --bcast-exclude.
2123
2124 --signal=[R:]<sig_num>[@sig_time]
2125 When a job is within sig_time seconds of its end time, send it
2126 the signal sig_num. Due to the resolution of event handling by
2127 Slurm, the signal may be sent up to 60 seconds earlier than
2128 specified. sig_num may either be a signal number or name (e.g.
2129 "10" or "USR1"). sig_time must have an integer value between 0
2130 and 65535. By default, no signal is sent before the job's end
2131 time. If a sig_num is specified without any sig_time, the de‐
2132 fault time will be 60 seconds. This option applies to job allo‐
2133 cations. Use the "R:" option to allow this job to overlap with
2134 a reservation with MaxStartDelay set. To have the signal sent
2135 at preemption time see the preempt_send_user_signal SlurmctldPa‐
2136 rameter.
2137
2138 --slurmd-debug=<level>
2139 Specify a debug level for slurmd(8). The level may be specified
2140 either an integer value between 0 [quiet, only errors are dis‐
2141 played] and 4 [verbose operation] or the SlurmdDebug tags.
2142
2143 quiet Log nothing
2144
2145 fatal Log only fatal errors
2146
2147 error Log only errors
2148
2149 info Log errors and general informational messages
2150
2151 verbose Log errors and verbose informational messages
2152
2153 The slurmd debug information is copied onto the stderr of the
2154 job. By default only errors are displayed. This option applies
2155 to job and step allocations.
2156
2157 --sockets-per-node=<sockets>
2158 Restrict node selection to nodes with at least the specified
2159 number of sockets. See additional information under -B option
2160 above when task/affinity plugin is enabled. This option applies
2161 to job allocations.
2162 NOTE: This option may implicitly impact the number of tasks if
2163 -n was not specified.
2164
2165 --spread-job
2166 Spread the job allocation over as many nodes as possible and at‐
2167 tempt to evenly distribute tasks across the allocated nodes.
2168 This option disables the topology/tree plugin. This option ap‐
2169 plies to job allocations.
2170
2171 --switches=<count>[@max-time]
2172 When a tree topology is used, this defines the maximum count of
2173 leaf switches desired for the job allocation and optionally the
2174 maximum time to wait for that number of switches. If Slurm finds
2175 an allocation containing more switches than the count specified,
2176 the job remains pending until it either finds an allocation with
2177 desired switch count or the time limit expires. It there is no
2178 switch count limit, there is no delay in starting the job. Ac‐
2179 ceptable time formats include "minutes", "minutes:seconds",
2180 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
2181 "days-hours:minutes:seconds". The job's maximum time delay may
2182 be limited by the system administrator using the SchedulerParam‐
2183 eters configuration parameter with the max_switch_wait parameter
2184 option. On a dragonfly network the only switch count supported
2185 is 1 since communication performance will be highest when a job
2186 is allocate resources on one leaf switch or more than 2 leaf
2187 switches. The default max-time is the max_switch_wait Sched‐
2188 ulerParameters. This option applies to job allocations.
2189
2190 --task-epilog=<executable>
2191 The slurmstepd daemon will run executable just after each task
2192 terminates. This will be executed before any TaskEpilog parame‐
2193 ter in slurm.conf is executed. This is meant to be a very
2194 short-lived program. If it fails to terminate within a few sec‐
2195 onds, it will be killed along with any descendant processes.
2196 This option applies to step allocations.
2197
2198 --task-prolog=<executable>
2199 The slurmstepd daemon will run executable just before launching
2200 each task. This will be executed after any TaskProlog parameter
2201 in slurm.conf is executed. Besides the normal environment vari‐
2202 ables, this has SLURM_TASK_PID available to identify the process
2203 ID of the task being started. Standard output from this program
2204 of the form "export NAME=value" will be used to set environment
2205 variables for the task being spawned. This option applies to
2206 step allocations.
2207
2208 --test-only
2209 Returns an estimate of when a job would be scheduled to run
2210 given the current job queue and all the other srun arguments
2211 specifying the job. This limits srun's behavior to just return
2212 information; no job is actually submitted. The program will be
2213 executed directly by the slurmd daemon. This option applies to
2214 job allocations.
2215
2216 --thread-spec=<num>
2217 Count of specialized threads per node reserved by the job for
2218 system operations and not used by the application. The applica‐
2219 tion will not use these threads, but will be charged for their
2220 allocation. This option can not be used with the --core-spec
2221 option. This option applies to job allocations.
2222
2223 NOTE: Explicitly setting a job's specialized thread value im‐
2224 plicitly sets its --exclusive option, reserving entire nodes for
2225 the job.
2226
2227 -T, --threads=<nthreads>
2228 Allows limiting the number of concurrent threads used to send
2229 the job request from the srun process to the slurmd processes on
2230 the allocated nodes. Default is to use one thread per allocated
2231 node up to a maximum of 60 concurrent threads. Specifying this
2232 option limits the number of concurrent threads to nthreads (less
2233 than or equal to 60). This should only be used to set a low
2234 thread count for testing on very small memory computers. This
2235 option applies to job allocations.
2236
2237 --threads-per-core=<threads>
2238 Restrict node selection to nodes with at least the specified
2239 number of threads per core. In task layout, use the specified
2240 maximum number of threads per core. Implies --cpu-bind=threads
2241 unless overridden by command line or environment options. NOTE:
2242 "Threads" refers to the number of processing units on each core
2243 rather than the number of application tasks to be launched per
2244 core. See additional information under -B option above when
2245 task/affinity plugin is enabled. This option applies to job and
2246 step allocations.
2247 NOTE: This option may implicitly impact the number of tasks if
2248 -n was not specified.
2249
2250 -t, --time=<time>
2251 Set a limit on the total run time of the job allocation. If the
2252 requested time limit exceeds the partition's time limit, the job
2253 will be left in a PENDING state (possibly indefinitely). The
2254 default time limit is the partition's default time limit. When
2255 the time limit is reached, each task in each job step is sent
2256 SIGTERM followed by SIGKILL. The interval between signals is
2257 specified by the Slurm configuration parameter KillWait. The
2258 OverTimeLimit configuration parameter may permit the job to run
2259 longer than scheduled. Time resolution is one minute and second
2260 values are rounded up to the next minute.
2261
2262 A time limit of zero requests that no time limit be imposed.
2263 Acceptable time formats include "minutes", "minutes:seconds",
2264 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
2265 "days-hours:minutes:seconds". This option applies to job and
2266 step allocations.
2267
2268 --time-min=<time>
2269 Set a minimum time limit on the job allocation. If specified,
2270 the job may have its --time limit lowered to a value no lower
2271 than --time-min if doing so permits the job to begin execution
2272 earlier than otherwise possible. The job's time limit will not
2273 be changed after the job is allocated resources. This is per‐
2274 formed by a backfill scheduling algorithm to allocate resources
2275 otherwise reserved for higher priority jobs. Acceptable time
2276 formats include "minutes", "minutes:seconds", "hours:min‐
2277 utes:seconds", "days-hours", "days-hours:minutes" and
2278 "days-hours:minutes:seconds". This option applies to job alloca‐
2279 tions.
2280
2281 --tmp=<size>[units]
2282 Specify a minimum amount of temporary disk space per node. De‐
2283 fault units are megabytes. Different units can be specified us‐
2284 ing the suffix [K|M|G|T]. This option applies to job alloca‐
2285 tions.
2286
2287 --uid=<user>
2288 Attempt to submit and/or run a job as user instead of the invok‐
2289 ing user id. The invoking user's credentials will be used to
2290 check access permissions for the target partition. User root may
2291 use this option to run jobs as a normal user in a RootOnly par‐
2292 tition for example. If run as root, srun will drop its permis‐
2293 sions to the uid specified after node allocation is successful.
2294 user may be the user name or numerical user ID. This option ap‐
2295 plies to job and step allocations.
2296
2297 -u, --unbuffered
2298 By default, the connection between slurmstepd and the
2299 user-launched application is over a pipe. The stdio output writ‐
2300 ten by the application is buffered by the glibc until it is
2301 flushed or the output is set as unbuffered. See setbuf(3). If
2302 this option is specified the tasks are executed with a pseudo
2303 terminal so that the application output is unbuffered. This op‐
2304 tion applies to step allocations.
2305
2306 --usage
2307 Display brief help message and exit.
2308
2309 --use-min-nodes
2310 If a range of node counts is given, prefer the smaller count.
2311
2312 -v, --verbose
2313 Increase the verbosity of srun's informational messages. Multi‐
2314 ple -v's will further increase srun's verbosity. By default
2315 only errors will be displayed. This option applies to job and
2316 step allocations.
2317
2318 -V, --version
2319 Display version information and exit.
2320
2321 -W, --wait=<seconds>
2322 Specify how long to wait after the first task terminates before
2323 terminating all remaining tasks. A value of 0 indicates an un‐
2324 limited wait (a warning will be issued after 60 seconds). The
2325 default value is set by the WaitTime parameter in the slurm con‐
2326 figuration file (see slurm.conf(5)). This option can be useful
2327 to ensure that a job is terminated in a timely fashion in the
2328 event that one or more tasks terminate prematurely. Note: The
2329 -K, --kill-on-bad-exit option takes precedence over -W, --wait
2330 to terminate the job immediately if a task exits with a non-zero
2331 exit code. This option applies to job allocations.
2332
2333 --wckey=<wckey>
2334 Specify wckey to be used with job. If TrackWCKey=no (default)
2335 in the slurm.conf this value is ignored. This option applies to
2336 job allocations.
2337
2338 --x11[={all|first|last}]
2339 Sets up X11 forwarding on "all", "first" or "last" node(s) of
2340 the allocation. This option is only enabled if Slurm was com‐
2341 piled with X11 support and PrologFlags=x11 is defined in the
2342 slurm.conf. Default is "all".
2343
2344 srun will submit the job request to the slurm job controller, then ini‐
2345 tiate all processes on the remote nodes. If the request cannot be met
2346 immediately, srun will block until the resources are free to run the
2347 job. If the -I (--immediate) option is specified srun will terminate if
2348 resources are not immediately available.
2349
2350 When initiating remote processes srun will propagate the current work‐
2351 ing directory, unless --chdir=<path> is specified, in which case path
2352 will become the working directory for the remote processes.
2353
2354 The -n, -c, and -N options control how CPUs and nodes will be allo‐
2355 cated to the job. When specifying only the number of processes to run
2356 with -n, a default of one CPU per process is allocated. By specifying
2357 the number of CPUs required per task (-c), more than one CPU may be al‐
2358 located per process. If the number of nodes is specified with -N, srun
2359 will attempt to allocate at least the number of nodes specified.
2360
2361 Combinations of the above three options may be used to change how pro‐
2362 cesses are distributed across nodes and cpus. For instance, by specify‐
2363 ing both the number of processes and number of nodes on which to run,
2364 the number of processes per node is implied. However, if the number of
2365 CPUs per process is more important then number of processes (-n) and
2366 the number of CPUs per process (-c) should be specified.
2367
2368 srun will refuse to allocate more than one process per CPU unless
2369 --overcommit (-O) is also specified.
2370
2371 srun will attempt to meet the above specifications "at a minimum." That
2372 is, if 16 nodes are requested for 32 processes, and some nodes do not
2373 have 2 CPUs, the allocation of nodes will be increased in order to meet
2374 the demand for CPUs. In other words, a minimum of 16 nodes are being
2375 requested. However, if 16 nodes are requested for 15 processes, srun
2376 will consider this an error, as 15 processes cannot run across 16
2377 nodes.
2378
2379
2380 IO Redirection
2381
2382 By default, stdout and stderr will be redirected from all tasks to the
2383 stdout and stderr of srun, and stdin will be redirected from the stan‐
2384 dard input of srun to all remote tasks. If stdin is only to be read by
2385 a subset of the spawned tasks, specifying a file to read from rather
2386 than forwarding stdin from the srun command may be preferable as it
2387 avoids moving and storing data that will never be read.
2388
2389 For OS X, the poll() function does not support stdin, so input from a
2390 terminal is not possible.
2391
2392 This behavior may be changed with the --output, --error, and --input
2393 (-o, -e, -i) options. Valid format specifications for these options are
2394
2395
2396 all stdout stderr is redirected from all tasks to srun. stdin is
2397 broadcast to all remote tasks. (This is the default behav‐
2398 ior)
2399
2400 none stdout and stderr is not received from any task. stdin is
2401 not sent to any task (stdin is closed).
2402
2403 taskid stdout and/or stderr are redirected from only the task with
2404 relative id equal to taskid, where 0 <= taskid <= ntasks,
2405 where ntasks is the total number of tasks in the current job
2406 step. stdin is redirected from the stdin of srun to this
2407 same task. This file will be written on the node executing
2408 the task.
2409
2410 filename srun will redirect stdout and/or stderr to the named file
2411 from all tasks. stdin will be redirected from the named file
2412 and broadcast to all tasks in the job. filename refers to a
2413 path on the host that runs srun. Depending on the cluster's
2414 file system layout, this may result in the output appearing
2415 in different places depending on whether the job is run in
2416 batch mode.
2417
2418 filename pattern
2419 srun allows for a filename pattern to be used to generate the
2420 named IO file described above. The following list of format
2421 specifiers may be used in the format string to generate a
2422 filename that will be unique to a given jobid, stepid, node,
2423 or task. In each case, the appropriate number of files are
2424 opened and associated with the corresponding tasks. Note that
2425 any format string containing %t, %n, and/or %N will be writ‐
2426 ten on the node executing the task rather than the node where
2427 srun executes, these format specifiers are not supported on a
2428 BGQ system.
2429
2430 \\ Do not process any of the replacement symbols.
2431
2432 %% The character "%".
2433
2434 %A Job array's master job allocation number.
2435
2436 %a Job array ID (index) number.
2437
2438 %J jobid.stepid of the running job. (e.g. "128.0")
2439
2440 %j jobid of the running job.
2441
2442 %s stepid of the running job.
2443
2444 %N short hostname. This will create a separate IO file
2445 per node.
2446
2447 %n Node identifier relative to current job (e.g. "0" is
2448 the first node of the running job) This will create a
2449 separate IO file per node.
2450
2451 %t task identifier (rank) relative to current job. This
2452 will create a separate IO file per task.
2453
2454 %u User name.
2455
2456 %x Job name.
2457
2458 A number placed between the percent character and format
2459 specifier may be used to zero-pad the result in the IO file‐
2460 name. This number is ignored if the format specifier corre‐
2461 sponds to non-numeric data (%N for example).
2462
2463 Some examples of how the format string may be used for a 4
2464 task job step with a Job ID of 128 and step id of 0 are in‐
2465 cluded below:
2466
2467
2468 job%J.out job128.0.out
2469
2470 job%4j.out job0128.out
2471
2472 job%j-%2t.out job128-00.out, job128-01.out, ...
2473
2475 Executing srun sends a remote procedure call to slurmctld. If enough
2476 calls from srun or other Slurm client commands that send remote proce‐
2477 dure calls to the slurmctld daemon come in at once, it can result in a
2478 degradation of performance of the slurmctld daemon, possibly resulting
2479 in a denial of service.
2480
2481 Do not run srun or other Slurm client commands that send remote proce‐
2482 dure calls to slurmctld from loops in shell scripts or other programs.
2483 Ensure that programs limit calls to srun to the minimum necessary for
2484 the information you are trying to gather.
2485
2486
2488 Upon startup, srun will read and handle the options set in the follow‐
2489 ing environment variables. The majority of these variables are set the
2490 same way the options are set, as defined above. For flag options that
2491 are defined to expect no argument, the option can be enabled by setting
2492 the environment variable without a value (empty or NULL string), the
2493 string 'yes', or a non-zero number. Any other value for the environment
2494 variable will result in the option not being set. There are a couple
2495 exceptions to these rules that are noted below.
2496 NOTE: Command line options always override environment variable set‐
2497 tings.
2498
2499
2500 PMI_FANOUT This is used exclusively with PMI (MPICH2 and
2501 MVAPICH2) and controls the fanout of data commu‐
2502 nications. The srun command sends messages to ap‐
2503 plication programs (via the PMI library) and
2504 those applications may be called upon to forward
2505 that data to up to this number of additional
2506 tasks. Higher values offload work from the srun
2507 command to the applications and likely increase
2508 the vulnerability to failures. The default value
2509 is 32.
2510
2511 PMI_FANOUT_OFF_HOST This is used exclusively with PMI (MPICH2 and
2512 MVAPICH2) and controls the fanout of data commu‐
2513 nications. The srun command sends messages to
2514 application programs (via the PMI library) and
2515 those applications may be called upon to forward
2516 that data to additional tasks. By default, srun
2517 sends one message per host and one task on that
2518 host forwards the data to other tasks on that
2519 host up to PMI_FANOUT. If PMI_FANOUT_OFF_HOST is
2520 defined, the user task may be required to forward
2521 the data to tasks on other hosts. Setting
2522 PMI_FANOUT_OFF_HOST may increase performance.
2523 Since more work is performed by the PMI library
2524 loaded by the user application, failures also can
2525 be more common and more difficult to diagnose.
2526 Should be disabled/enabled by setting to 0 or 1.
2527
2528 PMI_TIME This is used exclusively with PMI (MPICH2 and
2529 MVAPICH2) and controls how much the communica‐
2530 tions from the tasks to the srun are spread out
2531 in time in order to avoid overwhelming the srun
2532 command with work. The default value is 500 (mi‐
2533 croseconds) per task. On relatively slow proces‐
2534 sors or systems with very large processor counts
2535 (and large PMI data sets), higher values may be
2536 required.
2537
2538 SLURM_ACCOUNT Same as -A, --account
2539
2540 SLURM_ACCTG_FREQ Same as --acctg-freq
2541
2542 SLURM_BCAST Same as --bcast
2543
2544 SLURM_BCAST_EXCLUDE Same as --bcast-exclude
2545
2546 SLURM_BURST_BUFFER Same as --bb
2547
2548 SLURM_CLUSTERS Same as -M, --clusters
2549
2550 SLURM_COMPRESS Same as --compress
2551
2552 SLURM_CONF The location of the Slurm configuration file.
2553
2554 SLURM_CONSTRAINT Same as -C, --constraint
2555
2556 SLURM_CORE_SPEC Same as --core-spec
2557
2558 SLURM_CPU_BIND Same as --cpu-bind
2559
2560 SLURM_CPU_FREQ_REQ Same as --cpu-freq.
2561
2562 SLURM_CPUS_PER_GPU Same as --cpus-per-gpu
2563
2564 SRUN_CPUS_PER_TASK Same as -c, --cpus-per-task
2565
2566 SLURM_DEBUG Same as -v, --verbose. Must be set to 0 or 1 to
2567 disable or enable the option.
2568
2569 SLURM_DELAY_BOOT Same as --delay-boot
2570
2571 SLURM_DEPENDENCY Same as -d, --dependency=<jobid>
2572
2573 SLURM_DISABLE_STATUS Same as -X, --disable-status
2574
2575 SLURM_DIST_PLANESIZE Plane distribution size. Only used if --distribu‐
2576 tion=plane, without =<size>, is set.
2577
2578 SLURM_DISTRIBUTION Same as -m, --distribution
2579
2580 SLURM_EPILOG Same as --epilog
2581
2582 SLURM_EXACT Same as --exact
2583
2584 SLURM_EXCLUSIVE Same as --exclusive
2585
2586 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
2587 error occurs (e.g. invalid options). This can be
2588 used by a script to distinguish application exit
2589 codes from various Slurm error conditions. Also
2590 see SLURM_EXIT_IMMEDIATE.
2591
2592 SLURM_EXIT_IMMEDIATE Specifies the exit code generated when the --im‐
2593 mediate option is used and resources are not cur‐
2594 rently available. This can be used by a script
2595 to distinguish application exit codes from vari‐
2596 ous Slurm error conditions. Also see
2597 SLURM_EXIT_ERROR.
2598
2599 SLURM_EXPORT_ENV Same as --export
2600
2601 SLURM_GPU_BIND Same as --gpu-bind
2602
2603 SLURM_GPU_FREQ Same as --gpu-freq
2604
2605 SLURM_GPUS Same as -G, --gpus
2606
2607 SLURM_GPUS_PER_NODE Same as --gpus-per-node
2608
2609 SLURM_GPUS_PER_TASK Same as --gpus-per-task
2610
2611 SLURM_GRES Same as --gres. Also see SLURM_STEP_GRES
2612
2613 SLURM_GRES_FLAGS Same as --gres-flags
2614
2615 SLURM_HINT Same as --hint
2616
2617 SLURM_IMMEDIATE Same as -I, --immediate
2618
2619 SLURM_JOB_ID Same as --jobid
2620
2621 SLURM_JOB_NAME Same as -J, --job-name except within an existing
2622 allocation, in which case it is ignored to avoid
2623 using the batch job's name as the name of each
2624 job step.
2625
2626 SLURM_JOB_NUM_NODES Same as -N, --nodes. Total number of nodes in
2627 the job’s resource allocation.
2628
2629 SLURM_KILL_BAD_EXIT Same as -K, --kill-on-bad-exit. Must be set to 0
2630 or 1 to disable or enable the option.
2631
2632 SLURM_LABELIO Same as -l, --label
2633
2634 SLURM_MEM_BIND Same as --mem-bind
2635
2636 SLURM_MEM_PER_CPU Same as --mem-per-cpu
2637
2638 SLURM_MEM_PER_GPU Same as --mem-per-gpu
2639
2640 SLURM_MEM_PER_NODE Same as --mem
2641
2642 SLURM_MPI_TYPE Same as --mpi
2643
2644 SLURM_NETWORK Same as --network
2645
2646 SLURM_NNODES Same as -N, --nodes. Total number of nodes in the
2647 job’s resource allocation. See
2648 SLURM_JOB_NUM_NODES. Included for backwards com‐
2649 patibility.
2650
2651 SLURM_NO_KILL Same as -k, --no-kill
2652
2653 SLURM_NPROCS Same as -n, --ntasks. See SLURM_NTASKS. Included
2654 for backwards compatibility.
2655
2656 SLURM_NTASKS Same as -n, --ntasks
2657
2658 SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2659
2660 SLURM_NTASKS_PER_GPU Same as --ntasks-per-gpu
2661
2662 SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2663
2664 SLURM_NTASKS_PER_SOCKET
2665 Same as --ntasks-per-socket
2666
2667 SLURM_OPEN_MODE Same as --open-mode
2668
2669 SLURM_OVERCOMMIT Same as -O, --overcommit
2670
2671 SLURM_OVERLAP Same as --overlap
2672
2673 SLURM_PARTITION Same as -p, --partition
2674
2675 SLURM_PMI_KVS_NO_DUP_KEYS
2676 If set, then PMI key-pairs will contain no dupli‐
2677 cate keys. MPI can use this variable to inform
2678 the PMI library that it will not use duplicate
2679 keys so PMI can skip the check for duplicate
2680 keys. This is the case for MPICH2 and reduces
2681 overhead in testing for duplicates for improved
2682 performance
2683
2684 SLURM_POWER Same as --power
2685
2686 SLURM_PROFILE Same as --profile
2687
2688 SLURM_PROLOG Same as --prolog
2689
2690 SLURM_QOS Same as --qos
2691
2692 SLURM_REMOTE_CWD Same as -D, --chdir=
2693
2694 SLURM_REQ_SWITCH When a tree topology is used, this defines the
2695 maximum count of switches desired for the job al‐
2696 location and optionally the maximum time to wait
2697 for that number of switches. See --switches
2698
2699 SLURM_RESERVATION Same as --reservation
2700
2701 SLURM_RESV_PORTS Same as --resv-ports
2702
2703 SLURM_SEND_LIBS Same as --send-libs
2704
2705 SLURM_SIGNAL Same as --signal
2706
2707 SLURM_SPREAD_JOB Same as --spread-job
2708
2709 SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2710 if set and non-zero, successive task exit mes‐
2711 sages with the same exit code will be printed
2712 only once.
2713
2714 SLURM_STDERRMODE Same as -e, --error
2715
2716 SLURM_STDINMODE Same as -i, --input
2717
2718 SLURM_STDOUTMODE Same as -o, --output
2719
2720 SLURM_STEP_GRES Same as --gres (only applies to job steps, not to
2721 job allocations). Also see SLURM_GRES
2722
2723 SLURM_STEP_KILLED_MSG_NODE_ID=ID
2724 If set, only the specified node will log when the
2725 job or step are killed by a signal.
2726
2727 SLURM_TASK_EPILOG Same as --task-epilog
2728
2729 SLURM_TASK_PROLOG Same as --task-prolog
2730
2731 SLURM_TEST_EXEC If defined, srun will verify existence of the ex‐
2732 ecutable program along with user execute permis‐
2733 sion on the node where srun was called before at‐
2734 tempting to launch it on nodes in the step.
2735
2736 SLURM_THREAD_SPEC Same as --thread-spec
2737
2738 SLURM_THREADS Same as -T, --threads
2739
2740 SLURM_THREADS_PER_CORE
2741 Same as --threads-per-core
2742
2743 SLURM_TIMELIMIT Same as -t, --time
2744
2745 SLURM_UMASK If defined, Slurm will use the defined umask to
2746 set permissions when creating the output/error
2747 files for the job.
2748
2749 SLURM_UNBUFFEREDIO Same as -u, --unbuffered
2750
2751 SLURM_USE_MIN_NODES Same as --use-min-nodes
2752
2753 SLURM_WAIT Same as -W, --wait
2754
2755 SLURM_WAIT4SWITCH Max time waiting for requested switches. See
2756 --switches
2757
2758 SLURM_WCKEY Same as -W, --wckey
2759
2760 SLURM_WORKING_DIR -D, --chdir
2761
2762 SLURMD_DEBUG Same as -d, --slurmd-debug. Must be set to 0 or 1
2763 to disable or enable the option.
2764
2765 SRUN_CONTAINER Same as --container.
2766
2767 SRUN_EXPORT_ENV Same as --export, and will override any setting
2768 for SLURM_EXPORT_ENV.
2769
2771 srun will set some environment variables in the environment of the exe‐
2772 cuting tasks on the remote compute nodes. These environment variables
2773 are:
2774
2775
2776 SLURM_*_HET_GROUP_# For a heterogeneous job allocation, the environ‐
2777 ment variables are set separately for each compo‐
2778 nent.
2779
2780 SLURM_CLUSTER_NAME Name of the cluster on which the job is execut‐
2781 ing.
2782
2783 SLURM_CPU_BIND_LIST --cpu-bind map or mask list (list of Slurm CPU
2784 IDs or masks for this node, CPU_ID = Board_ID x
2785 threads_per_board + Socket_ID x
2786 threads_per_socket + Core_ID x threads_per_core +
2787 Thread_ID).
2788
2789 SLURM_CPU_BIND_TYPE --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2790
2791 SLURM_CPU_BIND_VERBOSE
2792 --cpu-bind verbosity (quiet,verbose).
2793
2794 SLURM_CPU_FREQ_REQ Contains the value requested for cpu frequency on
2795 the srun command as a numerical frequency in
2796 kilohertz, or a coded value for a request of low,
2797 medium,highm1 or high for the frequency. See the
2798 description of the --cpu-freq option or the
2799 SLURM_CPU_FREQ_REQ input environment variable.
2800
2801 SLURM_CPUS_ON_NODE Number of CPUs available to the step on this
2802 node. NOTE: The select/linear plugin allocates
2803 entire nodes to jobs, so the value indicates the
2804 total count of CPUs on the node. For the se‐
2805 lect/cons_res and cons/tres plugins, this number
2806 indicates the number of CPUs on this node allo‐
2807 cated to the step.
2808
2809 SLURM_CPUS_PER_TASK Number of cpus requested per task. Only set if
2810 the --cpus-per-task option is specified.
2811
2812 SLURM_DISTRIBUTION Distribution type for the allocated jobs. Set the
2813 distribution with -m, --distribution.
2814
2815 SLURM_GPUS_ON_NODE Number of GPUs available to the step on this
2816 node.
2817
2818 SLURM_GTIDS Global task IDs running on this node. Zero ori‐
2819 gin and comma separated. It is read internally
2820 by pmi if Slurm was built with pmi support. Leav‐
2821 ing the variable set may cause problems when us‐
2822 ing external packages from within the job (Abaqus
2823 and Ansys have been known to have problems when
2824 it is set - consult the appropriate documentation
2825 for 3rd party software).
2826
2827 SLURM_HET_SIZE Set to count of components in heterogeneous job.
2828
2829 SLURM_JOB_ACCOUNT Account name associated of the job allocation.
2830
2831 SLURM_JOB_CPUS_PER_NODE
2832 Count of CPUs available to the job on the nodes
2833 in the allocation, using the format
2834 CPU_count[(xnumber_of_nodes)][,CPU_count [(xnum‐
2835 ber_of_nodes)] ...]. For example:
2836 SLURM_JOB_CPUS_PER_NODE='72(x2),36' indicates
2837 that on the first and second nodes (as listed by
2838 SLURM_JOB_NODELIST) the allocation has 72 CPUs,
2839 while the third node has 36 CPUs. NOTE: The se‐
2840 lect/linear plugin allocates entire nodes to
2841 jobs, so the value indicates the total count of
2842 CPUs on allocated nodes. The select/cons_res and
2843 select/cons_tres plugins allocate individual CPUs
2844 to jobs, so this number indicates the number of
2845 CPUs allocated to the job.
2846
2847 SLURM_JOB_DEPENDENCY Set to value of the --dependency option.
2848
2849 SLURM_JOB_GPUS The global GPU IDs of the GPUs allocated to this
2850 job. The GPU IDs are not relative to any device
2851 cgroup, even if devices are constrained with
2852 task/cgroup. Only set in batch and interactive
2853 jobs.
2854
2855 SLURM_JOB_ID Job id of the executing job.
2856
2857 SLURM_JOB_NAME Set to the value of the --job-name option or the
2858 command name when srun is used to create a new
2859 job allocation. Not set when srun is used only to
2860 create a job step (i.e. within an existing job
2861 allocation).
2862
2863 SLURM_JOB_NODELIST List of nodes allocated to the job.
2864
2865 SLURM_JOB_NODES Total number of nodes in the job's resource allo‐
2866 cation.
2867
2868 SLURM_JOB_PARTITION Name of the partition in which the job is run‐
2869 ning.
2870
2871 SLURM_JOB_QOS Quality Of Service (QOS) of the job allocation.
2872
2873 SLURM_JOB_RESERVATION Advanced reservation containing the job alloca‐
2874 tion, if any.
2875
2876 SLURM_JOBID Job id of the executing job. See SLURM_JOB_ID.
2877 Included for backwards compatibility.
2878
2879 SLURM_LAUNCH_NODE_IPADDR
2880 IP address of the node from which the task launch
2881 was initiated (where the srun command ran from).
2882
2883 SLURM_LOCALID Node local task ID for the process within a job.
2884
2885 SLURM_MEM_BIND_LIST --mem-bind map or mask list (<list of IDs or
2886 masks for this node>).
2887
2888 SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2889
2890 SLURM_MEM_BIND_SORT Sort free cache pages (run zonesort on Intel KNL
2891 nodes).
2892
2893 SLURM_MEM_BIND_TYPE --mem-bind type (none,rank,map_mem:,mask_mem:).
2894
2895 SLURM_MEM_BIND_VERBOSE
2896 --mem-bind verbosity (quiet,verbose).
2897
2898 SLURM_NODE_ALIASES Sets of node name, communication address and
2899 hostname for nodes allocated to the job from the
2900 cloud. Each element in the set if colon separated
2901 and each set is comma separated. For example:
2902 SLURM_NODE_ALIASES=
2903 ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2904
2905 SLURM_NODEID The relative node ID of the current node.
2906
2907 SLURM_NPROCS Total number of processes in the current job or
2908 job step. See SLURM_NTASKS. Included for back‐
2909 wards compatibility.
2910
2911 SLURM_NTASKS Total number of processes in the current job or
2912 job step.
2913
2914 SLURM_OVERCOMMIT Set to 1 if --overcommit was specified.
2915
2916 SLURM_PRIO_PROCESS The scheduling priority (nice value) at the time
2917 of job submission. This value is propagated to
2918 the spawned processes.
2919
2920 SLURM_PROCID The MPI rank (or relative process ID) of the cur‐
2921 rent process.
2922
2923 SLURM_SRUN_COMM_HOST IP address of srun communication host.
2924
2925 SLURM_SRUN_COMM_PORT srun communication port.
2926
2927 SLURM_CONTAINER OCI Bundle for job. Only set if --container is
2928 specified.
2929
2930 SLURM_SHARDS_ON_NODE Number of GPU Shards available to the step on
2931 this node.
2932
2933 SLURM_STEP_GPUS The global GPU IDs of the GPUs allocated to this
2934 step (excluding batch and interactive steps). The
2935 GPU IDs are not relative to any device cgroup,
2936 even if devices are constrained with task/cgroup.
2937
2938 SLURM_STEP_ID The step ID of the current job.
2939
2940 SLURM_STEP_LAUNCHER_PORT
2941 Step launcher port.
2942
2943 SLURM_STEP_NODELIST List of nodes allocated to the step.
2944
2945 SLURM_STEP_NUM_NODES Number of nodes allocated to the step.
2946
2947 SLURM_STEP_NUM_TASKS Number of processes in the job step or whole het‐
2948 erogeneous job step.
2949
2950 SLURM_STEP_TASKS_PER_NODE
2951 Number of processes per node within the step.
2952
2953 SLURM_STEPID The step ID of the current job. See
2954 SLURM_STEP_ID. Included for backwards compatibil‐
2955 ity.
2956
2957 SLURM_SUBMIT_DIR The directory from which the allocation was in‐
2958 voked from.
2959
2960 SLURM_SUBMIT_HOST The hostname of the computer from which the allo‐
2961 cation was invoked from.
2962
2963 SLURM_TASK_PID The process ID of the task being started.
2964
2965 SLURM_TASKS_PER_NODE Number of tasks to be initiated on each node.
2966 Values are comma separated and in the same order
2967 as SLURM_JOB_NODELIST. If two or more consecu‐
2968 tive nodes are to have the same task count, that
2969 count is followed by "(x#)" where "#" is the rep‐
2970 etition count. For example,
2971 "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2972 first three nodes will each execute two tasks and
2973 the fourth node will execute one task.
2974
2975 SLURM_TOPOLOGY_ADDR This is set only if the system has the topol‐
2976 ogy/tree plugin configured. The value will be
2977 set to the names network switches which may be
2978 involved in the job's communications from the
2979 system's top level switch down to the leaf switch
2980 and ending with node name. A period is used to
2981 separate each hardware component name.
2982
2983 SLURM_TOPOLOGY_ADDR_PATTERN
2984 This is set only if the system has the topol‐
2985 ogy/tree plugin configured. The value will be
2986 set component types listed in SLURM_TOPOL‐
2987 OGY_ADDR. Each component will be identified as
2988 either "switch" or "node". A period is used to
2989 separate each hardware component type.
2990
2991 SLURM_UMASK The umask in effect when the job was submitted.
2992
2993 SLURMD_NODENAME Name of the node running the task. In the case of
2994 a parallel job executing on multiple compute
2995 nodes, the various tasks will have this environ‐
2996 ment variable set to different values on each
2997 compute node.
2998
2999 SRUN_DEBUG Set to the logging level of the srun command.
3000 Default value is 3 (info level). The value is
3001 incremented or decremented based upon the --ver‐
3002 bose and --quiet options.
3003
3005 Signals sent to the srun command are automatically forwarded to the
3006 tasks it is controlling with a few exceptions. The escape sequence
3007 <control-c> will report the state of all tasks associated with the srun
3008 command. If <control-c> is entered twice within one second, then the
3009 associated SIGINT signal will be sent to all tasks and a termination
3010 sequence will be entered sending SIGCONT, SIGTERM, and SIGKILL to all
3011 spawned tasks. If a third <control-c> is received, the srun program
3012 will be terminated without waiting for remote tasks to exit or their
3013 I/O to complete.
3014
3015 The escape sequence <control-z> is presently ignored.
3016
3017
3019 MPI use depends upon the type of MPI being used. There are three fun‐
3020 damentally different modes of operation used by these various MPI im‐
3021 plementations.
3022
3023 1. Slurm directly launches the tasks and performs initialization of
3024 communications through the PMI2 or PMIx APIs. For example: "srun -n16
3025 a.out".
3026
3027 2. Slurm creates a resource allocation for the job and then mpirun
3028 launches tasks using Slurm's infrastructure (OpenMPI).
3029
3030 3. Slurm creates a resource allocation for the job and then mpirun
3031 launches tasks using some mechanism other than Slurm, such as SSH or
3032 RSH. These tasks are initiated outside of Slurm's monitoring or con‐
3033 trol. Slurm's epilog should be configured to purge these tasks when the
3034 job's allocation is relinquished, or the use of pam_slurm_adopt is
3035 highly recommended.
3036
3037 See https://slurm.schedmd.com/mpi_guide.html for more information on
3038 use of these various MPI implementations with Slurm.
3039
3040
3042 Comments in the configuration file must have a "#" in column one. The
3043 configuration file contains the following fields separated by white
3044 space:
3045
3046
3047 Task rank
3048 One or more task ranks to use this configuration. Multiple val‐
3049 ues may be comma separated. Ranges may be indicated with two
3050 numbers separated with a '-' with the smaller number first (e.g.
3051 "0-4" and not "4-0"). To indicate all tasks not otherwise spec‐
3052 ified, specify a rank of '*' as the last line of the file. If
3053 an attempt is made to initiate a task for which no executable
3054 program is defined, the following error message will be produced
3055 "No executable program specified for this task".
3056
3057 Executable
3058 The name of the program to execute. May be fully qualified
3059 pathname if desired.
3060
3061 Arguments
3062 Program arguments. The expression "%t" will be replaced with
3063 the task's number. The expression "%o" will be replaced with
3064 the task's offset within this range (e.g. a configured task rank
3065 value of "1-5" would have offset values of "0-4"). Single
3066 quotes may be used to avoid having the enclosed values inter‐
3067 preted. This field is optional. Any arguments for the program
3068 entered on the command line will be added to the arguments spec‐
3069 ified in the configuration file.
3070
3071 For example:
3072
3073 $ cat silly.conf
3074 ###################################################################
3075 # srun multiple program configuration file
3076 #
3077 # srun -n8 -l --multi-prog silly.conf
3078 ###################################################################
3079 4-6 hostname
3080 1,7 echo task:%t
3081 0,2-3 echo offset:%o
3082
3083 $ srun -n8 -l --multi-prog silly.conf
3084 0: offset:0
3085 1: task:1
3086 2: offset:1
3087 3: offset:2
3088 4: linux15.llnl.gov
3089 5: linux16.llnl.gov
3090 6: linux17.llnl.gov
3091 7: task:7
3092
3093
3095 This simple example demonstrates the execution of the command hostname
3096 in eight tasks. At least eight processors will be allocated to the job
3097 (the same as the task count) on however many nodes are required to sat‐
3098 isfy the request. The output of each task will be proceeded with its
3099 task number. (The machine "dev" in the example below has a total of
3100 two CPUs per node)
3101
3102 $ srun -n8 -l hostname
3103 0: dev0
3104 1: dev0
3105 2: dev1
3106 3: dev1
3107 4: dev2
3108 5: dev2
3109 6: dev3
3110 7: dev3
3111
3112
3113 The srun -r option is used within a job script to run two job steps on
3114 disjoint nodes in the following example. The script is run using allo‐
3115 cate mode instead of as a batch job in this case.
3116
3117 $ cat test.sh
3118 #!/bin/sh
3119 echo $SLURM_JOB_NODELIST
3120 srun -lN2 -r2 hostname
3121 srun -lN2 hostname
3122
3123 $ salloc -N4 test.sh
3124 dev[7-10]
3125 0: dev9
3126 1: dev10
3127 0: dev7
3128 1: dev8
3129
3130
3131 The following script runs two job steps in parallel within an allocated
3132 set of nodes.
3133
3134 $ cat test.sh
3135 #!/bin/bash
3136 srun -lN2 -n4 -r 2 sleep 60 &
3137 srun -lN2 -r 0 sleep 60 &
3138 sleep 1
3139 squeue
3140 squeue -s
3141 wait
3142
3143 $ salloc -N4 test.sh
3144 JOBID PARTITION NAME USER ST TIME NODES NODELIST
3145 65641 batch test.sh grondo R 0:01 4 dev[7-10]
3146
3147 STEPID PARTITION USER TIME NODELIST
3148 65641.0 batch grondo 0:01 dev[7-8]
3149 65641.1 batch grondo 0:01 dev[9-10]
3150
3151
3152 This example demonstrates how one executes a simple MPI job. We use
3153 srun to build a list of machines (nodes) to be used by mpirun in its
3154 required format. A sample command line and the script to be executed
3155 follow.
3156
3157 $ cat test.sh
3158 #!/bin/sh
3159 MACHINEFILE="nodes.$SLURM_JOB_ID"
3160
3161 # Generate Machinefile for mpi such that hosts are in the same
3162 # order as if run via srun
3163 #
3164 srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3165
3166 # Run using generated Machine file:
3167 mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3168
3169 rm $MACHINEFILE
3170
3171 $ salloc -N2 -n4 test.sh
3172
3173
3174 This simple example demonstrates the execution of different jobs on
3175 different nodes in the same srun. You can do this for any number of
3176 nodes or any number of jobs. The executables are placed on the nodes
3177 sited by the SLURM_NODEID env var. Starting at 0 and going to the num‐
3178 ber specified on the srun command line.
3179
3180 $ cat test.sh
3181 case $SLURM_NODEID in
3182 0) echo "I am running on "
3183 hostname ;;
3184 1) hostname
3185 echo "is where I am running" ;;
3186 esac
3187
3188 $ srun -N2 test.sh
3189 dev0
3190 is where I am running
3191 I am running on
3192 dev1
3193
3194
3195 This example demonstrates use of multi-core options to control layout
3196 of tasks. We request that four sockets per node and two cores per
3197 socket be dedicated to the job.
3198
3199 $ srun -N2 -B 4-4:2-2 a.out
3200
3201
3202 This example shows a script in which Slurm is used to provide resource
3203 management for a job by executing the various job steps as processors
3204 become available for their dedicated use.
3205
3206 $ cat my.script
3207 #!/bin/bash
3208 srun -n4 prog1 &
3209 srun -n3 prog2 &
3210 srun -n1 prog3 &
3211 srun -n1 prog4 &
3212 wait
3213
3214
3215 This example shows how to launch an application called "server" with
3216 one task, 8 CPUs and 16 GB of memory (2 GB per CPU) plus another appli‐
3217 cation called "client" with 16 tasks, 1 CPU per task (the default) and
3218 1 GB of memory per task.
3219
3220 $ srun -n1 -c16 --mem-per-cpu=1gb server : -n16 --mem-per-cpu=1gb client
3221
3222
3224 Copyright (C) 2006-2007 The Regents of the University of California.
3225 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3226 Copyright (C) 2008-2010 Lawrence Livermore National Security.
3227 Copyright (C) 2010-2022 SchedMD LLC.
3228
3229 This file is part of Slurm, a resource management program. For de‐
3230 tails, see <https://slurm.schedmd.com/>.
3231
3232 Slurm is free software; you can redistribute it and/or modify it under
3233 the terms of the GNU General Public License as published by the Free
3234 Software Foundation; either version 2 of the License, or (at your op‐
3235 tion) any later version.
3236
3237 Slurm is distributed in the hope that it will be useful, but WITHOUT
3238 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
3239 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
3240 for more details.
3241
3242
3244 salloc(1), sattach(1), sbatch(1), sbcast(1), scancel(1), scontrol(1),
3245 squeue(1), slurm.conf(5), sched_setaffinity (2), numa (3) getrlimit (2)
3246
3247
3248
3249December 2022 Slurm Commands srun(1)