1srun(1) Slurm Commands srun(1)
2
3
4
6 srun - Run parallel jobs
7
8
10 srun [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]]
11 executable(N) [args(N)...]
12
13 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
14 For more details about heterogeneous jobs see the document
15 https://slurm.schedmd.com/heterogeneous_jobs.html
16
17
19 Run a parallel job on cluster managed by Slurm. If necessary, srun
20 will first create a resource allocation in which to run the parallel
21 job.
22
23 The following document describes the influence of various options on
24 the allocation of cpus to jobs and tasks.
25 https://slurm.schedmd.com/cpu_management.html
26
27
29 srun will return the highest exit code of all tasks run or the highest
30 signal (with the high-order bit set in an 8-bit integer -- e.g. 128 +
31 signal) of any task that exited with a signal.
32 The value 253 is reserved for out-of-memory errors.
33
34
36 The executable is resolved in the following order:
37
38 1. If executable starts with ".", then path is constructed as: current
39 working directory / executable
40 2. If executable starts with a "/", then path is considered absolute.
41 3. If executable can be resolved through PATH. See path_resolution(7).
42 4. If executable is in current working directory.
43
44 Current working directory is the calling process working directory un‐
45 less the --chdir argument is passed, which will override the current
46 working directory.
47
48
50 --accel-bind=<options>
51 Control how tasks are bound to generic resources of type gpu and
52 nic. Multiple options may be specified. Supported options in‐
53 clude:
54
55 g Bind each task to GPUs which are closest to the allocated
56 CPUs.
57
58 n Bind each task to NICs which are closest to the allocated
59 CPUs.
60
61 v Verbose mode. Log how tasks are bound to GPU and NIC de‐
62 vices.
63
64 This option applies to job allocations.
65
66 -A, --account=<account>
67 Charge resources used by this job to specified account. The ac‐
68 count is an arbitrary string. The account name may be changed
69 after job submission using the scontrol command. This option ap‐
70 plies to job allocations.
71
72 --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
73 Define the job accounting and profiling sampling intervals in
74 seconds. This can be used to override the JobAcctGatherFre‐
75 quency parameter in the slurm.conf file. <datatype>=<interval>
76 specifies the task sampling interval for the jobacct_gather
77 plugin or a sampling interval for a profiling type by the
78 acct_gather_profile plugin. Multiple comma-separated
79 <datatype>=<interval> pairs may be specified. Supported datatype
80 values are:
81
82 task Sampling interval for the jobacct_gather plugins and
83 for task profiling by the acct_gather_profile
84 plugin.
85 NOTE: This frequency is used to monitor memory us‐
86 age. If memory limits are enforced the highest fre‐
87 quency a user can request is what is configured in
88 the slurm.conf file. It can not be disabled.
89
90 energy Sampling interval for energy profiling using the
91 acct_gather_energy plugin.
92
93 network Sampling interval for infiniband profiling using the
94 acct_gather_interconnect plugin.
95
96 filesystem Sampling interval for filesystem profiling using the
97 acct_gather_filesystem plugin.
98
99
100 The default value for the task sampling interval is 30 seconds.
101 The default value for all other intervals is 0. An interval of
102 0 disables sampling of the specified type. If the task sampling
103 interval is 0, accounting information is collected only at job
104 termination (reducing Slurm interference with the job).
105 Smaller (non-zero) values have a greater impact upon job perfor‐
106 mance, but a value of 30 seconds is not likely to be noticeable
107 for applications having less than 10,000 tasks. This option ap‐
108 plies to job allocations.
109
110 --bb=<spec>
111 Burst buffer specification. The form of the specification is
112 system dependent. Also see --bbf. This option applies to job
113 allocations. When the --bb option is used, Slurm parses this
114 option and creates a temporary burst buffer script file that is
115 used internally by the burst buffer plugins. See Slurm's burst
116 buffer guide for more information and examples:
117 https://slurm.schedmd.com/burst_buffer.html
118
119 --bbf=<file_name>
120 Path of file containing burst buffer specification. The form of
121 the specification is system dependent. Also see --bb. This op‐
122 tion applies to job allocations. See Slurm's burst buffer guide
123 for more information and examples:
124 https://slurm.schedmd.com/burst_buffer.html
125
126 --bcast[=<dest_path>]
127 Copy executable file to allocated compute nodes. If a file name
128 is specified, copy the executable to the specified destination
129 file path. If the path specified ends with '/' it is treated as
130 a target directory, and the destination file name will be
131 slurm_bcast_<job_id>.<step_id>_<nodename>. If no dest_path is
132 specified and the slurm.conf BcastParameters DestDir is config‐
133 ured then it is used, and the filename follows the above pat‐
134 tern. If none of the previous is specified, then --chdir is
135 used, and the filename follows the above pattern too. For exam‐
136 ple, "srun --bcast=/tmp/mine -N3 a.out" will copy the file
137 "a.out" from your current directory to the file "/tmp/mine" on
138 each of the three allocated compute nodes and execute that file.
139 This option applies to step allocations.
140
141 --bcast-exclude={NONE|<exclude_path>[,<exclude_path>...]}
142 Comma-separated list of absolute directory paths to be excluded
143 when autodetecting and broadcasting executable shared object de‐
144 pendencies through --bcast. If the keyword "NONE" is configured,
145 no directory paths will be excluded. The default value is that
146 of slurm.conf BcastExclude and this option overrides it. See
147 also --bcast and --send-libs.
148
149 -b, --begin=<time>
150 Defer initiation of this job until the specified time. It ac‐
151 cepts times of the form HH:MM:SS to run a job at a specific time
152 of day (seconds are optional). (If that time is already past,
153 the next day is assumed.) You may also specify midnight, noon,
154 fika (3 PM) or teatime (4 PM) and you can have a time-of-day
155 suffixed with AM or PM for running in the morning or the
156 evening. You can also say what day the job will be run, by
157 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
158 Combine date and time using the following format
159 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
160 count time-units, where the time-units can be seconds (default),
161 minutes, hours, days, or weeks and you can tell Slurm to run the
162 job today with the keyword today and to run the job tomorrow
163 with the keyword tomorrow. The value may be changed after job
164 submission using the scontrol command. For example:
165
166 --begin=16:00
167 --begin=now+1hour
168 --begin=now+60 (seconds by default)
169 --begin=2010-01-20T12:34:00
170
171
172 Notes on date/time specifications:
173 - Although the 'seconds' field of the HH:MM:SS time specifica‐
174 tion is allowed by the code, note that the poll time of the
175 Slurm scheduler is not precise enough to guarantee dispatch of
176 the job on the exact second. The job will be eligible to start
177 on the next poll following the specified time. The exact poll
178 interval depends on the Slurm scheduler (e.g., 60 seconds with
179 the default sched/builtin).
180 - If no time (HH:MM:SS) is specified, the default is
181 (00:00:00).
182 - If a date is specified without a year (e.g., MM/DD) then the
183 current year is assumed, unless the combination of MM/DD and
184 HH:MM:SS has already passed for that year, in which case the
185 next year is used.
186 This option applies to job allocations.
187
188 -D, --chdir=<path>
189 Have the remote processes do a chdir to path before beginning
190 execution. The default is to chdir to the current working direc‐
191 tory of the srun process. The path can be specified as full path
192 or relative path to the directory where the command is executed.
193 This option applies to job allocations.
194
195 --cluster-constraint=<list>
196 Specifies features that a federated cluster must have to have a
197 sibling job submitted to it. Slurm will attempt to submit a sib‐
198 ling job to a cluster if it has at least one of the specified
199 features.
200
201 -M, --clusters=<string>
202 Clusters to issue commands to. Multiple cluster names may be
203 comma separated. The job will be submitted to the one cluster
204 providing the earliest expected job initiation time. The default
205 value is the current cluster. A value of 'all' will query to run
206 on all clusters. Note the --export option to control environ‐
207 ment variables exported between clusters. This option applies
208 only to job allocations. Note that the SlurmDBD must be up for
209 this option to work properly.
210
211 --comment=<string>
212 An arbitrary comment. This option applies to job allocations.
213
214 --compress[=type]
215 Compress file before sending it to compute hosts. The optional
216 argument specifies the data compression library to be used. The
217 default is BcastParameters Compression= if set or "lz4" other‐
218 wise. Supported values are "lz4". Some compression libraries
219 may be unavailable on some systems. For use with the --bcast
220 option. This option applies to step allocations.
221
222 -C, --constraint=<list>
223 Nodes can have features assigned to them by the Slurm adminis‐
224 trator. Users can specify which of these features are required
225 by their job using the constraint option. If you are looking for
226 'soft' constraints please see see --prefer for more information.
227 Only nodes having features matching the job constraints will be
228 used to satisfy the request. Multiple constraints may be speci‐
229 fied with AND, OR, matching OR, resource counts, etc. (some op‐
230 erators are not supported on all system types).
231
232 NOTE: If features that are part of the node_features/helpers
233 plugin are requested, then only the Single Name and AND options
234 are supported.
235
236 Supported --constraint options include:
237
238 Single Name
239 Only nodes which have the specified feature will be used.
240 For example, --constraint="intel"
241
242 Node Count
243 A request can specify the number of nodes needed with
244 some feature by appending an asterisk and count after the
245 feature name. For example, --nodes=16 --con‐
246 straint="graphics*4 ..." indicates that the job requires
247 16 nodes and that at least four of those nodes must have
248 the feature "graphics."
249
250 AND If only nodes with all of specified features will be
251 used. The ampersand is used for an AND operator. For
252 example, --constraint="intel&gpu"
253
254 OR If only nodes with at least one of specified features
255 will be used. The vertical bar is used for an OR opera‐
256 tor. For example, --constraint="intel|amd"
257
258 Matching OR
259 If only one of a set of possible options should be used
260 for all allocated nodes, then use the OR operator and en‐
261 close the options within square brackets. For example,
262 --constraint="[rack1|rack2|rack3|rack4]" might be used to
263 specify that all nodes must be allocated on a single rack
264 of the cluster, but any of those four racks can be used.
265
266 Multiple Counts
267 Specific counts of multiple resources may be specified by
268 using the AND operator and enclosing the options within
269 square brackets. For example, --con‐
270 straint="[rack1*2&rack2*4]" might be used to specify that
271 two nodes must be allocated from nodes with the feature
272 of "rack1" and four nodes must be allocated from nodes
273 with the feature "rack2".
274
275 NOTE: This construct does not support multiple Intel KNL
276 NUMA or MCDRAM modes. For example, while --con‐
277 straint="[(knl&quad)*2&(knl&hemi)*4]" is not supported,
278 --constraint="[haswell*2&(knl&hemi)*4]" is supported.
279 Specification of multiple KNL modes requires the use of a
280 heterogeneous job.
281
282 NOTE: Multiple Counts can cause jobs to be allocated with
283 a non-optimal network layout.
284
285 Brackets
286 Brackets can be used to indicate that you are looking for
287 a set of nodes with the different requirements contained
288 within the brackets. For example, --con‐
289 straint="[(rack1|rack2)*1&(rack3)*2]" will get you one
290 node with either the "rack1" or "rack2" features and two
291 nodes with the "rack3" feature. The same request without
292 the brackets will try to find a single node that meets
293 those requirements.
294
295 NOTE: Brackets are only reserved for Multiple Counts and
296 Matching OR syntax. AND operators require a count for
297 each feature inside square brackets (i.e.
298 "[quad*2&hemi*1]"). Slurm will only allow a single set of
299 bracketed constraints per job.
300
301 Parenthesis
302 Parenthesis can be used to group like node features to‐
303 gether. For example, --con‐
304 straint="[(knl&snc4&flat)*4&haswell*1]" might be used to
305 specify that four nodes with the features "knl", "snc4"
306 and "flat" plus one node with the feature "haswell" are
307 required. All options within parenthesis should be
308 grouped with AND (e.g. "&") operands.
309
310 WARNING: When srun is executed from within salloc or sbatch, the
311 constraint value can only contain a single feature name. None of
312 the other operators are currently supported for job steps.
313 This option applies to job and step allocations.
314
315 --container=<path_to_container>
316 Absolute path to OCI container bundle.
317
318 --contiguous
319 If set, then the allocated nodes must form a contiguous set.
320
321 NOTE: If SelectPlugin=cons_res this option won't be honored with
322 the topology/tree or topology/3d_torus plugins, both of which
323 can modify the node ordering. This option applies to job alloca‐
324 tions.
325
326 -S, --core-spec=<num>
327 Count of specialized cores per node reserved by the job for sys‐
328 tem operations and not used by the application. The application
329 will not use these cores, but will be charged for their alloca‐
330 tion. Default value is dependent upon the node's configured
331 CoreSpecCount value. If a value of zero is designated and the
332 Slurm configuration option AllowSpecResourcesUsage is enabled,
333 the job will be allowed to override CoreSpecCount and use the
334 specialized resources on nodes it is allocated. This option can
335 not be used with the --thread-spec option. This option applies
336 to job allocations.
337
338 NOTE: This option may implicitly impact the number of tasks if
339 -n was not specified.
340
341 NOTE: Explicitly setting a job's specialized core value implic‐
342 itly sets its --exclusive option, reserving entire nodes for the
343 job.
344
345 --cores-per-socket=<cores>
346 Restrict node selection to nodes with at least the specified
347 number of cores per socket. See additional information under -B
348 option above when task/affinity plugin is enabled. This option
349 applies to job allocations.
350
351 --cpu-bind=[{quiet|verbose},]<type>
352 Bind tasks to CPUs. Used only when the task/affinity plugin is
353 enabled. NOTE: To have Slurm always report on the selected CPU
354 binding for all commands executed in a shell, you can enable
355 verbose mode by setting the SLURM_CPU_BIND environment variable
356 value to "verbose".
357
358 The following informational environment variables are set when
359 --cpu-bind is in use:
360
361 SLURM_CPU_BIND_VERBOSE
362 SLURM_CPU_BIND_TYPE
363 SLURM_CPU_BIND_LIST
364
365 See the ENVIRONMENT VARIABLES section for a more detailed de‐
366 scription of the individual SLURM_CPU_BIND variables. These
367 variable are available only if the task/affinity plugin is con‐
368 figured.
369
370 When using --cpus-per-task to run multithreaded tasks, be aware
371 that CPU binding is inherited from the parent of the process.
372 This means that the multithreaded task should either specify or
373 clear the CPU binding itself to avoid having all threads of the
374 multithreaded task use the same mask/CPU as the parent. Alter‐
375 natively, fat masks (masks which specify more than one allowed
376 CPU) could be used for the tasks in order to provide multiple
377 CPUs for the multithreaded tasks.
378
379 Note that a job step can be allocated different numbers of CPUs
380 on each node or be allocated CPUs not starting at location zero.
381 Therefore one of the options which automatically generate the
382 task binding is recommended. Explicitly specified masks or
383 bindings are only honored when the job step has been allocated
384 every available CPU on the node.
385
386 Binding a task to a NUMA locality domain means to bind the task
387 to the set of CPUs that belong to the NUMA locality domain or
388 "NUMA node". If NUMA locality domain options are used on sys‐
389 tems with no NUMA support, then each socket is considered a lo‐
390 cality domain.
391
392 If the --cpu-bind option is not used, the default binding mode
393 will depend upon Slurm's configuration and the step's resource
394 allocation. If all allocated nodes have the same configured
395 CpuBind mode, that will be used. Otherwise if the job's Parti‐
396 tion has a configured CpuBind mode, that will be used. Other‐
397 wise if Slurm has a configured TaskPluginParam value, that mode
398 will be used. Otherwise automatic binding will be performed as
399 described below.
400
401 Auto Binding
402 Applies only when task/affinity is enabled. If the job
403 step allocation includes an allocation with a number of
404 sockets, cores, or threads equal to the number of tasks
405 times cpus-per-task, then the tasks will by default be
406 bound to the appropriate resources (auto binding). Dis‐
407 able this mode of operation by explicitly setting
408 "--cpu-bind=none". Use TaskPluginParam=auto‐
409 bind=[threads|cores|sockets] to set a default cpu binding
410 in case "auto binding" doesn't find a match.
411
412 Supported options include:
413
414 q[uiet]
415 Quietly bind before task runs (default)
416
417 v[erbose]
418 Verbosely report binding before task runs
419
420 no[ne] Do not bind tasks to CPUs (default unless auto
421 binding is applied)
422
423 rank Automatically bind by task rank. The lowest num‐
424 bered task on each node is bound to socket (or
425 core or thread) zero, etc. Not supported unless
426 the entire node is allocated to the job.
427
428 map_cpu:<list>
429 Bind by setting CPU masks on tasks (or ranks) as
430 specified where <list> is
431 <cpu_id_for_task_0>,<cpu_id_for_task_1>,... If
432 the number of tasks (or ranks) exceeds the number
433 of elements in this list, elements in the list
434 will be reused as needed starting from the begin‐
435 ning of the list. To simplify support for large
436 task counts, the lists may follow a map with an
437 asterisk and repetition count. For example
438 "map_cpu:0*4,3*4".
439
440 mask_cpu:<list>
441 Bind by setting CPU masks on tasks (or ranks) as
442 specified where <list> is
443 <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
444 The mapping is specified for a node and identical
445 mapping is applied to the tasks on every node
446 (i.e. the lowest task ID on each node is mapped to
447 the first mask specified in the list, etc.). CPU
448 masks are always interpreted as hexadecimal values
449 but can be preceded with an optional '0x'. If the
450 number of tasks (or ranks) exceeds the number of
451 elements in this list, elements in the list will
452 be reused as needed starting from the beginning of
453 the list. To simplify support for large task
454 counts, the lists may follow a map with an aster‐
455 isk and repetition count. For example
456 "mask_cpu:0x0f*4,0xf0*4".
457
458 rank_ldom
459 Bind to a NUMA locality domain by rank. Not sup‐
460 ported unless the entire node is allocated to the
461 job.
462
463 map_ldom:<list>
464 Bind by mapping NUMA locality domain IDs to tasks
465 as specified where <list> is
466 <ldom1>,<ldom2>,...<ldomN>. The locality domain
467 IDs are interpreted as decimal values unless they
468 are preceded with '0x' in which case they are in‐
469 terpreted as hexadecimal values. Not supported
470 unless the entire node is allocated to the job.
471
472 mask_ldom:<list>
473 Bind by setting NUMA locality domain masks on
474 tasks as specified where <list> is
475 <mask1>,<mask2>,...<maskN>. NUMA locality domain
476 masks are always interpreted as hexadecimal values
477 but can be preceded with an optional '0x'. Not
478 supported unless the entire node is allocated to
479 the job.
480
481 sockets
482 Automatically generate masks binding tasks to
483 sockets. Only the CPUs on the socket which have
484 been allocated to the job will be used. If the
485 number of tasks differs from the number of allo‐
486 cated sockets this can result in sub-optimal bind‐
487 ing.
488
489 cores Automatically generate masks binding tasks to
490 cores. If the number of tasks differs from the
491 number of allocated cores this can result in
492 sub-optimal binding.
493
494 threads
495 Automatically generate masks binding tasks to
496 threads. If the number of tasks differs from the
497 number of allocated threads this can result in
498 sub-optimal binding.
499
500 ldoms Automatically generate masks binding tasks to NUMA
501 locality domains. If the number of tasks differs
502 from the number of allocated locality domains this
503 can result in sub-optimal binding.
504
505 help Show help message for cpu-bind
506
507 This option applies to job and step allocations.
508
509 --cpu-freq=<p1>[-p2[:p3]]
510
511 Request that the job step initiated by this srun command be run
512 at some requested frequency if possible, on the CPUs selected
513 for the step on the compute node(s).
514
515 p1 can be [#### | low | medium | high | highm1] which will set
516 the frequency scaling_speed to the corresponding value, and set
517 the frequency scaling_governor to UserSpace. See below for defi‐
518 nition of the values.
519
520 p1 can be [Conservative | OnDemand | Performance | PowerSave]
521 which will set the scaling_governor to the corresponding value.
522 The governor has to be in the list set by the slurm.conf option
523 CpuFreqGovernors.
524
525 When p2 is present, p1 will be the minimum scaling frequency and
526 p2 will be the maximum scaling frequency.
527
528 p2 can be [#### | medium | high | highm1] p2 must be greater
529 than p1.
530
531 p3 can be [Conservative | OnDemand | Performance | PowerSave |
532 SchedUtil | UserSpace] which will set the governor to the corre‐
533 sponding value.
534
535 If p3 is UserSpace, the frequency scaling_speed will be set by a
536 power or energy aware scheduling strategy to a value between p1
537 and p2 that lets the job run within the site's power goal. The
538 job may be delayed if p1 is higher than a frequency that allows
539 the job to run within the goal.
540
541 If the current frequency is < min, it will be set to min. Like‐
542 wise, if the current frequency is > max, it will be set to max.
543
544 Acceptable values at present include:
545
546 #### frequency in kilohertz
547
548 Low the lowest available frequency
549
550 High the highest available frequency
551
552 HighM1 (high minus one) will select the next highest
553 available frequency
554
555 Medium attempts to set a frequency in the middle of the
556 available range
557
558 Conservative attempts to use the Conservative CPU governor
559
560 OnDemand attempts to use the OnDemand CPU governor (the de‐
561 fault value)
562
563 Performance attempts to use the Performance CPU governor
564
565 PowerSave attempts to use the PowerSave CPU governor
566
567 UserSpace attempts to use the UserSpace CPU governor
568
569 The following informational environment variable is set
570 in the job
571 step when --cpu-freq option is requested.
572 SLURM_CPU_FREQ_REQ
573
574 This environment variable can also be used to supply the value
575 for the CPU frequency request if it is set when the 'srun' com‐
576 mand is issued. The --cpu-freq on the command line will over‐
577 ride the environment variable value. The form on the environ‐
578 ment variable is the same as the command line. See the ENVIRON‐
579 MENT VARIABLES section for a description of the
580 SLURM_CPU_FREQ_REQ variable.
581
582 NOTE: This parameter is treated as a request, not a requirement.
583 If the job step's node does not support setting the CPU fre‐
584 quency, or the requested value is outside the bounds of the le‐
585 gal frequencies, an error is logged, but the job step is allowed
586 to continue.
587
588 NOTE: Setting the frequency for just the CPUs of the job step
589 implies that the tasks are confined to those CPUs. If task con‐
590 finement (i.e. the task/affinity TaskPlugin is enabled, or the
591 task/cgroup TaskPlugin is enabled with "ConstrainCores=yes" set
592 in cgroup.conf) is not configured, this parameter is ignored.
593
594 NOTE: When the step completes, the frequency and governor of
595 each selected CPU is reset to the previous values.
596
597 NOTE: When submitting jobs with the --cpu-freq option with lin‐
598 uxproc as the ProctrackType can cause jobs to run too quickly
599 before Accounting is able to poll for job information. As a re‐
600 sult not all of accounting information will be present.
601
602 This option applies to job and step allocations.
603
604 --cpus-per-gpu=<ncpus>
605 Advise Slurm that ensuing job steps will require ncpus proces‐
606 sors per allocated GPU. Not compatible with the --cpus-per-task
607 option.
608
609 -c, --cpus-per-task=<ncpus>
610 Request that ncpus be allocated per process. This may be useful
611 if the job is multithreaded and requires more than one CPU per
612 task for optimal performance. Explicitly requesting this option
613 implies --exact. The default is one CPU per process and does not
614 imply --exact. If -c is specified without -n, as many tasks
615 will be allocated per node as possible while satisfying the -c
616 restriction. For instance on a cluster with 8 CPUs per node, a
617 job request for 4 nodes and 3 CPUs per task may be allocated 3
618 or 6 CPUs per node (1 or 2 tasks per node) depending upon re‐
619 source consumption by other jobs. Such a job may be unable to
620 execute more than a total of 4 tasks.
621
622 WARNING: There are configurations and options interpreted dif‐
623 ferently by job and job step requests which can result in incon‐
624 sistencies for this option. For example srun -c2
625 --threads-per-core=1 prog may allocate two cores for the job,
626 but if each of those cores contains two threads, the job alloca‐
627 tion will include four CPUs. The job step allocation will then
628 launch two threads per CPU for a total of two tasks.
629
630 WARNING: When srun is executed from within salloc or sbatch,
631 there are configurations and options which can result in incon‐
632 sistent allocations when -c has a value greater than -c on sal‐
633 loc or sbatch. The number of cpus per task specified for salloc
634 or sbatch is not automatically inherited by srun and, if de‐
635 sired, must be requested again, either by specifying
636 --cpus-per-task when calling srun, or by setting the
637 SRUN_CPUS_PER_TASK environment variable.
638
639 This option applies to job and step allocations.
640
641 --deadline=<OPT>
642 remove the job if no ending is possible before this deadline
643 (start > (deadline - time[-min])). Default is no deadline.
644 Valid time formats are:
645 HH:MM[:SS] [AM|PM]
646 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
647 MM/DD[/YY]-HH:MM[:SS]
648 YYYY-MM-DD[THH:MM[:SS]]]
649 now[+count[seconds(default)|minutes|hours|days|weeks]]
650
651 This option applies only to job allocations.
652
653 --delay-boot=<minutes>
654 Do not reboot nodes in order to satisfied this job's feature
655 specification if the job has been eligible to run for less than
656 this time period. If the job has waited for less than the spec‐
657 ified period, it will use only nodes which already have the
658 specified features. The argument is in units of minutes. A de‐
659 fault value may be set by a system administrator using the de‐
660 lay_boot option of the SchedulerParameters configuration parame‐
661 ter in the slurm.conf file, otherwise the default value is zero
662 (no delay).
663
664 This option applies only to job allocations.
665
666 -d, --dependency=<dependency_list>
667 Defer the start of this job until the specified dependencies
668 have been satisfied completed. This option does not apply to job
669 steps (executions of srun within an existing salloc or sbatch
670 allocation) only to job allocations. <dependency_list> is of
671 the form <type:job_id[:job_id][,type:job_id[:job_id]]> or
672 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
673 must be satisfied if the "," separator is used. Any dependency
674 may be satisfied if the "?" separator is used. Only one separa‐
675 tor may be used. Many jobs can share the same dependency and
676 these jobs may even belong to different users. The value may
677 be changed after job submission using the scontrol command. De‐
678 pendencies on remote jobs are allowed in a federation. Once a
679 job dependency fails due to the termination state of a preceding
680 job, the dependent job will never be run, even if the preceding
681 job is requeued and has a different termination state in a sub‐
682 sequent execution. This option applies to job allocations.
683
684 after:job_id[[+time][:jobid[+time]...]]
685 After the specified jobs start or are cancelled and
686 'time' in minutes from job start or cancellation happens,
687 this job can begin execution. If no 'time' is given then
688 there is no delay after start or cancellation.
689
690 afterany:job_id[:jobid...]
691 This job can begin execution after the specified jobs
692 have terminated.
693
694 afterburstbuffer:job_id[:jobid...]
695 This job can begin execution after the specified jobs
696 have terminated and any associated burst buffer stage out
697 operations have completed.
698
699 aftercorr:job_id[:jobid...]
700 A task of this job array can begin execution after the
701 corresponding task ID in the specified job has completed
702 successfully (ran to completion with an exit code of
703 zero).
704
705 afternotok:job_id[:jobid...]
706 This job can begin execution after the specified jobs
707 have terminated in some failed state (non-zero exit code,
708 node failure, timed out, etc).
709
710 afterok:job_id[:jobid...]
711 This job can begin execution after the specified jobs
712 have successfully executed (ran to completion with an
713 exit code of zero).
714
715 singleton
716 This job can begin execution after any previously
717 launched jobs sharing the same job name and user have
718 terminated. In other words, only one job by that name
719 and owned by that user can be running or suspended at any
720 point in time. In a federation, a singleton dependency
721 must be fulfilled on all clusters unless DependencyParam‐
722 eters=disable_remote_singleton is used in slurm.conf.
723
724 -X, --disable-status
725 Disable the display of task status when srun receives a single
726 SIGINT (Ctrl-C). Instead immediately forward the SIGINT to the
727 running job. Without this option a second Ctrl-C in one second
728 is required to forcibly terminate the job and srun will immedi‐
729 ately exit. May also be set via the environment variable
730 SLURM_DISABLE_STATUS. This option applies to job allocations.
731
732 -m, --distribution={*|block|cyclic|arbi‐
733 trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
734
735 Specify alternate distribution methods for remote processes.
736 For job allocation, this sets environment variables that will be
737 used by subsequent srun requests. Task distribution affects job
738 allocation at the last stage of the evaluation of available re‐
739 sources by the cons_res and cons_tres plugins. Consequently,
740 other options (e.g. --ntasks-per-node, --cpus-per-task) may af‐
741 fect resource selection prior to task distribution. To ensure a
742 specific task distribution jobs should have access to whole
743 nodes, for instance by using the --exclusive flag.
744
745 This option controls the distribution of tasks to the nodes on
746 which resources have been allocated, and the distribution of
747 those resources to tasks for binding (task affinity). The first
748 distribution method (before the first ":") controls the distri‐
749 bution of tasks to nodes. The second distribution method (after
750 the first ":") controls the distribution of allocated CPUs
751 across sockets for binding to tasks. The third distribution
752 method (after the second ":") controls the distribution of allo‐
753 cated CPUs across cores for binding to tasks. The second and
754 third distributions apply only if task affinity is enabled. The
755 third distribution is supported only if the task/cgroup plugin
756 is configured. The default value for each distribution type is
757 specified by *.
758
759 Note that with select/cons_res and select/cons_tres, the number
760 of CPUs allocated to each socket and node may be different. Re‐
761 fer to https://slurm.schedmd.com/mc_support.html for more infor‐
762 mation on resource allocation, distribution of tasks to nodes,
763 and binding of tasks to CPUs.
764 First distribution method (distribution of tasks across nodes):
765
766
767 * Use the default method for distributing tasks to nodes
768 (block).
769
770 block The block distribution method will distribute tasks to a
771 node such that consecutive tasks share a node. For exam‐
772 ple, consider an allocation of three nodes each with two
773 cpus. A four-task block distribution request will dis‐
774 tribute those tasks to the nodes with tasks one and two
775 on the first node, task three on the second node, and
776 task four on the third node. Block distribution is the
777 default behavior if the number of tasks exceeds the num‐
778 ber of allocated nodes.
779
780 cyclic The cyclic distribution method will distribute tasks to a
781 node such that consecutive tasks are distributed over
782 consecutive nodes (in a round-robin fashion). For exam‐
783 ple, consider an allocation of three nodes each with two
784 cpus. A four-task cyclic distribution request will dis‐
785 tribute those tasks to the nodes with tasks one and four
786 on the first node, task two on the second node, and task
787 three on the third node. Note that when SelectType is
788 select/cons_res, the same number of CPUs may not be allo‐
789 cated on each node. Task distribution will be round-robin
790 among all the nodes with CPUs yet to be assigned to
791 tasks. Cyclic distribution is the default behavior if
792 the number of tasks is no larger than the number of allo‐
793 cated nodes.
794
795 plane The tasks are distributed in blocks of size <size>. The
796 size must be given or SLURM_DIST_PLANESIZE must be set.
797 The number of tasks distributed to each node is the same
798 as for cyclic distribution, but the taskids assigned to
799 each node depend on the plane size. Additional distribu‐
800 tion specifications cannot be combined with this option.
801 For more details (including examples and diagrams),
802 please see https://slurm.schedmd.com/mc_support.html and
803 https://slurm.schedmd.com/dist_plane.html
804
805 arbitrary
806 The arbitrary method of distribution will allocate pro‐
807 cesses in-order as listed in file designated by the envi‐
808 ronment variable SLURM_HOSTFILE. If this variable is
809 listed it will over ride any other method specified. If
810 not set the method will default to block. Inside the
811 hostfile must contain at minimum the number of hosts re‐
812 quested and be one per line or comma separated. If spec‐
813 ifying a task count (-n, --ntasks=<number>), your tasks
814 will be laid out on the nodes in the order of the file.
815 NOTE: The arbitrary distribution option on a job alloca‐
816 tion only controls the nodes to be allocated to the job
817 and not the allocation of CPUs on those nodes. This op‐
818 tion is meant primarily to control a job step's task lay‐
819 out in an existing job allocation for the srun command.
820 NOTE: If the number of tasks is given and a list of re‐
821 quested nodes is also given, the number of nodes used
822 from that list will be reduced to match that of the num‐
823 ber of tasks if the number of nodes in the list is
824 greater than the number of tasks.
825
826 Second distribution method (distribution of CPUs across sockets
827 for binding):
828
829
830 * Use the default method for distributing CPUs across sock‐
831 ets (cyclic).
832
833 block The block distribution method will distribute allocated
834 CPUs consecutively from the same socket for binding to
835 tasks, before using the next consecutive socket.
836
837 cyclic The cyclic distribution method will distribute allocated
838 CPUs for binding to a given task consecutively from the
839 same socket, and from the next consecutive socket for the
840 next task, in a round-robin fashion across sockets.
841 Tasks requiring more than one CPU will have all of those
842 CPUs allocated on a single socket if possible.
843
844 fcyclic
845 The fcyclic distribution method will distribute allocated
846 CPUs for binding to tasks from consecutive sockets in a
847 round-robin fashion across the sockets. Tasks requiring
848 more than one CPU will have each CPUs allocated in a
849 cyclic fashion across sockets.
850
851 Third distribution method (distribution of CPUs across cores for
852 binding):
853
854
855 * Use the default method for distributing CPUs across cores
856 (inherited from second distribution method).
857
858 block The block distribution method will distribute allocated
859 CPUs consecutively from the same core for binding to
860 tasks, before using the next consecutive core.
861
862 cyclic The cyclic distribution method will distribute allocated
863 CPUs for binding to a given task consecutively from the
864 same core, and from the next consecutive core for the
865 next task, in a round-robin fashion across cores.
866
867 fcyclic
868 The fcyclic distribution method will distribute allocated
869 CPUs for binding to tasks from consecutive cores in a
870 round-robin fashion across the cores.
871
872 Optional control for task distribution over nodes:
873
874
875 Pack Rather than evenly distributing a job step's tasks evenly
876 across its allocated nodes, pack them as tightly as pos‐
877 sible on the nodes. This only applies when the "block"
878 task distribution method is used.
879
880 NoPack Rather than packing a job step's tasks as tightly as pos‐
881 sible on the nodes, distribute them evenly. This user
882 option will supersede the SelectTypeParameters
883 CR_Pack_Nodes configuration parameter.
884
885 This option applies to job and step allocations.
886
887 --epilog={none|<executable>}
888 srun will run executable just after the job step completes. The
889 command line arguments for executable will be the command and
890 arguments of the job step. If none is specified, then no srun
891 epilog will be run. This parameter overrides the SrunEpilog pa‐
892 rameter in slurm.conf. This parameter is completely independent
893 from the Epilog parameter in slurm.conf. This option applies to
894 job allocations.
895
896 -e, --error=<filename_pattern>
897 Specify how stderr is to be redirected. By default in interac‐
898 tive mode, srun redirects stderr to the same file as stdout, if
899 one is specified. The --error option is provided to allow stdout
900 and stderr to be redirected to different locations. See IO Re‐
901 direction below for more options. If the specified file already
902 exists, it will be overwritten. This option applies to job and
903 step allocations.
904
905 --exact
906 Allow a step access to only the resources requested for the
907 step. By default, all non-GRES resources on each node in the
908 step allocation will be used. This option only applies to step
909 allocations.
910 NOTE: Parallel steps will either be blocked or rejected until
911 requested step resources are available unless --overlap is spec‐
912 ified. Job resources can be held after the completion of an srun
913 command while Slurm does job cleanup. Step epilogs and/or SPANK
914 plugins can further delay the release of step resources.
915
916 -x, --exclude={<host1[,<host2>...]|<filename>}
917 Request that a specific list of hosts not be included in the re‐
918 sources allocated to this job. The host list will be assumed to
919 be a filename if it contains a "/" character. This option ap‐
920 plies to job and step allocations.
921
922 --exclusive[={user|mcs}]
923 This option applies to job and job step allocations, and has two
924 slightly different meanings for each one. When used to initiate
925 a job, the job allocation cannot share nodes with other running
926 jobs (or just other users with the "=user" option or "=mcs" op‐
927 tion). If user/mcs are not specified (i.e. the job allocation
928 can not share nodes with other running jobs), the job is allo‐
929 cated all CPUs and GRES on all nodes in the allocation, but is
930 only allocated as much memory as it requested. This is by design
931 to support gang scheduling, because suspended jobs still reside
932 in memory. To request all the memory on a node, use --mem=0.
933 The default shared/exclusive behavior depends on system configu‐
934 ration and the partition's OverSubscribe option takes precedence
935 over the job's option. NOTE: Since shared GRES (MPS) cannot be
936 allocated at the same time as a sharing GRES (GPU) this option
937 only allocates all sharing GRES and no underlying shared GRES.
938
939 This option can also be used when initiating more than one job
940 step within an existing resource allocation (default), where you
941 want separate processors to be dedicated to each job step. If
942 sufficient processors are not available to initiate the job
943 step, it will be deferred. This can be thought of as providing a
944 mechanism for resource management to the job within its alloca‐
945 tion (--exact implied).
946
947 The exclusive allocation of CPUs applies to job steps by de‐
948 fault, but --exact is NOT the default. In other words, the de‐
949 fault behavior is this: job steps will not share CPUs, but job
950 steps will be allocated all CPUs available to the job on all
951 nodes allocated to the steps.
952
953 In order to share the resources use the --overlap option.
954
955 See EXAMPLE below.
956
957 --export={[ALL,]<environment_variables>|ALL|NONE}
958 Identify which environment variables from the submission envi‐
959 ronment are propagated to the launched application.
960
961 --export=ALL
962 Default mode if --export is not specified. All of the
963 user's environment will be loaded from the caller's
964 environment.
965
966 --export=NONE
967 None of the user environment will be defined. User
968 must use absolute path to the binary to be executed
969 that will define the environment. User can not specify
970 explicit environment variables with "NONE".
971
972 This option is particularly important for jobs that
973 are submitted on one cluster and execute on a differ‐
974 ent cluster (e.g. with different paths). To avoid
975 steps inheriting environment export settings (e.g.
976 "NONE") from sbatch command, either set --export=ALL
977 or the environment variable SLURM_EXPORT_ENV should be
978 set to "ALL".
979
980 --export=[ALL,]<environment_variables>
981 Exports all SLURM* environment variables along with
982 explicitly defined variables. Multiple environment
983 variable names should be comma separated. Environment
984 variable names may be specified to propagate the cur‐
985 rent value (e.g. "--export=EDITOR") or specific values
986 may be exported (e.g. "--export=EDITOR=/bin/emacs").
987 If "ALL" is specified, then all user environment vari‐
988 ables will be loaded and will take precedence over any
989 explicitly given environment variables.
990
991 Example: --export=EDITOR,ARG1=test
992 In this example, the propagated environment will only
993 contain the variable EDITOR from the user's environ‐
994 ment, SLURM_* environment variables, and ARG1=test.
995
996 Example: --export=ALL,EDITOR=/bin/emacs
997 There are two possible outcomes for this example. If
998 the caller has the EDITOR environment variable de‐
999 fined, then the job's environment will inherit the
1000 variable from the caller's environment. If the caller
1001 doesn't have an environment variable defined for EDI‐
1002 TOR, then the job's environment will use the value
1003 given by --export.
1004
1005 -B, --extra-node-info=<sockets>[:cores[:threads]]
1006 Restrict node selection to nodes with at least the specified
1007 number of sockets, cores per socket and/or threads per core.
1008 NOTE: These options do not specify the resource allocation size.
1009 Each value specified is considered a minimum. An asterisk (*)
1010 can be used as a placeholder indicating that all available re‐
1011 sources of that type are to be utilized. Values can also be
1012 specified as min-max. The individual levels can also be speci‐
1013 fied in separate options if desired:
1014
1015 --sockets-per-node=<sockets>
1016 --cores-per-socket=<cores>
1017 --threads-per-core=<threads>
1018 If task/affinity plugin is enabled, then specifying an alloca‐
1019 tion in this manner also sets a default --cpu-bind option of
1020 threads if the -B option specifies a thread count, otherwise an
1021 option of cores if a core count is specified, otherwise an op‐
1022 tion of sockets. If SelectType is configured to se‐
1023 lect/cons_res, it must have a parameter of CR_Core, CR_Core_Mem‐
1024 ory, CR_Socket, or CR_Socket_Memory for this option to be hon‐
1025 ored. If not specified, the scontrol show job will display
1026 'ReqS:C:T=*:*:*'. This option applies to job allocations.
1027 NOTE: This option is mutually exclusive with --hint,
1028 --threads-per-core and --ntasks-per-core.
1029 NOTE: If the number of sockets, cores and threads were all spec‐
1030 ified, the number of nodes was specified (as a fixed number, not
1031 a range) and the number of tasks was NOT specified, srun will
1032 implicitly calculate the number of tasks as one task per thread.
1033
1034 --gid=<group>
1035 If srun is run as root, and the --gid option is used, submit the
1036 job with group's group access permissions. group may be the
1037 group name or the numerical group ID. This option applies to job
1038 allocations.
1039
1040 --gpu-bind=[verbose,]<type>
1041 Bind tasks to specific GPUs. By default every spawned task can
1042 access every GPU allocated to the step. If "verbose," is speci‐
1043 fied before <type>, then print out GPU binding debug information
1044 to the stderr of the tasks. GPU binding is ignored if there is
1045 only one task.
1046
1047 Supported type options:
1048
1049 closest Bind each task to the GPU(s) which are closest. In a
1050 NUMA environment, each task may be bound to more than
1051 one GPU (i.e. all GPUs in that NUMA environment).
1052
1053 map_gpu:<list>
1054 Bind by setting GPU masks on tasks (or ranks) as spec‐
1055 ified where <list> is
1056 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
1057 are interpreted as decimal values. If the number of
1058 tasks (or ranks) exceeds the number of elements in
1059 this list, elements in the list will be reused as
1060 needed starting from the beginning of the list. To
1061 simplify support for large task counts, the lists may
1062 follow a map with an asterisk and repetition count.
1063 For example "map_gpu:0*4,1*4". If the task/cgroup
1064 plugin is used and ConstrainDevices is set in
1065 cgroup.conf, then the GPU IDs are zero-based indexes
1066 relative to the GPUs allocated to the job (e.g. the
1067 first GPU is 0, even if the global ID is 3). Other‐
1068 wise, the GPU IDs are global IDs, and all GPUs on each
1069 node in the job should be allocated for predictable
1070 binding results.
1071
1072 mask_gpu:<list>
1073 Bind by setting GPU masks on tasks (or ranks) as spec‐
1074 ified where <list> is
1075 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
1076 mapping is specified for a node and identical mapping
1077 is applied to the tasks on every node (i.e. the lowest
1078 task ID on each node is mapped to the first mask spec‐
1079 ified in the list, etc.). GPU masks are always inter‐
1080 preted as hexadecimal values but can be preceded with
1081 an optional '0x'. To simplify support for large task
1082 counts, the lists may follow a map with an asterisk
1083 and repetition count. For example
1084 "mask_gpu:0x0f*4,0xf0*4". If the task/cgroup plugin
1085 is used and ConstrainDevices is set in cgroup.conf,
1086 then the GPU IDs are zero-based indexes relative to
1087 the GPUs allocated to the job (e.g. the first GPU is
1088 0, even if the global ID is 3). Otherwise, the GPU IDs
1089 are global IDs, and all GPUs on each node in the job
1090 should be allocated for predictable binding results.
1091
1092 none Do not bind tasks to GPUs (turns off binding if
1093 --gpus-per-task is requested).
1094
1095 per_task:<gpus_per_task>
1096 Each task will be bound to the number of gpus speci‐
1097 fied in <gpus_per_task>. Gpus are assigned in order to
1098 tasks. The first task will be assigned the first x
1099 number of gpus on the node etc.
1100
1101 single:<tasks_per_gpu>
1102 Like --gpu-bind=closest, except that each task can
1103 only be bound to a single GPU, even when it can be
1104 bound to multiple GPUs that are equally close. The
1105 GPU to bind to is determined by <tasks_per_gpu>, where
1106 the first <tasks_per_gpu> tasks are bound to the first
1107 GPU available, the second <tasks_per_gpu> tasks are
1108 bound to the second GPU available, etc. This is basi‐
1109 cally a block distribution of tasks onto available
1110 GPUs, where the available GPUs are determined by the
1111 socket affinity of the task and the socket affinity of
1112 the GPUs as specified in gres.conf's Cores parameter.
1113
1114 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
1115 Request that GPUs allocated to the job are configured with spe‐
1116 cific frequency values. This option can be used to indepen‐
1117 dently configure the GPU and its memory frequencies. After the
1118 job is completed, the frequencies of all affected GPUs will be
1119 reset to the highest possible values. In some cases, system
1120 power caps may override the requested values. The field type
1121 can be "memory". If type is not specified, the GPU frequency is
1122 implied. The value field can either be "low", "medium", "high",
1123 "highm1" or a numeric value in megahertz (MHz). If the speci‐
1124 fied numeric value is not possible, a value as close as possible
1125 will be used. See below for definition of the values. The ver‐
1126 bose option causes current GPU frequency information to be
1127 logged. Examples of use include "--gpu-freq=medium,memory=high"
1128 and "--gpu-freq=450".
1129
1130 Supported value definitions:
1131
1132 low the lowest available frequency.
1133
1134 medium attempts to set a frequency in the middle of the
1135 available range.
1136
1137 high the highest available frequency.
1138
1139 highm1 (high minus one) will select the next highest avail‐
1140 able frequency.
1141
1142 -G, --gpus=[type:]<number>
1143 Specify the total number of GPUs required for the job. An op‐
1144 tional GPU type specification can be supplied. For example
1145 "--gpus=volta:3". Multiple options can be requested in a comma
1146 separated list, for example: "--gpus=volta:3,kepler:1". See
1147 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
1148 options.
1149 NOTE: The allocation has to contain at least one GPU per node.
1150
1151 --gpus-per-node=[type:]<number>
1152 Specify the number of GPUs required for the job on each node in‐
1153 cluded in the job's resource allocation. An optional GPU type
1154 specification can be supplied. For example
1155 "--gpus-per-node=volta:3". Multiple options can be requested in
1156 a comma separated list, for example:
1157 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
1158 --gpus-per-socket and --gpus-per-task options.
1159
1160 --gpus-per-socket=[type:]<number>
1161 Specify the number of GPUs required for the job on each socket
1162 included in the job's resource allocation. An optional GPU type
1163 specification can be supplied. For example
1164 "--gpus-per-socket=volta:3". Multiple options can be requested
1165 in a comma separated list, for example:
1166 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
1167 sockets per node count ( --sockets-per-node). See also the
1168 --gpus, --gpus-per-node and --gpus-per-task options. This op‐
1169 tion applies to job allocations.
1170
1171 --gpus-per-task=[type:]<number>
1172 Specify the number of GPUs required for the job on each task to
1173 be spawned in the job's resource allocation. An optional GPU
1174 type specification can be supplied. For example
1175 "--gpus-per-task=volta:1". Multiple options can be requested in
1176 a comma separated list, for example:
1177 "--gpus-per-task=volta:3,kepler:1". See also the --gpus,
1178 --gpus-per-socket and --gpus-per-node options. This option re‐
1179 quires an explicit task count, e.g. -n, --ntasks or "--gpus=X
1180 --gpus-per-task=Y" rather than an ambiguous range of nodes with
1181 -N, --nodes. This option will implicitly set
1182 --gpu-bind=per_task:<gpus_per_task>, but that can be overridden
1183 with an explicit --gpu-bind specification.
1184
1185 --gres=<list>
1186 Specifies a comma-delimited list of generic consumable re‐
1187 sources. The format of each entry on the list is
1188 "name[[:type]:count]". The name is that of the consumable re‐
1189 source. The count is the number of those resources with a de‐
1190 fault value of 1. The count can have a suffix of "k" or "K"
1191 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
1192 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
1193 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
1194 x 1024 x 1024 x 1024). The specified resources will be allo‐
1195 cated to the job on each node. The available generic consumable
1196 resources is configurable by the system administrator. A list
1197 of available generic consumable resources will be printed and
1198 the command will exit if the option argument is "help". Exam‐
1199 ples of use include "--gres=gpu:2", "--gres=gpu:kepler:2", and
1200 "--gres=help". NOTE: This option applies to job and step allo‐
1201 cations. By default, a job step is allocated all of the generic
1202 resources that have been requested by the job, except those im‐
1203 plicitly requested when a job is exclusive. To change the be‐
1204 havior so that each job step is allocated no generic resources,
1205 explicitly set the value of --gres to specify zero counts for
1206 each generic resource OR set "--gres=none" OR set the
1207 SLURM_STEP_GRES environment variable to "none".
1208
1209 --gres-flags=<type>
1210 Specify generic resource task binding options.
1211
1212 disable-binding
1213 Disable filtering of CPUs with respect to generic re‐
1214 source locality. This option is currently required to
1215 use more CPUs than are bound to a GRES (i.e. if a GPU is
1216 bound to the CPUs on one socket, but resources on more
1217 than one socket are required to run the job). This op‐
1218 tion may permit a job to be allocated resources sooner
1219 than otherwise possible, but may result in lower job per‐
1220 formance. This option applies to job allocations.
1221 NOTE: This option is specific to SelectType=cons_res.
1222
1223 enforce-binding
1224 The only CPUs available to the job/step will be those
1225 bound to the selected GRES (i.e. the CPUs identified in
1226 the gres.conf file will be strictly enforced). This op‐
1227 tion may result in delayed initiation of a job. For ex‐
1228 ample a job requiring two GPUs and one CPU will be de‐
1229 layed until both GPUs on a single socket are available
1230 rather than using GPUs bound to separate sockets, how‐
1231 ever, the application performance may be improved due to
1232 improved communication speed. Requires the node to be
1233 configured with more than one socket and resource filter‐
1234 ing will be performed on a per-socket basis. NOTE: Job
1235 steps that don't use --exact will not be affected.
1236 NOTE: This option is specific to SelectType=cons_tres for
1237 job allocations.
1238
1239 -h, --help
1240 Display help information and exit.
1241
1242 --het-group=<expr>
1243 Identify each component in a heterogeneous job allocation for
1244 which a step is to be created. Applies only to srun commands is‐
1245 sued inside a salloc allocation or sbatch script. <expr> is a
1246 set of integers corresponding to one or more options offsets on
1247 the salloc or sbatch command line. Examples: "--het-group=2",
1248 "--het-group=0,4", "--het-group=1,3-5". The default value is
1249 --het-group=0.
1250
1251 --hint=<type>
1252 Bind tasks according to application hints.
1253 NOTE: This option cannot be used in conjunction with any of
1254 --ntasks-per-core, --threads-per-core, --cpu-bind (other than
1255 --cpu-bind=verbose) or -B. If --hint is specified as a command
1256 line argument, it will take precedence over the environment.
1257
1258 compute_bound
1259 Select settings for compute bound applications: use all
1260 cores in each socket, one thread per core.
1261
1262 memory_bound
1263 Select settings for memory bound applications: use only
1264 one core in each socket, one thread per core.
1265
1266 [no]multithread
1267 [don't] use extra threads with in-core multi-threading
1268 which can benefit communication intensive applications.
1269 Only supported with the task/affinity plugin.
1270
1271 help show this help message
1272
1273 This option applies to job allocations.
1274
1275 -H, --hold
1276 Specify the job is to be submitted in a held state (priority of
1277 zero). A held job can now be released using scontrol to reset
1278 its priority (e.g. "scontrol release <job_id>"). This option ap‐
1279 plies to job allocations.
1280
1281 -I, --immediate[=<seconds>]
1282 exit if resources are not available within the time period spec‐
1283 ified. If no argument is given (seconds defaults to 1), re‐
1284 sources must be available immediately for the request to suc‐
1285 ceed. If defer is configured in SchedulerParameters and sec‐
1286 onds=1 the allocation request will fail immediately; defer con‐
1287 flicts and takes precedence over this option. By default, --im‐
1288 mediate is off, and the command will block until resources be‐
1289 come available. Since this option's argument is optional, for
1290 proper parsing the single letter option must be followed immedi‐
1291 ately with the value and not include a space between them. For
1292 example "-I60" and not "-I 60". This option applies to job and
1293 step allocations.
1294
1295 -i, --input=<mode>
1296 Specify how stdin is to be redirected. By default, srun redi‐
1297 rects stdin from the terminal to all tasks. See IO Redirection
1298 below for more options. For OS X, the poll() function does not
1299 support stdin, so input from a terminal is not possible. This
1300 option applies to job and step allocations.
1301
1302 -J, --job-name=<jobname>
1303 Specify a name for the job. The specified name will appear along
1304 with the job id number when querying running jobs on the system.
1305 The default is the supplied executable program's name. NOTE:
1306 This information may be written to the slurm_jobacct.log file.
1307 This file is space delimited so if a space is used in the job‐
1308 name name it will cause problems in properly displaying the con‐
1309 tents of the slurm_jobacct.log file when the sacct command is
1310 used. This option applies to job and step allocations.
1311
1312 --jobid=<jobid>
1313 Initiate a job step under an already allocated job with job id
1314 id. Using this option will cause srun to behave exactly as if
1315 the SLURM_JOB_ID environment variable was set. This option ap‐
1316 plies to step allocations.
1317
1318 -K, --kill-on-bad-exit[=0|1]
1319 Controls whether or not to terminate a step if any task exits
1320 with a non-zero exit code. If this option is not specified, the
1321 default action will be based upon the Slurm configuration param‐
1322 eter of KillOnBadExit. If this option is specified, it will take
1323 precedence over KillOnBadExit. An option argument of zero will
1324 not terminate the job. A non-zero argument or no argument will
1325 terminate the job. Note: This option takes precedence over the
1326 -W, --wait option to terminate the job immediately if a task ex‐
1327 its with a non-zero exit code. Since this option's argument is
1328 optional, for proper parsing the single letter option must be
1329 followed immediately with the value and not include a space be‐
1330 tween them. For example "-K1" and not "-K 1".
1331
1332 -l, --label
1333 Prepend task number to lines of stdout/err. The --label option
1334 will prepend lines of output with the remote task id. This op‐
1335 tion applies to step allocations.
1336
1337 -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
1338 Specification of licenses (or other resources available on all
1339 nodes of the cluster) which must be allocated to this job. Li‐
1340 cense names can be followed by a colon and count (the default
1341 count is one). Multiple license names should be comma separated
1342 (e.g. "--licenses=foo:4,bar"). This option applies to job allo‐
1343 cations.
1344
1345 NOTE: When submitting heterogeneous jobs, license requests only
1346 work correctly when made on the first component job. For exam‐
1347 ple "srun -L ansys:2 : myexecutable".
1348
1349 --mail-type=<type>
1350 Notify user by email when certain event types occur. Valid type
1351 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1352 BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
1353 VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1354 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1355 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
1356 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1357 time limit). Multiple type values may be specified in a comma
1358 separated list. The user to be notified is indicated with
1359 --mail-user. This option applies to job allocations.
1360
1361 --mail-user=<user>
1362 User to receive email notification of state changes as defined
1363 by --mail-type. The default value is the submitting user. This
1364 option applies to job allocations.
1365
1366 --mcs-label=<mcs>
1367 Used only when the mcs/group plugin is enabled. This parameter
1368 is a group among the groups of the user. Default value is cal‐
1369 culated by the Plugin mcs if it's enabled. This option applies
1370 to job allocations.
1371
1372 --mem=<size>[units]
1373 Specify the real memory required per node. Default units are
1374 megabytes. Different units can be specified using the suffix
1375 [K|M|G|T]. Default value is DefMemPerNode and the maximum value
1376 is MaxMemPerNode. If configured, both of parameters can be seen
1377 using the scontrol show config command. This parameter would
1378 generally be used if whole nodes are allocated to jobs (Select‐
1379 Type=select/linear). Specifying a memory limit of zero for a
1380 job step will restrict the job step to the amount of memory al‐
1381 located to the job, but not remove any of the job's memory allo‐
1382 cation from being available to other job steps. Also see
1383 --mem-per-cpu and --mem-per-gpu. The --mem, --mem-per-cpu and
1384 --mem-per-gpu options are mutually exclusive. If --mem,
1385 --mem-per-cpu or --mem-per-gpu are specified as command line ar‐
1386 guments, then they will take precedence over the environment
1387 (potentially inherited from salloc or sbatch).
1388
1389 NOTE: A memory size specification of zero is treated as a spe‐
1390 cial case and grants the job access to all of the memory on each
1391 node for newly submitted jobs and all available job memory to
1392 new job steps.
1393
1394 NOTE: Enforcement of memory limits currently relies upon the
1395 task/cgroup plugin or enabling of accounting, which samples mem‐
1396 ory use on a periodic basis (data need not be stored, just col‐
1397 lected). In both cases memory use is based upon the job's Resi‐
1398 dent Set Size (RSS). A task may exceed the memory limit until
1399 the next periodic accounting sample.
1400
1401 This option applies to job and step allocations.
1402
1403 --mem-bind=[{quiet|verbose},]<type>
1404 Bind tasks to memory. Used only when the task/affinity plugin is
1405 enabled and the NUMA memory functions are available. Note that
1406 the resolution of CPU and memory binding may differ on some ar‐
1407 chitectures. For example, CPU binding may be performed at the
1408 level of the cores within a processor while memory binding will
1409 be performed at the level of nodes, where the definition of
1410 "nodes" may differ from system to system. By default no memory
1411 binding is performed; any task using any CPU can use any memory.
1412 This option is typically used to ensure that each task is bound
1413 to the memory closest to its assigned CPU. The use of any type
1414 other than "none" or "local" is not recommended. If you want
1415 greater control, try running a simple test code with the options
1416 "--cpu-bind=verbose,none --mem-bind=verbose,none" to determine
1417 the specific configuration.
1418
1419 NOTE: To have Slurm always report on the selected memory binding
1420 for all commands executed in a shell, you can enable verbose
1421 mode by setting the SLURM_MEM_BIND environment variable value to
1422 "verbose".
1423
1424 The following informational environment variables are set when
1425 --mem-bind is in use:
1426
1427 SLURM_MEM_BIND_LIST
1428 SLURM_MEM_BIND_PREFER
1429 SLURM_MEM_BIND_SORT
1430 SLURM_MEM_BIND_TYPE
1431 SLURM_MEM_BIND_VERBOSE
1432
1433 See the ENVIRONMENT VARIABLES section for a more detailed de‐
1434 scription of the individual SLURM_MEM_BIND* variables.
1435
1436 Supported options include:
1437
1438 help show this help message
1439
1440 local Use memory local to the processor in use
1441
1442 map_mem:<list>
1443 Bind by setting memory masks on tasks (or ranks) as spec‐
1444 ified where <list> is
1445 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1446 ping is specified for a node and identical mapping is ap‐
1447 plied to the tasks on every node (i.e. the lowest task ID
1448 on each node is mapped to the first ID specified in the
1449 list, etc.). NUMA IDs are interpreted as decimal values
1450 unless they are preceded with '0x' in which case they in‐
1451 terpreted as hexadecimal values. If the number of tasks
1452 (or ranks) exceeds the number of elements in this list,
1453 elements in the list will be reused as needed starting
1454 from the beginning of the list. To simplify support for
1455 large task counts, the lists may follow a map with an as‐
1456 terisk and repetition count. For example
1457 "map_mem:0x0f*4,0xf0*4". For predictable binding re‐
1458 sults, all CPUs for each node in the job should be allo‐
1459 cated to the job.
1460
1461 mask_mem:<list>
1462 Bind by setting memory masks on tasks (or ranks) as spec‐
1463 ified where <list> is
1464 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1465 mapping is specified for a node and identical mapping is
1466 applied to the tasks on every node (i.e. the lowest task
1467 ID on each node is mapped to the first mask specified in
1468 the list, etc.). NUMA masks are always interpreted as
1469 hexadecimal values. Note that masks must be preceded
1470 with a '0x' if they don't begin with [0-9] so they are
1471 seen as numerical values. If the number of tasks (or
1472 ranks) exceeds the number of elements in this list, ele‐
1473 ments in the list will be reused as needed starting from
1474 the beginning of the list. To simplify support for large
1475 task counts, the lists may follow a mask with an asterisk
1476 and repetition count. For example "mask_mem:0*4,1*4".
1477 For predictable binding results, all CPUs for each node
1478 in the job should be allocated to the job.
1479
1480 no[ne] don't bind tasks to memory (default)
1481
1482 nosort avoid sorting free cache pages (default, LaunchParameters
1483 configuration parameter can override this default)
1484
1485 p[refer]
1486 Prefer use of first specified NUMA node, but permit
1487 use of other available NUMA nodes.
1488
1489 q[uiet]
1490 quietly bind before task runs (default)
1491
1492 rank bind by task rank (not recommended)
1493
1494 sort sort free cache pages (run zonesort on Intel KNL nodes)
1495
1496 v[erbose]
1497 verbosely report binding before task runs
1498
1499 This option applies to job and step allocations.
1500
1501 --mem-per-cpu=<size>[units]
1502 Minimum memory required per usable allocated CPU. Default units
1503 are megabytes. Different units can be specified using the suf‐
1504 fix [K|M|G|T]. The default value is DefMemPerCPU and the maxi‐
1505 mum value is MaxMemPerCPU (see exception below). If configured,
1506 both parameters can be seen using the scontrol show config com‐
1507 mand. Note that if the job's --mem-per-cpu value exceeds the
1508 configured MaxMemPerCPU, then the user's limit will be treated
1509 as a memory limit per task; --mem-per-cpu will be reduced to a
1510 value no larger than MaxMemPerCPU; --cpus-per-task will be set
1511 and the value of --cpus-per-task multiplied by the new
1512 --mem-per-cpu value will equal the original --mem-per-cpu value
1513 specified by the user. This parameter would generally be used
1514 if individual processors are allocated to jobs (SelectType=se‐
1515 lect/cons_res). If resources are allocated by core, socket, or
1516 whole nodes, then the number of CPUs allocated to a job may be
1517 higher than the task count and the value of --mem-per-cpu should
1518 be adjusted accordingly. Specifying a memory limit of zero for
1519 a job step will restrict the job step to the amount of memory
1520 allocated to the job, but not remove any of the job's memory al‐
1521 location from being available to other job steps. Also see
1522 --mem and --mem-per-gpu. The --mem, --mem-per-cpu and
1523 --mem-per-gpu options are mutually exclusive.
1524
1525 NOTE: If the final amount of memory requested by a job can't be
1526 satisfied by any of the nodes configured in the partition, the
1527 job will be rejected. This could happen if --mem-per-cpu is
1528 used with the --exclusive option for a job allocation and
1529 --mem-per-cpu times the number of CPUs on a node is greater than
1530 the total memory of that node.
1531
1532 NOTE: This applies to usable allocated CPUs in a job allocation.
1533 This is important when more than one thread per core is config‐
1534 ured. If a job requests --threads-per-core with fewer threads
1535 on a core than exist on the core (or --hint=nomultithread which
1536 implies --threads-per-core=1), the job will be unable to use
1537 those extra threads on the core and those threads will not be
1538 included in the memory per CPU calculation. But if the job has
1539 access to all threads on the core, those threads will be in‐
1540 cluded in the memory per CPU calculation even if the job did not
1541 explicitly request those threads.
1542
1543 In the following examples, each core has two threads.
1544
1545 In this first example, two tasks can run on separate hyper‐
1546 threads in the same core because --threads-per-core is not used.
1547 The third task uses both threads of the second core. The allo‐
1548 cated memory per cpu includes all threads:
1549
1550 $ salloc -n3 --mem-per-cpu=100
1551 salloc: Granted job allocation 17199
1552 $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
1553 JobID ReqTRES AllocTRES
1554 ------- ----------------------------------- -----------------------------------
1555 17199 billing=3,cpu=3,mem=300M,node=1 billing=4,cpu=4,mem=400M,node=1
1556
1557 In this second example, because of --threads-per-core=1, each
1558 task is allocated an entire core but is only able to use one
1559 thread per core. Allocated CPUs includes all threads on each
1560 core. However, allocated memory per cpu includes only the usable
1561 thread in each core.
1562
1563 $ salloc -n3 --mem-per-cpu=100 --threads-per-core=1
1564 salloc: Granted job allocation 17200
1565 $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
1566 JobID ReqTRES AllocTRES
1567 ------- ----------------------------------- -----------------------------------
1568 17200 billing=3,cpu=3,mem=300M,node=1 billing=6,cpu=6,mem=300M,node=1
1569
1570 --mem-per-gpu=<size>[units]
1571 Minimum memory required per allocated GPU. Default units are
1572 megabytes. Different units can be specified using the suffix
1573 [K|M|G|T]. Default value is DefMemPerGPU and is available on
1574 both a global and per partition basis. If configured, the pa‐
1575 rameters can be seen using the scontrol show config and scontrol
1576 show partition commands. Also see --mem. The --mem,
1577 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1578
1579 --mincpus=<n>
1580 Specify a minimum number of logical cpus/processors per node.
1581 This option applies to job allocations.
1582
1583 --mpi=<mpi_type>
1584 Identify the type of MPI to be used. May result in unique initi‐
1585 ation procedures.
1586
1587 cray_shasta
1588 To enable Cray PMI support. This is for applications
1589 built with the Cray Programming Environment. The PMI Con‐
1590 trol Port can be specified with the --resv-ports option
1591 or with the MpiParams=ports=<port range> parameter in
1592 your slurm.conf. This plugin does not have support for
1593 heterogeneous jobs. Support for cray_shasta is included
1594 by default.
1595
1596 list Lists available mpi types to choose from.
1597
1598 pmi2 To enable PMI2 support. The PMI2 support in Slurm works
1599 only if the MPI implementation supports it, in other
1600 words if the MPI has the PMI2 interface implemented. The
1601 --mpi=pmi2 will load the library lib/slurm/mpi_pmi2.so
1602 which provides the server side functionality but the
1603 client side must implement PMI2_Init() and the other in‐
1604 terface calls.
1605
1606 pmix To enable PMIx support (https://pmix.github.io). The PMIx
1607 support in Slurm can be used to launch parallel applica‐
1608 tions (e.g. MPI) if it supports PMIx, PMI2 or PMI1. Slurm
1609 must be configured with pmix support by passing
1610 "--with-pmix=<PMIx installation path>" option to its
1611 "./configure" script.
1612
1613 At the time of writing PMIx is supported in Open MPI
1614 starting from version 2.0. PMIx also supports backward
1615 compatibility with PMI1 and PMI2 and can be used if MPI
1616 was configured with PMI2/PMI1 support pointing to the
1617 PMIx library ("libpmix"). If MPI supports PMI1/PMI2 but
1618 doesn't provide the way to point to a specific implemen‐
1619 tation, a hack'ish solution leveraging LD_PRELOAD can be
1620 used to force "libpmix" usage.
1621
1622 none No special MPI processing. This is the default and works
1623 with many other versions of MPI.
1624
1625 This option applies to step allocations.
1626
1627 --msg-timeout=<seconds>
1628 Modify the job launch message timeout. The default value is
1629 MessageTimeout in the Slurm configuration file slurm.conf.
1630 Changes to this are typically not recommended, but could be use‐
1631 ful to diagnose problems. This option applies to job alloca‐
1632 tions.
1633
1634 --multi-prog
1635 Run a job with different programs and different arguments for
1636 each task. In this case, the executable program specified is ac‐
1637 tually a configuration file specifying the executable and argu‐
1638 ments for each task. See MULTIPLE PROGRAM CONFIGURATION below
1639 for details on the configuration file contents. This option ap‐
1640 plies to step allocations.
1641
1642 --network=<type>
1643 Specify information pertaining to the switch or network. The
1644 interpretation of type is system dependent. This option is sup‐
1645 ported when running Slurm on a Cray natively. It is used to re‐
1646 quest using Network Performance Counters. Only one value per
1647 request is valid. All options are case in-sensitive. In this
1648 configuration supported values include:
1649
1650
1651 system
1652 Use the system-wide network performance counters. Only
1653 nodes requested will be marked in use for the job alloca‐
1654 tion. If the job does not fill up the entire system the
1655 rest of the nodes are not able to be used by other jobs
1656 using NPC, if idle their state will appear as PerfCnts.
1657 These nodes are still available for other jobs not using
1658 NPC.
1659
1660 blade Use the blade network performance counters. Only nodes re‐
1661 quested will be marked in use for the job allocation. If
1662 the job does not fill up the entire blade(s) allocated to
1663 the job those blade(s) are not able to be used by other
1664 jobs using NPC, if idle their state will appear as PerfC‐
1665 nts. These nodes are still available for other jobs not
1666 using NPC.
1667
1668 In all cases the job allocation request must specify the --ex‐
1669 clusive option and the step cannot specify the --overlap option.
1670 Otherwise the request will be denied.
1671
1672 Also with any of these options steps are not allowed to share
1673 blades, so resources would remain idle inside an allocation if
1674 the step running on a blade does not take up all the nodes on
1675 the blade.
1676
1677 The network option is also available on systems with HPE Sling‐
1678 shot networks. It can be used to override the default network
1679 resources allocated for the job step. Multiple values may be
1680 specified in a comma-separated list.
1681
1682 def_<rsrc>=<val>
1683 Per-CPU reserved allocation for this resource.
1684
1685 res_<rsrc>=<val>
1686 Per-node reserved allocation for this resource. If
1687 set, overrides the per-CPU allocation.
1688
1689 max_<rsrc>=<val>
1690 Maximum per-node limit for this resource.
1691
1692 depth=<depth>
1693 Multiplier for per-CPU resource allocation. Default
1694 is the number of reserved CPUs on the node.
1695
1696 The resources that may be requested are:
1697
1698 txqs Transmit command queues. The default is 3 per-CPU,
1699 maximum 1024 per-node.
1700
1701 tgqs Target command queues. The default is 2 per-CPU, max‐
1702 imum 512 per-node.
1703
1704 eqs Event queues. The default is 8 per-CPU, maximum 2048
1705 per-node.
1706
1707 cts Counters. The default is 2 per-CPU, maximum 2048 per-
1708 node.
1709
1710 tles Trigger list entries. The default is 1 per-CPU, maxi‐
1711 mum 2048 per-node.
1712
1713 ptes Portable table entries. The default is 8 per-CPU,
1714 maximum 2048 per-node.
1715
1716 les List entries. The default is 134 per-CPU, maximum
1717 65535 per-node.
1718
1719 acs Addressing contexts. The default is 4 per-CPU, maxi‐
1720 mum 1024 per-node.
1721
1722 This option applies to job and step allocations.
1723
1724 --nice[=adjustment]
1725 Run the job with an adjusted scheduling priority within Slurm.
1726 With no adjustment value the scheduling priority is decreased by
1727 100. A negative nice value increases the priority, otherwise de‐
1728 creases it. The adjustment range is +/- 2147483645. Only privi‐
1729 leged users can specify a negative adjustment.
1730
1731 -Z, --no-allocate
1732 Run the specified tasks on a set of nodes without creating a
1733 Slurm "job" in the Slurm queue structure, bypassing the normal
1734 resource allocation step. The list of nodes must be specified
1735 with the -w, --nodelist option. This is a privileged option
1736 only available for the users "SlurmUser" and "root". This option
1737 applies to job allocations.
1738
1739 -k, --no-kill[=off]
1740 Do not automatically terminate a job if one of the nodes it has
1741 been allocated fails. This option applies to job and step allo‐
1742 cations. The job will assume all responsibilities for
1743 fault-tolerance. Tasks launch using this option will not be
1744 considered terminated (e.g. -K, --kill-on-bad-exit and -W,
1745 --wait options will have no effect upon the job step). The ac‐
1746 tive job step (MPI job) will likely suffer a fatal error, but
1747 subsequent job steps may be run if this option is specified.
1748
1749 Specify an optional argument of "off" disable the effect of the
1750 SLURM_NO_KILL environment variable.
1751
1752 The default action is to terminate the job upon node failure.
1753
1754 -F, --nodefile=<node_file>
1755 Much like --nodelist, but the list is contained in a file of
1756 name node file. The node names of the list may also span multi‐
1757 ple lines in the file. Duplicate node names in the file will
1758 be ignored. The order of the node names in the list is not im‐
1759 portant; the node names will be sorted by Slurm.
1760
1761 -w, --nodelist={<node_name_list>|<filename>}
1762 Request a specific list of hosts. The job will contain all of
1763 these hosts and possibly additional hosts as needed to satisfy
1764 resource requirements. The list may be specified as a
1765 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1766 for example), or a filename. The host list will be assumed to
1767 be a filename if it contains a "/" character. If you specify a
1768 minimum node or processor count larger than can be satisfied by
1769 the supplied host list, additional resources will be allocated
1770 on other nodes as needed. Rather than repeating a host name
1771 multiple times, an asterisk and a repetition count may be ap‐
1772 pended to a host name. For example "host1,host1" and "host1*2"
1773 are equivalent. If the number of tasks is given and a list of
1774 requested nodes is also given, the number of nodes used from
1775 that list will be reduced to match that of the number of tasks
1776 if the number of nodes in the list is greater than the number of
1777 tasks. This option applies to job and step allocations.
1778
1779 -N, --nodes=<minnodes>[-maxnodes]
1780 Request that a minimum of minnodes nodes be allocated to this
1781 job. A maximum node count may also be specified with maxnodes.
1782 If only one number is specified, this is used as both the mini‐
1783 mum and maximum node count. The partition's node limits super‐
1784 sede those of the job. If a job's node limits are outside of
1785 the range permitted for its associated partition, the job will
1786 be left in a PENDING state. This permits possible execution at
1787 a later time, when the partition limit is changed. If a job
1788 node limit exceeds the number of nodes configured in the parti‐
1789 tion, the job will be rejected. Note that the environment vari‐
1790 able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1791 ibility) will be set to the count of nodes actually allocated to
1792 the job. See the ENVIRONMENT VARIABLES section for more informa‐
1793 tion. If -N is not specified, the default behavior is to allo‐
1794 cate enough nodes to satisfy the requested resources as ex‐
1795 pressed by per-job specification options, e.g. -n, -c and
1796 --gpus. The job will be allocated as many nodes as possible
1797 within the range specified and without delaying the initiation
1798 of the job. If the number of tasks is given and a number of re‐
1799 quested nodes is also given, the number of nodes used from that
1800 request will be reduced to match that of the number of tasks if
1801 the number of nodes in the request is greater than the number of
1802 tasks. The node count specification may include a numeric value
1803 followed by a suffix of "k" (multiplies numeric value by 1,024)
1804 or "m" (multiplies numeric value by 1,048,576). This option ap‐
1805 plies to job and step allocations.
1806
1807 -n, --ntasks=<number>
1808 Specify the number of tasks to run. Request that srun allocate
1809 resources for ntasks tasks. The default is one task per node,
1810 but note that the --cpus-per-task option will change this de‐
1811 fault. This option applies to job and step allocations.
1812
1813 --ntasks-per-core=<ntasks>
1814 Request the maximum ntasks be invoked on each core. This option
1815 applies to the job allocation, but not to step allocations.
1816 Meant to be used with the --ntasks option. Related to
1817 --ntasks-per-node except at the core level instead of the node
1818 level. Masks will automatically be generated to bind the tasks
1819 to specific cores unless --cpu-bind=none is specified. NOTE:
1820 This option is not supported when using SelectType=select/lin‐
1821 ear.
1822
1823 --ntasks-per-gpu=<ntasks>
1824 Request that there are ntasks tasks invoked for every GPU. This
1825 option can work in two ways: 1) either specify --ntasks in addi‐
1826 tion, in which case a type-less GPU specification will be auto‐
1827 matically determined to satisfy --ntasks-per-gpu, or 2) specify
1828 the GPUs wanted (e.g. via --gpus or --gres) without specifying
1829 --ntasks, and the total task count will be automatically deter‐
1830 mined. The number of CPUs needed will be automatically in‐
1831 creased if necessary to allow for any calculated task count.
1832 This option will implicitly set --gpu-bind=single:<ntasks>, but
1833 that can be overridden with an explicit --gpu-bind specifica‐
1834 tion. This option is not compatible with a node range (i.e.
1835 -N<minnodes-maxnodes>). This option is not compatible with
1836 --gpus-per-task, --gpus-per-socket, or --ntasks-per-node. This
1837 option is not supported unless SelectType=cons_tres is config‐
1838 ured (either directly or indirectly on Cray systems).
1839
1840 --ntasks-per-node=<ntasks>
1841 Request that ntasks be invoked on each node. If used with the
1842 --ntasks option, the --ntasks option will take precedence and
1843 the --ntasks-per-node will be treated as a maximum count of
1844 tasks per node. Meant to be used with the --nodes option. This
1845 is related to --cpus-per-task=ncpus, but does not require knowl‐
1846 edge of the actual number of cpus on each node. In some cases,
1847 it is more convenient to be able to request that no more than a
1848 specific number of tasks be invoked on each node. Examples of
1849 this include submitting a hybrid MPI/OpenMP app where only one
1850 MPI "task/rank" should be assigned to each node while allowing
1851 the OpenMP portion to utilize all of the parallelism present in
1852 the node, or submitting a single setup/cleanup/monitoring job to
1853 each node of a pre-existing allocation as one step in a larger
1854 job script. This option applies to job allocations.
1855
1856 --ntasks-per-socket=<ntasks>
1857 Request the maximum ntasks be invoked on each socket. This op‐
1858 tion applies to the job allocation, but not to step allocations.
1859 Meant to be used with the --ntasks option. Related to
1860 --ntasks-per-node except at the socket level instead of the node
1861 level. Masks will automatically be generated to bind the tasks
1862 to specific sockets unless --cpu-bind=none is specified. NOTE:
1863 This option is not supported when using SelectType=select/lin‐
1864 ear.
1865
1866 --open-mode={append|truncate}
1867 Open the output and error files using append or truncate mode as
1868 specified. For heterogeneous job steps the default value is
1869 "append". Otherwise the default value is specified by the sys‐
1870 tem configuration parameter JobFileAppend. This option applies
1871 to job and step allocations.
1872
1873 -o, --output=<filename_pattern>
1874 Specify the "filename pattern" for stdout redirection. By de‐
1875 fault in interactive mode, srun collects stdout from all tasks
1876 and sends this output via TCP/IP to the attached terminal. With
1877 --output stdout may be redirected to a file, to one file per
1878 task, or to /dev/null. See section IO Redirection below for the
1879 various forms of filename pattern. If the specified file al‐
1880 ready exists, it will be overwritten.
1881
1882 If --error is not also specified on the command line, both std‐
1883 out and stderr will directed to the file specified by --output.
1884 This option applies to job and step allocations.
1885
1886 -O, --overcommit
1887 Overcommit resources. This option applies to job and step allo‐
1888 cations.
1889
1890 When applied to a job allocation (not including jobs requesting
1891 exclusive access to the nodes) the resources are allocated as if
1892 only one task per node is requested. This means that the re‐
1893 quested number of cpus per task (-c, --cpus-per-task) are allo‐
1894 cated per node rather than being multiplied by the number of
1895 tasks. Options used to specify the number of tasks per node,
1896 socket, core, etc. are ignored.
1897
1898 When applied to job step allocations (the srun command when exe‐
1899 cuted within an existing job allocation), this option can be
1900 used to launch more than one task per CPU. Normally, srun will
1901 not allocate more than one process per CPU. By specifying
1902 --overcommit you are explicitly allowing more than one process
1903 per CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1904 mitted to execute per node. NOTE: MAX_TASKS_PER_NODE is defined
1905 in the file slurm.h and is not a variable, it is set at Slurm
1906 build time.
1907
1908 --overlap
1909 Specifying --overlap allows steps to share all resources (CPUs,
1910 memory, and GRES) with all other steps. A step using this option
1911 will overlap all other steps, even those that did not specify
1912 --overlap.
1913
1914 By default steps do not share resources with other parallel
1915 steps. This option applies to step allocations.
1916
1917 -s, --oversubscribe
1918 The job allocation can over-subscribe resources with other run‐
1919 ning jobs. The resources to be over-subscribed can be nodes,
1920 sockets, cores, and/or hyperthreads depending upon configura‐
1921 tion. The default over-subscribe behavior depends on system
1922 configuration and the partition's OverSubscribe option takes
1923 precedence over the job's option. This option may result in the
1924 allocation being granted sooner than if the --oversubscribe op‐
1925 tion was not set and allow higher system utilization, but appli‐
1926 cation performance will likely suffer due to competition for re‐
1927 sources. This option applies to job allocations.
1928
1929 -p, --partition=<partition_names>
1930 Request a specific partition for the resource allocation. If
1931 not specified, the default behavior is to allow the slurm con‐
1932 troller to select the default partition as designated by the
1933 system administrator. If the job can use more than one parti‐
1934 tion, specify their names in a comma separate list and the one
1935 offering earliest initiation will be used with no regard given
1936 to the partition name ordering (although higher priority parti‐
1937 tions will be considered first). When the job is initiated, the
1938 name of the partition used will be placed first in the job
1939 record partition string. This option applies to job allocations.
1940
1941 --power=<flags>
1942 Comma separated list of power management plugin options. Cur‐
1943 rently available flags include: level (all nodes allocated to
1944 the job should have identical power caps, may be disabled by the
1945 Slurm configuration option PowerParameters=job_no_level). This
1946 option applies to job allocations.
1947
1948 --prefer=<list>
1949 Nodes can have features assigned to them by the Slurm adminis‐
1950 trator. Users can specify which of these features are desired
1951 but not required by their job using the prefer option. This op‐
1952 tion operates independently from --constraint and will override
1953 whatever is set there if possible. When scheduling the features
1954 in --prefer are tried first if a node set isn't available with
1955 those features then --constraint is attempted. See --constraint
1956 for more information, this option behaves the same way.
1957
1958
1959 -E, --preserve-env
1960 Pass the current values of environment variables
1961 SLURM_JOB_NUM_NODES and SLURM_NTASKS through to the executable,
1962 rather than computing them from command line parameters. This
1963 option applies to job allocations.
1964
1965 --priority=<value>
1966 Request a specific job priority. May be subject to configura‐
1967 tion specific constraints. value should either be a numeric
1968 value or "TOP" (for highest possible value). Only Slurm opera‐
1969 tors and administrators can set the priority of a job. This op‐
1970 tion applies to job allocations only.
1971
1972 --profile={all|none|<type>[,<type>...]}
1973 Enables detailed data collection by the acct_gather_profile
1974 plugin. Detailed data are typically time-series that are stored
1975 in an HDF5 file for the job or an InfluxDB database depending on
1976 the configured plugin. This option applies to job and step al‐
1977 locations.
1978
1979 All All data types are collected. (Cannot be combined with
1980 other values.)
1981
1982 None No data types are collected. This is the default.
1983 (Cannot be combined with other values.)
1984
1985 Valid type values are:
1986
1987 Energy Energy data is collected.
1988
1989 Task Task (I/O, Memory, ...) data is collected.
1990
1991 Filesystem
1992 Filesystem data is collected.
1993
1994 Network
1995 Network (InfiniBand) data is collected.
1996
1997 --prolog=<executable>
1998 srun will run executable just before launching the job step.
1999 The command line arguments for executable will be the command
2000 and arguments of the job step. If executable is "none", then no
2001 srun prolog will be run. This parameter overrides the SrunProlog
2002 parameter in slurm.conf. This parameter is completely indepen‐
2003 dent from the Prolog parameter in slurm.conf. This option ap‐
2004 plies to job allocations.
2005
2006 --propagate[=rlimit[,rlimit...]]
2007 Allows users to specify which of the modifiable (soft) resource
2008 limits to propagate to the compute nodes and apply to their
2009 jobs. If no rlimit is specified, then all resource limits will
2010 be propagated. The following rlimit names are supported by
2011 Slurm (although some options may not be supported on some sys‐
2012 tems):
2013
2014 ALL All limits listed below (default)
2015
2016 NONE No limits listed below
2017
2018 AS The maximum address space (virtual memory) for a
2019 process.
2020
2021 CORE The maximum size of core file
2022
2023 CPU The maximum amount of CPU time
2024
2025 DATA The maximum size of a process's data segment
2026
2027 FSIZE The maximum size of files created. Note that if the
2028 user sets FSIZE to less than the current size of the
2029 slurmd.log, job launches will fail with a 'File size
2030 limit exceeded' error.
2031
2032 MEMLOCK The maximum size that may be locked into memory
2033
2034 NOFILE The maximum number of open files
2035
2036 NPROC The maximum number of processes available
2037
2038 RSS The maximum resident set size. Note that this only has
2039 effect with Linux kernels 2.4.30 or older or BSD.
2040
2041 STACK The maximum stack size
2042
2043 This option applies to job allocations.
2044
2045 --pty Execute task zero in pseudo terminal mode. Implicitly sets
2046 --unbuffered. Implicitly sets --error and --output to /dev/null
2047 for all tasks except task zero, which may cause those tasks to
2048 exit immediately (e.g. shells will typically exit immediately in
2049 that situation). This option applies to step allocations.
2050
2051 -q, --qos=<qos>
2052 Request a quality of service for the job. QOS values can be de‐
2053 fined for each user/cluster/account association in the Slurm
2054 database. Users will be limited to their association's defined
2055 set of qos's when the Slurm configuration parameter, Account‐
2056 ingStorageEnforce, includes "qos" in its definition. This option
2057 applies to job allocations.
2058
2059 -Q, --quiet
2060 Suppress informational messages from srun. Errors will still be
2061 displayed. This option applies to job and step allocations.
2062
2063 --quit-on-interrupt
2064 Quit immediately on single SIGINT (Ctrl-C). Use of this option
2065 disables the status feature normally available when srun re‐
2066 ceives a single Ctrl-C and causes srun to instead immediately
2067 terminate the running job. This option applies to step alloca‐
2068 tions.
2069
2070 --reboot
2071 Force the allocated nodes to reboot before starting the job.
2072 This is only supported with some system configurations and will
2073 otherwise be silently ignored. Only root, SlurmUser or admins
2074 can reboot nodes. This option applies to job allocations.
2075
2076 -r, --relative=<n>
2077 Run a job step relative to node n of the current allocation.
2078 This option may be used to spread several job steps out among
2079 the nodes of the current job. If -r is used, the current job
2080 step will begin at node n of the allocated nodelist, where the
2081 first node is considered node 0. The -r option is not permitted
2082 with -w or -x option and will result in a fatal error when not
2083 running within a prior allocation (i.e. when SLURM_JOB_ID is not
2084 set). The default for n is 0. If the value of --nodes exceeds
2085 the number of nodes identified with the --relative option, a
2086 warning message will be printed and the --relative option will
2087 take precedence. This option applies to step allocations.
2088
2089 --reservation=<reservation_names>
2090 Allocate resources for the job from the named reservation. If
2091 the job can use more than one reservation, specify their names
2092 in a comma separate list and the one offering earliest initia‐
2093 tion. Each reservation will be considered in the order it was
2094 requested. All reservations will be listed in scontrol/squeue
2095 through the life of the job. In accounting the first reserva‐
2096 tion will be seen and after the job starts the reservation used
2097 will replace it.
2098
2099 --resv-ports[=count]
2100 Reserve communication ports for this job. Users can specify the
2101 number of port they want to reserve. The parameter Mpi‐
2102 Params=ports=12000-12999 must be specified in slurm.conf. If the
2103 number of reserved ports is zero then no ports are reserved.
2104 Used for native Cray's PMI only. This option applies to job and
2105 step allocations.
2106
2107 --send-libs[=yes|no]
2108 If set to yes (or no argument), autodetect and broadcast the ex‐
2109 ecutable's shared object dependencies to allocated compute
2110 nodes. The files are placed in a directory alongside the exe‐
2111 cutable. The LD_LIBRARY_PATH is automatically updated to include
2112 this cache directory as well. This overrides the default behav‐
2113 ior configured in slurm.conf SbcastParameters send_libs. This
2114 option only works in conjunction with --bcast. See also
2115 --bcast-exclude.
2116
2117 --signal=[R:]<sig_num>[@sig_time]
2118 When a job is within sig_time seconds of its end time, send it
2119 the signal sig_num. Due to the resolution of event handling by
2120 Slurm, the signal may be sent up to 60 seconds earlier than
2121 specified. sig_num may either be a signal number or name (e.g.
2122 "10" or "USR1"). sig_time must have an integer value between 0
2123 and 65535. By default, no signal is sent before the job's end
2124 time. If a sig_num is specified without any sig_time, the de‐
2125 fault time will be 60 seconds. This option applies to job allo‐
2126 cations. Use the "R:" option to allow this job to overlap with
2127 a reservation with MaxStartDelay set. To have the signal sent
2128 at preemption time see the preempt_send_user_signal SlurmctldPa‐
2129 rameter.
2130
2131 --slurmd-debug=<level>
2132 Specify a debug level for slurmd(8). The level may be specified
2133 either an integer value between 0 [quiet, only errors are dis‐
2134 played] and 4 [verbose operation] or the SlurmdDebug tags.
2135
2136 quiet Log nothing
2137
2138 fatal Log only fatal errors
2139
2140 error Log only errors
2141
2142 info Log errors and general informational messages
2143
2144 verbose Log errors and verbose informational messages
2145
2146 The slurmd debug information is copied onto the stderr of the
2147 job. By default only errors are displayed. This option applies
2148 to job and step allocations.
2149
2150 --sockets-per-node=<sockets>
2151 Restrict node selection to nodes with at least the specified
2152 number of sockets. See additional information under -B option
2153 above when task/affinity plugin is enabled. This option applies
2154 to job allocations.
2155 NOTE: This option may implicitly impact the number of tasks if
2156 -n was not specified.
2157
2158 --spread-job
2159 Spread the job allocation over as many nodes as possible and at‐
2160 tempt to evenly distribute tasks across the allocated nodes.
2161 This option disables the topology/tree plugin. This option ap‐
2162 plies to job allocations.
2163
2164 --switches=<count>[@max-time]
2165 When a tree topology is used, this defines the maximum count of
2166 leaf switches desired for the job allocation and optionally the
2167 maximum time to wait for that number of switches. If Slurm finds
2168 an allocation containing more switches than the count specified,
2169 the job remains pending until it either finds an allocation with
2170 desired switch count or the time limit expires. It there is no
2171 switch count limit, there is no delay in starting the job. Ac‐
2172 ceptable time formats include "minutes", "minutes:seconds",
2173 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
2174 "days-hours:minutes:seconds". The job's maximum time delay may
2175 be limited by the system administrator using the SchedulerParam‐
2176 eters configuration parameter with the max_switch_wait parameter
2177 option. On a dragonfly network the only switch count supported
2178 is 1 since communication performance will be highest when a job
2179 is allocate resources on one leaf switch or more than 2 leaf
2180 switches. The default max-time is the max_switch_wait Sched‐
2181 ulerParameters. This option applies to job allocations.
2182
2183 --task-epilog=<executable>
2184 The slurmstepd daemon will run executable just after each task
2185 terminates. This will be executed before any TaskEpilog parame‐
2186 ter in slurm.conf is executed. This is meant to be a very
2187 short-lived program. If it fails to terminate within a few sec‐
2188 onds, it will be killed along with any descendant processes.
2189 This option applies to step allocations.
2190
2191 --task-prolog=<executable>
2192 The slurmstepd daemon will run executable just before launching
2193 each task. This will be executed after any TaskProlog parameter
2194 in slurm.conf is executed. Besides the normal environment vari‐
2195 ables, this has SLURM_TASK_PID available to identify the process
2196 ID of the task being started. Standard output from this program
2197 of the form "export NAME=value" will be used to set environment
2198 variables for the task being spawned. This option applies to
2199 step allocations.
2200
2201 --test-only
2202 Returns an estimate of when a job would be scheduled to run
2203 given the current job queue and all the other srun arguments
2204 specifying the job. This limits srun's behavior to just return
2205 information; no job is actually submitted. The program will be
2206 executed directly by the slurmd daemon. This option applies to
2207 job allocations.
2208
2209 --thread-spec=<num>
2210 Count of specialized threads per node reserved by the job for
2211 system operations and not used by the application. The applica‐
2212 tion will not use these threads, but will be charged for their
2213 allocation. This option can not be used with the --core-spec
2214 option. This option applies to job allocations.
2215
2216 NOTE: Explicitly setting a job's specialized thread value im‐
2217 plicitly sets its --exclusive option, reserving entire nodes for
2218 the job.
2219
2220 -T, --threads=<nthreads>
2221 Allows limiting the number of concurrent threads used to send
2222 the job request from the srun process to the slurmd processes on
2223 the allocated nodes. Default is to use one thread per allocated
2224 node up to a maximum of 60 concurrent threads. Specifying this
2225 option limits the number of concurrent threads to nthreads (less
2226 than or equal to 60). This should only be used to set a low
2227 thread count for testing on very small memory computers. This
2228 option applies to job allocations.
2229
2230 --threads-per-core=<threads>
2231 Restrict node selection to nodes with at least the specified
2232 number of threads per core. In task layout, use the specified
2233 maximum number of threads per core. Implies --cpu-bind=threads
2234 unless overridden by command line or environment options. NOTE:
2235 "Threads" refers to the number of processing units on each core
2236 rather than the number of application tasks to be launched per
2237 core. See additional information under -B option above when
2238 task/affinity plugin is enabled. This option applies to job and
2239 step allocations.
2240 NOTE: This option may implicitly impact the number of tasks if
2241 -n was not specified.
2242
2243 -t, --time=<time>
2244 Set a limit on the total run time of the job allocation. If the
2245 requested time limit exceeds the partition's time limit, the job
2246 will be left in a PENDING state (possibly indefinitely). The
2247 default time limit is the partition's default time limit. When
2248 the time limit is reached, each task in each job step is sent
2249 SIGTERM followed by SIGKILL. The interval between signals is
2250 specified by the Slurm configuration parameter KillWait. The
2251 OverTimeLimit configuration parameter may permit the job to run
2252 longer than scheduled. Time resolution is one minute and second
2253 values are rounded up to the next minute.
2254
2255 A time limit of zero requests that no time limit be imposed.
2256 Acceptable time formats include "minutes", "minutes:seconds",
2257 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
2258 "days-hours:minutes:seconds". This option applies to job and
2259 step allocations.
2260
2261 --time-min=<time>
2262 Set a minimum time limit on the job allocation. If specified,
2263 the job may have its --time limit lowered to a value no lower
2264 than --time-min if doing so permits the job to begin execution
2265 earlier than otherwise possible. The job's time limit will not
2266 be changed after the job is allocated resources. This is per‐
2267 formed by a backfill scheduling algorithm to allocate resources
2268 otherwise reserved for higher priority jobs. Acceptable time
2269 formats include "minutes", "minutes:seconds", "hours:min‐
2270 utes:seconds", "days-hours", "days-hours:minutes" and
2271 "days-hours:minutes:seconds". This option applies to job alloca‐
2272 tions.
2273
2274 --tmp=<size>[units]
2275 Specify a minimum amount of temporary disk space per node. De‐
2276 fault units are megabytes. Different units can be specified us‐
2277 ing the suffix [K|M|G|T]. This option applies to job alloca‐
2278 tions.
2279
2280 --uid=<user>
2281 Attempt to submit and/or run a job as user instead of the invok‐
2282 ing user id. The invoking user's credentials will be used to
2283 check access permissions for the target partition. User root may
2284 use this option to run jobs as a normal user in a RootOnly par‐
2285 tition for example. If run as root, srun will drop its permis‐
2286 sions to the uid specified after node allocation is successful.
2287 user may be the user name or numerical user ID. This option ap‐
2288 plies to job and step allocations.
2289
2290 -u, --unbuffered
2291 By default, the connection between slurmstepd and the
2292 user-launched application is over a pipe. The stdio output writ‐
2293 ten by the application is buffered by the glibc until it is
2294 flushed or the output is set as unbuffered. See setbuf(3). If
2295 this option is specified the tasks are executed with a pseudo
2296 terminal so that the application output is unbuffered. This op‐
2297 tion applies to step allocations.
2298
2299 --usage
2300 Display brief help message and exit.
2301
2302 --use-min-nodes
2303 If a range of node counts is given, prefer the smaller count.
2304
2305 -v, --verbose
2306 Increase the verbosity of srun's informational messages. Multi‐
2307 ple -v's will further increase srun's verbosity. By default
2308 only errors will be displayed. This option applies to job and
2309 step allocations.
2310
2311 -V, --version
2312 Display version information and exit.
2313
2314 -W, --wait=<seconds>
2315 Specify how long to wait after the first task terminates before
2316 terminating all remaining tasks. A value of 0 indicates an un‐
2317 limited wait (a warning will be issued after 60 seconds). The
2318 default value is set by the WaitTime parameter in the slurm con‐
2319 figuration file (see slurm.conf(5)). This option can be useful
2320 to ensure that a job is terminated in a timely fashion in the
2321 event that one or more tasks terminate prematurely. Note: The
2322 -K, --kill-on-bad-exit option takes precedence over -W, --wait
2323 to terminate the job immediately if a task exits with a non-zero
2324 exit code. This option applies to job allocations.
2325
2326 --wckey=<wckey>
2327 Specify wckey to be used with job. If TrackWCKey=no (default)
2328 in the slurm.conf this value is ignored. This option applies to
2329 job allocations.
2330
2331 --x11[={all|first|last}]
2332 Sets up X11 forwarding on "all", "first" or "last" node(s) of
2333 the allocation. This option is only enabled if Slurm was com‐
2334 piled with X11 support and PrologFlags=x11 is defined in the
2335 slurm.conf. Default is "all".
2336
2337 srun will submit the job request to the slurm job controller, then ini‐
2338 tiate all processes on the remote nodes. If the request cannot be met
2339 immediately, srun will block until the resources are free to run the
2340 job. If the -I (--immediate) option is specified srun will terminate if
2341 resources are not immediately available.
2342
2343 When initiating remote processes srun will propagate the current work‐
2344 ing directory, unless --chdir=<path> is specified, in which case path
2345 will become the working directory for the remote processes.
2346
2347 The -n, -c, and -N options control how CPUs and nodes will be allo‐
2348 cated to the job. When specifying only the number of processes to run
2349 with -n, a default of one CPU per process is allocated. By specifying
2350 the number of CPUs required per task (-c), more than one CPU may be al‐
2351 located per process. If the number of nodes is specified with -N, srun
2352 will attempt to allocate at least the number of nodes specified.
2353
2354 Combinations of the above three options may be used to change how pro‐
2355 cesses are distributed across nodes and cpus. For instance, by specify‐
2356 ing both the number of processes and number of nodes on which to run,
2357 the number of processes per node is implied. However, if the number of
2358 CPUs per process is more important then number of processes (-n) and
2359 the number of CPUs per process (-c) should be specified.
2360
2361 srun will refuse to allocate more than one process per CPU unless
2362 --overcommit (-O) is also specified.
2363
2364 srun will attempt to meet the above specifications "at a minimum." That
2365 is, if 16 nodes are requested for 32 processes, and some nodes do not
2366 have 2 CPUs, the allocation of nodes will be increased in order to meet
2367 the demand for CPUs. In other words, a minimum of 16 nodes are being
2368 requested. However, if 16 nodes are requested for 15 processes, srun
2369 will consider this an error, as 15 processes cannot run across 16
2370 nodes.
2371
2372
2373 IO Redirection
2374
2375 By default, stdout and stderr will be redirected from all tasks to the
2376 stdout and stderr of srun, and stdin will be redirected from the stan‐
2377 dard input of srun to all remote tasks. If stdin is only to be read by
2378 a subset of the spawned tasks, specifying a file to read from rather
2379 than forwarding stdin from the srun command may be preferable as it
2380 avoids moving and storing data that will never be read.
2381
2382 For OS X, the poll() function does not support stdin, so input from a
2383 terminal is not possible.
2384
2385 This behavior may be changed with the --output, --error, and --input
2386 (-o, -e, -i) options. Valid format specifications for these options are
2387
2388
2389 all stdout stderr is redirected from all tasks to srun. stdin is
2390 broadcast to all remote tasks. (This is the default behav‐
2391 ior)
2392
2393 none stdout and stderr is not received from any task. stdin is
2394 not sent to any task (stdin is closed).
2395
2396 taskid stdout and/or stderr are redirected from only the task with
2397 relative id equal to taskid, where 0 <= taskid <= ntasks,
2398 where ntasks is the total number of tasks in the current job
2399 step. stdin is redirected from the stdin of srun to this
2400 same task. This file will be written on the node executing
2401 the task.
2402
2403 filename srun will redirect stdout and/or stderr to the named file
2404 from all tasks. stdin will be redirected from the named file
2405 and broadcast to all tasks in the job. filename refers to a
2406 path on the host that runs srun. Depending on the cluster's
2407 file system layout, this may result in the output appearing
2408 in different places depending on whether the job is run in
2409 batch mode.
2410
2411 filename pattern
2412 srun allows for a filename pattern to be used to generate the
2413 named IO file described above. The following list of format
2414 specifiers may be used in the format string to generate a
2415 filename that will be unique to a given jobid, stepid, node,
2416 or task. In each case, the appropriate number of files are
2417 opened and associated with the corresponding tasks. Note that
2418 any format string containing %t, %n, and/or %N will be writ‐
2419 ten on the node executing the task rather than the node where
2420 srun executes, these format specifiers are not supported on a
2421 BGQ system.
2422
2423 \\ Do not process any of the replacement symbols.
2424
2425 %% The character "%".
2426
2427 %A Job array's master job allocation number.
2428
2429 %a Job array ID (index) number.
2430
2431 %J jobid.stepid of the running job. (e.g. "128.0")
2432
2433 %j jobid of the running job.
2434
2435 %s stepid of the running job.
2436
2437 %N short hostname. This will create a separate IO file
2438 per node.
2439
2440 %n Node identifier relative to current job (e.g. "0" is
2441 the first node of the running job) This will create a
2442 separate IO file per node.
2443
2444 %t task identifier (rank) relative to current job. This
2445 will create a separate IO file per task.
2446
2447 %u User name.
2448
2449 %x Job name.
2450
2451 A number placed between the percent character and format
2452 specifier may be used to zero-pad the result in the IO file‐
2453 name. This number is ignored if the format specifier corre‐
2454 sponds to non-numeric data (%N for example).
2455
2456 Some examples of how the format string may be used for a 4
2457 task job step with a Job ID of 128 and step id of 0 are in‐
2458 cluded below:
2459
2460
2461 job%J.out job128.0.out
2462
2463 job%4j.out job0128.out
2464
2465 job%j-%2t.out job128-00.out, job128-01.out, ...
2466
2468 Executing srun sends a remote procedure call to slurmctld. If enough
2469 calls from srun or other Slurm client commands that send remote proce‐
2470 dure calls to the slurmctld daemon come in at once, it can result in a
2471 degradation of performance of the slurmctld daemon, possibly resulting
2472 in a denial of service.
2473
2474 Do not run srun or other Slurm client commands that send remote proce‐
2475 dure calls to slurmctld from loops in shell scripts or other programs.
2476 Ensure that programs limit calls to srun to the minimum necessary for
2477 the information you are trying to gather.
2478
2479
2481 Upon startup, srun will read and handle the options set in the follow‐
2482 ing environment variables. The majority of these variables are set the
2483 same way the options are set, as defined above. For flag options that
2484 are defined to expect no argument, the option can be enabled by setting
2485 the environment variable without a value (empty or NULL string), the
2486 string 'yes', or a non-zero number. Any other value for the environment
2487 variable will result in the option not being set. There are a couple
2488 exceptions to these rules that are noted below.
2489 NOTE: Command line options always override environment variable set‐
2490 tings.
2491
2492
2493 PMI_FANOUT This is used exclusively with PMI (MPICH2 and
2494 MVAPICH2) and controls the fanout of data commu‐
2495 nications. The srun command sends messages to ap‐
2496 plication programs (via the PMI library) and
2497 those applications may be called upon to forward
2498 that data to up to this number of additional
2499 tasks. Higher values offload work from the srun
2500 command to the applications and likely increase
2501 the vulnerability to failures. The default value
2502 is 32.
2503
2504 PMI_FANOUT_OFF_HOST This is used exclusively with PMI (MPICH2 and
2505 MVAPICH2) and controls the fanout of data commu‐
2506 nications. The srun command sends messages to
2507 application programs (via the PMI library) and
2508 those applications may be called upon to forward
2509 that data to additional tasks. By default, srun
2510 sends one message per host and one task on that
2511 host forwards the data to other tasks on that
2512 host up to PMI_FANOUT. If PMI_FANOUT_OFF_HOST is
2513 defined, the user task may be required to forward
2514 the data to tasks on other hosts. Setting
2515 PMI_FANOUT_OFF_HOST may increase performance.
2516 Since more work is performed by the PMI library
2517 loaded by the user application, failures also can
2518 be more common and more difficult to diagnose.
2519 Should be disabled/enabled by setting to 0 or 1.
2520
2521 PMI_TIME This is used exclusively with PMI (MPICH2 and
2522 MVAPICH2) and controls how much the communica‐
2523 tions from the tasks to the srun are spread out
2524 in time in order to avoid overwhelming the srun
2525 command with work. The default value is 500 (mi‐
2526 croseconds) per task. On relatively slow proces‐
2527 sors or systems with very large processor counts
2528 (and large PMI data sets), higher values may be
2529 required.
2530
2531 SLURM_ACCOUNT Same as -A, --account
2532
2533 SLURM_ACCTG_FREQ Same as --acctg-freq
2534
2535 SLURM_BCAST Same as --bcast
2536
2537 SLURM_BCAST_EXCLUDE Same as --bcast-exclude
2538
2539 SLURM_BURST_BUFFER Same as --bb
2540
2541 SLURM_CLUSTERS Same as -M, --clusters
2542
2543 SLURM_COMPRESS Same as --compress
2544
2545 SLURM_CONF The location of the Slurm configuration file.
2546
2547 SLURM_CONSTRAINT Same as -C, --constraint
2548
2549 SLURM_CORE_SPEC Same as --core-spec
2550
2551 SLURM_CPU_BIND Same as --cpu-bind
2552
2553 SLURM_CPU_FREQ_REQ Same as --cpu-freq.
2554
2555 SLURM_CPUS_PER_GPU Same as --cpus-per-gpu
2556
2557 SRUN_CPUS_PER_TASK Same as -c, --cpus-per-task
2558
2559 SLURM_DEBUG Same as -v, --verbose. Must be set to 0 or 1 to
2560 disable or enable the option.
2561
2562 SLURM_DEBUG_FLAGS Specify debug flags for srun to use. See De‐
2563 bugFlags in the slurm.conf(5) man page for a full
2564 list of flags. The environment variable takes
2565 precedence over the setting in the slurm.conf.
2566
2567 SLURM_DELAY_BOOT Same as --delay-boot
2568
2569 SLURM_DEPENDENCY Same as -d, --dependency=<jobid>
2570
2571 SLURM_DISABLE_STATUS Same as -X, --disable-status
2572
2573 SLURM_DIST_PLANESIZE Plane distribution size. Only used if --distribu‐
2574 tion=plane, without =<size>, is set.
2575
2576 SLURM_DISTRIBUTION Same as -m, --distribution
2577
2578 SLURM_EPILOG Same as --epilog
2579
2580 SLURM_EXACT Same as --exact
2581
2582 SLURM_EXCLUSIVE Same as --exclusive
2583
2584 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
2585 error occurs (e.g. invalid options). This can be
2586 used by a script to distinguish application exit
2587 codes from various Slurm error conditions. Also
2588 see SLURM_EXIT_IMMEDIATE.
2589
2590 SLURM_EXIT_IMMEDIATE Specifies the exit code generated when the --im‐
2591 mediate option is used and resources are not cur‐
2592 rently available. This can be used by a script
2593 to distinguish application exit codes from vari‐
2594 ous Slurm error conditions. Also see
2595 SLURM_EXIT_ERROR.
2596
2597 SLURM_EXPORT_ENV Same as --export
2598
2599 SLURM_GPU_BIND Same as --gpu-bind
2600
2601 SLURM_GPU_FREQ Same as --gpu-freq
2602
2603 SLURM_GPUS Same as -G, --gpus
2604
2605 SLURM_GPUS_PER_NODE Same as --gpus-per-node
2606
2607 SLURM_GPUS_PER_TASK Same as --gpus-per-task
2608
2609 SLURM_GRES Same as --gres. Also see SLURM_STEP_GRES
2610
2611 SLURM_GRES_FLAGS Same as --gres-flags
2612
2613 SLURM_HINT Same as --hint
2614
2615 SLURM_IMMEDIATE Same as -I, --immediate
2616
2617 SLURM_JOB_ID Same as --jobid
2618
2619 SLURM_JOB_NAME Same as -J, --job-name except within an existing
2620 allocation, in which case it is ignored to avoid
2621 using the batch job's name as the name of each
2622 job step.
2623
2624 SLURM_JOB_NUM_NODES Same as -N, --nodes. Total number of nodes in
2625 the job’s resource allocation.
2626
2627 SLURM_KILL_BAD_EXIT Same as -K, --kill-on-bad-exit. Must be set to 0
2628 or 1 to disable or enable the option.
2629
2630 SLURM_LABELIO Same as -l, --label
2631
2632 SLURM_MEM_BIND Same as --mem-bind
2633
2634 SLURM_MEM_PER_CPU Same as --mem-per-cpu
2635
2636 SLURM_MEM_PER_GPU Same as --mem-per-gpu
2637
2638 SLURM_MEM_PER_NODE Same as --mem
2639
2640 SLURM_MPI_TYPE Same as --mpi
2641
2642 SLURM_NETWORK Same as --network
2643
2644 SLURM_NNODES Same as -N, --nodes. Total number of nodes in the
2645 job’s resource allocation. See
2646 SLURM_JOB_NUM_NODES. Included for backwards com‐
2647 patibility.
2648
2649 SLURM_NO_KILL Same as -k, --no-kill
2650
2651 SLURM_NPROCS Same as -n, --ntasks. See SLURM_NTASKS. Included
2652 for backwards compatibility.
2653
2654 SLURM_NTASKS Same as -n, --ntasks
2655
2656 SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2657
2658 SLURM_NTASKS_PER_GPU Same as --ntasks-per-gpu
2659
2660 SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2661
2662 SLURM_NTASKS_PER_SOCKET
2663 Same as --ntasks-per-socket
2664
2665 SLURM_OPEN_MODE Same as --open-mode
2666
2667 SLURM_OVERCOMMIT Same as -O, --overcommit
2668
2669 SLURM_OVERLAP Same as --overlap
2670
2671 SLURM_PARTITION Same as -p, --partition
2672
2673 SLURM_PMI_KVS_NO_DUP_KEYS
2674 If set, then PMI key-pairs will contain no dupli‐
2675 cate keys. MPI can use this variable to inform
2676 the PMI library that it will not use duplicate
2677 keys so PMI can skip the check for duplicate
2678 keys. This is the case for MPICH2 and reduces
2679 overhead in testing for duplicates for improved
2680 performance
2681
2682 SLURM_POWER Same as --power
2683
2684 SLURM_PROFILE Same as --profile
2685
2686 SLURM_PROLOG Same as --prolog
2687
2688 SLURM_QOS Same as --qos
2689
2690 SLURM_REMOTE_CWD Same as -D, --chdir=
2691
2692 SLURM_REQ_SWITCH When a tree topology is used, this defines the
2693 maximum count of switches desired for the job al‐
2694 location and optionally the maximum time to wait
2695 for that number of switches. See --switches
2696
2697 SLURM_RESERVATION Same as --reservation
2698
2699 SLURM_RESV_PORTS Same as --resv-ports
2700
2701 SLURM_SEND_LIBS Same as --send-libs
2702
2703 SLURM_SIGNAL Same as --signal
2704
2705 SLURM_SPREAD_JOB Same as --spread-job
2706
2707 SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2708 if set and non-zero, successive task exit mes‐
2709 sages with the same exit code will be printed
2710 only once.
2711
2712 SLURM_STDERRMODE Same as -e, --error
2713
2714 SLURM_STDINMODE Same as -i, --input
2715
2716 SLURM_STDOUTMODE Same as -o, --output
2717
2718 SLURM_STEP_GRES Same as --gres (only applies to job steps, not to
2719 job allocations). Also see SLURM_GRES
2720
2721 SLURM_STEP_KILLED_MSG_NODE_ID=ID
2722 If set, only the specified node will log when the
2723 job or step are killed by a signal.
2724
2725 SLURM_TASK_EPILOG Same as --task-epilog
2726
2727 SLURM_TASK_PROLOG Same as --task-prolog
2728
2729 SLURM_TEST_EXEC If defined, srun will verify existence of the ex‐
2730 ecutable program along with user execute permis‐
2731 sion on the node where srun was called before at‐
2732 tempting to launch it on nodes in the step.
2733
2734 SLURM_THREAD_SPEC Same as --thread-spec
2735
2736 SLURM_THREADS Same as -T, --threads
2737
2738 SLURM_THREADS_PER_CORE
2739 Same as --threads-per-core
2740
2741 SLURM_TIMELIMIT Same as -t, --time
2742
2743 SLURM_UMASK If defined, Slurm will use the defined umask to
2744 set permissions when creating the output/error
2745 files for the job.
2746
2747 SLURM_UNBUFFEREDIO Same as -u, --unbuffered
2748
2749 SLURM_USE_MIN_NODES Same as --use-min-nodes
2750
2751 SLURM_WAIT Same as -W, --wait
2752
2753 SLURM_WAIT4SWITCH Max time waiting for requested switches. See
2754 --switches
2755
2756 SLURM_WCKEY Same as -W, --wckey
2757
2758 SLURM_WORKING_DIR -D, --chdir
2759
2760 SLURMD_DEBUG Same as -d, --slurmd-debug. Must be set to 0 or 1
2761 to disable or enable the option.
2762
2763 SRUN_CONTAINER Same as --container.
2764
2765 SRUN_EXPORT_ENV Same as --export, and will override any setting
2766 for SLURM_EXPORT_ENV.
2767
2769 srun will set some environment variables in the environment of the exe‐
2770 cuting tasks on the remote compute nodes. These environment variables
2771 are:
2772
2773
2774 SLURM_*_HET_GROUP_# For a heterogeneous job allocation, the environ‐
2775 ment variables are set separately for each compo‐
2776 nent.
2777
2778 SLURM_CLUSTER_NAME Name of the cluster on which the job is execut‐
2779 ing.
2780
2781 SLURM_CPU_BIND_LIST --cpu-bind map or mask list (list of Slurm CPU
2782 IDs or masks for this node, CPU_ID = Board_ID x
2783 threads_per_board + Socket_ID x
2784 threads_per_socket + Core_ID x threads_per_core +
2785 Thread_ID).
2786
2787 SLURM_CPU_BIND_TYPE --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2788
2789 SLURM_CPU_BIND_VERBOSE
2790 --cpu-bind verbosity (quiet,verbose).
2791
2792 SLURM_CPU_FREQ_REQ Contains the value requested for cpu frequency on
2793 the srun command as a numerical frequency in
2794 kilohertz, or a coded value for a request of low,
2795 medium,highm1 or high for the frequency. See the
2796 description of the --cpu-freq option or the
2797 SLURM_CPU_FREQ_REQ input environment variable.
2798
2799 SLURM_CPUS_ON_NODE Number of CPUs available to the step on this
2800 node. NOTE: The select/linear plugin allocates
2801 entire nodes to jobs, so the value indicates the
2802 total count of CPUs on the node. For the se‐
2803 lect/cons_res and cons/tres plugins, this number
2804 indicates the number of CPUs on this node allo‐
2805 cated to the step.
2806
2807 SLURM_CPUS_PER_TASK Number of cpus requested per task. Only set if
2808 the --cpus-per-task option is specified.
2809
2810 SLURM_DISTRIBUTION Distribution type for the allocated jobs. Set the
2811 distribution with -m, --distribution.
2812
2813 SLURM_GPUS_ON_NODE Number of GPUs available to the step on this
2814 node.
2815
2816 SLURM_GTIDS Global task IDs running on this node. Zero ori‐
2817 gin and comma separated. It is read internally
2818 by pmi if Slurm was built with pmi support. Leav‐
2819 ing the variable set may cause problems when us‐
2820 ing external packages from within the job (Abaqus
2821 and Ansys have been known to have problems when
2822 it is set - consult the appropriate documentation
2823 for 3rd party software).
2824
2825 SLURM_HET_SIZE Set to count of components in heterogeneous job.
2826
2827 SLURM_JOB_ACCOUNT Account name associated of the job allocation.
2828
2829 SLURM_JOB_CPUS_PER_NODE
2830 Count of CPUs available to the job on the nodes
2831 in the allocation, using the format
2832 CPU_count[(xnumber_of_nodes)][,CPU_count [(xnum‐
2833 ber_of_nodes)] ...]. For example:
2834 SLURM_JOB_CPUS_PER_NODE='72(x2),36' indicates
2835 that on the first and second nodes (as listed by
2836 SLURM_JOB_NODELIST) the allocation has 72 CPUs,
2837 while the third node has 36 CPUs. NOTE: The se‐
2838 lect/linear plugin allocates entire nodes to
2839 jobs, so the value indicates the total count of
2840 CPUs on allocated nodes. The select/cons_res and
2841 select/cons_tres plugins allocate individual CPUs
2842 to jobs, so this number indicates the number of
2843 CPUs allocated to the job.
2844
2845 SLURM_JOB_DEPENDENCY Set to value of the --dependency option.
2846
2847 SLURM_JOB_GPUS The global GPU IDs of the GPUs allocated to this
2848 job. The GPU IDs are not relative to any device
2849 cgroup, even if devices are constrained with
2850 task/cgroup. Only set in batch and interactive
2851 jobs.
2852
2853 SLURM_JOB_ID Job id of the executing job.
2854
2855 SLURM_JOB_NAME Set to the value of the --job-name option or the
2856 command name when srun is used to create a new
2857 job allocation. Not set when srun is used only to
2858 create a job step (i.e. within an existing job
2859 allocation).
2860
2861 SLURM_JOB_NODELIST List of nodes allocated to the job.
2862
2863 SLURM_JOB_NODES Total number of nodes in the job's resource allo‐
2864 cation.
2865
2866 SLURM_JOB_PARTITION Name of the partition in which the job is run‐
2867 ning.
2868
2869 SLURM_JOB_QOS Quality Of Service (QOS) of the job allocation.
2870
2871 SLURM_JOB_RESERVATION Advanced reservation containing the job alloca‐
2872 tion, if any.
2873
2874 SLURM_JOBID Job id of the executing job. See SLURM_JOB_ID.
2875 Included for backwards compatibility.
2876
2877 SLURM_LAUNCH_NODE_IPADDR
2878 IP address of the node from which the task launch
2879 was initiated (where the srun command ran from).
2880
2881 SLURM_LOCALID Node local task ID for the process within a job.
2882
2883 SLURM_MEM_BIND_LIST --mem-bind map or mask list (<list of IDs or
2884 masks for this node>).
2885
2886 SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2887
2888 SLURM_MEM_BIND_SORT Sort free cache pages (run zonesort on Intel KNL
2889 nodes).
2890
2891 SLURM_MEM_BIND_TYPE --mem-bind type (none,rank,map_mem:,mask_mem:).
2892
2893 SLURM_MEM_BIND_VERBOSE
2894 --mem-bind verbosity (quiet,verbose).
2895
2896 SLURM_NODE_ALIASES Sets of node name, communication address and
2897 hostname for nodes allocated to the job from the
2898 cloud. Each element in the set if colon separated
2899 and each set is comma separated. For example:
2900 SLURM_NODE_ALIASES=
2901 ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2902
2903 SLURM_NODEID The relative node ID of the current node.
2904
2905 SLURM_NPROCS Total number of processes in the current job or
2906 job step. See SLURM_NTASKS. Included for back‐
2907 wards compatibility.
2908
2909 SLURM_NTASKS Total number of processes in the current job or
2910 job step.
2911
2912 SLURM_OVERCOMMIT Set to 1 if --overcommit was specified.
2913
2914 SLURM_PRIO_PROCESS The scheduling priority (nice value) at the time
2915 of job submission. This value is propagated to
2916 the spawned processes.
2917
2918 SLURM_PROCID The MPI rank (or relative process ID) of the cur‐
2919 rent process.
2920
2921 SLURM_SRUN_COMM_HOST IP address of srun communication host.
2922
2923 SLURM_SRUN_COMM_PORT srun communication port.
2924
2925 SLURM_CONTAINER OCI Bundle for job. Only set if --container is
2926 specified.
2927
2928 SLURM_SHARDS_ON_NODE Number of GPU Shards available to the step on
2929 this node.
2930
2931 SLURM_STEP_GPUS The global GPU IDs of the GPUs allocated to this
2932 step (excluding batch and interactive steps). The
2933 GPU IDs are not relative to any device cgroup,
2934 even if devices are constrained with task/cgroup.
2935
2936 SLURM_STEP_ID The step ID of the current job.
2937
2938 SLURM_STEP_LAUNCHER_PORT
2939 Step launcher port.
2940
2941 SLURM_STEP_NODELIST List of nodes allocated to the step.
2942
2943 SLURM_STEP_NUM_NODES Number of nodes allocated to the step.
2944
2945 SLURM_STEP_NUM_TASKS Number of processes in the job step or whole het‐
2946 erogeneous job step.
2947
2948 SLURM_STEP_TASKS_PER_NODE
2949 Number of processes per node within the step.
2950
2951 SLURM_STEPID The step ID of the current job. See
2952 SLURM_STEP_ID. Included for backwards compatibil‐
2953 ity.
2954
2955 SLURM_SUBMIT_DIR The directory from which the allocation was in‐
2956 voked from.
2957
2958 SLURM_SUBMIT_HOST The hostname of the computer from which the allo‐
2959 cation was invoked from.
2960
2961 SLURM_TASK_PID The process ID of the task being started.
2962
2963 SLURM_TASKS_PER_NODE Number of tasks to be initiated on each node.
2964 Values are comma separated and in the same order
2965 as SLURM_JOB_NODELIST. If two or more consecu‐
2966 tive nodes are to have the same task count, that
2967 count is followed by "(x#)" where "#" is the rep‐
2968 etition count. For example,
2969 "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2970 first three nodes will each execute two tasks and
2971 the fourth node will execute one task.
2972
2973 SLURM_TOPOLOGY_ADDR This is set only if the system has the topol‐
2974 ogy/tree plugin configured. The value will be
2975 set to the names network switches which may be
2976 involved in the job's communications from the
2977 system's top level switch down to the leaf switch
2978 and ending with node name. A period is used to
2979 separate each hardware component name.
2980
2981 SLURM_TOPOLOGY_ADDR_PATTERN
2982 This is set only if the system has the topol‐
2983 ogy/tree plugin configured. The value will be
2984 set component types listed in SLURM_TOPOL‐
2985 OGY_ADDR. Each component will be identified as
2986 either "switch" or "node". A period is used to
2987 separate each hardware component type.
2988
2989 SLURM_UMASK The umask in effect when the job was submitted.
2990
2991 SLURMD_NODENAME Name of the node running the task. In the case of
2992 a parallel job executing on multiple compute
2993 nodes, the various tasks will have this environ‐
2994 ment variable set to different values on each
2995 compute node.
2996
2997 SRUN_DEBUG Set to the logging level of the srun command.
2998 Default value is 3 (info level). The value is
2999 incremented or decremented based upon the --ver‐
3000 bose and --quiet options.
3001
3003 Signals sent to the srun command are automatically forwarded to the
3004 tasks it is controlling with a few exceptions. The escape sequence
3005 <control-c> will report the state of all tasks associated with the srun
3006 command. If <control-c> is entered twice within one second, then the
3007 associated SIGINT signal will be sent to all tasks and a termination
3008 sequence will be entered sending SIGCONT, SIGTERM, and SIGKILL to all
3009 spawned tasks. If a third <control-c> is received, the srun program
3010 will be terminated without waiting for remote tasks to exit or their
3011 I/O to complete.
3012
3013 The escape sequence <control-z> is presently ignored.
3014
3015
3017 MPI use depends upon the type of MPI being used. There are three fun‐
3018 damentally different modes of operation used by these various MPI im‐
3019 plementations.
3020
3021 1. Slurm directly launches the tasks and performs initialization of
3022 communications through the PMI2 or PMIx APIs. For example: "srun -n16
3023 a.out".
3024
3025 2. Slurm creates a resource allocation for the job and then mpirun
3026 launches tasks using Slurm's infrastructure (OpenMPI).
3027
3028 3. Slurm creates a resource allocation for the job and then mpirun
3029 launches tasks using some mechanism other than Slurm, such as SSH or
3030 RSH. These tasks are initiated outside of Slurm's monitoring or con‐
3031 trol. Slurm's epilog should be configured to purge these tasks when the
3032 job's allocation is relinquished, or the use of pam_slurm_adopt is
3033 highly recommended.
3034
3035 See https://slurm.schedmd.com/mpi_guide.html for more information on
3036 use of these various MPI implementations with Slurm.
3037
3038
3040 Comments in the configuration file must have a "#" in column one. The
3041 configuration file contains the following fields separated by white
3042 space:
3043
3044
3045 Task rank
3046 One or more task ranks to use this configuration. Multiple val‐
3047 ues may be comma separated. Ranges may be indicated with two
3048 numbers separated with a '-' with the smaller number first (e.g.
3049 "0-4" and not "4-0"). To indicate all tasks not otherwise spec‐
3050 ified, specify a rank of '*' as the last line of the file. If
3051 an attempt is made to initiate a task for which no executable
3052 program is defined, the following error message will be produced
3053 "No executable program specified for this task".
3054
3055 Executable
3056 The name of the program to execute. May be fully qualified
3057 pathname if desired.
3058
3059 Arguments
3060 Program arguments. The expression "%t" will be replaced with
3061 the task's number. The expression "%o" will be replaced with
3062 the task's offset within this range (e.g. a configured task rank
3063 value of "1-5" would have offset values of "0-4"). Single
3064 quotes may be used to avoid having the enclosed values inter‐
3065 preted. This field is optional. Any arguments for the program
3066 entered on the command line will be added to the arguments spec‐
3067 ified in the configuration file.
3068
3069 For example:
3070
3071 $ cat silly.conf
3072 ###################################################################
3073 # srun multiple program configuration file
3074 #
3075 # srun -n8 -l --multi-prog silly.conf
3076 ###################################################################
3077 4-6 hostname
3078 1,7 echo task:%t
3079 0,2-3 echo offset:%o
3080
3081 $ srun -n8 -l --multi-prog silly.conf
3082 0: offset:0
3083 1: task:1
3084 2: offset:1
3085 3: offset:2
3086 4: linux15.llnl.gov
3087 5: linux16.llnl.gov
3088 6: linux17.llnl.gov
3089 7: task:7
3090
3091
3093 This simple example demonstrates the execution of the command hostname
3094 in eight tasks. At least eight processors will be allocated to the job
3095 (the same as the task count) on however many nodes are required to sat‐
3096 isfy the request. The output of each task will be proceeded with its
3097 task number. (The machine "dev" in the example below has a total of
3098 two CPUs per node)
3099
3100 $ srun -n8 -l hostname
3101 0: dev0
3102 1: dev0
3103 2: dev1
3104 3: dev1
3105 4: dev2
3106 5: dev2
3107 6: dev3
3108 7: dev3
3109
3110
3111 The srun -r option is used within a job script to run two job steps on
3112 disjoint nodes in the following example. The script is run using allo‐
3113 cate mode instead of as a batch job in this case.
3114
3115 $ cat test.sh
3116 #!/bin/sh
3117 echo $SLURM_JOB_NODELIST
3118 srun -lN2 -r2 hostname
3119 srun -lN2 hostname
3120
3121 $ salloc -N4 test.sh
3122 dev[7-10]
3123 0: dev9
3124 1: dev10
3125 0: dev7
3126 1: dev8
3127
3128
3129 The following script runs two job steps in parallel within an allocated
3130 set of nodes.
3131
3132 $ cat test.sh
3133 #!/bin/bash
3134 srun -lN2 -n4 -r 2 sleep 60 &
3135 srun -lN2 -r 0 sleep 60 &
3136 sleep 1
3137 squeue
3138 squeue -s
3139 wait
3140
3141 $ salloc -N4 test.sh
3142 JOBID PARTITION NAME USER ST TIME NODES NODELIST
3143 65641 batch test.sh grondo R 0:01 4 dev[7-10]
3144
3145 STEPID PARTITION USER TIME NODELIST
3146 65641.0 batch grondo 0:01 dev[7-8]
3147 65641.1 batch grondo 0:01 dev[9-10]
3148
3149
3150 This example demonstrates how one executes a simple MPI job. We use
3151 srun to build a list of machines (nodes) to be used by mpirun in its
3152 required format. A sample command line and the script to be executed
3153 follow.
3154
3155 $ cat test.sh
3156 #!/bin/sh
3157 MACHINEFILE="nodes.$SLURM_JOB_ID"
3158
3159 # Generate Machinefile for mpi such that hosts are in the same
3160 # order as if run via srun
3161 #
3162 srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3163
3164 # Run using generated Machine file:
3165 mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3166
3167 rm $MACHINEFILE
3168
3169 $ salloc -N2 -n4 test.sh
3170
3171
3172 This simple example demonstrates the execution of different jobs on
3173 different nodes in the same srun. You can do this for any number of
3174 nodes or any number of jobs. The executables are placed on the nodes
3175 sited by the SLURM_NODEID env var. Starting at 0 and going to the num‐
3176 ber specified on the srun command line.
3177
3178 $ cat test.sh
3179 case $SLURM_NODEID in
3180 0) echo "I am running on "
3181 hostname ;;
3182 1) hostname
3183 echo "is where I am running" ;;
3184 esac
3185
3186 $ srun -N2 test.sh
3187 dev0
3188 is where I am running
3189 I am running on
3190 dev1
3191
3192
3193 This example demonstrates use of multi-core options to control layout
3194 of tasks. We request that four sockets per node and two cores per
3195 socket be dedicated to the job.
3196
3197 $ srun -N2 -B 4-4:2-2 a.out
3198
3199
3200 This example shows a script in which Slurm is used to provide resource
3201 management for a job by executing the various job steps as processors
3202 become available for their dedicated use.
3203
3204 $ cat my.script
3205 #!/bin/bash
3206 srun -n4 prog1 &
3207 srun -n3 prog2 &
3208 srun -n1 prog3 &
3209 srun -n1 prog4 &
3210 wait
3211
3212
3213 This example shows how to launch an application called "server" with
3214 one task, 8 CPUs and 16 GB of memory (2 GB per CPU) plus another appli‐
3215 cation called "client" with 16 tasks, 1 CPU per task (the default) and
3216 1 GB of memory per task.
3217
3218 $ srun -n1 -c16 --mem-per-cpu=1gb server : -n16 --mem-per-cpu=1gb client
3219
3220
3222 Copyright (C) 2006-2007 The Regents of the University of California.
3223 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3224 Copyright (C) 2008-2010 Lawrence Livermore National Security.
3225 Copyright (C) 2010-2022 SchedMD LLC.
3226
3227 This file is part of Slurm, a resource management program. For de‐
3228 tails, see <https://slurm.schedmd.com/>.
3229
3230 Slurm is free software; you can redistribute it and/or modify it under
3231 the terms of the GNU General Public License as published by the Free
3232 Software Foundation; either version 2 of the License, or (at your op‐
3233 tion) any later version.
3234
3235 Slurm is distributed in the hope that it will be useful, but WITHOUT
3236 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
3237 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
3238 for more details.
3239
3240
3242 salloc(1), sattach(1), sbatch(1), sbcast(1), scancel(1), scontrol(1),
3243 squeue(1), slurm.conf(5), sched_setaffinity (2), numa (3) getrlimit (2)
3244
3245
3246
3247October 2022 Slurm Commands srun(1)