1srun(1) Slurm Commands srun(1)
2
3
4
6 srun - Run parallel jobs
7
8
10 srun [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]]
11 executable(N) [args(N)...]
12
13 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
14 For more details about heterogeneous jobs see the document
15 https://slurm.schedmd.com/heterogeneous_jobs.html
16
17
19 Run a parallel job on cluster managed by Slurm. If necessary, srun
20 will first create a resource allocation in which to run the parallel
21 job.
22
23 The following document describes the influence of various options on
24 the allocation of cpus to jobs and tasks.
25 https://slurm.schedmd.com/cpu_management.html
26
27
29 srun will return the highest exit code of all tasks run or the highest
30 signal (with the high-order bit set in an 8-bit integer -- e.g. 128 +
31 signal) of any task that exited with a signal.
32 The value 253 is reserved for out-of-memory errors.
33
34
36 The executable is resolved in the following order:
37
38 1. If executable starts with ".", then path is constructed as: current
39 working directory / executable
40 2. If executable starts with a "/", then path is considered absolute.
41 3. If executable can be resolved through PATH. See path_resolution(7).
42 4. If executable is in current working directory.
43
44 Current working directory is the calling process working directory un‐
45 less the --chdir argument is passed, which will override the current
46 working directory.
47
48
50 --accel-bind=<options>
51 Control how tasks are bound to generic resources of type gpu and
52 nic. Multiple options may be specified. Supported options in‐
53 clude:
54
55 g Bind each task to GPUs which are closest to the allocated
56 CPUs.
57
58 n Bind each task to NICs which are closest to the allocated
59 CPUs.
60
61 v Verbose mode. Log how tasks are bound to GPU and NIC de‐
62 vices.
63
64 This option applies to job allocations.
65
66
67 -A, --account=<account>
68 Charge resources used by this job to specified account. The ac‐
69 count is an arbitrary string. The account name may be changed
70 after job submission using the scontrol command. This option ap‐
71 plies to job allocations.
72
73
74 --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
75 Define the job accounting and profiling sampling intervals in
76 seconds. This can be used to override the JobAcctGatherFre‐
77 quency parameter in the slurm.conf file. <datatype>=<interval>
78 specifies the task sampling interval for the jobacct_gather
79 plugin or a sampling interval for a profiling type by the
80 acct_gather_profile plugin. Multiple comma-separated
81 <datatype>=<interval> pairs may be specified. Supported datatype
82 values are:
83
84 task Sampling interval for the jobacct_gather plugins and
85 for task profiling by the acct_gather_profile
86 plugin.
87 NOTE: This frequency is used to monitor memory us‐
88 age. If memory limits are enforced the highest fre‐
89 quency a user can request is what is configured in
90 the slurm.conf file. It can not be disabled.
91
92 energy Sampling interval for energy profiling using the
93 acct_gather_energy plugin.
94
95 network Sampling interval for infiniband profiling using the
96 acct_gather_interconnect plugin.
97
98 filesystem Sampling interval for filesystem profiling using the
99 acct_gather_filesystem plugin.
100
101
102 The default value for the task sampling interval is 30 seconds.
103 The default value for all other intervals is 0. An interval of
104 0 disables sampling of the specified type. If the task sampling
105 interval is 0, accounting information is collected only at job
106 termination (reducing Slurm interference with the job).
107 Smaller (non-zero) values have a greater impact upon job perfor‐
108 mance, but a value of 30 seconds is not likely to be noticeable
109 for applications having less than 10,000 tasks. This option ap‐
110 plies to job allocations.
111
112
113 --bb=<spec>
114 Burst buffer specification. The form of the specification is
115 system dependent. Also see --bbf. This option applies to job
116 allocations. When the --bb option is used, Slurm parses this
117 option and creates a temporary burst buffer script file that is
118 used internally by the burst buffer plugins. See Slurm's burst
119 buffer guide for more information and examples:
120 https://slurm.schedmd.com/burst_buffer.html
121
122
123 --bbf=<file_name>
124 Path of file containing burst buffer specification. The form of
125 the specification is system dependent. Also see --bb. This op‐
126 tion applies to job allocations. See Slurm's burst buffer guide
127 for more information and examples:
128 https://slurm.schedmd.com/burst_buffer.html
129
130
131 --bcast[=<dest_path>]
132 Copy executable file to allocated compute nodes. If a file name
133 is specified, copy the executable to the specified destination
134 file path. If the path specified ends with '/' it is treated as
135 a target directory, and the destination file name will be
136 slurm_bcast_<job_id>.<step_id>_<nodename>. If no dest_path is
137 specified and the slurm.conf BcastParameters DestDir is config‐
138 ured then it is used, and the filename follows the above pat‐
139 tern. If none of the previous is specified, then --chdir is
140 used, and the filename follows the above pattern too. For exam‐
141 ple, "srun --bcast=/tmp/mine -N3 a.out" will copy the file
142 "a.out" from your current directory to the file "/tmp/mine" on
143 each of the three allocated compute nodes and execute that file.
144 This option applies to step allocations.
145
146
147 --bcast-exclude={NONE|<exclude_path>[,<exclude_path>...]}
148 Comma-separated list of absolute directory paths to be excluded
149 when autodetecting and broadcasting executable shared object de‐
150 pendencies through --bcast. If the keyword "NONE" is configured,
151 no directory paths will be excluded. The default value is that
152 of slurm.conf BcastExclude and this option overrides it. See
153 also --bcast and --send-libs.
154
155
156 -b, --begin=<time>
157 Defer initiation of this job until the specified time. It ac‐
158 cepts times of the form HH:MM:SS to run a job at a specific time
159 of day (seconds are optional). (If that time is already past,
160 the next day is assumed.) You may also specify midnight, noon,
161 fika (3 PM) or teatime (4 PM) and you can have a time-of-day
162 suffixed with AM or PM for running in the morning or the
163 evening. You can also say what day the job will be run, by
164 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
165 Combine date and time using the following format
166 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
167 count time-units, where the time-units can be seconds (default),
168 minutes, hours, days, or weeks and you can tell Slurm to run the
169 job today with the keyword today and to run the job tomorrow
170 with the keyword tomorrow. The value may be changed after job
171 submission using the scontrol command. For example:
172 --begin=16:00
173 --begin=now+1hour
174 --begin=now+60 (seconds by default)
175 --begin=2010-01-20T12:34:00
176
177
178 Notes on date/time specifications:
179 - Although the 'seconds' field of the HH:MM:SS time specifica‐
180 tion is allowed by the code, note that the poll time of the
181 Slurm scheduler is not precise enough to guarantee dispatch of
182 the job on the exact second. The job will be eligible to start
183 on the next poll following the specified time. The exact poll
184 interval depends on the Slurm scheduler (e.g., 60 seconds with
185 the default sched/builtin).
186 - If no time (HH:MM:SS) is specified, the default is
187 (00:00:00).
188 - If a date is specified without a year (e.g., MM/DD) then the
189 current year is assumed, unless the combination of MM/DD and
190 HH:MM:SS has already passed for that year, in which case the
191 next year is used.
192 This option applies to job allocations.
193
194
195 -D, --chdir=<path>
196 Have the remote processes do a chdir to path before beginning
197 execution. The default is to chdir to the current working direc‐
198 tory of the srun process. The path can be specified as full path
199 or relative path to the directory where the command is executed.
200 This option applies to job allocations.
201
202
203 --cluster-constraint=<list>
204 Specifies features that a federated cluster must have to have a
205 sibling job submitted to it. Slurm will attempt to submit a sib‐
206 ling job to a cluster if it has at least one of the specified
207 features.
208
209
210 -M, --clusters=<string>
211 Clusters to issue commands to. Multiple cluster names may be
212 comma separated. The job will be submitted to the one cluster
213 providing the earliest expected job initiation time. The default
214 value is the current cluster. A value of 'all' will query to run
215 on all clusters. Note the --export option to control environ‐
216 ment variables exported between clusters. This option applies
217 only to job allocations. Note that the SlurmDBD must be up for
218 this option to work properly.
219
220
221 --comment=<string>
222 An arbitrary comment. This option applies to job allocations.
223
224
225 --compress[=type]
226 Compress file before sending it to compute hosts. The optional
227 argument specifies the data compression library to be used. The
228 default is BcastParameters Compression= if set or "lz4" other‐
229 wise. Supported values are "lz4". Some compression libraries
230 may be unavailable on some systems. For use with the --bcast
231 option. This option applies to step allocations.
232
233
234 -C, --constraint=<list>
235 Nodes can have features assigned to them by the Slurm adminis‐
236 trator. Users can specify which of these features are required
237 by their job using the constraint option. Only nodes having
238 features matching the job constraints will be used to satisfy
239 the request. Multiple constraints may be specified with AND,
240 OR, matching OR, resource counts, etc. (some operators are not
241 supported on all system types). Supported constraint options
242 include:
243
244 Single Name
245 Only nodes which have the specified feature will be used.
246 For example, --constraint="intel"
247
248 Node Count
249 A request can specify the number of nodes needed with
250 some feature by appending an asterisk and count after the
251 feature name. For example, --nodes=16 --con‐
252 straint="graphics*4 ..." indicates that the job requires
253 16 nodes and that at least four of those nodes must have
254 the feature "graphics."
255
256 AND If only nodes with all of specified features will be
257 used. The ampersand is used for an AND operator. For
258 example, --constraint="intel&gpu"
259
260 OR If only nodes with at least one of specified features
261 will be used. The vertical bar is used for an OR opera‐
262 tor. For example, --constraint="intel|amd"
263
264 Matching OR
265 If only one of a set of possible options should be used
266 for all allocated nodes, then use the OR operator and en‐
267 close the options within square brackets. For example,
268 --constraint="[rack1|rack2|rack3|rack4]" might be used to
269 specify that all nodes must be allocated on a single rack
270 of the cluster, but any of those four racks can be used.
271
272 Multiple Counts
273 Specific counts of multiple resources may be specified by
274 using the AND operator and enclosing the options within
275 square brackets. For example, --con‐
276 straint="[rack1*2&rack2*4]" might be used to specify that
277 two nodes must be allocated from nodes with the feature
278 of "rack1" and four nodes must be allocated from nodes
279 with the feature "rack2".
280
281 NOTE: This construct does not support multiple Intel KNL
282 NUMA or MCDRAM modes. For example, while --con‐
283 straint="[(knl&quad)*2&(knl&hemi)*4]" is not supported,
284 --constraint="[haswell*2&(knl&hemi)*4]" is supported.
285 Specification of multiple KNL modes requires the use of a
286 heterogeneous job.
287
288 Brackets
289 Brackets can be used to indicate that you are looking for
290 a set of nodes with the different requirements contained
291 within the brackets. For example, --con‐
292 straint="[(rack1|rack2)*1&(rack3)*2]" will get you one
293 node with either the "rack1" or "rack2" features and two
294 nodes with the "rack3" feature. The same request without
295 the brackets will try to find a single node that meets
296 those requirements.
297
298 NOTE: Brackets are only reserved for Multiple Counts and
299 Matching OR syntax. AND operators require a count for
300 each feature inside square brackets (i.e.
301 "[quad*2&hemi*1]").
302
303 Parenthesis
304 Parenthesis can be used to group like node features to‐
305 gether. For example, --con‐
306 straint="[(knl&snc4&flat)*4&haswell*1]" might be used to
307 specify that four nodes with the features "knl", "snc4"
308 and "flat" plus one node with the feature "haswell" are
309 required. All options within parenthesis should be
310 grouped with AND (e.g. "&") operands.
311
312 WARNING: When srun is executed from within salloc or sbatch, the
313 constraint value can only contain a single feature name. None of
314 the other operators are currently supported for job steps.
315 This option applies to job and step allocations.
316
317
318 --container=<path_to_container>
319 Absolute path to OCI container bundle.
320
321
322 --contiguous
323 If set, then the allocated nodes must form a contiguous set.
324
325 NOTE: If SelectPlugin=cons_res this option won't be honored with
326 the topology/tree or topology/3d_torus plugins, both of which
327 can modify the node ordering. This option applies to job alloca‐
328 tions.
329
330
331 -S, --core-spec=<num>
332 Count of specialized cores per node reserved by the job for sys‐
333 tem operations and not used by the application. The application
334 will not use these cores, but will be charged for their alloca‐
335 tion. Default value is dependent upon the node's configured
336 CoreSpecCount value. If a value of zero is designated and the
337 Slurm configuration option AllowSpecResourcesUsage is enabled,
338 the job will be allowed to override CoreSpecCount and use the
339 specialized resources on nodes it is allocated. This option can
340 not be used with the --thread-spec option. This option applies
341 to job allocations.
342 NOTE: This option may implicitly impact the number of tasks if
343 -n was not specified.
344
345
346 --cores-per-socket=<cores>
347 Restrict node selection to nodes with at least the specified
348 number of cores per socket. See additional information under -B
349 option above when task/affinity plugin is enabled. This option
350 applies to job allocations.
351
352
353 --cpu-bind=[{quiet|verbose},]<type>
354 Bind tasks to CPUs. Used only when the task/affinity or
355 task/cgroup plugin is enabled. NOTE: To have Slurm always re‐
356 port on the selected CPU binding for all commands executed in a
357 shell, you can enable verbose mode by setting the SLURM_CPU_BIND
358 environment variable value to "verbose".
359
360 The following informational environment variables are set when
361 --cpu-bind is in use:
362 SLURM_CPU_BIND_VERBOSE
363 SLURM_CPU_BIND_TYPE
364 SLURM_CPU_BIND_LIST
365
366 See the ENVIRONMENT VARIABLES section for a more detailed de‐
367 scription of the individual SLURM_CPU_BIND variables. These
368 variable are available only if the task/affinity plugin is con‐
369 figured.
370
371 When using --cpus-per-task to run multithreaded tasks, be aware
372 that CPU binding is inherited from the parent of the process.
373 This means that the multithreaded task should either specify or
374 clear the CPU binding itself to avoid having all threads of the
375 multithreaded task use the same mask/CPU as the parent. Alter‐
376 natively, fat masks (masks which specify more than one allowed
377 CPU) could be used for the tasks in order to provide multiple
378 CPUs for the multithreaded tasks.
379
380 Note that a job step can be allocated different numbers of CPUs
381 on each node or be allocated CPUs not starting at location zero.
382 Therefore one of the options which automatically generate the
383 task binding is recommended. Explicitly specified masks or
384 bindings are only honored when the job step has been allocated
385 every available CPU on the node.
386
387 Binding a task to a NUMA locality domain means to bind the task
388 to the set of CPUs that belong to the NUMA locality domain or
389 "NUMA node". If NUMA locality domain options are used on sys‐
390 tems with no NUMA support, then each socket is considered a lo‐
391 cality domain.
392
393 If the --cpu-bind option is not used, the default binding mode
394 will depend upon Slurm's configuration and the step's resource
395 allocation. If all allocated nodes have the same configured
396 CpuBind mode, that will be used. Otherwise if the job's Parti‐
397 tion has a configured CpuBind mode, that will be used. Other‐
398 wise if Slurm has a configured TaskPluginParam value, that mode
399 will be used. Otherwise automatic binding will be performed as
400 described below.
401
402
403 Auto Binding
404 Applies only when task/affinity is enabled. If the job
405 step allocation includes an allocation with a number of
406 sockets, cores, or threads equal to the number of tasks
407 times cpus-per-task, then the tasks will by default be
408 bound to the appropriate resources (auto binding). Dis‐
409 able this mode of operation by explicitly setting
410 "--cpu-bind=none". Use TaskPluginParam=auto‐
411 bind=[threads|cores|sockets] to set a default cpu binding
412 in case "auto binding" doesn't find a match.
413
414 Supported options include:
415
416 q[uiet]
417 Quietly bind before task runs (default)
418
419 v[erbose]
420 Verbosely report binding before task runs
421
422 no[ne] Do not bind tasks to CPUs (default unless auto
423 binding is applied)
424
425 rank Automatically bind by task rank. The lowest num‐
426 bered task on each node is bound to socket (or
427 core or thread) zero, etc. Not supported unless
428 the entire node is allocated to the job.
429
430 map_cpu:<list>
431 Bind by setting CPU masks on tasks (or ranks) as
432 specified where <list> is
433 <cpu_id_for_task_0>,<cpu_id_for_task_1>,... CPU
434 IDs are interpreted as decimal values unless they
435 are preceded with '0x' in which case they inter‐
436 preted as hexadecimal values. If the number of
437 tasks (or ranks) exceeds the number of elements in
438 this list, elements in the list will be reused as
439 needed starting from the beginning of the list.
440 To simplify support for large task counts, the
441 lists may follow a map with an asterisk and repe‐
442 tition count. For example
443 "map_cpu:0x0f*4,0xf0*4".
444
445 mask_cpu:<list>
446 Bind by setting CPU masks on tasks (or ranks) as
447 specified where <list> is
448 <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
449 The mapping is specified for a node and identical
450 mapping is applied to the tasks on every node
451 (i.e. the lowest task ID on each node is mapped to
452 the first mask specified in the list, etc.). CPU
453 masks are always interpreted as hexadecimal values
454 but can be preceded with an optional '0x'. If the
455 number of tasks (or ranks) exceeds the number of
456 elements in this list, elements in the list will
457 be reused as needed starting from the beginning of
458 the list. To simplify support for large task
459 counts, the lists may follow a map with an aster‐
460 isk and repetition count. For example
461 "mask_cpu:0x0f*4,0xf0*4".
462
463 rank_ldom
464 Bind to a NUMA locality domain by rank. Not sup‐
465 ported unless the entire node is allocated to the
466 job.
467
468 map_ldom:<list>
469 Bind by mapping NUMA locality domain IDs to tasks
470 as specified where <list> is
471 <ldom1>,<ldom2>,...<ldomN>. The locality domain
472 IDs are interpreted as decimal values unless they
473 are preceded with '0x' in which case they are in‐
474 terpreted as hexadecimal values. Not supported
475 unless the entire node is allocated to the job.
476
477 mask_ldom:<list>
478 Bind by setting NUMA locality domain masks on
479 tasks as specified where <list> is
480 <mask1>,<mask2>,...<maskN>. NUMA locality domain
481 masks are always interpreted as hexadecimal values
482 but can be preceded with an optional '0x'. Not
483 supported unless the entire node is allocated to
484 the job.
485
486 sockets
487 Automatically generate masks binding tasks to
488 sockets. Only the CPUs on the socket which have
489 been allocated to the job will be used. If the
490 number of tasks differs from the number of allo‐
491 cated sockets this can result in sub-optimal bind‐
492 ing.
493
494 cores Automatically generate masks binding tasks to
495 cores. If the number of tasks differs from the
496 number of allocated cores this can result in
497 sub-optimal binding.
498
499 threads
500 Automatically generate masks binding tasks to
501 threads. If the number of tasks differs from the
502 number of allocated threads this can result in
503 sub-optimal binding.
504
505 ldoms Automatically generate masks binding tasks to NUMA
506 locality domains. If the number of tasks differs
507 from the number of allocated locality domains this
508 can result in sub-optimal binding.
509
510 help Show help message for cpu-bind
511
512 This option applies to job and step allocations.
513
514
515 --cpu-freq=<p1>[-p2[:p3]]
516
517 Request that the job step initiated by this srun command be run
518 at some requested frequency if possible, on the CPUs selected
519 for the step on the compute node(s).
520
521 p1 can be [#### | low | medium | high | highm1] which will set
522 the frequency scaling_speed to the corresponding value, and set
523 the frequency scaling_governor to UserSpace. See below for defi‐
524 nition of the values.
525
526 p1 can be [Conservative | OnDemand | Performance | PowerSave]
527 which will set the scaling_governor to the corresponding value.
528 The governor has to be in the list set by the slurm.conf option
529 CpuFreqGovernors.
530
531 When p2 is present, p1 will be the minimum scaling frequency and
532 p2 will be the maximum scaling frequency.
533
534 p2 can be [#### | medium | high | highm1] p2 must be greater
535 than p1.
536
537 p3 can be [Conservative | OnDemand | Performance | PowerSave |
538 SchedUtil | UserSpace] which will set the governor to the corre‐
539 sponding value.
540
541 If p3 is UserSpace, the frequency scaling_speed will be set by a
542 power or energy aware scheduling strategy to a value between p1
543 and p2 that lets the job run within the site's power goal. The
544 job may be delayed if p1 is higher than a frequency that allows
545 the job to run within the goal.
546
547 If the current frequency is < min, it will be set to min. Like‐
548 wise, if the current frequency is > max, it will be set to max.
549
550 Acceptable values at present include:
551
552 #### frequency in kilohertz
553
554 Low the lowest available frequency
555
556 High the highest available frequency
557
558 HighM1 (high minus one) will select the next highest
559 available frequency
560
561 Medium attempts to set a frequency in the middle of the
562 available range
563
564 Conservative attempts to use the Conservative CPU governor
565
566 OnDemand attempts to use the OnDemand CPU governor (the de‐
567 fault value)
568
569 Performance attempts to use the Performance CPU governor
570
571 PowerSave attempts to use the PowerSave CPU governor
572
573 UserSpace attempts to use the UserSpace CPU governor
574
575
576 The following informational environment variable is set
577 in the job
578 step when --cpu-freq option is requested.
579 SLURM_CPU_FREQ_REQ
580
581 This environment variable can also be used to supply the value
582 for the CPU frequency request if it is set when the 'srun' com‐
583 mand is issued. The --cpu-freq on the command line will over‐
584 ride the environment variable value. The form on the environ‐
585 ment variable is the same as the command line. See the ENVIRON‐
586 MENT VARIABLES section for a description of the
587 SLURM_CPU_FREQ_REQ variable.
588
589 NOTE: This parameter is treated as a request, not a requirement.
590 If the job step's node does not support setting the CPU fre‐
591 quency, or the requested value is outside the bounds of the le‐
592 gal frequencies, an error is logged, but the job step is allowed
593 to continue.
594
595 NOTE: Setting the frequency for just the CPUs of the job step
596 implies that the tasks are confined to those CPUs. If task con‐
597 finement (i.e., TaskPlugin=task/affinity or TaskPlu‐
598 gin=task/cgroup with the "ConstrainCores" option) is not config‐
599 ured, this parameter is ignored.
600
601 NOTE: When the step completes, the frequency and governor of
602 each selected CPU is reset to the previous values.
603
604 NOTE: When submitting jobs with the --cpu-freq option with lin‐
605 uxproc as the ProctrackType can cause jobs to run too quickly
606 before Accounting is able to poll for job information. As a re‐
607 sult not all of accounting information will be present.
608
609 This option applies to job and step allocations.
610
611
612 --cpus-per-gpu=<ncpus>
613 Advise Slurm that ensuing job steps will require ncpus proces‐
614 sors per allocated GPU. Not compatible with the --cpus-per-task
615 option.
616
617
618 -c, --cpus-per-task=<ncpus>
619 Request that ncpus be allocated per process. This may be useful
620 if the job is multithreaded and requires more than one CPU per
621 task for optimal performance. Explicitly requesting this option
622 implies --exact. The default is one CPU per process and does not
623 imply --exact. If -c is specified without -n, as many tasks
624 will be allocated per node as possible while satisfying the -c
625 restriction. For instance on a cluster with 8 CPUs per node, a
626 job request for 4 nodes and 3 CPUs per task may be allocated 3
627 or 6 CPUs per node (1 or 2 tasks per node) depending upon re‐
628 source consumption by other jobs. Such a job may be unable to
629 execute more than a total of 4 tasks.
630
631 WARNING: There are configurations and options interpreted dif‐
632 ferently by job and job step requests which can result in incon‐
633 sistencies for this option. For example srun -c2
634 --threads-per-core=1 prog may allocate two cores for the job,
635 but if each of those cores contains two threads, the job alloca‐
636 tion will include four CPUs. The job step allocation will then
637 launch two threads per CPU for a total of two tasks.
638
639 WARNING: When srun is executed from within salloc or sbatch,
640 there are configurations and options which can result in incon‐
641 sistent allocations when -c has a value greater than -c on sal‐
642 loc or sbatch.
643
644 This option applies to job and step allocations.
645
646
647 --deadline=<OPT>
648 remove the job if no ending is possible before this deadline
649 (start > (deadline - time[-min])). Default is no deadline.
650 Valid time formats are:
651 HH:MM[:SS] [AM|PM]
652 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
653 MM/DD[/YY]-HH:MM[:SS]
654 YYYY-MM-DD[THH:MM[:SS]]]
655 now[+count[seconds(default)|minutes|hours|days|weeks]]
656
657 This option applies only to job allocations.
658
659
660 --delay-boot=<minutes>
661 Do not reboot nodes in order to satisfied this job's feature
662 specification if the job has been eligible to run for less than
663 this time period. If the job has waited for less than the spec‐
664 ified period, it will use only nodes which already have the
665 specified features. The argument is in units of minutes. A de‐
666 fault value may be set by a system administrator using the de‐
667 lay_boot option of the SchedulerParameters configuration parame‐
668 ter in the slurm.conf file, otherwise the default value is zero
669 (no delay).
670
671 This option applies only to job allocations.
672
673
674 -d, --dependency=<dependency_list>
675 Defer the start of this job until the specified dependencies
676 have been satisfied completed. This option does not apply to job
677 steps (executions of srun within an existing salloc or sbatch
678 allocation) only to job allocations. <dependency_list> is of
679 the form <type:job_id[:job_id][,type:job_id[:job_id]]> or
680 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
681 must be satisfied if the "," separator is used. Any dependency
682 may be satisfied if the "?" separator is used. Only one separa‐
683 tor may be used. Many jobs can share the same dependency and
684 these jobs may even belong to different users. The value may
685 be changed after job submission using the scontrol command. De‐
686 pendencies on remote jobs are allowed in a federation. Once a
687 job dependency fails due to the termination state of a preceding
688 job, the dependent job will never be run, even if the preceding
689 job is requeued and has a different termination state in a sub‐
690 sequent execution. This option applies to job allocations.
691
692 after:job_id[[+time][:jobid[+time]...]]
693 After the specified jobs start or are cancelled and
694 'time' in minutes from job start or cancellation happens,
695 this job can begin execution. If no 'time' is given then
696 there is no delay after start or cancellation.
697
698 afterany:job_id[:jobid...]
699 This job can begin execution after the specified jobs
700 have terminated.
701
702 afterburstbuffer:job_id[:jobid...]
703 This job can begin execution after the specified jobs
704 have terminated and any associated burst buffer stage out
705 operations have completed.
706
707 aftercorr:job_id[:jobid...]
708 A task of this job array can begin execution after the
709 corresponding task ID in the specified job has completed
710 successfully (ran to completion with an exit code of
711 zero).
712
713 afternotok:job_id[:jobid...]
714 This job can begin execution after the specified jobs
715 have terminated in some failed state (non-zero exit code,
716 node failure, timed out, etc).
717
718 afterok:job_id[:jobid...]
719 This job can begin execution after the specified jobs
720 have successfully executed (ran to completion with an
721 exit code of zero).
722
723 singleton
724 This job can begin execution after any previously
725 launched jobs sharing the same job name and user have
726 terminated. In other words, only one job by that name
727 and owned by that user can be running or suspended at any
728 point in time. In a federation, a singleton dependency
729 must be fulfilled on all clusters unless DependencyParam‐
730 eters=disable_remote_singleton is used in slurm.conf.
731
732
733 -X, --disable-status
734 Disable the display of task status when srun receives a single
735 SIGINT (Ctrl-C). Instead immediately forward the SIGINT to the
736 running job. Without this option a second Ctrl-C in one second
737 is required to forcibly terminate the job and srun will immedi‐
738 ately exit. May also be set via the environment variable
739 SLURM_DISABLE_STATUS. This option applies to job allocations.
740
741
742 -m, --distribution={*|block|cyclic|arbi‐
743 trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
744
745 Specify alternate distribution methods for remote processes.
746 For job allocation, this sets environment variables that will be
747 used by subsequent srun requests and also affects which cores
748 will be selected for job allocation.
749
750 This option controls the distribution of tasks to the nodes on
751 which resources have been allocated, and the distribution of
752 those resources to tasks for binding (task affinity). The first
753 distribution method (before the first ":") controls the distri‐
754 bution of tasks to nodes. The second distribution method (after
755 the first ":") controls the distribution of allocated CPUs
756 across sockets for binding to tasks. The third distribution
757 method (after the second ":") controls the distribution of allo‐
758 cated CPUs across cores for binding to tasks. The second and
759 third distributions apply only if task affinity is enabled. The
760 third distribution is supported only if the task/cgroup plugin
761 is configured. The default value for each distribution type is
762 specified by *.
763
764 Note that with select/cons_res and select/cons_tres, the number
765 of CPUs allocated to each socket and node may be different. Re‐
766 fer to https://slurm.schedmd.com/mc_support.html for more infor‐
767 mation on resource allocation, distribution of tasks to nodes,
768 and binding of tasks to CPUs.
769 First distribution method (distribution of tasks across nodes):
770
771
772 * Use the default method for distributing tasks to nodes
773 (block).
774
775 block The block distribution method will distribute tasks to a
776 node such that consecutive tasks share a node. For exam‐
777 ple, consider an allocation of three nodes each with two
778 cpus. A four-task block distribution request will dis‐
779 tribute those tasks to the nodes with tasks one and two
780 on the first node, task three on the second node, and
781 task four on the third node. Block distribution is the
782 default behavior if the number of tasks exceeds the num‐
783 ber of allocated nodes.
784
785 cyclic The cyclic distribution method will distribute tasks to a
786 node such that consecutive tasks are distributed over
787 consecutive nodes (in a round-robin fashion). For exam‐
788 ple, consider an allocation of three nodes each with two
789 cpus. A four-task cyclic distribution request will dis‐
790 tribute those tasks to the nodes with tasks one and four
791 on the first node, task two on the second node, and task
792 three on the third node. Note that when SelectType is
793 select/cons_res, the same number of CPUs may not be allo‐
794 cated on each node. Task distribution will be round-robin
795 among all the nodes with CPUs yet to be assigned to
796 tasks. Cyclic distribution is the default behavior if
797 the number of tasks is no larger than the number of allo‐
798 cated nodes.
799
800 plane The tasks are distributed in blocks of size <size>. The
801 size must be given or SLURM_DIST_PLANESIZE must be set.
802 The number of tasks distributed to each node is the same
803 as for cyclic distribution, but the taskids assigned to
804 each node depend on the plane size. Additional distribu‐
805 tion specifications cannot be combined with this option.
806 For more details (including examples and diagrams),
807 please see https://slurm.schedmd.com/mc_support.html and
808 https://slurm.schedmd.com/dist_plane.html
809
810 arbitrary
811 The arbitrary method of distribution will allocate pro‐
812 cesses in-order as listed in file designated by the envi‐
813 ronment variable SLURM_HOSTFILE. If this variable is
814 listed it will over ride any other method specified. If
815 not set the method will default to block. Inside the
816 hostfile must contain at minimum the number of hosts re‐
817 quested and be one per line or comma separated. If spec‐
818 ifying a task count (-n, --ntasks=<number>), your tasks
819 will be laid out on the nodes in the order of the file.
820 NOTE: The arbitrary distribution option on a job alloca‐
821 tion only controls the nodes to be allocated to the job
822 and not the allocation of CPUs on those nodes. This op‐
823 tion is meant primarily to control a job step's task lay‐
824 out in an existing job allocation for the srun command.
825 NOTE: If the number of tasks is given and a list of re‐
826 quested nodes is also given, the number of nodes used
827 from that list will be reduced to match that of the num‐
828 ber of tasks if the number of nodes in the list is
829 greater than the number of tasks.
830
831
832 Second distribution method (distribution of CPUs across sockets
833 for binding):
834
835
836 * Use the default method for distributing CPUs across sock‐
837 ets (cyclic).
838
839 block The block distribution method will distribute allocated
840 CPUs consecutively from the same socket for binding to
841 tasks, before using the next consecutive socket.
842
843 cyclic The cyclic distribution method will distribute allocated
844 CPUs for binding to a given task consecutively from the
845 same socket, and from the next consecutive socket for the
846 next task, in a round-robin fashion across sockets.
847 Tasks requiring more than one CPU will have all of those
848 CPUs allocated on a single socket if possible.
849
850 fcyclic
851 The fcyclic distribution method will distribute allocated
852 CPUs for binding to tasks from consecutive sockets in a
853 round-robin fashion across the sockets. Tasks requiring
854 more than one CPU will have each CPUs allocated in a
855 cyclic fashion across sockets.
856
857
858 Third distribution method (distribution of CPUs across cores for
859 binding):
860
861
862 * Use the default method for distributing CPUs across cores
863 (inherited from second distribution method).
864
865 block The block distribution method will distribute allocated
866 CPUs consecutively from the same core for binding to
867 tasks, before using the next consecutive core.
868
869 cyclic The cyclic distribution method will distribute allocated
870 CPUs for binding to a given task consecutively from the
871 same core, and from the next consecutive core for the
872 next task, in a round-robin fashion across cores.
873
874 fcyclic
875 The fcyclic distribution method will distribute allocated
876 CPUs for binding to tasks from consecutive cores in a
877 round-robin fashion across the cores.
878
879
880
881 Optional control for task distribution over nodes:
882
883
884 Pack Rather than evenly distributing a job step's tasks evenly
885 across its allocated nodes, pack them as tightly as pos‐
886 sible on the nodes. This only applies when the "block"
887 task distribution method is used.
888
889 NoPack Rather than packing a job step's tasks as tightly as pos‐
890 sible on the nodes, distribute them evenly. This user
891 option will supersede the SelectTypeParameters
892 CR_Pack_Nodes configuration parameter.
893
894 This option applies to job and step allocations.
895
896
897 --epilog={none|<executable>}
898 srun will run executable just after the job step completes. The
899 command line arguments for executable will be the command and
900 arguments of the job step. If none is specified, then no srun
901 epilog will be run. This parameter overrides the SrunEpilog pa‐
902 rameter in slurm.conf. This parameter is completely independent
903 from the Epilog parameter in slurm.conf. This option applies to
904 job allocations.
905
906
907 -e, --error=<filename_pattern>
908 Specify how stderr is to be redirected. By default in interac‐
909 tive mode, srun redirects stderr to the same file as stdout, if
910 one is specified. The --error option is provided to allow stdout
911 and stderr to be redirected to different locations. See IO Re‐
912 direction below for more options. If the specified file already
913 exists, it will be overwritten. This option applies to job and
914 step allocations.
915
916
917 --exact
918 Allow a step access to only the resources requested for the
919 step. By default, all non-GRES resources on each node in the
920 step allocation will be used. This option only applies to step
921 allocations.
922 NOTE: Parallel steps will either be blocked or rejected until
923 requested step resources are available unless --overlap is spec‐
924 ified. Job resources can be held after the completion of an srun
925 command while Slurm does job cleanup. Step epilogs and/or SPANK
926 plugins can further delay the release of step resources.
927
928
929 -x, --exclude={<host1[,<host2>...]|<filename>}
930 Request that a specific list of hosts not be included in the re‐
931 sources allocated to this job. The host list will be assumed to
932 be a filename if it contains a "/" character. This option ap‐
933 plies to job and step allocations.
934
935
936 --exclusive[={user|mcs}]
937 This option applies to job and job step allocations, and has two
938 slightly different meanings for each one. When used to initiate
939 a job, the job allocation cannot share nodes with other running
940 jobs (or just other users with the "=user" option or "=mcs" op‐
941 tion). If user/mcs are not specified (i.e. the job allocation
942 can not share nodes with other running jobs), the job is allo‐
943 cated all CPUs and GRES on all nodes in the allocation, but is
944 only allocated as much memory as it requested. This is by design
945 to support gang scheduling, because suspended jobs still reside
946 in memory. To request all the memory on a node, use --mem=0.
947 The default shared/exclusive behavior depends on system configu‐
948 ration and the partition's OverSubscribe option takes precedence
949 over the job's option.
950
951 This option can also be used when initiating more than one job
952 step within an existing resource allocation (default), where you
953 want separate processors to be dedicated to each job step. If
954 sufficient processors are not available to initiate the job
955 step, it will be deferred. This can be thought of as providing a
956 mechanism for resource management to the job within its alloca‐
957 tion (--exact implied).
958
959 The exclusive allocation of CPUs applies to job steps by de‐
960 fault. In order to share the resources use the --overlap option.
961
962 See EXAMPLE below.
963
964
965 --export={[ALL,]<environment_variables>|ALL|NONE}
966 Identify which environment variables from the submission envi‐
967 ronment are propagated to the launched application.
968
969 --export=ALL
970 Default mode if --export is not specified. All of the
971 user's environment will be loaded from the caller's
972 environment.
973
974
975 --export=NONE
976 None of the user environment will be defined. User
977 must use absolute path to the binary to be executed
978 that will define the environment. User can not specify
979 explicit environment variables with "NONE".
980
981 This option is particularly important for jobs that
982 are submitted on one cluster and execute on a differ‐
983 ent cluster (e.g. with different paths). To avoid
984 steps inheriting environment export settings (e.g.
985 "NONE") from sbatch command, either set --export=ALL
986 or the environment variable SLURM_EXPORT_ENV should be
987 set to "ALL".
988
989 --export=[ALL,]<environment_variables>
990 Exports all SLURM* environment variables along with
991 explicitly defined variables. Multiple environment
992 variable names should be comma separated. Environment
993 variable names may be specified to propagate the cur‐
994 rent value (e.g. "--export=EDITOR") or specific values
995 may be exported (e.g. "--export=EDITOR=/bin/emacs").
996 If "ALL" is specified, then all user environment vari‐
997 ables will be loaded and will take precedence over any
998 explicitly given environment variables.
999
1000 Example: --export=EDITOR,ARG1=test
1001 In this example, the propagated environment will only
1002 contain the variable EDITOR from the user's environ‐
1003 ment, SLURM_* environment variables, and ARG1=test.
1004
1005 Example: --export=ALL,EDITOR=/bin/emacs
1006 There are two possible outcomes for this example. If
1007 the caller has the EDITOR environment variable de‐
1008 fined, then the job's environment will inherit the
1009 variable from the caller's environment. If the caller
1010 doesn't have an environment variable defined for EDI‐
1011 TOR, then the job's environment will use the value
1012 given by --export.
1013
1014
1015 -B, --extra-node-info=<sockets>[:cores[:threads]]
1016 Restrict node selection to nodes with at least the specified
1017 number of sockets, cores per socket and/or threads per core.
1018 NOTE: These options do not specify the resource allocation size.
1019 Each value specified is considered a minimum. An asterisk (*)
1020 can be used as a placeholder indicating that all available re‐
1021 sources of that type are to be utilized. Values can also be
1022 specified as min-max. The individual levels can also be speci‐
1023 fied in separate options if desired:
1024 --sockets-per-node=<sockets>
1025 --cores-per-socket=<cores>
1026 --threads-per-core=<threads>
1027 If task/affinity plugin is enabled, then specifying an alloca‐
1028 tion in this manner also sets a default --cpu-bind option of
1029 threads if the -B option specifies a thread count, otherwise an
1030 option of cores if a core count is specified, otherwise an op‐
1031 tion of sockets. If SelectType is configured to se‐
1032 lect/cons_res, it must have a parameter of CR_Core, CR_Core_Mem‐
1033 ory, CR_Socket, or CR_Socket_Memory for this option to be hon‐
1034 ored. If not specified, the scontrol show job will display
1035 'ReqS:C:T=*:*:*'. This option applies to job allocations.
1036 NOTE: This option is mutually exclusive with --hint,
1037 --threads-per-core and --ntasks-per-core.
1038 NOTE: If the number of sockets, cores and threads were all spec‐
1039 ified, the number of nodes was specified (as a fixed number, not
1040 a range) and the number of tasks was NOT specified, srun will
1041 implicitly calculate the number of tasks as one task per thread.
1042
1043
1044 --gid=<group>
1045 If srun is run as root, and the --gid option is used, submit the
1046 job with group's group access permissions. group may be the
1047 group name or the numerical group ID. This option applies to job
1048 allocations.
1049
1050
1051 --gpu-bind=[verbose,]<type>
1052 Bind tasks to specific GPUs. By default every spawned task can
1053 access every GPU allocated to the step. If "verbose," is speci‐
1054 fied before <type>, then print out GPU binding debug information
1055 to the stderr of the tasks. GPU binding is ignored if there is
1056 only one task.
1057
1058 Supported type options:
1059
1060 closest Bind each task to the GPU(s) which are closest. In a
1061 NUMA environment, each task may be bound to more than
1062 one GPU (i.e. all GPUs in that NUMA environment).
1063
1064 map_gpu:<list>
1065 Bind by setting GPU masks on tasks (or ranks) as spec‐
1066 ified where <list> is
1067 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
1068 are interpreted as decimal values unless they are pre‐
1069 ceded with '0x' in which case they interpreted as
1070 hexadecimal values. If the number of tasks (or ranks)
1071 exceeds the number of elements in this list, elements
1072 in the list will be reused as needed starting from the
1073 beginning of the list. To simplify support for large
1074 task counts, the lists may follow a map with an aster‐
1075 isk and repetition count. For example
1076 "map_gpu:0*4,1*4". If the task/cgroup plugin is used
1077 and ConstrainDevices is set in cgroup.conf, then the
1078 GPU IDs are zero-based indexes relative to the GPUs
1079 allocated to the job (e.g. the first GPU is 0, even if
1080 the global ID is 3). Otherwise, the GPU IDs are global
1081 IDs, and all GPUs on each node in the job should be
1082 allocated for predictable binding results.
1083
1084 mask_gpu:<list>
1085 Bind by setting GPU masks on tasks (or ranks) as spec‐
1086 ified where <list> is
1087 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
1088 mapping is specified for a node and identical mapping
1089 is applied to the tasks on every node (i.e. the lowest
1090 task ID on each node is mapped to the first mask spec‐
1091 ified in the list, etc.). GPU masks are always inter‐
1092 preted as hexadecimal values but can be preceded with
1093 an optional '0x'. To simplify support for large task
1094 counts, the lists may follow a map with an asterisk
1095 and repetition count. For example
1096 "mask_gpu:0x0f*4,0xf0*4". If the task/cgroup plugin
1097 is used and ConstrainDevices is set in cgroup.conf,
1098 then the GPU IDs are zero-based indexes relative to
1099 the GPUs allocated to the job (e.g. the first GPU is
1100 0, even if the global ID is 3). Otherwise, the GPU IDs
1101 are global IDs, and all GPUs on each node in the job
1102 should be allocated for predictable binding results.
1103
1104 none Do not bind tasks to GPUs (turns off binding if
1105 --gpus-per-task is requested).
1106
1107 per_task:<gpus_per_task>
1108 Each task will be bound to the number of gpus speci‐
1109 fied in <gpus_per_task>. Gpus are assigned in order to
1110 tasks. The first task will be assigned the first x
1111 number of gpus on the node etc.
1112
1113 single:<tasks_per_gpu>
1114 Like --gpu-bind=closest, except that each task can
1115 only be bound to a single GPU, even when it can be
1116 bound to multiple GPUs that are equally close. The
1117 GPU to bind to is determined by <tasks_per_gpu>, where
1118 the first <tasks_per_gpu> tasks are bound to the first
1119 GPU available, the second <tasks_per_gpu> tasks are
1120 bound to the second GPU available, etc. This is basi‐
1121 cally a block distribution of tasks onto available
1122 GPUs, where the available GPUs are determined by the
1123 socket affinity of the task and the socket affinity of
1124 the GPUs as specified in gres.conf's Cores parameter.
1125
1126
1127 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
1128 Request that GPUs allocated to the job are configured with spe‐
1129 cific frequency values. This option can be used to indepen‐
1130 dently configure the GPU and its memory frequencies. After the
1131 job is completed, the frequencies of all affected GPUs will be
1132 reset to the highest possible values. In some cases, system
1133 power caps may override the requested values. The field type
1134 can be "memory". If type is not specified, the GPU frequency is
1135 implied. The value field can either be "low", "medium", "high",
1136 "highm1" or a numeric value in megahertz (MHz). If the speci‐
1137 fied numeric value is not possible, a value as close as possible
1138 will be used. See below for definition of the values. The ver‐
1139 bose option causes current GPU frequency information to be
1140 logged. Examples of use include "--gpu-freq=medium,memory=high"
1141 and "--gpu-freq=450".
1142
1143 Supported value definitions:
1144
1145 low the lowest available frequency.
1146
1147 medium attempts to set a frequency in the middle of the
1148 available range.
1149
1150 high the highest available frequency.
1151
1152 highm1 (high minus one) will select the next highest avail‐
1153 able frequency.
1154
1155
1156 -G, --gpus=[type:]<number>
1157 Specify the total number of GPUs required for the job. An op‐
1158 tional GPU type specification can be supplied. For example
1159 "--gpus=volta:3". Multiple options can be requested in a comma
1160 separated list, for example: "--gpus=volta:3,kepler:1". See
1161 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
1162 options.
1163
1164
1165 --gpus-per-node=[type:]<number>
1166 Specify the number of GPUs required for the job on each node in‐
1167 cluded in the job's resource allocation. An optional GPU type
1168 specification can be supplied. For example
1169 "--gpus-per-node=volta:3". Multiple options can be requested in
1170 a comma separated list, for example:
1171 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
1172 --gpus-per-socket and --gpus-per-task options.
1173
1174
1175 --gpus-per-socket=[type:]<number>
1176 Specify the number of GPUs required for the job on each socket
1177 included in the job's resource allocation. An optional GPU type
1178 specification can be supplied. For example
1179 "--gpus-per-socket=volta:3". Multiple options can be requested
1180 in a comma separated list, for example:
1181 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
1182 sockets per node count ( --sockets-per-node). See also the
1183 --gpus, --gpus-per-node and --gpus-per-task options. This op‐
1184 tion applies to job allocations.
1185
1186
1187 --gpus-per-task=[type:]<number>
1188 Specify the number of GPUs required for the job on each task to
1189 be spawned in the job's resource allocation. An optional GPU
1190 type specification can be supplied. For example
1191 "--gpus-per-task=volta:1". Multiple options can be requested in
1192 a comma separated list, for example:
1193 "--gpus-per-task=volta:3,kepler:1". See also the --gpus,
1194 --gpus-per-socket and --gpus-per-node options. This option re‐
1195 quires an explicit task count, e.g. -n, --ntasks or "--gpus=X
1196 --gpus-per-task=Y" rather than an ambiguous range of nodes with
1197 -N, --nodes. This option will implicitly set
1198 --gpu-bind=per_task:<gpus_per_task>, but that can be overridden
1199 with an explicit --gpu-bind specification.
1200
1201
1202 --gres=<list>
1203 Specifies a comma-delimited list of generic consumable re‐
1204 sources. The format of each entry on the list is
1205 "name[[:type]:count]". The name is that of the consumable re‐
1206 source. The count is the number of those resources with a de‐
1207 fault value of 1. The count can have a suffix of "k" or "K"
1208 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
1209 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
1210 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
1211 x 1024 x 1024 x 1024). The specified resources will be allo‐
1212 cated to the job on each node. The available generic consumable
1213 resources is configurable by the system administrator. A list
1214 of available generic consumable resources will be printed and
1215 the command will exit if the option argument is "help". Exam‐
1216 ples of use include "--gres=gpu:2", "--gres=gpu:kepler:2", and
1217 "--gres=help". NOTE: This option applies to job and step allo‐
1218 cations. By default, a job step is allocated all of the generic
1219 resources that have been allocated to the job. To change the
1220 behavior so that each job step is allocated no generic re‐
1221 sources, explicitly set the value of --gres to specify zero
1222 counts for each generic resource OR set "--gres=none" OR set the
1223 SLURM_STEP_GRES environment variable to "none".
1224
1225
1226 --gres-flags=<type>
1227 Specify generic resource task binding options. This option ap‐
1228 plies to job allocations.
1229
1230 disable-binding
1231 Disable filtering of CPUs with respect to generic re‐
1232 source locality. This option is currently required to
1233 use more CPUs than are bound to a GRES (i.e. if a GPU is
1234 bound to the CPUs on one socket, but resources on more
1235 than one socket are required to run the job). This op‐
1236 tion may permit a job to be allocated resources sooner
1237 than otherwise possible, but may result in lower job per‐
1238 formance.
1239 NOTE: This option is specific to SelectType=cons_res.
1240
1241 enforce-binding
1242 The only CPUs available to the job will be those bound to
1243 the selected GRES (i.e. the CPUs identified in the
1244 gres.conf file will be strictly enforced). This option
1245 may result in delayed initiation of a job. For example a
1246 job requiring two GPUs and one CPU will be delayed until
1247 both GPUs on a single socket are available rather than
1248 using GPUs bound to separate sockets, however, the appli‐
1249 cation performance may be improved due to improved commu‐
1250 nication speed. Requires the node to be configured with
1251 more than one socket and resource filtering will be per‐
1252 formed on a per-socket basis.
1253 NOTE: This option is specific to SelectType=cons_tres.
1254
1255
1256 -h, --help
1257 Display help information and exit.
1258
1259
1260 --het-group=<expr>
1261 Identify each component in a heterogeneous job allocation for
1262 which a step is to be created. Applies only to srun commands is‐
1263 sued inside a salloc allocation or sbatch script. <expr> is a
1264 set of integers corresponding to one or more options offsets on
1265 the salloc or sbatch command line. Examples: "--het-group=2",
1266 "--het-group=0,4", "--het-group=1,3-5". The default value is
1267 --het-group=0.
1268
1269
1270 --hint=<type>
1271 Bind tasks according to application hints.
1272 NOTE: This option cannot be used in conjunction with any of
1273 --ntasks-per-core, --threads-per-core, --cpu-bind (other than
1274 --cpu-bind=verbose) or -B. If --hint is specified as a command
1275 line argument, it will take precedence over the environment.
1276
1277 compute_bound
1278 Select settings for compute bound applications: use all
1279 cores in each socket, one thread per core.
1280
1281 memory_bound
1282 Select settings for memory bound applications: use only
1283 one core in each socket, one thread per core.
1284
1285 [no]multithread
1286 [don't] use extra threads with in-core multi-threading
1287 which can benefit communication intensive applications.
1288 Only supported with the task/affinity plugin.
1289
1290 help show this help message
1291
1292 This option applies to job allocations.
1293
1294
1295 -H, --hold
1296 Specify the job is to be submitted in a held state (priority of
1297 zero). A held job can now be released using scontrol to reset
1298 its priority (e.g. "scontrol release <job_id>"). This option ap‐
1299 plies to job allocations.
1300
1301
1302 -I, --immediate[=<seconds>]
1303 exit if resources are not available within the time period spec‐
1304 ified. If no argument is given (seconds defaults to 1), re‐
1305 sources must be available immediately for the request to suc‐
1306 ceed. If defer is configured in SchedulerParameters and sec‐
1307 onds=1 the allocation request will fail immediately; defer con‐
1308 flicts and takes precedence over this option. By default, --im‐
1309 mediate is off, and the command will block until resources be‐
1310 come available. Since this option's argument is optional, for
1311 proper parsing the single letter option must be followed immedi‐
1312 ately with the value and not include a space between them. For
1313 example "-I60" and not "-I 60". This option applies to job and
1314 step allocations.
1315
1316
1317 -i, --input=<mode>
1318 Specify how stdin is to redirected. By default, srun redirects
1319 stdin from the terminal all tasks. See IO Redirection below for
1320 more options. For OS X, the poll() function does not support
1321 stdin, so input from a terminal is not possible. This option ap‐
1322 plies to job and step allocations.
1323
1324
1325 -J, --job-name=<jobname>
1326 Specify a name for the job. The specified name will appear along
1327 with the job id number when querying running jobs on the system.
1328 The default is the supplied executable program's name. NOTE:
1329 This information may be written to the slurm_jobacct.log file.
1330 This file is space delimited so if a space is used in the job‐
1331 name name it will cause problems in properly displaying the con‐
1332 tents of the slurm_jobacct.log file when the sacct command is
1333 used. This option applies to job and step allocations.
1334
1335
1336 --jobid=<jobid>
1337 Initiate a job step under an already allocated job with job id
1338 id. Using this option will cause srun to behave exactly as if
1339 the SLURM_JOB_ID environment variable was set. This option ap‐
1340 plies to step allocations.
1341
1342
1343 -K, --kill-on-bad-exit[=0|1]
1344 Controls whether or not to terminate a step if any task exits
1345 with a non-zero exit code. If this option is not specified, the
1346 default action will be based upon the Slurm configuration param‐
1347 eter of KillOnBadExit. If this option is specified, it will take
1348 precedence over KillOnBadExit. An option argument of zero will
1349 not terminate the job. A non-zero argument or no argument will
1350 terminate the job. Note: This option takes precedence over the
1351 -W, --wait option to terminate the job immediately if a task ex‐
1352 its with a non-zero exit code. Since this option's argument is
1353 optional, for proper parsing the single letter option must be
1354 followed immediately with the value and not include a space be‐
1355 tween them. For example "-K1" and not "-K 1".
1356
1357
1358 -l, --label
1359 Prepend task number to lines of stdout/err. The --label option
1360 will prepend lines of output with the remote task id. This op‐
1361 tion applies to step allocations.
1362
1363
1364 -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
1365 Specification of licenses (or other resources available on all
1366 nodes of the cluster) which must be allocated to this job. Li‐
1367 cense names can be followed by a colon and count (the default
1368 count is one). Multiple license names should be comma separated
1369 (e.g. "--licenses=foo:4,bar"). This option applies to job allo‐
1370 cations.
1371
1372
1373 --mail-type=<type>
1374 Notify user by email when certain event types occur. Valid type
1375 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1376 BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
1377 VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1378 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1379 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
1380 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1381 time limit). Multiple type values may be specified in a comma
1382 separated list. The user to be notified is indicated with
1383 --mail-user. This option applies to job allocations.
1384
1385
1386 --mail-user=<user>
1387 User to receive email notification of state changes as defined
1388 by --mail-type. The default value is the submitting user. This
1389 option applies to job allocations.
1390
1391
1392 --mcs-label=<mcs>
1393 Used only when the mcs/group plugin is enabled. This parameter
1394 is a group among the groups of the user. Default value is cal‐
1395 culated by the Plugin mcs if it's enabled. This option applies
1396 to job allocations.
1397
1398
1399 --mem=<size>[units]
1400 Specify the real memory required per node. Default units are
1401 megabytes. Different units can be specified using the suffix
1402 [K|M|G|T]. Default value is DefMemPerNode and the maximum value
1403 is MaxMemPerNode. If configured, both of parameters can be seen
1404 using the scontrol show config command. This parameter would
1405 generally be used if whole nodes are allocated to jobs (Select‐
1406 Type=select/linear). Specifying a memory limit of zero for a
1407 job step will restrict the job step to the amount of memory al‐
1408 located to the job, but not remove any of the job's memory allo‐
1409 cation from being available to other job steps. Also see
1410 --mem-per-cpu and --mem-per-gpu. The --mem, --mem-per-cpu and
1411 --mem-per-gpu options are mutually exclusive. If --mem,
1412 --mem-per-cpu or --mem-per-gpu are specified as command line ar‐
1413 guments, then they will take precedence over the environment
1414 (potentially inherited from salloc or sbatch).
1415
1416 NOTE: A memory size specification of zero is treated as a spe‐
1417 cial case and grants the job access to all of the memory on each
1418 node for newly submitted jobs and all available job memory to
1419 new job steps.
1420
1421 Specifying new memory limits for job steps are only advisory.
1422
1423 If the job is allocated multiple nodes in a heterogeneous clus‐
1424 ter, the memory limit on each node will be that of the node in
1425 the allocation with the smallest memory size (same limit will
1426 apply to every node in the job's allocation).
1427
1428 NOTE: Enforcement of memory limits currently relies upon the
1429 task/cgroup plugin or enabling of accounting, which samples mem‐
1430 ory use on a periodic basis (data need not be stored, just col‐
1431 lected). In both cases memory use is based upon the job's Resi‐
1432 dent Set Size (RSS). A task may exceed the memory limit until
1433 the next periodic accounting sample.
1434
1435 This option applies to job and step allocations.
1436
1437
1438 --mem-bind=[{quiet|verbose},]<type>
1439 Bind tasks to memory. Used only when the task/affinity plugin is
1440 enabled and the NUMA memory functions are available. Note that
1441 the resolution of CPU and memory binding may differ on some ar‐
1442 chitectures. For example, CPU binding may be performed at the
1443 level of the cores within a processor while memory binding will
1444 be performed at the level of nodes, where the definition of
1445 "nodes" may differ from system to system. By default no memory
1446 binding is performed; any task using any CPU can use any memory.
1447 This option is typically used to ensure that each task is bound
1448 to the memory closest to its assigned CPU. The use of any type
1449 other than "none" or "local" is not recommended. If you want
1450 greater control, try running a simple test code with the options
1451 "--cpu-bind=verbose,none --mem-bind=verbose,none" to determine
1452 the specific configuration.
1453
1454 NOTE: To have Slurm always report on the selected memory binding
1455 for all commands executed in a shell, you can enable verbose
1456 mode by setting the SLURM_MEM_BIND environment variable value to
1457 "verbose".
1458
1459 The following informational environment variables are set when
1460 --mem-bind is in use:
1461
1462 SLURM_MEM_BIND_LIST
1463 SLURM_MEM_BIND_PREFER
1464 SLURM_MEM_BIND_SORT
1465 SLURM_MEM_BIND_TYPE
1466 SLURM_MEM_BIND_VERBOSE
1467
1468 See the ENVIRONMENT VARIABLES section for a more detailed de‐
1469 scription of the individual SLURM_MEM_BIND* variables.
1470
1471 Supported options include:
1472
1473 help show this help message
1474
1475 local Use memory local to the processor in use
1476
1477 map_mem:<list>
1478 Bind by setting memory masks on tasks (or ranks) as spec‐
1479 ified where <list> is
1480 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1481 ping is specified for a node and identical mapping is ap‐
1482 plied to the tasks on every node (i.e. the lowest task ID
1483 on each node is mapped to the first ID specified in the
1484 list, etc.). NUMA IDs are interpreted as decimal values
1485 unless they are preceded with '0x' in which case they in‐
1486 terpreted as hexadecimal values. If the number of tasks
1487 (or ranks) exceeds the number of elements in this list,
1488 elements in the list will be reused as needed starting
1489 from the beginning of the list. To simplify support for
1490 large task counts, the lists may follow a map with an as‐
1491 terisk and repetition count. For example
1492 "map_mem:0x0f*4,0xf0*4". For predictable binding re‐
1493 sults, all CPUs for each node in the job should be allo‐
1494 cated to the job.
1495
1496 mask_mem:<list>
1497 Bind by setting memory masks on tasks (or ranks) as spec‐
1498 ified where <list> is
1499 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1500 mapping is specified for a node and identical mapping is
1501 applied to the tasks on every node (i.e. the lowest task
1502 ID on each node is mapped to the first mask specified in
1503 the list, etc.). NUMA masks are always interpreted as
1504 hexadecimal values. Note that masks must be preceded
1505 with a '0x' if they don't begin with [0-9] so they are
1506 seen as numerical values. If the number of tasks (or
1507 ranks) exceeds the number of elements in this list, ele‐
1508 ments in the list will be reused as needed starting from
1509 the beginning of the list. To simplify support for large
1510 task counts, the lists may follow a mask with an asterisk
1511 and repetition count. For example "mask_mem:0*4,1*4".
1512 For predictable binding results, all CPUs for each node
1513 in the job should be allocated to the job.
1514
1515 no[ne] don't bind tasks to memory (default)
1516
1517 nosort avoid sorting free cache pages (default, LaunchParameters
1518 configuration parameter can override this default)
1519
1520 p[refer]
1521 Prefer use of first specified NUMA node, but permit
1522 use of other available NUMA nodes.
1523
1524 q[uiet]
1525 quietly bind before task runs (default)
1526
1527 rank bind by task rank (not recommended)
1528
1529 sort sort free cache pages (run zonesort on Intel KNL nodes)
1530
1531 v[erbose]
1532 verbosely report binding before task runs
1533
1534 This option applies to job and step allocations.
1535
1536
1537 --mem-per-cpu=<size>[units]
1538 Minimum memory required per allocated CPU. Default units are
1539 megabytes. Different units can be specified using the suffix
1540 [K|M|G|T]. The default value is DefMemPerCPU and the maximum
1541 value is MaxMemPerCPU (see exception below). If configured, both
1542 parameters can be seen using the scontrol show config command.
1543 Note that if the job's --mem-per-cpu value exceeds the config‐
1544 ured MaxMemPerCPU, then the user's limit will be treated as a
1545 memory limit per task; --mem-per-cpu will be reduced to a value
1546 no larger than MaxMemPerCPU; --cpus-per-task will be set and the
1547 value of --cpus-per-task multiplied by the new --mem-per-cpu
1548 value will equal the original --mem-per-cpu value specified by
1549 the user. This parameter would generally be used if individual
1550 processors are allocated to jobs (SelectType=select/cons_res).
1551 If resources are allocated by core, socket, or whole nodes, then
1552 the number of CPUs allocated to a job may be higher than the
1553 task count and the value of --mem-per-cpu should be adjusted ac‐
1554 cordingly. Specifying a memory limit of zero for a job step
1555 will restrict the job step to the amount of memory allocated to
1556 the job, but not remove any of the job's memory allocation from
1557 being available to other job steps. Also see --mem and
1558 --mem-per-gpu. The --mem, --mem-per-cpu and --mem-per-gpu op‐
1559 tions are mutually exclusive.
1560
1561 NOTE: If the final amount of memory requested by a job can't be
1562 satisfied by any of the nodes configured in the partition, the
1563 job will be rejected. This could happen if --mem-per-cpu is
1564 used with the --exclusive option for a job allocation and
1565 --mem-per-cpu times the number of CPUs on a node is greater than
1566 the total memory of that node.
1567
1568
1569 --mem-per-gpu=<size>[units]
1570 Minimum memory required per allocated GPU. Default units are
1571 megabytes. Different units can be specified using the suffix
1572 [K|M|G|T]. Default value is DefMemPerGPU and is available on
1573 both a global and per partition basis. If configured, the pa‐
1574 rameters can be seen using the scontrol show config and scontrol
1575 show partition commands. Also see --mem. The --mem,
1576 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1577
1578
1579 --mincpus=<n>
1580 Specify a minimum number of logical cpus/processors per node.
1581 This option applies to job allocations.
1582
1583
1584 --mpi=<mpi_type>
1585 Identify the type of MPI to be used. May result in unique initi‐
1586 ation procedures.
1587
1588 list Lists available mpi types to choose from.
1589
1590 pmi2 To enable PMI2 support. The PMI2 support in Slurm works
1591 only if the MPI implementation supports it, in other
1592 words if the MPI has the PMI2 interface implemented. The
1593 --mpi=pmi2 will load the library lib/slurm/mpi_pmi2.so
1594 which provides the server side functionality but the
1595 client side must implement PMI2_Init() and the other in‐
1596 terface calls.
1597
1598 pmix To enable PMIx support (https://pmix.github.io). The PMIx
1599 support in Slurm can be used to launch parallel applica‐
1600 tions (e.g. MPI) if it supports PMIx, PMI2 or PMI1. Slurm
1601 must be configured with pmix support by passing
1602 "--with-pmix=<PMIx installation path>" option to its
1603 "./configure" script.
1604
1605 At the time of writing PMIx is supported in Open MPI
1606 starting from version 2.0. PMIx also supports backward
1607 compatibility with PMI1 and PMI2 and can be used if MPI
1608 was configured with PMI2/PMI1 support pointing to the
1609 PMIx library ("libpmix"). If MPI supports PMI1/PMI2 but
1610 doesn't provide the way to point to a specific implemen‐
1611 tation, a hack'ish solution leveraging LD_PRELOAD can be
1612 used to force "libpmix" usage.
1613
1614
1615 none No special MPI processing. This is the default and works
1616 with many other versions of MPI.
1617
1618 This option applies to step allocations.
1619
1620
1621 --msg-timeout=<seconds>
1622 Modify the job launch message timeout. The default value is
1623 MessageTimeout in the Slurm configuration file slurm.conf.
1624 Changes to this are typically not recommended, but could be use‐
1625 ful to diagnose problems. This option applies to job alloca‐
1626 tions.
1627
1628
1629 --multi-prog
1630 Run a job with different programs and different arguments for
1631 each task. In this case, the executable program specified is ac‐
1632 tually a configuration file specifying the executable and argu‐
1633 ments for each task. See MULTIPLE PROGRAM CONFIGURATION below
1634 for details on the configuration file contents. This option ap‐
1635 plies to step allocations.
1636
1637
1638 --network=<type>
1639 Specify information pertaining to the switch or network. The
1640 interpretation of type is system dependent. This option is sup‐
1641 ported when running Slurm on a Cray natively. It is used to re‐
1642 quest using Network Performance Counters. Only one value per
1643 request is valid. All options are case in-sensitive. In this
1644 configuration supported values include:
1645
1646 system
1647 Use the system-wide network performance counters. Only
1648 nodes requested will be marked in use for the job alloca‐
1649 tion. If the job does not fill up the entire system the
1650 rest of the nodes are not able to be used by other jobs
1651 using NPC, if idle their state will appear as PerfCnts.
1652 These nodes are still available for other jobs not using
1653 NPC.
1654
1655 blade Use the blade network performance counters. Only nodes re‐
1656 quested will be marked in use for the job allocation. If
1657 the job does not fill up the entire blade(s) allocated to
1658 the job those blade(s) are not able to be used by other
1659 jobs using NPC, if idle their state will appear as PerfC‐
1660 nts. These nodes are still available for other jobs not
1661 using NPC.
1662
1663
1664 In all cases the job allocation request must specify the
1665 --exclusive option and the step cannot specify the --overlap op‐
1666 tion. Otherwise the request will be denied.
1667
1668 Also with any of these options steps are not allowed to share
1669 blades, so resources would remain idle inside an allocation if
1670 the step running on a blade does not take up all the nodes on
1671 the blade.
1672
1673 The network option is also supported on systems with IBM's Par‐
1674 allel Environment (PE). See IBM's LoadLeveler job command key‐
1675 word documentation about the keyword "network" for more informa‐
1676 tion. Multiple values may be specified in a comma separated
1677 list. All options are case in-sensitive. Supported values in‐
1678 clude:
1679
1680 BULK_XFER[=<resources>]
1681 Enable bulk transfer of data using Remote Di‐
1682 rect-Memory Access (RDMA). The optional resources
1683 specification is a numeric value which can have a
1684 suffix of "k", "K", "m", "M", "g" or "G" for kilo‐
1685 bytes, megabytes or gigabytes. NOTE: The resources
1686 specification is not supported by the underlying IBM
1687 infrastructure as of Parallel Environment version
1688 2.2 and no value should be specified at this time.
1689 The devices allocated to a job must all be of the
1690 same type. The default value depends upon depends
1691 upon what hardware is available and in order of
1692 preferences is IPONLY (which is not considered in
1693 User Space mode), HFI, IB, HPCE, and KMUX.
1694
1695 CAU=<count> Number of Collective Acceleration Units (CAU) re‐
1696 quired. Applies only to IBM Power7-IH processors.
1697 Default value is zero. Independent CAU will be al‐
1698 located for each programming interface (MPI, LAPI,
1699 etc.)
1700
1701 DEVNAME=<name>
1702 Specify the device name to use for communications
1703 (e.g. "eth0" or "mlx4_0").
1704
1705 DEVTYPE=<type>
1706 Specify the device type to use for communications.
1707 The supported values of type are: "IB" (InfiniBand),
1708 "HFI" (P7 Host Fabric Interface), "IPONLY" (IP-Only
1709 interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1710 nel Emulation of HPCE). The devices allocated to a
1711 job must all be of the same type. The default value
1712 depends upon depends upon what hardware is available
1713 and in order of preferences is IPONLY (which is not
1714 considered in User Space mode), HFI, IB, HPCE, and
1715 KMUX.
1716
1717 IMMED =<count>
1718 Number of immediate send slots per window required.
1719 Applies only to IBM Power7-IH processors. Default
1720 value is zero.
1721
1722 INSTANCES =<count>
1723 Specify number of network connections for each task
1724 on each network connection. The default instance
1725 count is 1.
1726
1727 IPV4 Use Internet Protocol (IP) version 4 communications
1728 (default).
1729
1730 IPV6 Use Internet Protocol (IP) version 6 communications.
1731
1732 LAPI Use the LAPI programming interface.
1733
1734 MPI Use the MPI programming interface. MPI is the de‐
1735 fault interface.
1736
1737 PAMI Use the PAMI programming interface.
1738
1739 SHMEM Use the OpenSHMEM programming interface.
1740
1741 SN_ALL Use all available switch networks (default).
1742
1743 SN_SINGLE Use one available switch network.
1744
1745 UPC Use the UPC programming interface.
1746
1747 US Use User Space communications.
1748
1749
1750 Some examples of network specifications:
1751
1752 Instances=2,US,MPI,SN_ALL
1753 Create two user space connections for MPI communica‐
1754 tions on every switch network for each task.
1755
1756 US,MPI,Instances=3,Devtype=IB
1757 Create three user space connections for MPI communi‐
1758 cations on every InfiniBand network for each task.
1759
1760 IPV4,LAPI,SN_Single
1761 Create a IP version 4 connection for LAPI communica‐
1762 tions on one switch network for each task.
1763
1764 Instances=2,US,LAPI,MPI
1765 Create two user space connections each for LAPI and
1766 MPI communications on every switch network for each
1767 task. Note that SN_ALL is the default option so ev‐
1768 ery switch network is used. Also note that In‐
1769 stances=2 specifies that two connections are estab‐
1770 lished for each protocol (LAPI and MPI) and each
1771 task. If there are two networks and four tasks on
1772 the node then a total of 32 connections are estab‐
1773 lished (2 instances x 2 protocols x 2 networks x 4
1774 tasks).
1775
1776 This option applies to job and step allocations.
1777
1778
1779 --nice[=adjustment]
1780 Run the job with an adjusted scheduling priority within Slurm.
1781 With no adjustment value the scheduling priority is decreased by
1782 100. A negative nice value increases the priority, otherwise de‐
1783 creases it. The adjustment range is +/- 2147483645. Only privi‐
1784 leged users can specify a negative adjustment.
1785
1786
1787 -Z, --no-allocate
1788 Run the specified tasks on a set of nodes without creating a
1789 Slurm "job" in the Slurm queue structure, bypassing the normal
1790 resource allocation step. The list of nodes must be specified
1791 with the -w, --nodelist option. This is a privileged option
1792 only available for the users "SlurmUser" and "root". This option
1793 applies to job allocations.
1794
1795
1796 -k, --no-kill[=off]
1797 Do not automatically terminate a job if one of the nodes it has
1798 been allocated fails. This option applies to job and step allo‐
1799 cations. The job will assume all responsibilities for
1800 fault-tolerance. Tasks launch using this option will not be
1801 considered terminated (e.g. -K, --kill-on-bad-exit and -W,
1802 --wait options will have no effect upon the job step). The ac‐
1803 tive job step (MPI job) will likely suffer a fatal error, but
1804 subsequent job steps may be run if this option is specified.
1805
1806 Specify an optional argument of "off" disable the effect of the
1807 SLURM_NO_KILL environment variable.
1808
1809 The default action is to terminate the job upon node failure.
1810
1811
1812 -F, --nodefile=<node_file>
1813 Much like --nodelist, but the list is contained in a file of
1814 name node file. The node names of the list may also span multi‐
1815 ple lines in the file. Duplicate node names in the file will
1816 be ignored. The order of the node names in the list is not im‐
1817 portant; the node names will be sorted by Slurm.
1818
1819
1820 -w, --nodelist={<node_name_list>|<filename>}
1821 Request a specific list of hosts. The job will contain all of
1822 these hosts and possibly additional hosts as needed to satisfy
1823 resource requirements. The list may be specified as a
1824 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1825 for example), or a filename. The host list will be assumed to
1826 be a filename if it contains a "/" character. If you specify a
1827 minimum node or processor count larger than can be satisfied by
1828 the supplied host list, additional resources will be allocated
1829 on other nodes as needed. Rather than repeating a host name
1830 multiple times, an asterisk and a repetition count may be ap‐
1831 pended to a host name. For example "host1,host1" and "host1*2"
1832 are equivalent. If the number of tasks is given and a list of
1833 requested nodes is also given, the number of nodes used from
1834 that list will be reduced to match that of the number of tasks
1835 if the number of nodes in the list is greater than the number of
1836 tasks. This option applies to job and step allocations.
1837
1838
1839 -N, --nodes=<minnodes>[-maxnodes]
1840 Request that a minimum of minnodes nodes be allocated to this
1841 job. A maximum node count may also be specified with maxnodes.
1842 If only one number is specified, this is used as both the mini‐
1843 mum and maximum node count. The partition's node limits super‐
1844 sede those of the job. If a job's node limits are outside of
1845 the range permitted for its associated partition, the job will
1846 be left in a PENDING state. This permits possible execution at
1847 a later time, when the partition limit is changed. If a job
1848 node limit exceeds the number of nodes configured in the parti‐
1849 tion, the job will be rejected. Note that the environment vari‐
1850 able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1851 ibility) will be set to the count of nodes actually allocated to
1852 the job. See the ENVIRONMENT VARIABLES section for more informa‐
1853 tion. If -N is not specified, the default behavior is to allo‐
1854 cate enough nodes to satisfy the requirements of the -n and -c
1855 options. The job will be allocated as many nodes as possible
1856 within the range specified and without delaying the initiation
1857 of the job. If the number of tasks is given and a number of re‐
1858 quested nodes is also given, the number of nodes used from that
1859 request will be reduced to match that of the number of tasks if
1860 the number of nodes in the request is greater than the number of
1861 tasks. The node count specification may include a numeric value
1862 followed by a suffix of "k" (multiplies numeric value by 1,024)
1863 or "m" (multiplies numeric value by 1,048,576). This option ap‐
1864 plies to job and step allocations.
1865
1866
1867 -n, --ntasks=<number>
1868 Specify the number of tasks to run. Request that srun allocate
1869 resources for ntasks tasks. The default is one task per node,
1870 but note that the --cpus-per-task option will change this de‐
1871 fault. This option applies to job and step allocations.
1872
1873
1874 --ntasks-per-core=<ntasks>
1875 Request the maximum ntasks be invoked on each core. This option
1876 applies to the job allocation, but not to step allocations.
1877 Meant to be used with the --ntasks option. Related to
1878 --ntasks-per-node except at the core level instead of the node
1879 level. Masks will automatically be generated to bind the tasks
1880 to specific cores unless --cpu-bind=none is specified. NOTE:
1881 This option is not supported when using SelectType=select/lin‐
1882 ear.
1883
1884
1885 --ntasks-per-gpu=<ntasks>
1886 Request that there are ntasks tasks invoked for every GPU. This
1887 option can work in two ways: 1) either specify --ntasks in addi‐
1888 tion, in which case a type-less GPU specification will be auto‐
1889 matically determined to satisfy --ntasks-per-gpu, or 2) specify
1890 the GPUs wanted (e.g. via --gpus or --gres) without specifying
1891 --ntasks, and the total task count will be automatically deter‐
1892 mined. The number of CPUs needed will be automatically in‐
1893 creased if necessary to allow for any calculated task count.
1894 This option will implicitly set --gpu-bind=single:<ntasks>, but
1895 that can be overridden with an explicit --gpu-bind specifica‐
1896 tion. This option is not compatible with a node range (i.e.
1897 -N<minnodes-maxnodes>). This option is not compatible with
1898 --gpus-per-task, --gpus-per-socket, or --ntasks-per-node. This
1899 option is not supported unless SelectType=cons_tres is config‐
1900 ured (either directly or indirectly on Cray systems).
1901
1902
1903 --ntasks-per-node=<ntasks>
1904 Request that ntasks be invoked on each node. If used with the
1905 --ntasks option, the --ntasks option will take precedence and
1906 the --ntasks-per-node will be treated as a maximum count of
1907 tasks per node. Meant to be used with the --nodes option. This
1908 is related to --cpus-per-task=ncpus, but does not require knowl‐
1909 edge of the actual number of cpus on each node. In some cases,
1910 it is more convenient to be able to request that no more than a
1911 specific number of tasks be invoked on each node. Examples of
1912 this include submitting a hybrid MPI/OpenMP app where only one
1913 MPI "task/rank" should be assigned to each node while allowing
1914 the OpenMP portion to utilize all of the parallelism present in
1915 the node, or submitting a single setup/cleanup/monitoring job to
1916 each node of a pre-existing allocation as one step in a larger
1917 job script. This option applies to job allocations.
1918
1919
1920 --ntasks-per-socket=<ntasks>
1921 Request the maximum ntasks be invoked on each socket. This op‐
1922 tion applies to the job allocation, but not to step allocations.
1923 Meant to be used with the --ntasks option. Related to
1924 --ntasks-per-node except at the socket level instead of the node
1925 level. Masks will automatically be generated to bind the tasks
1926 to specific sockets unless --cpu-bind=none is specified. NOTE:
1927 This option is not supported when using SelectType=select/lin‐
1928 ear.
1929
1930
1931 --open-mode={append|truncate}
1932 Open the output and error files using append or truncate mode as
1933 specified. For heterogeneous job steps the default value is
1934 "append". Otherwise the default value is specified by the sys‐
1935 tem configuration parameter JobFileAppend. This option applies
1936 to job and step allocations.
1937
1938
1939 -o, --output=<filename_pattern>
1940 Specify the "filename pattern" for stdout redirection. By de‐
1941 fault in interactive mode, srun collects stdout from all tasks
1942 and sends this output via TCP/IP to the attached terminal. With
1943 --output stdout may be redirected to a file, to one file per
1944 task, or to /dev/null. See section IO Redirection below for the
1945 various forms of filename pattern. If the specified file al‐
1946 ready exists, it will be overwritten.
1947
1948 If --error is not also specified on the command line, both std‐
1949 out and stderr will directed to the file specified by --output.
1950 This option applies to job and step allocations.
1951
1952
1953 -O, --overcommit
1954 Overcommit resources. This option applies to job and step allo‐
1955 cations.
1956
1957 When applied to a job allocation (not including jobs requesting
1958 exclusive access to the nodes) the resources are allocated as if
1959 only one task per node is requested. This means that the re‐
1960 quested number of cpus per task (-c, --cpus-per-task) are allo‐
1961 cated per node rather than being multiplied by the number of
1962 tasks. Options used to specify the number of tasks per node,
1963 socket, core, etc. are ignored.
1964
1965 When applied to job step allocations (the srun command when exe‐
1966 cuted within an existing job allocation), this option can be
1967 used to launch more than one task per CPU. Normally, srun will
1968 not allocate more than one process per CPU. By specifying
1969 --overcommit you are explicitly allowing more than one process
1970 per CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1971 mitted to execute per node. NOTE: MAX_TASKS_PER_NODE is defined
1972 in the file slurm.h and is not a variable, it is set at Slurm
1973 build time.
1974
1975
1976 --overlap
1977 Allow steps to overlap each other on the CPUs. By default steps
1978 do not share CPUs with other parallel steps.
1979
1980
1981 -s, --oversubscribe
1982 The job allocation can over-subscribe resources with other run‐
1983 ning jobs. The resources to be over-subscribed can be nodes,
1984 sockets, cores, and/or hyperthreads depending upon configura‐
1985 tion. The default over-subscribe behavior depends on system
1986 configuration and the partition's OverSubscribe option takes
1987 precedence over the job's option. This option may result in the
1988 allocation being granted sooner than if the --oversubscribe op‐
1989 tion was not set and allow higher system utilization, but appli‐
1990 cation performance will likely suffer due to competition for re‐
1991 sources. This option applies to step allocations.
1992
1993
1994 -p, --partition=<partition_names>
1995 Request a specific partition for the resource allocation. If
1996 not specified, the default behavior is to allow the slurm con‐
1997 troller to select the default partition as designated by the
1998 system administrator. If the job can use more than one parti‐
1999 tion, specify their names in a comma separate list and the one
2000 offering earliest initiation will be used with no regard given
2001 to the partition name ordering (although higher priority parti‐
2002 tions will be considered first). When the job is initiated, the
2003 name of the partition used will be placed first in the job
2004 record partition string. This option applies to job allocations.
2005
2006
2007 --power=<flags>
2008 Comma separated list of power management plugin options. Cur‐
2009 rently available flags include: level (all nodes allocated to
2010 the job should have identical power caps, may be disabled by the
2011 Slurm configuration option PowerParameters=job_no_level). This
2012 option applies to job allocations.
2013
2014
2015 -E, --preserve-env
2016 Pass the current values of environment variables
2017 SLURM_JOB_NUM_NODES and SLURM_NTASKS through to the executable,
2018 rather than computing them from command line parameters. This
2019 option applies to job allocations.
2020
2021
2022
2023 --priority=<value>
2024 Request a specific job priority. May be subject to configura‐
2025 tion specific constraints. value should either be a numeric
2026 value or "TOP" (for highest possible value). Only Slurm opera‐
2027 tors and administrators can set the priority of a job. This op‐
2028 tion applies to job allocations only.
2029
2030
2031 --profile={all|none|<type>[,<type>...]}
2032 Enables detailed data collection by the acct_gather_profile
2033 plugin. Detailed data are typically time-series that are stored
2034 in an HDF5 file for the job or an InfluxDB database depending on
2035 the configured plugin. This option applies to job and step al‐
2036 locations.
2037
2038
2039 All All data types are collected. (Cannot be combined with
2040 other values.)
2041
2042
2043 None No data types are collected. This is the default.
2044 (Cannot be combined with other values.)
2045
2046
2047 Valid type values are:
2048
2049
2050 Energy Energy data is collected.
2051
2052
2053 Task Task (I/O, Memory, ...) data is collected.
2054
2055
2056 Filesystem
2057 Filesystem data is collected.
2058
2059
2060 Network
2061 Network (InfiniBand) data is collected.
2062
2063
2064 --prolog=<executable>
2065 srun will run executable just before launching the job step.
2066 The command line arguments for executable will be the command
2067 and arguments of the job step. If executable is "none", then no
2068 srun prolog will be run. This parameter overrides the SrunProlog
2069 parameter in slurm.conf. This parameter is completely indepen‐
2070 dent from the Prolog parameter in slurm.conf. This option ap‐
2071 plies to job allocations.
2072
2073
2074 --propagate[=rlimit[,rlimit...]]
2075 Allows users to specify which of the modifiable (soft) resource
2076 limits to propagate to the compute nodes and apply to their
2077 jobs. If no rlimit is specified, then all resource limits will
2078 be propagated. The following rlimit names are supported by
2079 Slurm (although some options may not be supported on some sys‐
2080 tems):
2081
2082 ALL All limits listed below (default)
2083
2084 NONE No limits listed below
2085
2086 AS The maximum address space (virtual memory) for a
2087 process.
2088
2089 CORE The maximum size of core file
2090
2091 CPU The maximum amount of CPU time
2092
2093 DATA The maximum size of a process's data segment
2094
2095 FSIZE The maximum size of files created. Note that if the
2096 user sets FSIZE to less than the current size of the
2097 slurmd.log, job launches will fail with a 'File size
2098 limit exceeded' error.
2099
2100 MEMLOCK The maximum size that may be locked into memory
2101
2102 NOFILE The maximum number of open files
2103
2104 NPROC The maximum number of processes available
2105
2106 RSS The maximum resident set size. Note that this only has
2107 effect with Linux kernels 2.4.30 or older or BSD.
2108
2109 STACK The maximum stack size
2110
2111 This option applies to job allocations.
2112
2113
2114 --pty Execute task zero in pseudo terminal mode. Implicitly sets
2115 --unbuffered. Implicitly sets --error and --output to /dev/null
2116 for all tasks except task zero, which may cause those tasks to
2117 exit immediately (e.g. shells will typically exit immediately in
2118 that situation). This option applies to step allocations.
2119
2120
2121 -q, --qos=<qos>
2122 Request a quality of service for the job. QOS values can be de‐
2123 fined for each user/cluster/account association in the Slurm
2124 database. Users will be limited to their association's defined
2125 set of qos's when the Slurm configuration parameter, Account‐
2126 ingStorageEnforce, includes "qos" in its definition. This option
2127 applies to job allocations.
2128
2129
2130 -Q, --quiet
2131 Suppress informational messages from srun. Errors will still be
2132 displayed. This option applies to job and step allocations.
2133
2134
2135 --quit-on-interrupt
2136 Quit immediately on single SIGINT (Ctrl-C). Use of this option
2137 disables the status feature normally available when srun re‐
2138 ceives a single Ctrl-C and causes srun to instead immediately
2139 terminate the running job. This option applies to step alloca‐
2140 tions.
2141
2142
2143 --reboot
2144 Force the allocated nodes to reboot before starting the job.
2145 This is only supported with some system configurations and will
2146 otherwise be silently ignored. Only root, SlurmUser or admins
2147 can reboot nodes. This option applies to job allocations.
2148
2149
2150 -r, --relative=<n>
2151 Run a job step relative to node n of the current allocation.
2152 This option may be used to spread several job steps out among
2153 the nodes of the current job. If -r is used, the current job
2154 step will begin at node n of the allocated nodelist, where the
2155 first node is considered node 0. The -r option is not permitted
2156 with -w or -x option and will result in a fatal error when not
2157 running within a prior allocation (i.e. when SLURM_JOB_ID is not
2158 set). The default for n is 0. If the value of --nodes exceeds
2159 the number of nodes identified with the --relative option, a
2160 warning message will be printed and the --relative option will
2161 take precedence. This option applies to step allocations.
2162
2163
2164 --reservation=<reservation_names>
2165 Allocate resources for the job from the named reservation. If
2166 the job can use more than one reservation, specify their names
2167 in a comma separate list and the one offering earliest initia‐
2168 tion. Each reservation will be considered in the order it was
2169 requested. All reservations will be listed in scontrol/squeue
2170 through the life of the job. In accounting the first reserva‐
2171 tion will be seen and after the job starts the reservation used
2172 will replace it.
2173
2174
2175 --resv-ports[=count]
2176 Reserve communication ports for this job. Users can specify the
2177 number of port they want to reserve. The parameter Mpi‐
2178 Params=ports=12000-12999 must be specified in slurm.conf. If not
2179 specified and Slurm's OpenMPI plugin is used, then by default
2180 the number of reserved equal to the highest number of tasks on
2181 any node in the job step allocation. If the number of reserved
2182 ports is zero then no ports is reserved. Used for OpenMPI. This
2183 option applies to job and step allocations.
2184
2185
2186 --send-libs[=yes|no]
2187 If set to yes (or no argument), autodetect and broadcast the ex‐
2188 ecutable's shared object dependencies to allocated compute
2189 nodes. The files are placed in a directory alongside the exe‐
2190 cutable. The LD_LIBRARY_PATH is automatically updated to include
2191 this cache directory as well. This overrides the default behav‐
2192 ior configured in slurm.conf SbcastParameters send_libs. This
2193 option only works in conjunction with --bcast. See also
2194 --bcast-exclude.
2195
2196
2197 --signal=[R:]<sig_num>[@sig_time]
2198 When a job is within sig_time seconds of its end time, send it
2199 the signal sig_num. Due to the resolution of event handling by
2200 Slurm, the signal may be sent up to 60 seconds earlier than
2201 specified. sig_num may either be a signal number or name (e.g.
2202 "10" or "USR1"). sig_time must have an integer value between 0
2203 and 65535. By default, no signal is sent before the job's end
2204 time. If a sig_num is specified without any sig_time, the de‐
2205 fault time will be 60 seconds. This option applies to job allo‐
2206 cations. Use the "R:" option to allow this job to overlap with
2207 a reservation with MaxStartDelay set. To have the signal sent
2208 at preemption time see the preempt_send_user_signal SlurmctldPa‐
2209 rameter.
2210
2211
2212 --slurmd-debug=<level>
2213 Specify a debug level for slurmd(8). The level may be specified
2214 either an integer value between 0 [quiet, only errors are dis‐
2215 played] and 4 [verbose operation] or the SlurmdDebug tags.
2216
2217 quiet Log nothing
2218
2219 fatal Log only fatal errors
2220
2221 error Log only errors
2222
2223 info Log errors and general informational messages
2224
2225 verbose Log errors and verbose informational messages
2226
2227
2228 The slurmd debug information is copied onto the stderr of
2229 the job. By default only errors are displayed. This option ap‐
2230 plies to job and step allocations.
2231
2232
2233 --sockets-per-node=<sockets>
2234 Restrict node selection to nodes with at least the specified
2235 number of sockets. See additional information under -B option
2236 above when task/affinity plugin is enabled. This option applies
2237 to job allocations.
2238 NOTE: This option may implicitly impact the number of tasks if
2239 -n was not specified.
2240
2241
2242 --spread-job
2243 Spread the job allocation over as many nodes as possible and at‐
2244 tempt to evenly distribute tasks across the allocated nodes.
2245 This option disables the topology/tree plugin. This option ap‐
2246 plies to job allocations.
2247
2248
2249 --switches=<count>[@max-time]
2250 When a tree topology is used, this defines the maximum count of
2251 leaf switches desired for the job allocation and optionally the
2252 maximum time to wait for that number of switches. If Slurm finds
2253 an allocation containing more switches than the count specified,
2254 the job remains pending until it either finds an allocation with
2255 desired switch count or the time limit expires. It there is no
2256 switch count limit, there is no delay in starting the job. Ac‐
2257 ceptable time formats include "minutes", "minutes:seconds",
2258 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
2259 "days-hours:minutes:seconds". The job's maximum time delay may
2260 be limited by the system administrator using the SchedulerParam‐
2261 eters configuration parameter with the max_switch_wait parameter
2262 option. On a dragonfly network the only switch count supported
2263 is 1 since communication performance will be highest when a job
2264 is allocate resources on one leaf switch or more than 2 leaf
2265 switches. The default max-time is the max_switch_wait Sched‐
2266 ulerParameters. This option applies to job allocations.
2267
2268
2269 --task-epilog=<executable>
2270 The slurmstepd daemon will run executable just after each task
2271 terminates. This will be executed before any TaskEpilog parame‐
2272 ter in slurm.conf is executed. This is meant to be a very
2273 short-lived program. If it fails to terminate within a few sec‐
2274 onds, it will be killed along with any descendant processes.
2275 This option applies to step allocations.
2276
2277
2278 --task-prolog=<executable>
2279 The slurmstepd daemon will run executable just before launching
2280 each task. This will be executed after any TaskProlog parameter
2281 in slurm.conf is executed. Besides the normal environment vari‐
2282 ables, this has SLURM_TASK_PID available to identify the process
2283 ID of the task being started. Standard output from this program
2284 of the form "export NAME=value" will be used to set environment
2285 variables for the task being spawned. This option applies to
2286 step allocations.
2287
2288
2289 --test-only
2290 Returns an estimate of when a job would be scheduled to run
2291 given the current job queue and all the other srun arguments
2292 specifying the job. This limits srun's behavior to just return
2293 information; no job is actually submitted. The program will be
2294 executed directly by the slurmd daemon. This option applies to
2295 job allocations.
2296
2297
2298 --thread-spec=<num>
2299 Count of specialized threads per node reserved by the job for
2300 system operations and not used by the application. The applica‐
2301 tion will not use these threads, but will be charged for their
2302 allocation. This option can not be used with the --core-spec
2303 option. This option applies to job allocations.
2304
2305
2306 -T, --threads=<nthreads>
2307 Allows limiting the number of concurrent threads used to send
2308 the job request from the srun process to the slurmd processes on
2309 the allocated nodes. Default is to use one thread per allocated
2310 node up to a maximum of 60 concurrent threads. Specifying this
2311 option limits the number of concurrent threads to nthreads (less
2312 than or equal to 60). This should only be used to set a low
2313 thread count for testing on very small memory computers. This
2314 option applies to job allocations.
2315
2316
2317 --threads-per-core=<threads>
2318 Restrict node selection to nodes with at least the specified
2319 number of threads per core. In task layout, use the specified
2320 maximum number of threads per core. Implies --exact and
2321 --cpu-bind=threads unless overridden by command line or environ‐
2322 ment options. NOTE: "Threads" refers to the number of process‐
2323 ing units on each core rather than the number of application
2324 tasks to be launched per core. See additional information under
2325 -B option above when task/affinity plugin is enabled. This op‐
2326 tion applies to job and step allocations.
2327 NOTE: This option may implicitly impact the number of tasks if
2328 -n was not specified.
2329
2330
2331 -t, --time=<time>
2332 Set a limit on the total run time of the job allocation. If the
2333 requested time limit exceeds the partition's time limit, the job
2334 will be left in a PENDING state (possibly indefinitely). The
2335 default time limit is the partition's default time limit. When
2336 the time limit is reached, each task in each job step is sent
2337 SIGTERM followed by SIGKILL. The interval between signals is
2338 specified by the Slurm configuration parameter KillWait. The
2339 OverTimeLimit configuration parameter may permit the job to run
2340 longer than scheduled. Time resolution is one minute and second
2341 values are rounded up to the next minute.
2342
2343 A time limit of zero requests that no time limit be imposed.
2344 Acceptable time formats include "minutes", "minutes:seconds",
2345 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
2346 "days-hours:minutes:seconds". This option applies to job and
2347 step allocations.
2348
2349
2350 --time-min=<time>
2351 Set a minimum time limit on the job allocation. If specified,
2352 the job may have its --time limit lowered to a value no lower
2353 than --time-min if doing so permits the job to begin execution
2354 earlier than otherwise possible. The job's time limit will not
2355 be changed after the job is allocated resources. This is per‐
2356 formed by a backfill scheduling algorithm to allocate resources
2357 otherwise reserved for higher priority jobs. Acceptable time
2358 formats include "minutes", "minutes:seconds", "hours:min‐
2359 utes:seconds", "days-hours", "days-hours:minutes" and
2360 "days-hours:minutes:seconds". This option applies to job alloca‐
2361 tions.
2362
2363
2364 --tmp=<size>[units]
2365 Specify a minimum amount of temporary disk space per node. De‐
2366 fault units are megabytes. Different units can be specified us‐
2367 ing the suffix [K|M|G|T]. This option applies to job alloca‐
2368 tions.
2369
2370
2371 --uid=<user>
2372 Attempt to submit and/or run a job as user instead of the invok‐
2373 ing user id. The invoking user's credentials will be used to
2374 check access permissions for the target partition. User root may
2375 use this option to run jobs as a normal user in a RootOnly par‐
2376 tition for example. If run as root, srun will drop its permis‐
2377 sions to the uid specified after node allocation is successful.
2378 user may be the user name or numerical user ID. This option ap‐
2379 plies to job and step allocations.
2380
2381
2382 -u, --unbuffered
2383 By default, the connection between slurmstepd and the
2384 user-launched application is over a pipe. The stdio output writ‐
2385 ten by the application is buffered by the glibc until it is
2386 flushed or the output is set as unbuffered. See setbuf(3). If
2387 this option is specified the tasks are executed with a pseudo
2388 terminal so that the application output is unbuffered. This op‐
2389 tion applies to step allocations.
2390
2391 --usage
2392 Display brief help message and exit.
2393
2394
2395 --use-min-nodes
2396 If a range of node counts is given, prefer the smaller count.
2397
2398
2399 -v, --verbose
2400 Increase the verbosity of srun's informational messages. Multi‐
2401 ple -v's will further increase srun's verbosity. By default
2402 only errors will be displayed. This option applies to job and
2403 step allocations.
2404
2405
2406 -V, --version
2407 Display version information and exit.
2408
2409
2410 -W, --wait=<seconds>
2411 Specify how long to wait after the first task terminates before
2412 terminating all remaining tasks. A value of 0 indicates an un‐
2413 limited wait (a warning will be issued after 60 seconds). The
2414 default value is set by the WaitTime parameter in the slurm con‐
2415 figuration file (see slurm.conf(5)). This option can be useful
2416 to ensure that a job is terminated in a timely fashion in the
2417 event that one or more tasks terminate prematurely. Note: The
2418 -K, --kill-on-bad-exit option takes precedence over -W, --wait
2419 to terminate the job immediately if a task exits with a non-zero
2420 exit code. This option applies to job allocations.
2421
2422
2423 --wckey=<wckey>
2424 Specify wckey to be used with job. If TrackWCKey=no (default)
2425 in the slurm.conf this value is ignored. This option applies to
2426 job allocations.
2427
2428
2429 --x11[={all|first|last}]
2430 Sets up X11 forwarding on "all", "first" or "last" node(s) of
2431 the allocation. This option is only enabled if Slurm was com‐
2432 piled with X11 support and PrologFlags=x11 is defined in the
2433 slurm.conf. Default is "all".
2434
2435
2436 srun will submit the job request to the slurm job controller, then ini‐
2437 tiate all processes on the remote nodes. If the request cannot be met
2438 immediately, srun will block until the resources are free to run the
2439 job. If the -I (--immediate) option is specified srun will terminate if
2440 resources are not immediately available.
2441
2442 When initiating remote processes srun will propagate the current work‐
2443 ing directory, unless --chdir=<path> is specified, in which case path
2444 will become the working directory for the remote processes.
2445
2446 The -n, -c, and -N options control how CPUs and nodes will be allo‐
2447 cated to the job. When specifying only the number of processes to run
2448 with -n, a default of one CPU per process is allocated. By specifying
2449 the number of CPUs required per task (-c), more than one CPU may be al‐
2450 located per process. If the number of nodes is specified with -N, srun
2451 will attempt to allocate at least the number of nodes specified.
2452
2453 Combinations of the above three options may be used to change how pro‐
2454 cesses are distributed across nodes and cpus. For instance, by specify‐
2455 ing both the number of processes and number of nodes on which to run,
2456 the number of processes per node is implied. However, if the number of
2457 CPUs per process is more important then number of processes (-n) and
2458 the number of CPUs per process (-c) should be specified.
2459
2460 srun will refuse to allocate more than one process per CPU unless
2461 --overcommit (-O) is also specified.
2462
2463 srun will attempt to meet the above specifications "at a minimum." That
2464 is, if 16 nodes are requested for 32 processes, and some nodes do not
2465 have 2 CPUs, the allocation of nodes will be increased in order to meet
2466 the demand for CPUs. In other words, a minimum of 16 nodes are being
2467 requested. However, if 16 nodes are requested for 15 processes, srun
2468 will consider this an error, as 15 processes cannot run across 16
2469 nodes.
2470
2471
2472 IO Redirection
2473
2474 By default, stdout and stderr will be redirected from all tasks to the
2475 stdout and stderr of srun, and stdin will be redirected from the stan‐
2476 dard input of srun to all remote tasks. If stdin is only to be read by
2477 a subset of the spawned tasks, specifying a file to read from rather
2478 than forwarding stdin from the srun command may be preferable as it
2479 avoids moving and storing data that will never be read.
2480
2481 For OS X, the poll() function does not support stdin, so input from a
2482 terminal is not possible.
2483
2484 This behavior may be changed with the --output, --error, and --input
2485 (-o, -e, -i) options. Valid format specifications for these options are
2486
2487 all stdout stderr is redirected from all tasks to srun. stdin is
2488 broadcast to all remote tasks. (This is the default behav‐
2489 ior)
2490
2491 none stdout and stderr is not received from any task. stdin is
2492 not sent to any task (stdin is closed).
2493
2494 taskid stdout and/or stderr are redirected from only the task with
2495 relative id equal to taskid, where 0 <= taskid <= ntasks,
2496 where ntasks is the total number of tasks in the current job
2497 step. stdin is redirected from the stdin of srun to this
2498 same task. This file will be written on the node executing
2499 the task.
2500
2501 filename srun will redirect stdout and/or stderr to the named file
2502 from all tasks. stdin will be redirected from the named file
2503 and broadcast to all tasks in the job. filename refers to a
2504 path on the host that runs srun. Depending on the cluster's
2505 file system layout, this may result in the output appearing
2506 in different places depending on whether the job is run in
2507 batch mode.
2508
2509 filename pattern
2510 srun allows for a filename pattern to be used to generate the
2511 named IO file described above. The following list of format
2512 specifiers may be used in the format string to generate a
2513 filename that will be unique to a given jobid, stepid, node,
2514 or task. In each case, the appropriate number of files are
2515 opened and associated with the corresponding tasks. Note that
2516 any format string containing %t, %n, and/or %N will be writ‐
2517 ten on the node executing the task rather than the node where
2518 srun executes, these format specifiers are not supported on a
2519 BGQ system.
2520
2521 \\ Do not process any of the replacement symbols.
2522
2523 %% The character "%".
2524
2525 %A Job array's master job allocation number.
2526
2527 %a Job array ID (index) number.
2528
2529 %J jobid.stepid of the running job. (e.g. "128.0")
2530
2531 %j jobid of the running job.
2532
2533 %s stepid of the running job.
2534
2535 %N short hostname. This will create a separate IO file
2536 per node.
2537
2538 %n Node identifier relative to current job (e.g. "0" is
2539 the first node of the running job) This will create a
2540 separate IO file per node.
2541
2542 %t task identifier (rank) relative to current job. This
2543 will create a separate IO file per task.
2544
2545 %u User name.
2546
2547 %x Job name.
2548
2549 A number placed between the percent character and format
2550 specifier may be used to zero-pad the result in the IO file‐
2551 name. This number is ignored if the format specifier corre‐
2552 sponds to non-numeric data (%N for example).
2553
2554 Some examples of how the format string may be used for a 4
2555 task job step with a Job ID of 128 and step id of 0 are in‐
2556 cluded below:
2557
2558 job%J.out job128.0.out
2559
2560 job%4j.out job0128.out
2561
2562 job%j-%2t.out job128-00.out, job128-01.out, ...
2563
2565 Executing srun sends a remote procedure call to slurmctld. If enough
2566 calls from srun or other Slurm client commands that send remote proce‐
2567 dure calls to the slurmctld daemon come in at once, it can result in a
2568 degradation of performance of the slurmctld daemon, possibly resulting
2569 in a denial of service.
2570
2571 Do not run srun or other Slurm client commands that send remote proce‐
2572 dure calls to slurmctld from loops in shell scripts or other programs.
2573 Ensure that programs limit calls to srun to the minimum necessary for
2574 the information you are trying to gather.
2575
2576
2578 Upon startup, srun will read and handle the options set in the follow‐
2579 ing environment variables. The majority of these variables are set the
2580 same way the options are set, as defined above. For flag options that
2581 are defined to expect no argument, the option can be enabled by setting
2582 the environment variable without a value (empty or NULL string), the
2583 string 'yes', or a non-zero number. Any other value for the environment
2584 variable will result in the option not being set. There are a couple
2585 exceptions to these rules that are noted below.
2586 NOTE: Command line options always override environment variable set‐
2587 tings.
2588
2589
2590 PMI_FANOUT This is used exclusively with PMI (MPICH2 and
2591 MVAPICH2) and controls the fanout of data commu‐
2592 nications. The srun command sends messages to ap‐
2593 plication programs (via the PMI library) and
2594 those applications may be called upon to forward
2595 that data to up to this number of additional
2596 tasks. Higher values offload work from the srun
2597 command to the applications and likely increase
2598 the vulnerability to failures. The default value
2599 is 32.
2600
2601 PMI_FANOUT_OFF_HOST This is used exclusively with PMI (MPICH2 and
2602 MVAPICH2) and controls the fanout of data commu‐
2603 nications. The srun command sends messages to
2604 application programs (via the PMI library) and
2605 those applications may be called upon to forward
2606 that data to additional tasks. By default, srun
2607 sends one message per host and one task on that
2608 host forwards the data to other tasks on that
2609 host up to PMI_FANOUT. If PMI_FANOUT_OFF_HOST is
2610 defined, the user task may be required to forward
2611 the data to tasks on other hosts. Setting
2612 PMI_FANOUT_OFF_HOST may increase performance.
2613 Since more work is performed by the PMI library
2614 loaded by the user application, failures also can
2615 be more common and more difficult to diagnose.
2616 Should be disabled/enabled by setting to 0 or 1.
2617
2618 PMI_TIME This is used exclusively with PMI (MPICH2 and
2619 MVAPICH2) and controls how much the communica‐
2620 tions from the tasks to the srun are spread out
2621 in time in order to avoid overwhelming the srun
2622 command with work. The default value is 500 (mi‐
2623 croseconds) per task. On relatively slow proces‐
2624 sors or systems with very large processor counts
2625 (and large PMI data sets), higher values may be
2626 required.
2627
2628 SLURM_ACCOUNT Same as -A, --account
2629
2630 SLURM_ACCTG_FREQ Same as --acctg-freq
2631
2632 SLURM_BCAST Same as --bcast
2633
2634 SLURM_BCAST_EXCLUDE Same as --bcast-exclude
2635
2636 SLURM_BURST_BUFFER Same as --bb
2637
2638 SLURM_CLUSTERS Same as -M, --clusters
2639
2640 SLURM_COMPRESS Same as --compress
2641
2642 SLURM_CONF The location of the Slurm configuration file.
2643
2644 SLURM_CONSTRAINT Same as -C, --constraint
2645
2646 SLURM_CORE_SPEC Same as --core-spec
2647
2648 SLURM_CPU_BIND Same as --cpu-bind
2649
2650 SLURM_CPU_FREQ_REQ Same as --cpu-freq.
2651
2652 SLURM_CPUS_PER_GPU Same as --cpus-per-gpu
2653
2654 SLURM_CPUS_PER_TASK Same as -c, --cpus-per-task
2655
2656 SLURM_DEBUG Same as -v, --verbose. Must be set to 0 or 1 to
2657 disable or enable the option.
2658
2659 SLURM_DELAY_BOOT Same as --delay-boot
2660
2661 SLURM_DEPENDENCY Same as -d, --dependency=<jobid>
2662
2663 SLURM_DISABLE_STATUS Same as -X, --disable-status
2664
2665 SLURM_DIST_PLANESIZE Plane distribution size. Only used if --distribu‐
2666 tion=plane, without =<size>, is set.
2667
2668 SLURM_DISTRIBUTION Same as -m, --distribution
2669
2670 SLURM_EPILOG Same as --epilog
2671
2672 SLURM_EXACT Same as --exact
2673
2674 SLURM_EXCLUSIVE Same as --exclusive
2675
2676 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
2677 error occurs (e.g. invalid options). This can be
2678 used by a script to distinguish application exit
2679 codes from various Slurm error conditions. Also
2680 see SLURM_EXIT_IMMEDIATE.
2681
2682 SLURM_EXIT_IMMEDIATE Specifies the exit code generated when the --im‐
2683 mediate option is used and resources are not cur‐
2684 rently available. This can be used by a script
2685 to distinguish application exit codes from vari‐
2686 ous Slurm error conditions. Also see
2687 SLURM_EXIT_ERROR.
2688
2689 SLURM_EXPORT_ENV Same as --export
2690
2691 SLURM_GPU_BIND Same as --gpu-bind
2692
2693 SLURM_GPU_FREQ Same as --gpu-freq
2694
2695 SLURM_GPUS Same as -G, --gpus
2696
2697 SLURM_GPUS_PER_NODE Same as --gpus-per-node
2698
2699 SLURM_GPUS_PER_TASK Same as --gpus-per-task
2700
2701 SLURM_GRES Same as --gres. Also see SLURM_STEP_GRES
2702
2703 SLURM_GRES_FLAGS Same as --gres-flags
2704
2705 SLURM_HINT Same as --hint
2706
2707 SLURM_IMMEDIATE Same as -I, --immediate
2708
2709 SLURM_JOB_ID Same as --jobid
2710
2711 SLURM_JOB_NAME Same as -J, --job-name except within an existing
2712 allocation, in which case it is ignored to avoid
2713 using the batch job's name as the name of each
2714 job step.
2715
2716 SLURM_JOB_NUM_NODES Same as -N, --nodes. Total number of nodes in
2717 the job’s resource allocation.
2718
2719 SLURM_KILL_BAD_EXIT Same as -K, --kill-on-bad-exit. Must be set to 0
2720 or 1 to disable or enable the option.
2721
2722 SLURM_LABELIO Same as -l, --label
2723
2724 SLURM_MEM_BIND Same as --mem-bind
2725
2726 SLURM_MEM_PER_CPU Same as --mem-per-cpu
2727
2728 SLURM_MEM_PER_GPU Same as --mem-per-gpu
2729
2730 SLURM_MEM_PER_NODE Same as --mem
2731
2732 SLURM_MPI_TYPE Same as --mpi
2733
2734 SLURM_NETWORK Same as --network
2735
2736 SLURM_NNODES Same as -N, --nodes. Total number of nodes in the
2737 job’s resource allocation. See
2738 SLURM_JOB_NUM_NODES. Included for backwards com‐
2739 patibility.
2740
2741 SLURM_NO_KILL Same as -k, --no-kill
2742
2743 SLURM_NPROCS Same as -n, --ntasks. See SLURM_NTASKS. Included
2744 for backwards compatibility.
2745
2746 SLURM_NTASKS Same as -n, --ntasks
2747
2748 SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2749
2750 SLURM_NTASKS_PER_GPU Same as --ntasks-per-gpu
2751
2752 SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2753
2754 SLURM_NTASKS_PER_SOCKET
2755 Same as --ntasks-per-socket
2756
2757 SLURM_OPEN_MODE Same as --open-mode
2758
2759 SLURM_OVERCOMMIT Same as -O, --overcommit
2760
2761 SLURM_OVERLAP Same as --overlap
2762
2763 SLURM_PARTITION Same as -p, --partition
2764
2765 SLURM_PMI_KVS_NO_DUP_KEYS
2766 If set, then PMI key-pairs will contain no dupli‐
2767 cate keys. MPI can use this variable to inform
2768 the PMI library that it will not use duplicate
2769 keys so PMI can skip the check for duplicate
2770 keys. This is the case for MPICH2 and reduces
2771 overhead in testing for duplicates for improved
2772 performance
2773
2774 SLURM_POWER Same as --power
2775
2776 SLURM_PROFILE Same as --profile
2777
2778 SLURM_PROLOG Same as --prolog
2779
2780 SLURM_QOS Same as --qos
2781
2782 SLURM_REMOTE_CWD Same as -D, --chdir=
2783
2784 SLURM_REQ_SWITCH When a tree topology is used, this defines the
2785 maximum count of switches desired for the job al‐
2786 location and optionally the maximum time to wait
2787 for that number of switches. See --switches
2788
2789 SLURM_RESERVATION Same as --reservation
2790
2791 SLURM_RESV_PORTS Same as --resv-ports
2792
2793 SLURM_SEND_LIBS Same as --send-libs
2794
2795 SLURM_SIGNAL Same as --signal
2796
2797 SLURM_SPREAD_JOB Same as --spread-job
2798
2799 SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2800 if set and non-zero, successive task exit mes‐
2801 sages with the same exit code will be printed
2802 only once.
2803
2804 SLURM_STDERRMODE Same as -e, --error
2805
2806 SLURM_STDINMODE Same as -i, --input
2807
2808 SLURM_STDOUTMODE Same as -o, --output
2809
2810 SLURM_STEP_GRES Same as --gres (only applies to job steps, not to
2811 job allocations). Also see SLURM_GRES
2812
2813 SLURM_STEP_KILLED_MSG_NODE_ID=ID
2814 If set, only the specified node will log when the
2815 job or step are killed by a signal.
2816
2817 SLURM_TASK_EPILOG Same as --task-epilog
2818
2819 SLURM_TASK_PROLOG Same as --task-prolog
2820
2821 SLURM_TEST_EXEC If defined, srun will verify existence of the ex‐
2822 ecutable program along with user execute permis‐
2823 sion on the node where srun was called before at‐
2824 tempting to launch it on nodes in the step.
2825
2826 SLURM_THREAD_SPEC Same as --thread-spec
2827
2828 SLURM_THREADS Same as -T, --threads
2829
2830 SLURM_THREADS_PER_CORE
2831 Same as --threads-per-core
2832
2833 SLURM_TIMELIMIT Same as -t, --time
2834
2835 SLURM_UNBUFFEREDIO Same as -u, --unbuffered
2836
2837 SLURM_USE_MIN_NODES Same as --use-min-nodes
2838
2839 SLURM_WAIT Same as -W, --wait
2840
2841 SLURM_WAIT4SWITCH Max time waiting for requested switches. See
2842 --switches
2843
2844 SLURM_WCKEY Same as -W, --wckey
2845
2846 SLURM_WORKING_DIR -D, --chdir
2847
2848 SLURMD_DEBUG Same as -d, --slurmd-debug. Must be set to 0 or 1
2849 to disable or enable the option.
2850
2851 SRUN_CONTAINER Same as --container.
2852
2853 SRUN_EXPORT_ENV Same as --export, and will override any setting
2854 for SLURM_EXPORT_ENV.
2855
2856
2857
2859 srun will set some environment variables in the environment of the exe‐
2860 cuting tasks on the remote compute nodes. These environment variables
2861 are:
2862
2863
2864 SLURM_*_HET_GROUP_# For a heterogeneous job allocation, the environ‐
2865 ment variables are set separately for each compo‐
2866 nent.
2867
2868 SLURM_CLUSTER_NAME Name of the cluster on which the job is execut‐
2869 ing.
2870
2871 SLURM_CPU_BIND_LIST --cpu-bind map or mask list (list of Slurm CPU
2872 IDs or masks for this node, CPU_ID = Board_ID x
2873 threads_per_board + Socket_ID x
2874 threads_per_socket + Core_ID x threads_per_core +
2875 Thread_ID).
2876
2877 SLURM_CPU_BIND_TYPE --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2878
2879 SLURM_CPU_BIND_VERBOSE
2880 --cpu-bind verbosity (quiet,verbose).
2881
2882 SLURM_CPU_FREQ_REQ Contains the value requested for cpu frequency on
2883 the srun command as a numerical frequency in
2884 kilohertz, or a coded value for a request of low,
2885 medium,highm1 or high for the frequency. See the
2886 description of the --cpu-freq option or the
2887 SLURM_CPU_FREQ_REQ input environment variable.
2888
2889 SLURM_CPUS_ON_NODE Number of CPUs available to the step on this
2890 node. NOTE: The select/linear plugin allocates
2891 entire nodes to jobs, so the value indicates the
2892 total count of CPUs on the node. For the se‐
2893 lect/cons_res and cons/tres plugins, this number
2894 indicates the number of CPUs on this node allo‐
2895 cated to the step.
2896
2897 SLURM_CPUS_PER_TASK Number of cpus requested per task. Only set if
2898 the --cpus-per-task option is specified.
2899
2900 SLURM_DISTRIBUTION Distribution type for the allocated jobs. Set the
2901 distribution with -m, --distribution.
2902
2903 SLURM_GPUS_ON_NODE Number of GPUs available to the step on this
2904 node.
2905
2906 SLURM_GTIDS Global task IDs running on this node. Zero ori‐
2907 gin and comma separated. It is read internally
2908 by pmi if Slurm was built with pmi support. Leav‐
2909 ing the variable set may cause problems when us‐
2910 ing external packages from within the job (Abaqus
2911 and Ansys have been known to have problems when
2912 it is set - consult the appropriate documentation
2913 for 3rd party software).
2914
2915 SLURM_HET_SIZE Set to count of components in heterogeneous job.
2916
2917 SLURM_JOB_ACCOUNT Account name associated of the job allocation.
2918
2919 SLURM_JOB_CPUS_PER_NODE
2920 Count of CPUs available to the job on the nodes
2921 in the allocation, using the format
2922 CPU_count[(xnumber_of_nodes)][,CPU_count [(xnum‐
2923 ber_of_nodes)] ...]. For example:
2924 SLURM_JOB_CPUS_PER_NODE='72(x2),36' indicates
2925 that on the first and second nodes (as listed by
2926 SLURM_JOB_NODELIST) the allocation has 72 CPUs,
2927 while the third node has 36 CPUs. NOTE: The se‐
2928 lect/linear plugin allocates entire nodes to
2929 jobs, so the value indicates the total count of
2930 CPUs on allocated nodes. The select/cons_res and
2931 select/cons_tres plugins allocate individual CPUs
2932 to jobs, so this number indicates the number of
2933 CPUs allocated to the job.
2934
2935 SLURM_JOB_DEPENDENCY Set to value of the --dependency option.
2936
2937 SLURM_JOB_ID Job id of the executing job.
2938
2939 SLURM_JOB_NAME Set to the value of the --job-name option or the
2940 command name when srun is used to create a new
2941 job allocation. Not set when srun is used only to
2942 create a job step (i.e. within an existing job
2943 allocation).
2944
2945 SLURM_JOB_NODELIST List of nodes allocated to the job.
2946
2947 SLURM_JOB_NODES Total number of nodes in the job's resource allo‐
2948 cation.
2949
2950 SLURM_JOB_PARTITION Name of the partition in which the job is run‐
2951 ning.
2952
2953 SLURM_JOB_QOS Quality Of Service (QOS) of the job allocation.
2954
2955 SLURM_JOB_RESERVATION Advanced reservation containing the job alloca‐
2956 tion, if any.
2957
2958 SLURM_JOBID Job id of the executing job. See SLURM_JOB_ID.
2959 Included for backwards compatibility.
2960
2961 SLURM_LAUNCH_NODE_IPADDR
2962 IP address of the node from which the task launch
2963 was initiated (where the srun command ran from).
2964
2965 SLURM_LOCALID Node local task ID for the process within a job.
2966
2967 SLURM_MEM_BIND_LIST --mem-bind map or mask list (<list of IDs or
2968 masks for this node>).
2969
2970 SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2971
2972 SLURM_MEM_BIND_SORT Sort free cache pages (run zonesort on Intel KNL
2973 nodes).
2974
2975 SLURM_MEM_BIND_TYPE --mem-bind type (none,rank,map_mem:,mask_mem:).
2976
2977 SLURM_MEM_BIND_VERBOSE
2978 --mem-bind verbosity (quiet,verbose).
2979
2980 SLURM_NODE_ALIASES Sets of node name, communication address and
2981 hostname for nodes allocated to the job from the
2982 cloud. Each element in the set if colon separated
2983 and each set is comma separated. For example:
2984 SLURM_NODE_ALIASES=
2985 ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2986
2987 SLURM_NODEID The relative node ID of the current node.
2988
2989 SLURM_NPROCS Total number of processes in the current job or
2990 job step. See SLURM_NTASKS. Included for back‐
2991 wards compatibility.
2992
2993 SLURM_NTASKS Total number of processes in the current job or
2994 job step.
2995
2996 SLURM_OVERCOMMIT Set to 1 if --overcommit was specified.
2997
2998 SLURM_PRIO_PROCESS The scheduling priority (nice value) at the time
2999 of job submission. This value is propagated to
3000 the spawned processes.
3001
3002 SLURM_PROCID The MPI rank (or relative process ID) of the cur‐
3003 rent process.
3004
3005 SLURM_SRUN_COMM_HOST IP address of srun communication host.
3006
3007 SLURM_SRUN_COMM_PORT srun communication port.
3008
3009 SLURM_CONTAINER OCI Bundle for job. Only set if --container is
3010 specified.
3011
3012 SLURM_STEP_ID The step ID of the current job.
3013
3014 SLURM_STEP_LAUNCHER_PORT
3015 Step launcher port.
3016
3017 SLURM_STEP_NODELIST List of nodes allocated to the step.
3018
3019 SLURM_STEP_NUM_NODES Number of nodes allocated to the step.
3020
3021 SLURM_STEP_NUM_TASKS Number of processes in the job step or whole het‐
3022 erogeneous job step.
3023
3024 SLURM_STEP_TASKS_PER_NODE
3025 Number of processes per node within the step.
3026
3027 SLURM_STEPID The step ID of the current job. See
3028 SLURM_STEP_ID. Included for backwards compatibil‐
3029 ity.
3030
3031 SLURM_SUBMIT_DIR The directory from which the allocation was in‐
3032 voked from.
3033
3034 SLURM_SUBMIT_HOST The hostname of the computer from which the allo‐
3035 cation was invoked from.
3036
3037 SLURM_TASK_PID The process ID of the task being started.
3038
3039 SLURM_TASKS_PER_NODE Number of tasks to be initiated on each node.
3040 Values are comma separated and in the same order
3041 as SLURM_JOB_NODELIST. If two or more consecu‐
3042 tive nodes are to have the same task count, that
3043 count is followed by "(x#)" where "#" is the rep‐
3044 etition count. For example,
3045 "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
3046 first three nodes will each execute two tasks and
3047 the fourth node will execute one task.
3048
3049
3050 SLURM_TOPOLOGY_ADDR This is set only if the system has the topol‐
3051 ogy/tree plugin configured. The value will be
3052 set to the names network switches which may be
3053 involved in the job's communications from the
3054 system's top level switch down to the leaf switch
3055 and ending with node name. A period is used to
3056 separate each hardware component name.
3057
3058 SLURM_TOPOLOGY_ADDR_PATTERN
3059 This is set only if the system has the topol‐
3060 ogy/tree plugin configured. The value will be
3061 set component types listed in SLURM_TOPOL‐
3062 OGY_ADDR. Each component will be identified as
3063 either "switch" or "node". A period is used to
3064 separate each hardware component type.
3065
3066 SLURM_UMASK The umask in effect when the job was submitted.
3067
3068 SLURMD_NODENAME Name of the node running the task. In the case of
3069 a parallel job executing on multiple compute
3070 nodes, the various tasks will have this environ‐
3071 ment variable set to different values on each
3072 compute node.
3073
3074 SRUN_DEBUG Set to the logging level of the srun command.
3075 Default value is 3 (info level). The value is
3076 incremented or decremented based upon the --ver‐
3077 bose and --quiet options.
3078
3079
3081 Signals sent to the srun command are automatically forwarded to the
3082 tasks it is controlling with a few exceptions. The escape sequence
3083 <control-c> will report the state of all tasks associated with the srun
3084 command. If <control-c> is entered twice within one second, then the
3085 associated SIGINT signal will be sent to all tasks and a termination
3086 sequence will be entered sending SIGCONT, SIGTERM, and SIGKILL to all
3087 spawned tasks. If a third <control-c> is received, the srun program
3088 will be terminated without waiting for remote tasks to exit or their
3089 I/O to complete.
3090
3091 The escape sequence <control-z> is presently ignored.
3092
3093
3095 MPI use depends upon the type of MPI being used. There are three fun‐
3096 damentally different modes of operation used by these various MPI im‐
3097 plementations.
3098
3099 1. Slurm directly launches the tasks and performs initialization of
3100 communications through the PMI2 or PMIx APIs. For example: "srun -n16
3101 a.out".
3102
3103 2. Slurm creates a resource allocation for the job and then mpirun
3104 launches tasks using Slurm's infrastructure (OpenMPI).
3105
3106 3. Slurm creates a resource allocation for the job and then mpirun
3107 launches tasks using some mechanism other than Slurm, such as SSH or
3108 RSH. These tasks are initiated outside of Slurm's monitoring or con‐
3109 trol. Slurm's epilog should be configured to purge these tasks when the
3110 job's allocation is relinquished, or the use of pam_slurm_adopt is
3111 highly recommended.
3112
3113 See https://slurm.schedmd.com/mpi_guide.html for more information on
3114 use of these various MPI implementations with Slurm.
3115
3116
3118 Comments in the configuration file must have a "#" in column one. The
3119 configuration file contains the following fields separated by white
3120 space:
3121
3122 Task rank
3123 One or more task ranks to use this configuration. Multiple val‐
3124 ues may be comma separated. Ranges may be indicated with two
3125 numbers separated with a '-' with the smaller number first (e.g.
3126 "0-4" and not "4-0"). To indicate all tasks not otherwise spec‐
3127 ified, specify a rank of '*' as the last line of the file. If
3128 an attempt is made to initiate a task for which no executable
3129 program is defined, the following error message will be produced
3130 "No executable program specified for this task".
3131
3132 Executable
3133 The name of the program to execute. May be fully qualified
3134 pathname if desired.
3135
3136 Arguments
3137 Program arguments. The expression "%t" will be replaced with
3138 the task's number. The expression "%o" will be replaced with
3139 the task's offset within this range (e.g. a configured task rank
3140 value of "1-5" would have offset values of "0-4"). Single
3141 quotes may be used to avoid having the enclosed values inter‐
3142 preted. This field is optional. Any arguments for the program
3143 entered on the command line will be added to the arguments spec‐
3144 ified in the configuration file.
3145
3146 For example:
3147 $ cat silly.conf
3148 ###################################################################
3149 # srun multiple program configuration file
3150 #
3151 # srun -n8 -l --multi-prog silly.conf
3152 ###################################################################
3153 4-6 hostname
3154 1,7 echo task:%t
3155 0,2-3 echo offset:%o
3156
3157 $ srun -n8 -l --multi-prog silly.conf
3158 0: offset:0
3159 1: task:1
3160 2: offset:1
3161 3: offset:2
3162 4: linux15.llnl.gov
3163 5: linux16.llnl.gov
3164 6: linux17.llnl.gov
3165 7: task:7
3166
3167
3169 This simple example demonstrates the execution of the command hostname
3170 in eight tasks. At least eight processors will be allocated to the job
3171 (the same as the task count) on however many nodes are required to sat‐
3172 isfy the request. The output of each task will be proceeded with its
3173 task number. (The machine "dev" in the example below has a total of
3174 two CPUs per node)
3175
3176 $ srun -n8 -l hostname
3177 0: dev0
3178 1: dev0
3179 2: dev1
3180 3: dev1
3181 4: dev2
3182 5: dev2
3183 6: dev3
3184 7: dev3
3185
3186
3187 The srun -r option is used within a job script to run two job steps on
3188 disjoint nodes in the following example. The script is run using allo‐
3189 cate mode instead of as a batch job in this case.
3190
3191 $ cat test.sh
3192 #!/bin/sh
3193 echo $SLURM_JOB_NODELIST
3194 srun -lN2 -r2 hostname
3195 srun -lN2 hostname
3196
3197 $ salloc -N4 test.sh
3198 dev[7-10]
3199 0: dev9
3200 1: dev10
3201 0: dev7
3202 1: dev8
3203
3204
3205 The following script runs two job steps in parallel within an allocated
3206 set of nodes.
3207
3208 $ cat test.sh
3209 #!/bin/bash
3210 srun -lN2 -n4 -r 2 sleep 60 &
3211 srun -lN2 -r 0 sleep 60 &
3212 sleep 1
3213 squeue
3214 squeue -s
3215 wait
3216
3217 $ salloc -N4 test.sh
3218 JOBID PARTITION NAME USER ST TIME NODES NODELIST
3219 65641 batch test.sh grondo R 0:01 4 dev[7-10]
3220
3221 STEPID PARTITION USER TIME NODELIST
3222 65641.0 batch grondo 0:01 dev[7-8]
3223 65641.1 batch grondo 0:01 dev[9-10]
3224
3225
3226 This example demonstrates how one executes a simple MPI job. We use
3227 srun to build a list of machines (nodes) to be used by mpirun in its
3228 required format. A sample command line and the script to be executed
3229 follow.
3230
3231 $ cat test.sh
3232 #!/bin/sh
3233 MACHINEFILE="nodes.$SLURM_JOB_ID"
3234
3235 # Generate Machinefile for mpi such that hosts are in the same
3236 # order as if run via srun
3237 #
3238 srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3239
3240 # Run using generated Machine file:
3241 mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3242
3243 rm $MACHINEFILE
3244
3245 $ salloc -N2 -n4 test.sh
3246
3247
3248 This simple example demonstrates the execution of different jobs on
3249 different nodes in the same srun. You can do this for any number of
3250 nodes or any number of jobs. The executables are placed on the nodes
3251 sited by the SLURM_NODEID env var. Starting at 0 and going to the num‐
3252 ber specified on the srun command line.
3253
3254 $ cat test.sh
3255 case $SLURM_NODEID in
3256 0) echo "I am running on "
3257 hostname ;;
3258 1) hostname
3259 echo "is where I am running" ;;
3260 esac
3261
3262 $ srun -N2 test.sh
3263 dev0
3264 is where I am running
3265 I am running on
3266 dev1
3267
3268
3269 This example demonstrates use of multi-core options to control layout
3270 of tasks. We request that four sockets per node and two cores per
3271 socket be dedicated to the job.
3272
3273 $ srun -N2 -B 4-4:2-2 a.out
3274
3275
3276 This example shows a script in which Slurm is used to provide resource
3277 management for a job by executing the various job steps as processors
3278 become available for their dedicated use.
3279
3280 $ cat my.script
3281 #!/bin/bash
3282 srun -n4 prog1 &
3283 srun -n3 prog2 &
3284 srun -n1 prog3 &
3285 srun -n1 prog4 &
3286 wait
3287
3288
3289 This example shows how to launch an application called "server" with
3290 one task, 8 CPUs and 16 GB of memory (2 GB per CPU) plus another appli‐
3291 cation called "client" with 16 tasks, 1 CPU per task (the default) and
3292 1 GB of memory per task.
3293
3294 $ srun -n1 -c16 --mem-per-cpu=1gb server : -n16 --mem-per-cpu=1gb client
3295
3296
3298 Copyright (C) 2006-2007 The Regents of the University of California.
3299 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3300 Copyright (C) 2008-2010 Lawrence Livermore National Security.
3301 Copyright (C) 2010-2021 SchedMD LLC.
3302
3303 This file is part of Slurm, a resource management program. For de‐
3304 tails, see <https://slurm.schedmd.com/>.
3305
3306 Slurm is free software; you can redistribute it and/or modify it under
3307 the terms of the GNU General Public License as published by the Free
3308 Software Foundation; either version 2 of the License, or (at your op‐
3309 tion) any later version.
3310
3311 Slurm is distributed in the hope that it will be useful, but WITHOUT
3312 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
3313 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
3314 for more details.
3315
3316
3318 salloc(1), sattach(1), sbatch(1), sbcast(1), scancel(1), scontrol(1),
3319 squeue(1), slurm.conf(5), sched_setaffinity (2), numa (3) getrlimit (2)
3320
3321
3322
3323November 2021 Slurm Commands srun(1)