1srun(1) Slurm Commands srun(1)
2
3
4
6 srun - Run parallel jobs
7
8
10 srun [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]]
11 executable(N) [args(N)...]
12
13 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
14 For more details about heterogeneous jobs see the document
15 https://slurm.schedmd.com/heterogeneous_jobs.html
16
17
19 Run a parallel job on cluster managed by Slurm. If necessary, srun
20 will first create a resource allocation in which to run the parallel
21 job.
22
23 The following document describes the influence of various options on
24 the allocation of cpus to jobs and tasks.
25 https://slurm.schedmd.com/cpu_management.html
26
27
29 srun will return the highest exit code of all tasks run or the highest
30 signal (with the high-order bit set in an 8-bit integer -- e.g. 128 +
31 signal) of any task that exited with a signal.
32 The value 253 is reserved for out-of-memory errors.
33
34
36 The executable is resolved in the following order:
37
38 1. If executable starts with ".", then path is constructed as: current
39 working directory / executable
40 2. If executable starts with a "/", then path is considered absolute.
41 3. If executable can be resolved through PATH. See path_resolution(7).
42 4. If executable is in current working directory.
43
44 Current working directory is the calling process working directory un‐
45 less the --chdir argument is passed, which will override the current
46 working directory.
47
48
50 --accel-bind=<options>
51 Control how tasks are bound to generic resources of type gpu and
52 nic. Multiple options may be specified. Supported options in‐
53 clude:
54
55 g Bind each task to GPUs which are closest to the allocated
56 CPUs.
57
58 n Bind each task to NICs which are closest to the allocated
59 CPUs.
60
61 v Verbose mode. Log how tasks are bound to GPU and NIC de‐
62 vices.
63
64 This option applies to job allocations.
65
66 -A, --account=<account>
67 Charge resources used by this job to specified account. The ac‐
68 count is an arbitrary string. The account name may be changed
69 after job submission using the scontrol command. This option ap‐
70 plies to job allocations.
71
72 --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
73 Define the job accounting and profiling sampling intervals in
74 seconds. This can be used to override the JobAcctGatherFre‐
75 quency parameter in the slurm.conf file. <datatype>=<interval>
76 specifies the task sampling interval for the jobacct_gather
77 plugin or a sampling interval for a profiling type by the
78 acct_gather_profile plugin. Multiple comma-separated
79 <datatype>=<interval> pairs may be specified. Supported datatype
80 values are:
81
82 task Sampling interval for the jobacct_gather plugins and
83 for task profiling by the acct_gather_profile
84 plugin.
85 NOTE: This frequency is used to monitor memory us‐
86 age. If memory limits are enforced the highest fre‐
87 quency a user can request is what is configured in
88 the slurm.conf file. It can not be disabled.
89
90 energy Sampling interval for energy profiling using the
91 acct_gather_energy plugin.
92
93 network Sampling interval for infiniband profiling using the
94 acct_gather_interconnect plugin.
95
96 filesystem Sampling interval for filesystem profiling using the
97 acct_gather_filesystem plugin.
98
99
100 The default value for the task sampling interval is 30 seconds.
101 The default value for all other intervals is 0. An interval of
102 0 disables sampling of the specified type. If the task sampling
103 interval is 0, accounting information is collected only at job
104 termination (reducing Slurm interference with the job).
105 Smaller (non-zero) values have a greater impact upon job perfor‐
106 mance, but a value of 30 seconds is not likely to be noticeable
107 for applications having less than 10,000 tasks. This option ap‐
108 plies to job allocations.
109
110 --bb=<spec>
111 Burst buffer specification. The form of the specification is
112 system dependent. Also see --bbf. This option applies to job
113 allocations. When the --bb option is used, Slurm parses this
114 option and creates a temporary burst buffer script file that is
115 used internally by the burst buffer plugins. See Slurm's burst
116 buffer guide for more information and examples:
117 https://slurm.schedmd.com/burst_buffer.html
118
119 --bbf=<file_name>
120 Path of file containing burst buffer specification. The form of
121 the specification is system dependent. Also see --bb. This op‐
122 tion applies to job allocations. See Slurm's burst buffer guide
123 for more information and examples:
124 https://slurm.schedmd.com/burst_buffer.html
125
126 --bcast[=<dest_path>]
127 Copy executable file to allocated compute nodes. If a file name
128 is specified, copy the executable to the specified destination
129 file path. If the path specified ends with '/' it is treated as
130 a target directory, and the destination file name will be
131 slurm_bcast_<job_id>.<step_id>_<nodename>. If no dest_path is
132 specified and the slurm.conf BcastParameters DestDir is config‐
133 ured then it is used, and the filename follows the above pat‐
134 tern. If none of the previous is specified, then --chdir is
135 used, and the filename follows the above pattern too. For exam‐
136 ple, "srun --bcast=/tmp/mine -N3 a.out" will copy the file
137 "a.out" from your current directory to the file "/tmp/mine" on
138 each of the three allocated compute nodes and execute that file.
139 This option applies to step allocations.
140
141 --bcast-exclude={NONE|<exclude_path>[,<exclude_path>...]}
142 Comma-separated list of absolute directory paths to be excluded
143 when autodetecting and broadcasting executable shared object de‐
144 pendencies through --bcast. If the keyword "NONE" is configured,
145 no directory paths will be excluded. The default value is that
146 of slurm.conf BcastExclude and this option overrides it. See
147 also --bcast and --send-libs.
148
149 -b, --begin=<time>
150 Defer initiation of this job until the specified time. It ac‐
151 cepts times of the form HH:MM:SS to run a job at a specific time
152 of day (seconds are optional). (If that time is already past,
153 the next day is assumed.) You may also specify midnight, noon,
154 fika (3 PM) or teatime (4 PM) and you can have a time-of-day
155 suffixed with AM or PM for running in the morning or the
156 evening. You can also say what day the job will be run, by
157 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
158 Combine date and time using the following format
159 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
160 count time-units, where the time-units can be seconds (default),
161 minutes, hours, days, or weeks and you can tell Slurm to run the
162 job today with the keyword today and to run the job tomorrow
163 with the keyword tomorrow. The value may be changed after job
164 submission using the scontrol command. For example:
165
166 --begin=16:00
167 --begin=now+1hour
168 --begin=now+60 (seconds by default)
169 --begin=2010-01-20T12:34:00
170
171
172 Notes on date/time specifications:
173 - Although the 'seconds' field of the HH:MM:SS time specifica‐
174 tion is allowed by the code, note that the poll time of the
175 Slurm scheduler is not precise enough to guarantee dispatch of
176 the job on the exact second. The job will be eligible to start
177 on the next poll following the specified time. The exact poll
178 interval depends on the Slurm scheduler (e.g., 60 seconds with
179 the default sched/builtin).
180 - If no time (HH:MM:SS) is specified, the default is
181 (00:00:00).
182 - If a date is specified without a year (e.g., MM/DD) then the
183 current year is assumed, unless the combination of MM/DD and
184 HH:MM:SS has already passed for that year, in which case the
185 next year is used.
186 This option applies to job allocations.
187
188 -D, --chdir=<path>
189 Have the remote processes do a chdir to path before beginning
190 execution. The default is to chdir to the current working direc‐
191 tory of the srun process. The path can be specified as full path
192 or relative path to the directory where the command is executed.
193 This option applies to job allocations.
194
195 --cluster-constraint=<list>
196 Specifies features that a federated cluster must have to have a
197 sibling job submitted to it. Slurm will attempt to submit a sib‐
198 ling job to a cluster if it has at least one of the specified
199 features.
200
201 -M, --clusters=<string>
202 Clusters to issue commands to. Multiple cluster names may be
203 comma separated. The job will be submitted to the one cluster
204 providing the earliest expected job initiation time. The default
205 value is the current cluster. A value of 'all' will query to run
206 on all clusters. Note the --export option to control environ‐
207 ment variables exported between clusters. This option applies
208 only to job allocations. Note that the SlurmDBD must be up for
209 this option to work properly.
210
211 --comment=<string>
212 An arbitrary comment. This option applies to job allocations.
213
214 --compress[=type]
215 Compress file before sending it to compute hosts. The optional
216 argument specifies the data compression library to be used. The
217 default is BcastParameters Compression= if set or "lz4" other‐
218 wise. Supported values are "lz4". Some compression libraries
219 may be unavailable on some systems. For use with the --bcast
220 option. This option applies to step allocations.
221
222 -C, --constraint=<list>
223 Nodes can have features assigned to them by the Slurm adminis‐
224 trator. Users can specify which of these features are required
225 by their job using the constraint option. Only nodes having
226 features matching the job constraints will be used to satisfy
227 the request. Multiple constraints may be specified with AND,
228 OR, matching OR, resource counts, etc. (some operators are not
229 supported on all system types). Supported constraint options
230 include:
231
232 Single Name
233 Only nodes which have the specified feature will be used.
234 For example, --constraint="intel"
235
236 Node Count
237 A request can specify the number of nodes needed with
238 some feature by appending an asterisk and count after the
239 feature name. For example, --nodes=16 --con‐
240 straint="graphics*4 ..." indicates that the job requires
241 16 nodes and that at least four of those nodes must have
242 the feature "graphics."
243
244 AND If only nodes with all of specified features will be
245 used. The ampersand is used for an AND operator. For
246 example, --constraint="intel&gpu"
247
248 OR If only nodes with at least one of specified features
249 will be used. The vertical bar is used for an OR opera‐
250 tor. For example, --constraint="intel|amd"
251
252 Matching OR
253 If only one of a set of possible options should be used
254 for all allocated nodes, then use the OR operator and en‐
255 close the options within square brackets. For example,
256 --constraint="[rack1|rack2|rack3|rack4]" might be used to
257 specify that all nodes must be allocated on a single rack
258 of the cluster, but any of those four racks can be used.
259
260 Multiple Counts
261 Specific counts of multiple resources may be specified by
262 using the AND operator and enclosing the options within
263 square brackets. For example, --con‐
264 straint="[rack1*2&rack2*4]" might be used to specify that
265 two nodes must be allocated from nodes with the feature
266 of "rack1" and four nodes must be allocated from nodes
267 with the feature "rack2".
268
269 NOTE: This construct does not support multiple Intel KNL
270 NUMA or MCDRAM modes. For example, while --con‐
271 straint="[(knl&quad)*2&(knl&hemi)*4]" is not supported,
272 --constraint="[haswell*2&(knl&hemi)*4]" is supported.
273 Specification of multiple KNL modes requires the use of a
274 heterogeneous job.
275
276 Brackets
277 Brackets can be used to indicate that you are looking for
278 a set of nodes with the different requirements contained
279 within the brackets. For example, --con‐
280 straint="[(rack1|rack2)*1&(rack3)*2]" will get you one
281 node with either the "rack1" or "rack2" features and two
282 nodes with the "rack3" feature. The same request without
283 the brackets will try to find a single node that meets
284 those requirements.
285
286 NOTE: Brackets are only reserved for Multiple Counts and
287 Matching OR syntax. AND operators require a count for
288 each feature inside square brackets (i.e.
289 "[quad*2&hemi*1]"). Slurm will only allow a single set of
290 bracketed constraints per job.
291
292 Parenthesis
293 Parenthesis can be used to group like node features to‐
294 gether. For example, --con‐
295 straint="[(knl&snc4&flat)*4&haswell*1]" might be used to
296 specify that four nodes with the features "knl", "snc4"
297 and "flat" plus one node with the feature "haswell" are
298 required. All options within parenthesis should be
299 grouped with AND (e.g. "&") operands.
300
301 WARNING: When srun is executed from within salloc or sbatch, the
302 constraint value can only contain a single feature name. None of
303 the other operators are currently supported for job steps.
304 This option applies to job and step allocations.
305
306 --container=<path_to_container>
307 Absolute path to OCI container bundle.
308
309 --contiguous
310 If set, then the allocated nodes must form a contiguous set.
311
312 NOTE: If SelectPlugin=cons_res this option won't be honored with
313 the topology/tree or topology/3d_torus plugins, both of which
314 can modify the node ordering. This option applies to job alloca‐
315 tions.
316
317 -S, --core-spec=<num>
318 Count of specialized cores per node reserved by the job for sys‐
319 tem operations and not used by the application. The application
320 will not use these cores, but will be charged for their alloca‐
321 tion. Default value is dependent upon the node's configured
322 CoreSpecCount value. If a value of zero is designated and the
323 Slurm configuration option AllowSpecResourcesUsage is enabled,
324 the job will be allowed to override CoreSpecCount and use the
325 specialized resources on nodes it is allocated. This option can
326 not be used with the --thread-spec option. This option applies
327 to job allocations.
328 NOTE: This option may implicitly impact the number of tasks if
329 -n was not specified.
330
331 --cores-per-socket=<cores>
332 Restrict node selection to nodes with at least the specified
333 number of cores per socket. See additional information under -B
334 option above when task/affinity plugin is enabled. This option
335 applies to job allocations.
336
337 --cpu-bind=[{quiet|verbose},]<type>
338 Bind tasks to CPUs. Used only when the task/affinity plugin is
339 enabled. NOTE: To have Slurm always report on the selected CPU
340 binding for all commands executed in a shell, you can enable
341 verbose mode by setting the SLURM_CPU_BIND environment variable
342 value to "verbose".
343
344 The following informational environment variables are set when
345 --cpu-bind is in use:
346
347 SLURM_CPU_BIND_VERBOSE
348 SLURM_CPU_BIND_TYPE
349 SLURM_CPU_BIND_LIST
350
351 See the ENVIRONMENT VARIABLES section for a more detailed de‐
352 scription of the individual SLURM_CPU_BIND variables. These
353 variable are available only if the task/affinity plugin is con‐
354 figured.
355
356 When using --cpus-per-task to run multithreaded tasks, be aware
357 that CPU binding is inherited from the parent of the process.
358 This means that the multithreaded task should either specify or
359 clear the CPU binding itself to avoid having all threads of the
360 multithreaded task use the same mask/CPU as the parent. Alter‐
361 natively, fat masks (masks which specify more than one allowed
362 CPU) could be used for the tasks in order to provide multiple
363 CPUs for the multithreaded tasks.
364
365 Note that a job step can be allocated different numbers of CPUs
366 on each node or be allocated CPUs not starting at location zero.
367 Therefore one of the options which automatically generate the
368 task binding is recommended. Explicitly specified masks or
369 bindings are only honored when the job step has been allocated
370 every available CPU on the node.
371
372 Binding a task to a NUMA locality domain means to bind the task
373 to the set of CPUs that belong to the NUMA locality domain or
374 "NUMA node". If NUMA locality domain options are used on sys‐
375 tems with no NUMA support, then each socket is considered a lo‐
376 cality domain.
377
378 If the --cpu-bind option is not used, the default binding mode
379 will depend upon Slurm's configuration and the step's resource
380 allocation. If all allocated nodes have the same configured
381 CpuBind mode, that will be used. Otherwise if the job's Parti‐
382 tion has a configured CpuBind mode, that will be used. Other‐
383 wise if Slurm has a configured TaskPluginParam value, that mode
384 will be used. Otherwise automatic binding will be performed as
385 described below.
386
387 Auto Binding
388 Applies only when task/affinity is enabled. If the job
389 step allocation includes an allocation with a number of
390 sockets, cores, or threads equal to the number of tasks
391 times cpus-per-task, then the tasks will by default be
392 bound to the appropriate resources (auto binding). Dis‐
393 able this mode of operation by explicitly setting
394 "--cpu-bind=none". Use TaskPluginParam=auto‐
395 bind=[threads|cores|sockets] to set a default cpu binding
396 in case "auto binding" doesn't find a match.
397
398 Supported options include:
399
400 q[uiet]
401 Quietly bind before task runs (default)
402
403 v[erbose]
404 Verbosely report binding before task runs
405
406 no[ne] Do not bind tasks to CPUs (default unless auto
407 binding is applied)
408
409 rank Automatically bind by task rank. The lowest num‐
410 bered task on each node is bound to socket (or
411 core or thread) zero, etc. Not supported unless
412 the entire node is allocated to the job.
413
414 map_cpu:<list>
415 Bind by setting CPU masks on tasks (or ranks) as
416 specified where <list> is
417 <cpu_id_for_task_0>,<cpu_id_for_task_1>,... CPU
418 IDs are interpreted as decimal values unless they
419 are preceded with '0x' in which case they inter‐
420 preted as hexadecimal values. If the number of
421 tasks (or ranks) exceeds the number of elements in
422 this list, elements in the list will be reused as
423 needed starting from the beginning of the list.
424 To simplify support for large task counts, the
425 lists may follow a map with an asterisk and repe‐
426 tition count. For example
427 "map_cpu:0x0f*4,0xf0*4".
428
429 mask_cpu:<list>
430 Bind by setting CPU masks on tasks (or ranks) as
431 specified where <list> is
432 <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
433 The mapping is specified for a node and identical
434 mapping is applied to the tasks on every node
435 (i.e. the lowest task ID on each node is mapped to
436 the first mask specified in the list, etc.). CPU
437 masks are always interpreted as hexadecimal values
438 but can be preceded with an optional '0x'. If the
439 number of tasks (or ranks) exceeds the number of
440 elements in this list, elements in the list will
441 be reused as needed starting from the beginning of
442 the list. To simplify support for large task
443 counts, the lists may follow a map with an aster‐
444 isk and repetition count. For example
445 "mask_cpu:0x0f*4,0xf0*4".
446
447 rank_ldom
448 Bind to a NUMA locality domain by rank. Not sup‐
449 ported unless the entire node is allocated to the
450 job.
451
452 map_ldom:<list>
453 Bind by mapping NUMA locality domain IDs to tasks
454 as specified where <list> is
455 <ldom1>,<ldom2>,...<ldomN>. The locality domain
456 IDs are interpreted as decimal values unless they
457 are preceded with '0x' in which case they are in‐
458 terpreted as hexadecimal values. Not supported
459 unless the entire node is allocated to the job.
460
461 mask_ldom:<list>
462 Bind by setting NUMA locality domain masks on
463 tasks as specified where <list> is
464 <mask1>,<mask2>,...<maskN>. NUMA locality domain
465 masks are always interpreted as hexadecimal values
466 but can be preceded with an optional '0x'. Not
467 supported unless the entire node is allocated to
468 the job.
469
470 sockets
471 Automatically generate masks binding tasks to
472 sockets. Only the CPUs on the socket which have
473 been allocated to the job will be used. If the
474 number of tasks differs from the number of allo‐
475 cated sockets this can result in sub-optimal bind‐
476 ing.
477
478 cores Automatically generate masks binding tasks to
479 cores. If the number of tasks differs from the
480 number of allocated cores this can result in
481 sub-optimal binding.
482
483 threads
484 Automatically generate masks binding tasks to
485 threads. If the number of tasks differs from the
486 number of allocated threads this can result in
487 sub-optimal binding.
488
489 ldoms Automatically generate masks binding tasks to NUMA
490 locality domains. If the number of tasks differs
491 from the number of allocated locality domains this
492 can result in sub-optimal binding.
493
494 help Show help message for cpu-bind
495
496 This option applies to job and step allocations.
497
498 --cpu-freq=<p1>[-p2[:p3]]
499
500 Request that the job step initiated by this srun command be run
501 at some requested frequency if possible, on the CPUs selected
502 for the step on the compute node(s).
503
504 p1 can be [#### | low | medium | high | highm1] which will set
505 the frequency scaling_speed to the corresponding value, and set
506 the frequency scaling_governor to UserSpace. See below for defi‐
507 nition of the values.
508
509 p1 can be [Conservative | OnDemand | Performance | PowerSave]
510 which will set the scaling_governor to the corresponding value.
511 The governor has to be in the list set by the slurm.conf option
512 CpuFreqGovernors.
513
514 When p2 is present, p1 will be the minimum scaling frequency and
515 p2 will be the maximum scaling frequency.
516
517 p2 can be [#### | medium | high | highm1] p2 must be greater
518 than p1.
519
520 p3 can be [Conservative | OnDemand | Performance | PowerSave |
521 SchedUtil | UserSpace] which will set the governor to the corre‐
522 sponding value.
523
524 If p3 is UserSpace, the frequency scaling_speed will be set by a
525 power or energy aware scheduling strategy to a value between p1
526 and p2 that lets the job run within the site's power goal. The
527 job may be delayed if p1 is higher than a frequency that allows
528 the job to run within the goal.
529
530 If the current frequency is < min, it will be set to min. Like‐
531 wise, if the current frequency is > max, it will be set to max.
532
533 Acceptable values at present include:
534
535 #### frequency in kilohertz
536
537 Low the lowest available frequency
538
539 High the highest available frequency
540
541 HighM1 (high minus one) will select the next highest
542 available frequency
543
544 Medium attempts to set a frequency in the middle of the
545 available range
546
547 Conservative attempts to use the Conservative CPU governor
548
549 OnDemand attempts to use the OnDemand CPU governor (the de‐
550 fault value)
551
552 Performance attempts to use the Performance CPU governor
553
554 PowerSave attempts to use the PowerSave CPU governor
555
556 UserSpace attempts to use the UserSpace CPU governor
557
558 The following informational environment variable is set
559 in the job
560 step when --cpu-freq option is requested.
561 SLURM_CPU_FREQ_REQ
562
563 This environment variable can also be used to supply the value
564 for the CPU frequency request if it is set when the 'srun' com‐
565 mand is issued. The --cpu-freq on the command line will over‐
566 ride the environment variable value. The form on the environ‐
567 ment variable is the same as the command line. See the ENVIRON‐
568 MENT VARIABLES section for a description of the
569 SLURM_CPU_FREQ_REQ variable.
570
571 NOTE: This parameter is treated as a request, not a requirement.
572 If the job step's node does not support setting the CPU fre‐
573 quency, or the requested value is outside the bounds of the le‐
574 gal frequencies, an error is logged, but the job step is allowed
575 to continue.
576
577 NOTE: Setting the frequency for just the CPUs of the job step
578 implies that the tasks are confined to those CPUs. If task con‐
579 finement (i.e. the task/affinity TaskPlugin is enabled, or the
580 task/cgroup TaskPlugin is enabled with "ConstrainCores=yes" set
581 in cgroup.conf) is not configured, this parameter is ignored.
582
583 NOTE: When the step completes, the frequency and governor of
584 each selected CPU is reset to the previous values.
585
586 NOTE: When submitting jobs with the --cpu-freq option with lin‐
587 uxproc as the ProctrackType can cause jobs to run too quickly
588 before Accounting is able to poll for job information. As a re‐
589 sult not all of accounting information will be present.
590
591 This option applies to job and step allocations.
592
593 --cpus-per-gpu=<ncpus>
594 Advise Slurm that ensuing job steps will require ncpus proces‐
595 sors per allocated GPU. Not compatible with the --cpus-per-task
596 option.
597
598 -c, --cpus-per-task=<ncpus>
599 Request that ncpus be allocated per process. This may be useful
600 if the job is multithreaded and requires more than one CPU per
601 task for optimal performance. The default is one CPU per process
602 and does not imply --exact. If -c is specified without -n, as
603 many tasks will be allocated per node as possible while satisfy‐
604 ing the -c restriction. For instance on a cluster with 8 CPUs
605 per node, a job request for 4 nodes and 3 CPUs per task may be
606 allocated 3 or 6 CPUs per node (1 or 2 tasks per node) depending
607 upon resource consumption by other jobs. Such a job may be un‐
608 able to execute more than a total of 4 tasks.
609
610 WARNING: There are configurations and options interpreted dif‐
611 ferently by job and job step requests which can result in incon‐
612 sistencies for this option. For example srun -c2
613 --threads-per-core=1 prog may allocate two cores for the job,
614 but if each of those cores contains two threads, the job alloca‐
615 tion will include four CPUs. The job step allocation will then
616 launch two threads per CPU for a total of two tasks.
617
618 WARNING: When srun is executed from within salloc or sbatch,
619 there are configurations and options which can result in incon‐
620 sistent allocations when -c has a value greater than -c on sal‐
621 loc or sbatch.
622
623 This option applies to job and step allocations.
624
625 --deadline=<OPT>
626 remove the job if no ending is possible before this deadline
627 (start > (deadline - time[-min])). Default is no deadline.
628 Valid time formats are:
629 HH:MM[:SS] [AM|PM]
630 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
631 MM/DD[/YY]-HH:MM[:SS]
632 YYYY-MM-DD[THH:MM[:SS]]]
633 now[+count[seconds(default)|minutes|hours|days|weeks]]
634
635 This option applies only to job allocations.
636
637 --delay-boot=<minutes>
638 Do not reboot nodes in order to satisfied this job's feature
639 specification if the job has been eligible to run for less than
640 this time period. If the job has waited for less than the spec‐
641 ified period, it will use only nodes which already have the
642 specified features. The argument is in units of minutes. A de‐
643 fault value may be set by a system administrator using the de‐
644 lay_boot option of the SchedulerParameters configuration parame‐
645 ter in the slurm.conf file, otherwise the default value is zero
646 (no delay).
647
648 This option applies only to job allocations.
649
650 -d, --dependency=<dependency_list>
651 Defer the start of this job until the specified dependencies
652 have been satisfied completed. This option does not apply to job
653 steps (executions of srun within an existing salloc or sbatch
654 allocation) only to job allocations. <dependency_list> is of
655 the form <type:job_id[:job_id][,type:job_id[:job_id]]> or
656 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
657 must be satisfied if the "," separator is used. Any dependency
658 may be satisfied if the "?" separator is used. Only one separa‐
659 tor may be used. Many jobs can share the same dependency and
660 these jobs may even belong to different users. The value may
661 be changed after job submission using the scontrol command. De‐
662 pendencies on remote jobs are allowed in a federation. Once a
663 job dependency fails due to the termination state of a preceding
664 job, the dependent job will never be run, even if the preceding
665 job is requeued and has a different termination state in a sub‐
666 sequent execution. This option applies to job allocations.
667
668 after:job_id[[+time][:jobid[+time]...]]
669 After the specified jobs start or are cancelled and
670 'time' in minutes from job start or cancellation happens,
671 this job can begin execution. If no 'time' is given then
672 there is no delay after start or cancellation.
673
674 afterany:job_id[:jobid...]
675 This job can begin execution after the specified jobs
676 have terminated.
677
678 afterburstbuffer:job_id[:jobid...]
679 This job can begin execution after the specified jobs
680 have terminated and any associated burst buffer stage out
681 operations have completed.
682
683 aftercorr:job_id[:jobid...]
684 A task of this job array can begin execution after the
685 corresponding task ID in the specified job has completed
686 successfully (ran to completion with an exit code of
687 zero).
688
689 afternotok:job_id[:jobid...]
690 This job can begin execution after the specified jobs
691 have terminated in some failed state (non-zero exit code,
692 node failure, timed out, etc).
693
694 afterok:job_id[:jobid...]
695 This job can begin execution after the specified jobs
696 have successfully executed (ran to completion with an
697 exit code of zero).
698
699 singleton
700 This job can begin execution after any previously
701 launched jobs sharing the same job name and user have
702 terminated. In other words, only one job by that name
703 and owned by that user can be running or suspended at any
704 point in time. In a federation, a singleton dependency
705 must be fulfilled on all clusters unless DependencyParam‐
706 eters=disable_remote_singleton is used in slurm.conf.
707
708 -X, --disable-status
709 Disable the display of task status when srun receives a single
710 SIGINT (Ctrl-C). Instead immediately forward the SIGINT to the
711 running job. Without this option a second Ctrl-C in one second
712 is required to forcibly terminate the job and srun will immedi‐
713 ately exit. May also be set via the environment variable
714 SLURM_DISABLE_STATUS. This option applies to job allocations.
715
716 -m, --distribution={*|block|cyclic|arbi‐
717 trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
718
719 Specify alternate distribution methods for remote processes.
720 For job allocation, this sets environment variables that will be
721 used by subsequent srun requests and also affects which cores
722 will be selected for job allocation.
723
724 This option controls the distribution of tasks to the nodes on
725 which resources have been allocated, and the distribution of
726 those resources to tasks for binding (task affinity). The first
727 distribution method (before the first ":") controls the distri‐
728 bution of tasks to nodes. The second distribution method (after
729 the first ":") controls the distribution of allocated CPUs
730 across sockets for binding to tasks. The third distribution
731 method (after the second ":") controls the distribution of allo‐
732 cated CPUs across cores for binding to tasks. The second and
733 third distributions apply only if task affinity is enabled. The
734 third distribution is supported only if the task/cgroup plugin
735 is configured. The default value for each distribution type is
736 specified by *.
737
738 Note that with select/cons_res and select/cons_tres, the number
739 of CPUs allocated to each socket and node may be different. Re‐
740 fer to https://slurm.schedmd.com/mc_support.html for more infor‐
741 mation on resource allocation, distribution of tasks to nodes,
742 and binding of tasks to CPUs.
743 First distribution method (distribution of tasks across nodes):
744
745
746 * Use the default method for distributing tasks to nodes
747 (block).
748
749 block The block distribution method will distribute tasks to a
750 node such that consecutive tasks share a node. For exam‐
751 ple, consider an allocation of three nodes each with two
752 cpus. A four-task block distribution request will dis‐
753 tribute those tasks to the nodes with tasks one and two
754 on the first node, task three on the second node, and
755 task four on the third node. Block distribution is the
756 default behavior if the number of tasks exceeds the num‐
757 ber of allocated nodes.
758
759 cyclic The cyclic distribution method will distribute tasks to a
760 node such that consecutive tasks are distributed over
761 consecutive nodes (in a round-robin fashion). For exam‐
762 ple, consider an allocation of three nodes each with two
763 cpus. A four-task cyclic distribution request will dis‐
764 tribute those tasks to the nodes with tasks one and four
765 on the first node, task two on the second node, and task
766 three on the third node. Note that when SelectType is
767 select/cons_res, the same number of CPUs may not be allo‐
768 cated on each node. Task distribution will be round-robin
769 among all the nodes with CPUs yet to be assigned to
770 tasks. Cyclic distribution is the default behavior if
771 the number of tasks is no larger than the number of allo‐
772 cated nodes.
773
774 plane The tasks are distributed in blocks of size <size>. The
775 size must be given or SLURM_DIST_PLANESIZE must be set.
776 The number of tasks distributed to each node is the same
777 as for cyclic distribution, but the taskids assigned to
778 each node depend on the plane size. Additional distribu‐
779 tion specifications cannot be combined with this option.
780 For more details (including examples and diagrams),
781 please see https://slurm.schedmd.com/mc_support.html and
782 https://slurm.schedmd.com/dist_plane.html
783
784 arbitrary
785 The arbitrary method of distribution will allocate pro‐
786 cesses in-order as listed in file designated by the envi‐
787 ronment variable SLURM_HOSTFILE. If this variable is
788 listed it will over ride any other method specified. If
789 not set the method will default to block. Inside the
790 hostfile must contain at minimum the number of hosts re‐
791 quested and be one per line or comma separated. If spec‐
792 ifying a task count (-n, --ntasks=<number>), your tasks
793 will be laid out on the nodes in the order of the file.
794 NOTE: The arbitrary distribution option on a job alloca‐
795 tion only controls the nodes to be allocated to the job
796 and not the allocation of CPUs on those nodes. This op‐
797 tion is meant primarily to control a job step's task lay‐
798 out in an existing job allocation for the srun command.
799 NOTE: If the number of tasks is given and a list of re‐
800 quested nodes is also given, the number of nodes used
801 from that list will be reduced to match that of the num‐
802 ber of tasks if the number of nodes in the list is
803 greater than the number of tasks.
804
805 Second distribution method (distribution of CPUs across sockets
806 for binding):
807
808
809 * Use the default method for distributing CPUs across sock‐
810 ets (cyclic).
811
812 block The block distribution method will distribute allocated
813 CPUs consecutively from the same socket for binding to
814 tasks, before using the next consecutive socket.
815
816 cyclic The cyclic distribution method will distribute allocated
817 CPUs for binding to a given task consecutively from the
818 same socket, and from the next consecutive socket for the
819 next task, in a round-robin fashion across sockets.
820 Tasks requiring more than one CPU will have all of those
821 CPUs allocated on a single socket if possible.
822
823 fcyclic
824 The fcyclic distribution method will distribute allocated
825 CPUs for binding to tasks from consecutive sockets in a
826 round-robin fashion across the sockets. Tasks requiring
827 more than one CPU will have each CPUs allocated in a
828 cyclic fashion across sockets.
829
830 Third distribution method (distribution of CPUs across cores for
831 binding):
832
833
834 * Use the default method for distributing CPUs across cores
835 (inherited from second distribution method).
836
837 block The block distribution method will distribute allocated
838 CPUs consecutively from the same core for binding to
839 tasks, before using the next consecutive core.
840
841 cyclic The cyclic distribution method will distribute allocated
842 CPUs for binding to a given task consecutively from the
843 same core, and from the next consecutive core for the
844 next task, in a round-robin fashion across cores.
845
846 fcyclic
847 The fcyclic distribution method will distribute allocated
848 CPUs for binding to tasks from consecutive cores in a
849 round-robin fashion across the cores.
850
851 Optional control for task distribution over nodes:
852
853
854 Pack Rather than evenly distributing a job step's tasks evenly
855 across its allocated nodes, pack them as tightly as pos‐
856 sible on the nodes. This only applies when the "block"
857 task distribution method is used.
858
859 NoPack Rather than packing a job step's tasks as tightly as pos‐
860 sible on the nodes, distribute them evenly. This user
861 option will supersede the SelectTypeParameters
862 CR_Pack_Nodes configuration parameter.
863
864 This option applies to job and step allocations.
865
866 --epilog={none|<executable>}
867 srun will run executable just after the job step completes. The
868 command line arguments for executable will be the command and
869 arguments of the job step. If none is specified, then no srun
870 epilog will be run. This parameter overrides the SrunEpilog pa‐
871 rameter in slurm.conf. This parameter is completely independent
872 from the Epilog parameter in slurm.conf. This option applies to
873 job allocations.
874
875 -e, --error=<filename_pattern>
876 Specify how stderr is to be redirected. By default in interac‐
877 tive mode, srun redirects stderr to the same file as stdout, if
878 one is specified. The --error option is provided to allow stdout
879 and stderr to be redirected to different locations. See IO Re‐
880 direction below for more options. If the specified file already
881 exists, it will be overwritten. This option applies to job and
882 step allocations.
883
884 --exact
885 Allow a step access to only the resources requested for the
886 step. By default, all non-GRES resources on each node in the
887 step allocation will be used. This option only applies to step
888 allocations.
889 NOTE: Parallel steps will either be blocked or rejected until
890 requested step resources are available unless --overlap is spec‐
891 ified. Job resources can be held after the completion of an srun
892 command while Slurm does job cleanup. Step epilogs and/or SPANK
893 plugins can further delay the release of step resources.
894
895 -x, --exclude={<host1[,<host2>...]|<filename>}
896 Request that a specific list of hosts not be included in the re‐
897 sources allocated to this job. The host list will be assumed to
898 be a filename if it contains a "/" character. This option ap‐
899 plies to job and step allocations.
900
901 --exclusive[={user|mcs}]
902 This option applies to job and job step allocations, and has two
903 slightly different meanings for each one. When used to initiate
904 a job, the job allocation cannot share nodes with other running
905 jobs (or just other users with the "=user" option or "=mcs" op‐
906 tion). If user/mcs are not specified (i.e. the job allocation
907 can not share nodes with other running jobs), the job is allo‐
908 cated all CPUs and GRES on all nodes in the allocation, but is
909 only allocated as much memory as it requested. This is by design
910 to support gang scheduling, because suspended jobs still reside
911 in memory. To request all the memory on a node, use --mem=0.
912 The default shared/exclusive behavior depends on system configu‐
913 ration and the partition's OverSubscribe option takes precedence
914 over the job's option. NOTE: Since shared GRES (MPS) cannot be
915 allocated at the same time as a sharing GRES (GPU) this option
916 only allocates all sharing GRES and no underlying shared GRES.
917
918 This option can also be used when initiating more than one job
919 step within an existing resource allocation (default), where you
920 want separate processors to be dedicated to each job step. If
921 sufficient processors are not available to initiate the job
922 step, it will be deferred. This can be thought of as providing a
923 mechanism for resource management to the job within its alloca‐
924 tion (--exact implied).
925
926 The exclusive allocation of CPUs applies to job steps by de‐
927 fault, but --exact is NOT the default. In other words, the de‐
928 fault behavior is this: job steps will not share CPUs, but job
929 steps will be allocated all CPUs available to the job on all
930 nodes allocated to the steps.
931
932 In order to share the resources use the --overlap option.
933
934 See EXAMPLE below.
935
936 --export={[ALL,]<environment_variables>|ALL|NONE}
937 Identify which environment variables from the submission envi‐
938 ronment are propagated to the launched application.
939
940 --export=ALL
941 Default mode if --export is not specified. All of the
942 user's environment will be loaded from the caller's
943 environment.
944
945 --export=NONE
946 None of the user environment will be defined. User
947 must use absolute path to the binary to be executed
948 that will define the environment. User can not specify
949 explicit environment variables with "NONE".
950
951 This option is particularly important for jobs that
952 are submitted on one cluster and execute on a differ‐
953 ent cluster (e.g. with different paths). To avoid
954 steps inheriting environment export settings (e.g.
955 "NONE") from sbatch command, either set --export=ALL
956 or the environment variable SLURM_EXPORT_ENV should be
957 set to "ALL".
958
959 --export=[ALL,]<environment_variables>
960 Exports all SLURM* environment variables along with
961 explicitly defined variables. Multiple environment
962 variable names should be comma separated. Environment
963 variable names may be specified to propagate the cur‐
964 rent value (e.g. "--export=EDITOR") or specific values
965 may be exported (e.g. "--export=EDITOR=/bin/emacs").
966 If "ALL" is specified, then all user environment vari‐
967 ables will be loaded and will take precedence over any
968 explicitly given environment variables.
969
970 Example: --export=EDITOR,ARG1=test
971 In this example, the propagated environment will only
972 contain the variable EDITOR from the user's environ‐
973 ment, SLURM_* environment variables, and ARG1=test.
974
975 Example: --export=ALL,EDITOR=/bin/emacs
976 There are two possible outcomes for this example. If
977 the caller has the EDITOR environment variable de‐
978 fined, then the job's environment will inherit the
979 variable from the caller's environment. If the caller
980 doesn't have an environment variable defined for EDI‐
981 TOR, then the job's environment will use the value
982 given by --export.
983
984 -B, --extra-node-info=<sockets>[:cores[:threads]]
985 Restrict node selection to nodes with at least the specified
986 number of sockets, cores per socket and/or threads per core.
987 NOTE: These options do not specify the resource allocation size.
988 Each value specified is considered a minimum. An asterisk (*)
989 can be used as a placeholder indicating that all available re‐
990 sources of that type are to be utilized. Values can also be
991 specified as min-max. The individual levels can also be speci‐
992 fied in separate options if desired:
993
994 --sockets-per-node=<sockets>
995 --cores-per-socket=<cores>
996 --threads-per-core=<threads>
997 If task/affinity plugin is enabled, then specifying an alloca‐
998 tion in this manner also sets a default --cpu-bind option of
999 threads if the -B option specifies a thread count, otherwise an
1000 option of cores if a core count is specified, otherwise an op‐
1001 tion of sockets. If SelectType is configured to se‐
1002 lect/cons_res, it must have a parameter of CR_Core, CR_Core_Mem‐
1003 ory, CR_Socket, or CR_Socket_Memory for this option to be hon‐
1004 ored. If not specified, the scontrol show job will display
1005 'ReqS:C:T=*:*:*'. This option applies to job allocations.
1006 NOTE: This option is mutually exclusive with --hint,
1007 --threads-per-core and --ntasks-per-core.
1008 NOTE: If the number of sockets, cores and threads were all spec‐
1009 ified, the number of nodes was specified (as a fixed number, not
1010 a range) and the number of tasks was NOT specified, srun will
1011 implicitly calculate the number of tasks as one task per thread.
1012
1013 --gid=<group>
1014 If srun is run as root, and the --gid option is used, submit the
1015 job with group's group access permissions. group may be the
1016 group name or the numerical group ID. This option applies to job
1017 allocations.
1018
1019 --gpu-bind=[verbose,]<type>
1020 Bind tasks to specific GPUs. By default every spawned task can
1021 access every GPU allocated to the step. If "verbose," is speci‐
1022 fied before <type>, then print out GPU binding debug information
1023 to the stderr of the tasks. GPU binding is ignored if there is
1024 only one task.
1025
1026 Supported type options:
1027
1028 closest Bind each task to the GPU(s) which are closest. In a
1029 NUMA environment, each task may be bound to more than
1030 one GPU (i.e. all GPUs in that NUMA environment).
1031
1032 map_gpu:<list>
1033 Bind by setting GPU masks on tasks (or ranks) as spec‐
1034 ified where <list> is
1035 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
1036 are interpreted as decimal values unless they are pre‐
1037 ceded with '0x' in which case they interpreted as
1038 hexadecimal values. If the number of tasks (or ranks)
1039 exceeds the number of elements in this list, elements
1040 in the list will be reused as needed starting from the
1041 beginning of the list. To simplify support for large
1042 task counts, the lists may follow a map with an aster‐
1043 isk and repetition count. For example
1044 "map_gpu:0*4,1*4". If the task/cgroup plugin is used
1045 and ConstrainDevices is set in cgroup.conf, then the
1046 GPU IDs are zero-based indexes relative to the GPUs
1047 allocated to the job (e.g. the first GPU is 0, even if
1048 the global ID is 3). Otherwise, the GPU IDs are global
1049 IDs, and all GPUs on each node in the job should be
1050 allocated for predictable binding results.
1051
1052 mask_gpu:<list>
1053 Bind by setting GPU masks on tasks (or ranks) as spec‐
1054 ified where <list> is
1055 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
1056 mapping is specified for a node and identical mapping
1057 is applied to the tasks on every node (i.e. the lowest
1058 task ID on each node is mapped to the first mask spec‐
1059 ified in the list, etc.). GPU masks are always inter‐
1060 preted as hexadecimal values but can be preceded with
1061 an optional '0x'. To simplify support for large task
1062 counts, the lists may follow a map with an asterisk
1063 and repetition count. For example
1064 "mask_gpu:0x0f*4,0xf0*4". If the task/cgroup plugin
1065 is used and ConstrainDevices is set in cgroup.conf,
1066 then the GPU IDs are zero-based indexes relative to
1067 the GPUs allocated to the job (e.g. the first GPU is
1068 0, even if the global ID is 3). Otherwise, the GPU IDs
1069 are global IDs, and all GPUs on each node in the job
1070 should be allocated for predictable binding results.
1071
1072 none Do not bind tasks to GPUs (turns off binding if
1073 --gpus-per-task is requested).
1074
1075 per_task:<gpus_per_task>
1076 Each task will be bound to the number of gpus speci‐
1077 fied in <gpus_per_task>. Gpus are assigned in order to
1078 tasks. The first task will be assigned the first x
1079 number of gpus on the node etc.
1080
1081 single:<tasks_per_gpu>
1082 Like --gpu-bind=closest, except that each task can
1083 only be bound to a single GPU, even when it can be
1084 bound to multiple GPUs that are equally close. The
1085 GPU to bind to is determined by <tasks_per_gpu>, where
1086 the first <tasks_per_gpu> tasks are bound to the first
1087 GPU available, the second <tasks_per_gpu> tasks are
1088 bound to the second GPU available, etc. This is basi‐
1089 cally a block distribution of tasks onto available
1090 GPUs, where the available GPUs are determined by the
1091 socket affinity of the task and the socket affinity of
1092 the GPUs as specified in gres.conf's Cores parameter.
1093
1094 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
1095 Request that GPUs allocated to the job are configured with spe‐
1096 cific frequency values. This option can be used to indepen‐
1097 dently configure the GPU and its memory frequencies. After the
1098 job is completed, the frequencies of all affected GPUs will be
1099 reset to the highest possible values. In some cases, system
1100 power caps may override the requested values. The field type
1101 can be "memory". If type is not specified, the GPU frequency is
1102 implied. The value field can either be "low", "medium", "high",
1103 "highm1" or a numeric value in megahertz (MHz). If the speci‐
1104 fied numeric value is not possible, a value as close as possible
1105 will be used. See below for definition of the values. The ver‐
1106 bose option causes current GPU frequency information to be
1107 logged. Examples of use include "--gpu-freq=medium,memory=high"
1108 and "--gpu-freq=450".
1109
1110 Supported value definitions:
1111
1112 low the lowest available frequency.
1113
1114 medium attempts to set a frequency in the middle of the
1115 available range.
1116
1117 high the highest available frequency.
1118
1119 highm1 (high minus one) will select the next highest avail‐
1120 able frequency.
1121
1122 -G, --gpus=[type:]<number>
1123 Specify the total number of GPUs required for the job. An op‐
1124 tional GPU type specification can be supplied. For example
1125 "--gpus=volta:3". Multiple options can be requested in a comma
1126 separated list, for example: "--gpus=volta:3,kepler:1". See
1127 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
1128 options.
1129 NOTE: The allocation has to contain at least one GPU per node.
1130
1131 --gpus-per-node=[type:]<number>
1132 Specify the number of GPUs required for the job on each node in‐
1133 cluded in the job's resource allocation. An optional GPU type
1134 specification can be supplied. For example
1135 "--gpus-per-node=volta:3". Multiple options can be requested in
1136 a comma separated list, for example:
1137 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
1138 --gpus-per-socket and --gpus-per-task options.
1139
1140 --gpus-per-socket=[type:]<number>
1141 Specify the number of GPUs required for the job on each socket
1142 included in the job's resource allocation. An optional GPU type
1143 specification can be supplied. For example
1144 "--gpus-per-socket=volta:3". Multiple options can be requested
1145 in a comma separated list, for example:
1146 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
1147 sockets per node count ( --sockets-per-node). See also the
1148 --gpus, --gpus-per-node and --gpus-per-task options. This op‐
1149 tion applies to job allocations.
1150
1151 --gpus-per-task=[type:]<number>
1152 Specify the number of GPUs required for the job on each task to
1153 be spawned in the job's resource allocation. An optional GPU
1154 type specification can be supplied. For example
1155 "--gpus-per-task=volta:1". Multiple options can be requested in
1156 a comma separated list, for example:
1157 "--gpus-per-task=volta:3,kepler:1". See also the --gpus,
1158 --gpus-per-socket and --gpus-per-node options. This option re‐
1159 quires an explicit task count, e.g. -n, --ntasks or "--gpus=X
1160 --gpus-per-task=Y" rather than an ambiguous range of nodes with
1161 -N, --nodes. This option will implicitly set
1162 --gpu-bind=per_task:<gpus_per_task>, but that can be overridden
1163 with an explicit --gpu-bind specification.
1164
1165 --gres=<list>
1166 Specifies a comma-delimited list of generic consumable re‐
1167 sources. The format of each entry on the list is
1168 "name[[:type]:count]". The name is that of the consumable re‐
1169 source. The count is the number of those resources with a de‐
1170 fault value of 1. The count can have a suffix of "k" or "K"
1171 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
1172 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
1173 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
1174 x 1024 x 1024 x 1024). The specified resources will be allo‐
1175 cated to the job on each node. The available generic consumable
1176 resources is configurable by the system administrator. A list
1177 of available generic consumable resources will be printed and
1178 the command will exit if the option argument is "help". Exam‐
1179 ples of use include "--gres=gpu:2", "--gres=gpu:kepler:2", and
1180 "--gres=help". NOTE: This option applies to job and step allo‐
1181 cations. By default, a job step is allocated all of the generic
1182 resources that have been allocated to the job. To change the
1183 behavior so that each job step is allocated no generic re‐
1184 sources, explicitly set the value of --gres to specify zero
1185 counts for each generic resource OR set "--gres=none" OR set the
1186 SLURM_STEP_GRES environment variable to "none".
1187
1188 --gres-flags=<type>
1189 Specify generic resource task binding options. This option ap‐
1190 plies to job allocations.
1191
1192 disable-binding
1193 Disable filtering of CPUs with respect to generic re‐
1194 source locality. This option is currently required to
1195 use more CPUs than are bound to a GRES (i.e. if a GPU is
1196 bound to the CPUs on one socket, but resources on more
1197 than one socket are required to run the job). This op‐
1198 tion may permit a job to be allocated resources sooner
1199 than otherwise possible, but may result in lower job per‐
1200 formance.
1201 NOTE: This option is specific to SelectType=cons_res.
1202
1203 enforce-binding
1204 The only CPUs available to the job will be those bound to
1205 the selected GRES (i.e. the CPUs identified in the
1206 gres.conf file will be strictly enforced). This option
1207 may result in delayed initiation of a job. For example a
1208 job requiring two GPUs and one CPU will be delayed until
1209 both GPUs on a single socket are available rather than
1210 using GPUs bound to separate sockets, however, the appli‐
1211 cation performance may be improved due to improved commu‐
1212 nication speed. Requires the node to be configured with
1213 more than one socket and resource filtering will be per‐
1214 formed on a per-socket basis.
1215 NOTE: This option is specific to SelectType=cons_tres.
1216
1217 -h, --help
1218 Display help information and exit.
1219
1220 --het-group=<expr>
1221 Identify each component in a heterogeneous job allocation for
1222 which a step is to be created. Applies only to srun commands is‐
1223 sued inside a salloc allocation or sbatch script. <expr> is a
1224 set of integers corresponding to one or more options offsets on
1225 the salloc or sbatch command line. Examples: "--het-group=2",
1226 "--het-group=0,4", "--het-group=1,3-5". The default value is
1227 --het-group=0.
1228
1229 --hint=<type>
1230 Bind tasks according to application hints.
1231 NOTE: This option cannot be used in conjunction with any of
1232 --ntasks-per-core, --threads-per-core, --cpu-bind (other than
1233 --cpu-bind=verbose) or -B. If --hint is specified as a command
1234 line argument, it will take precedence over the environment.
1235
1236 compute_bound
1237 Select settings for compute bound applications: use all
1238 cores in each socket, one thread per core.
1239
1240 memory_bound
1241 Select settings for memory bound applications: use only
1242 one core in each socket, one thread per core.
1243
1244 [no]multithread
1245 [don't] use extra threads with in-core multi-threading
1246 which can benefit communication intensive applications.
1247 Only supported with the task/affinity plugin.
1248
1249 help show this help message
1250
1251 This option applies to job allocations.
1252
1253 -H, --hold
1254 Specify the job is to be submitted in a held state (priority of
1255 zero). A held job can now be released using scontrol to reset
1256 its priority (e.g. "scontrol release <job_id>"). This option ap‐
1257 plies to job allocations.
1258
1259 -I, --immediate[=<seconds>]
1260 exit if resources are not available within the time period spec‐
1261 ified. If no argument is given (seconds defaults to 1), re‐
1262 sources must be available immediately for the request to suc‐
1263 ceed. If defer is configured in SchedulerParameters and sec‐
1264 onds=1 the allocation request will fail immediately; defer con‐
1265 flicts and takes precedence over this option. By default, --im‐
1266 mediate is off, and the command will block until resources be‐
1267 come available. Since this option's argument is optional, for
1268 proper parsing the single letter option must be followed immedi‐
1269 ately with the value and not include a space between them. For
1270 example "-I60" and not "-I 60". This option applies to job and
1271 step allocations.
1272
1273 -i, --input=<mode>
1274 Specify how stdin is to redirected. By default, srun redirects
1275 stdin from the terminal to all tasks. See IO Redirection below
1276 for more options. For OS X, the poll() function does not sup‐
1277 port stdin, so input from a terminal is not possible. This op‐
1278 tion applies to job and step allocations.
1279
1280 -J, --job-name=<jobname>
1281 Specify a name for the job. The specified name will appear along
1282 with the job id number when querying running jobs on the system.
1283 The default is the supplied executable program's name. NOTE:
1284 This information may be written to the slurm_jobacct.log file.
1285 This file is space delimited so if a space is used in the job‐
1286 name name it will cause problems in properly displaying the con‐
1287 tents of the slurm_jobacct.log file when the sacct command is
1288 used. This option applies to job and step allocations.
1289
1290 --jobid=<jobid>
1291 Initiate a job step under an already allocated job with job id
1292 id. Using this option will cause srun to behave exactly as if
1293 the SLURM_JOB_ID environment variable was set. This option ap‐
1294 plies to step allocations.
1295
1296 -K, --kill-on-bad-exit[=0|1]
1297 Controls whether or not to terminate a step if any task exits
1298 with a non-zero exit code. If this option is not specified, the
1299 default action will be based upon the Slurm configuration param‐
1300 eter of KillOnBadExit. If this option is specified, it will take
1301 precedence over KillOnBadExit. An option argument of zero will
1302 not terminate the job. A non-zero argument or no argument will
1303 terminate the job. Note: This option takes precedence over the
1304 -W, --wait option to terminate the job immediately if a task ex‐
1305 its with a non-zero exit code. Since this option's argument is
1306 optional, for proper parsing the single letter option must be
1307 followed immediately with the value and not include a space be‐
1308 tween them. For example "-K1" and not "-K 1".
1309
1310 -l, --label
1311 Prepend task number to lines of stdout/err. The --label option
1312 will prepend lines of output with the remote task id. This op‐
1313 tion applies to step allocations.
1314
1315 -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
1316 Specification of licenses (or other resources available on all
1317 nodes of the cluster) which must be allocated to this job. Li‐
1318 cense names can be followed by a colon and count (the default
1319 count is one). Multiple license names should be comma separated
1320 (e.g. "--licenses=foo:4,bar"). This option applies to job allo‐
1321 cations.
1322
1323 NOTE: When submitting heterogeneous jobs, license requests only
1324 work correctly when made on the first component job. For exam‐
1325 ple "srun -L ansys:2 : myexecutable".
1326
1327 --mail-type=<type>
1328 Notify user by email when certain event types occur. Valid type
1329 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1330 BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
1331 VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1332 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1333 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
1334 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1335 time limit). Multiple type values may be specified in a comma
1336 separated list. The user to be notified is indicated with
1337 --mail-user. This option applies to job allocations.
1338
1339 --mail-user=<user>
1340 User to receive email notification of state changes as defined
1341 by --mail-type. The default value is the submitting user. This
1342 option applies to job allocations.
1343
1344 --mcs-label=<mcs>
1345 Used only when the mcs/group plugin is enabled. This parameter
1346 is a group among the groups of the user. Default value is cal‐
1347 culated by the Plugin mcs if it's enabled. This option applies
1348 to job allocations.
1349
1350 --mem=<size>[units]
1351 Specify the real memory required per node. Default units are
1352 megabytes. Different units can be specified using the suffix
1353 [K|M|G|T]. Default value is DefMemPerNode and the maximum value
1354 is MaxMemPerNode. If configured, both of parameters can be seen
1355 using the scontrol show config command. This parameter would
1356 generally be used if whole nodes are allocated to jobs (Select‐
1357 Type=select/linear). Specifying a memory limit of zero for a
1358 job step will restrict the job step to the amount of memory al‐
1359 located to the job, but not remove any of the job's memory allo‐
1360 cation from being available to other job steps. Also see
1361 --mem-per-cpu and --mem-per-gpu. The --mem, --mem-per-cpu and
1362 --mem-per-gpu options are mutually exclusive. If --mem,
1363 --mem-per-cpu or --mem-per-gpu are specified as command line ar‐
1364 guments, then they will take precedence over the environment
1365 (potentially inherited from salloc or sbatch).
1366
1367 NOTE: A memory size specification of zero is treated as a spe‐
1368 cial case and grants the job access to all of the memory on each
1369 node for newly submitted jobs and all available job memory to
1370 new job steps.
1371
1372 Specifying new memory limits for job steps are only advisory.
1373
1374 If the job is allocated multiple nodes in a heterogeneous clus‐
1375 ter, the memory limit on each node will be that of the node in
1376 the allocation with the smallest memory size (same limit will
1377 apply to every node in the job's allocation).
1378
1379 NOTE: Enforcement of memory limits currently relies upon the
1380 task/cgroup plugin or enabling of accounting, which samples mem‐
1381 ory use on a periodic basis (data need not be stored, just col‐
1382 lected). In both cases memory use is based upon the job's Resi‐
1383 dent Set Size (RSS). A task may exceed the memory limit until
1384 the next periodic accounting sample.
1385
1386 This option applies to job and step allocations.
1387
1388 --mem-bind=[{quiet|verbose},]<type>
1389 Bind tasks to memory. Used only when the task/affinity plugin is
1390 enabled and the NUMA memory functions are available. Note that
1391 the resolution of CPU and memory binding may differ on some ar‐
1392 chitectures. For example, CPU binding may be performed at the
1393 level of the cores within a processor while memory binding will
1394 be performed at the level of nodes, where the definition of
1395 "nodes" may differ from system to system. By default no memory
1396 binding is performed; any task using any CPU can use any memory.
1397 This option is typically used to ensure that each task is bound
1398 to the memory closest to its assigned CPU. The use of any type
1399 other than "none" or "local" is not recommended. If you want
1400 greater control, try running a simple test code with the options
1401 "--cpu-bind=verbose,none --mem-bind=verbose,none" to determine
1402 the specific configuration.
1403
1404 NOTE: To have Slurm always report on the selected memory binding
1405 for all commands executed in a shell, you can enable verbose
1406 mode by setting the SLURM_MEM_BIND environment variable value to
1407 "verbose".
1408
1409 The following informational environment variables are set when
1410 --mem-bind is in use:
1411
1412 SLURM_MEM_BIND_LIST
1413 SLURM_MEM_BIND_PREFER
1414 SLURM_MEM_BIND_SORT
1415 SLURM_MEM_BIND_TYPE
1416 SLURM_MEM_BIND_VERBOSE
1417
1418 See the ENVIRONMENT VARIABLES section for a more detailed de‐
1419 scription of the individual SLURM_MEM_BIND* variables.
1420
1421 Supported options include:
1422
1423 help show this help message
1424
1425 local Use memory local to the processor in use
1426
1427 map_mem:<list>
1428 Bind by setting memory masks on tasks (or ranks) as spec‐
1429 ified where <list> is
1430 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1431 ping is specified for a node and identical mapping is ap‐
1432 plied to the tasks on every node (i.e. the lowest task ID
1433 on each node is mapped to the first ID specified in the
1434 list, etc.). NUMA IDs are interpreted as decimal values
1435 unless they are preceded with '0x' in which case they in‐
1436 terpreted as hexadecimal values. If the number of tasks
1437 (or ranks) exceeds the number of elements in this list,
1438 elements in the list will be reused as needed starting
1439 from the beginning of the list. To simplify support for
1440 large task counts, the lists may follow a map with an as‐
1441 terisk and repetition count. For example
1442 "map_mem:0x0f*4,0xf0*4". For predictable binding re‐
1443 sults, all CPUs for each node in the job should be allo‐
1444 cated to the job.
1445
1446 mask_mem:<list>
1447 Bind by setting memory masks on tasks (or ranks) as spec‐
1448 ified where <list> is
1449 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1450 mapping is specified for a node and identical mapping is
1451 applied to the tasks on every node (i.e. the lowest task
1452 ID on each node is mapped to the first mask specified in
1453 the list, etc.). NUMA masks are always interpreted as
1454 hexadecimal values. Note that masks must be preceded
1455 with a '0x' if they don't begin with [0-9] so they are
1456 seen as numerical values. If the number of tasks (or
1457 ranks) exceeds the number of elements in this list, ele‐
1458 ments in the list will be reused as needed starting from
1459 the beginning of the list. To simplify support for large
1460 task counts, the lists may follow a mask with an asterisk
1461 and repetition count. For example "mask_mem:0*4,1*4".
1462 For predictable binding results, all CPUs for each node
1463 in the job should be allocated to the job.
1464
1465 no[ne] don't bind tasks to memory (default)
1466
1467 nosort avoid sorting free cache pages (default, LaunchParameters
1468 configuration parameter can override this default)
1469
1470 p[refer]
1471 Prefer use of first specified NUMA node, but permit
1472 use of other available NUMA nodes.
1473
1474 q[uiet]
1475 quietly bind before task runs (default)
1476
1477 rank bind by task rank (not recommended)
1478
1479 sort sort free cache pages (run zonesort on Intel KNL nodes)
1480
1481 v[erbose]
1482 verbosely report binding before task runs
1483
1484 This option applies to job and step allocations.
1485
1486 --mem-per-cpu=<size>[units]
1487 Minimum memory required per allocated CPU. Default units are
1488 megabytes. Different units can be specified using the suffix
1489 [K|M|G|T]. The default value is DefMemPerCPU and the maximum
1490 value is MaxMemPerCPU (see exception below). If configured, both
1491 parameters can be seen using the scontrol show config command.
1492 Note that if the job's --mem-per-cpu value exceeds the config‐
1493 ured MaxMemPerCPU, then the user's limit will be treated as a
1494 memory limit per task; --mem-per-cpu will be reduced to a value
1495 no larger than MaxMemPerCPU; --cpus-per-task will be set and the
1496 value of --cpus-per-task multiplied by the new --mem-per-cpu
1497 value will equal the original --mem-per-cpu value specified by
1498 the user. This parameter would generally be used if individual
1499 processors are allocated to jobs (SelectType=select/cons_res).
1500 If resources are allocated by core, socket, or whole nodes, then
1501 the number of CPUs allocated to a job may be higher than the
1502 task count and the value of --mem-per-cpu should be adjusted ac‐
1503 cordingly. Specifying a memory limit of zero for a job step
1504 will restrict the job step to the amount of memory allocated to
1505 the job, but not remove any of the job's memory allocation from
1506 being available to other job steps. Also see --mem and
1507 --mem-per-gpu. The --mem, --mem-per-cpu and --mem-per-gpu op‐
1508 tions are mutually exclusive.
1509
1510 NOTE: If the final amount of memory requested by a job can't be
1511 satisfied by any of the nodes configured in the partition, the
1512 job will be rejected. This could happen if --mem-per-cpu is
1513 used with the --exclusive option for a job allocation and
1514 --mem-per-cpu times the number of CPUs on a node is greater than
1515 the total memory of that node.
1516
1517 --mem-per-gpu=<size>[units]
1518 Minimum memory required per allocated GPU. Default units are
1519 megabytes. Different units can be specified using the suffix
1520 [K|M|G|T]. Default value is DefMemPerGPU and is available on
1521 both a global and per partition basis. If configured, the pa‐
1522 rameters can be seen using the scontrol show config and scontrol
1523 show partition commands. Also see --mem. The --mem,
1524 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1525
1526 --mincpus=<n>
1527 Specify a minimum number of logical cpus/processors per node.
1528 This option applies to job allocations.
1529
1530 --mpi=<mpi_type>
1531 Identify the type of MPI to be used. May result in unique initi‐
1532 ation procedures.
1533
1534 list Lists available mpi types to choose from.
1535
1536 pmi2 To enable PMI2 support. The PMI2 support in Slurm works
1537 only if the MPI implementation supports it, in other
1538 words if the MPI has the PMI2 interface implemented. The
1539 --mpi=pmi2 will load the library lib/slurm/mpi_pmi2.so
1540 which provides the server side functionality but the
1541 client side must implement PMI2_Init() and the other in‐
1542 terface calls.
1543
1544 pmix To enable PMIx support (https://pmix.github.io). The PMIx
1545 support in Slurm can be used to launch parallel applica‐
1546 tions (e.g. MPI) if it supports PMIx, PMI2 or PMI1. Slurm
1547 must be configured with pmix support by passing
1548 "--with-pmix=<PMIx installation path>" option to its
1549 "./configure" script.
1550
1551 At the time of writing PMIx is supported in Open MPI
1552 starting from version 2.0. PMIx also supports backward
1553 compatibility with PMI1 and PMI2 and can be used if MPI
1554 was configured with PMI2/PMI1 support pointing to the
1555 PMIx library ("libpmix"). If MPI supports PMI1/PMI2 but
1556 doesn't provide the way to point to a specific implemen‐
1557 tation, a hack'ish solution leveraging LD_PRELOAD can be
1558 used to force "libpmix" usage.
1559
1560 none No special MPI processing. This is the default and works
1561 with many other versions of MPI.
1562
1563 This option applies to step allocations.
1564
1565 --msg-timeout=<seconds>
1566 Modify the job launch message timeout. The default value is
1567 MessageTimeout in the Slurm configuration file slurm.conf.
1568 Changes to this are typically not recommended, but could be use‐
1569 ful to diagnose problems. This option applies to job alloca‐
1570 tions.
1571
1572 --multi-prog
1573 Run a job with different programs and different arguments for
1574 each task. In this case, the executable program specified is ac‐
1575 tually a configuration file specifying the executable and argu‐
1576 ments for each task. See MULTIPLE PROGRAM CONFIGURATION below
1577 for details on the configuration file contents. This option ap‐
1578 plies to step allocations.
1579
1580 --network=<type>
1581 Specify information pertaining to the switch or network. The
1582 interpretation of type is system dependent. This option is sup‐
1583 ported when running Slurm on a Cray natively. It is used to re‐
1584 quest using Network Performance Counters. Only one value per
1585 request is valid. All options are case in-sensitive. In this
1586 configuration supported values include:
1587
1588 system
1589 Use the system-wide network performance counters. Only
1590 nodes requested will be marked in use for the job alloca‐
1591 tion. If the job does not fill up the entire system the
1592 rest of the nodes are not able to be used by other jobs
1593 using NPC, if idle their state will appear as PerfCnts.
1594 These nodes are still available for other jobs not using
1595 NPC.
1596
1597 blade Use the blade network performance counters. Only nodes re‐
1598 quested will be marked in use for the job allocation. If
1599 the job does not fill up the entire blade(s) allocated to
1600 the job those blade(s) are not able to be used by other
1601 jobs using NPC, if idle their state will appear as PerfC‐
1602 nts. These nodes are still available for other jobs not
1603 using NPC.
1604
1605 In all cases the job allocation request must specify the --exclusive
1606 option and the step cannot specify the --overlap option. Otherwise the
1607 request will be denied.
1608
1609 Also with any of these options steps are not allowed to share blades,
1610 so resources would remain idle inside an allocation if the step running
1611 on a blade does not take up all the nodes on the blade.
1612
1613 The network option is also supported on systems with IBM's Parallel En‐
1614 vironment (PE). See IBM's LoadLeveler job command keyword documenta‐
1615 tion about the keyword "network" for more information. Multiple values
1616 may be specified in a comma separated list. All options are case
1617 in-sensitive. Supported values include:
1618
1619 BULK_XFER[=<resources>]
1620 Enable bulk transfer of data using Remote Di‐
1621 rect-Memory Access (RDMA). The optional resources
1622 specification is a numeric value which can have a
1623 suffix of "k", "K", "m", "M", "g" or "G" for kilo‐
1624 bytes, megabytes or gigabytes. NOTE: The resources
1625 specification is not supported by the underlying IBM
1626 infrastructure as of Parallel Environment version
1627 2.2 and no value should be specified at this time.
1628 The devices allocated to a job must all be of the
1629 same type. The default value depends upon depends
1630 upon what hardware is available and in order of
1631 preferences is IPONLY (which is not considered in
1632 User Space mode), HFI, IB, HPCE, and KMUX.
1633
1634 CAU=<count> Number of Collective Acceleration Units (CAU) re‐
1635 quired. Applies only to IBM Power7-IH processors.
1636 Default value is zero. Independent CAU will be al‐
1637 located for each programming interface (MPI, LAPI,
1638 etc.)
1639
1640 DEVNAME=<name>
1641 Specify the device name to use for communications
1642 (e.g. "eth0" or "mlx4_0").
1643
1644 DEVTYPE=<type>
1645 Specify the device type to use for communications.
1646 The supported values of type are: "IB" (InfiniBand),
1647 "HFI" (P7 Host Fabric Interface), "IPONLY" (IP-Only
1648 interfaces), "HPCE" (HPC Ethernet), and
1649
1650 "KMUX" (Kernel Emulation of HPCE). The devices al‐
1651 located to a job must all be of the same type. The
1652 default value depends upon depends upon what hard‐
1653 ware is available and in order of preferences is
1654 IPONLY (which is not considered in User Space mode),
1655 HFI, IB, HPCE, and KMUX.
1656
1657 IMMED =<count>
1658 Number of immediate send slots per window required.
1659 Applies only to IBM Power7-IH processors. Default
1660 value is zero.
1661
1662 INSTANCES =<count>
1663 Specify number of network connections for each task
1664 on each network connection. The default instance
1665 count is 1.
1666
1667 IPV4 Use Internet Protocol (IP) version 4 communications
1668 (default).
1669
1670 IPV6 Use Internet Protocol (IP) version 6 communications.
1671
1672 LAPI Use the LAPI programming interface.
1673
1674 MPI Use the MPI programming interface. MPI is the de‐
1675 fault interface.
1676
1677 PAMI Use the PAMI programming interface.
1678
1679 SHMEM Use the OpenSHMEM programming interface.
1680
1681 SN_ALL Use all available switch networks (default).
1682
1683 SN_SINGLE Use one available switch network.
1684
1685 UPC Use the UPC programming interface.
1686
1687 US Use User Space communications.
1688
1689 Some examples of network specifications:
1690
1691 Instances=2,US,MPI,SN_ALL
1692 Create two user space connections for MPI communications
1693 on every switch network for each task.
1694
1695 US,MPI,Instances=3,Devtype=IB
1696 Create three user space connections for MPI communica‐
1697 tions on every InfiniBand network for each task.
1698
1699 IPV4,LAPI,SN_Single
1700 Create a IP version 4 connection for LAPI communications
1701 on one switch network for each task.
1702
1703 Instances=2,US,LAPI,MPI
1704 Create two user space connections each for LAPI and MPI
1705 communications on every switch network for each task.
1706 Note that SN_ALL is the default option so every switch
1707 network is used. Also note that Instances=2 specifies
1708 that two connections are established for each protocol
1709 (LAPI and MPI) and each task. If there are two networks
1710 and four tasks on the node then a total of 32 connections
1711 are established (2 instances x 2 protocols x 2 networks x
1712 4 tasks).
1713
1714 This option applies to job and step allocations.
1715
1716 --nice[=adjustment]
1717 Run the job with an adjusted scheduling priority within Slurm.
1718 With no adjustment value the scheduling priority is decreased by
1719 100. A negative nice value increases the priority, otherwise de‐
1720 creases it. The adjustment range is +/- 2147483645. Only privi‐
1721 leged users can specify a negative adjustment.
1722
1723 -Z, --no-allocate
1724 Run the specified tasks on a set of nodes without creating a
1725 Slurm "job" in the Slurm queue structure, bypassing the normal
1726 resource allocation step. The list of nodes must be specified
1727 with the -w, --nodelist option. This is a privileged option
1728 only available for the users "SlurmUser" and "root". This option
1729 applies to job allocations.
1730
1731 -k, --no-kill[=off]
1732 Do not automatically terminate a job if one of the nodes it has
1733 been allocated fails. This option applies to job and step allo‐
1734 cations. The job will assume all responsibilities for
1735 fault-tolerance. Tasks launch using this option will not be
1736 considered terminated (e.g. -K, --kill-on-bad-exit and -W,
1737 --wait options will have no effect upon the job step). The ac‐
1738 tive job step (MPI job) will likely suffer a fatal error, but
1739 subsequent job steps may be run if this option is specified.
1740
1741 Specify an optional argument of "off" disable the effect of the
1742 SLURM_NO_KILL environment variable.
1743
1744 The default action is to terminate the job upon node failure.
1745
1746 -F, --nodefile=<node_file>
1747 Much like --nodelist, but the list is contained in a file of
1748 name node file. The node names of the list may also span multi‐
1749 ple lines in the file. Duplicate node names in the file will
1750 be ignored. The order of the node names in the list is not im‐
1751 portant; the node names will be sorted by Slurm.
1752
1753 -w, --nodelist={<node_name_list>|<filename>}
1754 Request a specific list of hosts. The job will contain all of
1755 these hosts and possibly additional hosts as needed to satisfy
1756 resource requirements. The list may be specified as a
1757 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1758 for example), or a filename. The host list will be assumed to
1759 be a filename if it contains a "/" character. If you specify a
1760 minimum node or processor count larger than can be satisfied by
1761 the supplied host list, additional resources will be allocated
1762 on other nodes as needed. Rather than repeating a host name
1763 multiple times, an asterisk and a repetition count may be ap‐
1764 pended to a host name. For example "host1,host1" and "host1*2"
1765 are equivalent. If the number of tasks is given and a list of
1766 requested nodes is also given, the number of nodes used from
1767 that list will be reduced to match that of the number of tasks
1768 if the number of nodes in the list is greater than the number of
1769 tasks. This option applies to job and step allocations.
1770
1771 -N, --nodes=<minnodes>[-maxnodes]
1772 Request that a minimum of minnodes nodes be allocated to this
1773 job. A maximum node count may also be specified with maxnodes.
1774 If only one number is specified, this is used as both the mini‐
1775 mum and maximum node count. The partition's node limits super‐
1776 sede those of the job. If a job's node limits are outside of
1777 the range permitted for its associated partition, the job will
1778 be left in a PENDING state. This permits possible execution at
1779 a later time, when the partition limit is changed. If a job
1780 node limit exceeds the number of nodes configured in the parti‐
1781 tion, the job will be rejected. Note that the environment vari‐
1782 able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1783 ibility) will be set to the count of nodes actually allocated to
1784 the job. See the ENVIRONMENT VARIABLES section for more informa‐
1785 tion. If -N is not specified, the default behavior is to allo‐
1786 cate enough nodes to satisfy the requested resources as ex‐
1787 pressed by per-job specification options, e.g. -n, -c and
1788 --gpus. The job will be allocated as many nodes as possible
1789 within the range specified and without delaying the initiation
1790 of the job. If the number of tasks is given and a number of re‐
1791 quested nodes is also given, the number of nodes used from that
1792 request will be reduced to match that of the number of tasks if
1793 the number of nodes in the request is greater than the number of
1794 tasks. The node count specification may include a numeric value
1795 followed by a suffix of "k" (multiplies numeric value by 1,024)
1796 or "m" (multiplies numeric value by 1,048,576). This option ap‐
1797 plies to job and step allocations.
1798
1799 -n, --ntasks=<number>
1800 Specify the number of tasks to run. Request that srun allocate
1801 resources for ntasks tasks. The default is one task per node,
1802 but note that the --cpus-per-task option will change this de‐
1803 fault. This option applies to job and step allocations.
1804
1805 --ntasks-per-core=<ntasks>
1806 Request the maximum ntasks be invoked on each core. This option
1807 applies to the job allocation, but not to step allocations.
1808 Meant to be used with the --ntasks option. Related to
1809 --ntasks-per-node except at the core level instead of the node
1810 level. Masks will automatically be generated to bind the tasks
1811 to specific cores unless --cpu-bind=none is specified. NOTE:
1812 This option is not supported when using SelectType=select/lin‐
1813 ear.
1814
1815 --ntasks-per-gpu=<ntasks>
1816 Request that there are ntasks tasks invoked for every GPU. This
1817 option can work in two ways: 1) either specify --ntasks in addi‐
1818 tion, in which case a type-less GPU specification will be auto‐
1819 matically determined to satisfy --ntasks-per-gpu, or 2) specify
1820 the GPUs wanted (e.g. via --gpus or --gres) without specifying
1821 --ntasks, and the total task count will be automatically deter‐
1822 mined. The number of CPUs needed will be automatically in‐
1823 creased if necessary to allow for any calculated task count.
1824 This option will implicitly set --gpu-bind=single:<ntasks>, but
1825 that can be overridden with an explicit --gpu-bind specifica‐
1826 tion. This option is not compatible with a node range (i.e.
1827 -N<minnodes-maxnodes>). This option is not compatible with
1828 --gpus-per-task, --gpus-per-socket, or --ntasks-per-node. This
1829 option is not supported unless SelectType=cons_tres is config‐
1830 ured (either directly or indirectly on Cray systems).
1831
1832 --ntasks-per-node=<ntasks>
1833 Request that ntasks be invoked on each node. If used with the
1834 --ntasks option, the --ntasks option will take precedence and
1835 the --ntasks-per-node will be treated as a maximum count of
1836 tasks per node. Meant to be used with the --nodes option. This
1837 is related to --cpus-per-task=ncpus, but does not require knowl‐
1838 edge of the actual number of cpus on each node. In some cases,
1839 it is more convenient to be able to request that no more than a
1840 specific number of tasks be invoked on each node. Examples of
1841 this include submitting a hybrid MPI/OpenMP app where only one
1842 MPI "task/rank" should be assigned to each node while allowing
1843 the OpenMP portion to utilize all of the parallelism present in
1844 the node, or submitting a single setup/cleanup/monitoring job to
1845 each node of a pre-existing allocation as one step in a larger
1846 job script. This option applies to job allocations.
1847
1848 --ntasks-per-socket=<ntasks>
1849 Request the maximum ntasks be invoked on each socket. This op‐
1850 tion applies to the job allocation, but not to step allocations.
1851 Meant to be used with the --ntasks option. Related to
1852 --ntasks-per-node except at the socket level instead of the node
1853 level. Masks will automatically be generated to bind the tasks
1854 to specific sockets unless --cpu-bind=none is specified. NOTE:
1855 This option is not supported when using SelectType=select/lin‐
1856 ear.
1857
1858 --open-mode={append|truncate}
1859 Open the output and error files using append or truncate mode as
1860 specified. For heterogeneous job steps the default value is
1861 "append". Otherwise the default value is specified by the sys‐
1862 tem configuration parameter JobFileAppend. This option applies
1863 to job and step allocations.
1864
1865 -o, --output=<filename_pattern>
1866 Specify the "filename pattern" for stdout redirection. By de‐
1867 fault in interactive mode, srun collects stdout from all tasks
1868 and sends this output via TCP/IP to the attached terminal. With
1869 --output stdout may be redirected to a file, to one file per
1870 task, or to /dev/null. See section IO Redirection below for the
1871 various forms of filename pattern. If the specified file al‐
1872 ready exists, it will be overwritten.
1873
1874 If --error is not also specified on the command line, both std‐
1875 out and stderr will directed to the file specified by --output.
1876 This option applies to job and step allocations.
1877
1878 -O, --overcommit
1879 Overcommit resources. This option applies to job and step allo‐
1880 cations.
1881
1882 When applied to a job allocation (not including jobs requesting
1883 exclusive access to the nodes) the resources are allocated as if
1884 only one task per node is requested. This means that the re‐
1885 quested number of cpus per task (-c, --cpus-per-task) are allo‐
1886 cated per node rather than being multiplied by the number of
1887 tasks. Options used to specify the number of tasks per node,
1888 socket, core, etc. are ignored.
1889
1890 When applied to job step allocations (the srun command when exe‐
1891 cuted within an existing job allocation), this option can be
1892 used to launch more than one task per CPU. Normally, srun will
1893 not allocate more than one process per CPU. By specifying
1894 --overcommit you are explicitly allowing more than one process
1895 per CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1896 mitted to execute per node. NOTE: MAX_TASKS_PER_NODE is defined
1897 in the file slurm.h and is not a variable, it is set at Slurm
1898 build time.
1899
1900 --overlap
1901 Allow steps to overlap each other on the CPUs. By default steps
1902 do not share CPUs with other parallel steps.
1903
1904 -s, --oversubscribe
1905 The job allocation can over-subscribe resources with other run‐
1906 ning jobs. The resources to be over-subscribed can be nodes,
1907 sockets, cores, and/or hyperthreads depending upon configura‐
1908 tion. The default over-subscribe behavior depends on system
1909 configuration and the partition's OverSubscribe option takes
1910 precedence over the job's option. This option may result in the
1911 allocation being granted sooner than if the --oversubscribe op‐
1912 tion was not set and allow higher system utilization, but appli‐
1913 cation performance will likely suffer due to competition for re‐
1914 sources. This option applies to step allocations.
1915
1916 -p, --partition=<partition_names>
1917 Request a specific partition for the resource allocation. If
1918 not specified, the default behavior is to allow the slurm con‐
1919 troller to select the default partition as designated by the
1920 system administrator. If the job can use more than one parti‐
1921 tion, specify their names in a comma separate list and the one
1922 offering earliest initiation will be used with no regard given
1923 to the partition name ordering (although higher priority parti‐
1924 tions will be considered first). When the job is initiated, the
1925 name of the partition used will be placed first in the job
1926 record partition string. This option applies to job allocations.
1927
1928 --power=<flags>
1929 Comma separated list of power management plugin options. Cur‐
1930 rently available flags include: level (all nodes allocated to
1931 the job should have identical power caps, may be disabled by the
1932 Slurm configuration option PowerParameters=job_no_level). This
1933 option applies to job allocations.
1934
1935 -E, --preserve-env
1936 Pass the current values of environment variables
1937 SLURM_JOB_NUM_NODES and SLURM_NTASKS through to the executable,
1938 rather than computing them from command line parameters. This
1939 option applies to job allocations.
1940
1941 --priority=<value>
1942 Request a specific job priority. May be subject to configura‐
1943 tion specific constraints. value should either be a numeric
1944 value or "TOP" (for highest possible value). Only Slurm opera‐
1945 tors and administrators can set the priority of a job. This op‐
1946 tion applies to job allocations only.
1947
1948 --profile={all|none|<type>[,<type>...]}
1949 Enables detailed data collection by the acct_gather_profile
1950 plugin. Detailed data are typically time-series that are stored
1951 in an HDF5 file for the job or an InfluxDB database depending on
1952 the configured plugin. This option applies to job and step al‐
1953 locations.
1954
1955 All All data types are collected. (Cannot be combined with
1956 other values.)
1957
1958 None No data types are collected. This is the default.
1959 (Cannot be combined with other values.)
1960
1961 Valid type values are:
1962
1963 Energy Energy data is collected.
1964
1965 Task Task (I/O, Memory, ...) data is collected.
1966
1967 Filesystem
1968 Filesystem data is collected.
1969
1970 Network
1971 Network (InfiniBand) data is collected.
1972
1973 --prolog=<executable>
1974 srun will run executable just before launching the job step.
1975 The command line arguments for executable will be the command
1976 and arguments of the job step. If executable is "none", then no
1977 srun prolog will be run. This parameter overrides the SrunProlog
1978 parameter in slurm.conf. This parameter is completely indepen‐
1979 dent from the Prolog parameter in slurm.conf. This option ap‐
1980 plies to job allocations.
1981
1982 --propagate[=rlimit[,rlimit...]]
1983 Allows users to specify which of the modifiable (soft) resource
1984 limits to propagate to the compute nodes and apply to their
1985 jobs. If no rlimit is specified, then all resource limits will
1986 be propagated. The following rlimit names are supported by
1987 Slurm (although some options may not be supported on some sys‐
1988 tems):
1989
1990 ALL All limits listed below (default)
1991
1992 NONE No limits listed below
1993
1994 AS The maximum address space (virtual memory) for a
1995 process.
1996
1997 CORE The maximum size of core file
1998
1999 CPU The maximum amount of CPU time
2000
2001 DATA The maximum size of a process's data segment
2002
2003 FSIZE The maximum size of files created. Note that if the
2004 user sets FSIZE to less than the current size of the
2005 slurmd.log, job launches will fail with a 'File size
2006 limit exceeded' error.
2007
2008 MEMLOCK The maximum size that may be locked into memory
2009
2010 NOFILE The maximum number of open files
2011
2012 NPROC The maximum number of processes available
2013
2014 RSS The maximum resident set size. Note that this only has
2015 effect with Linux kernels 2.4.30 or older or BSD.
2016
2017 STACK The maximum stack size
2018
2019 This option applies to job allocations.
2020
2021 --pty Execute task zero in pseudo terminal mode. Implicitly sets
2022 --unbuffered. Implicitly sets --error and --output to /dev/null
2023 for all tasks except task zero, which may cause those tasks to
2024 exit immediately (e.g. shells will typically exit immediately in
2025 that situation). This option applies to step allocations.
2026
2027 -q, --qos=<qos>
2028 Request a quality of service for the job. QOS values can be de‐
2029 fined for each user/cluster/account association in the Slurm
2030 database. Users will be limited to their association's defined
2031 set of qos's when the Slurm configuration parameter, Account‐
2032 ingStorageEnforce, includes "qos" in its definition. This option
2033 applies to job allocations.
2034
2035 -Q, --quiet
2036 Suppress informational messages from srun. Errors will still be
2037 displayed. This option applies to job and step allocations.
2038
2039 --quit-on-interrupt
2040 Quit immediately on single SIGINT (Ctrl-C). Use of this option
2041 disables the status feature normally available when srun re‐
2042 ceives a single Ctrl-C and causes srun to instead immediately
2043 terminate the running job. This option applies to step alloca‐
2044 tions.
2045
2046 --reboot
2047 Force the allocated nodes to reboot before starting the job.
2048 This is only supported with some system configurations and will
2049 otherwise be silently ignored. Only root, SlurmUser or admins
2050 can reboot nodes. This option applies to job allocations.
2051
2052 -r, --relative=<n>
2053 Run a job step relative to node n of the current allocation.
2054 This option may be used to spread several job steps out among
2055 the nodes of the current job. If -r is used, the current job
2056 step will begin at node n of the allocated nodelist, where the
2057 first node is considered node 0. The -r option is not permitted
2058 with -w or -x option and will result in a fatal error when not
2059 running within a prior allocation (i.e. when SLURM_JOB_ID is not
2060 set). The default for n is 0. If the value of --nodes exceeds
2061 the number of nodes identified with the --relative option, a
2062 warning message will be printed and the --relative option will
2063 take precedence. This option applies to step allocations.
2064
2065 --reservation=<reservation_names>
2066 Allocate resources for the job from the named reservation. If
2067 the job can use more than one reservation, specify their names
2068 in a comma separate list and the one offering earliest initia‐
2069 tion. Each reservation will be considered in the order it was
2070 requested. All reservations will be listed in scontrol/squeue
2071 through the life of the job. In accounting the first reserva‐
2072 tion will be seen and after the job starts the reservation used
2073 will replace it.
2074
2075 --resv-ports[=count]
2076 Reserve communication ports for this job. Users can specify the
2077 number of port they want to reserve. The parameter Mpi‐
2078 Params=ports=12000-12999 must be specified in slurm.conf. If not
2079 specified and Slurm's OpenMPI plugin is used, then by default
2080 the number of reserved equal to the highest number of tasks on
2081 any node in the job step allocation. If the number of reserved
2082 ports is zero then no ports is reserved. Used for OpenMPI. This
2083 option applies to job and step allocations.
2084
2085 --send-libs[=yes|no]
2086 If set to yes (or no argument), autodetect and broadcast the ex‐
2087 ecutable's shared object dependencies to allocated compute
2088 nodes. The files are placed in a directory alongside the exe‐
2089 cutable. The LD_LIBRARY_PATH is automatically updated to include
2090 this cache directory as well. This overrides the default behav‐
2091 ior configured in slurm.conf SbcastParameters send_libs. This
2092 option only works in conjunction with --bcast. See also
2093 --bcast-exclude.
2094
2095 --signal=[R:]<sig_num>[@sig_time]
2096 When a job is within sig_time seconds of its end time, send it
2097 the signal sig_num. Due to the resolution of event handling by
2098 Slurm, the signal may be sent up to 60 seconds earlier than
2099 specified. sig_num may either be a signal number or name (e.g.
2100 "10" or "USR1"). sig_time must have an integer value between 0
2101 and 65535. By default, no signal is sent before the job's end
2102 time. If a sig_num is specified without any sig_time, the de‐
2103 fault time will be 60 seconds. This option applies to job allo‐
2104 cations. Use the "R:" option to allow this job to overlap with
2105 a reservation with MaxStartDelay set. To have the signal sent
2106 at preemption time see the preempt_send_user_signal SlurmctldPa‐
2107 rameter.
2108
2109 --slurmd-debug=<level>
2110 Specify a debug level for slurmd(8). The level may be specified
2111 either an integer value between 0 [quiet, only errors are dis‐
2112 played] and 4 [verbose operation] or the SlurmdDebug tags.
2113
2114 quiet Log nothing
2115
2116 fatal Log only fatal errors
2117
2118 error Log only errors
2119
2120 info Log errors and general informational messages
2121
2122 verbose Log errors and verbose informational messages
2123
2124 The slurmd debug information is copied onto the stderr of the
2125 job. By default only errors are displayed. This option applies
2126 to job and step allocations.
2127
2128 --sockets-per-node=<sockets>
2129 Restrict node selection to nodes with at least the specified
2130 number of sockets. See additional information under -B option
2131 above when task/affinity plugin is enabled. This option applies
2132 to job allocations.
2133 NOTE: This option may implicitly impact the number of tasks if
2134 -n was not specified.
2135
2136 --spread-job
2137 Spread the job allocation over as many nodes as possible and at‐
2138 tempt to evenly distribute tasks across the allocated nodes.
2139 This option disables the topology/tree plugin. This option ap‐
2140 plies to job allocations.
2141
2142 --switches=<count>[@max-time]
2143 When a tree topology is used, this defines the maximum count of
2144 leaf switches desired for the job allocation and optionally the
2145 maximum time to wait for that number of switches. If Slurm finds
2146 an allocation containing more switches than the count specified,
2147 the job remains pending until it either finds an allocation with
2148 desired switch count or the time limit expires. It there is no
2149 switch count limit, there is no delay in starting the job. Ac‐
2150 ceptable time formats include "minutes", "minutes:seconds",
2151 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
2152 "days-hours:minutes:seconds". The job's maximum time delay may
2153 be limited by the system administrator using the SchedulerParam‐
2154 eters configuration parameter with the max_switch_wait parameter
2155 option. On a dragonfly network the only switch count supported
2156 is 1 since communication performance will be highest when a job
2157 is allocate resources on one leaf switch or more than 2 leaf
2158 switches. The default max-time is the max_switch_wait Sched‐
2159 ulerParameters. This option applies to job allocations.
2160
2161 --task-epilog=<executable>
2162 The slurmstepd daemon will run executable just after each task
2163 terminates. This will be executed before any TaskEpilog parame‐
2164 ter in slurm.conf is executed. This is meant to be a very
2165 short-lived program. If it fails to terminate within a few sec‐
2166 onds, it will be killed along with any descendant processes.
2167 This option applies to step allocations.
2168
2169 --task-prolog=<executable>
2170 The slurmstepd daemon will run executable just before launching
2171 each task. This will be executed after any TaskProlog parameter
2172 in slurm.conf is executed. Besides the normal environment vari‐
2173 ables, this has SLURM_TASK_PID available to identify the process
2174 ID of the task being started. Standard output from this program
2175 of the form "export NAME=value" will be used to set environment
2176 variables for the task being spawned. This option applies to
2177 step allocations.
2178
2179 --test-only
2180 Returns an estimate of when a job would be scheduled to run
2181 given the current job queue and all the other srun arguments
2182 specifying the job. This limits srun's behavior to just return
2183 information; no job is actually submitted. The program will be
2184 executed directly by the slurmd daemon. This option applies to
2185 job allocations.
2186
2187 --thread-spec=<num>
2188 Count of specialized threads per node reserved by the job for
2189 system operations and not used by the application. The applica‐
2190 tion will not use these threads, but will be charged for their
2191 allocation. This option can not be used with the --core-spec
2192 option. This option applies to job allocations.
2193
2194 -T, --threads=<nthreads>
2195 Allows limiting the number of concurrent threads used to send
2196 the job request from the srun process to the slurmd processes on
2197 the allocated nodes. Default is to use one thread per allocated
2198 node up to a maximum of 60 concurrent threads. Specifying this
2199 option limits the number of concurrent threads to nthreads (less
2200 than or equal to 60). This should only be used to set a low
2201 thread count for testing on very small memory computers. This
2202 option applies to job allocations.
2203
2204 --threads-per-core=<threads>
2205 Restrict node selection to nodes with at least the specified
2206 number of threads per core. In task layout, use the specified
2207 maximum number of threads per core. Implies --cpu-bind=threads
2208 unless overridden by command line or environment options. NOTE:
2209 "Threads" refers to the number of processing units on each core
2210 rather than the number of application tasks to be launched per
2211 core. See additional information under -B option above when
2212 task/affinity plugin is enabled. This option applies to job and
2213 step allocations.
2214 NOTE: This option may implicitly impact the number of tasks if
2215 -n was not specified.
2216
2217 -t, --time=<time>
2218 Set a limit on the total run time of the job allocation. If the
2219 requested time limit exceeds the partition's time limit, the job
2220 will be left in a PENDING state (possibly indefinitely). The
2221 default time limit is the partition's default time limit. When
2222 the time limit is reached, each task in each job step is sent
2223 SIGTERM followed by SIGKILL. The interval between signals is
2224 specified by the Slurm configuration parameter KillWait. The
2225 OverTimeLimit configuration parameter may permit the job to run
2226 longer than scheduled. Time resolution is one minute and second
2227 values are rounded up to the next minute.
2228
2229 A time limit of zero requests that no time limit be imposed.
2230 Acceptable time formats include "minutes", "minutes:seconds",
2231 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
2232 "days-hours:minutes:seconds". This option applies to job and
2233 step allocations.
2234
2235 --time-min=<time>
2236 Set a minimum time limit on the job allocation. If specified,
2237 the job may have its --time limit lowered to a value no lower
2238 than --time-min if doing so permits the job to begin execution
2239 earlier than otherwise possible. The job's time limit will not
2240 be changed after the job is allocated resources. This is per‐
2241 formed by a backfill scheduling algorithm to allocate resources
2242 otherwise reserved for higher priority jobs. Acceptable time
2243 formats include "minutes", "minutes:seconds", "hours:min‐
2244 utes:seconds", "days-hours", "days-hours:minutes" and
2245 "days-hours:minutes:seconds". This option applies to job alloca‐
2246 tions.
2247
2248 --tmp=<size>[units]
2249 Specify a minimum amount of temporary disk space per node. De‐
2250 fault units are megabytes. Different units can be specified us‐
2251 ing the suffix [K|M|G|T]. This option applies to job alloca‐
2252 tions.
2253
2254 --uid=<user>
2255 Attempt to submit and/or run a job as user instead of the invok‐
2256 ing user id. The invoking user's credentials will be used to
2257 check access permissions for the target partition. User root may
2258 use this option to run jobs as a normal user in a RootOnly par‐
2259 tition for example. If run as root, srun will drop its permis‐
2260 sions to the uid specified after node allocation is successful.
2261 user may be the user name or numerical user ID. This option ap‐
2262 plies to job and step allocations.
2263
2264 -u, --unbuffered
2265 By default, the connection between slurmstepd and the
2266 user-launched application is over a pipe. The stdio output writ‐
2267 ten by the application is buffered by the glibc until it is
2268 flushed or the output is set as unbuffered. See setbuf(3). If
2269 this option is specified the tasks are executed with a pseudo
2270 terminal so that the application output is unbuffered. This op‐
2271 tion applies to step allocations.
2272
2273 --usage
2274 Display brief help message and exit.
2275
2276 --use-min-nodes
2277 If a range of node counts is given, prefer the smaller count.
2278
2279 -v, --verbose
2280 Increase the verbosity of srun's informational messages. Multi‐
2281 ple -v's will further increase srun's verbosity. By default
2282 only errors will be displayed. This option applies to job and
2283 step allocations.
2284
2285 -V, --version
2286 Display version information and exit.
2287
2288 -W, --wait=<seconds>
2289 Specify how long to wait after the first task terminates before
2290 terminating all remaining tasks. A value of 0 indicates an un‐
2291 limited wait (a warning will be issued after 60 seconds). The
2292 default value is set by the WaitTime parameter in the slurm con‐
2293 figuration file (see slurm.conf(5)). This option can be useful
2294 to ensure that a job is terminated in a timely fashion in the
2295 event that one or more tasks terminate prematurely. Note: The
2296 -K, --kill-on-bad-exit option takes precedence over -W, --wait
2297 to terminate the job immediately if a task exits with a non-zero
2298 exit code. This option applies to job allocations.
2299
2300 --wckey=<wckey>
2301 Specify wckey to be used with job. If TrackWCKey=no (default)
2302 in the slurm.conf this value is ignored. This option applies to
2303 job allocations.
2304
2305 --x11[={all|first|last}]
2306 Sets up X11 forwarding on "all", "first" or "last" node(s) of
2307 the allocation. This option is only enabled if Slurm was com‐
2308 piled with X11 support and PrologFlags=x11 is defined in the
2309 slurm.conf. Default is "all".
2310
2311 srun will submit the job request to the slurm job controller, then ini‐
2312 tiate all processes on the remote nodes. If the request cannot be met
2313 immediately, srun will block until the resources are free to run the
2314 job. If the -I (--immediate) option is specified srun will terminate if
2315 resources are not immediately available.
2316
2317 When initiating remote processes srun will propagate the current work‐
2318 ing directory, unless --chdir=<path> is specified, in which case path
2319 will become the working directory for the remote processes.
2320
2321 The -n, -c, and -N options control how CPUs and nodes will be allo‐
2322 cated to the job. When specifying only the number of processes to run
2323 with -n, a default of one CPU per process is allocated. By specifying
2324 the number of CPUs required per task (-c), more than one CPU may be al‐
2325 located per process. If the number of nodes is specified with -N, srun
2326 will attempt to allocate at least the number of nodes specified.
2327
2328 Combinations of the above three options may be used to change how pro‐
2329 cesses are distributed across nodes and cpus. For instance, by specify‐
2330 ing both the number of processes and number of nodes on which to run,
2331 the number of processes per node is implied. However, if the number of
2332 CPUs per process is more important then number of processes (-n) and
2333 the number of CPUs per process (-c) should be specified.
2334
2335 srun will refuse to allocate more than one process per CPU unless
2336 --overcommit (-O) is also specified.
2337
2338 srun will attempt to meet the above specifications "at a minimum." That
2339 is, if 16 nodes are requested for 32 processes, and some nodes do not
2340 have 2 CPUs, the allocation of nodes will be increased in order to meet
2341 the demand for CPUs. In other words, a minimum of 16 nodes are being
2342 requested. However, if 16 nodes are requested for 15 processes, srun
2343 will consider this an error, as 15 processes cannot run across 16
2344 nodes.
2345
2346
2347 IO Redirection
2348
2349 By default, stdout and stderr will be redirected from all tasks to the
2350 stdout and stderr of srun, and stdin will be redirected from the stan‐
2351 dard input of srun to all remote tasks. If stdin is only to be read by
2352 a subset of the spawned tasks, specifying a file to read from rather
2353 than forwarding stdin from the srun command may be preferable as it
2354 avoids moving and storing data that will never be read.
2355
2356 For OS X, the poll() function does not support stdin, so input from a
2357 terminal is not possible.
2358
2359 This behavior may be changed with the --output, --error, and --input
2360 (-o, -e, -i) options. Valid format specifications for these options are
2361
2362
2363 all stdout stderr is redirected from all tasks to srun. stdin is
2364 broadcast to all remote tasks. (This is the default behav‐
2365 ior)
2366
2367 none stdout and stderr is not received from any task. stdin is
2368 not sent to any task (stdin is closed).
2369
2370 taskid stdout and/or stderr are redirected from only the task with
2371 relative id equal to taskid, where 0 <= taskid <= ntasks,
2372 where ntasks is the total number of tasks in the current job
2373 step. stdin is redirected from the stdin of srun to this
2374 same task. This file will be written on the node executing
2375 the task.
2376
2377 filename srun will redirect stdout and/or stderr to the named file
2378 from all tasks. stdin will be redirected from the named file
2379 and broadcast to all tasks in the job. filename refers to a
2380 path on the host that runs srun. Depending on the cluster's
2381 file system layout, this may result in the output appearing
2382 in different places depending on whether the job is run in
2383 batch mode.
2384
2385 filename pattern
2386 srun allows for a filename pattern to be used to generate the
2387 named IO file described above. The following list of format
2388 specifiers may be used in the format string to generate a
2389 filename that will be unique to a given jobid, stepid, node,
2390 or task. In each case, the appropriate number of files are
2391 opened and associated with the corresponding tasks. Note that
2392 any format string containing %t, %n, and/or %N will be writ‐
2393 ten on the node executing the task rather than the node where
2394 srun executes, these format specifiers are not supported on a
2395 BGQ system.
2396
2397 \\ Do not process any of the replacement symbols.
2398
2399 %% The character "%".
2400
2401 %A Job array's master job allocation number.
2402
2403 %a Job array ID (index) number.
2404
2405 %J jobid.stepid of the running job. (e.g. "128.0")
2406
2407 %j jobid of the running job.
2408
2409 %s stepid of the running job.
2410
2411 %N short hostname. This will create a separate IO file
2412 per node.
2413
2414 %n Node identifier relative to current job (e.g. "0" is
2415 the first node of the running job) This will create a
2416 separate IO file per node.
2417
2418 %t task identifier (rank) relative to current job. This
2419 will create a separate IO file per task.
2420
2421 %u User name.
2422
2423 %x Job name.
2424
2425 A number placed between the percent character and format
2426 specifier may be used to zero-pad the result in the IO file‐
2427 name. This number is ignored if the format specifier corre‐
2428 sponds to non-numeric data (%N for example).
2429
2430 Some examples of how the format string may be used for a 4
2431 task job step with a Job ID of 128 and step id of 0 are in‐
2432 cluded below:
2433
2434
2435 job%J.out job128.0.out
2436
2437 job%4j.out job0128.out
2438
2439 job%j-%2t.out job128-00.out, job128-01.out, ...
2440
2442 Executing srun sends a remote procedure call to slurmctld. If enough
2443 calls from srun or other Slurm client commands that send remote proce‐
2444 dure calls to the slurmctld daemon come in at once, it can result in a
2445 degradation of performance of the slurmctld daemon, possibly resulting
2446 in a denial of service.
2447
2448 Do not run srun or other Slurm client commands that send remote proce‐
2449 dure calls to slurmctld from loops in shell scripts or other programs.
2450 Ensure that programs limit calls to srun to the minimum necessary for
2451 the information you are trying to gather.
2452
2453
2455 Upon startup, srun will read and handle the options set in the follow‐
2456 ing environment variables. The majority of these variables are set the
2457 same way the options are set, as defined above. For flag options that
2458 are defined to expect no argument, the option can be enabled by setting
2459 the environment variable without a value (empty or NULL string), the
2460 string 'yes', or a non-zero number. Any other value for the environment
2461 variable will result in the option not being set. There are a couple
2462 exceptions to these rules that are noted below.
2463 NOTE: Command line options always override environment variable set‐
2464 tings.
2465
2466
2467 PMI_FANOUT This is used exclusively with PMI (MPICH2 and
2468 MVAPICH2) and controls the fanout of data commu‐
2469 nications. The srun command sends messages to ap‐
2470 plication programs (via the PMI library) and
2471 those applications may be called upon to forward
2472 that data to up to this number of additional
2473 tasks. Higher values offload work from the srun
2474 command to the applications and likely increase
2475 the vulnerability to failures. The default value
2476 is 32.
2477
2478 PMI_FANOUT_OFF_HOST This is used exclusively with PMI (MPICH2 and
2479 MVAPICH2) and controls the fanout of data commu‐
2480 nications. The srun command sends messages to
2481 application programs (via the PMI library) and
2482 those applications may be called upon to forward
2483 that data to additional tasks. By default, srun
2484 sends one message per host and one task on that
2485 host forwards the data to other tasks on that
2486 host up to PMI_FANOUT. If PMI_FANOUT_OFF_HOST is
2487 defined, the user task may be required to forward
2488 the data to tasks on other hosts. Setting
2489 PMI_FANOUT_OFF_HOST may increase performance.
2490 Since more work is performed by the PMI library
2491 loaded by the user application, failures also can
2492 be more common and more difficult to diagnose.
2493 Should be disabled/enabled by setting to 0 or 1.
2494
2495 PMI_TIME This is used exclusively with PMI (MPICH2 and
2496 MVAPICH2) and controls how much the communica‐
2497 tions from the tasks to the srun are spread out
2498 in time in order to avoid overwhelming the srun
2499 command with work. The default value is 500 (mi‐
2500 croseconds) per task. On relatively slow proces‐
2501 sors or systems with very large processor counts
2502 (and large PMI data sets), higher values may be
2503 required.
2504
2505 SLURM_ACCOUNT Same as -A, --account
2506
2507 SLURM_ACCTG_FREQ Same as --acctg-freq
2508
2509 SLURM_BCAST Same as --bcast
2510
2511 SLURM_BCAST_EXCLUDE Same as --bcast-exclude
2512
2513 SLURM_BURST_BUFFER Same as --bb
2514
2515 SLURM_CLUSTERS Same as -M, --clusters
2516
2517 SLURM_COMPRESS Same as --compress
2518
2519 SLURM_CONF The location of the Slurm configuration file.
2520
2521 SLURM_CONSTRAINT Same as -C, --constraint
2522
2523 SLURM_CORE_SPEC Same as --core-spec
2524
2525 SLURM_CPU_BIND Same as --cpu-bind
2526
2527 SLURM_CPU_FREQ_REQ Same as --cpu-freq.
2528
2529 SLURM_CPUS_PER_GPU Same as --cpus-per-gpu
2530
2531 SLURM_CPUS_PER_TASK Same as -c, --cpus-per-task
2532
2533 SLURM_DEBUG Same as -v, --verbose. Must be set to 0 or 1 to
2534 disable or enable the option.
2535
2536 SLURM_DELAY_BOOT Same as --delay-boot
2537
2538 SLURM_DEPENDENCY Same as -d, --dependency=<jobid>
2539
2540 SLURM_DISABLE_STATUS Same as -X, --disable-status
2541
2542 SLURM_DIST_PLANESIZE Plane distribution size. Only used if --distribu‐
2543 tion=plane, without =<size>, is set.
2544
2545 SLURM_DISTRIBUTION Same as -m, --distribution
2546
2547 SLURM_EPILOG Same as --epilog
2548
2549 SLURM_EXACT Same as --exact
2550
2551 SLURM_EXCLUSIVE Same as --exclusive
2552
2553 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
2554 error occurs (e.g. invalid options). This can be
2555 used by a script to distinguish application exit
2556 codes from various Slurm error conditions. Also
2557 see SLURM_EXIT_IMMEDIATE.
2558
2559 SLURM_EXIT_IMMEDIATE Specifies the exit code generated when the --im‐
2560 mediate option is used and resources are not cur‐
2561 rently available. This can be used by a script
2562 to distinguish application exit codes from vari‐
2563 ous Slurm error conditions. Also see
2564 SLURM_EXIT_ERROR.
2565
2566 SLURM_EXPORT_ENV Same as --export
2567
2568 SLURM_GPU_BIND Same as --gpu-bind
2569
2570 SLURM_GPU_FREQ Same as --gpu-freq
2571
2572 SLURM_GPUS Same as -G, --gpus
2573
2574 SLURM_GPUS_PER_NODE Same as --gpus-per-node
2575
2576 SLURM_GPUS_PER_TASK Same as --gpus-per-task
2577
2578 SLURM_GRES Same as --gres. Also see SLURM_STEP_GRES
2579
2580 SLURM_GRES_FLAGS Same as --gres-flags
2581
2582 SLURM_HINT Same as --hint
2583
2584 SLURM_IMMEDIATE Same as -I, --immediate
2585
2586 SLURM_JOB_ID Same as --jobid
2587
2588 SLURM_JOB_NAME Same as -J, --job-name except within an existing
2589 allocation, in which case it is ignored to avoid
2590 using the batch job's name as the name of each
2591 job step.
2592
2593 SLURM_JOB_NUM_NODES Same as -N, --nodes. Total number of nodes in
2594 the job’s resource allocation.
2595
2596 SLURM_KILL_BAD_EXIT Same as -K, --kill-on-bad-exit. Must be set to 0
2597 or 1 to disable or enable the option.
2598
2599 SLURM_LABELIO Same as -l, --label
2600
2601 SLURM_MEM_BIND Same as --mem-bind
2602
2603 SLURM_MEM_PER_CPU Same as --mem-per-cpu
2604
2605 SLURM_MEM_PER_GPU Same as --mem-per-gpu
2606
2607 SLURM_MEM_PER_NODE Same as --mem
2608
2609 SLURM_MPI_TYPE Same as --mpi
2610
2611 SLURM_NETWORK Same as --network
2612
2613 SLURM_NNODES Same as -N, --nodes. Total number of nodes in the
2614 job’s resource allocation. See
2615 SLURM_JOB_NUM_NODES. Included for backwards com‐
2616 patibility.
2617
2618 SLURM_NO_KILL Same as -k, --no-kill
2619
2620 SLURM_NPROCS Same as -n, --ntasks. See SLURM_NTASKS. Included
2621 for backwards compatibility.
2622
2623 SLURM_NTASKS Same as -n, --ntasks
2624
2625 SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2626
2627 SLURM_NTASKS_PER_GPU Same as --ntasks-per-gpu
2628
2629 SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2630
2631 SLURM_NTASKS_PER_SOCKET
2632 Same as --ntasks-per-socket
2633
2634 SLURM_OPEN_MODE Same as --open-mode
2635
2636 SLURM_OVERCOMMIT Same as -O, --overcommit
2637
2638 SLURM_OVERLAP Same as --overlap
2639
2640 SLURM_PARTITION Same as -p, --partition
2641
2642 SLURM_PMI_KVS_NO_DUP_KEYS
2643 If set, then PMI key-pairs will contain no dupli‐
2644 cate keys. MPI can use this variable to inform
2645 the PMI library that it will not use duplicate
2646 keys so PMI can skip the check for duplicate
2647 keys. This is the case for MPICH2 and reduces
2648 overhead in testing for duplicates for improved
2649 performance
2650
2651 SLURM_POWER Same as --power
2652
2653 SLURM_PROFILE Same as --profile
2654
2655 SLURM_PROLOG Same as --prolog
2656
2657 SLURM_QOS Same as --qos
2658
2659 SLURM_REMOTE_CWD Same as -D, --chdir=
2660
2661 SLURM_REQ_SWITCH When a tree topology is used, this defines the
2662 maximum count of switches desired for the job al‐
2663 location and optionally the maximum time to wait
2664 for that number of switches. See --switches
2665
2666 SLURM_RESERVATION Same as --reservation
2667
2668 SLURM_RESV_PORTS Same as --resv-ports
2669
2670 SLURM_SEND_LIBS Same as --send-libs
2671
2672 SLURM_SIGNAL Same as --signal
2673
2674 SLURM_SPREAD_JOB Same as --spread-job
2675
2676 SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2677 if set and non-zero, successive task exit mes‐
2678 sages with the same exit code will be printed
2679 only once.
2680
2681 SLURM_STDERRMODE Same as -e, --error
2682
2683 SLURM_STDINMODE Same as -i, --input
2684
2685 SLURM_STDOUTMODE Same as -o, --output
2686
2687 SLURM_STEP_GRES Same as --gres (only applies to job steps, not to
2688 job allocations). Also see SLURM_GRES
2689
2690 SLURM_STEP_KILLED_MSG_NODE_ID=ID
2691 If set, only the specified node will log when the
2692 job or step are killed by a signal.
2693
2694 SLURM_TASK_EPILOG Same as --task-epilog
2695
2696 SLURM_TASK_PROLOG Same as --task-prolog
2697
2698 SLURM_TEST_EXEC If defined, srun will verify existence of the ex‐
2699 ecutable program along with user execute permis‐
2700 sion on the node where srun was called before at‐
2701 tempting to launch it on nodes in the step.
2702
2703 SLURM_THREAD_SPEC Same as --thread-spec
2704
2705 SLURM_THREADS Same as -T, --threads
2706
2707 SLURM_THREADS_PER_CORE
2708 Same as --threads-per-core
2709
2710 SLURM_TIMELIMIT Same as -t, --time
2711
2712 SLURM_UNBUFFEREDIO Same as -u, --unbuffered
2713
2714 SLURM_USE_MIN_NODES Same as --use-min-nodes
2715
2716 SLURM_WAIT Same as -W, --wait
2717
2718 SLURM_WAIT4SWITCH Max time waiting for requested switches. See
2719 --switches
2720
2721 SLURM_WCKEY Same as -W, --wckey
2722
2723 SLURM_WORKING_DIR -D, --chdir
2724
2725 SLURMD_DEBUG Same as -d, --slurmd-debug. Must be set to 0 or 1
2726 to disable or enable the option.
2727
2728 SRUN_CONTAINER Same as --container.
2729
2730 SRUN_EXPORT_ENV Same as --export, and will override any setting
2731 for SLURM_EXPORT_ENV.
2732
2734 srun will set some environment variables in the environment of the exe‐
2735 cuting tasks on the remote compute nodes. These environment variables
2736 are:
2737
2738
2739 SLURM_*_HET_GROUP_# For a heterogeneous job allocation, the environ‐
2740 ment variables are set separately for each compo‐
2741 nent.
2742
2743 SLURM_CLUSTER_NAME Name of the cluster on which the job is execut‐
2744 ing.
2745
2746 SLURM_CPU_BIND_LIST --cpu-bind map or mask list (list of Slurm CPU
2747 IDs or masks for this node, CPU_ID = Board_ID x
2748 threads_per_board + Socket_ID x
2749 threads_per_socket + Core_ID x threads_per_core +
2750 Thread_ID).
2751
2752 SLURM_CPU_BIND_TYPE --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2753
2754 SLURM_CPU_BIND_VERBOSE
2755 --cpu-bind verbosity (quiet,verbose).
2756
2757 SLURM_CPU_FREQ_REQ Contains the value requested for cpu frequency on
2758 the srun command as a numerical frequency in
2759 kilohertz, or a coded value for a request of low,
2760 medium,highm1 or high for the frequency. See the
2761 description of the --cpu-freq option or the
2762 SLURM_CPU_FREQ_REQ input environment variable.
2763
2764 SLURM_CPUS_ON_NODE Number of CPUs available to the step on this
2765 node. NOTE: The select/linear plugin allocates
2766 entire nodes to jobs, so the value indicates the
2767 total count of CPUs on the node. For the se‐
2768 lect/cons_res and cons/tres plugins, this number
2769 indicates the number of CPUs on this node allo‐
2770 cated to the step.
2771
2772 SLURM_CPUS_PER_TASK Number of cpus requested per task. Only set if
2773 the --cpus-per-task option is specified.
2774
2775 SLURM_DISTRIBUTION Distribution type for the allocated jobs. Set the
2776 distribution with -m, --distribution.
2777
2778 SLURM_GPUS_ON_NODE Number of GPUs available to the step on this
2779 node.
2780
2781 SLURM_GTIDS Global task IDs running on this node. Zero ori‐
2782 gin and comma separated. It is read internally
2783 by pmi if Slurm was built with pmi support. Leav‐
2784 ing the variable set may cause problems when us‐
2785 ing external packages from within the job (Abaqus
2786 and Ansys have been known to have problems when
2787 it is set - consult the appropriate documentation
2788 for 3rd party software).
2789
2790 SLURM_HET_SIZE Set to count of components in heterogeneous job.
2791
2792 SLURM_JOB_ACCOUNT Account name associated of the job allocation.
2793
2794 SLURM_JOB_CPUS_PER_NODE
2795 Count of CPUs available to the job on the nodes
2796 in the allocation, using the format
2797 CPU_count[(xnumber_of_nodes)][,CPU_count [(xnum‐
2798 ber_of_nodes)] ...]. For example:
2799 SLURM_JOB_CPUS_PER_NODE='72(x2),36' indicates
2800 that on the first and second nodes (as listed by
2801 SLURM_JOB_NODELIST) the allocation has 72 CPUs,
2802 while the third node has 36 CPUs. NOTE: The se‐
2803 lect/linear plugin allocates entire nodes to
2804 jobs, so the value indicates the total count of
2805 CPUs on allocated nodes. The select/cons_res and
2806 select/cons_tres plugins allocate individual CPUs
2807 to jobs, so this number indicates the number of
2808 CPUs allocated to the job.
2809
2810 SLURM_JOB_DEPENDENCY Set to value of the --dependency option.
2811
2812 SLURM_JOB_ID Job id of the executing job.
2813
2814 SLURM_JOB_NAME Set to the value of the --job-name option or the
2815 command name when srun is used to create a new
2816 job allocation. Not set when srun is used only to
2817 create a job step (i.e. within an existing job
2818 allocation).
2819
2820 SLURM_JOB_NODELIST List of nodes allocated to the job.
2821
2822 SLURM_JOB_NODES Total number of nodes in the job's resource allo‐
2823 cation.
2824
2825 SLURM_JOB_PARTITION Name of the partition in which the job is run‐
2826 ning.
2827
2828 SLURM_JOB_QOS Quality Of Service (QOS) of the job allocation.
2829
2830 SLURM_JOB_RESERVATION Advanced reservation containing the job alloca‐
2831 tion, if any.
2832
2833 SLURM_JOBID Job id of the executing job. See SLURM_JOB_ID.
2834 Included for backwards compatibility.
2835
2836 SLURM_LAUNCH_NODE_IPADDR
2837 IP address of the node from which the task launch
2838 was initiated (where the srun command ran from).
2839
2840 SLURM_LOCALID Node local task ID for the process within a job.
2841
2842 SLURM_MEM_BIND_LIST --mem-bind map or mask list (<list of IDs or
2843 masks for this node>).
2844
2845 SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2846
2847 SLURM_MEM_BIND_SORT Sort free cache pages (run zonesort on Intel KNL
2848 nodes).
2849
2850 SLURM_MEM_BIND_TYPE --mem-bind type (none,rank,map_mem:,mask_mem:).
2851
2852 SLURM_MEM_BIND_VERBOSE
2853 --mem-bind verbosity (quiet,verbose).
2854
2855 SLURM_NODE_ALIASES Sets of node name, communication address and
2856 hostname for nodes allocated to the job from the
2857 cloud. Each element in the set if colon separated
2858 and each set is comma separated. For example:
2859 SLURM_NODE_ALIASES=
2860 ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2861
2862 SLURM_NODEID The relative node ID of the current node.
2863
2864 SLURM_NPROCS Total number of processes in the current job or
2865 job step. See SLURM_NTASKS. Included for back‐
2866 wards compatibility.
2867
2868 SLURM_NTASKS Total number of processes in the current job or
2869 job step.
2870
2871 SLURM_OVERCOMMIT Set to 1 if --overcommit was specified.
2872
2873 SLURM_PRIO_PROCESS The scheduling priority (nice value) at the time
2874 of job submission. This value is propagated to
2875 the spawned processes.
2876
2877 SLURM_PROCID The MPI rank (or relative process ID) of the cur‐
2878 rent process.
2879
2880 SLURM_SRUN_COMM_HOST IP address of srun communication host.
2881
2882 SLURM_SRUN_COMM_PORT srun communication port.
2883
2884 SLURM_CONTAINER OCI Bundle for job. Only set if --container is
2885 specified.
2886
2887 SLURM_STEP_ID The step ID of the current job.
2888
2889 SLURM_STEP_LAUNCHER_PORT
2890 Step launcher port.
2891
2892 SLURM_STEP_NODELIST List of nodes allocated to the step.
2893
2894 SLURM_STEP_NUM_NODES Number of nodes allocated to the step.
2895
2896 SLURM_STEP_NUM_TASKS Number of processes in the job step or whole het‐
2897 erogeneous job step.
2898
2899 SLURM_STEP_TASKS_PER_NODE
2900 Number of processes per node within the step.
2901
2902 SLURM_STEPID The step ID of the current job. See
2903 SLURM_STEP_ID. Included for backwards compatibil‐
2904 ity.
2905
2906 SLURM_SUBMIT_DIR The directory from which the allocation was in‐
2907 voked from.
2908
2909 SLURM_SUBMIT_HOST The hostname of the computer from which the allo‐
2910 cation was invoked from.
2911
2912 SLURM_TASK_PID The process ID of the task being started.
2913
2914 SLURM_TASKS_PER_NODE Number of tasks to be initiated on each node.
2915 Values are comma separated and in the same order
2916 as SLURM_JOB_NODELIST. If two or more consecu‐
2917 tive nodes are to have the same task count, that
2918 count is followed by "(x#)" where "#" is the rep‐
2919 etition count. For example,
2920 "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2921 first three nodes will each execute two tasks and
2922 the fourth node will execute one task.
2923
2924 SLURM_TOPOLOGY_ADDR This is set only if the system has the topol‐
2925 ogy/tree plugin configured. The value will be
2926 set to the names network switches which may be
2927 involved in the job's communications from the
2928 system's top level switch down to the leaf switch
2929 and ending with node name. A period is used to
2930 separate each hardware component name.
2931
2932 SLURM_TOPOLOGY_ADDR_PATTERN
2933 This is set only if the system has the topol‐
2934 ogy/tree plugin configured. The value will be
2935 set component types listed in SLURM_TOPOL‐
2936 OGY_ADDR. Each component will be identified as
2937 either "switch" or "node". A period is used to
2938 separate each hardware component type.
2939
2940 SLURM_UMASK The umask in effect when the job was submitted.
2941
2942 SLURMD_NODENAME Name of the node running the task. In the case of
2943 a parallel job executing on multiple compute
2944 nodes, the various tasks will have this environ‐
2945 ment variable set to different values on each
2946 compute node.
2947
2948 SRUN_DEBUG Set to the logging level of the srun command.
2949 Default value is 3 (info level). The value is
2950 incremented or decremented based upon the --ver‐
2951 bose and --quiet options.
2952
2954 Signals sent to the srun command are automatically forwarded to the
2955 tasks it is controlling with a few exceptions. The escape sequence
2956 <control-c> will report the state of all tasks associated with the srun
2957 command. If <control-c> is entered twice within one second, then the
2958 associated SIGINT signal will be sent to all tasks and a termination
2959 sequence will be entered sending SIGCONT, SIGTERM, and SIGKILL to all
2960 spawned tasks. If a third <control-c> is received, the srun program
2961 will be terminated without waiting for remote tasks to exit or their
2962 I/O to complete.
2963
2964 The escape sequence <control-z> is presently ignored.
2965
2966
2968 MPI use depends upon the type of MPI being used. There are three fun‐
2969 damentally different modes of operation used by these various MPI im‐
2970 plementations.
2971
2972 1. Slurm directly launches the tasks and performs initialization of
2973 communications through the PMI2 or PMIx APIs. For example: "srun -n16
2974 a.out".
2975
2976 2. Slurm creates a resource allocation for the job and then mpirun
2977 launches tasks using Slurm's infrastructure (OpenMPI).
2978
2979 3. Slurm creates a resource allocation for the job and then mpirun
2980 launches tasks using some mechanism other than Slurm, such as SSH or
2981 RSH. These tasks are initiated outside of Slurm's monitoring or con‐
2982 trol. Slurm's epilog should be configured to purge these tasks when the
2983 job's allocation is relinquished, or the use of pam_slurm_adopt is
2984 highly recommended.
2985
2986 See https://slurm.schedmd.com/mpi_guide.html for more information on
2987 use of these various MPI implementations with Slurm.
2988
2989
2991 Comments in the configuration file must have a "#" in column one. The
2992 configuration file contains the following fields separated by white
2993 space:
2994
2995
2996 Task rank
2997 One or more task ranks to use this configuration. Multiple val‐
2998 ues may be comma separated. Ranges may be indicated with two
2999 numbers separated with a '-' with the smaller number first (e.g.
3000 "0-4" and not "4-0"). To indicate all tasks not otherwise spec‐
3001 ified, specify a rank of '*' as the last line of the file. If
3002 an attempt is made to initiate a task for which no executable
3003 program is defined, the following error message will be produced
3004 "No executable program specified for this task".
3005
3006 Executable
3007 The name of the program to execute. May be fully qualified
3008 pathname if desired.
3009
3010 Arguments
3011 Program arguments. The expression "%t" will be replaced with
3012 the task's number. The expression "%o" will be replaced with
3013 the task's offset within this range (e.g. a configured task rank
3014 value of "1-5" would have offset values of "0-4"). Single
3015 quotes may be used to avoid having the enclosed values inter‐
3016 preted. This field is optional. Any arguments for the program
3017 entered on the command line will be added to the arguments spec‐
3018 ified in the configuration file.
3019
3020 For example:
3021
3022 $ cat silly.conf
3023 ###################################################################
3024 # srun multiple program configuration file
3025 #
3026 # srun -n8 -l --multi-prog silly.conf
3027 ###################################################################
3028 4-6 hostname
3029 1,7 echo task:%t
3030 0,2-3 echo offset:%o
3031
3032 $ srun -n8 -l --multi-prog silly.conf
3033 0: offset:0
3034 1: task:1
3035 2: offset:1
3036 3: offset:2
3037 4: linux15.llnl.gov
3038 5: linux16.llnl.gov
3039 6: linux17.llnl.gov
3040 7: task:7
3041
3042
3044 This simple example demonstrates the execution of the command hostname
3045 in eight tasks. At least eight processors will be allocated to the job
3046 (the same as the task count) on however many nodes are required to sat‐
3047 isfy the request. The output of each task will be proceeded with its
3048 task number. (The machine "dev" in the example below has a total of
3049 two CPUs per node)
3050
3051 $ srun -n8 -l hostname
3052 0: dev0
3053 1: dev0
3054 2: dev1
3055 3: dev1
3056 4: dev2
3057 5: dev2
3058 6: dev3
3059 7: dev3
3060
3061
3062 The srun -r option is used within a job script to run two job steps on
3063 disjoint nodes in the following example. The script is run using allo‐
3064 cate mode instead of as a batch job in this case.
3065
3066 $ cat test.sh
3067 #!/bin/sh
3068 echo $SLURM_JOB_NODELIST
3069 srun -lN2 -r2 hostname
3070 srun -lN2 hostname
3071
3072 $ salloc -N4 test.sh
3073 dev[7-10]
3074 0: dev9
3075 1: dev10
3076 0: dev7
3077 1: dev8
3078
3079
3080 The following script runs two job steps in parallel within an allocated
3081 set of nodes.
3082
3083 $ cat test.sh
3084 #!/bin/bash
3085 srun -lN2 -n4 -r 2 sleep 60 &
3086 srun -lN2 -r 0 sleep 60 &
3087 sleep 1
3088 squeue
3089 squeue -s
3090 wait
3091
3092 $ salloc -N4 test.sh
3093 JOBID PARTITION NAME USER ST TIME NODES NODELIST
3094 65641 batch test.sh grondo R 0:01 4 dev[7-10]
3095
3096 STEPID PARTITION USER TIME NODELIST
3097 65641.0 batch grondo 0:01 dev[7-8]
3098 65641.1 batch grondo 0:01 dev[9-10]
3099
3100
3101 This example demonstrates how one executes a simple MPI job. We use
3102 srun to build a list of machines (nodes) to be used by mpirun in its
3103 required format. A sample command line and the script to be executed
3104 follow.
3105
3106 $ cat test.sh
3107 #!/bin/sh
3108 MACHINEFILE="nodes.$SLURM_JOB_ID"
3109
3110 # Generate Machinefile for mpi such that hosts are in the same
3111 # order as if run via srun
3112 #
3113 srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3114
3115 # Run using generated Machine file:
3116 mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3117
3118 rm $MACHINEFILE
3119
3120 $ salloc -N2 -n4 test.sh
3121
3122
3123 This simple example demonstrates the execution of different jobs on
3124 different nodes in the same srun. You can do this for any number of
3125 nodes or any number of jobs. The executables are placed on the nodes
3126 sited by the SLURM_NODEID env var. Starting at 0 and going to the num‐
3127 ber specified on the srun command line.
3128
3129 $ cat test.sh
3130 case $SLURM_NODEID in
3131 0) echo "I am running on "
3132 hostname ;;
3133 1) hostname
3134 echo "is where I am running" ;;
3135 esac
3136
3137 $ srun -N2 test.sh
3138 dev0
3139 is where I am running
3140 I am running on
3141 dev1
3142
3143
3144 This example demonstrates use of multi-core options to control layout
3145 of tasks. We request that four sockets per node and two cores per
3146 socket be dedicated to the job.
3147
3148 $ srun -N2 -B 4-4:2-2 a.out
3149
3150
3151 This example shows a script in which Slurm is used to provide resource
3152 management for a job by executing the various job steps as processors
3153 become available for their dedicated use.
3154
3155 $ cat my.script
3156 #!/bin/bash
3157 srun -n4 prog1 &
3158 srun -n3 prog2 &
3159 srun -n1 prog3 &
3160 srun -n1 prog4 &
3161 wait
3162
3163
3164 This example shows how to launch an application called "server" with
3165 one task, 8 CPUs and 16 GB of memory (2 GB per CPU) plus another appli‐
3166 cation called "client" with 16 tasks, 1 CPU per task (the default) and
3167 1 GB of memory per task.
3168
3169 $ srun -n1 -c16 --mem-per-cpu=1gb server : -n16 --mem-per-cpu=1gb client
3170
3171
3173 Copyright (C) 2006-2007 The Regents of the University of California.
3174 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3175 Copyright (C) 2008-2010 Lawrence Livermore National Security.
3176 Copyright (C) 2010-2022 SchedMD LLC.
3177
3178 This file is part of Slurm, a resource management program. For de‐
3179 tails, see <https://slurm.schedmd.com/>.
3180
3181 Slurm is free software; you can redistribute it and/or modify it under
3182 the terms of the GNU General Public License as published by the Free
3183 Software Foundation; either version 2 of the License, or (at your op‐
3184 tion) any later version.
3185
3186 Slurm is distributed in the hope that it will be useful, but WITHOUT
3187 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
3188 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
3189 for more details.
3190
3191
3193 salloc(1), sattach(1), sbatch(1), sbcast(1), scancel(1), scontrol(1),
3194 squeue(1), slurm.conf(5), sched_setaffinity (2), numa (3) getrlimit (2)
3195
3196
3197
3198April 2022 Slurm Commands srun(1)