1srun(1) Slurm Commands srun(1)
2
3
4
6 srun - Run parallel jobs
7
8
10 srun [OPTIONS(0)...] [ : [OPTIONS(N)...]] executable(0) [args(0)...]
11
12 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
13 For more details about heterogeneous jobs see the document
14 https://slurm.schedmd.com/heterogeneous_jobs.html
15
16
18 Run a parallel job on cluster managed by Slurm. If necessary, srun
19 will first create a resource allocation in which to run the parallel
20 job.
21
22 The following document describes the influence of various options on
23 the allocation of cpus to jobs and tasks.
24 https://slurm.schedmd.com/cpu_management.html
25
26
28 srun will return the highest exit code of all tasks run or the highest
29 signal (with the high-order bit set in an 8-bit integer -- e.g. 128 +
30 signal) of any task that exited with a signal.
31
32
34 The executable is resolved in the following order:
35
36 1. If executable starts with ".", then path is constructed as: current
37 working directory / executable
38
39 2. If executable starts with a "/", then path is considered absolute.
40
41 3. If executable can be resolved through PATH. See path_resolution(7).
42
43 4. If executable is in current working directory.
44
45 Current working directory is the calling process working directory
46 unless the --chdir argument is passed, which will override the current
47 working directory.
48
49
51 --accel-bind=<options>
52 Control how tasks are bound to generic resources of type gpu,
53 mic and nic. Multiple options may be specified. Supported
54 options include:
55
56 g Bind each task to GPUs which are closest to the allocated
57 CPUs.
58
59 m Bind each task to MICs which are closest to the allocated
60 CPUs.
61
62 n Bind each task to NICs which are closest to the allocated
63 CPUs.
64
65 v Verbose mode. Log how tasks are bound to GPU and NIC
66 devices.
67
68 This option applies to job allocations.
69
70
71 -A, --account=<account>
72 Charge resources used by this job to specified account. The
73 account is an arbitrary string. The account name may be changed
74 after job submission using the scontrol command. This option
75 applies to job allocations.
76
77
78 --acctg-freq
79 Define the job accounting and profiling sampling intervals.
80 This can be used to override the JobAcctGatherFrequency parame‐
81 ter in Slurm's configuration file, slurm.conf. The supported
82 format is follows:
83
84 --acctg-freq=<datatype>=<interval>
85 where <datatype>=<interval> specifies the task sam‐
86 pling interval for the jobacct_gather plugin or a
87 sampling interval for a profiling type by the
88 acct_gather_profile plugin. Multiple, comma-sepa‐
89 rated <datatype>=<interval> intervals may be speci‐
90 fied. Supported datatypes are as follows:
91
92 task=<interval>
93 where <interval> is the task sampling inter‐
94 val in seconds for the jobacct_gather plugins
95 and for task profiling by the
96 acct_gather_profile plugin. NOTE: This fre‐
97 quency is used to monitor memory usage. If
98 memory limits are enforced the highest fre‐
99 quency a user can request is what is config‐
100 ured in the slurm.conf file. They can not
101 turn it off (=0) either.
102
103 energy=<interval>
104 where <interval> is the sampling interval in
105 seconds for energy profiling using the
106 acct_gather_energy plugin
107
108 network=<interval>
109 where <interval> is the sampling interval in
110 seconds for infiniband profiling using the
111 acct_gather_infiniband plugin.
112
113 filesystem=<interval>
114 where <interval> is the sampling interval in
115 seconds for filesystem profiling using the
116 acct_gather_filesystem plugin.
117
118 The default value for the task sampling
119 interval
120 is 30. The default value for all other intervals is 0. An
121 interval of 0 disables sampling of the specified type. If the
122 task sampling interval is 0, accounting information is collected
123 only at job termination (reducing Slurm interference with the
124 job).
125 Smaller (non-zero) values have a greater impact upon job perfor‐
126 mance, but a value of 30 seconds is not likely to be noticeable
127 for applications having less than 10,000 tasks. This option
128 applies job allocations.
129
130
131 -B --extra-node-info=<sockets[:cores[:threads]]>
132 Restrict node selection to nodes with at least the specified
133 number of sockets, cores per socket and/or threads per core.
134 NOTE: These options do not specify the resource allocation size.
135 Each value specified is considered a minimum. An asterisk (*)
136 can be used as a placeholder indicating that all available
137 resources of that type are to be utilized. Values can also be
138 specified as min-max. The individual levels can also be speci‐
139 fied in separate options if desired:
140 --sockets-per-node=<sockets>
141 --cores-per-socket=<cores>
142 --threads-per-core=<threads>
143 If task/affinity plugin is enabled, then specifying an alloca‐
144 tion in this manner also sets a default --cpu-bind option of
145 threads if the -B option specifies a thread count, otherwise an
146 option of cores if a core count is specified, otherwise an
147 option of sockets. If SelectType is configured to
148 select/cons_res, it must have a parameter of CR_Core,
149 CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option
150 to be honored. If not specified, the scontrol show job will
151 display 'ReqS:C:T=*:*:*'. This option applies to job alloca‐
152 tions.
153
154
155 --bb=<spec>
156 Burst buffer specification. The form of the specification is
157 system dependent. Also see --bbf. This option applies to job
158 allocations.
159
160
161 --bbf=<file_name>
162 Path of file containing burst buffer specification. The form of
163 the specification is system dependent. Also see --bb. This
164 option applies to job allocations.
165
166
167 --bcast[=<dest_path>]
168 Copy executable file to allocated compute nodes. If a file name
169 is specified, copy the executable to the specified destination
170 file path. If no path is specified, copy the file to a file
171 named "slurm_bcast_<job_id>.<step_id>" in the current working.
172 For example, "srun --bcast=/tmp/mine -N3 a.out" will copy the
173 file "a.out" from your current directory to the file "/tmp/mine"
174 on each of the three allocated compute nodes and execute that
175 file. This option applies to step allocations.
176
177
178 -b, --begin=<time>
179 Defer initiation of this job until the specified time. It
180 accepts times of the form HH:MM:SS to run a job at a specific
181 time of day (seconds are optional). (If that time is already
182 past, the next day is assumed.) You may also specify midnight,
183 noon, fika (3 PM) or teatime (4 PM) and you can have a
184 time-of-day suffixed with AM or PM for running in the morning or
185 the evening. You can also say what day the job will be run, by
186 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
187 Combine date and time using the following format
188 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
189 count time-units, where the time-units can be seconds (default),
190 minutes, hours, days, or weeks and you can tell Slurm to run the
191 job today with the keyword today and to run the job tomorrow
192 with the keyword tomorrow. The value may be changed after job
193 submission using the scontrol command. For example:
194 --begin=16:00
195 --begin=now+1hour
196 --begin=now+60 (seconds by default)
197 --begin=2010-01-20T12:34:00
198
199
200 Notes on date/time specifications:
201 - Although the 'seconds' field of the HH:MM:SS time specifica‐
202 tion is allowed by the code, note that the poll time of the
203 Slurm scheduler is not precise enough to guarantee dispatch of
204 the job on the exact second. The job will be eligible to start
205 on the next poll following the specified time. The exact poll
206 interval depends on the Slurm scheduler (e.g., 60 seconds with
207 the default sched/builtin).
208 - If no time (HH:MM:SS) is specified, the default is
209 (00:00:00).
210 - If a date is specified without a year (e.g., MM/DD) then the
211 current year is assumed, unless the combination of MM/DD and
212 HH:MM:SS has already passed for that year, in which case the
213 next year is used.
214 This option applies to job allocations.
215
216
217 --checkpoint=<time>
218 Specifies the interval between creating checkpoints of the job
219 step. By default, the job step will have no checkpoints cre‐
220 ated. Acceptable time formats include "minutes", "minutes:sec‐
221 onds", "hours:minutes:seconds", "days-hours", "days-hours:min‐
222 utes" and "days-hours:minutes:seconds". This option applies to
223 job and step allocations.
224
225
226 --cluster-constraint=<list>
227 Specifies features that a federated cluster must have to have a
228 sibling job submitted to it. Slurm will attempt to submit a sib‐
229 ling job to a cluster if it has at least one of the specified
230 features.
231
232
233 --comment=<string>
234 An arbitrary comment. This option applies to job allocations.
235
236
237 --compress[=type]
238 Compress file before sending it to compute hosts. The optional
239 argument specifies the data compression library to be used.
240 Supported values are "lz4" (default) and "zlib". Some compres‐
241 sion libraries may be unavailable on some systems. For use with
242 the --bcast option. This option applies to step allocations.
243
244
245 -C, --constraint=<list>
246 Nodes can have features assigned to them by the Slurm adminis‐
247 trator. Users can specify which of these features are required
248 by their job using the constraint option. Only nodes having
249 features matching the job constraints will be used to satisfy
250 the request. Multiple constraints may be specified with AND,
251 OR, matching OR, resource counts, etc. (some operators are not
252 supported on all system types). Supported constraint options
253 include:
254
255 Single Name
256 Only nodes which have the specified feature will be used.
257 For example, --constraint="intel"
258
259 Node Count
260 A request can specify the number of nodes needed with
261 some feature by appending an asterisk and count after the
262 feature name. For example "--nodes=16 --con‐
263 straint=graphics*4 ..." indicates that the job requires
264 16 nodes and that at least four of those nodes must have
265 the feature "graphics."
266
267 AND If only nodes with all of specified features will be
268 used. The ampersand is used for an AND operator. For
269 example, --constraint="intel&gpu"
270
271 OR If only nodes with at least one of specified features
272 will be used. The vertical bar is used for an OR opera‐
273 tor. For example, --constraint="intel|amd"
274
275 Matching OR
276 If only one of a set of possible options should be used
277 for all allocated nodes, then use the OR operator and
278 enclose the options within square brackets. For example:
279 "--constraint=[rack1|rack2|rack3|rack4]" might be used to
280 specify that all nodes must be allocated on a single rack
281 of the cluster, but any of those four racks can be used.
282
283 Multiple Counts
284 Specific counts of multiple resources may be specified by
285 using the AND operator and enclosing the options within
286 square brackets. For example: "--con‐
287 straint=[rack1*2&rack2*4]" might be used to specify that
288 two nodes must be allocated from nodes with the feature
289 of "rack1" and four nodes must be allocated from nodes
290 with the feature "rack2".
291
292 NOTE: This construct does not support multiple Intel KNL
293 NUMA or MCDRAM modes. For example, while "--con‐
294 straint=[(knl&quad)*2&(knl&hemi)*4]" is not supported,
295 "--constraint=[haswell*2&(knl&hemi)*4]" is supported.
296 Specification of multiple KNL modes requires the use of a
297 heterogeneous job.
298
299
300 Parenthesis
301 Parenthesis can be used to group like node features
302 together. For example "--con‐
303 straint=[(knl&snc4&flat)*4&haswell*1]" might be used to
304 specify that four nodes with the features "knl", "snc4"
305 and "flat" plus one node with the feature "haswell" are
306 required. All options within parenthesis should be
307 grouped with AND (e.g. "&") operands.
308
309 WARNING: When srun is executed from within salloc or sbatch, the con‐
310 straint value can only contain a single feature name. None of the other
311 operators are currently supported for job steps.
312 This option applies to job and step allocations.
313
314
315 --contiguous
316 If set, then the allocated nodes must form a contiguous set.
317 Not honored with the topology/tree or topology/3d_torus plugins,
318 both of which can modify the node ordering. This option applies
319 to job allocations.
320
321
322 --cores-per-socket=<cores>
323 Restrict node selection to nodes with at least the specified
324 number of cores per socket. See additional information under -B
325 option above when task/affinity plugin is enabled. This option
326 applies to job allocations.
327
328
329 --cpu-bind=[{quiet,verbose},]type
330 Bind tasks to CPUs. Used only when the task/affinity or
331 task/cgroup plugin is enabled. NOTE: To have Slurm always
332 report on the selected CPU binding for all commands executed in
333 a shell, you can enable verbose mode by setting the
334 SLURM_CPU_BIND environment variable value to "verbose".
335
336 The following informational environment variables are set when
337 --cpu-bind is in use:
338 SLURM_CPU_BIND_VERBOSE
339 SLURM_CPU_BIND_TYPE
340 SLURM_CPU_BIND_LIST
341
342 See the ENVIRONMENT VARIABLES section for a more detailed
343 description of the individual SLURM_CPU_BIND variables. These
344 variable are available only if the task/affinity plugin is con‐
345 figured.
346
347 When using --cpus-per-task to run multithreaded tasks, be aware
348 that CPU binding is inherited from the parent of the process.
349 This means that the multithreaded task should either specify or
350 clear the CPU binding itself to avoid having all threads of the
351 multithreaded task use the same mask/CPU as the parent. Alter‐
352 natively, fat masks (masks which specify more than one allowed
353 CPU) could be used for the tasks in order to provide multiple
354 CPUs for the multithreaded tasks.
355
356 By default, a job step has access to every CPU allocated to the
357 job. To ensure that distinct CPUs are allocated to each job
358 step, use the --exclusive option.
359
360 Note that a job step can be allocated different numbers of CPUs
361 on each node or be allocated CPUs not starting at location zero.
362 Therefore one of the options which automatically generate the
363 task binding is recommended. Explicitly specified masks or
364 bindings are only honored when the job step has been allocated
365 every available CPU on the node.
366
367 Binding a task to a NUMA locality domain means to bind the task
368 to the set of CPUs that belong to the NUMA locality domain or
369 "NUMA node". If NUMA locality domain options are used on sys‐
370 tems with no NUMA support, then each socket is considered a
371 locality domain.
372
373 If the --cpu-bind option is not used, the default binding mode
374 will depend upon Slurm's configuration and the step's resource
375 allocation. If all allocated nodes have the same configured
376 CpuBind mode, that will be used. Otherwise if the job's Parti‐
377 tion has a configured CpuBind mode, that will be used. Other‐
378 wise if Slurm has a configured TaskPluginParam value, that mode
379 will be used. Otherwise automatic binding will be performed as
380 described below.
381
382
383 Auto Binding
384 Applies only when task/affinity is enabled. If the job
385 step allocation includes an allocation with a number of
386 sockets, cores, or threads equal to the number of tasks
387 times cpus-per-task, then the tasks will by default be
388 bound to the appropriate resources (auto binding). Dis‐
389 able this mode of operation by explicitly setting
390 "--cpu-bind=none". Use TaskPluginParam=auto‐
391 bind=[threads|cores|sockets] to set a default cpu binding
392 in case "auto binding" doesn't find a match.
393
394 Supported options include:
395
396 q[uiet]
397 Quietly bind before task runs (default)
398
399 v[erbose]
400 Verbosely report binding before task runs
401
402 no[ne] Do not bind tasks to CPUs (default unless auto
403 binding is applied)
404
405 rank Automatically bind by task rank. The lowest num‐
406 bered task on each node is bound to socket (or
407 core or thread) zero, etc. Not supported unless
408 the entire node is allocated to the job.
409
410 map_cpu:<list>
411 Bind by setting CPU masks on tasks (or ranks) as
412 specified where <list> is
413 <cpu_id_for_task_0>,<cpu_id_for_task_1>,... CPU
414 IDs are interpreted as decimal values unless they
415 are preceded with '0x' in which case they inter‐
416 preted as hexadecimal values. If the number of
417 tasks (or ranks) exceeds the number of elements in
418 this list, elements in the list will be reused as
419 needed starting from the beginning of the list.
420 To simplify support for large task counts, the
421 lists may follow a map with an asterisk and repe‐
422 tition count For example "map_cpu:0x0f*4,0xf0*4".
423 Not supported unless the entire node is allocated
424 to the job.
425
426 mask_cpu:<list>
427 Bind by setting CPU masks on tasks (or ranks) as
428 specified where <list> is
429 <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
430 The mapping is specified for a node and identical
431 mapping is applied to the tasks on every node
432 (i.e. the lowest task ID on each node is mapped to
433 the first mask specified in the list, etc.). CPU
434 masks are always interpreted as hexadecimal values
435 but can be preceded with an optional '0x'. Not
436 supported unless the entire node is allocated to
437 the job. To simplify support for large task
438 counts, the lists may follow a map with an aster‐
439 isk and repetition count For example
440 "mask_cpu:0x0f*4,0xf0*4". Not supported unless
441 the entire node is allocated to the job.
442
443 rank_ldom
444 Bind to a NUMA locality domain by rank. Not sup‐
445 ported unless the entire node is allocated to the
446 job.
447
448 map_ldom:<list>
449 Bind by mapping NUMA locality domain IDs to tasks
450 as specified where <list> is
451 <ldom1>,<ldom2>,...<ldomN>. The locality domain
452 IDs are interpreted as decimal values unless they
453 are preceded with '0x' in which case they are
454 interpreted as hexadecimal values. Not supported
455 unless the entire node is allocated to the job.
456
457 mask_ldom:<list>
458 Bind by setting NUMA locality domain masks on
459 tasks as specified where <list> is
460 <mask1>,<mask2>,...<maskN>. NUMA locality domain
461 masks are always interpreted as hexadecimal values
462 but can be preceded with an optional '0x'. Not
463 supported unless the entire node is allocated to
464 the job.
465
466 sockets
467 Automatically generate masks binding tasks to
468 sockets. Only the CPUs on the socket which have
469 been allocated to the job will be used. If the
470 number of tasks differs from the number of allo‐
471 cated sockets this can result in sub-optimal bind‐
472 ing.
473
474 cores Automatically generate masks binding tasks to
475 cores. If the number of tasks differs from the
476 number of allocated cores this can result in
477 sub-optimal binding.
478
479 threads
480 Automatically generate masks binding tasks to
481 threads. If the number of tasks differs from the
482 number of allocated threads this can result in
483 sub-optimal binding.
484
485 ldoms Automatically generate masks binding tasks to NUMA
486 locality domains. If the number of tasks differs
487 from the number of allocated locality domains this
488 can result in sub-optimal binding.
489
490 boards Automatically generate masks binding tasks to
491 boards. If the number of tasks differs from the
492 number of allocated boards this can result in
493 sub-optimal binding. This option is supported by
494 the task/cgroup plugin only.
495
496 help Show help message for cpu-bind
497
498 This option applies to job and step allocations.
499
500
501 --cpu-freq =<p1[-p2[:p3]]>
502
503 Request that the job step initiated by this srun command be run
504 at some requested frequency if possible, on the CPUs selected
505 for the step on the compute node(s).
506
507 p1 can be [#### | low | medium | high | highm1] which will set
508 the frequency scaling_speed to the corresponding value, and set
509 the frequency scaling_governor to UserSpace. See below for defi‐
510 nition of the values.
511
512 p1 can be [Conservative | OnDemand | Performance | PowerSave]
513 which will set the scaling_governor to the corresponding value.
514 The governor has to be in the list set by the slurm.conf option
515 CpuFreqGovernors.
516
517 When p2 is present, p1 will be the minimum scaling frequency and
518 p2 will be the maximum scaling frequency.
519
520 p2 can be [#### | medium | high | highm1] p2 must be greater
521 than p1.
522
523 p3 can be [Conservative | OnDemand | Performance | PowerSave |
524 UserSpace] which will set the governor to the corresponding
525 value.
526
527 If p3 is UserSpace, the frequency scaling_speed will be set by a
528 power or energy aware scheduling strategy to a value between p1
529 and p2 that lets the job run within the site's power goal. The
530 job may be delayed if p1 is higher than a frequency that allows
531 the job to run within the goal.
532
533 If the current frequency is < min, it will be set to min. Like‐
534 wise, if the current frequency is > max, it will be set to max.
535
536 Acceptable values at present include:
537
538 #### frequency in kilohertz
539
540 Low the lowest available frequency
541
542 High the highest available frequency
543
544 HighM1 (high minus one) will select the next highest
545 available frequency
546
547 Medium attempts to set a frequency in the middle of the
548 available range
549
550 Conservative attempts to use the Conservative CPU governor
551
552 OnDemand attempts to use the OnDemand CPU governor (the
553 default value)
554
555 Performance attempts to use the Performance CPU governor
556
557 PowerSave attempts to use the PowerSave CPU governor
558
559 UserSpace attempts to use the UserSpace CPU governor
560
561
562 The following informational environment variable is set
563 in the job
564 step when --cpu-freq option is requested.
565 SLURM_CPU_FREQ_REQ
566
567 This environment variable can also be used to supply the value
568 for the CPU frequency request if it is set when the 'srun' com‐
569 mand is issued. The --cpu-freq on the command line will over‐
570 ride the environment variable value. The form on the environ‐
571 ment variable is the same as the command line. See the ENVIRON‐
572 MENT VARIABLES section for a description of the
573 SLURM_CPU_FREQ_REQ variable.
574
575 NOTE: This parameter is treated as a request, not a requirement.
576 If the job step's node does not support setting the CPU fre‐
577 quency, or the requested value is outside the bounds of the
578 legal frequencies, an error is logged, but the job step is
579 allowed to continue.
580
581 NOTE: Setting the frequency for just the CPUs of the job step
582 implies that the tasks are confined to those CPUs. If task con‐
583 finement (i.e., TaskPlugin=task/affinity or TaskPlu‐
584 gin=task/cgroup with the "ConstrainCores" option) is not config‐
585 ured, this parameter is ignored.
586
587 NOTE: When the step completes, the frequency and governor of
588 each selected CPU is reset to the previous values.
589
590 NOTE: When submitting jobs with the --cpu-freq option with lin‐
591 uxproc as the ProctrackType can cause jobs to run too quickly
592 before Accounting is able to poll for job information. As a
593 result not all of accounting information will be present.
594
595 This option applies to job and step allocations.
596
597
598 --cpus-per-gpu=<ncpus>
599 Advise Slurm that ensuing job steps will require ncpus proces‐
600 sors per allocated GPU. Requires the --gpus option. Not com‐
601 patible with the --cpus-per-task option.
602
603
604 -c, --cpus-per-task=<ncpus>
605 Request that ncpus be allocated per process. This may be useful
606 if the job is multithreaded and requires more than one CPU per
607 task for optimal performance. The default is one CPU per
608 process. If -c is specified without -n, as many tasks will be
609 allocated per node as possible while satisfying the -c restric‐
610 tion. For instance on a cluster with 8 CPUs per node, a job
611 request for 4 nodes and 3 CPUs per task may be allocated 3 or 6
612 CPUs per node (1 or 2 tasks per node) depending upon resource
613 consumption by other jobs. Such a job may be unable to execute
614 more than a total of 4 tasks. This option may also be useful to
615 spawn tasks without allocating resources to the job step from
616 the job's allocation when running multiple job steps with the
617 --exclusive option.
618
619 WARNING: There are configurations and options interpreted dif‐
620 ferently by job and job step requests which can result in incon‐
621 sistencies for this option. For example srun -c2
622 --threads-per-core=1 prog may allocate two cores for the job,
623 but if each of those cores contains two threads, the job alloca‐
624 tion will include four CPUs. The job step allocation will then
625 launch two threads per CPU for a total of two tasks.
626
627 WARNING: When srun is executed from within salloc or sbatch,
628 there are configurations and options which can result in incon‐
629 sistent allocations when -c has a value greater than -c on sal‐
630 loc or sbatch.
631
632 This option applies to job allocations.
633
634
635 --deadline=<OPT>
636 remove the job if no ending is possible before this deadline
637 (start > (deadline - time[-min])). Default is no deadline.
638 Valid time formats are:
639 HH:MM[:SS] [AM|PM]
640 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
641 MM/DD[/YY]-HH:MM[:SS]
642 YYYY-MM-DD[THH:MM[:SS]]]
643
644 This option applies only to job allocations.
645
646
647 --delay-boot=<minutes>
648 Do not reboot nodes in order to satisfied this job's feature
649 specification if the job has been eligible to run for less than
650 this time period. If the job has waited for less than the spec‐
651 ified period, it will use only nodes which already have the
652 specified features. The argument is in units of minutes. A
653 default value may be set by a system administrator using the
654 delay_boot option of the SchedulerParameters configuration
655 parameter in the slurm.conf file, otherwise the default value is
656 zero (no delay).
657
658 This option applies only to job allocations.
659
660
661 -d, --dependency=<dependency_list>
662 Defer the start of this job until the specified dependencies
663 have been satisfied completed. This option does not apply to job
664 steps (executions of srun within an existing salloc or sbatch
665 allocation) only to job allocations. <dependency_list> is of
666 the form <type:job_id[:job_id][,type:job_id[:job_id]]> or
667 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
668 must be satisfied if the "," separator is used. Any dependency
669 may be satisfied if the "?" separator is used. Many jobs can
670 share the same dependency and these jobs may even belong to dif‐
671 ferent users. The value may be changed after job submission
672 using the scontrol command. Once a job dependency fails due to
673 the termination state of a preceding job, the dependent job will
674 never be run, even if the preceding job is requeued and has a
675 different termination state in a subsequent execution. This
676 option applies to job allocations.
677
678 after:job_id[:jobid...]
679 This job can begin execution after the specified jobs
680 have begun execution.
681
682 afterany:job_id[:jobid...]
683 This job can begin execution after the specified jobs
684 have terminated.
685
686 afterburstbuffer:job_id[:jobid...]
687 This job can begin execution after the specified jobs
688 have terminated and any associated burst buffer stage out
689 operations have completed.
690
691 aftercorr:job_id[:jobid...]
692 A task of this job array can begin execution after the
693 corresponding task ID in the specified job has completed
694 successfully (ran to completion with an exit code of
695 zero).
696
697 afternotok:job_id[:jobid...]
698 This job can begin execution after the specified jobs
699 have terminated in some failed state (non-zero exit code,
700 node failure, timed out, etc).
701
702 afterok:job_id[:jobid...]
703 This job can begin execution after the specified jobs
704 have successfully executed (ran to completion with an
705 exit code of zero).
706
707 expand:job_id
708 Resources allocated to this job should be used to expand
709 the specified job. The job to expand must share the same
710 QOS (Quality of Service) and partition. Gang scheduling
711 of resources in the partition is also not supported.
712
713 singleton
714 This job can begin execution after any previously
715 launched jobs sharing the same job name and user have
716 terminated. In other words, only one job by that name
717 and owned by that user can be running or suspended at any
718 point in time.
719
720
721 -D, --chdir=<path>
722 Have the remote processes do a chdir to path before beginning
723 execution. The default is to chdir to the current working direc‐
724 tory of the srun process. The path can be specified as full path
725 or relative path to the directory where the command is executed.
726 This option applies to job allocations.
727
728
729 -e, --error=<filename pattern>
730 Specify how stderr is to be redirected. By default in interac‐
731 tive mode, srun redirects stderr to the same file as stdout, if
732 one is specified. The --error option is provided to allow stdout
733 and stderr to be redirected to different locations. See IO Re‐
734 direction below for more options. If the specified file already
735 exists, it will be overwritten. This option applies to job and
736 step allocations.
737
738
739 -E, --preserve-env
740 Pass the current values of environment variables SLURM_JOB_NODES
741 and SLURM_NTASKS through to the executable, rather than comput‐
742 ing them from commandline parameters. This option applies to job
743 allocations.
744
745
746 --epilog=<executable>
747 srun will run executable just after the job step completes. The
748 command line arguments for executable will be the command and
749 arguments of the job step. If executable is "none", then no
750 srun epilog will be run. This parameter overrides the SrunEpilog
751 parameter in slurm.conf. This parameter is completely indepen‐
752 dent from the Epilog parameter in slurm.conf. This option
753 applies to job allocations.
754
755
756
757 --exclusive[=user|mcs]
758 This option applies to job and job step allocations, and has two
759 slightly different meanings for each one. When used to initiate
760 a job, the job allocation cannot share nodes with other running
761 jobs (or just other users with the "=user" option or "=mcs"
762 option). The default shared/exclusive behavior depends on sys‐
763 tem configuration and the partition's OverSubscribe option takes
764 precedence over the job's option.
765
766 This option can also be used when initiating more than one job
767 step within an existing resource allocation, where you want sep‐
768 arate processors to be dedicated to each job step. If sufficient
769 processors are not available to initiate the job step, it will
770 be deferred. This can be thought of as providing a mechanism for
771 resource management to the job within it's allocation.
772
773 The exclusive allocation of CPUs only applies to job steps
774 explicitly invoked with the --exclusive option. For example, a
775 job might be allocated one node with four CPUs and a remote
776 shell invoked on the allocated node. If that shell is not
777 invoked with the --exclusive option, then it may create a job
778 step with four tasks using the --exclusive option and not con‐
779 flict with the remote shell's resource allocation. Use the
780 --exclusive option to invoke every job step to ensure distinct
781 resources for each step.
782
783 Note that all CPUs allocated to a job are available to each job
784 step unless the --exclusive option is used plus task affinity is
785 configured. Since resource management is provided by processor,
786 the --ntasks option must be specified, but the following options
787 should NOT be specified --relative, --distribution=arbitrary.
788 See EXAMPLE below.
789
790
791 --export=<environment variables [ALL] | NONE>
792 Identify which environment variables are propagated to the
793 launched application. By default, all are propagated. Multiple
794 environment variable names should be comma separated. Environ‐
795 ment variable names may be specified to propagate the current
796 value (e.g. "--export=EDITOR") or specific values may be
797 exported (e.g. "--export=EDITOR=/bin/emacs"). In these two exam‐
798 ples, the propagated environment will only contain the variable
799 EDITOR. If one desires to add to the environment instead of
800 replacing it, have the argument include ALL (e.g.
801 "--export=ALL,EDITOR=/bin/emacs"). This will propagate EDITOR
802 along with the current environment. Unlike sbatch, if ALL is
803 specified, any additional specified environment variables are
804 ignored. If one desires no environment variables be propagated,
805 use the argument NONE. Regardless of this setting, the appro‐
806 priate SLURM_* task environment variables are always exported to
807 the environment. srun may deviate from the above behavior if
808 the default launch plugin, launch/slurm, is not used.
809
810
811 -F, --nodefile=<node file>
812 Much like --nodelist, but the list is contained in a file of
813 name node file. The node names of the list may also span multi‐
814 ple lines in the file. Duplicate node names in the file will
815 be ignored. The order of the node names in the list is not
816 important; the node names will be sorted by Slurm.
817
818
819 --gid=<group>
820 If srun is run as root, and the --gid option is used, submit the
821 job with group's group access permissions. group may be the
822 group name or the numerical group ID. This option applies to job
823 allocations.
824
825
826 -G, --gpus=[<type>:]<number>
827 Specify the total number of GPUs required for the job. An
828 optional GPU type specification can be supplied. For example
829 "--gpus=volta:3". Multiple options can be requested in a comma
830 separated list, for example: "--gpus=volta:3,kepler:1". See
831 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
832 options.
833
834
835 --gpu-bind=<type>
836 Bind tasks to specific GPUs. By default every spawned task can
837 access every GPU allocated to the job.
838
839 Supported type options:
840
841 closest Bind each task to the GPU(s) which are closest. In a
842 NUMA environment, each task may be bound to more than
843 one GPU (i.e. all GPUs in that NUMA environment).
844
845 map_gpu:<list>
846 Bind by setting GPU masks on tasks (or ranks) as spec‐
847 ified where <list> is
848 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
849 are interpreted as decimal values unless they are pre‐
850 ceded with '0x' in which case they interpreted as
851 hexadecimal values. If the number of tasks (or ranks)
852 exceeds the number of elements in this list, elements
853 in the list will be reused as needed starting from the
854 beginning of the list. To simplify support for large
855 task counts, the lists may follow a map with an aster‐
856 isk and repetition count. For example
857 "map_gpu:0*4,1*4". Not supported unless the entire
858 node is allocated to the job.
859
860 mask_gpu:<list>
861 Bind by setting GPU masks on tasks (or ranks) as spec‐
862 ified where <list> is
863 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
864 mapping is specified for a node and identical mapping
865 is applied to the tasks on every node (i.e. the lowest
866 task ID on each node is mapped to the first mask spec‐
867 ified in the list, etc.). GPU masks are always inter‐
868 preted as hexadecimal values but can be preceded with
869 an optional '0x'. Not supported unless the entire node
870 is allocated to the job. To simplify support for large
871 task counts, the lists may follow a map with an aster‐
872 isk and repetition count. For example
873 "mask_gpu:0x0f*4,0xf0*4". Not supported unless the
874 entire node is allocated to the job.
875
876
877 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
878 Request that GPUs allocated to the job are configured with spe‐
879 cific frequency values. This option can be used to indepen‐
880 dently configure the GPU and its memory frequencies. After the
881 job is completed, the frequencies of all affected GPUs will be
882 reset to the highest possible values. In some cases, system
883 power caps may override the requested values. The field type
884 can be "memory". If type is not specified, the GPU frequency is
885 implied. The value field can either be "low", "medium", "high",
886 "highm1" or a numeric value in megahertz (MHz). If the speci‐
887 fied numeric value is not possible, a value as close as possible
888 will be used. See below for definition of the values. The ver‐
889 bose option causes current GPU frequency information to be
890 logged. Examples of use include "--gpu-freq=medium,memory=high"
891 and "--gpu-freq=450".
892
893 Supported value definitions:
894
895 low the lowest available frequency.
896
897 medium attempts to set a frequency in the middle of the
898 available range.
899
900 high the highest available frequency.
901
902 highm1 (high minus one) will select the next highest avail‐
903 able frequency.
904
905
906 --gpus-per-node=[<type>:]<number>
907 Specify the number of GPUs required for the job on each node
908 included in the job's resource allocation. An optional GPU type
909 specification can be supplied. For example
910 "--gpus-per-node=volta:3". Multiple options can be requested in
911 a comma separated list, for example:
912 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
913 --gpus-per-socket and --gpus-per-task options.
914
915
916 --gpus-per-socket=[<type>:]<number>
917 Specify the number of GPUs required for the job on each socket
918 included in the job's resource allocation. An optional GPU type
919 specification can be supplied. For example
920 "--gpus-per-socket=volta:3". Multiple options can be requested
921 in a comma separated list, for example:
922 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
923 sockets per node count ( --sockets-per-node). See also the
924 --gpus, --gpus-per-node and --gpus-per-task options. This
925 option applies to job allocations.
926
927
928 --gpus-per-task=[<type>:]<number>
929 Specify the number of GPUs required for the job on each task to
930 be spawned in the job's resource allocation. An optional GPU
931 type specification can be supplied. This option requires the
932 specification of a task count. For example
933 "--gpus-per-task=volta:1". Multiple options can be requested in
934 a comma separated list, for example:
935 "--gpus-per-task=volta:3,kepler:1". Requires job to specify a
936 task count (--nodes). See also the --gpus, --gpus-per-socket
937 and --gpus-per-node options.
938
939
940 --gres=<list>
941 Specifies a comma delimited list of generic consumable
942 resources. The format of each entry on the list is
943 "name[[:type]:count]". The name is that of the consumable
944 resource. The count is the number of those resources with a
945 default value of 1. The count can have a suffix of "k" or "K"
946 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
947 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
948 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
949 x 1024 x 1024 x 1024). The specified resources will be allo‐
950 cated to the job on each node. The available generic consumable
951 resources is configurable by the system administrator. A list
952 of available generic consumable resources will be printed and
953 the command will exit if the option argument is "help". Exam‐
954 ples of use include "--gres=gpu:2,mic:1", "--gres=gpu:kepler:2",
955 and "--gres=help". NOTE: This option applies to job and step
956 allocations. By default, a job step is allocated all of the
957 generic resources that have allocated to the job. To change the
958 behavior so that each job step is allocated no generic
959 resources, explicitly set the value of --gres to specify zero
960 counts for each generic resource OR set "--gres=none" OR set the
961 SLURM_STEP_GRES environment variable to "none".
962
963
964 --gres-flags=<type>
965 Specify generic resource task binding options. This option
966 applies to job allocations.
967
968 disable-binding
969 Disable filtering of CPUs with respect to generic
970 resource locality. This option is currently required to
971 use more CPUs than are bound to a GRES (i.e. if a GPU is
972 bound to the CPUs on one socket, but resources on more
973 than one socket are required to run the job). This
974 option may permit a job to be allocated resources sooner
975 than otherwise possible, but may result in lower job per‐
976 formance.
977
978 enforce-binding
979 The only CPUs available to the job will be those bound to
980 the selected GRES (i.e. the CPUs identified in the
981 gres.conf file will be strictly enforced). This option
982 may result in delayed initiation of a job. For example a
983 job requiring two GPUs and one CPU will be delayed until
984 both GPUs on a single socket are available rather than
985 using GPUs bound to separate sockets, however the appli‐
986 cation performance may be improved due to improved commu‐
987 nication speed. Requires the node to be configured with
988 more than one socket and resource filtering will be per‐
989 formed on a per-socket basis.
990
991
992 -H, --hold
993 Specify the job is to be submitted in a held state (priority of
994 zero). A held job can now be released using scontrol to reset
995 its priority (e.g. "scontrol release <job_id>"). This option
996 applies to job allocations.
997
998
999 -h, --help
1000 Display help information and exit.
1001
1002
1003 --hint=<type>
1004 Bind tasks according to application hints.
1005
1006 compute_bound
1007 Select settings for compute bound applications: use all
1008 cores in each socket, one thread per core.
1009
1010 memory_bound
1011 Select settings for memory bound applications: use only
1012 one core in each socket, one thread per core.
1013
1014 [no]multithread
1015 [don't] use extra threads with in-core multi-threading
1016 which can benefit communication intensive applications.
1017 Only supported with the task/affinity plugin.
1018
1019 help show this help message
1020
1021 This option applies to job allocations.
1022
1023
1024 -I, --immediate[=<seconds>]
1025 exit if resources are not available within the time period spec‐
1026 ified. If no argument is given (seconds defaults to 1),
1027 resources must be available immediately for the request to suc‐
1028 ceed. If defer is configured in SchedulerParameters and sec‐
1029 onds=1 the allocation request will fail immediately; defer con‐
1030 flicts and takes precedence over this option. By default,
1031 --immediate is off, and the command will block until resources
1032 become available. Since this option's argument is optional, for
1033 proper parsing the single letter option must be followed immedi‐
1034 ately with the value and not include a space between them. For
1035 example "-I60" and not "-I 60". This option applies to job and
1036 step allocations.
1037
1038
1039 -i, --input=<mode>
1040 Specify how stdin is to redirected. By default, srun redirects
1041 stdin from the terminal all tasks. See IO Redirection below for
1042 more options. For OS X, the poll() function does not support
1043 stdin, so input from a terminal is not possible. This option
1044 applies to job and step allocations.
1045
1046
1047 -J, --job-name=<jobname>
1048 Specify a name for the job. The specified name will appear along
1049 with the job id number when querying running jobs on the system.
1050 The default is the supplied executable program's name. NOTE:
1051 This information may be written to the slurm_jobacct.log file.
1052 This file is space delimited so if a space is used in the job‐
1053 name name it will cause problems in properly displaying the con‐
1054 tents of the slurm_jobacct.log file when the sacct command is
1055 used. This option applies to job and step allocations.
1056
1057
1058 --jobid=<jobid>
1059 Initiate a job step under an already allocated job with job id
1060 id. Using this option will cause srun to behave exactly as if
1061 the SLURM_JOB_ID environment variable was set. This option
1062 applies to step allocations.
1063
1064
1065 -K, --kill-on-bad-exit[=0|1]
1066 Controls whether or not to terminate a step if any task exits
1067 with a non-zero exit code. If this option is not specified, the
1068 default action will be based upon the Slurm configuration param‐
1069 eter of KillOnBadExit. If this option is specified, it will take
1070 precedence over KillOnBadExit. An option argument of zero will
1071 not terminate the job. A non-zero argument or no argument will
1072 terminate the job. Note: This option takes precedence over the
1073 -W, --wait option to terminate the job immediately if a task
1074 exits with a non-zero exit code. Since this option's argument
1075 is optional, for proper parsing the single letter option must be
1076 followed immediately with the value and not include a space
1077 between them. For example "-K1" and not "-K 1".
1078
1079
1080 -k, --no-kill [=off]
1081 Do not automatically terminate a job if one of the nodes it has
1082 been allocated fails. This option applies to job and step allo‐
1083 cations. The job will assume all responsibilities for
1084 fault-tolerance. Tasks launch using this option will not be
1085 considered terminated (e.g. -K, --kill-on-bad-exit and -W,
1086 --wait options will have no effect upon the job step). The
1087 active job step (MPI job) will likely suffer a fatal error, but
1088 subsequent job steps may be run if this option is specified.
1089
1090 Specify an optional argument of "off" disable the effect of the
1091 SLURM_NO_KILL environment variable.
1092
1093 The default action is to terminate the job upon node failure.
1094
1095
1096 -l, --label
1097 Prepend task number to lines of stdout/err. The --label option
1098 will prepend lines of output with the remote task id. This
1099 option applies to step allocations.
1100
1101
1102 -L, --licenses=<license>
1103 Specification of licenses (or other resources available on all
1104 nodes of the cluster) which must be allocated to this job.
1105 License names can be followed by a colon and count (the default
1106 count is one). Multiple license names should be comma separated
1107 (e.g. "--licenses=foo:4,bar"). This option applies to job allo‐
1108 cations.
1109
1110
1111 -M, --clusters=<string>
1112 Clusters to issue commands to. Multiple cluster names may be
1113 comma separated. The job will be submitted to the one cluster
1114 providing the earliest expected job initiation time. The default
1115 value is the current cluster. A value of 'all' will query to run
1116 on all clusters. Note the --export option to control environ‐
1117 ment variables exported between clusters. This option applies
1118 only to job allocations. Note that the SlurmDBD must be up for
1119 this option to work properly.
1120
1121
1122 -m, --distribution=
1123 *|block|cyclic|arbitrary|plane=<options>
1124 [:*|block|cyclic|fcyclic[:*|block|
1125 cyclic|fcyclic]][,Pack|NoPack]
1126
1127 Specify alternate distribution methods for remote processes.
1128 This option controls the distribution of tasks to the nodes on
1129 which resources have been allocated, and the distribution of
1130 those resources to tasks for binding (task affinity). The first
1131 distribution method (before the first ":") controls the distri‐
1132 bution of tasks to nodes. The second distribution method (after
1133 the first ":") controls the distribution of allocated CPUs
1134 across sockets for binding to tasks. The third distribution
1135 method (after the second ":") controls the distribution of allo‐
1136 cated CPUs across cores for binding to tasks. The second and
1137 third distributions apply only if task affinity is enabled. The
1138 third distribution is supported only if the task/cgroup plugin
1139 is configured. The default value for each distribution type is
1140 specified by *.
1141
1142 Note that with select/cons_res, the number of CPUs allocated on
1143 each socket and node may be different. Refer to
1144 https://slurm.schedmd.com/mc_support.html for more information
1145 on resource allocation, distribution of tasks to nodes, and
1146 binding of tasks to CPUs.
1147 First distribution method (distribution of tasks across nodes):
1148
1149
1150 * Use the default method for distributing tasks to nodes
1151 (block).
1152
1153 block The block distribution method will distribute tasks to a
1154 node such that consecutive tasks share a node. For exam‐
1155 ple, consider an allocation of three nodes each with two
1156 cpus. A four-task block distribution request will dis‐
1157 tribute those tasks to the nodes with tasks one and two
1158 on the first node, task three on the second node, and
1159 task four on the third node. Block distribution is the
1160 default behavior if the number of tasks exceeds the num‐
1161 ber of allocated nodes.
1162
1163 cyclic The cyclic distribution method will distribute tasks to a
1164 node such that consecutive tasks are distributed over
1165 consecutive nodes (in a round-robin fashion). For exam‐
1166 ple, consider an allocation of three nodes each with two
1167 cpus. A four-task cyclic distribution request will dis‐
1168 tribute those tasks to the nodes with tasks one and four
1169 on the first node, task two on the second node, and task
1170 three on the third node. Note that when SelectType is
1171 select/cons_res, the same number of CPUs may not be allo‐
1172 cated on each node. Task distribution will be round-robin
1173 among all the nodes with CPUs yet to be assigned to
1174 tasks. Cyclic distribution is the default behavior if
1175 the number of tasks is no larger than the number of allo‐
1176 cated nodes.
1177
1178 plane The tasks are distributed in blocks of a specified size.
1179 The options include a number representing the size of the
1180 task block. This is followed by an optional specifica‐
1181 tion of the task distribution scheme within a block of
1182 tasks and between the blocks of tasks. The number of
1183 tasks distributed to each node is the same as for cyclic
1184 distribution, but the taskids assigned to each node
1185 depend on the plane size. For more details (including
1186 examples and diagrams), please see
1187 https://slurm.schedmd.com/mc_support.html
1188 and
1189 https://slurm.schedmd.com/dist_plane.html
1190
1191 arbitrary
1192 The arbitrary method of distribution will allocate pro‐
1193 cesses in-order as listed in file designated by the envi‐
1194 ronment variable SLURM_HOSTFILE. If this variable is
1195 listed it will over ride any other method specified. If
1196 not set the method will default to block. Inside the
1197 hostfile must contain at minimum the number of hosts
1198 requested and be one per line or comma separated. If
1199 specifying a task count (-n, --ntasks=<number>), your
1200 tasks will be laid out on the nodes in the order of the
1201 file.
1202 NOTE: The arbitrary distribution option on a job alloca‐
1203 tion only controls the nodes to be allocated to the job
1204 and not the allocation of CPUs on those nodes. This
1205 option is meant primarily to control a job step's task
1206 layout in an existing job allocation for the srun com‐
1207 mand.
1208 NOTE: If number of tasks is given and a list of requested
1209 nodes is also given the number of nodes used from that
1210 list will be reduced to match that of the number of tasks
1211 if the number of nodes in the list is greater than the
1212 number of tasks.
1213
1214
1215 Second distribution method (distribution of CPUs across sockets
1216 for binding):
1217
1218
1219 * Use the default method for distributing CPUs across sock‐
1220 ets (cyclic).
1221
1222 block The block distribution method will distribute allocated
1223 CPUs consecutively from the same socket for binding to
1224 tasks, before using the next consecutive socket.
1225
1226 cyclic The cyclic distribution method will distribute allocated
1227 CPUs for binding to a given task consecutively from the
1228 same socket, and from the next consecutive socket for the
1229 next task, in a round-robin fashion across sockets.
1230
1231 fcyclic
1232 The fcyclic distribution method will distribute allocated
1233 CPUs for binding to tasks from consecutive sockets in a
1234 round-robin fashion across the sockets.
1235
1236
1237 Third distribution method (distribution of CPUs across cores for
1238 binding):
1239
1240
1241 * Use the default method for distributing CPUs across cores
1242 (inherited from second distribution method).
1243
1244 block The block distribution method will distribute allocated
1245 CPUs consecutively from the same core for binding to
1246 tasks, before using the next consecutive core.
1247
1248 cyclic The cyclic distribution method will distribute allocated
1249 CPUs for binding to a given task consecutively from the
1250 same core, and from the next consecutive core for the
1251 next task, in a round-robin fashion across cores.
1252
1253 fcyclic
1254 The fcyclic distribution method will distribute allocated
1255 CPUs for binding to tasks from consecutive cores in a
1256 round-robin fashion across the cores.
1257
1258
1259
1260 Optional control for task distribution over nodes:
1261
1262
1263 Pack Rather than evenly distributing a job step's tasks evenly
1264 across it's allocated nodes, pack them as tightly as pos‐
1265 sible on the nodes.
1266
1267 NoPack Rather than packing a job step's tasks as tightly as pos‐
1268 sible on the nodes, distribute them evenly. This user
1269 option will supersede the SelectTypeParameters
1270 CR_Pack_Nodes configuration parameter.
1271
1272 This option applies to job and step allocations.
1273
1274
1275 --mail-type=<type>
1276 Notify user by email when certain event types occur. Valid type
1277 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1278 BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buf‐
1279 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1280 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
1281 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1282 time limit). Multiple type values may be specified in a comma
1283 separated list. The user to be notified is indicated with
1284 --mail-user. This option applies to job allocations.
1285
1286
1287 --mail-user=<user>
1288 User to receive email notification of state changes as defined
1289 by --mail-type. The default value is the submitting user. This
1290 option applies to job allocations.
1291
1292
1293 --mcs-label=<mcs>
1294 Used only when the mcs/group plugin is enabled. This parameter
1295 is a group among the groups of the user. Default value is cal‐
1296 culated by the Plugin mcs if it's enabled. This option applies
1297 to job allocations.
1298
1299
1300 --mem=<size[units]>
1301 Specify the real memory required per node. Default units are
1302 megabytes unless the SchedulerParameters configuration parameter
1303 includes the "default_gbytes" option for gigabytes. Different
1304 units can be specified using the suffix [K|M|G|T]. Default
1305 value is DefMemPerNode and the maximum value is MaxMemPerNode.
1306 If configured, both of parameters can be seen using the scontrol
1307 show config command. This parameter would generally be used if
1308 whole nodes are allocated to jobs (SelectType=select/linear).
1309 Specifying a memory limit of zero for a job step will restrict
1310 the job step to the amount of memory allocated to the job, but
1311 not remove any of the job's memory allocation from being avail‐
1312 able to other job steps. Also see --mem-per-cpu and
1313 --mem-per-gpu. The --mem, --mem-per-cpu and --mem-per-gpu
1314 options are mutually exclusive. If --mem, --mem-per-cpu or
1315 --mem-per-gpu are specified as command line arguments, then they
1316 will take precedence over the environment (potentially inherited
1317 from salloc or sbatch).
1318
1319 NOTE: A memory size specification of zero is treated as a spe‐
1320 cial case and grants the job access to all of the memory on each
1321 node for newly submitted jobs and all available job memory to a
1322 new job steps.
1323
1324 Specifying new memory limits for job steps are only advisory.
1325
1326 If the job is allocated multiple nodes in a heterogeneous clus‐
1327 ter, the memory limit on each node will be that of the node in
1328 the allocation with the smallest memory size (same limit will
1329 apply to every node in the job's allocation).
1330
1331 NOTE: Enforcement of memory limits currently relies upon the
1332 task/cgroup plugin or enabling of accounting, which samples mem‐
1333 ory use on a periodic basis (data need not be stored, just col‐
1334 lected). In both cases memory use is based upon the job's Resi‐
1335 dent Set Size (RSS). A task may exceed the memory limit until
1336 the next periodic accounting sample.
1337
1338 This option applies to job and step allocations.
1339
1340
1341 --mem-per-cpu=<size[units]>
1342 Minimum memory required per allocated CPU. Default units are
1343 megabytes unless the SchedulerParameters configuration parameter
1344 includes the "default_gbytes" option for gigabytes. Different
1345 units can be specified using the suffix [K|M|G|T]. Default
1346 value is DefMemPerCPU and the maximum value is MaxMemPerCPU (see
1347 exception below). If configured, both of parameters can be seen
1348 using the scontrol show config command. Note that if the job's
1349 --mem-per-cpu value exceeds the configured MaxMemPerCPU, then
1350 the user's limit will be treated as a memory limit per task;
1351 --mem-per-cpu will be reduced to a value no larger than MaxMem‐
1352 PerCPU; --cpus-per-task will be set and the value of
1353 --cpus-per-task multiplied by the new --mem-per-cpu value will
1354 equal the original --mem-per-cpu value specified by the user.
1355 This parameter would generally be used if individual processors
1356 are allocated to jobs (SelectType=select/cons_res). If
1357 resources are allocated by the core, socket or whole nodes; the
1358 number of CPUs allocated to a job may be higher than the task
1359 count and the value of --mem-per-cpu should be adjusted accord‐
1360 ingly. Specifying a memory limit of zero for a job step will
1361 restrict the job step to the amount of memory allocated to the
1362 job, but not remove any of the job's memory allocation from
1363 being available to other job steps. Also see --mem and
1364 --mem-per-gpu. The --mem, --mem-per-cpu and --mem-per-gpu
1365 options are mutually exclusive.
1366
1367 NOTE:If the final amount of memory requested by job (eg.: when
1368 --mem-per-cpu use with --exclusive option) can't be satisfied by
1369 any of nodes configured in the partition, the job will be
1370 rejected.
1371
1372
1373 --mem-per-gpu=<size[units]>
1374 Minimum memory required per allocated GPU. Default units are
1375 megabytes unless the SchedulerParameters configuration parameter
1376 includes the "default_gbytes" option for gigabytes. Different
1377 units can be specified using the suffix [K|M|G|T]. Default
1378 value is DefMemPerGPU and is available on both a global and per
1379 partition basis. If configured, the parameters can be seen
1380 using the scontrol show config and scontrol show partition com‐
1381 mands. Also see --mem. The --mem, --mem-per-cpu and
1382 --mem-per-gpu options are mutually exclusive.
1383
1384
1385 --mem-bind=[{quiet,verbose},]type
1386 Bind tasks to memory. Used only when the task/affinity plugin is
1387 enabled and the NUMA memory functions are available. Note that
1388 the resolution of CPU and memory binding may differ on some
1389 architectures. For example, CPU binding may be performed at the
1390 level of the cores within a processor while memory binding will
1391 be performed at the level of nodes, where the definition of
1392 "nodes" may differ from system to system. By default no memory
1393 binding is performed; any task using any CPU can use any memory.
1394 This option is typically used to ensure that each task is bound
1395 to the memory closest to it's assigned CPU. The use of any type
1396 other than "none" or "local" is not recommended. If you want
1397 greater control, try running a simple test code with the options
1398 "--cpu-bind=verbose,none --mem-bind=verbose,none" to determine
1399 the specific configuration.
1400
1401 NOTE: To have Slurm always report on the selected memory binding
1402 for all commands executed in a shell, you can enable verbose
1403 mode by setting the SLURM_MEM_BIND environment variable value to
1404 "verbose".
1405
1406 The following informational environment variables are set when
1407 --mem-bind is in use:
1408
1409 SLURM_MEM_BIND_LIST
1410 SLURM_MEM_BIND_PREFER
1411 SLURM_MEM_BIND_SORT
1412 SLURM_MEM_BIND_TYPE
1413 SLURM_MEM_BIND_VERBOSE
1414
1415 See the ENVIRONMENT VARIABLES section for a more detailed
1416 description of the individual SLURM_MEM_BIND* variables.
1417
1418 Supported options include:
1419
1420 help show this help message
1421
1422 local Use memory local to the processor in use
1423
1424 map_mem:<list>
1425 Bind by setting memory masks on tasks (or ranks) as spec‐
1426 ified where <list> is
1427 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1428 ping is specified for a node and identical mapping is
1429 applied to the tasks on every node (i.e. the lowest task
1430 ID on each node is mapped to the first ID specified in
1431 the list, etc.). NUMA IDs are interpreted as decimal
1432 values unless they are preceded with '0x' in which case
1433 they interpreted as hexadecimal values. If the number of
1434 tasks (or ranks) exceeds the number of elements in this
1435 list, elements in the list will be reused as needed
1436 starting from the beginning of the list. To simplify
1437 support for large task counts, the lists may follow a map
1438 with an asterisk and repetition count For example
1439 "map_mem:0x0f*4,0xf0*4". Not supported unless the entire
1440 node is allocated to the job.
1441
1442 mask_mem:<list>
1443 Bind by setting memory masks on tasks (or ranks) as spec‐
1444 ified where <list> is
1445 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1446 mapping is specified for a node and identical mapping is
1447 applied to the tasks on every node (i.e. the lowest task
1448 ID on each node is mapped to the first mask specified in
1449 the list, etc.). NUMA masks are always interpreted as
1450 hexadecimal values. Note that masks must be preceded
1451 with a '0x' if they don't begin with [0-9] so they are
1452 seen as numerical values. If the number of tasks (or
1453 ranks) exceeds the number of elements in this list, ele‐
1454 ments in the list will be reused as needed starting from
1455 the beginning of the list. To simplify support for large
1456 task counts, the lists may follow a mask with an asterisk
1457 and repetition count For example "mask_mem:0*4,1*4". Not
1458 supported unless the entire node is allocated to the job.
1459
1460 no[ne] don't bind tasks to memory (default)
1461
1462 nosort avoid sorting free cache pages (default, LaunchParameters
1463 configuration parameter can override this default)
1464
1465 p[refer]
1466 Prefer use of first specified NUMA node, but permit
1467 use of other available NUMA nodes.
1468
1469 q[uiet]
1470 quietly bind before task runs (default)
1471
1472 rank bind by task rank (not recommended)
1473
1474 sort sort free cache pages (run zonesort on Intel KNL nodes)
1475
1476 v[erbose]
1477 verbosely report binding before task runs
1478
1479 This option applies to job and step allocations.
1480
1481
1482 --mincpus=<n>
1483 Specify a minimum number of logical cpus/processors per node.
1484 This option applies to job allocations.
1485
1486
1487 --msg-timeout=<seconds>
1488 Modify the job launch message timeout. The default value is
1489 MessageTimeout in the Slurm configuration file slurm.conf.
1490 Changes to this are typically not recommended, but could be use‐
1491 ful to diagnose problems. This option applies to job alloca‐
1492 tions.
1493
1494
1495 --mpi=<mpi_type>
1496 Identify the type of MPI to be used. May result in unique initi‐
1497 ation procedures.
1498
1499 list Lists available mpi types to choose from.
1500
1501 openmpi
1502 For use with OpenMPI.
1503
1504 pmi2 To enable PMI2 support. The PMI2 support in Slurm works
1505 only if the MPI implementation supports it, in other
1506 words if the MPI has the PMI2 interface implemented. The
1507 --mpi=pmi2 will load the library lib/slurm/mpi_pmi2.so
1508 which provides the server side functionality but the
1509 client side must implement PMI2_Init() and the other
1510 interface calls.
1511
1512 pmix To enable PMIx support (http://pmix.github.io/master).
1513 The PMIx support in Slurm can be used to launch parallel
1514 applications (e.g. MPI) if it supports PMIx, PMI2 or
1515 PMI1. Slurm must be configured with pmix support by pass‐
1516 ing "--with-pmix=<PMIx installation path>" option to its
1517 "./configure" script.
1518
1519 At the time of writing PMIx is supported in Open MPI
1520 starting from version 2.0. PMIx also supports backward
1521 compatibility with PMI1 and PMI2 and can be used if MPI
1522 was configured with PMI2/PMI1 support pointing to the
1523 PMIx library ("libpmix"). If MPI supports PMI1/PMI2 but
1524 doesn't provide the way to point to a specific implemen‐
1525 tation, a hack'ish solution leveraging LD_PRELOAD can be
1526 used to force "libpmix" usage.
1527
1528
1529 none No special MPI processing. This is the default and works
1530 with many other versions of MPI.
1531
1532 This option applies to step allocations.
1533
1534
1535 --multi-prog
1536 Run a job with different programs and different arguments for
1537 each task. In this case, the executable program specified is
1538 actually a configuration file specifying the executable and
1539 arguments for each task. See MULTIPLE PROGRAM CONFIGURATION
1540 below for details on the configuration file contents. This
1541 option applies to step allocations.
1542
1543
1544 -N, --nodes=<minnodes[-maxnodes]>
1545 Request that a minimum of minnodes nodes be allocated to this
1546 job. A maximum node count may also be specified with maxnodes.
1547 If only one number is specified, this is used as both the mini‐
1548 mum and maximum node count. The partition's node limits super‐
1549 sede those of the job. If a job's node limits are outside of
1550 the range permitted for its associated partition, the job will
1551 be left in a PENDING state. This permits possible execution at
1552 a later time, when the partition limit is changed. If a job
1553 node limit exceeds the number of nodes configured in the parti‐
1554 tion, the job will be rejected. Note that the environment vari‐
1555 able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1556 ibility) will be set to the count of nodes actually allocated to
1557 the job. See the ENVIRONMENT VARIABLES section for more informa‐
1558 tion. If -N is not specified, the default behavior is to allo‐
1559 cate enough nodes to satisfy the requirements of the -n and -c
1560 options. The job will be allocated as many nodes as possible
1561 within the range specified and without delaying the initiation
1562 of the job. If number of tasks is given and a number of
1563 requested nodes is also given the number of nodes used from that
1564 request will be reduced to match that of the number of tasks if
1565 the number of nodes in the request is greater than the number of
1566 tasks. The node count specification may include a numeric value
1567 followed by a suffix of "k" (multiplies numeric value by 1,024)
1568 or "m" (multiplies numeric value by 1,048,576). This option
1569 applies to job and step allocations.
1570
1571
1572 -n, --ntasks=<number>
1573 Specify the number of tasks to run. Request that srun allocate
1574 resources for ntasks tasks. The default is one task per node,
1575 but note that the --cpus-per-task option will change this
1576 default. This option applies to job and step allocations.
1577
1578
1579 --network=<type>
1580 Specify information pertaining to the switch or network. The
1581 interpretation of type is system dependent. This option is sup‐
1582 ported when running Slurm on a Cray natively. It is used to
1583 request using Network Performance Counters. Only one value per
1584 request is valid. All options are case in-sensitive. In this
1585 configuration supported values include:
1586
1587 system
1588 Use the system-wide network performance counters. Only
1589 nodes requested will be marked in use for the job alloca‐
1590 tion. If the job does not fill up the entire system the
1591 rest of the nodes are not able to be used by other jobs
1592 using NPC, if idle their state will appear as PerfCnts.
1593 These nodes are still available for other jobs not using
1594 NPC.
1595
1596 blade Use the blade network performance counters. Only nodes
1597 requested will be marked in use for the job allocation.
1598 If the job does not fill up the entire blade(s) allocated
1599 to the job those blade(s) are not able to be used by other
1600 jobs using NPC, if idle their state will appear as PerfC‐
1601 nts. These nodes are still available for other jobs not
1602 using NPC.
1603
1604
1605 In all cases the job or step allocation request must
1606 specify the
1607 --exclusive option. Otherwise the request will be denied.
1608
1609 Also with any of these options steps are not allowed to share
1610 blades, so resources would remain idle inside an allocation if
1611 the step running on a blade does not take up all the nodes on
1612 the blade.
1613
1614 The network option is also supported on systems with IBM's Par‐
1615 allel Environment (PE). See IBM's LoadLeveler job command key‐
1616 word documentation about the keyword "network" for more informa‐
1617 tion. Multiple values may be specified in a comma separated
1618 list. All options are case in-sensitive. Supported values
1619 include:
1620
1621 BULK_XFER[=<resources>]
1622 Enable bulk transfer of data using Remote Direct-
1623 Memory Access (RDMA). The optional resources speci‐
1624 fication is a numeric value which can have a suffix
1625 of "k", "K", "m", "M", "g" or "G" for kilobytes,
1626 megabytes or gigabytes. NOTE: The resources speci‐
1627 fication is not supported by the underlying IBM in‐
1628 frastructure as of Parallel Environment version 2.2
1629 and no value should be specified at this time. The
1630 devices allocated to a job must all be of the same
1631 type. The default value depends upon depends upon
1632 what hardware is available and in order of prefer‐
1633 ences is IPONLY (which is not considered in User
1634 Space mode), HFI, IB, HPCE, and KMUX.
1635
1636 CAU=<count> Number of Collective Acceleration Units (CAU)
1637 required. Applies only to IBM Power7-IH processors.
1638 Default value is zero. Independent CAU will be
1639 allocated for each programming interface (MPI, LAPI,
1640 etc.)
1641
1642 DEVNAME=<name>
1643 Specify the device name to use for communications
1644 (e.g. "eth0" or "mlx4_0").
1645
1646 DEVTYPE=<type>
1647 Specify the device type to use for communications.
1648 The supported values of type are: "IB" (InfiniBand),
1649 "HFI" (P7 Host Fabric Interface), "IPONLY" (IP-Only
1650 interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1651 nel Emulation of HPCE). The devices allocated to a
1652 job must all be of the same type. The default value
1653 depends upon depends upon what hardware is available
1654 and in order of preferences is IPONLY (which is not
1655 considered in User Space mode), HFI, IB, HPCE, and
1656 KMUX.
1657
1658 IMMED =<count>
1659 Number of immediate send slots per window required.
1660 Applies only to IBM Power7-IH processors. Default
1661 value is zero.
1662
1663 INSTANCES =<count>
1664 Specify number of network connections for each task
1665 on each network connection. The default instance
1666 count is 1.
1667
1668 IPV4 Use Internet Protocol (IP) version 4 communications
1669 (default).
1670
1671 IPV6 Use Internet Protocol (IP) version 6 communications.
1672
1673 LAPI Use the LAPI programming interface.
1674
1675 MPI Use the MPI programming interface. MPI is the
1676 default interface.
1677
1678 PAMI Use the PAMI programming interface.
1679
1680 SHMEM Use the OpenSHMEM programming interface.
1681
1682 SN_ALL Use all available switch networks (default).
1683
1684 SN_SINGLE Use one available switch network.
1685
1686 UPC Use the UPC programming interface.
1687
1688 US Use User Space communications.
1689
1690
1691 Some examples of network specifications:
1692
1693 Instances=2,US,MPI,SN_ALL
1694 Create two user space connections for MPI communica‐
1695 tions on every switch network for each task.
1696
1697 US,MPI,Instances=3,Devtype=IB
1698 Create three user space connections for MPI communi‐
1699 cations on every InfiniBand network for each task.
1700
1701 IPV4,LAPI,SN_Single
1702 Create a IP version 4 connection for LAPI communica‐
1703 tions on one switch network for each task.
1704
1705 Instances=2,US,LAPI,MPI
1706 Create two user space connections each for LAPI and
1707 MPI communications on every switch network for each
1708 task. Note that SN_ALL is the default option so
1709 every switch network is used. Also note that
1710 Instances=2 specifies that two connections are
1711 established for each protocol (LAPI and MPI) and
1712 each task. If there are two networks and four tasks
1713 on the node then a total of 32 connections are
1714 established (2 instances x 2 protocols x 2 networks
1715 x 4 tasks).
1716
1717 This option applies to job and step allocations.
1718
1719
1720 --nice[=adjustment]
1721 Run the job with an adjusted scheduling priority within Slurm.
1722 With no adjustment value the scheduling priority is decreased by
1723 100. A negative nice value increases the priority, otherwise
1724 decreases it. The adjustment range is +/- 2147483645. Only priv‐
1725 ileged users can specify a negative adjustment.
1726
1727
1728 --ntasks-per-core=<ntasks>
1729 Request the maximum ntasks be invoked on each core. This option
1730 applies to the job allocation, but not to step allocations.
1731 Meant to be used with the --ntasks option. Related to
1732 --ntasks-per-node except at the core level instead of the node
1733 level. Masks will automatically be generated to bind the tasks
1734 to specific core unless --cpu-bind=none is specified. NOTE:
1735 This option is not supported unless SelectType=cons_res is con‐
1736 figured (either directly or indirectly on Cray systems) along
1737 with the node's core count.
1738
1739
1740 --ntasks-per-node=<ntasks>
1741 Request that ntasks be invoked on each node. If used with the
1742 --ntasks option, the --ntasks option will take precedence and
1743 the --ntasks-per-node will be treated as a maximum count of
1744 tasks per node. Meant to be used with the --nodes option. This
1745 is related to --cpus-per-task=ncpus, but does not require knowl‐
1746 edge of the actual number of cpus on each node. In some cases,
1747 it is more convenient to be able to request that no more than a
1748 specific number of tasks be invoked on each node. Examples of
1749 this include submitting a hybrid MPI/OpenMP app where only one
1750 MPI "task/rank" should be assigned to each node while allowing
1751 the OpenMP portion to utilize all of the parallelism present in
1752 the node, or submitting a single setup/cleanup/monitoring job to
1753 each node of a pre-existing allocation as one step in a larger
1754 job script. This option applies to job allocations.
1755
1756
1757 --ntasks-per-socket=<ntasks>
1758 Request the maximum ntasks be invoked on each socket. This
1759 option applies to the job allocation, but not to step alloca‐
1760 tions. Meant to be used with the --ntasks option. Related to
1761 --ntasks-per-node except at the socket level instead of the node
1762 level. Masks will automatically be generated to bind the tasks
1763 to specific sockets unless --cpu-bind=none is specified. NOTE:
1764 This option is not supported unless SelectType=cons_res is con‐
1765 figured (either directly or indirectly on Cray systems) along
1766 with the node's socket count.
1767
1768
1769 -O, --overcommit
1770 Overcommit resources. This option applies to job and step allo‐
1771 cations. When applied to job allocation, only one CPU is allo‐
1772 cated to the job per node and options used to specify the number
1773 of tasks per node, socket, core, etc. are ignored. When
1774 applied to job step allocations (the srun command when executed
1775 within an existing job allocation), this option can be used to
1776 launch more than one task per CPU. Normally, srun will not
1777 allocate more than one process per CPU. By specifying --over‐
1778 commit you are explicitly allowing more than one process per
1779 CPU. However no more than MAX_TASKS_PER_NODE tasks are permitted
1780 to execute per node. NOTE: MAX_TASKS_PER_NODE is defined in the
1781 file slurm.h and is not a variable, it is set at Slurm build
1782 time.
1783
1784
1785 -o, --output=<filename pattern>
1786 Specify the "filename pattern" for stdout redirection. By
1787 default in interactive mode, srun collects stdout from all tasks
1788 and sends this output via TCP/IP to the attached terminal. With
1789 --output stdout may be redirected to a file, to one file per
1790 task, or to /dev/null. See section IO Redirection below for the
1791 various forms of filename pattern. If the specified file
1792 already exists, it will be overwritten.
1793
1794 If --error is not also specified on the command line, both std‐
1795 out and stderr will directed to the file specified by --output.
1796 This option applies to job and step allocations.
1797
1798
1799 --open-mode=<append|truncate>
1800 Open the output and error files using append or truncate mode as
1801 specified. For heterogeneous job steps the default value is
1802 "append". Otherwise the default value is specified by the sys‐
1803 tem configuration parameter JobFileAppend. This option applies
1804 to job and step allocations.
1805
1806
1807 --pack-group=<expr>
1808 Identify each job in a heterogeneous job allocation for which a
1809 step is to be created. Applies only to srun commands issued
1810 inside a salloc allocation or sbatch script. <expr> is a set of
1811 integers corresponding to one or more options indexes on the
1812 salloc or sbatch command line. Examples: "--pack-group=2",
1813 "--pack-group=0,4", "--pack-group=1,3-5". The default value is
1814 --pack-group=0.
1815
1816
1817 -p, --partition=<partition_names>
1818 Request a specific partition for the resource allocation. If
1819 not specified, the default behavior is to allow the slurm con‐
1820 troller to select the default partition as designated by the
1821 system administrator. If the job can use more than one parti‐
1822 tion, specify their names in a comma separate list and the one
1823 offering earliest initiation will be used with no regard given
1824 to the partition name ordering (although higher priority parti‐
1825 tions will be considered first). When the job is initiated, the
1826 name of the partition used will be placed first in the job
1827 record partition string. This option applies to job allocations.
1828
1829
1830 --power=<flags>
1831 Comma separated list of power management plugin options. Cur‐
1832 rently available flags include: level (all nodes allocated to
1833 the job should have identical power caps, may be disabled by the
1834 Slurm configuration option PowerParameters=job_no_level). This
1835 option applies to job allocations.
1836
1837
1838 --priority=<value>
1839 Request a specific job priority. May be subject to configura‐
1840 tion specific constraints. value should either be a numeric
1841 value or "TOP" (for highest possible value). Only Slurm opera‐
1842 tors and administrators can set the priority of a job. This
1843 option applies to job allocations only.
1844
1845
1846 --profile=<all|none|[energy[,|task[,|filesystem[,|network]]]]>
1847 enables detailed data collection by the acct_gather_profile
1848 plugin. Detailed data are typically time-series that are stored
1849 in an HDF5 file for the job or an InfluxDB database depending on
1850 the configured plugin.
1851
1852
1853 All All data types are collected. (Cannot be combined with
1854 other values.)
1855
1856
1857 None No data types are collected. This is the default.
1858 (Cannot be combined with other values.)
1859
1860
1861 Energy Energy data is collected.
1862
1863
1864 Task Task (I/O, Memory, ...) data is collected.
1865
1866
1867 Filesystem
1868 Filesystem data is collected.
1869
1870
1871 Network Network (InfiniBand) data is collected.
1872
1873
1874 This option applies to job and step allocations.
1875
1876
1877 --prolog=<executable>
1878 srun will run executable just before launching the job step.
1879 The command line arguments for executable will be the command
1880 and arguments of the job step. If executable is "none", then no
1881 srun prolog will be run. This parameter overrides the SrunProlog
1882 parameter in slurm.conf. This parameter is completely indepen‐
1883 dent from the Prolog parameter in slurm.conf. This option
1884 applies to job allocations.
1885
1886
1887 --propagate[=rlimit[,rlimit...]]
1888 Allows users to specify which of the modifiable (soft) resource
1889 limits to propagate to the compute nodes and apply to their
1890 jobs. If no rlimit is specified, then all resource limits will
1891 be propagated. The following rlimit names are supported by
1892 Slurm (although some options may not be supported on some sys‐
1893 tems):
1894
1895 ALL All limits listed below (default)
1896
1897 NONE No limits listed below
1898
1899 AS The maximum address space for a process
1900
1901 CORE The maximum size of core file
1902
1903 CPU The maximum amount of CPU time
1904
1905 DATA The maximum size of a process's data segment
1906
1907 FSIZE The maximum size of files created. Note that if the
1908 user sets FSIZE to less than the current size of the
1909 slurmd.log, job launches will fail with a 'File size
1910 limit exceeded' error.
1911
1912 MEMLOCK The maximum size that may be locked into memory
1913
1914 NOFILE The maximum number of open files
1915
1916 NPROC The maximum number of processes available
1917
1918 RSS The maximum resident set size
1919
1920 STACK The maximum stack size
1921
1922 This option applies to job allocations.
1923
1924
1925 --pty Execute task zero in pseudo terminal mode. Implicitly sets
1926 --unbuffered. Implicitly sets --error and --output to /dev/null
1927 for all tasks except task zero, which may cause those tasks to
1928 exit immediately (e.g. shells will typically exit immediately in
1929 that situation). This option applies to step allocations.
1930
1931
1932 -q, --qos=<qos>
1933 Request a quality of service for the job. QOS values can be
1934 defined for each user/cluster/account association in the Slurm
1935 database. Users will be limited to their association's defined
1936 set of qos's when the Slurm configuration parameter, Account‐
1937 ingStorageEnforce, includes "qos" in it's definition. This
1938 option applies to job allocations.
1939
1940
1941 -Q, --quiet
1942 Suppress informational messages from srun. Errors will still be
1943 displayed. This option applies to job and step allocations.
1944
1945
1946 --quit-on-interrupt
1947 Quit immediately on single SIGINT (Ctrl-C). Use of this option
1948 disables the status feature normally available when srun
1949 receives a single Ctrl-C and causes srun to instead immediately
1950 terminate the running job. This option applies to step alloca‐
1951 tions.
1952
1953
1954 -r, --relative=<n>
1955 Run a job step relative to node n of the current allocation.
1956 This option may be used to spread several job steps out among
1957 the nodes of the current job. If -r is used, the current job
1958 step will begin at node n of the allocated nodelist, where the
1959 first node is considered node 0. The -r option is not permitted
1960 with -w or -x option and will result in a fatal error when not
1961 running within a prior allocation (i.e. when SLURM_JOB_ID is not
1962 set). The default for n is 0. If the value of --nodes exceeds
1963 the number of nodes identified with the --relative option, a
1964 warning message will be printed and the --relative option will
1965 take precedence. This option applies to step allocations.
1966
1967
1968 --reboot
1969 Force the allocated nodes to reboot before starting the job.
1970 This is only supported with some system configurations and will
1971 otherwise be silently ignored. This option applies to job allo‐
1972 cations.
1973
1974
1975 --resv-ports[=count]
1976 Reserve communication ports for this job. Users can specify the
1977 number of port they want to reserve. The parameter Mpi‐
1978 Params=ports=12000-12999 must be specified in slurm.conf. If not
1979 specified and Slurm's OpenMPI plugin is used, then by default
1980 the number of reserved equal to the highest number of tasks on
1981 any node in the job step allocation. If the number of reserved
1982 ports is zero then no ports is reserved. Used for OpenMPI. This
1983 option applies to job and step allocations.
1984
1985
1986 --reservation=<name>
1987 Allocate resources for the job from the named reservation. This
1988 option applies to job allocations.
1989
1990
1991 -s, --oversubscribe
1992 The job allocation can over-subscribe resources with other run‐
1993 ning jobs. The resources to be over-subscribed can be nodes,
1994 sockets, cores, and/or hyperthreads depending upon configura‐
1995 tion. The default over-subscribe behavior depends on system
1996 configuration and the partition's OverSubscribe option takes
1997 precedence over the job's option. This option may result in the
1998 allocation being granted sooner than if the --oversubscribe
1999 option was not set and allow higher system utilization, but
2000 application performance will likely suffer due to competition
2001 for resources. Also see the --exclusive option. This option
2002 applies to step allocations.
2003
2004
2005 -S, --core-spec=<num>
2006 Count of specialized cores per node reserved by the job for sys‐
2007 tem operations and not used by the application. The application
2008 will not use these cores, but will be charged for their alloca‐
2009 tion. Default value is dependent upon the node's configured
2010 CoreSpecCount value. If a value of zero is designated and the
2011 Slurm configuration option AllowSpecResourcesUsage is enabled,
2012 the job will be allowed to override CoreSpecCount and use the
2013 specialized resources on nodes it is allocated. This option can
2014 not be used with the --thread-spec option. This option applies
2015 to job allocations.
2016
2017
2018 --signal=<sig_num>[@<sig_time>]
2019 When a job is within sig_time seconds of its end time, send it
2020 the signal sig_num. Due to the resolution of event handling by
2021 Slurm, the signal may be sent up to 60 seconds earlier than
2022 specified. sig_num may either be a signal number or name (e.g.
2023 "10" or "USR1"). sig_time must have an integer value between 0
2024 and 65535. By default, no signal is sent before the job's end
2025 time. If a sig_num is specified without any sig_time, the
2026 default time will be 60 seconds. This option applies to job
2027 allocations. To have the signal sent at preemption time see the
2028 preempt_send_user_signal SlurmctldParameter.
2029
2030
2031 --slurmd-debug=<level>
2032 Specify a debug level for slurmd(8). The level may be specified
2033 either an integer value between 0 [quiet, only errors are dis‐
2034 played] and 4 [verbose operation] or the SlurmdDebug tags.
2035
2036 quiet Log nothing
2037
2038 fatal Log only fatal errors
2039
2040 error Log only errors
2041
2042 info Log errors and general informational messages
2043
2044 verbose Log errors and verbose informational messages
2045
2046
2047 The slurmd debug information is copied onto the stderr of
2048 the job. By default only errors are displayed. This option
2049 applies to job and step allocations.
2050
2051
2052 --sockets-per-node=<sockets>
2053 Restrict node selection to nodes with at least the specified
2054 number of sockets. See additional information under -B option
2055 above when task/affinity plugin is enabled. This option applies
2056 to job allocations.
2057
2058
2059 --spread-job
2060 Spread the job allocation over as many nodes as possible and
2061 attempt to evenly distribute tasks across the allocated nodes.
2062 This option disables the topology/tree plugin. This option
2063 applies to job allocations.
2064
2065
2066 --switches=<count>[@<max-time>]
2067 When a tree topology is used, this defines the maximum count of
2068 switches desired for the job allocation and optionally the maxi‐
2069 mum time to wait for that number of switches. If Slurm finds an
2070 allocation containing more switches than the count specified,
2071 the job remains pending until it either finds an allocation with
2072 desired switch count or the time limit expires. It there is no
2073 switch count limit, there is no delay in starting the job.
2074 Acceptable time formats include "minutes", "minutes:seconds",
2075 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
2076 "days-hours:minutes:seconds". The job's maximum time delay may
2077 be limited by the system administrator using the SchedulerParam‐
2078 eters configuration parameter with the max_switch_wait parameter
2079 option. On a dragonfly network the only switch count supported
2080 is 1 since communication performance will be highest when a job
2081 is allocate resources on one leaf switch or more than 2 leaf
2082 switches. The default max-time is the max_switch_wait Sched‐
2083 ulerParameters. This option applies to job allocations.
2084
2085
2086 -T, --threads=<nthreads>
2087 Allows limiting the number of concurrent threads used to send
2088 the job request from the srun process to the slurmd processes on
2089 the allocated nodes. Default is to use one thread per allocated
2090 node up to a maximum of 60 concurrent threads. Specifying this
2091 option limits the number of concurrent threads to nthreads (less
2092 than or equal to 60). This should only be used to set a low
2093 thread count for testing on very small memory computers. This
2094 option applies to job allocations.
2095
2096
2097 -t, --time=<time>
2098 Set a limit on the total run time of the job allocation. If the
2099 requested time limit exceeds the partition's time limit, the job
2100 will be left in a PENDING state (possibly indefinitely). The
2101 default time limit is the partition's default time limit. When
2102 the time limit is reached, each task in each job step is sent
2103 SIGTERM followed by SIGKILL. The interval between signals is
2104 specified by the Slurm configuration parameter KillWait. The
2105 OverTimeLimit configuration parameter may permit the job to run
2106 longer than scheduled. Time resolution is one minute and second
2107 values are rounded up to the next minute.
2108
2109 A time limit of zero requests that no time limit be imposed.
2110 Acceptable time formats include "minutes", "minutes:seconds",
2111 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
2112 "days-hours:minutes:seconds". This option applies to job and
2113 step allocations.
2114
2115
2116 --task-epilog=<executable>
2117 The slurmstepd daemon will run executable just after each task
2118 terminates. This will be executed before any TaskEpilog parame‐
2119 ter in slurm.conf is executed. This is meant to be a very
2120 short-lived program. If it fails to terminate within a few sec‐
2121 onds, it will be killed along with any descendant processes.
2122 This option applies to step allocations.
2123
2124
2125 --task-prolog=<executable>
2126 The slurmstepd daemon will run executable just before launching
2127 each task. This will be executed after any TaskProlog parameter
2128 in slurm.conf is executed. Besides the normal environment vari‐
2129 ables, this has SLURM_TASK_PID available to identify the process
2130 ID of the task being started. Standard output from this program
2131 of the form "export NAME=value" will be used to set environment
2132 variables for the task being spawned. This option applies to
2133 step allocations.
2134
2135
2136 --test-only
2137 Returns an estimate of when a job would be scheduled to run
2138 given the current job queue and all the other srun arguments
2139 specifying the job. This limits srun's behavior to just return
2140 information; no job is actually submitted. The program will be
2141 executed directly by the slurmd daemon. This option applies to
2142 job allocations.
2143
2144
2145 --thread-spec=<num>
2146 Count of specialized threads per node reserved by the job for
2147 system operations and not used by the application. The applica‐
2148 tion will not use these threads, but will be charged for their
2149 allocation. This option can not be used with the --core-spec
2150 option. This option applies to job allocations.
2151
2152
2153 --threads-per-core=<threads>
2154 Restrict node selection to nodes with at least the specified
2155 number of threads per core. NOTE: "Threads" refers to the num‐
2156 ber of processing units on each core rather than the number of
2157 application tasks to be launched per core. See additional
2158 information under -B option above when task/affinity plugin is
2159 enabled. This option applies to job allocations.
2160
2161
2162 --time-min=<time>
2163 Set a minimum time limit on the job allocation. If specified,
2164 the job may have it's --time limit lowered to a value no lower
2165 than --time-min if doing so permits the job to begin execution
2166 earlier than otherwise possible. The job's time limit will not
2167 be changed after the job is allocated resources. This is per‐
2168 formed by a backfill scheduling algorithm to allocate resources
2169 otherwise reserved for higher priority jobs. Acceptable time
2170 formats include "minutes", "minutes:seconds", "hours:min‐
2171 utes:seconds", "days-hours", "days-hours:minutes" and
2172 "days-hours:minutes:seconds". This option applies to job alloca‐
2173 tions.
2174
2175
2176 --tmp=<size[units]>
2177 Specify a minimum amount of temporary disk space per node.
2178 Default units are megabytes unless the SchedulerParameters con‐
2179 figuration parameter includes the "default_gbytes" option for
2180 gigabytes. Different units can be specified using the suffix
2181 [K|M|G|T]. This option applies to job allocations.
2182
2183
2184 -u, --unbuffered
2185 By default the connection between slurmstepd and the user
2186 launched application is over a pipe. The stdio output written by
2187 the application is buffered by the glibc until it is flushed or
2188 the output is set as unbuffered. See setbuf(3). If this option
2189 is specified the tasks are executed with a pseudo terminal so
2190 that the application output is unbuffered. This option applies
2191 to step allocations.
2192
2193 --usage
2194 Display brief help message and exit.
2195
2196
2197 --uid=<user>
2198 Attempt to submit and/or run a job as user instead of the invok‐
2199 ing user id. The invoking user's credentials will be used to
2200 check access permissions for the target partition. User root may
2201 use this option to run jobs as a normal user in a RootOnly par‐
2202 tition for example. If run as root, srun will drop its permis‐
2203 sions to the uid specified after node allocation is successful.
2204 user may be the user name or numerical user ID. This option
2205 applies to job and step allocations.
2206
2207
2208 --use-min-nodes
2209 If a range of node counts is given, prefer the smaller count.
2210
2211
2212 -V, --version
2213 Display version information and exit.
2214
2215
2216 -v, --verbose
2217 Increase the verbosity of srun's informational messages. Multi‐
2218 ple -v's will further increase srun's verbosity. By default
2219 only errors will be displayed. This option applies to job and
2220 step allocations.
2221
2222
2223 -W, --wait=<seconds>
2224 Specify how long to wait after the first task terminates before
2225 terminating all remaining tasks. A value of 0 indicates an
2226 unlimited wait (a warning will be issued after 60 seconds). The
2227 default value is set by the WaitTime parameter in the slurm con‐
2228 figuration file (see slurm.conf(5)). This option can be useful
2229 to ensure that a job is terminated in a timely fashion in the
2230 event that one or more tasks terminate prematurely. Note: The
2231 -K, --kill-on-bad-exit option takes precedence over -W, --wait
2232 to terminate the job immediately if a task exits with a non-zero
2233 exit code. This option applies to job allocations.
2234
2235
2236 -w, --nodelist=<host1,host2,... or filename>
2237 Request a specific list of hosts. The job will contain all of
2238 these hosts and possibly additional hosts as needed to satisfy
2239 resource requirements. The list may be specified as a
2240 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
2241 for example), or a filename. The host list will be assumed to
2242 be a filename if it contains a "/" character. If you specify a
2243 minimum node or processor count larger than can be satisfied by
2244 the supplied host list, additional resources will be allocated
2245 on other nodes as needed. Rather than repeating a host name
2246 multiple times, an asterisk and a repetition count may be
2247 appended to a host name. For example "host1,host1" and "host1*2"
2248 are equivalent. If number of tasks is given and a list of
2249 requested nodes is also given the number of nodes used from that
2250 list will be reduced to match that of the number of tasks if the
2251 number of nodes in the list is greater than the number of tasks.
2252 This option applies to job and step allocations.
2253
2254
2255 --wckey=<wckey>
2256 Specify wckey to be used with job. If TrackWCKey=no (default)
2257 in the slurm.conf this value is ignored. This option applies to
2258 job allocations.
2259
2260
2261 -X, --disable-status
2262 Disable the display of task status when srun receives a single
2263 SIGINT (Ctrl-C). Instead immediately forward the SIGINT to the
2264 running job. Without this option a second Ctrl-C in one second
2265 is required to forcibly terminate the job and srun will immedi‐
2266 ately exit. May also be set via the environment variable
2267 SLURM_DISABLE_STATUS. This option applies to job allocations.
2268
2269
2270 -x, --exclude=<host1,host2,... or filename>
2271 Request that a specific list of hosts not be included in the
2272 resources allocated to this job. The host list will be assumed
2273 to be a filename if it contains a "/"character. This option
2274 applies to job allocations.
2275
2276
2277 --x11[=<all|first|last>]
2278 Sets up X11 forwarding on all, first or last node(s) of the
2279 allocation. This option is only enabled if Slurm was compiled
2280 with X11 support and PrologFlags=x11 is defined in the
2281 slurm.conf. Default is all.
2282
2283
2284 -Z, --no-allocate
2285 Run the specified tasks on a set of nodes without creating a
2286 Slurm "job" in the Slurm queue structure, bypassing the normal
2287 resource allocation step. The list of nodes must be specified
2288 with the -w, --nodelist option. This is a privileged option
2289 only available for the users "SlurmUser" and "root". This option
2290 applies to job allocations.
2291
2292
2293 srun will submit the job request to the slurm job controller, then ini‐
2294 tiate all processes on the remote nodes. If the request cannot be met
2295 immediately, srun will block until the resources are free to run the
2296 job. If the -I (--immediate) option is specified srun will terminate if
2297 resources are not immediately available.
2298
2299 When initiating remote processes srun will propagate the current work‐
2300 ing directory, unless --chdir=<path> is specified, in which case path
2301 will become the working directory for the remote processes.
2302
2303 The -n, -c, and -N options control how CPUs and nodes will be allo‐
2304 cated to the job. When specifying only the number of processes to run
2305 with -n, a default of one CPU per process is allocated. By specifying
2306 the number of CPUs required per task (-c), more than one CPU may be
2307 allocated per process. If the number of nodes is specified with -N,
2308 srun will attempt to allocate at least the number of nodes specified.
2309
2310 Combinations of the above three options may be used to change how pro‐
2311 cesses are distributed across nodes and cpus. For instance, by specify‐
2312 ing both the number of processes and number of nodes on which to run,
2313 the number of processes per node is implied. However, if the number of
2314 CPUs per process is more important then number of processes (-n) and
2315 the number of CPUs per process (-c) should be specified.
2316
2317 srun will refuse to allocate more than one process per CPU unless
2318 --overcommit (-O) is also specified.
2319
2320 srun will attempt to meet the above specifications "at a minimum." That
2321 is, if 16 nodes are requested for 32 processes, and some nodes do not
2322 have 2 CPUs, the allocation of nodes will be increased in order to meet
2323 the demand for CPUs. In other words, a minimum of 16 nodes are being
2324 requested. However, if 16 nodes are requested for 15 processes, srun
2325 will consider this an error, as 15 processes cannot run across 16
2326 nodes.
2327
2328
2329 IO Redirection
2330
2331 By default, stdout and stderr will be redirected from all tasks to the
2332 stdout and stderr of srun, and stdin will be redirected from the stan‐
2333 dard input of srun to all remote tasks. If stdin is only to be read by
2334 a subset of the spawned tasks, specifying a file to read from rather
2335 than forwarding stdin from the srun command may be preferable as it
2336 avoids moving and storing data that will never be read.
2337
2338 For OS X, the poll() function does not support stdin, so input from a
2339 terminal is not possible.
2340
2341 This behavior may be changed with the --output, --error, and --input
2342 (-o, -e, -i) options. Valid format specifications for these options are
2343
2344 all stdout stderr is redirected from all tasks to srun. stdin is
2345 broadcast to all remote tasks. (This is the default behav‐
2346 ior)
2347
2348 none stdout and stderr is not received from any task. stdin is
2349 not sent to any task (stdin is closed).
2350
2351 taskid stdout and/or stderr are redirected from only the task with
2352 relative id equal to taskid, where 0 <= taskid <= ntasks,
2353 where ntasks is the total number of tasks in the current job
2354 step. stdin is redirected from the stdin of srun to this
2355 same task. This file will be written on the node executing
2356 the task.
2357
2358 filename srun will redirect stdout and/or stderr to the named file
2359 from all tasks. stdin will be redirected from the named file
2360 and broadcast to all tasks in the job. filename refers to a
2361 path on the host that runs srun. Depending on the cluster's
2362 file system layout, this may result in the output appearing
2363 in different places depending on whether the job is run in
2364 batch mode.
2365
2366 filename pattern
2367 srun allows for a filename pattern to be used to generate the
2368 named IO file described above. The following list of format
2369 specifiers may be used in the format string to generate a
2370 filename that will be unique to a given jobid, stepid, node,
2371 or task. In each case, the appropriate number of files are
2372 opened and associated with the corresponding tasks. Note that
2373 any format string containing %t, %n, and/or %N will be writ‐
2374 ten on the node executing the task rather than the node where
2375 srun executes, these format specifiers are not supported on a
2376 BGQ system.
2377
2378 \\ Do not process any of the replacement symbols.
2379
2380 %% The character "%".
2381
2382 %A Job array's master job allocation number.
2383
2384 %a Job array ID (index) number.
2385
2386 %J jobid.stepid of the running job. (e.g. "128.0")
2387
2388 %j jobid of the running job.
2389
2390 %s stepid of the running job.
2391
2392 %N short hostname. This will create a separate IO file
2393 per node.
2394
2395 %n Node identifier relative to current job (e.g. "0" is
2396 the first node of the running job) This will create a
2397 separate IO file per node.
2398
2399 %t task identifier (rank) relative to current job. This
2400 will create a separate IO file per task.
2401
2402 %u User name.
2403
2404 %x Job name.
2405
2406 A number placed between the percent character and format
2407 specifier may be used to zero-pad the result in the IO file‐
2408 name. This number is ignored if the format specifier corre‐
2409 sponds to non-numeric data (%N for example).
2410
2411 Some examples of how the format string may be used for a 4
2412 task job step with a Job ID of 128 and step id of 0 are
2413 included below:
2414
2415 job%J.out job128.0.out
2416
2417 job%4j.out job0128.out
2418
2419 job%j-%2t.out job128-00.out, job128-01.out, ...
2420
2422 Some srun options may be set via environment variables. These environ‐
2423 ment variables, along with their corresponding options, are listed
2424 below. Note: Command line options will always override these settings.
2425
2426 PMI_FANOUT This is used exclusively with PMI (MPICH2 and
2427 MVAPICH2) and controls the fanout of data commu‐
2428 nications. The srun command sends messages to
2429 application programs (via the PMI library) and
2430 those applications may be called upon to forward
2431 that data to up to this number of additional
2432 tasks. Higher values offload work from the srun
2433 command to the applications and likely increase
2434 the vulnerability to failures. The default value
2435 is 32.
2436
2437 PMI_FANOUT_OFF_HOST This is used exclusively with PMI (MPICH2 and
2438 MVAPICH2) and controls the fanout of data commu‐
2439 nications. The srun command sends messages to
2440 application programs (via the PMI library) and
2441 those applications may be called upon to forward
2442 that data to additional tasks. By default, srun
2443 sends one message per host and one task on that
2444 host forwards the data to other tasks on that
2445 host up to PMI_FANOUT. If PMI_FANOUT_OFF_HOST is
2446 defined, the user task may be required to forward
2447 the data to tasks on other hosts. Setting
2448 PMI_FANOUT_OFF_HOST may increase performance.
2449 Since more work is performed by the PMI library
2450 loaded by the user application, failures also can
2451 be more common and more difficult to diagnose.
2452
2453 PMI_TIME This is used exclusively with PMI (MPICH2 and
2454 MVAPICH2) and controls how much the communica‐
2455 tions from the tasks to the srun are spread out
2456 in time in order to avoid overwhelming the srun
2457 command with work. The default value is 500
2458 (microseconds) per task. On relatively slow pro‐
2459 cessors or systems with very large processor
2460 counts (and large PMI data sets), higher values
2461 may be required.
2462
2463 SLURM_CONF The location of the Slurm configuration file.
2464
2465 SLURM_ACCOUNT Same as -A, --account
2466
2467 SLURM_ACCTG_FREQ Same as --acctg-freq
2468
2469 SLURM_BCAST Same as --bcast
2470
2471 SLURM_BURST_BUFFER Same as --bb
2472
2473 SLURM_CHECKPOINT Same as --checkpoint
2474
2475 SLURM_COMPRESS Same as --compress
2476
2477 SLURM_CONSTRAINT Same as -C, --constraint
2478
2479 SLURM_CORE_SPEC Same as --core-spec
2480
2481 SLURM_CPU_BIND Same as --cpu-bind
2482
2483 SLURM_CPU_FREQ_REQ Same as --cpu-freq.
2484
2485 SLURM_CPUS_PER_GPU Same as --cpus-per-gpu
2486
2487 SLURM_CPUS_PER_TASK Same as -c, --cpus-per-task
2488
2489 SLURM_DEBUG Same as -v, --verbose
2490
2491 SLURM_DELAY_BOOT Same as --delay-boot
2492
2493 SLURMD_DEBUG Same as -d, --slurmd-debug
2494
2495 SLURM_DEPENDENCY Same as -P, --dependency=<jobid>
2496
2497 SLURM_DISABLE_STATUS Same as -X, --disable-status
2498
2499 SLURM_DIST_PLANESIZE Same as -m plane
2500
2501 SLURM_DISTRIBUTION Same as -m, --distribution
2502
2503 SLURM_EPILOG Same as --epilog
2504
2505 SLURM_EXCLUSIVE Same as --exclusive
2506
2507 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
2508 error occurs (e.g. invalid options). This can be
2509 used by a script to distinguish application exit
2510 codes from various Slurm error conditions. Also
2511 see SLURM_EXIT_IMMEDIATE.
2512
2513 SLURM_EXIT_IMMEDIATE Specifies the exit code generated when the
2514 --immediate option is used and resources are not
2515 currently available. This can be used by a
2516 script to distinguish application exit codes from
2517 various Slurm error conditions. Also see
2518 SLURM_EXIT_ERROR.
2519
2520 SLURM_EXPORT_ENV Same as --export
2521
2522 SLURM_GPUS Same as -G, --gpus
2523
2524 SLURM_GPU_BIND Same as --gpu-bind
2525
2526 SLURM_GPU_FREQ Same as --gpu-freq
2527
2528 SLURM_GPUS_PER_NODE Same as --gpus-per-node
2529
2530 SLURM_GPUS_PER_TASK Same as --gpus-per-task
2531
2532 SLURM_GRES_FLAGS Same as --gres-flags
2533
2534 SLURM_HINT Same as --hint
2535
2536 SLURM_GRES Same as --gres. Also see SLURM_STEP_GRES
2537
2538 SLURM_IMMEDIATE Same as -I, --immediate
2539
2540 SLURM_JOB_ID Same as --jobid
2541
2542 SLURM_JOB_NAME Same as -J, --job-name except within an existing
2543 allocation, in which case it is ignored to avoid
2544 using the batch job's name as the name of each
2545 job step.
2546
2547 SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility)
2548 Same as -N, --nodes Total number of nodes in the
2549 job’s resource allocation.
2550
2551 SLURM_KILL_BAD_EXIT Same as -K, --kill-on-bad-exit
2552
2553 SLURM_LABELIO Same as -l, --label
2554
2555 SLURM_MEM_BIND Same as --mem-bind
2556
2557 SLURM_MEM_PER_CPU Same as --mem-per-cpu
2558
2559 SLURM_MEM_PER_GPU Same as --mem-per-gpu
2560
2561 SLURM_MEM_PER_NODE Same as --mem
2562
2563 SLURM_MPI_TYPE Same as --mpi
2564
2565 SLURM_NETWORK Same as --network
2566
2567 SLURM_NO_KILL Same as -k, --no-kill
2568
2569 SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
2570 Same as -n, --ntasks
2571
2572 SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2573
2574 SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2575
2576 SLURM_NTASKS_PER_SOCKET
2577 Same as --ntasks-per-socket
2578
2579 SLURM_OPEN_MODE Same as --open-mode
2580
2581 SLURM_OVERCOMMIT Same as -O, --overcommit
2582
2583 SLURM_PARTITION Same as -p, --partition
2584
2585 SLURM_PMI_KVS_NO_DUP_KEYS
2586 If set, then PMI key-pairs will contain no dupli‐
2587 cate keys. MPI can use this variable to inform
2588 the PMI library that it will not use duplicate
2589 keys so PMI can skip the check for duplicate
2590 keys. This is the case for MPICH2 and reduces
2591 overhead in testing for duplicates for improved
2592 performance
2593
2594 SLURM_POWER Same as --power
2595
2596 SLURM_PROFILE Same as --profile
2597
2598 SLURM_PROLOG Same as --prolog
2599
2600 SLURM_QOS Same as --qos
2601
2602 SLURM_REMOTE_CWD Same as -D, --chdir=
2603
2604 SLURM_REQ_SWITCH When a tree topology is used, this defines the
2605 maximum count of switches desired for the job
2606 allocation and optionally the maximum time to
2607 wait for that number of switches. See --switches
2608
2609 SLURM_RESERVATION Same as --reservation
2610
2611 SLURM_RESV_PORTS Same as --resv-ports
2612
2613 SLURM_SIGNAL Same as --signal
2614
2615 SLURM_STDERRMODE Same as -e, --error
2616
2617 SLURM_STDINMODE Same as -i, --input
2618
2619 SLURM_SPREAD_JOB Same as --spread-job
2620
2621 SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2622 if set and non-zero, successive task exit mes‐
2623 sages with the same exit code will be printed
2624 only once.
2625
2626 SLURM_STEP_GRES Same as --gres (only applies to job steps, not to
2627 job allocations). Also see SLURM_GRES
2628
2629 SLURM_STEP_KILLED_MSG_NODE_ID=ID
2630 If set, only the specified node will log when the
2631 job or step are killed by a signal.
2632
2633 SLURM_STDOUTMODE Same as -o, --output
2634
2635 SLURM_TASK_EPILOG Same as --task-epilog
2636
2637 SLURM_TASK_PROLOG Same as --task-prolog
2638
2639 SLURM_TEST_EXEC If defined, srun will verify existence of the
2640 executable program along with user execute per‐
2641 mission on the node where srun was called before
2642 attempting to launch it on nodes in the step.
2643
2644 SLURM_THREAD_SPEC Same as --thread-spec
2645
2646 SLURM_THREADS Same as -T, --threads
2647
2648 SLURM_TIMELIMIT Same as -t, --time
2649
2650 SLURM_UNBUFFEREDIO Same as -u, --unbuffered
2651
2652 SLURM_USE_MIN_NODES Same as --use-min-nodes
2653
2654 SLURM_WAIT Same as -W, --wait
2655
2656 SLURM_WAIT4SWITCH Max time waiting for requested switches. See
2657 --switches
2658
2659 SLURM_WCKEY Same as -W, --wckey
2660
2661 SLURM_WORKING_DIR -D, --chdir
2662
2663 SRUN_EXPORT_ENV Same as --export, and will override any setting
2664 for SRUN_EXPORT_ENV.
2665
2666
2667
2669 srun will set some environment variables in the environment of the exe‐
2670 cuting tasks on the remote compute nodes. These environment variables
2671 are:
2672
2673
2674 SLURM_*_PACK_GROUP_# For a heterogeneous job allocation, the environ‐
2675 ment variables are set separately for each compo‐
2676 nent.
2677
2678 SLURM_CLUSTER_NAME Name of the cluster on which the job is execut‐
2679 ing.
2680
2681 SLURM_CPU_BIND_VERBOSE
2682 --cpu-bind verbosity (quiet,verbose).
2683
2684 SLURM_CPU_BIND_TYPE --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2685
2686 SLURM_CPU_BIND_LIST --cpu-bind map or mask list (list of Slurm CPU
2687 IDs or masks for this node, CPU_ID = Board_ID x
2688 threads_per_board + Socket_ID x
2689 threads_per_socket + Core_ID x threads_per_core +
2690 Thread_ID).
2691
2692
2693 SLURM_CPU_FREQ_REQ Contains the value requested for cpu frequency on
2694 the srun command as a numerical frequency in
2695 kilohertz, or a coded value for a request of low,
2696 medium,highm1 or high for the frequency. See the
2697 description of the --cpu-freq option or the
2698 SLURM_CPU_FREQ_REQ input environment variable.
2699
2700 SLURM_CPUS_ON_NODE Count of processors available to the job on this
2701 node. Note the select/linear plugin allocates
2702 entire nodes to jobs, so the value indicates the
2703 total count of CPUs on the node. For the
2704 select/cons_res plugin, this number indicates the
2705 number of cores on this node allocated to the
2706 job.
2707
2708 SLURM_CPUS_PER_GPU Number of CPUs requested per allocated GPU. Only
2709 set if the --cpus-per-gpu option is specified.
2710
2711 SLURM_CPUS_PER_TASK Number of cpus requested per task. Only set if
2712 the --cpus-per-task option is specified.
2713
2714 SLURM_DISTRIBUTION Distribution type for the allocated jobs. Set the
2715 distribution with -m, --distribution.
2716
2717 SLURM_GPUS Number of GPUs requested. Only set if the -G,
2718 --gpus option is specified.
2719
2720 SLURM_GPU_BIND Requested binding of tasks to GPU. Only set if
2721 the --gpu-bind option is specified.
2722
2723 SLURM_GPU_FREQ Requested GPU frequency. Only set if the
2724 --gpu-freq option is specified.
2725
2726 SLURM_GPUS_PER_NODE Requested GPU count per allocated node. Only set
2727 if the --gpus-per-node option is specified.
2728
2729 SLURM_GPUS_PER_SOCKET Requested GPU count per allocated socket. Only
2730 set if the --gpus-per-socket option is specified.
2731
2732 SLURM_GPUS_PER_TASK Requested GPU count per allocated task. Only set
2733 if the --gpus-per-task option is specified.
2734
2735 SLURM_GTIDS Global task IDs running on this node. Zero ori‐
2736 gin and comma separated.
2737
2738 SLURM_JOB_ACCOUNT Account name associated of the job allocation.
2739
2740 SLURM_JOB_CPUS_PER_NODE
2741 Number of CPUS per node.
2742
2743 SLURM_JOB_DEPENDENCY Set to value of the --dependency option.
2744
2745 SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
2746 Job id of the executing job.
2747
2748
2749 SLURM_JOB_NAME Set to the value of the --job-name option or the
2750 command name when srun is used to create a new
2751 job allocation. Not set when srun is used only to
2752 create a job step (i.e. within an existing job
2753 allocation).
2754
2755
2756 SLURM_JOB_PARTITION Name of the partition in which the job is run‐
2757 ning.
2758
2759
2760 SLURM_JOB_QOS Quality Of Service (QOS) of the job allocation.
2761
2762 SLURM_JOB_RESERVATION Advanced reservation containing the job alloca‐
2763 tion, if any.
2764
2765
2766 SLURM_LAUNCH_NODE_IPADDR
2767 IP address of the node from which the task launch
2768 was initiated (where the srun command ran from).
2769
2770 SLURM_LOCALID Node local task ID for the process within a job.
2771
2772
2773 SLURM_MEM_BIND_LIST --mem-bind map or mask list (<list of IDs or
2774 masks for this node>).
2775
2776 SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2777
2778 SLURM_MEM_BIND_SORT Sort free cache pages (run zonesort on Intel KNL
2779 nodes).
2780
2781 SLURM_MEM_BIND_TYPE --mem-bind type (none,rank,map_mem:,mask_mem:).
2782
2783 SLURM_MEM_BIND_VERBOSE
2784 --mem-bind verbosity (quiet,verbose).
2785
2786 SLURM_MEM_PER_GPU Requested memory per allocated GPU. Only set if
2787 the --mem-per-gpu option is specified.
2788
2789 SLURM_JOB_NODES Total number of nodes in the job's resource allo‐
2790 cation.
2791
2792 SLURM_NODE_ALIASES Sets of node name, communication address and
2793 hostname for nodes allocated to the job from the
2794 cloud. Each element in the set if colon separated
2795 and each set is comma separated. For example:
2796 SLURM_NODE_ALIASES=
2797 ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2798
2799 SLURM_NODEID The relative node ID of the current node.
2800
2801 SLURM_JOB_NODELIST List of nodes allocated to the job.
2802
2803 SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
2804 Total number of processes in the current job or
2805 job step.
2806
2807 SLURM_PACK_SIZE Set to count of components in heterogeneous job.
2808
2809 SLURM_PRIO_PROCESS The scheduling priority (nice value) at the time
2810 of job submission. This value is propagated to
2811 the spawned processes.
2812
2813 SLURM_PROCID The MPI rank (or relative process ID) of the cur‐
2814 rent process.
2815
2816 SLURM_SRUN_COMM_HOST IP address of srun communication host.
2817
2818 SLURM_SRUN_COMM_PORT srun communication port.
2819
2820 SLURM_STEP_LAUNCHER_PORT
2821 Step launcher port.
2822
2823 SLURM_STEP_NODELIST List of nodes allocated to the step.
2824
2825 SLURM_STEP_NUM_NODES Number of nodes allocated to the step.
2826
2827 SLURM_STEP_NUM_TASKS Number of processes in the step.
2828
2829 SLURM_STEP_TASKS_PER_NODE
2830 Number of processes per node within the step.
2831
2832 SLURM_STEP_ID (and SLURM_STEPID for backwards compatibility)
2833 The step ID of the current job.
2834
2835 SLURM_SUBMIT_DIR The directory from which srun was invoked or, if
2836 applicable, the directory specified by the -D,
2837 --chdir option.
2838
2839 SLURM_SUBMIT_HOST The hostname of the computer from which salloc
2840 was invoked.
2841
2842 SLURM_TASK_PID The process ID of the task being started.
2843
2844 SLURM_TASKS_PER_NODE Number of tasks to be initiated on each node.
2845 Values are comma separated and in the same order
2846 as SLURM_JOB_NODELIST. If two or more consecu‐
2847 tive nodes are to have the same task count, that
2848 count is followed by "(x#)" where "#" is the rep‐
2849 etition count. For example,
2850 "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2851 first three nodes will each execute three tasks
2852 and the fourth node will execute one task.
2853
2854
2855 SLURM_TOPOLOGY_ADDR This is set only if the system has the topol‐
2856 ogy/tree plugin configured. The value will be
2857 set to the names network switches which may be
2858 involved in the job's communications from the
2859 system's top level switch down to the leaf switch
2860 and ending with node name. A period is used to
2861 separate each hardware component name.
2862
2863 SLURM_TOPOLOGY_ADDR_PATTERN
2864 This is set only if the system has the topol‐
2865 ogy/tree plugin configured. The value will be
2866 set component types listed in SLURM_TOPOL‐
2867 OGY_ADDR. Each component will be identified as
2868 either "switch" or "node". A period is used to
2869 separate each hardware component type.
2870
2871 SLURM_UMASK The umask in effect when the job was submitted.
2872
2873 SLURMD_NODENAME Name of the node running the task. In the case of
2874 a parallel job executing on multiple compute
2875 nodes, the various tasks will have this environ‐
2876 ment variable set to different values on each
2877 compute node.
2878
2879 SRUN_DEBUG Set to the logging level of the srun command.
2880 Default value is 3 (info level). The value is
2881 incremented or decremented based upon the --ver‐
2882 bose and --quiet options.
2883
2884
2886 Signals sent to the srun command are automatically forwarded to the
2887 tasks it is controlling with a few exceptions. The escape sequence
2888 <control-c> will report the state of all tasks associated with the srun
2889 command. If <control-c> is entered twice within one second, then the
2890 associated SIGINT signal will be sent to all tasks and a termination
2891 sequence will be entered sending SIGCONT, SIGTERM, and SIGKILL to all
2892 spawned tasks. If a third <control-c> is received, the srun program
2893 will be terminated without waiting for remote tasks to exit or their
2894 I/O to complete.
2895
2896 The escape sequence <control-z> is presently ignored. Our intent is for
2897 this put the srun command into a mode where various special actions may
2898 be invoked.
2899
2900
2902 MPI use depends upon the type of MPI being used. There are three fun‐
2903 damentally different modes of operation used by these various MPI
2904 implementation.
2905
2906 1. Slurm directly launches the tasks and performs initialization of
2907 communications through the PMI2 or PMIx APIs. For example: "srun -n16
2908 a.out".
2909
2910 2. Slurm creates a resource allocation for the job and then mpirun
2911 launches tasks using Slurm's infrastructure (OpenMPI).
2912
2913 3. Slurm creates a resource allocation for the job and then mpirun
2914 launches tasks using some mechanism other than Slurm, such as SSH or
2915 RSH. These tasks are initiated outside of Slurm's monitoring or con‐
2916 trol. Slurm's epilog should be configured to purge these tasks when the
2917 job's allocation is relinquished, or the use of pam_slurm_adopt is
2918 highly recommended.
2919
2920 See https://slurm.schedmd.com/mpi_guide.html for more information on
2921 use of these various MPI implementation with Slurm.
2922
2923
2925 Comments in the configuration file must have a "#" in column one. The
2926 configuration file contains the following fields separated by white
2927 space:
2928
2929 Task rank
2930 One or more task ranks to use this configuration. Multiple val‐
2931 ues may be comma separated. Ranges may be indicated with two
2932 numbers separated with a '-' with the smaller number first (e.g.
2933 "0-4" and not "4-0"). To indicate all tasks not otherwise spec‐
2934 ified, specify a rank of '*' as the last line of the file. If
2935 an attempt is made to initiate a task for which no executable
2936 program is defined, the following error message will be produced
2937 "No executable program specified for this task".
2938
2939 Executable
2940 The name of the program to execute. May be fully qualified
2941 pathname if desired.
2942
2943 Arguments
2944 Program arguments. The expression "%t" will be replaced with
2945 the task's number. The expression "%o" will be replaced with
2946 the task's offset within this range (e.g. a configured task rank
2947 value of "1-5" would have offset values of "0-4"). Single
2948 quotes may be used to avoid having the enclosed values inter‐
2949 preted. This field is optional. Any arguments for the program
2950 entered on the command line will be added to the arguments spec‐
2951 ified in the configuration file.
2952
2953 For example:
2954 ###################################################################
2955 # srun multiple program configuration file
2956 #
2957 # srun -n8 -l --multi-prog silly.conf
2958 ###################################################################
2959 4-6 hostname
2960 1,7 echo task:%t
2961 0,2-3 echo offset:%o
2962
2963 > srun -n8 -l --multi-prog silly.conf
2964 0: offset:0
2965 1: task:1
2966 2: offset:1
2967 3: offset:2
2968 4: linux15.llnl.gov
2969 5: linux16.llnl.gov
2970 6: linux17.llnl.gov
2971 7: task:7
2972
2973
2974
2975
2977 This simple example demonstrates the execution of the command hostname
2978 in eight tasks. At least eight processors will be allocated to the job
2979 (the same as the task count) on however many nodes are required to sat‐
2980 isfy the request. The output of each task will be proceeded with its
2981 task number. (The machine "dev" in the example below has a total of
2982 two CPUs per node)
2983
2984
2985 > srun -n8 -l hostname
2986 0: dev0
2987 1: dev0
2988 2: dev1
2989 3: dev1
2990 4: dev2
2991 5: dev2
2992 6: dev3
2993 7: dev3
2994
2995
2996 The srun -r option is used within a job script to run two job steps on
2997 disjoint nodes in the following example. The script is run using allo‐
2998 cate mode instead of as a batch job in this case.
2999
3000
3001 > cat test.sh
3002 #!/bin/sh
3003 echo $SLURM_JOB_NODELIST
3004 srun -lN2 -r2 hostname
3005 srun -lN2 hostname
3006
3007 > salloc -N4 test.sh
3008 dev[7-10]
3009 0: dev9
3010 1: dev10
3011 0: dev7
3012 1: dev8
3013
3014
3015 The following script runs two job steps in parallel within an allocated
3016 set of nodes.
3017
3018
3019 > cat test.sh
3020 #!/bin/bash
3021 srun -lN2 -n4 -r 2 sleep 60 &
3022 srun -lN2 -r 0 sleep 60 &
3023 sleep 1
3024 squeue
3025 squeue -s
3026 wait
3027
3028 > salloc -N4 test.sh
3029 JOBID PARTITION NAME USER ST TIME NODES NODELIST
3030 65641 batch test.sh grondo R 0:01 4 dev[7-10]
3031
3032 STEPID PARTITION USER TIME NODELIST
3033 65641.0 batch grondo 0:01 dev[7-8]
3034 65641.1 batch grondo 0:01 dev[9-10]
3035
3036
3037 This example demonstrates how one executes a simple MPI job. We use
3038 srun to build a list of machines (nodes) to be used by mpirun in its
3039 required format. A sample command line and the script to be executed
3040 follow.
3041
3042
3043 > cat test.sh
3044 #!/bin/sh
3045 MACHINEFILE="nodes.$SLURM_JOB_ID"
3046
3047 # Generate Machinefile for mpi such that hosts are in the same
3048 # order as if run via srun
3049 #
3050 srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3051
3052 # Run using generated Machine file:
3053 mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3054
3055 rm $MACHINEFILE
3056
3057 > salloc -N2 -n4 test.sh
3058
3059
3060 This simple example demonstrates the execution of different jobs on
3061 different nodes in the same srun. You can do this for any number of
3062 nodes or any number of jobs. The executables are placed on the nodes
3063 sited by the SLURM_NODEID env var. Starting at 0 and going to the num‐
3064 ber specified on the srun commandline.
3065
3066
3067 > cat test.sh
3068 case $SLURM_NODEID in
3069 0) echo "I am running on "
3070 hostname ;;
3071 1) hostname
3072 echo "is where I am running" ;;
3073 esac
3074
3075 > srun -N2 test.sh
3076 dev0
3077 is where I am running
3078 I am running on
3079 dev1
3080
3081
3082 This example demonstrates use of multi-core options to control layout
3083 of tasks. We request that four sockets per node and two cores per
3084 socket be dedicated to the job.
3085
3086
3087 > srun -N2 -B 4-4:2-2 a.out
3088
3089 This example shows a script in which Slurm is used to provide resource
3090 management for a job by executing the various job steps as processors
3091 become available for their dedicated use.
3092
3093
3094 > cat my.script
3095 #!/bin/bash
3096 srun --exclusive -n4 prog1 &
3097 srun --exclusive -n3 prog2 &
3098 srun --exclusive -n1 prog3 &
3099 srun --exclusive -n1 prog4 &
3100 wait
3101
3102
3103 This example shows how to launch an application called "master" with
3104 one task, 8 CPUs and and 16 GB of memory (2 GB per CPU) plus another
3105 application called "slave" with 16 tasks, 1 CPU per task (the default)
3106 and 1 GB of memory per task.
3107
3108
3109 > srun -n1 -c16 --mem-per-cpu=1gb master : -n16 --mem-per-cpu=1gb slave
3110
3111
3113 Copyright (C) 2006-2007 The Regents of the University of California.
3114 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3115 Copyright (C) 2008-2010 Lawrence Livermore National Security.
3116 Copyright (C) 2010-2015 SchedMD LLC.
3117
3118 This file is part of Slurm, a resource management program. For
3119 details, see <https://slurm.schedmd.com/>.
3120
3121 Slurm is free software; you can redistribute it and/or modify it under
3122 the terms of the GNU General Public License as published by the Free
3123 Software Foundation; either version 2 of the License, or (at your
3124 option) any later version.
3125
3126 Slurm is distributed in the hope that it will be useful, but WITHOUT
3127 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
3128 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
3129 for more details.
3130
3131
3133 salloc(1), sattach(1), sbatch(1), sbcast(1), scancel(1), scontrol(1),
3134 squeue(1), slurm.conf(5), sched_setaffinity (2), numa (3) getrlimit (2)
3135
3136
3137
3138December 2019 Slurm Commands srun(1)