1srun(1) Slurm Commands srun(1)
2
3
4
6 srun - Run parallel jobs
7
8
10 srun [OPTIONS(0)...] [ : [OPTIONS(n)...]] executable(0) [args(0)...]
11
12 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
13 For more details about heterogeneous jobs see the document
14 https://slurm.schedmd.com/heterogeneous_jobs.html
15
16
18 Run a parallel job on cluster managed by Slurm. If necessary, srun
19 will first create a resource allocation in which to run the parallel
20 job.
21
22 The following document describes the influence of various options on
23 the allocation of cpus to jobs and tasks.
24 https://slurm.schedmd.com/cpu_management.html
25
26
28 srun will return the highest exit code of all tasks run or the highest
29 signal (with the high-order bit set in an 8-bit integer -- e.g. 128 +
30 signal) of any task that exited with a signal.
31
32
34 The executable is resolved in the following order:
35
36 1. If executable starts with ".", then path is constructed as: current
37 working directory / executable
38
39 2. If executable starts with a "/", then path is considered absolute.
40
41 3. If executable can be resolved through PATH. See path_resolution(7).
42
43 4. If executable is in current working directory.
44
45
47 --accel-bind=<options>
48 Control how tasks are bound to generic resources of type gpu,
49 mic and nic. Multiple options may be specified. Supported
50 options include:
51
52 g Bind each task to GPUs which are closest to the allocated
53 CPUs.
54
55 m Bind each task to MICs which are closest to the allocated
56 CPUs.
57
58 n Bind each task to NICs which are closest to the allocated
59 CPUs.
60
61 v Verbose mode. Log how tasks are bound to GPU and NIC
62 devices.
63
64 This option applies to job allocations.
65
66
67 -A, --account=<account>
68 Charge resources used by this job to specified account. The
69 account is an arbitrary string. The account name may be changed
70 after job submission using the scontrol command. This option
71 applies to job allocations.
72
73
74 --acctg-freq
75 Define the job accounting and profiling sampling intervals.
76 This can be used to override the JobAcctGatherFrequency parame‐
77 ter in Slurm's configuration file, slurm.conf. The supported
78 format is follows:
79
80 --acctg-freq=<datatype>=<interval>
81 where <datatype>=<interval> specifies the task sam‐
82 pling interval for the jobacct_gather plugin or a
83 sampling interval for a profiling type by the
84 acct_gather_profile plugin. Multiple, comma-sepa‐
85 rated <datatype>=<interval> intervals may be speci‐
86 fied. Supported datatypes are as follows:
87
88 task=<interval>
89 where <interval> is the task sampling inter‐
90 val in seconds for the jobacct_gather plugins
91 and for task profiling by the
92 acct_gather_profile plugin. NOTE: This fre‐
93 quency is used to monitor memory usage. If
94 memory limits are enforced the highest fre‐
95 quency a user can request is what is config‐
96 ured in the slurm.conf file. They can not
97 turn it off (=0) either.
98
99 energy=<interval>
100 where <interval> is the sampling interval in
101 seconds for energy profiling using the
102 acct_gather_energy plugin
103
104 network=<interval>
105 where <interval> is the sampling interval in
106 seconds for infiniband profiling using the
107 acct_gather_infiniband plugin.
108
109 filesystem=<interval>
110 where <interval> is the sampling interval in
111 seconds for filesystem profiling using the
112 acct_gather_filesystem plugin.
113
114 The default value for the task sampling
115 interval
116 is 30. The default value for all other intervals is 0. An
117 interval of 0 disables sampling of the specified type. If the
118 task sampling interval is 0, accounting information is collected
119 only at job termination (reducing Slurm interference with the
120 job).
121 Smaller (non-zero) values have a greater impact upon job perfor‐
122 mance, but a value of 30 seconds is not likely to be noticeable
123 for applications having less than 10,000 tasks. This option
124 applies job allocations.
125
126
127 -B --extra-node-info=<sockets[:cores[:threads]]>
128 Restrict node selection to nodes with at least the specified
129 number of sockets, cores per socket and/or threads per core.
130 NOTE: These options do not specify the resource allocation size.
131 Each value specified is considered a minimum. An asterisk (*)
132 can be used as a placeholder indicating that all available
133 resources of that type are to be utilized. Values can also be
134 specified as min-max. The individual levels can also be speci‐
135 fied in separate options if desired:
136 --sockets-per-node=<sockets>
137 --cores-per-socket=<cores>
138 --threads-per-core=<threads>
139 If task/affinity plugin is enabled, then specifying an alloca‐
140 tion in this manner also sets a default --cpu-bind option of
141 threads if the -B option specifies a thread count, otherwise an
142 option of cores if a core count is specified, otherwise an
143 option of sockets. If SelectType is configured to
144 select/cons_res, it must have a parameter of CR_Core,
145 CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option
146 to be honored. If not specified, the scontrol show job will
147 display 'ReqS:C:T=*:*:*'. This option applies to job alloca‐
148 tions.
149
150
151 --bb=<spec>
152 Burst buffer specification. The form of the specification is
153 system dependent. Also see --bbf. This option applies to job
154 allocations.
155
156
157 --bbf=<file_name>
158 Path of file containing burst buffer specification. The form of
159 the specification is system dependent. Also see --bb. This
160 option applies to job allocations.
161
162
163 --bcast[=<dest_path>]
164 Copy executable file to allocated compute nodes. If a file name
165 is specified, copy the executable to the specified destination
166 file path. If no path is specified, copy the file to a file
167 named "slurm_bcast_<job_id>.<step_id>" in the current working.
168 For example, "srun --bcast=/tmp/mine -N3 a.out" will copy the
169 file "a.out" from your current directory to the file "/tmp/mine"
170 on each of the three allocated compute nodes and execute that
171 file. This option applies to step allocations.
172
173
174 --begin=<time>
175 Defer initiation of this job until the specified time. It
176 accepts times of the form HH:MM:SS to run a job at a specific
177 time of day (seconds are optional). (If that time is already
178 past, the next day is assumed.) You may also specify midnight,
179 noon, fika (3 PM) or teatime (4 PM) and you can have a
180 time-of-day suffixed with AM or PM for running in the morning or
181 the evening. You can also say what day the job will be run, by
182 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
183 Combine date and time using the following format
184 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
185 count time-units, where the time-units can be seconds (default),
186 minutes, hours, days, or weeks and you can tell Slurm to run the
187 job today with the keyword today and to run the job tomorrow
188 with the keyword tomorrow. The value may be changed after job
189 submission using the scontrol command. For example:
190 --begin=16:00
191 --begin=now+1hour
192 --begin=now+60 (seconds by default)
193 --begin=2010-01-20T12:34:00
194
195
196 Notes on date/time specifications:
197 - Although the 'seconds' field of the HH:MM:SS time specifica‐
198 tion is allowed by the code, note that the poll time of the
199 Slurm scheduler is not precise enough to guarantee dispatch of
200 the job on the exact second. The job will be eligible to start
201 on the next poll following the specified time. The exact poll
202 interval depends on the Slurm scheduler (e.g., 60 seconds with
203 the default sched/builtin).
204 - If no time (HH:MM:SS) is specified, the default is
205 (00:00:00).
206 - If a date is specified without a year (e.g., MM/DD) then the
207 current year is assumed, unless the combination of MM/DD and
208 HH:MM:SS has already passed for that year, in which case the
209 next year is used.
210 This option applies to job allocations.
211
212
213 --checkpoint=<time>
214 Specifies the interval between creating checkpoints of the job
215 step. By default, the job step will have no checkpoints cre‐
216 ated. Acceptable time formats include "minutes", "minutes:sec‐
217 onds", "hours:minutes:seconds", "days-hours", "days-hours:min‐
218 utes" and "days-hours:minutes:seconds". This option applies to
219 job and step allocations.
220
221
222 --checkpoint-dir=<directory>
223 Specifies the directory into which the job or job step's check‐
224 point should be written (used by the checkpoint/blcr and check‐
225 point/xlch plugins only). The default value is the current
226 working directory. Checkpoint files will be of the form
227 "<job_id>.ckpt" for jobs and "<job_id>.<step_id>.ckpt" for job
228 steps. This option applies to job and step allocations.
229
230
231 --cluster-constraint=<list>
232 Specifies features that a federated cluster must have to have a
233 sibling job submitted to it. Slurm will attempt to submit a sib‐
234 ling job to a cluster if it has at least one of the specified
235 features.
236
237
238 --comment=<string>
239 An arbitrary comment. This option applies to job allocations.
240
241
242 --compress[=type]
243 Compress file before sending it to compute hosts. The optional
244 argument specifies the data compression library to be used.
245 Supported values are "lz4" (default) and "zlib". Some compres‐
246 sion libraries may be unavailable on some systems. For use with
247 the --bcast option. This option applies to step allocations.
248
249
250 -C, --constraint=<list>
251 Nodes can have features assigned to them by the Slurm adminis‐
252 trator. Users can specify which of these features are required
253 by their job using the constraint option. Only nodes having
254 features matching the job constraints will be used to satisfy
255 the request. Multiple constraints may be specified with AND,
256 OR, matching OR, resource counts, etc. (some operators are not
257 supported on all system types). Supported constraint options
258 include:
259
260 Single Name
261 Only nodes which have the specified feature will be used.
262 For example, --constraint="intel"
263
264 Node Count
265 A request can specify the number of nodes needed with
266 some feature by appending an asterisk and count after the
267 feature name. For example "--nodes=16 --con‐
268 straint=graphics*4 ..." indicates that the job requires
269 16 nodes and that at least four of those nodes must have
270 the feature "graphics."
271
272 AND If only nodes with all of specified features will be
273 used. The ampersand is used for an AND operator. For
274 example, --constraint="intel&gpu"
275
276 OR If only nodes with at least one of specified features
277 will be used. The vertical bar is used for an OR opera‐
278 tor. For example, --constraint="intel|amd"
279
280 Matching OR
281 If only one of a set of possible options should be used
282 for all allocated nodes, then use the OR operator and
283 enclose the options within square brackets. For example:
284 "--constraint=[rack1|rack2|rack3|rack4]" might be used to
285 specify that all nodes must be allocated on a single rack
286 of the cluster, but any of those four racks can be used.
287
288 Multiple Counts
289 Specific counts of multiple resources may be specified by
290 using the AND operator and enclosing the options within
291 square brackets. For example: "--con‐
292 straint=[rack1*2&rack2*4]" might be used to specify that
293 two nodes must be allocated from nodes with the feature
294 of "rack1" and four nodes must be allocated from nodes
295 with the feature "rack2".
296
297 NOTE: This construct does not support multiple Intel KNL
298 NUMA or MCDRAM modes. For example, while "--con‐
299 straint=[(knl&quad)*2&(knl&hemi)*4]" is not supported,
300 "--constraint=[haswell*2&(knl&hemi)*4]" is supported.
301 Specification of multiple KNL modes requires the use of a
302 heterogeneous job.
303
304
305 Parenthesis
306 Parenthesis can be used to group like node features
307 together. For example "--con‐
308 straint=[(knl&snc4&flat)*4&haswell*1]" might be used to
309 specify that four nodes with the features "knl", "snc4"
310 and "flat" plus one node with the feature "haswell" are
311 required. All options within parenthesis should be
312 grouped with AND (e.g. "&") operands.
313
314 WARNING: When srun is executed from within salloc or sbatch, the con‐
315 straint value can only contain a single feature name. None of the other
316 operators are currently supported for job steps.
317 This option applies to job and step allocations.
318
319
320 --contiguous
321 If set, then the allocated nodes must form a contiguous set.
322 Not honored with the topology/tree or topology/3d_torus plugins,
323 both of which can modify the node ordering. This option applies
324 to job allocations.
325
326
327 --cores-per-socket=<cores>
328 Restrict node selection to nodes with at least the specified
329 number of cores per socket. See additional information under -B
330 option above when task/affinity plugin is enabled. This option
331 applies to job allocations.
332
333
334 --cpu-bind=[{quiet,verbose},]type
335 Bind tasks to CPUs. Used only when the task/affinity or
336 task/cgroup plugin is enabled. NOTE: To have Slurm always
337 report on the selected CPU binding for all commands executed in
338 a shell, you can enable verbose mode by setting the
339 SLURM_CPU_BIND environment variable value to "verbose".
340
341 The following informational environment variables are set when
342 --cpu-bind is in use:
343 SLURM_CPU_BIND_VERBOSE
344 SLURM_CPU_BIND_TYPE
345 SLURM_CPU_BIND_LIST
346
347 See the ENVIRONMENT VARIABLES section for a more detailed
348 description of the individual SLURM_CPU_BIND variables. These
349 variable are available only if the task/affinity plugin is con‐
350 figured.
351
352 When using --cpus-per-task to run multithreaded tasks, be aware
353 that CPU binding is inherited from the parent of the process.
354 This means that the multithreaded task should either specify or
355 clear the CPU binding itself to avoid having all threads of the
356 multithreaded task use the same mask/CPU as the parent. Alter‐
357 natively, fat masks (masks which specify more than one allowed
358 CPU) could be used for the tasks in order to provide multiple
359 CPUs for the multithreaded tasks.
360
361 By default, a job step has access to every CPU allocated to the
362 job. To ensure that distinct CPUs are allocated to each job
363 step, use the --exclusive option.
364
365 Note that a job step can be allocated different numbers of CPUs
366 on each node or be allocated CPUs not starting at location zero.
367 Therefore one of the options which automatically generate the
368 task binding is recommended. Explicitly specified masks or
369 bindings are only honored when the job step has been allocated
370 every available CPU on the node.
371
372 Binding a task to a NUMA locality domain means to bind the task
373 to the set of CPUs that belong to the NUMA locality domain or
374 "NUMA node". If NUMA locality domain options are used on sys‐
375 tems with no NUMA support, then each socket is considered a
376 locality domain.
377
378 If the --cpu-bind option is not used, the default binding mode
379 will depend upon Slurm's configuration and the step's resource
380 allocation. If all allocated nodes have the same configured
381 CpuBind mode, that will be used. Otherwise if the job's Parti‐
382 tion has a configured CpuBind mode, that will be used. Other‐
383 wise if Slurm has a configured TaskPluginParam value, that mode
384 will be used. Otherwise automatic binding will be performed as
385 described below.
386
387
388 Auto Binding
389 Applies only when task/affinity is enabled. If the job
390 step allocation includes an allocation with a number of
391 sockets, cores, or threads equal to the number of tasks
392 times cpus-per-task, then the tasks will by default be
393 bound to the appropriate resources (auto binding). Dis‐
394 able this mode of operation by explicitly setting
395 "--cpu-bind=none". Use TaskPluginParam=auto‐
396 bind=[threads|cores|sockets] to set a default cpu binding
397 in case "auto binding" doesn't find a match.
398
399 Supported options include:
400
401 q[uiet]
402 Quietly bind before task runs (default)
403
404 v[erbose]
405 Verbosely report binding before task runs
406
407 no[ne] Do not bind tasks to CPUs (default unless auto
408 binding is applied)
409
410 rank Automatically bind by task rank. The lowest num‐
411 bered task on each node is bound to socket (or
412 core or thread) zero, etc. Not supported unless
413 the entire node is allocated to the job.
414
415 map_cpu:<list>
416 Bind by setting CPU masks on tasks (or ranks) as
417 specified where <list> is
418 <cpu_id_for_task_0>,<cpu_id_for_task_1>,... CPU
419 IDs are interpreted as decimal values unless they
420 are preceded with '0x' in which case they inter‐
421 preted as hexadecimal values. If the number of
422 tasks (or ranks) exceeds the number of elements in
423 this list, elements in the list will be reused as
424 needed starting from the beginning of the list.
425 To simplify support for large task counts, the
426 lists may follow a map with an asterisk and repe‐
427 tition count For example "map_cpu:0x0f*4,0xf0*4".
428 Not supported unless the entire node is allocated
429 to the job.
430
431 mask_cpu:<list>
432 Bind by setting CPU masks on tasks (or ranks) as
433 specified where <list> is
434 <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
435 The mapping is specified for a node and identical
436 mapping is applied to the tasks on every node
437 (i.e. the lowest task ID on each node is mapped to
438 the first mask specified in the list, etc.). CPU
439 masks are always interpreted as hexadecimal values
440 but can be preceded with an optional '0x'. Not
441 supported unless the entire node is allocated to
442 the job. To simplify support for large task
443 counts, the lists may follow a map with an aster‐
444 isk and repetition count For example
445 "mask_cpu:0x0f*4,0xf0*4". Not supported unless
446 the entire node is allocated to the job.
447
448 rank_ldom
449 Bind to a NUMA locality domain by rank. Not sup‐
450 ported unless the entire node is allocated to the
451 job.
452
453 map_ldom:<list>
454 Bind by mapping NUMA locality domain IDs to tasks
455 as specified where <list> is
456 <ldom1>,<ldom2>,...<ldomN>. The locality domain
457 IDs are interpreted as decimal values unless they
458 are preceded with '0x' in which case they are
459 interpreted as hexadecimal values. Not supported
460 unless the entire node is allocated to the job.
461
462 mask_ldom:<list>
463 Bind by setting NUMA locality domain masks on
464 tasks as specified where <list> is
465 <mask1>,<mask2>,...<maskN>. NUMA locality domain
466 masks are always interpreted as hexadecimal values
467 but can be preceded with an optional '0x'. Not
468 supported unless the entire node is allocated to
469 the job.
470
471 sockets
472 Automatically generate masks binding tasks to
473 sockets. Only the CPUs on the socket which have
474 been allocated to the job will be used. If the
475 number of tasks differs from the number of allo‐
476 cated sockets this can result in sub-optimal bind‐
477 ing.
478
479 cores Automatically generate masks binding tasks to
480 cores. If the number of tasks differs from the
481 number of allocated cores this can result in
482 sub-optimal binding.
483
484 threads
485 Automatically generate masks binding tasks to
486 threads. If the number of tasks differs from the
487 number of allocated threads this can result in
488 sub-optimal binding.
489
490 ldoms Automatically generate masks binding tasks to NUMA
491 locality domains. If the number of tasks differs
492 from the number of allocated locality domains this
493 can result in sub-optimal binding.
494
495 boards Automatically generate masks binding tasks to
496 boards. If the number of tasks differs from the
497 number of allocated boards this can result in
498 sub-optimal binding. This option is supported by
499 the task/cgroup plugin only.
500
501 help Show help message for cpu-bind
502
503 This option applies to job and step allocations.
504
505
506 --cpu-freq =<p1[-p2[:p3]]>
507
508 Request that the job step initiated by this srun command be run
509 at some requested frequency if possible, on the CPUs selected
510 for the step on the compute node(s).
511
512 p1 can be [#### | low | medium | high | highm1] which will set
513 the frequency scaling_speed to the corresponding value, and set
514 the frequency scaling_governor to UserSpace. See below for defi‐
515 nition of the values.
516
517 p1 can be [Conservative | OnDemand | Performance | PowerSave]
518 which will set the scaling_governor to the corresponding value.
519 The governor has to be in the list set by the slurm.conf option
520 CpuFreqGovernors.
521
522 When p2 is present, p1 will be the minimum scaling frequency and
523 p2 will be the maximum scaling frequency.
524
525 p2 can be [#### | medium | high | highm1] p2 must be greater
526 than p1.
527
528 p3 can be [Conservative | OnDemand | Performance | PowerSave |
529 UserSpace] which will set the governor to the corresponding
530 value.
531
532 If p3 is UserSpace, the frequency scaling_speed will be set by a
533 power or energy aware scheduling strategy to a value between p1
534 and p2 that lets the job run within the site's power goal. The
535 job may be delayed if p1 is higher than a frequency that allows
536 the job to run within the goal.
537
538 If the current frequency is < min, it will be set to min. Like‐
539 wise, if the current frequency is > max, it will be set to max.
540
541 Acceptable values at present include:
542
543 #### frequency in kilohertz
544
545 Low the lowest available frequency
546
547 High the highest available frequency
548
549 HighM1 (high minus one) will select the next highest
550 available frequency
551
552 Medium attempts to set a frequency in the middle of the
553 available range
554
555 Conservative attempts to use the Conservative CPU governor
556
557 OnDemand attempts to use the OnDemand CPU governor (the
558 default value)
559
560 Performance attempts to use the Performance CPU governor
561
562 PowerSave attempts to use the PowerSave CPU governor
563
564 UserSpace attempts to use the UserSpace CPU governor
565
566
567 The following informational environment variable is set
568 in the job
569 step when --cpu-freq option is requested.
570 SLURM_CPU_FREQ_REQ
571
572 This environment variable can also be used to supply the value
573 for the CPU frequency request if it is set when the 'srun' com‐
574 mand is issued. The --cpu-freq on the command line will over‐
575 ride the environment variable value. The form on the environ‐
576 ment variable is the same as the command line. See the ENVIRON‐
577 MENT VARIABLES section for a description of the
578 SLURM_CPU_FREQ_REQ variable.
579
580 NOTE: This parameter is treated as a request, not a requirement.
581 If the job step's node does not support setting the CPU fre‐
582 quency, or the requested value is outside the bounds of the
583 legal frequencies, an error is logged, but the job step is
584 allowed to continue.
585
586 NOTE: Setting the frequency for just the CPUs of the job step
587 implies that the tasks are confined to those CPUs. If task con‐
588 finement (i.e., TaskPlugin=task/affinity or TaskPlu‐
589 gin=task/cgroup with the "ConstrainCores" option) is not config‐
590 ured, this parameter is ignored.
591
592 NOTE: When the step completes, the frequency and governor of
593 each selected CPU is reset to the previous values.
594
595 NOTE: When submitting jobs with the --cpu-freq option with lin‐
596 uxproc as the ProctrackType can cause jobs to run too quickly
597 before Accounting is able to poll for job information. As a
598 result not all of accounting information will be present.
599
600 This option applies to job and step allocations.
601
602
603 -c, --cpus-per-task=<ncpus>
604 Request that ncpus be allocated per process. This may be useful
605 if the job is multithreaded and requires more than one CPU per
606 task for optimal performance. The default is one CPU per
607 process. If -c is specified without -n, as many tasks will be
608 allocated per node as possible while satisfying the -c restric‐
609 tion. For instance on a cluster with 8 CPUs per node, a job
610 request for 4 nodes and 3 CPUs per task may be allocated 3 or 6
611 CPUs per node (1 or 2 tasks per node) depending upon resource
612 consumption by other jobs. Such a job may be unable to execute
613 more than a total of 4 tasks. This option may also be useful to
614 spawn tasks without allocating resources to the job step from
615 the job's allocation when running multiple job steps with the
616 --exclusive option.
617
618 WARNING: There are configurations and options interpreted dif‐
619 ferently by job and job step requests which can result in incon‐
620 sistencies for this option. For example srun -c2
621 --threads-per-core=1 prog may allocate two cores for the job,
622 but if each of those cores contains two threads, the job alloca‐
623 tion will include four CPUs. The job step allocation will then
624 launch two threads per CPU for a total of two tasks.
625
626 WARNING: When srun is executed from within salloc or sbatch,
627 there are configurations and options which can result in incon‐
628 sistent allocations when -c has a value greater than -c on sal‐
629 loc or sbatch.
630
631 This option applies to job allocations.
632
633
634 --deadline=<OPT>
635 remove the job if no ending is possible before this deadline
636 (start > (deadline - time[-min])). Default is no deadline.
637 Valid time formats are:
638 HH:MM[:SS] [AM|PM]
639 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
640 MM/DD[/YY]-HH:MM[:SS]
641 YYYY-MM-DD[THH:MM[:SS]]]
642
643 This option applies only to job allocations.
644
645
646 --delay-boot=<minutes>
647 Do not reboot nodes in order to satisfied this job's feature
648 specification if the job has been eligible to run for less than
649 this time period. If the job has waited for less than the spec‐
650 ified period, it will use only nodes which already have the
651 specified features. The argument is in units of minutes. A
652 default value may be set by a system administrator using the
653 delay_boot option of the SchedulerParameters configuration
654 parameter in the slurm.conf file, otherwise the default value is
655 zero (no delay).
656
657 This option applies only to job allocations.
658
659
660 -d, --dependency=<dependency_list>
661 Defer the start of this job until the specified dependencies
662 have been satisfied completed. This option does not apply to job
663 steps (executions of srun within an existing salloc or sbatch
664 allocation) only to job allocations. <dependency_list> is of
665 the form <type:job_id[:job_id][,type:job_id[:job_id]]> or
666 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
667 must be satisfied if the "," separator is used. Any dependency
668 may be satisfied if the "?" separator is used. Many jobs can
669 share the same dependency and these jobs may even belong to dif‐
670 ferent users. The value may be changed after job submission
671 using the scontrol command. Once a job dependency fails due to
672 the termination state of a preceding job, the dependent job will
673 never be run, even if the preceding job is requeued and has a
674 different termination state in a subsequent execution. This
675 option applies to job allocations.
676
677 after:job_id[:jobid...]
678 This job can begin execution after the specified jobs
679 have begun execution.
680
681 afterany:job_id[:jobid...]
682 This job can begin execution after the specified jobs
683 have terminated.
684
685 afterburstbuffer:job_id[:jobid...]
686 This job can begin execution after the specified jobs
687 have terminated and any associated burst buffer stage out
688 operations have completed.
689
690 aftercorr:job_id[:jobid...]
691 A task of this job array can begin execution after the
692 corresponding task ID in the specified job has completed
693 successfully (ran to completion with an exit code of
694 zero).
695
696 afternotok:job_id[:jobid...]
697 This job can begin execution after the specified jobs
698 have terminated in some failed state (non-zero exit code,
699 node failure, timed out, etc).
700
701 afterok:job_id[:jobid...]
702 This job can begin execution after the specified jobs
703 have successfully executed (ran to completion with an
704 exit code of zero).
705
706 expand:job_id
707 Resources allocated to this job should be used to expand
708 the specified job. The job to expand must share the same
709 QOS (Quality of Service) and partition. Gang scheduling
710 of resources in the partition is also not supported.
711
712 singleton
713 This job can begin execution after any previously
714 launched jobs sharing the same job name and user have
715 terminated. In other words, only one job by that name
716 and owned by that user can be running or suspended at any
717 point in time.
718
719
720 -D, --chdir=<path>
721 Have the remote processes do a chdir to path before beginning
722 execution. The default is to chdir to the current working direc‐
723 tory of the srun process. The path can be specified as full path
724 or relative path to the directory where the command is executed.
725 This option applies to job allocations.
726
727
728 -e, --error=<filename pattern>
729 Specify how stderr is to be redirected. By default in interac‐
730 tive mode, srun redirects stderr to the same file as stdout, if
731 one is specified. The --error option is provided to allow stdout
732 and stderr to be redirected to different locations. See IO Re‐
733 direction below for more options. If the specified file already
734 exists, it will be overwritten. This option applies to job and
735 step allocations.
736
737
738 -E, --preserve-env
739 Pass the current values of environment variables SLURM_JOB_NODES
740 and SLURM_NTASKS through to the executable, rather than comput‐
741 ing them from commandline parameters. This option applies to job
742 allocations.
743
744
745 --epilog=<executable>
746 srun will run executable just after the job step completes. The
747 command line arguments for executable will be the command and
748 arguments of the job step. If executable is "none", then no
749 srun epilog will be run. This parameter overrides the SrunEpilog
750 parameter in slurm.conf. This parameter is completely indepen‐
751 dent from the Epilog parameter in slurm.conf. This option
752 applies to job allocations.
753
754
755
756 --exclusive[=user|mcs]
757 This option applies to job and job step allocations, and has two
758 slightly different meanings for each one. When used to initiate
759 a job, the job allocation cannot share nodes with other running
760 jobs (or just other users with the "=user" option or "=mcs"
761 option). The default shared/exclusive behavior depends on sys‐
762 tem configuration and the partition's OverSubscribe option takes
763 precedence over the job's option.
764
765 This option can also be used when initiating more than one job
766 step within an existing resource allocation, where you want sep‐
767 arate processors to be dedicated to each job step. If sufficient
768 processors are not available to initiate the job step, it will
769 be deferred. This can be thought of as providing a mechanism for
770 resource management to the job within it's allocation.
771
772 The exclusive allocation of CPUs only applies to job steps
773 explicitly invoked with the --exclusive option. For example, a
774 job might be allocated one node with four CPUs and a remote
775 shell invoked on the allocated node. If that shell is not
776 invoked with the --exclusive option, then it may create a job
777 step with four tasks using the --exclusive option and not con‐
778 flict with the remote shell's resource allocation. Use the
779 --exclusive option to invoke every job step to ensure distinct
780 resources for each step.
781
782 Note that all CPUs allocated to a job are available to each job
783 step unless the --exclusive option is used plus task affinity is
784 configured. Since resource management is provided by processor,
785 the --ntasks option must be specified, but the following options
786 should NOT be specified --relative, --distribution=arbitrary.
787 See EXAMPLE below.
788
789
790 --export=<environment variables [ALL] | NONE>
791 Identify which environment variables are propagated to the
792 launched application. By default, all are propagated. Multiple
793 environment variable names should be comma separated. Environ‐
794 ment variable names may be specified to propagate the current
795 value (e.g. "--export=EDITOR") or specific values may be
796 exported (e.g. "--export=EDITOR=/bin/emacs"). In these two exam‐
797 ples, the propagated environment will only contain the variable
798 EDITOR. If one desires to add to the environment instead of
799 replacing it, have the argument include ALL (e.g.
800 "--export=ALL,EDITOR=/bin/emacs"). This will propagate EDITOR
801 along with the current environment. Unlike sbatch, if ALL is
802 specified, any additional specified environment variables are
803 ignored. If one desires no environment variables be propagated,
804 use the argument NONE. Regardless of this setting, the appro‐
805 priate SLURM_* task environment variables are always exported to
806 the environment. srun may deviate from the above behavior if
807 the default launch plugin, launch/slurm, is not used.
808
809
810 --gid=<group>
811 If srun is run as root, and the --gid option is used, submit the
812 job with group's group access permissions. group may be the
813 group name or the numerical group ID. This option applies to job
814 allocations.
815
816
817 --gres=<list>
818 Specifies a comma delimited list of generic consumable
819 resources. The format of each entry on the list is
820 "name[[:type]:count]". The name is that of the consumable
821 resource. The count is the number of those resources with a
822 default value of 1. The specified resources will be allocated
823 to the job on each node. The available generic consumable
824 resources is configurable by the system administrator. A list
825 of available generic consumable resources will be printed and
826 the command will exit if the option argument is "help". Exam‐
827 ples of use include "--gres=gpu:2,mic:1", "--gres=gpu:kepler:2",
828 and "--gres=help". NOTE: This option applies to job and step
829 allocations. By default, a job step is allocated all of the
830 generic resources that have allocated to the job. To change the
831 behavior so that each job step is allocated no generic
832 resources, explicitly set the value of --gres to specify zero
833 counts for each generic resource OR set "--gres=none" OR set the
834 SLURM_STEP_GRES environment variable to "none".
835
836
837 --gres-flags=<type>
838 Specify generic resource task binding options. This option
839 applies to job allocations.
840
841 disable-binding
842 Disable filtering of CPUs with respect to generic
843 resource locality. This option is currently required to
844 use more CPUs than are bound to a GRES (i.e. if a GPU is
845 bound to the CPUs on one socket, but resources on more
846 than one socket are required to run the job). This
847 option may permit a job to be allocated resources sooner
848 than otherwise possible, but may result in lower job per‐
849 formance.
850
851 enforce-binding
852 The only CPUs available to the job will be those bound to
853 the selected GRES (i.e. the CPUs identified in the
854 gres.conf file will be strictly enforced). This option
855 may result in delayed initiation of a job. For example a
856 job requiring two GPUs and one CPU will be delayed until
857 both GPUs on a single socket are available rather than
858 using GPUs bound to separate sockets, however the appli‐
859 cation performance may be improved due to improved commu‐
860 nication speed. Requires the node to be configured with
861 more than one socket and resource filtering will be per‐
862 formed on a per-socket basis.
863
864
865 -H, --hold
866 Specify the job is to be submitted in a held state (priority of
867 zero). A held job can now be released using scontrol to reset
868 its priority (e.g. "scontrol release <job_id>"). This option
869 applies to job allocations.
870
871
872 -h, --help
873 Display help information and exit.
874
875
876 --hint=<type>
877 Bind tasks according to application hints.
878
879 compute_bound
880 Select settings for compute bound applications: use all
881 cores in each socket, one thread per core.
882
883 memory_bound
884 Select settings for memory bound applications: use only
885 one core in each socket, one thread per core.
886
887 [no]multithread
888 [don't] use extra threads with in-core multi-threading
889 which can benefit communication intensive applications.
890 Only supported with the task/affinity plugin.
891
892 help show this help message
893
894 This option applies to job allocations.
895
896
897 -I, --immediate[=<seconds>]
898 exit if resources are not available within the time period spec‐
899 ified. If no argument is given, resources must be available
900 immediately for the request to succeed. By default, --immediate
901 is off, and the command will block until resources become avail‐
902 able. Since this option's argument is optional, for proper pars‐
903 ing the single letter option must be followed immediately with
904 the value and not include a space between them. For example
905 "-I60" and not "-I 60". This option applies to job and step
906 allocations.
907
908
909 -i, --input=<mode>
910 Specify how stdin is to redirected. By default, srun redirects
911 stdin from the terminal all tasks. See IO Redirection below for
912 more options. For OS X, the poll() function does not support
913 stdin, so input from a terminal is not possible. This option
914 applies to job and step allocations.
915
916
917 -J, --job-name=<jobname>
918 Specify a name for the job. The specified name will appear along
919 with the job id number when querying running jobs on the system.
920 The default is the supplied executable program's name. NOTE:
921 This information may be written to the slurm_jobacct.log file.
922 This file is space delimited so if a space is used in the job‐
923 name name it will cause problems in properly displaying the con‐
924 tents of the slurm_jobacct.log file when the sacct command is
925 used. This option applies to job and step allocations.
926
927
928 --jobid=<jobid>
929 Initiate a job step under an already allocated job with job id
930 id. Using this option will cause srun to behave exactly as if
931 the SLURM_JOB_ID environment variable was set. This option
932 applies to job and step allocations. NOTE: For job allocations,
933 this is only valid for users root and SlurmUser.
934
935
936 -K, --kill-on-bad-exit[=0|1]
937 Controls whether or not to terminate a step if any task exits
938 with a non-zero exit code. If this option is not specified, the
939 default action will be based upon the Slurm configuration param‐
940 eter of KillOnBadExit. If this option is specified, it will take
941 precedence over KillOnBadExit. An option argument of zero will
942 not terminate the job. A non-zero argument or no argument will
943 terminate the job. Note: This option takes precedence over the
944 -W, --wait option to terminate the job immediately if a task
945 exits with a non-zero exit code. Since this option's argument
946 is optional, for proper parsing the single letter option must be
947 followed immediately with the value and not include a space
948 between them. For example "-K1" and not "-K 1".
949
950
951 -k, --no-kill
952 Do not automatically terminate a job if one of the nodes it has
953 been allocated fails. This option applies to job and step allo‐
954 cations. The job will assume all responsibilities for
955 fault-tolerance. Tasks launch using this option will not be
956 considered terminated (e.g. -K, --kill-on-bad-exit and -W,
957 --wait options will have no effect upon the job step). The
958 active job step (MPI job) will likely suffer a fatal error, but
959 subsequent job steps may be run if this option is specified.
960 The default action is to terminate the job upon node failure.
961
962
963 --launch-cmd
964 Print external launch command instead of running job normally
965 through Slurm. This option is only valid if using something
966 other than the launch/slurm plugin. This option applies to step
967 allocations.
968
969
970 --launcher-opts=<options>
971 Options for the external launcher if using something other than
972 the launch/slurm plugin. This option applies to step alloca‐
973 tions.
974
975
976 -l, --label
977 Prepend task number to lines of stdout/err. The --label option
978 will prepend lines of output with the remote task id. This
979 option applies to step allocations.
980
981
982 -L, --licenses=<license>
983 Specification of licenses (or other resources available on all
984 nodes of the cluster) which must be allocated to this job.
985 License names can be followed by a colon and count (the default
986 count is one). Multiple license names should be comma separated
987 (e.g. "--licenses=foo:4,bar"). This option applies to job allo‐
988 cations.
989
990
991 -M, --clusters=<string>
992 Clusters to issue commands to. Multiple cluster names may be
993 comma separated. The job will be submitted to the one cluster
994 providing the earliest expected job initiation time. The default
995 value is the current cluster. A value of 'all' will query to run
996 on all clusters. Note the --export option to control environ‐
997 ment variables exported between clusters. This option applies
998 only to job allocations. Note that the SlurmDBD must be up for
999 this option to work properly.
1000
1001
1002 -m, --distribution=
1003 *|block|cyclic|arbitrary|plane=<options>
1004 [:*|block|cyclic|fcyclic[:*|block|
1005 cyclic|fcyclic]][,Pack|NoPack]
1006
1007 Specify alternate distribution methods for remote processes.
1008 This option controls the distribution of tasks to the nodes on
1009 which resources have been allocated, and the distribution of
1010 those resources to tasks for binding (task affinity). The first
1011 distribution method (before the first ":") controls the distri‐
1012 bution of tasks to nodes. The second distribution method (after
1013 the first ":") controls the distribution of allocated CPUs
1014 across sockets for binding to tasks. The third distribution
1015 method (after the second ":") controls the distribution of allo‐
1016 cated CPUs across cores for binding to tasks. The second and
1017 third distributions apply only if task affinity is enabled. The
1018 third distribution is supported only if the task/cgroup plugin
1019 is configured. The default value for each distribution type is
1020 specified by *.
1021
1022 Note that with select/cons_res, the number of CPUs allocated on
1023 each socket and node may be different. Refer to
1024 https://slurm.schedmd.com/mc_support.html for more information
1025 on resource allocation, distribution of tasks to nodes, and
1026 binding of tasks to CPUs.
1027 First distribution method (distribution of tasks across nodes):
1028
1029
1030 * Use the default method for distributing tasks to nodes
1031 (block).
1032
1033 block The block distribution method will distribute tasks to a
1034 node such that consecutive tasks share a node. For exam‐
1035 ple, consider an allocation of three nodes each with two
1036 cpus. A four-task block distribution request will dis‐
1037 tribute those tasks to the nodes with tasks one and two
1038 on the first node, task three on the second node, and
1039 task four on the third node. Block distribution is the
1040 default behavior if the number of tasks exceeds the num‐
1041 ber of allocated nodes.
1042
1043 cyclic The cyclic distribution method will distribute tasks to a
1044 node such that consecutive tasks are distributed over
1045 consecutive nodes (in a round-robin fashion). For exam‐
1046 ple, consider an allocation of three nodes each with two
1047 cpus. A four-task cyclic distribution request will dis‐
1048 tribute those tasks to the nodes with tasks one and four
1049 on the first node, task two on the second node, and task
1050 three on the third node. Note that when SelectType is
1051 select/cons_res, the same number of CPUs may not be allo‐
1052 cated on each node. Task distribution will be round-robin
1053 among all the nodes with CPUs yet to be assigned to
1054 tasks. Cyclic distribution is the default behavior if
1055 the number of tasks is no larger than the number of allo‐
1056 cated nodes.
1057
1058 plane The tasks are distributed in blocks of a specified size.
1059 The options include a number representing the size of the
1060 task block. This is followed by an optional specifica‐
1061 tion of the task distribution scheme within a block of
1062 tasks and between the blocks of tasks. The number of
1063 tasks distributed to each node is the same as for cyclic
1064 distribution, but the taskids assigned to each node
1065 depend on the plane size. For more details (including
1066 examples and diagrams), please see
1067 https://slurm.schedmd.com/mc_support.html
1068 and
1069 https://slurm.schedmd.com/dist_plane.html
1070
1071 arbitrary
1072 The arbitrary method of distribution will allocate pro‐
1073 cesses in-order as listed in file designated by the envi‐
1074 ronment variable SLURM_HOSTFILE. If this variable is
1075 listed it will over ride any other method specified. If
1076 not set the method will default to block. Inside the
1077 hostfile must contain at minimum the number of hosts
1078 requested and be one per line or comma separated. If
1079 specifying a task count (-n, --ntasks=<number>), your
1080 tasks will be laid out on the nodes in the order of the
1081 file.
1082 NOTE: The arbitrary distribution option on a job alloca‐
1083 tion only controls the nodes to be allocated to the job
1084 and not the allocation of CPUs on those nodes. This
1085 option is meant primarily to control a job step's task
1086 layout in an existing job allocation for the srun com‐
1087 mand.
1088 NOTE: If number of tasks is given and a list of requested
1089 nodes is also given the number of nodes used from that
1090 list will be reduced to match that of the number of tasks
1091 if the number of nodes in the list is greater than the
1092 number of tasks.
1093
1094
1095 Second distribution method (distribution of CPUs across sockets
1096 for binding):
1097
1098
1099 * Use the default method for distributing CPUs across sock‐
1100 ets (cyclic).
1101
1102 block The block distribution method will distribute allocated
1103 CPUs consecutively from the same socket for binding to
1104 tasks, before using the next consecutive socket.
1105
1106 cyclic The cyclic distribution method will distribute allocated
1107 CPUs for binding to a given task consecutively from the
1108 same socket, and from the next consecutive socket for the
1109 next task, in a round-robin fashion across sockets.
1110
1111 fcyclic
1112 The fcyclic distribution method will distribute allocated
1113 CPUs for binding to tasks from consecutive sockets in a
1114 round-robin fashion across the sockets.
1115
1116
1117 Third distribution method (distribution of CPUs across cores for
1118 binding):
1119
1120
1121 * Use the default method for distributing CPUs across cores
1122 (inherited from second distribution method).
1123
1124 block The block distribution method will distribute allocated
1125 CPUs consecutively from the same core for binding to
1126 tasks, before using the next consecutive core.
1127
1128 cyclic The cyclic distribution method will distribute allocated
1129 CPUs for binding to a given task consecutively from the
1130 same core, and from the next consecutive core for the
1131 next task, in a round-robin fashion across cores.
1132
1133 fcyclic
1134 The fcyclic distribution method will distribute allocated
1135 CPUs for binding to tasks from consecutive cores in a
1136 round-robin fashion across the cores.
1137
1138
1139
1140 Optional control for task distribution over nodes:
1141
1142
1143 Pack Rather than evenly distributing a job step's tasks evenly
1144 across it's allocated nodes, pack them as tightly as pos‐
1145 sible on the nodes.
1146
1147 NoPack Rather than packing a job step's tasks as tightly as pos‐
1148 sible on the nodes, distribute them evenly. This user
1149 option will supersede the SelectTypeParameters
1150 CR_Pack_Nodes configuration parameter.
1151
1152 This option applies to job and step allocations.
1153
1154
1155 --mail-type=<type>
1156 Notify user by email when certain event types occur. Valid type
1157 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1158 BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buf‐
1159 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1160 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
1161 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1162 time limit). Multiple type values may be specified in a comma
1163 separated list. The user to be notified is indicated with
1164 --mail-user. This option applies to job allocations.
1165
1166
1167 --mail-user=<user>
1168 User to receive email notification of state changes as defined
1169 by --mail-type. The default value is the submitting user. This
1170 option applies to job allocations.
1171
1172
1173 --mcs-label=<mcs>
1174 Used only when the mcs/group plugin is enabled. This parameter
1175 is a group among the groups of the user. Default value is cal‐
1176 culated by the Plugin mcs if it's enabled. This option applies
1177 to job allocations.
1178
1179
1180 --mem=<size[units]>
1181 Specify the real memory required per node. Default units are
1182 megabytes unless the SchedulerParameters configuration parameter
1183 includes the "default_gbytes" option for gigabytes. Different
1184 units can be specified using the suffix [K|M|G|T]. Default
1185 value is DefMemPerNode and the maximum value is MaxMemPerNode.
1186 If configured, both of parameters can be seen using the scontrol
1187 show config command. This parameter would generally be used if
1188 whole nodes are allocated to jobs (SelectType=select/linear).
1189 Specifying a memory limit of zero for a job step will restrict
1190 the job step to the amount of memory allocated to the job, but
1191 not remove any of the job's memory allocation from being avail‐
1192 able to other job steps. Also see --mem-per-cpu. --mem and
1193 --mem-per-cpu are mutually exclusive.
1194
1195 NOTE: A memory size specification of zero is treated as a spe‐
1196 cial case and grants the job access to all of the memory on each
1197 node for newly submitted jobs and all available job memory to a
1198 new job steps.
1199
1200 Specificing new memory limits for job steps are only advisory.
1201
1202 If the job is allocated multiple nodes in a heterogeneous clus‐
1203 ter, the memory limit on each node will be that of the node in
1204 the allocation with the smallest memory size (same limit will
1205 apply to every node in the job's allocation).
1206
1207 NOTE: Enforcement of memory limits currently relies upon the
1208 task/cgroup plugin or enabling of accounting, which samples mem‐
1209 ory use on a periodic basis (data need not be stored, just col‐
1210 lected). In both cases memory use is based upon the job's Resi‐
1211 dent Set Size (RSS). A task may exceed the memory limit until
1212 the next periodic accounting sample.
1213
1214 This option applies to job and step allocations.
1215
1216
1217 --mem-per-cpu=<size[units]>
1218 Minimum memory required per allocated CPU. Default units are
1219 megabytes unless the SchedulerParameters configuration parameter
1220 includes the "default_gbytes" option for gigabytes. Different
1221 units can be specified using the suffix [K|M|G|T]. Default
1222 value is DefMemPerCPU and the maximum value is MaxMemPerCPU (see
1223 exception below). If configured, both of parameters can be seen
1224 using the scontrol show config command. Note that if the job's
1225 --mem-per-cpu value exceeds the configured MaxMemPerCPU, then
1226 the user's limit will be treated as a memory limit per task;
1227 --mem-per-cpu will be reduced to a value no larger than MaxMem‐
1228 PerCPU; --cpus-per-task will be set and the value of
1229 --cpus-per-task multiplied by the new --mem-per-cpu value will
1230 equal the original --mem-per-cpu value specified by the user.
1231 This parameter would generally be used if individual processors
1232 are allocated to jobs (SelectType=select/cons_res). If
1233 resources are allocated by the core, socket or whole nodes; the
1234 number of CPUs allocated to a job may be higher than the task
1235 count and the value of --mem-per-cpu should be adjusted accord‐
1236 ingly. Specifying a memory limit of zero for a job step will
1237 restrict the job step to the amount of memory allocated to the
1238 job, but not remove any of the job's memory allocation from
1239 being available to other job steps. Also see --mem. --mem and
1240 --mem-per-cpu are mutually exclusive. This option applies to job
1241 and step allocations.
1242
1243
1244 --mem-bind=[{quiet,verbose},]type
1245 Bind tasks to memory. Used only when the task/affinity plugin is
1246 enabled and the NUMA memory functions are available. Note that
1247 the resolution of CPU and memory binding may differ on some
1248 architectures. For example, CPU binding may be performed at the
1249 level of the cores within a processor while memory binding will
1250 be performed at the level of nodes, where the definition of
1251 "nodes" may differ from system to system. By default no memory
1252 binding is performed; any task using any CPU can use any memory.
1253 This option is typically used to ensure that each task is bound
1254 to the memory closest to it's assigned CPU. The use of any type
1255 other than "none" or "local" is not recommended. If you want
1256 greater control, try running a simple test code with the options
1257 "--cpu-bind=verbose,none --mem-bind=verbose,none" to determine
1258 the specific configuration.
1259
1260 NOTE: To have Slurm always report on the selected memory binding
1261 for all commands executed in a shell, you can enable verbose
1262 mode by setting the SLURM_MEM_BIND environment variable value to
1263 "verbose".
1264
1265 The following informational environment variables are set when
1266 --mem-bind is in use:
1267
1268 SLURM_MEM_BIND_LIST
1269 SLURM_MEM_BIND_PREFER
1270 SLURM_MEM_BIND_SORT
1271 SLURM_MEM_BIND_TYPE
1272 SLURM_MEM_BIND_VERBOSE
1273
1274 See the ENVIRONMENT VARIABLES section for a more detailed
1275 description of the individual SLURM_MEM_BIND* variables.
1276
1277 Supported options include:
1278
1279 help show this help message
1280
1281 local Use memory local to the processor in use
1282
1283 map_mem:<list>
1284 Bind by setting memory masks on tasks (or ranks) as spec‐
1285 ified where <list> is
1286 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1287 ping is specified for a node and identical mapping is
1288 applied to the tasks on every node (i.e. the lowest task
1289 ID on each node is mapped to the first ID specified in
1290 the list, etc.). NUMA IDs are interpreted as decimal
1291 values unless they are preceded with '0x' in which case
1292 they interpreted as hexadecimal values. If the number of
1293 tasks (or ranks) exceeds the number of elements in this
1294 list, elements in the list will be reused as needed
1295 starting from the beginning of the list. To simplify
1296 support for large task counts, the lists may follow a map
1297 with an asterisk and repetition count For example
1298 "map_mem:0x0f*4,0xf0*4". Not supported unless the entire
1299 node is allocated to the job.
1300
1301 mask_mem:<list>
1302 Bind by setting memory masks on tasks (or ranks) as spec‐
1303 ified where <list> is
1304 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1305 mapping is specified for a node and identical mapping is
1306 applied to the tasks on every node (i.e. the lowest task
1307 ID on each node is mapped to the first mask specified in
1308 the list, etc.). NUMA masks are always interpreted as
1309 hexadecimal values. Note that masks must be preceded
1310 with a '0x' if they don't begin with [0-9] so they are
1311 seen as numerical values. If the number of tasks (or
1312 ranks) exceeds the number of elements in this list, ele‐
1313 ments in the list will be reused as needed starting from
1314 the beginning of the list. To simplify support for large
1315 task counts, the lists may follow a mask with an asterisk
1316 and repetition count For example "mask_mem:0*4,1*4". Not
1317 supported unless the entire node is allocated to the job.
1318
1319 no[ne] don't bind tasks to memory (default)
1320
1321 nosort avoid sorting free cache pages (default, LaunchParameters
1322 configuration parameter can override this default)
1323
1324 p[refer]
1325 Prefer use of first specified NUMA node, but permit
1326 use of other available NUMA nodes.
1327
1328 q[uiet]
1329 quietly bind before task runs (default)
1330
1331 rank bind by task rank (not recommended)
1332
1333 sort sort free cache pages (run zonesort on Intel KNL nodes)
1334
1335 v[erbose]
1336 verbosely report binding before task runs
1337
1338 This option applies to job and step allocations.
1339
1340
1341 --mincpus=<n>
1342 Specify a minimum number of logical cpus/processors per node.
1343 This option applies to job allocations.
1344
1345
1346 --msg-timeout=<seconds>
1347 Modify the job launch message timeout. The default value is
1348 MessageTimeout in the Slurm configuration file slurm.conf.
1349 Changes to this are typically not recommended, but could be use‐
1350 ful to diagnose problems. This option applies to job alloca‐
1351 tions.
1352
1353
1354 --mpi=<mpi_type>
1355 Identify the type of MPI to be used. May result in unique initi‐
1356 ation procedures.
1357
1358 list Lists available mpi types to choose from.
1359
1360 openmpi
1361 For use with OpenMPI.
1362
1363 pmi2 To enable PMI2 support. The PMI2 support in Slurm works
1364 only if the MPI implementation supports it, in other
1365 words if the MPI has the PMI2 interface implemented. The
1366 --mpi=pmi2 will load the library lib/slurm/mpi_pmi2.so
1367 which provides the server side functionality but the
1368 client side must implement PMI2_Init() and the other
1369 interface calls.
1370
1371 pmix To enable PMIx support (http://pmix.github.io/master).
1372 The PMIx support in Slurm can be used to launch parallel
1373 applications (e.g. MPI) if it supports PMIx, PMI2 or
1374 PMI1. Slurm must be configured with pmix support by pass‐
1375 ing "--with-pmix=<PMIx installation path>" option to its
1376 "./configure" script.
1377
1378 At the time of writing PMIx is supported in Open MPI
1379 starting from version 2.0. PMIx also supports backward
1380 compatibility with PMI1 and PMI2 and can be used if MPI
1381 was configured with PMI2/PMI1 support pointing to the
1382 PMIx library ("libpmix"). If MPI supports PMI1/PMI2 but
1383 doesn't provide the way to point to a specific implemen‐
1384 tation, a hack'ish solution leveraging LD_PRELOAD can be
1385 used to force "libpmix" usage.
1386
1387
1388 none No special MPI processing. This is the default and works
1389 with many other versions of MPI.
1390
1391 This option applies to step allocations.
1392
1393
1394 --multi-prog
1395 Run a job with different programs and different arguments for
1396 each task. In this case, the executable program specified is
1397 actually a configuration file specifying the executable and
1398 arguments for each task. See MULTIPLE PROGRAM CONFIGURATION
1399 below for details on the configuration file contents. This
1400 option applies to step allocations.
1401
1402
1403 -N, --nodes=<minnodes[-maxnodes]>
1404 Request that a minimum of minnodes nodes be allocated to this
1405 job. A maximum node count may also be specified with maxnodes.
1406 If only one number is specified, this is used as both the mini‐
1407 mum and maximum node count. The partition's node limits super‐
1408 sede those of the job. If a job's node limits are outside of
1409 the range permitted for its associated partition, the job will
1410 be left in a PENDING state. This permits possible execution at
1411 a later time, when the partition limit is changed. If a job
1412 node limit exceeds the number of nodes configured in the parti‐
1413 tion, the job will be rejected. Note that the environment vari‐
1414 able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1415 ibility) will be set to the count of nodes actually allocated to
1416 the job. See the ENVIRONMENT VARIABLES section for more informa‐
1417 tion. If -N is not specified, the default behavior is to allo‐
1418 cate enough nodes to satisfy the requirements of the -n and -c
1419 options. The job will be allocated as many nodes as possible
1420 within the range specified and without delaying the initiation
1421 of the job. If number of tasks is given and a number of
1422 requested nodes is also given the number of nodes used from that
1423 request will be reduced to match that of the number of tasks if
1424 the number of nodes in the request is greater than the number of
1425 tasks. The node count specification may include a numeric value
1426 followed by a suffix of "k" (multiplies numeric value by 1,024)
1427 or "m" (multiplies numeric value by 1,048,576). This option
1428 applies to job and step allocations.
1429
1430
1431 -n, --ntasks=<number>
1432 Specify the number of tasks to run. Request that srun allocate
1433 resources for ntasks tasks. The default is one task per node,
1434 but note that the --cpus-per-task option will change this
1435 default. This option applies to job and step allocations.
1436
1437
1438 --network=<type>
1439 Specify information pertaining to the switch or network. The
1440 interpretation of type is system dependent. This option is sup‐
1441 ported when running Slurm on a Cray natively. It is used to
1442 request using Network Performance Counters. Only one value per
1443 request is valid. All options are case in-sensitive. In this
1444 configuration supported values include:
1445
1446 system
1447 Use the system-wide network performance counters. Only
1448 nodes requested will be marked in use for the job alloca‐
1449 tion. If the job does not fill up the entire system the
1450 rest of the nodes are not able to be used by other jobs
1451 using NPC, if idle their state will appear as PerfCnts.
1452 These nodes are still available for other jobs not using
1453 NPC.
1454
1455 blade Use the blade network performance counters. Only nodes
1456 requested will be marked in use for the job allocation.
1457 If the job does not fill up the entire blade(s) allocated
1458 to the job those blade(s) are not able to be used by other
1459 jobs using NPC, if idle their state will appear as PerfC‐
1460 nts. These nodes are still available for other jobs not
1461 using NPC.
1462
1463
1464 In all cases the job or step allocation request must
1465 specify the
1466 --exclusive option. Otherwise the request will be denied.
1467
1468 Also with any of these options steps are not allowed to share
1469 blades, so resources would remain idle inside an allocation if
1470 the step running on a blade does not take up all the nodes on
1471 the blade.
1472
1473 The network option is also supported on systems with IBM's Par‐
1474 allel Environment (PE). See IBM's LoadLeveler job command key‐
1475 word documentation about the keyword "network" for more informa‐
1476 tion. Multiple values may be specified in a comma separated
1477 list. All options are case in-sensitive. Supported values
1478 include:
1479
1480 BULK_XFER[=<resources>]
1481 Enable bulk transfer of data using Remote Direct-
1482 Memory Access (RDMA). The optional resources speci‐
1483 fication is a numeric value which can have a suffix
1484 of "k", "K", "m", "M", "g" or "G" for kilobytes,
1485 megabytes or gigabytes. NOTE: The resources speci‐
1486 fication is not supported by the underlying IBM in‐
1487 frastructure as of Parallel Environment version 2.2
1488 and no value should be specified at this time. The
1489 devices allocated to a job must all be of the same
1490 type. The default value depends upon depends upon
1491 what hardware is available and in order of prefer‐
1492 ences is IPONLY (which is not considered in User
1493 Space mode), HFI, IB, HPCE, and KMUX.
1494
1495 CAU=<count> Number of Collective Acceleration Units (CAU)
1496 required. Applies only to IBM Power7-IH processors.
1497 Default value is zero. Independent CAU will be
1498 allocated for each programming interface (MPI, LAPI,
1499 etc.)
1500
1501 DEVNAME=<name>
1502 Specify the device name to use for communications
1503 (e.g. "eth0" or "mlx4_0").
1504
1505 DEVTYPE=<type>
1506 Specify the device type to use for communications.
1507 The supported values of type are: "IB" (InfiniBand),
1508 "HFI" (P7 Host Fabric Interface), "IPONLY" (IP-Only
1509 interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1510 nel Emulation of HPCE). The devices allocated to a
1511 job must all be of the same type. The default value
1512 depends upon depends upon what hardware is available
1513 and in order of preferences is IPONLY (which is not
1514 considered in User Space mode), HFI, IB, HPCE, and
1515 KMUX.
1516
1517 IMMED =<count>
1518 Number of immediate send slots per window required.
1519 Applies only to IBM Power7-IH processors. Default
1520 value is zero.
1521
1522 INSTANCES =<count>
1523 Specify number of network connections for each task
1524 on each network connection. The default instance
1525 count is 1.
1526
1527 IPV4 Use Internet Protocol (IP) version 4 communications
1528 (default).
1529
1530 IPV6 Use Internet Protocol (IP) version 6 communications.
1531
1532 LAPI Use the LAPI programming interface.
1533
1534 MPI Use the MPI programming interface. MPI is the
1535 default interface.
1536
1537 PAMI Use the PAMI programming interface.
1538
1539 SHMEM Use the OpenSHMEM programming interface.
1540
1541 SN_ALL Use all available switch networks (default).
1542
1543 SN_SINGLE Use one available switch network.
1544
1545 UPC Use the UPC programming interface.
1546
1547 US Use User Space communications.
1548
1549
1550 Some examples of network specifications:
1551
1552 Instances=2,US,MPI,SN_ALL
1553 Create two user space connections for MPI communica‐
1554 tions on every switch network for each task.
1555
1556 US,MPI,Instances=3,Devtype=IB
1557 Create three user space connections for MPI communi‐
1558 cations on every InfiniBand network for each task.
1559
1560 IPV4,LAPI,SN_Single
1561 Create a IP version 4 connection for LAPI communica‐
1562 tions on one switch network for each task.
1563
1564 Instances=2,US,LAPI,MPI
1565 Create two user space connections each for LAPI and
1566 MPI communications on every switch network for each
1567 task. Note that SN_ALL is the default option so
1568 every switch network is used. Also note that
1569 Instances=2 specifies that two connections are
1570 established for each protocol (LAPI and MPI) and
1571 each task. If there are two networks and four tasks
1572 on the node then a total of 32 connections are
1573 established (2 instances x 2 protocols x 2 networks
1574 x 4 tasks).
1575
1576 This option applies to job and step allocations.
1577
1578
1579 --nice[=adjustment]
1580 Run the job with an adjusted scheduling priority within Slurm.
1581 With no adjustment value the scheduling priority is decreased by
1582 100. A negative nice value increases the priority, otherwise
1583 decreases it. The adjustment range is +/- 2147483645. Only priv‐
1584 ileged users can specify a negative adjustment.
1585
1586
1587 --ntasks-per-core=<ntasks>
1588 Request the maximum ntasks be invoked on each core. This option
1589 applies to the job allocation, but not to step allocations.
1590 Meant to be used with the --ntasks option. Related to
1591 --ntasks-per-node except at the core level instead of the node
1592 level. Masks will automatically be generated to bind the tasks
1593 to specific core unless --cpu-bind=none is specified. NOTE:
1594 This option is not supported unless SelectType=cons_res is con‐
1595 figured (either directly or indirectly on Cray systems) along
1596 with the node's core count.
1597
1598
1599 --ntasks-per-node=<ntasks>
1600 Request that ntasks be invoked on each node. If used with the
1601 --ntasks option, the --ntasks option will take precedence and
1602 the --ntasks-per-node will be treated as a maximum count of
1603 tasks per node. Meant to be used with the --nodes option. This
1604 is related to --cpus-per-task=ncpus, but does not require knowl‐
1605 edge of the actual number of cpus on each node. In some cases,
1606 it is more convenient to be able to request that no more than a
1607 specific number of tasks be invoked on each node. Examples of
1608 this include submitting a hybrid MPI/OpenMP app where only one
1609 MPI "task/rank" should be assigned to each node while allowing
1610 the OpenMP portion to utilize all of the parallelism present in
1611 the node, or submitting a single setup/cleanup/monitoring job to
1612 each node of a pre-existing allocation as one step in a larger
1613 job script. This option applies to job allocations.
1614
1615
1616 --ntasks-per-socket=<ntasks>
1617 Request the maximum ntasks be invoked on each socket. This
1618 option applies to the job allocation, but not to step alloca‐
1619 tions. Meant to be used with the --ntasks option. Related to
1620 --ntasks-per-node except at the socket level instead of the node
1621 level. Masks will automatically be generated to bind the tasks
1622 to specific sockets unless --cpu-bind=none is specified. NOTE:
1623 This option is not supported unless SelectType=cons_res is con‐
1624 figured (either directly or indirectly on Cray systems) along
1625 with the node's socket count.
1626
1627
1628 -O, --overcommit
1629 Overcommit resources. This option applies to job and step allo‐
1630 cations. When applied to job allocation, only one CPU is allo‐
1631 cated to the job per node and options used to specify the number
1632 of tasks per node, socket, core, etc. are ignored. When
1633 applied to job step allocations (the srun command when executed
1634 within an existing job allocation), this option can be used to
1635 launch more than one task per CPU. Normally, srun will not
1636 allocate more than one process per CPU. By specifying --over‐
1637 commit you are explicitly allowing more than one process per
1638 CPU. However no more than MAX_TASKS_PER_NODE tasks are permitted
1639 to execute per node. NOTE: MAX_TASKS_PER_NODE is defined in the
1640 file slurm.h and is not a variable, it is set at Slurm build
1641 time.
1642
1643
1644 -o, --output=<filename pattern>
1645 Specify the "filename pattern" for stdout redirection. By
1646 default in interactive mode, srun collects stdout from all tasks
1647 and sends this output via TCP/IP to the attached terminal. With
1648 --output stdout may be redirected to a file, to one file per
1649 task, or to /dev/null. See section IO Redirection below for the
1650 various forms of filename pattern. If the specified file
1651 already exists, it will be overwritten.
1652
1653 If --error is not also specified on the command line, both std‐
1654 out and stderr will directed to the file specified by --output.
1655 This option applies to job and step allocations.
1656
1657
1658 --open-mode=<append|truncate>
1659 Open the output and error files using append or truncate mode as
1660 specified. For heterogeneous job steps the default value is
1661 "append". Otherwise the default value is specified by the sys‐
1662 tem configuration parameter JobFileAppend. This option applies
1663 to job and step allocations.
1664
1665
1666 --pack-group=<expr>
1667 Identify each job in a heterogeneous job allocation for which a
1668 step is to be created. Applies only to srun commands issued
1669 inside a salloc allocation or sbatch script. <expr> is a set of
1670 integers corresponding to one or more options indexes on the
1671 salloc or sbatch command line. Examples: "--pack-group=2",
1672 "--pack-group=0,4", "--pack-group=1,3-5". The default value is
1673 --pack-group=0.
1674
1675
1676 -p, --partition=<partition_names>
1677 Request a specific partition for the resource allocation. If
1678 not specified, the default behavior is to allow the slurm con‐
1679 troller to select the default partition as designated by the
1680 system administrator. If the job can use more than one parti‐
1681 tion, specify their names in a comma separate list and the one
1682 offering earliest initiation will be used with no regard given
1683 to the partition name ordering (although higher priority parti‐
1684 tions will be considered first). When the job is initiated, the
1685 name of the partition used will be placed first in the job
1686 record partition string. This option applies to job allocations.
1687
1688
1689 --power=<flags>
1690 Comma separated list of power management plugin options. Cur‐
1691 rently available flags include: level (all nodes allocated to
1692 the job should have identical power caps, may be disabled by the
1693 Slurm configuration option PowerParameters=job_no_level). This
1694 option applies to job allocations.
1695
1696
1697 --priority=<value>
1698 Request a specific job priority. May be subject to configura‐
1699 tion specific constraints. value should either be a numeric
1700 value or "TOP" (for highest possible value). Only Slurm opera‐
1701 tors and administrators can set the priority of a job. This
1702 option applies to job allocations only.
1703
1704
1705 --profile=<all|none|[energy[,|task[,|filesystem[,|network]]]]>
1706 enables detailed data collection by the acct_gather_profile
1707 plugin. Detailed data are typically time-series that are stored
1708 in an HDF5 file for the job or an InfluxDB database depending on
1709 the configured plugin.
1710
1711
1712 All All data types are collected. (Cannot be combined with
1713 other values.)
1714
1715
1716 None No data types are collected. This is the default.
1717 (Cannot be combined with other values.)
1718
1719
1720 Energy Energy data is collected.
1721
1722
1723 Task Task (I/O, Memory, ...) data is collected.
1724
1725
1726 Filesystem
1727 Filesystem data is collected.
1728
1729
1730 Network Network (InfiniBand) data is collected.
1731
1732
1733 This option applies to job and step allocations.
1734
1735
1736 --prolog=<executable>
1737 srun will run executable just before launching the job step.
1738 The command line arguments for executable will be the command
1739 and arguments of the job step. If executable is "none", then no
1740 srun prolog will be run. This parameter overrides the SrunProlog
1741 parameter in slurm.conf. This parameter is completely indepen‐
1742 dent from the Prolog parameter in slurm.conf. This option
1743 applies to job allocations.
1744
1745
1746 --propagate[=rlimit[,rlimit...]]
1747 Allows users to specify which of the modifiable (soft) resource
1748 limits to propagate to the compute nodes and apply to their
1749 jobs. If no rlimit is specified, then all resource limits will
1750 be propagated. The following rlimit names are supported by
1751 Slurm (although some options may not be supported on some sys‐
1752 tems):
1753
1754 ALL All limits listed below (default)
1755
1756 NONE No limits listed below
1757
1758 AS The maximum address space for a process
1759
1760 CORE The maximum size of core file
1761
1762 CPU The maximum amount of CPU time
1763
1764 DATA The maximum size of a process's data segment
1765
1766 FSIZE The maximum size of files created. Note that if the
1767 user sets FSIZE to less than the current size of the
1768 slurmd.log, job launches will fail with a 'File size
1769 limit exceeded' error.
1770
1771 MEMLOCK The maximum size that may be locked into memory
1772
1773 NOFILE The maximum number of open files
1774
1775 NPROC The maximum number of processes available
1776
1777 RSS The maximum resident set size
1778
1779 STACK The maximum stack size
1780
1781 This option applies to job allocations.
1782
1783
1784 --pty Execute task zero in pseudo terminal mode. Implicitly sets
1785 --unbuffered. Implicitly sets --error and --output to /dev/null
1786 for all tasks except task zero, which may cause those tasks to
1787 exit immediately (e.g. shells will typically exit immediately in
1788 that situation). This option applies to step allocations.
1789
1790
1791 -q, --qos=<qos>
1792 Request a quality of service for the job. QOS values can be
1793 defined for each user/cluster/account association in the Slurm
1794 database. Users will be limited to their association's defined
1795 set of qos's when the Slurm configuration parameter, Account‐
1796 ingStorageEnforce, includes "qos" in it's definition. This
1797 option applies to job allocations.
1798
1799
1800 -Q, --quiet
1801 Suppress informational messages from srun. Errors will still be
1802 displayed. This option applies to job and step allocations.
1803
1804
1805 --quit-on-interrupt
1806 Quit immediately on single SIGINT (Ctrl-C). Use of this option
1807 disables the status feature normally available when srun
1808 receives a single Ctrl-C and causes srun to instead immediately
1809 terminate the running job. This option applies to step alloca‐
1810 tions.
1811
1812
1813 -r, --relative=<n>
1814 Run a job step relative to node n of the current allocation.
1815 This option may be used to spread several job steps out among
1816 the nodes of the current job. If -r is used, the current job
1817 step will begin at node n of the allocated nodelist, where the
1818 first node is considered node 0. The -r option is not permitted
1819 with -w or -x option and will result in a fatal error when not
1820 running within a prior allocation (i.e. when SLURM_JOB_ID is not
1821 set). The default for n is 0. If the value of --nodes exceeds
1822 the number of nodes identified with the --relative option, a
1823 warning message will be printed and the --relative option will
1824 take precedence. This option applies to step allocations.
1825
1826
1827 --reboot
1828 Force the allocated nodes to reboot before starting the job.
1829 This is only supported with some system configurations and will
1830 otherwise be silently ignored. This option applies to job allo‐
1831 cations.
1832
1833
1834 --resv-ports[=count]
1835 Reserve communication ports for this job. Users can specify the
1836 number of port they want to reserve. The parameter Mpi‐
1837 Params=ports=12000-12999 must be specified in slurm.conf. If not
1838 specified and Slurm's OpenMPI plugin is used, then by default
1839 the number of reserved equal to the highest number of tasks on
1840 any node in the job step allocation. If the number of reserved
1841 ports is zero then no ports is reserved. Used for OpenMPI. This
1842 option applies to job and step allocations.
1843
1844
1845 --reservation=<name>
1846 Allocate resources for the job from the named reservation. This
1847 option applies to job allocations.
1848
1849
1850 --restart-dir=<directory>
1851 Specifies the directory from which the job or job step's check‐
1852 point should be read (used by the checkpoint/blcrm and check‐
1853 point/xlch plugins only). This option applies to job alloca‐
1854 tions.
1855
1856 --share The --share option has been replaced by the --oversub‐
1857 scribe option described below.
1858
1859
1860 -s, --oversubscribe
1861 The job allocation can over-subscribe resources with other run‐
1862 ning jobs. The resources to be over-subscribed can be nodes,
1863 sockets, cores, and/or hyperthreads depending upon configura‐
1864 tion. The default over-subscribe behavior depends on system
1865 configuration and the partition's OverSubscribe option takes
1866 precedence over the job's option. This option may result in the
1867 allocation being granted sooner than if the --oversubscribe
1868 option was not set and allow higher system utilization, but
1869 application performance will likely suffer due to competition
1870 for resources. Also see the --exclusive option. This option
1871 applies to step allocations.
1872
1873
1874 -S, --core-spec=<num>
1875 Count of specialized cores per node reserved by the job for sys‐
1876 tem operations and not used by the application. The application
1877 will not use these cores, but will be charged for their alloca‐
1878 tion. Default value is dependent upon the node's configured
1879 CoreSpecCount value. If a value of zero is designated and the
1880 Slurm configuration option AllowSpecResourcesUsage is enabled,
1881 the job will be allowed to override CoreSpecCount and use the
1882 specialized resources on nodes it is allocated. This option can
1883 not be used with the --thread-spec option. This option applies
1884 to job allocations.
1885
1886
1887 --signal=<sig_num>[@<sig_time>]
1888 When a job is within sig_time seconds of its end time, send it
1889 the signal sig_num. Due to the resolution of event handling by
1890 Slurm, the signal may be sent up to 60 seconds earlier than
1891 specified. sig_num may either be a signal number or name (e.g.
1892 "10" or "USR1"). sig_time must have an integer value between 0
1893 and 65535. By default, no signal is sent before the job's end
1894 time. If a sig_num is specified without any sig_time, the
1895 default time will be 60 seconds. This option applies to job
1896 allocations.
1897
1898
1899 --slurmd-debug=<level>
1900 Specify a debug level for slurmd(8). The level may be specified
1901 either an integer value between 0 [quiet, only errors are dis‐
1902 played] and 4 [verbose operation] or the SlurmdDebug tags.
1903
1904 quiet Log nothing
1905
1906 fatal Log only fatal errors
1907
1908 error Log only errors
1909
1910 info Log errors and general informational messages
1911
1912 verbose Log errors and verbose informational messages
1913
1914
1915 The slurmd debug information is copied onto the stderr of
1916 the job. By default only errors are displayed. This option
1917 applies to job and step allocations.
1918
1919
1920 --sockets-per-node=<sockets>
1921 Restrict node selection to nodes with at least the specified
1922 number of sockets. See additional information under -B option
1923 above when task/affinity plugin is enabled. This option applies
1924 to job allocations.
1925
1926
1927 --spread-job
1928 Spread the job allocation over as many nodes as possible and
1929 attempt to evenly distribute tasks across the allocated nodes.
1930 This option disables the topology/tree plugin. This option
1931 applies to job allocations.
1932
1933
1934 --switches=<count>[@<max-time>]
1935 When a tree topology is used, this defines the maximum count of
1936 switches desired for the job allocation and optionally the maxi‐
1937 mum time to wait for that number of switches. If Slurm finds an
1938 allocation containing more switches than the count specified,
1939 the job remains pending until it either finds an allocation with
1940 desired switch count or the time limit expires. It there is no
1941 switch count limit, there is no delay in starting the job.
1942 Acceptable time formats include "minutes", "minutes:seconds",
1943 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1944 "days-hours:minutes:seconds". The job's maximum time delay may
1945 be limited by the system administrator using the SchedulerParam‐
1946 eters configuration parameter with the max_switch_wait parameter
1947 option. On a dragonfly network the only switch count supported
1948 is 1 since communication performance will be highest when a job
1949 is allocate resources on one leaf switch or more than 2 leaf
1950 switches. The default max-time is the max_switch_wait Sched‐
1951 ulerParameters. This option applies to job allocations.
1952
1953
1954 -T, --threads=<nthreads>
1955 Allows limiting the number of concurrent threads used to send
1956 the job request from the srun process to the slurmd processes on
1957 the allocated nodes. Default is to use one thread per allocated
1958 node up to a maximum of 60 concurrent threads. Specifying this
1959 option limits the number of concurrent threads to nthreads (less
1960 than or equal to 60). This should only be used to set a low
1961 thread count for testing on very small memory computers. This
1962 option applies to job allocations.
1963
1964
1965 -t, --time=<time>
1966 Set a limit on the total run time of the job allocation. If the
1967 requested time limit exceeds the partition's time limit, the job
1968 will be left in a PENDING state (possibly indefinitely). The
1969 default time limit is the partition's default time limit. When
1970 the time limit is reached, each task in each job step is sent
1971 SIGTERM followed by SIGKILL. The interval between signals is
1972 specified by the Slurm configuration parameter KillWait. The
1973 OverTimeLimit configuration parameter may permit the job to run
1974 longer than scheduled. Time resolution is one minute and second
1975 values are rounded up to the next minute.
1976
1977 A time limit of zero requests that no time limit be imposed.
1978 Acceptable time formats include "minutes", "minutes:seconds",
1979 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1980 "days-hours:minutes:seconds". This option applies to job and
1981 step allocations.
1982
1983
1984 --task-epilog=<executable>
1985 The slurmstepd daemon will run executable just after each task
1986 terminates. This will be executed before any TaskEpilog parame‐
1987 ter in slurm.conf is executed. This is meant to be a very
1988 short-lived program. If it fails to terminate within a few sec‐
1989 onds, it will be killed along with any descendant processes.
1990 This option applies to step allocations.
1991
1992
1993 --task-prolog=<executable>
1994 The slurmstepd daemon will run executable just before launching
1995 each task. This will be executed after any TaskProlog parameter
1996 in slurm.conf is executed. Besides the normal environment vari‐
1997 ables, this has SLURM_TASK_PID available to identify the process
1998 ID of the task being started. Standard output from this program
1999 of the form "export NAME=value" will be used to set environment
2000 variables for the task being spawned. This option applies to
2001 step allocations.
2002
2003
2004 --test-only
2005 Returns an estimate of when a job would be scheduled to run
2006 given the current job queue and all the other srun arguments
2007 specifying the job. This limits srun's behavior to just return
2008 information; no job is actually submitted. The program will be
2009 executed directly by the slurmd daemon. This option applies to
2010 job allocations.
2011
2012
2013 --thread-spec=<num>
2014 Count of specialized threads per node reserved by the job for
2015 system operations and not used by the application. The applica‐
2016 tion will not use these threads, but will be charged for their
2017 allocation. This option can not be used with the --core-spec
2018 option. This option applies to job allocations.
2019
2020
2021 --threads-per-core=<threads>
2022 Restrict node selection to nodes with at least the specified
2023 number of threads per core. NOTE: "Threads" refers to the num‐
2024 ber of processing units on each core rather than the number of
2025 application tasks to be launched per core. See additional
2026 information under -B option above when task/affinity plugin is
2027 enabled. This option applies to job allocations.
2028
2029
2030 --time-min=<time>
2031 Set a minimum time limit on the job allocation. If specified,
2032 the job may have it's --time limit lowered to a value no lower
2033 than --time-min if doing so permits the job to begin execution
2034 earlier than otherwise possible. The job's time limit will not
2035 be changed after the job is allocated resources. This is per‐
2036 formed by a backfill scheduling algorithm to allocate resources
2037 otherwise reserved for higher priority jobs. Acceptable time
2038 formats include "minutes", "minutes:seconds", "hours:min‐
2039 utes:seconds", "days-hours", "days-hours:minutes" and
2040 "days-hours:minutes:seconds". This option applies to job alloca‐
2041 tions.
2042
2043
2044 --tmp=<size[units]>
2045 Specify a minimum amount of temporary disk space per node.
2046 Default units are megabytes unless the SchedulerParameters con‐
2047 figuration parameter includes the "default_gbytes" option for
2048 gigabytes. Different units can be specified using the suffix
2049 [K|M|G|T]. This option applies to job allocations.
2050
2051
2052 -u, --unbuffered
2053 By default the connection between slurmstepd and the user
2054 launched application is over a pipe. The stdio output written by
2055 the application is buffered by the glibc until it is flushed or
2056 the output is set as unbuffered. See setbuf(3). If this option
2057 is specified the tasks are executed with a pseudo terminal so
2058 that the application output is unbuffered. This option applies
2059 to step allocations.
2060
2061 --usage
2062 Display brief help message and exit.
2063
2064
2065 --uid=<user>
2066 Attempt to submit and/or run a job as user instead of the invok‐
2067 ing user id. The invoking user's credentials will be used to
2068 check access permissions for the target partition. User root may
2069 use this option to run jobs as a normal user in a RootOnly par‐
2070 tition for example. If run as root, srun will drop its permis‐
2071 sions to the uid specified after node allocation is successful.
2072 user may be the user name or numerical user ID. This option
2073 applies to job and step allocations.
2074
2075
2076 --use-min-nodes
2077 If a range of node counts is given, prefer the smaller count.
2078
2079
2080 -V, --version
2081 Display version information and exit.
2082
2083
2084 -v, --verbose
2085 Increase the verbosity of srun's informational messages. Multi‐
2086 ple -v's will further increase srun's verbosity. By default
2087 only errors will be displayed. This option applies to job and
2088 step allocations.
2089
2090
2091 -W, --wait=<seconds>
2092 Specify how long to wait after the first task terminates before
2093 terminating all remaining tasks. A value of 0 indicates an
2094 unlimited wait (a warning will be issued after 60 seconds). The
2095 default value is set by the WaitTime parameter in the slurm con‐
2096 figuration file (see slurm.conf(5)). This option can be useful
2097 to ensure that a job is terminated in a timely fashion in the
2098 event that one or more tasks terminate prematurely. Note: The
2099 -K, --kill-on-bad-exit option takes precedence over -W, --wait
2100 to terminate the job immediately if a task exits with a non-zero
2101 exit code. This option applies to job allocations.
2102
2103
2104 -w, --nodelist=<host1,host2,... or filename>
2105 Request a specific list of hosts. The job will contain all of
2106 these hosts and possibly additional hosts as needed to satisfy
2107 resource requirements. The list may be specified as a
2108 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
2109 for example), or a filename. The host list will be assumed to
2110 be a filename if it contains a "/" character. If you specify a
2111 minimum node or processor count larger than can be satisfied by
2112 the supplied host list, additional resources will be allocated
2113 on other nodes as needed. Rather than repeating a host name
2114 multiple times, an asterisk and a repetition count may be
2115 appended to a host name. For example "host1,host1" and "host1*2"
2116 are equivalent. If number of tasks is given and a list of
2117 requested nodes is also given the number of nodes used from that
2118 list will be reduced to match that of the number of tasks if the
2119 number of nodes in the list is greater than the number of tasks.
2120 This option applies to job and step allocations.
2121
2122
2123 --wckey=<wckey>
2124 Specify wckey to be used with job. If TrackWCKey=no (default)
2125 in the slurm.conf this value is ignored. This option applies to
2126 job allocations.
2127
2128
2129 -X, --disable-status
2130 Disable the display of task status when srun receives a single
2131 SIGINT (Ctrl-C). Instead immediately forward the SIGINT to the
2132 running job. Without this option a second Ctrl-C in one second
2133 is required to forcibly terminate the job and srun will immedi‐
2134 ately exit. May also be set via the environment variable
2135 SLURM_DISABLE_STATUS. This option applies to job allocations.
2136
2137
2138 -x, --exclude=<host1,host2,... or filename>
2139 Request that a specific list of hosts not be included in the
2140 resources allocated to this job. The host list will be assumed
2141 to be a filename if it contains a "/"character. This option
2142 applies to job allocations.
2143
2144
2145 --x11[=<all|first|last>]
2146 Sets up X11 forwarding on all, first or last node(s) of the
2147 allocation. This option is only enabled if Slurm was compiled
2148 with X11 support and PrologFlags=x11 is defined in the
2149 slurm.conf. Default is all.
2150
2151
2152 -Z, --no-allocate
2153 Run the specified tasks on a set of nodes without creating a
2154 Slurm "job" in the Slurm queue structure, bypassing the normal
2155 resource allocation step. The list of nodes must be specified
2156 with the -w, --nodelist option. This is a privileged option
2157 only available for the users "SlurmUser" and "root". This option
2158 applies to job allocations.
2159
2160
2161 srun will submit the job request to the slurm job controller, then ini‐
2162 tiate all processes on the remote nodes. If the request cannot be met
2163 immediately, srun will block until the resources are free to run the
2164 job. If the -I (--immediate) option is specified srun will terminate if
2165 resources are not immediately available.
2166
2167 When initiating remote processes srun will propagate the current work‐
2168 ing directory, unless --chdir=<path> is specified, in which case path
2169 will become the working directory for the remote processes.
2170
2171 The -n, -c, and -N options control how CPUs and nodes will be allo‐
2172 cated to the job. When specifying only the number of processes to run
2173 with -n, a default of one CPU per process is allocated. By specifying
2174 the number of CPUs required per task (-c), more than one CPU may be
2175 allocated per process. If the number of nodes is specified with -N,
2176 srun will attempt to allocate at least the number of nodes specified.
2177
2178 Combinations of the above three options may be used to change how pro‐
2179 cesses are distributed across nodes and cpus. For instance, by specify‐
2180 ing both the number of processes and number of nodes on which to run,
2181 the number of processes per node is implied. However, if the number of
2182 CPUs per process is more important then number of processes (-n) and
2183 the number of CPUs per process (-c) should be specified.
2184
2185 srun will refuse to allocate more than one process per CPU unless
2186 --overcommit (-O) is also specified.
2187
2188 srun will attempt to meet the above specifications "at a minimum." That
2189 is, if 16 nodes are requested for 32 processes, and some nodes do not
2190 have 2 CPUs, the allocation of nodes will be increased in order to meet
2191 the demand for CPUs. In other words, a minimum of 16 nodes are being
2192 requested. However, if 16 nodes are requested for 15 processes, srun
2193 will consider this an error, as 15 processes cannot run across 16
2194 nodes.
2195
2196
2197 IO Redirection
2198
2199 By default, stdout and stderr will be redirected from all tasks to the
2200 stdout and stderr of srun, and stdin will be redirected from the stan‐
2201 dard input of srun to all remote tasks. If stdin is only to be read by
2202 a subset of the spawned tasks, specifying a file to read from rather
2203 than forwarding stdin from the srun command may be preferable as it
2204 avoids moving and storing data that will never be read.
2205
2206 For OS X, the poll() function does not support stdin, so input from a
2207 terminal is not possible.
2208
2209 For BGQ srun only supports stdin to 1 task running on the system. By
2210 default it is taskid 0 but can be changed with the -i<taskid> as
2211 described below, or --launcher-opts="--stdinrank=<taskid>".
2212
2213 This behavior may be changed with the --output, --error, and --input
2214 (-o, -e, -i) options. Valid format specifications for these options are
2215
2216 all stdout stderr is redirected from all tasks to srun. stdin is
2217 broadcast to all remote tasks. (This is the default behav‐
2218 ior)
2219
2220 none stdout and stderr is not received from any task. stdin is
2221 not sent to any task (stdin is closed).
2222
2223 taskid stdout and/or stderr are redirected from only the task with
2224 relative id equal to taskid, where 0 <= taskid <= ntasks,
2225 where ntasks is the total number of tasks in the current job
2226 step. stdin is redirected from the stdin of srun to this
2227 same task. This file will be written on the node executing
2228 the task.
2229
2230 filename srun will redirect stdout and/or stderr to the named file
2231 from all tasks. stdin will be redirected from the named file
2232 and broadcast to all tasks in the job. filename refers to a
2233 path on the host that runs srun. Depending on the cluster's
2234 file system layout, this may result in the output appearing
2235 in different places depending on whether the job is run in
2236 batch mode.
2237
2238 filename pattern
2239 srun allows for a filename pattern to be used to generate the
2240 named IO file described above. The following list of format
2241 specifiers may be used in the format string to generate a
2242 filename that will be unique to a given jobid, stepid, node,
2243 or task. In each case, the appropriate number of files are
2244 opened and associated with the corresponding tasks. Note that
2245 any format string containing %t, %n, and/or %N will be writ‐
2246 ten on the node executing the task rather than the node where
2247 srun executes, these format specifiers are not supported on a
2248 BGQ system.
2249
2250 \\ Do not process any of the replacement symbols.
2251
2252 %% The character "%".
2253
2254 %A Job array's master job allocation number.
2255
2256 %a Job array ID (index) number.
2257
2258 %J jobid.stepid of the running job. (e.g. "128.0")
2259
2260 %j jobid of the running job.
2261
2262 %s stepid of the running job.
2263
2264 %N short hostname. This will create a separate IO file
2265 per node.
2266
2267 %n Node identifier relative to current job (e.g. "0" is
2268 the first node of the running job) This will create a
2269 separate IO file per node.
2270
2271 %t task identifier (rank) relative to current job. This
2272 will create a separate IO file per task.
2273
2274 %u User name.
2275
2276 %x Job name.
2277
2278 A number placed between the percent character and format
2279 specifier may be used to zero-pad the result in the IO file‐
2280 name. This number is ignored if the format specifier corre‐
2281 sponds to non-numeric data (%N for example).
2282
2283 Some examples of how the format string may be used for a 4
2284 task job step with a Job ID of 128 and step id of 0 are
2285 included below:
2286
2287 job%J.out job128.0.out
2288
2289 job%4j.out job0128.out
2290
2291 job%j-%2t.out job128-00.out, job128-01.out, ...
2292
2294 Some srun options may be set via environment variables. These environ‐
2295 ment variables, along with their corresponding options, are listed
2296 below. Note: Command line options will always override these settings.
2297
2298 PMI_FANOUT This is used exclusively with PMI (MPICH2 and
2299 MVAPICH2) and controls the fanout of data commu‐
2300 nications. The srun command sends messages to
2301 application programs (via the PMI library) and
2302 those applications may be called upon to forward
2303 that data to up to this number of additional
2304 tasks. Higher values offload work from the srun
2305 command to the applications and likely increase
2306 the vulnerability to failures. The default value
2307 is 32.
2308
2309 PMI_FANOUT_OFF_HOST This is used exclusively with PMI (MPICH2 and
2310 MVAPICH2) and controls the fanout of data commu‐
2311 nications. The srun command sends messages to
2312 application programs (via the PMI library) and
2313 those applications may be called upon to forward
2314 that data to additional tasks. By default, srun
2315 sends one message per host and one task on that
2316 host forwards the data to other tasks on that
2317 host up to PMI_FANOUT. If PMI_FANOUT_OFF_HOST is
2318 defined, the user task may be required to forward
2319 the data to tasks on other hosts. Setting
2320 PMI_FANOUT_OFF_HOST may increase performance.
2321 Since more work is performed by the PMI library
2322 loaded by the user application, failures also can
2323 be more common and more difficult to diagnose.
2324
2325 PMI_TIME This is used exclusively with PMI (MPICH2 and
2326 MVAPICH2) and controls how much the communica‐
2327 tions from the tasks to the srun are spread out
2328 in time in order to avoid overwhelming the srun
2329 command with work. The default value is 500
2330 (microseconds) per task. On relatively slow pro‐
2331 cessors or systems with very large processor
2332 counts (and large PMI data sets), higher values
2333 may be required.
2334
2335 SLURM_CONF The location of the Slurm configuration file.
2336
2337 SLURM_ACCOUNT Same as -A, --account
2338
2339 SLURM_ACCTG_FREQ Same as --acctg-freq
2340
2341 SLURM_BCAST Same as --bcast
2342
2343 SLURM_BURST_BUFFER Same as --bb
2344
2345 SLURM_CHECKPOINT Same as --checkpoint
2346
2347 SLURM_CHECKPOINT_DIR Same as --checkpoint-dir
2348
2349 SLURM_COMPRESS Same as --compress
2350
2351 SLURM_CONSTRAINT Same as -C, --constraint
2352
2353 SLURM_CORE_SPEC Same as --core-spec
2354
2355 SLURM_CPU_BIND Same as --cpu-bind
2356
2357 SLURM_CPU_FREQ_REQ Same as --cpu-freq.
2358
2359 SLURM_CPUS_PER_TASK Same as -c, --cpus-per-task
2360
2361 SLURM_DEBUG Same as -v, --verbose
2362
2363 SLURM_DELAY_BOOT Same as --delay-boot
2364
2365 SLURMD_DEBUG Same as -d, --slurmd-debug
2366
2367 SLURM_DEPENDENCY -P, --dependency=<jobid>
2368
2369 SLURM_DISABLE_STATUS Same as -X, --disable-status
2370
2371 SLURM_DIST_PLANESIZE Same as -m plane
2372
2373 SLURM_DISTRIBUTION Same as -m, --distribution
2374
2375 SLURM_EPILOG Same as --epilog
2376
2377 SLURM_EXCLUSIVE Same as --exclusive
2378
2379 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
2380 error occurs (e.g. invalid options). This can be
2381 used by a script to distinguish application exit
2382 codes from various Slurm error conditions. Also
2383 see SLURM_EXIT_IMMEDIATE.
2384
2385 SLURM_EXIT_IMMEDIATE Specifies the exit code generated when the
2386 --immediate option is used and resources are not
2387 currently available. This can be used by a
2388 script to distinguish application exit codes from
2389 various Slurm error conditions. Also see
2390 SLURM_EXIT_ERROR.
2391
2392 SLURM_GRES_FLAGS Same as --gres-flags
2393
2394 SLURM_HINT Same as --hint
2395
2396 SLURM_GRES Same as --gres. Also see SLURM_STEP_GRES
2397
2398 SLURM_IMMEDIATE Same as -I, --immediate
2399
2400 SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
2401 Same as --jobid
2402
2403 SLURM_JOB_NAME Same as -J, --job-name except within an existing
2404 allocation, in which case it is ignored to avoid
2405 using the batch job's name as the name of each
2406 job step.
2407
2408 SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility)
2409 Same as -N, --nodes Total number of nodes in the
2410 job’s resource allocation.
2411
2412 SLURM_KILL_BAD_EXIT Same as -K, --kill-on-bad-exit
2413
2414 SLURM_LABELIO Same as -l, --label
2415
2416 SLURM_MEM_BIND Same as --mem-bind
2417
2418 SLURM_MEM_PER_CPU Same as --mem-per-cpu
2419
2420 SLURM_MEM_PER_NODE Same as --mem
2421
2422 SLURM_MPI_TYPE Same as --mpi
2423
2424 SLURM_NETWORK Same as --network
2425
2426 SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
2427 Same as -n, --ntasks
2428
2429 SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2430
2431 SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2432
2433 SLURM_NTASKS_PER_SOCKET
2434 Same as --ntasks-per-socket
2435
2436 SLURM_OPEN_MODE Same as --open-mode
2437
2438 SLURM_OVERCOMMIT Same as -O, --overcommit
2439
2440 SLURM_PARTITION Same as -p, --partition
2441
2442 SLURM_PMI_KVS_NO_DUP_KEYS
2443 If set, then PMI key-pairs will contain no dupli‐
2444 cate keys. MPI can use this variable to inform
2445 the PMI library that it will not use duplicate
2446 keys so PMI can skip the check for duplicate
2447 keys. This is the case for MPICH2 and reduces
2448 overhead in testing for duplicates for improved
2449 performance
2450
2451 SLURM_POWER Same as --power
2452
2453 SLURM_PROFILE Same as --profile
2454
2455 SLURM_PROLOG Same as --prolog
2456
2457 SLURM_QOS Same as --qos
2458
2459 SLURM_REMOTE_CWD Same as -D, --chdir=
2460
2461 SLURM_REQ_SWITCH When a tree topology is used, this defines the
2462 maximum count of switches desired for the job
2463 allocation and optionally the maximum time to
2464 wait for that number of switches. See --switches
2465
2466 SLURM_RESERVATION Same as --reservation
2467
2468 SLURM_RESTART_DIR Same as --restart-dir
2469
2470 SLURM_RESV_PORTS Same as --resv-ports
2471
2472 SLURM_SIGNAL Same as --signal
2473
2474 SLURM_STDERRMODE Same as -e, --error
2475
2476 SLURM_STDINMODE Same as -i, --input
2477
2478 SLURM_SPREAD_JOB Same as --spread-job
2479
2480 SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2481 if set and non-zero, successive task exit mes‐
2482 sages with the same exit code will be printed
2483 only once.
2484
2485 SLURM_STEP_GRES Same as --gres (only applies to job steps, not to
2486 job allocations). Also see SLURM_GRES
2487
2488 SLURM_STEP_KILLED_MSG_NODE_ID=ID
2489 If set, only the specified node will log when the
2490 job or step are killed by a signal.
2491
2492 SLURM_STDOUTMODE Same as -o, --output
2493
2494 SLURM_TASK_EPILOG Same as --task-epilog
2495
2496 SLURM_TASK_PROLOG Same as --task-prolog
2497
2498 SLURM_TEST_EXEC If defined, srun will verify existence of the
2499 executable program along with user execute per‐
2500 mission on the node where srun was called before
2501 attempting to launch it on nodes in the step.
2502
2503 SLURM_THREAD_SPEC Same as --thread-spec
2504
2505 SLURM_THREADS Same as -T, --threads
2506
2507 SLURM_TIMELIMIT Same as -t, --time
2508
2509 SLURM_UNBUFFEREDIO Same as -u, --unbuffered
2510
2511 SLURM_USE_MIN_NODES Same as --use-min-nodes
2512
2513 SLURM_WAIT Same as -W, --wait
2514
2515 SLURM_WAIT4SWITCH Max time waiting for requested switches. See
2516 --switches
2517
2518 SLURM_WCKEY Same as -W, --wckey
2519
2520 SLURM_WORKING_DIR -D, --chdir
2521
2522
2523
2525 srun will set some environment variables in the environment of the exe‐
2526 cuting tasks on the remote compute nodes. These environment variables
2527 are:
2528
2529
2530 SLURM_*_PACK_GROUP_# For a heterogenous job allocation, the environ‐
2531 ment variables are set separately for each compo‐
2532 nent.
2533
2534 SLURM_CHECKPOINT_IMAGE_DIR
2535 Directory into which checkpoint images should be
2536 written if specified on the execute line.
2537
2538 SLURM_CLUSTER_NAME Name of the cluster on which the job is execut‐
2539 ing.
2540
2541 SLURM_CPU_BIND_VERBOSE
2542 --cpu-bind verbosity (quiet,verbose).
2543
2544 SLURM_CPU_BIND_TYPE --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2545
2546 SLURM_CPU_BIND_LIST --cpu-bind map or mask list (list of Slurm CPU
2547 IDs or masks for this node, CPU_ID = Board_ID x
2548 threads_per_board + Socket_ID x
2549 threads_per_socket + Core_ID x threads_per_core +
2550 Thread_ID).
2551
2552
2553 SLURM_CPU_FREQ_REQ Contains the value requested for cpu frequency on
2554 the srun command as a numerical frequency in
2555 kilohertz, or a coded value for a request of low,
2556 medium,highm1 or high for the frequency. See the
2557 description of the --cpu-freq option or the
2558 SLURM_CPU_FREQ_REQ input environment variable.
2559
2560 SLURM_CPUS_ON_NODE Count of processors available to the job on this
2561 node. Note the select/linear plugin allocates
2562 entire nodes to jobs, so the value indicates the
2563 total count of CPUs on the node. For the
2564 select/cons_res plugin, this number indicates the
2565 number of cores on this node allocated to the
2566 job.
2567
2568 SLURM_CPUS_PER_TASK Number of cpus requested per task. Only set if
2569 the --cpus-per-task option is specified.
2570
2571 SLURM_DISTRIBUTION Distribution type for the allocated jobs. Set the
2572 distribution with -m, --distribution.
2573
2574 SLURM_GTIDS Global task IDs running on this node. Zero ori‐
2575 gin and comma separated.
2576
2577 SLURM_JOB_ACCOUNT Account name associated of the job allocation.
2578
2579 SLURM_JOB_CPUS_PER_NODE
2580 Number of CPUS per node.
2581
2582 SLURM_JOB_DEPENDENCY Set to value of the --dependency option.
2583
2584 SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
2585 Job id of the executing job.
2586
2587
2588 SLURM_JOB_NAME Set to the value of the --job-name option or the
2589 command name when srun is used to create a new
2590 job allocation. Not set when srun is used only to
2591 create a job step (i.e. within an existing job
2592 allocation).
2593
2594
2595 SLURM_JOB_PARTITION Name of the partition in which the job is run‐
2596 ning.
2597
2598
2599 SLURM_JOB_QOS Quality Of Service (QOS) of the job allocation.
2600
2601 SLURM_JOB_RESERVATION Advanced reservation containing the job alloca‐
2602 tion, if any.
2603
2604
2605 SLURM_LAUNCH_NODE_IPADDR
2606 IP address of the node from which the task launch
2607 was initiated (where the srun command ran from).
2608
2609 SLURM_LOCALID Node local task ID for the process within a job.
2610
2611
2612 SLURM_MEM_BIND_LIST --mem-bind map or mask list (<list of IDs or
2613 masks for this node>).
2614
2615 SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2616
2617 SLURM_MEM_BIND_SORT Sort free cache pages (run zonesort on Intel KNL
2618 nodes).
2619
2620 SLURM_MEM_BIND_TYPE --mem-bind type (none,rank,map_mem:,mask_mem:).
2621
2622 SLURM_MEM_BIND_VERBOSE
2623 --mem-bind verbosity (quiet,verbose).
2624
2625 SLURM_JOB_NODES Total number of nodes in the job's resource allo‐
2626 cation.
2627
2628 SLURM_NODE_ALIASES Sets of node name, communication address and
2629 hostname for nodes allocated to the job from the
2630 cloud. Each element in the set if colon separated
2631 and each set is comma separated. For example:
2632 SLURM_NODE_ALIASES=
2633 ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2634
2635 SLURM_NODEID The relative node ID of the current node.
2636
2637 SLURM_JOB_NODELIST List of nodes allocated to the job.
2638
2639 SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
2640 Total number of processes in the current job.
2641
2642 SLURM_PACK_SIZE Set to count of components in heterogeneous job.
2643
2644 SLURM_PRIO_PROCESS The scheduling priority (nice value) at the time
2645 of job submission. This value is propagated to
2646 the spawned processes.
2647
2648 SLURM_PROCID The MPI rank (or relative process ID) of the cur‐
2649 rent process.
2650
2651 SLURM_SRUN_COMM_HOST IP address of srun communication host.
2652
2653 SLURM_SRUN_COMM_PORT srun communication port.
2654
2655 SLURM_STEP_LAUNCHER_PORT
2656 Step launcher port.
2657
2658 SLURM_STEP_NODELIST List of nodes allocated to the step.
2659
2660 SLURM_STEP_NUM_NODES Number of nodes allocated to the step.
2661
2662 SLURM_STEP_NUM_TASKS Number of processes in the step.
2663
2664 SLURM_STEP_TASKS_PER_NODE
2665 Number of processes per node within the step.
2666
2667 SLURM_STEP_ID (and SLURM_STEPID for backwards compatibility)
2668 The step ID of the current job.
2669
2670 SLURM_SUBMIT_DIR The directory from which srun was invoked.
2671
2672 SLURM_SUBMIT_HOST The hostname of the computer from which salloc
2673 was invoked.
2674
2675 SLURM_TASK_PID The process ID of the task being started.
2676
2677 SLURM_TASKS_PER_NODE Number of tasks to be initiated on each node.
2678 Values are comma separated and in the same order
2679 as SLURM_JOB_NODELIST. If two or more consecu‐
2680 tive nodes are to have the same task count, that
2681 count is followed by "(x#)" where "#" is the rep‐
2682 etition count. For example,
2683 "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2684 first three nodes will each execute three tasks
2685 and the fourth node will execute one task.
2686
2687
2688 SLURM_TOPOLOGY_ADDR This is set only if the system has the topol‐
2689 ogy/tree plugin configured. The value will be
2690 set to the names network switches which may be
2691 involved in the job's communications from the
2692 system's top level switch down to the leaf switch
2693 and ending with node name. A period is used to
2694 separate each hardware component name.
2695
2696 SLURM_TOPOLOGY_ADDR_PATTERN
2697 This is set only if the system has the topol‐
2698 ogy/tree plugin configured. The value will be
2699 set component types listed in SLURM_TOPOL‐
2700 OGY_ADDR. Each component will be identified as
2701 either "switch" or "node". A period is used to
2702 separate each hardware component type.
2703
2704 SLURM_UMASK The umask in effect when the job was submitted.
2705
2706 SLURMD_NODENAME Name of the node running the task. In the case of
2707 a parallel job executing on multiple compute
2708 nodes, the various tasks will have this environ‐
2709 ment variable set to different values on each
2710 compute node.
2711
2712 SRUN_DEBUG Set to the logging level of the srun command.
2713 Default value is 3 (info level). The value is
2714 incremented or decremented based upon the --ver‐
2715 bose and --quiet options.
2716
2717
2719 Signals sent to the srun command are automatically forwarded to the
2720 tasks it is controlling with a few exceptions. The escape sequence
2721 <control-c> will report the state of all tasks associated with the srun
2722 command. If <control-c> is entered twice within one second, then the
2723 associated SIGINT signal will be sent to all tasks and a termination
2724 sequence will be entered sending SIGCONT, SIGTERM, and SIGKILL to all
2725 spawned tasks. If a third <control-c> is received, the srun program
2726 will be terminated without waiting for remote tasks to exit or their
2727 I/O to complete.
2728
2729 The escape sequence <control-z> is presently ignored. Our intent is for
2730 this put the srun command into a mode where various special actions may
2731 be invoked.
2732
2733
2735 MPI use depends upon the type of MPI being used. There are three fun‐
2736 damentally different modes of operation used by these various MPI
2737 implementation.
2738
2739 1. Slurm directly launches the tasks and performs initialization of
2740 communications through the PMI2 or PMIx APIs. For example: "srun -n16
2741 a.out".
2742
2743 2. Slurm creates a resource allocation for the job and then mpirun
2744 launches tasks using Slurm's infrastructure (OpenMPI).
2745
2746 3. Slurm creates a resource allocation for the job and then mpirun
2747 launches tasks using some mechanism other than Slurm, such as SSH or
2748 RSH. These tasks are initiated outside of Slurm's monitoring or con‐
2749 trol. Slurm's epilog should be configured to purge these tasks when the
2750 job's allocation is relinquished, or the use of pam_slurm_adopt is
2751 highly recommended.
2752
2753 See https://slurm.schedmd.com/mpi_guide.html for more information on
2754 use of these various MPI implementation with Slurm.
2755
2756
2758 Comments in the configuration file must have a "#" in column one. The
2759 configuration file contains the following fields separated by white
2760 space:
2761
2762 Task rank
2763 One or more task ranks to use this configuration. Multiple val‐
2764 ues may be comma separated. Ranges may be indicated with two
2765 numbers separated with a '-' with the smaller number first (e.g.
2766 "0-4" and not "4-0"). To indicate all tasks not otherwise spec‐
2767 ified, specify a rank of '*' as the last line of the file. If
2768 an attempt is made to initiate a task for which no executable
2769 program is defined, the following error message will be produced
2770 "No executable program specified for this task".
2771
2772 Executable
2773 The name of the program to execute. May be fully qualified
2774 pathname if desired.
2775
2776 Arguments
2777 Program arguments. The expression "%t" will be replaced with
2778 the task's number. The expression "%o" will be replaced with
2779 the task's offset within this range (e.g. a configured task rank
2780 value of "1-5" would have offset values of "0-4"). Single
2781 quotes may be used to avoid having the enclosed values inter‐
2782 preted. This field is optional. Any arguments for the program
2783 entered on the command line will be added to the arguments spec‐
2784 ified in the configuration file.
2785
2786 For example:
2787 ###################################################################
2788 # srun multiple program configuration file
2789 #
2790 # srun -n8 -l --multi-prog silly.conf
2791 ###################################################################
2792 4-6 hostname
2793 1,7 echo task:%t
2794 0,2-3 echo offset:%o
2795
2796 > srun -n8 -l --multi-prog silly.conf
2797 0: offset:0
2798 1: task:1
2799 2: offset:1
2800 3: offset:2
2801 4: linux15.llnl.gov
2802 5: linux16.llnl.gov
2803 6: linux17.llnl.gov
2804 7: task:7
2805
2806
2807
2808
2810 This simple example demonstrates the execution of the command hostname
2811 in eight tasks. At least eight processors will be allocated to the job
2812 (the same as the task count) on however many nodes are required to sat‐
2813 isfy the request. The output of each task will be proceeded with its
2814 task number. (The machine "dev" in the example below has a total of
2815 two CPUs per node)
2816
2817
2818 > srun -n8 -l hostname
2819 0: dev0
2820 1: dev0
2821 2: dev1
2822 3: dev1
2823 4: dev2
2824 5: dev2
2825 6: dev3
2826 7: dev3
2827
2828
2829 The srun -r option is used within a job script to run two job steps on
2830 disjoint nodes in the following example. The script is run using allo‐
2831 cate mode instead of as a batch job in this case.
2832
2833
2834 > cat test.sh
2835 #!/bin/sh
2836 echo $SLURM_JOB_NODELIST
2837 srun -lN2 -r2 hostname
2838 srun -lN2 hostname
2839
2840 > salloc -N4 test.sh
2841 dev[7-10]
2842 0: dev9
2843 1: dev10
2844 0: dev7
2845 1: dev8
2846
2847
2848 The following script runs two job steps in parallel within an allocated
2849 set of nodes.
2850
2851
2852 > cat test.sh
2853 #!/bin/bash
2854 srun -lN2 -n4 -r 2 sleep 60 &
2855 srun -lN2 -r 0 sleep 60 &
2856 sleep 1
2857 squeue
2858 squeue -s
2859 wait
2860
2861 > salloc -N4 test.sh
2862 JOBID PARTITION NAME USER ST TIME NODES NODELIST
2863 65641 batch test.sh grondo R 0:01 4 dev[7-10]
2864
2865 STEPID PARTITION USER TIME NODELIST
2866 65641.0 batch grondo 0:01 dev[7-8]
2867 65641.1 batch grondo 0:01 dev[9-10]
2868
2869
2870 This example demonstrates how one executes a simple MPI job. We use
2871 srun to build a list of machines (nodes) to be used by mpirun in its
2872 required format. A sample command line and the script to be executed
2873 follow.
2874
2875
2876 > cat test.sh
2877 #!/bin/sh
2878 MACHINEFILE="nodes.$SLURM_JOB_ID"
2879
2880 # Generate Machinefile for mpi such that hosts are in the same
2881 # order as if run via srun
2882 #
2883 srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
2884
2885 # Run using generated Machine file:
2886 mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
2887
2888 rm $MACHINEFILE
2889
2890 > salloc -N2 -n4 test.sh
2891
2892
2893 This simple example demonstrates the execution of different jobs on
2894 different nodes in the same srun. You can do this for any number of
2895 nodes or any number of jobs. The executables are placed on the nodes
2896 sited by the SLURM_NODEID env var. Starting at 0 and going to the num‐
2897 ber specified on the srun commandline.
2898
2899
2900 > cat test.sh
2901 case $SLURM_NODEID in
2902 0) echo "I am running on "
2903 hostname ;;
2904 1) hostname
2905 echo "is where I am running" ;;
2906 esac
2907
2908 > srun -N2 test.sh
2909 dev0
2910 is where I am running
2911 I am running on
2912 dev1
2913
2914
2915 This example demonstrates use of multi-core options to control layout
2916 of tasks. We request that four sockets per node and two cores per
2917 socket be dedicated to the job.
2918
2919
2920 > srun -N2 -B 4-4:2-2 a.out
2921
2922 This example shows a script in which Slurm is used to provide resource
2923 management for a job by executing the various job steps as processors
2924 become available for their dedicated use.
2925
2926
2927 > cat my.script
2928 #!/bin/bash
2929 srun --exclusive -n4 prog1 &
2930 srun --exclusive -n3 prog2 &
2931 srun --exclusive -n1 prog3 &
2932 srun --exclusive -n1 prog4 &
2933 wait
2934
2935
2936 This example shows how to launch an application called "master" with
2937 one task, 8 CPUs and and 16 GB of memory (2 GB per CPU) plus another
2938 application called "slave" with 16 tasks, 1 CPU per task (the default)
2939 and 1 GB of memory per task.
2940
2941
2942 > srun -n1 -c16 --mem-per-cpu=1gb master : -n16 --mem-per-cpu=1gb slave
2943
2944
2946 Copyright (C) 2006-2007 The Regents of the University of California.
2947 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
2948 Copyright (C) 2008-2010 Lawrence Livermore National Security.
2949 Copyright (C) 2010-2015 SchedMD LLC.
2950
2951 This file is part of Slurm, a resource management program. For
2952 details, see <https://slurm.schedmd.com/>.
2953
2954 Slurm is free software; you can redistribute it and/or modify it under
2955 the terms of the GNU General Public License as published by the Free
2956 Software Foundation; either version 2 of the License, or (at your
2957 option) any later version.
2958
2959 Slurm is distributed in the hope that it will be useful, but WITHOUT
2960 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
2961 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
2962 for more details.
2963
2964
2966 salloc(1), sattach(1), sbatch(1), sbcast(1), scancel(1), scontrol(1),
2967 squeue(1), slurm.conf(5), sched_setaffinity (2), numa (3) getrlimit (2)
2968
2969
2970
2971November 2018 Slurm Commands srun(1)