1srun(1) Slurm Commands srun(1)
2
3
4
6 srun - Run parallel jobs
7
8
10 srun [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]]
11 executable(N) [args(N)...]
12
13 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
14 For more details about heterogeneous jobs see the document
15 https://slurm.schedmd.com/heterogeneous_jobs.html
16
17
19 Run a parallel job on cluster managed by Slurm. If necessary, srun
20 will first create a resource allocation in which to run the parallel
21 job.
22
23 The following document describes the influence of various options on
24 the allocation of cpus to jobs and tasks.
25 https://slurm.schedmd.com/cpu_management.html
26
27
29 srun will return the highest exit code of all tasks run or the highest
30 signal (with the high-order bit set in an 8-bit integer -- e.g. 128 +
31 signal) of any task that exited with a signal.
32 The value 253 is reserved for out-of-memory errors.
33
34
36 The executable is resolved in the following order:
37
38 1. If executable starts with ".", then path is constructed as: current
39 working directory / executable
40 2. If executable starts with a "/", then path is considered absolute.
41 3. If executable can be resolved through PATH. See path_resolution(7).
42 4. If executable is in current working directory.
43
44 Current working directory is the calling process working directory
45 unless the --chdir argument is passed, which will override the current
46 working directory.
47
48
50 --accel-bind=<options>
51 Control how tasks are bound to generic resources of type gpu,
52 mic and nic. Multiple options may be specified. Supported
53 options include:
54
55 g Bind each task to GPUs which are closest to the allocated
56 CPUs.
57
58 m Bind each task to MICs which are closest to the allocated
59 CPUs.
60
61 n Bind each task to NICs which are closest to the allocated
62 CPUs.
63
64 v Verbose mode. Log how tasks are bound to GPU and NIC
65 devices.
66
67 This option applies to job allocations.
68
69
70 -A, --account=<account>
71 Charge resources used by this job to specified account. The
72 account is an arbitrary string. The account name may be changed
73 after job submission using the scontrol command. This option
74 applies to job allocations.
75
76
77 --acctg-freq
78 Define the job accounting and profiling sampling intervals.
79 This can be used to override the JobAcctGatherFrequency parame‐
80 ter in Slurm's configuration file, slurm.conf. The supported
81 format is follows:
82
83 --acctg-freq=<datatype>=<interval>
84 where <datatype>=<interval> specifies the task sam‐
85 pling interval for the jobacct_gather plugin or a
86 sampling interval for a profiling type by the
87 acct_gather_profile plugin. Multiple, comma-sepa‐
88 rated <datatype>=<interval> intervals may be speci‐
89 fied. Supported datatypes are as follows:
90
91 task=<interval>
92 where <interval> is the task sampling inter‐
93 val in seconds for the jobacct_gather plugins
94 and for task profiling by the
95 acct_gather_profile plugin. NOTE: This fre‐
96 quency is used to monitor memory usage. If
97 memory limits are enforced the highest fre‐
98 quency a user can request is what is config‐
99 ured in the slurm.conf file. They can not
100 turn it off (=0) either.
101
102 energy=<interval>
103 where <interval> is the sampling interval in
104 seconds for energy profiling using the
105 acct_gather_energy plugin
106
107 network=<interval>
108 where <interval> is the sampling interval in
109 seconds for infiniband profiling using the
110 acct_gather_interconnect plugin.
111
112 filesystem=<interval>
113 where <interval> is the sampling interval in
114 seconds for filesystem profiling using the
115 acct_gather_filesystem plugin.
116
117 The default value for the task sampling
118 interval
119 is 30. The default value for all other intervals is 0. An
120 interval of 0 disables sampling of the specified type. If the
121 task sampling interval is 0, accounting information is collected
122 only at job termination (reducing Slurm interference with the
123 job).
124 Smaller (non-zero) values have a greater impact upon job perfor‐
125 mance, but a value of 30 seconds is not likely to be noticeable
126 for applications having less than 10,000 tasks. This option
127 applies job allocations.
128
129
130 -B --extra-node-info=<sockets[:cores[:threads]]>
131 Restrict node selection to nodes with at least the specified
132 number of sockets, cores per socket and/or threads per core.
133 NOTE: These options do not specify the resource allocation size.
134 Each value specified is considered a minimum. An asterisk (*)
135 can be used as a placeholder indicating that all available
136 resources of that type are to be utilized. Values can also be
137 specified as min-max. The individual levels can also be speci‐
138 fied in separate options if desired:
139 --sockets-per-node=<sockets>
140 --cores-per-socket=<cores>
141 --threads-per-core=<threads>
142 If task/affinity plugin is enabled, then specifying an alloca‐
143 tion in this manner also sets a default --cpu-bind option of
144 threads if the -B option specifies a thread count, otherwise an
145 option of cores if a core count is specified, otherwise an
146 option of sockets. If SelectType is configured to
147 select/cons_res, it must have a parameter of CR_Core,
148 CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option
149 to be honored. If not specified, the scontrol show job will
150 display 'ReqS:C:T=*:*:*'. This option applies to job alloca‐
151 tions. NOTE: This option is mutually exclusive with --hint,
152 --threads-per-core and --ntasks-per-core.
153
154
155 --bb=<spec>
156 Burst buffer specification. The form of the specification is
157 system dependent. Also see --bbf. This option applies to job
158 allocations.
159
160
161 --bbf=<file_name>
162 Path of file containing burst buffer specification. The form of
163 the specification is system dependent. Also see --bb. This
164 option applies to job allocations.
165
166
167 --bcast[=<dest_path>]
168 Copy executable file to allocated compute nodes. If a file name
169 is specified, copy the executable to the specified destination
170 file path. If the path specified ends with '/' it is treated as
171 a target directory, and the destination file name will be
172 slurm_bcast_<job_id>.<step_id>_<nodename>. If no dest_path is
173 specified, then the current working directory is used, and the
174 filename follows the above pattern. For example, "srun
175 --bcast=/tmp/mine -N3 a.out" will copy the file "a.out" from
176 your current directory to the file "/tmp/mine" on each of the
177 three allocated compute nodes and execute that file. This option
178 applies to step allocations.
179
180
181 -b, --begin=<time>
182 Defer initiation of this job until the specified time. It
183 accepts times of the form HH:MM:SS to run a job at a specific
184 time of day (seconds are optional). (If that time is already
185 past, the next day is assumed.) You may also specify midnight,
186 noon, fika (3 PM) or teatime (4 PM) and you can have a
187 time-of-day suffixed with AM or PM for running in the morning or
188 the evening. You can also say what day the job will be run, by
189 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
190 Combine date and time using the following format
191 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
192 count time-units, where the time-units can be seconds (default),
193 minutes, hours, days, or weeks and you can tell Slurm to run the
194 job today with the keyword today and to run the job tomorrow
195 with the keyword tomorrow. The value may be changed after job
196 submission using the scontrol command. For example:
197 --begin=16:00
198 --begin=now+1hour
199 --begin=now+60 (seconds by default)
200 --begin=2010-01-20T12:34:00
201
202
203 Notes on date/time specifications:
204 - Although the 'seconds' field of the HH:MM:SS time specifica‐
205 tion is allowed by the code, note that the poll time of the
206 Slurm scheduler is not precise enough to guarantee dispatch of
207 the job on the exact second. The job will be eligible to start
208 on the next poll following the specified time. The exact poll
209 interval depends on the Slurm scheduler (e.g., 60 seconds with
210 the default sched/builtin).
211 - If no time (HH:MM:SS) is specified, the default is
212 (00:00:00).
213 - If a date is specified without a year (e.g., MM/DD) then the
214 current year is assumed, unless the combination of MM/DD and
215 HH:MM:SS has already passed for that year, in which case the
216 next year is used.
217 This option applies to job allocations.
218
219
220 --cluster-constraint=<list>
221 Specifies features that a federated cluster must have to have a
222 sibling job submitted to it. Slurm will attempt to submit a sib‐
223 ling job to a cluster if it has at least one of the specified
224 features.
225
226
227 --comment=<string>
228 An arbitrary comment. This option applies to job allocations.
229
230
231 --compress[=type]
232 Compress file before sending it to compute hosts. The optional
233 argument specifies the data compression library to be used.
234 Supported values are "lz4" (default) and "zlib". Some compres‐
235 sion libraries may be unavailable on some systems. For use with
236 the --bcast option. This option applies to step allocations.
237
238
239 -C, --constraint=<list>
240 Nodes can have features assigned to them by the Slurm adminis‐
241 trator. Users can specify which of these features are required
242 by their job using the constraint option. Only nodes having
243 features matching the job constraints will be used to satisfy
244 the request. Multiple constraints may be specified with AND,
245 OR, matching OR, resource counts, etc. (some operators are not
246 supported on all system types). Supported constraint options
247 include:
248
249 Single Name
250 Only nodes which have the specified feature will be used.
251 For example, --constraint="intel"
252
253 Node Count
254 A request can specify the number of nodes needed with
255 some feature by appending an asterisk and count after the
256 feature name. For example, --nodes=16 --con‐
257 straint="graphics*4 ..." indicates that the job requires
258 16 nodes and that at least four of those nodes must have
259 the feature "graphics."
260
261 AND If only nodes with all of specified features will be
262 used. The ampersand is used for an AND operator. For
263 example, --constraint="intel&gpu"
264
265 OR If only nodes with at least one of specified features
266 will be used. The vertical bar is used for an OR opera‐
267 tor. For example, --constraint="intel|amd"
268
269 Matching OR
270 If only one of a set of possible options should be used
271 for all allocated nodes, then use the OR operator and
272 enclose the options within square brackets. For example,
273 --constraint="[rack1|rack2|rack3|rack4]" might be used to
274 specify that all nodes must be allocated on a single rack
275 of the cluster, but any of those four racks can be used.
276
277 Multiple Counts
278 Specific counts of multiple resources may be specified by
279 using the AND operator and enclosing the options within
280 square brackets. For example, --con‐
281 straint="[rack1*2&rack2*4]" might be used to specify that
282 two nodes must be allocated from nodes with the feature
283 of "rack1" and four nodes must be allocated from nodes
284 with the feature "rack2".
285
286 NOTE: This construct does not support multiple Intel KNL
287 NUMA or MCDRAM modes. For example, while --con‐
288 straint="[(knl&quad)*2&(knl&hemi)*4]" is not supported,
289 --constraint="[haswell*2&(knl&hemi)*4]" is supported.
290 Specification of multiple KNL modes requires the use of a
291 heterogeneous job.
292
293 Brackets
294 Brackets can be used to indicate that you are looking for
295 a set of nodes with the different requirements contained
296 within the brackets. For example, --con‐
297 straint="[(rack1|rack2)*1&(rack3)*2]" will get you one
298 node with either the "rack1" or "rack2" features and two
299 nodes with the "rack3" feature. The same request without
300 the brackets will try to find a single node that meets
301 those requirements.
302
303 Parenthesis
304 Parenthesis can be used to group like node features
305 together. For example, --con‐
306 straint="[(knl&snc4&flat)*4&haswell*1]" might be used to
307 specify that four nodes with the features "knl", "snc4"
308 and "flat" plus one node with the feature "haswell" are
309 required. All options within parenthesis should be
310 grouped with AND (e.g. "&") operands.
311
312 WARNING: When srun is executed from within salloc or sbatch, the
313 constraint value can only contain a single feature name. None of
314 the other operators are currently supported for job steps.
315 This option applies to job and step allocations.
316
317
318 --contiguous
319 If set, then the allocated nodes must form a contiguous set.
320
321 NOTE: If SelectPlugin=cons_res this option won't be honored with
322 the topology/tree or topology/3d_torus plugins, both of which
323 can modify the node ordering. This option applies to job alloca‐
324 tions.
325
326
327 --cores-per-socket=<cores>
328 Restrict node selection to nodes with at least the specified
329 number of cores per socket. See additional information under -B
330 option above when task/affinity plugin is enabled. This option
331 applies to job allocations.
332
333
334 --cpu-bind=[{quiet,verbose},]type
335 Bind tasks to CPUs. Used only when the task/affinity or
336 task/cgroup plugin is enabled. NOTE: To have Slurm always
337 report on the selected CPU binding for all commands executed in
338 a shell, you can enable verbose mode by setting the
339 SLURM_CPU_BIND environment variable value to "verbose".
340
341 The following informational environment variables are set when
342 --cpu-bind is in use:
343 SLURM_CPU_BIND_VERBOSE
344 SLURM_CPU_BIND_TYPE
345 SLURM_CPU_BIND_LIST
346
347 See the ENVIRONMENT VARIABLES section for a more detailed
348 description of the individual SLURM_CPU_BIND variables. These
349 variable are available only if the task/affinity plugin is con‐
350 figured.
351
352 When using --cpus-per-task to run multithreaded tasks, be aware
353 that CPU binding is inherited from the parent of the process.
354 This means that the multithreaded task should either specify or
355 clear the CPU binding itself to avoid having all threads of the
356 multithreaded task use the same mask/CPU as the parent. Alter‐
357 natively, fat masks (masks which specify more than one allowed
358 CPU) could be used for the tasks in order to provide multiple
359 CPUs for the multithreaded tasks.
360
361 Note that a job step can be allocated different numbers of CPUs
362 on each node or be allocated CPUs not starting at location zero.
363 Therefore one of the options which automatically generate the
364 task binding is recommended. Explicitly specified masks or
365 bindings are only honored when the job step has been allocated
366 every available CPU on the node.
367
368 Binding a task to a NUMA locality domain means to bind the task
369 to the set of CPUs that belong to the NUMA locality domain or
370 "NUMA node". If NUMA locality domain options are used on sys‐
371 tems with no NUMA support, then each socket is considered a
372 locality domain.
373
374 If the --cpu-bind option is not used, the default binding mode
375 will depend upon Slurm's configuration and the step's resource
376 allocation. If all allocated nodes have the same configured
377 CpuBind mode, that will be used. Otherwise if the job's Parti‐
378 tion has a configured CpuBind mode, that will be used. Other‐
379 wise if Slurm has a configured TaskPluginParam value, that mode
380 will be used. Otherwise automatic binding will be performed as
381 described below.
382
383
384 Auto Binding
385 Applies only when task/affinity is enabled. If the job
386 step allocation includes an allocation with a number of
387 sockets, cores, or threads equal to the number of tasks
388 times cpus-per-task, then the tasks will by default be
389 bound to the appropriate resources (auto binding). Dis‐
390 able this mode of operation by explicitly setting
391 "--cpu-bind=none". Use TaskPluginParam=auto‐
392 bind=[threads|cores|sockets] to set a default cpu binding
393 in case "auto binding" doesn't find a match.
394
395 Supported options include:
396
397 q[uiet]
398 Quietly bind before task runs (default)
399
400 v[erbose]
401 Verbosely report binding before task runs
402
403 no[ne] Do not bind tasks to CPUs (default unless auto
404 binding is applied)
405
406 rank Automatically bind by task rank. The lowest num‐
407 bered task on each node is bound to socket (or
408 core or thread) zero, etc. Not supported unless
409 the entire node is allocated to the job.
410
411 map_cpu:<list>
412 Bind by setting CPU masks on tasks (or ranks) as
413 specified where <list> is
414 <cpu_id_for_task_0>,<cpu_id_for_task_1>,... CPU
415 IDs are interpreted as decimal values unless they
416 are preceded with '0x' in which case they inter‐
417 preted as hexadecimal values. If the number of
418 tasks (or ranks) exceeds the number of elements in
419 this list, elements in the list will be reused as
420 needed starting from the beginning of the list.
421 To simplify support for large task counts, the
422 lists may follow a map with an asterisk and repe‐
423 tition count. For example
424 "map_cpu:0x0f*4,0xf0*4". Not supported unless the
425 entire node is allocated to the job.
426
427 mask_cpu:<list>
428 Bind by setting CPU masks on tasks (or ranks) as
429 specified where <list> is
430 <cpu_mask_for_task_0>,<cpu_mask_for_task_1>,...
431 The mapping is specified for a node and identical
432 mapping is applied to the tasks on every node
433 (i.e. the lowest task ID on each node is mapped to
434 the first mask specified in the list, etc.). CPU
435 masks are always interpreted as hexadecimal values
436 but can be preceded with an optional '0x'. If the
437 number of tasks (or ranks) exceeds the number of
438 elements in this list, elements in the list will
439 be reused as needed starting from the beginning of
440 the list. To simplify support for large task
441 counts, the lists may follow a map with an aster‐
442 isk and repetition count. For example
443 "mask_cpu:0x0f*4,0xf0*4". Not supported unless
444 the entire node is allocated to the job.
445
446 rank_ldom
447 Bind to a NUMA locality domain by rank. Not sup‐
448 ported unless the entire node is allocated to the
449 job.
450
451 map_ldom:<list>
452 Bind by mapping NUMA locality domain IDs to tasks
453 as specified where <list> is
454 <ldom1>,<ldom2>,...<ldomN>. The locality domain
455 IDs are interpreted as decimal values unless they
456 are preceded with '0x' in which case they are
457 interpreted as hexadecimal values. Not supported
458 unless the entire node is allocated to the job.
459
460 mask_ldom:<list>
461 Bind by setting NUMA locality domain masks on
462 tasks as specified where <list> is
463 <mask1>,<mask2>,...<maskN>. NUMA locality domain
464 masks are always interpreted as hexadecimal values
465 but can be preceded with an optional '0x'. Not
466 supported unless the entire node is allocated to
467 the job.
468
469 sockets
470 Automatically generate masks binding tasks to
471 sockets. Only the CPUs on the socket which have
472 been allocated to the job will be used. If the
473 number of tasks differs from the number of allo‐
474 cated sockets this can result in sub-optimal bind‐
475 ing.
476
477 cores Automatically generate masks binding tasks to
478 cores. If the number of tasks differs from the
479 number of allocated cores this can result in
480 sub-optimal binding.
481
482 threads
483 Automatically generate masks binding tasks to
484 threads. If the number of tasks differs from the
485 number of allocated threads this can result in
486 sub-optimal binding.
487
488 ldoms Automatically generate masks binding tasks to NUMA
489 locality domains. If the number of tasks differs
490 from the number of allocated locality domains this
491 can result in sub-optimal binding.
492
493 boards Automatically generate masks binding tasks to
494 boards. If the number of tasks differs from the
495 number of allocated boards this can result in
496 sub-optimal binding. This option is supported by
497 the task/cgroup plugin only.
498
499 help Show help message for cpu-bind
500
501 This option applies to job and step allocations.
502
503
504 --cpu-freq =<p1[-p2[:p3]]>
505
506 Request that the job step initiated by this srun command be run
507 at some requested frequency if possible, on the CPUs selected
508 for the step on the compute node(s).
509
510 p1 can be [#### | low | medium | high | highm1] which will set
511 the frequency scaling_speed to the corresponding value, and set
512 the frequency scaling_governor to UserSpace. See below for defi‐
513 nition of the values.
514
515 p1 can be [Conservative | OnDemand | Performance | PowerSave]
516 which will set the scaling_governor to the corresponding value.
517 The governor has to be in the list set by the slurm.conf option
518 CpuFreqGovernors.
519
520 When p2 is present, p1 will be the minimum scaling frequency and
521 p2 will be the maximum scaling frequency.
522
523 p2 can be [#### | medium | high | highm1] p2 must be greater
524 than p1.
525
526 p3 can be [Conservative | OnDemand | Performance | PowerSave |
527 UserSpace] which will set the governor to the corresponding
528 value.
529
530 If p3 is UserSpace, the frequency scaling_speed will be set by a
531 power or energy aware scheduling strategy to a value between p1
532 and p2 that lets the job run within the site's power goal. The
533 job may be delayed if p1 is higher than a frequency that allows
534 the job to run within the goal.
535
536 If the current frequency is < min, it will be set to min. Like‐
537 wise, if the current frequency is > max, it will be set to max.
538
539 Acceptable values at present include:
540
541 #### frequency in kilohertz
542
543 Low the lowest available frequency
544
545 High the highest available frequency
546
547 HighM1 (high minus one) will select the next highest
548 available frequency
549
550 Medium attempts to set a frequency in the middle of the
551 available range
552
553 Conservative attempts to use the Conservative CPU governor
554
555 OnDemand attempts to use the OnDemand CPU governor (the
556 default value)
557
558 Performance attempts to use the Performance CPU governor
559
560 PowerSave attempts to use the PowerSave CPU governor
561
562 UserSpace attempts to use the UserSpace CPU governor
563
564
565 The following informational environment variable is set
566 in the job
567 step when --cpu-freq option is requested.
568 SLURM_CPU_FREQ_REQ
569
570 This environment variable can also be used to supply the value
571 for the CPU frequency request if it is set when the 'srun' com‐
572 mand is issued. The --cpu-freq on the command line will over‐
573 ride the environment variable value. The form on the environ‐
574 ment variable is the same as the command line. See the ENVIRON‐
575 MENT VARIABLES section for a description of the
576 SLURM_CPU_FREQ_REQ variable.
577
578 NOTE: This parameter is treated as a request, not a requirement.
579 If the job step's node does not support setting the CPU fre‐
580 quency, or the requested value is outside the bounds of the
581 legal frequencies, an error is logged, but the job step is
582 allowed to continue.
583
584 NOTE: Setting the frequency for just the CPUs of the job step
585 implies that the tasks are confined to those CPUs. If task con‐
586 finement (i.e., TaskPlugin=task/affinity or TaskPlu‐
587 gin=task/cgroup with the "ConstrainCores" option) is not config‐
588 ured, this parameter is ignored.
589
590 NOTE: When the step completes, the frequency and governor of
591 each selected CPU is reset to the previous values.
592
593 NOTE: When submitting jobs with the --cpu-freq option with lin‐
594 uxproc as the ProctrackType can cause jobs to run too quickly
595 before Accounting is able to poll for job information. As a
596 result not all of accounting information will be present.
597
598 This option applies to job and step allocations.
599
600
601 --cpus-per-gpu=<ncpus>
602 Advise Slurm that ensuing job steps will require ncpus proces‐
603 sors per allocated GPU. Not compatible with the --cpus-per-task
604 option.
605
606
607 -c, --cpus-per-task=<ncpus>
608 Request that ncpus be allocated per process. This may be useful
609 if the job is multithreaded and requires more than one CPU per
610 task for optimal performance. The default is one CPU per
611 process. If -c is specified without -n, as many tasks will be
612 allocated per node as possible while satisfying the -c restric‐
613 tion. For instance on a cluster with 8 CPUs per node, a job
614 request for 4 nodes and 3 CPUs per task may be allocated 3 or 6
615 CPUs per node (1 or 2 tasks per node) depending upon resource
616 consumption by other jobs. Such a job may be unable to execute
617 more than a total of 4 tasks.
618
619 WARNING: There are configurations and options interpreted dif‐
620 ferently by job and job step requests which can result in incon‐
621 sistencies for this option. For example srun -c2
622 --threads-per-core=1 prog may allocate two cores for the job,
623 but if each of those cores contains two threads, the job alloca‐
624 tion will include four CPUs. The job step allocation will then
625 launch two threads per CPU for a total of two tasks.
626
627 WARNING: When srun is executed from within salloc or sbatch,
628 there are configurations and options which can result in incon‐
629 sistent allocations when -c has a value greater than -c on sal‐
630 loc or sbatch.
631
632 This option applies to job allocations.
633
634
635 --deadline=<OPT>
636 remove the job if no ending is possible before this deadline
637 (start > (deadline - time[-min])). Default is no deadline.
638 Valid time formats are:
639 HH:MM[:SS] [AM|PM]
640 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
641 MM/DD[/YY]-HH:MM[:SS]
642 YYYY-MM-DD[THH:MM[:SS]]]
643 now[+count[seconds(default)|minutes|hours|days|weeks]]
644
645 This option applies only to job allocations.
646
647
648 --delay-boot=<minutes>
649 Do not reboot nodes in order to satisfied this job's feature
650 specification if the job has been eligible to run for less than
651 this time period. If the job has waited for less than the spec‐
652 ified period, it will use only nodes which already have the
653 specified features. The argument is in units of minutes. A
654 default value may be set by a system administrator using the
655 delay_boot option of the SchedulerParameters configuration
656 parameter in the slurm.conf file, otherwise the default value is
657 zero (no delay).
658
659 This option applies only to job allocations.
660
661
662 -d, --dependency=<dependency_list>
663 Defer the start of this job until the specified dependencies
664 have been satisfied completed. This option does not apply to job
665 steps (executions of srun within an existing salloc or sbatch
666 allocation) only to job allocations. <dependency_list> is of
667 the form <type:job_id[:job_id][,type:job_id[:job_id]]> or
668 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
669 must be satisfied if the "," separator is used. Any dependency
670 may be satisfied if the "?" separator is used. Only one separa‐
671 tor may be used. Many jobs can share the same dependency and
672 these jobs may even belong to different users. The value may
673 be changed after job submission using the scontrol command.
674 Dependencies on remote jobs are allowed in a federation. Once a
675 job dependency fails due to the termination state of a preceding
676 job, the dependent job will never be run, even if the preceding
677 job is requeued and has a different termination state in a sub‐
678 sequent execution. This option applies to job allocations.
679
680 after:job_id[[+time][:jobid[+time]...]]
681 After the specified jobs start or are cancelled and
682 'time' in minutes from job start or cancellation happens,
683 this job can begin execution. If no 'time' is given then
684 there is no delay after start or cancellation.
685
686 afterany:job_id[:jobid...]
687 This job can begin execution after the specified jobs
688 have terminated.
689
690 afterburstbuffer:job_id[:jobid...]
691 This job can begin execution after the specified jobs
692 have terminated and any associated burst buffer stage out
693 operations have completed.
694
695 aftercorr:job_id[:jobid...]
696 A task of this job array can begin execution after the
697 corresponding task ID in the specified job has completed
698 successfully (ran to completion with an exit code of
699 zero).
700
701 afternotok:job_id[:jobid...]
702 This job can begin execution after the specified jobs
703 have terminated in some failed state (non-zero exit code,
704 node failure, timed out, etc).
705
706 afterok:job_id[:jobid...]
707 This job can begin execution after the specified jobs
708 have successfully executed (ran to completion with an
709 exit code of zero).
710
711 expand:job_id
712 Resources allocated to this job should be used to expand
713 the specified job. The job to expand must share the same
714 QOS (Quality of Service) and partition. Gang scheduling
715 of resources in the partition is also not supported.
716 "expand" is not allowed for jobs that didn't originate on
717 the same cluster as the submitted job.
718
719 singleton
720 This job can begin execution after any previously
721 launched jobs sharing the same job name and user have
722 terminated. In other words, only one job by that name
723 and owned by that user can be running or suspended at any
724 point in time. In a federation, a singleton dependency
725 must be fulfilled on all clusters unless DependencyParam‐
726 eters=disable_remote_singleton is used in slurm.conf.
727
728
729 -D, --chdir=<path>
730 Have the remote processes do a chdir to path before beginning
731 execution. The default is to chdir to the current working direc‐
732 tory of the srun process. The path can be specified as full path
733 or relative path to the directory where the command is executed.
734 This option applies to job allocations.
735
736
737 -e, --error=<filename pattern>
738 Specify how stderr is to be redirected. By default in interac‐
739 tive mode, srun redirects stderr to the same file as stdout, if
740 one is specified. The --error option is provided to allow stdout
741 and stderr to be redirected to different locations. See IO Re‐
742 direction below for more options. If the specified file already
743 exists, it will be overwritten. This option applies to job and
744 step allocations.
745
746
747 -E, --preserve-env
748 Pass the current values of environment variables SLURM_JOB_NODES
749 and SLURM_NTASKS through to the executable, rather than comput‐
750 ing them from commandline parameters. This option applies to job
751 allocations.
752
753
754 --exact
755 Allow a step access to only the resources requested for the
756 step. By default, all non-GRES resources on each node in the
757 step allocation will be used. Note that no other parallel step
758 will have access to those resources unless --overlap is speci‐
759 fied. This option applies to step allocations.
760
761
762 --epilog=<executable>
763 srun will run executable just after the job step completes. The
764 command line arguments for executable will be the command and
765 arguments of the job step. If executable is "none", then no
766 srun epilog will be run. This parameter overrides the SrunEpilog
767 parameter in slurm.conf. This parameter is completely indepen‐
768 dent from the Epilog parameter in slurm.conf. This option
769 applies to job allocations.
770
771
772
773 --exclusive[=user|mcs]
774 This option applies to job and job step allocations, and has two
775 slightly different meanings for each one. When used to initiate
776 a job, the job allocation cannot share nodes with other running
777 jobs (or just other users with the "=user" option or "=mcs"
778 option). The default shared/exclusive behavior depends on sys‐
779 tem configuration and the partition's OverSubscribe option takes
780 precedence over the job's option.
781
782 This option can also be used when initiating more than one job
783 step within an existing resource allocation (default), where you
784 want separate processors to be dedicated to each job step. If
785 sufficient processors are not available to initiate the job
786 step, it will be deferred. This can be thought of as providing a
787 mechanism for resource management to the job within its alloca‐
788 tion (--exact implied).
789
790 The exclusive allocation of CPUs applies to job steps by
791 default. In order to share the resources use the --overlap
792 option.
793
794 See EXAMPLE below.
795
796
797 --export=<[ALL,]environment variables|ALL|NONE>
798 Identify which environment variables from the submission envi‐
799 ronment are propagated to the launched application.
800
801 --export=ALL
802 Default mode if --export is not specified. All of the
803 users environment will be loaded from callers environ‐
804 ment.
805
806 --export=NONE
807 None of the user environment will be defined. User
808 must use absolute path to the binary to be executed
809 that will define the environment. User can not specify
810 explicit environment variables with NONE.
811 This option is particularly important for jobs that
812 are submitted on one cluster and execute on a differ‐
813 ent cluster (e.g. with different paths). To avoid
814 steps inheriting environment export settings (e.g.
815 NONE) from sbatch command, either set --export=ALL or
816 the environment variable SLURM_EXPORT_ENV should be
817 set to ALL.
818
819 --export=<[ALL,]environment variables>
820 Exports all SLURM* environment variables along with
821 explicitly defined variables. Multiple environment
822 variable names should be comma separated. Environment
823 variable names may be specified to propagate the cur‐
824 rent value (e.g. "--export=EDITOR") or specific values
825 may be exported (e.g. "--export=EDITOR=/bin/emacs").
826 If ALL is specified, then all user environment vari‐
827 ables will be loaded and will take precedence over any
828 explicitly given environment variables.
829
830 Example: --export=EDITOR,ARG1=test
831 In this example, the propagated environment will only
832 contain the variable EDITOR from the user's environ‐
833 ment, SLURM_* environment variables, and ARG1=test.
834
835 Example: --export=ALL,EDITOR=/bin/emacs
836 There are two possible outcomes for this example. If
837 the caller has the EDITOR environment variable
838 defined, then the job's environment will inherit the
839 variable from the caller's environment. If the caller
840 doesn't have an environment variable defined for EDI‐
841 TOR, then the job's environment will use the value
842 given by --export.
843
844
845 -F, --nodefile=<node file>
846 Much like --nodelist, but the list is contained in a file of
847 name node file. The node names of the list may also span multi‐
848 ple lines in the file. Duplicate node names in the file will
849 be ignored. The order of the node names in the list is not
850 important; the node names will be sorted by Slurm.
851
852
853 --gid=<group>
854 If srun is run as root, and the --gid option is used, submit the
855 job with group's group access permissions. group may be the
856 group name or the numerical group ID. This option applies to job
857 allocations.
858
859
860 -G, --gpus=[<type>:]<number>
861 Specify the total number of GPUs required for the job. An
862 optional GPU type specification can be supplied. For example
863 "--gpus=volta:3". Multiple options can be requested in a comma
864 separated list, for example: "--gpus=volta:3,kepler:1". See
865 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
866 options.
867
868
869 --gpu-bind=[verbose,]<type>
870 Bind tasks to specific GPUs. By default every spawned task can
871 access every GPU allocated to the job. If "verbose," is speci‐
872 fied before <type>, then print out GPU binding information.
873
874 Supported type options:
875
876 closest Bind each task to the GPU(s) which are closest. In a
877 NUMA environment, each task may be bound to more than
878 one GPU (i.e. all GPUs in that NUMA environment).
879
880 map_gpu:<list>
881 Bind by setting GPU masks on tasks (or ranks) as spec‐
882 ified where <list> is
883 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
884 are interpreted as decimal values unless they are pre‐
885 ceded with '0x' in which case they interpreted as
886 hexadecimal values. If the number of tasks (or ranks)
887 exceeds the number of elements in this list, elements
888 in the list will be reused as needed starting from the
889 beginning of the list. To simplify support for large
890 task counts, the lists may follow a map with an aster‐
891 isk and repetition count. For example
892 "map_gpu:0*4,1*4". If the task/cgroup plugin is used
893 and ConstrainDevices is set in cgroup.conf, then the
894 GPU IDs are zero-based indexes relative to the GPUs
895 allocated to the job (e.g. the first GPU is 0, even if
896 the global ID is 3). Otherwise, the GPU IDs are global
897 IDs, and all GPUs on each node in the job should be
898 allocated for predictable binding results.
899
900 mask_gpu:<list>
901 Bind by setting GPU masks on tasks (or ranks) as spec‐
902 ified where <list> is
903 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
904 mapping is specified for a node and identical mapping
905 is applied to the tasks on every node (i.e. the lowest
906 task ID on each node is mapped to the first mask spec‐
907 ified in the list, etc.). GPU masks are always inter‐
908 preted as hexadecimal values but can be preceded with
909 an optional '0x'. To simplify support for large task
910 counts, the lists may follow a map with an asterisk
911 and repetition count. For example
912 "mask_gpu:0x0f*4,0xf0*4". If the task/cgroup plugin
913 is used and ConstrainDevices is set in cgroup.conf,
914 then the GPU IDs are zero-based indexes relative to
915 the GPUs allocated to the job (e.g. the first GPU is
916 0, even if the global ID is 3). Otherwise, the GPU IDs
917 are global IDs, and all GPUs on each node in the job
918 should be allocated for predictable binding results.
919
920 single:<tasks_per_gpu>
921 Like --gpu-bind=closest, except that each task can
922 only be bound to a single GPU, even when it can be
923 bound to multiple GPUs that are equally close. The
924 GPU to bind to is determined by <tasks_per_gpu>, where
925 the first <tasks_per_gpu> tasks are bound to the first
926 GPU available, the second <tasks_per_gpu> tasks are
927 bound to the second GPU available, etc. This is basi‐
928 cally a block distribution of tasks onto available
929 GPUs, where the available GPUs are determined by the
930 socket affinity of the task and the socket affinity of
931 the GPUs as specified in gres.conf's Cores parameter.
932
933
934 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
935 Request that GPUs allocated to the job are configured with spe‐
936 cific frequency values. This option can be used to indepen‐
937 dently configure the GPU and its memory frequencies. After the
938 job is completed, the frequencies of all affected GPUs will be
939 reset to the highest possible values. In some cases, system
940 power caps may override the requested values. The field type
941 can be "memory". If type is not specified, the GPU frequency is
942 implied. The value field can either be "low", "medium", "high",
943 "highm1" or a numeric value in megahertz (MHz). If the speci‐
944 fied numeric value is not possible, a value as close as possible
945 will be used. See below for definition of the values. The ver‐
946 bose option causes current GPU frequency information to be
947 logged. Examples of use include "--gpu-freq=medium,memory=high"
948 and "--gpu-freq=450".
949
950 Supported value definitions:
951
952 low the lowest available frequency.
953
954 medium attempts to set a frequency in the middle of the
955 available range.
956
957 high the highest available frequency.
958
959 highm1 (high minus one) will select the next highest avail‐
960 able frequency.
961
962
963 --gpus-per-node=[<type>:]<number>
964 Specify the number of GPUs required for the job on each node
965 included in the job's resource allocation. An optional GPU type
966 specification can be supplied. For example
967 "--gpus-per-node=volta:3". Multiple options can be requested in
968 a comma separated list, for example:
969 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
970 --gpus-per-socket and --gpus-per-task options.
971
972
973 --gpus-per-socket=[<type>:]<number>
974 Specify the number of GPUs required for the job on each socket
975 included in the job's resource allocation. An optional GPU type
976 specification can be supplied. For example
977 "--gpus-per-socket=volta:3". Multiple options can be requested
978 in a comma separated list, for example:
979 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
980 sockets per node count ( --sockets-per-node). See also the
981 --gpus, --gpus-per-node and --gpus-per-task options. This
982 option applies to job allocations.
983
984
985 --gpus-per-task=[<type>:]<number>
986 Specify the number of GPUs required for the job on each task to
987 be spawned in the job's resource allocation. An optional GPU
988 type specification can be supplied. For example
989 "--gpus-per-task=volta:1". Multiple options can be requested in
990 a comma separated list, for example:
991 "--gpus-per-task=volta:3,kepler:1". See also the --gpus,
992 --gpus-per-socket and --gpus-per-node options. This option
993 requires an explicit task count, e.g. -n, --ntasks or "--gpus=X
994 --gpus-per-task=Y" rather than an ambiguous range of nodes with
995 -N, --nodes.
996 NOTE: This option will not have any impact on GPU binding,
997 specifically it won't limit the number of devices set for
998 CUDA_VISIBLE_DEVICES.
999
1000
1001 --gres=<list>
1002 Specifies a comma delimited list of generic consumable
1003 resources. The format of each entry on the list is
1004 "name[[:type]:count]". The name is that of the consumable
1005 resource. The count is the number of those resources with a
1006 default value of 1. The count can have a suffix of "k" or "K"
1007 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
1008 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
1009 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
1010 x 1024 x 1024 x 1024). The specified resources will be allo‐
1011 cated to the job on each node. The available generic consumable
1012 resources is configurable by the system administrator. A list
1013 of available generic consumable resources will be printed and
1014 the command will exit if the option argument is "help". Exam‐
1015 ples of use include "--gres=gpu:2,mic:1", "--gres=gpu:kepler:2",
1016 and "--gres=help". NOTE: This option applies to job and step
1017 allocations. By default, a job step is allocated all of the
1018 generic resources that have been allocated to the job. To
1019 change the behavior so that each job step is allocated no
1020 generic resources, explicitly set the value of --gres to specify
1021 zero counts for each generic resource OR set "--gres=none" OR
1022 set the SLURM_STEP_GRES environment variable to "none".
1023
1024
1025 --gres-flags=<type>
1026 Specify generic resource task binding options. This option
1027 applies to job allocations.
1028
1029 disable-binding
1030 Disable filtering of CPUs with respect to generic
1031 resource locality. This option is currently required to
1032 use more CPUs than are bound to a GRES (i.e. if a GPU is
1033 bound to the CPUs on one socket, but resources on more
1034 than one socket are required to run the job). This
1035 option may permit a job to be allocated resources sooner
1036 than otherwise possible, but may result in lower job per‐
1037 formance.
1038 NOTE: This option is specific to SelectType=cons_res.
1039
1040 enforce-binding
1041 The only CPUs available to the job will be those bound to
1042 the selected GRES (i.e. the CPUs identified in the
1043 gres.conf file will be strictly enforced). This option
1044 may result in delayed initiation of a job. For example a
1045 job requiring two GPUs and one CPU will be delayed until
1046 both GPUs on a single socket are available rather than
1047 using GPUs bound to separate sockets, however, the appli‐
1048 cation performance may be improved due to improved commu‐
1049 nication speed. Requires the node to be configured with
1050 more than one socket and resource filtering will be per‐
1051 formed on a per-socket basis.
1052 NOTE: This option is specific to SelectType=cons_tres.
1053
1054
1055 -H, --hold
1056 Specify the job is to be submitted in a held state (priority of
1057 zero). A held job can now be released using scontrol to reset
1058 its priority (e.g. "scontrol release <job_id>"). This option
1059 applies to job allocations.
1060
1061
1062 -h, --help
1063 Display help information and exit.
1064
1065
1066 --hint=<type>
1067 Bind tasks according to application hints.
1068 NOTE: This option cannot be used in conjunction with any of
1069 --ntasks-per-core, --threads-per-core, --cpu-bind (other than
1070 --cpu-bind=verbose) or -B. If --hint is specified as a command
1071 line argument, it will take precedence over the environment.
1072
1073 compute_bound
1074 Select settings for compute bound applications: use all
1075 cores in each socket, one thread per core.
1076
1077 memory_bound
1078 Select settings for memory bound applications: use only
1079 one core in each socket, one thread per core.
1080
1081 [no]multithread
1082 [don't] use extra threads with in-core multi-threading
1083 which can benefit communication intensive applications.
1084 Only supported with the task/affinity plugin.
1085
1086 help show this help message
1087
1088 This option applies to job allocations.
1089
1090
1091 -I, --immediate[=<seconds>]
1092 exit if resources are not available within the time period spec‐
1093 ified. If no argument is given (seconds defaults to 1),
1094 resources must be available immediately for the request to suc‐
1095 ceed. If defer is configured in SchedulerParameters and sec‐
1096 onds=1 the allocation request will fail immediately; defer con‐
1097 flicts and takes precedence over this option. By default,
1098 --immediate is off, and the command will block until resources
1099 become available. Since this option's argument is optional, for
1100 proper parsing the single letter option must be followed immedi‐
1101 ately with the value and not include a space between them. For
1102 example "-I60" and not "-I 60". This option applies to job and
1103 step allocations.
1104
1105
1106 -i, --input=<mode>
1107 Specify how stdin is to redirected. By default, srun redirects
1108 stdin from the terminal all tasks. See IO Redirection below for
1109 more options. For OS X, the poll() function does not support
1110 stdin, so input from a terminal is not possible. This option
1111 applies to job and step allocations.
1112
1113
1114 -J, --job-name=<jobname>
1115 Specify a name for the job. The specified name will appear along
1116 with the job id number when querying running jobs on the system.
1117 The default is the supplied executable program's name. NOTE:
1118 This information may be written to the slurm_jobacct.log file.
1119 This file is space delimited so if a space is used in the job‐
1120 name name it will cause problems in properly displaying the con‐
1121 tents of the slurm_jobacct.log file when the sacct command is
1122 used. This option applies to job and step allocations.
1123
1124
1125 --jobid=<jobid>
1126 Initiate a job step under an already allocated job with job id
1127 id. Using this option will cause srun to behave exactly as if
1128 the SLURM_JOB_ID environment variable was set. This option
1129 applies to step allocations.
1130
1131
1132 -K, --kill-on-bad-exit[=0|1]
1133 Controls whether or not to terminate a step if any task exits
1134 with a non-zero exit code. If this option is not specified, the
1135 default action will be based upon the Slurm configuration param‐
1136 eter of KillOnBadExit. If this option is specified, it will take
1137 precedence over KillOnBadExit. An option argument of zero will
1138 not terminate the job. A non-zero argument or no argument will
1139 terminate the job. Note: This option takes precedence over the
1140 -W, --wait option to terminate the job immediately if a task
1141 exits with a non-zero exit code. Since this option's argument
1142 is optional, for proper parsing the single letter option must be
1143 followed immediately with the value and not include a space
1144 between them. For example "-K1" and not "-K 1".
1145
1146
1147 -k, --no-kill [=off]
1148 Do not automatically terminate a job if one of the nodes it has
1149 been allocated fails. This option applies to job and step allo‐
1150 cations. The job will assume all responsibilities for
1151 fault-tolerance. Tasks launch using this option will not be
1152 considered terminated (e.g. -K, --kill-on-bad-exit and -W,
1153 --wait options will have no effect upon the job step). The
1154 active job step (MPI job) will likely suffer a fatal error, but
1155 subsequent job steps may be run if this option is specified.
1156
1157 Specify an optional argument of "off" disable the effect of the
1158 SLURM_NO_KILL environment variable.
1159
1160 The default action is to terminate the job upon node failure.
1161
1162
1163 -l, --label
1164 Prepend task number to lines of stdout/err. The --label option
1165 will prepend lines of output with the remote task id. This
1166 option applies to step allocations.
1167
1168
1169 -L, --licenses=<license>
1170 Specification of licenses (or other resources available on all
1171 nodes of the cluster) which must be allocated to this job.
1172 License names can be followed by a colon and count (the default
1173 count is one). Multiple license names should be comma separated
1174 (e.g. "--licenses=foo:4,bar"). This option applies to job allo‐
1175 cations.
1176
1177
1178 -M, --clusters=<string>
1179 Clusters to issue commands to. Multiple cluster names may be
1180 comma separated. The job will be submitted to the one cluster
1181 providing the earliest expected job initiation time. The default
1182 value is the current cluster. A value of 'all' will query to run
1183 on all clusters. Note the --export option to control environ‐
1184 ment variables exported between clusters. This option applies
1185 only to job allocations. Note that the SlurmDBD must be up for
1186 this option to work properly.
1187
1188
1189 -m, --distribution=
1190 *|block|cyclic|arbitrary|plane=<options>
1191 [:*|block|cyclic|fcyclic[:*|block|
1192 cyclic|fcyclic]][,Pack|NoPack]
1193
1194 Specify alternate distribution methods for remote processes.
1195 This option controls the distribution of tasks to the nodes on
1196 which resources have been allocated, and the distribution of
1197 those resources to tasks for binding (task affinity). The first
1198 distribution method (before the first ":") controls the distri‐
1199 bution of tasks to nodes. The second distribution method (after
1200 the first ":") controls the distribution of allocated CPUs
1201 across sockets for binding to tasks. The third distribution
1202 method (after the second ":") controls the distribution of allo‐
1203 cated CPUs across cores for binding to tasks. The second and
1204 third distributions apply only if task affinity is enabled. The
1205 third distribution is supported only if the task/cgroup plugin
1206 is configured. The default value for each distribution type is
1207 specified by *.
1208
1209 Note that with select/cons_res and select/cons_tres, the number
1210 of CPUs allocated to each socket and node may be different.
1211 Refer to https://slurm.schedmd.com/mc_support.html for more
1212 information on resource allocation, distribution of tasks to
1213 nodes, and binding of tasks to CPUs.
1214 First distribution method (distribution of tasks across nodes):
1215
1216
1217 * Use the default method for distributing tasks to nodes
1218 (block).
1219
1220 block The block distribution method will distribute tasks to a
1221 node such that consecutive tasks share a node. For exam‐
1222 ple, consider an allocation of three nodes each with two
1223 cpus. A four-task block distribution request will dis‐
1224 tribute those tasks to the nodes with tasks one and two
1225 on the first node, task three on the second node, and
1226 task four on the third node. Block distribution is the
1227 default behavior if the number of tasks exceeds the num‐
1228 ber of allocated nodes.
1229
1230 cyclic The cyclic distribution method will distribute tasks to a
1231 node such that consecutive tasks are distributed over
1232 consecutive nodes (in a round-robin fashion). For exam‐
1233 ple, consider an allocation of three nodes each with two
1234 cpus. A four-task cyclic distribution request will dis‐
1235 tribute those tasks to the nodes with tasks one and four
1236 on the first node, task two on the second node, and task
1237 three on the third node. Note that when SelectType is
1238 select/cons_res, the same number of CPUs may not be allo‐
1239 cated on each node. Task distribution will be round-robin
1240 among all the nodes with CPUs yet to be assigned to
1241 tasks. Cyclic distribution is the default behavior if
1242 the number of tasks is no larger than the number of allo‐
1243 cated nodes.
1244
1245 plane The tasks are distributed in blocks of a specified size.
1246 The number of tasks distributed to each node is the same
1247 as for cyclic distribution, but the taskids assigned to
1248 each node depend on the plane size. Additional distribu‐
1249 tion specifications cannot be combined with this option.
1250 For more details (including examples and diagrams),
1251 please see
1252 https://slurm.schedmd.com/mc_support.html
1253 and
1254 https://slurm.schedmd.com/dist_plane.html
1255
1256 arbitrary
1257 The arbitrary method of distribution will allocate pro‐
1258 cesses in-order as listed in file designated by the envi‐
1259 ronment variable SLURM_HOSTFILE. If this variable is
1260 listed it will over ride any other method specified. If
1261 not set the method will default to block. Inside the
1262 hostfile must contain at minimum the number of hosts
1263 requested and be one per line or comma separated. If
1264 specifying a task count (-n, --ntasks=<number>), your
1265 tasks will be laid out on the nodes in the order of the
1266 file.
1267 NOTE: The arbitrary distribution option on a job alloca‐
1268 tion only controls the nodes to be allocated to the job
1269 and not the allocation of CPUs on those nodes. This
1270 option is meant primarily to control a job step's task
1271 layout in an existing job allocation for the srun com‐
1272 mand.
1273 NOTE: If the number of tasks is given and a list of
1274 requested nodes is also given, the number of nodes used
1275 from that list will be reduced to match that of the num‐
1276 ber of tasks if the number of nodes in the list is
1277 greater than the number of tasks.
1278
1279
1280 Second distribution method (distribution of CPUs across sockets
1281 for binding):
1282
1283
1284 * Use the default method for distributing CPUs across sock‐
1285 ets (cyclic).
1286
1287 block The block distribution method will distribute allocated
1288 CPUs consecutively from the same socket for binding to
1289 tasks, before using the next consecutive socket.
1290
1291 cyclic The cyclic distribution method will distribute allocated
1292 CPUs for binding to a given task consecutively from the
1293 same socket, and from the next consecutive socket for the
1294 next task, in a round-robin fashion across sockets.
1295
1296 fcyclic
1297 The fcyclic distribution method will distribute allocated
1298 CPUs for binding to tasks from consecutive sockets in a
1299 round-robin fashion across the sockets.
1300
1301
1302 Third distribution method (distribution of CPUs across cores for
1303 binding):
1304
1305
1306 * Use the default method for distributing CPUs across cores
1307 (inherited from second distribution method).
1308
1309 block The block distribution method will distribute allocated
1310 CPUs consecutively from the same core for binding to
1311 tasks, before using the next consecutive core.
1312
1313 cyclic The cyclic distribution method will distribute allocated
1314 CPUs for binding to a given task consecutively from the
1315 same core, and from the next consecutive core for the
1316 next task, in a round-robin fashion across cores.
1317
1318 fcyclic
1319 The fcyclic distribution method will distribute allocated
1320 CPUs for binding to tasks from consecutive cores in a
1321 round-robin fashion across the cores.
1322
1323
1324
1325 Optional control for task distribution over nodes:
1326
1327
1328 Pack Rather than evenly distributing a job step's tasks evenly
1329 across its allocated nodes, pack them as tightly as pos‐
1330 sible on the nodes. This only applies when the "block"
1331 task distribution method is used.
1332
1333 NoPack Rather than packing a job step's tasks as tightly as pos‐
1334 sible on the nodes, distribute them evenly. This user
1335 option will supersede the SelectTypeParameters
1336 CR_Pack_Nodes configuration parameter.
1337
1338 This option applies to job and step allocations.
1339
1340
1341 --mail-type=<type>
1342 Notify user by email when certain event types occur. Valid type
1343 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1344 BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT),
1345 INVALID_DEPEND (dependency never satisfied), STAGE_OUT (burst
1346 buffer stage out and teardown completed), TIME_LIMIT,
1347 TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80
1348 (reached 80 percent of time limit), and TIME_LIMIT_50 (reached
1349 50 percent of time limit). Multiple type values may be speci‐
1350 fied in a comma separated list. The user to be notified is
1351 indicated with --mail-user. This option applies to job alloca‐
1352 tions.
1353
1354
1355 --mail-user=<user>
1356 User to receive email notification of state changes as defined
1357 by --mail-type. The default value is the submitting user. This
1358 option applies to job allocations.
1359
1360
1361 --mcs-label=<mcs>
1362 Used only when the mcs/group plugin is enabled. This parameter
1363 is a group among the groups of the user. Default value is cal‐
1364 culated by the Plugin mcs if it's enabled. This option applies
1365 to job allocations.
1366
1367
1368 --mem=<size[units]>
1369 Specify the real memory required per node. Default units are
1370 megabytes. Different units can be specified using the suffix
1371 [K|M|G|T]. Default value is DefMemPerNode and the maximum value
1372 is MaxMemPerNode. If configured, both of parameters can be seen
1373 using the scontrol show config command. This parameter would
1374 generally be used if whole nodes are allocated to jobs (Select‐
1375 Type=select/linear). Specifying a memory limit of zero for a
1376 job step will restrict the job step to the amount of memory
1377 allocated to the job, but not remove any of the job's memory
1378 allocation from being available to other job steps. Also see
1379 --mem-per-cpu and --mem-per-gpu. The --mem, --mem-per-cpu and
1380 --mem-per-gpu options are mutually exclusive. If --mem,
1381 --mem-per-cpu or --mem-per-gpu are specified as command line
1382 arguments, then they will take precedence over the environment
1383 (potentially inherited from salloc or sbatch).
1384
1385 NOTE: A memory size specification of zero is treated as a spe‐
1386 cial case and grants the job access to all of the memory on each
1387 node for newly submitted jobs and all available job memory to
1388 new job steps.
1389
1390 Specifying new memory limits for job steps are only advisory.
1391
1392 If the job is allocated multiple nodes in a heterogeneous clus‐
1393 ter, the memory limit on each node will be that of the node in
1394 the allocation with the smallest memory size (same limit will
1395 apply to every node in the job's allocation).
1396
1397 NOTE: Enforcement of memory limits currently relies upon the
1398 task/cgroup plugin or enabling of accounting, which samples mem‐
1399 ory use on a periodic basis (data need not be stored, just col‐
1400 lected). In both cases memory use is based upon the job's Resi‐
1401 dent Set Size (RSS). A task may exceed the memory limit until
1402 the next periodic accounting sample.
1403
1404 This option applies to job and step allocations.
1405
1406
1407 --mem-per-cpu=<size[units]>
1408 Minimum memory required per allocated CPU. Default units are
1409 megabytes. Different units can be specified using the suffix
1410 [K|M|G|T]. The default value is DefMemPerCPU and the maximum
1411 value is MaxMemPerCPU (see exception below). If configured, both
1412 parameters can be seen using the scontrol show config command.
1413 Note that if the job's --mem-per-cpu value exceeds the config‐
1414 ured MaxMemPerCPU, then the user's limit will be treated as a
1415 memory limit per task; --mem-per-cpu will be reduced to a value
1416 no larger than MaxMemPerCPU; --cpus-per-task will be set and the
1417 value of --cpus-per-task multiplied by the new --mem-per-cpu
1418 value will equal the original --mem-per-cpu value specified by
1419 the user. This parameter would generally be used if individual
1420 processors are allocated to jobs (SelectType=select/cons_res).
1421 If resources are allocated by core, socket, or whole nodes, then
1422 the number of CPUs allocated to a job may be higher than the
1423 task count and the value of --mem-per-cpu should be adjusted
1424 accordingly. Specifying a memory limit of zero for a job step
1425 will restrict the job step to the amount of memory allocated to
1426 the job, but not remove any of the job's memory allocation from
1427 being available to other job steps. Also see --mem and
1428 --mem-per-gpu. The --mem, --mem-per-cpu and --mem-per-gpu
1429 options are mutually exclusive.
1430
1431 NOTE: If the final amount of memory requested by a job can't be
1432 satisfied by any of the nodes configured in the partition, the
1433 job will be rejected. This could happen if --mem-per-cpu is
1434 used with the --exclusive option for a job allocation and
1435 --mem-per-cpu times the number of CPUs on a node is greater than
1436 the total memory of that node.
1437
1438
1439 --mem-per-gpu=<size[units]>
1440 Minimum memory required per allocated GPU. Default units are
1441 megabytes. Different units can be specified using the suffix
1442 [K|M|G|T]. Default value is DefMemPerGPU and is available on
1443 both a global and per partition basis. If configured, the
1444 parameters can be seen using the scontrol show config and scon‐
1445 trol show partition commands. Also see --mem. The --mem,
1446 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1447
1448
1449 --mem-bind=[{quiet,verbose},]type
1450 Bind tasks to memory. Used only when the task/affinity plugin is
1451 enabled and the NUMA memory functions are available. Note that
1452 the resolution of CPU and memory binding may differ on some
1453 architectures. For example, CPU binding may be performed at the
1454 level of the cores within a processor while memory binding will
1455 be performed at the level of nodes, where the definition of
1456 "nodes" may differ from system to system. By default no memory
1457 binding is performed; any task using any CPU can use any memory.
1458 This option is typically used to ensure that each task is bound
1459 to the memory closest to its assigned CPU. The use of any type
1460 other than "none" or "local" is not recommended. If you want
1461 greater control, try running a simple test code with the options
1462 "--cpu-bind=verbose,none --mem-bind=verbose,none" to determine
1463 the specific configuration.
1464
1465 NOTE: To have Slurm always report on the selected memory binding
1466 for all commands executed in a shell, you can enable verbose
1467 mode by setting the SLURM_MEM_BIND environment variable value to
1468 "verbose".
1469
1470 The following informational environment variables are set when
1471 --mem-bind is in use:
1472
1473 SLURM_MEM_BIND_LIST
1474 SLURM_MEM_BIND_PREFER
1475 SLURM_MEM_BIND_SORT
1476 SLURM_MEM_BIND_TYPE
1477 SLURM_MEM_BIND_VERBOSE
1478
1479 See the ENVIRONMENT VARIABLES section for a more detailed
1480 description of the individual SLURM_MEM_BIND* variables.
1481
1482 Supported options include:
1483
1484 help show this help message
1485
1486 local Use memory local to the processor in use
1487
1488 map_mem:<list>
1489 Bind by setting memory masks on tasks (or ranks) as spec‐
1490 ified where <list> is
1491 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1492 ping is specified for a node and identical mapping is
1493 applied to the tasks on every node (i.e. the lowest task
1494 ID on each node is mapped to the first ID specified in
1495 the list, etc.). NUMA IDs are interpreted as decimal
1496 values unless they are preceded with '0x' in which case
1497 they interpreted as hexadecimal values. If the number of
1498 tasks (or ranks) exceeds the number of elements in this
1499 list, elements in the list will be reused as needed
1500 starting from the beginning of the list. To simplify
1501 support for large task counts, the lists may follow a map
1502 with an asterisk and repetition count. For example
1503 "map_mem:0x0f*4,0xf0*4". For predictable binding
1504 results, all CPUs for each node in the job should be
1505 allocated to the job.
1506
1507 mask_mem:<list>
1508 Bind by setting memory masks on tasks (or ranks) as spec‐
1509 ified where <list> is
1510 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1511 mapping is specified for a node and identical mapping is
1512 applied to the tasks on every node (i.e. the lowest task
1513 ID on each node is mapped to the first mask specified in
1514 the list, etc.). NUMA masks are always interpreted as
1515 hexadecimal values. Note that masks must be preceded
1516 with a '0x' if they don't begin with [0-9] so they are
1517 seen as numerical values. If the number of tasks (or
1518 ranks) exceeds the number of elements in this list, ele‐
1519 ments in the list will be reused as needed starting from
1520 the beginning of the list. To simplify support for large
1521 task counts, the lists may follow a mask with an asterisk
1522 and repetition count. For example "mask_mem:0*4,1*4".
1523 For predictable binding results, all CPUs for each node
1524 in the job should be allocated to the job.
1525
1526 no[ne] don't bind tasks to memory (default)
1527
1528 nosort avoid sorting free cache pages (default, LaunchParameters
1529 configuration parameter can override this default)
1530
1531 p[refer]
1532 Prefer use of first specified NUMA node, but permit
1533 use of other available NUMA nodes.
1534
1535 q[uiet]
1536 quietly bind before task runs (default)
1537
1538 rank bind by task rank (not recommended)
1539
1540 sort sort free cache pages (run zonesort on Intel KNL nodes)
1541
1542 v[erbose]
1543 verbosely report binding before task runs
1544
1545 This option applies to job and step allocations.
1546
1547
1548 --mincpus=<n>
1549 Specify a minimum number of logical cpus/processors per node.
1550 This option applies to job allocations.
1551
1552
1553 --msg-timeout=<seconds>
1554 Modify the job launch message timeout. The default value is
1555 MessageTimeout in the Slurm configuration file slurm.conf.
1556 Changes to this are typically not recommended, but could be use‐
1557 ful to diagnose problems. This option applies to job alloca‐
1558 tions.
1559
1560
1561 --mpi=<mpi_type>
1562 Identify the type of MPI to be used. May result in unique initi‐
1563 ation procedures.
1564
1565 list Lists available mpi types to choose from.
1566
1567 pmi2 To enable PMI2 support. The PMI2 support in Slurm works
1568 only if the MPI implementation supports it, in other
1569 words if the MPI has the PMI2 interface implemented. The
1570 --mpi=pmi2 will load the library lib/slurm/mpi_pmi2.so
1571 which provides the server side functionality but the
1572 client side must implement PMI2_Init() and the other
1573 interface calls.
1574
1575 pmix To enable PMIx support (https://pmix.github.io). The PMIx
1576 support in Slurm can be used to launch parallel applica‐
1577 tions (e.g. MPI) if it supports PMIx, PMI2 or PMI1. Slurm
1578 must be configured with pmix support by passing "--with-
1579 pmix=<PMIx installation path>" option to its "./config‐
1580 ure" script.
1581
1582 At the time of writing PMIx is supported in Open MPI
1583 starting from version 2.0. PMIx also supports backward
1584 compatibility with PMI1 and PMI2 and can be used if MPI
1585 was configured with PMI2/PMI1 support pointing to the
1586 PMIx library ("libpmix"). If MPI supports PMI1/PMI2 but
1587 doesn't provide the way to point to a specific implemen‐
1588 tation, a hack'ish solution leveraging LD_PRELOAD can be
1589 used to force "libpmix" usage.
1590
1591
1592 none No special MPI processing. This is the default and works
1593 with many other versions of MPI.
1594
1595 This option applies to step allocations.
1596
1597
1598 --multi-prog
1599 Run a job with different programs and different arguments for
1600 each task. In this case, the executable program specified is
1601 actually a configuration file specifying the executable and
1602 arguments for each task. See MULTIPLE PROGRAM CONFIGURATION
1603 below for details on the configuration file contents. This
1604 option applies to step allocations.
1605
1606
1607 -N, --nodes=<minnodes[-maxnodes]>
1608 Request that a minimum of minnodes nodes be allocated to this
1609 job. A maximum node count may also be specified with maxnodes.
1610 If only one number is specified, this is used as both the mini‐
1611 mum and maximum node count. The partition's node limits super‐
1612 sede those of the job. If a job's node limits are outside of
1613 the range permitted for its associated partition, the job will
1614 be left in a PENDING state. This permits possible execution at
1615 a later time, when the partition limit is changed. If a job
1616 node limit exceeds the number of nodes configured in the parti‐
1617 tion, the job will be rejected. Note that the environment vari‐
1618 able SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compat‐
1619 ibility) will be set to the count of nodes actually allocated to
1620 the job. See the ENVIRONMENT VARIABLES section for more informa‐
1621 tion. If -N is not specified, the default behavior is to allo‐
1622 cate enough nodes to satisfy the requirements of the -n and -c
1623 options. The job will be allocated as many nodes as possible
1624 within the range specified and without delaying the initiation
1625 of the job. If the number of tasks is given and a number of
1626 requested nodes is also given, the number of nodes used from
1627 that request will be reduced to match that of the number of
1628 tasks if the number of nodes in the request is greater than the
1629 number of tasks. The node count specification may include a
1630 numeric value followed by a suffix of "k" (multiplies numeric
1631 value by 1,024) or "m" (multiplies numeric value by 1,048,576).
1632 This option applies to job and step allocations.
1633
1634
1635 -n, --ntasks=<number>
1636 Specify the number of tasks to run. Request that srun allocate
1637 resources for ntasks tasks. The default is one task per node,
1638 but note that the --cpus-per-task option will change this
1639 default. This option applies to job and step allocations.
1640
1641
1642 --network=<type>
1643 Specify information pertaining to the switch or network. The
1644 interpretation of type is system dependent. This option is sup‐
1645 ported when running Slurm on a Cray natively. It is used to
1646 request using Network Performance Counters. Only one value per
1647 request is valid. All options are case in-sensitive. In this
1648 configuration supported values include:
1649
1650 system
1651 Use the system-wide network performance counters. Only
1652 nodes requested will be marked in use for the job alloca‐
1653 tion. If the job does not fill up the entire system the
1654 rest of the nodes are not able to be used by other jobs
1655 using NPC, if idle their state will appear as PerfCnts.
1656 These nodes are still available for other jobs not using
1657 NPC.
1658
1659 blade Use the blade network performance counters. Only nodes
1660 requested will be marked in use for the job allocation.
1661 If the job does not fill up the entire blade(s) allocated
1662 to the job those blade(s) are not able to be used by other
1663 jobs using NPC, if idle their state will appear as PerfC‐
1664 nts. These nodes are still available for other jobs not
1665 using NPC.
1666
1667
1668 In all cases the job allocation request must specify the
1669 --exclusive option and the step cannot specify the --overlap
1670 option. Otherwise the request will be denied.
1671
1672 Also with any of these options steps are not allowed to share
1673 blades, so resources would remain idle inside an allocation if
1674 the step running on a blade does not take up all the nodes on
1675 the blade.
1676
1677 The network option is also supported on systems with IBM's Par‐
1678 allel Environment (PE). See IBM's LoadLeveler job command key‐
1679 word documentation about the keyword "network" for more informa‐
1680 tion. Multiple values may be specified in a comma separated
1681 list. All options are case in-sensitive. Supported values
1682 include:
1683
1684 BULK_XFER[=<resources>]
1685 Enable bulk transfer of data using Remote Direct-
1686 Memory Access (RDMA). The optional resources speci‐
1687 fication is a numeric value which can have a suffix
1688 of "k", "K", "m", "M", "g" or "G" for kilobytes,
1689 megabytes or gigabytes. NOTE: The resources speci‐
1690 fication is not supported by the underlying IBM in‐
1691 frastructure as of Parallel Environment version 2.2
1692 and no value should be specified at this time. The
1693 devices allocated to a job must all be of the same
1694 type. The default value depends upon depends upon
1695 what hardware is available and in order of prefer‐
1696 ences is IPONLY (which is not considered in User
1697 Space mode), HFI, IB, HPCE, and KMUX.
1698
1699 CAU=<count> Number of Collective Acceleration Units (CAU)
1700 required. Applies only to IBM Power7-IH processors.
1701 Default value is zero. Independent CAU will be
1702 allocated for each programming interface (MPI, LAPI,
1703 etc.)
1704
1705 DEVNAME=<name>
1706 Specify the device name to use for communications
1707 (e.g. "eth0" or "mlx4_0").
1708
1709 DEVTYPE=<type>
1710 Specify the device type to use for communications.
1711 The supported values of type are: "IB" (InfiniBand),
1712 "HFI" (P7 Host Fabric Interface), "IPONLY" (IP-Only
1713 interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1714 nel Emulation of HPCE). The devices allocated to a
1715 job must all be of the same type. The default value
1716 depends upon depends upon what hardware is available
1717 and in order of preferences is IPONLY (which is not
1718 considered in User Space mode), HFI, IB, HPCE, and
1719 KMUX.
1720
1721 IMMED =<count>
1722 Number of immediate send slots per window required.
1723 Applies only to IBM Power7-IH processors. Default
1724 value is zero.
1725
1726 INSTANCES =<count>
1727 Specify number of network connections for each task
1728 on each network connection. The default instance
1729 count is 1.
1730
1731 IPV4 Use Internet Protocol (IP) version 4 communications
1732 (default).
1733
1734 IPV6 Use Internet Protocol (IP) version 6 communications.
1735
1736 LAPI Use the LAPI programming interface.
1737
1738 MPI Use the MPI programming interface. MPI is the
1739 default interface.
1740
1741 PAMI Use the PAMI programming interface.
1742
1743 SHMEM Use the OpenSHMEM programming interface.
1744
1745 SN_ALL Use all available switch networks (default).
1746
1747 SN_SINGLE Use one available switch network.
1748
1749 UPC Use the UPC programming interface.
1750
1751 US Use User Space communications.
1752
1753
1754 Some examples of network specifications:
1755
1756 Instances=2,US,MPI,SN_ALL
1757 Create two user space connections for MPI communica‐
1758 tions on every switch network for each task.
1759
1760 US,MPI,Instances=3,Devtype=IB
1761 Create three user space connections for MPI communi‐
1762 cations on every InfiniBand network for each task.
1763
1764 IPV4,LAPI,SN_Single
1765 Create a IP version 4 connection for LAPI communica‐
1766 tions on one switch network for each task.
1767
1768 Instances=2,US,LAPI,MPI
1769 Create two user space connections each for LAPI and
1770 MPI communications on every switch network for each
1771 task. Note that SN_ALL is the default option so
1772 every switch network is used. Also note that
1773 Instances=2 specifies that two connections are
1774 established for each protocol (LAPI and MPI) and
1775 each task. If there are two networks and four tasks
1776 on the node then a total of 32 connections are
1777 established (2 instances x 2 protocols x 2 networks
1778 x 4 tasks).
1779
1780 This option applies to job and step allocations.
1781
1782
1783 --nice[=adjustment]
1784 Run the job with an adjusted scheduling priority within Slurm.
1785 With no adjustment value the scheduling priority is decreased by
1786 100. A negative nice value increases the priority, otherwise
1787 decreases it. The adjustment range is +/- 2147483645. Only priv‐
1788 ileged users can specify a negative adjustment.
1789
1790
1791 --ntasks-per-core=<ntasks>
1792 Request the maximum ntasks be invoked on each core. This option
1793 applies to the job allocation, but not to step allocations.
1794 Meant to be used with the --ntasks option. Related to
1795 --ntasks-per-node except at the core level instead of the node
1796 level. Masks will automatically be generated to bind the tasks
1797 to specific cores unless --cpu-bind=none is specified. NOTE:
1798 This option is not supported unless SelectType=cons_res is con‐
1799 figured (either directly or indirectly on Cray systems) along
1800 with the node's core count.
1801
1802
1803 --ntasks-per-gpu=<ntasks>
1804 Request that there are ntasks tasks invoked for every GPU. This
1805 option can work in two ways: 1) either specify --ntasks in addi‐
1806 tion, in which case a type-less GPU specification will be auto‐
1807 matically determined to satisfy --ntasks-per-gpu, or 2) specify
1808 the GPUs wanted (e.g. via --gpus or --gres) without specifying
1809 --ntasks, and the total task count will be automatically deter‐
1810 mined. The number of CPUs needed will be automatically
1811 increased if necessary to allow for any calculated task count.
1812 This option will implicitly set --gpu-bind=single:<ntasks>, but
1813 that can be overridden with an explicit --gpu-bind specifica‐
1814 tion. This option is not compatible with a node range (i.e.
1815 -N<minnodes-maxnodes>). This option is not compatible with
1816 --gpus-per-task, --gpus-per-socket, or --ntasks-per-node. This
1817 option is not supported unless SelectType=cons_tres is config‐
1818 ured (either directly or indirectly on Cray systems).
1819
1820
1821 --ntasks-per-node=<ntasks>
1822 Request that ntasks be invoked on each node. If used with the
1823 --ntasks option, the --ntasks option will take precedence and
1824 the --ntasks-per-node will be treated as a maximum count of
1825 tasks per node. Meant to be used with the --nodes option. This
1826 is related to --cpus-per-task=ncpus, but does not require knowl‐
1827 edge of the actual number of cpus on each node. In some cases,
1828 it is more convenient to be able to request that no more than a
1829 specific number of tasks be invoked on each node. Examples of
1830 this include submitting a hybrid MPI/OpenMP app where only one
1831 MPI "task/rank" should be assigned to each node while allowing
1832 the OpenMP portion to utilize all of the parallelism present in
1833 the node, or submitting a single setup/cleanup/monitoring job to
1834 each node of a pre-existing allocation as one step in a larger
1835 job script. This option applies to job allocations.
1836
1837
1838 --ntasks-per-socket=<ntasks>
1839 Request the maximum ntasks be invoked on each socket. This
1840 option applies to the job allocation, but not to step alloca‐
1841 tions. Meant to be used with the --ntasks option. Related to
1842 --ntasks-per-node except at the socket level instead of the node
1843 level. Masks will automatically be generated to bind the tasks
1844 to specific sockets unless --cpu-bind=none is specified. NOTE:
1845 This option is not supported unless SelectType=cons_res is con‐
1846 figured (either directly or indirectly on Cray systems) along
1847 with the node's socket count.
1848
1849
1850 -O, --overcommit
1851 Overcommit resources. This option applies to job and step allo‐
1852 cations. When applied to job allocation, only one CPU is allo‐
1853 cated to the job per node and options used to specify the number
1854 of tasks per node, socket, core, etc. are ignored. When
1855 applied to job step allocations (the srun command when executed
1856 within an existing job allocation), this option can be used to
1857 launch more than one task per CPU. Normally, srun will not
1858 allocate more than one process per CPU. By specifying --over‐
1859 commit you are explicitly allowing more than one process per
1860 CPU. However no more than MAX_TASKS_PER_NODE tasks are permitted
1861 to execute per node. NOTE: MAX_TASKS_PER_NODE is defined in the
1862 file slurm.h and is not a variable, it is set at Slurm build
1863 time.
1864
1865
1866 --overlap
1867 Allow steps to overlap each other on the same resources. By
1868 default steps do not share resources with other parallel steps.
1869
1870
1871 -o, --output=<filename pattern>
1872 Specify the "filename pattern" for stdout redirection. By
1873 default in interactive mode, srun collects stdout from all tasks
1874 and sends this output via TCP/IP to the attached terminal. With
1875 --output stdout may be redirected to a file, to one file per
1876 task, or to /dev/null. See section IO Redirection below for the
1877 various forms of filename pattern. If the specified file
1878 already exists, it will be overwritten.
1879
1880 If --error is not also specified on the command line, both std‐
1881 out and stderr will directed to the file specified by --output.
1882 This option applies to job and step allocations.
1883
1884
1885 --open-mode=<append|truncate>
1886 Open the output and error files using append or truncate mode as
1887 specified. For heterogeneous job steps the default value is
1888 "append". Otherwise the default value is specified by the sys‐
1889 tem configuration parameter JobFileAppend. This option applies
1890 to job and step allocations.
1891
1892
1893 --het-group=<expr>
1894 Identify each component in a heterogeneous job allocation for
1895 which a step is to be created. Applies only to srun commands
1896 issued inside a salloc allocation or sbatch script. <expr> is a
1897 set of integers corresponding to one or more options offsets on
1898 the salloc or sbatch command line. Examples: "--het-group=2",
1899 "--het-group=0,4", "--het-group=1,3-5". The default value is
1900 --het-group=0.
1901
1902
1903 -p, --partition=<partition_names>
1904 Request a specific partition for the resource allocation. If
1905 not specified, the default behavior is to allow the slurm con‐
1906 troller to select the default partition as designated by the
1907 system administrator. If the job can use more than one parti‐
1908 tion, specify their names in a comma separate list and the one
1909 offering earliest initiation will be used with no regard given
1910 to the partition name ordering (although higher priority parti‐
1911 tions will be considered first). When the job is initiated, the
1912 name of the partition used will be placed first in the job
1913 record partition string. This option applies to job allocations.
1914
1915
1916 --power=<flags>
1917 Comma separated list of power management plugin options. Cur‐
1918 rently available flags include: level (all nodes allocated to
1919 the job should have identical power caps, may be disabled by the
1920 Slurm configuration option PowerParameters=job_no_level). This
1921 option applies to job allocations.
1922
1923
1924 --priority=<value>
1925 Request a specific job priority. May be subject to configura‐
1926 tion specific constraints. value should either be a numeric
1927 value or "TOP" (for highest possible value). Only Slurm opera‐
1928 tors and administrators can set the priority of a job. This
1929 option applies to job allocations only.
1930
1931
1932 --profile=<all|none|[energy[,|task[,|filesystem[,|network]]]]>
1933 enables detailed data collection by the acct_gather_profile
1934 plugin. Detailed data are typically time-series that are stored
1935 in an HDF5 file for the job or an InfluxDB database depending on
1936 the configured plugin.
1937
1938
1939 All All data types are collected. (Cannot be combined with
1940 other values.)
1941
1942
1943 None No data types are collected. This is the default.
1944 (Cannot be combined with other values.)
1945
1946
1947 Energy Energy data is collected.
1948
1949
1950 Task Task (I/O, Memory, ...) data is collected.
1951
1952
1953 Filesystem
1954 Filesystem data is collected.
1955
1956
1957 Network Network (InfiniBand) data is collected.
1958
1959
1960 This option applies to job and step allocations.
1961
1962
1963 --prolog=<executable>
1964 srun will run executable just before launching the job step.
1965 The command line arguments for executable will be the command
1966 and arguments of the job step. If executable is "none", then no
1967 srun prolog will be run. This parameter overrides the SrunProlog
1968 parameter in slurm.conf. This parameter is completely indepen‐
1969 dent from the Prolog parameter in slurm.conf. This option
1970 applies to job allocations.
1971
1972
1973 --propagate[=rlimit[,rlimit...]]
1974 Allows users to specify which of the modifiable (soft) resource
1975 limits to propagate to the compute nodes and apply to their
1976 jobs. If no rlimit is specified, then all resource limits will
1977 be propagated. The following rlimit names are supported by
1978 Slurm (although some options may not be supported on some sys‐
1979 tems):
1980
1981 ALL All limits listed below (default)
1982
1983 NONE No limits listed below
1984
1985 AS The maximum address space for a process
1986
1987 CORE The maximum size of core file
1988
1989 CPU The maximum amount of CPU time
1990
1991 DATA The maximum size of a process's data segment
1992
1993 FSIZE The maximum size of files created. Note that if the
1994 user sets FSIZE to less than the current size of the
1995 slurmd.log, job launches will fail with a 'File size
1996 limit exceeded' error.
1997
1998 MEMLOCK The maximum size that may be locked into memory
1999
2000 NOFILE The maximum number of open files
2001
2002 NPROC The maximum number of processes available
2003
2004 RSS The maximum resident set size
2005
2006 STACK The maximum stack size
2007
2008 This option applies to job allocations.
2009
2010
2011 --pty Execute task zero in pseudo terminal mode. Implicitly sets
2012 --unbuffered. Implicitly sets --error and --output to /dev/null
2013 for all tasks except task zero, which may cause those tasks to
2014 exit immediately (e.g. shells will typically exit immediately in
2015 that situation). This option applies to step allocations.
2016
2017
2018 -q, --qos=<qos>
2019 Request a quality of service for the job. QOS values can be
2020 defined for each user/cluster/account association in the Slurm
2021 database. Users will be limited to their association's defined
2022 set of qos's when the Slurm configuration parameter, Account‐
2023 ingStorageEnforce, includes "qos" in its definition. This option
2024 applies to job allocations.
2025
2026
2027 -Q, --quiet
2028 Suppress informational messages from srun. Errors will still be
2029 displayed. This option applies to job and step allocations.
2030
2031
2032 --quit-on-interrupt
2033 Quit immediately on single SIGINT (Ctrl-C). Use of this option
2034 disables the status feature normally available when srun
2035 receives a single Ctrl-C and causes srun to instead immediately
2036 terminate the running job. This option applies to step alloca‐
2037 tions.
2038
2039
2040 -r, --relative=<n>
2041 Run a job step relative to node n of the current allocation.
2042 This option may be used to spread several job steps out among
2043 the nodes of the current job. If -r is used, the current job
2044 step will begin at node n of the allocated nodelist, where the
2045 first node is considered node 0. The -r option is not permitted
2046 with -w or -x option and will result in a fatal error when not
2047 running within a prior allocation (i.e. when SLURM_JOB_ID is not
2048 set). The default for n is 0. If the value of --nodes exceeds
2049 the number of nodes identified with the --relative option, a
2050 warning message will be printed and the --relative option will
2051 take precedence. This option applies to step allocations.
2052
2053
2054 --reboot
2055 Force the allocated nodes to reboot before starting the job.
2056 This is only supported with some system configurations and will
2057 otherwise be silently ignored. Only root, SlurmUser or admins
2058 can reboot nodes. This option applies to job allocations.
2059
2060
2061 --resv-ports[=count]
2062 Reserve communication ports for this job. Users can specify the
2063 number of port they want to reserve. The parameter Mpi‐
2064 Params=ports=12000-12999 must be specified in slurm.conf. If not
2065 specified and Slurm's OpenMPI plugin is used, then by default
2066 the number of reserved equal to the highest number of tasks on
2067 any node in the job step allocation. If the number of reserved
2068 ports is zero then no ports is reserved. Used for OpenMPI. This
2069 option applies to job and step allocations.
2070
2071
2072 --reservation=<reservation_names>
2073 Allocate resources for the job from the named reservation. If
2074 the job can use more than one reservation, specify their names
2075 in a comma separate list and the one offering earliest initia‐
2076 tion. Each reservation will be considered in the order it was
2077 requested. All reservations will be listed in scontrol/squeue
2078 through the life of the job. In accounting the first reserva‐
2079 tion will be seen and after the job starts the reservation used
2080 will replace it.
2081
2082
2083 -s, --oversubscribe
2084 The job allocation can over-subscribe resources with other run‐
2085 ning jobs. The resources to be over-subscribed can be nodes,
2086 sockets, cores, and/or hyperthreads depending upon configura‐
2087 tion. The default over-subscribe behavior depends on system
2088 configuration and the partition's OverSubscribe option takes
2089 precedence over the job's option. This option may result in the
2090 allocation being granted sooner than if the --oversubscribe
2091 option was not set and allow higher system utilization, but
2092 application performance will likely suffer due to competition
2093 for resources. This option applies to step allocations.
2094
2095
2096 -S, --core-spec=<num>
2097 Count of specialized cores per node reserved by the job for sys‐
2098 tem operations and not used by the application. The application
2099 will not use these cores, but will be charged for their alloca‐
2100 tion. Default value is dependent upon the node's configured
2101 CoreSpecCount value. If a value of zero is designated and the
2102 Slurm configuration option AllowSpecResourcesUsage is enabled,
2103 the job will be allowed to override CoreSpecCount and use the
2104 specialized resources on nodes it is allocated. This option can
2105 not be used with the --thread-spec option. This option applies
2106 to job allocations.
2107
2108
2109 --signal=[R:]<sig_num>[@<sig_time>]
2110 When a job is within sig_time seconds of its end time, send it
2111 the signal sig_num. Due to the resolution of event handling by
2112 Slurm, the signal may be sent up to 60 seconds earlier than
2113 specified. sig_num may either be a signal number or name (e.g.
2114 "10" or "USR1"). sig_time must have an integer value between 0
2115 and 65535. By default, no signal is sent before the job's end
2116 time. If a sig_num is specified without any sig_time, the
2117 default time will be 60 seconds. This option applies to job
2118 allocations. Use the "R:" option to allow this job to overlap
2119 with a reservation with MaxStartDelay set. To have the signal
2120 sent at preemption time see the preempt_send_user_signal Slurm‐
2121 ctldParameter.
2122
2123
2124 --slurmd-debug=<level>
2125 Specify a debug level for slurmd(8). The level may be specified
2126 either an integer value between 0 [quiet, only errors are dis‐
2127 played] and 4 [verbose operation] or the SlurmdDebug tags.
2128
2129 quiet Log nothing
2130
2131 fatal Log only fatal errors
2132
2133 error Log only errors
2134
2135 info Log errors and general informational messages
2136
2137 verbose Log errors and verbose informational messages
2138
2139
2140 The slurmd debug information is copied onto the stderr of
2141 the job. By default only errors are displayed. This option
2142 applies to job and step allocations.
2143
2144
2145 --sockets-per-node=<sockets>
2146 Restrict node selection to nodes with at least the specified
2147 number of sockets. See additional information under -B option
2148 above when task/affinity plugin is enabled. This option applies
2149 to job allocations.
2150
2151
2152 --spread-job
2153 Spread the job allocation over as many nodes as possible and
2154 attempt to evenly distribute tasks across the allocated nodes.
2155 This option disables the topology/tree plugin. This option
2156 applies to job allocations.
2157
2158
2159 --switches=<count>[@<max-time>]
2160 When a tree topology is used, this defines the maximum count of
2161 switches desired for the job allocation and optionally the maxi‐
2162 mum time to wait for that number of switches. If Slurm finds an
2163 allocation containing more switches than the count specified,
2164 the job remains pending until it either finds an allocation with
2165 desired switch count or the time limit expires. It there is no
2166 switch count limit, there is no delay in starting the job.
2167 Acceptable time formats include "minutes", "minutes:seconds",
2168 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
2169 "days-hours:minutes:seconds". The job's maximum time delay may
2170 be limited by the system administrator using the SchedulerParam‐
2171 eters configuration parameter with the max_switch_wait parameter
2172 option. On a dragonfly network the only switch count supported
2173 is 1 since communication performance will be highest when a job
2174 is allocate resources on one leaf switch or more than 2 leaf
2175 switches. The default max-time is the max_switch_wait Sched‐
2176 ulerParameters. This option applies to job allocations.
2177
2178
2179 -T, --threads=<nthreads>
2180 Allows limiting the number of concurrent threads used to send
2181 the job request from the srun process to the slurmd processes on
2182 the allocated nodes. Default is to use one thread per allocated
2183 node up to a maximum of 60 concurrent threads. Specifying this
2184 option limits the number of concurrent threads to nthreads (less
2185 than or equal to 60). This should only be used to set a low
2186 thread count for testing on very small memory computers. This
2187 option applies to job allocations.
2188
2189
2190 -t, --time=<time>
2191 Set a limit on the total run time of the job allocation. If the
2192 requested time limit exceeds the partition's time limit, the job
2193 will be left in a PENDING state (possibly indefinitely). The
2194 default time limit is the partition's default time limit. When
2195 the time limit is reached, each task in each job step is sent
2196 SIGTERM followed by SIGKILL. The interval between signals is
2197 specified by the Slurm configuration parameter KillWait. The
2198 OverTimeLimit configuration parameter may permit the job to run
2199 longer than scheduled. Time resolution is one minute and second
2200 values are rounded up to the next minute.
2201
2202 A time limit of zero requests that no time limit be imposed.
2203 Acceptable time formats include "minutes", "minutes:seconds",
2204 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
2205 "days-hours:minutes:seconds". This option applies to job and
2206 step allocations.
2207
2208
2209 --task-epilog=<executable>
2210 The slurmstepd daemon will run executable just after each task
2211 terminates. This will be executed before any TaskEpilog parame‐
2212 ter in slurm.conf is executed. This is meant to be a very
2213 short-lived program. If it fails to terminate within a few sec‐
2214 onds, it will be killed along with any descendant processes.
2215 This option applies to step allocations.
2216
2217
2218 --task-prolog=<executable>
2219 The slurmstepd daemon will run executable just before launching
2220 each task. This will be executed after any TaskProlog parameter
2221 in slurm.conf is executed. Besides the normal environment vari‐
2222 ables, this has SLURM_TASK_PID available to identify the process
2223 ID of the task being started. Standard output from this program
2224 of the form "export NAME=value" will be used to set environment
2225 variables for the task being spawned. This option applies to
2226 step allocations.
2227
2228
2229 --test-only
2230 Returns an estimate of when a job would be scheduled to run
2231 given the current job queue and all the other srun arguments
2232 specifying the job. This limits srun's behavior to just return
2233 information; no job is actually submitted. The program will be
2234 executed directly by the slurmd daemon. This option applies to
2235 job allocations.
2236
2237
2238 --thread-spec=<num>
2239 Count of specialized threads per node reserved by the job for
2240 system operations and not used by the application. The applica‐
2241 tion will not use these threads, but will be charged for their
2242 allocation. This option can not be used with the --core-spec
2243 option. This option applies to job allocations.
2244
2245
2246 --threads-per-core=<threads>
2247 Restrict node selection to nodes with at least the specified
2248 number of threads per core. In task layout, use the specified
2249 maximum number of threads per core. Implies --cpu-bind=threads.
2250 NOTE: "Threads" refers to the number of processing units on each
2251 core rather than the number of application tasks to be launched
2252 per core. See additional information under -B option above when
2253 task/affinity plugin is enabled. This option applies to job and
2254 step allocations.
2255
2256
2257 --time-min=<time>
2258 Set a minimum time limit on the job allocation. If specified,
2259 the job may have its --time limit lowered to a value no lower
2260 than --time-min if doing so permits the job to begin execution
2261 earlier than otherwise possible. The job's time limit will not
2262 be changed after the job is allocated resources. This is per‐
2263 formed by a backfill scheduling algorithm to allocate resources
2264 otherwise reserved for higher priority jobs. Acceptable time
2265 formats include "minutes", "minutes:seconds", "hours:min‐
2266 utes:seconds", "days-hours", "days-hours:minutes" and
2267 "days-hours:minutes:seconds". This option applies to job alloca‐
2268 tions.
2269
2270
2271 --tmp=<size[units]>
2272 Specify a minimum amount of temporary disk space per node.
2273 Default units are megabytes. Different units can be specified
2274 using the suffix [K|M|G|T]. This option applies to job alloca‐
2275 tions.
2276
2277
2278 -u, --unbuffered
2279 By default the connection between slurmstepd and the user
2280 launched application is over a pipe. The stdio output written by
2281 the application is buffered by the glibc until it is flushed or
2282 the output is set as unbuffered. See setbuf(3). If this option
2283 is specified the tasks are executed with a pseudo terminal so
2284 that the application output is unbuffered. This option applies
2285 to step allocations.
2286
2287 --usage
2288 Display brief help message and exit.
2289
2290
2291 --uid=<user>
2292 Attempt to submit and/or run a job as user instead of the invok‐
2293 ing user id. The invoking user's credentials will be used to
2294 check access permissions for the target partition. User root may
2295 use this option to run jobs as a normal user in a RootOnly par‐
2296 tition for example. If run as root, srun will drop its permis‐
2297 sions to the uid specified after node allocation is successful.
2298 user may be the user name or numerical user ID. This option
2299 applies to job and step allocations.
2300
2301
2302 --use-min-nodes
2303 If a range of node counts is given, prefer the smaller count.
2304
2305
2306 -V, --version
2307 Display version information and exit.
2308
2309
2310 -v, --verbose
2311 Increase the verbosity of srun's informational messages. Multi‐
2312 ple -v's will further increase srun's verbosity. By default
2313 only errors will be displayed. This option applies to job and
2314 step allocations.
2315
2316
2317 -W, --wait=<seconds>
2318 Specify how long to wait after the first task terminates before
2319 terminating all remaining tasks. A value of 0 indicates an
2320 unlimited wait (a warning will be issued after 60 seconds). The
2321 default value is set by the WaitTime parameter in the slurm con‐
2322 figuration file (see slurm.conf(5)). This option can be useful
2323 to ensure that a job is terminated in a timely fashion in the
2324 event that one or more tasks terminate prematurely. Note: The
2325 -K, --kill-on-bad-exit option takes precedence over -W, --wait
2326 to terminate the job immediately if a task exits with a non-zero
2327 exit code. This option applies to job allocations.
2328
2329
2330 -w, --nodelist=<host1,host2,... or filename>
2331 Request a specific list of hosts. The job will contain all of
2332 these hosts and possibly additional hosts as needed to satisfy
2333 resource requirements. The list may be specified as a
2334 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
2335 for example), or a filename. The host list will be assumed to
2336 be a filename if it contains a "/" character. If you specify a
2337 minimum node or processor count larger than can be satisfied by
2338 the supplied host list, additional resources will be allocated
2339 on other nodes as needed. Rather than repeating a host name
2340 multiple times, an asterisk and a repetition count may be
2341 appended to a host name. For example "host1,host1" and "host1*2"
2342 are equivalent. If the number of tasks is given and a list of
2343 requested nodes is also given, the number of nodes used from
2344 that list will be reduced to match that of the number of tasks
2345 if the number of nodes in the list is greater than the number of
2346 tasks. This option applies to job and step allocations.
2347
2348
2349 --wckey=<wckey>
2350 Specify wckey to be used with job. If TrackWCKey=no (default)
2351 in the slurm.conf this value is ignored. This option applies to
2352 job allocations.
2353
2354
2355 -X, --disable-status
2356 Disable the display of task status when srun receives a single
2357 SIGINT (Ctrl-C). Instead immediately forward the SIGINT to the
2358 running job. Without this option a second Ctrl-C in one second
2359 is required to forcibly terminate the job and srun will immedi‐
2360 ately exit. May also be set via the environment variable
2361 SLURM_DISABLE_STATUS. This option applies to job allocations.
2362
2363
2364 -x, --exclude=<host1,host2,... or filename>
2365 Request that a specific list of hosts not be included in the
2366 resources allocated to this job. The host list will be assumed
2367 to be a filename if it contains a "/" character. This option
2368 applies to job and step allocations.
2369
2370
2371 --x11[=<all|first|last>]
2372 Sets up X11 forwarding on all, first or last node(s) of the
2373 allocation. This option is only enabled if Slurm was compiled
2374 with X11 support and PrologFlags=x11 is defined in the
2375 slurm.conf. Default is all.
2376
2377
2378 -Z, --no-allocate
2379 Run the specified tasks on a set of nodes without creating a
2380 Slurm "job" in the Slurm queue structure, bypassing the normal
2381 resource allocation step. The list of nodes must be specified
2382 with the -w, --nodelist option. This is a privileged option
2383 only available for the users "SlurmUser" and "root". This option
2384 applies to job allocations.
2385
2386
2387 srun will submit the job request to the slurm job controller, then ini‐
2388 tiate all processes on the remote nodes. If the request cannot be met
2389 immediately, srun will block until the resources are free to run the
2390 job. If the -I (--immediate) option is specified srun will terminate if
2391 resources are not immediately available.
2392
2393 When initiating remote processes srun will propagate the current work‐
2394 ing directory, unless --chdir=<path> is specified, in which case path
2395 will become the working directory for the remote processes.
2396
2397 The -n, -c, and -N options control how CPUs and nodes will be allo‐
2398 cated to the job. When specifying only the number of processes to run
2399 with -n, a default of one CPU per process is allocated. By specifying
2400 the number of CPUs required per task (-c), more than one CPU may be
2401 allocated per process. If the number of nodes is specified with -N,
2402 srun will attempt to allocate at least the number of nodes specified.
2403
2404 Combinations of the above three options may be used to change how pro‐
2405 cesses are distributed across nodes and cpus. For instance, by specify‐
2406 ing both the number of processes and number of nodes on which to run,
2407 the number of processes per node is implied. However, if the number of
2408 CPUs per process is more important then number of processes (-n) and
2409 the number of CPUs per process (-c) should be specified.
2410
2411 srun will refuse to allocate more than one process per CPU unless
2412 --overcommit (-O) is also specified.
2413
2414 srun will attempt to meet the above specifications "at a minimum." That
2415 is, if 16 nodes are requested for 32 processes, and some nodes do not
2416 have 2 CPUs, the allocation of nodes will be increased in order to meet
2417 the demand for CPUs. In other words, a minimum of 16 nodes are being
2418 requested. However, if 16 nodes are requested for 15 processes, srun
2419 will consider this an error, as 15 processes cannot run across 16
2420 nodes.
2421
2422
2423 IO Redirection
2424
2425 By default, stdout and stderr will be redirected from all tasks to the
2426 stdout and stderr of srun, and stdin will be redirected from the stan‐
2427 dard input of srun to all remote tasks. If stdin is only to be read by
2428 a subset of the spawned tasks, specifying a file to read from rather
2429 than forwarding stdin from the srun command may be preferable as it
2430 avoids moving and storing data that will never be read.
2431
2432 For OS X, the poll() function does not support stdin, so input from a
2433 terminal is not possible.
2434
2435 This behavior may be changed with the --output, --error, and --input
2436 (-o, -e, -i) options. Valid format specifications for these options are
2437
2438 all stdout stderr is redirected from all tasks to srun. stdin is
2439 broadcast to all remote tasks. (This is the default behav‐
2440 ior)
2441
2442 none stdout and stderr is not received from any task. stdin is
2443 not sent to any task (stdin is closed).
2444
2445 taskid stdout and/or stderr are redirected from only the task with
2446 relative id equal to taskid, where 0 <= taskid <= ntasks,
2447 where ntasks is the total number of tasks in the current job
2448 step. stdin is redirected from the stdin of srun to this
2449 same task. This file will be written on the node executing
2450 the task.
2451
2452 filename srun will redirect stdout and/or stderr to the named file
2453 from all tasks. stdin will be redirected from the named file
2454 and broadcast to all tasks in the job. filename refers to a
2455 path on the host that runs srun. Depending on the cluster's
2456 file system layout, this may result in the output appearing
2457 in different places depending on whether the job is run in
2458 batch mode.
2459
2460 filename pattern
2461 srun allows for a filename pattern to be used to generate the
2462 named IO file described above. The following list of format
2463 specifiers may be used in the format string to generate a
2464 filename that will be unique to a given jobid, stepid, node,
2465 or task. In each case, the appropriate number of files are
2466 opened and associated with the corresponding tasks. Note that
2467 any format string containing %t, %n, and/or %N will be writ‐
2468 ten on the node executing the task rather than the node where
2469 srun executes, these format specifiers are not supported on a
2470 BGQ system.
2471
2472 \\ Do not process any of the replacement symbols.
2473
2474 %% The character "%".
2475
2476 %A Job array's master job allocation number.
2477
2478 %a Job array ID (index) number.
2479
2480 %J jobid.stepid of the running job. (e.g. "128.0")
2481
2482 %j jobid of the running job.
2483
2484 %s stepid of the running job.
2485
2486 %N short hostname. This will create a separate IO file
2487 per node.
2488
2489 %n Node identifier relative to current job (e.g. "0" is
2490 the first node of the running job) This will create a
2491 separate IO file per node.
2492
2493 %t task identifier (rank) relative to current job. This
2494 will create a separate IO file per task.
2495
2496 %u User name.
2497
2498 %x Job name.
2499
2500 A number placed between the percent character and format
2501 specifier may be used to zero-pad the result in the IO file‐
2502 name. This number is ignored if the format specifier corre‐
2503 sponds to non-numeric data (%N for example).
2504
2505 Some examples of how the format string may be used for a 4
2506 task job step with a Job ID of 128 and step id of 0 are
2507 included below:
2508
2509 job%J.out job128.0.out
2510
2511 job%4j.out job0128.out
2512
2513 job%j-%2t.out job128-00.out, job128-01.out, ...
2514
2516 Executing srun sends a remote procedure call to slurmctld. If enough
2517 calls from srun or other Slurm client commands that send remote proce‐
2518 dure calls to the slurmctld daemon come in at once, it can result in a
2519 degradation of performance of the slurmctld daemon, possibly resulting
2520 in a denial of service.
2521
2522 Do not run srun or other Slurm client commands that send remote proce‐
2523 dure calls to slurmctld from loops in shell scripts or other programs.
2524 Ensure that programs limit calls to srun to the minimum necessary for
2525 the information you are trying to gather.
2526
2527
2529 Some srun options may be set via environment variables. These environ‐
2530 ment variables, along with their corresponding options, are listed
2531 below. Note: Command line options will always override these settings.
2532
2533 PMI_FANOUT This is used exclusively with PMI (MPICH2 and
2534 MVAPICH2) and controls the fanout of data commu‐
2535 nications. The srun command sends messages to
2536 application programs (via the PMI library) and
2537 those applications may be called upon to forward
2538 that data to up to this number of additional
2539 tasks. Higher values offload work from the srun
2540 command to the applications and likely increase
2541 the vulnerability to failures. The default value
2542 is 32.
2543
2544 PMI_FANOUT_OFF_HOST This is used exclusively with PMI (MPICH2 and
2545 MVAPICH2) and controls the fanout of data commu‐
2546 nications. The srun command sends messages to
2547 application programs (via the PMI library) and
2548 those applications may be called upon to forward
2549 that data to additional tasks. By default, srun
2550 sends one message per host and one task on that
2551 host forwards the data to other tasks on that
2552 host up to PMI_FANOUT. If PMI_FANOUT_OFF_HOST is
2553 defined, the user task may be required to forward
2554 the data to tasks on other hosts. Setting
2555 PMI_FANOUT_OFF_HOST may increase performance.
2556 Since more work is performed by the PMI library
2557 loaded by the user application, failures also can
2558 be more common and more difficult to diagnose.
2559
2560 PMI_TIME This is used exclusively with PMI (MPICH2 and
2561 MVAPICH2) and controls how much the communica‐
2562 tions from the tasks to the srun are spread out
2563 in time in order to avoid overwhelming the srun
2564 command with work. The default value is 500
2565 (microseconds) per task. On relatively slow pro‐
2566 cessors or systems with very large processor
2567 counts (and large PMI data sets), higher values
2568 may be required.
2569
2570 SLURM_CONF The location of the Slurm configuration file.
2571
2572 SLURM_ACCOUNT Same as -A, --account
2573
2574 SLURM_ACCTG_FREQ Same as --acctg-freq
2575
2576 SLURM_BCAST Same as --bcast
2577
2578 SLURM_BURST_BUFFER Same as --bb
2579
2580 SLURM_CLUSTERS Same as -M, --clusters
2581
2582 SLURM_COMPRESS Same as --compress
2583
2584 SLURM_CONSTRAINT Same as -C, --constraint
2585
2586 SLURM_CORE_SPEC Same as --core-spec
2587
2588 SLURM_CPU_BIND Same as --cpu-bind
2589
2590 SLURM_CPU_FREQ_REQ Same as --cpu-freq.
2591
2592 SLURM_CPUS_PER_GPU Same as --cpus-per-gpu
2593
2594 SLURM_CPUS_PER_TASK Same as -c, --cpus-per-task
2595
2596 SLURM_DEBUG Same as -v, --verbose
2597
2598 SLURM_DELAY_BOOT Same as --delay-boot
2599
2600 SLURMD_DEBUG Same as -d, --slurmd-debug
2601
2602 SLURM_DEPENDENCY Same as -P, --dependency=<jobid>
2603
2604 SLURM_DISABLE_STATUS Same as -X, --disable-status
2605
2606 SLURM_DIST_PLANESIZE Plane distribution size. Only used if --distribu‐
2607 tion=plane, without =<size>, is set.
2608
2609 SLURM_DISTRIBUTION Same as -m, --distribution
2610
2611 SLURM_EPILOG Same as --epilog
2612
2613 SLURM_EXCLUSIVE Same as --exclusive
2614
2615 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
2616 error occurs (e.g. invalid options). This can be
2617 used by a script to distinguish application exit
2618 codes from various Slurm error conditions. Also
2619 see SLURM_EXIT_IMMEDIATE.
2620
2621 SLURM_EXIT_IMMEDIATE Specifies the exit code generated when the
2622 --immediate option is used and resources are not
2623 currently available. This can be used by a
2624 script to distinguish application exit codes from
2625 various Slurm error conditions. Also see
2626 SLURM_EXIT_ERROR.
2627
2628 SLURM_EXPORT_ENV Same as --export
2629
2630 SLURM_GPUS Same as -G, --gpus
2631
2632 SLURM_GPU_BIND Same as --gpu-bind
2633
2634 SLURM_GPU_FREQ Same as --gpu-freq
2635
2636 SLURM_GPUS_PER_NODE Same as --gpus-per-node
2637
2638 SLURM_GPUS_PER_TASK Same as --gpus-per-task
2639
2640 SLURM_GRES_FLAGS Same as --gres-flags
2641
2642 SLURM_HINT Same as --hint
2643
2644 SLURM_GRES Same as --gres. Also see SLURM_STEP_GRES
2645
2646 SLURM_IMMEDIATE Same as -I, --immediate
2647
2648 SLURM_JOB_ID Same as --jobid
2649
2650 SLURM_JOB_NAME Same as -J, --job-name except within an existing
2651 allocation, in which case it is ignored to avoid
2652 using the batch job's name as the name of each
2653 job step.
2654
2655 SLURM_JOB_NODELIST Same as -w, --nodelist=<host1,host2,... or file‐
2656 name>. If job has been resized, ensure that this
2657 nodelist is adjusted (or undefined) to avoid jobs
2658 steps being rejected due to down nodes.
2659
2660 SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility)
2661 Same as -N, --nodes Total number of nodes in the
2662 job’s resource allocation.
2663
2664 SLURM_KILL_BAD_EXIT Same as -K, --kill-on-bad-exit
2665
2666 SLURM_LABELIO Same as -l, --label
2667
2668 SLURM_MEM_BIND Same as --mem-bind
2669
2670 SLURM_MEM_PER_CPU Same as --mem-per-cpu
2671
2672 SLURM_MEM_PER_GPU Same as --mem-per-gpu
2673
2674 SLURM_MEM_PER_NODE Same as --mem
2675
2676 SLURM_MPI_TYPE Same as --mpi
2677
2678 SLURM_NETWORK Same as --network
2679
2680 SLURM_NO_KILL Same as -k, --no-kill
2681
2682 SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
2683 Same as -n, --ntasks
2684
2685 SLURM_NTASKS_PER_CORE Same as --ntasks-per-core
2686
2687 SLURM_NTASKS_PER_GPU Same as --ntasks-per-gpu
2688
2689 SLURM_NTASKS_PER_NODE Same as --ntasks-per-node
2690
2691 SLURM_NTASKS_PER_SOCKET
2692 Same as --ntasks-per-socket
2693
2694 SLURM_OPEN_MODE Same as --open-mode
2695
2696 SLURM_OVERCOMMIT Same as -O, --overcommit
2697
2698 SLURM_OVERLAP Same as --overlap
2699
2700 SLURM_PARTITION Same as -p, --partition
2701
2702 SLURM_PMI_KVS_NO_DUP_KEYS
2703 If set, then PMI key-pairs will contain no dupli‐
2704 cate keys. MPI can use this variable to inform
2705 the PMI library that it will not use duplicate
2706 keys so PMI can skip the check for duplicate
2707 keys. This is the case for MPICH2 and reduces
2708 overhead in testing for duplicates for improved
2709 performance
2710
2711 SLURM_POWER Same as --power
2712
2713 SLURM_PROFILE Same as --profile
2714
2715 SLURM_PROLOG Same as --prolog
2716
2717 SLURM_QOS Same as --qos
2718
2719 SLURM_REMOTE_CWD Same as -D, --chdir=
2720
2721 SLURM_REQ_SWITCH When a tree topology is used, this defines the
2722 maximum count of switches desired for the job
2723 allocation and optionally the maximum time to
2724 wait for that number of switches. See --switches
2725
2726 SLURM_RESERVATION Same as --reservation
2727
2728 SLURM_RESV_PORTS Same as --resv-ports
2729
2730 SLURM_SIGNAL Same as --signal
2731
2732 SLURM_STDERRMODE Same as -e, --error
2733
2734 SLURM_STDINMODE Same as -i, --input
2735
2736 SLURM_SPREAD_JOB Same as --spread-job
2737
2738 SLURM_SRUN_REDUCE_TASK_EXIT_MSG
2739 if set and non-zero, successive task exit mes‐
2740 sages with the same exit code will be printed
2741 only once.
2742
2743 SLURM_STEP_GRES Same as --gres (only applies to job steps, not to
2744 job allocations). Also see SLURM_GRES
2745
2746 SLURM_STEP_KILLED_MSG_NODE_ID=ID
2747 If set, only the specified node will log when the
2748 job or step are killed by a signal.
2749
2750 SLURM_STDOUTMODE Same as -o, --output
2751
2752 SLURM_TASK_EPILOG Same as --task-epilog
2753
2754 SLURM_TASK_PROLOG Same as --task-prolog
2755
2756 SLURM_TEST_EXEC If defined, srun will verify existence of the
2757 executable program along with user execute per‐
2758 mission on the node where srun was called before
2759 attempting to launch it on nodes in the step.
2760
2761 SLURM_THREAD_SPEC Same as --thread-spec
2762
2763 SLURM_THREADS Same as -T, --threads
2764
2765 SLURM_THREADS_PER_CORE
2766 Same as -T, --threads-per-core
2767
2768 SLURM_TIMELIMIT Same as -t, --time
2769
2770 SLURM_UNBUFFEREDIO Same as -u, --unbuffered
2771
2772 SLURM_USE_MIN_NODES Same as --use-min-nodes
2773
2774 SLURM_WAIT Same as -W, --wait
2775
2776 SLURM_WAIT4SWITCH Max time waiting for requested switches. See
2777 --switches
2778
2779 SLURM_WCKEY Same as -W, --wckey
2780
2781 SLURM_WHOLE Same as --whole
2782
2783 SLURM_WORKING_DIR -D, --chdir
2784
2785 SRUN_EXPORT_ENV Same as --export, and will override any setting
2786 for SLURM_EXPORT_ENV.
2787
2788
2789
2791 srun will set some environment variables in the environment of the exe‐
2792 cuting tasks on the remote compute nodes. These environment variables
2793 are:
2794
2795
2796 SLURM_*_HET_GROUP_# For a heterogeneous job allocation, the environ‐
2797 ment variables are set separately for each compo‐
2798 nent.
2799
2800 SLURM_CLUSTER_NAME Name of the cluster on which the job is execut‐
2801 ing.
2802
2803 SLURM_CPU_BIND_VERBOSE
2804 --cpu-bind verbosity (quiet,verbose).
2805
2806 SLURM_CPU_BIND_TYPE --cpu-bind type (none,rank,map_cpu:,mask_cpu:).
2807
2808 SLURM_CPU_BIND_LIST --cpu-bind map or mask list (list of Slurm CPU
2809 IDs or masks for this node, CPU_ID = Board_ID x
2810 threads_per_board + Socket_ID x
2811 threads_per_socket + Core_ID x threads_per_core +
2812 Thread_ID).
2813
2814
2815 SLURM_CPU_FREQ_REQ Contains the value requested for cpu frequency on
2816 the srun command as a numerical frequency in
2817 kilohertz, or a coded value for a request of low,
2818 medium,highm1 or high for the frequency. See the
2819 description of the --cpu-freq option or the
2820 SLURM_CPU_FREQ_REQ input environment variable.
2821
2822 SLURM_CPUS_ON_NODE Count of processors available to the job on this
2823 node. Note the select/linear plugin allocates
2824 entire nodes to jobs, so the value indicates the
2825 total count of CPUs on the node. For the
2826 select/cons_res plugin, this number indicates the
2827 number of cores on this node allocated to the
2828 job.
2829
2830 SLURM_CPUS_PER_TASK Number of cpus requested per task. Only set if
2831 the --cpus-per-task option is specified.
2832
2833 SLURM_DISTRIBUTION Distribution type for the allocated jobs. Set the
2834 distribution with -m, --distribution.
2835
2836 SLURM_GTIDS Global task IDs running on this node. Zero ori‐
2837 gin and comma separated.
2838
2839 SLURM_JOB_ACCOUNT Account name associated of the job allocation.
2840
2841 SLURM_JOB_CPUS_PER_NODE
2842 Number of CPUS per node.
2843
2844 SLURM_JOB_DEPENDENCY Set to value of the --dependency option.
2845
2846 SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
2847 Job id of the executing job.
2848
2849
2850 SLURM_JOB_NAME Set to the value of the --job-name option or the
2851 command name when srun is used to create a new
2852 job allocation. Not set when srun is used only to
2853 create a job step (i.e. within an existing job
2854 allocation).
2855
2856
2857 SLURM_JOB_PARTITION Name of the partition in which the job is run‐
2858 ning.
2859
2860
2861 SLURM_JOB_QOS Quality Of Service (QOS) of the job allocation.
2862
2863 SLURM_JOB_RESERVATION Advanced reservation containing the job alloca‐
2864 tion, if any.
2865
2866
2867 SLURM_LAUNCH_NODE_IPADDR
2868 IP address of the node from which the task launch
2869 was initiated (where the srun command ran from).
2870
2871 SLURM_LOCALID Node local task ID for the process within a job.
2872
2873
2874 SLURM_MEM_BIND_LIST --mem-bind map or mask list (<list of IDs or
2875 masks for this node>).
2876
2877 SLURM_MEM_BIND_PREFER --mem-bind prefer (prefer).
2878
2879 SLURM_MEM_BIND_SORT Sort free cache pages (run zonesort on Intel KNL
2880 nodes).
2881
2882 SLURM_MEM_BIND_TYPE --mem-bind type (none,rank,map_mem:,mask_mem:).
2883
2884 SLURM_MEM_BIND_VERBOSE
2885 --mem-bind verbosity (quiet,verbose).
2886
2887 SLURM_JOB_NODES Total number of nodes in the job's resource allo‐
2888 cation.
2889
2890 SLURM_NODE_ALIASES Sets of node name, communication address and
2891 hostname for nodes allocated to the job from the
2892 cloud. Each element in the set if colon separated
2893 and each set is comma separated. For example:
2894 SLURM_NODE_ALIASES=
2895 ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2896
2897 SLURM_NODEID The relative node ID of the current node.
2898
2899 SLURM_JOB_NODELIST List of nodes allocated to the job.
2900
2901 SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
2902 Total number of processes in the current job or
2903 job step.
2904
2905 SLURM_HET_SIZE Set to count of components in heterogeneous job.
2906
2907 SLURM_PRIO_PROCESS The scheduling priority (nice value) at the time
2908 of job submission. This value is propagated to
2909 the spawned processes.
2910
2911 SLURM_PROCID The MPI rank (or relative process ID) of the cur‐
2912 rent process.
2913
2914 SLURM_SRUN_COMM_HOST IP address of srun communication host.
2915
2916 SLURM_SRUN_COMM_PORT srun communication port.
2917
2918 SLURM_STEP_LAUNCHER_PORT
2919 Step launcher port.
2920
2921 SLURM_STEP_NODELIST List of nodes allocated to the step.
2922
2923 SLURM_STEP_NUM_NODES Number of nodes allocated to the step.
2924
2925 SLURM_STEP_NUM_TASKS Number of processes in the job step or whole het‐
2926 erogeneous job step.
2927
2928 SLURM_STEP_TASKS_PER_NODE
2929 Number of processes per node within the step.
2930
2931 SLURM_STEP_ID (and SLURM_STEPID for backwards compatibility)
2932 The step ID of the current job.
2933
2934 SLURM_SUBMIT_DIR The directory from which srun was invoked or, if
2935 applicable, the directory specified by the -D,
2936 --chdir option.
2937
2938 SLURM_SUBMIT_HOST The hostname of the computer from which salloc
2939 was invoked.
2940
2941 SLURM_TASK_PID The process ID of the task being started.
2942
2943 SLURM_TASKS_PER_NODE Number of tasks to be initiated on each node.
2944 Values are comma separated and in the same order
2945 as SLURM_JOB_NODELIST. If two or more consecu‐
2946 tive nodes are to have the same task count, that
2947 count is followed by "(x#)" where "#" is the rep‐
2948 etition count. For example,
2949 "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2950 first three nodes will each execute two tasks and
2951 the fourth node will execute one task.
2952
2953
2954 SLURM_TOPOLOGY_ADDR This is set only if the system has the topol‐
2955 ogy/tree plugin configured. The value will be
2956 set to the names network switches which may be
2957 involved in the job's communications from the
2958 system's top level switch down to the leaf switch
2959 and ending with node name. A period is used to
2960 separate each hardware component name.
2961
2962 SLURM_TOPOLOGY_ADDR_PATTERN
2963 This is set only if the system has the topol‐
2964 ogy/tree plugin configured. The value will be
2965 set component types listed in SLURM_TOPOL‐
2966 OGY_ADDR. Each component will be identified as
2967 either "switch" or "node". A period is used to
2968 separate each hardware component type.
2969
2970 SLURM_UMASK The umask in effect when the job was submitted.
2971
2972 SLURMD_NODENAME Name of the node running the task. In the case of
2973 a parallel job executing on multiple compute
2974 nodes, the various tasks will have this environ‐
2975 ment variable set to different values on each
2976 compute node.
2977
2978 SRUN_DEBUG Set to the logging level of the srun command.
2979 Default value is 3 (info level). The value is
2980 incremented or decremented based upon the --ver‐
2981 bose and --quiet options.
2982
2983
2985 Signals sent to the srun command are automatically forwarded to the
2986 tasks it is controlling with a few exceptions. The escape sequence
2987 <control-c> will report the state of all tasks associated with the srun
2988 command. If <control-c> is entered twice within one second, then the
2989 associated SIGINT signal will be sent to all tasks and a termination
2990 sequence will be entered sending SIGCONT, SIGTERM, and SIGKILL to all
2991 spawned tasks. If a third <control-c> is received, the srun program
2992 will be terminated without waiting for remote tasks to exit or their
2993 I/O to complete.
2994
2995 The escape sequence <control-z> is presently ignored. Our intent is for
2996 this put the srun command into a mode where various special actions may
2997 be invoked.
2998
2999
3001 MPI use depends upon the type of MPI being used. There are three fun‐
3002 damentally different modes of operation used by these various MPI
3003 implementation.
3004
3005 1. Slurm directly launches the tasks and performs initialization of
3006 communications through the PMI2 or PMIx APIs. For example: "srun -n16
3007 a.out".
3008
3009 2. Slurm creates a resource allocation for the job and then mpirun
3010 launches tasks using Slurm's infrastructure (OpenMPI).
3011
3012 3. Slurm creates a resource allocation for the job and then mpirun
3013 launches tasks using some mechanism other than Slurm, such as SSH or
3014 RSH. These tasks are initiated outside of Slurm's monitoring or con‐
3015 trol. Slurm's epilog should be configured to purge these tasks when the
3016 job's allocation is relinquished, or the use of pam_slurm_adopt is
3017 highly recommended.
3018
3019 See https://slurm.schedmd.com/mpi_guide.html for more information on
3020 use of these various MPI implementation with Slurm.
3021
3022
3024 Comments in the configuration file must have a "#" in column one. The
3025 configuration file contains the following fields separated by white
3026 space:
3027
3028 Task rank
3029 One or more task ranks to use this configuration. Multiple val‐
3030 ues may be comma separated. Ranges may be indicated with two
3031 numbers separated with a '-' with the smaller number first (e.g.
3032 "0-4" and not "4-0"). To indicate all tasks not otherwise spec‐
3033 ified, specify a rank of '*' as the last line of the file. If
3034 an attempt is made to initiate a task for which no executable
3035 program is defined, the following error message will be produced
3036 "No executable program specified for this task".
3037
3038 Executable
3039 The name of the program to execute. May be fully qualified
3040 pathname if desired.
3041
3042 Arguments
3043 Program arguments. The expression "%t" will be replaced with
3044 the task's number. The expression "%o" will be replaced with
3045 the task's offset within this range (e.g. a configured task rank
3046 value of "1-5" would have offset values of "0-4"). Single
3047 quotes may be used to avoid having the enclosed values inter‐
3048 preted. This field is optional. Any arguments for the program
3049 entered on the command line will be added to the arguments spec‐
3050 ified in the configuration file.
3051
3052 For example:
3053 ###################################################################
3054 # srun multiple program configuration file
3055 #
3056 # srun -n8 -l --multi-prog silly.conf
3057 ###################################################################
3058 4-6 hostname
3059 1,7 echo task:%t
3060 0,2-3 echo offset:%o
3061
3062 > srun -n8 -l --multi-prog silly.conf
3063 0: offset:0
3064 1: task:1
3065 2: offset:1
3066 3: offset:2
3067 4: linux15.llnl.gov
3068 5: linux16.llnl.gov
3069 6: linux17.llnl.gov
3070 7: task:7
3071
3072
3073
3074
3076 This simple example demonstrates the execution of the command hostname
3077 in eight tasks. At least eight processors will be allocated to the job
3078 (the same as the task count) on however many nodes are required to sat‐
3079 isfy the request. The output of each task will be proceeded with its
3080 task number. (The machine "dev" in the example below has a total of
3081 two CPUs per node)
3082
3083
3084 > srun -n8 -l hostname
3085 0: dev0
3086 1: dev0
3087 2: dev1
3088 3: dev1
3089 4: dev2
3090 5: dev2
3091 6: dev3
3092 7: dev3
3093
3094
3095 The srun -r option is used within a job script to run two job steps on
3096 disjoint nodes in the following example. The script is run using allo‐
3097 cate mode instead of as a batch job in this case.
3098
3099
3100 > cat test.sh
3101 #!/bin/sh
3102 echo $SLURM_JOB_NODELIST
3103 srun -lN2 -r2 hostname
3104 srun -lN2 hostname
3105
3106 > salloc -N4 test.sh
3107 dev[7-10]
3108 0: dev9
3109 1: dev10
3110 0: dev7
3111 1: dev8
3112
3113
3114 The following script runs two job steps in parallel within an allocated
3115 set of nodes.
3116
3117
3118 > cat test.sh
3119 #!/bin/bash
3120 srun -lN2 -n4 -r 2 sleep 60 &
3121 srun -lN2 -r 0 sleep 60 &
3122 sleep 1
3123 squeue
3124 squeue -s
3125 wait
3126
3127 > salloc -N4 test.sh
3128 JOBID PARTITION NAME USER ST TIME NODES NODELIST
3129 65641 batch test.sh grondo R 0:01 4 dev[7-10]
3130
3131 STEPID PARTITION USER TIME NODELIST
3132 65641.0 batch grondo 0:01 dev[7-8]
3133 65641.1 batch grondo 0:01 dev[9-10]
3134
3135
3136 This example demonstrates how one executes a simple MPI job. We use
3137 srun to build a list of machines (nodes) to be used by mpirun in its
3138 required format. A sample command line and the script to be executed
3139 follow.
3140
3141
3142 > cat test.sh
3143 #!/bin/sh
3144 MACHINEFILE="nodes.$SLURM_JOB_ID"
3145
3146 # Generate Machinefile for mpi such that hosts are in the same
3147 # order as if run via srun
3148 #
3149 srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE
3150
3151 # Run using generated Machine file:
3152 mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app
3153
3154 rm $MACHINEFILE
3155
3156 > salloc -N2 -n4 test.sh
3157
3158
3159 This simple example demonstrates the execution of different jobs on
3160 different nodes in the same srun. You can do this for any number of
3161 nodes or any number of jobs. The executables are placed on the nodes
3162 sited by the SLURM_NODEID env var. Starting at 0 and going to the num‐
3163 ber specified on the srun commandline.
3164
3165
3166 > cat test.sh
3167 case $SLURM_NODEID in
3168 0) echo "I am running on "
3169 hostname ;;
3170 1) hostname
3171 echo "is where I am running" ;;
3172 esac
3173
3174 > srun -N2 test.sh
3175 dev0
3176 is where I am running
3177 I am running on
3178 dev1
3179
3180
3181 This example demonstrates use of multi-core options to control layout
3182 of tasks. We request that four sockets per node and two cores per
3183 socket be dedicated to the job.
3184
3185
3186 > srun -N2 -B 4-4:2-2 a.out
3187
3188 This example shows a script in which Slurm is used to provide resource
3189 management for a job by executing the various job steps as processors
3190 become available for their dedicated use.
3191
3192
3193 > cat my.script
3194 #!/bin/bash
3195 srun -n4 prog1 &
3196 srun -n3 prog2 &
3197 srun -n1 prog3 &
3198 srun -n1 prog4 &
3199 wait
3200
3201
3202 This example shows how to launch an application called "server" with
3203 one task, 8 CPUs and 16 GB of memory (2 GB per CPU) plus another appli‐
3204 cation called "client" with 16 tasks, 1 CPU per task (the default) and
3205 1 GB of memory per task.
3206
3207
3208 > srun -n1 -c16 --mem-per-cpu=1gb server : -n16 --mem-per-cpu=1gb client
3209
3210
3212 Copyright (C) 2006-2007 The Regents of the University of California.
3213 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
3214 Copyright (C) 2008-2010 Lawrence Livermore National Security.
3215 Copyright (C) 2010-2015 SchedMD LLC.
3216
3217 This file is part of Slurm, a resource management program. For
3218 details, see <https://slurm.schedmd.com/>.
3219
3220 Slurm is free software; you can redistribute it and/or modify it under
3221 the terms of the GNU General Public License as published by the Free
3222 Software Foundation; either version 2 of the License, or (at your
3223 option) any later version.
3224
3225 Slurm is distributed in the hope that it will be useful, but WITHOUT
3226 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
3227 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
3228 for more details.
3229
3230
3232 salloc(1), sattach(1), sbatch(1), sbcast(1), scancel(1), scontrol(1),
3233 squeue(1), slurm.conf(5), sched_setaffinity (2), numa (3) getrlimit (2)
3234
3235
3236
3237January 2021 Slurm Commands srun(1)