1sbatch(1) Slurm Commands sbatch(1)
2
3
4
6 sbatch - Submit a batch script to Slurm.
7
8
10 sbatch [OPTIONS(0)...] [ : [OPTIONS(N)...]] script(0) [args(0)...]
11
12 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
13 For more details about heterogeneous jobs see the document
14 https://slurm.schedmd.com/heterogeneous_jobs.html
15
16
18 sbatch submits a batch script to Slurm. The batch script may be given
19 to sbatch through a file name on the command line, or if no file name
20 is specified, sbatch will read in a script from standard input. The
21 batch script may contain options preceded with "#SBATCH" before any ex‐
22 ecutable commands in the script. sbatch will stop processing further
23 #SBATCH directives once the first non-comment non-whitespace line has
24 been reached in the script.
25
26 sbatch exits immediately after the script is successfully transferred
27 to the Slurm controller and assigned a Slurm job ID. The batch script
28 is not necessarily granted resources immediately, it may sit in the
29 queue of pending jobs for some time before its required resources be‐
30 come available.
31
32 By default both standard output and standard error are directed to a
33 file of the name "slurm-%j.out", where the "%j" is replaced with the
34 job allocation number. The file will be generated on the first node of
35 the job allocation. Other than the batch script itself, Slurm does no
36 movement of user files.
37
38 When the job allocation is finally granted for the batch script, Slurm
39 runs a single copy of the batch script on the first node in the set of
40 allocated nodes.
41
42 The following document describes the influence of various options on
43 the allocation of cpus to jobs and tasks.
44 https://slurm.schedmd.com/cpu_management.html
45
46
48 sbatch will return 0 on success or error code on failure.
49
50
52 The batch script is resolved in the following order:
53
54 1. If script starts with ".", then path is constructed as: current
55 working directory / script
56 2. If script starts with a "/", then path is considered absolute.
57 3. If script is in current working directory.
58 4. If script can be resolved through PATH. See path_resolution(7).
59
60 Current working directory is the calling process working directory un‐
61 less the --chdir argument is passed, which will override the current
62 working directory.
63
64
66 -A, --account=<account>
67 Charge resources used by this job to specified account. The ac‐
68 count is an arbitrary string. The account name may be changed
69 after job submission using the scontrol command.
70
71
72 --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
73 Define the job accounting and profiling sampling intervals in
74 seconds. This can be used to override the JobAcctGatherFre‐
75 quency parameter in the slurm.conf file. <datatype>=<interval>
76 specifies the task sampling interval for the jobacct_gather
77 plugin or a sampling interval for a profiling type by the
78 acct_gather_profile plugin. Multiple comma-separated
79 <datatype>=<interval> pairs may be specified. Supported datatype
80 values are:
81
82 task Sampling interval for the jobacct_gather plugins and
83 for task profiling by the acct_gather_profile
84 plugin.
85 NOTE: This frequency is used to monitor memory us‐
86 age. If memory limits are enforced, the highest fre‐
87 quency a user can request is what is configured in
88 the slurm.conf file. It can not be disabled.
89
90 energy Sampling interval for energy profiling using the
91 acct_gather_energy plugin.
92
93 network Sampling interval for infiniband profiling using the
94 acct_gather_interconnect plugin.
95
96 filesystem Sampling interval for filesystem profiling using the
97 acct_gather_filesystem plugin.
98
99
100 The default value for the task sampling interval is 30 seconds.
101 The default value for all other intervals is 0. An interval of
102 0 disables sampling of the specified type. If the task sampling
103 interval is 0, accounting information is collected only at job
104 termination (reducing Slurm interference with the job).
105 Smaller (non-zero) values have a greater impact upon job perfor‐
106 mance, but a value of 30 seconds is not likely to be noticeable
107 for applications having less than 10,000 tasks.
108
109
110 -a, --array=<indexes>
111 Submit a job array, multiple jobs to be executed with identical
112 parameters. The indexes specification identifies what array in‐
113 dex values should be used. Multiple values may be specified us‐
114 ing a comma separated list and/or a range of values with a "-"
115 separator. For example, "--array=0-15" or "--array=0,6,16-32".
116 A step function can also be specified with a suffix containing a
117 colon and number. For example, "--array=0-15:4" is equivalent to
118 "--array=0,4,8,12". A maximum number of simultaneously running
119 tasks from the job array may be specified using a "%" separator.
120 For example "--array=0-15%4" will limit the number of simultane‐
121 ously running tasks from this job array to 4. The minimum index
122 value is 0. the maximum value is one less than the configura‐
123 tion parameter MaxArraySize. NOTE: currently, federated job ar‐
124 rays only run on the local cluster.
125
126
127 --batch=<list>
128 Nodes can have features assigned to them by the Slurm adminis‐
129 trator. Users can specify which of these features are required
130 by their batch script using this options. For example a job's
131 allocation may include both Intel Haswell and KNL nodes with
132 features "haswell" and "knl" respectively. On such a configura‐
133 tion the batch script would normally benefit by executing on a
134 faster Haswell node. This would be specified using the option
135 "--batch=haswell". The specification can include AND and OR op‐
136 erators using the ampersand and vertical bar separators. For ex‐
137 ample: "--batch=haswell|broadwell" or "--batch=haswell|big_mem‐
138 ory". The --batch argument must be a subset of the job's --con‐
139 straint=<list> argument (i.e. the job can not request only KNL
140 nodes, but require the script to execute on a Haswell node). If
141 the request can not be satisfied from the resources allocated to
142 the job, the batch script will execute on the first node of the
143 job allocation.
144
145
146 --bb=<spec>
147 Burst buffer specification. The form of the specification is
148 system dependent. Also see --bbf. When the --bb option is
149 used, Slurm parses this option and creates a temporary burst
150 buffer script file that is used internally by the burst buffer
151 plugins. See Slurm's burst buffer guide for more information and
152 examples:
153 https://slurm.schedmd.com/burst_buffer.html
154
155
156 --bbf=<file_name>
157 Path of file containing burst buffer specification. The form of
158 the specification is system dependent. These burst buffer di‐
159 rectives will be inserted into the submitted batch script. See
160 Slurm's burst buffer guide for more information and examples:
161 https://slurm.schedmd.com/burst_buffer.html
162
163
164 -b, --begin=<time>
165 Submit the batch script to the Slurm controller immediately,
166 like normal, but tell the controller to defer the allocation of
167 the job until the specified time.
168
169 Time may be of the form HH:MM:SS to run a job at a specific time
170 of day (seconds are optional). (If that time is already past,
171 the next day is assumed.) You may also specify midnight, noon,
172 fika (3 PM) or teatime (4 PM) and you can have a time-of-day
173 suffixed with AM or PM for running in the morning or the
174 evening. You can also say what day the job will be run, by
175 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
176 Combine date and time using the following format
177 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
178 count time-units, where the time-units can be seconds (default),
179 minutes, hours, days, or weeks and you can tell Slurm to run the
180 job today with the keyword today and to run the job tomorrow
181 with the keyword tomorrow. The value may be changed after job
182 submission using the scontrol command. For example:
183 --begin=16:00
184 --begin=now+1hour
185 --begin=now+60 (seconds by default)
186 --begin=2010-01-20T12:34:00
187
188
189 Notes on date/time specifications:
190 - Although the 'seconds' field of the HH:MM:SS time specifica‐
191 tion is allowed by the code, note that the poll time of the
192 Slurm scheduler is not precise enough to guarantee dispatch of
193 the job on the exact second. The job will be eligible to start
194 on the next poll following the specified time. The exact poll
195 interval depends on the Slurm scheduler (e.g., 60 seconds with
196 the default sched/builtin).
197 - If no time (HH:MM:SS) is specified, the default is
198 (00:00:00).
199 - If a date is specified without a year (e.g., MM/DD) then the
200 current year is assumed, unless the combination of MM/DD and
201 HH:MM:SS has already passed for that year, in which case the
202 next year is used.
203
204
205 -D, --chdir=<directory>
206 Set the working directory of the batch script to directory be‐
207 fore it is executed. The path can be specified as full path or
208 relative path to the directory where the command is executed.
209
210
211 --cluster-constraint=[!]<list>
212 Specifies features that a federated cluster must have to have a
213 sibling job submitted to it. Slurm will attempt to submit a sib‐
214 ling job to a cluster if it has at least one of the specified
215 features. If the "!" option is included, Slurm will attempt to
216 submit a sibling job to a cluster that has none of the specified
217 features.
218
219
220 -M, --clusters=<string>
221 Clusters to issue commands to. Multiple cluster names may be
222 comma separated. The job will be submitted to the one cluster
223 providing the earliest expected job initiation time. The default
224 value is the current cluster. A value of 'all' will query to run
225 on all clusters. Note the --export option to control environ‐
226 ment variables exported between clusters. Note that the Slur‐
227 mDBD must be up for this option to work properly.
228
229
230 --comment=<string>
231 An arbitrary comment enclosed in double quotes if using spaces
232 or some special characters.
233
234
235 -C, --constraint=<list>
236 Nodes can have features assigned to them by the Slurm adminis‐
237 trator. Users can specify which of these features are required
238 by their job using the constraint option. Only nodes having
239 features matching the job constraints will be used to satisfy
240 the request. Multiple constraints may be specified with AND,
241 OR, matching OR, resource counts, etc. (some operators are not
242 supported on all system types). Supported constraint options
243 include:
244
245 Single Name
246 Only nodes which have the specified feature will be used.
247 For example, --constraint="intel"
248
249 Node Count
250 A request can specify the number of nodes needed with
251 some feature by appending an asterisk and count after the
252 feature name. For example, --nodes=16 --con‐
253 straint="graphics*4 ..." indicates that the job requires
254 16 nodes and that at least four of those nodes must have
255 the feature "graphics."
256
257 AND If only nodes with all of specified features will be
258 used. The ampersand is used for an AND operator. For
259 example, --constraint="intel&gpu"
260
261 OR If only nodes with at least one of specified features
262 will be used. The vertical bar is used for an OR opera‐
263 tor. For example, --constraint="intel|amd"
264
265 Matching OR
266 If only one of a set of possible options should be used
267 for all allocated nodes, then use the OR operator and en‐
268 close the options within square brackets. For example,
269 --constraint="[rack1|rack2|rack3|rack4]" might be used to
270 specify that all nodes must be allocated on a single rack
271 of the cluster, but any of those four racks can be used.
272
273 Multiple Counts
274 Specific counts of multiple resources may be specified by
275 using the AND operator and enclosing the options within
276 square brackets. For example, --con‐
277 straint="[rack1*2&rack2*4]" might be used to specify that
278 two nodes must be allocated from nodes with the feature
279 of "rack1" and four nodes must be allocated from nodes
280 with the feature "rack2".
281
282 NOTE: This construct does not support multiple Intel KNL
283 NUMA or MCDRAM modes. For example, while --con‐
284 straint="[(knl&quad)*2&(knl&hemi)*4]" is not supported,
285 --constraint="[haswell*2&(knl&hemi)*4]" is supported.
286 Specification of multiple KNL modes requires the use of a
287 heterogeneous job.
288
289 Brackets
290 Brackets can be used to indicate that you are looking for
291 a set of nodes with the different requirements contained
292 within the brackets. For example, --con‐
293 straint="[(rack1|rack2)*1&(rack3)*2]" will get you one
294 node with either the "rack1" or "rack2" features and two
295 nodes with the "rack3" feature. The same request without
296 the brackets will try to find a single node that meets
297 those requirements.
298
299 NOTE: Brackets are only reserved for Multiple Counts and
300 Matching OR syntax. AND operators require a count for
301 each feature inside square brackets (i.e.
302 "[quad*2&hemi*1]").
303
304 Parenthesis
305 Parenthesis can be used to group like node features to‐
306 gether. For example, --con‐
307 straint="[(knl&snc4&flat)*4&haswell*1]" might be used to
308 specify that four nodes with the features "knl", "snc4"
309 and "flat" plus one node with the feature "haswell" are
310 required. All options within parenthesis should be
311 grouped with AND (e.g. "&") operands.
312
313
314 --container=<path_to_container>
315 Absolute path to OCI container bundle.
316
317
318 --contiguous
319 If set, then the allocated nodes must form a contiguous set.
320
321 NOTE: If SelectPlugin=cons_res this option won't be honored with
322 the topology/tree or topology/3d_torus plugins, both of which
323 can modify the node ordering.
324
325
326 -S, --core-spec=<num>
327 Count of specialized cores per node reserved by the job for sys‐
328 tem operations and not used by the application. The application
329 will not use these cores, but will be charged for their alloca‐
330 tion. Default value is dependent upon the node's configured
331 CoreSpecCount value. If a value of zero is designated and the
332 Slurm configuration option AllowSpecResourcesUsage is enabled,
333 the job will be allowed to override CoreSpecCount and use the
334 specialized resources on nodes it is allocated. This option can
335 not be used with the --thread-spec option.
336
337
338 --cores-per-socket=<cores>
339 Restrict node selection to nodes with at least the specified
340 number of cores per socket. See additional information under -B
341 option above when task/affinity plugin is enabled.
342 NOTE: This option may implicitly set the number of tasks (if -n
343 was not specified) as one task per requested thread.
344
345
346 --cpu-freq=<p1>[-p2[:p3]]
347
348 Request that job steps initiated by srun commands inside this
349 sbatch script be run at some requested frequency if possible, on
350 the CPUs selected for the step on the compute node(s).
351
352 p1 can be [#### | low | medium | high | highm1] which will set
353 the frequency scaling_speed to the corresponding value, and set
354 the frequency scaling_governor to UserSpace. See below for defi‐
355 nition of the values.
356
357 p1 can be [Conservative | OnDemand | Performance | PowerSave]
358 which will set the scaling_governor to the corresponding value.
359 The governor has to be in the list set by the slurm.conf option
360 CpuFreqGovernors.
361
362 When p2 is present, p1 will be the minimum scaling frequency and
363 p2 will be the maximum scaling frequency.
364
365 p2 can be [#### | medium | high | highm1] p2 must be greater
366 than p1.
367
368 p3 can be [Conservative | OnDemand | Performance | PowerSave |
369 SchedUtil | UserSpace] which will set the governor to the corre‐
370 sponding value.
371
372 If p3 is UserSpace, the frequency scaling_speed will be set by a
373 power or energy aware scheduling strategy to a value between p1
374 and p2 that lets the job run within the site's power goal. The
375 job may be delayed if p1 is higher than a frequency that allows
376 the job to run within the goal.
377
378 If the current frequency is < min, it will be set to min. Like‐
379 wise, if the current frequency is > max, it will be set to max.
380
381 Acceptable values at present include:
382
383 #### frequency in kilohertz
384
385 Low the lowest available frequency
386
387 High the highest available frequency
388
389 HighM1 (high minus one) will select the next highest
390 available frequency
391
392 Medium attempts to set a frequency in the middle of the
393 available range
394
395 Conservative attempts to use the Conservative CPU governor
396
397 OnDemand attempts to use the OnDemand CPU governor (the de‐
398 fault value)
399
400 Performance attempts to use the Performance CPU governor
401
402 PowerSave attempts to use the PowerSave CPU governor
403
404 UserSpace attempts to use the UserSpace CPU governor
405
406
407 The following informational environment variable is set
408 in the job
409 step when --cpu-freq option is requested.
410 SLURM_CPU_FREQ_REQ
411
412 This environment variable can also be used to supply the value
413 for the CPU frequency request if it is set when the 'srun' com‐
414 mand is issued. The --cpu-freq on the command line will over‐
415 ride the environment variable value. The form on the environ‐
416 ment variable is the same as the command line. See the ENVIRON‐
417 MENT VARIABLES section for a description of the
418 SLURM_CPU_FREQ_REQ variable.
419
420 NOTE: This parameter is treated as a request, not a requirement.
421 If the job step's node does not support setting the CPU fre‐
422 quency, or the requested value is outside the bounds of the le‐
423 gal frequencies, an error is logged, but the job step is allowed
424 to continue.
425
426 NOTE: Setting the frequency for just the CPUs of the job step
427 implies that the tasks are confined to those CPUs. If task con‐
428 finement (i.e., TaskPlugin=task/affinity or TaskPlu‐
429 gin=task/cgroup with the "ConstrainCores" option) is not config‐
430 ured, this parameter is ignored.
431
432 NOTE: When the step completes, the frequency and governor of
433 each selected CPU is reset to the previous values.
434
435 NOTE: When submitting jobs with the --cpu-freq option with lin‐
436 uxproc as the ProctrackType can cause jobs to run too quickly
437 before Accounting is able to poll for job information. As a re‐
438 sult not all of accounting information will be present.
439
440
441 --cpus-per-gpu=<ncpus>
442 Advise Slurm that ensuing job steps will require ncpus proces‐
443 sors per allocated GPU. Not compatible with the --cpus-per-task
444 option.
445
446
447 -c, --cpus-per-task=<ncpus>
448 Advise the Slurm controller that ensuing job steps will require
449 ncpus number of processors per task. Without this option, the
450 controller will just try to allocate one processor per task.
451
452 For instance, consider an application that has 4 tasks, each re‐
453 quiring 3 processors. If our cluster is comprised of quad-pro‐
454 cessors nodes and we simply ask for 12 processors, the con‐
455 troller might give us only 3 nodes. However, by using the
456 --cpus-per-task=3 options, the controller knows that each task
457 requires 3 processors on the same node, and the controller will
458 grant an allocation of 4 nodes, one for each of the 4 tasks.
459
460
461 --deadline=<OPT>
462 remove the job if no ending is possible before this deadline
463 (start > (deadline - time[-min])). Default is no deadline.
464 Valid time formats are:
465 HH:MM[:SS] [AM|PM]
466 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
467 MM/DD[/YY]-HH:MM[:SS]
468 YYYY-MM-DD[THH:MM[:SS]]]
469 now[+count[seconds(default)|minutes|hours|days|weeks]]
470
471
472 --delay-boot=<minutes>
473 Do not reboot nodes in order to satisfied this job's feature
474 specification if the job has been eligible to run for less than
475 this time period. If the job has waited for less than the spec‐
476 ified period, it will use only nodes which already have the
477 specified features. The argument is in units of minutes. A de‐
478 fault value may be set by a system administrator using the de‐
479 lay_boot option of the SchedulerParameters configuration parame‐
480 ter in the slurm.conf file, otherwise the default value is zero
481 (no delay).
482
483
484 -d, --dependency=<dependency_list>
485 Defer the start of this job until the specified dependencies
486 have been satisfied completed. <dependency_list> is of the form
487 <type:job_id[:job_id][,type:job_id[:job_id]]> or
488 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
489 must be satisfied if the "," separator is used. Any dependency
490 may be satisfied if the "?" separator is used. Only one separa‐
491 tor may be used. Many jobs can share the same dependency and
492 these jobs may even belong to different users. The value may
493 be changed after job submission using the scontrol command. De‐
494 pendencies on remote jobs are allowed in a federation. Once a
495 job dependency fails due to the termination state of a preceding
496 job, the dependent job will never be run, even if the preceding
497 job is requeued and has a different termination state in a sub‐
498 sequent execution.
499
500 after:job_id[[+time][:jobid[+time]...]]
501 After the specified jobs start or are cancelled and
502 'time' in minutes from job start or cancellation happens,
503 this job can begin execution. If no 'time' is given then
504 there is no delay after start or cancellation.
505
506 afterany:job_id[:jobid...]
507 This job can begin execution after the specified jobs
508 have terminated.
509
510 afterburstbuffer:job_id[:jobid...]
511 This job can begin execution after the specified jobs
512 have terminated and any associated burst buffer stage out
513 operations have completed.
514
515 aftercorr:job_id[:jobid...]
516 A task of this job array can begin execution after the
517 corresponding task ID in the specified job has completed
518 successfully (ran to completion with an exit code of
519 zero).
520
521 afternotok:job_id[:jobid...]
522 This job can begin execution after the specified jobs
523 have terminated in some failed state (non-zero exit code,
524 node failure, timed out, etc).
525
526 afterok:job_id[:jobid...]
527 This job can begin execution after the specified jobs
528 have successfully executed (ran to completion with an
529 exit code of zero).
530
531 singleton
532 This job can begin execution after any previously
533 launched jobs sharing the same job name and user have
534 terminated. In other words, only one job by that name
535 and owned by that user can be running or suspended at any
536 point in time. In a federation, a singleton dependency
537 must be fulfilled on all clusters unless DependencyParam‐
538 eters=disable_remote_singleton is used in slurm.conf.
539
540
541 -m, --distribution={*|block|cyclic|arbi‐
542 trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
543
544 Specify alternate distribution methods for remote processes.
545 For job allocation, this sets environment variables that will be
546 used by subsequent srun requests and also affects which cores
547 will be selected for job allocation.
548
549 This option controls the distribution of tasks to the nodes on
550 which resources have been allocated, and the distribution of
551 those resources to tasks for binding (task affinity). The first
552 distribution method (before the first ":") controls the distri‐
553 bution of tasks to nodes. The second distribution method (after
554 the first ":") controls the distribution of allocated CPUs
555 across sockets for binding to tasks. The third distribution
556 method (after the second ":") controls the distribution of allo‐
557 cated CPUs across cores for binding to tasks. The second and
558 third distributions apply only if task affinity is enabled. The
559 third distribution is supported only if the task/cgroup plugin
560 is configured. The default value for each distribution type is
561 specified by *.
562
563 Note that with select/cons_res and select/cons_tres, the number
564 of CPUs allocated to each socket and node may be different. Re‐
565 fer to https://slurm.schedmd.com/mc_support.html for more infor‐
566 mation on resource allocation, distribution of tasks to nodes,
567 and binding of tasks to CPUs.
568 First distribution method (distribution of tasks across nodes):
569
570
571 * Use the default method for distributing tasks to nodes
572 (block).
573
574 block The block distribution method will distribute tasks to a
575 node such that consecutive tasks share a node. For exam‐
576 ple, consider an allocation of three nodes each with two
577 cpus. A four-task block distribution request will dis‐
578 tribute those tasks to the nodes with tasks one and two
579 on the first node, task three on the second node, and
580 task four on the third node. Block distribution is the
581 default behavior if the number of tasks exceeds the num‐
582 ber of allocated nodes.
583
584 cyclic The cyclic distribution method will distribute tasks to a
585 node such that consecutive tasks are distributed over
586 consecutive nodes (in a round-robin fashion). For exam‐
587 ple, consider an allocation of three nodes each with two
588 cpus. A four-task cyclic distribution request will dis‐
589 tribute those tasks to the nodes with tasks one and four
590 on the first node, task two on the second node, and task
591 three on the third node. Note that when SelectType is
592 select/cons_res, the same number of CPUs may not be allo‐
593 cated on each node. Task distribution will be round-robin
594 among all the nodes with CPUs yet to be assigned to
595 tasks. Cyclic distribution is the default behavior if
596 the number of tasks is no larger than the number of allo‐
597 cated nodes.
598
599 plane The tasks are distributed in blocks of size <size>. The
600 size must be given or SLURM_DIST_PLANESIZE must be set.
601 The number of tasks distributed to each node is the same
602 as for cyclic distribution, but the taskids assigned to
603 each node depend on the plane size. Additional distribu‐
604 tion specifications cannot be combined with this option.
605 For more details (including examples and diagrams),
606 please see https://slurm.schedmd.com/mc_support.html and
607 https://slurm.schedmd.com/dist_plane.html
608
609 arbitrary
610 The arbitrary method of distribution will allocate pro‐
611 cesses in-order as listed in file designated by the envi‐
612 ronment variable SLURM_HOSTFILE. If this variable is
613 listed it will over ride any other method specified. If
614 not set the method will default to block. Inside the
615 hostfile must contain at minimum the number of hosts re‐
616 quested and be one per line or comma separated. If spec‐
617 ifying a task count (-n, --ntasks=<number>), your tasks
618 will be laid out on the nodes in the order of the file.
619 NOTE: The arbitrary distribution option on a job alloca‐
620 tion only controls the nodes to be allocated to the job
621 and not the allocation of CPUs on those nodes. This op‐
622 tion is meant primarily to control a job step's task lay‐
623 out in an existing job allocation for the srun command.
624 NOTE: If the number of tasks is given and a list of re‐
625 quested nodes is also given, the number of nodes used
626 from that list will be reduced to match that of the num‐
627 ber of tasks if the number of nodes in the list is
628 greater than the number of tasks.
629
630
631 Second distribution method (distribution of CPUs across sockets
632 for binding):
633
634
635 * Use the default method for distributing CPUs across sock‐
636 ets (cyclic).
637
638 block The block distribution method will distribute allocated
639 CPUs consecutively from the same socket for binding to
640 tasks, before using the next consecutive socket.
641
642 cyclic The cyclic distribution method will distribute allocated
643 CPUs for binding to a given task consecutively from the
644 same socket, and from the next consecutive socket for the
645 next task, in a round-robin fashion across sockets.
646 Tasks requiring more than one CPU will have all of those
647 CPUs allocated on a single socket if possible.
648
649 fcyclic
650 The fcyclic distribution method will distribute allocated
651 CPUs for binding to tasks from consecutive sockets in a
652 round-robin fashion across the sockets. Tasks requiring
653 more than one CPU will have each CPUs allocated in a
654 cyclic fashion across sockets.
655
656
657 Third distribution method (distribution of CPUs across cores for
658 binding):
659
660
661 * Use the default method for distributing CPUs across cores
662 (inherited from second distribution method).
663
664 block The block distribution method will distribute allocated
665 CPUs consecutively from the same core for binding to
666 tasks, before using the next consecutive core.
667
668 cyclic The cyclic distribution method will distribute allocated
669 CPUs for binding to a given task consecutively from the
670 same core, and from the next consecutive core for the
671 next task, in a round-robin fashion across cores.
672
673 fcyclic
674 The fcyclic distribution method will distribute allocated
675 CPUs for binding to tasks from consecutive cores in a
676 round-robin fashion across the cores.
677
678
679
680 Optional control for task distribution over nodes:
681
682
683 Pack Rather than evenly distributing a job step's tasks evenly
684 across its allocated nodes, pack them as tightly as pos‐
685 sible on the nodes. This only applies when the "block"
686 task distribution method is used.
687
688 NoPack Rather than packing a job step's tasks as tightly as pos‐
689 sible on the nodes, distribute them evenly. This user
690 option will supersede the SelectTypeParameters
691 CR_Pack_Nodes configuration parameter.
692
693
694 -e, --error=<filename_pattern>
695 Instruct Slurm to connect the batch script's standard error di‐
696 rectly to the file name specified in the "filename pattern". By
697 default both standard output and standard error are directed to
698 the same file. For job arrays, the default file name is
699 "slurm-%A_%a.out", "%A" is replaced by the job ID and "%a" with
700 the array index. For other jobs, the default file name is
701 "slurm-%j.out", where the "%j" is replaced by the job ID. See
702 the filename pattern section below for filename specification
703 options.
704
705
706 -x, --exclude=<node_name_list>
707 Explicitly exclude certain nodes from the resources granted to
708 the job.
709
710
711 --exclusive[={user|mcs}]
712 The job allocation can not share nodes with other running jobs
713 (or just other users with the "=user" option or with the "=mcs"
714 option). If user/mcs are not specified (i.e. the job allocation
715 can not share nodes with other running jobs), the job is allo‐
716 cated all CPUs and GRES on all nodes in the allocation, but is
717 only allocated as much memory as it requested. This is by design
718 to support gang scheduling, because suspended jobs still reside
719 in memory. To request all the memory on a node, use --mem=0.
720 The default shared/exclusive behavior depends on system configu‐
721 ration and the partition's OverSubscribe option takes precedence
722 over the job's option.
723
724
725 --export={[ALL,]<environment_variables>|ALL|NONE}
726 Identify which environment variables from the submission envi‐
727 ronment are propagated to the launched application. Note that
728 SLURM_* variables are always propagated.
729
730 --export=ALL
731 Default mode if --export is not specified. All of the
732 user's environment will be loaded (either from the
733 caller's environment or from a clean environment if
734 --get-user-env is specified).
735
736 --export=NONE
737 Only SLURM_* variables from the user environment will
738 be defined. User must use absolute path to the binary
739 to be executed that will define the environment. User
740 can not specify explicit environment variables with
741 "NONE". --get-user-env will be ignored.
742
743 This option is particularly important for jobs that
744 are submitted on one cluster and execute on a differ‐
745 ent cluster (e.g. with different paths). To avoid
746 steps inheriting environment export settings (e.g.
747 "NONE") from sbatch command, the environment variable
748 SLURM_EXPORT_ENV should be set to "ALL" in the job
749 script.
750
751 --export=[ALL,]<environment_variables>
752 Exports all SLURM_* environment variables along with
753 explicitly defined variables. Multiple environment
754 variable names should be comma separated. Environment
755 variable names may be specified to propagate the cur‐
756 rent value (e.g. "--export=EDITOR") or specific values
757 may be exported (e.g. "--export=EDITOR=/bin/emacs").
758 If "ALL" is specified, then all user environment vari‐
759 ables will be loaded and will take precedence over any
760 explicitly given environment variables.
761
762 Example: --export=EDITOR,ARG1=test
763 In this example, the propagated environment will only
764 contain the variable EDITOR from the user's environ‐
765 ment, SLURM_* environment variables, and ARG1=test.
766
767 Example: --export=ALL,EDITOR=/bin/emacs
768 There are two possible outcomes for this example. If
769 the caller has the EDITOR environment variable de‐
770 fined, then the job's environment will inherit the
771 variable from the caller's environment. If the caller
772 doesn't have an environment variable defined for EDI‐
773 TOR, then the job's environment will use the value
774 given by --export.
775
776
777 --export-file={<filename>|<fd>}
778 If a number between 3 and OPEN_MAX is specified as the argument
779 to this option, a readable file descriptor will be assumed
780 (STDIN and STDOUT are not supported as valid arguments). Other‐
781 wise a filename is assumed. Export environment variables de‐
782 fined in <filename> or read from <fd> to the job's execution en‐
783 vironment. The content is one or more environment variable defi‐
784 nitions of the form NAME=value, each separated by a null charac‐
785 ter. This allows the use of special characters in environment
786 definitions.
787
788
789 -B, --extra-node-info=<sockets>[:cores[:threads]]
790 Restrict node selection to nodes with at least the specified
791 number of sockets, cores per socket and/or threads per core.
792 NOTE: These options do not specify the resource allocation size.
793 Each value specified is considered a minimum. An asterisk (*)
794 can be used as a placeholder indicating that all available re‐
795 sources of that type are to be utilized. Values can also be
796 specified as min-max. The individual levels can also be speci‐
797 fied in separate options if desired:
798 --sockets-per-node=<sockets>
799 --cores-per-socket=<cores>
800 --threads-per-core=<threads>
801 If task/affinity plugin is enabled, then specifying an alloca‐
802 tion in this manner also results in subsequently launched tasks
803 being bound to threads if the -B option specifies a thread
804 count, otherwise an option of cores if a core count is speci‐
805 fied, otherwise an option of sockets. If SelectType is config‐
806 ured to select/cons_res, it must have a parameter of CR_Core,
807 CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option
808 to be honored. If not specified, the scontrol show job will
809 display 'ReqS:C:T=*:*:*'. This option applies to job alloca‐
810 tions.
811 NOTE: This option is mutually exclusive with --hint,
812 --threads-per-core and --ntasks-per-core.
813 NOTE: This option may implicitly set the number of tasks (if -n
814 was not specified) as one task per requested thread.
815
816
817 --get-user-env[=timeout][mode]
818 This option will tell sbatch to retrieve the login environment
819 variables for the user specified in the --uid option. The envi‐
820 ronment variables are retrieved by running something of this
821 sort "su - <username> -c /usr/bin/env" and parsing the output.
822 Be aware that any environment variables already set in sbatch's
823 environment will take precedence over any environment variables
824 in the user's login environment. Clear any environment variables
825 before calling sbatch that you do not want propagated to the
826 spawned program. The optional timeout value is in seconds. De‐
827 fault value is 8 seconds. The optional mode value control the
828 "su" options. With a mode value of "S", "su" is executed with‐
829 out the "-" option. With a mode value of "L", "su" is executed
830 with the "-" option, replicating the login environment. If mode
831 not specified, the mode established at Slurm build time is used.
832 Example of use include "--get-user-env", "--get-user-env=10"
833 "--get-user-env=10L", and "--get-user-env=S".
834
835
836 --gid=<group>
837 If sbatch is run as root, and the --gid option is used, submit
838 the job with group's group access permissions. group may be the
839 group name or the numerical group ID.
840
841
842 --gpu-bind=[verbose,]<type>
843 Bind tasks to specific GPUs. By default every spawned task can
844 access every GPU allocated to the step. If "verbose," is speci‐
845 fied before <type>, then print out GPU binding debug information
846 to the stderr of the tasks. GPU binding is ignored if there is
847 only one task.
848
849 Supported type options:
850
851 closest Bind each task to the GPU(s) which are closest. In a
852 NUMA environment, each task may be bound to more than
853 one GPU (i.e. all GPUs in that NUMA environment).
854
855 map_gpu:<list>
856 Bind by setting GPU masks on tasks (or ranks) as spec‐
857 ified where <list> is
858 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
859 are interpreted as decimal values unless they are pre‐
860 ceded with '0x' in which case they interpreted as
861 hexadecimal values. If the number of tasks (or ranks)
862 exceeds the number of elements in this list, elements
863 in the list will be reused as needed starting from the
864 beginning of the list. To simplify support for large
865 task counts, the lists may follow a map with an aster‐
866 isk and repetition count. For example
867 "map_gpu:0*4,1*4". If the task/cgroup plugin is used
868 and ConstrainDevices is set in cgroup.conf, then the
869 GPU IDs are zero-based indexes relative to the GPUs
870 allocated to the job (e.g. the first GPU is 0, even if
871 the global ID is 3). Otherwise, the GPU IDs are global
872 IDs, and all GPUs on each node in the job should be
873 allocated for predictable binding results.
874
875 mask_gpu:<list>
876 Bind by setting GPU masks on tasks (or ranks) as spec‐
877 ified where <list> is
878 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
879 mapping is specified for a node and identical mapping
880 is applied to the tasks on every node (i.e. the lowest
881 task ID on each node is mapped to the first mask spec‐
882 ified in the list, etc.). GPU masks are always inter‐
883 preted as hexadecimal values but can be preceded with
884 an optional '0x'. To simplify support for large task
885 counts, the lists may follow a map with an asterisk
886 and repetition count. For example
887 "mask_gpu:0x0f*4,0xf0*4". If the task/cgroup plugin
888 is used and ConstrainDevices is set in cgroup.conf,
889 then the GPU IDs are zero-based indexes relative to
890 the GPUs allocated to the job (e.g. the first GPU is
891 0, even if the global ID is 3). Otherwise, the GPU IDs
892 are global IDs, and all GPUs on each node in the job
893 should be allocated for predictable binding results.
894
895 none Do not bind tasks to GPUs (turns off binding if
896 --gpus-per-task is requested).
897
898 per_task:<gpus_per_task>
899 Each task will be bound to the number of gpus speci‐
900 fied in <gpus_per_task>. Gpus are assigned in order to
901 tasks. The first task will be assigned the first x
902 number of gpus on the node etc.
903
904 single:<tasks_per_gpu>
905 Like --gpu-bind=closest, except that each task can
906 only be bound to a single GPU, even when it can be
907 bound to multiple GPUs that are equally close. The
908 GPU to bind to is determined by <tasks_per_gpu>, where
909 the first <tasks_per_gpu> tasks are bound to the first
910 GPU available, the second <tasks_per_gpu> tasks are
911 bound to the second GPU available, etc. This is basi‐
912 cally a block distribution of tasks onto available
913 GPUs, where the available GPUs are determined by the
914 socket affinity of the task and the socket affinity of
915 the GPUs as specified in gres.conf's Cores parameter.
916
917
918 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
919 Request that GPUs allocated to the job are configured with spe‐
920 cific frequency values. This option can be used to indepen‐
921 dently configure the GPU and its memory frequencies. After the
922 job is completed, the frequencies of all affected GPUs will be
923 reset to the highest possible values. In some cases, system
924 power caps may override the requested values. The field type
925 can be "memory". If type is not specified, the GPU frequency is
926 implied. The value field can either be "low", "medium", "high",
927 "highm1" or a numeric value in megahertz (MHz). If the speci‐
928 fied numeric value is not possible, a value as close as possible
929 will be used. See below for definition of the values. The ver‐
930 bose option causes current GPU frequency information to be
931 logged. Examples of use include "--gpu-freq=medium,memory=high"
932 and "--gpu-freq=450".
933
934 Supported value definitions:
935
936 low the lowest available frequency.
937
938 medium attempts to set a frequency in the middle of the
939 available range.
940
941 high the highest available frequency.
942
943 highm1 (high minus one) will select the next highest avail‐
944 able frequency.
945
946
947 -G, --gpus=[type:]<number>
948 Specify the total number of GPUs required for the job. An op‐
949 tional GPU type specification can be supplied. For example
950 "--gpus=volta:3". Multiple options can be requested in a comma
951 separated list, for example: "--gpus=volta:3,kepler:1". See
952 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
953 options.
954
955
956 --gpus-per-node=[type:]<number>
957 Specify the number of GPUs required for the job on each node in‐
958 cluded in the job's resource allocation. An optional GPU type
959 specification can be supplied. For example
960 "--gpus-per-node=volta:3". Multiple options can be requested in
961 a comma separated list, for example:
962 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
963 --gpus-per-socket and --gpus-per-task options.
964
965
966 --gpus-per-socket=[type:]<number>
967 Specify the number of GPUs required for the job on each socket
968 included in the job's resource allocation. An optional GPU type
969 specification can be supplied. For example
970 "--gpus-per-socket=volta:3". Multiple options can be requested
971 in a comma separated list, for example:
972 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
973 sockets per node count ( --sockets-per-node). See also the
974 --gpus, --gpus-per-node and --gpus-per-task options.
975
976
977 --gpus-per-task=[type:]<number>
978 Specify the number of GPUs required for the job on each task to
979 be spawned in the job's resource allocation. An optional GPU
980 type specification can be supplied. For example
981 "--gpus-per-task=volta:1". Multiple options can be requested in
982 a comma separated list, for example:
983 "--gpus-per-task=volta:3,kepler:1". See also the --gpus,
984 --gpus-per-socket and --gpus-per-node options. This option re‐
985 quires an explicit task count, e.g. -n, --ntasks or "--gpus=X
986 --gpus-per-task=Y" rather than an ambiguous range of nodes with
987 -N, --nodes. This option will implicitly set
988 --gpu-bind=per_task:<gpus_per_task>, but that can be overridden
989 with an explicit --gpu-bind specification.
990
991
992 --gres=<list>
993 Specifies a comma-delimited list of generic consumable re‐
994 sources. The format of each entry on the list is
995 "name[[:type]:count]". The name is that of the consumable re‐
996 source. The count is the number of those resources with a de‐
997 fault value of 1. The count can have a suffix of "k" or "K"
998 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
999 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
1000 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
1001 x 1024 x 1024 x 1024). The specified resources will be allo‐
1002 cated to the job on each node. The available generic consumable
1003 resources is configurable by the system administrator. A list
1004 of available generic consumable resources will be printed and
1005 the command will exit if the option argument is "help". Exam‐
1006 ples of use include "--gres=gpu:2", "--gres=gpu:kepler:2", and
1007 "--gres=help".
1008
1009
1010 --gres-flags=<type>
1011 Specify generic resource task binding options.
1012
1013 disable-binding
1014 Disable filtering of CPUs with respect to generic re‐
1015 source locality. This option is currently required to
1016 use more CPUs than are bound to a GRES (i.e. if a GPU is
1017 bound to the CPUs on one socket, but resources on more
1018 than one socket are required to run the job). This op‐
1019 tion may permit a job to be allocated resources sooner
1020 than otherwise possible, but may result in lower job per‐
1021 formance.
1022 NOTE: This option is specific to SelectType=cons_res.
1023
1024 enforce-binding
1025 The only CPUs available to the job will be those bound to
1026 the selected GRES (i.e. the CPUs identified in the
1027 gres.conf file will be strictly enforced). This option
1028 may result in delayed initiation of a job. For example a
1029 job requiring two GPUs and one CPU will be delayed until
1030 both GPUs on a single socket are available rather than
1031 using GPUs bound to separate sockets, however, the appli‐
1032 cation performance may be improved due to improved commu‐
1033 nication speed. Requires the node to be configured with
1034 more than one socket and resource filtering will be per‐
1035 formed on a per-socket basis.
1036 NOTE: This option is specific to SelectType=cons_tres.
1037
1038
1039 -h, --help
1040 Display help information and exit.
1041
1042
1043 --hint=<type>
1044 Bind tasks according to application hints.
1045 NOTE: This option cannot be used in conjunction with
1046 --ntasks-per-core, --threads-per-core or -B. If --hint is speci‐
1047 fied as a command line argument, it will take precedence over
1048 the environment.
1049
1050 compute_bound
1051 Select settings for compute bound applications: use all
1052 cores in each socket, one thread per core.
1053
1054 memory_bound
1055 Select settings for memory bound applications: use only
1056 one core in each socket, one thread per core.
1057
1058 [no]multithread
1059 [don't] use extra threads with in-core multi-threading
1060 which can benefit communication intensive applications.
1061 Only supported with the task/affinity plugin.
1062
1063 help show this help message
1064
1065
1066 -H, --hold
1067 Specify the job is to be submitted in a held state (priority of
1068 zero). A held job can now be released using scontrol to reset
1069 its priority (e.g. "scontrol release <job_id>").
1070
1071
1072 --ignore-pbs
1073 Ignore all "#PBS" and "#BSUB" options specified in the batch
1074 script.
1075
1076
1077 -i, --input=<filename_pattern>
1078 Instruct Slurm to connect the batch script's standard input di‐
1079 rectly to the file name specified in the "filename pattern".
1080
1081 By default, "/dev/null" is open on the batch script's standard
1082 input and both standard output and standard error are directed
1083 to a file of the name "slurm-%j.out", where the "%j" is replaced
1084 with the job allocation number, as described below in the file‐
1085 name pattern section.
1086
1087
1088 -J, --job-name=<jobname>
1089 Specify a name for the job allocation. The specified name will
1090 appear along with the job id number when querying running jobs
1091 on the system. The default is the name of the batch script, or
1092 just "sbatch" if the script is read on sbatch's standard input.
1093
1094
1095 --kill-on-invalid-dep=<yes|no>
1096 If a job has an invalid dependency and it can never run this pa‐
1097 rameter tells Slurm to terminate it or not. A terminated job
1098 state will be JOB_CANCELLED. If this option is not specified
1099 the system wide behavior applies. By default the job stays
1100 pending with reason DependencyNeverSatisfied or if the kill_in‐
1101 valid_depend is specified in slurm.conf the job is terminated.
1102
1103
1104 -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
1105 Specification of licenses (or other resources available on all
1106 nodes of the cluster) which must be allocated to this job. Li‐
1107 cense names can be followed by a colon and count (the default
1108 count is one). Multiple license names should be comma separated
1109 (e.g. "--licenses=foo:4,bar"). To submit jobs using remote li‐
1110 censes, those served by the slurmdbd, specify the name of the
1111 server providing the licenses. For example "--license=nas‐
1112 tran@slurmdb:12".
1113
1114
1115 --mail-type=<type>
1116 Notify user by email when certain event types occur. Valid type
1117 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1118 BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
1119 VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1120 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1121 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
1122 percent of time limit), TIME_LIMIT_50 (reached 50 percent of
1123 time limit) and ARRAY_TASKS (send emails for each array task).
1124 Multiple type values may be specified in a comma separated list.
1125 The user to be notified is indicated with --mail-user. Unless
1126 the ARRAY_TASKS option is specified, mail notifications on job
1127 BEGIN, END and FAIL apply to a job array as a whole rather than
1128 generating individual email messages for each task in the job
1129 array.
1130
1131
1132 --mail-user=<user>
1133 User to receive email notification of state changes as defined
1134 by --mail-type. The default value is the submitting user.
1135
1136
1137 --mcs-label=<mcs>
1138 Used only when the mcs/group plugin is enabled. This parameter
1139 is a group among the groups of the user. Default value is cal‐
1140 culated by the Plugin mcs if it's enabled.
1141
1142
1143 --mem=<size>[units]
1144 Specify the real memory required per node. Default units are
1145 megabytes. Different units can be specified using the suffix
1146 [K|M|G|T]. Default value is DefMemPerNode and the maximum value
1147 is MaxMemPerNode. If configured, both parameters can be seen us‐
1148 ing the scontrol show config command. This parameter would gen‐
1149 erally be used if whole nodes are allocated to jobs (Select‐
1150 Type=select/linear). Also see --mem-per-cpu and --mem-per-gpu.
1151 The --mem, --mem-per-cpu and --mem-per-gpu options are mutually
1152 exclusive. If --mem, --mem-per-cpu or --mem-per-gpu are speci‐
1153 fied as command line arguments, then they will take precedence
1154 over the environment.
1155
1156 NOTE: A memory size specification of zero is treated as a spe‐
1157 cial case and grants the job access to all of the memory on each
1158 node. If the job is allocated multiple nodes in a heterogeneous
1159 cluster, the memory limit on each node will be that of the node
1160 in the allocation with the smallest memory size (same limit will
1161 apply to every node in the job's allocation).
1162
1163 NOTE: Enforcement of memory limits currently relies upon the
1164 task/cgroup plugin or enabling of accounting, which samples mem‐
1165 ory use on a periodic basis (data need not be stored, just col‐
1166 lected). In both cases memory use is based upon the job's Resi‐
1167 dent Set Size (RSS). A task may exceed the memory limit until
1168 the next periodic accounting sample.
1169
1170
1171 --mem-bind=[{quiet|verbose},]<type>
1172 Bind tasks to memory. Used only when the task/affinity plugin is
1173 enabled and the NUMA memory functions are available. Note that
1174 the resolution of CPU and memory binding may differ on some ar‐
1175 chitectures. For example, CPU binding may be performed at the
1176 level of the cores within a processor while memory binding will
1177 be performed at the level of nodes, where the definition of
1178 "nodes" may differ from system to system. By default no memory
1179 binding is performed; any task using any CPU can use any memory.
1180 This option is typically used to ensure that each task is bound
1181 to the memory closest to its assigned CPU. The use of any type
1182 other than "none" or "local" is not recommended.
1183
1184 NOTE: To have Slurm always report on the selected memory binding
1185 for all commands executed in a shell, you can enable verbose
1186 mode by setting the SLURM_MEM_BIND environment variable value to
1187 "verbose".
1188
1189 The following informational environment variables are set when
1190 --mem-bind is in use:
1191
1192 SLURM_MEM_BIND_LIST
1193 SLURM_MEM_BIND_PREFER
1194 SLURM_MEM_BIND_SORT
1195 SLURM_MEM_BIND_TYPE
1196 SLURM_MEM_BIND_VERBOSE
1197
1198 See the ENVIRONMENT VARIABLES section for a more detailed de‐
1199 scription of the individual SLURM_MEM_BIND* variables.
1200
1201 Supported options include:
1202
1203 help show this help message
1204
1205 local Use memory local to the processor in use
1206
1207 map_mem:<list>
1208 Bind by setting memory masks on tasks (or ranks) as spec‐
1209 ified where <list> is
1210 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1211 ping is specified for a node and identical mapping is ap‐
1212 plied to the tasks on every node (i.e. the lowest task ID
1213 on each node is mapped to the first ID specified in the
1214 list, etc.). NUMA IDs are interpreted as decimal values
1215 unless they are preceded with '0x' in which case they in‐
1216 terpreted as hexadecimal values. If the number of tasks
1217 (or ranks) exceeds the number of elements in this list,
1218 elements in the list will be reused as needed starting
1219 from the beginning of the list. To simplify support for
1220 large task counts, the lists may follow a map with an as‐
1221 terisk and repetition count. For example
1222 "map_mem:0x0f*4,0xf0*4". For predictable binding re‐
1223 sults, all CPUs for each node in the job should be allo‐
1224 cated to the job.
1225
1226 mask_mem:<list>
1227 Bind by setting memory masks on tasks (or ranks) as spec‐
1228 ified where <list> is
1229 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1230 mapping is specified for a node and identical mapping is
1231 applied to the tasks on every node (i.e. the lowest task
1232 ID on each node is mapped to the first mask specified in
1233 the list, etc.). NUMA masks are always interpreted as
1234 hexadecimal values. Note that masks must be preceded
1235 with a '0x' if they don't begin with [0-9] so they are
1236 seen as numerical values. If the number of tasks (or
1237 ranks) exceeds the number of elements in this list, ele‐
1238 ments in the list will be reused as needed starting from
1239 the beginning of the list. To simplify support for large
1240 task counts, the lists may follow a mask with an asterisk
1241 and repetition count. For example "mask_mem:0*4,1*4".
1242 For predictable binding results, all CPUs for each node
1243 in the job should be allocated to the job.
1244
1245 no[ne] don't bind tasks to memory (default)
1246
1247 p[refer]
1248 Prefer use of first specified NUMA node, but permit
1249 use of other available NUMA nodes.
1250
1251 q[uiet]
1252 quietly bind before task runs (default)
1253
1254 rank bind by task rank (not recommended)
1255
1256 sort sort free cache pages (run zonesort on Intel KNL nodes)
1257
1258 v[erbose]
1259 verbosely report binding before task runs
1260
1261
1262 --mem-per-cpu=<size>[units]
1263 Minimum memory required per allocated CPU. Default units are
1264 megabytes. The default value is DefMemPerCPU and the maximum
1265 value is MaxMemPerCPU (see exception below). If configured, both
1266 parameters can be seen using the scontrol show config command.
1267 Note that if the job's --mem-per-cpu value exceeds the config‐
1268 ured MaxMemPerCPU, then the user's limit will be treated as a
1269 memory limit per task; --mem-per-cpu will be reduced to a value
1270 no larger than MaxMemPerCPU; --cpus-per-task will be set and the
1271 value of --cpus-per-task multiplied by the new --mem-per-cpu
1272 value will equal the original --mem-per-cpu value specified by
1273 the user. This parameter would generally be used if individual
1274 processors are allocated to jobs (SelectType=select/cons_res).
1275 If resources are allocated by core, socket, or whole nodes, then
1276 the number of CPUs allocated to a job may be higher than the
1277 task count and the value of --mem-per-cpu should be adjusted ac‐
1278 cordingly. Also see --mem and --mem-per-gpu. The --mem,
1279 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1280
1281 NOTE: If the final amount of memory requested by a job can't be
1282 satisfied by any of the nodes configured in the partition, the
1283 job will be rejected. This could happen if --mem-per-cpu is
1284 used with the --exclusive option for a job allocation and
1285 --mem-per-cpu times the number of CPUs on a node is greater than
1286 the total memory of that node.
1287
1288
1289 --mem-per-gpu=<size>[units]
1290 Minimum memory required per allocated GPU. Default units are
1291 megabytes. Different units can be specified using the suffix
1292 [K|M|G|T]. Default value is DefMemPerGPU and is available on
1293 both a global and per partition basis. If configured, the pa‐
1294 rameters can be seen using the scontrol show config and scontrol
1295 show partition commands. Also see --mem. The --mem,
1296 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1297
1298
1299 --mincpus=<n>
1300 Specify a minimum number of logical cpus/processors per node.
1301
1302
1303 --network=<type>
1304 Specify information pertaining to the switch or network. The
1305 interpretation of type is system dependent. This option is sup‐
1306 ported when running Slurm on a Cray natively. It is used to re‐
1307 quest using Network Performance Counters. Only one value per
1308 request is valid. All options are case in-sensitive. In this
1309 configuration supported values include:
1310
1311 system
1312 Use the system-wide network performance counters. Only
1313 nodes requested will be marked in use for the job alloca‐
1314 tion. If the job does not fill up the entire system the
1315 rest of the nodes are not able to be used by other jobs
1316 using NPC, if idle their state will appear as PerfCnts.
1317 These nodes are still available for other jobs not using
1318 NPC.
1319
1320 blade Use the blade network performance counters. Only nodes re‐
1321 quested will be marked in use for the job allocation. If
1322 the job does not fill up the entire blade(s) allocated to
1323 the job those blade(s) are not able to be used by other
1324 jobs using NPC, if idle their state will appear as PerfC‐
1325 nts. These nodes are still available for other jobs not
1326 using NPC.
1327
1328
1329 In all cases the job allocation request must specify the
1330 --exclusive option. Otherwise the request will be denied.
1331
1332 Also with any of these options steps are not allowed to share
1333 blades, so resources would remain idle inside an allocation if
1334 the step running on a blade does not take up all the nodes on
1335 the blade.
1336
1337 The network option is also supported on systems with IBM's Par‐
1338 allel Environment (PE). See IBM's LoadLeveler job command key‐
1339 word documentation about the keyword "network" for more informa‐
1340 tion. Multiple values may be specified in a comma separated
1341 list. All options are case in-sensitive. Supported values in‐
1342 clude:
1343
1344 BULK_XFER[=<resources>]
1345 Enable bulk transfer of data using Remote Di‐
1346 rect-Memory Access (RDMA). The optional resources
1347 specification is a numeric value which can have a
1348 suffix of "k", "K", "m", "M", "g" or "G" for kilo‐
1349 bytes, megabytes or gigabytes. NOTE: The resources
1350 specification is not supported by the underlying IBM
1351 infrastructure as of Parallel Environment version
1352 2.2 and no value should be specified at this time.
1353
1354 CAU=<count> Number of Collective Acceleration Units (CAU) re‐
1355 quired. Applies only to IBM Power7-IH processors.
1356 Default value is zero. Independent CAU will be al‐
1357 located for each programming interface (MPI, LAPI,
1358 etc.)
1359
1360 DEVNAME=<name>
1361 Specify the device name to use for communications
1362 (e.g. "eth0" or "mlx4_0").
1363
1364 DEVTYPE=<type>
1365 Specify the device type to use for communications.
1366 The supported values of type are: "IB" (InfiniBand),
1367 "HFI" (P7 Host Fabric Interface), "IPONLY" (IP-Only
1368 interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1369 nel Emulation of HPCE). The devices allocated to a
1370 job must all be of the same type. The default value
1371 depends upon depends upon what hardware is available
1372 and in order of preferences is IPONLY (which is not
1373 considered in User Space mode), HFI, IB, HPCE, and
1374 KMUX.
1375
1376 IMMED =<count>
1377 Number of immediate send slots per window required.
1378 Applies only to IBM Power7-IH processors. Default
1379 value is zero.
1380
1381 INSTANCES =<count>
1382 Specify number of network connections for each task
1383 on each network connection. The default instance
1384 count is 1.
1385
1386 IPV4 Use Internet Protocol (IP) version 4 communications
1387 (default).
1388
1389 IPV6 Use Internet Protocol (IP) version 6 communications.
1390
1391 LAPI Use the LAPI programming interface.
1392
1393 MPI Use the MPI programming interface. MPI is the de‐
1394 fault interface.
1395
1396 PAMI Use the PAMI programming interface.
1397
1398 SHMEM Use the OpenSHMEM programming interface.
1399
1400 SN_ALL Use all available switch networks (default).
1401
1402 SN_SINGLE Use one available switch network.
1403
1404 UPC Use the UPC programming interface.
1405
1406 US Use User Space communications.
1407
1408
1409 Some examples of network specifications:
1410
1411 Instances=2,US,MPI,SN_ALL
1412 Create two user space connections for MPI communica‐
1413 tions on every switch network for each task.
1414
1415 US,MPI,Instances=3,Devtype=IB
1416 Create three user space connections for MPI communi‐
1417 cations on every InfiniBand network for each task.
1418
1419 IPV4,LAPI,SN_Single
1420 Create a IP version 4 connection for LAPI communica‐
1421 tions on one switch network for each task.
1422
1423 Instances=2,US,LAPI,MPI
1424 Create two user space connections each for LAPI and
1425 MPI communications on every switch network for each
1426 task. Note that SN_ALL is the default option so ev‐
1427 ery switch network is used. Also note that In‐
1428 stances=2 specifies that two connections are estab‐
1429 lished for each protocol (LAPI and MPI) and each
1430 task. If there are two networks and four tasks on
1431 the node then a total of 32 connections are estab‐
1432 lished (2 instances x 2 protocols x 2 networks x 4
1433 tasks).
1434
1435
1436 --nice[=adjustment]
1437 Run the job with an adjusted scheduling priority within Slurm.
1438 With no adjustment value the scheduling priority is decreased by
1439 100. A negative nice value increases the priority, otherwise de‐
1440 creases it. The adjustment range is +/- 2147483645. Only privi‐
1441 leged users can specify a negative adjustment.
1442
1443
1444 -k, --no-kill[=off]
1445 Do not automatically terminate a job if one of the nodes it has
1446 been allocated fails. The user will assume the responsibilities
1447 for fault-tolerance should a node fail. When there is a node
1448 failure, any active job steps (usually MPI jobs) on that node
1449 will almost certainly suffer a fatal error, but with --no-kill,
1450 the job allocation will not be revoked so the user may launch
1451 new job steps on the remaining nodes in their allocation.
1452
1453 Specify an optional argument of "off" disable the effect of the
1454 SBATCH_NO_KILL environment variable.
1455
1456 By default Slurm terminates the entire job allocation if any
1457 node fails in its range of allocated nodes.
1458
1459
1460 --no-requeue
1461 Specifies that the batch job should never be requeued under any
1462 circumstances. Setting this option will prevent system adminis‐
1463 trators from being able to restart the job (for example, after a
1464 scheduled downtime), recover from a node failure, or be requeued
1465 upon preemption by a higher priority job. When a job is re‐
1466 queued, the batch script is initiated from its beginning. Also
1467 see the --requeue option. The JobRequeue configuration parame‐
1468 ter controls the default behavior on the cluster.
1469
1470
1471 -F, --nodefile=<node_file>
1472 Much like --nodelist, but the list is contained in a file of
1473 name node file. The node names of the list may also span multi‐
1474 ple lines in the file. Duplicate node names in the file will
1475 be ignored. The order of the node names in the list is not im‐
1476 portant; the node names will be sorted by Slurm.
1477
1478
1479 -w, --nodelist=<node_name_list>
1480 Request a specific list of hosts. The job will contain all of
1481 these hosts and possibly additional hosts as needed to satisfy
1482 resource requirements. The list may be specified as a
1483 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1484 for example), or a filename. The host list will be assumed to
1485 be a filename if it contains a "/" character. If you specify a
1486 minimum node or processor count larger than can be satisfied by
1487 the supplied host list, additional resources will be allocated
1488 on other nodes as needed. Duplicate node names in the list will
1489 be ignored. The order of the node names in the list is not im‐
1490 portant; the node names will be sorted by Slurm.
1491
1492
1493 -N, --nodes=<minnodes>[-maxnodes]
1494 Request that a minimum of minnodes nodes be allocated to this
1495 job. A maximum node count may also be specified with maxnodes.
1496 If only one number is specified, this is used as both the mini‐
1497 mum and maximum node count. The partition's node limits super‐
1498 sede those of the job. If a job's node limits are outside of
1499 the range permitted for its associated partition, the job will
1500 be left in a PENDING state. This permits possible execution at
1501 a later time, when the partition limit is changed. If a job
1502 node limit exceeds the number of nodes configured in the parti‐
1503 tion, the job will be rejected. Note that the environment vari‐
1504 able SLURM_JOB_NUM_NODES will be set to the count of nodes actu‐
1505 ally allocated to the job. See the ENVIRONMENT VARIABLES sec‐
1506 tion for more information. If -N is not specified, the default
1507 behavior is to allocate enough nodes to satisfy the requirements
1508 of the -n and -c options. The job will be allocated as many
1509 nodes as possible within the range specified and without delay‐
1510 ing the initiation of the job. The node count specification may
1511 include a numeric value followed by a suffix of "k" (multiplies
1512 numeric value by 1,024) or "m" (multiplies numeric value by
1513 1,048,576).
1514
1515
1516 -n, --ntasks=<number>
1517 sbatch does not launch tasks, it requests an allocation of re‐
1518 sources and submits a batch script. This option advises the
1519 Slurm controller that job steps run within the allocation will
1520 launch a maximum of number tasks and to provide for sufficient
1521 resources. The default is one task per node, but note that the
1522 --cpus-per-task option will change this default.
1523
1524
1525 --ntasks-per-core=<ntasks>
1526 Request the maximum ntasks be invoked on each core. Meant to be
1527 used with the --ntasks option. Related to --ntasks-per-node ex‐
1528 cept at the core level instead of the node level. NOTE: This
1529 option is not supported when using SelectType=select/linear.
1530
1531
1532 --ntasks-per-gpu=<ntasks>
1533 Request that there are ntasks tasks invoked for every GPU. This
1534 option can work in two ways: 1) either specify --ntasks in addi‐
1535 tion, in which case a type-less GPU specification will be auto‐
1536 matically determined to satisfy --ntasks-per-gpu, or 2) specify
1537 the GPUs wanted (e.g. via --gpus or --gres) without specifying
1538 --ntasks, and the total task count will be automatically deter‐
1539 mined. The number of CPUs needed will be automatically in‐
1540 creased if necessary to allow for any calculated task count.
1541 This option will implicitly set --gpu-bind=single:<ntasks>, but
1542 that can be overridden with an explicit --gpu-bind specifica‐
1543 tion. This option is not compatible with a node range (i.e.
1544 -N<minnodes-maxnodes>). This option is not compatible with
1545 --gpus-per-task, --gpus-per-socket, or --ntasks-per-node. This
1546 option is not supported unless SelectType=cons_tres is config‐
1547 ured (either directly or indirectly on Cray systems).
1548
1549
1550 --ntasks-per-node=<ntasks>
1551 Request that ntasks be invoked on each node. If used with the
1552 --ntasks option, the --ntasks option will take precedence and
1553 the --ntasks-per-node will be treated as a maximum count of
1554 tasks per node. Meant to be used with the --nodes option. This
1555 is related to --cpus-per-task=ncpus, but does not require knowl‐
1556 edge of the actual number of cpus on each node. In some cases,
1557 it is more convenient to be able to request that no more than a
1558 specific number of tasks be invoked on each node. Examples of
1559 this include submitting a hybrid MPI/OpenMP app where only one
1560 MPI "task/rank" should be assigned to each node while allowing
1561 the OpenMP portion to utilize all of the parallelism present in
1562 the node, or submitting a single setup/cleanup/monitoring job to
1563 each node of a pre-existing allocation as one step in a larger
1564 job script.
1565
1566
1567 --ntasks-per-socket=<ntasks>
1568 Request the maximum ntasks be invoked on each socket. Meant to
1569 be used with the --ntasks option. Related to --ntasks-per-node
1570 except at the socket level instead of the node level. NOTE:
1571 This option is not supported when using SelectType=select/lin‐
1572 ear.
1573
1574
1575 --open-mode={append|truncate}
1576 Open the output and error files using append or truncate mode as
1577 specified. The default value is specified by the system config‐
1578 uration parameter JobFileAppend.
1579
1580
1581 -o, --output=<filename_pattern>
1582 Instruct Slurm to connect the batch script's standard output di‐
1583 rectly to the file name specified in the "filename pattern". By
1584 default both standard output and standard error are directed to
1585 the same file. For job arrays, the default file name is
1586 "slurm-%A_%a.out", "%A" is replaced by the job ID and "%a" with
1587 the array index. For other jobs, the default file name is
1588 "slurm-%j.out", where the "%j" is replaced by the job ID. See
1589 the filename pattern section below for filename specification
1590 options.
1591
1592
1593 -O, --overcommit
1594 Overcommit resources.
1595
1596 When applied to a job allocation (not including jobs requesting
1597 exclusive access to the nodes) the resources are allocated as if
1598 only one task per node is requested. This means that the re‐
1599 quested number of cpus per task (-c, --cpus-per-task) are allo‐
1600 cated per node rather than being multiplied by the number of
1601 tasks. Options used to specify the number of tasks per node,
1602 socket, core, etc. are ignored.
1603
1604 When applied to job step allocations (the srun command when exe‐
1605 cuted within an existing job allocation), this option can be
1606 used to launch more than one task per CPU. Normally, srun will
1607 not allocate more than one process per CPU. By specifying
1608 --overcommit you are explicitly allowing more than one process
1609 per CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1610 mitted to execute per node. NOTE: MAX_TASKS_PER_NODE is defined
1611 in the file slurm.h and is not a variable, it is set at Slurm
1612 build time.
1613
1614
1615 -s, --oversubscribe
1616 The job allocation can over-subscribe resources with other run‐
1617 ning jobs. The resources to be over-subscribed can be nodes,
1618 sockets, cores, and/or hyperthreads depending upon configura‐
1619 tion. The default over-subscribe behavior depends on system
1620 configuration and the partition's OverSubscribe option takes
1621 precedence over the job's option. This option may result in the
1622 allocation being granted sooner than if the --oversubscribe op‐
1623 tion was not set and allow higher system utilization, but appli‐
1624 cation performance will likely suffer due to competition for re‐
1625 sources. Also see the --exclusive option.
1626
1627
1628 --parsable
1629 Outputs only the job id number and the cluster name if present.
1630 The values are separated by a semicolon. Errors will still be
1631 displayed.
1632
1633
1634 -p, --partition=<partition_names>
1635 Request a specific partition for the resource allocation. If
1636 not specified, the default behavior is to allow the slurm con‐
1637 troller to select the default partition as designated by the
1638 system administrator. If the job can use more than one parti‐
1639 tion, specify their names in a comma separate list and the one
1640 offering earliest initiation will be used with no regard given
1641 to the partition name ordering (although higher priority parti‐
1642 tions will be considered first). When the job is initiated, the
1643 name of the partition used will be placed first in the job
1644 record partition string.
1645
1646
1647 --power=<flags>
1648 Comma separated list of power management plugin options. Cur‐
1649 rently available flags include: level (all nodes allocated to
1650 the job should have identical power caps, may be disabled by the
1651 Slurm configuration option PowerParameters=job_no_level).
1652
1653
1654 --priority=<value>
1655 Request a specific job priority. May be subject to configura‐
1656 tion specific constraints. value should either be a numeric
1657 value or "TOP" (for highest possible value). Only Slurm opera‐
1658 tors and administrators can set the priority of a job.
1659
1660
1661 --profile={all|none|<type>[,<type>...]}
1662 Enables detailed data collection by the acct_gather_profile
1663 plugin. Detailed data are typically time-series that are stored
1664 in an HDF5 file for the job or an InfluxDB database depending on
1665 the configured plugin.
1666
1667
1668 All All data types are collected. (Cannot be combined with
1669 other values.)
1670
1671
1672 None No data types are collected. This is the default.
1673 (Cannot be combined with other values.)
1674
1675
1676 Valid type values are:
1677
1678
1679 Energy Energy data is collected.
1680
1681
1682 Task Task (I/O, Memory, ...) data is collected.
1683
1684
1685 Lustre Lustre data is collected.
1686
1687
1688 Network
1689 Network (InfiniBand) data is collected.
1690
1691
1692 --propagate[=rlimit[,rlimit...]]
1693 Allows users to specify which of the modifiable (soft) resource
1694 limits to propagate to the compute nodes and apply to their
1695 jobs. If no rlimit is specified, then all resource limits will
1696 be propagated. The following rlimit names are supported by
1697 Slurm (although some options may not be supported on some sys‐
1698 tems):
1699
1700 ALL All limits listed below (default)
1701
1702 NONE No limits listed below
1703
1704 AS The maximum address space (virtual memory) for a
1705 process.
1706
1707 CORE The maximum size of core file
1708
1709 CPU The maximum amount of CPU time
1710
1711 DATA The maximum size of a process's data segment
1712
1713 FSIZE The maximum size of files created. Note that if the
1714 user sets FSIZE to less than the current size of the
1715 slurmd.log, job launches will fail with a 'File size
1716 limit exceeded' error.
1717
1718 MEMLOCK The maximum size that may be locked into memory
1719
1720 NOFILE The maximum number of open files
1721
1722 NPROC The maximum number of processes available
1723
1724 RSS The maximum resident set size. Note that this only has
1725 effect with Linux kernels 2.4.30 or older or BSD.
1726
1727 STACK The maximum stack size
1728
1729
1730 -q, --qos=<qos>
1731 Request a quality of service for the job. QOS values can be de‐
1732 fined for each user/cluster/account association in the Slurm
1733 database. Users will be limited to their association's defined
1734 set of qos's when the Slurm configuration parameter, Account‐
1735 ingStorageEnforce, includes "qos" in its definition.
1736
1737
1738 -Q, --quiet
1739 Suppress informational messages from sbatch such as Job ID. Only
1740 errors will still be displayed.
1741
1742
1743 --reboot
1744 Force the allocated nodes to reboot before starting the job.
1745 This is only supported with some system configurations and will
1746 otherwise be silently ignored. Only root, SlurmUser or admins
1747 can reboot nodes.
1748
1749
1750 --requeue
1751 Specifies that the batch job should be eligible for requeuing.
1752 The job may be requeued explicitly by a system administrator,
1753 after node failure, or upon preemption by a higher priority job.
1754 When a job is requeued, the batch script is initiated from its
1755 beginning. Also see the --no-requeue option. The JobRequeue
1756 configuration parameter controls the default behavior on the
1757 cluster.
1758
1759
1760 --reservation=<reservation_names>
1761 Allocate resources for the job from the named reservation. If
1762 the job can use more than one reservation, specify their names
1763 in a comma separate list and the one offering earliest initia‐
1764 tion. Each reservation will be considered in the order it was
1765 requested. All reservations will be listed in scontrol/squeue
1766 through the life of the job. In accounting the first reserva‐
1767 tion will be seen and after the job starts the reservation used
1768 will replace it.
1769
1770
1771 --signal=[{R|B}:]<sig_num>[@sig_time]
1772 When a job is within sig_time seconds of its end time, send it
1773 the signal sig_num. Due to the resolution of event handling by
1774 Slurm, the signal may be sent up to 60 seconds earlier than
1775 specified. sig_num may either be a signal number or name (e.g.
1776 "10" or "USR1"). sig_time must have an integer value between 0
1777 and 65535. By default, no signal is sent before the job's end
1778 time. If a sig_num is specified without any sig_time, the de‐
1779 fault time will be 60 seconds. Use the "B:" option to signal
1780 only the batch shell, none of the other processes will be sig‐
1781 naled. By default all job steps will be signaled, but not the
1782 batch shell itself. Use the "R:" option to allow this job to
1783 overlap with a reservation with MaxStartDelay set. To have the
1784 signal sent at preemption time see the preempt_send_user_signal
1785 SlurmctldParameter.
1786
1787
1788 --sockets-per-node=<sockets>
1789 Restrict node selection to nodes with at least the specified
1790 number of sockets. See additional information under -B option
1791 above when task/affinity plugin is enabled.
1792 NOTE: This option may implicitly set the number of tasks (if -n
1793 was not specified) as one task per requested thread.
1794
1795
1796 --spread-job
1797 Spread the job allocation over as many nodes as possible and at‐
1798 tempt to evenly distribute tasks across the allocated nodes.
1799 This option disables the topology/tree plugin.
1800
1801
1802 --switches=<count>[@max-time]
1803 When a tree topology is used, this defines the maximum count of
1804 leaf switches desired for the job allocation and optionally the
1805 maximum time to wait for that number of switches. If Slurm finds
1806 an allocation containing more switches than the count specified,
1807 the job remains pending until it either finds an allocation with
1808 desired switch count or the time limit expires. It there is no
1809 switch count limit, there is no delay in starting the job. Ac‐
1810 ceptable time formats include "minutes", "minutes:seconds",
1811 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1812 "days-hours:minutes:seconds". The job's maximum time delay may
1813 be limited by the system administrator using the SchedulerParam‐
1814 eters configuration parameter with the max_switch_wait parameter
1815 option. On a dragonfly network the only switch count supported
1816 is 1 since communication performance will be highest when a job
1817 is allocate resources on one leaf switch or more than 2 leaf
1818 switches. The default max-time is the max_switch_wait Sched‐
1819 ulerParameters.
1820
1821
1822 --test-only
1823 Validate the batch script and return an estimate of when a job
1824 would be scheduled to run given the current job queue and all
1825 the other arguments specifying the job requirements. No job is
1826 actually submitted.
1827
1828
1829 --thread-spec=<num>
1830 Count of specialized threads per node reserved by the job for
1831 system operations and not used by the application. The applica‐
1832 tion will not use these threads, but will be charged for their
1833 allocation. This option can not be used with the --core-spec
1834 option.
1835
1836
1837 --threads-per-core=<threads>
1838 Restrict node selection to nodes with at least the specified
1839 number of threads per core. In task layout, use the specified
1840 maximum number of threads per core. NOTE: "Threads" refers to
1841 the number of processing units on each core rather than the num‐
1842 ber of application tasks to be launched per core. See addi‐
1843 tional information under -B option above when task/affinity
1844 plugin is enabled.
1845 NOTE: This option may implicitly set the number of tasks (if -n
1846 was not specified) as one task per requested thread.
1847
1848
1849 -t, --time=<time>
1850 Set a limit on the total run time of the job allocation. If the
1851 requested time limit exceeds the partition's time limit, the job
1852 will be left in a PENDING state (possibly indefinitely). The
1853 default time limit is the partition's default time limit. When
1854 the time limit is reached, each task in each job step is sent
1855 SIGTERM followed by SIGKILL. The interval between signals is
1856 specified by the Slurm configuration parameter KillWait. The
1857 OverTimeLimit configuration parameter may permit the job to run
1858 longer than scheduled. Time resolution is one minute and second
1859 values are rounded up to the next minute.
1860
1861 A time limit of zero requests that no time limit be imposed.
1862 Acceptable time formats include "minutes", "minutes:seconds",
1863 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1864 "days-hours:minutes:seconds".
1865
1866
1867 --time-min=<time>
1868 Set a minimum time limit on the job allocation. If specified,
1869 the job may have its --time limit lowered to a value no lower
1870 than --time-min if doing so permits the job to begin execution
1871 earlier than otherwise possible. The job's time limit will not
1872 be changed after the job is allocated resources. This is per‐
1873 formed by a backfill scheduling algorithm to allocate resources
1874 otherwise reserved for higher priority jobs. Acceptable time
1875 formats include "minutes", "minutes:seconds", "hours:min‐
1876 utes:seconds", "days-hours", "days-hours:minutes" and
1877 "days-hours:minutes:seconds".
1878
1879
1880 --tmp=<size>[units]
1881 Specify a minimum amount of temporary disk space per node. De‐
1882 fault units are megabytes. Different units can be specified us‐
1883 ing the suffix [K|M|G|T].
1884
1885
1886 --uid=<user>
1887 Attempt to submit and/or run a job as user instead of the invok‐
1888 ing user id. The invoking user's credentials will be used to
1889 check access permissions for the target partition. User root may
1890 use this option to run jobs as a normal user in a RootOnly par‐
1891 tition for example. If run as root, sbatch will drop its permis‐
1892 sions to the uid specified after node allocation is successful.
1893 user may be the user name or numerical user ID.
1894
1895
1896 --usage
1897 Display brief help message and exit.
1898
1899
1900 --use-min-nodes
1901 If a range of node counts is given, prefer the smaller count.
1902
1903
1904 -v, --verbose
1905 Increase the verbosity of sbatch's informational messages. Mul‐
1906 tiple -v's will further increase sbatch's verbosity. By default
1907 only errors will be displayed.
1908
1909
1910 -V, --version
1911 Display version information and exit.
1912
1913
1914 -W, --wait
1915 Do not exit until the submitted job terminates. The exit code
1916 of the sbatch command will be the same as the exit code of the
1917 submitted job. If the job terminated due to a signal rather than
1918 a normal exit, the exit code will be set to 1. In the case of a
1919 job array, the exit code recorded will be the highest value for
1920 any task in the job array.
1921
1922
1923 --wait-all-nodes=<value>
1924 Controls when the execution of the command begins. By default
1925 the job will begin execution as soon as the allocation is made.
1926
1927 0 Begin execution as soon as allocation can be made. Do not
1928 wait for all nodes to be ready for use (i.e. booted).
1929
1930 1 Do not begin execution until all nodes are ready for use.
1931
1932
1933 --wckey=<wckey>
1934 Specify wckey to be used with job. If TrackWCKey=no (default)
1935 in the slurm.conf this value is ignored.
1936
1937
1938 --wrap=<command_string>
1939 Sbatch will wrap the specified command string in a simple "sh"
1940 shell script, and submit that script to the slurm controller.
1941 When --wrap is used, a script name and arguments may not be
1942 specified on the command line; instead the sbatch-generated
1943 wrapper script is used.
1944
1945
1947 sbatch allows for a filename pattern to contain one or more replacement
1948 symbols, which are a percent sign "%" followed by a letter (e.g. %j).
1949
1950 \\ Do not process any of the replacement symbols.
1951
1952 %% The character "%".
1953
1954 %A Job array's master job allocation number.
1955
1956 %a Job array ID (index) number.
1957
1958 %J jobid.stepid of the running job. (e.g. "128.0")
1959
1960 %j jobid of the running job.
1961
1962 %N short hostname. This will create a separate IO file per node.
1963
1964 %n Node identifier relative to current job (e.g. "0" is the first
1965 node of the running job) This will create a separate IO file per
1966 node.
1967
1968 %s stepid of the running job.
1969
1970 %t task identifier (rank) relative to current job. This will create
1971 a separate IO file per task.
1972
1973 %u User name.
1974
1975 %x Job name.
1976
1977 A number placed between the percent character and format specifier may
1978 be used to zero-pad the result in the IO filename. This number is ig‐
1979 nored if the format specifier corresponds to non-numeric data (%N for
1980 example).
1981
1982 Some examples of how the format string may be used for a 4 task job
1983 step with a Job ID of 128 and step id of 0 are included below:
1984
1985 job%J.out job128.0.out
1986
1987 job%4j.out job0128.out
1988
1989 job%j-%2t.out job128-00.out, job128-01.out, ...
1990
1992 Executing sbatch sends a remote procedure call to slurmctld. If enough
1993 calls from sbatch or other Slurm client commands that send remote pro‐
1994 cedure calls to the slurmctld daemon come in at once, it can result in
1995 a degradation of performance of the slurmctld daemon, possibly result‐
1996 ing in a denial of service.
1997
1998 Do not run sbatch or other Slurm client commands that send remote pro‐
1999 cedure calls to slurmctld from loops in shell scripts or other pro‐
2000 grams. Ensure that programs limit calls to sbatch to the minimum neces‐
2001 sary for the information you are trying to gather.
2002
2003
2005 Upon startup, sbatch will read and handle the options set in the fol‐
2006 lowing environment variables. The majority of these variables are set
2007 the same way the options are set, as defined above. For flag options
2008 that are defined to expect no argument, the option can be enabled by
2009 setting the environment variable without a value (empty or NULL
2010 string), the string 'yes', or a non-zero number. Any other value for
2011 the environment variable will result in the option not being set.
2012 There are a couple exceptions to these rules that are noted below.
2013 NOTE: Environment variables will override any options set in a batch
2014 script, and command line options will override any environment vari‐
2015 ables.
2016
2017
2018 SBATCH_ACCOUNT Same as -A, --account
2019
2020 SBATCH_ACCTG_FREQ Same as --acctg-freq
2021
2022 SBATCH_ARRAY_INX Same as -a, --array
2023
2024 SBATCH_BATCH Same as --batch
2025
2026 SBATCH_CLUSTERS or SLURM_CLUSTERS
2027 Same as --clusters
2028
2029 SBATCH_CONSTRAINT Same as -C, --constraint
2030
2031 SBATCH_CONTAINER Same as --container.
2032
2033 SBATCH_CORE_SPEC Same as --core-spec
2034
2035 SBATCH_CPUS_PER_GPU Same as --cpus-per-gpu
2036
2037 SBATCH_DEBUG Same as -v, --verbose. Must be set to 0 or 1 to
2038 disable or enable the option.
2039
2040 SBATCH_DELAY_BOOT Same as --delay-boot
2041
2042 SBATCH_DISTRIBUTION Same as -m, --distribution
2043
2044 SBATCH_EXCLUSIVE Same as --exclusive
2045
2046 SBATCH_EXPORT Same as --export
2047
2048 SBATCH_GET_USER_ENV Same as --get-user-env
2049
2050 SBATCH_GPU_BIND Same as --gpu-bind
2051
2052 SBATCH_GPU_FREQ Same as --gpu-freq
2053
2054 SBATCH_GPUS Same as -G, --gpus
2055
2056 SBATCH_GPUS_PER_NODE Same as --gpus-per-node
2057
2058 SBATCH_GPUS_PER_TASK Same as --gpus-per-task
2059
2060 SBATCH_GRES Same as --gres
2061
2062 SBATCH_GRES_FLAGS Same as --gres-flags
2063
2064 SBATCH_HINT or SLURM_HINT
2065 Same as --hint
2066
2067 SBATCH_IGNORE_PBS Same as --ignore-pbs
2068
2069 SBATCH_JOB_NAME Same as -J, --job-name
2070
2071 SBATCH_MEM_BIND Same as --mem-bind
2072
2073 SBATCH_MEM_PER_CPU Same as --mem-per-cpu
2074
2075 SBATCH_MEM_PER_GPU Same as --mem-per-gpu
2076
2077 SBATCH_MEM_PER_NODE Same as --mem
2078
2079 SBATCH_NETWORK Same as --network
2080
2081 SBATCH_NO_KILL Same as -k, --no-kill
2082
2083 SBATCH_NO_REQUEUE Same as --no-requeue
2084
2085 SBATCH_OPEN_MODE Same as --open-mode
2086
2087 SBATCH_OVERCOMMIT Same as -O, --overcommit
2088
2089 SBATCH_PARTITION Same as -p, --partition
2090
2091 SBATCH_POWER Same as --power
2092
2093 SBATCH_PROFILE Same as --profile
2094
2095 SBATCH_QOS Same as --qos
2096
2097 SBATCH_REQ_SWITCH When a tree topology is used, this defines the
2098 maximum count of switches desired for the job al‐
2099 location and optionally the maximum time to wait
2100 for that number of switches. See --switches
2101
2102 SBATCH_REQUEUE Same as --requeue
2103
2104 SBATCH_RESERVATION Same as --reservation
2105
2106 SBATCH_SIGNAL Same as --signal
2107
2108 SBATCH_SPREAD_JOB Same as --spread-job
2109
2110 SBATCH_THREAD_SPEC Same as --thread-spec
2111
2112 SBATCH_THREADS_PER_CORE
2113 Same as --threads-per-core
2114
2115 SBATCH_TIMELIMIT Same as -t, --time
2116
2117 SBATCH_USE_MIN_NODES Same as --use-min-nodes
2118
2119 SBATCH_WAIT Same as -W, --wait
2120
2121 SBATCH_WAIT_ALL_NODES Same as --wait-all-nodes. Must be set to 0 or 1
2122 to disable or enable the option.
2123
2124 SBATCH_WAIT4SWITCH Max time waiting for requested switches. See
2125 --switches
2126
2127 SBATCH_WCKEY Same as --wckey
2128
2129 SLURM_CONF The location of the Slurm configuration file.
2130
2131 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
2132 error occurs (e.g. invalid options). This can be
2133 used by a script to distinguish application exit
2134 codes from various Slurm error conditions.
2135
2136 SLURM_STEP_KILLED_MSG_NODE_ID=ID
2137 If set, only the specified node will log when the
2138 job or step are killed by a signal.
2139
2140
2142 The Slurm controller will set the following variables in the environ‐
2143 ment of the batch script.
2144
2145 SBATCH_MEM_BIND
2146 Set to value of the --mem-bind option.
2147
2148 SBATCH_MEM_BIND_LIST
2149 Set to bit mask used for memory binding.
2150
2151 SBATCH_MEM_BIND_PREFER
2152 Set to "prefer" if the --mem-bind option includes the prefer op‐
2153 tion.
2154
2155 SBATCH_MEM_BIND_TYPE
2156 Set to the memory binding type specified with the --mem-bind op‐
2157 tion. Possible values are "none", "rank", "map_map", "mask_mem"
2158 and "local".
2159
2160 SBATCH_MEM_BIND_VERBOSE
2161 Set to "verbose" if the --mem-bind option includes the verbose
2162 option. Set to "quiet" otherwise.
2163
2164 SLURM_*_HET_GROUP_#
2165 For a heterogeneous job allocation, the environment variables
2166 are set separately for each component.
2167
2168 SLURM_ARRAY_JOB_ID
2169 Job array's master job ID number.
2170
2171 SLURM_ARRAY_TASK_COUNT
2172 Total number of tasks in a job array.
2173
2174 SLURM_ARRAY_TASK_ID
2175 Job array ID (index) number.
2176
2177 SLURM_ARRAY_TASK_MAX
2178 Job array's maximum ID (index) number.
2179
2180 SLURM_ARRAY_TASK_MIN
2181 Job array's minimum ID (index) number.
2182
2183 SLURM_ARRAY_TASK_STEP
2184 Job array's index step size.
2185
2186 SLURM_CLUSTER_NAME
2187 Name of the cluster on which the job is executing.
2188
2189 SLURM_CPUS_ON_NODE
2190 Number of CPUs allocated to the batch step. NOTE: The se‐
2191 lect/linear plugin allocates entire nodes to jobs, so the value
2192 indicates the total count of CPUs on the node. For the se‐
2193 lect/cons_res and cons/tres plugins, this number indicates the
2194 number of CPUs on this node allocated to the step.
2195
2196 SLURM_CPUS_PER_GPU
2197 Number of CPUs requested per allocated GPU. Only set if the
2198 --cpus-per-gpu option is specified.
2199
2200 SLURM_CPUS_PER_TASK
2201 Number of cpus requested per task. Only set if the
2202 --cpus-per-task option is specified.
2203
2204 SLURM_CONTAINER
2205 OCI Bundle for job. Only set if --container is specified.
2206
2207 SLURM_DIST_PLANESIZE
2208 Plane distribution size. Only set for plane distributions. See
2209 -m, --distribution.
2210
2211 SLURM_DISTRIBUTION
2212 Same as -m, --distribution
2213
2214 SLURM_EXPORT_ENV
2215 Same as --export.
2216
2217 SLURM_GPU_BIND
2218 Requested binding of tasks to GPU. Only set if the --gpu-bind
2219 option is specified.
2220
2221 SLURM_GPU_FREQ
2222 Requested GPU frequency. Only set if the --gpu-freq option is
2223 specified.
2224
2225 SLURM_GPUS
2226 Number of GPUs requested. Only set if the -G, --gpus option is
2227 specified.
2228
2229 SLURM_GPUS_ON_NODE
2230 Number of GPUs allocated to the batch step.
2231
2232 SLURM_GPUS_PER_NODE
2233 Requested GPU count per allocated node. Only set if the
2234 --gpus-per-node option is specified.
2235
2236 SLURM_GPUS_PER_SOCKET
2237 Requested GPU count per allocated socket. Only set if the
2238 --gpus-per-socket option is specified.
2239
2240 SLURM_GPUS_PER_TASK
2241 Requested GPU count per allocated task. Only set if the
2242 --gpus-per-task option is specified.
2243
2244 SLURM_GTIDS
2245 Global task IDs running on this node. Zero origin and comma
2246 separated. It is read internally by pmi if Slurm was built with
2247 pmi support. Leaving the variable set may cause problems when
2248 using external packages from within the job (Abaqus and Ansys
2249 have been known to have problems when it is set - consult the
2250 appropriate documentation for 3rd party software).
2251
2252 SLURM_HET_SIZE
2253 Set to count of components in heterogeneous job.
2254
2255 SLURM_JOB_ACCOUNT
2256 Account name associated of the job allocation.
2257
2258 SLURM_JOB_ID
2259 The ID of the job allocation.
2260
2261 SLURM_JOB_CPUS_PER_NODE
2262 Count of CPUs available to the job on the nodes in the alloca‐
2263 tion, using the format CPU_count[(xnumber_of_nodes)][,CPU_count
2264 [(xnumber_of_nodes)] ...]. For example:
2265 SLURM_JOB_CPUS_PER_NODE='72(x2),36' indicates that on the first
2266 and second nodes (as listed by SLURM_JOB_NODELIST) the alloca‐
2267 tion has 72 CPUs, while the third node has 36 CPUs. NOTE: The
2268 select/linear plugin allocates entire nodes to jobs, so the
2269 value indicates the total count of CPUs on allocated nodes. The
2270 select/cons_res and select/cons_tres plugins allocate individual
2271 CPUs to jobs, so this number indicates the number of CPUs allo‐
2272 cated to the job.
2273
2274 SLURM_JOB_DEPENDENCY
2275 Set to value of the --dependency option.
2276
2277 SLURM_JOB_NAME
2278 Name of the job.
2279
2280 SLURM_JOB_NODELIST
2281 List of nodes allocated to the job.
2282
2283 SLURM_JOB_NUM_NODES
2284 Total number of nodes in the job's resource allocation.
2285
2286 SLURM_JOB_PARTITION
2287 Name of the partition in which the job is running.
2288
2289 SLURM_JOB_QOS
2290 Quality Of Service (QOS) of the job allocation.
2291
2292 SLURM_JOB_RESERVATION
2293 Advanced reservation containing the job allocation, if any.
2294
2295 SLURM_JOBID
2296 The ID of the job allocation. See SLURM_JOB_ID. Included for
2297 backwards compatibility.
2298
2299 SLURM_LOCALID
2300 Node local task ID for the process within a job.
2301
2302 SLURM_MEM_PER_CPU
2303 Same as --mem-per-cpu
2304
2305 SLURM_MEM_PER_GPU
2306 Requested memory per allocated GPU. Only set if the
2307 --mem-per-gpu option is specified.
2308
2309 SLURM_MEM_PER_NODE
2310 Same as --mem
2311
2312 SLURM_NNODES
2313 Total number of nodes in the job's resource allocation. See
2314 SLURM_JOB_NUM_NODES. Included for backwards compatibility.
2315
2316 SLURM_NODE_ALIASES
2317 Sets of node name, communication address and hostname for nodes
2318 allocated to the job from the cloud. Each element in the set if
2319 colon separated and each set is comma separated. For example:
2320 SLURM_NODE_ALIASES=ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2321
2322 SLURM_NODEID
2323 ID of the nodes allocated.
2324
2325 SLURM_NODELIST
2326 List of nodes allocated to the job. See SLURM_JOB_NODELIST. In‐
2327 cluded for backwards compatibility.
2328
2329 SLURM_NPROCS
2330 Same as -n, --ntasks. See SLURM_NTASKS. Included for backwards
2331 compatibility.
2332
2333 SLURM_NTASKS
2334 Same as -n, --ntasks
2335
2336 SLURM_NTASKS_PER_CORE
2337 Number of tasks requested per core. Only set if the
2338 --ntasks-per-core option is specified.
2339
2340
2341 SLURM_NTASKS_PER_GPU
2342 Number of tasks requested per GPU. Only set if the
2343 --ntasks-per-gpu option is specified.
2344
2345 SLURM_NTASKS_PER_NODE
2346 Number of tasks requested per node. Only set if the
2347 --ntasks-per-node option is specified.
2348
2349 SLURM_NTASKS_PER_SOCKET
2350 Number of tasks requested per socket. Only set if the
2351 --ntasks-per-socket option is specified.
2352
2353 SLURM_OVERCOMMIT
2354 Set to 1 if --overcommit was specified.
2355
2356 SLURM_PRIO_PROCESS
2357 The scheduling priority (nice value) at the time of job submis‐
2358 sion. This value is propagated to the spawned processes.
2359
2360 SLURM_PROCID
2361 The MPI rank (or relative process ID) of the current process
2362
2363 SLURM_PROFILE
2364 Same as --profile
2365
2366 SLURM_RESTART_COUNT
2367 If the job has been restarted due to system failure or has been
2368 explicitly requeued, this will be sent to the number of times
2369 the job has been restarted.
2370
2371 SLURM_SUBMIT_DIR
2372 The directory from which sbatch was invoked.
2373
2374 SLURM_SUBMIT_HOST
2375 The hostname of the computer from which sbatch was invoked.
2376
2377 SLURM_TASK_PID
2378 The process ID of the task being started.
2379
2380 SLURM_TASKS_PER_NODE
2381 Number of tasks to be initiated on each node. Values are comma
2382 separated and in the same order as SLURM_JOB_NODELIST. If two
2383 or more consecutive nodes are to have the same task count, that
2384 count is followed by "(x#)" where "#" is the repetition count.
2385 For example, "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2386 first three nodes will each execute two tasks and the fourth
2387 node will execute one task.
2388
2389 SLURM_THREADS_PER_CORE
2390 This is only set if --threads-per-core or
2391 SBATCH_THREADS_PER_CORE were specified. The value will be set to
2392 the value specified by --threads-per-core or
2393 SBATCH_THREADS_PER_CORE. This is used by subsequent srun calls
2394 within the job allocation.
2395
2396 SLURM_TOPOLOGY_ADDR
2397 This is set only if the system has the topology/tree plugin
2398 configured. The value will be set to the names network
2399 switches which may be involved in the job's communications
2400 from the system's top level switch down to the leaf switch and
2401 ending with node name. A period is used to separate each hard‐
2402 ware component name.
2403
2404 SLURM_TOPOLOGY_ADDR_PATTERN
2405 This is set only if the system has the topology/tree plugin
2406 configured. The value will be set component types listed in
2407 SLURM_TOPOLOGY_ADDR. Each component will be identified as ei‐
2408 ther "switch" or "node". A period is used to separate each
2409 hardware component type.
2410
2411 SLURMD_NODENAME
2412 Name of the node running the job script.
2413
2414
2416 Specify a batch script by filename on the command line. The batch
2417 script specifies a 1 minute time limit for the job.
2418
2419 $ cat myscript
2420 #!/bin/sh
2421 #SBATCH --time=1
2422 srun hostname |sort
2423
2424 $ sbatch -N4 myscript
2425 salloc: Granted job allocation 65537
2426
2427 $ cat slurm-65537.out
2428 host1
2429 host2
2430 host3
2431 host4
2432
2433
2434 Pass a batch script to sbatch on standard input:
2435
2436 $ sbatch -N4 <<EOF
2437 > #!/bin/sh
2438 > srun hostname |sort
2439 > EOF
2440 sbatch: Submitted batch job 65541
2441
2442 $ cat slurm-65541.out
2443 host1
2444 host2
2445 host3
2446 host4
2447
2448
2449 To create a heterogeneous job with 3 components, each allocating a
2450 unique set of nodes:
2451
2452 $ sbatch -w node[2-3] : -w node4 : -w node[5-7] work.bash
2453 Submitted batch job 34987
2454
2455
2457 Copyright (C) 2006-2007 The Regents of the University of California.
2458 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
2459 Copyright (C) 2008-2010 Lawrence Livermore National Security.
2460 Copyright (C) 2010-2021 SchedMD LLC.
2461
2462 This file is part of Slurm, a resource management program. For de‐
2463 tails, see <https://slurm.schedmd.com/>.
2464
2465 Slurm is free software; you can redistribute it and/or modify it under
2466 the terms of the GNU General Public License as published by the Free
2467 Software Foundation; either version 2 of the License, or (at your op‐
2468 tion) any later version.
2469
2470 Slurm is distributed in the hope that it will be useful, but WITHOUT
2471 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
2472 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
2473 for more details.
2474
2475
2477 sinfo(1), sattach(1), salloc(1), squeue(1), scancel(1), scontrol(1),
2478 slurm.conf(5), sched_setaffinity (2), numa (3)
2479
2480
2481
2482November 2021 Slurm Commands sbatch(1)