1sbatch(1) Slurm Commands sbatch(1)
2
3
4
6 sbatch - Submit a batch script to Slurm.
7
8
10 sbatch [OPTIONS(0)...] [ : [OPTIONS(N)...]] script(0) [args(0)...]
11
12 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
13 For more details about heterogeneous jobs see the document
14 https://slurm.schedmd.com/heterogeneous_jobs.html
15
16
18 sbatch submits a batch script to Slurm. The batch script may be given
19 to sbatch through a file name on the command line, or if no file name
20 is specified, sbatch will read in a script from standard input. The
21 batch script may contain options preceded with "#SBATCH" before any ex‐
22 ecutable commands in the script. sbatch will stop processing further
23 #SBATCH directives once the first non-comment non-whitespace line has
24 been reached in the script.
25
26 sbatch exits immediately after the script is successfully transferred
27 to the Slurm controller and assigned a Slurm job ID. The batch script
28 is not necessarily granted resources immediately, it may sit in the
29 queue of pending jobs for some time before its required resources be‐
30 come available.
31
32 By default both standard output and standard error are directed to a
33 file of the name "slurm-%j.out", where the "%j" is replaced with the
34 job allocation number. The file will be generated on the first node of
35 the job allocation. Other than the batch script itself, Slurm does no
36 movement of user files.
37
38 When the job allocation is finally granted for the batch script, Slurm
39 runs a single copy of the batch script on the first node in the set of
40 allocated nodes.
41
42 The following document describes the influence of various options on
43 the allocation of cpus to jobs and tasks.
44 https://slurm.schedmd.com/cpu_management.html
45
46
48 sbatch will return 0 on success or error code on failure.
49
50
52 The batch script is resolved in the following order:
53
54 1. If script starts with ".", then path is constructed as: current
55 working directory / script
56 2. If script starts with a "/", then path is considered absolute.
57 3. If script is in current working directory.
58 4. If script can be resolved through PATH. See path_resolution(7).
59
60 Current working directory is the calling process working directory un‐
61 less the --chdir argument is passed, which will override the current
62 working directory.
63
64
66 -A, --account=<account>
67 Charge resources used by this job to specified account. The ac‐
68 count is an arbitrary string. The account name may be changed
69 after job submission using the scontrol command.
70
71 --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
72 Define the job accounting and profiling sampling intervals in
73 seconds. This can be used to override the JobAcctGatherFre‐
74 quency parameter in the slurm.conf file. <datatype>=<interval>
75 specifies the task sampling interval for the jobacct_gather
76 plugin or a sampling interval for a profiling type by the
77 acct_gather_profile plugin. Multiple comma-separated
78 <datatype>=<interval> pairs may be specified. Supported datatype
79 values are:
80
81 task Sampling interval for the jobacct_gather plugins and
82 for task profiling by the acct_gather_profile
83 plugin.
84 NOTE: This frequency is used to monitor memory us‐
85 age. If memory limits are enforced, the highest fre‐
86 quency a user can request is what is configured in
87 the slurm.conf file. It can not be disabled.
88
89 energy Sampling interval for energy profiling using the
90 acct_gather_energy plugin.
91
92 network Sampling interval for infiniband profiling using the
93 acct_gather_interconnect plugin.
94
95 filesystem Sampling interval for filesystem profiling using the
96 acct_gather_filesystem plugin.
97
98 The default value for the task sampling interval is 30 seconds.
99 The default value for all other intervals is 0. An interval of
100 0 disables sampling of the specified type. If the task sampling
101 interval is 0, accounting information is collected only at job
102 termination (reducing Slurm interference with the job).
103 Smaller (non-zero) values have a greater impact upon job perfor‐
104 mance, but a value of 30 seconds is not likely to be noticeable
105 for applications having less than 10,000 tasks.
106
107 -a, --array=<indexes>
108 Submit a job array, multiple jobs to be executed with identical
109 parameters. The indexes specification identifies what array in‐
110 dex values should be used. Multiple values may be specified us‐
111 ing a comma separated list and/or a range of values with a "-"
112 separator. For example, "--array=0-15" or "--array=0,6,16-32".
113 A step function can also be specified with a suffix containing a
114 colon and number. For example, "--array=0-15:4" is equivalent to
115 "--array=0,4,8,12". A maximum number of simultaneously running
116 tasks from the job array may be specified using a "%" separator.
117 For example "--array=0-15%4" will limit the number of simultane‐
118 ously running tasks from this job array to 4. The minimum index
119 value is 0. the maximum value is one less than the configura‐
120 tion parameter MaxArraySize. NOTE: currently, federated job ar‐
121 rays only run on the local cluster.
122
123 --batch=<list>
124 Nodes can have features assigned to them by the Slurm adminis‐
125 trator. Users can specify which of these features are required
126 by their batch script using this options. For example a job's
127 allocation may include both Intel Haswell and KNL nodes with
128 features "haswell" and "knl" respectively. On such a configura‐
129 tion the batch script would normally benefit by executing on a
130 faster Haswell node. This would be specified using the option
131 "--batch=haswell". The specification can include AND and OR op‐
132 erators using the ampersand and vertical bar separators. For ex‐
133 ample: "--batch=haswell|broadwell" or "--batch=haswell|big_mem‐
134 ory". The --batch argument must be a subset of the job's --con‐
135 straint=<list> argument (i.e. the job can not request only KNL
136 nodes, but require the script to execute on a Haswell node). If
137 the request can not be satisfied from the resources allocated to
138 the job, the batch script will execute on the first node of the
139 job allocation.
140
141 --bb=<spec>
142 Burst buffer specification. The form of the specification is
143 system dependent. Also see --bbf. When the --bb option is
144 used, Slurm parses this option and creates a temporary burst
145 buffer script file that is used internally by the burst buffer
146 plugins. See Slurm's burst buffer guide for more information and
147 examples:
148 https://slurm.schedmd.com/burst_buffer.html
149
150 --bbf=<file_name>
151 Path of file containing burst buffer specification. The form of
152 the specification is system dependent. These burst buffer di‐
153 rectives will be inserted into the submitted batch script. See
154 Slurm's burst buffer guide for more information and examples:
155 https://slurm.schedmd.com/burst_buffer.html
156
157 -b, --begin=<time>
158 Submit the batch script to the Slurm controller immediately,
159 like normal, but tell the controller to defer the allocation of
160 the job until the specified time.
161
162 Time may be of the form HH:MM:SS to run a job at a specific time
163 of day (seconds are optional). (If that time is already past,
164 the next day is assumed.) You may also specify midnight, noon,
165 fika (3 PM) or teatime (4 PM) and you can have a time-of-day
166 suffixed with AM or PM for running in the morning or the
167 evening. You can also say what day the job will be run, by
168 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
169 Combine date and time using the following format
170 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
171 count time-units, where the time-units can be seconds (default),
172 minutes, hours, days, or weeks and you can tell Slurm to run the
173 job today with the keyword today and to run the job tomorrow
174 with the keyword tomorrow. The value may be changed after job
175 submission using the scontrol command. For example:
176
177 --begin=16:00
178 --begin=now+1hour
179 --begin=now+60 (seconds by default)
180 --begin=2010-01-20T12:34:00
181
182
183 Notes on date/time specifications:
184 - Although the 'seconds' field of the HH:MM:SS time specifica‐
185 tion is allowed by the code, note that the poll time of the
186 Slurm scheduler is not precise enough to guarantee dispatch of
187 the job on the exact second. The job will be eligible to start
188 on the next poll following the specified time. The exact poll
189 interval depends on the Slurm scheduler (e.g., 60 seconds with
190 the default sched/builtin).
191 - If no time (HH:MM:SS) is specified, the default is
192 (00:00:00).
193 - If a date is specified without a year (e.g., MM/DD) then the
194 current year is assumed, unless the combination of MM/DD and
195 HH:MM:SS has already passed for that year, in which case the
196 next year is used.
197
198 -D, --chdir=<directory>
199 Set the working directory of the batch script to directory be‐
200 fore it is executed. The path can be specified as full path or
201 relative path to the directory where the command is executed.
202
203 --cluster-constraint=[!]<list>
204 Specifies features that a federated cluster must have to have a
205 sibling job submitted to it. Slurm will attempt to submit a sib‐
206 ling job to a cluster if it has at least one of the specified
207 features. If the "!" option is included, Slurm will attempt to
208 submit a sibling job to a cluster that has none of the specified
209 features.
210
211 -M, --clusters=<string>
212 Clusters to issue commands to. Multiple cluster names may be
213 comma separated. The job will be submitted to the one cluster
214 providing the earliest expected job initiation time. The default
215 value is the current cluster. A value of 'all' will query to run
216 on all clusters. Note the --export option to control environ‐
217 ment variables exported between clusters. Note that the Slur‐
218 mDBD must be up for this option to work properly.
219
220 --comment=<string>
221 An arbitrary comment enclosed in double quotes if using spaces
222 or some special characters.
223
224 -C, --constraint=<list>
225 Nodes can have features assigned to them by the Slurm adminis‐
226 trator. Users can specify which of these features are required
227 by their job using the constraint option. Only nodes having
228 features matching the job constraints will be used to satisfy
229 the request. Multiple constraints may be specified with AND,
230 OR, matching OR, resource counts, etc. (some operators are not
231 supported on all system types). Supported constraint options
232 include:
233
234 Single Name
235 Only nodes which have the specified feature will be used.
236 For example, --constraint="intel"
237
238 Node Count
239 A request can specify the number of nodes needed with
240 some feature by appending an asterisk and count after the
241 feature name. For example, --nodes=16 --con‐
242 straint="graphics*4 ..." indicates that the job requires
243 16 nodes and that at least four of those nodes must have
244 the feature "graphics."
245
246 AND If only nodes with all of specified features will be
247 used. The ampersand is used for an AND operator. For
248 example, --constraint="intel&gpu"
249
250 OR If only nodes with at least one of specified features
251 will be used. The vertical bar is used for an OR opera‐
252 tor. For example, --constraint="intel|amd"
253
254 Matching OR
255 If only one of a set of possible options should be used
256 for all allocated nodes, then use the OR operator and en‐
257 close the options within square brackets. For example,
258 --constraint="[rack1|rack2|rack3|rack4]" might be used to
259 specify that all nodes must be allocated on a single rack
260 of the cluster, but any of those four racks can be used.
261
262 Multiple Counts
263 Specific counts of multiple resources may be specified by
264 using the AND operator and enclosing the options within
265 square brackets. For example, --con‐
266 straint="[rack1*2&rack2*4]" might be used to specify that
267 two nodes must be allocated from nodes with the feature
268 of "rack1" and four nodes must be allocated from nodes
269 with the feature "rack2".
270
271 NOTE: This construct does not support multiple Intel KNL
272 NUMA or MCDRAM modes. For example, while --con‐
273 straint="[(knl&quad)*2&(knl&hemi)*4]" is not supported,
274 --constraint="[haswell*2&(knl&hemi)*4]" is supported.
275 Specification of multiple KNL modes requires the use of a
276 heterogeneous job.
277
278 Brackets
279 Brackets can be used to indicate that you are looking for
280 a set of nodes with the different requirements contained
281 within the brackets. For example, --con‐
282 straint="[(rack1|rack2)*1&(rack3)*2]" will get you one
283 node with either the "rack1" or "rack2" features and two
284 nodes with the "rack3" feature. The same request without
285 the brackets will try to find a single node that meets
286 those requirements.
287
288 NOTE: Brackets are only reserved for Multiple Counts and
289 Matching OR syntax. AND operators require a count for
290 each feature inside square brackets (i.e.
291 "[quad*2&hemi*1]"). Slurm will only allow a single set of
292 bracketed constraints per job.
293
294 Parenthesis
295 Parenthesis can be used to group like node features to‐
296 gether. For example, --con‐
297 straint="[(knl&snc4&flat)*4&haswell*1]" might be used to
298 specify that four nodes with the features "knl", "snc4"
299 and "flat" plus one node with the feature "haswell" are
300 required. All options within parenthesis should be
301 grouped with AND (e.g. "&") operands.
302
303 --container=<path_to_container>
304 Absolute path to OCI container bundle.
305
306 --contiguous
307 If set, then the allocated nodes must form a contiguous set.
308
309 NOTE: If SelectPlugin=cons_res this option won't be honored with
310 the topology/tree or topology/3d_torus plugins, both of which
311 can modify the node ordering.
312
313 -S, --core-spec=<num>
314 Count of specialized cores per node reserved by the job for sys‐
315 tem operations and not used by the application. The application
316 will not use these cores, but will be charged for their alloca‐
317 tion. Default value is dependent upon the node's configured
318 CoreSpecCount value. If a value of zero is designated and the
319 Slurm configuration option AllowSpecResourcesUsage is enabled,
320 the job will be allowed to override CoreSpecCount and use the
321 specialized resources on nodes it is allocated. This option can
322 not be used with the --thread-spec option.
323
324 --cores-per-socket=<cores>
325 Restrict node selection to nodes with at least the specified
326 number of cores per socket. See additional information under -B
327 option above when task/affinity plugin is enabled.
328 NOTE: This option may implicitly set the number of tasks (if -n
329 was not specified) as one task per requested thread.
330
331 --cpu-freq=<p1>[-p2[:p3]]
332
333 Request that job steps initiated by srun commands inside this
334 sbatch script be run at some requested frequency if possible, on
335 the CPUs selected for the step on the compute node(s).
336
337 p1 can be [#### | low | medium | high | highm1] which will set
338 the frequency scaling_speed to the corresponding value, and set
339 the frequency scaling_governor to UserSpace. See below for defi‐
340 nition of the values.
341
342 p1 can be [Conservative | OnDemand | Performance | PowerSave]
343 which will set the scaling_governor to the corresponding value.
344 The governor has to be in the list set by the slurm.conf option
345 CpuFreqGovernors.
346
347 When p2 is present, p1 will be the minimum scaling frequency and
348 p2 will be the maximum scaling frequency.
349
350 p2 can be [#### | medium | high | highm1] p2 must be greater
351 than p1.
352
353 p3 can be [Conservative | OnDemand | Performance | PowerSave |
354 SchedUtil | UserSpace] which will set the governor to the corre‐
355 sponding value.
356
357 If p3 is UserSpace, the frequency scaling_speed will be set by a
358 power or energy aware scheduling strategy to a value between p1
359 and p2 that lets the job run within the site's power goal. The
360 job may be delayed if p1 is higher than a frequency that allows
361 the job to run within the goal.
362
363 If the current frequency is < min, it will be set to min. Like‐
364 wise, if the current frequency is > max, it will be set to max.
365
366 Acceptable values at present include:
367
368 #### frequency in kilohertz
369
370 Low the lowest available frequency
371
372 High the highest available frequency
373
374 HighM1 (high minus one) will select the next highest
375 available frequency
376
377 Medium attempts to set a frequency in the middle of the
378 available range
379
380 Conservative attempts to use the Conservative CPU governor
381
382 OnDemand attempts to use the OnDemand CPU governor (the de‐
383 fault value)
384
385 Performance attempts to use the Performance CPU governor
386
387 PowerSave attempts to use the PowerSave CPU governor
388
389 UserSpace attempts to use the UserSpace CPU governor
390
391 The following informational environment variable is set in the job step
392 when --cpu-freq option is requested.
393 SLURM_CPU_FREQ_REQ
394
395 This environment variable can also be used to supply the value for the
396 CPU frequency request if it is set when the 'srun' command is issued.
397 The --cpu-freq on the command line will override the environment vari‐
398 able value. The form on the environment variable is the same as the
399 command line. See the ENVIRONMENT VARIABLES section for a description
400 of the SLURM_CPU_FREQ_REQ variable.
401
402 NOTE: This parameter is treated as a request, not a requirement. If
403 the job step's node does not support setting the CPU frequency, or the
404 requested value is outside the bounds of the legal frequencies, an er‐
405 ror is logged, but the job step is allowed to continue.
406
407 NOTE: Setting the frequency for just the CPUs of the job step implies
408 that the tasks are confined to those CPUs. If task confinement (i.e.
409 the task/affinity TaskPlugin is enabled, or the task/cgroup TaskPlugin
410 is enabled with "ConstrainCores=yes" set in cgroup.conf) is not config‐
411 ured, this parameter is ignored.
412
413 NOTE: When the step completes, the frequency and governor of each se‐
414 lected CPU is reset to the previous values.
415
416 NOTE: When submitting jobs with the --cpu-freq option with linuxproc
417 as the ProctrackType can cause jobs to run too quickly before Account‐
418 ing is able to poll for job information. As a result not all of ac‐
419 counting information will be present.
420
421 --cpus-per-gpu=<ncpus>
422 Advise Slurm that ensuing job steps will require ncpus proces‐
423 sors per allocated GPU. Not compatible with the --cpus-per-task
424 option.
425
426 -c, --cpus-per-task=<ncpus>
427 Advise the Slurm controller that ensuing job steps will require
428 ncpus number of processors per task. Without this option, the
429 controller will just try to allocate one processor per task.
430
431 For instance, consider an application that has 4 tasks, each re‐
432 quiring 3 processors. If our cluster is comprised of quad-pro‐
433 cessors nodes and we simply ask for 12 processors, the con‐
434 troller might give us only 3 nodes. However, by using the
435 --cpus-per-task=3 options, the controller knows that each task
436 requires 3 processors on the same node, and the controller will
437 grant an allocation of 4 nodes, one for each of the 4 tasks.
438
439 --deadline=<OPT>
440 remove the job if no ending is possible before this deadline
441 (start > (deadline - time[-min])). Default is no deadline.
442 Valid time formats are:
443 HH:MM[:SS] [AM|PM]
444 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
445 MM/DD[/YY]-HH:MM[:SS]
446 YYYY-MM-DD[THH:MM[:SS]]]
447 now[+count[seconds(default)|minutes|hours|days|weeks]]
448
449 --delay-boot=<minutes>
450 Do not reboot nodes in order to satisfied this job's feature
451 specification if the job has been eligible to run for less than
452 this time period. If the job has waited for less than the spec‐
453 ified period, it will use only nodes which already have the
454 specified features. The argument is in units of minutes. A de‐
455 fault value may be set by a system administrator using the de‐
456 lay_boot option of the SchedulerParameters configuration parame‐
457 ter in the slurm.conf file, otherwise the default value is zero
458 (no delay).
459
460 -d, --dependency=<dependency_list>
461 Defer the start of this job until the specified dependencies
462 have been satisfied completed. <dependency_list> is of the form
463 <type:job_id[:job_id][,type:job_id[:job_id]]> or
464 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
465 must be satisfied if the "," separator is used. Any dependency
466 may be satisfied if the "?" separator is used. Only one separa‐
467 tor may be used. Many jobs can share the same dependency and
468 these jobs may even belong to different users. The value may
469 be changed after job submission using the scontrol command. De‐
470 pendencies on remote jobs are allowed in a federation. Once a
471 job dependency fails due to the termination state of a preceding
472 job, the dependent job will never be run, even if the preceding
473 job is requeued and has a different termination state in a sub‐
474 sequent execution.
475
476 after:job_id[[+time][:jobid[+time]...]]
477 After the specified jobs start or are cancelled and
478 'time' in minutes from job start or cancellation happens,
479 this job can begin execution. If no 'time' is given then
480 there is no delay after start or cancellation.
481
482 afterany:job_id[:jobid...]
483 This job can begin execution after the specified jobs
484 have terminated.
485
486 afterburstbuffer:job_id[:jobid...]
487 This job can begin execution after the specified jobs
488 have terminated and any associated burst buffer stage out
489 operations have completed.
490
491 aftercorr:job_id[:jobid...]
492 A task of this job array can begin execution after the
493 corresponding task ID in the specified job has completed
494 successfully (ran to completion with an exit code of
495 zero).
496
497 afternotok:job_id[:jobid...]
498 This job can begin execution after the specified jobs
499 have terminated in some failed state (non-zero exit code,
500 node failure, timed out, etc).
501
502 afterok:job_id[:jobid...]
503 This job can begin execution after the specified jobs
504 have successfully executed (ran to completion with an
505 exit code of zero).
506
507 singleton
508 This job can begin execution after any previously
509 launched jobs sharing the same job name and user have
510 terminated. In other words, only one job by that name
511 and owned by that user can be running or suspended at any
512 point in time. In a federation, a singleton dependency
513 must be fulfilled on all clusters unless DependencyParam‐
514 eters=disable_remote_singleton is used in slurm.conf.
515
516 -m, --distribution={*|block|cyclic|arbi‐
517 trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
518
519 Specify alternate distribution methods for remote processes.
520 For job allocation, this sets environment variables that will be
521 used by subsequent srun requests and also affects which cores
522 will be selected for job allocation.
523
524 This option controls the distribution of tasks to the nodes on
525 which resources have been allocated, and the distribution of
526 those resources to tasks for binding (task affinity). The first
527 distribution method (before the first ":") controls the distri‐
528 bution of tasks to nodes. The second distribution method (after
529 the first ":") controls the distribution of allocated CPUs
530 across sockets for binding to tasks. The third distribution
531 method (after the second ":") controls the distribution of allo‐
532 cated CPUs across cores for binding to tasks. The second and
533 third distributions apply only if task affinity is enabled. The
534 third distribution is supported only if the task/cgroup plugin
535 is configured. The default value for each distribution type is
536 specified by *.
537
538 Note that with select/cons_res and select/cons_tres, the number
539 of CPUs allocated to each socket and node may be different. Re‐
540 fer to https://slurm.schedmd.com/mc_support.html for more infor‐
541 mation on resource allocation, distribution of tasks to nodes,
542 and binding of tasks to CPUs.
543 First distribution method (distribution of tasks across nodes):
544
545
546 * Use the default method for distributing tasks to nodes
547 (block).
548
549 block The block distribution method will distribute tasks to a
550 node such that consecutive tasks share a node. For exam‐
551 ple, consider an allocation of three nodes each with two
552 cpus. A four-task block distribution request will dis‐
553 tribute those tasks to the nodes with tasks one and two
554 on the first node, task three on the second node, and
555 task four on the third node. Block distribution is the
556 default behavior if the number of tasks exceeds the num‐
557 ber of allocated nodes.
558
559 cyclic The cyclic distribution method will distribute tasks to a
560 node such that consecutive tasks are distributed over
561 consecutive nodes (in a round-robin fashion). For exam‐
562 ple, consider an allocation of three nodes each with two
563 cpus. A four-task cyclic distribution request will dis‐
564 tribute those tasks to the nodes with tasks one and four
565 on the first node, task two on the second node, and task
566 three on the third node. Note that when SelectType is
567 select/cons_res, the same number of CPUs may not be allo‐
568 cated on each node. Task distribution will be round-robin
569 among all the nodes with CPUs yet to be assigned to
570 tasks. Cyclic distribution is the default behavior if
571 the number of tasks is no larger than the number of allo‐
572 cated nodes.
573
574 plane The tasks are distributed in blocks of size <size>. The
575 size must be given or SLURM_DIST_PLANESIZE must be set.
576 The number of tasks distributed to each node is the same
577 as for cyclic distribution, but the taskids assigned to
578 each node depend on the plane size. Additional distribu‐
579 tion specifications cannot be combined with this option.
580 For more details (including examples and diagrams),
581 please see https://slurm.schedmd.com/mc_support.html and
582 https://slurm.schedmd.com/dist_plane.html
583
584 arbitrary
585 The arbitrary method of distribution will allocate pro‐
586 cesses in-order as listed in file designated by the envi‐
587 ronment variable SLURM_HOSTFILE. If this variable is
588 listed it will over ride any other method specified. If
589 not set the method will default to block. Inside the
590 hostfile must contain at minimum the number of hosts re‐
591 quested and be one per line or comma separated. If spec‐
592 ifying a task count (-n, --ntasks=<number>), your tasks
593 will be laid out on the nodes in the order of the file.
594 NOTE: The arbitrary distribution option on a job alloca‐
595 tion only controls the nodes to be allocated to the job
596 and not the allocation of CPUs on those nodes. This op‐
597 tion is meant primarily to control a job step's task lay‐
598 out in an existing job allocation for the srun command.
599 NOTE: If the number of tasks is given and a list of re‐
600 quested nodes is also given, the number of nodes used
601 from that list will be reduced to match that of the num‐
602 ber of tasks if the number of nodes in the list is
603 greater than the number of tasks.
604
605 Second distribution method (distribution of CPUs across sockets
606 for binding):
607
608
609 * Use the default method for distributing CPUs across sock‐
610 ets (cyclic).
611
612 block The block distribution method will distribute allocated
613 CPUs consecutively from the same socket for binding to
614 tasks, before using the next consecutive socket.
615
616 cyclic The cyclic distribution method will distribute allocated
617 CPUs for binding to a given task consecutively from the
618 same socket, and from the next consecutive socket for the
619 next task, in a round-robin fashion across sockets.
620 Tasks requiring more than one CPU will have all of those
621 CPUs allocated on a single socket if possible.
622
623 fcyclic
624 The fcyclic distribution method will distribute allocated
625 CPUs for binding to tasks from consecutive sockets in a
626 round-robin fashion across the sockets. Tasks requiring
627 more than one CPU will have each CPUs allocated in a
628 cyclic fashion across sockets.
629
630 Third distribution method (distribution of CPUs across cores for
631 binding):
632
633
634 * Use the default method for distributing CPUs across cores
635 (inherited from second distribution method).
636
637 block The block distribution method will distribute allocated
638 CPUs consecutively from the same core for binding to
639 tasks, before using the next consecutive core.
640
641 cyclic The cyclic distribution method will distribute allocated
642 CPUs for binding to a given task consecutively from the
643 same core, and from the next consecutive core for the
644 next task, in a round-robin fashion across cores.
645
646 fcyclic
647 The fcyclic distribution method will distribute allocated
648 CPUs for binding to tasks from consecutive cores in a
649 round-robin fashion across the cores.
650
651 Optional control for task distribution over nodes:
652
653
654 Pack Rather than evenly distributing a job step's tasks evenly
655 across its allocated nodes, pack them as tightly as pos‐
656 sible on the nodes. This only applies when the "block"
657 task distribution method is used.
658
659 NoPack Rather than packing a job step's tasks as tightly as pos‐
660 sible on the nodes, distribute them evenly. This user
661 option will supersede the SelectTypeParameters
662 CR_Pack_Nodes configuration parameter.
663
664 -e, --error=<filename_pattern>
665 Instruct Slurm to connect the batch script's standard error di‐
666 rectly to the file name specified in the "filename pattern". By
667 default both standard output and standard error are directed to
668 the same file. For job arrays, the default file name is
669 "slurm-%A_%a.out", "%A" is replaced by the job ID and "%a" with
670 the array index. For other jobs, the default file name is
671 "slurm-%j.out", where the "%j" is replaced by the job ID. See
672 the filename pattern section below for filename specification
673 options.
674
675 -x, --exclude=<node_name_list>
676 Explicitly exclude certain nodes from the resources granted to
677 the job.
678
679 --exclusive[={user|mcs}]
680 The job allocation can not share nodes with other running jobs
681 (or just other users with the "=user" option or with the "=mcs"
682 option). If user/mcs are not specified (i.e. the job allocation
683 can not share nodes with other running jobs), the job is allo‐
684 cated all CPUs and GRES on all nodes in the allocation, but is
685 only allocated as much memory as it requested. This is by design
686 to support gang scheduling, because suspended jobs still reside
687 in memory. To request all the memory on a node, use --mem=0.
688 The default shared/exclusive behavior depends on system configu‐
689 ration and the partition's OverSubscribe option takes precedence
690 over the job's option. NOTE: Since shared GRES (MPS) cannot be
691 allocated at the same time as a sharing GRES (GPU) this option
692 only allocates all sharing GRES and no underlying shared GRES.
693
694 --export={[ALL,]<environment_variables>|ALL|NONE}
695 Identify which environment variables from the submission envi‐
696 ronment are propagated to the launched application. Note that
697 SLURM_* variables are always propagated.
698
699 --export=ALL
700 Default mode if --export is not specified. All of the
701 user's environment will be loaded (either from the
702 caller's environment or from a clean environment if
703 --get-user-env is specified).
704
705 --export=NONE
706 Only SLURM_* variables from the user environment will
707 be defined. User must use absolute path to the binary
708 to be executed that will define the environment. User
709 can not specify explicit environment variables with
710 "NONE". --get-user-env will be ignored.
711
712 This option is particularly important for jobs that
713 are submitted on one cluster and execute on a differ‐
714 ent cluster (e.g. with different paths). To avoid
715 steps inheriting environment export settings (e.g.
716 "NONE") from sbatch command, the environment variable
717 SLURM_EXPORT_ENV should be set to "ALL" in the job
718 script.
719
720 --export=[ALL,]<environment_variables>
721 Exports all SLURM_* environment variables along with
722 explicitly defined variables. Multiple environment
723 variable names should be comma separated. Environment
724 variable names may be specified to propagate the cur‐
725 rent value (e.g. "--export=EDITOR") or specific values
726 may be exported (e.g. "--export=EDITOR=/bin/emacs").
727 If "ALL" is specified, then all user environment vari‐
728 ables will be loaded and will take precedence over any
729 explicitly given environment variables.
730
731 Example: --export=EDITOR,ARG1=test
732 In this example, the propagated environment will only
733 contain the variable EDITOR from the user's environ‐
734 ment, SLURM_* environment variables, and ARG1=test.
735
736 Example: --export=ALL,EDITOR=/bin/emacs
737 There are two possible outcomes for this example. If
738 the caller has the EDITOR environment variable de‐
739 fined, then the job's environment will inherit the
740 variable from the caller's environment. If the caller
741 doesn't have an environment variable defined for EDI‐
742 TOR, then the job's environment will use the value
743 given by --export.
744
745 --export-file={<filename>|<fd>}
746 If a number between 3 and OPEN_MAX is specified as the argument
747 to this option, a readable file descriptor will be assumed
748 (STDIN and STDOUT are not supported as valid arguments). Other‐
749 wise a filename is assumed. Export environment variables de‐
750 fined in <filename> or read from <fd> to the job's execution en‐
751 vironment. The content is one or more environment variable defi‐
752 nitions of the form NAME=value, each separated by a null charac‐
753 ter. This allows the use of special characters in environment
754 definitions.
755
756 -B, --extra-node-info=<sockets>[:cores[:threads]]
757 Restrict node selection to nodes with at least the specified
758 number of sockets, cores per socket and/or threads per core.
759 NOTE: These options do not specify the resource allocation size.
760 Each value specified is considered a minimum. An asterisk (*)
761 can be used as a placeholder indicating that all available re‐
762 sources of that type are to be utilized. Values can also be
763 specified as min-max. The individual levels can also be speci‐
764 fied in separate options if desired:
765 --sockets-per-node=<sockets>
766 --cores-per-socket=<cores>
767 --threads-per-core=<threads>
768 If task/affinity plugin is enabled, then specifying an alloca‐
769 tion in this manner also results in subsequently launched tasks
770 being bound to threads if the -B option specifies a thread
771 count, otherwise an option of cores if a core count is speci‐
772 fied, otherwise an option of sockets. If SelectType is config‐
773 ured to select/cons_res, it must have a parameter of CR_Core,
774 CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option
775 to be honored. If not specified, the scontrol show job will
776 display 'ReqS:C:T=*:*:*'. This option applies to job alloca‐
777 tions.
778 NOTE: This option is mutually exclusive with --hint,
779 --threads-per-core and --ntasks-per-core.
780 NOTE: This option may implicitly set the number of tasks (if -n
781 was not specified) as one task per requested thread.
782
783 --get-user-env[=timeout][mode]
784 This option will tell sbatch to retrieve the login environment
785 variables for the user specified in the --uid option. The envi‐
786 ronment variables are retrieved by running something of this
787 sort "su - <username> -c /usr/bin/env" and parsing the output.
788 Be aware that any environment variables already set in sbatch's
789 environment will take precedence over any environment variables
790 in the user's login environment. Clear any environment variables
791 before calling sbatch that you do not want propagated to the
792 spawned program. The optional timeout value is in seconds. De‐
793 fault value is 8 seconds. The optional mode value control the
794 "su" options. With a mode value of "S", "su" is executed with‐
795 out the "-" option. With a mode value of "L", "su" is executed
796 with the "-" option, replicating the login environment. If mode
797 not specified, the mode established at Slurm build time is used.
798 Example of use include "--get-user-env", "--get-user-env=10"
799 "--get-user-env=10L", and "--get-user-env=S".
800
801 --gid=<group>
802 If sbatch is run as root, and the --gid option is used, submit
803 the job with group's group access permissions. group may be the
804 group name or the numerical group ID.
805
806 --gpu-bind=[verbose,]<type>
807 Bind tasks to specific GPUs. By default every spawned task can
808 access every GPU allocated to the step. If "verbose," is speci‐
809 fied before <type>, then print out GPU binding debug information
810 to the stderr of the tasks. GPU binding is ignored if there is
811 only one task.
812
813 Supported type options:
814
815 closest Bind each task to the GPU(s) which are closest. In a
816 NUMA environment, each task may be bound to more than
817 one GPU (i.e. all GPUs in that NUMA environment).
818
819 map_gpu:<list>
820 Bind by setting GPU masks on tasks (or ranks) as spec‐
821 ified where <list> is
822 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
823 are interpreted as decimal values unless they are pre‐
824 ceded with '0x' in which case they interpreted as
825 hexadecimal values. If the number of tasks (or ranks)
826 exceeds the number of elements in this list, elements
827 in the list will be reused as needed starting from the
828 beginning of the list. To simplify support for large
829 task counts, the lists may follow a map with an aster‐
830 isk and repetition count. For example
831 "map_gpu:0*4,1*4". If the task/cgroup plugin is used
832 and ConstrainDevices is set in cgroup.conf, then the
833 GPU IDs are zero-based indexes relative to the GPUs
834 allocated to the job (e.g. the first GPU is 0, even if
835 the global ID is 3). Otherwise, the GPU IDs are global
836 IDs, and all GPUs on each node in the job should be
837 allocated for predictable binding results.
838
839 mask_gpu:<list>
840 Bind by setting GPU masks on tasks (or ranks) as spec‐
841 ified where <list> is
842 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
843 mapping is specified for a node and identical mapping
844 is applied to the tasks on every node (i.e. the lowest
845 task ID on each node is mapped to the first mask spec‐
846 ified in the list, etc.). GPU masks are always inter‐
847 preted as hexadecimal values but can be preceded with
848 an optional '0x'. To simplify support for large task
849 counts, the lists may follow a map with an asterisk
850 and repetition count. For example
851 "mask_gpu:0x0f*4,0xf0*4". If the task/cgroup plugin
852 is used and ConstrainDevices is set in cgroup.conf,
853 then the GPU IDs are zero-based indexes relative to
854 the GPUs allocated to the job (e.g. the first GPU is
855 0, even if the global ID is 3). Otherwise, the GPU IDs
856 are global IDs, and all GPUs on each node in the job
857 should be allocated for predictable binding results.
858
859 none Do not bind tasks to GPUs (turns off binding if
860 --gpus-per-task is requested).
861
862 per_task:<gpus_per_task>
863 Each task will be bound to the number of gpus speci‐
864 fied in <gpus_per_task>. Gpus are assigned in order to
865 tasks. The first task will be assigned the first x
866 number of gpus on the node etc.
867
868 single:<tasks_per_gpu>
869 Like --gpu-bind=closest, except that each task can
870 only be bound to a single GPU, even when it can be
871 bound to multiple GPUs that are equally close. The
872 GPU to bind to is determined by <tasks_per_gpu>, where
873 the first <tasks_per_gpu> tasks are bound to the first
874 GPU available, the second <tasks_per_gpu> tasks are
875 bound to the second GPU available, etc. This is basi‐
876 cally a block distribution of tasks onto available
877 GPUs, where the available GPUs are determined by the
878 socket affinity of the task and the socket affinity of
879 the GPUs as specified in gres.conf's Cores parameter.
880
881 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
882 Request that GPUs allocated to the job are configured with spe‐
883 cific frequency values. This option can be used to indepen‐
884 dently configure the GPU and its memory frequencies. After the
885 job is completed, the frequencies of all affected GPUs will be
886 reset to the highest possible values. In some cases, system
887 power caps may override the requested values. The field type
888 can be "memory". If type is not specified, the GPU frequency is
889 implied. The value field can either be "low", "medium", "high",
890 "highm1" or a numeric value in megahertz (MHz). If the speci‐
891 fied numeric value is not possible, a value as close as possible
892 will be used. See below for definition of the values. The ver‐
893 bose option causes current GPU frequency information to be
894 logged. Examples of use include "--gpu-freq=medium,memory=high"
895 and "--gpu-freq=450".
896
897 Supported value definitions:
898
899 low the lowest available frequency.
900
901 medium attempts to set a frequency in the middle of the
902 available range.
903
904 high the highest available frequency.
905
906 highm1 (high minus one) will select the next highest avail‐
907 able frequency.
908
909 -G, --gpus=[type:]<number>
910 Specify the total number of GPUs required for the job. An op‐
911 tional GPU type specification can be supplied. For example
912 "--gpus=volta:3". Multiple options can be requested in a comma
913 separated list, for example: "--gpus=volta:3,kepler:1". See
914 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
915 options.
916 NOTE: The allocation has to contain at least one GPU per node.
917
918 --gpus-per-node=[type:]<number>
919 Specify the number of GPUs required for the job on each node in‐
920 cluded in the job's resource allocation. An optional GPU type
921 specification can be supplied. For example
922 "--gpus-per-node=volta:3". Multiple options can be requested in
923 a comma separated list, for example:
924 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
925 --gpus-per-socket and --gpus-per-task options.
926
927 --gpus-per-socket=[type:]<number>
928 Specify the number of GPUs required for the job on each socket
929 included in the job's resource allocation. An optional GPU type
930 specification can be supplied. For example
931 "--gpus-per-socket=volta:3". Multiple options can be requested
932 in a comma separated list, for example:
933 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
934 sockets per node count ( --sockets-per-node). See also the
935 --gpus, --gpus-per-node and --gpus-per-task options.
936
937 --gpus-per-task=[type:]<number>
938 Specify the number of GPUs required for the job on each task to
939 be spawned in the job's resource allocation. An optional GPU
940 type specification can be supplied. For example
941 "--gpus-per-task=volta:1". Multiple options can be requested in
942 a comma separated list, for example:
943 "--gpus-per-task=volta:3,kepler:1". See also the --gpus,
944 --gpus-per-socket and --gpus-per-node options. This option re‐
945 quires an explicit task count, e.g. -n, --ntasks or "--gpus=X
946 --gpus-per-task=Y" rather than an ambiguous range of nodes with
947 -N, --nodes. This option will implicitly set
948 --gpu-bind=per_task:<gpus_per_task>, but that can be overridden
949 with an explicit --gpu-bind specification.
950
951 --gres=<list>
952 Specifies a comma-delimited list of generic consumable re‐
953 sources. The format of each entry on the list is
954 "name[[:type]:count]". The name is that of the consumable re‐
955 source. The count is the number of those resources with a de‐
956 fault value of 1. The count can have a suffix of "k" or "K"
957 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
958 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
959 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
960 x 1024 x 1024 x 1024). The specified resources will be allo‐
961 cated to the job on each node. The available generic consumable
962 resources is configurable by the system administrator. A list
963 of available generic consumable resources will be printed and
964 the command will exit if the option argument is "help". Exam‐
965 ples of use include "--gres=gpu:2", "--gres=gpu:kepler:2", and
966 "--gres=help".
967
968 --gres-flags=<type>
969 Specify generic resource task binding options.
970
971 disable-binding
972 Disable filtering of CPUs with respect to generic re‐
973 source locality. This option is currently required to
974 use more CPUs than are bound to a GRES (i.e. if a GPU is
975 bound to the CPUs on one socket, but resources on more
976 than one socket are required to run the job). This op‐
977 tion may permit a job to be allocated resources sooner
978 than otherwise possible, but may result in lower job per‐
979 formance.
980 NOTE: This option is specific to SelectType=cons_res.
981
982 enforce-binding
983 The only CPUs available to the job will be those bound to
984 the selected GRES (i.e. the CPUs identified in the
985 gres.conf file will be strictly enforced). This option
986 may result in delayed initiation of a job. For example a
987 job requiring two GPUs and one CPU will be delayed until
988 both GPUs on a single socket are available rather than
989 using GPUs bound to separate sockets, however, the appli‐
990 cation performance may be improved due to improved commu‐
991 nication speed. Requires the node to be configured with
992 more than one socket and resource filtering will be per‐
993 formed on a per-socket basis.
994 NOTE: This option is specific to SelectType=cons_tres.
995
996 -h, --help
997 Display help information and exit.
998
999 --hint=<type>
1000 Bind tasks according to application hints.
1001 NOTE: This option cannot be used in conjunction with
1002 --ntasks-per-core, --threads-per-core or -B. If --hint is speci‐
1003 fied as a command line argument, it will take precedence over
1004 the environment.
1005
1006 compute_bound
1007 Select settings for compute bound applications: use all
1008 cores in each socket, one thread per core.
1009
1010 memory_bound
1011 Select settings for memory bound applications: use only
1012 one core in each socket, one thread per core.
1013
1014 [no]multithread
1015 [don't] use extra threads with in-core multi-threading
1016 which can benefit communication intensive applications.
1017 Only supported with the task/affinity plugin.
1018
1019 help show this help message
1020
1021 -H, --hold
1022 Specify the job is to be submitted in a held state (priority of
1023 zero). A held job can now be released using scontrol to reset
1024 its priority (e.g. "scontrol release <job_id>").
1025
1026 --ignore-pbs
1027 Ignore all "#PBS" and "#BSUB" options specified in the batch
1028 script.
1029
1030 -i, --input=<filename_pattern>
1031 Instruct Slurm to connect the batch script's standard input di‐
1032 rectly to the file name specified in the "filename pattern".
1033
1034 By default, "/dev/null" is open on the batch script's standard
1035 input and both standard output and standard error are directed
1036 to a file of the name "slurm-%j.out", where the "%j" is replaced
1037 with the job allocation number, as described below in the file‐
1038 name pattern section.
1039
1040 -J, --job-name=<jobname>
1041 Specify a name for the job allocation. The specified name will
1042 appear along with the job id number when querying running jobs
1043 on the system. The default is the name of the batch script, or
1044 just "sbatch" if the script is read on sbatch's standard input.
1045
1046 --kill-on-invalid-dep=<yes|no>
1047 If a job has an invalid dependency and it can never run this pa‐
1048 rameter tells Slurm to terminate it or not. A terminated job
1049 state will be JOB_CANCELLED. If this option is not specified
1050 the system wide behavior applies. By default the job stays
1051 pending with reason DependencyNeverSatisfied or if the kill_in‐
1052 valid_depend is specified in slurm.conf the job is terminated.
1053
1054 -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
1055 Specification of licenses (or other resources available on all
1056 nodes of the cluster) which must be allocated to this job. Li‐
1057 cense names can be followed by a colon and count (the default
1058 count is one). Multiple license names should be comma separated
1059 (e.g. "--licenses=foo:4,bar"). To submit jobs using remote li‐
1060 censes, those served by the slurmdbd, specify the name of the
1061 server providing the licenses. For example "--license=nas‐
1062 tran@slurmdb:12".
1063
1064 NOTE: When submitting heterogeneous jobs, license requests only
1065 work correctly when made on the first component job. For exam‐
1066 ple "sbatch -L ansys:2 : script.sh".
1067
1068 --mail-type=<type>
1069 Notify user by email when certain event types occur. Valid type
1070 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1071 BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
1072 VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1073 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1074 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
1075 percent of time limit), TIME_LIMIT_50 (reached 50 percent of
1076 time limit) and ARRAY_TASKS (send emails for each array task).
1077 Multiple type values may be specified in a comma separated list.
1078 The user to be notified is indicated with --mail-user. Unless
1079 the ARRAY_TASKS option is specified, mail notifications on job
1080 BEGIN, END and FAIL apply to a job array as a whole rather than
1081 generating individual email messages for each task in the job
1082 array.
1083
1084 --mail-user=<user>
1085 User to receive email notification of state changes as defined
1086 by --mail-type. The default value is the submitting user.
1087
1088 --mcs-label=<mcs>
1089 Used only when the mcs/group plugin is enabled. This parameter
1090 is a group among the groups of the user. Default value is cal‐
1091 culated by the Plugin mcs if it's enabled.
1092
1093 --mem=<size>[units]
1094 Specify the real memory required per node. Default units are
1095 megabytes. Different units can be specified using the suffix
1096 [K|M|G|T]. Default value is DefMemPerNode and the maximum value
1097 is MaxMemPerNode. If configured, both parameters can be seen us‐
1098 ing the scontrol show config command. This parameter would gen‐
1099 erally be used if whole nodes are allocated to jobs (Select‐
1100 Type=select/linear). Also see --mem-per-cpu and --mem-per-gpu.
1101 The --mem, --mem-per-cpu and --mem-per-gpu options are mutually
1102 exclusive. If --mem, --mem-per-cpu or --mem-per-gpu are speci‐
1103 fied as command line arguments, then they will take precedence
1104 over the environment.
1105
1106 NOTE: A memory size specification of zero is treated as a spe‐
1107 cial case and grants the job access to all of the memory on each
1108 node. If the job is allocated multiple nodes in a heterogeneous
1109 cluster, the memory limit on each node will be that of the node
1110 in the allocation with the smallest memory size (same limit will
1111 apply to every node in the job's allocation).
1112
1113 NOTE: Enforcement of memory limits currently relies upon the
1114 task/cgroup plugin or enabling of accounting, which samples mem‐
1115 ory use on a periodic basis (data need not be stored, just col‐
1116 lected). In both cases memory use is based upon the job's Resi‐
1117 dent Set Size (RSS). A task may exceed the memory limit until
1118 the next periodic accounting sample.
1119
1120 --mem-bind=[{quiet|verbose},]<type>
1121 Bind tasks to memory. Used only when the task/affinity plugin is
1122 enabled and the NUMA memory functions are available. Note that
1123 the resolution of CPU and memory binding may differ on some ar‐
1124 chitectures. For example, CPU binding may be performed at the
1125 level of the cores within a processor while memory binding will
1126 be performed at the level of nodes, where the definition of
1127 "nodes" may differ from system to system. By default no memory
1128 binding is performed; any task using any CPU can use any memory.
1129 This option is typically used to ensure that each task is bound
1130 to the memory closest to its assigned CPU. The use of any type
1131 other than "none" or "local" is not recommended.
1132
1133 NOTE: To have Slurm always report on the selected memory binding
1134 for all commands executed in a shell, you can enable verbose
1135 mode by setting the SLURM_MEM_BIND environment variable value to
1136 "verbose".
1137
1138 The following informational environment variables are set when
1139 --mem-bind is in use:
1140
1141 SLURM_MEM_BIND_LIST
1142 SLURM_MEM_BIND_PREFER
1143 SLURM_MEM_BIND_SORT
1144 SLURM_MEM_BIND_TYPE
1145 SLURM_MEM_BIND_VERBOSE
1146
1147 See the ENVIRONMENT VARIABLES section for a more detailed de‐
1148 scription of the individual SLURM_MEM_BIND* variables.
1149
1150 Supported options include:
1151
1152 help show this help message
1153
1154 local Use memory local to the processor in use
1155
1156 map_mem:<list>
1157 Bind by setting memory masks on tasks (or ranks) as spec‐
1158 ified where <list> is
1159 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1160 ping is specified for a node and identical mapping is ap‐
1161 plied to the tasks on every node (i.e. the lowest task ID
1162 on each node is mapped to the first ID specified in the
1163 list, etc.). NUMA IDs are interpreted as decimal values
1164 unless they are preceded with '0x' in which case they in‐
1165 terpreted as hexadecimal values. If the number of tasks
1166 (or ranks) exceeds the number of elements in this list,
1167 elements in the list will be reused as needed starting
1168 from the beginning of the list. To simplify support for
1169 large task counts, the lists may follow a map with an as‐
1170 terisk and repetition count. For example
1171 "map_mem:0x0f*4,0xf0*4". For predictable binding re‐
1172 sults, all CPUs for each node in the job should be allo‐
1173 cated to the job.
1174
1175 mask_mem:<list>
1176 Bind by setting memory masks on tasks (or ranks) as spec‐
1177 ified where <list> is
1178 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1179 mapping is specified for a node and identical mapping is
1180 applied to the tasks on every node (i.e. the lowest task
1181 ID on each node is mapped to the first mask specified in
1182 the list, etc.). NUMA masks are always interpreted as
1183 hexadecimal values. Note that masks must be preceded
1184 with a '0x' if they don't begin with [0-9] so they are
1185 seen as numerical values. If the number of tasks (or
1186 ranks) exceeds the number of elements in this list, ele‐
1187 ments in the list will be reused as needed starting from
1188 the beginning of the list. To simplify support for large
1189 task counts, the lists may follow a mask with an asterisk
1190 and repetition count. For example "mask_mem:0*4,1*4".
1191 For predictable binding results, all CPUs for each node
1192 in the job should be allocated to the job.
1193
1194 no[ne] don't bind tasks to memory (default)
1195
1196 p[refer]
1197 Prefer use of first specified NUMA node, but permit
1198 use of other available NUMA nodes.
1199
1200 q[uiet]
1201 quietly bind before task runs (default)
1202
1203 rank bind by task rank (not recommended)
1204
1205 sort sort free cache pages (run zonesort on Intel KNL nodes)
1206
1207 v[erbose]
1208 verbosely report binding before task runs
1209
1210 --mem-per-cpu=<size>[units]
1211 Minimum memory required per allocated CPU. Default units are
1212 megabytes. The default value is DefMemPerCPU and the maximum
1213 value is MaxMemPerCPU (see exception below). If configured, both
1214 parameters can be seen using the scontrol show config command.
1215 Note that if the job's --mem-per-cpu value exceeds the config‐
1216 ured MaxMemPerCPU, then the user's limit will be treated as a
1217 memory limit per task; --mem-per-cpu will be reduced to a value
1218 no larger than MaxMemPerCPU; --cpus-per-task will be set and the
1219 value of --cpus-per-task multiplied by the new --mem-per-cpu
1220 value will equal the original --mem-per-cpu value specified by
1221 the user. This parameter would generally be used if individual
1222 processors are allocated to jobs (SelectType=select/cons_res).
1223 If resources are allocated by core, socket, or whole nodes, then
1224 the number of CPUs allocated to a job may be higher than the
1225 task count and the value of --mem-per-cpu should be adjusted ac‐
1226 cordingly. Also see --mem and --mem-per-gpu. The --mem,
1227 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1228
1229 NOTE: If the final amount of memory requested by a job can't be
1230 satisfied by any of the nodes configured in the partition, the
1231 job will be rejected. This could happen if --mem-per-cpu is
1232 used with the --exclusive option for a job allocation and
1233 --mem-per-cpu times the number of CPUs on a node is greater than
1234 the total memory of that node.
1235
1236 --mem-per-gpu=<size>[units]
1237 Minimum memory required per allocated GPU. Default units are
1238 megabytes. Different units can be specified using the suffix
1239 [K|M|G|T]. Default value is DefMemPerGPU and is available on
1240 both a global and per partition basis. If configured, the pa‐
1241 rameters can be seen using the scontrol show config and scontrol
1242 show partition commands. Also see --mem. The --mem,
1243 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1244
1245 --mincpus=<n>
1246 Specify a minimum number of logical cpus/processors per node.
1247
1248 --network=<type>
1249 Specify information pertaining to the switch or network. The
1250 interpretation of type is system dependent. This option is sup‐
1251 ported when running Slurm on a Cray natively. It is used to re‐
1252 quest using Network Performance Counters. Only one value per
1253 request is valid. All options are case in-sensitive. In this
1254 configuration supported values include:
1255
1256 system
1257 Use the system-wide network performance counters. Only
1258 nodes requested will be marked in use for the job alloca‐
1259 tion. If the job does not fill up the entire system the
1260 rest of the nodes are not able to be used by other jobs
1261 using NPC, if idle their state will appear as PerfCnts.
1262 These nodes are still available for other jobs not using
1263 NPC.
1264
1265 blade Use the blade network performance counters. Only nodes re‐
1266 quested will be marked in use for the job allocation. If
1267 the job does not fill up the entire blade(s) allocated to
1268 the job those blade(s) are not able to be used by other
1269 jobs using NPC, if idle their state will appear as PerfC‐
1270 nts. These nodes are still available for other jobs not
1271 using NPC.
1272
1273 In all cases the job allocation request must specify the --exclusive
1274 option. Otherwise the request will be denied.
1275
1276 Also with any of these options steps are not allowed to share blades,
1277 so resources would remain idle inside an allocation if the step running
1278 on a blade does not take up all the nodes on the blade.
1279
1280 The network option is also supported on systems with IBM's Parallel En‐
1281 vironment (PE). See IBM's LoadLeveler job command keyword documenta‐
1282 tion about the keyword "network" for more information. Multiple values
1283 may be specified in a comma separated list. All options are case
1284 in-sensitive. Supported values include:
1285
1286 BULK_XFER[=<resources>]
1287 Enable bulk transfer of data using Remote Di‐
1288 rect-Memory Access (RDMA). The optional resources
1289 specification is a numeric value which can have a
1290 suffix of "k", "K", "m", "M", "g" or "G" for kilo‐
1291 bytes, megabytes or gigabytes. NOTE: The resources
1292 specification is not supported by the underlying IBM
1293 infrastructure as of Parallel Environment version
1294 2.2 and no value should be specified at this time.
1295
1296 CAU=<count> Number of Collective Acceleration Units (CAU) re‐
1297 quired. Applies only to IBM Power7-IH processors.
1298 Default value is zero. Independent CAU will be al‐
1299 located for each programming interface (MPI, LAPI,
1300 etc.)
1301
1302 DEVNAME=<name>
1303 Specify the device name to use for communications
1304 (e.g. "eth0" or "mlx4_0").
1305
1306 DEVTYPE=<type>
1307 Specify the device type to use for communications.
1308 The supported values of type are: "IB" (InfiniBand),
1309 "HFI" (P7 Host Fabric Interface), "IPONLY" (IP-Only
1310 interfaces), "HPCE" (HPC Ethernet), and
1311
1312 "KMUX" (Kernel Emulation of HPCE). The devices al‐
1313 located to a job must all be of the same type. The
1314 default value depends upon depends upon what hard‐
1315 ware is available and in order of preferences is
1316 IPONLY (which is not considered in User Space mode),
1317 HFI, IB, HPCE, and KMUX.
1318
1319 IMMED =<count>
1320 Number of immediate send slots per window required.
1321 Applies only to IBM Power7-IH processors. Default
1322 value is zero.
1323
1324 INSTANCES =<count>
1325 Specify number of network connections for each task
1326 on each network connection. The default instance
1327 count is 1.
1328
1329 IPV4 Use Internet Protocol (IP) version 4 communications
1330 (default).
1331
1332 IPV6 Use Internet Protocol (IP) version 6 communications.
1333
1334 LAPI Use the LAPI programming interface.
1335
1336 MPI Use the MPI programming interface. MPI is the de‐
1337 fault interface.
1338
1339 PAMI Use the PAMI programming interface.
1340
1341 SHMEM Use the OpenSHMEM programming interface.
1342
1343 SN_ALL Use all available switch networks (default).
1344
1345 SN_SINGLE Use one available switch network.
1346
1347 UPC Use the UPC programming interface.
1348
1349 US Use User Space communications.
1350
1351 Some examples of network specifications:
1352
1353 Instances=2,US,MPI,SN_ALL
1354 Create two user space connections for MPI communications
1355 on every switch network for each task.
1356
1357 US,MPI,Instances=3,Devtype=IB
1358 Create three user space connections for MPI communica‐
1359 tions on every InfiniBand network for each task.
1360
1361 IPV4,LAPI,SN_Single
1362 Create a IP version 4 connection for LAPI communications
1363 on one switch network for each task.
1364
1365 Instances=2,US,LAPI,MPI
1366 Create two user space connections each for LAPI and MPI
1367 communications on every switch network for each task.
1368 Note that SN_ALL is the default option so every switch
1369 network is used. Also note that Instances=2 specifies
1370 that two connections are established for each protocol
1371 (LAPI and MPI) and each task. If there are two networks
1372 and four tasks on the node then a total of 32 connections
1373 are established (2 instances x 2 protocols x 2 networks x
1374 4 tasks).
1375
1376 --nice[=adjustment]
1377 Run the job with an adjusted scheduling priority within Slurm.
1378 With no adjustment value the scheduling priority is decreased by
1379 100. A negative nice value increases the priority, otherwise de‐
1380 creases it. The adjustment range is +/- 2147483645. Only privi‐
1381 leged users can specify a negative adjustment.
1382
1383 -k, --no-kill[=off]
1384 Do not automatically terminate a job if one of the nodes it has
1385 been allocated fails. The user will assume the responsibilities
1386 for fault-tolerance should a node fail. When there is a node
1387 failure, any active job steps (usually MPI jobs) on that node
1388 will almost certainly suffer a fatal error, but with --no-kill,
1389 the job allocation will not be revoked so the user may launch
1390 new job steps on the remaining nodes in their allocation.
1391
1392 Specify an optional argument of "off" disable the effect of the
1393 SBATCH_NO_KILL environment variable.
1394
1395 By default Slurm terminates the entire job allocation if any
1396 node fails in its range of allocated nodes.
1397
1398 --no-requeue
1399 Specifies that the batch job should never be requeued under any
1400 circumstances. Setting this option will prevent system adminis‐
1401 trators from being able to restart the job (for example, after a
1402 scheduled downtime), recover from a node failure, or be requeued
1403 upon preemption by a higher priority job. When a job is re‐
1404 queued, the batch script is initiated from its beginning. Also
1405 see the --requeue option. The JobRequeue configuration parame‐
1406 ter controls the default behavior on the cluster.
1407
1408 -F, --nodefile=<node_file>
1409 Much like --nodelist, but the list is contained in a file of
1410 name node file. The node names of the list may also span multi‐
1411 ple lines in the file. Duplicate node names in the file will
1412 be ignored. The order of the node names in the list is not im‐
1413 portant; the node names will be sorted by Slurm.
1414
1415 -w, --nodelist=<node_name_list>
1416 Request a specific list of hosts. The job will contain all of
1417 these hosts and possibly additional hosts as needed to satisfy
1418 resource requirements. The list may be specified as a
1419 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1420 for example), or a filename. The host list will be assumed to
1421 be a filename if it contains a "/" character. If you specify a
1422 minimum node or processor count larger than can be satisfied by
1423 the supplied host list, additional resources will be allocated
1424 on other nodes as needed. Duplicate node names in the list will
1425 be ignored. The order of the node names in the list is not im‐
1426 portant; the node names will be sorted by Slurm.
1427
1428 -N, --nodes=<minnodes>[-maxnodes]
1429 Request that a minimum of minnodes nodes be allocated to this
1430 job. A maximum node count may also be specified with maxnodes.
1431 If only one number is specified, this is used as both the mini‐
1432 mum and maximum node count. The partition's node limits super‐
1433 sede those of the job. If a job's node limits are outside of
1434 the range permitted for its associated partition, the job will
1435 be left in a PENDING state. This permits possible execution at
1436 a later time, when the partition limit is changed. If a job
1437 node limit exceeds the number of nodes configured in the parti‐
1438 tion, the job will be rejected. Note that the environment vari‐
1439 able SLURM_JOB_NUM_NODES will be set to the count of nodes actu‐
1440 ally allocated to the job. See the ENVIRONMENT VARIABLES sec‐
1441 tion for more information. If -N is not specified, the default
1442 behavior is to allocate enough nodes to satisfy the requested
1443 resources as expressed by per-job specification options, e.g.
1444 -n, -c and --gpus. The job will be allocated as many nodes as
1445 possible within the range specified and without delaying the
1446 initiation of the job. The node count specification may include
1447 a numeric value followed by a suffix of "k" (multiplies numeric
1448 value by 1,024) or "m" (multiplies numeric value by 1,048,576).
1449
1450 -n, --ntasks=<number>
1451 sbatch does not launch tasks, it requests an allocation of re‐
1452 sources and submits a batch script. This option advises the
1453 Slurm controller that job steps run within the allocation will
1454 launch a maximum of number tasks and to provide for sufficient
1455 resources. The default is one task per node, but note that the
1456 --cpus-per-task option will change this default.
1457
1458 --ntasks-per-core=<ntasks>
1459 Request the maximum ntasks be invoked on each core. Meant to be
1460 used with the --ntasks option. Related to --ntasks-per-node ex‐
1461 cept at the core level instead of the node level. NOTE: This
1462 option is not supported when using SelectType=select/linear.
1463
1464 --ntasks-per-gpu=<ntasks>
1465 Request that there are ntasks tasks invoked for every GPU. This
1466 option can work in two ways: 1) either specify --ntasks in addi‐
1467 tion, in which case a type-less GPU specification will be auto‐
1468 matically determined to satisfy --ntasks-per-gpu, or 2) specify
1469 the GPUs wanted (e.g. via --gpus or --gres) without specifying
1470 --ntasks, and the total task count will be automatically deter‐
1471 mined. The number of CPUs needed will be automatically in‐
1472 creased if necessary to allow for any calculated task count.
1473 This option will implicitly set --gpu-bind=single:<ntasks>, but
1474 that can be overridden with an explicit --gpu-bind specifica‐
1475 tion. This option is not compatible with a node range (i.e.
1476 -N<minnodes-maxnodes>). This option is not compatible with
1477 --gpus-per-task, --gpus-per-socket, or --ntasks-per-node. This
1478 option is not supported unless SelectType=cons_tres is config‐
1479 ured (either directly or indirectly on Cray systems).
1480
1481 --ntasks-per-node=<ntasks>
1482 Request that ntasks be invoked on each node. If used with the
1483 --ntasks option, the --ntasks option will take precedence and
1484 the --ntasks-per-node will be treated as a maximum count of
1485 tasks per node. Meant to be used with the --nodes option. This
1486 is related to --cpus-per-task=ncpus, but does not require knowl‐
1487 edge of the actual number of cpus on each node. In some cases,
1488 it is more convenient to be able to request that no more than a
1489 specific number of tasks be invoked on each node. Examples of
1490 this include submitting a hybrid MPI/OpenMP app where only one
1491 MPI "task/rank" should be assigned to each node while allowing
1492 the OpenMP portion to utilize all of the parallelism present in
1493 the node, or submitting a single setup/cleanup/monitoring job to
1494 each node of a pre-existing allocation as one step in a larger
1495 job script.
1496
1497 --ntasks-per-socket=<ntasks>
1498 Request the maximum ntasks be invoked on each socket. Meant to
1499 be used with the --ntasks option. Related to --ntasks-per-node
1500 except at the socket level instead of the node level. NOTE:
1501 This option is not supported when using SelectType=select/lin‐
1502 ear.
1503
1504 --open-mode={append|truncate}
1505 Open the output and error files using append or truncate mode as
1506 specified. The default value is specified by the system config‐
1507 uration parameter JobFileAppend.
1508
1509 -o, --output=<filename_pattern>
1510 Instruct Slurm to connect the batch script's standard output di‐
1511 rectly to the file name specified in the "filename pattern". By
1512 default both standard output and standard error are directed to
1513 the same file. For job arrays, the default file name is
1514 "slurm-%A_%a.out", "%A" is replaced by the job ID and "%a" with
1515 the array index. For other jobs, the default file name is
1516 "slurm-%j.out", where the "%j" is replaced by the job ID. See
1517 the filename pattern section below for filename specification
1518 options.
1519
1520 -O, --overcommit
1521 Overcommit resources.
1522
1523 When applied to a job allocation (not including jobs requesting
1524 exclusive access to the nodes) the resources are allocated as if
1525 only one task per node is requested. This means that the re‐
1526 quested number of cpus per task (-c, --cpus-per-task) are allo‐
1527 cated per node rather than being multiplied by the number of
1528 tasks. Options used to specify the number of tasks per node,
1529 socket, core, etc. are ignored.
1530
1531 When applied to job step allocations (the srun command when exe‐
1532 cuted within an existing job allocation), this option can be
1533 used to launch more than one task per CPU. Normally, srun will
1534 not allocate more than one process per CPU. By specifying
1535 --overcommit you are explicitly allowing more than one process
1536 per CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1537 mitted to execute per node. NOTE: MAX_TASKS_PER_NODE is defined
1538 in the file slurm.h and is not a variable, it is set at Slurm
1539 build time.
1540
1541 -s, --oversubscribe
1542 The job allocation can over-subscribe resources with other run‐
1543 ning jobs. The resources to be over-subscribed can be nodes,
1544 sockets, cores, and/or hyperthreads depending upon configura‐
1545 tion. The default over-subscribe behavior depends on system
1546 configuration and the partition's OverSubscribe option takes
1547 precedence over the job's option. This option may result in the
1548 allocation being granted sooner than if the --oversubscribe op‐
1549 tion was not set and allow higher system utilization, but appli‐
1550 cation performance will likely suffer due to competition for re‐
1551 sources. Also see the --exclusive option.
1552
1553 --parsable
1554 Outputs only the job id number and the cluster name if present.
1555 The values are separated by a semicolon. Errors will still be
1556 displayed.
1557
1558 -p, --partition=<partition_names>
1559 Request a specific partition for the resource allocation. If
1560 not specified, the default behavior is to allow the slurm con‐
1561 troller to select the default partition as designated by the
1562 system administrator. If the job can use more than one parti‐
1563 tion, specify their names in a comma separate list and the one
1564 offering earliest initiation will be used with no regard given
1565 to the partition name ordering (although higher priority parti‐
1566 tions will be considered first). When the job is initiated, the
1567 name of the partition used will be placed first in the job
1568 record partition string.
1569
1570 --power=<flags>
1571 Comma separated list of power management plugin options. Cur‐
1572 rently available flags include: level (all nodes allocated to
1573 the job should have identical power caps, may be disabled by the
1574 Slurm configuration option PowerParameters=job_no_level).
1575
1576 --priority=<value>
1577 Request a specific job priority. May be subject to configura‐
1578 tion specific constraints. value should either be a numeric
1579 value or "TOP" (for highest possible value). Only Slurm opera‐
1580 tors and administrators can set the priority of a job.
1581
1582 --profile={all|none|<type>[,<type>...]}
1583 Enables detailed data collection by the acct_gather_profile
1584 plugin. Detailed data are typically time-series that are stored
1585 in an HDF5 file for the job or an InfluxDB database depending on
1586 the configured plugin.
1587
1588 All All data types are collected. (Cannot be combined with
1589 other values.)
1590
1591 None No data types are collected. This is the default.
1592 (Cannot be combined with other values.)
1593
1594 Valid type values are:
1595
1596 Energy Energy data is collected.
1597
1598 Task Task (I/O, Memory, ...) data is collected.
1599
1600 Lustre Lustre data is collected.
1601
1602 Network
1603 Network (InfiniBand) data is collected.
1604
1605 --propagate[=rlimit[,rlimit...]]
1606 Allows users to specify which of the modifiable (soft) resource
1607 limits to propagate to the compute nodes and apply to their
1608 jobs. If no rlimit is specified, then all resource limits will
1609 be propagated. The following rlimit names are supported by
1610 Slurm (although some options may not be supported on some sys‐
1611 tems):
1612
1613 ALL All limits listed below (default)
1614
1615 NONE No limits listed below
1616
1617 AS The maximum address space (virtual memory) for a
1618 process.
1619
1620 CORE The maximum size of core file
1621
1622 CPU The maximum amount of CPU time
1623
1624 DATA The maximum size of a process's data segment
1625
1626 FSIZE The maximum size of files created. Note that if the
1627 user sets FSIZE to less than the current size of the
1628 slurmd.log, job launches will fail with a 'File size
1629 limit exceeded' error.
1630
1631 MEMLOCK The maximum size that may be locked into memory
1632
1633 NOFILE The maximum number of open files
1634
1635 NPROC The maximum number of processes available
1636
1637 RSS The maximum resident set size. Note that this only has
1638 effect with Linux kernels 2.4.30 or older or BSD.
1639
1640 STACK The maximum stack size
1641
1642 -q, --qos=<qos>
1643 Request a quality of service for the job. QOS values can be de‐
1644 fined for each user/cluster/account association in the Slurm
1645 database. Users will be limited to their association's defined
1646 set of qos's when the Slurm configuration parameter, Account‐
1647 ingStorageEnforce, includes "qos" in its definition.
1648
1649 -Q, --quiet
1650 Suppress informational messages from sbatch such as Job ID. Only
1651 errors will still be displayed.
1652
1653 --reboot
1654 Force the allocated nodes to reboot before starting the job.
1655 This is only supported with some system configurations and will
1656 otherwise be silently ignored. Only root, SlurmUser or admins
1657 can reboot nodes.
1658
1659 --requeue
1660 Specifies that the batch job should be eligible for requeuing.
1661 The job may be requeued explicitly by a system administrator,
1662 after node failure, or upon preemption by a higher priority job.
1663 When a job is requeued, the batch script is initiated from its
1664 beginning. Also see the --no-requeue option. The JobRequeue
1665 configuration parameter controls the default behavior on the
1666 cluster.
1667
1668 --reservation=<reservation_names>
1669 Allocate resources for the job from the named reservation. If
1670 the job can use more than one reservation, specify their names
1671 in a comma separate list and the one offering earliest initia‐
1672 tion. Each reservation will be considered in the order it was
1673 requested. All reservations will be listed in scontrol/squeue
1674 through the life of the job. In accounting the first reserva‐
1675 tion will be seen and after the job starts the reservation used
1676 will replace it.
1677
1678 --signal=[{R|B}:]<sig_num>[@sig_time]
1679 When a job is within sig_time seconds of its end time, send it
1680 the signal sig_num. Due to the resolution of event handling by
1681 Slurm, the signal may be sent up to 60 seconds earlier than
1682 specified. sig_num may either be a signal number or name (e.g.
1683 "10" or "USR1"). sig_time must have an integer value between 0
1684 and 65535. By default, no signal is sent before the job's end
1685 time. If a sig_num is specified without any sig_time, the de‐
1686 fault time will be 60 seconds. Use the "B:" option to signal
1687 only the batch shell, none of the other processes will be sig‐
1688 naled. By default all job steps will be signaled, but not the
1689 batch shell itself. Use the "R:" option to allow this job to
1690 overlap with a reservation with MaxStartDelay set. To have the
1691 signal sent at preemption time see the preempt_send_user_signal
1692 SlurmctldParameter.
1693
1694 --sockets-per-node=<sockets>
1695 Restrict node selection to nodes with at least the specified
1696 number of sockets. See additional information under -B option
1697 above when task/affinity plugin is enabled.
1698 NOTE: This option may implicitly set the number of tasks (if -n
1699 was not specified) as one task per requested thread.
1700
1701 --spread-job
1702 Spread the job allocation over as many nodes as possible and at‐
1703 tempt to evenly distribute tasks across the allocated nodes.
1704 This option disables the topology/tree plugin.
1705
1706 --switches=<count>[@max-time]
1707 When a tree topology is used, this defines the maximum count of
1708 leaf switches desired for the job allocation and optionally the
1709 maximum time to wait for that number of switches. If Slurm finds
1710 an allocation containing more switches than the count specified,
1711 the job remains pending until it either finds an allocation with
1712 desired switch count or the time limit expires. It there is no
1713 switch count limit, there is no delay in starting the job. Ac‐
1714 ceptable time formats include "minutes", "minutes:seconds",
1715 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1716 "days-hours:minutes:seconds". The job's maximum time delay may
1717 be limited by the system administrator using the SchedulerParam‐
1718 eters configuration parameter with the max_switch_wait parameter
1719 option. On a dragonfly network the only switch count supported
1720 is 1 since communication performance will be highest when a job
1721 is allocate resources on one leaf switch or more than 2 leaf
1722 switches. The default max-time is the max_switch_wait Sched‐
1723 ulerParameters.
1724
1725 --test-only
1726 Validate the batch script and return an estimate of when a job
1727 would be scheduled to run given the current job queue and all
1728 the other arguments specifying the job requirements. No job is
1729 actually submitted.
1730
1731 --thread-spec=<num>
1732 Count of specialized threads per node reserved by the job for
1733 system operations and not used by the application. The applica‐
1734 tion will not use these threads, but will be charged for their
1735 allocation. This option can not be used with the --core-spec
1736 option.
1737
1738 --threads-per-core=<threads>
1739 Restrict node selection to nodes with at least the specified
1740 number of threads per core. In task layout, use the specified
1741 maximum number of threads per core. NOTE: "Threads" refers to
1742 the number of processing units on each core rather than the num‐
1743 ber of application tasks to be launched per core. See addi‐
1744 tional information under -B option above when task/affinity
1745 plugin is enabled.
1746 NOTE: This option may implicitly set the number of tasks (if -n
1747 was not specified) as one task per requested thread.
1748
1749 -t, --time=<time>
1750 Set a limit on the total run time of the job allocation. If the
1751 requested time limit exceeds the partition's time limit, the job
1752 will be left in a PENDING state (possibly indefinitely). The
1753 default time limit is the partition's default time limit. When
1754 the time limit is reached, each task in each job step is sent
1755 SIGTERM followed by SIGKILL. The interval between signals is
1756 specified by the Slurm configuration parameter KillWait. The
1757 OverTimeLimit configuration parameter may permit the job to run
1758 longer than scheduled. Time resolution is one minute and second
1759 values are rounded up to the next minute.
1760
1761 A time limit of zero requests that no time limit be imposed.
1762 Acceptable time formats include "minutes", "minutes:seconds",
1763 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1764 "days-hours:minutes:seconds".
1765
1766 --time-min=<time>
1767 Set a minimum time limit on the job allocation. If specified,
1768 the job may have its --time limit lowered to a value no lower
1769 than --time-min if doing so permits the job to begin execution
1770 earlier than otherwise possible. The job's time limit will not
1771 be changed after the job is allocated resources. This is per‐
1772 formed by a backfill scheduling algorithm to allocate resources
1773 otherwise reserved for higher priority jobs. Acceptable time
1774 formats include "minutes", "minutes:seconds", "hours:min‐
1775 utes:seconds", "days-hours", "days-hours:minutes" and
1776 "days-hours:minutes:seconds".
1777
1778 --tmp=<size>[units]
1779 Specify a minimum amount of temporary disk space per node. De‐
1780 fault units are megabytes. Different units can be specified us‐
1781 ing the suffix [K|M|G|T].
1782
1783 --uid=<user>
1784 Attempt to submit and/or run a job as user instead of the invok‐
1785 ing user id. The invoking user's credentials will be used to
1786 check access permissions for the target partition. User root may
1787 use this option to run jobs as a normal user in a RootOnly par‐
1788 tition for example. If run as root, sbatch will drop its permis‐
1789 sions to the uid specified after node allocation is successful.
1790 user may be the user name or numerical user ID.
1791
1792 --usage
1793 Display brief help message and exit.
1794
1795 --use-min-nodes
1796 If a range of node counts is given, prefer the smaller count.
1797
1798 -v, --verbose
1799 Increase the verbosity of sbatch's informational messages. Mul‐
1800 tiple -v's will further increase sbatch's verbosity. By default
1801 only errors will be displayed.
1802
1803 -V, --version
1804 Display version information and exit.
1805
1806 -W, --wait
1807 Do not exit until the submitted job terminates. The exit code
1808 of the sbatch command will be the same as the exit code of the
1809 submitted job. If the job terminated due to a signal rather than
1810 a normal exit, the exit code will be set to 1. In the case of a
1811 job array, the exit code recorded will be the highest value for
1812 any task in the job array.
1813
1814 --wait-all-nodes=<value>
1815 Controls when the execution of the command begins. By default
1816 the job will begin execution as soon as the allocation is made.
1817
1818 0 Begin execution as soon as allocation can be made. Do not
1819 wait for all nodes to be ready for use (i.e. booted).
1820
1821 1 Do not begin execution until all nodes are ready for use.
1822
1823 --wckey=<wckey>
1824 Specify wckey to be used with job. If TrackWCKey=no (default)
1825 in the slurm.conf this value is ignored.
1826
1827 --wrap=<command_string>
1828 Sbatch will wrap the specified command string in a simple "sh"
1829 shell script, and submit that script to the slurm controller.
1830 When --wrap is used, a script name and arguments may not be
1831 specified on the command line; instead the sbatch-generated
1832 wrapper script is used.
1833
1835 sbatch allows for a filename pattern to contain one or more replacement
1836 symbols, which are a percent sign "%" followed by a letter (e.g. %j).
1837
1838
1839 \\ Do not process any of the replacement symbols.
1840
1841 %% The character "%".
1842
1843 %A Job array's master job allocation number.
1844
1845 %a Job array ID (index) number.
1846
1847 %J jobid.stepid of the running job. (e.g. "128.0")
1848
1849 %j jobid of the running job.
1850
1851 %N short hostname. This will create a separate IO file per node.
1852
1853 %n Node identifier relative to current job (e.g. "0" is the first
1854 node of the running job) This will create a separate IO file per
1855 node.
1856
1857 %s stepid of the running job.
1858
1859 %t task identifier (rank) relative to current job. This will create
1860 a separate IO file per task.
1861
1862 %u User name.
1863
1864 %x Job name.
1865
1866 A number placed between the percent character and format specifier may
1867 be used to zero-pad the result in the IO filename. This number is ig‐
1868 nored if the format specifier corresponds to non-numeric data (%N for
1869 example).
1870
1871 Some examples of how the format string may be used for a 4 task job
1872 step with a Job ID of 128 and step id of 0 are included below:
1873
1874
1875 job%J.out job128.0.out
1876
1877 job%4j.out job0128.out
1878
1879 job%j-%2t.out job128-00.out, job128-01.out, ...
1880
1882 Executing sbatch sends a remote procedure call to slurmctld. If enough
1883 calls from sbatch or other Slurm client commands that send remote pro‐
1884 cedure calls to the slurmctld daemon come in at once, it can result in
1885 a degradation of performance of the slurmctld daemon, possibly result‐
1886 ing in a denial of service.
1887
1888 Do not run sbatch or other Slurm client commands that send remote pro‐
1889 cedure calls to slurmctld from loops in shell scripts or other pro‐
1890 grams. Ensure that programs limit calls to sbatch to the minimum neces‐
1891 sary for the information you are trying to gather.
1892
1893
1895 Upon startup, sbatch will read and handle the options set in the fol‐
1896 lowing environment variables. The majority of these variables are set
1897 the same way the options are set, as defined above. For flag options
1898 that are defined to expect no argument, the option can be enabled by
1899 setting the environment variable without a value (empty or NULL
1900 string), the string 'yes', or a non-zero number. Any other value for
1901 the environment variable will result in the option not being set.
1902 There are a couple exceptions to these rules that are noted below.
1903 NOTE: Environment variables will override any options set in a batch
1904 script, and command line options will override any environment vari‐
1905 ables.
1906
1907
1908 SBATCH_ACCOUNT Same as -A, --account
1909
1910 SBATCH_ACCTG_FREQ Same as --acctg-freq
1911
1912 SBATCH_ARRAY_INX Same as -a, --array
1913
1914 SBATCH_BATCH Same as --batch
1915
1916 SBATCH_CLUSTERS or SLURM_CLUSTERS
1917 Same as --clusters
1918
1919 SBATCH_CONSTRAINT Same as -C, --constraint
1920
1921 SBATCH_CONTAINER Same as --container.
1922
1923 SBATCH_CORE_SPEC Same as --core-spec
1924
1925 SBATCH_CPUS_PER_GPU Same as --cpus-per-gpu
1926
1927 SBATCH_DEBUG Same as -v, --verbose. Must be set to 0 or 1 to
1928 disable or enable the option.
1929
1930 SBATCH_DELAY_BOOT Same as --delay-boot
1931
1932 SBATCH_DISTRIBUTION Same as -m, --distribution
1933
1934 SBATCH_EXCLUSIVE Same as --exclusive
1935
1936 SBATCH_EXPORT Same as --export
1937
1938 SBATCH_GET_USER_ENV Same as --get-user-env
1939
1940 SBATCH_GPU_BIND Same as --gpu-bind
1941
1942 SBATCH_GPU_FREQ Same as --gpu-freq
1943
1944 SBATCH_GPUS Same as -G, --gpus
1945
1946 SBATCH_GPUS_PER_NODE Same as --gpus-per-node
1947
1948 SBATCH_GPUS_PER_TASK Same as --gpus-per-task
1949
1950 SBATCH_GRES Same as --gres
1951
1952 SBATCH_GRES_FLAGS Same as --gres-flags
1953
1954 SBATCH_HINT or SLURM_HINT
1955 Same as --hint
1956
1957 SBATCH_IGNORE_PBS Same as --ignore-pbs
1958
1959 SBATCH_JOB_NAME Same as -J, --job-name
1960
1961 SBATCH_MEM_BIND Same as --mem-bind
1962
1963 SBATCH_MEM_PER_CPU Same as --mem-per-cpu
1964
1965 SBATCH_MEM_PER_GPU Same as --mem-per-gpu
1966
1967 SBATCH_MEM_PER_NODE Same as --mem
1968
1969 SBATCH_NETWORK Same as --network
1970
1971 SBATCH_NO_KILL Same as -k, --no-kill
1972
1973 SBATCH_NO_REQUEUE Same as --no-requeue
1974
1975 SBATCH_OPEN_MODE Same as --open-mode
1976
1977 SBATCH_OVERCOMMIT Same as -O, --overcommit
1978
1979 SBATCH_PARTITION Same as -p, --partition
1980
1981 SBATCH_POWER Same as --power
1982
1983 SBATCH_PROFILE Same as --profile
1984
1985 SBATCH_QOS Same as --qos
1986
1987 SBATCH_REQ_SWITCH When a tree topology is used, this defines the
1988 maximum count of switches desired for the job al‐
1989 location and optionally the maximum time to wait
1990 for that number of switches. See --switches
1991
1992 SBATCH_REQUEUE Same as --requeue
1993
1994 SBATCH_RESERVATION Same as --reservation
1995
1996 SBATCH_SIGNAL Same as --signal
1997
1998 SBATCH_SPREAD_JOB Same as --spread-job
1999
2000 SBATCH_THREAD_SPEC Same as --thread-spec
2001
2002 SBATCH_THREADS_PER_CORE
2003 Same as --threads-per-core
2004
2005 SBATCH_TIMELIMIT Same as -t, --time
2006
2007 SBATCH_USE_MIN_NODES Same as --use-min-nodes
2008
2009 SBATCH_WAIT Same as -W, --wait
2010
2011 SBATCH_WAIT_ALL_NODES Same as --wait-all-nodes. Must be set to 0 or 1
2012 to disable or enable the option.
2013
2014 SBATCH_WAIT4SWITCH Max time waiting for requested switches. See
2015 --switches
2016
2017 SBATCH_WCKEY Same as --wckey
2018
2019 SLURM_CONF The location of the Slurm configuration file.
2020
2021 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
2022 error occurs (e.g. invalid options). This can be
2023 used by a script to distinguish application exit
2024 codes from various Slurm error conditions.
2025
2026 SLURM_STEP_KILLED_MSG_NODE_ID=ID
2027 If set, only the specified node will log when the
2028 job or step are killed by a signal.
2029
2031 The Slurm controller will set the following variables in the environ‐
2032 ment of the batch script.
2033
2034
2035 SBATCH_MEM_BIND
2036 Set to value of the --mem-bind option.
2037
2038 SBATCH_MEM_BIND_LIST
2039 Set to bit mask used for memory binding.
2040
2041 SBATCH_MEM_BIND_PREFER
2042 Set to "prefer" if the --mem-bind option includes the prefer op‐
2043 tion.
2044
2045 SBATCH_MEM_BIND_TYPE
2046 Set to the memory binding type specified with the --mem-bind op‐
2047 tion. Possible values are "none", "rank", "map_map", "mask_mem"
2048 and "local".
2049
2050 SBATCH_MEM_BIND_VERBOSE
2051 Set to "verbose" if the --mem-bind option includes the verbose
2052 option. Set to "quiet" otherwise.
2053
2054 SLURM_*_HET_GROUP_#
2055 For a heterogeneous job allocation, the environment variables
2056 are set separately for each component.
2057
2058 SLURM_ARRAY_JOB_ID
2059 Job array's master job ID number.
2060
2061 SLURM_ARRAY_TASK_COUNT
2062 Total number of tasks in a job array.
2063
2064 SLURM_ARRAY_TASK_ID
2065 Job array ID (index) number.
2066
2067 SLURM_ARRAY_TASK_MAX
2068 Job array's maximum ID (index) number.
2069
2070 SLURM_ARRAY_TASK_MIN
2071 Job array's minimum ID (index) number.
2072
2073 SLURM_ARRAY_TASK_STEP
2074 Job array's index step size.
2075
2076 SLURM_CLUSTER_NAME
2077 Name of the cluster on which the job is executing.
2078
2079 SLURM_CPUS_ON_NODE
2080 Number of CPUs allocated to the batch step. NOTE: The se‐
2081 lect/linear plugin allocates entire nodes to jobs, so the value
2082 indicates the total count of CPUs on the node. For the se‐
2083 lect/cons_res and cons/tres plugins, this number indicates the
2084 number of CPUs on this node allocated to the step.
2085
2086 SLURM_CPUS_PER_GPU
2087 Number of CPUs requested per allocated GPU. Only set if the
2088 --cpus-per-gpu option is specified.
2089
2090 SLURM_CPUS_PER_TASK
2091 Number of cpus requested per task. Only set if the
2092 --cpus-per-task option is specified.
2093
2094 SLURM_CONTAINER
2095 OCI Bundle for job. Only set if --container is specified.
2096
2097 SLURM_DIST_PLANESIZE
2098 Plane distribution size. Only set for plane distributions. See
2099 -m, --distribution.
2100
2101 SLURM_DISTRIBUTION
2102 Same as -m, --distribution
2103
2104 SLURM_EXPORT_ENV
2105 Same as --export.
2106
2107 SLURM_GPU_BIND
2108 Requested binding of tasks to GPU. Only set if the --gpu-bind
2109 option is specified.
2110
2111 SLURM_GPU_FREQ
2112 Requested GPU frequency. Only set if the --gpu-freq option is
2113 specified.
2114
2115 SLURM_GPUS
2116 Number of GPUs requested. Only set if the -G, --gpus option is
2117 specified.
2118
2119 SLURM_GPUS_ON_NODE
2120 Number of GPUs allocated to the batch step.
2121
2122 SLURM_GPUS_PER_NODE
2123 Requested GPU count per allocated node. Only set if the
2124 --gpus-per-node option is specified.
2125
2126 SLURM_GPUS_PER_SOCKET
2127 Requested GPU count per allocated socket. Only set if the
2128 --gpus-per-socket option is specified.
2129
2130 SLURM_GPUS_PER_TASK
2131 Requested GPU count per allocated task. Only set if the
2132 --gpus-per-task option is specified.
2133
2134 SLURM_GTIDS
2135 Global task IDs running on this node. Zero origin and comma
2136 separated. It is read internally by pmi if Slurm was built with
2137 pmi support. Leaving the variable set may cause problems when
2138 using external packages from within the job (Abaqus and Ansys
2139 have been known to have problems when it is set - consult the
2140 appropriate documentation for 3rd party software).
2141
2142 SLURM_HET_SIZE
2143 Set to count of components in heterogeneous job.
2144
2145 SLURM_JOB_ACCOUNT
2146 Account name associated of the job allocation.
2147
2148 SLURM_JOB_ID
2149 The ID of the job allocation.
2150
2151 SLURM_JOB_CPUS_PER_NODE
2152 Count of CPUs available to the job on the nodes in the alloca‐
2153 tion, using the format CPU_count[(xnumber_of_nodes)][,CPU_count
2154 [(xnumber_of_nodes)] ...]. For example:
2155 SLURM_JOB_CPUS_PER_NODE='72(x2),36' indicates that on the first
2156 and second nodes (as listed by SLURM_JOB_NODELIST) the alloca‐
2157 tion has 72 CPUs, while the third node has 36 CPUs. NOTE: The
2158 select/linear plugin allocates entire nodes to jobs, so the
2159 value indicates the total count of CPUs on allocated nodes. The
2160 select/cons_res and select/cons_tres plugins allocate individual
2161 CPUs to jobs, so this number indicates the number of CPUs allo‐
2162 cated to the job.
2163
2164 SLURM_JOB_DEPENDENCY
2165 Set to value of the --dependency option.
2166
2167 SLURM_JOB_NAME
2168 Name of the job.
2169
2170 SLURM_JOB_NODELIST
2171 List of nodes allocated to the job.
2172
2173 SLURM_JOB_NUM_NODES
2174 Total number of nodes in the job's resource allocation.
2175
2176 SLURM_JOB_PARTITION
2177 Name of the partition in which the job is running.
2178
2179 SLURM_JOB_QOS
2180 Quality Of Service (QOS) of the job allocation.
2181
2182 SLURM_JOB_RESERVATION
2183 Advanced reservation containing the job allocation, if any.
2184
2185 SLURM_JOBID
2186 The ID of the job allocation. See SLURM_JOB_ID. Included for
2187 backwards compatibility.
2188
2189 SLURM_LOCALID
2190 Node local task ID for the process within a job.
2191
2192 SLURM_MEM_PER_CPU
2193 Same as --mem-per-cpu
2194
2195 SLURM_MEM_PER_GPU
2196 Requested memory per allocated GPU. Only set if the
2197 --mem-per-gpu option is specified.
2198
2199 SLURM_MEM_PER_NODE
2200 Same as --mem
2201
2202 SLURM_NNODES
2203 Total number of nodes in the job's resource allocation. See
2204 SLURM_JOB_NUM_NODES. Included for backwards compatibility.
2205
2206 SLURM_NODE_ALIASES
2207 Sets of node name, communication address and hostname for nodes
2208 allocated to the job from the cloud. Each element in the set if
2209 colon separated and each set is comma separated. For example:
2210 SLURM_NODE_ALIASES=ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2211
2212 SLURM_NODEID
2213 ID of the nodes allocated.
2214
2215 SLURM_NODELIST
2216 List of nodes allocated to the job. See SLURM_JOB_NODELIST. In‐
2217 cluded for backwards compatibility.
2218
2219 SLURM_NPROCS
2220 Same as -n, --ntasks. See SLURM_NTASKS. Included for backwards
2221 compatibility.
2222
2223 SLURM_NTASKS
2224 Same as -n, --ntasks
2225
2226 SLURM_NTASKS_PER_CORE
2227 Number of tasks requested per core. Only set if the
2228 --ntasks-per-core option is specified.
2229
2230
2231 SLURM_NTASKS_PER_GPU
2232 Number of tasks requested per GPU. Only set if the
2233 --ntasks-per-gpu option is specified.
2234
2235 SLURM_NTASKS_PER_NODE
2236 Number of tasks requested per node. Only set if the
2237 --ntasks-per-node option is specified.
2238
2239 SLURM_NTASKS_PER_SOCKET
2240 Number of tasks requested per socket. Only set if the
2241 --ntasks-per-socket option is specified.
2242
2243 SLURM_OVERCOMMIT
2244 Set to 1 if --overcommit was specified.
2245
2246 SLURM_PRIO_PROCESS
2247 The scheduling priority (nice value) at the time of job submis‐
2248 sion. This value is propagated to the spawned processes.
2249
2250 SLURM_PROCID
2251 The MPI rank (or relative process ID) of the current process
2252
2253 SLURM_PROFILE
2254 Same as --profile
2255
2256 SLURM_RESTART_COUNT
2257 If the job has been restarted due to system failure or has been
2258 explicitly requeued, this will be sent to the number of times
2259 the job has been restarted.
2260
2261 SLURM_SUBMIT_DIR
2262 The directory from which sbatch was invoked.
2263
2264 SLURM_SUBMIT_HOST
2265 The hostname of the computer from which sbatch was invoked.
2266
2267 SLURM_TASK_PID
2268 The process ID of the task being started.
2269
2270 SLURM_TASKS_PER_NODE
2271 Number of tasks to be initiated on each node. Values are comma
2272 separated and in the same order as SLURM_JOB_NODELIST. If two
2273 or more consecutive nodes are to have the same task count, that
2274 count is followed by "(x#)" where "#" is the repetition count.
2275 For example, "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2276 first three nodes will each execute two tasks and the fourth
2277 node will execute one task.
2278
2279 SLURM_THREADS_PER_CORE
2280 This is only set if --threads-per-core or
2281 SBATCH_THREADS_PER_CORE were specified. The value will be set to
2282 the value specified by --threads-per-core or
2283 SBATCH_THREADS_PER_CORE. This is used by subsequent srun calls
2284 within the job allocation.
2285
2286 SLURM_TOPOLOGY_ADDR
2287 This is set only if the system has the topology/tree plugin
2288 configured. The value will be set to the names network
2289 switches which may be involved in the job's communications
2290 from the system's top level switch down to the leaf switch and
2291 ending with node name. A period is used to separate each hard‐
2292 ware component name.
2293
2294 SLURM_TOPOLOGY_ADDR_PATTERN
2295 This is set only if the system has the topology/tree plugin
2296 configured. The value will be set component types listed in
2297 SLURM_TOPOLOGY_ADDR. Each component will be identified as ei‐
2298 ther "switch" or "node". A period is used to separate each
2299 hardware component type.
2300
2301 SLURMD_NODENAME
2302 Name of the node running the job script.
2303
2305 Specify a batch script by filename on the command line. The batch
2306 script specifies a 1 minute time limit for the job.
2307
2308 $ cat myscript
2309 #!/bin/sh
2310 #SBATCH --time=1
2311 srun hostname |sort
2312
2313 $ sbatch -N4 myscript
2314 salloc: Granted job allocation 65537
2315
2316 $ cat slurm-65537.out
2317 host1
2318 host2
2319 host3
2320 host4
2321
2322
2323 Pass a batch script to sbatch on standard input:
2324
2325 $ sbatch -N4 <<EOF
2326 > #!/bin/sh
2327 > srun hostname |sort
2328 > EOF
2329 sbatch: Submitted batch job 65541
2330
2331 $ cat slurm-65541.out
2332 host1
2333 host2
2334 host3
2335 host4
2336
2337
2338 To create a heterogeneous job with 3 components, each allocating a
2339 unique set of nodes:
2340
2341 $ sbatch -w node[2-3] : -w node4 : -w node[5-7] work.bash
2342 Submitted batch job 34987
2343
2344
2346 Copyright (C) 2006-2007 The Regents of the University of California.
2347 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
2348 Copyright (C) 2008-2010 Lawrence Livermore National Security.
2349 Copyright (C) 2010-2022 SchedMD LLC.
2350
2351 This file is part of Slurm, a resource management program. For de‐
2352 tails, see <https://slurm.schedmd.com/>.
2353
2354 Slurm is free software; you can redistribute it and/or modify it under
2355 the terms of the GNU General Public License as published by the Free
2356 Software Foundation; either version 2 of the License, or (at your op‐
2357 tion) any later version.
2358
2359 Slurm is distributed in the hope that it will be useful, but WITHOUT
2360 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
2361 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
2362 for more details.
2363
2364
2366 sinfo(1), sattach(1), salloc(1), squeue(1), scancel(1), scontrol(1),
2367 slurm.conf(5), sched_setaffinity (2), numa (3)
2368
2369
2370
2371April 2022 Slurm Commands sbatch(1)