1sbatch(1) Slurm Commands sbatch(1)
2
3
4
6 sbatch - Submit a batch script to Slurm.
7
8
10 sbatch [OPTIONS(0)...] [ : [OPTIONS(N)...]] script(0) [args(0)...]
11
12 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
13 For more details about heterogeneous jobs see the document
14 https://slurm.schedmd.com/heterogeneous_jobs.html
15
16
18 sbatch submits a batch script to Slurm. The batch script may be given
19 to sbatch through a file name on the command line, or if no file name
20 is specified, sbatch will read in a script from standard input. The
21 batch script may contain options preceded with "#SBATCH" before any ex‐
22 ecutable commands in the script. sbatch will stop processing further
23 #SBATCH directives once the first non-comment non-whitespace line has
24 been reached in the script.
25
26 sbatch exits immediately after the script is successfully transferred
27 to the Slurm controller and assigned a Slurm job ID. The batch script
28 is not necessarily granted resources immediately, it may sit in the
29 queue of pending jobs for some time before its required resources be‐
30 come available.
31
32 By default both standard output and standard error are directed to a
33 file of the name "slurm-%j.out", where the "%j" is replaced with the
34 job allocation number. The file will be generated on the first node of
35 the job allocation. Other than the batch script itself, Slurm does no
36 movement of user files.
37
38 When the job allocation is finally granted for the batch script, Slurm
39 runs a single copy of the batch script on the first node in the set of
40 allocated nodes.
41
42 The following document describes the influence of various options on
43 the allocation of cpus to jobs and tasks.
44 https://slurm.schedmd.com/cpu_management.html
45
46
48 sbatch will return 0 on success or error code on failure.
49
50
52 The batch script is resolved in the following order:
53
54 1. If script starts with ".", then path is constructed as: current
55 working directory / script
56 2. If script starts with a "/", then path is considered absolute.
57 3. If script is in current working directory.
58 4. If script can be resolved through PATH. See path_resolution(7).
59
60 Current working directory is the calling process working directory un‐
61 less the --chdir argument is passed, which will override the current
62 working directory.
63
64
66 -A, --account=<account>
67 Charge resources used by this job to specified account. The ac‐
68 count is an arbitrary string. The account name may be changed
69 after job submission using the scontrol command.
70
71 --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
72 Define the job accounting and profiling sampling intervals in
73 seconds. This can be used to override the JobAcctGatherFre‐
74 quency parameter in the slurm.conf file. <datatype>=<interval>
75 specifies the task sampling interval for the jobacct_gather
76 plugin or a sampling interval for a profiling type by the
77 acct_gather_profile plugin. Multiple comma-separated
78 <datatype>=<interval> pairs may be specified. Supported datatype
79 values are:
80
81 task Sampling interval for the jobacct_gather plugins and
82 for task profiling by the acct_gather_profile
83 plugin.
84 NOTE: This frequency is used to monitor memory us‐
85 age. If memory limits are enforced, the highest fre‐
86 quency a user can request is what is configured in
87 the slurm.conf file. It can not be disabled.
88
89 energy Sampling interval for energy profiling using the
90 acct_gather_energy plugin.
91
92 network Sampling interval for infiniband profiling using the
93 acct_gather_interconnect plugin.
94
95 filesystem Sampling interval for filesystem profiling using the
96 acct_gather_filesystem plugin.
97
98 The default value for the task sampling interval is 30 seconds.
99 The default value for all other intervals is 0. An interval of
100 0 disables sampling of the specified type. If the task sampling
101 interval is 0, accounting information is collected only at job
102 termination (reducing Slurm interference with the job).
103 Smaller (non-zero) values have a greater impact upon job perfor‐
104 mance, but a value of 30 seconds is not likely to be noticeable
105 for applications having less than 10,000 tasks.
106
107 -a, --array=<indexes>
108 Submit a job array, multiple jobs to be executed with identical
109 parameters. The indexes specification identifies what array in‐
110 dex values should be used. Multiple values may be specified us‐
111 ing a comma separated list and/or a range of values with a "-"
112 separator. For example, "--array=0-15" or "--array=0,6,16-32".
113 A step function can also be specified with a suffix containing a
114 colon and number. For example, "--array=0-15:4" is equivalent to
115 "--array=0,4,8,12". A maximum number of simultaneously running
116 tasks from the job array may be specified using a "%" separator.
117 For example "--array=0-15%4" will limit the number of simultane‐
118 ously running tasks from this job array to 4. The minimum index
119 value is 0. the maximum value is one less than the configura‐
120 tion parameter MaxArraySize. NOTE: currently, federated job ar‐
121 rays only run on the local cluster.
122
123 --batch=<list>
124 Nodes can have features assigned to them by the Slurm adminis‐
125 trator. Users can specify which of these features are required
126 by their batch script using this options. For example a job's
127 allocation may include both Intel Haswell and KNL nodes with
128 features "haswell" and "knl" respectively. On such a configura‐
129 tion the batch script would normally benefit by executing on a
130 faster Haswell node. This would be specified using the option
131 "--batch=haswell". The specification can include AND and OR op‐
132 erators using the ampersand and vertical bar separators. For ex‐
133 ample: "--batch=haswell|broadwell" or "--batch=haswell|big_mem‐
134 ory". The --batch argument must be a subset of the job's --con‐
135 straint=<list> argument (i.e. the job can not request only KNL
136 nodes, but require the script to execute on a Haswell node). If
137 the request can not be satisfied from the resources allocated to
138 the job, the batch script will execute on the first node of the
139 job allocation.
140
141 --bb=<spec>
142 Burst buffer specification. The form of the specification is
143 system dependent. Also see --bbf. When the --bb option is
144 used, Slurm parses this option and creates a temporary burst
145 buffer script file that is used internally by the burst buffer
146 plugins. See Slurm's burst buffer guide for more information and
147 examples:
148 https://slurm.schedmd.com/burst_buffer.html
149
150 --bbf=<file_name>
151 Path of file containing burst buffer specification. The form of
152 the specification is system dependent. These burst buffer di‐
153 rectives will be inserted into the submitted batch script. See
154 Slurm's burst buffer guide for more information and examples:
155 https://slurm.schedmd.com/burst_buffer.html
156
157 -b, --begin=<time>
158 Submit the batch script to the Slurm controller immediately,
159 like normal, but tell the controller to defer the allocation of
160 the job until the specified time.
161
162 Time may be of the form HH:MM:SS to run a job at a specific time
163 of day (seconds are optional). (If that time is already past,
164 the next day is assumed.) You may also specify midnight, noon,
165 fika (3 PM) or teatime (4 PM) and you can have a time-of-day
166 suffixed with AM or PM for running in the morning or the
167 evening. You can also say what day the job will be run, by
168 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
169 Combine date and time using the following format
170 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
171 count time-units, where the time-units can be seconds (default),
172 minutes, hours, days, or weeks and you can tell Slurm to run the
173 job today with the keyword today and to run the job tomorrow
174 with the keyword tomorrow. The value may be changed after job
175 submission using the scontrol command. For example:
176
177 --begin=16:00
178 --begin=now+1hour
179 --begin=now+60 (seconds by default)
180 --begin=2010-01-20T12:34:00
181
182
183 Notes on date/time specifications:
184 - Although the 'seconds' field of the HH:MM:SS time specifica‐
185 tion is allowed by the code, note that the poll time of the
186 Slurm scheduler is not precise enough to guarantee dispatch of
187 the job on the exact second. The job will be eligible to start
188 on the next poll following the specified time. The exact poll
189 interval depends on the Slurm scheduler (e.g., 60 seconds with
190 the default sched/builtin).
191 - If no time (HH:MM:SS) is specified, the default is
192 (00:00:00).
193 - If a date is specified without a year (e.g., MM/DD) then the
194 current year is assumed, unless the combination of MM/DD and
195 HH:MM:SS has already passed for that year, in which case the
196 next year is used.
197
198 -D, --chdir=<directory>
199 Set the working directory of the batch script to directory be‐
200 fore it is executed. The path can be specified as full path or
201 relative path to the directory where the command is executed.
202
203 --cluster-constraint=[!]<list>
204 Specifies features that a federated cluster must have to have a
205 sibling job submitted to it. Slurm will attempt to submit a sib‐
206 ling job to a cluster if it has at least one of the specified
207 features. If the "!" option is included, Slurm will attempt to
208 submit a sibling job to a cluster that has none of the specified
209 features.
210
211 -M, --clusters=<string>
212 Clusters to issue commands to. Multiple cluster names may be
213 comma separated. The job will be submitted to the one cluster
214 providing the earliest expected job initiation time. The default
215 value is the current cluster. A value of 'all' will query to run
216 on all clusters. Note the --export option to control environ‐
217 ment variables exported between clusters. Note that the Slur‐
218 mDBD must be up for this option to work properly.
219
220 --comment=<string>
221 An arbitrary comment enclosed in double quotes if using spaces
222 or some special characters.
223
224 -C, --constraint=<list>
225 Nodes can have features assigned to them by the Slurm adminis‐
226 trator. Users can specify which of these features are required
227 by their job using the constraint option. If you are looking for
228 'soft' constraints please see see --prefer for more information.
229 Only nodes having features matching the job constraints will be
230 used to satisfy the request. Multiple constraints may be speci‐
231 fied with AND, OR, matching OR, resource counts, etc. (some op‐
232 erators are not supported on all system types).
233
234 NOTE: If features that are part of the node_features/helpers
235 plugin are requested, then only the Single Name and AND options
236 are supported.
237
238 Supported --constraint options include:
239
240 Single Name
241 Only nodes which have the specified feature will be used.
242 For example, --constraint="intel"
243
244 Node Count
245 A request can specify the number of nodes needed with
246 some feature by appending an asterisk and count after the
247 feature name. For example, --nodes=16 --con‐
248 straint="graphics*4 ..." indicates that the job requires
249 16 nodes and that at least four of those nodes must have
250 the feature "graphics."
251
252 AND If only nodes with all of specified features will be
253 used. The ampersand is used for an AND operator. For
254 example, --constraint="intel&gpu"
255
256 OR If only nodes with at least one of specified features
257 will be used. The vertical bar is used for an OR opera‐
258 tor. For example, --constraint="intel|amd"
259
260 Matching OR
261 If only one of a set of possible options should be used
262 for all allocated nodes, then use the OR operator and en‐
263 close the options within square brackets. For example,
264 --constraint="[rack1|rack2|rack3|rack4]" might be used to
265 specify that all nodes must be allocated on a single rack
266 of the cluster, but any of those four racks can be used.
267
268 Multiple Counts
269 Specific counts of multiple resources may be specified by
270 using the AND operator and enclosing the options within
271 square brackets. For example, --con‐
272 straint="[rack1*2&rack2*4]" might be used to specify that
273 two nodes must be allocated from nodes with the feature
274 of "rack1" and four nodes must be allocated from nodes
275 with the feature "rack2".
276
277 NOTE: This construct does not support multiple Intel KNL
278 NUMA or MCDRAM modes. For example, while --con‐
279 straint="[(knl&quad)*2&(knl&hemi)*4]" is not supported,
280 --constraint="[haswell*2&(knl&hemi)*4]" is supported.
281 Specification of multiple KNL modes requires the use of a
282 heterogeneous job.
283
284 NOTE: Multiple Counts can cause jobs to be allocated with
285 a non-optimal network layout.
286
287 Brackets
288 Brackets can be used to indicate that you are looking for
289 a set of nodes with the different requirements contained
290 within the brackets. For example, --con‐
291 straint="[(rack1|rack2)*1&(rack3)*2]" will get you one
292 node with either the "rack1" or "rack2" features and two
293 nodes with the "rack3" feature. The same request without
294 the brackets will try to find a single node that meets
295 those requirements.
296
297 NOTE: Brackets are only reserved for Multiple Counts and
298 Matching OR syntax. AND operators require a count for
299 each feature inside square brackets (i.e.
300 "[quad*2&hemi*1]"). Slurm will only allow a single set of
301 bracketed constraints per job.
302
303 Parenthesis
304 Parenthesis can be used to group like node features to‐
305 gether. For example, --con‐
306 straint="[(knl&snc4&flat)*4&haswell*1]" might be used to
307 specify that four nodes with the features "knl", "snc4"
308 and "flat" plus one node with the feature "haswell" are
309 required. All options within parenthesis should be
310 grouped with AND (e.g. "&") operands.
311
312 --container=<path_to_container>
313 Absolute path to OCI container bundle.
314
315 --contiguous
316 If set, then the allocated nodes must form a contiguous set.
317
318 NOTE: If SelectPlugin=cons_res this option won't be honored with
319 the topology/tree or topology/3d_torus plugins, both of which
320 can modify the node ordering.
321
322 -S, --core-spec=<num>
323 Count of specialized cores per node reserved by the job for sys‐
324 tem operations and not used by the application. The application
325 will not use these cores, but will be charged for their alloca‐
326 tion. Default value is dependent upon the node's configured
327 CoreSpecCount value. If a value of zero is designated and the
328 Slurm configuration option AllowSpecResourcesUsage is enabled,
329 the job will be allowed to override CoreSpecCount and use the
330 specialized resources on nodes it is allocated. This option can
331 not be used with the --thread-spec option.
332
333 NOTE: Explicitly setting a job's specialized core value implic‐
334 itly sets its --exclusive option, reserving entire nodes for the
335 job.
336
337 --cores-per-socket=<cores>
338 Restrict node selection to nodes with at least the specified
339 number of cores per socket. See additional information under -B
340 option above when task/affinity plugin is enabled.
341 NOTE: This option may implicitly set the number of tasks (if -n
342 was not specified) as one task per requested thread.
343
344 --cpu-freq=<p1>[-p2[:p3]]
345
346 Request that job steps initiated by srun commands inside this
347 sbatch script be run at some requested frequency if possible, on
348 the CPUs selected for the step on the compute node(s).
349
350 p1 can be [#### | low | medium | high | highm1] which will set
351 the frequency scaling_speed to the corresponding value, and set
352 the frequency scaling_governor to UserSpace. See below for defi‐
353 nition of the values.
354
355 p1 can be [Conservative | OnDemand | Performance | PowerSave]
356 which will set the scaling_governor to the corresponding value.
357 The governor has to be in the list set by the slurm.conf option
358 CpuFreqGovernors.
359
360 When p2 is present, p1 will be the minimum scaling frequency and
361 p2 will be the maximum scaling frequency.
362
363 p2 can be [#### | medium | high | highm1] p2 must be greater
364 than p1.
365
366 p3 can be [Conservative | OnDemand | Performance | PowerSave |
367 SchedUtil | UserSpace] which will set the governor to the corre‐
368 sponding value.
369
370 If p3 is UserSpace, the frequency scaling_speed will be set by a
371 power or energy aware scheduling strategy to a value between p1
372 and p2 that lets the job run within the site's power goal. The
373 job may be delayed if p1 is higher than a frequency that allows
374 the job to run within the goal.
375
376 If the current frequency is < min, it will be set to min. Like‐
377 wise, if the current frequency is > max, it will be set to max.
378
379 Acceptable values at present include:
380
381 #### frequency in kilohertz
382
383 Low the lowest available frequency
384
385 High the highest available frequency
386
387 HighM1 (high minus one) will select the next highest
388 available frequency
389
390 Medium attempts to set a frequency in the middle of the
391 available range
392
393 Conservative attempts to use the Conservative CPU governor
394
395 OnDemand attempts to use the OnDemand CPU governor (the de‐
396 fault value)
397
398 Performance attempts to use the Performance CPU governor
399
400 PowerSave attempts to use the PowerSave CPU governor
401
402 UserSpace attempts to use the UserSpace CPU governor
403
404 The following informational environment variable is set in the job step
405 when --cpu-freq option is requested.
406 SLURM_CPU_FREQ_REQ
407
408 This environment variable can also be used to supply the value for the
409 CPU frequency request if it is set when the 'srun' command is issued.
410 The --cpu-freq on the command line will override the environment vari‐
411 able value. The form on the environment variable is the same as the
412 command line. See the ENVIRONMENT VARIABLES section for a description
413 of the SLURM_CPU_FREQ_REQ variable.
414
415 NOTE: This parameter is treated as a request, not a requirement. If
416 the job step's node does not support setting the CPU frequency, or the
417 requested value is outside the bounds of the legal frequencies, an er‐
418 ror is logged, but the job step is allowed to continue.
419
420 NOTE: Setting the frequency for just the CPUs of the job step implies
421 that the tasks are confined to those CPUs. If task confinement (i.e.
422 the task/affinity TaskPlugin is enabled, or the task/cgroup TaskPlugin
423 is enabled with "ConstrainCores=yes" set in cgroup.conf) is not config‐
424 ured, this parameter is ignored.
425
426 NOTE: When the step completes, the frequency and governor of each se‐
427 lected CPU is reset to the previous values.
428
429 NOTE: When submitting jobs with the --cpu-freq option with linuxproc
430 as the ProctrackType can cause jobs to run too quickly before Account‐
431 ing is able to poll for job information. As a result not all of ac‐
432 counting information will be present.
433
434 --cpus-per-gpu=<ncpus>
435 Advise Slurm that ensuing job steps will require ncpus proces‐
436 sors per allocated GPU. Not compatible with the --cpus-per-task
437 option.
438
439 -c, --cpus-per-task=<ncpus>
440 Advise the Slurm controller that ensuing job steps will require
441 ncpus number of processors per task. Without this option, the
442 controller will just try to allocate one processor per task.
443
444 For instance, consider an application that has 4 tasks, each re‐
445 quiring 3 processors. If our cluster is comprised of quad-pro‐
446 cessors nodes and we simply ask for 12 processors, the con‐
447 troller might give us only 3 nodes. However, by using the
448 --cpus-per-task=3 options, the controller knows that each task
449 requires 3 processors on the same node, and the controller will
450 grant an allocation of 4 nodes, one for each of the 4 tasks.
451
452 NOTE: Beginning with 22.05, srun will not inherit the
453 --cpus-per-task value requested by salloc or sbatch. It must be
454 requested again with the call to srun or set with the
455 SRUN_CPUS_PER_TASK environment variable if desired for the
456 task(s).
457
458 --deadline=<OPT>
459 remove the job if no ending is possible before this deadline
460 (start > (deadline - time[-min])). Default is no deadline.
461 Valid time formats are:
462 HH:MM[:SS] [AM|PM]
463 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
464 MM/DD[/YY]-HH:MM[:SS]
465 YYYY-MM-DD[THH:MM[:SS]]]
466 now[+count[seconds(default)|minutes|hours|days|weeks]]
467
468 --delay-boot=<minutes>
469 Do not reboot nodes in order to satisfied this job's feature
470 specification if the job has been eligible to run for less than
471 this time period. If the job has waited for less than the spec‐
472 ified period, it will use only nodes which already have the
473 specified features. The argument is in units of minutes. A de‐
474 fault value may be set by a system administrator using the de‐
475 lay_boot option of the SchedulerParameters configuration parame‐
476 ter in the slurm.conf file, otherwise the default value is zero
477 (no delay).
478
479 -d, --dependency=<dependency_list>
480 Defer the start of this job until the specified dependencies
481 have been satisfied completed. <dependency_list> is of the form
482 <type:job_id[:job_id][,type:job_id[:job_id]]> or
483 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
484 must be satisfied if the "," separator is used. Any dependency
485 may be satisfied if the "?" separator is used. Only one separa‐
486 tor may be used. Many jobs can share the same dependency and
487 these jobs may even belong to different users. The value may
488 be changed after job submission using the scontrol command. De‐
489 pendencies on remote jobs are allowed in a federation. Once a
490 job dependency fails due to the termination state of a preceding
491 job, the dependent job will never be run, even if the preceding
492 job is requeued and has a different termination state in a sub‐
493 sequent execution.
494
495 after:job_id[[+time][:jobid[+time]...]]
496 After the specified jobs start or are cancelled and
497 'time' in minutes from job start or cancellation happens,
498 this job can begin execution. If no 'time' is given then
499 there is no delay after start or cancellation.
500
501 afterany:job_id[:jobid...]
502 This job can begin execution after the specified jobs
503 have terminated.
504
505 afterburstbuffer:job_id[:jobid...]
506 This job can begin execution after the specified jobs
507 have terminated and any associated burst buffer stage out
508 operations have completed.
509
510 aftercorr:job_id[:jobid...]
511 A task of this job array can begin execution after the
512 corresponding task ID in the specified job has completed
513 successfully (ran to completion with an exit code of
514 zero).
515
516 afternotok:job_id[:jobid...]
517 This job can begin execution after the specified jobs
518 have terminated in some failed state (non-zero exit code,
519 node failure, timed out, etc).
520
521 afterok:job_id[:jobid...]
522 This job can begin execution after the specified jobs
523 have successfully executed (ran to completion with an
524 exit code of zero).
525
526 singleton
527 This job can begin execution after any previously
528 launched jobs sharing the same job name and user have
529 terminated. In other words, only one job by that name
530 and owned by that user can be running or suspended at any
531 point in time. In a federation, a singleton dependency
532 must be fulfilled on all clusters unless DependencyParam‐
533 eters=disable_remote_singleton is used in slurm.conf.
534
535 -m, --distribution={*|block|cyclic|arbi‐
536 trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
537
538 Specify alternate distribution methods for remote processes.
539 For job allocation, this sets environment variables that will be
540 used by subsequent srun requests and also affects which cores
541 will be selected for job allocation.
542
543 This option controls the distribution of tasks to the nodes on
544 which resources have been allocated, and the distribution of
545 those resources to tasks for binding (task affinity). The first
546 distribution method (before the first ":") controls the distri‐
547 bution of tasks to nodes. The second distribution method (after
548 the first ":") controls the distribution of allocated CPUs
549 across sockets for binding to tasks. The third distribution
550 method (after the second ":") controls the distribution of allo‐
551 cated CPUs across cores for binding to tasks. The second and
552 third distributions apply only if task affinity is enabled. The
553 third distribution is supported only if the task/cgroup plugin
554 is configured. The default value for each distribution type is
555 specified by *.
556
557 Note that with select/cons_res and select/cons_tres, the number
558 of CPUs allocated to each socket and node may be different. Re‐
559 fer to https://slurm.schedmd.com/mc_support.html for more infor‐
560 mation on resource allocation, distribution of tasks to nodes,
561 and binding of tasks to CPUs.
562 First distribution method (distribution of tasks across nodes):
563
564
565 * Use the default method for distributing tasks to nodes
566 (block).
567
568 block The block distribution method will distribute tasks to a
569 node such that consecutive tasks share a node. For exam‐
570 ple, consider an allocation of three nodes each with two
571 cpus. A four-task block distribution request will dis‐
572 tribute those tasks to the nodes with tasks one and two
573 on the first node, task three on the second node, and
574 task four on the third node. Block distribution is the
575 default behavior if the number of tasks exceeds the num‐
576 ber of allocated nodes.
577
578 cyclic The cyclic distribution method will distribute tasks to a
579 node such that consecutive tasks are distributed over
580 consecutive nodes (in a round-robin fashion). For exam‐
581 ple, consider an allocation of three nodes each with two
582 cpus. A four-task cyclic distribution request will dis‐
583 tribute those tasks to the nodes with tasks one and four
584 on the first node, task two on the second node, and task
585 three on the third node. Note that when SelectType is
586 select/cons_res, the same number of CPUs may not be allo‐
587 cated on each node. Task distribution will be round-robin
588 among all the nodes with CPUs yet to be assigned to
589 tasks. Cyclic distribution is the default behavior if
590 the number of tasks is no larger than the number of allo‐
591 cated nodes.
592
593 plane The tasks are distributed in blocks of size <size>. The
594 size must be given or SLURM_DIST_PLANESIZE must be set.
595 The number of tasks distributed to each node is the same
596 as for cyclic distribution, but the taskids assigned to
597 each node depend on the plane size. Additional distribu‐
598 tion specifications cannot be combined with this option.
599 For more details (including examples and diagrams),
600 please see https://slurm.schedmd.com/mc_support.html and
601 https://slurm.schedmd.com/dist_plane.html
602
603 arbitrary
604 The arbitrary method of distribution will allocate pro‐
605 cesses in-order as listed in file designated by the envi‐
606 ronment variable SLURM_HOSTFILE. If this variable is
607 listed it will over ride any other method specified. If
608 not set the method will default to block. Inside the
609 hostfile must contain at minimum the number of hosts re‐
610 quested and be one per line or comma separated. If spec‐
611 ifying a task count (-n, --ntasks=<number>), your tasks
612 will be laid out on the nodes in the order of the file.
613 NOTE: The arbitrary distribution option on a job alloca‐
614 tion only controls the nodes to be allocated to the job
615 and not the allocation of CPUs on those nodes. This op‐
616 tion is meant primarily to control a job step's task lay‐
617 out in an existing job allocation for the srun command.
618 NOTE: If the number of tasks is given and a list of re‐
619 quested nodes is also given, the number of nodes used
620 from that list will be reduced to match that of the num‐
621 ber of tasks if the number of nodes in the list is
622 greater than the number of tasks.
623
624 Second distribution method (distribution of CPUs across sockets
625 for binding):
626
627
628 * Use the default method for distributing CPUs across sock‐
629 ets (cyclic).
630
631 block The block distribution method will distribute allocated
632 CPUs consecutively from the same socket for binding to
633 tasks, before using the next consecutive socket.
634
635 cyclic The cyclic distribution method will distribute allocated
636 CPUs for binding to a given task consecutively from the
637 same socket, and from the next consecutive socket for the
638 next task, in a round-robin fashion across sockets.
639 Tasks requiring more than one CPU will have all of those
640 CPUs allocated on a single socket if possible.
641
642 fcyclic
643 The fcyclic distribution method will distribute allocated
644 CPUs for binding to tasks from consecutive sockets in a
645 round-robin fashion across the sockets. Tasks requiring
646 more than one CPU will have each CPUs allocated in a
647 cyclic fashion across sockets.
648
649 Third distribution method (distribution of CPUs across cores for
650 binding):
651
652
653 * Use the default method for distributing CPUs across cores
654 (inherited from second distribution method).
655
656 block The block distribution method will distribute allocated
657 CPUs consecutively from the same core for binding to
658 tasks, before using the next consecutive core.
659
660 cyclic The cyclic distribution method will distribute allocated
661 CPUs for binding to a given task consecutively from the
662 same core, and from the next consecutive core for the
663 next task, in a round-robin fashion across cores.
664
665 fcyclic
666 The fcyclic distribution method will distribute allocated
667 CPUs for binding to tasks from consecutive cores in a
668 round-robin fashion across the cores.
669
670 Optional control for task distribution over nodes:
671
672
673 Pack Rather than evenly distributing a job step's tasks evenly
674 across its allocated nodes, pack them as tightly as pos‐
675 sible on the nodes. This only applies when the "block"
676 task distribution method is used.
677
678 NoPack Rather than packing a job step's tasks as tightly as pos‐
679 sible on the nodes, distribute them evenly. This user
680 option will supersede the SelectTypeParameters
681 CR_Pack_Nodes configuration parameter.
682
683 -e, --error=<filename_pattern>
684 Instruct Slurm to connect the batch script's standard error di‐
685 rectly to the file name specified in the "filename pattern". By
686 default both standard output and standard error are directed to
687 the same file. For job arrays, the default file name is
688 "slurm-%A_%a.out", "%A" is replaced by the job ID and "%a" with
689 the array index. For other jobs, the default file name is
690 "slurm-%j.out", where the "%j" is replaced by the job ID. See
691 the filename pattern section below for filename specification
692 options.
693
694 -x, --exclude=<node_name_list>
695 Explicitly exclude certain nodes from the resources granted to
696 the job.
697
698 --exclusive[={user|mcs}]
699 The job allocation can not share nodes with other running jobs
700 (or just other users with the "=user" option or with the "=mcs"
701 option). If user/mcs are not specified (i.e. the job allocation
702 can not share nodes with other running jobs), the job is allo‐
703 cated all CPUs and GRES on all nodes in the allocation, but is
704 only allocated as much memory as it requested. This is by design
705 to support gang scheduling, because suspended jobs still reside
706 in memory. To request all the memory on a node, use --mem=0.
707 The default shared/exclusive behavior depends on system configu‐
708 ration and the partition's OverSubscribe option takes precedence
709 over the job's option. NOTE: Since shared GRES (MPS) cannot be
710 allocated at the same time as a sharing GRES (GPU) this option
711 only allocates all sharing GRES and no underlying shared GRES.
712
713 --export={[ALL,]<environment_variables>|ALL|NONE}
714 Identify which environment variables from the submission envi‐
715 ronment are propagated to the launched application. Note that
716 SLURM_* variables are always propagated.
717
718 --export=ALL
719 Default mode if --export is not specified. All of the
720 user's environment will be loaded (either from the
721 caller's environment or from a clean environment if
722 --get-user-env is specified).
723
724 --export=NONE
725 Only SLURM_* variables from the user environment will
726 be defined. User must use absolute path to the binary
727 to be executed that will define the environment. User
728 can not specify explicit environment variables with
729 "NONE". --get-user-env will be ignored.
730
731 This option is particularly important for jobs that
732 are submitted on one cluster and execute on a differ‐
733 ent cluster (e.g. with different paths). To avoid
734 steps inheriting environment export settings (e.g.
735 "NONE") from sbatch command, the environment variable
736 SLURM_EXPORT_ENV should be set to "ALL" in the job
737 script.
738
739 --export=[ALL,]<environment_variables>
740 Exports all SLURM_* environment variables along with
741 explicitly defined variables. Multiple environment
742 variable names should be comma separated. Environment
743 variable names may be specified to propagate the cur‐
744 rent value (e.g. "--export=EDITOR") or specific values
745 may be exported (e.g. "--export=EDITOR=/bin/emacs").
746 If "ALL" is specified, then all user environment vari‐
747 ables will be loaded and will take precedence over any
748 explicitly given environment variables.
749
750 Example: --export=EDITOR,ARG1=test
751 In this example, the propagated environment will only
752 contain the variable EDITOR from the user's environ‐
753 ment, SLURM_* environment variables, and ARG1=test.
754
755 Example: --export=ALL,EDITOR=/bin/emacs
756 There are two possible outcomes for this example. If
757 the caller has the EDITOR environment variable de‐
758 fined, then the job's environment will inherit the
759 variable from the caller's environment. If the caller
760 doesn't have an environment variable defined for EDI‐
761 TOR, then the job's environment will use the value
762 given by --export.
763
764 --export-file={<filename>|<fd>}
765 If a number between 3 and OPEN_MAX is specified as the argument
766 to this option, a readable file descriptor will be assumed
767 (STDIN and STDOUT are not supported as valid arguments). Other‐
768 wise a filename is assumed. Export environment variables de‐
769 fined in <filename> or read from <fd> to the job's execution en‐
770 vironment. The content is one or more environment variable defi‐
771 nitions of the form NAME=value, each separated by a null charac‐
772 ter. This allows the use of special characters in environment
773 definitions.
774
775 -B, --extra-node-info=<sockets>[:cores[:threads]]
776 Restrict node selection to nodes with at least the specified
777 number of sockets, cores per socket and/or threads per core.
778 NOTE: These options do not specify the resource allocation size.
779 Each value specified is considered a minimum. An asterisk (*)
780 can be used as a placeholder indicating that all available re‐
781 sources of that type are to be utilized. Values can also be
782 specified as min-max. The individual levels can also be speci‐
783 fied in separate options if desired:
784 --sockets-per-node=<sockets>
785 --cores-per-socket=<cores>
786 --threads-per-core=<threads>
787 If task/affinity plugin is enabled, then specifying an alloca‐
788 tion in this manner also results in subsequently launched tasks
789 being bound to threads if the -B option specifies a thread
790 count, otherwise an option of cores if a core count is speci‐
791 fied, otherwise an option of sockets. If SelectType is config‐
792 ured to select/cons_res, it must have a parameter of CR_Core,
793 CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option
794 to be honored. If not specified, the scontrol show job will
795 display 'ReqS:C:T=*:*:*'. This option applies to job alloca‐
796 tions.
797 NOTE: This option is mutually exclusive with --hint,
798 --threads-per-core and --ntasks-per-core.
799 NOTE: This option may implicitly set the number of tasks (if -n
800 was not specified) as one task per requested thread.
801
802 --get-user-env[=timeout][mode]
803 This option will tell sbatch to retrieve the login environment
804 variables for the user specified in the --uid option. The envi‐
805 ronment variables are retrieved by running something of this
806 sort "su - <username> -c /usr/bin/env" and parsing the output.
807 Be aware that any environment variables already set in sbatch's
808 environment will take precedence over any environment variables
809 in the user's login environment. Clear any environment variables
810 before calling sbatch that you do not want propagated to the
811 spawned program. The optional timeout value is in seconds. De‐
812 fault value is 8 seconds. The optional mode value control the
813 "su" options. With a mode value of "S", "su" is executed with‐
814 out the "-" option. With a mode value of "L", "su" is executed
815 with the "-" option, replicating the login environment. If mode
816 not specified, the mode established at Slurm build time is used.
817 Example of use include "--get-user-env", "--get-user-env=10"
818 "--get-user-env=10L", and "--get-user-env=S".
819
820 --gid=<group>
821 If sbatch is run as root, and the --gid option is used, submit
822 the job with group's group access permissions. group may be the
823 group name or the numerical group ID.
824
825 --gpu-bind=[verbose,]<type>
826 Bind tasks to specific GPUs. By default every spawned task can
827 access every GPU allocated to the step. If "verbose," is speci‐
828 fied before <type>, then print out GPU binding debug information
829 to the stderr of the tasks. GPU binding is ignored if there is
830 only one task.
831
832 Supported type options:
833
834 closest Bind each task to the GPU(s) which are closest. In a
835 NUMA environment, each task may be bound to more than
836 one GPU (i.e. all GPUs in that NUMA environment).
837
838 map_gpu:<list>
839 Bind by setting GPU masks on tasks (or ranks) as spec‐
840 ified where <list> is
841 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
842 are interpreted as decimal values unless they are pre‐
843 ceded with '0x' in which case they interpreted as
844 hexadecimal values. If the number of tasks (or ranks)
845 exceeds the number of elements in this list, elements
846 in the list will be reused as needed starting from the
847 beginning of the list. To simplify support for large
848 task counts, the lists may follow a map with an aster‐
849 isk and repetition count. For example
850 "map_gpu:0*4,1*4". If the task/cgroup plugin is used
851 and ConstrainDevices is set in cgroup.conf, then the
852 GPU IDs are zero-based indexes relative to the GPUs
853 allocated to the job (e.g. the first GPU is 0, even if
854 the global ID is 3). Otherwise, the GPU IDs are global
855 IDs, and all GPUs on each node in the job should be
856 allocated for predictable binding results.
857
858 mask_gpu:<list>
859 Bind by setting GPU masks on tasks (or ranks) as spec‐
860 ified where <list> is
861 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
862 mapping is specified for a node and identical mapping
863 is applied to the tasks on every node (i.e. the lowest
864 task ID on each node is mapped to the first mask spec‐
865 ified in the list, etc.). GPU masks are always inter‐
866 preted as hexadecimal values but can be preceded with
867 an optional '0x'. To simplify support for large task
868 counts, the lists may follow a map with an asterisk
869 and repetition count. For example
870 "mask_gpu:0x0f*4,0xf0*4". If the task/cgroup plugin
871 is used and ConstrainDevices is set in cgroup.conf,
872 then the GPU IDs are zero-based indexes relative to
873 the GPUs allocated to the job (e.g. the first GPU is
874 0, even if the global ID is 3). Otherwise, the GPU IDs
875 are global IDs, and all GPUs on each node in the job
876 should be allocated for predictable binding results.
877
878 none Do not bind tasks to GPUs (turns off binding if
879 --gpus-per-task is requested).
880
881 per_task:<gpus_per_task>
882 Each task will be bound to the number of gpus speci‐
883 fied in <gpus_per_task>. Gpus are assigned in order to
884 tasks. The first task will be assigned the first x
885 number of gpus on the node etc.
886
887 single:<tasks_per_gpu>
888 Like --gpu-bind=closest, except that each task can
889 only be bound to a single GPU, even when it can be
890 bound to multiple GPUs that are equally close. The
891 GPU to bind to is determined by <tasks_per_gpu>, where
892 the first <tasks_per_gpu> tasks are bound to the first
893 GPU available, the second <tasks_per_gpu> tasks are
894 bound to the second GPU available, etc. This is basi‐
895 cally a block distribution of tasks onto available
896 GPUs, where the available GPUs are determined by the
897 socket affinity of the task and the socket affinity of
898 the GPUs as specified in gres.conf's Cores parameter.
899
900 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
901 Request that GPUs allocated to the job are configured with spe‐
902 cific frequency values. This option can be used to indepen‐
903 dently configure the GPU and its memory frequencies. After the
904 job is completed, the frequencies of all affected GPUs will be
905 reset to the highest possible values. In some cases, system
906 power caps may override the requested values. The field type
907 can be "memory". If type is not specified, the GPU frequency is
908 implied. The value field can either be "low", "medium", "high",
909 "highm1" or a numeric value in megahertz (MHz). If the speci‐
910 fied numeric value is not possible, a value as close as possible
911 will be used. See below for definition of the values. The ver‐
912 bose option causes current GPU frequency information to be
913 logged. Examples of use include "--gpu-freq=medium,memory=high"
914 and "--gpu-freq=450".
915
916 Supported value definitions:
917
918 low the lowest available frequency.
919
920 medium attempts to set a frequency in the middle of the
921 available range.
922
923 high the highest available frequency.
924
925 highm1 (high minus one) will select the next highest avail‐
926 able frequency.
927
928 -G, --gpus=[type:]<number>
929 Specify the total number of GPUs required for the job. An op‐
930 tional GPU type specification can be supplied. For example
931 "--gpus=volta:3". Multiple options can be requested in a comma
932 separated list, for example: "--gpus=volta:3,kepler:1". See
933 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
934 options.
935 NOTE: The allocation has to contain at least one GPU per node.
936
937 --gpus-per-node=[type:]<number>
938 Specify the number of GPUs required for the job on each node in‐
939 cluded in the job's resource allocation. An optional GPU type
940 specification can be supplied. For example
941 "--gpus-per-node=volta:3". Multiple options can be requested in
942 a comma separated list, for example:
943 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
944 --gpus-per-socket and --gpus-per-task options.
945
946 --gpus-per-socket=[type:]<number>
947 Specify the number of GPUs required for the job on each socket
948 included in the job's resource allocation. An optional GPU type
949 specification can be supplied. For example
950 "--gpus-per-socket=volta:3". Multiple options can be requested
951 in a comma separated list, for example:
952 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
953 sockets per node count ( --sockets-per-node). See also the
954 --gpus, --gpus-per-node and --gpus-per-task options.
955
956 --gpus-per-task=[type:]<number>
957 Specify the number of GPUs required for the job on each task to
958 be spawned in the job's resource allocation. An optional GPU
959 type specification can be supplied. For example
960 "--gpus-per-task=volta:1". Multiple options can be requested in
961 a comma separated list, for example:
962 "--gpus-per-task=volta:3,kepler:1". See also the --gpus,
963 --gpus-per-socket and --gpus-per-node options. This option re‐
964 quires an explicit task count, e.g. -n, --ntasks or "--gpus=X
965 --gpus-per-task=Y" rather than an ambiguous range of nodes with
966 -N, --nodes. This option will implicitly set
967 --gpu-bind=per_task:<gpus_per_task>, but that can be overridden
968 with an explicit --gpu-bind specification.
969
970 --gres=<list>
971 Specifies a comma-delimited list of generic consumable re‐
972 sources. The format of each entry on the list is
973 "name[[:type]:count]". The name is that of the consumable re‐
974 source. The count is the number of those resources with a de‐
975 fault value of 1. The count can have a suffix of "k" or "K"
976 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
977 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
978 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
979 x 1024 x 1024 x 1024). The specified resources will be allo‐
980 cated to the job on each node. The available generic consumable
981 resources is configurable by the system administrator. A list
982 of available generic consumable resources will be printed and
983 the command will exit if the option argument is "help". Exam‐
984 ples of use include "--gres=gpu:2", "--gres=gpu:kepler:2", and
985 "--gres=help".
986
987 --gres-flags=<type>
988 Specify generic resource task binding options.
989
990 disable-binding
991 Disable filtering of CPUs with respect to generic re‐
992 source locality. This option is currently required to
993 use more CPUs than are bound to a GRES (i.e. if a GPU is
994 bound to the CPUs on one socket, but resources on more
995 than one socket are required to run the job). This op‐
996 tion may permit a job to be allocated resources sooner
997 than otherwise possible, but may result in lower job per‐
998 formance.
999 NOTE: This option is specific to SelectType=cons_res.
1000
1001 enforce-binding
1002 The only CPUs available to the job will be those bound to
1003 the selected GRES (i.e. the CPUs identified in the
1004 gres.conf file will be strictly enforced). This option
1005 may result in delayed initiation of a job. For example a
1006 job requiring two GPUs and one CPU will be delayed until
1007 both GPUs on a single socket are available rather than
1008 using GPUs bound to separate sockets, however, the appli‐
1009 cation performance may be improved due to improved commu‐
1010 nication speed. Requires the node to be configured with
1011 more than one socket and resource filtering will be per‐
1012 formed on a per-socket basis.
1013 NOTE: This option is specific to SelectType=cons_tres.
1014
1015 -h, --help
1016 Display help information and exit.
1017
1018 --hint=<type>
1019 Bind tasks according to application hints.
1020 NOTE: This option cannot be used in conjunction with
1021 --ntasks-per-core, --threads-per-core or -B. If --hint is speci‐
1022 fied as a command line argument, it will take precedence over
1023 the environment.
1024
1025 compute_bound
1026 Select settings for compute bound applications: use all
1027 cores in each socket, one thread per core.
1028
1029 memory_bound
1030 Select settings for memory bound applications: use only
1031 one core in each socket, one thread per core.
1032
1033 [no]multithread
1034 [don't] use extra threads with in-core multi-threading
1035 which can benefit communication intensive applications.
1036 Only supported with the task/affinity plugin.
1037
1038 help show this help message
1039
1040 -H, --hold
1041 Specify the job is to be submitted in a held state (priority of
1042 zero). A held job can now be released using scontrol to reset
1043 its priority (e.g. "scontrol release <job_id>").
1044
1045 --ignore-pbs
1046 Ignore all "#PBS" and "#BSUB" options specified in the batch
1047 script.
1048
1049 -i, --input=<filename_pattern>
1050 Instruct Slurm to connect the batch script's standard input di‐
1051 rectly to the file name specified in the "filename pattern".
1052
1053 By default, "/dev/null" is open on the batch script's standard
1054 input and both standard output and standard error are directed
1055 to a file of the name "slurm-%j.out", where the "%j" is replaced
1056 with the job allocation number, as described below in the file‐
1057 name pattern section.
1058
1059 -J, --job-name=<jobname>
1060 Specify a name for the job allocation. The specified name will
1061 appear along with the job id number when querying running jobs
1062 on the system. The default is the name of the batch script, or
1063 just "sbatch" if the script is read on sbatch's standard input.
1064
1065 --kill-on-invalid-dep=<yes|no>
1066 If a job has an invalid dependency and it can never run this pa‐
1067 rameter tells Slurm to terminate it or not. A terminated job
1068 state will be JOB_CANCELLED. If this option is not specified
1069 the system wide behavior applies. By default the job stays
1070 pending with reason DependencyNeverSatisfied or if the kill_in‐
1071 valid_depend is specified in slurm.conf the job is terminated.
1072
1073 -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
1074 Specification of licenses (or other resources available on all
1075 nodes of the cluster) which must be allocated to this job. Li‐
1076 cense names can be followed by a colon and count (the default
1077 count is one). Multiple license names should be comma separated
1078 (e.g. "--licenses=foo:4,bar"). To submit jobs using remote li‐
1079 censes, those served by the slurmdbd, specify the name of the
1080 server providing the licenses. For example "--license=nas‐
1081 tran@slurmdb:12".
1082
1083 NOTE: When submitting heterogeneous jobs, license requests only
1084 work correctly when made on the first component job. For exam‐
1085 ple "sbatch -L ansys:2 : script.sh".
1086
1087 --mail-type=<type>
1088 Notify user by email when certain event types occur. Valid type
1089 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1090 BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
1091 VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1092 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1093 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
1094 percent of time limit), TIME_LIMIT_50 (reached 50 percent of
1095 time limit) and ARRAY_TASKS (send emails for each array task).
1096 Multiple type values may be specified in a comma separated list.
1097 The user to be notified is indicated with --mail-user. Unless
1098 the ARRAY_TASKS option is specified, mail notifications on job
1099 BEGIN, END and FAIL apply to a job array as a whole rather than
1100 generating individual email messages for each task in the job
1101 array.
1102
1103 --mail-user=<user>
1104 User to receive email notification of state changes as defined
1105 by --mail-type. The default value is the submitting user.
1106
1107 --mcs-label=<mcs>
1108 Used only when the mcs/group plugin is enabled. This parameter
1109 is a group among the groups of the user. Default value is cal‐
1110 culated by the Plugin mcs if it's enabled.
1111
1112 --mem=<size>[units]
1113 Specify the real memory required per node. Default units are
1114 megabytes. Different units can be specified using the suffix
1115 [K|M|G|T]. Default value is DefMemPerNode and the maximum value
1116 is MaxMemPerNode. If configured, both parameters can be seen us‐
1117 ing the scontrol show config command. This parameter would gen‐
1118 erally be used if whole nodes are allocated to jobs (Select‐
1119 Type=select/linear). Also see --mem-per-cpu and --mem-per-gpu.
1120 The --mem, --mem-per-cpu and --mem-per-gpu options are mutually
1121 exclusive. If --mem, --mem-per-cpu or --mem-per-gpu are speci‐
1122 fied as command line arguments, then they will take precedence
1123 over the environment.
1124
1125 NOTE: A memory size specification of zero is treated as a spe‐
1126 cial case and grants the job access to all of the memory on each
1127 node.
1128
1129 NOTE: Enforcement of memory limits currently relies upon the
1130 task/cgroup plugin or enabling of accounting, which samples mem‐
1131 ory use on a periodic basis (data need not be stored, just col‐
1132 lected). In both cases memory use is based upon the job's Resi‐
1133 dent Set Size (RSS). A task may exceed the memory limit until
1134 the next periodic accounting sample.
1135
1136 --mem-bind=[{quiet|verbose},]<type>
1137 Bind tasks to memory. Used only when the task/affinity plugin is
1138 enabled and the NUMA memory functions are available. Note that
1139 the resolution of CPU and memory binding may differ on some ar‐
1140 chitectures. For example, CPU binding may be performed at the
1141 level of the cores within a processor while memory binding will
1142 be performed at the level of nodes, where the definition of
1143 "nodes" may differ from system to system. By default no memory
1144 binding is performed; any task using any CPU can use any memory.
1145 This option is typically used to ensure that each task is bound
1146 to the memory closest to its assigned CPU. The use of any type
1147 other than "none" or "local" is not recommended.
1148
1149 NOTE: To have Slurm always report on the selected memory binding
1150 for all commands executed in a shell, you can enable verbose
1151 mode by setting the SLURM_MEM_BIND environment variable value to
1152 "verbose".
1153
1154 The following informational environment variables are set when
1155 --mem-bind is in use:
1156
1157 SLURM_MEM_BIND_LIST
1158 SLURM_MEM_BIND_PREFER
1159 SLURM_MEM_BIND_SORT
1160 SLURM_MEM_BIND_TYPE
1161 SLURM_MEM_BIND_VERBOSE
1162
1163 See the ENVIRONMENT VARIABLES section for a more detailed de‐
1164 scription of the individual SLURM_MEM_BIND* variables.
1165
1166 Supported options include:
1167
1168 help show this help message
1169
1170 local Use memory local to the processor in use
1171
1172 map_mem:<list>
1173 Bind by setting memory masks on tasks (or ranks) as spec‐
1174 ified where <list> is
1175 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1176 ping is specified for a node and identical mapping is ap‐
1177 plied to the tasks on every node (i.e. the lowest task ID
1178 on each node is mapped to the first ID specified in the
1179 list, etc.). NUMA IDs are interpreted as decimal values
1180 unless they are preceded with '0x' in which case they in‐
1181 terpreted as hexadecimal values. If the number of tasks
1182 (or ranks) exceeds the number of elements in this list,
1183 elements in the list will be reused as needed starting
1184 from the beginning of the list. To simplify support for
1185 large task counts, the lists may follow a map with an as‐
1186 terisk and repetition count. For example
1187 "map_mem:0x0f*4,0xf0*4". For predictable binding re‐
1188 sults, all CPUs for each node in the job should be allo‐
1189 cated to the job.
1190
1191 mask_mem:<list>
1192 Bind by setting memory masks on tasks (or ranks) as spec‐
1193 ified where <list> is
1194 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1195 mapping is specified for a node and identical mapping is
1196 applied to the tasks on every node (i.e. the lowest task
1197 ID on each node is mapped to the first mask specified in
1198 the list, etc.). NUMA masks are always interpreted as
1199 hexadecimal values. Note that masks must be preceded
1200 with a '0x' if they don't begin with [0-9] so they are
1201 seen as numerical values. If the number of tasks (or
1202 ranks) exceeds the number of elements in this list, ele‐
1203 ments in the list will be reused as needed starting from
1204 the beginning of the list. To simplify support for large
1205 task counts, the lists may follow a mask with an asterisk
1206 and repetition count. For example "mask_mem:0*4,1*4".
1207 For predictable binding results, all CPUs for each node
1208 in the job should be allocated to the job.
1209
1210 no[ne] don't bind tasks to memory (default)
1211
1212 p[refer]
1213 Prefer use of first specified NUMA node, but permit
1214 use of other available NUMA nodes.
1215
1216 q[uiet]
1217 quietly bind before task runs (default)
1218
1219 rank bind by task rank (not recommended)
1220
1221 sort sort free cache pages (run zonesort on Intel KNL nodes)
1222
1223 v[erbose]
1224 verbosely report binding before task runs
1225
1226 --mem-per-cpu=<size>[units]
1227 Minimum memory required per usable allocated CPU. Default units
1228 are megabytes. The default value is DefMemPerCPU and the maxi‐
1229 mum value is MaxMemPerCPU (see exception below). If configured,
1230 both parameters can be seen using the scontrol show config com‐
1231 mand. Note that if the job's --mem-per-cpu value exceeds the
1232 configured MaxMemPerCPU, then the user's limit will be treated
1233 as a memory limit per task; --mem-per-cpu will be reduced to a
1234 value no larger than MaxMemPerCPU; --cpus-per-task will be set
1235 and the value of --cpus-per-task multiplied by the new
1236 --mem-per-cpu value will equal the original --mem-per-cpu value
1237 specified by the user. This parameter would generally be used
1238 if individual processors are allocated to jobs (SelectType=se‐
1239 lect/cons_res). If resources are allocated by core, socket, or
1240 whole nodes, then the number of CPUs allocated to a job may be
1241 higher than the task count and the value of --mem-per-cpu should
1242 be adjusted accordingly. Also see --mem and --mem-per-gpu. The
1243 --mem, --mem-per-cpu and --mem-per-gpu options are mutually ex‐
1244 clusive.
1245
1246 NOTE: If the final amount of memory requested by a job can't be
1247 satisfied by any of the nodes configured in the partition, the
1248 job will be rejected. This could happen if --mem-per-cpu is
1249 used with the --exclusive option for a job allocation and
1250 --mem-per-cpu times the number of CPUs on a node is greater than
1251 the total memory of that node.
1252
1253 NOTE: This applies to usable allocated CPUs in a job allocation.
1254 This is important when more than one thread per core is config‐
1255 ured. If a job requests --threads-per-core with fewer threads
1256 on a core than exist on the core (or --hint=nomultithread which
1257 implies --threads-per-core=1), the job will be unable to use
1258 those extra threads on the core and those threads will not be
1259 included in the memory per CPU calculation. But if the job has
1260 access to all threads on the core, those threads will be in‐
1261 cluded in the memory per CPU calculation even if the job did not
1262 explicitly request those threads.
1263
1264 In the following examples, each core has two threads.
1265
1266 In this first example, two tasks can run on separate hyper‐
1267 threads in the same core because --threads-per-core is not used.
1268 The third task uses both threads of the second core. The allo‐
1269 cated memory per cpu includes all threads:
1270
1271 $ salloc -n3 --mem-per-cpu=100
1272 salloc: Granted job allocation 17199
1273 $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
1274 JobID ReqTRES AllocTRES
1275 ------- ----------------------------------- -----------------------------------
1276 17199 billing=3,cpu=3,mem=300M,node=1 billing=4,cpu=4,mem=400M,node=1
1277
1278 In this second example, because of --threads-per-core=1, each
1279 task is allocated an entire core but is only able to use one
1280 thread per core. Allocated CPUs includes all threads on each
1281 core. However, allocated memory per cpu includes only the usable
1282 thread in each core.
1283
1284 $ salloc -n3 --mem-per-cpu=100 --threads-per-core=1
1285 salloc: Granted job allocation 17200
1286 $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
1287 JobID ReqTRES AllocTRES
1288 ------- ----------------------------------- -----------------------------------
1289 17200 billing=3,cpu=3,mem=300M,node=1 billing=6,cpu=6,mem=300M,node=1
1290
1291 --mem-per-gpu=<size>[units]
1292 Minimum memory required per allocated GPU. Default units are
1293 megabytes. Different units can be specified using the suffix
1294 [K|M|G|T]. Default value is DefMemPerGPU and is available on
1295 both a global and per partition basis. If configured, the pa‐
1296 rameters can be seen using the scontrol show config and scontrol
1297 show partition commands. Also see --mem. The --mem,
1298 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1299
1300 --mincpus=<n>
1301 Specify a minimum number of logical cpus/processors per node.
1302
1303 --network=<type>
1304 Specify information pertaining to the switch or network. The
1305 interpretation of type is system dependent. This option is sup‐
1306 ported when running Slurm on a Cray natively. It is used to re‐
1307 quest using Network Performance Counters. Only one value per
1308 request is valid. All options are case in-sensitive. In this
1309 configuration supported values include:
1310
1311 system
1312 Use the system-wide network performance counters. Only
1313 nodes requested will be marked in use for the job alloca‐
1314 tion. If the job does not fill up the entire system the
1315 rest of the nodes are not able to be used by other jobs
1316 using NPC, if idle their state will appear as PerfCnts.
1317 These nodes are still available for other jobs not using
1318 NPC.
1319
1320 blade Use the blade network performance counters. Only nodes re‐
1321 quested will be marked in use for the job allocation. If
1322 the job does not fill up the entire blade(s) allocated to
1323 the job those blade(s) are not able to be used by other
1324 jobs using NPC, if idle their state will appear as PerfC‐
1325 nts. These nodes are still available for other jobs not
1326 using NPC.
1327
1328 In all cases the job allocation request must specify the --exclusive
1329 option. Otherwise the request will be denied.
1330
1331 Also with any of these options steps are not allowed to share blades,
1332 so resources would remain idle inside an allocation if the step running
1333 on a blade does not take up all the nodes on the blade.
1334
1335 --nice[=adjustment]
1336 Run the job with an adjusted scheduling priority within Slurm.
1337 With no adjustment value the scheduling priority is decreased by
1338 100. A negative nice value increases the priority, otherwise de‐
1339 creases it. The adjustment range is +/- 2147483645. Only privi‐
1340 leged users can specify a negative adjustment.
1341
1342 -k, --no-kill[=off]
1343 Do not automatically terminate a job if one of the nodes it has
1344 been allocated fails. The user will assume the responsibilities
1345 for fault-tolerance should a node fail. When there is a node
1346 failure, any active job steps (usually MPI jobs) on that node
1347 will almost certainly suffer a fatal error, but with --no-kill,
1348 the job allocation will not be revoked so the user may launch
1349 new job steps on the remaining nodes in their allocation.
1350
1351 Specify an optional argument of "off" disable the effect of the
1352 SBATCH_NO_KILL environment variable.
1353
1354 By default Slurm terminates the entire job allocation if any
1355 node fails in its range of allocated nodes.
1356
1357 --no-requeue
1358 Specifies that the batch job should never be requeued under any
1359 circumstances. Setting this option will prevent system adminis‐
1360 trators from being able to restart the job (for example, after a
1361 scheduled downtime), recover from a node failure, or be requeued
1362 upon preemption by a higher priority job. When a job is re‐
1363 queued, the batch script is initiated from its beginning. Also
1364 see the --requeue option. The JobRequeue configuration parame‐
1365 ter controls the default behavior on the cluster.
1366
1367 -F, --nodefile=<node_file>
1368 Much like --nodelist, but the list is contained in a file of
1369 name node file. The node names of the list may also span multi‐
1370 ple lines in the file. Duplicate node names in the file will
1371 be ignored. The order of the node names in the list is not im‐
1372 portant; the node names will be sorted by Slurm.
1373
1374 -w, --nodelist=<node_name_list>
1375 Request a specific list of hosts. The job will contain all of
1376 these hosts and possibly additional hosts as needed to satisfy
1377 resource requirements. The list may be specified as a
1378 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1379 for example), or a filename. The host list will be assumed to
1380 be a filename if it contains a "/" character. If you specify a
1381 minimum node or processor count larger than can be satisfied by
1382 the supplied host list, additional resources will be allocated
1383 on other nodes as needed. Duplicate node names in the list will
1384 be ignored. The order of the node names in the list is not im‐
1385 portant; the node names will be sorted by Slurm.
1386
1387 -N, --nodes=<minnodes>[-maxnodes]
1388 Request that a minimum of minnodes nodes be allocated to this
1389 job. A maximum node count may also be specified with maxnodes.
1390 If only one number is specified, this is used as both the mini‐
1391 mum and maximum node count. The partition's node limits super‐
1392 sede those of the job. If a job's node limits are outside of
1393 the range permitted for its associated partition, the job will
1394 be left in a PENDING state. This permits possible execution at
1395 a later time, when the partition limit is changed. If a job
1396 node limit exceeds the number of nodes configured in the parti‐
1397 tion, the job will be rejected. Note that the environment vari‐
1398 able SLURM_JOB_NUM_NODES will be set to the count of nodes actu‐
1399 ally allocated to the job. See the ENVIRONMENT VARIABLES sec‐
1400 tion for more information. If -N is not specified, the default
1401 behavior is to allocate enough nodes to satisfy the requested
1402 resources as expressed by per-job specification options, e.g.
1403 -n, -c and --gpus. The job will be allocated as many nodes as
1404 possible within the range specified and without delaying the
1405 initiation of the job. The node count specification may include
1406 a numeric value followed by a suffix of "k" (multiplies numeric
1407 value by 1,024) or "m" (multiplies numeric value by 1,048,576).
1408
1409 -n, --ntasks=<number>
1410 sbatch does not launch tasks, it requests an allocation of re‐
1411 sources and submits a batch script. This option advises the
1412 Slurm controller that job steps run within the allocation will
1413 launch a maximum of number tasks and to provide for sufficient
1414 resources. The default is one task per node, but note that the
1415 --cpus-per-task option will change this default.
1416
1417 --ntasks-per-core=<ntasks>
1418 Request the maximum ntasks be invoked on each core. Meant to be
1419 used with the --ntasks option. Related to --ntasks-per-node ex‐
1420 cept at the core level instead of the node level. NOTE: This
1421 option is not supported when using SelectType=select/linear.
1422
1423 --ntasks-per-gpu=<ntasks>
1424 Request that there are ntasks tasks invoked for every GPU. This
1425 option can work in two ways: 1) either specify --ntasks in addi‐
1426 tion, in which case a type-less GPU specification will be auto‐
1427 matically determined to satisfy --ntasks-per-gpu, or 2) specify
1428 the GPUs wanted (e.g. via --gpus or --gres) without specifying
1429 --ntasks, and the total task count will be automatically deter‐
1430 mined. The number of CPUs needed will be automatically in‐
1431 creased if necessary to allow for any calculated task count.
1432 This option will implicitly set --gpu-bind=single:<ntasks>, but
1433 that can be overridden with an explicit --gpu-bind specifica‐
1434 tion. This option is not compatible with a node range (i.e.
1435 -N<minnodes-maxnodes>). This option is not compatible with
1436 --gpus-per-task, --gpus-per-socket, or --ntasks-per-node. This
1437 option is not supported unless SelectType=cons_tres is config‐
1438 ured (either directly or indirectly on Cray systems).
1439
1440 --ntasks-per-node=<ntasks>
1441 Request that ntasks be invoked on each node. If used with the
1442 --ntasks option, the --ntasks option will take precedence and
1443 the --ntasks-per-node will be treated as a maximum count of
1444 tasks per node. Meant to be used with the --nodes option. This
1445 is related to --cpus-per-task=ncpus, but does not require knowl‐
1446 edge of the actual number of cpus on each node. In some cases,
1447 it is more convenient to be able to request that no more than a
1448 specific number of tasks be invoked on each node. Examples of
1449 this include submitting a hybrid MPI/OpenMP app where only one
1450 MPI "task/rank" should be assigned to each node while allowing
1451 the OpenMP portion to utilize all of the parallelism present in
1452 the node, or submitting a single setup/cleanup/monitoring job to
1453 each node of a pre-existing allocation as one step in a larger
1454 job script.
1455
1456 --ntasks-per-socket=<ntasks>
1457 Request the maximum ntasks be invoked on each socket. Meant to
1458 be used with the --ntasks option. Related to --ntasks-per-node
1459 except at the socket level instead of the node level. NOTE:
1460 This option is not supported when using SelectType=select/lin‐
1461 ear.
1462
1463 --open-mode={append|truncate}
1464 Open the output and error files using append or truncate mode as
1465 specified. The default value is specified by the system config‐
1466 uration parameter JobFileAppend.
1467
1468 -o, --output=<filename_pattern>
1469 Instruct Slurm to connect the batch script's standard output di‐
1470 rectly to the file name specified in the "filename pattern". By
1471 default both standard output and standard error are directed to
1472 the same file. For job arrays, the default file name is
1473 "slurm-%A_%a.out", "%A" is replaced by the job ID and "%a" with
1474 the array index. For other jobs, the default file name is
1475 "slurm-%j.out", where the "%j" is replaced by the job ID. See
1476 the filename pattern section below for filename specification
1477 options.
1478
1479 -O, --overcommit
1480 Overcommit resources.
1481
1482 When applied to a job allocation (not including jobs requesting
1483 exclusive access to the nodes) the resources are allocated as if
1484 only one task per node is requested. This means that the re‐
1485 quested number of cpus per task (-c, --cpus-per-task) are allo‐
1486 cated per node rather than being multiplied by the number of
1487 tasks. Options used to specify the number of tasks per node,
1488 socket, core, etc. are ignored.
1489
1490 When applied to job step allocations (the srun command when exe‐
1491 cuted within an existing job allocation), this option can be
1492 used to launch more than one task per CPU. Normally, srun will
1493 not allocate more than one process per CPU. By specifying
1494 --overcommit you are explicitly allowing more than one process
1495 per CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1496 mitted to execute per node. NOTE: MAX_TASKS_PER_NODE is defined
1497 in the file slurm.h and is not a variable, it is set at Slurm
1498 build time.
1499
1500 -s, --oversubscribe
1501 The job allocation can over-subscribe resources with other run‐
1502 ning jobs. The resources to be over-subscribed can be nodes,
1503 sockets, cores, and/or hyperthreads depending upon configura‐
1504 tion. The default over-subscribe behavior depends on system
1505 configuration and the partition's OverSubscribe option takes
1506 precedence over the job's option. This option may result in the
1507 allocation being granted sooner than if the --oversubscribe op‐
1508 tion was not set and allow higher system utilization, but appli‐
1509 cation performance will likely suffer due to competition for re‐
1510 sources. Also see the --exclusive option.
1511
1512 --parsable
1513 Outputs only the job id number and the cluster name if present.
1514 The values are separated by a semicolon. Errors will still be
1515 displayed.
1516
1517 -p, --partition=<partition_names>
1518 Request a specific partition for the resource allocation. If
1519 not specified, the default behavior is to allow the slurm con‐
1520 troller to select the default partition as designated by the
1521 system administrator. If the job can use more than one parti‐
1522 tion, specify their names in a comma separate list and the one
1523 offering earliest initiation will be used with no regard given
1524 to the partition name ordering (although higher priority parti‐
1525 tions will be considered first). When the job is initiated, the
1526 name of the partition used will be placed first in the job
1527 record partition string.
1528
1529 --power=<flags>
1530 Comma separated list of power management plugin options. Cur‐
1531 rently available flags include: level (all nodes allocated to
1532 the job should have identical power caps, may be disabled by the
1533 Slurm configuration option PowerParameters=job_no_level).
1534
1535 --prefer=<list>
1536 Nodes can have features assigned to them by the Slurm adminis‐
1537 trator. Users can specify which of these features are desired
1538 but not required by their job using the prefer option. This op‐
1539 tion operates independently from --constraint and will override
1540 whatever is set there if possible. When scheduling the features
1541 in --prefer are tried first if a node set isn't available with
1542 those features then --constraint is attempted. See --constraint
1543 for more information, this option behaves the same way.
1544
1545
1546 --priority=<value>
1547 Request a specific job priority. May be subject to configura‐
1548 tion specific constraints. value should either be a numeric
1549 value or "TOP" (for highest possible value). Only Slurm opera‐
1550 tors and administrators can set the priority of a job.
1551
1552 --profile={all|none|<type>[,<type>...]}
1553 Enables detailed data collection by the acct_gather_profile
1554 plugin. Detailed data are typically time-series that are stored
1555 in an HDF5 file for the job or an InfluxDB database depending on
1556 the configured plugin.
1557
1558 All All data types are collected. (Cannot be combined with
1559 other values.)
1560
1561 None No data types are collected. This is the default.
1562 (Cannot be combined with other values.)
1563
1564 Valid type values are:
1565
1566 Energy Energy data is collected.
1567
1568 Task Task (I/O, Memory, ...) data is collected.
1569
1570 Lustre Lustre data is collected.
1571
1572 Network
1573 Network (InfiniBand) data is collected.
1574
1575 --propagate[=rlimit[,rlimit...]]
1576 Allows users to specify which of the modifiable (soft) resource
1577 limits to propagate to the compute nodes and apply to their
1578 jobs. If no rlimit is specified, then all resource limits will
1579 be propagated. The following rlimit names are supported by
1580 Slurm (although some options may not be supported on some sys‐
1581 tems):
1582
1583 ALL All limits listed below (default)
1584
1585 NONE No limits listed below
1586
1587 AS The maximum address space (virtual memory) for a
1588 process.
1589
1590 CORE The maximum size of core file
1591
1592 CPU The maximum amount of CPU time
1593
1594 DATA The maximum size of a process's data segment
1595
1596 FSIZE The maximum size of files created. Note that if the
1597 user sets FSIZE to less than the current size of the
1598 slurmd.log, job launches will fail with a 'File size
1599 limit exceeded' error.
1600
1601 MEMLOCK The maximum size that may be locked into memory
1602
1603 NOFILE The maximum number of open files
1604
1605 NPROC The maximum number of processes available
1606
1607 RSS The maximum resident set size. Note that this only has
1608 effect with Linux kernels 2.4.30 or older or BSD.
1609
1610 STACK The maximum stack size
1611
1612 -q, --qos=<qos>
1613 Request a quality of service for the job. QOS values can be de‐
1614 fined for each user/cluster/account association in the Slurm
1615 database. Users will be limited to their association's defined
1616 set of qos's when the Slurm configuration parameter, Account‐
1617 ingStorageEnforce, includes "qos" in its definition.
1618
1619 -Q, --quiet
1620 Suppress informational messages from sbatch such as Job ID. Only
1621 errors will still be displayed.
1622
1623 --reboot
1624 Force the allocated nodes to reboot before starting the job.
1625 This is only supported with some system configurations and will
1626 otherwise be silently ignored. Only root, SlurmUser or admins
1627 can reboot nodes.
1628
1629 --requeue
1630 Specifies that the batch job should be eligible for requeuing.
1631 The job may be requeued explicitly by a system administrator,
1632 after node failure, or upon preemption by a higher priority job.
1633 When a job is requeued, the batch script is initiated from its
1634 beginning. Also see the --no-requeue option. The JobRequeue
1635 configuration parameter controls the default behavior on the
1636 cluster.
1637
1638 --reservation=<reservation_names>
1639 Allocate resources for the job from the named reservation. If
1640 the job can use more than one reservation, specify their names
1641 in a comma separate list and the one offering earliest initia‐
1642 tion. Each reservation will be considered in the order it was
1643 requested. All reservations will be listed in scontrol/squeue
1644 through the life of the job. In accounting the first reserva‐
1645 tion will be seen and after the job starts the reservation used
1646 will replace it.
1647
1648 --signal=[{R|B}:]<sig_num>[@sig_time]
1649 When a job is within sig_time seconds of its end time, send it
1650 the signal sig_num. Due to the resolution of event handling by
1651 Slurm, the signal may be sent up to 60 seconds earlier than
1652 specified. sig_num may either be a signal number or name (e.g.
1653 "10" or "USR1"). sig_time must have an integer value between 0
1654 and 65535. By default, no signal is sent before the job's end
1655 time. If a sig_num is specified without any sig_time, the de‐
1656 fault time will be 60 seconds. Use the "B:" option to signal
1657 only the batch shell, none of the other processes will be sig‐
1658 naled. By default all job steps will be signaled, but not the
1659 batch shell itself. Use the "R:" option to allow this job to
1660 overlap with a reservation with MaxStartDelay set. To have the
1661 signal sent at preemption time see the preempt_send_user_signal
1662 SlurmctldParameter.
1663
1664 --sockets-per-node=<sockets>
1665 Restrict node selection to nodes with at least the specified
1666 number of sockets. See additional information under -B option
1667 above when task/affinity plugin is enabled.
1668 NOTE: This option may implicitly set the number of tasks (if -n
1669 was not specified) as one task per requested thread.
1670
1671 --spread-job
1672 Spread the job allocation over as many nodes as possible and at‐
1673 tempt to evenly distribute tasks across the allocated nodes.
1674 This option disables the topology/tree plugin.
1675
1676 --switches=<count>[@max-time]
1677 When a tree topology is used, this defines the maximum count of
1678 leaf switches desired for the job allocation and optionally the
1679 maximum time to wait for that number of switches. If Slurm finds
1680 an allocation containing more switches than the count specified,
1681 the job remains pending until it either finds an allocation with
1682 desired switch count or the time limit expires. It there is no
1683 switch count limit, there is no delay in starting the job. Ac‐
1684 ceptable time formats include "minutes", "minutes:seconds",
1685 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1686 "days-hours:minutes:seconds". The job's maximum time delay may
1687 be limited by the system administrator using the SchedulerParam‐
1688 eters configuration parameter with the max_switch_wait parameter
1689 option. On a dragonfly network the only switch count supported
1690 is 1 since communication performance will be highest when a job
1691 is allocate resources on one leaf switch or more than 2 leaf
1692 switches. The default max-time is the max_switch_wait Sched‐
1693 ulerParameters.
1694
1695 --test-only
1696 Validate the batch script and return an estimate of when a job
1697 would be scheduled to run given the current job queue and all
1698 the other arguments specifying the job requirements. No job is
1699 actually submitted.
1700
1701 --thread-spec=<num>
1702 Count of specialized threads per node reserved by the job for
1703 system operations and not used by the application. The applica‐
1704 tion will not use these threads, but will be charged for their
1705 allocation. This option can not be used with the --core-spec
1706 option.
1707
1708 NOTE: Explicitly setting a job's specialized thread value im‐
1709 plicitly sets its --exclusive option, reserving entire nodes for
1710 the job.
1711
1712 --threads-per-core=<threads>
1713 Restrict node selection to nodes with at least the specified
1714 number of threads per core. In task layout, use the specified
1715 maximum number of threads per core. NOTE: "Threads" refers to
1716 the number of processing units on each core rather than the num‐
1717 ber of application tasks to be launched per core. See addi‐
1718 tional information under -B option above when task/affinity
1719 plugin is enabled.
1720 NOTE: This option may implicitly set the number of tasks (if -n
1721 was not specified) as one task per requested thread.
1722
1723 -t, --time=<time>
1724 Set a limit on the total run time of the job allocation. If the
1725 requested time limit exceeds the partition's time limit, the job
1726 will be left in a PENDING state (possibly indefinitely). The
1727 default time limit is the partition's default time limit. When
1728 the time limit is reached, each task in each job step is sent
1729 SIGTERM followed by SIGKILL. The interval between signals is
1730 specified by the Slurm configuration parameter KillWait. The
1731 OverTimeLimit configuration parameter may permit the job to run
1732 longer than scheduled. Time resolution is one minute and second
1733 values are rounded up to the next minute.
1734
1735 A time limit of zero requests that no time limit be imposed.
1736 Acceptable time formats include "minutes", "minutes:seconds",
1737 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1738 "days-hours:minutes:seconds".
1739
1740 --time-min=<time>
1741 Set a minimum time limit on the job allocation. If specified,
1742 the job may have its --time limit lowered to a value no lower
1743 than --time-min if doing so permits the job to begin execution
1744 earlier than otherwise possible. The job's time limit will not
1745 be changed after the job is allocated resources. This is per‐
1746 formed by a backfill scheduling algorithm to allocate resources
1747 otherwise reserved for higher priority jobs. Acceptable time
1748 formats include "minutes", "minutes:seconds", "hours:min‐
1749 utes:seconds", "days-hours", "days-hours:minutes" and
1750 "days-hours:minutes:seconds".
1751
1752 --tmp=<size>[units]
1753 Specify a minimum amount of temporary disk space per node. De‐
1754 fault units are megabytes. Different units can be specified us‐
1755 ing the suffix [K|M|G|T].
1756
1757 --uid=<user>
1758 Attempt to submit and/or run a job as user instead of the invok‐
1759 ing user id. The invoking user's credentials will be used to
1760 check access permissions for the target partition. User root may
1761 use this option to run jobs as a normal user in a RootOnly par‐
1762 tition for example. If run as root, sbatch will drop its permis‐
1763 sions to the uid specified after node allocation is successful.
1764 user may be the user name or numerical user ID.
1765
1766 --usage
1767 Display brief help message and exit.
1768
1769 --use-min-nodes
1770 If a range of node counts is given, prefer the smaller count.
1771
1772 -v, --verbose
1773 Increase the verbosity of sbatch's informational messages. Mul‐
1774 tiple -v's will further increase sbatch's verbosity. By default
1775 only errors will be displayed.
1776
1777 -V, --version
1778 Display version information and exit.
1779
1780 -W, --wait
1781 Do not exit until the submitted job terminates. The exit code
1782 of the sbatch command will be the same as the exit code of the
1783 submitted job. If the job terminated due to a signal rather than
1784 a normal exit, the exit code will be set to 1. In the case of a
1785 job array, the exit code recorded will be the highest value for
1786 any task in the job array.
1787
1788 --wait-all-nodes=<value>
1789 Controls when the execution of the command begins. By default
1790 the job will begin execution as soon as the allocation is made.
1791
1792 0 Begin execution as soon as allocation can be made. Do not
1793 wait for all nodes to be ready for use (i.e. booted).
1794
1795 1 Do not begin execution until all nodes are ready for use.
1796
1797 --wckey=<wckey>
1798 Specify wckey to be used with job. If TrackWCKey=no (default)
1799 in the slurm.conf this value is ignored.
1800
1801 --wrap=<command_string>
1802 Sbatch will wrap the specified command string in a simple "sh"
1803 shell script, and submit that script to the slurm controller.
1804 When --wrap is used, a script name and arguments may not be
1805 specified on the command line; instead the sbatch-generated
1806 wrapper script is used.
1807
1809 sbatch allows for a filename pattern to contain one or more replacement
1810 symbols, which are a percent sign "%" followed by a letter (e.g. %j).
1811
1812
1813 \\ Do not process any of the replacement symbols.
1814
1815 %% The character "%".
1816
1817 %A Job array's master job allocation number.
1818
1819 %a Job array ID (index) number.
1820
1821 %J jobid.stepid of the running job. (e.g. "128.0")
1822
1823 %j jobid of the running job.
1824
1825 %N short hostname. This will create a separate IO file per node.
1826
1827 %n Node identifier relative to current job (e.g. "0" is the first
1828 node of the running job) This will create a separate IO file per
1829 node.
1830
1831 %s stepid of the running job.
1832
1833 %t task identifier (rank) relative to current job. This will create
1834 a separate IO file per task.
1835
1836 %u User name.
1837
1838 %x Job name.
1839
1840 A number placed between the percent character and format specifier may
1841 be used to zero-pad the result in the IO filename. This number is ig‐
1842 nored if the format specifier corresponds to non-numeric data (%N for
1843 example).
1844
1845 Some examples of how the format string may be used for a 4 task job
1846 step with a Job ID of 128 and step id of 0 are included below:
1847
1848
1849 job%J.out job128.0.out
1850
1851 job%4j.out job0128.out
1852
1853 job%j-%2t.out job128-00.out, job128-01.out, ...
1854
1856 Executing sbatch sends a remote procedure call to slurmctld. If enough
1857 calls from sbatch or other Slurm client commands that send remote pro‐
1858 cedure calls to the slurmctld daemon come in at once, it can result in
1859 a degradation of performance of the slurmctld daemon, possibly result‐
1860 ing in a denial of service.
1861
1862 Do not run sbatch or other Slurm client commands that send remote pro‐
1863 cedure calls to slurmctld from loops in shell scripts or other pro‐
1864 grams. Ensure that programs limit calls to sbatch to the minimum neces‐
1865 sary for the information you are trying to gather.
1866
1867
1869 Upon startup, sbatch will read and handle the options set in the fol‐
1870 lowing environment variables. The majority of these variables are set
1871 the same way the options are set, as defined above. For flag options
1872 that are defined to expect no argument, the option can be enabled by
1873 setting the environment variable without a value (empty or NULL
1874 string), the string 'yes', or a non-zero number. Any other value for
1875 the environment variable will result in the option not being set.
1876 There are a couple exceptions to these rules that are noted below.
1877 NOTE: Environment variables will override any options set in a batch
1878 script, and command line options will override any environment vari‐
1879 ables.
1880
1881
1882 SBATCH_ACCOUNT Same as -A, --account
1883
1884 SBATCH_ACCTG_FREQ Same as --acctg-freq
1885
1886 SBATCH_ARRAY_INX Same as -a, --array
1887
1888 SBATCH_BATCH Same as --batch
1889
1890 SBATCH_CLUSTERS or SLURM_CLUSTERS
1891 Same as --clusters
1892
1893 SBATCH_CONSTRAINT Same as -C, --constraint
1894
1895 SBATCH_CONTAINER Same as --container.
1896
1897 SBATCH_CORE_SPEC Same as --core-spec
1898
1899 SBATCH_CPUS_PER_GPU Same as --cpus-per-gpu
1900
1901 SBATCH_DEBUG Same as -v, --verbose. Must be set to 0 or 1 to
1902 disable or enable the option.
1903
1904 SBATCH_DELAY_BOOT Same as --delay-boot
1905
1906 SBATCH_DISTRIBUTION Same as -m, --distribution
1907
1908 SBATCH_ERROR Same as -e, --error
1909
1910 SBATCH_EXCLUSIVE Same as --exclusive
1911
1912 SBATCH_EXPORT Same as --export
1913
1914 SBATCH_GET_USER_ENV Same as --get-user-env
1915
1916 SBATCH_GPU_BIND Same as --gpu-bind
1917
1918 SBATCH_GPU_FREQ Same as --gpu-freq
1919
1920 SBATCH_GPUS Same as -G, --gpus
1921
1922 SBATCH_GPUS_PER_NODE Same as --gpus-per-node
1923
1924 SBATCH_GPUS_PER_TASK Same as --gpus-per-task
1925
1926 SBATCH_GRES Same as --gres
1927
1928 SBATCH_GRES_FLAGS Same as --gres-flags
1929
1930 SBATCH_HINT or SLURM_HINT
1931 Same as --hint
1932
1933 SBATCH_IGNORE_PBS Same as --ignore-pbs
1934
1935 SBATCH_INPUT Same as -i, --input
1936
1937 SBATCH_JOB_NAME Same as -J, --job-name
1938
1939 SBATCH_MEM_BIND Same as --mem-bind
1940
1941 SBATCH_MEM_PER_CPU Same as --mem-per-cpu
1942
1943 SBATCH_MEM_PER_GPU Same as --mem-per-gpu
1944
1945 SBATCH_MEM_PER_NODE Same as --mem
1946
1947 SBATCH_NETWORK Same as --network
1948
1949 SBATCH_NO_KILL Same as -k, --no-kill
1950
1951 SBATCH_NO_REQUEUE Same as --no-requeue
1952
1953 SBATCH_OPEN_MODE Same as --open-mode
1954
1955 SBATCH_OUTPUT Same as -o, --output
1956
1957 SBATCH_OVERCOMMIT Same as -O, --overcommit
1958
1959 SBATCH_PARTITION Same as -p, --partition
1960
1961 SBATCH_POWER Same as --power
1962
1963 SBATCH_PROFILE Same as --profile
1964
1965 SBATCH_QOS Same as --qos
1966
1967 SBATCH_REQ_SWITCH When a tree topology is used, this defines the
1968 maximum count of switches desired for the job al‐
1969 location and optionally the maximum time to wait
1970 for that number of switches. See --switches
1971
1972 SBATCH_REQUEUE Same as --requeue
1973
1974 SBATCH_RESERVATION Same as --reservation
1975
1976 SBATCH_SIGNAL Same as --signal
1977
1978 SBATCH_SPREAD_JOB Same as --spread-job
1979
1980 SBATCH_THREAD_SPEC Same as --thread-spec
1981
1982 SBATCH_THREADS_PER_CORE
1983 Same as --threads-per-core
1984
1985 SBATCH_TIMELIMIT Same as -t, --time
1986
1987 SBATCH_USE_MIN_NODES Same as --use-min-nodes
1988
1989 SBATCH_WAIT Same as -W, --wait
1990
1991 SBATCH_WAIT_ALL_NODES Same as --wait-all-nodes. Must be set to 0 or 1
1992 to disable or enable the option.
1993
1994 SBATCH_WAIT4SWITCH Max time waiting for requested switches. See
1995 --switches
1996
1997 SBATCH_WCKEY Same as --wckey
1998
1999 SLURM_CONF The location of the Slurm configuration file.
2000
2001 SLURM_DEBUG_FLAGS Specify debug flags for sbatch to use. See De‐
2002 bugFlags in the slurm.conf(5) man page for a full
2003 list of flags. The environment variable takes
2004 precedence over the setting in the slurm.conf.
2005
2006 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
2007 error occurs (e.g. invalid options). This can be
2008 used by a script to distinguish application exit
2009 codes from various Slurm error conditions.
2010
2011 SLURM_STEP_KILLED_MSG_NODE_ID=ID
2012 If set, only the specified node will log when the
2013 job or step are killed by a signal.
2014
2015 SLURM_UMASK If defined, Slurm will use the defined umask to
2016 set permissions when creating the output/error
2017 files for the job.
2018
2020 The Slurm controller will set the following variables in the environ‐
2021 ment of the batch script.
2022
2023
2024 SBATCH_MEM_BIND
2025 Set to value of the --mem-bind option.
2026
2027 SBATCH_MEM_BIND_LIST
2028 Set to bit mask used for memory binding.
2029
2030 SBATCH_MEM_BIND_PREFER
2031 Set to "prefer" if the --mem-bind option includes the prefer op‐
2032 tion.
2033
2034 SBATCH_MEM_BIND_TYPE
2035 Set to the memory binding type specified with the --mem-bind op‐
2036 tion. Possible values are "none", "rank", "map_map", "mask_mem"
2037 and "local".
2038
2039 SBATCH_MEM_BIND_VERBOSE
2040 Set to "verbose" if the --mem-bind option includes the verbose
2041 option. Set to "quiet" otherwise.
2042
2043 SLURM_*_HET_GROUP_#
2044 For a heterogeneous job allocation, the environment variables
2045 are set separately for each component.
2046
2047 SLURM_ARRAY_JOB_ID
2048 Job array's master job ID number.
2049
2050 SLURM_ARRAY_TASK_COUNT
2051 Total number of tasks in a job array.
2052
2053 SLURM_ARRAY_TASK_ID
2054 Job array ID (index) number.
2055
2056 SLURM_ARRAY_TASK_MAX
2057 Job array's maximum ID (index) number.
2058
2059 SLURM_ARRAY_TASK_MIN
2060 Job array's minimum ID (index) number.
2061
2062 SLURM_ARRAY_TASK_STEP
2063 Job array's index step size.
2064
2065 SLURM_CLUSTER_NAME
2066 Name of the cluster on which the job is executing.
2067
2068 SLURM_CPUS_ON_NODE
2069 Number of CPUs allocated to the batch step. NOTE: The se‐
2070 lect/linear plugin allocates entire nodes to jobs, so the value
2071 indicates the total count of CPUs on the node. For the se‐
2072 lect/cons_res and cons/tres plugins, this number indicates the
2073 number of CPUs on this node allocated to the step.
2074
2075 SLURM_CPUS_PER_GPU
2076 Number of CPUs requested per allocated GPU. Only set if the
2077 --cpus-per-gpu option is specified.
2078
2079 SLURM_CPUS_PER_TASK
2080 Number of cpus requested per task. Only set if the
2081 --cpus-per-task option is specified.
2082
2083 SLURM_CONTAINER
2084 OCI Bundle for job. Only set if --container is specified.
2085
2086 SLURM_DIST_PLANESIZE
2087 Plane distribution size. Only set for plane distributions. See
2088 -m, --distribution.
2089
2090 SLURM_DISTRIBUTION
2091 Same as -m, --distribution
2092
2093 SLURM_EXPORT_ENV
2094 Same as --export.
2095
2096 SLURM_GPU_BIND
2097 Requested binding of tasks to GPU. Only set if the --gpu-bind
2098 option is specified.
2099
2100 SLURM_GPU_FREQ
2101 Requested GPU frequency. Only set if the --gpu-freq option is
2102 specified.
2103
2104 SLURM_GPUS
2105 Number of GPUs requested. Only set if the -G, --gpus option is
2106 specified.
2107
2108 SLURM_GPUS_ON_NODE
2109 Number of GPUs allocated to the batch step.
2110
2111 SLURM_GPUS_PER_NODE
2112 Requested GPU count per allocated node. Only set if the
2113 --gpus-per-node option is specified.
2114
2115 SLURM_GPUS_PER_SOCKET
2116 Requested GPU count per allocated socket. Only set if the
2117 --gpus-per-socket option is specified.
2118
2119 SLURM_GPUS_PER_TASK
2120 Requested GPU count per allocated task. Only set if the
2121 --gpus-per-task option is specified.
2122
2123 SLURM_GTIDS
2124 Global task IDs running on this node. Zero origin and comma
2125 separated. It is read internally by pmi if Slurm was built with
2126 pmi support. Leaving the variable set may cause problems when
2127 using external packages from within the job (Abaqus and Ansys
2128 have been known to have problems when it is set - consult the
2129 appropriate documentation for 3rd party software).
2130
2131 SLURM_HET_SIZE
2132 Set to count of components in heterogeneous job.
2133
2134 SLURM_JOB_ACCOUNT
2135 Account name associated of the job allocation.
2136
2137 SLURM_JOB_GPUS
2138 The global GPU IDs of the GPUs allocated to this job. The GPU
2139 IDs are not relative to any device cgroup, even if devices are
2140 constrained with task/cgroup. Only set in batch and interactive
2141 jobs.
2142
2143 SLURM_JOB_ID
2144 The ID of the job allocation.
2145
2146 SLURM_JOB_CPUS_PER_NODE
2147 Count of CPUs available to the job on the nodes in the alloca‐
2148 tion, using the format CPU_count[(xnumber_of_nodes)][,CPU_count
2149 [(xnumber_of_nodes)] ...]. For example:
2150 SLURM_JOB_CPUS_PER_NODE='72(x2),36' indicates that on the first
2151 and second nodes (as listed by SLURM_JOB_NODELIST) the alloca‐
2152 tion has 72 CPUs, while the third node has 36 CPUs. NOTE: The
2153 select/linear plugin allocates entire nodes to jobs, so the
2154 value indicates the total count of CPUs on allocated nodes. The
2155 select/cons_res and select/cons_tres plugins allocate individual
2156 CPUs to jobs, so this number indicates the number of CPUs allo‐
2157 cated to the job.
2158
2159 SLURM_JOB_DEPENDENCY
2160 Set to value of the --dependency option.
2161
2162 SLURM_JOB_NAME
2163 Name of the job.
2164
2165 SLURM_JOB_NODELIST
2166 List of nodes allocated to the job.
2167
2168 SLURM_JOB_NUM_NODES
2169 Total number of nodes in the job's resource allocation.
2170
2171 SLURM_JOB_PARTITION
2172 Name of the partition in which the job is running.
2173
2174 SLURM_JOB_QOS
2175 Quality Of Service (QOS) of the job allocation.
2176
2177 SLURM_JOB_RESERVATION
2178 Advanced reservation containing the job allocation, if any.
2179
2180 SLURM_JOBID
2181 The ID of the job allocation. See SLURM_JOB_ID. Included for
2182 backwards compatibility.
2183
2184 SLURM_LOCALID
2185 Node local task ID for the process within a job.
2186
2187 SLURM_MEM_PER_CPU
2188 Same as --mem-per-cpu
2189
2190 SLURM_MEM_PER_GPU
2191 Requested memory per allocated GPU. Only set if the
2192 --mem-per-gpu option is specified.
2193
2194 SLURM_MEM_PER_NODE
2195 Same as --mem
2196
2197 SLURM_NNODES
2198 Total number of nodes in the job's resource allocation. See
2199 SLURM_JOB_NUM_NODES. Included for backwards compatibility.
2200
2201 SLURM_NODE_ALIASES
2202 Sets of node name, communication address and hostname for nodes
2203 allocated to the job from the cloud. Each element in the set if
2204 colon separated and each set is comma separated. For example:
2205 SLURM_NODE_ALIASES=ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2206
2207 SLURM_NODEID
2208 ID of the nodes allocated.
2209
2210 SLURM_NODELIST
2211 List of nodes allocated to the job. See SLURM_JOB_NODELIST. In‐
2212 cluded for backwards compatibility.
2213
2214 SLURM_NPROCS
2215 Same as -n, --ntasks. See SLURM_NTASKS. Included for backwards
2216 compatibility.
2217
2218 SLURM_NTASKS
2219 Same as -n, --ntasks
2220
2221 SLURM_NTASKS_PER_CORE
2222 Number of tasks requested per core. Only set if the
2223 --ntasks-per-core option is specified.
2224
2225
2226 SLURM_NTASKS_PER_GPU
2227 Number of tasks requested per GPU. Only set if the
2228 --ntasks-per-gpu option is specified.
2229
2230 SLURM_NTASKS_PER_NODE
2231 Number of tasks requested per node. Only set if the
2232 --ntasks-per-node option is specified.
2233
2234 SLURM_NTASKS_PER_SOCKET
2235 Number of tasks requested per socket. Only set if the
2236 --ntasks-per-socket option is specified.
2237
2238 SLURM_OVERCOMMIT
2239 Set to 1 if --overcommit was specified.
2240
2241 SLURM_PRIO_PROCESS
2242 The scheduling priority (nice value) at the time of job submis‐
2243 sion. This value is propagated to the spawned processes.
2244
2245 SLURM_PROCID
2246 The MPI rank (or relative process ID) of the current process
2247
2248 SLURM_PROFILE
2249 Same as --profile
2250
2251 SLURM_RESTART_COUNT
2252 If the job has been restarted due to system failure or has been
2253 explicitly requeued, this will be sent to the number of times
2254 the job has been restarted.
2255
2256 SLURM_SHARDS_ON_NODE
2257 Number of GPU Shards available to the step on this node.
2258
2259 SLURM_SUBMIT_DIR
2260 The directory from which sbatch was invoked.
2261
2262 SLURM_SUBMIT_HOST
2263 The hostname of the computer from which sbatch was invoked.
2264
2265 SLURM_TASK_PID
2266 The process ID of the task being started.
2267
2268 SLURM_TASKS_PER_NODE
2269 Number of tasks to be initiated on each node. Values are comma
2270 separated and in the same order as SLURM_JOB_NODELIST. If two
2271 or more consecutive nodes are to have the same task count, that
2272 count is followed by "(x#)" where "#" is the repetition count.
2273 For example, "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2274 first three nodes will each execute two tasks and the fourth
2275 node will execute one task.
2276
2277 SLURM_THREADS_PER_CORE
2278 This is only set if --threads-per-core or
2279 SBATCH_THREADS_PER_CORE were specified. The value will be set to
2280 the value specified by --threads-per-core or
2281 SBATCH_THREADS_PER_CORE. This is used by subsequent srun calls
2282 within the job allocation.
2283
2284 SLURM_TOPOLOGY_ADDR
2285 This is set only if the system has the topology/tree plugin
2286 configured. The value will be set to the names network
2287 switches which may be involved in the job's communications
2288 from the system's top level switch down to the leaf switch and
2289 ending with node name. A period is used to separate each hard‐
2290 ware component name.
2291
2292 SLURM_TOPOLOGY_ADDR_PATTERN
2293 This is set only if the system has the topology/tree plugin
2294 configured. The value will be set component types listed in
2295 SLURM_TOPOLOGY_ADDR. Each component will be identified as ei‐
2296 ther "switch" or "node". A period is used to separate each
2297 hardware component type.
2298
2299 SLURMD_NODENAME
2300 Name of the node running the job script.
2301
2303 Specify a batch script by filename on the command line. The batch
2304 script specifies a 1 minute time limit for the job.
2305
2306 $ cat myscript
2307 #!/bin/sh
2308 #SBATCH --time=1
2309 srun hostname |sort
2310
2311 $ sbatch -N4 myscript
2312 salloc: Granted job allocation 65537
2313
2314 $ cat slurm-65537.out
2315 host1
2316 host2
2317 host3
2318 host4
2319
2320
2321 Pass a batch script to sbatch on standard input:
2322
2323 $ sbatch -N4 <<EOF
2324 > #!/bin/sh
2325 > srun hostname |sort
2326 > EOF
2327 sbatch: Submitted batch job 65541
2328
2329 $ cat slurm-65541.out
2330 host1
2331 host2
2332 host3
2333 host4
2334
2335
2336 To create a heterogeneous job with 3 components, each allocating a
2337 unique set of nodes:
2338
2339 $ sbatch -w node[2-3] : -w node4 : -w node[5-7] work.bash
2340 Submitted batch job 34987
2341
2342
2344 Copyright (C) 2006-2007 The Regents of the University of California.
2345 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
2346 Copyright (C) 2008-2010 Lawrence Livermore National Security.
2347 Copyright (C) 2010-2022 SchedMD LLC.
2348
2349 This file is part of Slurm, a resource management program. For de‐
2350 tails, see <https://slurm.schedmd.com/>.
2351
2352 Slurm is free software; you can redistribute it and/or modify it under
2353 the terms of the GNU General Public License as published by the Free
2354 Software Foundation; either version 2 of the License, or (at your op‐
2355 tion) any later version.
2356
2357 Slurm is distributed in the hope that it will be useful, but WITHOUT
2358 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
2359 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
2360 for more details.
2361
2362
2364 sinfo(1), sattach(1), salloc(1), squeue(1), scancel(1), scontrol(1),
2365 slurm.conf(5), sched_setaffinity (2), numa (3)
2366
2367
2368
2369September 2022 Slurm Commands sbatch(1)