1sbatch(1) Slurm Commands sbatch(1)
2
3
4
6 sbatch - Submit a batch script to Slurm.
7
8
10 sbatch [OPTIONS(0)...] [ : [OPTIONS(N)...]] script(0) [args(0)...]
11
12 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
13 For more details about heterogeneous jobs see the document
14 https://slurm.schedmd.com/heterogeneous_jobs.html
15
16
18 sbatch submits a batch script to Slurm. The batch script may be given
19 to sbatch through a file name on the command line, or if no file name
20 is specified, sbatch will read in a script from standard input. The
21 batch script may contain options preceded with "#SBATCH" before any ex‐
22 ecutable commands in the script. sbatch will stop processing further
23 #SBATCH directives once the first non-comment non-whitespace line has
24 been reached in the script.
25
26 sbatch exits immediately after the script is successfully transferred
27 to the Slurm controller and assigned a Slurm job ID. The batch script
28 is not necessarily granted resources immediately, it may sit in the
29 queue of pending jobs for some time before its required resources be‐
30 come available.
31
32 By default both standard output and standard error are directed to a
33 file of the name "slurm-%j.out", where the "%j" is replaced with the
34 job allocation number. The file will be generated on the first node of
35 the job allocation. Other than the batch script itself, Slurm does no
36 movement of user files.
37
38 When the job allocation is finally granted for the batch script, Slurm
39 runs a single copy of the batch script on the first node in the set of
40 allocated nodes.
41
42 The following document describes the influence of various options on
43 the allocation of cpus to jobs and tasks.
44 https://slurm.schedmd.com/cpu_management.html
45
46
48 sbatch will return 0 on success or error code on failure.
49
50
52 The batch script is resolved in the following order:
53
54 1. If script starts with ".", then path is constructed as: current
55 working directory / script
56 2. If script starts with a "/", then path is considered absolute.
57 3. If script is in current working directory.
58 4. If script can be resolved through PATH. See path_resolution(7).
59
60 Current working directory is the calling process working directory un‐
61 less the --chdir argument is passed, which will override the current
62 working directory.
63
64
66 -A, --account=<account>
67 Charge resources used by this job to specified account. The ac‐
68 count is an arbitrary string. The account name may be changed
69 after job submission using the scontrol command.
70
71 --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
72 Define the job accounting and profiling sampling intervals in
73 seconds. This can be used to override the JobAcctGatherFre‐
74 quency parameter in the slurm.conf file. <datatype>=<interval>
75 specifies the task sampling interval for the jobacct_gather
76 plugin or a sampling interval for a profiling type by the
77 acct_gather_profile plugin. Multiple comma-separated
78 <datatype>=<interval> pairs may be specified. Supported datatype
79 values are:
80
81 task Sampling interval for the jobacct_gather plugins and
82 for task profiling by the acct_gather_profile
83 plugin.
84 NOTE: This frequency is used to monitor memory us‐
85 age. If memory limits are enforced, the highest fre‐
86 quency a user can request is what is configured in
87 the slurm.conf file. It can not be disabled.
88
89 energy Sampling interval for energy profiling using the
90 acct_gather_energy plugin.
91
92 network Sampling interval for infiniband profiling using the
93 acct_gather_interconnect plugin.
94
95 filesystem Sampling interval for filesystem profiling using the
96 acct_gather_filesystem plugin.
97
98 The default value for the task sampling interval is 30 seconds.
99 The default value for all other intervals is 0. An interval of
100 0 disables sampling of the specified type. If the task sampling
101 interval is 0, accounting information is collected only at job
102 termination (reducing Slurm interference with the job).
103 Smaller (non-zero) values have a greater impact upon job perfor‐
104 mance, but a value of 30 seconds is not likely to be noticeable
105 for applications having less than 10,000 tasks.
106
107 -a, --array=<indexes>
108 Submit a job array, multiple jobs to be executed with identical
109 parameters. The indexes specification identifies what array in‐
110 dex values should be used. Multiple values may be specified us‐
111 ing a comma separated list and/or a range of values with a "-"
112 separator. For example, "--array=0-15" or "--array=0,6,16-32".
113 A step function can also be specified with a suffix containing a
114 colon and number. For example, "--array=0-15:4" is equivalent to
115 "--array=0,4,8,12". A maximum number of simultaneously running
116 tasks from the job array may be specified using a "%" separator.
117 For example "--array=0-15%4" will limit the number of simultane‐
118 ously running tasks from this job array to 4. The minimum index
119 value is 0. the maximum value is one less than the configura‐
120 tion parameter MaxArraySize. NOTE: currently, federated job ar‐
121 rays only run on the local cluster.
122
123 --batch=<list>
124 Nodes can have features assigned to them by the Slurm adminis‐
125 trator. Users can specify which of these features are required
126 by their batch script using this options. For example a job's
127 allocation may include both Intel Haswell and KNL nodes with
128 features "haswell" and "knl" respectively. On such a configura‐
129 tion the batch script would normally benefit by executing on a
130 faster Haswell node. This would be specified using the option
131 "--batch=haswell". The specification can include AND and OR op‐
132 erators using the ampersand and vertical bar separators. For ex‐
133 ample: "--batch=haswell|broadwell" or "--batch=haswell|big_mem‐
134 ory". The --batch argument must be a subset of the job's --con‐
135 straint=<list> argument (i.e. the job can not request only KNL
136 nodes, but require the script to execute on a Haswell node). If
137 the request can not be satisfied from the resources allocated to
138 the job, the batch script will execute on the first node of the
139 job allocation.
140
141 --bb=<spec>
142 Burst buffer specification. The form of the specification is
143 system dependent. Also see --bbf. When the --bb option is
144 used, Slurm parses this option and creates a temporary burst
145 buffer script file that is used internally by the burst buffer
146 plugins. See Slurm's burst buffer guide for more information and
147 examples:
148 https://slurm.schedmd.com/burst_buffer.html
149
150 --bbf=<file_name>
151 Path of file containing burst buffer specification. The form of
152 the specification is system dependent. These burst buffer di‐
153 rectives will be inserted into the submitted batch script. See
154 Slurm's burst buffer guide for more information and examples:
155 https://slurm.schedmd.com/burst_buffer.html
156
157 -b, --begin=<time>
158 Submit the batch script to the Slurm controller immediately,
159 like normal, but tell the controller to defer the allocation of
160 the job until the specified time.
161
162 Time may be of the form HH:MM:SS to run a job at a specific time
163 of day (seconds are optional). (If that time is already past,
164 the next day is assumed.) You may also specify midnight, noon,
165 fika (3 PM) or teatime (4 PM) and you can have a time-of-day
166 suffixed with AM or PM for running in the morning or the
167 evening. You can also say what day the job will be run, by
168 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
169 Combine date and time using the following format
170 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
171 count time-units, where the time-units can be seconds (default),
172 minutes, hours, days, or weeks and you can tell Slurm to run the
173 job today with the keyword today and to run the job tomorrow
174 with the keyword tomorrow. The value may be changed after job
175 submission using the scontrol command. For example:
176
177 --begin=16:00
178 --begin=now+1hour
179 --begin=now+60 (seconds by default)
180 --begin=2010-01-20T12:34:00
181
182
183 Notes on date/time specifications:
184 - Although the 'seconds' field of the HH:MM:SS time specifica‐
185 tion is allowed by the code, note that the poll time of the
186 Slurm scheduler is not precise enough to guarantee dispatch of
187 the job on the exact second. The job will be eligible to start
188 on the next poll following the specified time. The exact poll
189 interval depends on the Slurm scheduler (e.g., 60 seconds with
190 the default sched/builtin).
191 - If no time (HH:MM:SS) is specified, the default is
192 (00:00:00).
193 - If a date is specified without a year (e.g., MM/DD) then the
194 current year is assumed, unless the combination of MM/DD and
195 HH:MM:SS has already passed for that year, in which case the
196 next year is used.
197
198 -D, --chdir=<directory>
199 Set the working directory of the batch script to directory be‐
200 fore it is executed. The path can be specified as full path or
201 relative path to the directory where the command is executed.
202
203 --cluster-constraint=[!]<list>
204 Specifies features that a federated cluster must have to have a
205 sibling job submitted to it. Slurm will attempt to submit a sib‐
206 ling job to a cluster if it has at least one of the specified
207 features. If the "!" option is included, Slurm will attempt to
208 submit a sibling job to a cluster that has none of the specified
209 features.
210
211 -M, --clusters=<string>
212 Clusters to issue commands to. Multiple cluster names may be
213 comma separated. The job will be submitted to the one cluster
214 providing the earliest expected job initiation time. The default
215 value is the current cluster. A value of 'all' will query to run
216 on all clusters. Note the --export option to control environ‐
217 ment variables exported between clusters. Note that the Slur‐
218 mDBD must be up for this option to work properly.
219
220 --comment=<string>
221 An arbitrary comment enclosed in double quotes if using spaces
222 or some special characters.
223
224 -C, --constraint=<list>
225 Nodes can have features assigned to them by the Slurm adminis‐
226 trator. Users can specify which of these features are required
227 by their job using the constraint option. If you are looking for
228 'soft' constraints please see see --prefer for more information.
229 Only nodes having features matching the job constraints will be
230 used to satisfy the request. Multiple constraints may be speci‐
231 fied with AND, OR, matching OR, resource counts, etc. (some op‐
232 erators are not supported on all system types).
233
234 NOTE: If features that are part of the node_features/helpers
235 plugin are requested, then only the Single Name and AND options
236 are supported.
237
238 Supported --constraint options include:
239
240 Single Name
241 Only nodes which have the specified feature will be used.
242 For example, --constraint="intel"
243
244 Node Count
245 A request can specify the number of nodes needed with
246 some feature by appending an asterisk and count after the
247 feature name. For example, --nodes=16 --con‐
248 straint="graphics*4 ..." indicates that the job requires
249 16 nodes and that at least four of those nodes must have
250 the feature "graphics."
251
252 AND If only nodes with all of specified features will be
253 used. The ampersand is used for an AND operator. For
254 example, --constraint="intel&gpu"
255
256 OR If only nodes with at least one of specified features
257 will be used. The vertical bar is used for an OR opera‐
258 tor. For example, --constraint="intel|amd"
259
260 Matching OR
261 If only one of a set of possible options should be used
262 for all allocated nodes, then use the OR operator and en‐
263 close the options within square brackets. For example,
264 --constraint="[rack1|rack2|rack3|rack4]" might be used to
265 specify that all nodes must be allocated on a single rack
266 of the cluster, but any of those four racks can be used.
267
268 Multiple Counts
269 Specific counts of multiple resources may be specified by
270 using the AND operator and enclosing the options within
271 square brackets. For example, --con‐
272 straint="[rack1*2&rack2*4]" might be used to specify that
273 two nodes must be allocated from nodes with the feature
274 of "rack1" and four nodes must be allocated from nodes
275 with the feature "rack2".
276
277 NOTE: This construct does not support multiple Intel KNL
278 NUMA or MCDRAM modes. For example, while --con‐
279 straint="[(knl&quad)*2&(knl&hemi)*4]" is not supported,
280 --constraint="[haswell*2&(knl&hemi)*4]" is supported.
281 Specification of multiple KNL modes requires the use of a
282 heterogeneous job.
283
284 NOTE: Multiple Counts can cause jobs to be allocated with
285 a non-optimal network layout.
286
287 Brackets
288 Brackets can be used to indicate that you are looking for
289 a set of nodes with the different requirements contained
290 within the brackets. For example, --con‐
291 straint="[(rack1|rack2)*1&(rack3)*2]" will get you one
292 node with either the "rack1" or "rack2" features and two
293 nodes with the "rack3" feature. The same request without
294 the brackets will try to find a single node that meets
295 those requirements.
296
297 NOTE: Brackets are only reserved for Multiple Counts and
298 Matching OR syntax. AND operators require a count for
299 each feature inside square brackets (i.e.
300 "[quad*2&hemi*1]"). Slurm will only allow a single set of
301 bracketed constraints per job.
302
303 Parenthesis
304 Parenthesis can be used to group like node features to‐
305 gether. For example, --con‐
306 straint="[(knl&snc4&flat)*4&haswell*1]" might be used to
307 specify that four nodes with the features "knl", "snc4"
308 and "flat" plus one node with the feature "haswell" are
309 required. All options within parenthesis should be
310 grouped with AND (e.g. "&") operands.
311
312 --container=<path_to_container>
313 Absolute path to OCI container bundle.
314
315 --contiguous
316 If set, then the allocated nodes must form a contiguous set.
317
318 NOTE: If SelectPlugin=cons_res this option won't be honored with
319 the topology/tree or topology/3d_torus plugins, both of which
320 can modify the node ordering.
321
322 -S, --core-spec=<num>
323 Count of specialized cores per node reserved by the job for sys‐
324 tem operations and not used by the application. The application
325 will not use these cores, but will be charged for their alloca‐
326 tion. Default value is dependent upon the node's configured
327 CoreSpecCount value. If a value of zero is designated and the
328 Slurm configuration option AllowSpecResourcesUsage is enabled,
329 the job will be allowed to override CoreSpecCount and use the
330 specialized resources on nodes it is allocated. This option can
331 not be used with the --thread-spec option.
332
333 NOTE: Explicitly setting a job's specialized core value implic‐
334 itly sets its --exclusive option, reserving entire nodes for the
335 job.
336
337 --cores-per-socket=<cores>
338 Restrict node selection to nodes with at least the specified
339 number of cores per socket. See additional information under -B
340 option above when task/affinity plugin is enabled.
341 NOTE: This option may implicitly set the number of tasks (if -n
342 was not specified) as one task per requested thread.
343
344 --cpu-freq=<p1>[-p2[:p3]]
345
346 Request that job steps initiated by srun commands inside this
347 sbatch script be run at some requested frequency if possible, on
348 the CPUs selected for the step on the compute node(s).
349
350 p1 can be [#### | low | medium | high | highm1] which will set
351 the frequency scaling_speed to the corresponding value, and set
352 the frequency scaling_governor to UserSpace. See below for defi‐
353 nition of the values.
354
355 p1 can be [Conservative | OnDemand | Performance | PowerSave]
356 which will set the scaling_governor to the corresponding value.
357 The governor has to be in the list set by the slurm.conf option
358 CpuFreqGovernors.
359
360 When p2 is present, p1 will be the minimum scaling frequency and
361 p2 will be the maximum scaling frequency.
362
363 p2 can be [#### | medium | high | highm1] p2 must be greater
364 than p1.
365
366 p3 can be [Conservative | OnDemand | Performance | PowerSave |
367 SchedUtil | UserSpace] which will set the governor to the corre‐
368 sponding value.
369
370 If p3 is UserSpace, the frequency scaling_speed will be set by a
371 power or energy aware scheduling strategy to a value between p1
372 and p2 that lets the job run within the site's power goal. The
373 job may be delayed if p1 is higher than a frequency that allows
374 the job to run within the goal.
375
376 If the current frequency is < min, it will be set to min. Like‐
377 wise, if the current frequency is > max, it will be set to max.
378
379 Acceptable values at present include:
380
381 #### frequency in kilohertz
382
383 Low the lowest available frequency
384
385 High the highest available frequency
386
387 HighM1 (high minus one) will select the next highest
388 available frequency
389
390 Medium attempts to set a frequency in the middle of the
391 available range
392
393 Conservative attempts to use the Conservative CPU governor
394
395 OnDemand attempts to use the OnDemand CPU governor (the de‐
396 fault value)
397
398 Performance attempts to use the Performance CPU governor
399
400 PowerSave attempts to use the PowerSave CPU governor
401
402 UserSpace attempts to use the UserSpace CPU governor
403
404 The following informational environment variable is set in the job step
405 when --cpu-freq option is requested.
406 SLURM_CPU_FREQ_REQ
407
408 This environment variable can also be used to supply the value for the
409 CPU frequency request if it is set when the 'srun' command is issued.
410 The --cpu-freq on the command line will override the environment vari‐
411 able value. The form on the environment variable is the same as the
412 command line. See the ENVIRONMENT VARIABLES section for a description
413 of the SLURM_CPU_FREQ_REQ variable.
414
415 NOTE: This parameter is treated as a request, not a requirement. If
416 the job step's node does not support setting the CPU frequency, or the
417 requested value is outside the bounds of the legal frequencies, an er‐
418 ror is logged, but the job step is allowed to continue.
419
420 NOTE: Setting the frequency for just the CPUs of the job step implies
421 that the tasks are confined to those CPUs. If task confinement (i.e.
422 the task/affinity TaskPlugin is enabled, or the task/cgroup TaskPlugin
423 is enabled with "ConstrainCores=yes" set in cgroup.conf) is not config‐
424 ured, this parameter is ignored.
425
426 NOTE: When the step completes, the frequency and governor of each se‐
427 lected CPU is reset to the previous values.
428
429 NOTE: When submitting jobs with the --cpu-freq option with linuxproc
430 as the ProctrackType can cause jobs to run too quickly before Account‐
431 ing is able to poll for job information. As a result not all of ac‐
432 counting information will be present.
433
434 --cpus-per-gpu=<ncpus>
435 Advise Slurm that ensuing job steps will require ncpus proces‐
436 sors per allocated GPU. Not compatible with the --cpus-per-task
437 option.
438
439 -c, --cpus-per-task=<ncpus>
440 Advise the Slurm controller that ensuing job steps will require
441 ncpus number of processors per task. Without this option, the
442 controller will just try to allocate one processor per task.
443
444 For instance, consider an application that has 4 tasks, each re‐
445 quiring 3 processors. If our cluster is comprised of quad-pro‐
446 cessors nodes and we simply ask for 12 processors, the con‐
447 troller might give us only 3 nodes. However, by using the
448 --cpus-per-task=3 options, the controller knows that each task
449 requires 3 processors on the same node, and the controller will
450 grant an allocation of 4 nodes, one for each of the 4 tasks.
451
452 NOTE: Beginning with 22.05, srun will not inherit the
453 --cpus-per-task value requested by salloc or sbatch. It must be
454 requested again with the call to srun or set with the
455 SRUN_CPUS_PER_TASK environment variable if desired for the
456 task(s).
457
458 --deadline=<OPT>
459 remove the job if no ending is possible before this deadline
460 (start > (deadline - time[-min])). Default is no deadline.
461 Valid time formats are:
462 HH:MM[:SS] [AM|PM]
463 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
464 MM/DD[/YY]-HH:MM[:SS]
465 YYYY-MM-DD[THH:MM[:SS]]]
466 now[+count[seconds(default)|minutes|hours|days|weeks]]
467
468 --delay-boot=<minutes>
469 Do not reboot nodes in order to satisfied this job's feature
470 specification if the job has been eligible to run for less than
471 this time period. If the job has waited for less than the spec‐
472 ified period, it will use only nodes which already have the
473 specified features. The argument is in units of minutes. A de‐
474 fault value may be set by a system administrator using the de‐
475 lay_boot option of the SchedulerParameters configuration parame‐
476 ter in the slurm.conf file, otherwise the default value is zero
477 (no delay).
478
479 -d, --dependency=<dependency_list>
480 Defer the start of this job until the specified dependencies
481 have been satisfied completed. <dependency_list> is of the form
482 <type:job_id[:job_id][,type:job_id[:job_id]]> or
483 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
484 must be satisfied if the "," separator is used. Any dependency
485 may be satisfied if the "?" separator is used. Only one separa‐
486 tor may be used. For instance:
487 -d afterok:20:21,afterany:23
488
489 means that the job can run only after a 0 return code of jobs 20
490 and 21 AND the completion of job 23. However:
491 -d afterok:20:21?afterany:23
492 means that any of the conditions (afterok:20 OR afterok:21 OR
493 afterany:23) will be enough to release the job. Many jobs can
494 share the same dependency and these jobs may even belong to dif‐
495 ferent users. The value may be changed after job submission
496 using the scontrol command. Dependencies on remote jobs are al‐
497 lowed in a federation. Once a job dependency fails due to the
498 termination state of a preceding job, the dependent job will
499 never be run, even if the preceding job is requeued and has a
500 different termination state in a subsequent execution.
501
502 after:job_id[[+time][:jobid[+time]...]]
503 After the specified jobs start or are cancelled and
504 'time' in minutes from job start or cancellation happens,
505 this job can begin execution. If no 'time' is given then
506 there is no delay after start or cancellation.
507
508 afterany:job_id[:jobid...]
509 This job can begin execution after the specified jobs
510 have terminated. This is the default dependency type.
511
512 afterburstbuffer:job_id[:jobid...]
513 This job can begin execution after the specified jobs
514 have terminated and any associated burst buffer stage out
515 operations have completed.
516
517 aftercorr:job_id[:jobid...]
518 A task of this job array can begin execution after the
519 corresponding task ID in the specified job has completed
520 successfully (ran to completion with an exit code of
521 zero).
522
523 afternotok:job_id[:jobid...]
524 This job can begin execution after the specified jobs
525 have terminated in some failed state (non-zero exit code,
526 node failure, timed out, etc).
527
528 afterok:job_id[:jobid...]
529 This job can begin execution after the specified jobs
530 have successfully executed (ran to completion with an
531 exit code of zero).
532
533 singleton
534 This job can begin execution after any previously
535 launched jobs sharing the same job name and user have
536 terminated. In other words, only one job by that name
537 and owned by that user can be running or suspended at any
538 point in time. In a federation, a singleton dependency
539 must be fulfilled on all clusters unless DependencyParam‐
540 eters=disable_remote_singleton is used in slurm.conf.
541
542 -m, --distribution={*|block|cyclic|arbi‐
543 trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
544
545 Specify alternate distribution methods for remote processes.
546 For job allocation, this sets environment variables that will be
547 used by subsequent srun requests and also affects which cores
548 will be selected for job allocation.
549
550 This option controls the distribution of tasks to the nodes on
551 which resources have been allocated, and the distribution of
552 those resources to tasks for binding (task affinity). The first
553 distribution method (before the first ":") controls the distri‐
554 bution of tasks to nodes. The second distribution method (after
555 the first ":") controls the distribution of allocated CPUs
556 across sockets for binding to tasks. The third distribution
557 method (after the second ":") controls the distribution of allo‐
558 cated CPUs across cores for binding to tasks. The second and
559 third distributions apply only if task affinity is enabled. The
560 third distribution is supported only if the task/cgroup plugin
561 is configured. The default value for each distribution type is
562 specified by *.
563
564 Note that with select/cons_res and select/cons_tres, the number
565 of CPUs allocated to each socket and node may be different. Re‐
566 fer to https://slurm.schedmd.com/mc_support.html for more infor‐
567 mation on resource allocation, distribution of tasks to nodes,
568 and binding of tasks to CPUs.
569 First distribution method (distribution of tasks across nodes):
570
571
572 * Use the default method for distributing tasks to nodes
573 (block).
574
575 block The block distribution method will distribute tasks to a
576 node such that consecutive tasks share a node. For exam‐
577 ple, consider an allocation of three nodes each with two
578 cpus. A four-task block distribution request will dis‐
579 tribute those tasks to the nodes with tasks one and two
580 on the first node, task three on the second node, and
581 task four on the third node. Block distribution is the
582 default behavior if the number of tasks exceeds the num‐
583 ber of allocated nodes.
584
585 cyclic The cyclic distribution method will distribute tasks to a
586 node such that consecutive tasks are distributed over
587 consecutive nodes (in a round-robin fashion). For exam‐
588 ple, consider an allocation of three nodes each with two
589 cpus. A four-task cyclic distribution request will dis‐
590 tribute those tasks to the nodes with tasks one and four
591 on the first node, task two on the second node, and task
592 three on the third node. Note that when SelectType is
593 select/cons_res, the same number of CPUs may not be allo‐
594 cated on each node. Task distribution will be round-robin
595 among all the nodes with CPUs yet to be assigned to
596 tasks. Cyclic distribution is the default behavior if
597 the number of tasks is no larger than the number of allo‐
598 cated nodes.
599
600 plane The tasks are distributed in blocks of size <size>. The
601 size must be given or SLURM_DIST_PLANESIZE must be set.
602 The number of tasks distributed to each node is the same
603 as for cyclic distribution, but the taskids assigned to
604 each node depend on the plane size. Additional distribu‐
605 tion specifications cannot be combined with this option.
606 For more details (including examples and diagrams),
607 please see https://slurm.schedmd.com/mc_support.html and
608 https://slurm.schedmd.com/dist_plane.html
609
610 arbitrary
611 The arbitrary method of distribution will allocate pro‐
612 cesses in-order as listed in file designated by the envi‐
613 ronment variable SLURM_HOSTFILE. If this variable is
614 listed it will over ride any other method specified. If
615 not set the method will default to block. Inside the
616 hostfile must contain at minimum the number of hosts re‐
617 quested and be one per line or comma separated. If spec‐
618 ifying a task count (-n, --ntasks=<number>), your tasks
619 will be laid out on the nodes in the order of the file.
620 NOTE: The arbitrary distribution option on a job alloca‐
621 tion only controls the nodes to be allocated to the job
622 and not the allocation of CPUs on those nodes. This op‐
623 tion is meant primarily to control a job step's task lay‐
624 out in an existing job allocation for the srun command.
625 NOTE: If the number of tasks is given and a list of re‐
626 quested nodes is also given, the number of nodes used
627 from that list will be reduced to match that of the num‐
628 ber of tasks if the number of nodes in the list is
629 greater than the number of tasks.
630
631 Second distribution method (distribution of CPUs across sockets
632 for binding):
633
634
635 * Use the default method for distributing CPUs across sock‐
636 ets (cyclic).
637
638 block The block distribution method will distribute allocated
639 CPUs consecutively from the same socket for binding to
640 tasks, before using the next consecutive socket.
641
642 cyclic The cyclic distribution method will distribute allocated
643 CPUs for binding to a given task consecutively from the
644 same socket, and from the next consecutive socket for the
645 next task, in a round-robin fashion across sockets.
646 Tasks requiring more than one CPU will have all of those
647 CPUs allocated on a single socket if possible.
648
649 fcyclic
650 The fcyclic distribution method will distribute allocated
651 CPUs for binding to tasks from consecutive sockets in a
652 round-robin fashion across the sockets. Tasks requiring
653 more than one CPU will have each CPUs allocated in a
654 cyclic fashion across sockets.
655
656 Third distribution method (distribution of CPUs across cores for
657 binding):
658
659
660 * Use the default method for distributing CPUs across cores
661 (inherited from second distribution method).
662
663 block The block distribution method will distribute allocated
664 CPUs consecutively from the same core for binding to
665 tasks, before using the next consecutive core.
666
667 cyclic The cyclic distribution method will distribute allocated
668 CPUs for binding to a given task consecutively from the
669 same core, and from the next consecutive core for the
670 next task, in a round-robin fashion across cores.
671
672 fcyclic
673 The fcyclic distribution method will distribute allocated
674 CPUs for binding to tasks from consecutive cores in a
675 round-robin fashion across the cores.
676
677 Optional control for task distribution over nodes:
678
679
680 Pack Rather than evenly distributing a job step's tasks evenly
681 across its allocated nodes, pack them as tightly as pos‐
682 sible on the nodes. This only applies when the "block"
683 task distribution method is used.
684
685 NoPack Rather than packing a job step's tasks as tightly as pos‐
686 sible on the nodes, distribute them evenly. This user
687 option will supersede the SelectTypeParameters
688 CR_Pack_Nodes configuration parameter.
689
690 -e, --error=<filename_pattern>
691 Instruct Slurm to connect the batch script's standard error di‐
692 rectly to the file name specified in the "filename pattern". By
693 default both standard output and standard error are directed to
694 the same file. For job arrays, the default file name is
695 "slurm-%A_%a.out", "%A" is replaced by the job ID and "%a" with
696 the array index. For other jobs, the default file name is
697 "slurm-%j.out", where the "%j" is replaced by the job ID. See
698 the filename pattern section below for filename specification
699 options.
700
701 -x, --exclude=<node_name_list>
702 Explicitly exclude certain nodes from the resources granted to
703 the job.
704
705 --exclusive[={user|mcs}]
706 The job allocation can not share nodes with other running jobs
707 (or just other users with the "=user" option or with the "=mcs"
708 option). If user/mcs are not specified (i.e. the job allocation
709 can not share nodes with other running jobs), the job is allo‐
710 cated all CPUs and GRES on all nodes in the allocation, but is
711 only allocated as much memory as it requested. This is by design
712 to support gang scheduling, because suspended jobs still reside
713 in memory. To request all the memory on a node, use --mem=0.
714 The default shared/exclusive behavior depends on system configu‐
715 ration and the partition's OverSubscribe option takes precedence
716 over the job's option. NOTE: Since shared GRES (MPS) cannot be
717 allocated at the same time as a sharing GRES (GPU) this option
718 only allocates all sharing GRES and no underlying shared GRES.
719
720 --export={[ALL,]<environment_variables>|ALL|NONE}
721 Identify which environment variables from the submission envi‐
722 ronment are propagated to the launched application. Note that
723 SLURM_* variables are always propagated.
724
725 --export=ALL
726 Default mode if --export is not specified. All of the
727 user's environment will be loaded (either from the
728 caller's environment or from a clean environment if
729 --get-user-env is specified).
730
731 --export=NONE
732 Only SLURM_* variables from the user environment will
733 be defined. User must use absolute path to the binary
734 to be executed that will define the environment. User
735 can not specify explicit environment variables with
736 "NONE". --get-user-env will be ignored.
737
738 This option is particularly important for jobs that
739 are submitted on one cluster and execute on a differ‐
740 ent cluster (e.g. with different paths). To avoid
741 steps inheriting environment export settings (e.g.
742 "NONE") from sbatch command, the environment variable
743 SLURM_EXPORT_ENV should be set to "ALL" in the job
744 script.
745
746 --export=[ALL,]<environment_variables>
747 Exports all SLURM_* environment variables along with
748 explicitly defined variables. Multiple environment
749 variable names should be comma separated. Environment
750 variable names may be specified to propagate the cur‐
751 rent value (e.g. "--export=EDITOR") or specific values
752 may be exported (e.g. "--export=EDITOR=/bin/emacs").
753 If "ALL" is specified, then all user environment vari‐
754 ables will be loaded and will take precedence over any
755 explicitly given environment variables.
756
757 Example: --export=EDITOR,ARG1=test
758 In this example, the propagated environment will only
759 contain the variable EDITOR from the user's environ‐
760 ment, SLURM_* environment variables, and ARG1=test.
761
762 Example: --export=ALL,EDITOR=/bin/emacs
763 There are two possible outcomes for this example. If
764 the caller has the EDITOR environment variable de‐
765 fined, then the job's environment will inherit the
766 variable from the caller's environment. If the caller
767 doesn't have an environment variable defined for EDI‐
768 TOR, then the job's environment will use the value
769 given by --export.
770
771 --export-file={<filename>|<fd>}
772 If a number between 3 and OPEN_MAX is specified as the argument
773 to this option, a readable file descriptor will be assumed
774 (STDIN and STDOUT are not supported as valid arguments). Other‐
775 wise a filename is assumed. Export environment variables de‐
776 fined in <filename> or read from <fd> to the job's execution en‐
777 vironment. The content is one or more environment variable defi‐
778 nitions of the form NAME=value, each separated by a null charac‐
779 ter. This allows the use of special characters in environment
780 definitions.
781
782 -B, --extra-node-info=<sockets>[:cores[:threads]]
783 Restrict node selection to nodes with at least the specified
784 number of sockets, cores per socket and/or threads per core.
785 NOTE: These options do not specify the resource allocation size.
786 Each value specified is considered a minimum. An asterisk (*)
787 can be used as a placeholder indicating that all available re‐
788 sources of that type are to be utilized. Values can also be
789 specified as min-max. The individual levels can also be speci‐
790 fied in separate options if desired:
791 --sockets-per-node=<sockets>
792 --cores-per-socket=<cores>
793 --threads-per-core=<threads>
794 If task/affinity plugin is enabled, then specifying an alloca‐
795 tion in this manner also results in subsequently launched tasks
796 being bound to threads if the -B option specifies a thread
797 count, otherwise an option of cores if a core count is speci‐
798 fied, otherwise an option of sockets. If SelectType is config‐
799 ured to select/cons_res, it must have a parameter of CR_Core,
800 CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option
801 to be honored. If not specified, the scontrol show job will
802 display 'ReqS:C:T=*:*:*'. This option applies to job alloca‐
803 tions.
804 NOTE: This option is mutually exclusive with --hint,
805 --threads-per-core and --ntasks-per-core.
806 NOTE: This option may implicitly set the number of tasks (if -n
807 was not specified) as one task per requested thread.
808
809 --get-user-env[=timeout][mode]
810 This option will tell sbatch to retrieve the login environment
811 variables for the user specified in the --uid option. The envi‐
812 ronment variables are retrieved by running something of this
813 sort "su - <username> -c /usr/bin/env" and parsing the output.
814 Be aware that any environment variables already set in sbatch's
815 environment will take precedence over any environment variables
816 in the user's login environment. Clear any environment variables
817 before calling sbatch that you do not want propagated to the
818 spawned program. The optional timeout value is in seconds. De‐
819 fault value is 8 seconds. The optional mode value control the
820 "su" options. With a mode value of "S", "su" is executed with‐
821 out the "-" option. With a mode value of "L", "su" is executed
822 with the "-" option, replicating the login environment. If mode
823 not specified, the mode established at Slurm build time is used.
824 Example of use include "--get-user-env", "--get-user-env=10"
825 "--get-user-env=10L", and "--get-user-env=S".
826
827 --gid=<group>
828 If sbatch is run as root, and the --gid option is used, submit
829 the job with group's group access permissions. group may be the
830 group name or the numerical group ID.
831
832 --gpu-bind=[verbose,]<type>
833 Bind tasks to specific GPUs. By default every spawned task can
834 access every GPU allocated to the step. If "verbose," is speci‐
835 fied before <type>, then print out GPU binding debug information
836 to the stderr of the tasks. GPU binding is ignored if there is
837 only one task.
838
839 Supported type options:
840
841 closest Bind each task to the GPU(s) which are closest. In a
842 NUMA environment, each task may be bound to more than
843 one GPU (i.e. all GPUs in that NUMA environment).
844
845 map_gpu:<list>
846 Bind by setting GPU masks on tasks (or ranks) as spec‐
847 ified where <list> is
848 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
849 are interpreted as decimal values unless they are pre‐
850 ceded with '0x' in which case they interpreted as
851 hexadecimal values. If the number of tasks (or ranks)
852 exceeds the number of elements in this list, elements
853 in the list will be reused as needed starting from the
854 beginning of the list. To simplify support for large
855 task counts, the lists may follow a map with an aster‐
856 isk and repetition count. For example
857 "map_gpu:0*4,1*4". If the task/cgroup plugin is used
858 and ConstrainDevices is set in cgroup.conf, then the
859 GPU IDs are zero-based indexes relative to the GPUs
860 allocated to the job (e.g. the first GPU is 0, even if
861 the global ID is 3). Otherwise, the GPU IDs are global
862 IDs, and all GPUs on each node in the job should be
863 allocated for predictable binding results.
864
865 mask_gpu:<list>
866 Bind by setting GPU masks on tasks (or ranks) as spec‐
867 ified where <list> is
868 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
869 mapping is specified for a node and identical mapping
870 is applied to the tasks on every node (i.e. the lowest
871 task ID on each node is mapped to the first mask spec‐
872 ified in the list, etc.). GPU masks are always inter‐
873 preted as hexadecimal values but can be preceded with
874 an optional '0x'. To simplify support for large task
875 counts, the lists may follow a map with an asterisk
876 and repetition count. For example
877 "mask_gpu:0x0f*4,0xf0*4". If the task/cgroup plugin
878 is used and ConstrainDevices is set in cgroup.conf,
879 then the GPU IDs are zero-based indexes relative to
880 the GPUs allocated to the job (e.g. the first GPU is
881 0, even if the global ID is 3). Otherwise, the GPU IDs
882 are global IDs, and all GPUs on each node in the job
883 should be allocated for predictable binding results.
884
885 none Do not bind tasks to GPUs (turns off binding if
886 --gpus-per-task is requested).
887
888 per_task:<gpus_per_task>
889 Each task will be bound to the number of gpus speci‐
890 fied in <gpus_per_task>. Gpus are assigned in order to
891 tasks. The first task will be assigned the first x
892 number of gpus on the node etc.
893
894 single:<tasks_per_gpu>
895 Like --gpu-bind=closest, except that each task can
896 only be bound to a single GPU, even when it can be
897 bound to multiple GPUs that are equally close. The
898 GPU to bind to is determined by <tasks_per_gpu>, where
899 the first <tasks_per_gpu> tasks are bound to the first
900 GPU available, the second <tasks_per_gpu> tasks are
901 bound to the second GPU available, etc. This is basi‐
902 cally a block distribution of tasks onto available
903 GPUs, where the available GPUs are determined by the
904 socket affinity of the task and the socket affinity of
905 the GPUs as specified in gres.conf's Cores parameter.
906
907 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
908 Request that GPUs allocated to the job are configured with spe‐
909 cific frequency values. This option can be used to indepen‐
910 dently configure the GPU and its memory frequencies. After the
911 job is completed, the frequencies of all affected GPUs will be
912 reset to the highest possible values. In some cases, system
913 power caps may override the requested values. The field type
914 can be "memory". If type is not specified, the GPU frequency is
915 implied. The value field can either be "low", "medium", "high",
916 "highm1" or a numeric value in megahertz (MHz). If the speci‐
917 fied numeric value is not possible, a value as close as possible
918 will be used. See below for definition of the values. The ver‐
919 bose option causes current GPU frequency information to be
920 logged. Examples of use include "--gpu-freq=medium,memory=high"
921 and "--gpu-freq=450".
922
923 Supported value definitions:
924
925 low the lowest available frequency.
926
927 medium attempts to set a frequency in the middle of the
928 available range.
929
930 high the highest available frequency.
931
932 highm1 (high minus one) will select the next highest avail‐
933 able frequency.
934
935 -G, --gpus=[type:]<number>
936 Specify the total number of GPUs required for the job. An op‐
937 tional GPU type specification can be supplied. For example
938 "--gpus=volta:3". Multiple options can be requested in a comma
939 separated list, for example: "--gpus=volta:3,kepler:1". See
940 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
941 options.
942 NOTE: The allocation has to contain at least one GPU per node.
943
944 --gpus-per-node=[type:]<number>
945 Specify the number of GPUs required for the job on each node in‐
946 cluded in the job's resource allocation. An optional GPU type
947 specification can be supplied. For example
948 "--gpus-per-node=volta:3". Multiple options can be requested in
949 a comma separated list, for example:
950 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
951 --gpus-per-socket and --gpus-per-task options.
952
953 --gpus-per-socket=[type:]<number>
954 Specify the number of GPUs required for the job on each socket
955 included in the job's resource allocation. An optional GPU type
956 specification can be supplied. For example
957 "--gpus-per-socket=volta:3". Multiple options can be requested
958 in a comma separated list, for example:
959 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
960 sockets per node count ( --sockets-per-node). See also the
961 --gpus, --gpus-per-node and --gpus-per-task options.
962
963 --gpus-per-task=[type:]<number>
964 Specify the number of GPUs required for the job on each task to
965 be spawned in the job's resource allocation. An optional GPU
966 type specification can be supplied. For example
967 "--gpus-per-task=volta:1". Multiple options can be requested in
968 a comma separated list, for example:
969 "--gpus-per-task=volta:3,kepler:1". See also the --gpus,
970 --gpus-per-socket and --gpus-per-node options. This option re‐
971 quires an explicit task count, e.g. -n, --ntasks or "--gpus=X
972 --gpus-per-task=Y" rather than an ambiguous range of nodes with
973 -N, --nodes. This option will implicitly set
974 --gpu-bind=per_task:<gpus_per_task>, but that can be overridden
975 with an explicit --gpu-bind specification.
976
977 --gres=<list>
978 Specifies a comma-delimited list of generic consumable re‐
979 sources. The format of each entry on the list is
980 "name[[:type]:count]". The name is that of the consumable re‐
981 source. The count is the number of those resources with a de‐
982 fault value of 1. The count can have a suffix of "k" or "K"
983 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
984 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
985 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
986 x 1024 x 1024 x 1024). The specified resources will be allo‐
987 cated to the job on each node. The available generic consumable
988 resources is configurable by the system administrator. A list
989 of available generic consumable resources will be printed and
990 the command will exit if the option argument is "help". Exam‐
991 ples of use include "--gres=gpu:2", "--gres=gpu:kepler:2", and
992 "--gres=help".
993
994 --gres-flags=<type>
995 Specify generic resource task binding options.
996
997 disable-binding
998 Disable filtering of CPUs with respect to generic re‐
999 source locality. This option is currently required to
1000 use more CPUs than are bound to a GRES (i.e. if a GPU is
1001 bound to the CPUs on one socket, but resources on more
1002 than one socket are required to run the job). This op‐
1003 tion may permit a job to be allocated resources sooner
1004 than otherwise possible, but may result in lower job per‐
1005 formance.
1006 NOTE: This option is specific to SelectType=cons_res.
1007
1008 enforce-binding
1009 The only CPUs available to the job will be those bound to
1010 the selected GRES (i.e. the CPUs identified in the
1011 gres.conf file will be strictly enforced). This option
1012 may result in delayed initiation of a job. For example a
1013 job requiring two GPUs and one CPU will be delayed until
1014 both GPUs on a single socket are available rather than
1015 using GPUs bound to separate sockets, however, the appli‐
1016 cation performance may be improved due to improved commu‐
1017 nication speed. Requires the node to be configured with
1018 more than one socket and resource filtering will be per‐
1019 formed on a per-socket basis.
1020 NOTE: This option is specific to SelectType=cons_tres.
1021
1022 -h, --help
1023 Display help information and exit.
1024
1025 --hint=<type>
1026 Bind tasks according to application hints.
1027 NOTE: This option cannot be used in conjunction with
1028 --ntasks-per-core, --threads-per-core or -B. If --hint is speci‐
1029 fied as a command line argument, it will take precedence over
1030 the environment.
1031
1032 compute_bound
1033 Select settings for compute bound applications: use all
1034 cores in each socket, one thread per core.
1035
1036 memory_bound
1037 Select settings for memory bound applications: use only
1038 one core in each socket, one thread per core.
1039
1040 [no]multithread
1041 [don't] use extra threads with in-core multi-threading
1042 which can benefit communication intensive applications.
1043 Only supported with the task/affinity plugin.
1044
1045 help show this help message
1046
1047 -H, --hold
1048 Specify the job is to be submitted in a held state (priority of
1049 zero). A held job can now be released using scontrol to reset
1050 its priority (e.g. "scontrol release <job_id>").
1051
1052 --ignore-pbs
1053 Ignore all "#PBS" and "#BSUB" options specified in the batch
1054 script.
1055
1056 -i, --input=<filename_pattern>
1057 Instruct Slurm to connect the batch script's standard input di‐
1058 rectly to the file name specified in the "filename pattern".
1059
1060 By default, "/dev/null" is open on the batch script's standard
1061 input and both standard output and standard error are directed
1062 to a file of the name "slurm-%j.out", where the "%j" is replaced
1063 with the job allocation number, as described below in the file‐
1064 name pattern section.
1065
1066 -J, --job-name=<jobname>
1067 Specify a name for the job allocation. The specified name will
1068 appear along with the job id number when querying running jobs
1069 on the system. The default is the name of the batch script, or
1070 just "sbatch" if the script is read on sbatch's standard input.
1071
1072 --kill-on-invalid-dep=<yes|no>
1073 If a job has an invalid dependency and it can never run this pa‐
1074 rameter tells Slurm to terminate it or not. A terminated job
1075 state will be JOB_CANCELLED. If this option is not specified
1076 the system wide behavior applies. By default the job stays
1077 pending with reason DependencyNeverSatisfied or if the kill_in‐
1078 valid_depend is specified in slurm.conf the job is terminated.
1079
1080 -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
1081 Specification of licenses (or other resources available on all
1082 nodes of the cluster) which must be allocated to this job. Li‐
1083 cense names can be followed by a colon and count (the default
1084 count is one). Multiple license names should be comma separated
1085 (e.g. "--licenses=foo:4,bar"). To submit jobs using remote li‐
1086 censes, those served by the slurmdbd, specify the name of the
1087 server providing the licenses. For example "--license=nas‐
1088 tran@slurmdb:12".
1089
1090 NOTE: When submitting heterogeneous jobs, license requests only
1091 work correctly when made on the first component job. For exam‐
1092 ple "sbatch -L ansys:2 : script.sh".
1093
1094 --mail-type=<type>
1095 Notify user by email when certain event types occur. Valid type
1096 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1097 BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
1098 VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1099 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1100 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
1101 percent of time limit), TIME_LIMIT_50 (reached 50 percent of
1102 time limit) and ARRAY_TASKS (send emails for each array task).
1103 Multiple type values may be specified in a comma separated list.
1104 The user to be notified is indicated with --mail-user. Unless
1105 the ARRAY_TASKS option is specified, mail notifications on job
1106 BEGIN, END and FAIL apply to a job array as a whole rather than
1107 generating individual email messages for each task in the job
1108 array.
1109
1110 --mail-user=<user>
1111 User to receive email notification of state changes as defined
1112 by --mail-type. The default value is the submitting user.
1113
1114 --mcs-label=<mcs>
1115 Used only when the mcs/group plugin is enabled. This parameter
1116 is a group among the groups of the user. Default value is cal‐
1117 culated by the Plugin mcs if it's enabled.
1118
1119 --mem=<size>[units]
1120 Specify the real memory required per node. Default units are
1121 megabytes. Different units can be specified using the suffix
1122 [K|M|G|T]. Default value is DefMemPerNode and the maximum value
1123 is MaxMemPerNode. If configured, both parameters can be seen us‐
1124 ing the scontrol show config command. This parameter would gen‐
1125 erally be used if whole nodes are allocated to jobs (Select‐
1126 Type=select/linear). Also see --mem-per-cpu and --mem-per-gpu.
1127 The --mem, --mem-per-cpu and --mem-per-gpu options are mutually
1128 exclusive. If --mem, --mem-per-cpu or --mem-per-gpu are speci‐
1129 fied as command line arguments, then they will take precedence
1130 over the environment.
1131
1132 NOTE: A memory size specification of zero is treated as a spe‐
1133 cial case and grants the job access to all of the memory on each
1134 node.
1135
1136 NOTE: Enforcement of memory limits currently relies upon the
1137 task/cgroup plugin or enabling of accounting, which samples mem‐
1138 ory use on a periodic basis (data need not be stored, just col‐
1139 lected). In both cases memory use is based upon the job's Resi‐
1140 dent Set Size (RSS). A task may exceed the memory limit until
1141 the next periodic accounting sample.
1142
1143 --mem-bind=[{quiet|verbose},]<type>
1144 Bind tasks to memory. Used only when the task/affinity plugin is
1145 enabled and the NUMA memory functions are available. Note that
1146 the resolution of CPU and memory binding may differ on some ar‐
1147 chitectures. For example, CPU binding may be performed at the
1148 level of the cores within a processor while memory binding will
1149 be performed at the level of nodes, where the definition of
1150 "nodes" may differ from system to system. By default no memory
1151 binding is performed; any task using any CPU can use any memory.
1152 This option is typically used to ensure that each task is bound
1153 to the memory closest to its assigned CPU. The use of any type
1154 other than "none" or "local" is not recommended.
1155
1156 NOTE: To have Slurm always report on the selected memory binding
1157 for all commands executed in a shell, you can enable verbose
1158 mode by setting the SLURM_MEM_BIND environment variable value to
1159 "verbose".
1160
1161 The following informational environment variables are set when
1162 --mem-bind is in use:
1163
1164 SLURM_MEM_BIND_LIST
1165 SLURM_MEM_BIND_PREFER
1166 SLURM_MEM_BIND_SORT
1167 SLURM_MEM_BIND_TYPE
1168 SLURM_MEM_BIND_VERBOSE
1169
1170 See the ENVIRONMENT VARIABLES section for a more detailed de‐
1171 scription of the individual SLURM_MEM_BIND* variables.
1172
1173 Supported options include:
1174
1175 help show this help message
1176
1177 local Use memory local to the processor in use
1178
1179 map_mem:<list>
1180 Bind by setting memory masks on tasks (or ranks) as spec‐
1181 ified where <list> is
1182 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1183 ping is specified for a node and identical mapping is ap‐
1184 plied to the tasks on every node (i.e. the lowest task ID
1185 on each node is mapped to the first ID specified in the
1186 list, etc.). NUMA IDs are interpreted as decimal values
1187 unless they are preceded with '0x' in which case they in‐
1188 terpreted as hexadecimal values. If the number of tasks
1189 (or ranks) exceeds the number of elements in this list,
1190 elements in the list will be reused as needed starting
1191 from the beginning of the list. To simplify support for
1192 large task counts, the lists may follow a map with an as‐
1193 terisk and repetition count. For example
1194 "map_mem:0x0f*4,0xf0*4". For predictable binding re‐
1195 sults, all CPUs for each node in the job should be allo‐
1196 cated to the job.
1197
1198 mask_mem:<list>
1199 Bind by setting memory masks on tasks (or ranks) as spec‐
1200 ified where <list> is
1201 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1202 mapping is specified for a node and identical mapping is
1203 applied to the tasks on every node (i.e. the lowest task
1204 ID on each node is mapped to the first mask specified in
1205 the list, etc.). NUMA masks are always interpreted as
1206 hexadecimal values. Note that masks must be preceded
1207 with a '0x' if they don't begin with [0-9] so they are
1208 seen as numerical values. If the number of tasks (or
1209 ranks) exceeds the number of elements in this list, ele‐
1210 ments in the list will be reused as needed starting from
1211 the beginning of the list. To simplify support for large
1212 task counts, the lists may follow a mask with an asterisk
1213 and repetition count. For example "mask_mem:0*4,1*4".
1214 For predictable binding results, all CPUs for each node
1215 in the job should be allocated to the job.
1216
1217 no[ne] don't bind tasks to memory (default)
1218
1219 p[refer]
1220 Prefer use of first specified NUMA node, but permit
1221 use of other available NUMA nodes.
1222
1223 q[uiet]
1224 quietly bind before task runs (default)
1225
1226 rank bind by task rank (not recommended)
1227
1228 sort sort free cache pages (run zonesort on Intel KNL nodes)
1229
1230 v[erbose]
1231 verbosely report binding before task runs
1232
1233 --mem-per-cpu=<size>[units]
1234 Minimum memory required per usable allocated CPU. Default units
1235 are megabytes. The default value is DefMemPerCPU and the maxi‐
1236 mum value is MaxMemPerCPU (see exception below). If configured,
1237 both parameters can be seen using the scontrol show config com‐
1238 mand. Note that if the job's --mem-per-cpu value exceeds the
1239 configured MaxMemPerCPU, then the user's limit will be treated
1240 as a memory limit per task; --mem-per-cpu will be reduced to a
1241 value no larger than MaxMemPerCPU; --cpus-per-task will be set
1242 and the value of --cpus-per-task multiplied by the new
1243 --mem-per-cpu value will equal the original --mem-per-cpu value
1244 specified by the user. This parameter would generally be used
1245 if individual processors are allocated to jobs (SelectType=se‐
1246 lect/cons_res). If resources are allocated by core, socket, or
1247 whole nodes, then the number of CPUs allocated to a job may be
1248 higher than the task count and the value of --mem-per-cpu should
1249 be adjusted accordingly. Also see --mem and --mem-per-gpu. The
1250 --mem, --mem-per-cpu and --mem-per-gpu options are mutually ex‐
1251 clusive.
1252
1253 NOTE: If the final amount of memory requested by a job can't be
1254 satisfied by any of the nodes configured in the partition, the
1255 job will be rejected. This could happen if --mem-per-cpu is
1256 used with the --exclusive option for a job allocation and
1257 --mem-per-cpu times the number of CPUs on a node is greater than
1258 the total memory of that node.
1259
1260 NOTE: This applies to usable allocated CPUs in a job allocation.
1261 This is important when more than one thread per core is config‐
1262 ured. If a job requests --threads-per-core with fewer threads
1263 on a core than exist on the core (or --hint=nomultithread which
1264 implies --threads-per-core=1), the job will be unable to use
1265 those extra threads on the core and those threads will not be
1266 included in the memory per CPU calculation. But if the job has
1267 access to all threads on the core, those threads will be in‐
1268 cluded in the memory per CPU calculation even if the job did not
1269 explicitly request those threads.
1270
1271 In the following examples, each core has two threads.
1272
1273 In this first example, two tasks can run on separate hyper‐
1274 threads in the same core because --threads-per-core is not used.
1275 The third task uses both threads of the second core. The allo‐
1276 cated memory per cpu includes all threads:
1277
1278 $ salloc -n3 --mem-per-cpu=100
1279 salloc: Granted job allocation 17199
1280 $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
1281 JobID ReqTRES AllocTRES
1282 ------- ----------------------------------- -----------------------------------
1283 17199 billing=3,cpu=3,mem=300M,node=1 billing=4,cpu=4,mem=400M,node=1
1284
1285 In this second example, because of --threads-per-core=1, each
1286 task is allocated an entire core but is only able to use one
1287 thread per core. Allocated CPUs includes all threads on each
1288 core. However, allocated memory per cpu includes only the usable
1289 thread in each core.
1290
1291 $ salloc -n3 --mem-per-cpu=100 --threads-per-core=1
1292 salloc: Granted job allocation 17200
1293 $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
1294 JobID ReqTRES AllocTRES
1295 ------- ----------------------------------- -----------------------------------
1296 17200 billing=3,cpu=3,mem=300M,node=1 billing=6,cpu=6,mem=300M,node=1
1297
1298 --mem-per-gpu=<size>[units]
1299 Minimum memory required per allocated GPU. Default units are
1300 megabytes. Different units can be specified using the suffix
1301 [K|M|G|T]. Default value is DefMemPerGPU and is available on
1302 both a global and per partition basis. If configured, the pa‐
1303 rameters can be seen using the scontrol show config and scontrol
1304 show partition commands. Also see --mem. The --mem,
1305 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1306
1307 --mincpus=<n>
1308 Specify a minimum number of logical cpus/processors per node.
1309
1310 --network=<type>
1311 Specify information pertaining to the switch or network. The
1312 interpretation of type is system dependent. This option is sup‐
1313 ported when running Slurm on a Cray natively. It is used to re‐
1314 quest using Network Performance Counters. Only one value per
1315 request is valid. All options are case in-sensitive. In this
1316 configuration supported values include:
1317
1318 system
1319 Use the system-wide network performance counters. Only
1320 nodes requested will be marked in use for the job alloca‐
1321 tion. If the job does not fill up the entire system the
1322 rest of the nodes are not able to be used by other jobs
1323 using NPC, if idle their state will appear as PerfCnts.
1324 These nodes are still available for other jobs not using
1325 NPC.
1326
1327 blade Use the blade network performance counters. Only nodes re‐
1328 quested will be marked in use for the job allocation. If
1329 the job does not fill up the entire blade(s) allocated to
1330 the job those blade(s) are not able to be used by other
1331 jobs using NPC, if idle their state will appear as PerfC‐
1332 nts. These nodes are still available for other jobs not
1333 using NPC.
1334
1335 In all cases the job allocation request must specify the --exclusive
1336 option. Otherwise the request will be denied.
1337
1338 Also with any of these options steps are not allowed to share blades,
1339 so resources would remain idle inside an allocation if the step running
1340 on a blade does not take up all the nodes on the blade.
1341
1342 --nice[=adjustment]
1343 Run the job with an adjusted scheduling priority within Slurm.
1344 With no adjustment value the scheduling priority is decreased by
1345 100. A negative nice value increases the priority, otherwise de‐
1346 creases it. The adjustment range is +/- 2147483645. Only privi‐
1347 leged users can specify a negative adjustment.
1348
1349 -k, --no-kill[=off]
1350 Do not automatically terminate a job if one of the nodes it has
1351 been allocated fails. The user will assume the responsibilities
1352 for fault-tolerance should a node fail. The job allocation will
1353 not be revoked so the user may launch new job steps on the re‐
1354 maining nodes in their allocation. This option does not set the
1355 SLURM_NO_KILL environment variable. Therefore, when a node
1356 fails, steps running on that node will be killed unless the
1357 SLURM_NO_KILL environment variable was explicitly set or srun
1358 calls within the job allocation explicitly requested --no-kill.
1359
1360 Specify an optional argument of "off" to disable the effect of
1361 the SBATCH_NO_KILL environment variable.
1362
1363 By default Slurm terminates the entire job allocation if any
1364 node fails in its range of allocated nodes.
1365
1366 --no-requeue
1367 Specifies that the batch job should never be requeued under any
1368 circumstances. Setting this option will prevent system adminis‐
1369 trators from being able to restart the job (for example, after a
1370 scheduled downtime), recover from a node failure, or be requeued
1371 upon preemption by a higher priority job. When a job is re‐
1372 queued, the batch script is initiated from its beginning. Also
1373 see the --requeue option. The JobRequeue configuration parame‐
1374 ter controls the default behavior on the cluster.
1375
1376 -F, --nodefile=<node_file>
1377 Much like --nodelist, but the list is contained in a file of
1378 name node file. The node names of the list may also span multi‐
1379 ple lines in the file. Duplicate node names in the file will
1380 be ignored. The order of the node names in the list is not im‐
1381 portant; the node names will be sorted by Slurm.
1382
1383 -w, --nodelist=<node_name_list>
1384 Request a specific list of hosts. The job will contain all of
1385 these hosts and possibly additional hosts as needed to satisfy
1386 resource requirements. The list may be specified as a
1387 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1388 for example), or a filename. The host list will be assumed to
1389 be a filename if it contains a "/" character. If you specify a
1390 minimum node or processor count larger than can be satisfied by
1391 the supplied host list, additional resources will be allocated
1392 on other nodes as needed. Duplicate node names in the list will
1393 be ignored. The order of the node names in the list is not im‐
1394 portant; the node names will be sorted by Slurm.
1395
1396 -N, --nodes=<minnodes>[-maxnodes]
1397 Request that a minimum of minnodes nodes be allocated to this
1398 job. A maximum node count may also be specified with maxnodes.
1399 If only one number is specified, this is used as both the mini‐
1400 mum and maximum node count. The partition's node limits super‐
1401 sede those of the job. If a job's node limits are outside of
1402 the range permitted for its associated partition, the job will
1403 be left in a PENDING state. This permits possible execution at
1404 a later time, when the partition limit is changed. If a job
1405 node limit exceeds the number of nodes configured in the parti‐
1406 tion, the job will be rejected. Note that the environment vari‐
1407 able SLURM_JOB_NUM_NODES will be set to the count of nodes actu‐
1408 ally allocated to the job. See the ENVIRONMENT VARIABLES sec‐
1409 tion for more information. If -N is not specified, the default
1410 behavior is to allocate enough nodes to satisfy the requested
1411 resources as expressed by per-job specification options, e.g.
1412 -n, -c and --gpus. The job will be allocated as many nodes as
1413 possible within the range specified and without delaying the
1414 initiation of the job. The node count specification may include
1415 a numeric value followed by a suffix of "k" (multiplies numeric
1416 value by 1,024) or "m" (multiplies numeric value by 1,048,576).
1417
1418 -n, --ntasks=<number>
1419 sbatch does not launch tasks, it requests an allocation of re‐
1420 sources and submits a batch script. This option advises the
1421 Slurm controller that job steps run within the allocation will
1422 launch a maximum of number tasks and to provide for sufficient
1423 resources. The default is one task per node, but note that the
1424 --cpus-per-task option will change this default.
1425
1426 --ntasks-per-core=<ntasks>
1427 Request the maximum ntasks be invoked on each core. Meant to be
1428 used with the --ntasks option. Related to --ntasks-per-node ex‐
1429 cept at the core level instead of the node level. NOTE: This
1430 option is not supported when using SelectType=select/linear.
1431
1432 --ntasks-per-gpu=<ntasks>
1433 Request that there are ntasks tasks invoked for every GPU. This
1434 option can work in two ways: 1) either specify --ntasks in addi‐
1435 tion, in which case a type-less GPU specification will be auto‐
1436 matically determined to satisfy --ntasks-per-gpu, or 2) specify
1437 the GPUs wanted (e.g. via --gpus or --gres) without specifying
1438 --ntasks, and the total task count will be automatically deter‐
1439 mined. The number of CPUs needed will be automatically in‐
1440 creased if necessary to allow for any calculated task count.
1441 This option will implicitly set --gpu-bind=single:<ntasks>, but
1442 that can be overridden with an explicit --gpu-bind specifica‐
1443 tion. This option is not compatible with a node range (i.e.
1444 -N<minnodes-maxnodes>). This option is not compatible with
1445 --gpus-per-task, --gpus-per-socket, or --ntasks-per-node. This
1446 option is not supported unless SelectType=cons_tres is config‐
1447 ured (either directly or indirectly on Cray systems).
1448
1449 --ntasks-per-node=<ntasks>
1450 Request that ntasks be invoked on each node. If used with the
1451 --ntasks option, the --ntasks option will take precedence and
1452 the --ntasks-per-node will be treated as a maximum count of
1453 tasks per node. Meant to be used with the --nodes option. This
1454 is related to --cpus-per-task=ncpus, but does not require knowl‐
1455 edge of the actual number of cpus on each node. In some cases,
1456 it is more convenient to be able to request that no more than a
1457 specific number of tasks be invoked on each node. Examples of
1458 this include submitting a hybrid MPI/OpenMP app where only one
1459 MPI "task/rank" should be assigned to each node while allowing
1460 the OpenMP portion to utilize all of the parallelism present in
1461 the node, or submitting a single setup/cleanup/monitoring job to
1462 each node of a pre-existing allocation as one step in a larger
1463 job script.
1464
1465 --ntasks-per-socket=<ntasks>
1466 Request the maximum ntasks be invoked on each socket. Meant to
1467 be used with the --ntasks option. Related to --ntasks-per-node
1468 except at the socket level instead of the node level. NOTE:
1469 This option is not supported when using SelectType=select/lin‐
1470 ear.
1471
1472 --open-mode={append|truncate}
1473 Open the output and error files using append or truncate mode as
1474 specified. The default value is specified by the system config‐
1475 uration parameter JobFileAppend.
1476
1477 -o, --output=<filename_pattern>
1478 Instruct Slurm to connect the batch script's standard output di‐
1479 rectly to the file name specified in the "filename pattern". By
1480 default both standard output and standard error are directed to
1481 the same file. For job arrays, the default file name is
1482 "slurm-%A_%a.out", "%A" is replaced by the job ID and "%a" with
1483 the array index. For other jobs, the default file name is
1484 "slurm-%j.out", where the "%j" is replaced by the job ID. See
1485 the filename pattern section below for filename specification
1486 options.
1487
1488 -O, --overcommit
1489 Overcommit resources.
1490
1491 When applied to a job allocation (not including jobs requesting
1492 exclusive access to the nodes) the resources are allocated as if
1493 only one task per node is requested. This means that the re‐
1494 quested number of cpus per task (-c, --cpus-per-task) are allo‐
1495 cated per node rather than being multiplied by the number of
1496 tasks. Options used to specify the number of tasks per node,
1497 socket, core, etc. are ignored.
1498
1499 When applied to job step allocations (the srun command when exe‐
1500 cuted within an existing job allocation), this option can be
1501 used to launch more than one task per CPU. Normally, srun will
1502 not allocate more than one process per CPU. By specifying
1503 --overcommit you are explicitly allowing more than one process
1504 per CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1505 mitted to execute per node. NOTE: MAX_TASKS_PER_NODE is defined
1506 in the file slurm.h and is not a variable, it is set at Slurm
1507 build time.
1508
1509 -s, --oversubscribe
1510 The job allocation can over-subscribe resources with other run‐
1511 ning jobs. The resources to be over-subscribed can be nodes,
1512 sockets, cores, and/or hyperthreads depending upon configura‐
1513 tion. The default over-subscribe behavior depends on system
1514 configuration and the partition's OverSubscribe option takes
1515 precedence over the job's option. This option may result in the
1516 allocation being granted sooner than if the --oversubscribe op‐
1517 tion was not set and allow higher system utilization, but appli‐
1518 cation performance will likely suffer due to competition for re‐
1519 sources. Also see the --exclusive option.
1520
1521 --parsable
1522 Outputs only the job id number and the cluster name if present.
1523 The values are separated by a semicolon. Errors will still be
1524 displayed.
1525
1526 -p, --partition=<partition_names>
1527 Request a specific partition for the resource allocation. If
1528 not specified, the default behavior is to allow the slurm con‐
1529 troller to select the default partition as designated by the
1530 system administrator. If the job can use more than one parti‐
1531 tion, specify their names in a comma separate list and the one
1532 offering earliest initiation will be used with no regard given
1533 to the partition name ordering (although higher priority parti‐
1534 tions will be considered first). When the job is initiated, the
1535 name of the partition used will be placed first in the job
1536 record partition string.
1537
1538 --power=<flags>
1539 Comma separated list of power management plugin options. Cur‐
1540 rently available flags include: level (all nodes allocated to
1541 the job should have identical power caps, may be disabled by the
1542 Slurm configuration option PowerParameters=job_no_level).
1543
1544 --prefer=<list>
1545 Nodes can have features assigned to them by the Slurm adminis‐
1546 trator. Users can specify which of these features are desired
1547 but not required by their job using the prefer option. This op‐
1548 tion operates independently from --constraint and will override
1549 whatever is set there if possible. When scheduling the features
1550 in --prefer are tried first if a node set isn't available with
1551 those features then --constraint is attempted. See --constraint
1552 for more information, this option behaves the same way.
1553
1554
1555 --priority=<value>
1556 Request a specific job priority. May be subject to configura‐
1557 tion specific constraints. value should either be a numeric
1558 value or "TOP" (for highest possible value). Only Slurm opera‐
1559 tors and administrators can set the priority of a job.
1560
1561 --profile={all|none|<type>[,<type>...]}
1562 Enables detailed data collection by the acct_gather_profile
1563 plugin. Detailed data are typically time-series that are stored
1564 in an HDF5 file for the job or an InfluxDB database depending on
1565 the configured plugin.
1566
1567 All All data types are collected. (Cannot be combined with
1568 other values.)
1569
1570 None No data types are collected. This is the default.
1571 (Cannot be combined with other values.)
1572
1573 Valid type values are:
1574
1575 Energy Energy data is collected.
1576
1577 Task Task (I/O, Memory, ...) data is collected.
1578
1579 Lustre Lustre data is collected.
1580
1581 Network
1582 Network (InfiniBand) data is collected.
1583
1584 --propagate[=rlimit[,rlimit...]]
1585 Allows users to specify which of the modifiable (soft) resource
1586 limits to propagate to the compute nodes and apply to their
1587 jobs. If no rlimit is specified, then all resource limits will
1588 be propagated. The following rlimit names are supported by
1589 Slurm (although some options may not be supported on some sys‐
1590 tems):
1591
1592 ALL All limits listed below (default)
1593
1594 NONE No limits listed below
1595
1596 AS The maximum address space (virtual memory) for a
1597 process.
1598
1599 CORE The maximum size of core file
1600
1601 CPU The maximum amount of CPU time
1602
1603 DATA The maximum size of a process's data segment
1604
1605 FSIZE The maximum size of files created. Note that if the
1606 user sets FSIZE to less than the current size of the
1607 slurmd.log, job launches will fail with a 'File size
1608 limit exceeded' error.
1609
1610 MEMLOCK The maximum size that may be locked into memory
1611
1612 NOFILE The maximum number of open files
1613
1614 NPROC The maximum number of processes available
1615
1616 RSS The maximum resident set size. Note that this only has
1617 effect with Linux kernels 2.4.30 or older or BSD.
1618
1619 STACK The maximum stack size
1620
1621 -q, --qos=<qos>
1622 Request a quality of service for the job. QOS values can be de‐
1623 fined for each user/cluster/account association in the Slurm
1624 database. Users will be limited to their association's defined
1625 set of qos's when the Slurm configuration parameter, Account‐
1626 ingStorageEnforce, includes "qos" in its definition.
1627
1628 -Q, --quiet
1629 Suppress informational messages from sbatch such as Job ID. Only
1630 errors will still be displayed.
1631
1632 --reboot
1633 Force the allocated nodes to reboot before starting the job.
1634 This is only supported with some system configurations and will
1635 otherwise be silently ignored. Only root, SlurmUser or admins
1636 can reboot nodes.
1637
1638 --requeue
1639 Specifies that the batch job should be eligible for requeuing.
1640 The job may be requeued explicitly by a system administrator,
1641 after node failure, or upon preemption by a higher priority job.
1642 When a job is requeued, the batch script is initiated from its
1643 beginning. Also see the --no-requeue option. The JobRequeue
1644 configuration parameter controls the default behavior on the
1645 cluster.
1646
1647 --reservation=<reservation_names>
1648 Allocate resources for the job from the named reservation. If
1649 the job can use more than one reservation, specify their names
1650 in a comma separate list and the one offering earliest initia‐
1651 tion. Each reservation will be considered in the order it was
1652 requested. All reservations will be listed in scontrol/squeue
1653 through the life of the job. In accounting the first reserva‐
1654 tion will be seen and after the job starts the reservation used
1655 will replace it.
1656
1657 --signal=[{R|B}:]<sig_num>[@sig_time]
1658 When a job is within sig_time seconds of its end time, send it
1659 the signal sig_num. Due to the resolution of event handling by
1660 Slurm, the signal may be sent up to 60 seconds earlier than
1661 specified. sig_num may either be a signal number or name (e.g.
1662 "10" or "USR1"). sig_time must have an integer value between 0
1663 and 65535. By default, no signal is sent before the job's end
1664 time. If a sig_num is specified without any sig_time, the de‐
1665 fault time will be 60 seconds. Use the "B:" option to signal
1666 only the batch shell, none of the other processes will be sig‐
1667 naled. By default all job steps will be signaled, but not the
1668 batch shell itself. Use the "R:" option to allow this job to
1669 overlap with a reservation with MaxStartDelay set. To have the
1670 signal sent at preemption time see the preempt_send_user_signal
1671 SlurmctldParameter.
1672
1673 --sockets-per-node=<sockets>
1674 Restrict node selection to nodes with at least the specified
1675 number of sockets. See additional information under -B option
1676 above when task/affinity plugin is enabled.
1677 NOTE: This option may implicitly set the number of tasks (if -n
1678 was not specified) as one task per requested thread.
1679
1680 --spread-job
1681 Spread the job allocation over as many nodes as possible and at‐
1682 tempt to evenly distribute tasks across the allocated nodes.
1683 This option disables the topology/tree plugin.
1684
1685 --switches=<count>[@max-time]
1686 When a tree topology is used, this defines the maximum count of
1687 leaf switches desired for the job allocation and optionally the
1688 maximum time to wait for that number of switches. If Slurm finds
1689 an allocation containing more switches than the count specified,
1690 the job remains pending until it either finds an allocation with
1691 desired switch count or the time limit expires. It there is no
1692 switch count limit, there is no delay in starting the job. Ac‐
1693 ceptable time formats include "minutes", "minutes:seconds",
1694 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1695 "days-hours:minutes:seconds". The job's maximum time delay may
1696 be limited by the system administrator using the SchedulerParam‐
1697 eters configuration parameter with the max_switch_wait parameter
1698 option. On a dragonfly network the only switch count supported
1699 is 1 since communication performance will be highest when a job
1700 is allocate resources on one leaf switch or more than 2 leaf
1701 switches. The default max-time is the max_switch_wait Sched‐
1702 ulerParameters.
1703
1704 --test-only
1705 Validate the batch script and return an estimate of when a job
1706 would be scheduled to run given the current job queue and all
1707 the other arguments specifying the job requirements. No job is
1708 actually submitted.
1709
1710 --thread-spec=<num>
1711 Count of specialized threads per node reserved by the job for
1712 system operations and not used by the application. The applica‐
1713 tion will not use these threads, but will be charged for their
1714 allocation. This option can not be used with the --core-spec
1715 option.
1716
1717 NOTE: Explicitly setting a job's specialized thread value im‐
1718 plicitly sets its --exclusive option, reserving entire nodes for
1719 the job.
1720
1721 --threads-per-core=<threads>
1722 Restrict node selection to nodes with at least the specified
1723 number of threads per core. In task layout, use the specified
1724 maximum number of threads per core. NOTE: "Threads" refers to
1725 the number of processing units on each core rather than the num‐
1726 ber of application tasks to be launched per core. See addi‐
1727 tional information under -B option above when task/affinity
1728 plugin is enabled.
1729 NOTE: This option may implicitly set the number of tasks (if -n
1730 was not specified) as one task per requested thread.
1731
1732 -t, --time=<time>
1733 Set a limit on the total run time of the job allocation. If the
1734 requested time limit exceeds the partition's time limit, the job
1735 will be left in a PENDING state (possibly indefinitely). The
1736 default time limit is the partition's default time limit. When
1737 the time limit is reached, each task in each job step is sent
1738 SIGTERM followed by SIGKILL. The interval between signals is
1739 specified by the Slurm configuration parameter KillWait. The
1740 OverTimeLimit configuration parameter may permit the job to run
1741 longer than scheduled. Time resolution is one minute and second
1742 values are rounded up to the next minute.
1743
1744 A time limit of zero requests that no time limit be imposed.
1745 Acceptable time formats include "minutes", "minutes:seconds",
1746 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1747 "days-hours:minutes:seconds".
1748
1749 --time-min=<time>
1750 Set a minimum time limit on the job allocation. If specified,
1751 the job may have its --time limit lowered to a value no lower
1752 than --time-min if doing so permits the job to begin execution
1753 earlier than otherwise possible. The job's time limit will not
1754 be changed after the job is allocated resources. This is per‐
1755 formed by a backfill scheduling algorithm to allocate resources
1756 otherwise reserved for higher priority jobs. Acceptable time
1757 formats include "minutes", "minutes:seconds", "hours:min‐
1758 utes:seconds", "days-hours", "days-hours:minutes" and
1759 "days-hours:minutes:seconds".
1760
1761 --tmp=<size>[units]
1762 Specify a minimum amount of temporary disk space per node. De‐
1763 fault units are megabytes. Different units can be specified us‐
1764 ing the suffix [K|M|G|T].
1765
1766 --uid=<user>
1767 Attempt to submit and/or run a job as user instead of the invok‐
1768 ing user id. The invoking user's credentials will be used to
1769 check access permissions for the target partition. User root may
1770 use this option to run jobs as a normal user in a RootOnly par‐
1771 tition for example. If run as root, sbatch will drop its permis‐
1772 sions to the uid specified after node allocation is successful.
1773 user may be the user name or numerical user ID.
1774
1775 --usage
1776 Display brief help message and exit.
1777
1778 --use-min-nodes
1779 If a range of node counts is given, prefer the smaller count.
1780
1781 -v, --verbose
1782 Increase the verbosity of sbatch's informational messages. Mul‐
1783 tiple -v's will further increase sbatch's verbosity. By default
1784 only errors will be displayed.
1785
1786 -V, --version
1787 Display version information and exit.
1788
1789 -W, --wait
1790 Do not exit until the submitted job terminates. The exit code
1791 of the sbatch command will be the same as the exit code of the
1792 submitted job. If the job terminated due to a signal rather than
1793 a normal exit, the exit code will be set to 1. In the case of a
1794 job array, the exit code recorded will be the highest value for
1795 any task in the job array.
1796
1797 --wait-all-nodes=<value>
1798 Controls when the execution of the command begins. By default
1799 the job will begin execution as soon as the allocation is made.
1800
1801 0 Begin execution as soon as allocation can be made. Do not
1802 wait for all nodes to be ready for use (i.e. booted).
1803
1804 1 Do not begin execution until all nodes are ready for use.
1805
1806 --wckey=<wckey>
1807 Specify wckey to be used with job. If TrackWCKey=no (default)
1808 in the slurm.conf this value is ignored.
1809
1810 --wrap=<command_string>
1811 Sbatch will wrap the specified command string in a simple "sh"
1812 shell script, and submit that script to the slurm controller.
1813 When --wrap is used, a script name and arguments may not be
1814 specified on the command line; instead the sbatch-generated
1815 wrapper script is used.
1816
1818 sbatch allows for a filename pattern to contain one or more replacement
1819 symbols, which are a percent sign "%" followed by a letter (e.g. %j).
1820
1821
1822 \\ Do not process any of the replacement symbols.
1823
1824 %% The character "%".
1825
1826 %A Job array's master job allocation number.
1827
1828 %a Job array ID (index) number.
1829
1830 %J jobid.stepid of the running job. (e.g. "128.0")
1831
1832 %j jobid of the running job.
1833
1834 %N short hostname. This will create a separate IO file per node.
1835
1836 %n Node identifier relative to current job (e.g. "0" is the first
1837 node of the running job) This will create a separate IO file per
1838 node.
1839
1840 %s stepid of the running job.
1841
1842 %t task identifier (rank) relative to current job. This will create
1843 a separate IO file per task.
1844
1845 %u User name.
1846
1847 %x Job name.
1848
1849 A number placed between the percent character and format specifier may
1850 be used to zero-pad the result in the IO filename. This number is ig‐
1851 nored if the format specifier corresponds to non-numeric data (%N for
1852 example).
1853
1854 Some examples of how the format string may be used for a 4 task job
1855 step with a Job ID of 128 and step id of 0 are included below:
1856
1857
1858 job%J.out job128.0.out
1859
1860 job%4j.out job0128.out
1861
1862 job%j-%2t.out job128-00.out, job128-01.out, ...
1863
1865 Executing sbatch sends a remote procedure call to slurmctld. If enough
1866 calls from sbatch or other Slurm client commands that send remote pro‐
1867 cedure calls to the slurmctld daemon come in at once, it can result in
1868 a degradation of performance of the slurmctld daemon, possibly result‐
1869 ing in a denial of service.
1870
1871 Do not run sbatch or other Slurm client commands that send remote pro‐
1872 cedure calls to slurmctld from loops in shell scripts or other pro‐
1873 grams. Ensure that programs limit calls to sbatch to the minimum neces‐
1874 sary for the information you are trying to gather.
1875
1876
1878 Upon startup, sbatch will read and handle the options set in the fol‐
1879 lowing environment variables. The majority of these variables are set
1880 the same way the options are set, as defined above. For flag options
1881 that are defined to expect no argument, the option can be enabled by
1882 setting the environment variable without a value (empty or NULL
1883 string), the string 'yes', or a non-zero number. Any other value for
1884 the environment variable will result in the option not being set.
1885 There are a couple exceptions to these rules that are noted below.
1886 NOTE: Environment variables will override any options set in a batch
1887 script, and command line options will override any environment vari‐
1888 ables.
1889
1890
1891 SBATCH_ACCOUNT Same as -A, --account
1892
1893 SBATCH_ACCTG_FREQ Same as --acctg-freq
1894
1895 SBATCH_ARRAY_INX Same as -a, --array
1896
1897 SBATCH_BATCH Same as --batch
1898
1899 SBATCH_CLUSTERS or SLURM_CLUSTERS
1900 Same as --clusters
1901
1902 SBATCH_CONSTRAINT Same as -C, --constraint
1903
1904 SBATCH_CONTAINER Same as --container.
1905
1906 SBATCH_CORE_SPEC Same as --core-spec
1907
1908 SBATCH_CPUS_PER_GPU Same as --cpus-per-gpu
1909
1910 SBATCH_DEBUG Same as -v, --verbose. Must be set to 0 or 1 to
1911 disable or enable the option.
1912
1913 SBATCH_DELAY_BOOT Same as --delay-boot
1914
1915 SBATCH_DISTRIBUTION Same as -m, --distribution
1916
1917 SBATCH_ERROR Same as -e, --error
1918
1919 SBATCH_EXCLUSIVE Same as --exclusive
1920
1921 SBATCH_EXPORT Same as --export
1922
1923 SBATCH_GET_USER_ENV Same as --get-user-env
1924
1925 SBATCH_GPU_BIND Same as --gpu-bind
1926
1927 SBATCH_GPU_FREQ Same as --gpu-freq
1928
1929 SBATCH_GPUS Same as -G, --gpus
1930
1931 SBATCH_GPUS_PER_NODE Same as --gpus-per-node
1932
1933 SBATCH_GPUS_PER_TASK Same as --gpus-per-task
1934
1935 SBATCH_GRES Same as --gres
1936
1937 SBATCH_GRES_FLAGS Same as --gres-flags
1938
1939 SBATCH_HINT or SLURM_HINT
1940 Same as --hint
1941
1942 SBATCH_IGNORE_PBS Same as --ignore-pbs
1943
1944 SBATCH_INPUT Same as -i, --input
1945
1946 SBATCH_JOB_NAME Same as -J, --job-name
1947
1948 SBATCH_MEM_BIND Same as --mem-bind
1949
1950 SBATCH_MEM_PER_CPU Same as --mem-per-cpu
1951
1952 SBATCH_MEM_PER_GPU Same as --mem-per-gpu
1953
1954 SBATCH_MEM_PER_NODE Same as --mem
1955
1956 SBATCH_NETWORK Same as --network
1957
1958 SBATCH_NO_KILL Same as -k, --no-kill
1959
1960 SBATCH_NO_REQUEUE Same as --no-requeue
1961
1962 SBATCH_OPEN_MODE Same as --open-mode
1963
1964 SBATCH_OUTPUT Same as -o, --output
1965
1966 SBATCH_OVERCOMMIT Same as -O, --overcommit
1967
1968 SBATCH_PARTITION Same as -p, --partition
1969
1970 SBATCH_POWER Same as --power
1971
1972 SBATCH_PROFILE Same as --profile
1973
1974 SBATCH_QOS Same as --qos
1975
1976 SBATCH_REQ_SWITCH When a tree topology is used, this defines the
1977 maximum count of switches desired for the job al‐
1978 location and optionally the maximum time to wait
1979 for that number of switches. See --switches
1980
1981 SBATCH_REQUEUE Same as --requeue
1982
1983 SBATCH_RESERVATION Same as --reservation
1984
1985 SBATCH_SIGNAL Same as --signal
1986
1987 SBATCH_SPREAD_JOB Same as --spread-job
1988
1989 SBATCH_THREAD_SPEC Same as --thread-spec
1990
1991 SBATCH_THREADS_PER_CORE
1992 Same as --threads-per-core
1993
1994 SBATCH_TIMELIMIT Same as -t, --time
1995
1996 SBATCH_USE_MIN_NODES Same as --use-min-nodes
1997
1998 SBATCH_WAIT Same as -W, --wait
1999
2000 SBATCH_WAIT_ALL_NODES Same as --wait-all-nodes. Must be set to 0 or 1
2001 to disable or enable the option.
2002
2003 SBATCH_WAIT4SWITCH Max time waiting for requested switches. See
2004 --switches
2005
2006 SBATCH_WCKEY Same as --wckey
2007
2008 SLURM_CONF The location of the Slurm configuration file.
2009
2010 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
2011 error occurs (e.g. invalid options). This can be
2012 used by a script to distinguish application exit
2013 codes from various Slurm error conditions.
2014
2015 SLURM_STEP_KILLED_MSG_NODE_ID=ID
2016 If set, only the specified node will log when the
2017 job or step are killed by a signal.
2018
2019 SLURM_UMASK If defined, Slurm will use the defined umask to
2020 set permissions when creating the output/error
2021 files for the job.
2022
2024 The Slurm controller will set the following variables in the environ‐
2025 ment of the batch script.
2026
2027
2028 SBATCH_MEM_BIND
2029 Set to value of the --mem-bind option.
2030
2031 SBATCH_MEM_BIND_LIST
2032 Set to bit mask used for memory binding.
2033
2034 SBATCH_MEM_BIND_PREFER
2035 Set to "prefer" if the --mem-bind option includes the prefer op‐
2036 tion.
2037
2038 SBATCH_MEM_BIND_TYPE
2039 Set to the memory binding type specified with the --mem-bind op‐
2040 tion. Possible values are "none", "rank", "map_map", "mask_mem"
2041 and "local".
2042
2043 SBATCH_MEM_BIND_VERBOSE
2044 Set to "verbose" if the --mem-bind option includes the verbose
2045 option. Set to "quiet" otherwise.
2046
2047 SLURM_*_HET_GROUP_#
2048 For a heterogeneous job allocation, the environment variables
2049 are set separately for each component.
2050
2051 SLURM_ARRAY_JOB_ID
2052 Job array's master job ID number.
2053
2054 SLURM_ARRAY_TASK_COUNT
2055 Total number of tasks in a job array.
2056
2057 SLURM_ARRAY_TASK_ID
2058 Job array ID (index) number.
2059
2060 SLURM_ARRAY_TASK_MAX
2061 Job array's maximum ID (index) number.
2062
2063 SLURM_ARRAY_TASK_MIN
2064 Job array's minimum ID (index) number.
2065
2066 SLURM_ARRAY_TASK_STEP
2067 Job array's index step size.
2068
2069 SLURM_CLUSTER_NAME
2070 Name of the cluster on which the job is executing.
2071
2072 SLURM_CPUS_ON_NODE
2073 Number of CPUs allocated to the batch step. NOTE: The se‐
2074 lect/linear plugin allocates entire nodes to jobs, so the value
2075 indicates the total count of CPUs on the node. For the se‐
2076 lect/cons_res and cons/tres plugins, this number indicates the
2077 number of CPUs on this node allocated to the step.
2078
2079 SLURM_CPUS_PER_GPU
2080 Number of CPUs requested per allocated GPU. Only set if the
2081 --cpus-per-gpu option is specified.
2082
2083 SLURM_CPUS_PER_TASK
2084 Number of cpus requested per task. Only set if the
2085 --cpus-per-task option is specified.
2086
2087 SLURM_CONTAINER
2088 OCI Bundle for job. Only set if --container is specified.
2089
2090 SLURM_DIST_PLANESIZE
2091 Plane distribution size. Only set for plane distributions. See
2092 -m, --distribution.
2093
2094 SLURM_DISTRIBUTION
2095 Same as -m, --distribution
2096
2097 SLURM_EXPORT_ENV
2098 Same as --export.
2099
2100 SLURM_GPU_BIND
2101 Requested binding of tasks to GPU. Only set if the --gpu-bind
2102 option is specified.
2103
2104 SLURM_GPU_FREQ
2105 Requested GPU frequency. Only set if the --gpu-freq option is
2106 specified.
2107
2108 SLURM_GPUS
2109 Number of GPUs requested. Only set if the -G, --gpus option is
2110 specified.
2111
2112 SLURM_GPUS_ON_NODE
2113 Number of GPUs allocated to the batch step.
2114
2115 SLURM_GPUS_PER_NODE
2116 Requested GPU count per allocated node. Only set if the
2117 --gpus-per-node option is specified.
2118
2119 SLURM_GPUS_PER_SOCKET
2120 Requested GPU count per allocated socket. Only set if the
2121 --gpus-per-socket option is specified.
2122
2123 SLURM_GPUS_PER_TASK
2124 Requested GPU count per allocated task. Only set if the
2125 --gpus-per-task option is specified.
2126
2127 SLURM_GTIDS
2128 Global task IDs running on this node. Zero origin and comma
2129 separated. It is read internally by pmi if Slurm was built with
2130 pmi support. Leaving the variable set may cause problems when
2131 using external packages from within the job (Abaqus and Ansys
2132 have been known to have problems when it is set - consult the
2133 appropriate documentation for 3rd party software).
2134
2135 SLURM_HET_SIZE
2136 Set to count of components in heterogeneous job.
2137
2138 SLURM_JOB_ACCOUNT
2139 Account name associated of the job allocation.
2140
2141 SLURM_JOB_GPUS
2142 The global GPU IDs of the GPUs allocated to this job. The GPU
2143 IDs are not relative to any device cgroup, even if devices are
2144 constrained with task/cgroup. Only set in batch and interactive
2145 jobs.
2146
2147 SLURM_JOB_ID
2148 The ID of the job allocation.
2149
2150 SLURM_JOB_CPUS_PER_NODE
2151 Count of CPUs available to the job on the nodes in the alloca‐
2152 tion, using the format CPU_count[(xnumber_of_nodes)][,CPU_count
2153 [(xnumber_of_nodes)] ...]. For example:
2154 SLURM_JOB_CPUS_PER_NODE='72(x2),36' indicates that on the first
2155 and second nodes (as listed by SLURM_JOB_NODELIST) the alloca‐
2156 tion has 72 CPUs, while the third node has 36 CPUs. NOTE: The
2157 select/linear plugin allocates entire nodes to jobs, so the
2158 value indicates the total count of CPUs on allocated nodes. The
2159 select/cons_res and select/cons_tres plugins allocate individual
2160 CPUs to jobs, so this number indicates the number of CPUs allo‐
2161 cated to the job.
2162
2163 SLURM_JOB_DEPENDENCY
2164 Set to value of the --dependency option.
2165
2166 SLURM_JOB_NAME
2167 Name of the job.
2168
2169 SLURM_JOB_NODELIST
2170 List of nodes allocated to the job.
2171
2172 SLURM_JOB_NUM_NODES
2173 Total number of nodes in the job's resource allocation.
2174
2175 SLURM_JOB_PARTITION
2176 Name of the partition in which the job is running.
2177
2178 SLURM_JOB_QOS
2179 Quality Of Service (QOS) of the job allocation.
2180
2181 SLURM_JOB_RESERVATION
2182 Advanced reservation containing the job allocation, if any.
2183
2184 SLURM_JOBID
2185 The ID of the job allocation. See SLURM_JOB_ID. Included for
2186 backwards compatibility.
2187
2188 SLURM_LOCALID
2189 Node local task ID for the process within a job.
2190
2191 SLURM_MEM_PER_CPU
2192 Same as --mem-per-cpu
2193
2194 SLURM_MEM_PER_GPU
2195 Requested memory per allocated GPU. Only set if the
2196 --mem-per-gpu option is specified.
2197
2198 SLURM_MEM_PER_NODE
2199 Same as --mem
2200
2201 SLURM_NNODES
2202 Total number of nodes in the job's resource allocation. See
2203 SLURM_JOB_NUM_NODES. Included for backwards compatibility.
2204
2205 SLURM_NODE_ALIASES
2206 Sets of node name, communication address and hostname for nodes
2207 allocated to the job from the cloud. Each element in the set if
2208 colon separated and each set is comma separated. For example:
2209 SLURM_NODE_ALIASES=ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2210
2211 SLURM_NODEID
2212 ID of the nodes allocated.
2213
2214 SLURM_NODELIST
2215 List of nodes allocated to the job. See SLURM_JOB_NODELIST. In‐
2216 cluded for backwards compatibility.
2217
2218 SLURM_NPROCS
2219 Same as -n, --ntasks. See SLURM_NTASKS. Included for backwards
2220 compatibility.
2221
2222 SLURM_NTASKS
2223 Same as -n, --ntasks
2224
2225 SLURM_NTASKS_PER_CORE
2226 Number of tasks requested per core. Only set if the
2227 --ntasks-per-core option is specified.
2228
2229
2230 SLURM_NTASKS_PER_GPU
2231 Number of tasks requested per GPU. Only set if the
2232 --ntasks-per-gpu option is specified.
2233
2234 SLURM_NTASKS_PER_NODE
2235 Number of tasks requested per node. Only set if the
2236 --ntasks-per-node option is specified.
2237
2238 SLURM_NTASKS_PER_SOCKET
2239 Number of tasks requested per socket. Only set if the
2240 --ntasks-per-socket option is specified.
2241
2242 SLURM_OVERCOMMIT
2243 Set to 1 if --overcommit was specified.
2244
2245 SLURM_PRIO_PROCESS
2246 The scheduling priority (nice value) at the time of job submis‐
2247 sion. This value is propagated to the spawned processes.
2248
2249 SLURM_PROCID
2250 The MPI rank (or relative process ID) of the current process
2251
2252 SLURM_PROFILE
2253 Same as --profile
2254
2255 SLURM_RESTART_COUNT
2256 If the job has been restarted due to system failure or has been
2257 explicitly requeued, this will be sent to the number of times
2258 the job has been restarted.
2259
2260 SLURM_SHARDS_ON_NODE
2261 Number of GPU Shards available to the step on this node.
2262
2263 SLURM_SUBMIT_DIR
2264 The directory from which sbatch was invoked.
2265
2266 SLURM_SUBMIT_HOST
2267 The hostname of the computer from which sbatch was invoked.
2268
2269 SLURM_TASK_PID
2270 The process ID of the task being started.
2271
2272 SLURM_TASKS_PER_NODE
2273 Number of tasks to be initiated on each node. Values are comma
2274 separated and in the same order as SLURM_JOB_NODELIST. If two
2275 or more consecutive nodes are to have the same task count, that
2276 count is followed by "(x#)" where "#" is the repetition count.
2277 For example, "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2278 first three nodes will each execute two tasks and the fourth
2279 node will execute one task.
2280
2281 SLURM_THREADS_PER_CORE
2282 This is only set if --threads-per-core or
2283 SBATCH_THREADS_PER_CORE were specified. The value will be set to
2284 the value specified by --threads-per-core or
2285 SBATCH_THREADS_PER_CORE. This is used by subsequent srun calls
2286 within the job allocation.
2287
2288 SLURM_TOPOLOGY_ADDR
2289 This is set only if the system has the topology/tree plugin
2290 configured. The value will be set to the names network
2291 switches which may be involved in the job's communications
2292 from the system's top level switch down to the leaf switch and
2293 ending with node name. A period is used to separate each hard‐
2294 ware component name.
2295
2296 SLURM_TOPOLOGY_ADDR_PATTERN
2297 This is set only if the system has the topology/tree plugin
2298 configured. The value will be set component types listed in
2299 SLURM_TOPOLOGY_ADDR. Each component will be identified as ei‐
2300 ther "switch" or "node". A period is used to separate each
2301 hardware component type.
2302
2303 SLURMD_NODENAME
2304 Name of the node running the job script.
2305
2307 Specify a batch script by filename on the command line. The batch
2308 script specifies a 1 minute time limit for the job.
2309
2310 $ cat myscript
2311 #!/bin/sh
2312 #SBATCH --time=1
2313 srun hostname |sort
2314
2315 $ sbatch -N4 myscript
2316 salloc: Granted job allocation 65537
2317
2318 $ cat slurm-65537.out
2319 host1
2320 host2
2321 host3
2322 host4
2323
2324
2325 Pass a batch script to sbatch on standard input:
2326
2327 $ sbatch -N4 <<EOF
2328 > #!/bin/sh
2329 > srun hostname |sort
2330 > EOF
2331 sbatch: Submitted batch job 65541
2332
2333 $ cat slurm-65541.out
2334 host1
2335 host2
2336 host3
2337 host4
2338
2339
2340 To create a heterogeneous job with 3 components, each allocating a
2341 unique set of nodes:
2342
2343 $ sbatch -w node[2-3] : -w node4 : -w node[5-7] work.bash
2344 Submitted batch job 34987
2345
2346
2348 Copyright (C) 2006-2007 The Regents of the University of California.
2349 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
2350 Copyright (C) 2008-2010 Lawrence Livermore National Security.
2351 Copyright (C) 2010-2022 SchedMD LLC.
2352
2353 This file is part of Slurm, a resource management program. For de‐
2354 tails, see <https://slurm.schedmd.com/>.
2355
2356 Slurm is free software; you can redistribute it and/or modify it under
2357 the terms of the GNU General Public License as published by the Free
2358 Software Foundation; either version 2 of the License, or (at your op‐
2359 tion) any later version.
2360
2361 Slurm is distributed in the hope that it will be useful, but WITHOUT
2362 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
2363 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
2364 for more details.
2365
2366
2368 sinfo(1), sattach(1), salloc(1), squeue(1), scancel(1), scontrol(1),
2369 slurm.conf(5), sched_setaffinity (2), numa (3)
2370
2371
2372
2373December 2022 Slurm Commands sbatch(1)