1sbatch(1) Slurm Commands sbatch(1)
2
3
4
6 sbatch - Submit a batch script to Slurm.
7
8
10 sbatch [OPTIONS(0)...] [ : [OPTIONS(N)...]] script(0) [args(0)...]
11
12 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
13 For more details about heterogeneous jobs see the document
14 https://slurm.schedmd.com/heterogeneous_jobs.html
15
16
18 sbatch submits a batch script to Slurm. The batch script may be given
19 to sbatch through a file name on the command line, or if no file name
20 is specified, sbatch will read in a script from standard input. The
21 batch script may contain options preceded with "#SBATCH" before any ex‐
22 ecutable commands in the script. sbatch will stop processing further
23 #SBATCH directives once the first non-comment non-whitespace line has
24 been reached in the script.
25
26 sbatch exits immediately after the script is successfully transferred
27 to the Slurm controller and assigned a Slurm job ID. The batch script
28 is not necessarily granted resources immediately, it may sit in the
29 queue of pending jobs for some time before its required resources be‐
30 come available.
31
32 By default both standard output and standard error are directed to a
33 file of the name "slurm-%j.out", where the "%j" is replaced with the
34 job allocation number. The file will be generated on the first node of
35 the job allocation. Other than the batch script itself, Slurm does no
36 movement of user files.
37
38 When the job allocation is finally granted for the batch script, Slurm
39 runs a single copy of the batch script on the first node in the set of
40 allocated nodes.
41
42 The following document describes the influence of various options on
43 the allocation of cpus to jobs and tasks.
44 https://slurm.schedmd.com/cpu_management.html
45
46
48 sbatch will return 0 on success or error code on failure.
49
50
52 The batch script is resolved in the following order:
53
54 1. If script starts with ".", then path is constructed as: current
55 working directory / script
56 2. If script starts with a "/", then path is considered absolute.
57 3. If script is in current working directory.
58 4. If script can be resolved through PATH. See path_resolution(7).
59
60 Current working directory is the calling process working directory un‐
61 less the --chdir argument is passed, which will override the current
62 working directory.
63
64
66 -a, --array=<indexes>
67 Submit a job array, multiple jobs to be executed with identical
68 parameters. The indexes specification identifies what array in‐
69 dex values should be used. Multiple values may be specified us‐
70 ing a comma separated list and/or a range of values with a "-"
71 separator. For example, "--array=0-15" or "--array=0,6,16-32".
72 A step function can also be specified with a suffix containing a
73 colon and number. For example, "--array=0-15:4" is equivalent to
74 "--array=0,4,8,12". A maximum number of simultaneously running
75 tasks from the job array may be specified using a "%" separator.
76 For example "--array=0-15%4" will limit the number of simultane‐
77 ously running tasks from this job array to 4. The minimum index
78 value is 0. the maximum value is one less than the configura‐
79 tion parameter MaxArraySize. NOTE: currently, federated job ar‐
80 rays only run on the local cluster.
81
82
83 -A, --account=<account>
84 Charge resources used by this job to specified account. The ac‐
85 count is an arbitrary string. The account name may be changed
86 after job submission using the scontrol command.
87
88
89 --acctg-freq
90 Define the job accounting and profiling sampling intervals.
91 This can be used to override the JobAcctGatherFrequency parame‐
92 ter in Slurm's configuration file, slurm.conf. The supported
93 format is as follows:
94
95 --acctg-freq=<datatype>=<interval>
96 where <datatype>=<interval> specifies the task sam‐
97 pling interval for the jobacct_gather plugin or a
98 sampling interval for a profiling type by the
99 acct_gather_profile plugin. Multiple, comma-sepa‐
100 rated <datatype>=<interval> intervals may be speci‐
101 fied. Supported datatypes are as follows:
102
103 task=<interval>
104 where <interval> is the task sampling inter‐
105 val in seconds for the jobacct_gather plugins
106 and for task profiling by the
107 acct_gather_profile plugin. NOTE: This fre‐
108 quency is used to monitor memory usage. If
109 memory limits are enforced the highest fre‐
110 quency a user can request is what is config‐
111 ured in the slurm.conf file. They can not
112 turn it off (=0) either.
113
114 energy=<interval>
115 where <interval> is the sampling interval in
116 seconds for energy profiling using the
117 acct_gather_energy plugin
118
119 network=<interval>
120 where <interval> is the sampling interval in
121 seconds for infiniband profiling using the
122 acct_gather_interconnect plugin.
123
124 filesystem=<interval>
125 where <interval> is the sampling interval in
126 seconds for filesystem profiling using the
127 acct_gather_filesystem plugin.
128
129 The default value for the task sampling in‐
130 terval is 30 seconds.
131 The default value for all other intervals is 0. An interval of
132 0 disables sampling of the specified type. If the task sampling
133 interval is 0, accounting information is collected only at job
134 termination (reducing Slurm interference with the job).
135 Smaller (non-zero) values have a greater impact upon job perfor‐
136 mance, but a value of 30 seconds is not likely to be noticeable
137 for applications having less than 10,000 tasks.
138
139
140 -B --extra-node-info=<sockets[:cores[:threads]]>
141 Restrict node selection to nodes with at least the specified
142 number of sockets, cores per socket and/or threads per core.
143 NOTE: These options do not specify the resource allocation size.
144 Each value specified is considered a minimum. An asterisk (*)
145 can be used as a placeholder indicating that all available re‐
146 sources of that type are to be utilized. Values can also be
147 specified as min-max. The individual levels can also be speci‐
148 fied in separate options if desired:
149 --sockets-per-node=<sockets>
150 --cores-per-socket=<cores>
151 --threads-per-core=<threads>
152 If task/affinity plugin is enabled, then specifying an alloca‐
153 tion in this manner also results in subsequently launched tasks
154 being bound to threads if the -B option specifies a thread
155 count, otherwise an option of cores if a core count is speci‐
156 fied, otherwise an option of sockets. If SelectType is config‐
157 ured to select/cons_res, it must have a parameter of CR_Core,
158 CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option
159 to be honored. If not specified, the scontrol show job will
160 display 'ReqS:C:T=*:*:*'. This option applies to job alloca‐
161 tions. NOTE: This option is mutually exclusive with --hint,
162 --threads-per-core and --ntasks-per-core.
163
164
165 --batch=<list>
166 Nodes can have features assigned to them by the Slurm adminis‐
167 trator. Users can specify which of these features are required
168 by their batch script using this options. For example a job's
169 allocation may include both Intel Haswell and KNL nodes with
170 features "haswell" and "knl" respectively. On such a configura‐
171 tion the batch script would normally benefit by executing on a
172 faster Haswell node. This would be specified using the option
173 "--batch=haswell". The specification can include AND and OR op‐
174 erators using the ampersand and vertical bar separators. For ex‐
175 ample: "--batch=haswell|broadwell" or "--batch=haswell|big_mem‐
176 ory". The --batch argument must be a subset of the job's --con‐
177 straint=<list> argument (i.e. the job can not request only KNL
178 nodes, but require the script to execute on a Haswell node). If
179 the request can not be satisfied from the resources allocated to
180 the job, the batch script will execute on the first node of the
181 job allocation.
182
183
184 --bb=<spec>
185 Burst buffer specification. The form of the specification is
186 system dependent. Note the burst buffer may not be accessible
187 from a login node, but require that salloc spawn a shell on one
188 of its allocated compute nodes.
189
190
191 --bbf=<file_name>
192 Path of file containing burst buffer specification. The form of
193 the specification is system dependent. These burst buffer di‐
194 rectives will be inserted into the submitted batch script.
195
196
197 -b, --begin=<time>
198 Submit the batch script to the Slurm controller immediately,
199 like normal, but tell the controller to defer the allocation of
200 the job until the specified time.
201
202 Time may be of the form HH:MM:SS to run a job at a specific time
203 of day (seconds are optional). (If that time is already past,
204 the next day is assumed.) You may also specify midnight, noon,
205 fika (3 PM) or teatime (4 PM) and you can have a time-of-day
206 suffixed with AM or PM for running in the morning or the
207 evening. You can also say what day the job will be run, by
208 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
209 Combine date and time using the following format
210 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
211 count time-units, where the time-units can be seconds (default),
212 minutes, hours, days, or weeks and you can tell Slurm to run the
213 job today with the keyword today and to run the job tomorrow
214 with the keyword tomorrow. The value may be changed after job
215 submission using the scontrol command. For example:
216 --begin=16:00
217 --begin=now+1hour
218 --begin=now+60 (seconds by default)
219 --begin=2010-01-20T12:34:00
220
221
222 Notes on date/time specifications:
223 - Although the 'seconds' field of the HH:MM:SS time specifica‐
224 tion is allowed by the code, note that the poll time of the
225 Slurm scheduler is not precise enough to guarantee dispatch of
226 the job on the exact second. The job will be eligible to start
227 on the next poll following the specified time. The exact poll
228 interval depends on the Slurm scheduler (e.g., 60 seconds with
229 the default sched/builtin).
230 - If no time (HH:MM:SS) is specified, the default is
231 (00:00:00).
232 - If a date is specified without a year (e.g., MM/DD) then the
233 current year is assumed, unless the combination of MM/DD and
234 HH:MM:SS has already passed for that year, in which case the
235 next year is used.
236
237
238 --cluster-constraint=[!]<list>
239 Specifies features that a federated cluster must have to have a
240 sibling job submitted to it. Slurm will attempt to submit a sib‐
241 ling job to a cluster if it has at least one of the specified
242 features. If the "!" option is included, Slurm will attempt to
243 submit a sibling job to a cluster that has none of the specified
244 features.
245
246
247 --comment=<string>
248 An arbitrary comment enclosed in double quotes if using spaces
249 or some special characters.
250
251
252 -C, --constraint=<list>
253 Nodes can have features assigned to them by the Slurm adminis‐
254 trator. Users can specify which of these features are required
255 by their job using the constraint option. Only nodes having
256 features matching the job constraints will be used to satisfy
257 the request. Multiple constraints may be specified with AND,
258 OR, matching OR, resource counts, etc. (some operators are not
259 supported on all system types). Supported constraint options
260 include:
261
262 Single Name
263 Only nodes which have the specified feature will be used.
264 For example, --constraint="intel"
265
266 Node Count
267 A request can specify the number of nodes needed with
268 some feature by appending an asterisk and count after the
269 feature name. For example, --nodes=16 --con‐
270 straint="graphics*4 ..." indicates that the job requires
271 16 nodes and that at least four of those nodes must have
272 the feature "graphics."
273
274 AND If only nodes with all of specified features will be
275 used. The ampersand is used for an AND operator. For
276 example, --constraint="intel&gpu"
277
278 OR If only nodes with at least one of specified features
279 will be used. The vertical bar is used for an OR opera‐
280 tor. For example, --constraint="intel|amd"
281
282 Matching OR
283 If only one of a set of possible options should be used
284 for all allocated nodes, then use the OR operator and en‐
285 close the options within square brackets. For example,
286 --constraint="[rack1|rack2|rack3|rack4]" might be used to
287 specify that all nodes must be allocated on a single rack
288 of the cluster, but any of those four racks can be used.
289
290 Multiple Counts
291 Specific counts of multiple resources may be specified by
292 using the AND operator and enclosing the options within
293 square brackets. For example, --con‐
294 straint="[rack1*2&rack2*4]" might be used to specify that
295 two nodes must be allocated from nodes with the feature
296 of "rack1" and four nodes must be allocated from nodes
297 with the feature "rack2".
298
299 NOTE: This construct does not support multiple Intel KNL
300 NUMA or MCDRAM modes. For example, while --con‐
301 straint="[(knl&quad)*2&(knl&hemi)*4]" is not supported,
302 --constraint="[haswell*2&(knl&hemi)*4]" is supported.
303 Specification of multiple KNL modes requires the use of a
304 heterogeneous job.
305
306 Brackets
307 Brackets can be used to indicate that you are looking for
308 a set of nodes with the different requirements contained
309 within the brackets. For example, --con‐
310 straint="[(rack1|rack2)*1&(rack3)*2]" will get you one
311 node with either the "rack1" or "rack2" features and two
312 nodes with the "rack3" feature. The same request without
313 the brackets will try to find a single node that meets
314 those requirements.
315
316 Parenthesis
317 Parenthesis can be used to group like node features to‐
318 gether. For example, --con‐
319 straint="[(knl&snc4&flat)*4&haswell*1]" might be used to
320 specify that four nodes with the features "knl", "snc4"
321 and "flat" plus one node with the feature "haswell" are
322 required. All options within parenthesis should be
323 grouped with AND (e.g. "&") operands.
324
325
326 --contiguous
327 If set, then the allocated nodes must form a contiguous set.
328
329 NOTE: If SelectPlugin=cons_res this option won't be honored with
330 the topology/tree or topology/3d_torus plugins, both of which
331 can modify the node ordering.
332
333
334 --cores-per-socket=<cores>
335 Restrict node selection to nodes with at least the specified
336 number of cores per socket. See additional information under -B
337 option above when task/affinity plugin is enabled.
338
339
340 --cpu-freq =<p1[-p2[:p3]]>
341
342 Request that job steps initiated by srun commands inside this
343 sbatch script be run at some requested frequency if possible, on
344 the CPUs selected for the step on the compute node(s).
345
346 p1 can be [#### | low | medium | high | highm1] which will set
347 the frequency scaling_speed to the corresponding value, and set
348 the frequency scaling_governor to UserSpace. See below for defi‐
349 nition of the values.
350
351 p1 can be [Conservative | OnDemand | Performance | PowerSave]
352 which will set the scaling_governor to the corresponding value.
353 The governor has to be in the list set by the slurm.conf option
354 CpuFreqGovernors.
355
356 When p2 is present, p1 will be the minimum scaling frequency and
357 p2 will be the maximum scaling frequency.
358
359 p2 can be [#### | medium | high | highm1] p2 must be greater
360 than p1.
361
362 p3 can be [Conservative | OnDemand | Performance | PowerSave |
363 UserSpace] which will set the governor to the corresponding
364 value.
365
366 If p3 is UserSpace, the frequency scaling_speed will be set by a
367 power or energy aware scheduling strategy to a value between p1
368 and p2 that lets the job run within the site's power goal. The
369 job may be delayed if p1 is higher than a frequency that allows
370 the job to run within the goal.
371
372 If the current frequency is < min, it will be set to min. Like‐
373 wise, if the current frequency is > max, it will be set to max.
374
375 Acceptable values at present include:
376
377 #### frequency in kilohertz
378
379 Low the lowest available frequency
380
381 High the highest available frequency
382
383 HighM1 (high minus one) will select the next highest
384 available frequency
385
386 Medium attempts to set a frequency in the middle of the
387 available range
388
389 Conservative attempts to use the Conservative CPU governor
390
391 OnDemand attempts to use the OnDemand CPU governor (the de‐
392 fault value)
393
394 Performance attempts to use the Performance CPU governor
395
396 PowerSave attempts to use the PowerSave CPU governor
397
398 UserSpace attempts to use the UserSpace CPU governor
399
400
401 The following informational environment variable is set
402 in the job
403 step when --cpu-freq option is requested.
404 SLURM_CPU_FREQ_REQ
405
406 This environment variable can also be used to supply the value
407 for the CPU frequency request if it is set when the 'srun' com‐
408 mand is issued. The --cpu-freq on the command line will over‐
409 ride the environment variable value. The form on the environ‐
410 ment variable is the same as the command line. See the ENVIRON‐
411 MENT VARIABLES section for a description of the
412 SLURM_CPU_FREQ_REQ variable.
413
414 NOTE: This parameter is treated as a request, not a requirement.
415 If the job step's node does not support setting the CPU fre‐
416 quency, or the requested value is outside the bounds of the le‐
417 gal frequencies, an error is logged, but the job step is allowed
418 to continue.
419
420 NOTE: Setting the frequency for just the CPUs of the job step
421 implies that the tasks are confined to those CPUs. If task con‐
422 finement (i.e., TaskPlugin=task/affinity or TaskPlu‐
423 gin=task/cgroup with the "ConstrainCores" option) is not config‐
424 ured, this parameter is ignored.
425
426 NOTE: When the step completes, the frequency and governor of
427 each selected CPU is reset to the previous values.
428
429 NOTE: When submitting jobs with the --cpu-freq option with lin‐
430 uxproc as the ProctrackType can cause jobs to run too quickly
431 before Accounting is able to poll for job information. As a re‐
432 sult not all of accounting information will be present.
433
434
435 --cpus-per-gpu=<ncpus>
436 Advise Slurm that ensuing job steps will require ncpus proces‐
437 sors per allocated GPU. Not compatible with the --cpus-per-task
438 option.
439
440
441 -c, --cpus-per-task=<ncpus>
442 Advise the Slurm controller that ensuing job steps will require
443 ncpus number of processors per task. Without this option, the
444 controller will just try to allocate one processor per task.
445
446 For instance, consider an application that has 4 tasks, each re‐
447 quiring 3 processors. If our cluster is comprised of quad-pro‐
448 cessors nodes and we simply ask for 12 processors, the con‐
449 troller might give us only 3 nodes. However, by using the
450 --cpus-per-task=3 options, the controller knows that each task
451 requires 3 processors on the same node, and the controller will
452 grant an allocation of 4 nodes, one for each of the 4 tasks.
453
454
455 --deadline=<OPT>
456 remove the job if no ending is possible before this deadline
457 (start > (deadline - time[-min])). Default is no deadline.
458 Valid time formats are:
459 HH:MM[:SS] [AM|PM]
460 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
461 MM/DD[/YY]-HH:MM[:SS]
462 YYYY-MM-DD[THH:MM[:SS]]]
463 now[+count[seconds(default)|minutes|hours|days|weeks]]
464
465
466 --delay-boot=<minutes>
467 Do not reboot nodes in order to satisfied this job's feature
468 specification if the job has been eligible to run for less than
469 this time period. If the job has waited for less than the spec‐
470 ified period, it will use only nodes which already have the
471 specified features. The argument is in units of minutes. A de‐
472 fault value may be set by a system administrator using the de‐
473 lay_boot option of the SchedulerParameters configuration parame‐
474 ter in the slurm.conf file, otherwise the default value is zero
475 (no delay).
476
477
478 -d, --dependency=<dependency_list>
479 Defer the start of this job until the specified dependencies
480 have been satisfied completed. <dependency_list> is of the form
481 <type:job_id[:job_id][,type:job_id[:job_id]]> or
482 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
483 must be satisfied if the "," separator is used. Any dependency
484 may be satisfied if the "?" separator is used. Only one separa‐
485 tor may be used. Many jobs can share the same dependency and
486 these jobs may even belong to different users. The value may
487 be changed after job submission using the scontrol command. De‐
488 pendencies on remote jobs are allowed in a federation. Once a
489 job dependency fails due to the termination state of a preceding
490 job, the dependent job will never be run, even if the preceding
491 job is requeued and has a different termination state in a sub‐
492 sequent execution.
493
494 after:job_id[[+time][:jobid[+time]...]]
495 After the specified jobs start or are cancelled and
496 'time' in minutes from job start or cancellation happens,
497 this job can begin execution. If no 'time' is given then
498 there is no delay after start or cancellation.
499
500 afterany:job_id[:jobid...]
501 This job can begin execution after the specified jobs
502 have terminated.
503
504 afterburstbuffer:job_id[:jobid...]
505 This job can begin execution after the specified jobs
506 have terminated and any associated burst buffer stage out
507 operations have completed.
508
509 aftercorr:job_id[:jobid...]
510 A task of this job array can begin execution after the
511 corresponding task ID in the specified job has completed
512 successfully (ran to completion with an exit code of
513 zero).
514
515 afternotok:job_id[:jobid...]
516 This job can begin execution after the specified jobs
517 have terminated in some failed state (non-zero exit code,
518 node failure, timed out, etc).
519
520 afterok:job_id[:jobid...]
521 This job can begin execution after the specified jobs
522 have successfully executed (ran to completion with an
523 exit code of zero).
524
525 expand:job_id
526 Resources allocated to this job should be used to expand
527 the specified job. The job to expand must share the same
528 QOS (Quality of Service) and partition. Gang scheduling
529 of resources in the partition is also not supported.
530 "expand" is not allowed for jobs that didn't originate on
531 the same cluster as the submitted job.
532
533 singleton
534 This job can begin execution after any previously
535 launched jobs sharing the same job name and user have
536 terminated. In other words, only one job by that name
537 and owned by that user can be running or suspended at any
538 point in time. In a federation, a singleton dependency
539 must be fulfilled on all clusters unless DependencyParam‐
540 eters=disable_remote_singleton is used in slurm.conf.
541
542
543 -D, --chdir=<directory>
544 Set the working directory of the batch script to directory be‐
545 fore it is executed. The path can be specified as full path or
546 relative path to the directory where the command is executed.
547
548
549 -e, --error=<filename pattern>
550 Instruct Slurm to connect the batch script's standard error di‐
551 rectly to the file name specified in the "filename pattern". By
552 default both standard output and standard error are directed to
553 the same file. For job arrays, the default file name is
554 "slurm-%A_%a.out", "%A" is replaced by the job ID and "%a" with
555 the array index. For other jobs, the default file name is
556 "slurm-%j.out", where the "%j" is replaced by the job ID. See
557 the filename pattern section below for filename specification
558 options.
559
560
561 --exclusive[=user|mcs]
562 The job allocation can not share nodes with other running jobs
563 (or just other users with the "=user" option or with the "=mcs"
564 option). The default shared/exclusive behavior depends on sys‐
565 tem configuration and the partition's OverSubscribe option takes
566 precedence over the job's option.
567
568
569 --export=<[ALL,]environment variables|ALL|NONE>
570 Identify which environment variables from the submission envi‐
571 ronment are propagated to the launched application. Note that
572 SLURM_* variables are always propagated.
573
574 --export=ALL
575 Default mode if --export is not specified. All of the
576 users environment will be loaded (either from callers
577 environment or clean environment if --get-user-env is
578 specified).
579
580 --export=NONE
581 Only SLURM_* variables from the user environment will
582 be defined. User must use absolute path to the binary
583 to be executed that will define the environment. User
584 can not specify explicit environment variables with
585 NONE. --get-user-env will be ignored. This option is
586 particularly important for jobs that are submitted on
587 one cluster and execute on a different cluster (e.g.
588 with different paths). To avoid steps inheriting en‐
589 vironment export settings (e.g. NONE) from sbatch com‐
590 mand, the environment variable SLURM_EXPORT_ENV should
591 be set to ALL in the job script.
592
593 --export=<[ALL,]environment variables>
594 Exports all SLURM_* environment variables along with
595 explicitly defined variables. Multiple environment
596 variable names should be comma separated. Environment
597 variable names may be specified to propagate the cur‐
598 rent value (e.g. "--export=EDITOR") or specific values
599 may be exported (e.g. "--export=EDITOR=/bin/emacs").
600 If ALL is specified, then all user environment vari‐
601 ables will be loaded and will take precedence over any
602 explicitly given environment variables.
603
604 Example: --export=EDITOR,ARG1=test
605 In this example, the propagated environment will only
606 contain the variable EDITOR from the user's environ‐
607 ment, SLURM_* environment variables, and ARG1=test.
608
609 Example: --export=ALL,EDITOR=/bin/emacs
610 There are two possible outcomes for this example. If
611 the caller has the EDITOR environment variable de‐
612 fined, then the job's environment will inherit the
613 variable from the caller's environment. If the caller
614 doesn't have an environment variable defined for EDI‐
615 TOR, then the job's environment will use the value
616 given by --export.
617
618
619 --export-file=<filename | fd>
620 If a number between 3 and OPEN_MAX is specified as the argument
621 to this option, a readable file descriptor will be assumed
622 (STDIN and STDOUT are not supported as valid arguments). Other‐
623 wise a filename is assumed. Export environment variables de‐
624 fined in <filename> or read from <fd> to the job's execution en‐
625 vironment. The content is one or more environment variable defi‐
626 nitions of the form NAME=value, each separated by a null charac‐
627 ter. This allows the use of special characters in environment
628 definitions.
629
630
631 -F, --nodefile=<node file>
632 Much like --nodelist, but the list is contained in a file of
633 name node file. The node names of the list may also span multi‐
634 ple lines in the file. Duplicate node names in the file will
635 be ignored. The order of the node names in the list is not im‐
636 portant; the node names will be sorted by Slurm.
637
638
639 --get-user-env[=timeout][mode]
640 This option will tell sbatch to retrieve the login environment
641 variables for the user specified in the --uid option. The envi‐
642 ronment variables are retrieved by running something of this
643 sort "su - <username> -c /usr/bin/env" and parsing the output.
644 Be aware that any environment variables already set in sbatch's
645 environment will take precedence over any environment variables
646 in the user's login environment. Clear any environment variables
647 before calling sbatch that you do not want propagated to the
648 spawned program. The optional timeout value is in seconds. De‐
649 fault value is 8 seconds. The optional mode value control the
650 "su" options. With a mode value of "S", "su" is executed with‐
651 out the "-" option. With a mode value of "L", "su" is executed
652 with the "-" option, replicating the login environment. If mode
653 not specified, the mode established at Slurm build time is used.
654 Example of use include "--get-user-env", "--get-user-env=10"
655 "--get-user-env=10L", and "--get-user-env=S".
656
657
658 --gid=<group>
659 If sbatch is run as root, and the --gid option is used, submit
660 the job with group's group access permissions. group may be the
661 group name or the numerical group ID.
662
663
664 -G, --gpus=[<type>:]<number>
665 Specify the total number of GPUs required for the job. An op‐
666 tional GPU type specification can be supplied. For example
667 "--gpus=volta:3". Multiple options can be requested in a comma
668 separated list, for example: "--gpus=volta:3,kepler:1". See
669 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
670 options.
671
672
673 --gpu-bind=[verbose,]<type>
674 Bind tasks to specific GPUs. By default every spawned task can
675 access every GPU allocated to the job. If "verbose," is speci‐
676 fied before <type>, then print out GPU binding information.
677
678 Supported type options:
679
680 closest Bind each task to the GPU(s) which are closest. In a
681 NUMA environment, each task may be bound to more than
682 one GPU (i.e. all GPUs in that NUMA environment).
683
684 map_gpu:<list>
685 Bind by setting GPU masks on tasks (or ranks) as spec‐
686 ified where <list> is
687 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
688 are interpreted as decimal values unless they are pre‐
689 ceded with '0x' in which case they interpreted as
690 hexadecimal values. If the number of tasks (or ranks)
691 exceeds the number of elements in this list, elements
692 in the list will be reused as needed starting from the
693 beginning of the list. To simplify support for large
694 task counts, the lists may follow a map with an aster‐
695 isk and repetition count. For example
696 "map_gpu:0*4,1*4". If the task/cgroup plugin is used
697 and ConstrainDevices is set in cgroup.conf, then the
698 GPU IDs are zero-based indexes relative to the GPUs
699 allocated to the job (e.g. the first GPU is 0, even if
700 the global ID is 3). Otherwise, the GPU IDs are global
701 IDs, and all GPUs on each node in the job should be
702 allocated for predictable binding results.
703
704 mask_gpu:<list>
705 Bind by setting GPU masks on tasks (or ranks) as spec‐
706 ified where <list> is
707 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
708 mapping is specified for a node and identical mapping
709 is applied to the tasks on every node (i.e. the lowest
710 task ID on each node is mapped to the first mask spec‐
711 ified in the list, etc.). GPU masks are always inter‐
712 preted as hexadecimal values but can be preceded with
713 an optional '0x'. To simplify support for large task
714 counts, the lists may follow a map with an asterisk
715 and repetition count. For example
716 "mask_gpu:0x0f*4,0xf0*4". If the task/cgroup plugin
717 is used and ConstrainDevices is set in cgroup.conf,
718 then the GPU IDs are zero-based indexes relative to
719 the GPUs allocated to the job (e.g. the first GPU is
720 0, even if the global ID is 3). Otherwise, the GPU IDs
721 are global IDs, and all GPUs on each node in the job
722 should be allocated for predictable binding results.
723
724 single:<tasks_per_gpu>
725 Like --gpu-bind=closest, except that each task can
726 only be bound to a single GPU, even when it can be
727 bound to multiple GPUs that are equally close. The
728 GPU to bind to is determined by <tasks_per_gpu>, where
729 the first <tasks_per_gpu> tasks are bound to the first
730 GPU available, the second <tasks_per_gpu> tasks are
731 bound to the second GPU available, etc. This is basi‐
732 cally a block distribution of tasks onto available
733 GPUs, where the available GPUs are determined by the
734 socket affinity of the task and the socket affinity of
735 the GPUs as specified in gres.conf's Cores parameter.
736
737
738 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
739 Request that GPUs allocated to the job are configured with spe‐
740 cific frequency values. This option can be used to indepen‐
741 dently configure the GPU and its memory frequencies. After the
742 job is completed, the frequencies of all affected GPUs will be
743 reset to the highest possible values. In some cases, system
744 power caps may override the requested values. The field type
745 can be "memory". If type is not specified, the GPU frequency is
746 implied. The value field can either be "low", "medium", "high",
747 "highm1" or a numeric value in megahertz (MHz). If the speci‐
748 fied numeric value is not possible, a value as close as possible
749 will be used. See below for definition of the values. The ver‐
750 bose option causes current GPU frequency information to be
751 logged. Examples of use include "--gpu-freq=medium,memory=high"
752 and "--gpu-freq=450".
753
754 Supported value definitions:
755
756 low the lowest available frequency.
757
758 medium attempts to set a frequency in the middle of the
759 available range.
760
761 high the highest available frequency.
762
763 highm1 (high minus one) will select the next highest avail‐
764 able frequency.
765
766
767 --gpus-per-node=[<type>:]<number>
768 Specify the number of GPUs required for the job on each node in‐
769 cluded in the job's resource allocation. An optional GPU type
770 specification can be supplied. For example
771 "--gpus-per-node=volta:3". Multiple options can be requested in
772 a comma separated list, for example:
773 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
774 --gpus-per-socket and --gpus-per-task options.
775
776
777 --gpus-per-socket=[<type>:]<number>
778 Specify the number of GPUs required for the job on each socket
779 included in the job's resource allocation. An optional GPU type
780 specification can be supplied. For example
781 "--gpus-per-socket=volta:3". Multiple options can be requested
782 in a comma separated list, for example:
783 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
784 sockets per node count ( --sockets-per-node). See also the
785 --gpus, --gpus-per-node and --gpus-per-task options.
786
787
788 --gpus-per-task=[<type>:]<number>
789 Specify the number of GPUs required for the job on each task to
790 be spawned in the job's resource allocation. An optional GPU
791 type specification can be supplied. For example
792 "--gpus-per-task=volta:1". Multiple options can be requested in
793 a comma separated list, for example:
794 "--gpus-per-task=volta:3,kepler:1". See also the --gpus,
795 --gpus-per-socket and --gpus-per-node options. This option re‐
796 quires an explicit task count, e.g. -n, --ntasks or "--gpus=X
797 --gpus-per-task=Y" rather than an ambiguous range of nodes with
798 -N, --nodes.
799 NOTE: This option will not have any impact on GPU binding,
800 specifically it won't limit the number of devices set for
801 CUDA_VISIBLE_DEVICES.
802
803
804 --gres=<list>
805 Specifies a comma delimited list of generic consumable re‐
806 sources. The format of each entry on the list is
807 "name[[:type]:count]". The name is that of the consumable re‐
808 source. The count is the number of those resources with a de‐
809 fault value of 1. The count can have a suffix of "k" or "K"
810 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
811 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
812 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
813 x 1024 x 1024 x 1024). The specified resources will be allo‐
814 cated to the job on each node. The available generic consumable
815 resources is configurable by the system administrator. A list
816 of available generic consumable resources will be printed and
817 the command will exit if the option argument is "help". Exam‐
818 ples of use include "--gres=gpu:2,mic:1", "--gres=gpu:kepler:2",
819 and "--gres=help".
820
821
822 --gres-flags=<type>
823 Specify generic resource task binding options.
824
825 disable-binding
826 Disable filtering of CPUs with respect to generic re‐
827 source locality. This option is currently required to
828 use more CPUs than are bound to a GRES (i.e. if a GPU is
829 bound to the CPUs on one socket, but resources on more
830 than one socket are required to run the job). This op‐
831 tion may permit a job to be allocated resources sooner
832 than otherwise possible, but may result in lower job per‐
833 formance.
834 NOTE: This option is specific to SelectType=cons_res.
835
836 enforce-binding
837 The only CPUs available to the job will be those bound to
838 the selected GRES (i.e. the CPUs identified in the
839 gres.conf file will be strictly enforced). This option
840 may result in delayed initiation of a job. For example a
841 job requiring two GPUs and one CPU will be delayed until
842 both GPUs on a single socket are available rather than
843 using GPUs bound to separate sockets, however, the appli‐
844 cation performance may be improved due to improved commu‐
845 nication speed. Requires the node to be configured with
846 more than one socket and resource filtering will be per‐
847 formed on a per-socket basis.
848 NOTE: This option is specific to SelectType=cons_tres.
849
850
851 -H, --hold
852 Specify the job is to be submitted in a held state (priority of
853 zero). A held job can now be released using scontrol to reset
854 its priority (e.g. "scontrol release <job_id>").
855
856
857 -h, --help
858 Display help information and exit.
859
860
861 --hint=<type>
862 Bind tasks according to application hints.
863 NOTE: This option cannot be used in conjunction with
864 --ntasks-per-core, --threads-per-core or -B. If --hint is speci‐
865 fied as a command line argument, it will take precedence over
866 the environment.
867
868 compute_bound
869 Select settings for compute bound applications: use all
870 cores in each socket, one thread per core.
871
872 memory_bound
873 Select settings for memory bound applications: use only
874 one core in each socket, one thread per core.
875
876 [no]multithread
877 [don't] use extra threads with in-core multi-threading
878 which can benefit communication intensive applications.
879 Only supported with the task/affinity plugin.
880
881 help show this help message
882
883
884 --ignore-pbs
885 Ignore all "#PBS" and "#BSUB" options specified in the batch
886 script.
887
888
889 -i, --input=<filename pattern>
890 Instruct Slurm to connect the batch script's standard input di‐
891 rectly to the file name specified in the "filename pattern".
892
893 By default, "/dev/null" is open on the batch script's standard
894 input and both standard output and standard error are directed
895 to a file of the name "slurm-%j.out", where the "%j" is replaced
896 with the job allocation number, as described below in the file‐
897 name pattern section.
898
899
900 -J, --job-name=<jobname>
901 Specify a name for the job allocation. The specified name will
902 appear along with the job id number when querying running jobs
903 on the system. The default is the name of the batch script, or
904 just "sbatch" if the script is read on sbatch's standard input.
905
906
907 -k, --no-kill [=off]
908 Do not automatically terminate a job if one of the nodes it has
909 been allocated fails. The user will assume the responsibilities
910 for fault-tolerance should a node fail. When there is a node
911 failure, any active job steps (usually MPI jobs) on that node
912 will almost certainly suffer a fatal error, but with --no-kill,
913 the job allocation will not be revoked so the user may launch
914 new job steps on the remaining nodes in their allocation.
915
916 Specify an optional argument of "off" disable the effect of the
917 SBATCH_NO_KILL environment variable.
918
919 By default Slurm terminates the entire job allocation if any
920 node fails in its range of allocated nodes.
921
922
923 --kill-on-invalid-dep=<yes|no>
924 If a job has an invalid dependency and it can never run this pa‐
925 rameter tells Slurm to terminate it or not. A terminated job
926 state will be JOB_CANCELLED. If this option is not specified
927 the system wide behavior applies. By default the job stays
928 pending with reason DependencyNeverSatisfied or if the kill_in‐
929 valid_depend is specified in slurm.conf the job is terminated.
930
931
932 -L, --licenses=<license>
933 Specification of licenses (or other resources available on all
934 nodes of the cluster) which must be allocated to this job. Li‐
935 cense names can be followed by a colon and count (the default
936 count is one). Multiple license names should be comma separated
937 (e.g. "--licenses=foo:4,bar"). To submit jobs using remote li‐
938 censes, those served by the slurmdbd, specify the name of the
939 server providing the licenses. For example "--license=nas‐
940 tran@slurmdb:12".
941
942
943 -M, --clusters=<string>
944 Clusters to issue commands to. Multiple cluster names may be
945 comma separated. The job will be submitted to the one cluster
946 providing the earliest expected job initiation time. The default
947 value is the current cluster. A value of 'all' will query to run
948 on all clusters. Note the --export option to control environ‐
949 ment variables exported between clusters. Note that the Slur‐
950 mDBD must be up for this option to work properly.
951
952
953 -m, --distribution=
954 arbitrary|<block|cyclic|plane=<options>[:block|cyclic|fcyclic]>
955
956 Specify alternate distribution methods for remote processes. In
957 sbatch, this only sets environment variables that will be used
958 by subsequent srun requests. This option controls the assign‐
959 ment of tasks to the nodes on which resources have been allo‐
960 cated, and the distribution of those resources to tasks for
961 binding (task affinity). The first distribution method (before
962 the ":") controls the distribution of resources across nodes.
963 The optional second distribution method (after the ":") controls
964 the distribution of resources across sockets within a node.
965 Note that with select/cons_res, the number of cpus allocated on
966 each socket and node may be different. Refer to
967 https://slurm.schedmd.com/mc_support.html for more information
968 on resource allocation, assignment of tasks to nodes, and bind‐
969 ing of tasks to CPUs.
970
971 First distribution method:
972
973 block The block distribution method will distribute tasks to a
974 node such that consecutive tasks share a node. For exam‐
975 ple, consider an allocation of three nodes each with two
976 cpus. A four-task block distribution request will dis‐
977 tribute those tasks to the nodes with tasks one and two
978 on the first node, task three on the second node, and
979 task four on the third node. Block distribution is the
980 default behavior if the number of tasks exceeds the num‐
981 ber of allocated nodes.
982
983 cyclic The cyclic distribution method will distribute tasks to a
984 node such that consecutive tasks are distributed over
985 consecutive nodes (in a round-robin fashion). For exam‐
986 ple, consider an allocation of three nodes each with two
987 cpus. A four-task cyclic distribution request will dis‐
988 tribute those tasks to the nodes with tasks one and four
989 on the first node, task two on the second node, and task
990 three on the third node. Note that when SelectType is
991 select/cons_res, the same number of CPUs may not be allo‐
992 cated on each node. Task distribution will be round-robin
993 among all the nodes with CPUs yet to be assigned to
994 tasks. Cyclic distribution is the default behavior if
995 the number of tasks is no larger than the number of allo‐
996 cated nodes.
997
998 plane The tasks are distributed in blocks of a specified size.
999 The number of tasks distributed to each node is the same
1000 as for cyclic distribution, but the taskids assigned to
1001 each node depend on the plane size. Additional distribu‐
1002 tion specifications cannot be combined with this option.
1003 For more details (including examples and diagrams),
1004 please see
1005 https://slurm.schedmd.com/mc_support.html
1006 and
1007 https://slurm.schedmd.com/dist_plane.html
1008
1009 arbitrary
1010 The arbitrary method of distribution will allocate pro‐
1011 cesses in-order as listed in file designated by the envi‐
1012 ronment variable SLURM_HOSTFILE. If this variable is
1013 listed it will override any other method specified. If
1014 not set the method will default to block. Inside the
1015 hostfile must contain at minimum the number of hosts re‐
1016 quested and be one per line or comma separated. If spec‐
1017 ifying a task count (-n, --ntasks=<number>), your tasks
1018 will be laid out on the nodes in the order of the file.
1019 NOTE: The arbitrary distribution option on a job alloca‐
1020 tion only controls the nodes to be allocated to the job
1021 and not the allocation of CPUs on those nodes. This op‐
1022 tion is meant primarily to control a job step's task lay‐
1023 out in an existing job allocation for the srun command.
1024
1025
1026 Second distribution method:
1027
1028 block The block distribution method will distribute tasks to
1029 sockets such that consecutive tasks share a socket.
1030
1031 cyclic The cyclic distribution method will distribute tasks to
1032 sockets such that consecutive tasks are distributed over
1033 consecutive sockets (in a round-robin fashion). Tasks
1034 requiring more than one CPU will have all of those CPUs
1035 allocated on a single socket if possible.
1036
1037 fcyclic
1038 The fcyclic distribution method will distribute tasks to
1039 sockets such that consecutive tasks are distributed over
1040 consecutive sockets (in a round-robin fashion). Tasks
1041 requiring more than one CPU will have each CPUs allocated
1042 in a cyclic fashion across sockets.
1043
1044
1045 --mail-type=<type>
1046 Notify user by email when certain event types occur. Valid type
1047 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1048 BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
1049 VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1050 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1051 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
1052 percent of time limit), TIME_LIMIT_50 (reached 50 percent of
1053 time limit) and ARRAY_TASKS (send emails for each array task).
1054 Multiple type values may be specified in a comma separated list.
1055 The user to be notified is indicated with --mail-user. Unless
1056 the ARRAY_TASKS option is specified, mail notifications on job
1057 BEGIN, END and FAIL apply to a job array as a whole rather than
1058 generating individual email messages for each task in the job
1059 array.
1060
1061
1062 --mail-user=<user>
1063 User to receive email notification of state changes as defined
1064 by --mail-type. The default value is the submitting user.
1065
1066
1067 --mcs-label=<mcs>
1068 Used only when the mcs/group plugin is enabled. This parameter
1069 is a group among the groups of the user. Default value is cal‐
1070 culated by the Plugin mcs if it's enabled.
1071
1072
1073 --mem=<size[units]>
1074 Specify the real memory required per node. Default units are
1075 megabytes. Different units can be specified using the suffix
1076 [K|M|G|T]. Default value is DefMemPerNode and the maximum value
1077 is MaxMemPerNode. If configured, both parameters can be seen us‐
1078 ing the scontrol show config command. This parameter would gen‐
1079 erally be used if whole nodes are allocated to jobs (Select‐
1080 Type=select/linear). Also see --mem-per-cpu and --mem-per-gpu.
1081 The --mem, --mem-per-cpu and --mem-per-gpu options are mutually
1082 exclusive. If --mem, --mem-per-cpu or --mem-per-gpu are speci‐
1083 fied as command line arguments, then they will take precedence
1084 over the environment.
1085
1086 NOTE: A memory size specification of zero is treated as a spe‐
1087 cial case and grants the job access to all of the memory on each
1088 node. If the job is allocated multiple nodes in a heterogeneous
1089 cluster, the memory limit on each node will be that of the node
1090 in the allocation with the smallest memory size (same limit will
1091 apply to every node in the job's allocation).
1092
1093 NOTE: Enforcement of memory limits currently relies upon the
1094 task/cgroup plugin or enabling of accounting, which samples mem‐
1095 ory use on a periodic basis (data need not be stored, just col‐
1096 lected). In both cases memory use is based upon the job's Resi‐
1097 dent Set Size (RSS). A task may exceed the memory limit until
1098 the next periodic accounting sample.
1099
1100
1101 --mem-per-cpu=<size[units]>
1102 Minimum memory required per allocated CPU. Default units are
1103 megabytes. The default value is DefMemPerCPU and the maximum
1104 value is MaxMemPerCPU (see exception below). If configured, both
1105 parameters can be seen using the scontrol show config command.
1106 Note that if the job's --mem-per-cpu value exceeds the config‐
1107 ured MaxMemPerCPU, then the user's limit will be treated as a
1108 memory limit per task; --mem-per-cpu will be reduced to a value
1109 no larger than MaxMemPerCPU; --cpus-per-task will be set and the
1110 value of --cpus-per-task multiplied by the new --mem-per-cpu
1111 value will equal the original --mem-per-cpu value specified by
1112 the user. This parameter would generally be used if individual
1113 processors are allocated to jobs (SelectType=select/cons_res).
1114 If resources are allocated by core, socket, or whole nodes, then
1115 the number of CPUs allocated to a job may be higher than the
1116 task count and the value of --mem-per-cpu should be adjusted ac‐
1117 cordingly. Also see --mem and --mem-per-gpu. The --mem,
1118 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1119
1120 NOTE: If the final amount of memory requested by a job can't be
1121 satisfied by any of the nodes configured in the partition, the
1122 job will be rejected. This could happen if --mem-per-cpu is
1123 used with the --exclusive option for a job allocation and
1124 --mem-per-cpu times the number of CPUs on a node is greater than
1125 the total memory of that node.
1126
1127
1128 --mem-per-gpu=<size[units]>
1129 Minimum memory required per allocated GPU. Default units are
1130 megabytes. Different units can be specified using the suffix
1131 [K|M|G|T]. Default value is DefMemPerGPU and is available on
1132 both a global and per partition basis. If configured, the pa‐
1133 rameters can be seen using the scontrol show config and scontrol
1134 show partition commands. Also see --mem. The --mem,
1135 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1136
1137
1138 --mem-bind=[{quiet,verbose},]type
1139 Bind tasks to memory. Used only when the task/affinity plugin is
1140 enabled and the NUMA memory functions are available. Note that
1141 the resolution of CPU and memory binding may differ on some ar‐
1142 chitectures. For example, CPU binding may be performed at the
1143 level of the cores within a processor while memory binding will
1144 be performed at the level of nodes, where the definition of
1145 "nodes" may differ from system to system. By default no memory
1146 binding is performed; any task using any CPU can use any memory.
1147 This option is typically used to ensure that each task is bound
1148 to the memory closest to its assigned CPU. The use of any type
1149 other than "none" or "local" is not recommended.
1150
1151 NOTE: To have Slurm always report on the selected memory binding
1152 for all commands executed in a shell, you can enable verbose
1153 mode by setting the SLURM_MEM_BIND environment variable value to
1154 "verbose".
1155
1156 The following informational environment variables are set when
1157 --mem-bind is in use:
1158
1159 SLURM_MEM_BIND_LIST
1160 SLURM_MEM_BIND_PREFER
1161 SLURM_MEM_BIND_SORT
1162 SLURM_MEM_BIND_TYPE
1163 SLURM_MEM_BIND_VERBOSE
1164
1165 See the ENVIRONMENT VARIABLES section for a more detailed de‐
1166 scription of the individual SLURM_MEM_BIND* variables.
1167
1168 Supported options include:
1169
1170 help show this help message
1171
1172 local Use memory local to the processor in use
1173
1174 map_mem:<list>
1175 Bind by setting memory masks on tasks (or ranks) as spec‐
1176 ified where <list> is
1177 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1178 ping is specified for a node and identical mapping is ap‐
1179 plied to the tasks on every node (i.e. the lowest task ID
1180 on each node is mapped to the first ID specified in the
1181 list, etc.). NUMA IDs are interpreted as decimal values
1182 unless they are preceded with '0x' in which case they in‐
1183 terpreted as hexadecimal values. If the number of tasks
1184 (or ranks) exceeds the number of elements in this list,
1185 elements in the list will be reused as needed starting
1186 from the beginning of the list. To simplify support for
1187 large task counts, the lists may follow a map with an as‐
1188 terisk and repetition count. For example
1189 "map_mem:0x0f*4,0xf0*4". For predictable binding re‐
1190 sults, all CPUs for each node in the job should be allo‐
1191 cated to the job.
1192
1193 mask_mem:<list>
1194 Bind by setting memory masks on tasks (or ranks) as spec‐
1195 ified where <list> is
1196 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1197 mapping is specified for a node and identical mapping is
1198 applied to the tasks on every node (i.e. the lowest task
1199 ID on each node is mapped to the first mask specified in
1200 the list, etc.). NUMA masks are always interpreted as
1201 hexadecimal values. Note that masks must be preceded
1202 with a '0x' if they don't begin with [0-9] so they are
1203 seen as numerical values. If the number of tasks (or
1204 ranks) exceeds the number of elements in this list, ele‐
1205 ments in the list will be reused as needed starting from
1206 the beginning of the list. To simplify support for large
1207 task counts, the lists may follow a mask with an asterisk
1208 and repetition count. For example "mask_mem:0*4,1*4".
1209 For predictable binding results, all CPUs for each node
1210 in the job should be allocated to the job.
1211
1212 no[ne] don't bind tasks to memory (default)
1213
1214 p[refer]
1215 Prefer use of first specified NUMA node, but permit
1216 use of other available NUMA nodes.
1217
1218 q[uiet]
1219 quietly bind before task runs (default)
1220
1221 rank bind by task rank (not recommended)
1222
1223 sort sort free cache pages (run zonesort on Intel KNL nodes)
1224
1225 v[erbose]
1226 verbosely report binding before task runs
1227
1228
1229 --mincpus=<n>
1230 Specify a minimum number of logical cpus/processors per node.
1231
1232
1233 -N, --nodes=<minnodes[-maxnodes]>
1234 Request that a minimum of minnodes nodes be allocated to this
1235 job. A maximum node count may also be specified with maxnodes.
1236 If only one number is specified, this is used as both the mini‐
1237 mum and maximum node count. The partition's node limits super‐
1238 sede those of the job. If a job's node limits are outside of
1239 the range permitted for its associated partition, the job will
1240 be left in a PENDING state. This permits possible execution at
1241 a later time, when the partition limit is changed. If a job
1242 node limit exceeds the number of nodes configured in the parti‐
1243 tion, the job will be rejected. Note that the environment vari‐
1244 able SLURM_JOB_NUM_NODES will be set to the count of nodes actu‐
1245 ally allocated to the job. See the ENVIRONMENT VARIABLES sec‐
1246 tion for more information. If -N is not specified, the default
1247 behavior is to allocate enough nodes to satisfy the requirements
1248 of the -n and -c options. The job will be allocated as many
1249 nodes as possible within the range specified and without delay‐
1250 ing the initiation of the job. The node count specification may
1251 include a numeric value followed by a suffix of "k" (multiplies
1252 numeric value by 1,024) or "m" (multiplies numeric value by
1253 1,048,576).
1254
1255
1256 -n, --ntasks=<number>
1257 sbatch does not launch tasks, it requests an allocation of re‐
1258 sources and submits a batch script. This option advises the
1259 Slurm controller that job steps run within the allocation will
1260 launch a maximum of number tasks and to provide for sufficient
1261 resources. The default is one task per node, but note that the
1262 --cpus-per-task option will change this default.
1263
1264
1265 --network=<type>
1266 Specify information pertaining to the switch or network. The
1267 interpretation of type is system dependent. This option is sup‐
1268 ported when running Slurm on a Cray natively. It is used to re‐
1269 quest using Network Performance Counters. Only one value per
1270 request is valid. All options are case in-sensitive. In this
1271 configuration supported values include:
1272
1273 system
1274 Use the system-wide network performance counters. Only
1275 nodes requested will be marked in use for the job alloca‐
1276 tion. If the job does not fill up the entire system the
1277 rest of the nodes are not able to be used by other jobs
1278 using NPC, if idle their state will appear as PerfCnts.
1279 These nodes are still available for other jobs not using
1280 NPC.
1281
1282 blade Use the blade network performance counters. Only nodes re‐
1283 quested will be marked in use for the job allocation. If
1284 the job does not fill up the entire blade(s) allocated to
1285 the job those blade(s) are not able to be used by other
1286 jobs using NPC, if idle their state will appear as PerfC‐
1287 nts. These nodes are still available for other jobs not
1288 using NPC.
1289
1290
1291 In all cases the job allocation request must specify the
1292 --exclusive option. Otherwise the request will be denied.
1293
1294 Also with any of these options steps are not allowed to share
1295 blades, so resources would remain idle inside an allocation if
1296 the step running on a blade does not take up all the nodes on
1297 the blade.
1298
1299 The network option is also supported on systems with IBM's Par‐
1300 allel Environment (PE). See IBM's LoadLeveler job command key‐
1301 word documentation about the keyword "network" for more informa‐
1302 tion. Multiple values may be specified in a comma separated
1303 list. All options are case in-sensitive. Supported values in‐
1304 clude:
1305
1306 BULK_XFER[=<resources>]
1307 Enable bulk transfer of data using Remote Di‐
1308 rect-Memory Access (RDMA). The optional resources
1309 specification is a numeric value which can have a
1310 suffix of "k", "K", "m", "M", "g" or "G" for kilo‐
1311 bytes, megabytes or gigabytes. NOTE: The resources
1312 specification is not supported by the underlying IBM
1313 infrastructure as of Parallel Environment version
1314 2.2 and no value should be specified at this time.
1315
1316 CAU=<count> Number of Collective Acceleration Units (CAU) re‐
1317 quired. Applies only to IBM Power7-IH processors.
1318 Default value is zero. Independent CAU will be al‐
1319 located for each programming interface (MPI, LAPI,
1320 etc.)
1321
1322 DEVNAME=<name>
1323 Specify the device name to use for communications
1324 (e.g. "eth0" or "mlx4_0").
1325
1326 DEVTYPE=<type>
1327 Specify the device type to use for communications.
1328 The supported values of type are: "IB" (InfiniBand),
1329 "HFI" (P7 Host Fabric Interface), "IPONLY" (IP-Only
1330 interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1331 nel Emulation of HPCE). The devices allocated to a
1332 job must all be of the same type. The default value
1333 depends upon depends upon what hardware is available
1334 and in order of preferences is IPONLY (which is not
1335 considered in User Space mode), HFI, IB, HPCE, and
1336 KMUX.
1337
1338 IMMED =<count>
1339 Number of immediate send slots per window required.
1340 Applies only to IBM Power7-IH processors. Default
1341 value is zero.
1342
1343 INSTANCES =<count>
1344 Specify number of network connections for each task
1345 on each network connection. The default instance
1346 count is 1.
1347
1348 IPV4 Use Internet Protocol (IP) version 4 communications
1349 (default).
1350
1351 IPV6 Use Internet Protocol (IP) version 6 communications.
1352
1353 LAPI Use the LAPI programming interface.
1354
1355 MPI Use the MPI programming interface. MPI is the de‐
1356 fault interface.
1357
1358 PAMI Use the PAMI programming interface.
1359
1360 SHMEM Use the OpenSHMEM programming interface.
1361
1362 SN_ALL Use all available switch networks (default).
1363
1364 SN_SINGLE Use one available switch network.
1365
1366 UPC Use the UPC programming interface.
1367
1368 US Use User Space communications.
1369
1370
1371 Some examples of network specifications:
1372
1373 Instances=2,US,MPI,SN_ALL
1374 Create two user space connections for MPI communica‐
1375 tions on every switch network for each task.
1376
1377 US,MPI,Instances=3,Devtype=IB
1378 Create three user space connections for MPI communi‐
1379 cations on every InfiniBand network for each task.
1380
1381 IPV4,LAPI,SN_Single
1382 Create a IP version 4 connection for LAPI communica‐
1383 tions on one switch network for each task.
1384
1385 Instances=2,US,LAPI,MPI
1386 Create two user space connections each for LAPI and
1387 MPI communications on every switch network for each
1388 task. Note that SN_ALL is the default option so ev‐
1389 ery switch network is used. Also note that In‐
1390 stances=2 specifies that two connections are estab‐
1391 lished for each protocol (LAPI and MPI) and each
1392 task. If there are two networks and four tasks on
1393 the node then a total of 32 connections are estab‐
1394 lished (2 instances x 2 protocols x 2 networks x 4
1395 tasks).
1396
1397
1398 --nice[=adjustment]
1399 Run the job with an adjusted scheduling priority within Slurm.
1400 With no adjustment value the scheduling priority is decreased by
1401 100. A negative nice value increases the priority, otherwise de‐
1402 creases it. The adjustment range is +/- 2147483645. Only privi‐
1403 leged users can specify a negative adjustment.
1404
1405
1406 --no-requeue
1407 Specifies that the batch job should never be requeued under any
1408 circumstances. Setting this option will prevent system adminis‐
1409 trators from being able to restart the job (for example, after a
1410 scheduled downtime), recover from a node failure, or be requeued
1411 upon preemption by a higher priority job. When a job is re‐
1412 queued, the batch script is initiated from its beginning. Also
1413 see the --requeue option. The JobRequeue configuration parame‐
1414 ter controls the default behavior on the cluster.
1415
1416
1417 --ntasks-per-core=<ntasks>
1418 Request the maximum ntasks be invoked on each core. Meant to be
1419 used with the --ntasks option. Related to --ntasks-per-node ex‐
1420 cept at the core level instead of the node level. NOTE: This
1421 option is not supported when using SelectType=select/linear.
1422
1423
1424 --ntasks-per-gpu=<ntasks>
1425 Request that there are ntasks tasks invoked for every GPU. This
1426 option can work in two ways: 1) either specify --ntasks in addi‐
1427 tion, in which case a type-less GPU specification will be auto‐
1428 matically determined to satisfy --ntasks-per-gpu, or 2) specify
1429 the GPUs wanted (e.g. via --gpus or --gres) without specifying
1430 --ntasks, and the total task count will be automatically deter‐
1431 mined. The number of CPUs needed will be automatically in‐
1432 creased if necessary to allow for any calculated task count.
1433 This option will implicitly set --gpu-bind=single:<ntasks>, but
1434 that can be overridden with an explicit --gpu-bind specifica‐
1435 tion. This option is not compatible with a node range (i.e.
1436 -N<minnodes-maxnodes>). This option is not compatible with
1437 --gpus-per-task, --gpus-per-socket, or --ntasks-per-node. This
1438 option is not supported unless SelectType=cons_tres is config‐
1439 ured (either directly or indirectly on Cray systems).
1440
1441
1442 --ntasks-per-node=<ntasks>
1443 Request that ntasks be invoked on each node. If used with the
1444 --ntasks option, the --ntasks option will take precedence and
1445 the --ntasks-per-node will be treated as a maximum count of
1446 tasks per node. Meant to be used with the --nodes option. This
1447 is related to --cpus-per-task=ncpus, but does not require knowl‐
1448 edge of the actual number of cpus on each node. In some cases,
1449 it is more convenient to be able to request that no more than a
1450 specific number of tasks be invoked on each node. Examples of
1451 this include submitting a hybrid MPI/OpenMP app where only one
1452 MPI "task/rank" should be assigned to each node while allowing
1453 the OpenMP portion to utilize all of the parallelism present in
1454 the node, or submitting a single setup/cleanup/monitoring job to
1455 each node of a pre-existing allocation as one step in a larger
1456 job script.
1457
1458
1459 --ntasks-per-socket=<ntasks>
1460 Request the maximum ntasks be invoked on each socket. Meant to
1461 be used with the --ntasks option. Related to --ntasks-per-node
1462 except at the socket level instead of the node level. NOTE:
1463 This option is not supported when using SelectType=select/lin‐
1464 ear.
1465
1466
1467 -O, --overcommit
1468 Overcommit resources. When applied to job allocation, only one
1469 CPU is allocated to the job per node and options used to specify
1470 the number of tasks per node, socket, core, etc. are ignored.
1471 When applied to job step allocations (the srun command when exe‐
1472 cuted within an existing job allocation), this option can be
1473 used to launch more than one task per CPU. Normally, srun will
1474 not allocate more than one process per CPU. By specifying
1475 --overcommit you are explicitly allowing more than one process
1476 per CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1477 mitted to execute per node. NOTE: MAX_TASKS_PER_NODE is defined
1478 in the file slurm.h and is not a variable, it is set at Slurm
1479 build time.
1480
1481
1482 -o, --output=<filename pattern>
1483 Instruct Slurm to connect the batch script's standard output di‐
1484 rectly to the file name specified in the "filename pattern". By
1485 default both standard output and standard error are directed to
1486 the same file. For job arrays, the default file name is
1487 "slurm-%A_%a.out", "%A" is replaced by the job ID and "%a" with
1488 the array index. For other jobs, the default file name is
1489 "slurm-%j.out", where the "%j" is replaced by the job ID. See
1490 the filename pattern section below for filename specification
1491 options.
1492
1493
1494 --open-mode=append|truncate
1495 Open the output and error files using append or truncate mode as
1496 specified. The default value is specified by the system config‐
1497 uration parameter JobFileAppend.
1498
1499
1500 --parsable
1501 Outputs only the job id number and the cluster name if present.
1502 The values are separated by a semicolon. Errors will still be
1503 displayed.
1504
1505
1506 -p, --partition=<partition_names>
1507 Request a specific partition for the resource allocation. If
1508 not specified, the default behavior is to allow the slurm con‐
1509 troller to select the default partition as designated by the
1510 system administrator. If the job can use more than one parti‐
1511 tion, specify their names in a comma separate list and the one
1512 offering earliest initiation will be used with no regard given
1513 to the partition name ordering (although higher priority parti‐
1514 tions will be considered first). When the job is initiated, the
1515 name of the partition used will be placed first in the job
1516 record partition string.
1517
1518
1519 --power=<flags>
1520 Comma separated list of power management plugin options. Cur‐
1521 rently available flags include: level (all nodes allocated to
1522 the job should have identical power caps, may be disabled by the
1523 Slurm configuration option PowerParameters=job_no_level).
1524
1525
1526 --priority=<value>
1527 Request a specific job priority. May be subject to configura‐
1528 tion specific constraints. value should either be a numeric
1529 value or "TOP" (for highest possible value). Only Slurm opera‐
1530 tors and administrators can set the priority of a job.
1531
1532
1533 --profile=<all|none|[energy[,|task[,|lustre[,|network]]]]>
1534 enables detailed data collection by the acct_gather_profile
1535 plugin. Detailed data are typically time-series that are stored
1536 in an HDF5 file for the job or an InfluxDB database depending on
1537 the configured plugin.
1538
1539
1540 All All data types are collected. (Cannot be combined with
1541 other values.)
1542
1543
1544 None No data types are collected. This is the default.
1545 (Cannot be combined with other values.)
1546
1547
1548 Energy Energy data is collected.
1549
1550
1551 Task Task (I/O, Memory, ...) data is collected.
1552
1553
1554 Lustre Lustre data is collected.
1555
1556
1557 Network Network (InfiniBand) data is collected.
1558
1559
1560 --propagate[=rlimit[,rlimit...]]
1561 Allows users to specify which of the modifiable (soft) resource
1562 limits to propagate to the compute nodes and apply to their
1563 jobs. If no rlimit is specified, then all resource limits will
1564 be propagated. The following rlimit names are supported by
1565 Slurm (although some options may not be supported on some sys‐
1566 tems):
1567
1568 ALL All limits listed below (default)
1569
1570 NONE No limits listed below
1571
1572 AS The maximum address space for a process
1573
1574 CORE The maximum size of core file
1575
1576 CPU The maximum amount of CPU time
1577
1578 DATA The maximum size of a process's data segment
1579
1580 FSIZE The maximum size of files created. Note that if the
1581 user sets FSIZE to less than the current size of the
1582 slurmd.log, job launches will fail with a 'File size
1583 limit exceeded' error.
1584
1585 MEMLOCK The maximum size that may be locked into memory
1586
1587 NOFILE The maximum number of open files
1588
1589 NPROC The maximum number of processes available
1590
1591 RSS The maximum resident set size
1592
1593 STACK The maximum stack size
1594
1595
1596 -q, --qos=<qos>
1597 Request a quality of service for the job. QOS values can be de‐
1598 fined for each user/cluster/account association in the Slurm
1599 database. Users will be limited to their association's defined
1600 set of qos's when the Slurm configuration parameter, Account‐
1601 ingStorageEnforce, includes "qos" in its definition.
1602
1603
1604 -Q, --quiet
1605 Suppress informational messages from sbatch such as Job ID. Only
1606 errors will still be displayed.
1607
1608
1609 --reboot
1610 Force the allocated nodes to reboot before starting the job.
1611 This is only supported with some system configurations and will
1612 otherwise be silently ignored. Only root, SlurmUser or admins
1613 can reboot nodes.
1614
1615
1616 --requeue
1617 Specifies that the batch job should be eligible for requeuing.
1618 The job may be requeued explicitly by a system administrator,
1619 after node failure, or upon preemption by a higher priority job.
1620 When a job is requeued, the batch script is initiated from its
1621 beginning. Also see the --no-requeue option. The JobRequeue
1622 configuration parameter controls the default behavior on the
1623 cluster.
1624
1625
1626 --reservation=<reservation_names>
1627 Allocate resources for the job from the named reservation. If
1628 the job can use more than one reservation, specify their names
1629 in a comma separate list and the one offering earliest initia‐
1630 tion. Each reservation will be considered in the order it was
1631 requested. All reservations will be listed in scontrol/squeue
1632 through the life of the job. In accounting the first reserva‐
1633 tion will be seen and after the job starts the reservation used
1634 will replace it.
1635
1636
1637 -s, --oversubscribe
1638 The job allocation can over-subscribe resources with other run‐
1639 ning jobs. The resources to be over-subscribed can be nodes,
1640 sockets, cores, and/or hyperthreads depending upon configura‐
1641 tion. The default over-subscribe behavior depends on system
1642 configuration and the partition's OverSubscribe option takes
1643 precedence over the job's option. This option may result in the
1644 allocation being granted sooner than if the --oversubscribe op‐
1645 tion was not set and allow higher system utilization, but appli‐
1646 cation performance will likely suffer due to competition for re‐
1647 sources. Also see the --exclusive option.
1648
1649
1650 -S, --core-spec=<num>
1651 Count of specialized cores per node reserved by the job for sys‐
1652 tem operations and not used by the application. The application
1653 will not use these cores, but will be charged for their alloca‐
1654 tion. Default value is dependent upon the node's configured
1655 CoreSpecCount value. If a value of zero is designated and the
1656 Slurm configuration option AllowSpecResourcesUsage is enabled,
1657 the job will be allowed to override CoreSpecCount and use the
1658 specialized resources on nodes it is allocated. This option can
1659 not be used with the --thread-spec option.
1660
1661
1662 --signal=[[R][B]:]<sig_num>[@<sig_time>]
1663 When a job is within sig_time seconds of its end time, send it
1664 the signal sig_num. Due to the resolution of event handling by
1665 Slurm, the signal may be sent up to 60 seconds earlier than
1666 specified. sig_num may either be a signal number or name (e.g.
1667 "10" or "USR1"). sig_time must have an integer value between 0
1668 and 65535. By default, no signal is sent before the job's end
1669 time. If a sig_num is specified without any sig_time, the de‐
1670 fault time will be 60 seconds. Use the "B:" option to signal
1671 only the batch shell, none of the other processes will be sig‐
1672 naled. By default all job steps will be signaled, but not the
1673 batch shell itself. Use the "R:" option to allow this job to
1674 overlap with a reservation with MaxStartDelay set. To have the
1675 signal sent at preemption time see the preempt_send_user_signal
1676 SlurmctldParameter.
1677
1678
1679 --sockets-per-node=<sockets>
1680 Restrict node selection to nodes with at least the specified
1681 number of sockets. See additional information under -B option
1682 above when task/affinity plugin is enabled.
1683
1684
1685 --spread-job
1686 Spread the job allocation over as many nodes as possible and at‐
1687 tempt to evenly distribute tasks across the allocated nodes.
1688 This option disables the topology/tree plugin.
1689
1690
1691 --switches=<count>[@<max-time>]
1692 When a tree topology is used, this defines the maximum count of
1693 switches desired for the job allocation and optionally the maxi‐
1694 mum time to wait for that number of switches. If Slurm finds an
1695 allocation containing more switches than the count specified,
1696 the job remains pending until it either finds an allocation with
1697 desired switch count or the time limit expires. It there is no
1698 switch count limit, there is no delay in starting the job. Ac‐
1699 ceptable time formats include "minutes", "minutes:seconds",
1700 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1701 "days-hours:minutes:seconds". The job's maximum time delay may
1702 be limited by the system administrator using the SchedulerParam‐
1703 eters configuration parameter with the max_switch_wait parameter
1704 option. On a dragonfly network the only switch count supported
1705 is 1 since communication performance will be highest when a job
1706 is allocate resources on one leaf switch or more than 2 leaf
1707 switches. The default max-time is the max_switch_wait Sched‐
1708 ulerParameters.
1709
1710
1711 -t, --time=<time>
1712 Set a limit on the total run time of the job allocation. If the
1713 requested time limit exceeds the partition's time limit, the job
1714 will be left in a PENDING state (possibly indefinitely). The
1715 default time limit is the partition's default time limit. When
1716 the time limit is reached, each task in each job step is sent
1717 SIGTERM followed by SIGKILL. The interval between signals is
1718 specified by the Slurm configuration parameter KillWait. The
1719 OverTimeLimit configuration parameter may permit the job to run
1720 longer than scheduled. Time resolution is one minute and second
1721 values are rounded up to the next minute.
1722
1723 A time limit of zero requests that no time limit be imposed.
1724 Acceptable time formats include "minutes", "minutes:seconds",
1725 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1726 "days-hours:minutes:seconds".
1727
1728
1729 --test-only
1730 Validate the batch script and return an estimate of when a job
1731 would be scheduled to run given the current job queue and all
1732 the other arguments specifying the job requirements. No job is
1733 actually submitted.
1734
1735
1736 --thread-spec=<num>
1737 Count of specialized threads per node reserved by the job for
1738 system operations and not used by the application. The applica‐
1739 tion will not use these threads, but will be charged for their
1740 allocation. This option can not be used with the --core-spec
1741 option.
1742
1743
1744 --threads-per-core=<threads>
1745 Restrict node selection to nodes with at least the specified
1746 number of threads per core. In task layout, use the specified
1747 maximum number of threads per core. NOTE: "Threads" refers to
1748 the number of processing units on each core rather than the num‐
1749 ber of application tasks to be launched per core. See addi‐
1750 tional information under -B option above when task/affinity
1751 plugin is enabled.
1752
1753
1754 --time-min=<time>
1755 Set a minimum time limit on the job allocation. If specified,
1756 the job may have its --time limit lowered to a value no lower
1757 than --time-min if doing so permits the job to begin execution
1758 earlier than otherwise possible. The job's time limit will not
1759 be changed after the job is allocated resources. This is per‐
1760 formed by a backfill scheduling algorithm to allocate resources
1761 otherwise reserved for higher priority jobs. Acceptable time
1762 formats include "minutes", "minutes:seconds", "hours:min‐
1763 utes:seconds", "days-hours", "days-hours:minutes" and
1764 "days-hours:minutes:seconds".
1765
1766
1767 --tmp=<size[units]>
1768 Specify a minimum amount of temporary disk space per node. De‐
1769 fault units are megabytes. Different units can be specified us‐
1770 ing the suffix [K|M|G|T].
1771
1772
1773 --usage
1774 Display brief help message and exit.
1775
1776
1777 --uid=<user>
1778 Attempt to submit and/or run a job as user instead of the invok‐
1779 ing user id. The invoking user's credentials will be used to
1780 check access permissions for the target partition. User root may
1781 use this option to run jobs as a normal user in a RootOnly par‐
1782 tition for example. If run as root, sbatch will drop its permis‐
1783 sions to the uid specified after node allocation is successful.
1784 user may be the user name or numerical user ID.
1785
1786
1787 --use-min-nodes
1788 If a range of node counts is given, prefer the smaller count.
1789
1790
1791 -V, --version
1792 Display version information and exit.
1793
1794
1795 -v, --verbose
1796 Increase the verbosity of sbatch's informational messages. Mul‐
1797 tiple -v's will further increase sbatch's verbosity. By default
1798 only errors will be displayed.
1799
1800
1801 -w, --nodelist=<node name list>
1802 Request a specific list of hosts. The job will contain all of
1803 these hosts and possibly additional hosts as needed to satisfy
1804 resource requirements. The list may be specified as a
1805 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1806 for example), or a filename. The host list will be assumed to
1807 be a filename if it contains a "/" character. If you specify a
1808 minimum node or processor count larger than can be satisfied by
1809 the supplied host list, additional resources will be allocated
1810 on other nodes as needed. Duplicate node names in the list will
1811 be ignored. The order of the node names in the list is not im‐
1812 portant; the node names will be sorted by Slurm.
1813
1814
1815 -W, --wait
1816 Do not exit until the submitted job terminates. The exit code
1817 of the sbatch command will be the same as the exit code of the
1818 submitted job. If the job terminated due to a signal rather than
1819 a normal exit, the exit code will be set to 1. In the case of a
1820 job array, the exit code recorded will be the highest value for
1821 any task in the job array.
1822
1823
1824 --wait-all-nodes=<value>
1825 Controls when the execution of the command begins. By default
1826 the job will begin execution as soon as the allocation is made.
1827
1828 0 Begin execution as soon as allocation can be made. Do not
1829 wait for all nodes to be ready for use (i.e. booted).
1830
1831 1 Do not begin execution until all nodes are ready for use.
1832
1833
1834 --wckey=<wckey>
1835 Specify wckey to be used with job. If TrackWCKey=no (default)
1836 in the slurm.conf this value is ignored.
1837
1838
1839 --wrap=<command string>
1840 Sbatch will wrap the specified command string in a simple "sh"
1841 shell script, and submit that script to the slurm controller.
1842 When --wrap is used, a script name and arguments may not be
1843 specified on the command line; instead the sbatch-generated
1844 wrapper script is used.
1845
1846
1847 -x, --exclude=<node name list>
1848 Explicitly exclude certain nodes from the resources granted to
1849 the job.
1850
1851
1853 sbatch allows for a filename pattern to contain one or more replacement
1854 symbols, which are a percent sign "%" followed by a letter (e.g. %j).
1855
1856 \\ Do not process any of the replacement symbols.
1857
1858 %% The character "%".
1859
1860 %A Job array's master job allocation number.
1861
1862 %a Job array ID (index) number.
1863
1864 %J jobid.stepid of the running job. (e.g. "128.0")
1865
1866 %j jobid of the running job.
1867
1868 %N short hostname. This will create a separate IO file per node.
1869
1870 %n Node identifier relative to current job (e.g. "0" is the first
1871 node of the running job) This will create a separate IO file per
1872 node.
1873
1874 %s stepid of the running job.
1875
1876 %t task identifier (rank) relative to current job. This will create
1877 a separate IO file per task.
1878
1879 %u User name.
1880
1881 %x Job name.
1882
1883 A number placed between the percent character and format specifier may
1884 be used to zero-pad the result in the IO filename. This number is ig‐
1885 nored if the format specifier corresponds to non-numeric data (%N for
1886 example).
1887
1888 Some examples of how the format string may be used for a 4 task job
1889 step with a Job ID of 128 and step id of 0 are included below:
1890
1891 job%J.out job128.0.out
1892
1893 job%4j.out job0128.out
1894
1895 job%j-%2t.out job128-00.out, job128-01.out, ...
1896
1898 Executing sbatch sends a remote procedure call to slurmctld. If enough
1899 calls from sbatch or other Slurm client commands that send remote pro‐
1900 cedure calls to the slurmctld daemon come in at once, it can result in
1901 a degradation of performance of the slurmctld daemon, possibly result‐
1902 ing in a denial of service.
1903
1904 Do not run sbatch or other Slurm client commands that send remote pro‐
1905 cedure calls to slurmctld from loops in shell scripts or other pro‐
1906 grams. Ensure that programs limit calls to sbatch to the minimum neces‐
1907 sary for the information you are trying to gather.
1908
1909
1911 Upon startup, sbatch will read and handle the options set in the fol‐
1912 lowing environment variables. Note that environment variables will
1913 override any options set in a batch script, and command line options
1914 will override any environment variables.
1915
1916
1917 SBATCH_ACCOUNT Same as -A, --account
1918
1919 SBATCH_ACCTG_FREQ Same as --acctg-freq
1920
1921 SBATCH_ARRAY_INX Same as -a, --array
1922
1923 SBATCH_BATCH Same as --batch
1924
1925 SBATCH_CLUSTERS or SLURM_CLUSTERS
1926 Same as --clusters
1927
1928 SBATCH_CONSTRAINT Same as -C, --constraint
1929
1930 SBATCH_CORE_SPEC Same as --core-spec
1931
1932 SBATCH_CPUS_PER_GPU Same as --cpus-per-gpu
1933
1934 SBATCH_DEBUG Same as -v, --verbose
1935
1936 SBATCH_DELAY_BOOT Same as --delay-boot
1937
1938 SBATCH_DISTRIBUTION Same as -m, --distribution
1939
1940 SBATCH_EXCLUSIVE Same as --exclusive
1941
1942 SBATCH_EXPORT Same as --export
1943
1944 SBATCH_GET_USER_ENV Same as --get-user-env
1945
1946 SBATCH_GPU_BIND Same as --gpu-bind
1947
1948 SBATCH_GPU_FREQ Same as --gpu-freq
1949
1950 SBATCH_GPUS Same as -G, --gpus
1951
1952 SBATCH_GPUS_PER_NODE Same as --gpus-per-node
1953
1954 SBATCH_GPUS_PER_TASK Same as --gpus-per-task
1955
1956 SBATCH_GRES Same as --gres
1957
1958 SBATCH_GRES_FLAGS Same as --gres-flags
1959
1960 SBATCH_HINT or SLURM_HINT
1961 Same as --hint
1962
1963 SBATCH_IGNORE_PBS Same as --ignore-pbs
1964
1965 SBATCH_JOB_NAME Same as -J, --job-name
1966
1967 SBATCH_MEM_BIND Same as --mem-bind
1968
1969 SBATCH_MEM_PER_CPU Same as --mem-per-cpu
1970
1971 SBATCH_MEM_PER_GPU Same as --mem-per-gpu
1972
1973 SBATCH_MEM_PER_NODE Same as --mem
1974
1975 SBATCH_NETWORK Same as --network
1976
1977 SBATCH_NO_KILL Same as -k, --no-kill
1978
1979 SBATCH_NO_REQUEUE Same as --no-requeue
1980
1981 SBATCH_OPEN_MODE Same as --open-mode
1982
1983 SBATCH_OVERCOMMIT Same as -O, --overcommit
1984
1985 SBATCH_PARTITION Same as -p, --partition
1986
1987 SBATCH_POWER Same as --power
1988
1989 SBATCH_PROFILE Same as --profile
1990
1991 SBATCH_QOS Same as --qos
1992
1993 SBATCH_REQ_SWITCH When a tree topology is used, this defines the
1994 maximum count of switches desired for the job al‐
1995 location and optionally the maximum time to wait
1996 for that number of switches. See --switches
1997
1998 SBATCH_REQUEUE Same as --requeue
1999
2000 SBATCH_RESERVATION Same as --reservation
2001
2002 SBATCH_SIGNAL Same as --signal
2003
2004 SBATCH_SPREAD_JOB Same as --spread-job
2005
2006 SBATCH_THREAD_SPEC Same as --thread-spec
2007
2008 SBATCH_TIMELIMIT Same as -t, --time
2009
2010 SBATCH_USE_MIN_NODES Same as --use-min-nodes
2011
2012 SBATCH_WAIT Same as -W, --wait
2013
2014 SBATCH_WAIT_ALL_NODES Same as --wait-all-nodes
2015
2016 SBATCH_WAIT4SWITCH Max time waiting for requested switches. See
2017 --switches
2018
2019 SBATCH_WCKEY Same as --wckey
2020
2021 SLURM_CONF The location of the Slurm configuration file.
2022
2023 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
2024 error occurs (e.g. invalid options). This can be
2025 used by a script to distinguish application exit
2026 codes from various Slurm error conditions.
2027
2028 SLURM_STEP_KILLED_MSG_NODE_ID=ID
2029 If set, only the specified node will log when the
2030 job or step are killed by a signal.
2031
2032
2034 The Slurm controller will set the following variables in the environ‐
2035 ment of the batch script.
2036
2037 SBATCH_MEM_BIND
2038 Set to value of the --mem-bind option.
2039
2040 SBATCH_MEM_BIND_LIST
2041 Set to bit mask used for memory binding.
2042
2043 SBATCH_MEM_BIND_PREFER
2044 Set to "prefer" if the --mem-bind option includes the prefer op‐
2045 tion.
2046
2047 SBATCH_MEM_BIND_TYPE
2048 Set to the memory binding type specified with the --mem-bind op‐
2049 tion. Possible values are "none", "rank", "map_map", "mask_mem"
2050 and "local".
2051
2052 SBATCH_MEM_BIND_VERBOSE
2053 Set to "verbose" if the --mem-bind option includes the verbose
2054 option. Set to "quiet" otherwise.
2055
2056 SLURM_*_HET_GROUP_#
2057 For a heterogeneous job allocation, the environment variables
2058 are set separately for each component.
2059
2060 SLURM_ARRAY_JOB_ID
2061 Job array's master job ID number.
2062
2063 SLURM_ARRAY_TASK_COUNT
2064 Total number of tasks in a job array.
2065
2066 SLURM_ARRAY_TASK_ID
2067 Job array ID (index) number.
2068
2069 SLURM_ARRAY_TASK_MAX
2070 Job array's maximum ID (index) number.
2071
2072 SLURM_ARRAY_TASK_MIN
2073 Job array's minimum ID (index) number.
2074
2075 SLURM_ARRAY_TASK_STEP
2076 Job array's index step size.
2077
2078 SLURM_CLUSTER_NAME
2079 Name of the cluster on which the job is executing.
2080
2081 SLURM_CPUS_ON_NODE
2082 Number of CPUS on the allocated node.
2083
2084 SLURM_CPUS_PER_GPU
2085 Number of CPUs requested per allocated GPU. Only set if the
2086 --cpus-per-gpu option is specified.
2087
2088 SLURM_CPUS_PER_TASK
2089 Number of cpus requested per task. Only set if the
2090 --cpus-per-task option is specified.
2091
2092 SLURM_DIST_PLANESIZE
2093 Plane distribution size. Only set for plane distributions. See
2094 -m, --distribution.
2095
2096 SLURM_DISTRIBUTION
2097 Same as -m, --distribution
2098
2099 SLURM_EXPORT_ENV
2100 Same as --export.
2101
2102 SLURM_GPU_BIND
2103 Requested binding of tasks to GPU. Only set if the --gpu-bind
2104 option is specified.
2105
2106 SLURM_GPU_FREQ
2107 Requested GPU frequency. Only set if the --gpu-freq option is
2108 specified.
2109
2110 SLURM_GPUS
2111 Number of GPUs requested. Only set if the -G, --gpus option is
2112 specified.
2113
2114 SLURM_GPUS_PER_NODE
2115 Requested GPU count per allocated node. Only set if the
2116 --gpus-per-node option is specified.
2117
2118 SLURM_GPUS_PER_SOCKET
2119 Requested GPU count per allocated socket. Only set if the
2120 --gpus-per-socket option is specified.
2121
2122 SLURM_GPUS_PER_TASK
2123 Requested GPU count per allocated task. Only set if the
2124 --gpus-per-task option is specified.
2125
2126 SLURM_GTIDS
2127 Global task IDs running on this node. Zero origin and comma
2128 separated.
2129
2130 SLURM_HET_SIZE
2131 Set to count of components in heterogeneous job.
2132
2133 SLURM_JOB_ACCOUNT
2134 Account name associated of the job allocation.
2135
2136 SLURM_JOB_ID
2137 The ID of the job allocation.
2138
2139 SLURM_JOB_CPUS_PER_NODE
2140 Count of processors available to the job on this node. Note the
2141 select/linear plugin allocates entire nodes to jobs, so the
2142 value indicates the total count of CPUs on the node. The se‐
2143 lect/cons_res plugin allocates individual processors to jobs, so
2144 this number indicates the number of processors on this node al‐
2145 located to the job.
2146
2147 SLURM_JOB_DEPENDENCY
2148 Set to value of the --dependency option.
2149
2150 SLURM_JOB_NAME
2151 Name of the job.
2152
2153 SLURM_JOB_NODELIST
2154 List of nodes allocated to the job.
2155
2156 SLURM_JOB_NUM_NODES
2157 Total number of nodes in the job's resource allocation.
2158
2159 SLURM_JOB_PARTITION
2160 Name of the partition in which the job is running.
2161
2162 SLURM_JOB_QOS
2163 Quality Of Service (QOS) of the job allocation.
2164
2165 SLURM_JOB_RESERVATION
2166 Advanced reservation containing the job allocation, if any.
2167
2168 SLURM_JOBID
2169 The ID of the job allocation. See SLURM_JOB_ID. Included for
2170 backwards compatibility.
2171
2172 SLURM_LOCALID
2173 Node local task ID for the process within a job.
2174
2175 SLURM_MEM_PER_CPU
2176 Same as --mem-per-cpu
2177
2178 SLURM_MEM_PER_GPU
2179 Requested memory per allocated GPU. Only set if the
2180 --mem-per-gpu option is specified.
2181
2182 SLURM_MEM_PER_NODE
2183 Same as --mem
2184
2185 SLURM_NNODES
2186 Total number of nodes in the job's resource allocation. See
2187 SLURM_JOB_NUM_NODES. Included for backwards compatibility.
2188
2189 SLURM_NODE_ALIASES
2190 Sets of node name, communication address and hostname for nodes
2191 allocated to the job from the cloud. Each element in the set if
2192 colon separated and each set is comma separated. For example:
2193 SLURM_NODE_ALIASES=ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2194
2195 SLURM_NODEID
2196 ID of the nodes allocated.
2197
2198 SLURM_NODELIST
2199 List of nodes allocated to the job. See SLURM_JOB_NODELIST. In‐
2200 cluded for backwards compatibility.
2201
2202 SLURM_NPROCS
2203 Same as -n, --ntasks. See SLURM_NTASKS. Included for backwards
2204 compatibility.
2205
2206 SLURM_NTASKS
2207 Same as -n, --ntasks
2208
2209 SLURM_NTASKS_PER_CORE
2210 Number of tasks requested per core. Only set if the
2211 --ntasks-per-core option is specified.
2212
2213
2214 SLURM_NTASKS_PER_GPU
2215 Number of tasks requested per GPU. Only set if the
2216 --ntasks-per-gpu option is specified.
2217
2218 SLURM_NTASKS_PER_NODE
2219 Number of tasks requested per node. Only set if the
2220 --ntasks-per-node option is specified.
2221
2222 SLURM_NTASKS_PER_SOCKET
2223 Number of tasks requested per socket. Only set if the
2224 --ntasks-per-socket option is specified.
2225
2226 SLURM_OVERCOMMIT
2227 Set to 1 if --overcommit was specified.
2228
2229 SLURM_PRIO_PROCESS
2230 The scheduling priority (nice value) at the time of job submis‐
2231 sion. This value is propagated to the spawned processes.
2232
2233 SLURM_PROCID
2234 The MPI rank (or relative process ID) of the current process
2235
2236 SLURM_PROFILE
2237 Same as --profile
2238
2239 SLURM_RESTART_COUNT
2240 If the job has been restarted due to system failure or has been
2241 explicitly requeued, this will be sent to the number of times
2242 the job has been restarted.
2243
2244 SLURM_SUBMIT_DIR
2245 The directory from which sbatch was invoked.
2246
2247 SLURM_SUBMIT_HOST
2248 The hostname of the computer from which sbatch was invoked.
2249
2250 SLURM_TASK_PID
2251 The process ID of the task being started.
2252
2253 SLURM_TASKS_PER_NODE
2254 Number of tasks to be initiated on each node. Values are comma
2255 separated and in the same order as SLURM_JOB_NODELIST. If two
2256 or more consecutive nodes are to have the same task count, that
2257 count is followed by "(x#)" where "#" is the repetition count.
2258 For example, "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2259 first three nodes will each execute two tasks and the fourth
2260 node will execute one task.
2261
2262 SLURM_TOPOLOGY_ADDR
2263 This is set only if the system has the topology/tree plugin
2264 configured. The value will be set to the names network
2265 switches which may be involved in the job's communications
2266 from the system's top level switch down to the leaf switch and
2267 ending with node name. A period is used to separate each hard‐
2268 ware component name.
2269
2270 SLURM_TOPOLOGY_ADDR_PATTERN
2271 This is set only if the system has the topology/tree plugin
2272 configured. The value will be set component types listed in
2273 SLURM_TOPOLOGY_ADDR. Each component will be identified as ei‐
2274 ther "switch" or "node". A period is used to separate each
2275 hardware component type.
2276
2277 SLURMD_NODENAME
2278 Name of the node running the job script.
2279
2280
2282 Specify a batch script by filename on the command line. The batch
2283 script specifies a 1 minute time limit for the job.
2284
2285 $ cat myscript
2286 #!/bin/sh
2287 #SBATCH --time=1
2288 srun hostname |sort
2289
2290 $ sbatch -N4 myscript
2291 salloc: Granted job allocation 65537
2292
2293 $ cat slurm-65537.out
2294 host1
2295 host2
2296 host3
2297 host4
2298
2299
2300 Pass a batch script to sbatch on standard input:
2301
2302 $ sbatch -N4 <<EOF
2303 > #!/bin/sh
2304 > srun hostname |sort
2305 > EOF
2306 sbatch: Submitted batch job 65541
2307
2308 $ cat slurm-65541.out
2309 host1
2310 host2
2311 host3
2312 host4
2313
2314
2315 To create a heterogeneous job with 3 components, each allocating a
2316 unique set of nodes:
2317
2318 $ sbatch -w node[2-3] : -w node4 : -w node[5-7] work.bash
2319 Submitted batch job 34987
2320
2321
2323 Copyright (C) 2006-2007 The Regents of the University of California.
2324 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
2325 Copyright (C) 2008-2010 Lawrence Livermore National Security.
2326 Copyright (C) 2010-2017 SchedMD LLC.
2327
2328 This file is part of Slurm, a resource management program. For de‐
2329 tails, see <https://slurm.schedmd.com/>.
2330
2331 Slurm is free software; you can redistribute it and/or modify it under
2332 the terms of the GNU General Public License as published by the Free
2333 Software Foundation; either version 2 of the License, or (at your op‐
2334 tion) any later version.
2335
2336 Slurm is distributed in the hope that it will be useful, but WITHOUT
2337 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
2338 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
2339 for more details.
2340
2341
2343 sinfo(1), sattach(1), salloc(1), squeue(1), scancel(1), scontrol(1),
2344 slurm.conf(5), sched_setaffinity (2), numa (3)
2345
2346
2347
2348April 2021 Slurm Commands sbatch(1)