1sbatch(1) Slurm Commands sbatch(1)
2
3
4
6 sbatch - Submit a batch script to Slurm.
7
8
10 sbatch [OPTIONS(0)...] [ : [OPTIONS(N)...]] script(0) [args(0)...]
11
12 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
13 For more details about heterogeneous jobs see the document
14 https://slurm.schedmd.com/heterogeneous_jobs.html
15
16
18 sbatch submits a batch script to Slurm. The batch script may be given
19 to sbatch through a file name on the command line, or if no file name
20 is specified, sbatch will read in a script from standard input. The
21 batch script may contain options preceded with "#SBATCH" before any
22 executable commands in the script. sbatch will stop processing further
23 #SBATCH directives once the first non-comment non-whitespace line has
24 been reached in the script.
25
26 sbatch exits immediately after the script is successfully transferred
27 to the Slurm controller and assigned a Slurm job ID. The batch script
28 is not necessarily granted resources immediately, it may sit in the
29 queue of pending jobs for some time before its required resources
30 become available.
31
32 By default both standard output and standard error are directed to a
33 file of the name "slurm-%j.out", where the "%j" is replaced with the
34 job allocation number. The file will be generated on the first node of
35 the job allocation. Other than the batch script itself, Slurm does no
36 movement of user files.
37
38 When the job allocation is finally granted for the batch script, Slurm
39 runs a single copy of the batch script on the first node in the set of
40 allocated nodes.
41
42 The following document describes the influence of various options on
43 the allocation of cpus to jobs and tasks.
44 https://slurm.schedmd.com/cpu_management.html
45
46
48 sbatch will return 0 on success or error code on failure.
49
50
52 The batch script is resolved in the following order:
53
54 1. If script starts with ".", then path is constructed as: current
55 working directory / script
56 2. If script starts with a "/", then path is considered absolute.
57 3. If script is in current working directory.
58 4. If script can be resolved through PATH. See path_resolution(7).
59
60 Current working directory is the calling process working directory
61 unless the --chdir argument is passed, which will override the current
62 working directory.
63
64
66 -a, --array=<indexes>
67 Submit a job array, multiple jobs to be executed with identical
68 parameters. The indexes specification identifies what array
69 index values should be used. Multiple values may be specified
70 using a comma separated list and/or a range of values with a "-"
71 separator. For example, "--array=0-15" or "--array=0,6,16-32".
72 A step function can also be specified with a suffix containing a
73 colon and number. For example, "--array=0-15:4" is equivalent to
74 "--array=0,4,8,12". A maximum number of simultaneously running
75 tasks from the job array may be specified using a "%" separator.
76 For example "--array=0-15%4" will limit the number of simultane‐
77 ously running tasks from this job array to 4. The minimum index
78 value is 0. the maximum value is one less than the configura‐
79 tion parameter MaxArraySize. NOTE: currently, federated job
80 arrays only run on the local cluster.
81
82
83 -A, --account=<account>
84 Charge resources used by this job to specified account. The
85 account is an arbitrary string. The account name may be changed
86 after job submission using the scontrol command.
87
88
89 --acctg-freq
90 Define the job accounting and profiling sampling intervals.
91 This can be used to override the JobAcctGatherFrequency parame‐
92 ter in Slurm's configuration file, slurm.conf. The supported
93 format is as follows:
94
95 --acctg-freq=<datatype>=<interval>
96 where <datatype>=<interval> specifies the task sam‐
97 pling interval for the jobacct_gather plugin or a
98 sampling interval for a profiling type by the
99 acct_gather_profile plugin. Multiple, comma-sepa‐
100 rated <datatype>=<interval> intervals may be speci‐
101 fied. Supported datatypes are as follows:
102
103 task=<interval>
104 where <interval> is the task sampling inter‐
105 val in seconds for the jobacct_gather plugins
106 and for task profiling by the
107 acct_gather_profile plugin. NOTE: This fre‐
108 quency is used to monitor memory usage. If
109 memory limits are enforced the highest fre‐
110 quency a user can request is what is config‐
111 ured in the slurm.conf file. They can not
112 turn it off (=0) either.
113
114 energy=<interval>
115 where <interval> is the sampling interval in
116 seconds for energy profiling using the
117 acct_gather_energy plugin
118
119 network=<interval>
120 where <interval> is the sampling interval in
121 seconds for infiniband profiling using the
122 acct_gather_interconnect plugin.
123
124 filesystem=<interval>
125 where <interval> is the sampling interval in
126 seconds for filesystem profiling using the
127 acct_gather_filesystem plugin.
128
129 The default value for the task sampling
130 interval is 30 seconds.
131 The default value for all other intervals is 0. An interval of
132 0 disables sampling of the specified type. If the task sampling
133 interval is 0, accounting information is collected only at job
134 termination (reducing Slurm interference with the job).
135 Smaller (non-zero) values have a greater impact upon job perfor‐
136 mance, but a value of 30 seconds is not likely to be noticeable
137 for applications having less than 10,000 tasks.
138
139
140 -B --extra-node-info=<sockets[:cores[:threads]]>
141 Restrict node selection to nodes with at least the specified
142 number of sockets, cores per socket and/or threads per core.
143 NOTE: These options do not specify the resource allocation size.
144 Each value specified is considered a minimum. An asterisk (*)
145 can be used as a placeholder indicating that all available
146 resources of that type are to be utilized. Values can also be
147 specified as min-max. The individual levels can also be speci‐
148 fied in separate options if desired:
149 --sockets-per-node=<sockets>
150 --cores-per-socket=<cores>
151 --threads-per-core=<threads>
152 If task/affinity plugin is enabled, then specifying an alloca‐
153 tion in this manner also results in subsequently launched tasks
154 being bound to threads if the -B option specifies a thread
155 count, otherwise an option of cores if a core count is speci‐
156 fied, otherwise an option of sockets. If SelectType is config‐
157 ured to select/cons_res, it must have a parameter of CR_Core,
158 CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option
159 to be honored. If not specified, the scontrol show job will
160 display 'ReqS:C:T=*:*:*'. This option applies to job alloca‐
161 tions. NOTE: This option is mutually exclusive with --hint,
162 --threads-per-core and --ntasks-per-core.
163
164
165 --batch=<list>
166 Nodes can have features assigned to them by the Slurm adminis‐
167 trator. Users can specify which of these features are required
168 by their batch script using this options. For example a job's
169 allocation may include both Intel Haswell and KNL nodes with
170 features "haswell" and "knl" respectively. On such a configura‐
171 tion the batch script would normally benefit by executing on a
172 faster Haswell node. This would be specified using the option
173 "--batch=haswell". The specification can include AND and OR
174 operators using the ampersand and vertical bar separators. For
175 example: "--batch=haswell|broadwell" or
176 "--batch=haswell|big_memory". The --batch argument must be a
177 subset of the job's --constraint=<list> argument (i.e. the job
178 can not request only KNL nodes, but require the script to exe‐
179 cute on a Haswell node). If the request can not be satisfied
180 from the resources allocated to the job, the batch script will
181 execute on the first node of the job allocation.
182
183
184 --bb=<spec>
185 Burst buffer specification. The form of the specification is
186 system dependent. Note the burst buffer may not be accessible
187 from a login node, but require that salloc spawn a shell on one
188 of its allocated compute nodes.
189
190
191 --bbf=<file_name>
192 Path of file containing burst buffer specification. The form of
193 the specification is system dependent. These burst buffer
194 directives will be inserted into the submitted batch script.
195
196
197 -b, --begin=<time>
198 Submit the batch script to the Slurm controller immediately,
199 like normal, but tell the controller to defer the allocation of
200 the job until the specified time.
201
202 Time may be of the form HH:MM:SS to run a job at a specific time
203 of day (seconds are optional). (If that time is already past,
204 the next day is assumed.) You may also specify midnight, noon,
205 fika (3 PM) or teatime (4 PM) and you can have a time-of-day
206 suffixed with AM or PM for running in the morning or the
207 evening. You can also say what day the job will be run, by
208 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
209 Combine date and time using the following format
210 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
211 count time-units, where the time-units can be seconds (default),
212 minutes, hours, days, or weeks and you can tell Slurm to run the
213 job today with the keyword today and to run the job tomorrow
214 with the keyword tomorrow. The value may be changed after job
215 submission using the scontrol command. For example:
216 --begin=16:00
217 --begin=now+1hour
218 --begin=now+60 (seconds by default)
219 --begin=2010-01-20T12:34:00
220
221
222 Notes on date/time specifications:
223 - Although the 'seconds' field of the HH:MM:SS time specifica‐
224 tion is allowed by the code, note that the poll time of the
225 Slurm scheduler is not precise enough to guarantee dispatch of
226 the job on the exact second. The job will be eligible to start
227 on the next poll following the specified time. The exact poll
228 interval depends on the Slurm scheduler (e.g., 60 seconds with
229 the default sched/builtin).
230 - If no time (HH:MM:SS) is specified, the default is
231 (00:00:00).
232 - If a date is specified without a year (e.g., MM/DD) then the
233 current year is assumed, unless the combination of MM/DD and
234 HH:MM:SS has already passed for that year, in which case the
235 next year is used.
236
237
238 --cluster-constraint=[!]<list>
239 Specifies features that a federated cluster must have to have a
240 sibling job submitted to it. Slurm will attempt to submit a sib‐
241 ling job to a cluster if it has at least one of the specified
242 features. If the "!" option is included, Slurm will attempt to
243 submit a sibling job to a cluster that has none of the specified
244 features.
245
246
247 --comment=<string>
248 An arbitrary comment enclosed in double quotes if using spaces
249 or some special characters.
250
251
252 -C, --constraint=<list>
253 Nodes can have features assigned to them by the Slurm adminis‐
254 trator. Users can specify which of these features are required
255 by their job using the constraint option. Only nodes having
256 features matching the job constraints will be used to satisfy
257 the request. Multiple constraints may be specified with AND,
258 OR, matching OR, resource counts, etc. (some operators are not
259 supported on all system types). Supported constraint options
260 include:
261
262 Single Name
263 Only nodes which have the specified feature will be used.
264 For example, --constraint="intel"
265
266 Node Count
267 A request can specify the number of nodes needed with
268 some feature by appending an asterisk and count after the
269 feature name. For example, --nodes=16 --con‐
270 straint="graphics*4 ..." indicates that the job requires
271 16 nodes and that at least four of those nodes must have
272 the feature "graphics."
273
274 AND If only nodes with all of specified features will be
275 used. The ampersand is used for an AND operator. For
276 example, --constraint="intel&gpu"
277
278 OR If only nodes with at least one of specified features
279 will be used. The vertical bar is used for an OR opera‐
280 tor. For example, --constraint="intel|amd"
281
282 Matching OR
283 If only one of a set of possible options should be used
284 for all allocated nodes, then use the OR operator and
285 enclose the options within square brackets. For example,
286 --constraint="[rack1|rack2|rack3|rack4]" might be used to
287 specify that all nodes must be allocated on a single rack
288 of the cluster, but any of those four racks can be used.
289
290 Multiple Counts
291 Specific counts of multiple resources may be specified by
292 using the AND operator and enclosing the options within
293 square brackets. For example, --con‐
294 straint="[rack1*2&rack2*4]" might be used to specify that
295 two nodes must be allocated from nodes with the feature
296 of "rack1" and four nodes must be allocated from nodes
297 with the feature "rack2".
298
299 NOTE: This construct does not support multiple Intel KNL
300 NUMA or MCDRAM modes. For example, while --con‐
301 straint="[(knl&quad)*2&(knl&hemi)*4]" is not supported,
302 --constraint="[haswell*2&(knl&hemi)*4]" is supported.
303 Specification of multiple KNL modes requires the use of a
304 heterogeneous job.
305
306 Brackets
307 Brackets can be used to indicate that you are looking for
308 a set of nodes with the different requirements contained
309 within the brackets. For example, --con‐
310 straint="[(rack1|rack2)*1&(rack3)*2]" will get you one
311 node with either the "rack1" or "rack2" features and two
312 nodes with the "rack3" feature. The same request without
313 the brackets will try to find a single node that meets
314 those requirements.
315
316 Parenthesis
317 Parenthesis can be used to group like node features
318 together. For example, --con‐
319 straint="[(knl&snc4&flat)*4&haswell*1]" might be used to
320 specify that four nodes with the features "knl", "snc4"
321 and "flat" plus one node with the feature "haswell" are
322 required. All options within parenthesis should be
323 grouped with AND (e.g. "&") operands.
324
325
326 --contiguous
327 If set, then the allocated nodes must form a contiguous set.
328
329 NOTE: If SelectPlugin=cons_res this option won't be honored with
330 the topology/tree or topology/3d_torus plugins, both of which
331 can modify the node ordering.
332
333
334 --cores-per-socket=<cores>
335 Restrict node selection to nodes with at least the specified
336 number of cores per socket. See additional information under -B
337 option above when task/affinity plugin is enabled.
338
339
340 --cpu-freq =<p1[-p2[:p3]]>
341
342 Request that job steps initiated by srun commands inside this
343 sbatch script be run at some requested frequency if possible, on
344 the CPUs selected for the step on the compute node(s).
345
346 p1 can be [#### | low | medium | high | highm1] which will set
347 the frequency scaling_speed to the corresponding value, and set
348 the frequency scaling_governor to UserSpace. See below for defi‐
349 nition of the values.
350
351 p1 can be [Conservative | OnDemand | Performance | PowerSave]
352 which will set the scaling_governor to the corresponding value.
353 The governor has to be in the list set by the slurm.conf option
354 CpuFreqGovernors.
355
356 When p2 is present, p1 will be the minimum scaling frequency and
357 p2 will be the maximum scaling frequency.
358
359 p2 can be [#### | medium | high | highm1] p2 must be greater
360 than p1.
361
362 p3 can be [Conservative | OnDemand | Performance | PowerSave |
363 UserSpace] which will set the governor to the corresponding
364 value.
365
366 If p3 is UserSpace, the frequency scaling_speed will be set by a
367 power or energy aware scheduling strategy to a value between p1
368 and p2 that lets the job run within the site's power goal. The
369 job may be delayed if p1 is higher than a frequency that allows
370 the job to run within the goal.
371
372 If the current frequency is < min, it will be set to min. Like‐
373 wise, if the current frequency is > max, it will be set to max.
374
375 Acceptable values at present include:
376
377 #### frequency in kilohertz
378
379 Low the lowest available frequency
380
381 High the highest available frequency
382
383 HighM1 (high minus one) will select the next highest
384 available frequency
385
386 Medium attempts to set a frequency in the middle of the
387 available range
388
389 Conservative attempts to use the Conservative CPU governor
390
391 OnDemand attempts to use the OnDemand CPU governor (the
392 default value)
393
394 Performance attempts to use the Performance CPU governor
395
396 PowerSave attempts to use the PowerSave CPU governor
397
398 UserSpace attempts to use the UserSpace CPU governor
399
400
401 The following informational environment variable is set
402 in the job
403 step when --cpu-freq option is requested.
404 SLURM_CPU_FREQ_REQ
405
406 This environment variable can also be used to supply the value
407 for the CPU frequency request if it is set when the 'srun' com‐
408 mand is issued. The --cpu-freq on the command line will over‐
409 ride the environment variable value. The form on the environ‐
410 ment variable is the same as the command line. See the ENVIRON‐
411 MENT VARIABLES section for a description of the
412 SLURM_CPU_FREQ_REQ variable.
413
414 NOTE: This parameter is treated as a request, not a requirement.
415 If the job step's node does not support setting the CPU fre‐
416 quency, or the requested value is outside the bounds of the
417 legal frequencies, an error is logged, but the job step is
418 allowed to continue.
419
420 NOTE: Setting the frequency for just the CPUs of the job step
421 implies that the tasks are confined to those CPUs. If task con‐
422 finement (i.e., TaskPlugin=task/affinity or TaskPlu‐
423 gin=task/cgroup with the "ConstrainCores" option) is not config‐
424 ured, this parameter is ignored.
425
426 NOTE: When the step completes, the frequency and governor of
427 each selected CPU is reset to the previous values.
428
429 NOTE: When submitting jobs with the --cpu-freq option with lin‐
430 uxproc as the ProctrackType can cause jobs to run too quickly
431 before Accounting is able to poll for job information. As a
432 result not all of accounting information will be present.
433
434
435 --cpus-per-gpu=<ncpus>
436 Advise Slurm that ensuing job steps will require ncpus proces‐
437 sors per allocated GPU. Not compatible with the --cpus-per-task
438 option.
439
440
441 -c, --cpus-per-task=<ncpus>
442 Advise the Slurm controller that ensuing job steps will require
443 ncpus number of processors per task. Without this option, the
444 controller will just try to allocate one processor per task.
445
446 For instance, consider an application that has 4 tasks, each
447 requiring 3 processors. If our cluster is comprised of
448 quad-processors nodes and we simply ask for 12 processors, the
449 controller might give us only 3 nodes. However, by using the
450 --cpus-per-task=3 options, the controller knows that each task
451 requires 3 processors on the same node, and the controller will
452 grant an allocation of 4 nodes, one for each of the 4 tasks.
453
454
455 --deadline=<OPT>
456 remove the job if no ending is possible before this deadline
457 (start > (deadline - time[-min])). Default is no deadline.
458 Valid time formats are:
459 HH:MM[:SS] [AM|PM]
460 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
461 MM/DD[/YY]-HH:MM[:SS]
462 YYYY-MM-DD[THH:MM[:SS]]]
463 now[+count[seconds(default)|minutes|hours|days|weeks]]
464
465
466 --delay-boot=<minutes>
467 Do not reboot nodes in order to satisfied this job's feature
468 specification if the job has been eligible to run for less than
469 this time period. If the job has waited for less than the spec‐
470 ified period, it will use only nodes which already have the
471 specified features. The argument is in units of minutes. A
472 default value may be set by a system administrator using the
473 delay_boot option of the SchedulerParameters configuration
474 parameter in the slurm.conf file, otherwise the default value is
475 zero (no delay).
476
477
478 -d, --dependency=<dependency_list>
479 Defer the start of this job until the specified dependencies
480 have been satisfied completed. <dependency_list> is of the form
481 <type:job_id[:job_id][,type:job_id[:job_id]]> or
482 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
483 must be satisfied if the "," separator is used. Any dependency
484 may be satisfied if the "?" separator is used. Only one separa‐
485 tor may be used. Many jobs can share the same dependency and
486 these jobs may even belong to different users. The value may
487 be changed after job submission using the scontrol command.
488 Dependencies on remote jobs are allowed in a federation. Once a
489 job dependency fails due to the termination state of a preceding
490 job, the dependent job will never be run, even if the preceding
491 job is requeued and has a different termination state in a sub‐
492 sequent execution.
493
494 after:job_id[[+time][:jobid[+time]...]]
495 After the specified jobs start or are cancelled and
496 'time' in minutes from job start or cancellation happens,
497 this job can begin execution. If no 'time' is given then
498 there is no delay after start or cancellation.
499
500 afterany:job_id[:jobid...]
501 This job can begin execution after the specified jobs
502 have terminated.
503
504 afterburstbuffer:job_id[:jobid...]
505 This job can begin execution after the specified jobs
506 have terminated and any associated burst buffer stage out
507 operations have completed.
508
509 aftercorr:job_id[:jobid...]
510 A task of this job array can begin execution after the
511 corresponding task ID in the specified job has completed
512 successfully (ran to completion with an exit code of
513 zero).
514
515 afternotok:job_id[:jobid...]
516 This job can begin execution after the specified jobs
517 have terminated in some failed state (non-zero exit code,
518 node failure, timed out, etc).
519
520 afterok:job_id[:jobid...]
521 This job can begin execution after the specified jobs
522 have successfully executed (ran to completion with an
523 exit code of zero).
524
525 expand:job_id
526 Resources allocated to this job should be used to expand
527 the specified job. The job to expand must share the same
528 QOS (Quality of Service) and partition. Gang scheduling
529 of resources in the partition is also not supported.
530 "expand" is not allowed for jobs that didn't originate on
531 the same cluster as the submitted job.
532
533 singleton
534 This job can begin execution after any previously
535 launched jobs sharing the same job name and user have
536 terminated. In other words, only one job by that name
537 and owned by that user can be running or suspended at any
538 point in time. In a federation, a singleton dependency
539 must be fulfilled on all clusters unless DependencyParam‐
540 eters=disable_remote_singleton is used in slurm.conf.
541
542
543 -D, --chdir=<directory>
544 Set the working directory of the batch script to directory
545 before it is executed. The path can be specified as full path or
546 relative path to the directory where the command is executed.
547
548
549 -e, --error=<filename pattern>
550 Instruct Slurm to connect the batch script's standard error
551 directly to the file name specified in the "filename pattern".
552 By default both standard output and standard error are directed
553 to the same file. For job arrays, the default file name is
554 "slurm-%A_%a.out", "%A" is replaced by the job ID and "%a" with
555 the array index. For other jobs, the default file name is
556 "slurm-%j.out", where the "%j" is replaced by the job ID. See
557 the filename pattern section below for filename specification
558 options.
559
560
561 --exclusive[=user|mcs]
562 The job allocation can not share nodes with other running jobs
563 (or just other users with the "=user" option or with the "=mcs"
564 option). The default shared/exclusive behavior depends on sys‐
565 tem configuration and the partition's OverSubscribe option takes
566 precedence over the job's option.
567
568
569 --export=<[ALL,]environment variables|ALL|NONE>
570 Identify which environment variables from the submission envi‐
571 ronment are propagated to the launched application. Note that
572 SLURM_* variables are always propagated.
573
574 --export=ALL
575 Default mode if --export is not specified. All of the
576 users environment will be loaded (either from callers
577 environment or clean environment if --get-user-env is
578 specified).
579
580 --export=NONE
581 Only SLURM_* variables from the user environment will
582 be defined. User must use absolute path to the binary
583 to be executed that will define the environment. User
584 can not specify explicit environment variables with
585 NONE. --get-user-env will be ignored.
586 This option is particularly important for jobs that
587 are submitted on one cluster and execute on a differ‐
588 ent cluster (e.g. with different paths). To avoid
589 steps inheriting environment export settings (e.g.
590 NONE) from sbatch command, the environment variable
591 SLURM_EXPORT_ENV should be set to ALL in the job
592 script.
593
594 --export=<[ALL,]environment variables>
595 Exports all SLURM_* environment variables along with
596 explicitly defined variables. Multiple environment
597 variable names should be comma separated. Environment
598 variable names may be specified to propagate the cur‐
599 rent value (e.g. "--export=EDITOR") or specific values
600 may be exported (e.g. "--export=EDITOR=/bin/emacs").
601 If ALL is specified, then all user environment vari‐
602 ables will be loaded and will take precedence over any
603 explicitly given environment variables.
604
605 Example: --export=EDITOR,ARG1=test
606 In this example, the propagated environment will only
607 contain the variable EDITOR from the user's environ‐
608 ment, SLURM_* environment variables, and ARG1=test.
609
610 Example: --export=ALL,EDITOR=/bin/emacs
611 There are two possible outcomes for this example. If
612 the caller has the EDITOR environment variable
613 defined, then the job's environment will inherit the
614 variable from the caller's environment. If the caller
615 doesn't have an environment variable defined for EDI‐
616 TOR, then the job's environment will use the value
617 given by --export.
618
619
620 --export-file=<filename | fd>
621 If a number between 3 and OPEN_MAX is specified as the argument
622 to this option, a readable file descriptor will be assumed
623 (STDIN and STDOUT are not supported as valid arguments). Other‐
624 wise a filename is assumed. Export environment variables
625 defined in <filename> or read from <fd> to the job's execution
626 environment. The content is one or more environment variable
627 definitions of the form NAME=value, each separated by a null
628 character. This allows the use of special characters in envi‐
629 ronment definitions.
630
631
632 -F, --nodefile=<node file>
633 Much like --nodelist, but the list is contained in a file of
634 name node file. The node names of the list may also span multi‐
635 ple lines in the file. Duplicate node names in the file will
636 be ignored. The order of the node names in the list is not
637 important; the node names will be sorted by Slurm.
638
639
640 --get-user-env[=timeout][mode]
641 This option will tell sbatch to retrieve the login environment
642 variables for the user specified in the --uid option. The envi‐
643 ronment variables are retrieved by running something of this
644 sort "su - <username> -c /usr/bin/env" and parsing the output.
645 Be aware that any environment variables already set in sbatch's
646 environment will take precedence over any environment variables
647 in the user's login environment. Clear any environment variables
648 before calling sbatch that you do not want propagated to the
649 spawned program. The optional timeout value is in seconds.
650 Default value is 8 seconds. The optional mode value control the
651 "su" options. With a mode value of "S", "su" is executed with‐
652 out the "-" option. With a mode value of "L", "su" is executed
653 with the "-" option, replicating the login environment. If mode
654 not specified, the mode established at Slurm build time is used.
655 Example of use include "--get-user-env", "--get-user-env=10"
656 "--get-user-env=10L", and "--get-user-env=S".
657
658
659 --gid=<group>
660 If sbatch is run as root, and the --gid option is used, submit
661 the job with group's group access permissions. group may be the
662 group name or the numerical group ID.
663
664
665 -G, --gpus=[<type>:]<number>
666 Specify the total number of GPUs required for the job. An
667 optional GPU type specification can be supplied. For example
668 "--gpus=volta:3". Multiple options can be requested in a comma
669 separated list, for example: "--gpus=volta:3,kepler:1". See
670 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
671 options.
672
673
674 --gpu-bind=[verbose,]<type>
675 Bind tasks to specific GPUs. By default every spawned task can
676 access every GPU allocated to the job. If "verbose," is speci‐
677 fied before <type>, then print out GPU binding information.
678
679 Supported type options:
680
681 closest Bind each task to the GPU(s) which are closest. In a
682 NUMA environment, each task may be bound to more than
683 one GPU (i.e. all GPUs in that NUMA environment).
684
685 map_gpu:<list>
686 Bind by setting GPU masks on tasks (or ranks) as spec‐
687 ified where <list> is
688 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
689 are interpreted as decimal values unless they are pre‐
690 ceded with '0x' in which case they interpreted as
691 hexadecimal values. If the number of tasks (or ranks)
692 exceeds the number of elements in this list, elements
693 in the list will be reused as needed starting from the
694 beginning of the list. To simplify support for large
695 task counts, the lists may follow a map with an aster‐
696 isk and repetition count. For example
697 "map_gpu:0*4,1*4". If the task/cgroup plugin is used
698 and ConstrainDevices is set in cgroup.conf, then the
699 GPU IDs are zero-based indexes relative to the GPUs
700 allocated to the job (e.g. the first GPU is 0, even if
701 the global ID is 3). Otherwise, the GPU IDs are global
702 IDs, and all GPUs on each node in the job should be
703 allocated for predictable binding results.
704
705 mask_gpu:<list>
706 Bind by setting GPU masks on tasks (or ranks) as spec‐
707 ified where <list> is
708 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
709 mapping is specified for a node and identical mapping
710 is applied to the tasks on every node (i.e. the lowest
711 task ID on each node is mapped to the first mask spec‐
712 ified in the list, etc.). GPU masks are always inter‐
713 preted as hexadecimal values but can be preceded with
714 an optional '0x'. To simplify support for large task
715 counts, the lists may follow a map with an asterisk
716 and repetition count. For example
717 "mask_gpu:0x0f*4,0xf0*4". If the task/cgroup plugin
718 is used and ConstrainDevices is set in cgroup.conf,
719 then the GPU IDs are zero-based indexes relative to
720 the GPUs allocated to the job (e.g. the first GPU is
721 0, even if the global ID is 3). Otherwise, the GPU IDs
722 are global IDs, and all GPUs on each node in the job
723 should be allocated for predictable binding results.
724
725 single:<tasks_per_gpu>
726 Like --gpu-bind=closest, except that each task can
727 only be bound to a single GPU, even when it can be
728 bound to multiple GPUs that are equally close. The
729 GPU to bind to is determined by <tasks_per_gpu>, where
730 the first <tasks_per_gpu> tasks are bound to the first
731 GPU available, the second <tasks_per_gpu> tasks are
732 bound to the second GPU available, etc. This is basi‐
733 cally a block distribution of tasks onto available
734 GPUs, where the available GPUs are determined by the
735 socket affinity of the task and the socket affinity of
736 the GPUs as specified in gres.conf's Cores parameter.
737
738
739 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
740 Request that GPUs allocated to the job are configured with spe‐
741 cific frequency values. This option can be used to indepen‐
742 dently configure the GPU and its memory frequencies. After the
743 job is completed, the frequencies of all affected GPUs will be
744 reset to the highest possible values. In some cases, system
745 power caps may override the requested values. The field type
746 can be "memory". If type is not specified, the GPU frequency is
747 implied. The value field can either be "low", "medium", "high",
748 "highm1" or a numeric value in megahertz (MHz). If the speci‐
749 fied numeric value is not possible, a value as close as possible
750 will be used. See below for definition of the values. The ver‐
751 bose option causes current GPU frequency information to be
752 logged. Examples of use include "--gpu-freq=medium,memory=high"
753 and "--gpu-freq=450".
754
755 Supported value definitions:
756
757 low the lowest available frequency.
758
759 medium attempts to set a frequency in the middle of the
760 available range.
761
762 high the highest available frequency.
763
764 highm1 (high minus one) will select the next highest avail‐
765 able frequency.
766
767
768 --gpus-per-node=[<type>:]<number>
769 Specify the number of GPUs required for the job on each node
770 included in the job's resource allocation. An optional GPU type
771 specification can be supplied. For example
772 "--gpus-per-node=volta:3". Multiple options can be requested in
773 a comma separated list, for example:
774 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
775 --gpus-per-socket and --gpus-per-task options.
776
777
778 --gpus-per-socket=[<type>:]<number>
779 Specify the number of GPUs required for the job on each socket
780 included in the job's resource allocation. An optional GPU type
781 specification can be supplied. For example
782 "--gpus-per-socket=volta:3". Multiple options can be requested
783 in a comma separated list, for example:
784 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
785 sockets per node count ( --sockets-per-node). See also the
786 --gpus, --gpus-per-node and --gpus-per-task options.
787
788
789 --gpus-per-task=[<type>:]<number>
790 Specify the number of GPUs required for the job on each task to
791 be spawned in the job's resource allocation. An optional GPU
792 type specification can be supplied. For example
793 "--gpus-per-task=volta:1". Multiple options can be requested in
794 a comma separated list, for example:
795 "--gpus-per-task=volta:3,kepler:1". See also the --gpus,
796 --gpus-per-socket and --gpus-per-node options. This option
797 requires an explicit task count, e.g. -n, --ntasks or "--gpus=X
798 --gpus-per-task=Y" rather than an ambiguous range of nodes with
799 -N, --nodes.
800 NOTE: This option will not have any impact on GPU binding,
801 specifically it won't limit the number of devices set for
802 CUDA_VISIBLE_DEVICES.
803
804
805 --gres=<list>
806 Specifies a comma delimited list of generic consumable
807 resources. The format of each entry on the list is
808 "name[[:type]:count]". The name is that of the consumable
809 resource. The count is the number of those resources with a
810 default value of 1. The count can have a suffix of "k" or "K"
811 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
812 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
813 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
814 x 1024 x 1024 x 1024). The specified resources will be allo‐
815 cated to the job on each node. The available generic consumable
816 resources is configurable by the system administrator. A list
817 of available generic consumable resources will be printed and
818 the command will exit if the option argument is "help". Exam‐
819 ples of use include "--gres=gpu:2,mic:1", "--gres=gpu:kepler:2",
820 and "--gres=help".
821
822
823 --gres-flags=<type>
824 Specify generic resource task binding options.
825
826 disable-binding
827 Disable filtering of CPUs with respect to generic
828 resource locality. This option is currently required to
829 use more CPUs than are bound to a GRES (i.e. if a GPU is
830 bound to the CPUs on one socket, but resources on more
831 than one socket are required to run the job). This
832 option may permit a job to be allocated resources sooner
833 than otherwise possible, but may result in lower job per‐
834 formance.
835 NOTE: This option is specific to SelectType=cons_res.
836
837 enforce-binding
838 The only CPUs available to the job will be those bound to
839 the selected GRES (i.e. the CPUs identified in the
840 gres.conf file will be strictly enforced). This option
841 may result in delayed initiation of a job. For example a
842 job requiring two GPUs and one CPU will be delayed until
843 both GPUs on a single socket are available rather than
844 using GPUs bound to separate sockets, however, the appli‐
845 cation performance may be improved due to improved commu‐
846 nication speed. Requires the node to be configured with
847 more than one socket and resource filtering will be per‐
848 formed on a per-socket basis.
849 NOTE: This option is specific to SelectType=cons_tres.
850
851
852 -H, --hold
853 Specify the job is to be submitted in a held state (priority of
854 zero). A held job can now be released using scontrol to reset
855 its priority (e.g. "scontrol release <job_id>").
856
857
858 -h, --help
859 Display help information and exit.
860
861
862 --hint=<type>
863 Bind tasks according to application hints.
864 NOTE: This option cannot be used in conjunction with
865 --ntasks-per-core, --threads-per-core or -B. If --hint is speci‐
866 fied as a command line argument, it will take precedence over
867 the environment.
868
869 compute_bound
870 Select settings for compute bound applications: use all
871 cores in each socket, one thread per core.
872
873 memory_bound
874 Select settings for memory bound applications: use only
875 one core in each socket, one thread per core.
876
877 [no]multithread
878 [don't] use extra threads with in-core multi-threading
879 which can benefit communication intensive applications.
880 Only supported with the task/affinity plugin.
881
882 help show this help message
883
884
885 --ignore-pbs
886 Ignore all "#PBS" and "#BSUB" options specified in the batch
887 script.
888
889
890 -i, --input=<filename pattern>
891 Instruct Slurm to connect the batch script's standard input
892 directly to the file name specified in the "filename pattern".
893
894 By default, "/dev/null" is open on the batch script's standard
895 input and both standard output and standard error are directed
896 to a file of the name "slurm-%j.out", where the "%j" is replaced
897 with the job allocation number, as described below in the file‐
898 name pattern section.
899
900
901 -J, --job-name=<jobname>
902 Specify a name for the job allocation. The specified name will
903 appear along with the job id number when querying running jobs
904 on the system. The default is the name of the batch script, or
905 just "sbatch" if the script is read on sbatch's standard input.
906
907
908 -k, --no-kill [=off]
909 Do not automatically terminate a job if one of the nodes it has
910 been allocated fails. The user will assume the responsibilities
911 for fault-tolerance should a node fail. When there is a node
912 failure, any active job steps (usually MPI jobs) on that node
913 will almost certainly suffer a fatal error, but with --no-kill,
914 the job allocation will not be revoked so the user may launch
915 new job steps on the remaining nodes in their allocation.
916
917 Specify an optional argument of "off" disable the effect of the
918 SBATCH_NO_KILL environment variable.
919
920 By default Slurm terminates the entire job allocation if any
921 node fails in its range of allocated nodes.
922
923
924 --kill-on-invalid-dep=<yes|no>
925 If a job has an invalid dependency and it can never run this
926 parameter tells Slurm to terminate it or not. A terminated job
927 state will be JOB_CANCELLED. If this option is not specified
928 the system wide behavior applies. By default the job stays
929 pending with reason DependencyNeverSatisfied or if the
930 kill_invalid_depend is specified in slurm.conf the job is termi‐
931 nated.
932
933
934 -L, --licenses=<license>
935 Specification of licenses (or other resources available on all
936 nodes of the cluster) which must be allocated to this job.
937 License names can be followed by a colon and count (the default
938 count is one). Multiple license names should be comma separated
939 (e.g. "--licenses=foo:4,bar"). To submit jobs using remote
940 licenses, those served by the slurmdbd, specify the name of the
941 server providing the licenses. For example "--license=nas‐
942 tran@slurmdb:12".
943
944
945 -M, --clusters=<string>
946 Clusters to issue commands to. Multiple cluster names may be
947 comma separated. The job will be submitted to the one cluster
948 providing the earliest expected job initiation time. The default
949 value is the current cluster. A value of 'all' will query to run
950 on all clusters. Note the --export option to control environ‐
951 ment variables exported between clusters. Note that the Slur‐
952 mDBD must be up for this option to work properly.
953
954
955 -m, --distribution=
956 arbitrary|<block|cyclic|plane=<options>[:block|cyclic|fcyclic]>
957
958 Specify alternate distribution methods for remote processes. In
959 sbatch, this only sets environment variables that will be used
960 by subsequent srun requests. This option controls the assign‐
961 ment of tasks to the nodes on which resources have been allo‐
962 cated, and the distribution of those resources to tasks for
963 binding (task affinity). The first distribution method (before
964 the ":") controls the distribution of resources across nodes.
965 The optional second distribution method (after the ":") controls
966 the distribution of resources across sockets within a node.
967 Note that with select/cons_res, the number of cpus allocated on
968 each socket and node may be different. Refer to
969 https://slurm.schedmd.com/mc_support.html for more information
970 on resource allocation, assignment of tasks to nodes, and bind‐
971 ing of tasks to CPUs.
972
973 First distribution method:
974
975 block The block distribution method will distribute tasks to a
976 node such that consecutive tasks share a node. For exam‐
977 ple, consider an allocation of three nodes each with two
978 cpus. A four-task block distribution request will dis‐
979 tribute those tasks to the nodes with tasks one and two
980 on the first node, task three on the second node, and
981 task four on the third node. Block distribution is the
982 default behavior if the number of tasks exceeds the num‐
983 ber of allocated nodes.
984
985 cyclic The cyclic distribution method will distribute tasks to a
986 node such that consecutive tasks are distributed over
987 consecutive nodes (in a round-robin fashion). For exam‐
988 ple, consider an allocation of three nodes each with two
989 cpus. A four-task cyclic distribution request will dis‐
990 tribute those tasks to the nodes with tasks one and four
991 on the first node, task two on the second node, and task
992 three on the third node. Note that when SelectType is
993 select/cons_res, the same number of CPUs may not be allo‐
994 cated on each node. Task distribution will be round-robin
995 among all the nodes with CPUs yet to be assigned to
996 tasks. Cyclic distribution is the default behavior if
997 the number of tasks is no larger than the number of allo‐
998 cated nodes.
999
1000 plane The tasks are distributed in blocks of a specified size.
1001 The number of tasks distributed to each node is the same
1002 as for cyclic distribution, but the taskids assigned to
1003 each node depend on the plane size. Additional distribu‐
1004 tion specifications cannot be combined with this option.
1005 For more details (including examples and diagrams),
1006 please see
1007 https://slurm.schedmd.com/mc_support.html
1008 and
1009 https://slurm.schedmd.com/dist_plane.html
1010
1011 arbitrary
1012 The arbitrary method of distribution will allocate pro‐
1013 cesses in-order as listed in file designated by the envi‐
1014 ronment variable SLURM_HOSTFILE. If this variable is
1015 listed it will override any other method specified. If
1016 not set the method will default to block. Inside the
1017 hostfile must contain at minimum the number of hosts
1018 requested and be one per line or comma separated. If
1019 specifying a task count (-n, --ntasks=<number>), your
1020 tasks will be laid out on the nodes in the order of the
1021 file.
1022 NOTE: The arbitrary distribution option on a job alloca‐
1023 tion only controls the nodes to be allocated to the job
1024 and not the allocation of CPUs on those nodes. This
1025 option is meant primarily to control a job step's task
1026 layout in an existing job allocation for the srun com‐
1027 mand.
1028
1029
1030 Second distribution method:
1031
1032 block The block distribution method will distribute tasks to
1033 sockets such that consecutive tasks share a socket.
1034
1035 cyclic The cyclic distribution method will distribute tasks to
1036 sockets such that consecutive tasks are distributed over
1037 consecutive sockets (in a round-robin fashion). Tasks
1038 requiring more than one CPU will have all of those CPUs
1039 allocated on a single socket if possible.
1040
1041 fcyclic
1042 The fcyclic distribution method will distribute tasks to
1043 sockets such that consecutive tasks are distributed over
1044 consecutive sockets (in a round-robin fashion). Tasks
1045 requiring more than one CPU will have each CPUs allocated
1046 in a cyclic fashion across sockets.
1047
1048
1049 --mail-type=<type>
1050 Notify user by email when certain event types occur. Valid type
1051 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1052 BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT),
1053 INVALID_DEPEND (dependency never satisfied), STAGE_OUT (burst
1054 buffer stage out and teardown completed), TIME_LIMIT,
1055 TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80
1056 (reached 80 percent of time limit), TIME_LIMIT_50 (reached 50
1057 percent of time limit) and ARRAY_TASKS (send emails for each
1058 array task). Multiple type values may be specified in a comma
1059 separated list. The user to be notified is indicated with
1060 --mail-user. Unless the ARRAY_TASKS option is specified, mail
1061 notifications on job BEGIN, END and FAIL apply to a job array as
1062 a whole rather than generating individual email messages for
1063 each task in the job array.
1064
1065
1066 --mail-user=<user>
1067 User to receive email notification of state changes as defined
1068 by --mail-type. The default value is the submitting user.
1069
1070
1071 --mcs-label=<mcs>
1072 Used only when the mcs/group plugin is enabled. This parameter
1073 is a group among the groups of the user. Default value is cal‐
1074 culated by the Plugin mcs if it's enabled.
1075
1076
1077 --mem=<size[units]>
1078 Specify the real memory required per node. Default units are
1079 megabytes. Different units can be specified using the suffix
1080 [K|M|G|T]. Default value is DefMemPerNode and the maximum value
1081 is MaxMemPerNode. If configured, both parameters can be seen
1082 using the scontrol show config command. This parameter would
1083 generally be used if whole nodes are allocated to jobs (Select‐
1084 Type=select/linear). Also see --mem-per-cpu and --mem-per-gpu.
1085 The --mem, --mem-per-cpu and --mem-per-gpu options are mutually
1086 exclusive. If --mem, --mem-per-cpu or --mem-per-gpu are speci‐
1087 fied as command line arguments, then they will take precedence
1088 over the environment.
1089
1090 NOTE: A memory size specification of zero is treated as a spe‐
1091 cial case and grants the job access to all of the memory on each
1092 node. If the job is allocated multiple nodes in a heterogeneous
1093 cluster, the memory limit on each node will be that of the node
1094 in the allocation with the smallest memory size (same limit will
1095 apply to every node in the job's allocation).
1096
1097 NOTE: Enforcement of memory limits currently relies upon the
1098 task/cgroup plugin or enabling of accounting, which samples mem‐
1099 ory use on a periodic basis (data need not be stored, just col‐
1100 lected). In both cases memory use is based upon the job's Resi‐
1101 dent Set Size (RSS). A task may exceed the memory limit until
1102 the next periodic accounting sample.
1103
1104
1105 --mem-per-cpu=<size[units]>
1106 Minimum memory required per allocated CPU. Default units are
1107 megabytes. The default value is DefMemPerCPU and the maximum
1108 value is MaxMemPerCPU (see exception below). If configured, both
1109 parameters can be seen using the scontrol show config command.
1110 Note that if the job's --mem-per-cpu value exceeds the config‐
1111 ured MaxMemPerCPU, then the user's limit will be treated as a
1112 memory limit per task; --mem-per-cpu will be reduced to a value
1113 no larger than MaxMemPerCPU; --cpus-per-task will be set and the
1114 value of --cpus-per-task multiplied by the new --mem-per-cpu
1115 value will equal the original --mem-per-cpu value specified by
1116 the user. This parameter would generally be used if individual
1117 processors are allocated to jobs (SelectType=select/cons_res).
1118 If resources are allocated by core, socket, or whole nodes, then
1119 the number of CPUs allocated to a job may be higher than the
1120 task count and the value of --mem-per-cpu should be adjusted
1121 accordingly. Also see --mem and --mem-per-gpu. The --mem,
1122 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1123
1124 NOTE: If the final amount of memory requested by a job can't be
1125 satisfied by any of the nodes configured in the partition, the
1126 job will be rejected. This could happen if --mem-per-cpu is
1127 used with the --exclusive option for a job allocation and
1128 --mem-per-cpu times the number of CPUs on a node is greater than
1129 the total memory of that node.
1130
1131
1132 --mem-per-gpu=<size[units]>
1133 Minimum memory required per allocated GPU. Default units are
1134 megabytes. Different units can be specified using the suffix
1135 [K|M|G|T]. Default value is DefMemPerGPU and is available on
1136 both a global and per partition basis. If configured, the
1137 parameters can be seen using the scontrol show config and scon‐
1138 trol show partition commands. Also see --mem. The --mem,
1139 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1140
1141
1142 --mem-bind=[{quiet,verbose},]type
1143 Bind tasks to memory. Used only when the task/affinity plugin is
1144 enabled and the NUMA memory functions are available. Note that
1145 the resolution of CPU and memory binding may differ on some
1146 architectures. For example, CPU binding may be performed at the
1147 level of the cores within a processor while memory binding will
1148 be performed at the level of nodes, where the definition of
1149 "nodes" may differ from system to system. By default no memory
1150 binding is performed; any task using any CPU can use any memory.
1151 This option is typically used to ensure that each task is bound
1152 to the memory closest to its assigned CPU. The use of any type
1153 other than "none" or "local" is not recommended.
1154
1155 NOTE: To have Slurm always report on the selected memory binding
1156 for all commands executed in a shell, you can enable verbose
1157 mode by setting the SLURM_MEM_BIND environment variable value to
1158 "verbose".
1159
1160 The following informational environment variables are set when
1161 --mem-bind is in use:
1162
1163 SLURM_MEM_BIND_LIST
1164 SLURM_MEM_BIND_PREFER
1165 SLURM_MEM_BIND_SORT
1166 SLURM_MEM_BIND_TYPE
1167 SLURM_MEM_BIND_VERBOSE
1168
1169 See the ENVIRONMENT VARIABLES section for a more detailed
1170 description of the individual SLURM_MEM_BIND* variables.
1171
1172 Supported options include:
1173
1174 help show this help message
1175
1176 local Use memory local to the processor in use
1177
1178 map_mem:<list>
1179 Bind by setting memory masks on tasks (or ranks) as spec‐
1180 ified where <list> is
1181 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1182 ping is specified for a node and identical mapping is
1183 applied to the tasks on every node (i.e. the lowest task
1184 ID on each node is mapped to the first ID specified in
1185 the list, etc.). NUMA IDs are interpreted as decimal
1186 values unless they are preceded with '0x' in which case
1187 they interpreted as hexadecimal values. If the number of
1188 tasks (or ranks) exceeds the number of elements in this
1189 list, elements in the list will be reused as needed
1190 starting from the beginning of the list. To simplify
1191 support for large task counts, the lists may follow a map
1192 with an asterisk and repetition count. For example
1193 "map_mem:0x0f*4,0xf0*4". For predictable binding
1194 results, all CPUs for each node in the job should be
1195 allocated to the job.
1196
1197 mask_mem:<list>
1198 Bind by setting memory masks on tasks (or ranks) as spec‐
1199 ified where <list> is
1200 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1201 mapping is specified for a node and identical mapping is
1202 applied to the tasks on every node (i.e. the lowest task
1203 ID on each node is mapped to the first mask specified in
1204 the list, etc.). NUMA masks are always interpreted as
1205 hexadecimal values. Note that masks must be preceded
1206 with a '0x' if they don't begin with [0-9] so they are
1207 seen as numerical values. If the number of tasks (or
1208 ranks) exceeds the number of elements in this list, ele‐
1209 ments in the list will be reused as needed starting from
1210 the beginning of the list. To simplify support for large
1211 task counts, the lists may follow a mask with an asterisk
1212 and repetition count. For example "mask_mem:0*4,1*4".
1213 For predictable binding results, all CPUs for each node
1214 in the job should be allocated to the job.
1215
1216 no[ne] don't bind tasks to memory (default)
1217
1218 p[refer]
1219 Prefer use of first specified NUMA node, but permit
1220 use of other available NUMA nodes.
1221
1222 q[uiet]
1223 quietly bind before task runs (default)
1224
1225 rank bind by task rank (not recommended)
1226
1227 sort sort free cache pages (run zonesort on Intel KNL nodes)
1228
1229 v[erbose]
1230 verbosely report binding before task runs
1231
1232
1233 --mincpus=<n>
1234 Specify a minimum number of logical cpus/processors per node.
1235
1236
1237 -N, --nodes=<minnodes[-maxnodes]>
1238 Request that a minimum of minnodes nodes be allocated to this
1239 job. A maximum node count may also be specified with maxnodes.
1240 If only one number is specified, this is used as both the mini‐
1241 mum and maximum node count. The partition's node limits super‐
1242 sede those of the job. If a job's node limits are outside of
1243 the range permitted for its associated partition, the job will
1244 be left in a PENDING state. This permits possible execution at
1245 a later time, when the partition limit is changed. If a job
1246 node limit exceeds the number of nodes configured in the parti‐
1247 tion, the job will be rejected. Note that the environment vari‐
1248 able SLURM_JOB_NODES will be set to the count of nodes actually
1249 allocated to the job. See the ENVIRONMENT VARIABLES section for
1250 more information. If -N is not specified, the default behavior
1251 is to allocate enough nodes to satisfy the requirements of the
1252 -n and -c options. The job will be allocated as many nodes as
1253 possible within the range specified and without delaying the
1254 initiation of the job. The node count specification may include
1255 a numeric value followed by a suffix of "k" (multiplies numeric
1256 value by 1,024) or "m" (multiplies numeric value by 1,048,576).
1257
1258
1259 -n, --ntasks=<number>
1260 sbatch does not launch tasks, it requests an allocation of
1261 resources and submits a batch script. This option advises the
1262 Slurm controller that job steps run within the allocation will
1263 launch a maximum of number tasks and to provide for sufficient
1264 resources. The default is one task per node, but note that the
1265 --cpus-per-task option will change this default.
1266
1267
1268 --network=<type>
1269 Specify information pertaining to the switch or network. The
1270 interpretation of type is system dependent. This option is sup‐
1271 ported when running Slurm on a Cray natively. It is used to
1272 request using Network Performance Counters. Only one value per
1273 request is valid. All options are case in-sensitive. In this
1274 configuration supported values include:
1275
1276 system
1277 Use the system-wide network performance counters. Only
1278 nodes requested will be marked in use for the job alloca‐
1279 tion. If the job does not fill up the entire system the
1280 rest of the nodes are not able to be used by other jobs
1281 using NPC, if idle their state will appear as PerfCnts.
1282 These nodes are still available for other jobs not using
1283 NPC.
1284
1285 blade Use the blade network performance counters. Only nodes
1286 requested will be marked in use for the job allocation.
1287 If the job does not fill up the entire blade(s) allocated
1288 to the job those blade(s) are not able to be used by other
1289 jobs using NPC, if idle their state will appear as PerfC‐
1290 nts. These nodes are still available for other jobs not
1291 using NPC.
1292
1293
1294 In all cases the job allocation request must specify the
1295 --exclusive option. Otherwise the request will be denied.
1296
1297 Also with any of these options steps are not allowed to share
1298 blades, so resources would remain idle inside an allocation if
1299 the step running on a blade does not take up all the nodes on
1300 the blade.
1301
1302 The network option is also supported on systems with IBM's Par‐
1303 allel Environment (PE). See IBM's LoadLeveler job command key‐
1304 word documentation about the keyword "network" for more informa‐
1305 tion. Multiple values may be specified in a comma separated
1306 list. All options are case in-sensitive. Supported values
1307 include:
1308
1309 BULK_XFER[=<resources>]
1310 Enable bulk transfer of data using Remote
1311 Direct-Memory Access (RDMA). The optional resources
1312 specification is a numeric value which can have a
1313 suffix of "k", "K", "m", "M", "g" or "G" for kilo‐
1314 bytes, megabytes or gigabytes. NOTE: The resources
1315 specification is not supported by the underlying IBM
1316 infrastructure as of Parallel Environment version
1317 2.2 and no value should be specified at this time.
1318
1319 CAU=<count> Number of Collective Acceleration Units (CAU)
1320 required. Applies only to IBM Power7-IH processors.
1321 Default value is zero. Independent CAU will be
1322 allocated for each programming interface (MPI, LAPI,
1323 etc.)
1324
1325 DEVNAME=<name>
1326 Specify the device name to use for communications
1327 (e.g. "eth0" or "mlx4_0").
1328
1329 DEVTYPE=<type>
1330 Specify the device type to use for communications.
1331 The supported values of type are: "IB" (InfiniBand),
1332 "HFI" (P7 Host Fabric Interface), "IPONLY" (IP-Only
1333 interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1334 nel Emulation of HPCE). The devices allocated to a
1335 job must all be of the same type. The default value
1336 depends upon depends upon what hardware is available
1337 and in order of preferences is IPONLY (which is not
1338 considered in User Space mode), HFI, IB, HPCE, and
1339 KMUX.
1340
1341 IMMED =<count>
1342 Number of immediate send slots per window required.
1343 Applies only to IBM Power7-IH processors. Default
1344 value is zero.
1345
1346 INSTANCES =<count>
1347 Specify number of network connections for each task
1348 on each network connection. The default instance
1349 count is 1.
1350
1351 IPV4 Use Internet Protocol (IP) version 4 communications
1352 (default).
1353
1354 IPV6 Use Internet Protocol (IP) version 6 communications.
1355
1356 LAPI Use the LAPI programming interface.
1357
1358 MPI Use the MPI programming interface. MPI is the
1359 default interface.
1360
1361 PAMI Use the PAMI programming interface.
1362
1363 SHMEM Use the OpenSHMEM programming interface.
1364
1365 SN_ALL Use all available switch networks (default).
1366
1367 SN_SINGLE Use one available switch network.
1368
1369 UPC Use the UPC programming interface.
1370
1371 US Use User Space communications.
1372
1373
1374 Some examples of network specifications:
1375
1376 Instances=2,US,MPI,SN_ALL
1377 Create two user space connections for MPI communica‐
1378 tions on every switch network for each task.
1379
1380 US,MPI,Instances=3,Devtype=IB
1381 Create three user space connections for MPI communi‐
1382 cations on every InfiniBand network for each task.
1383
1384 IPV4,LAPI,SN_Single
1385 Create a IP version 4 connection for LAPI communica‐
1386 tions on one switch network for each task.
1387
1388 Instances=2,US,LAPI,MPI
1389 Create two user space connections each for LAPI and
1390 MPI communications on every switch network for each
1391 task. Note that SN_ALL is the default option so
1392 every switch network is used. Also note that
1393 Instances=2 specifies that two connections are
1394 established for each protocol (LAPI and MPI) and
1395 each task. If there are two networks and four tasks
1396 on the node then a total of 32 connections are
1397 established (2 instances x 2 protocols x 2 networks
1398 x 4 tasks).
1399
1400
1401 --nice[=adjustment]
1402 Run the job with an adjusted scheduling priority within Slurm.
1403 With no adjustment value the scheduling priority is decreased by
1404 100. A negative nice value increases the priority, otherwise
1405 decreases it. The adjustment range is +/- 2147483645. Only priv‐
1406 ileged users can specify a negative adjustment.
1407
1408
1409 --no-requeue
1410 Specifies that the batch job should never be requeued under any
1411 circumstances. Setting this option will prevent system adminis‐
1412 trators from being able to restart the job (for example, after a
1413 scheduled downtime), recover from a node failure, or be requeued
1414 upon preemption by a higher priority job. When a job is
1415 requeued, the batch script is initiated from its beginning.
1416 Also see the --requeue option. The JobRequeue configuration
1417 parameter controls the default behavior on the cluster.
1418
1419
1420 --ntasks-per-core=<ntasks>
1421 Request the maximum ntasks be invoked on each core. Meant to be
1422 used with the --ntasks option. Related to --ntasks-per-node
1423 except at the core level instead of the node level. NOTE: This
1424 option is not supported unless SelectType=cons_res is configured
1425 (either directly or indirectly on Cray systems) along with the
1426 node's core count.
1427
1428
1429 --ntasks-per-gpu=<ntasks>
1430 Request that there are ntasks tasks invoked for every GPU. This
1431 option can work in two ways: 1) either specify --ntasks in addi‐
1432 tion, in which case a type-less GPU specification will be auto‐
1433 matically determined to satisfy --ntasks-per-gpu, or 2) specify
1434 the GPUs wanted (e.g. via --gpus or --gres) without specifying
1435 --ntasks, and the total task count will be automatically deter‐
1436 mined. The number of CPUs needed will be automatically
1437 increased if necessary to allow for any calculated task count.
1438 This option will implicitly set --gpu-bind=single:<ntasks>, but
1439 that can be overridden with an explicit --gpu-bind specifica‐
1440 tion. This option is not compatible with a node range (i.e.
1441 -N<minnodes-maxnodes>). This option is not compatible with
1442 --gpus-per-task, --gpus-per-socket, or --ntasks-per-node. This
1443 option is not supported unless SelectType=cons_tres is config‐
1444 ured (either directly or indirectly on Cray systems).
1445
1446
1447 --ntasks-per-node=<ntasks>
1448 Request that ntasks be invoked on each node. If used with the
1449 --ntasks option, the --ntasks option will take precedence and
1450 the --ntasks-per-node will be treated as a maximum count of
1451 tasks per node. Meant to be used with the --nodes option. This
1452 is related to --cpus-per-task=ncpus, but does not require knowl‐
1453 edge of the actual number of cpus on each node. In some cases,
1454 it is more convenient to be able to request that no more than a
1455 specific number of tasks be invoked on each node. Examples of
1456 this include submitting a hybrid MPI/OpenMP app where only one
1457 MPI "task/rank" should be assigned to each node while allowing
1458 the OpenMP portion to utilize all of the parallelism present in
1459 the node, or submitting a single setup/cleanup/monitoring job to
1460 each node of a pre-existing allocation as one step in a larger
1461 job script.
1462
1463
1464 --ntasks-per-socket=<ntasks>
1465 Request the maximum ntasks be invoked on each socket. Meant to
1466 be used with the --ntasks option. Related to --ntasks-per-node
1467 except at the socket level instead of the node level. NOTE:
1468 This option is not supported unless SelectType=cons_res is con‐
1469 figured (either directly or indirectly on Cray systems) along
1470 with the node's socket count.
1471
1472
1473 -O, --overcommit
1474 Overcommit resources. When applied to job allocation, only one
1475 CPU is allocated to the job per node and options used to specify
1476 the number of tasks per node, socket, core, etc. are ignored.
1477 When applied to job step allocations (the srun command when exe‐
1478 cuted within an existing job allocation), this option can be
1479 used to launch more than one task per CPU. Normally, srun will
1480 not allocate more than one process per CPU. By specifying
1481 --overcommit you are explicitly allowing more than one process
1482 per CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1483 mitted to execute per node. NOTE: MAX_TASKS_PER_NODE is defined
1484 in the file slurm.h and is not a variable, it is set at Slurm
1485 build time.
1486
1487
1488 -o, --output=<filename pattern>
1489 Instruct Slurm to connect the batch script's standard output
1490 directly to the file name specified in the "filename pattern".
1491 By default both standard output and standard error are directed
1492 to the same file. For job arrays, the default file name is
1493 "slurm-%A_%a.out", "%A" is replaced by the job ID and "%a" with
1494 the array index. For other jobs, the default file name is
1495 "slurm-%j.out", where the "%j" is replaced by the job ID. See
1496 the filename pattern section below for filename specification
1497 options.
1498
1499
1500 --open-mode=append|truncate
1501 Open the output and error files using append or truncate mode as
1502 specified. The default value is specified by the system config‐
1503 uration parameter JobFileAppend.
1504
1505
1506 --parsable
1507 Outputs only the job id number and the cluster name if present.
1508 The values are separated by a semicolon. Errors will still be
1509 displayed.
1510
1511
1512 -p, --partition=<partition_names>
1513 Request a specific partition for the resource allocation. If
1514 not specified, the default behavior is to allow the slurm con‐
1515 troller to select the default partition as designated by the
1516 system administrator. If the job can use more than one parti‐
1517 tion, specify their names in a comma separate list and the one
1518 offering earliest initiation will be used with no regard given
1519 to the partition name ordering (although higher priority parti‐
1520 tions will be considered first). When the job is initiated, the
1521 name of the partition used will be placed first in the job
1522 record partition string.
1523
1524
1525 --power=<flags>
1526 Comma separated list of power management plugin options. Cur‐
1527 rently available flags include: level (all nodes allocated to
1528 the job should have identical power caps, may be disabled by the
1529 Slurm configuration option PowerParameters=job_no_level).
1530
1531
1532 --priority=<value>
1533 Request a specific job priority. May be subject to configura‐
1534 tion specific constraints. value should either be a numeric
1535 value or "TOP" (for highest possible value). Only Slurm opera‐
1536 tors and administrators can set the priority of a job.
1537
1538
1539 --profile=<all|none|[energy[,|task[,|lustre[,|network]]]]>
1540 enables detailed data collection by the acct_gather_profile
1541 plugin. Detailed data are typically time-series that are stored
1542 in an HDF5 file for the job or an InfluxDB database depending on
1543 the configured plugin.
1544
1545
1546 All All data types are collected. (Cannot be combined with
1547 other values.)
1548
1549
1550 None No data types are collected. This is the default.
1551 (Cannot be combined with other values.)
1552
1553
1554 Energy Energy data is collected.
1555
1556
1557 Task Task (I/O, Memory, ...) data is collected.
1558
1559
1560 Lustre Lustre data is collected.
1561
1562
1563 Network Network (InfiniBand) data is collected.
1564
1565
1566 --propagate[=rlimit[,rlimit...]]
1567 Allows users to specify which of the modifiable (soft) resource
1568 limits to propagate to the compute nodes and apply to their
1569 jobs. If no rlimit is specified, then all resource limits will
1570 be propagated. The following rlimit names are supported by
1571 Slurm (although some options may not be supported on some sys‐
1572 tems):
1573
1574 ALL All limits listed below (default)
1575
1576 NONE No limits listed below
1577
1578 AS The maximum address space for a process
1579
1580 CORE The maximum size of core file
1581
1582 CPU The maximum amount of CPU time
1583
1584 DATA The maximum size of a process's data segment
1585
1586 FSIZE The maximum size of files created. Note that if the
1587 user sets FSIZE to less than the current size of the
1588 slurmd.log, job launches will fail with a 'File size
1589 limit exceeded' error.
1590
1591 MEMLOCK The maximum size that may be locked into memory
1592
1593 NOFILE The maximum number of open files
1594
1595 NPROC The maximum number of processes available
1596
1597 RSS The maximum resident set size
1598
1599 STACK The maximum stack size
1600
1601
1602 -q, --qos=<qos>
1603 Request a quality of service for the job. QOS values can be
1604 defined for each user/cluster/account association in the Slurm
1605 database. Users will be limited to their association's defined
1606 set of qos's when the Slurm configuration parameter, Account‐
1607 ingStorageEnforce, includes "qos" in its definition.
1608
1609
1610 -Q, --quiet
1611 Suppress informational messages from sbatch such as Job ID. Only
1612 errors will still be displayed.
1613
1614
1615 --reboot
1616 Force the allocated nodes to reboot before starting the job.
1617 This is only supported with some system configurations and will
1618 otherwise be silently ignored. Only root, SlurmUser or admins
1619 can reboot nodes.
1620
1621
1622 --requeue
1623 Specifies that the batch job should be eligible for requeuing.
1624 The job may be requeued explicitly by a system administrator,
1625 after node failure, or upon preemption by a higher priority job.
1626 When a job is requeued, the batch script is initiated from its
1627 beginning. Also see the --no-requeue option. The JobRequeue
1628 configuration parameter controls the default behavior on the
1629 cluster.
1630
1631
1632 --reservation=<reservation_names>
1633 Allocate resources for the job from the named reservation. If
1634 the job can use more than one reservation, specify their names
1635 in a comma separate list and the one offering earliest initia‐
1636 tion. Each reservation will be considered in the order it was
1637 requested. All reservations will be listed in scontrol/squeue
1638 through the life of the job. In accounting the first reserva‐
1639 tion will be seen and after the job starts the reservation used
1640 will replace it.
1641
1642
1643 -s, --oversubscribe
1644 The job allocation can over-subscribe resources with other run‐
1645 ning jobs. The resources to be over-subscribed can be nodes,
1646 sockets, cores, and/or hyperthreads depending upon configura‐
1647 tion. The default over-subscribe behavior depends on system
1648 configuration and the partition's OverSubscribe option takes
1649 precedence over the job's option. This option may result in the
1650 allocation being granted sooner than if the --oversubscribe
1651 option was not set and allow higher system utilization, but
1652 application performance will likely suffer due to competition
1653 for resources. Also see the --exclusive option.
1654
1655
1656 -S, --core-spec=<num>
1657 Count of specialized cores per node reserved by the job for sys‐
1658 tem operations and not used by the application. The application
1659 will not use these cores, but will be charged for their alloca‐
1660 tion. Default value is dependent upon the node's configured
1661 CoreSpecCount value. If a value of zero is designated and the
1662 Slurm configuration option AllowSpecResourcesUsage is enabled,
1663 the job will be allowed to override CoreSpecCount and use the
1664 specialized resources on nodes it is allocated. This option can
1665 not be used with the --thread-spec option.
1666
1667
1668 --signal=[[R][B]:]<sig_num>[@<sig_time>]
1669 When a job is within sig_time seconds of its end time, send it
1670 the signal sig_num. Due to the resolution of event handling by
1671 Slurm, the signal may be sent up to 60 seconds earlier than
1672 specified. sig_num may either be a signal number or name (e.g.
1673 "10" or "USR1"). sig_time must have an integer value between 0
1674 and 65535. By default, no signal is sent before the job's end
1675 time. If a sig_num is specified without any sig_time, the
1676 default time will be 60 seconds. Use the "B:" option to signal
1677 only the batch shell, none of the other processes will be sig‐
1678 naled. By default all job steps will be signaled, but not the
1679 batch shell itself. Use the "R:" option to allow this job to
1680 overlap with a reservation with MaxStartDelay set. To have the
1681 signal sent at preemption time see the preempt_send_user_signal
1682 SlurmctldParameter.
1683
1684
1685 --sockets-per-node=<sockets>
1686 Restrict node selection to nodes with at least the specified
1687 number of sockets. See additional information under -B option
1688 above when task/affinity plugin is enabled.
1689
1690
1691 --spread-job
1692 Spread the job allocation over as many nodes as possible and
1693 attempt to evenly distribute tasks across the allocated nodes.
1694 This option disables the topology/tree plugin.
1695
1696
1697 --switches=<count>[@<max-time>]
1698 When a tree topology is used, this defines the maximum count of
1699 switches desired for the job allocation and optionally the maxi‐
1700 mum time to wait for that number of switches. If Slurm finds an
1701 allocation containing more switches than the count specified,
1702 the job remains pending until it either finds an allocation with
1703 desired switch count or the time limit expires. It there is no
1704 switch count limit, there is no delay in starting the job.
1705 Acceptable time formats include "minutes", "minutes:seconds",
1706 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1707 "days-hours:minutes:seconds". The job's maximum time delay may
1708 be limited by the system administrator using the SchedulerParam‐
1709 eters configuration parameter with the max_switch_wait parameter
1710 option. On a dragonfly network the only switch count supported
1711 is 1 since communication performance will be highest when a job
1712 is allocate resources on one leaf switch or more than 2 leaf
1713 switches. The default max-time is the max_switch_wait Sched‐
1714 ulerParameters.
1715
1716
1717 -t, --time=<time>
1718 Set a limit on the total run time of the job allocation. If the
1719 requested time limit exceeds the partition's time limit, the job
1720 will be left in a PENDING state (possibly indefinitely). The
1721 default time limit is the partition's default time limit. When
1722 the time limit is reached, each task in each job step is sent
1723 SIGTERM followed by SIGKILL. The interval between signals is
1724 specified by the Slurm configuration parameter KillWait. The
1725 OverTimeLimit configuration parameter may permit the job to run
1726 longer than scheduled. Time resolution is one minute and second
1727 values are rounded up to the next minute.
1728
1729 A time limit of zero requests that no time limit be imposed.
1730 Acceptable time formats include "minutes", "minutes:seconds",
1731 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1732 "days-hours:minutes:seconds".
1733
1734
1735 --test-only
1736 Validate the batch script and return an estimate of when a job
1737 would be scheduled to run given the current job queue and all
1738 the other arguments specifying the job requirements. No job is
1739 actually submitted.
1740
1741
1742 --thread-spec=<num>
1743 Count of specialized threads per node reserved by the job for
1744 system operations and not used by the application. The applica‐
1745 tion will not use these threads, but will be charged for their
1746 allocation. This option can not be used with the --core-spec
1747 option.
1748
1749
1750 --threads-per-core=<threads>
1751 Restrict node selection to nodes with at least the specified
1752 number of threads per core. In task layout, use the specified
1753 maximum number of threads per core. NOTE: "Threads" refers to
1754 the number of processing units on each core rather than the num‐
1755 ber of application tasks to be launched per core. See addi‐
1756 tional information under -B option above when task/affinity
1757 plugin is enabled.
1758
1759
1760 --time-min=<time>
1761 Set a minimum time limit on the job allocation. If specified,
1762 the job may have its --time limit lowered to a value no lower
1763 than --time-min if doing so permits the job to begin execution
1764 earlier than otherwise possible. The job's time limit will not
1765 be changed after the job is allocated resources. This is per‐
1766 formed by a backfill scheduling algorithm to allocate resources
1767 otherwise reserved for higher priority jobs. Acceptable time
1768 formats include "minutes", "minutes:seconds", "hours:min‐
1769 utes:seconds", "days-hours", "days-hours:minutes" and
1770 "days-hours:minutes:seconds".
1771
1772
1773 --tmp=<size[units]>
1774 Specify a minimum amount of temporary disk space per node.
1775 Default units are megabytes. Different units can be specified
1776 using the suffix [K|M|G|T].
1777
1778
1779 --usage
1780 Display brief help message and exit.
1781
1782
1783 --uid=<user>
1784 Attempt to submit and/or run a job as user instead of the invok‐
1785 ing user id. The invoking user's credentials will be used to
1786 check access permissions for the target partition. User root may
1787 use this option to run jobs as a normal user in a RootOnly par‐
1788 tition for example. If run as root, sbatch will drop its permis‐
1789 sions to the uid specified after node allocation is successful.
1790 user may be the user name or numerical user ID.
1791
1792
1793 --use-min-nodes
1794 If a range of node counts is given, prefer the smaller count.
1795
1796
1797 -V, --version
1798 Display version information and exit.
1799
1800
1801 -v, --verbose
1802 Increase the verbosity of sbatch's informational messages. Mul‐
1803 tiple -v's will further increase sbatch's verbosity. By default
1804 only errors will be displayed.
1805
1806
1807 -w, --nodelist=<node name list>
1808 Request a specific list of hosts. The job will contain all of
1809 these hosts and possibly additional hosts as needed to satisfy
1810 resource requirements. The list may be specified as a
1811 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1812 for example), or a filename. The host list will be assumed to
1813 be a filename if it contains a "/" character. If you specify a
1814 minimum node or processor count larger than can be satisfied by
1815 the supplied host list, additional resources will be allocated
1816 on other nodes as needed. Duplicate node names in the list will
1817 be ignored. The order of the node names in the list is not
1818 important; the node names will be sorted by Slurm.
1819
1820
1821 -W, --wait
1822 Do not exit until the submitted job terminates. The exit code
1823 of the sbatch command will be the same as the exit code of the
1824 submitted job. If the job terminated due to a signal rather than
1825 a normal exit, the exit code will be set to 1. In the case of a
1826 job array, the exit code recorded will be the highest value for
1827 any task in the job array.
1828
1829
1830 --wait-all-nodes=<value>
1831 Controls when the execution of the command begins. By default
1832 the job will begin execution as soon as the allocation is made.
1833
1834 0 Begin execution as soon as allocation can be made. Do not
1835 wait for all nodes to be ready for use (i.e. booted).
1836
1837 1 Do not begin execution until all nodes are ready for use.
1838
1839
1840 --wckey=<wckey>
1841 Specify wckey to be used with job. If TrackWCKey=no (default)
1842 in the slurm.conf this value is ignored.
1843
1844
1845 --wrap=<command string>
1846 Sbatch will wrap the specified command string in a simple "sh"
1847 shell script, and submit that script to the slurm controller.
1848 When --wrap is used, a script name and arguments may not be
1849 specified on the command line; instead the sbatch-generated
1850 wrapper script is used.
1851
1852
1853 -x, --exclude=<node name list>
1854 Explicitly exclude certain nodes from the resources granted to
1855 the job.
1856
1857
1859 sbatch allows for a filename pattern to contain one or more replacement
1860 symbols, which are a percent sign "%" followed by a letter (e.g. %j).
1861
1862 \\ Do not process any of the replacement symbols.
1863
1864 %% The character "%".
1865
1866 %A Job array's master job allocation number.
1867
1868 %a Job array ID (index) number.
1869
1870 %J jobid.stepid of the running job. (e.g. "128.0")
1871
1872 %j jobid of the running job.
1873
1874 %N short hostname. This will create a separate IO file per node.
1875
1876 %n Node identifier relative to current job (e.g. "0" is the first
1877 node of the running job) This will create a separate IO file per
1878 node.
1879
1880 %s stepid of the running job.
1881
1882 %t task identifier (rank) relative to current job. This will create
1883 a separate IO file per task.
1884
1885 %u User name.
1886
1887 %x Job name.
1888
1889 A number placed between the percent character and format specifier may
1890 be used to zero-pad the result in the IO filename. This number is
1891 ignored if the format specifier corresponds to non-numeric data (%N
1892 for example).
1893
1894 Some examples of how the format string may be used for a 4 task job
1895 step with a Job ID of 128 and step id of 0 are included below:
1896
1897 job%J.out job128.0.out
1898
1899 job%4j.out job0128.out
1900
1901 job%j-%2t.out job128-00.out, job128-01.out, ...
1902
1904 Executing sbatch sends a remote procedure call to slurmctld. If enough
1905 calls from sbatch or other Slurm client commands that send remote pro‐
1906 cedure calls to the slurmctld daemon come in at once, it can result in
1907 a degradation of performance of the slurmctld daemon, possibly result‐
1908 ing in a denial of service.
1909
1910 Do not run sbatch or other Slurm client commands that send remote pro‐
1911 cedure calls to slurmctld from loops in shell scripts or other pro‐
1912 grams. Ensure that programs limit calls to sbatch to the minimum neces‐
1913 sary for the information you are trying to gather.
1914
1915
1917 Upon startup, sbatch will read and handle the options set in the fol‐
1918 lowing environment variables. Note that environment variables will
1919 override any options set in a batch script, and command line options
1920 will override any environment variables.
1921
1922
1923 SBATCH_ACCOUNT Same as -A, --account
1924
1925 SBATCH_ACCTG_FREQ Same as --acctg-freq
1926
1927 SBATCH_ARRAY_INX Same as -a, --array
1928
1929 SBATCH_BATCH Same as --batch
1930
1931 SBATCH_CLUSTERS or SLURM_CLUSTERS
1932 Same as --clusters
1933
1934 SBATCH_CONSTRAINT Same as -C, --constraint
1935
1936 SBATCH_CORE_SPEC Same as --core-spec
1937
1938 SBATCH_CPUS_PER_GPU Same as --cpus-per-gpu
1939
1940 SBATCH_DEBUG Same as -v, --verbose
1941
1942 SBATCH_DELAY_BOOT Same as --delay-boot
1943
1944 SBATCH_DISTRIBUTION Same as -m, --distribution
1945
1946 SBATCH_EXCLUSIVE Same as --exclusive
1947
1948 SBATCH_EXPORT Same as --export
1949
1950 SBATCH_GET_USER_ENV Same as --get-user-env
1951
1952 SBATCH_GPUS Same as -G, --gpus
1953
1954 SBATCH_GPU_BIND Same as --gpu-bind
1955
1956 SBATCH_GPU_FREQ Same as --gpu-freq
1957
1958 SBATCH_GPUS_PER_NODE Same as --gpus-per-node
1959
1960 SBATCH_GPUS_PER_TASK Same as --gpus-per-task
1961
1962 SBATCH_GRES Same as --gres
1963
1964 SBATCH_GRES_FLAGS Same as --gres-flags
1965
1966 SBATCH_HINT or SLURM_HINT
1967 Same as --hint
1968
1969 SBATCH_IGNORE_PBS Same as --ignore-pbs
1970
1971 SBATCH_JOB_NAME Same as -J, --job-name
1972
1973 SBATCH_MEM_BIND Same as --mem-bind
1974
1975 SBATCH_MEM_PER_CPU Same as --mem-per-cpu
1976
1977 SBATCH_MEM_PER_GPU Same as --mem-per-gpu
1978
1979 SBATCH_MEM_PER_NODE Same as --mem
1980
1981 SBATCH_NETWORK Same as --network
1982
1983 SBATCH_NO_KILL Same as -k, --no-kill
1984
1985 SBATCH_NO_REQUEUE Same as --no-requeue
1986
1987 SBATCH_OPEN_MODE Same as --open-mode
1988
1989 SBATCH_OVERCOMMIT Same as -O, --overcommit
1990
1991 SBATCH_PARTITION Same as -p, --partition
1992
1993 SBATCH_POWER Same as --power
1994
1995 SBATCH_PROFILE Same as --profile
1996
1997 SBATCH_QOS Same as --qos
1998
1999 SBATCH_RESERVATION Same as --reservation
2000
2001 SBATCH_REQ_SWITCH When a tree topology is used, this defines the
2002 maximum count of switches desired for the job
2003 allocation and optionally the maximum time to
2004 wait for that number of switches. See --switches
2005
2006 SBATCH_REQUEUE Same as --requeue
2007
2008 SBATCH_SIGNAL Same as --signal
2009
2010 SBATCH_SPREAD_JOB Same as --spread-job
2011
2012 SBATCH_THREAD_SPEC Same as --thread-spec
2013
2014 SBATCH_TIMELIMIT Same as -t, --time
2015
2016 SBATCH_USE_MIN_NODES Same as --use-min-nodes
2017
2018 SBATCH_WAIT Same as -W, --wait
2019
2020 SBATCH_WAIT_ALL_NODES Same as --wait-all-nodes
2021
2022 SBATCH_WAIT4SWITCH Max time waiting for requested switches. See
2023 --switches
2024
2025 SBATCH_WCKEY Same as --wckey
2026
2027 SLURM_CONF The location of the Slurm configuration file.
2028
2029 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
2030 error occurs (e.g. invalid options). This can be
2031 used by a script to distinguish application exit
2032 codes from various Slurm error conditions.
2033
2034 SLURM_STEP_KILLED_MSG_NODE_ID=ID
2035 If set, only the specified node will log when the
2036 job or step are killed by a signal.
2037
2038
2040 The Slurm controller will set the following variables in the environ‐
2041 ment of the batch script.
2042
2043 SBATCH_MEM_BIND
2044 Set to value of the --mem-bind option.
2045
2046 SBATCH_MEM_BIND_LIST
2047 Set to bit mask used for memory binding.
2048
2049 SBATCH_MEM_BIND_PREFER
2050 Set to "prefer" if the --mem-bind option includes the prefer
2051 option.
2052
2053 SBATCH_MEM_BIND_TYPE
2054 Set to the memory binding type specified with the --mem-bind
2055 option. Possible values are "none", "rank", "map_map",
2056 "mask_mem" and "local".
2057
2058 SBATCH_MEM_BIND_VERBOSE
2059 Set to "verbose" if the --mem-bind option includes the verbose
2060 option. Set to "quiet" otherwise.
2061
2062 SLURM_*_HET_GROUP_#
2063 For a heterogeneous job allocation, the environment variables
2064 are set separately for each component.
2065
2066 SLURM_ARRAY_TASK_COUNT
2067 Total number of tasks in a job array.
2068
2069 SLURM_ARRAY_TASK_ID
2070 Job array ID (index) number.
2071
2072 SLURM_ARRAY_TASK_MAX
2073 Job array's maximum ID (index) number.
2074
2075 SLURM_ARRAY_TASK_MIN
2076 Job array's minimum ID (index) number.
2077
2078 SLURM_ARRAY_TASK_STEP
2079 Job array's index step size.
2080
2081 SLURM_ARRAY_JOB_ID
2082 Job array's master job ID number.
2083
2084 SLURM_CLUSTER_NAME
2085 Name of the cluster on which the job is executing.
2086
2087 SLURM_CPUS_ON_NODE
2088 Number of CPUS on the allocated node.
2089
2090 SLURM_CPUS_PER_GPU
2091 Number of CPUs requested per allocated GPU. Only set if the
2092 --cpus-per-gpu option is specified.
2093
2094 SLURM_CPUS_PER_TASK
2095 Number of cpus requested per task. Only set if the
2096 --cpus-per-task option is specified.
2097
2098 SLURM_DIST_PLANESIZE
2099 Plane distribution size. Only set for plane distributions. See
2100 -m, --distribution.
2101
2102 SLURM_DISTRIBUTION
2103 Same as -m, --distribution
2104
2105 SLURM_EXPORT_ENV
2106 Same as -e, --export.
2107
2108 SLURM_GPUS
2109 Number of GPUs requested. Only set if the -G, --gpus option is
2110 specified.
2111
2112 SLURM_GPU_BIND
2113 Requested binding of tasks to GPU. Only set if the --gpu-bind
2114 option is specified.
2115
2116 SLURM_GPU_FREQ
2117 Requested GPU frequency. Only set if the --gpu-freq option is
2118 specified.
2119
2120 SLURM_GPUS_PER_NODE
2121 Requested GPU count per allocated node. Only set if the
2122 --gpus-per-node option is specified.
2123
2124 SLURM_GPUS_PER_SOCKET
2125 Requested GPU count per allocated socket. Only set if the
2126 --gpus-per-socket option is specified.
2127
2128 SLURM_GPUS_PER_TASK
2129 Requested GPU count per allocated task. Only set if the
2130 --gpus-per-task option is specified.
2131
2132 SLURM_GTIDS
2133 Global task IDs running on this node. Zero origin and comma
2134 separated.
2135
2136 SLURM_JOB_ACCOUNT
2137 Account name associated of the job allocation.
2138
2139 SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
2140 The ID of the job allocation.
2141
2142 SLURM_JOB_CPUS_PER_NODE
2143 Count of processors available to the job on this node. Note the
2144 select/linear plugin allocates entire nodes to jobs, so the
2145 value indicates the total count of CPUs on the node. The
2146 select/cons_res plugin allocates individual processors to jobs,
2147 so this number indicates the number of processors on this node
2148 allocated to the job.
2149
2150 SLURM_JOB_DEPENDENCY
2151 Set to value of the --dependency option.
2152
2153 SLURM_JOB_NAME
2154 Name of the job.
2155
2156 SLURM_JOB_NODELIST (and SLURM_NODELIST for backwards compatibility)
2157 List of nodes allocated to the job.
2158
2159 SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility)
2160 Total number of nodes in the job's resource allocation.
2161
2162 SLURM_JOB_PARTITION
2163 Name of the partition in which the job is running.
2164
2165 SLURM_JOB_QOS
2166 Quality Of Service (QOS) of the job allocation.
2167
2168 SLURM_JOB_RESERVATION
2169 Advanced reservation containing the job allocation, if any.
2170
2171 SLURM_LOCALID
2172 Node local task ID for the process within a job.
2173
2174 SLURM_MEM_PER_CPU
2175 Same as --mem-per-cpu
2176
2177 SLURM_MEM_PER_GPU
2178 Requested memory per allocated GPU. Only set if the
2179 --mem-per-gpu option is specified.
2180
2181 SLURM_MEM_PER_NODE
2182 Same as --mem
2183
2184 SLURM_NODE_ALIASES
2185 Sets of node name, communication address and hostname for nodes
2186 allocated to the job from the cloud. Each element in the set if
2187 colon separated and each set is comma separated. For example:
2188 SLURM_NODE_ALIASES=ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2189
2190 SLURM_NODEID
2191 ID of the nodes allocated.
2192
2193 SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
2194 Same as -n, --ntasks
2195
2196 SLURM_NTASKS_PER_CORE
2197 Number of tasks requested per core. Only set if the
2198 --ntasks-per-core option is specified.
2199
2200
2201 SLURM_NTASKS_PER_GPU
2202 Number of tasks requested per GPU. Only set if the
2203 --ntasks-per-gpu option is specified.
2204
2205 SLURM_NTASKS_PER_NODE
2206 Number of tasks requested per node. Only set if the
2207 --ntasks-per-node option is specified.
2208
2209 SLURM_NTASKS_PER_SOCKET
2210 Number of tasks requested per socket. Only set if the
2211 --ntasks-per-socket option is specified.
2212
2213 SLURM_HET_SIZE
2214 Set to count of components in heterogeneous job.
2215
2216 SLURM_PRIO_PROCESS
2217 The scheduling priority (nice value) at the time of job submis‐
2218 sion. This value is propagated to the spawned processes.
2219
2220 SLURM_PROCID
2221 The MPI rank (or relative process ID) of the current process
2222
2223 SLURM_PROFILE
2224 Same as --profile
2225
2226 SLURM_RESTART_COUNT
2227 If the job has been restarted due to system failure or has been
2228 explicitly requeued, this will be sent to the number of times
2229 the job has been restarted.
2230
2231 SLURM_SUBMIT_DIR
2232 The directory from which sbatch was invoked or, if applicable,
2233 the directory specified by the -D, --chdir option.
2234
2235 SLURM_SUBMIT_HOST
2236 The hostname of the computer from which sbatch was invoked.
2237
2238 SLURM_TASKS_PER_NODE
2239 Number of tasks to be initiated on each node. Values are comma
2240 separated and in the same order as SLURM_JOB_NODELIST. If two
2241 or more consecutive nodes are to have the same task count, that
2242 count is followed by "(x#)" where "#" is the repetition count.
2243 For example, "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2244 first three nodes will each execute two tasks and the fourth
2245 node will execute one task.
2246
2247 SLURM_TASK_PID
2248 The process ID of the task being started.
2249
2250 SLURM_TOPOLOGY_ADDR
2251 This is set only if the system has the topology/tree plugin
2252 configured. The value will be set to the names network
2253 switches which may be involved in the job's communications
2254 from the system's top level switch down to the leaf switch and
2255 ending with node name. A period is used to separate each hard‐
2256 ware component name.
2257
2258 SLURM_TOPOLOGY_ADDR_PATTERN
2259 This is set only if the system has the topology/tree plugin
2260 configured. The value will be set component types listed in
2261 SLURM_TOPOLOGY_ADDR. Each component will be identified as
2262 either "switch" or "node". A period is used to separate each
2263 hardware component type.
2264
2265 SLURMD_NODENAME
2266 Name of the node running the job script.
2267
2268
2270 Specify a batch script by filename on the command line. The batch
2271 script specifies a 1 minute time limit for the job.
2272
2273 $ cat myscript
2274 #!/bin/sh
2275 #SBATCH --time=1
2276 srun hostname |sort
2277
2278 $ sbatch -N4 myscript
2279 salloc: Granted job allocation 65537
2280
2281 $ cat slurm-65537.out
2282 host1
2283 host2
2284 host3
2285 host4
2286
2287
2288 Pass a batch script to sbatch on standard input:
2289
2290 $ sbatch -N4 <<EOF
2291 > #!/bin/sh
2292 > srun hostname |sort
2293 > EOF
2294 sbatch: Submitted batch job 65541
2295
2296 $ cat slurm-65541.out
2297 host1
2298 host2
2299 host3
2300 host4
2301
2302
2303 To create a heterogeneous job with 3 components, each allocating a
2304 unique set of nodes:
2305
2306 sbatch -w node[2-3] : -w node4 : -w node[5-7] work.bash
2307 Submitted batch job 34987
2308
2309
2311 Copyright (C) 2006-2007 The Regents of the University of California.
2312 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
2313 Copyright (C) 2008-2010 Lawrence Livermore National Security.
2314 Copyright (C) 2010-2017 SchedMD LLC.
2315
2316 This file is part of Slurm, a resource management program. For
2317 details, see <https://slurm.schedmd.com/>.
2318
2319 Slurm is free software; you can redistribute it and/or modify it under
2320 the terms of the GNU General Public License as published by the Free
2321 Software Foundation; either version 2 of the License, or (at your
2322 option) any later version.
2323
2324 Slurm is distributed in the hope that it will be useful, but WITHOUT
2325 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
2326 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
2327 for more details.
2328
2329
2331 sinfo(1), sattach(1), salloc(1), squeue(1), scancel(1), scontrol(1),
2332 slurm.conf(5), sched_setaffinity (2), numa (3)
2333
2334
2335
2336November 2020 Slurm Commands sbatch(1)