1salloc(1) Slurm Commands salloc(1)
2
3
4
6 salloc - Obtain a Slurm job allocation (a set of nodes), execute a com‐
7 mand, and then release the allocation when the command is finished.
8
9
11 salloc [OPTIONS(0)...] [ : [OPTIONS(n)...]] command(0) [args(0)...]
12
13 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
14 For more details about heterogeneous jobs see the document
15 https://slurm.schedmd.com/heterogeneous_jobs.html
16
17
19 salloc is used to allocate a Slurm job allocation, which is a set of
20 resources (nodes), possibly with some set of constraints (e.g. number
21 of processors per node). When salloc successfully obtains the
22 requested allocation, it then runs the command specified by the user.
23 Finally, when the user specified command is complete, salloc relin‐
24 quishes the job allocation.
25
26 The command may be any program the user wishes. Some typical commands
27 are xterm, a shell script containing srun commands, and srun (see the
28 EXAMPLES section). If no command is specified, then the value of Sal‐
29 locDefaultCommand in slurm.conf is used. If SallocDefaultCommand is not
30 set, then salloc runs the user's default shell.
31
32 The following document describes the influence of various options on
33 the allocation of cpus to jobs and tasks.
34 https://slurm.schedmd.com/cpu_management.html
35
36 NOTE: The salloc logic includes support to save and restore the termi‐
37 nal line settings and is designed to be executed in the foreground. If
38 you need to execute salloc in the background, set its standard input to
39 some file, for example: "salloc -n16 a.out </dev/null &"
40
41
43 If salloc is unable to execute the user command, it will return 1 and
44 print errors to stderr. Else if success or if killed by signals HUP,
45 INT, KILL, or QUIT: it will return 0.
46
47
49 If provided, the command is resolved in the following order:
50
51 1. If command starts with ".", then path is constructed as: current
52 working directory / command
53
54 2. If command starts with a "/", then path is considered absolute.
55
56 3. If command can be resolved through PATH. See path_resolution(7).
57
58
60 -A, --account=<account>
61 Charge resources used by this job to specified account. The
62 account is an arbitrary string. The account name may be changed
63 after job submission using the scontrol command.
64
65
66 --acctg-freq
67 Define the job accounting and profiling sampling intervals.
68 This can be used to override the JobAcctGatherFrequency parame‐
69 ter in Slurm's configuration file, slurm.conf. The supported
70 format is as follows:
71
72 --acctg-freq=<datatype>=<interval>
73 where <datatype>=<interval> specifies the task sam‐
74 pling interval for the jobacct_gather plugin or a
75 sampling interval for a profiling type by the
76 acct_gather_profile plugin. Multiple, comma-sepa‐
77 rated <datatype>=<interval> intervals may be speci‐
78 fied. Supported datatypes are as follows:
79
80 task=<interval>
81 where <interval> is the task sampling inter‐
82 val in seconds for the jobacct_gather plugins
83 and for task profiling by the
84 acct_gather_profile plugin. NOTE: This fre‐
85 quency is used to monitor memory usage. If
86 memory limits are enforced the highest fre‐
87 quency a user can request is what is config‐
88 ured in the slurm.conf file. They can not
89 turn it off (=0) either.
90
91 energy=<interval>
92 where <interval> is the sampling interval in
93 seconds for energy profiling using the
94 acct_gather_energy plugin
95
96 network=<interval>
97 where <interval> is the sampling interval in
98 seconds for infiniband profiling using the
99 acct_gather_infiniband plugin.
100
101 filesystem=<interval>
102 where <interval> is the sampling interval in
103 seconds for filesystem profiling using the
104 acct_gather_filesystem plugin.
105
106 The default value for the task sampling
107 interval
108 is 30. The default value for all other intervals is 0. An
109 interval of 0 disables sampling of the specified type. If the
110 task sampling interval is 0, accounting information is collected
111 only at job termination (reducing Slurm interference with the
112 job).
113 Smaller (non-zero) values have a greater impact upon job perfor‐
114 mance, but a value of 30 seconds is not likely to be noticeable
115 for applications having less than 10,000 tasks.
116
117
118 -B --extra-node-info=<sockets[:cores[:threads]]>
119 Restrict node selection to nodes with at least the specified
120 number of sockets, cores per socket and/or threads per core.
121 NOTE: These options do not specify the resource allocation size.
122 Each value specified is considered a minimum. An asterisk (*)
123 can be used as a placeholder indicating that all available
124 resources of that type are to be utilized. Values can also be
125 specified as min-max. The individual levels can also be speci‐
126 fied in separate options if desired:
127 --sockets-per-node=<sockets>
128 --cores-per-socket=<cores>
129 --threads-per-core=<threads>
130 If task/affinity plugin is enabled, then specifying an alloca‐
131 tion in this manner also results in subsequently launched tasks
132 being bound to threads if the -B option specifies a thread
133 count, otherwise an option of cores if a core count is speci‐
134 fied, otherwise an option of sockets. If SelectType is config‐
135 ured to select/cons_res, it must have a parameter of CR_Core,
136 CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option
137 to be honored. If not specified, the scontrol show job will
138 display 'ReqS:C:T=*:*:*'. This option applies to job alloca‐
139 tions.
140
141
142 --bb=<spec>
143 Burst buffer specification. The form of the specification is
144 system dependent. Note the burst buffer may not be accessible
145 from a login node, but require that salloc spawn a shell on one
146 of it's allocated compute nodes. See the description of Sal‐
147 locDefaultCommand in the slurm.conf man page for more informa‐
148 tion about how to spawn a remote shell.
149
150
151 --bbf=<file_name>
152 Path of file containing burst buffer specification. The form of
153 the specification is system dependent. Also see --bb. Note the
154 burst buffer may not be accessible from a login node, but
155 require that salloc spawn a shell on one of it's allocated com‐
156 pute nodes. See the description of SallocDefaultCommand in the
157 slurm.conf man page for more information about how to spawn a
158 remote shell.
159
160
161 --begin=<time>
162 Defer eligibility of this job allocation until the specified
163 time.
164
165 Time may be of the form HH:MM:SS to run a job at a specific time
166 of day (seconds are optional). (If that time is already past,
167 the next day is assumed.) You may also specify midnight, noon,
168 fika (3 PM) or teatime (4 PM) and you can have a time-of-day
169 suffixed with AM or PM for running in the morning or the
170 evening. You can also say what day the job will be run, by
171 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
172 Combine date and time using the following format
173 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
174 count time-units, where the time-units can be seconds (default),
175 minutes, hours, days, or weeks and you can tell Slurm to run the
176 job today with the keyword today and to run the job tomorrow
177 with the keyword tomorrow. The value may be changed after job
178 submission using the scontrol command. For example:
179 --begin=16:00
180 --begin=now+1hour
181 --begin=now+60 (seconds by default)
182 --begin=2010-01-20T12:34:00
183
184
185 Notes on date/time specifications:
186 - Although the 'seconds' field of the HH:MM:SS time specifica‐
187 tion is allowed by the code, note that the poll time of the
188 Slurm scheduler is not precise enough to guarantee dispatch of
189 the job on the exact second. The job will be eligible to start
190 on the next poll following the specified time. The exact poll
191 interval depends on the Slurm scheduler (e.g., 60 seconds with
192 the default sched/builtin).
193 - If no time (HH:MM:SS) is specified, the default is
194 (00:00:00).
195 - If a date is specified without a year (e.g., MM/DD) then the
196 current year is assumed, unless the combination of MM/DD and
197 HH:MM:SS has already passed for that year, in which case the
198 next year is used.
199
200
201 --bell Force salloc to ring the terminal bell when the job allocation
202 is granted (and only if stdout is a tty). By default, salloc
203 only rings the bell if the allocation is pending for more than
204 ten seconds (and only if stdout is a tty). Also see the option
205 --no-bell.
206
207
208 --cluster-constraint=<list>
209 Specifies features that a federated cluster must have to have a
210 sibling job submitted to it. Slurm will attempt to submit a sib‐
211 ling job to a cluster if it has at least one of the specified
212 features.
213
214
215 --comment=<string>
216 An arbitrary comment.
217
218
219 -C, --constraint=<list>
220 Nodes can have features assigned to them by the Slurm adminis‐
221 trator. Users can specify which of these features are required
222 by their job using the constraint option. Only nodes having
223 features matching the job constraints will be used to satisfy
224 the request. Multiple constraints may be specified with AND,
225 OR, matching OR, resource counts, etc. (some operators are not
226 supported on all system types). Supported constraint options
227 include:
228
229 Single Name
230 Only nodes which have the specified feature will be used.
231 For example, --constraint="intel"
232
233 Node Count
234 A request can specify the number of nodes needed with
235 some feature by appending an asterisk and count after the
236 feature name. For example "--nodes=16 --con‐
237 straint=graphics*4 ..." indicates that the job requires
238 16 nodes and that at least four of those nodes must have
239 the feature "graphics."
240
241 AND If only nodes with all of specified features will be
242 used. The ampersand is used for an AND operator. For
243 example, --constraint="intel&gpu"
244
245 OR If only nodes with at least one of specified features
246 will be used. The vertical bar is used for an OR opera‐
247 tor. For example, --constraint="intel|amd"
248
249 Matching OR
250 If only one of a set of possible options should be used
251 for all allocated nodes, then use the OR operator and
252 enclose the options within square brackets. For example:
253 "--constraint=[rack1|rack2|rack3|rack4]" might be used to
254 specify that all nodes must be allocated on a single rack
255 of the cluster, but any of those four racks can be used.
256
257 Multiple Counts
258 Specific counts of multiple resources may be specified by
259 using the AND operator and enclosing the options within
260 square brackets. For example: "--con‐
261 straint=[rack1*2&rack2*4]" might be used to specify that
262 two nodes must be allocated from nodes with the feature
263 of "rack1" and four nodes must be allocated from nodes
264 with the feature "rack2".
265
266 NOTE: This construct does not support multiple Intel KNL
267 NUMA or MCDRAM modes. For example, while "--con‐
268 straint=[(knl&quad)*2&(knl&hemi)*4]" is not supported,
269 "--constraint=[haswell*2&(knl&hemi)*4]" is supported.
270 Specification of multiple KNL modes requires the use of a
271 heterogeneous job.
272
273
274 Parenthesis
275 Parenthesis can be used to group like node features
276 together. For example "--con‐
277 straint=[(knl&snc4&flat)*4&haswell*1]" might be used to
278 specify that four nodes with the features "knl", "snc4"
279 and "flat" plus one node with the feature "haswell" are
280 required. All options within parenthesis should be
281 grouped with AND (e.g. "&") operands.
282
283
284 --contiguous
285 If set, then the allocated nodes must form a contiguous set.
286 Not honored with the topology/tree or topology/3d_torus plugins,
287 both of which can modify the node ordering.
288
289
290 --cores-per-socket=<cores>
291 Restrict node selection to nodes with at least the specified
292 number of cores per socket. See additional information under -B
293 option above when task/affinity plugin is enabled.
294
295
296 --cpu-freq =<p1[-p2[:p3]]>
297
298 Request that job steps initiated by srun commands inside this
299 allocation be run at some requested frequency if possible, on
300 the CPUs selected for the step on the compute node(s).
301
302 p1 can be [#### | low | medium | high | highm1] which will set
303 the frequency scaling_speed to the corresponding value, and set
304 the frequency scaling_governor to UserSpace. See below for defi‐
305 nition of the values.
306
307 p1 can be [Conservative | OnDemand | Performance | PowerSave]
308 which will set the scaling_governor to the corresponding value.
309 The governor has to be in the list set by the slurm.conf option
310 CpuFreqGovernors.
311
312 When p2 is present, p1 will be the minimum scaling frequency and
313 p2 will be the maximum scaling frequency.
314
315 p2 can be [#### | medium | high | highm1] p2 must be greater
316 than p1.
317
318 p3 can be [Conservative | OnDemand | Performance | PowerSave |
319 UserSpace] which will set the governor to the corresponding
320 value.
321
322 If p3 is UserSpace, the frequency scaling_speed will be set by a
323 power or energy aware scheduling strategy to a value between p1
324 and p2 that lets the job run within the site's power goal. The
325 job may be delayed if p1 is higher than a frequency that allows
326 the job to run within the goal.
327
328 If the current frequency is < min, it will be set to min. Like‐
329 wise, if the current frequency is > max, it will be set to max.
330
331 Acceptable values at present include:
332
333 #### frequency in kilohertz
334
335 Low the lowest available frequency
336
337 High the highest available frequency
338
339 HighM1 (high minus one) will select the next highest
340 available frequency
341
342 Medium attempts to set a frequency in the middle of the
343 available range
344
345 Conservative attempts to use the Conservative CPU governor
346
347 OnDemand attempts to use the OnDemand CPU governor (the
348 default value)
349
350 Performance attempts to use the Performance CPU governor
351
352 PowerSave attempts to use the PowerSave CPU governor
353
354 UserSpace attempts to use the UserSpace CPU governor
355
356
357 The following informational environment variable is set
358 in the job
359 step when --cpu-freq option is requested.
360 SLURM_CPU_FREQ_REQ
361
362 This environment variable can also be used to supply the value
363 for the CPU frequency request if it is set when the 'srun' com‐
364 mand is issued. The --cpu-freq on the command line will over‐
365 ride the environment variable value. The form on the environ‐
366 ment variable is the same as the command line. See the ENVIRON‐
367 MENT VARIABLES section for a description of the
368 SLURM_CPU_FREQ_REQ variable.
369
370 NOTE: This parameter is treated as a request, not a requirement.
371 If the job step's node does not support setting the CPU fre‐
372 quency, or the requested value is outside the bounds of the
373 legal frequencies, an error is logged, but the job step is
374 allowed to continue.
375
376 NOTE: Setting the frequency for just the CPUs of the job step
377 implies that the tasks are confined to those CPUs. If task con‐
378 finement (i.e., TaskPlugin=task/affinity or TaskPlu‐
379 gin=task/cgroup with the "ConstrainCores" option) is not config‐
380 ured, this parameter is ignored.
381
382 NOTE: When the step completes, the frequency and governor of
383 each selected CPU is reset to the previous values.
384
385 NOTE: When submitting jobs with the --cpu-freq option with lin‐
386 uxproc as the ProctrackType can cause jobs to run too quickly
387 before Accounting is able to poll for job information. As a
388 result not all of accounting information will be present.
389
390
391 -c, --cpus-per-task=<ncpus>
392 Advise Slurm that ensuing job steps will require ncpus proces‐
393 sors per task. By default Slurm will allocate one processor per
394 task.
395
396 For instance, consider an application that has 4 tasks, each
397 requiring 3 processors. If our cluster is comprised of
398 quad-processors nodes and we simply ask for 12 processors, the
399 controller might give us only 3 nodes. However, by using the
400 --cpus-per-task=3 options, the controller knows that each task
401 requires 3 processors on the same node, and the controller will
402 grant an allocation of 4 nodes, one for each of the 4 tasks.
403
404
405 --deadline=<OPT>
406 remove the job if no ending is possible before this deadline
407 (start > (deadline - time[-min])). Default is no deadline.
408 Valid time formats are:
409 HH:MM[:SS] [AM|PM]
410 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
411 MM/DD[/YY]-HH:MM[:SS]
412 YYYY-MM-DD[THH:MM[:SS]]]
413
414
415 --delay-boot=<minutes>
416 Do not reboot nodes in order to satisfied this job's feature
417 specification if the job has been eligible to run for less than
418 this time period. If the job has waited for less than the spec‐
419 ified period, it will use only nodes which already have the
420 specified features. The argument is in units of minutes. A
421 default value may be set by a system administrator using the
422 delay_boot option of the SchedulerParameters configuration
423 parameter in the slurm.conf file, otherwise the default value is
424 zero (no delay).
425
426
427 -d, --dependency=<dependency_list>
428 Defer the start of this job until the specified dependencies
429 have been satisfied completed. <dependency_list> is of the form
430 <type:job_id[:job_id][,type:job_id[:job_id]]> or
431 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
432 must be satisfied if the "," separator is used. Any dependency
433 may be satisfied if the "?" separator is used. Many jobs can
434 share the same dependency and these jobs may even belong to dif‐
435 ferent users. The value may be changed after job submission
436 using the scontrol command. Once a job dependency fails due to
437 the termination state of a preceding job, the dependent job will
438 never be run, even if the preceding job is requeued and has a
439 different termination state in a subsequent execution.
440
441 after:job_id[:jobid...]
442 This job can begin execution after the specified jobs
443 have begun execution.
444
445 afterany:job_id[:jobid...]
446 This job can begin execution after the specified jobs
447 have terminated.
448
449 afterburstbuffer:job_id[:jobid...]
450 This job can begin execution after the specified jobs
451 have terminated and any associated burst buffer stage out
452 operations have completed.
453
454 aftercorr:job_id[:jobid...]
455 A task of this job array can begin execution after the
456 corresponding task ID in the specified job has completed
457 successfully (ran to completion with an exit code of
458 zero).
459
460 afternotok:job_id[:jobid...]
461 This job can begin execution after the specified jobs
462 have terminated in some failed state (non-zero exit code,
463 node failure, timed out, etc).
464
465 afterok:job_id[:jobid...]
466 This job can begin execution after the specified jobs
467 have successfully executed (ran to completion with an
468 exit code of zero).
469
470 expand:job_id
471 Resources allocated to this job should be used to expand
472 the specified job. The job to expand must share the same
473 QOS (Quality of Service) and partition. Gang scheduling
474 of resources in the partition is also not supported.
475
476 singleton
477 This job can begin execution after any previously
478 launched jobs sharing the same job name and user have
479 terminated. In other words, only one job by that name
480 and owned by that user can be running or suspended at any
481 point in time.
482
483
484 -D, --chdir=<path>
485 Change directory to path before beginning execution. The path
486 can be specified as full path or relative path to the directory
487 where the command is executed.
488
489
490 --exclusive[=user|mcs]
491 The job allocation can not share nodes with other running jobs
492 (or just other users with the "=user" option or with the "=mcs"
493 option). The default shared/exclusive behavior depends on sys‐
494 tem configuration and the partition's OverSubscribe option takes
495 precedence over the job's option.
496
497
498 -F, --nodefile=<node file>
499 Much like --nodelist, but the list is contained in a file of
500 name node file. The node names of the list may also span multi‐
501 ple lines in the file. Duplicate node names in the file will
502 be ignored. The order of the node names in the list is not
503 important; the node names will be sorted by Slurm.
504
505
506 --get-user-env[=timeout][mode]
507 This option will load login environment variables for the user
508 specified in the --uid option. The environment variables are
509 retrieved by running something of this sort "su - <username> -c
510 /usr/bin/env" and parsing the output. Be aware that any envi‐
511 ronment variables already set in salloc's environment will take
512 precedence over any environment variables in the user's login
513 environment. The optional timeout value is in seconds. Default
514 value is 3 seconds. The optional mode value control the "su"
515 options. With a mode value of "S", "su" is executed without the
516 "-" option. With a mode value of "L", "su" is executed with the
517 "-" option, replicating the login environment. If mode not
518 specified, the mode established at Slurm build time is used.
519 Example of use include "--get-user-env", "--get-user-env=10"
520 "--get-user-env=10L", and "--get-user-env=S". NOTE: This option
521 only works if the caller has an effective uid of "root". This
522 option was originally created for use by Moab.
523
524
525 --gid=<group>
526 Submit the job with the specified group's group access permis‐
527 sions. group may be the group name or the numerical group ID.
528 In the default Slurm configuration, this option is only valid
529 when used by the user root.
530
531
532 --gres=<list>
533 Specifies a comma delimited list of generic consumable
534 resources. The format of each entry on the list is
535 "name[[:type]:count]". The name is that of the consumable
536 resource. The count is the number of those resources with a
537 default value of 1. The specified resources will be allocated
538 to the job on each node. The available generic consumable
539 resources is configurable by the system administrator. A list
540 of available generic consumable resources will be printed and
541 the command will exit if the option argument is "help". Exam‐
542 ples of use include "--gres=gpu:2,mic:1", "--gres=gpu:kepler:2",
543 and "--gres=help".
544
545
546 --gres-flags=<type>
547 Specify generic resource task binding options.
548
549 disable-binding
550 Disable filtering of CPUs with respect to generic
551 resource locality. This option is currently required to
552 use more CPUs than are bound to a GRES (i.e. if a GPU is
553 bound to the CPUs on one socket, but resources on more
554 than one socket are required to run the job). This
555 option may permit a job to be allocated resources sooner
556 than otherwise possible, but may result in lower job per‐
557 formance.
558
559 enforce-binding
560 The only CPUs available to the job will be those bound to
561 the selected GRES (i.e. the CPUs identified in the
562 gres.conf file will be strictly enforced). This option
563 may result in delayed initiation of a job. For example a
564 job requiring two GPUs and one CPU will be delayed until
565 both GPUs on a single socket are available rather than
566 using GPUs bound to separate sockets, however the appli‐
567 cation performance may be improved due to improved commu‐
568 nication speed. Requires the node to be configured with
569 more than one socket and resource filtering will be per‐
570 formed on a per-socket basis.
571
572
573 -H, --hold
574 Specify the job is to be submitted in a held state (priority of
575 zero). A held job can now be released using scontrol to reset
576 its priority (e.g. "scontrol release <job_id>").
577
578
579 -h, --help
580 Display help information and exit.
581
582
583 --hint=<type>
584 Bind tasks according to application hints.
585
586 compute_bound
587 Select settings for compute bound applications: use all
588 cores in each socket, one thread per core.
589
590 memory_bound
591 Select settings for memory bound applications: use only
592 one core in each socket, one thread per core.
593
594 [no]multithread
595 [don't] use extra threads with in-core multi-threading
596 which can benefit communication intensive applications.
597 Only supported with the task/affinity plugin.
598
599 help show this help message
600
601
602 -I, --immediate[=<seconds>]
603 exit if resources are not available within the time period spec‐
604 ified. If no argument is given, resources must be available
605 immediately for the request to succeed. By default, --immediate
606 is off, and the command will block until resources become avail‐
607 able. Since this option's argument is optional, for proper pars‐
608 ing the single letter option must be followed immediately with
609 the value and not include a space between them. For example
610 "-I60" and not "-I 60".
611
612
613 -J, --job-name=<jobname>
614 Specify a name for the job allocation. The specified name will
615 appear along with the job id number when querying running jobs
616 on the system. The default job name is the name of the "com‐
617 mand" specified on the command line.
618
619
620 --jobid=<jobid>
621 Allocate resources as the specified job id. NOTE: Only valid
622 for users root and SlurmUser.
623
624
625 -K, --kill-command[=signal]
626 salloc always runs a user-specified command once the allocation
627 is granted. salloc will wait indefinitely for that command to
628 exit. If you specify the --kill-command option salloc will send
629 a signal to your command any time that the Slurm controller
630 tells salloc that its job allocation has been revoked. The job
631 allocation can be revoked for a couple of reasons: someone used
632 scancel to revoke the allocation, or the allocation reached its
633 time limit. If you do not specify a signal name or number and
634 Slurm is configured to signal the spawned command at job termi‐
635 nation, the default signal is SIGHUP for interactive and SIGTERM
636 for non-interactive sessions. Since this option's argument is
637 optional, for proper parsing the single letter option must be
638 followed immediately with the value and not include a space
639 between them. For example "-K1" and not "-K 1".
640
641
642 -k, --no-kill
643 Do not automatically terminate a job if one of the nodes it has
644 been allocated fails. The user will assume the responsibilities
645 for fault-tolerance should a node fail. When there is a node
646 failure, any active job steps (usually MPI jobs) on that node
647 will almost certainly suffer a fatal error, but with --no-kill,
648 the job allocation will not be revoked so the user may launch
649 new job steps on the remaining nodes in their allocation.
650
651 By default Slurm terminates the entire job allocation if any
652 node fails in its range of allocated nodes.
653
654
655 -L, --licenses=<license>
656 Specification of licenses (or other resources available on all
657 nodes of the cluster) which must be allocated to this job.
658 License names can be followed by a colon and count (the default
659 count is one). Multiple license names should be comma separated
660 (e.g. "--licenses=foo:4,bar").
661
662
663 -M, --clusters=<string>
664 Clusters to issue commands to. Multiple cluster names may be
665 comma separated. The job will be submitted to the one cluster
666 providing the earliest expected job initiation time. The default
667 value is the current cluster. A value of 'all' will query to run
668 on all clusters. Note that the SlurmDBD must be up for this
669 option to work properly.
670
671
672 -m, --distribution=
673 arbitrary|<block|cyclic|plane=<options>[:block|cyclic|fcyclic]>
674
675 Specify alternate distribution methods for remote processes. In
676 salloc, this only sets environment variables that will be used
677 by subsequent srun requests. This option controls the assign‐
678 ment of tasks to the nodes on which resources have been allo‐
679 cated, and the distribution of those resources to tasks for
680 binding (task affinity). The first distribution method (before
681 the ":") controls the distribution of resources across nodes.
682 The optional second distribution method (after the ":") controls
683 the distribution of resources across sockets within a node.
684 Note that with select/cons_res, the number of cpus allocated on
685 each socket and node may be different. Refer to
686 https://slurm.schedmd.com/mc_support.html for more information
687 on resource allocation, assignment of tasks to nodes, and bind‐
688 ing of tasks to CPUs.
689
690 First distribution method:
691
692 block The block distribution method will distribute tasks to a
693 node such that consecutive tasks share a node. For exam‐
694 ple, consider an allocation of three nodes each with two
695 cpus. A four-task block distribution request will dis‐
696 tribute those tasks to the nodes with tasks one and two
697 on the first node, task three on the second node, and
698 task four on the third node. Block distribution is the
699 default behavior if the number of tasks exceeds the num‐
700 ber of allocated nodes.
701
702 cyclic The cyclic distribution method will distribute tasks to a
703 node such that consecutive tasks are distributed over
704 consecutive nodes (in a round-robin fashion). For exam‐
705 ple, consider an allocation of three nodes each with two
706 cpus. A four-task cyclic distribution request will dis‐
707 tribute those tasks to the nodes with tasks one and four
708 on the first node, task two on the second node, and task
709 three on the third node. Note that when SelectType is
710 select/cons_res, the same number of CPUs may not be allo‐
711 cated on each node. Task distribution will be round-robin
712 among all the nodes with CPUs yet to be assigned to
713 tasks. Cyclic distribution is the default behavior if
714 the number of tasks is no larger than the number of allo‐
715 cated nodes.
716
717 plane The tasks are distributed in blocks of a specified size.
718 The options include a number representing the size of the
719 task block. This is followed by an optional specifica‐
720 tion of the task distribution scheme within a block of
721 tasks and between the blocks of tasks. The number of
722 tasks distributed to each node is the same as for cyclic
723 distribution, but the taskids assigned to each node
724 depend on the plane size. For more details (including
725 examples and diagrams), please see
726 https://slurm.schedmd.com/mc_support.html
727 and
728 https://slurm.schedmd.com/dist_plane.html
729
730 arbitrary
731 The arbitrary method of distribution will allocate pro‐
732 cesses in-order as listed in file designated by the envi‐
733 ronment variable SLURM_HOSTFILE. If this variable is
734 listed it will over ride any other method specified. If
735 not set the method will default to block. Inside the
736 hostfile must contain at minimum the number of hosts
737 requested and be one per line or comma separated. If
738 specifying a task count (-n, --ntasks=<number>), your
739 tasks will be laid out on the nodes in the order of the
740 file.
741 NOTE: The arbitrary distribution option on a job alloca‐
742 tion only controls the nodes to be allocated to the job
743 and not the allocation of CPUs on those nodes. This
744 option is meant primarily to control a job step's task
745 layout in an existing job allocation for the srun com‐
746 mand.
747
748
749 Second distribution method:
750
751 block The block distribution method will distribute tasks to
752 sockets such that consecutive tasks share a socket.
753
754 cyclic The cyclic distribution method will distribute tasks to
755 sockets such that consecutive tasks are distributed over
756 consecutive sockets (in a round-robin fashion). Tasks
757 requiring more than one CPU will have all of those CPUs
758 allocated on a single socket if possible.
759
760 fcyclic
761 The fcyclic distribution method will distribute tasks to
762 sockets such that consecutive tasks are distributed over
763 consecutive sockets (in a round-robin fashion). Tasks
764 requiring more than one CPU will have each CPUs allocated
765 in a cyclic fashion across sockets.
766
767
768 --mail-type=<type>
769 Notify user by email when certain event types occur. Valid type
770 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
771 BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buf‐
772 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
773 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
774 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
775 time limit). Multiple type values may be specified in a comma
776 separated list. The user to be notified is indicated with
777 --mail-user.
778
779
780 --mail-user=<user>
781 User to receive email notification of state changes as defined
782 by --mail-type. The default value is the submitting user.
783
784
785 --mcs-label=<mcs>
786 Used only when the mcs/group plugin is enabled. This parameter
787 is a group among the groups of the user. Default value is cal‐
788 culated by the Plugin mcs if it's enabled.
789
790
791 --mem=<size[units]>
792 Specify the real memory required per node. Default units are
793 megabytes unless the SchedulerParameters configuration parameter
794 includes the "default_gbytes" option for gigabytes. Different
795 units can be specified using the suffix [K|M|G|T]. Default
796 value is DefMemPerNode and the maximum value is MaxMemPerNode.
797 If configured, both of parameters can be seen using the scontrol
798 show config command. This parameter would generally be used if
799 whole nodes are allocated to jobs (SelectType=select/linear).
800 Also see --mem-per-cpu. The --mem and --mem-per-cpu options are
801 mutually exclusive.
802
803 NOTE: A memory size specification of zero is treated as a spe‐
804 cial case and grants the job access to all of the memory on each
805 node. If the job is allocated multiple nodes in a heterogeneous
806 cluster, the memory limit on each node will be that of the node
807 in the allocation with the smallest memory size (same limit will
808 apply to every node in the job's allocation).
809
810 NOTE: Enforcement of memory limits currently relies upon the
811 task/cgroup plugin or enabling of accounting, which samples mem‐
812 ory use on a periodic basis (data need not be stored, just col‐
813 lected). In both cases memory use is based upon the job's Resi‐
814 dent Set Size (RSS). A task may exceed the memory limit until
815 the next periodic accounting sample.
816
817
818 --mem-per-cpu=<size[units]>
819 Minimum memory required per allocated CPU. Default units are
820 megabytes unless the SchedulerParameters configuration parameter
821 includes the "default_gbytes" option for gigabytes. Different
822 units can be specified using the suffix [K|M|G|T]. Default
823 value is DefMemPerCPU and the maximum value is MaxMemPerCPU (see
824 exception below). If configured, both of parameters can be seen
825 using the scontrol show config command. Note that if the job's
826 --mem-per-cpu value exceeds the configured MaxMemPerCPU, then
827 the user's limit will be treated as a memory limit per task;
828 --mem-per-cpu will be reduced to a value no larger than MaxMem‐
829 PerCPU; --cpus-per-task will be set and the value of
830 --cpus-per-task multiplied by the new --mem-per-cpu value will
831 equal the original --mem-per-cpu value specified by the user.
832 This parameter would generally be used if individual processors
833 are allocated to jobs (SelectType=select/cons_res). If
834 resources are allocated by the core, socket or whole nodes; the
835 number of CPUs allocated to a job may be higher than the task
836 count and the value of --mem-per-cpu should be adjusted accord‐
837 ingly. Also see --mem. --mem and --mem-per-cpu are mutually
838 exclusive.
839
840
841 --mem-bind=[{quiet,verbose},]type
842 Bind tasks to memory. Used only when the task/affinity plugin is
843 enabled and the NUMA memory functions are available. Note that
844 the resolution of CPU and memory binding may differ on some
845 architectures. For example, CPU binding may be performed at the
846 level of the cores within a processor while memory binding will
847 be performed at the level of nodes, where the definition of
848 "nodes" may differ from system to system. By default no memory
849 binding is performed; any task using any CPU can use any memory.
850 This option is typically used to ensure that each task is bound
851 to the memory closest to it's assigned CPU. The use of any type
852 other than "none" or "local" is not recommended. If you want
853 greater control, try running a simple test code with the options
854 "--cpu-bind=verbose,none --mem-bind=verbose,none" to determine
855 the specific configuration.
856
857 NOTE: To have Slurm always report on the selected memory binding
858 for all commands executed in a shell, you can enable verbose
859 mode by setting the SLURM_MEM_BIND environment variable value to
860 "verbose".
861
862 The following informational environment variables are set when
863 --mem-bind is in use:
864
865 SLURM_MEM_BIND_LIST
866 SLURM_MEM_BIND_PREFER
867 SLURM_MEM_BIND_SORT
868 SLURM_MEM_BIND_TYPE
869 SLURM_MEM_BIND_VERBOSE
870
871 See the ENVIRONMENT VARIABLES section for a more detailed
872 description of the individual SLURM_MEM_BIND* variables.
873
874 Supported options include:
875
876 help show this help message
877
878 local Use memory local to the processor in use
879
880 map_mem:<list>
881 Bind by setting memory masks on tasks (or ranks) as spec‐
882 ified where <list> is
883 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
884 ping is specified for a node and identical mapping is
885 applied to the tasks on every node (i.e. the lowest task
886 ID on each node is mapped to the first ID specified in
887 the list, etc.). NUMA IDs are interpreted as decimal
888 values unless they are preceded with '0x' in which case
889 they interpreted as hexadecimal values. If the number of
890 tasks (or ranks) exceeds the number of elements in this
891 list, elements in the list will be reused as needed
892 starting from the beginning of the list. To simplify
893 support for large task counts, the lists may follow a map
894 with an asterisk and repetition count For example
895 "map_mem:0x0f*4,0xf0*4". Not supported unless the entire
896 node is allocated to the job.
897
898 mask_mem:<list>
899 Bind by setting memory masks on tasks (or ranks) as spec‐
900 ified where <list> is
901 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
902 mapping is specified for a node and identical mapping is
903 applied to the tasks on every node (i.e. the lowest task
904 ID on each node is mapped to the first mask specified in
905 the list, etc.). NUMA masks are always interpreted as
906 hexadecimal values. Note that masks must be preceded
907 with a '0x' if they don't begin with [0-9] so they are
908 seen as numerical values. If the number of tasks (or
909 ranks) exceeds the number of elements in this list, ele‐
910 ments in the list will be reused as needed starting from
911 the beginning of the list. To simplify support for large
912 task counts, the lists may follow a mask with an asterisk
913 and repetition count For example "mask_mem:0*4,1*4". Not
914 supported unless the entire node is allocated to the job.
915
916 no[ne] don't bind tasks to memory (default)
917
918 p[refer]
919 Prefer use of first specified NUMA node, but permit
920 use of other available NUMA nodes.
921
922 q[uiet]
923 quietly bind before task runs (default)
924
925 rank bind by task rank (not recommended)
926
927 sort sort free cache pages (run zonesort on Intel KNL nodes)
928
929 v[erbose]
930 verbosely report binding before task runs
931
932
933 --mincpus=<n>
934 Specify a minimum number of logical cpus/processors per node.
935
936
937 -N, --nodes=<minnodes[-maxnodes]>
938 Request that a minimum of minnodes nodes be allocated to this
939 job. A maximum node count may also be specified with maxnodes.
940 If only one number is specified, this is used as both the mini‐
941 mum and maximum node count. The partition's node limits super‐
942 sede those of the job. If a job's node limits are outside of
943 the range permitted for its associated partition, the job will
944 be left in a PENDING state. This permits possible execution at
945 a later time, when the partition limit is changed. If a job
946 node limit exceeds the number of nodes configured in the parti‐
947 tion, the job will be rejected. Note that the environment vari‐
948 able SLURM_JOB_NODES will be set to the count of nodes actually
949 allocated to the job. See the ENVIRONMENT VARIABLES section for
950 more information. If -N is not specified, the default behavior
951 is to allocate enough nodes to satisfy the requirements of the
952 -n and -c options. The job will be allocated as many nodes as
953 possible within the range specified and without delaying the
954 initiation of the job. The node count specification may include
955 a numeric value followed by a suffix of "k" (multiplies numeric
956 value by 1,024) or "m" (multiplies numeric value by 1,048,576).
957
958
959 -n, --ntasks=<number>
960 salloc does not launch tasks, it requests an allocation of
961 resources and executed some command. This option advises the
962 Slurm controller that job steps run within this allocation will
963 launch a maximum of number tasks and sufficient resources are
964 allocated to accomplish this. The default is one task per node,
965 but note that the --cpus-per-task option will change this
966 default.
967
968
969 --network=<type>
970 Specify information pertaining to the switch or network. The
971 interpretation of type is system dependent. This option is sup‐
972 ported when running Slurm on a Cray natively. It is used to
973 request using Network Performance Counters. Only one value per
974 request is valid. All options are case in-sensitive. In this
975 configuration supported values include:
976
977 system
978 Use the system-wide network performance counters. Only
979 nodes requested will be marked in use for the job alloca‐
980 tion. If the job does not fill up the entire system the
981 rest of the nodes are not able to be used by other jobs
982 using NPC, if idle their state will appear as PerfCnts.
983 These nodes are still available for other jobs not using
984 NPC.
985
986 blade Use the blade network performance counters. Only nodes
987 requested will be marked in use for the job allocation.
988 If the job does not fill up the entire blade(s) allocated
989 to the job those blade(s) are not able to be used by other
990 jobs using NPC, if idle their state will appear as PerfC‐
991 nts. These nodes are still available for other jobs not
992 using NPC.
993
994
995 In all cases the job allocation request must specify the
996 --exclusive option. Otherwise the request will be denied.
997
998 Also with any of these options steps are not allowed to share
999 blades, so resources would remain idle inside an allocation if
1000 the step running on a blade does not take up all the nodes on
1001 the blade.
1002
1003 The network option is also supported on systems with IBM's Par‐
1004 allel Environment (PE). See IBM's LoadLeveler job command key‐
1005 word documentation about the keyword "network" for more informa‐
1006 tion. Multiple values may be specified in a comma separated
1007 list. All options are case in-sensitive. Supported values
1008 include:
1009
1010 BULK_XFER[=<resources>]
1011 Enable bulk transfer of data using Remote Direct-
1012 Memory Access (RDMA). The optional resources speci‐
1013 fication is a numeric value which can have a suffix
1014 of "k", "K", "m", "M", "g" or "G" for kilobytes,
1015 megabytes or gigabytes. NOTE: The resources speci‐
1016 fication is not supported by the underlying IBM in‐
1017 frastructure as of Parallel Environment version 2.2
1018 and no value should be specified at this time.
1019
1020 CAU=<count> Number of Collectve Acceleration Units (CAU)
1021 required. Applies only to IBM Power7-IH processors.
1022 Default value is zero. Independent CAU will be
1023 allocated for each programming interface (MPI, LAPI,
1024 etc.)
1025
1026 DEVNAME=<name>
1027 Specify the device name to use for communications
1028 (e.g. "eth0" or "mlx4_0").
1029
1030 DEVTYPE=<type>
1031 Specify the device type to use for communications.
1032 The supported values of type are: "IB" (InfiniBand),
1033 "HFI" (P7 Host Fabric Interface), "IPONLY" (IP-Only
1034 interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1035 nel Emulation of HPCE). The devices allocated to a
1036 job must all be of the same type. The default value
1037 depends upon depends upon what hardware is available
1038 and in order of preferences is IPONLY (which is not
1039 considered in User Space mode), HFI, IB, HPCE, and
1040 KMUX.
1041
1042 IMMED =<count>
1043 Number of immediate send slots per window required.
1044 Applies only to IBM Power7-IH processors. Default
1045 value is zero.
1046
1047 INSTANCES =<count>
1048 Specify number of network connections for each task
1049 on each network connection. The default instance
1050 count is 1.
1051
1052 IPV4 Use Internet Protocol (IP) version 4 communications
1053 (default).
1054
1055 IPV6 Use Internet Protocol (IP) version 6 communications.
1056
1057 LAPI Use the LAPI programming interface.
1058
1059 MPI Use the MPI programming interface. MPI is the
1060 default interface.
1061
1062 PAMI Use the PAMI programming interface.
1063
1064 SHMEM Use the OpenSHMEM programming interface.
1065
1066 SN_ALL Use all available switch networks (default).
1067
1068 SN_SINGLE Use one available switch network.
1069
1070 UPC Use the UPC programming interface.
1071
1072 US Use User Space communications.
1073
1074
1075 Some examples of network specifications:
1076
1077 Instances=2,US,MPI,SN_ALL
1078 Create two user space connections for MPI communica‐
1079 tions on every switch network for each task.
1080
1081 US,MPI,Instances=3,Devtype=IB
1082 Create three user space connections for MPI communi‐
1083 cations on every InfiniBand network for each task.
1084
1085 IPV4,LAPI,SN_Single
1086 Create a IP version 4 connection for LAPI communica‐
1087 tions on one switch network for each task.
1088
1089 Instances=2,US,LAPI,MPI
1090 Create two user space connections each for LAPI and
1091 MPI communications on every switch network for each
1092 task. Note that SN_ALL is the default option so
1093 every switch network is used. Also note that
1094 Instances=2 specifies that two connections are
1095 established for each protocol (LAPI and MPI) and
1096 each task. If there are two networks and four tasks
1097 on the node then a total of 32 connections are
1098 established (2 instances x 2 protocols x 2 networks
1099 x 4 tasks).
1100
1101
1102 --nice[=adjustment]
1103 Run the job with an adjusted scheduling priority within Slurm.
1104 With no adjustment value the scheduling priority is decreased by
1105 100. A negative nice value increases the priority, otherwise
1106 decreases it. The adjustment range is +/- 2147483645. Only priv‐
1107 ileged users can specify a negative adjustment.
1108
1109
1110 --ntasks-per-core=<ntasks>
1111 Request the maximum ntasks be invoked on each core. Meant to be
1112 used with the --ntasks option. Related to --ntasks-per-node
1113 except at the core level instead of the node level. NOTE: This
1114 option is not supported unless SelectType=cons_res is configured
1115 (either directly or indirectly on Cray systems) along with the
1116 node's core count.
1117
1118
1119 --ntasks-per-node=<ntasks>
1120 Request that ntasks be invoked on each node. If used with the
1121 --ntasks option, the --ntasks option will take precedence and
1122 the --ntasks-per-node will be treated as a maximum count of
1123 tasks per node. Meant to be used with the --nodes option. This
1124 is related to --cpus-per-task=ncpus, but does not require knowl‐
1125 edge of the actual number of cpus on each node. In some cases,
1126 it is more convenient to be able to request that no more than a
1127 specific number of tasks be invoked on each node. Examples of
1128 this include submitting a hybrid MPI/OpenMP app where only one
1129 MPI "task/rank" should be assigned to each node while allowing
1130 the OpenMP portion to utilize all of the parallelism present in
1131 the node, or submitting a single setup/cleanup/monitoring job to
1132 each node of a pre-existing allocation as one step in a larger
1133 job script.
1134
1135
1136 --ntasks-per-socket=<ntasks>
1137 Request the maximum ntasks be invoked on each socket. Meant to
1138 be used with the --ntasks option. Related to --ntasks-per-node
1139 except at the socket level instead of the node level. NOTE:
1140 This option is not supported unless SelectType=cons_res is con‐
1141 figured (either directly or indirectly on Cray systems) along
1142 with the node's socket count.
1143
1144
1145 --no-bell
1146 Silence salloc's use of the terminal bell. Also see the option
1147 --bell.
1148
1149
1150 --no-shell
1151 immediately exit after allocating resources, without running a
1152 command. However, the Slurm job will still be created and will
1153 remain active and will own the allocated resources as long as it
1154 is active. You will have a Slurm job id with no associated pro‐
1155 cesses or tasks. You can submit srun commands against this
1156 resource allocation, if you specify the --jobid= option with the
1157 job id of this Slurm job. Or, this can be used to temporarily
1158 reserve a set of resources so that other jobs cannot use them
1159 for some period of time. (Note that the Slurm job is subject to
1160 the normal constraints on jobs, including time limits, so that
1161 eventually the job will terminate and the resources will be
1162 freed, or you can terminate the job manually using the scancel
1163 command.)
1164
1165
1166 -O, --overcommit
1167 Overcommit resources. When applied to job allocation, only one
1168 CPU is allocated to the job per node and options used to specify
1169 the number of tasks per node, socket, core, etc. are ignored.
1170 When applied to job step allocations (the srun command when exe‐
1171 cuted within an existing job allocation), this option can be
1172 used to launch more than one task per CPU. Normally, srun will
1173 not allocate more than one process per CPU. By specifying
1174 --overcommit you are explicitly allowing more than one process
1175 per CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1176 mitted to execute per node. NOTE: MAX_TASKS_PER_NODE is defined
1177 in the file slurm.h and is not a variable, it is set at Slurm
1178 build time.
1179
1180
1181 -p, --partition=<partition_names>
1182 Request a specific partition for the resource allocation. If
1183 not specified, the default behavior is to allow the slurm con‐
1184 troller to select the default partition as designated by the
1185 system administrator. If the job can use more than one parti‐
1186 tion, specify their names in a comma separate list and the one
1187 offering earliest initiation will be used with no regard given
1188 to the partition name ordering (although higher priority parti‐
1189 tions will be considered first). When the job is initiated, the
1190 name of the partition used will be placed first in the job
1191 record partition string.
1192
1193
1194 --power=<flags>
1195 Comma separated list of power management plugin options. Cur‐
1196 rently available flags include: level (all nodes allocated to
1197 the job should have identical power caps, may be disabled by the
1198 Slurm configuration option PowerParameters=job_no_level).
1199
1200
1201 --priority=<value>
1202 Request a specific job priority. May be subject to configura‐
1203 tion specific constraints. value should either be a numeric
1204 value or "TOP" (for highest possible value). Only Slurm opera‐
1205 tors and administrators can set the priority of a job.
1206
1207
1208 --profile=<all|none|[energy[,|task[,|lustre[,|network]]]]>
1209 enables detailed data collection by the acct_gather_profile
1210 plugin. Detailed data are typically time-series that are stored
1211 in an HDF5 file for the job or an InfluxDB database depending on
1212 the configured plugin.
1213
1214
1215 All All data types are collected. (Cannot be combined with
1216 other values.)
1217
1218
1219 None No data types are collected. This is the default.
1220 (Cannot be combined with other values.)
1221
1222
1223 Energy Energy data is collected.
1224
1225
1226 Task Task (I/O, Memory, ...) data is collected.
1227
1228
1229 Lustre Lustre data is collected.
1230
1231
1232 Network Network (InfiniBand) data is collected.
1233
1234
1235 -q, --qos=<qos>
1236 Request a quality of service for the job. QOS values can be
1237 defined for each user/cluster/account association in the Slurm
1238 database. Users will be limited to their association's defined
1239 set of qos's when the Slurm configuration parameter, Account‐
1240 ingStorageEnforce, includes "qos" in it's definition.
1241
1242
1243 -Q, --quiet
1244 Suppress informational messages from salloc. Errors will still
1245 be displayed.
1246
1247
1248 --reboot
1249 Force the allocated nodes to reboot before starting the job.
1250 This is only supported with some system configurations and will
1251 otherwise be silently ignored.
1252
1253
1254 --reservation=<name>
1255 Allocate resources for the job from the named reservation.
1256
1257 --share The --share option has been replaced by the --oversub‐
1258 scribe option described below.
1259
1260
1261 -s, --oversubscribe
1262 The job allocation can over-subscribe resources with other run‐
1263 ning jobs. The resources to be over-subscribed can be nodes,
1264 sockets, cores, and/or hyperthreads depending upon configura‐
1265 tion. The default over-subscribe behavior depends on system
1266 configuration and the partition's OverSubscribe option takes
1267 precedence over the job's option. This option may result in the
1268 allocation being granted sooner than if the --oversubscribe
1269 option was not set and allow higher system utilization, but
1270 application performance will likely suffer due to competition
1271 for resources. Also see the --exclusive option.
1272
1273
1274 -S, --core-spec=<num>
1275 Count of specialized cores per node reserved by the job for sys‐
1276 tem operations and not used by the application. The application
1277 will not use these cores, but will be charged for their alloca‐
1278 tion. Default value is dependent upon the node's configured
1279 CoreSpecCount value. If a value of zero is designated and the
1280 Slurm configuration option AllowSpecResourcesUsage is enabled,
1281 the job will be allowed to override CoreSpecCount and use the
1282 specialized resources on nodes it is allocated. This option can
1283 not be used with the --thread-spec option.
1284
1285
1286 --signal=<sig_num>[@<sig_time>]
1287 When a job is within sig_time seconds of its end time, send it
1288 the signal sig_num. Due to the resolution of event handling by
1289 Slurm, the signal may be sent up to 60 seconds earlier than
1290 specified. sig_num may either be a signal number or name (e.g.
1291 "10" or "USR1"). sig_time must have an integer value between 0
1292 and 65535. By default, no signal is sent before the job's end
1293 time. If a sig_num is specified without any sig_time, the
1294 default time will be 60 seconds.
1295
1296
1297 --sockets-per-node=<sockets>
1298 Restrict node selection to nodes with at least the specified
1299 number of sockets. See additional information under -B option
1300 above when task/affinity plugin is enabled.
1301
1302
1303 --spread-job
1304 Spread the job allocation over as many nodes as possible and
1305 attempt to evenly distribute tasks across the allocated nodes.
1306 This option disables the topology/tree plugin.
1307
1308
1309 --switches=<count>[@<max-time>]
1310 When a tree topology is used, this defines the maximum count of
1311 switches desired for the job allocation and optionally the maxi‐
1312 mum time to wait for that number of switches. If Slurm finds an
1313 allocation containing more switches than the count specified,
1314 the job remains pending until it either finds an allocation with
1315 desired switch count or the time limit expires. It there is no
1316 switch count limit, there is no delay in starting the job.
1317 Acceptable time formats include "minutes", "minutes:seconds",
1318 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1319 "days-hours:minutes:seconds". The job's maximum time delay may
1320 be limited by the system administrator using the SchedulerParam‐
1321 eters configuration parameter with the max_switch_wait parameter
1322 option. On a dragonfly network the only switch count supported
1323 is 1 since communication performance will be highest when a job
1324 is allocate resources on one leaf switch or more than 2 leaf
1325 switches. The default max-time is the max_switch_wait Sched‐
1326 ulerParameters.
1327
1328
1329 -t, --time=<time>
1330 Set a limit on the total run time of the job allocation. If the
1331 requested time limit exceeds the partition's time limit, the job
1332 will be left in a PENDING state (possibly indefinitely). The
1333 default time limit is the partition's default time limit. When
1334 the time limit is reached, each task in each job step is sent
1335 SIGTERM followed by SIGKILL. The interval between signals is
1336 specified by the Slurm configuration parameter KillWait. The
1337 OverTimeLimit configuration parameter may permit the job to run
1338 longer than scheduled. Time resolution is one minute and second
1339 values are rounded up to the next minute.
1340
1341 A time limit of zero requests that no time limit be imposed.
1342 Acceptable time formats include "minutes", "minutes:seconds",
1343 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1344 "days-hours:minutes:seconds".
1345
1346
1347 --thread-spec=<num>
1348 Count of specialized threads per node reserved by the job for
1349 system operations and not used by the application. The applica‐
1350 tion will not use these threads, but will be charged for their
1351 allocation. This option can not be used with the --core-spec
1352 option.
1353
1354
1355 --threads-per-core=<threads>
1356 Restrict node selection to nodes with at least the specified
1357 number of threads per core. NOTE: "Threads" refers to the num‐
1358 ber of processing units on each core rather than the number of
1359 application tasks to be launched per core. See additional
1360 information under -B option above when task/affinity plugin is
1361 enabled.
1362
1363
1364 --time-min=<time>
1365 Set a minimum time limit on the job allocation. If specified,
1366 the job may have it's --time limit lowered to a value no lower
1367 than --time-min if doing so permits the job to begin execution
1368 earlier than otherwise possible. The job's time limit will not
1369 be changed after the job is allocated resources. This is per‐
1370 formed by a backfill scheduling algorithm to allocate resources
1371 otherwise reserved for higher priority jobs. Acceptable time
1372 formats include "minutes", "minutes:seconds", "hours:min‐
1373 utes:seconds", "days-hours", "days-hours:minutes" and
1374 "days-hours:minutes:seconds".
1375
1376
1377 --tmp=<size[units]>
1378 Specify a minimum amount of temporary disk space per node.
1379 Default units are megabytes unless the SchedulerParameters con‐
1380 figuration parameter includes the "default_gbytes" option for
1381 gigabytes. Different units can be specified using the suffix
1382 [K|M|G|T].
1383
1384
1385 -u, --usage
1386 Display brief help message and exit.
1387
1388
1389 --uid=<user>
1390 Attempt to submit and/or run a job as user instead of the invok‐
1391 ing user id. The invoking user's credentials will be used to
1392 check access permissions for the target partition. This option
1393 is only valid for user root. This option may be used by user
1394 root may use this option to run jobs as a normal user in a
1395 RootOnly partition for example. If run as root, salloc will drop
1396 its permissions to the uid specified after node allocation is
1397 successful. user may be the user name or numerical user ID.
1398
1399
1400 --use-min-nodes
1401 If a range of node counts is given, prefer the smaller count.
1402
1403
1404 -V, --version
1405 Display version information and exit.
1406
1407
1408 -v, --verbose
1409 Increase the verbosity of salloc's informational messages. Mul‐
1410 tiple -v's will further increase salloc's verbosity. By default
1411 only errors will be displayed.
1412
1413
1414 -w, --nodelist=<node name list>
1415 Request a specific list of hosts. The job will contain all of
1416 these hosts and possibly additional hosts as needed to satisfy
1417 resource requirements. The list may be specified as a
1418 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1419 for example), or a filename. The host list will be assumed to
1420 be a filename if it contains a "/" character. If you specify a
1421 minimum node or processor count larger than can be satisfied by
1422 the supplied host list, additional resources will be allocated
1423 on other nodes as needed. Duplicate node names in the list will
1424 be ignored. The order of the node names in the list is not
1425 important; the node names will be sorted by Slurm.
1426
1427
1428 --wait-all-nodes=<value>
1429 Controls when the execution of the command begins with respect
1430 to when nodes are ready for use (i.e. booted). By default, the
1431 salloc command will return as soon as the allocation is made.
1432 This default can be altered using the salloc_wait_nodes option
1433 to the SchedulerParameters parameter in the slurm.conf file.
1434
1435 0 Begin execution as soon as allocation can be made. Do not
1436 wait for all nodes to be ready for use (i.e. booted).
1437
1438 1 Do not begin execution until all nodes are ready for use.
1439
1440
1441 --wckey=<wckey>
1442 Specify wckey to be used with job. If TrackWCKey=no (default)
1443 in the slurm.conf this value is ignored.
1444
1445
1446 -x, --exclude=<node name list>
1447 Explicitly exclude certain nodes from the resources granted to
1448 the job.
1449
1450
1451 --x11[=<all|first|last>]
1452 Sets up X11 forwarding on all, first or last node(s) of the
1453 allocation. This option is only enabled if Slurm was compiled
1454 with X11 support and PrologFlags=x11 is defined in the
1455 slurm.conf. Default is all.
1456
1457
1459 Upon startup, salloc will read and handle the options set in the fol‐
1460 lowing environment variables. Note: Command line options always over‐
1461 ride environment variables settings.
1462
1463
1464 SALLOC_ACCOUNT Same as -A, --account
1465
1466 SALLOC_ACCTG_FREQ Same as --acctg-freq
1467
1468 SALLOC_BELL Same as --bell
1469
1470 SALLOC_BURST_BUFFER Same as --bb
1471
1472 SALLOC_CLUSTERS or SLURM_CLUSTERS
1473 Same as --clusters
1474
1475 SALLOC_CONSTRAINT Same as -C, --constraint
1476
1477 SALLOC_CORE_SPEC Same as --core-spec
1478
1479 SALLOC_DEBUG Same as -v, --verbose
1480
1481 SALLOC_DELAY_BOOT Same as --delay-boot
1482
1483 SALLOC_EXCLUSIVE Same as --exclusive
1484
1485 SALLOC_GRES Same as --gres
1486
1487 SALLOC_GRES_FLAGS Same as --gres-flags
1488
1489 SALLOC_HINT or SLURM_HINT
1490 Same as --hint
1491
1492 SALLOC_IMMEDIATE Same as -I, --immediate
1493
1494 SALLOC_JOBID Same as --jobid
1495
1496 SALLOC_KILL_CMD Same as -K, --kill-command
1497
1498 SALLOC_MEM_BIND Same as --mem-bind
1499
1500 SALLOC_NETWORK Same as --network
1501
1502 SALLOC_NO_BELL Same as --no-bell
1503
1504 SALLOC_OVERCOMMIT Same as -O, --overcommit
1505
1506 SALLOC_PARTITION Same as -p, --partition
1507
1508 SALLOC_POWER Same as --power
1509
1510 SALLOC_PROFILE Same as --profile
1511
1512 SALLOC_QOS Same as --qos
1513
1514 SALLOC_REQ_SWITCH When a tree topology is used, this defines the
1515 maximum count of switches desired for the job
1516 allocation and optionally the maximum time to
1517 wait for that number of switches. See --switches.
1518
1519 SALLOC_RESERVATION Same as --reservation
1520
1521 SALLOC_SIGNAL Same as --signal
1522
1523 SALLOC_SPREAD_JOB Same as --spread-job
1524
1525 SALLOC_THREAD_SPEC Same as --thread-spec
1526
1527 SALLOC_TIMELIMIT Same as -t, --time
1528
1529 SALLOC_USE_MIN_NODES Same as --use-min-nodes
1530
1531 SALLOC_WAIT_ALL_NODES Same as --wait-all-nodes
1532
1533 SALLOC_WCKEY Same as --wckey
1534
1535 SALLOC_WAIT4SWITCH Max time waiting for requested switches. See
1536 --switches
1537
1538 SLURM_CONF The location of the Slurm configuration file.
1539
1540 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
1541 error occurs (e.g. invalid options). This can be
1542 used by a script to distinguish application exit
1543 codes from various Slurm error conditions. Also
1544 see SLURM_EXIT_IMMEDIATE.
1545
1546 SLURM_EXIT_IMMEDIATE Specifies the exit code generated when the
1547 --immediate option is used and resources are not
1548 currently available. This can be used by a
1549 script to distinguish application exit codes from
1550 various Slurm error conditions. Also see
1551 SLURM_EXIT_ERROR.
1552
1553
1555 salloc will set the following environment variables in the environment
1556 of the executed program:
1557
1558 BASIL_RESERVATION_ID
1559 The reservation ID on Cray systems running ALPS/BASIL only.
1560
1561 SLURM_*_PACK_GROUP_#
1562 For a heterogenous job allocation, the environment variables are
1563 set separately for each component.
1564
1565 SLURM_CLUSTER_NAME
1566 Name of the cluster on which the job is executing.
1567
1568 SLURM_CPUS_PER_TASK
1569 Number of CPUs requested per task. Only set if the
1570 --cpus-per-task option is specified.
1571
1572 SLURM_DISTRIBUTION
1573 Only set if the -m, --distribution option is specified.
1574
1575 SLURM_JOB_ACCOUNT
1576 Account name associated of the job allocation.
1577
1578 SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
1579 The ID of the job allocation.
1580
1581 SLURM_JOB_CPUS_PER_NODE
1582 Count of processors available to the job on this node. Note the
1583 select/linear plugin allocates entire nodes to jobs, so the
1584 value indicates the total count of CPUs on each node. The
1585 select/cons_res plugin allocates individual processors to jobs,
1586 so this number indicates the number of processors on each node
1587 allocated to the job allocation.
1588
1589 SLURM_JOB_NODELIST (and SLURM_NODELIST for backwards compatibility)
1590 List of nodes allocated to the job.
1591
1592 SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility)
1593 Total number of nodes in the job allocation.
1594
1595 SLURM_JOB_PARTITION
1596 Name of the partition in which the job is running.
1597
1598 SLURM_JOB_QOS
1599 Quality Of Service (QOS) of the job allocation.
1600
1601 SLURM_JOB_RESERVATION
1602 Advanced reservation containing the job allocation, if any.
1603
1604 SLURM_MEM_BIND
1605 Set to value of the --mem-bind option.
1606
1607 SLURM_MEM_BIND_LIST
1608 Set to bit mask used for memory binding.
1609
1610 SLURM_MEM_BIND_PREFER
1611 Set to "prefer" if the --mem-bind option includes the prefer
1612 option.
1613
1614 SLURM_MEM_BIND_SORT
1615 Sort free cache pages (run zonesort on Intel KNL nodes)
1616
1617 SLURM_MEM_BIND_TYPE
1618 Set to the memory binding type specified with the --mem-bind
1619 option. Possible values are "none", "rank", "map_map",
1620 "mask_mem" and "local".
1621
1622 SLURM_MEM_BIND_VERBOSE
1623 Set to "verbose" if the --mem-bind option includes the verbose
1624 option. Set to "quiet" otherwise.
1625
1626 SLURM_MEM_PER_CPU
1627 Same as --mem-per-cpu
1628
1629 SLURM_MEM_PER_NODE
1630 Same as --mem
1631
1632 SLURM_PACK_SIZE
1633 Set to count of components in heterogeneous job.
1634
1635 SLURM_SUBMIT_DIR
1636 The directory from which salloc was invoked.
1637
1638 SLURM_SUBMIT_HOST
1639 The hostname of the computer from which salloc was invoked.
1640
1641 SLURM_NODE_ALIASES
1642 Sets of node name, communication address and hostname for nodes
1643 allocated to the job from the cloud. Each element in the set if
1644 colon separated and each set is comma separated. For example:
1645 SLURM_NODE_ALIASES=ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
1646
1647 SLURM_NTASKS
1648 Same as -n, --ntasks
1649
1650 SLURM_NTASKS_PER_CORE
1651 Set to value of the --ntasks-per-core option, if specified.
1652
1653 SLURM_NTASKS_PER_NODE
1654 Set to value of the --ntasks-per-node option, if specified.
1655
1656 SLURM_NTASKS_PER_SOCKET
1657 Set to value of the --ntasks-per-socket option, if specified.
1658
1659 SLURM_PROFILE
1660 Same as --profile
1661
1662 SLURM_TASKS_PER_NODE
1663 Number of tasks to be initiated on each node. Values are comma
1664 separated and in the same order as SLURM_JOB_NODELIST. If two
1665 or more consecutive nodes are to have the same task count, that
1666 count is followed by "(x#)" where "#" is the repetition count.
1667 For example, "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
1668 first three nodes will each execute three tasks and the fourth
1669 node will execute one task.
1670
1671
1673 While salloc is waiting for a PENDING job allocation, most signals will
1674 cause salloc to revoke the allocation request and exit.
1675
1676 However if the allocation has been granted and salloc has already
1677 started the specified command, then salloc will ignore most signals.
1678 salloc will not exit or release the allocation until the command exits.
1679 One notable exception is SIGHUP. A SIGHUP signal will cause salloc to
1680 release the allocation and exit without waiting for the command to fin‐
1681 ish. Another exception is SIGTERM, which will be forwarded to the
1682 spawned process.
1683
1684
1686 To get an allocation, and open a new xterm in which srun commands may
1687 be typed interactively:
1688
1689 $ salloc -N16 xterm
1690 salloc: Granted job allocation 65537
1691 (at this point the xterm appears, and salloc waits for xterm to
1692 exit)
1693 salloc: Relinquishing job allocation 65537
1694
1695 To grab an allocation of nodes and launch a parallel application on one
1696 command line (See the salloc man page for more examples):
1697
1698 salloc -N5 srun -n10 myprogram
1699
1700 +To create a heterogeneous job with 3 components, each allocating a
1701 unique set of nodes:
1702
1703 salloc -w node[2-3] : -w node4 : -w node[5-7] bash
1704 salloc: job 32294 queued and waiting for resources
1705 salloc: job 32294 has been allocated resources
1706 salloc: Granted job allocation 32294
1707
1708
1710 Copyright (C) 2006-2007 The Regents of the University of California.
1711 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
1712 Copyright (C) 2008-2010 Lawrence Livermore National Security.
1713 Copyright (C) 2010-2018 SchedMD LLC.
1714
1715 This file is part of Slurm, a resource management program. For
1716 details, see <https://slurm.schedmd.com/>.
1717
1718 Slurm is free software; you can redistribute it and/or modify it under
1719 the terms of the GNU General Public License as published by the Free
1720 Software Foundation; either version 2 of the License, or (at your
1721 option) any later version.
1722
1723 Slurm is distributed in the hope that it will be useful, but WITHOUT
1724 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
1725 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
1726 for more details.
1727
1728
1730 sinfo(1), sattach(1), sbatch(1), squeue(1), scancel(1), scontrol(1),
1731 slurm.conf(5), sched_setaffinity (2), numa (3)
1732
1733
1734
1735February 2019 Slurm Commands salloc(1)