1salloc(1) Slurm Commands salloc(1)
2
3
4
6 salloc - Obtain a Slurm job allocation (a set of nodes), execute a com‐
7 mand, and then release the allocation when the command is finished.
8
9
11 salloc [OPTIONS(0)...] [ : [OPTIONS(N)...]] [command(0) [args(0)...]]
12
13 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
14 For more details about heterogeneous jobs see the document
15 https://slurm.schedmd.com/heterogeneous_jobs.html
16
17
19 salloc is used to allocate a Slurm job allocation, which is a set of
20 resources (nodes), possibly with some set of constraints (e.g. number
21 of processors per node). When salloc successfully obtains the re‐
22 quested allocation, it then runs the command specified by the user.
23 Finally, when the user specified command is complete, salloc relin‐
24 quishes the job allocation.
25
26 The command may be any program the user wishes. Some typical commands
27 are xterm, a shell script containing srun commands, and srun (see the
28 EXAMPLES section). If no command is specified, then salloc runs the
29 user's default shell.
30
31 The following document describes the influence of various options on
32 the allocation of cpus to jobs and tasks.
33 https://slurm.schedmd.com/cpu_management.html
34
35 NOTE: The salloc logic includes support to save and restore the termi‐
36 nal line settings and is designed to be executed in the foreground. If
37 you need to execute salloc in the background, set its standard input to
38 some file, for example: "salloc -n16 a.out </dev/null &"
39
40
42 If salloc is unable to execute the user command, it will return 1 and
43 print errors to stderr. Else if success or if killed by signals HUP,
44 INT, KILL, or QUIT: it will return 0.
45
46
48 If provided, the command is resolved in the following order:
49
50 1. If command starts with ".", then path is constructed as: current
51 working directory / command
52 2. If command starts with a "/", then path is considered absolute.
53 3. If command can be resolved through PATH. See path_resolution(7).
54 4. If command is in current working directory.
55
56 Current working directory is the calling process working directory un‐
57 less the --chdir argument is passed, which will override the current
58 working directory.
59
60
62 -A, --account=<account>
63 Charge resources used by this job to specified account. The ac‐
64 count is an arbitrary string. The account name may be changed
65 after job submission using the scontrol command.
66
67
68 --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
69 Define the job accounting and profiling sampling intervals in
70 seconds. This can be used to override the JobAcctGatherFre‐
71 quency parameter in the slurm.conf file. <datatype>=<interval>
72 specifies the task sampling interval for the jobacct_gather
73 plugin or a sampling interval for a profiling type by the
74 acct_gather_profile plugin. Multiple comma-separated
75 <datatype>=<interval> pairs may be specified. Supported datatype
76 values are:
77
78 task Sampling interval for the jobacct_gather plugins and
79 for task profiling by the acct_gather_profile
80 plugin.
81 NOTE: This frequency is used to monitor memory us‐
82 age. If memory limits are enforced the highest fre‐
83 quency a user can request is what is configured in
84 the slurm.conf file. It can not be disabled.
85
86 energy Sampling interval for energy profiling using the
87 acct_gather_energy plugin.
88
89 network Sampling interval for infiniband profiling using the
90 acct_gather_interconnect plugin.
91
92 filesystem Sampling interval for filesystem profiling using the
93 acct_gather_filesystem plugin.
94
95
96 The default value for the task sampling interval is 30 seconds.
97 The default value for all other intervals is 0. An interval of
98 0 disables sampling of the specified type. If the task sampling
99 interval is 0, accounting information is collected only at job
100 termination (reducing Slurm interference with the job).
101 Smaller (non-zero) values have a greater impact upon job perfor‐
102 mance, but a value of 30 seconds is not likely to be noticeable
103 for applications having less than 10,000 tasks.
104
105
106 --bb=<spec>
107 Burst buffer specification. The form of the specification is
108 system dependent. Note the burst buffer may not be accessible
109 from a login node, but require that salloc spawn a shell on one
110 of its allocated compute nodes. When the --bb option is used,
111 Slurm parses this option and creates a temporary burst buffer
112 script file that is used internally by the burst buffer plugins.
113 See Slurm's burst buffer guide for more information and exam‐
114 ples:
115 https://slurm.schedmd.com/burst_buffer.html
116
117
118 --bbf=<file_name>
119 Path of file containing burst buffer specification. The form of
120 the specification is system dependent. Also see --bb. Note the
121 burst buffer may not be accessible from a login node, but re‐
122 quire that salloc spawn a shell on one of its allocated compute
123 nodes. See Slurm's burst buffer guide for more information and
124 examples:
125 https://slurm.schedmd.com/burst_buffer.html
126
127
128 --begin=<time>
129 Defer eligibility of this job allocation until the specified
130 time.
131
132 Time may be of the form HH:MM:SS to run a job at a specific time
133 of day (seconds are optional). (If that time is already past,
134 the next day is assumed.) You may also specify midnight, noon,
135 fika (3 PM) or teatime (4 PM) and you can have a time-of-day
136 suffixed with AM or PM for running in the morning or the
137 evening. You can also say what day the job will be run, by
138 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
139 Combine date and time using the following format
140 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
141 count time-units, where the time-units can be seconds (default),
142 minutes, hours, days, or weeks and you can tell Slurm to run the
143 job today with the keyword today and to run the job tomorrow
144 with the keyword tomorrow. The value may be changed after job
145 submission using the scontrol command. For example:
146 --begin=16:00
147 --begin=now+1hour
148 --begin=now+60 (seconds by default)
149 --begin=2010-01-20T12:34:00
150
151
152 Notes on date/time specifications:
153 - Although the 'seconds' field of the HH:MM:SS time specifica‐
154 tion is allowed by the code, note that the poll time of the
155 Slurm scheduler is not precise enough to guarantee dispatch of
156 the job on the exact second. The job will be eligible to start
157 on the next poll following the specified time. The exact poll
158 interval depends on the Slurm scheduler (e.g., 60 seconds with
159 the default sched/builtin).
160 - If no time (HH:MM:SS) is specified, the default is
161 (00:00:00).
162 - If a date is specified without a year (e.g., MM/DD) then the
163 current year is assumed, unless the combination of MM/DD and
164 HH:MM:SS has already passed for that year, in which case the
165 next year is used.
166
167
168 --bell Force salloc to ring the terminal bell when the job allocation
169 is granted (and only if stdout is a tty). By default, salloc
170 only rings the bell if the allocation is pending for more than
171 ten seconds (and only if stdout is a tty). Also see the option
172 --no-bell.
173
174
175 -D, --chdir=<path>
176 Change directory to path before beginning execution. The path
177 can be specified as full path or relative path to the directory
178 where the command is executed.
179
180
181 --cluster-constraint=<list>
182 Specifies features that a federated cluster must have to have a
183 sibling job submitted to it. Slurm will attempt to submit a sib‐
184 ling job to a cluster if it has at least one of the specified
185 features.
186
187
188 -M, --clusters=<string>
189 Clusters to issue commands to. Multiple cluster names may be
190 comma separated. The job will be submitted to the one cluster
191 providing the earliest expected job initiation time. The default
192 value is the current cluster. A value of 'all' will query to run
193 on all clusters. Note that the SlurmDBD must be up for this op‐
194 tion to work properly.
195
196
197 --comment=<string>
198 An arbitrary comment.
199
200
201 -C, --constraint=<list>
202 Nodes can have features assigned to them by the Slurm adminis‐
203 trator. Users can specify which of these features are required
204 by their job using the constraint option. Only nodes having
205 features matching the job constraints will be used to satisfy
206 the request. Multiple constraints may be specified with AND,
207 OR, matching OR, resource counts, etc. (some operators are not
208 supported on all system types). Supported constraint options
209 include:
210
211 Single Name
212 Only nodes which have the specified feature will be used.
213 For example, --constraint="intel"
214
215 Node Count
216 A request can specify the number of nodes needed with
217 some feature by appending an asterisk and count after the
218 feature name. For example, --nodes=16 --con‐
219 straint="graphics*4 ..." indicates that the job requires
220 16 nodes and that at least four of those nodes must have
221 the feature "graphics."
222
223 AND If only nodes with all of specified features will be
224 used. The ampersand is used for an AND operator. For
225 example, --constraint="intel&gpu"
226
227 OR If only nodes with at least one of specified features
228 will be used. The vertical bar is used for an OR opera‐
229 tor. For example, --constraint="intel|amd"
230
231 Matching OR
232 If only one of a set of possible options should be used
233 for all allocated nodes, then use the OR operator and en‐
234 close the options within square brackets. For example,
235 --constraint="[rack1|rack2|rack3|rack4]" might be used to
236 specify that all nodes must be allocated on a single rack
237 of the cluster, but any of those four racks can be used.
238
239 Multiple Counts
240 Specific counts of multiple resources may be specified by
241 using the AND operator and enclosing the options within
242 square brackets. For example, --con‐
243 straint="[rack1*2&rack2*4]" might be used to specify that
244 two nodes must be allocated from nodes with the feature
245 of "rack1" and four nodes must be allocated from nodes
246 with the feature "rack2".
247
248 NOTE: This construct does not support multiple Intel KNL
249 NUMA or MCDRAM modes. For example, while --con‐
250 straint="[(knl&quad)*2&(knl&hemi)*4]" is not supported,
251 --constraint="[haswell*2&(knl&hemi)*4]" is supported.
252 Specification of multiple KNL modes requires the use of a
253 heterogeneous job.
254
255 Brackets
256 Brackets can be used to indicate that you are looking for
257 a set of nodes with the different requirements contained
258 within the brackets. For example, --con‐
259 straint="[(rack1|rack2)*1&(rack3)*2]" will get you one
260 node with either the "rack1" or "rack2" features and two
261 nodes with the "rack3" feature. The same request without
262 the brackets will try to find a single node that meets
263 those requirements.
264
265 NOTE: Brackets are only reserved for Multiple Counts and
266 Matching OR syntax. AND operators require a count for
267 each feature inside square brackets (i.e.
268 "[quad*2&hemi*1]").
269
270 Parenthesis
271 Parenthesis can be used to group like node features to‐
272 gether. For example, --con‐
273 straint="[(knl&snc4&flat)*4&haswell*1]" might be used to
274 specify that four nodes with the features "knl", "snc4"
275 and "flat" plus one node with the feature "haswell" are
276 required. All options within parenthesis should be
277 grouped with AND (e.g. "&") operands.
278
279
280 --container=<path_to_container>
281 Absolute path to OCI container bundle.
282
283
284 --contiguous
285 If set, then the allocated nodes must form a contiguous set.
286
287 NOTE: If SelectPlugin=cons_res this option won't be honored with
288 the topology/tree or topology/3d_torus plugins, both of which
289 can modify the node ordering.
290
291
292 -S, --core-spec=<num>
293 Count of specialized cores per node reserved by the job for sys‐
294 tem operations and not used by the application. The application
295 will not use these cores, but will be charged for their alloca‐
296 tion. Default value is dependent upon the node's configured
297 CoreSpecCount value. If a value of zero is designated and the
298 Slurm configuration option AllowSpecResourcesUsage is enabled,
299 the job will be allowed to override CoreSpecCount and use the
300 specialized resources on nodes it is allocated. This option can
301 not be used with the --thread-spec option.
302
303
304 --cores-per-socket=<cores>
305 Restrict node selection to nodes with at least the specified
306 number of cores per socket. See additional information under -B
307 option above when task/affinity plugin is enabled.
308 NOTE: This option may implicitly set the number of tasks (if -n
309 was not specified) as one task per requested thread.
310
311
312 --cpu-freq=<p1>[-p2[:p3]]
313
314 Request that job steps initiated by srun commands inside this
315 allocation be run at some requested frequency if possible, on
316 the CPUs selected for the step on the compute node(s).
317
318 p1 can be [#### | low | medium | high | highm1] which will set
319 the frequency scaling_speed to the corresponding value, and set
320 the frequency scaling_governor to UserSpace. See below for defi‐
321 nition of the values.
322
323 p1 can be [Conservative | OnDemand | Performance | PowerSave]
324 which will set the scaling_governor to the corresponding value.
325 The governor has to be in the list set by the slurm.conf option
326 CpuFreqGovernors.
327
328 When p2 is present, p1 will be the minimum scaling frequency and
329 p2 will be the maximum scaling frequency.
330
331 p2 can be [#### | medium | high | highm1] p2 must be greater
332 than p1.
333
334 p3 can be [Conservative | OnDemand | Performance | PowerSave |
335 SchedUtil | UserSpace] which will set the governor to the corre‐
336 sponding value.
337
338 If p3 is UserSpace, the frequency scaling_speed will be set by a
339 power or energy aware scheduling strategy to a value between p1
340 and p2 that lets the job run within the site's power goal. The
341 job may be delayed if p1 is higher than a frequency that allows
342 the job to run within the goal.
343
344 If the current frequency is < min, it will be set to min. Like‐
345 wise, if the current frequency is > max, it will be set to max.
346
347 Acceptable values at present include:
348
349 #### frequency in kilohertz
350
351 Low the lowest available frequency
352
353 High the highest available frequency
354
355 HighM1 (high minus one) will select the next highest
356 available frequency
357
358 Medium attempts to set a frequency in the middle of the
359 available range
360
361 Conservative attempts to use the Conservative CPU governor
362
363 OnDemand attempts to use the OnDemand CPU governor (the de‐
364 fault value)
365
366 Performance attempts to use the Performance CPU governor
367
368 PowerSave attempts to use the PowerSave CPU governor
369
370 UserSpace attempts to use the UserSpace CPU governor
371
372
373 The following informational environment variable is set
374 in the job
375 step when --cpu-freq option is requested.
376 SLURM_CPU_FREQ_REQ
377
378 This environment variable can also be used to supply the value
379 for the CPU frequency request if it is set when the 'srun' com‐
380 mand is issued. The --cpu-freq on the command line will over‐
381 ride the environment variable value. The form on the environ‐
382 ment variable is the same as the command line. See the ENVIRON‐
383 MENT VARIABLES section for a description of the
384 SLURM_CPU_FREQ_REQ variable.
385
386 NOTE: This parameter is treated as a request, not a requirement.
387 If the job step's node does not support setting the CPU fre‐
388 quency, or the requested value is outside the bounds of the le‐
389 gal frequencies, an error is logged, but the job step is allowed
390 to continue.
391
392 NOTE: Setting the frequency for just the CPUs of the job step
393 implies that the tasks are confined to those CPUs. If task con‐
394 finement (i.e., TaskPlugin=task/affinity or TaskPlu‐
395 gin=task/cgroup with the "ConstrainCores" option) is not config‐
396 ured, this parameter is ignored.
397
398 NOTE: When the step completes, the frequency and governor of
399 each selected CPU is reset to the previous values.
400
401 NOTE: When submitting jobs with the --cpu-freq option with lin‐
402 uxproc as the ProctrackType can cause jobs to run too quickly
403 before Accounting is able to poll for job information. As a re‐
404 sult not all of accounting information will be present.
405
406
407 --cpus-per-gpu=<ncpus>
408 Advise Slurm that ensuing job steps will require ncpus proces‐
409 sors per allocated GPU. Not compatible with the --cpus-per-task
410 option.
411
412
413 -c, --cpus-per-task=<ncpus>
414 Advise Slurm that ensuing job steps will require ncpus proces‐
415 sors per task. By default Slurm will allocate one processor per
416 task.
417
418 For instance, consider an application that has 4 tasks, each re‐
419 quiring 3 processors. If our cluster is comprised of quad-pro‐
420 cessors nodes and we simply ask for 12 processors, the con‐
421 troller might give us only 3 nodes. However, by using the
422 --cpus-per-task=3 options, the controller knows that each task
423 requires 3 processors on the same node, and the controller will
424 grant an allocation of 4 nodes, one for each of the 4 tasks.
425
426
427 --deadline=<OPT>
428 remove the job if no ending is possible before this deadline
429 (start > (deadline - time[-min])). Default is no deadline.
430 Valid time formats are:
431 HH:MM[:SS] [AM|PM]
432 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
433 MM/DD[/YY]-HH:MM[:SS]
434 YYYY-MM-DD[THH:MM[:SS]]]
435 now[+count[seconds(default)|minutes|hours|days|weeks]]
436
437
438 --delay-boot=<minutes>
439 Do not reboot nodes in order to satisfied this job's feature
440 specification if the job has been eligible to run for less than
441 this time period. If the job has waited for less than the spec‐
442 ified period, it will use only nodes which already have the
443 specified features. The argument is in units of minutes. A de‐
444 fault value may be set by a system administrator using the de‐
445 lay_boot option of the SchedulerParameters configuration parame‐
446 ter in the slurm.conf file, otherwise the default value is zero
447 (no delay).
448
449
450 -d, --dependency=<dependency_list>
451 Defer the start of this job until the specified dependencies
452 have been satisfied completed. <dependency_list> is of the form
453 <type:job_id[:job_id][,type:job_id[:job_id]]> or
454 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
455 must be satisfied if the "," separator is used. Any dependency
456 may be satisfied if the "?" separator is used. Only one separa‐
457 tor may be used. Many jobs can share the same dependency and
458 these jobs may even belong to different users. The value may
459 be changed after job submission using the scontrol command. De‐
460 pendencies on remote jobs are allowed in a federation. Once a
461 job dependency fails due to the termination state of a preceding
462 job, the dependent job will never be run, even if the preceding
463 job is requeued and has a different termination state in a sub‐
464 sequent execution.
465
466 after:job_id[[+time][:jobid[+time]...]]
467 After the specified jobs start or are cancelled and
468 'time' in minutes from job start or cancellation happens,
469 this job can begin execution. If no 'time' is given then
470 there is no delay after start or cancellation.
471
472 afterany:job_id[:jobid...]
473 This job can begin execution after the specified jobs
474 have terminated.
475
476 afterburstbuffer:job_id[:jobid...]
477 This job can begin execution after the specified jobs
478 have terminated and any associated burst buffer stage out
479 operations have completed.
480
481 aftercorr:job_id[:jobid...]
482 A task of this job array can begin execution after the
483 corresponding task ID in the specified job has completed
484 successfully (ran to completion with an exit code of
485 zero).
486
487 afternotok:job_id[:jobid...]
488 This job can begin execution after the specified jobs
489 have terminated in some failed state (non-zero exit code,
490 node failure, timed out, etc).
491
492 afterok:job_id[:jobid...]
493 This job can begin execution after the specified jobs
494 have successfully executed (ran to completion with an
495 exit code of zero).
496
497 singleton
498 This job can begin execution after any previously
499 launched jobs sharing the same job name and user have
500 terminated. In other words, only one job by that name
501 and owned by that user can be running or suspended at any
502 point in time. In a federation, a singleton dependency
503 must be fulfilled on all clusters unless DependencyParam‐
504 eters=disable_remote_singleton is used in slurm.conf.
505
506
507 -m, --distribution={*|block|cyclic|arbi‐
508 trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
509
510 Specify alternate distribution methods for remote processes.
511 For job allocation, this sets environment variables that will be
512 used by subsequent srun requests and also affects which cores
513 will be selected for job allocation.
514
515 This option controls the distribution of tasks to the nodes on
516 which resources have been allocated, and the distribution of
517 those resources to tasks for binding (task affinity). The first
518 distribution method (before the first ":") controls the distri‐
519 bution of tasks to nodes. The second distribution method (after
520 the first ":") controls the distribution of allocated CPUs
521 across sockets for binding to tasks. The third distribution
522 method (after the second ":") controls the distribution of allo‐
523 cated CPUs across cores for binding to tasks. The second and
524 third distributions apply only if task affinity is enabled. The
525 third distribution is supported only if the task/cgroup plugin
526 is configured. The default value for each distribution type is
527 specified by *.
528
529 Note that with select/cons_res and select/cons_tres, the number
530 of CPUs allocated to each socket and node may be different. Re‐
531 fer to https://slurm.schedmd.com/mc_support.html for more infor‐
532 mation on resource allocation, distribution of tasks to nodes,
533 and binding of tasks to CPUs.
534 First distribution method (distribution of tasks across nodes):
535
536
537 * Use the default method for distributing tasks to nodes
538 (block).
539
540 block The block distribution method will distribute tasks to a
541 node such that consecutive tasks share a node. For exam‐
542 ple, consider an allocation of three nodes each with two
543 cpus. A four-task block distribution request will dis‐
544 tribute those tasks to the nodes with tasks one and two
545 on the first node, task three on the second node, and
546 task four on the third node. Block distribution is the
547 default behavior if the number of tasks exceeds the num‐
548 ber of allocated nodes.
549
550 cyclic The cyclic distribution method will distribute tasks to a
551 node such that consecutive tasks are distributed over
552 consecutive nodes (in a round-robin fashion). For exam‐
553 ple, consider an allocation of three nodes each with two
554 cpus. A four-task cyclic distribution request will dis‐
555 tribute those tasks to the nodes with tasks one and four
556 on the first node, task two on the second node, and task
557 three on the third node. Note that when SelectType is
558 select/cons_res, the same number of CPUs may not be allo‐
559 cated on each node. Task distribution will be round-robin
560 among all the nodes with CPUs yet to be assigned to
561 tasks. Cyclic distribution is the default behavior if
562 the number of tasks is no larger than the number of allo‐
563 cated nodes.
564
565 plane The tasks are distributed in blocks of size <size>. The
566 size must be given or SLURM_DIST_PLANESIZE must be set.
567 The number of tasks distributed to each node is the same
568 as for cyclic distribution, but the taskids assigned to
569 each node depend on the plane size. Additional distribu‐
570 tion specifications cannot be combined with this option.
571 For more details (including examples and diagrams),
572 please see https://slurm.schedmd.com/mc_support.html and
573 https://slurm.schedmd.com/dist_plane.html
574
575 arbitrary
576 The arbitrary method of distribution will allocate pro‐
577 cesses in-order as listed in file designated by the envi‐
578 ronment variable SLURM_HOSTFILE. If this variable is
579 listed it will over ride any other method specified. If
580 not set the method will default to block. Inside the
581 hostfile must contain at minimum the number of hosts re‐
582 quested and be one per line or comma separated. If spec‐
583 ifying a task count (-n, --ntasks=<number>), your tasks
584 will be laid out on the nodes in the order of the file.
585 NOTE: The arbitrary distribution option on a job alloca‐
586 tion only controls the nodes to be allocated to the job
587 and not the allocation of CPUs on those nodes. This op‐
588 tion is meant primarily to control a job step's task lay‐
589 out in an existing job allocation for the srun command.
590 NOTE: If the number of tasks is given and a list of re‐
591 quested nodes is also given, the number of nodes used
592 from that list will be reduced to match that of the num‐
593 ber of tasks if the number of nodes in the list is
594 greater than the number of tasks.
595
596
597 Second distribution method (distribution of CPUs across sockets
598 for binding):
599
600
601 * Use the default method for distributing CPUs across sock‐
602 ets (cyclic).
603
604 block The block distribution method will distribute allocated
605 CPUs consecutively from the same socket for binding to
606 tasks, before using the next consecutive socket.
607
608 cyclic The cyclic distribution method will distribute allocated
609 CPUs for binding to a given task consecutively from the
610 same socket, and from the next consecutive socket for the
611 next task, in a round-robin fashion across sockets.
612 Tasks requiring more than one CPU will have all of those
613 CPUs allocated on a single socket if possible.
614
615 fcyclic
616 The fcyclic distribution method will distribute allocated
617 CPUs for binding to tasks from consecutive sockets in a
618 round-robin fashion across the sockets. Tasks requiring
619 more than one CPU will have each CPUs allocated in a
620 cyclic fashion across sockets.
621
622
623 Third distribution method (distribution of CPUs across cores for
624 binding):
625
626
627 * Use the default method for distributing CPUs across cores
628 (inherited from second distribution method).
629
630 block The block distribution method will distribute allocated
631 CPUs consecutively from the same core for binding to
632 tasks, before using the next consecutive core.
633
634 cyclic The cyclic distribution method will distribute allocated
635 CPUs for binding to a given task consecutively from the
636 same core, and from the next consecutive core for the
637 next task, in a round-robin fashion across cores.
638
639 fcyclic
640 The fcyclic distribution method will distribute allocated
641 CPUs for binding to tasks from consecutive cores in a
642 round-robin fashion across the cores.
643
644
645
646 Optional control for task distribution over nodes:
647
648
649 Pack Rather than evenly distributing a job step's tasks evenly
650 across its allocated nodes, pack them as tightly as pos‐
651 sible on the nodes. This only applies when the "block"
652 task distribution method is used.
653
654 NoPack Rather than packing a job step's tasks as tightly as pos‐
655 sible on the nodes, distribute them evenly. This user
656 option will supersede the SelectTypeParameters
657 CR_Pack_Nodes configuration parameter.
658
659
660 -x, --exclude=<node_name_list>
661 Explicitly exclude certain nodes from the resources granted to
662 the job.
663
664
665 --exclusive[={user|mcs}]
666 The job allocation can not share nodes with other running jobs
667 (or just other users with the "=user" option or with the "=mcs"
668 option). If user/mcs are not specified (i.e. the job allocation
669 can not share nodes with other running jobs), the job is allo‐
670 cated all CPUs and GRES on all nodes in the allocation, but is
671 only allocated as much memory as it requested. This is by design
672 to support gang scheduling, because suspended jobs still reside
673 in memory. To request all the memory on a node, use --mem=0.
674 The default shared/exclusive behavior depends on system configu‐
675 ration and the partition's OverSubscribe option takes precedence
676 over the job's option.
677
678
679 -B, --extra-node-info=<sockets>[:cores[:threads]]
680 Restrict node selection to nodes with at least the specified
681 number of sockets, cores per socket and/or threads per core.
682 NOTE: These options do not specify the resource allocation size.
683 Each value specified is considered a minimum. An asterisk (*)
684 can be used as a placeholder indicating that all available re‐
685 sources of that type are to be utilized. Values can also be
686 specified as min-max. The individual levels can also be speci‐
687 fied in separate options if desired:
688 --sockets-per-node=<sockets>
689 --cores-per-socket=<cores>
690 --threads-per-core=<threads>
691 If task/affinity plugin is enabled, then specifying an alloca‐
692 tion in this manner also results in subsequently launched tasks
693 being bound to threads if the -B option specifies a thread
694 count, otherwise an option of cores if a core count is speci‐
695 fied, otherwise an option of sockets. If SelectType is config‐
696 ured to select/cons_res, it must have a parameter of CR_Core,
697 CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option
698 to be honored. If not specified, the scontrol show job will
699 display 'ReqS:C:T=*:*:*'. This option applies to job alloca‐
700 tions.
701 NOTE: This option is mutually exclusive with --hint,
702 --threads-per-core and --ntasks-per-core.
703 NOTE: This option may implicitly set the number of tasks (if -n
704 was not specified) as one task per requested thread.
705
706
707 --get-user-env[=timeout][mode]
708 This option will load login environment variables for the user
709 specified in the --uid option. The environment variables are
710 retrieved by running something of this sort "su - <username> -c
711 /usr/bin/env" and parsing the output. Be aware that any envi‐
712 ronment variables already set in salloc's environment will take
713 precedence over any environment variables in the user's login
714 environment. The optional timeout value is in seconds. Default
715 value is 3 seconds. The optional mode value control the "su"
716 options. With a mode value of "S", "su" is executed without the
717 "-" option. With a mode value of "L", "su" is executed with the
718 "-" option, replicating the login environment. If mode not
719 specified, the mode established at Slurm build time is used.
720 Example of use include "--get-user-env", "--get-user-env=10"
721 "--get-user-env=10L", and "--get-user-env=S". NOTE: This option
722 only works if the caller has an effective uid of "root".
723
724
725 --gid=<group>
726 Submit the job with the specified group's group access permis‐
727 sions. group may be the group name or the numerical group ID.
728 In the default Slurm configuration, this option is only valid
729 when used by the user root.
730
731
732 --gpu-bind=[verbose,]<type>
733 Bind tasks to specific GPUs. By default every spawned task can
734 access every GPU allocated to the step. If "verbose," is speci‐
735 fied before <type>, then print out GPU binding debug information
736 to the stderr of the tasks. GPU binding is ignored if there is
737 only one task.
738
739 Supported type options:
740
741 closest Bind each task to the GPU(s) which are closest. In a
742 NUMA environment, each task may be bound to more than
743 one GPU (i.e. all GPUs in that NUMA environment).
744
745 map_gpu:<list>
746 Bind by setting GPU masks on tasks (or ranks) as spec‐
747 ified where <list> is
748 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
749 are interpreted as decimal values unless they are pre‐
750 ceded with '0x' in which case they interpreted as
751 hexadecimal values. If the number of tasks (or ranks)
752 exceeds the number of elements in this list, elements
753 in the list will be reused as needed starting from the
754 beginning of the list. To simplify support for large
755 task counts, the lists may follow a map with an aster‐
756 isk and repetition count. For example
757 "map_gpu:0*4,1*4". If the task/cgroup plugin is used
758 and ConstrainDevices is set in cgroup.conf, then the
759 GPU IDs are zero-based indexes relative to the GPUs
760 allocated to the job (e.g. the first GPU is 0, even if
761 the global ID is 3). Otherwise, the GPU IDs are global
762 IDs, and all GPUs on each node in the job should be
763 allocated for predictable binding results.
764
765 mask_gpu:<list>
766 Bind by setting GPU masks on tasks (or ranks) as spec‐
767 ified where <list> is
768 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
769 mapping is specified for a node and identical mapping
770 is applied to the tasks on every node (i.e. the lowest
771 task ID on each node is mapped to the first mask spec‐
772 ified in the list, etc.). GPU masks are always inter‐
773 preted as hexadecimal values but can be preceded with
774 an optional '0x'. To simplify support for large task
775 counts, the lists may follow a map with an asterisk
776 and repetition count. For example
777 "mask_gpu:0x0f*4,0xf0*4". If the task/cgroup plugin
778 is used and ConstrainDevices is set in cgroup.conf,
779 then the GPU IDs are zero-based indexes relative to
780 the GPUs allocated to the job (e.g. the first GPU is
781 0, even if the global ID is 3). Otherwise, the GPU IDs
782 are global IDs, and all GPUs on each node in the job
783 should be allocated for predictable binding results.
784
785 none Do not bind tasks to GPUs (turns off binding if
786 --gpus-per-task is requested).
787
788 per_task:<gpus_per_task>
789 Each task will be bound to the number of gpus speci‐
790 fied in <gpus_per_task>. Gpus are assigned in order to
791 tasks. The first task will be assigned the first x
792 number of gpus on the node etc.
793
794 single:<tasks_per_gpu>
795 Like --gpu-bind=closest, except that each task can
796 only be bound to a single GPU, even when it can be
797 bound to multiple GPUs that are equally close. The
798 GPU to bind to is determined by <tasks_per_gpu>, where
799 the first <tasks_per_gpu> tasks are bound to the first
800 GPU available, the second <tasks_per_gpu> tasks are
801 bound to the second GPU available, etc. This is basi‐
802 cally a block distribution of tasks onto available
803 GPUs, where the available GPUs are determined by the
804 socket affinity of the task and the socket affinity of
805 the GPUs as specified in gres.conf's Cores parameter.
806
807
808 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
809 Request that GPUs allocated to the job are configured with spe‐
810 cific frequency values. This option can be used to indepen‐
811 dently configure the GPU and its memory frequencies. After the
812 job is completed, the frequencies of all affected GPUs will be
813 reset to the highest possible values. In some cases, system
814 power caps may override the requested values. The field type
815 can be "memory". If type is not specified, the GPU frequency is
816 implied. The value field can either be "low", "medium", "high",
817 "highm1" or a numeric value in megahertz (MHz). If the speci‐
818 fied numeric value is not possible, a value as close as possible
819 will be used. See below for definition of the values. The ver‐
820 bose option causes current GPU frequency information to be
821 logged. Examples of use include "--gpu-freq=medium,memory=high"
822 and "--gpu-freq=450".
823
824 Supported value definitions:
825
826 low the lowest available frequency.
827
828 medium attempts to set a frequency in the middle of the
829 available range.
830
831 high the highest available frequency.
832
833 highm1 (high minus one) will select the next highest avail‐
834 able frequency.
835
836
837 -G, --gpus=[type:]<number>
838 Specify the total number of GPUs required for the job. An op‐
839 tional GPU type specification can be supplied. For example
840 "--gpus=volta:3". Multiple options can be requested in a comma
841 separated list, for example: "--gpus=volta:3,kepler:1". See
842 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
843 options.
844
845
846 --gpus-per-node=[type:]<number>
847 Specify the number of GPUs required for the job on each node in‐
848 cluded in the job's resource allocation. An optional GPU type
849 specification can be supplied. For example
850 "--gpus-per-node=volta:3". Multiple options can be requested in
851 a comma separated list, for example:
852 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
853 --gpus-per-socket and --gpus-per-task options.
854
855
856 --gpus-per-socket=[type:]<number>
857 Specify the number of GPUs required for the job on each socket
858 included in the job's resource allocation. An optional GPU type
859 specification can be supplied. For example
860 "--gpus-per-socket=volta:3". Multiple options can be requested
861 in a comma separated list, for example:
862 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
863 sockets per node count ( --sockets-per-node). See also the
864 --gpus, --gpus-per-node and --gpus-per-task options.
865
866
867 --gpus-per-task=[type:]<number>
868 Specify the number of GPUs required for the job on each task to
869 be spawned in the job's resource allocation. An optional GPU
870 type specification can be supplied. For example
871 "--gpus-per-task=volta:1". Multiple options can be requested in
872 a comma separated list, for example:
873 "--gpus-per-task=volta:3,kepler:1". See also the --gpus,
874 --gpus-per-socket and --gpus-per-node options. This option re‐
875 quires an explicit task count, e.g. -n, --ntasks or "--gpus=X
876 --gpus-per-task=Y" rather than an ambiguous range of nodes with
877 -N, --nodes. This option will implicitly set
878 --gpu-bind=per_task:<gpus_per_task>, but that can be overridden
879 with an explicit --gpu-bind specification.
880
881
882 --gres=<list>
883 Specifies a comma-delimited list of generic consumable re‐
884 sources. The format of each entry on the list is
885 "name[[:type]:count]". The name is that of the consumable re‐
886 source. The count is the number of those resources with a de‐
887 fault value of 1. The count can have a suffix of "k" or "K"
888 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
889 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
890 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
891 x 1024 x 1024 x 1024). The specified resources will be allo‐
892 cated to the job on each node. The available generic consumable
893 resources is configurable by the system administrator. A list
894 of available generic consumable resources will be printed and
895 the command will exit if the option argument is "help". Exam‐
896 ples of use include "--gres=gpu:2", "--gres=gpu:kepler:2", and
897 "--gres=help".
898
899
900 --gres-flags=<type>
901 Specify generic resource task binding options.
902
903 disable-binding
904 Disable filtering of CPUs with respect to generic re‐
905 source locality. This option is currently required to
906 use more CPUs than are bound to a GRES (i.e. if a GPU is
907 bound to the CPUs on one socket, but resources on more
908 than one socket are required to run the job). This op‐
909 tion may permit a job to be allocated resources sooner
910 than otherwise possible, but may result in lower job per‐
911 formance.
912 NOTE: This option is specific to SelectType=cons_res.
913
914 enforce-binding
915 The only CPUs available to the job will be those bound to
916 the selected GRES (i.e. the CPUs identified in the
917 gres.conf file will be strictly enforced). This option
918 may result in delayed initiation of a job. For example a
919 job requiring two GPUs and one CPU will be delayed until
920 both GPUs on a single socket are available rather than
921 using GPUs bound to separate sockets, however, the appli‐
922 cation performance may be improved due to improved commu‐
923 nication speed. Requires the node to be configured with
924 more than one socket and resource filtering will be per‐
925 formed on a per-socket basis.
926 NOTE: This option is specific to SelectType=cons_tres.
927
928
929 -h, --help
930 Display help information and exit.
931
932
933 --hint=<type>
934 Bind tasks according to application hints.
935 NOTE: This option cannot be used in conjunction with
936 --ntasks-per-core, --threads-per-core or -B. If --hint is speci‐
937 fied as a command line argument, it will take precedence over
938 the environment.
939
940 compute_bound
941 Select settings for compute bound applications: use all
942 cores in each socket, one thread per core.
943
944 memory_bound
945 Select settings for memory bound applications: use only
946 one core in each socket, one thread per core.
947
948 [no]multithread
949 [don't] use extra threads with in-core multi-threading
950 which can benefit communication intensive applications.
951 Only supported with the task/affinity plugin.
952
953 help show this help message
954
955
956 -H, --hold
957 Specify the job is to be submitted in a held state (priority of
958 zero). A held job can now be released using scontrol to reset
959 its priority (e.g. "scontrol release <job_id>").
960
961
962 -I, --immediate[=<seconds>]
963 exit if resources are not available within the time period spec‐
964 ified. If no argument is given (seconds defaults to 1), re‐
965 sources must be available immediately for the request to suc‐
966 ceed. If defer is configured in SchedulerParameters and sec‐
967 onds=1 the allocation request will fail immediately; defer con‐
968 flicts and takes precedence over this option. By default, --im‐
969 mediate is off, and the command will block until resources be‐
970 come available. Since this option's argument is optional, for
971 proper parsing the single letter option must be followed immedi‐
972 ately with the value and not include a space between them. For
973 example "-I60" and not "-I 60".
974
975
976 -J, --job-name=<jobname>
977 Specify a name for the job allocation. The specified name will
978 appear along with the job id number when querying running jobs
979 on the system. The default job name is the name of the "com‐
980 mand" specified on the command line.
981
982
983 -K, --kill-command[=signal]
984 salloc always runs a user-specified command once the allocation
985 is granted. salloc will wait indefinitely for that command to
986 exit. If you specify the --kill-command option salloc will send
987 a signal to your command any time that the Slurm controller
988 tells salloc that its job allocation has been revoked. The job
989 allocation can be revoked for a couple of reasons: someone used
990 scancel to revoke the allocation, or the allocation reached its
991 time limit. If you do not specify a signal name or number and
992 Slurm is configured to signal the spawned command at job termi‐
993 nation, the default signal is SIGHUP for interactive and SIGTERM
994 for non-interactive sessions. Since this option's argument is
995 optional, for proper parsing the single letter option must be
996 followed immediately with the value and not include a space be‐
997 tween them. For example "-K1" and not "-K 1".
998
999
1000 -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
1001 Specification of licenses (or other resources available on all
1002 nodes of the cluster) which must be allocated to this job. Li‐
1003 cense names can be followed by a colon and count (the default
1004 count is one). Multiple license names should be comma separated
1005 (e.g. "--licenses=foo:4,bar").
1006
1007
1008 --mail-type=<type>
1009 Notify user by email when certain event types occur. Valid type
1010 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
1011 BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
1012 VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
1013 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
1014 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
1015 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
1016 time limit). Multiple type values may be specified in a comma
1017 separated list. The user to be notified is indicated with
1018 --mail-user.
1019
1020
1021 --mail-user=<user>
1022 User to receive email notification of state changes as defined
1023 by --mail-type. The default value is the submitting user.
1024
1025
1026 --mcs-label=<mcs>
1027 Used only when the mcs/group plugin is enabled. This parameter
1028 is a group among the groups of the user. Default value is cal‐
1029 culated by the Plugin mcs if it's enabled.
1030
1031
1032 --mem=<size>[units]
1033 Specify the real memory required per node. Default units are
1034 megabytes. Different units can be specified using the suffix
1035 [K|M|G|T]. Default value is DefMemPerNode and the maximum value
1036 is MaxMemPerNode. If configured, both of parameters can be seen
1037 using the scontrol show config command. This parameter would
1038 generally be used if whole nodes are allocated to jobs (Select‐
1039 Type=select/linear). Also see --mem-per-cpu and --mem-per-gpu.
1040 The --mem, --mem-per-cpu and --mem-per-gpu options are mutually
1041 exclusive. If --mem, --mem-per-cpu or --mem-per-gpu are speci‐
1042 fied as command line arguments, then they will take precedence
1043 over the environment.
1044
1045 NOTE: A memory size specification of zero is treated as a spe‐
1046 cial case and grants the job access to all of the memory on each
1047 node. If the job is allocated multiple nodes in a heterogeneous
1048 cluster, the memory limit on each node will be that of the node
1049 in the allocation with the smallest memory size (same limit will
1050 apply to every node in the job's allocation).
1051
1052 NOTE: Enforcement of memory limits currently relies upon the
1053 task/cgroup plugin or enabling of accounting, which samples mem‐
1054 ory use on a periodic basis (data need not be stored, just col‐
1055 lected). In both cases memory use is based upon the job's Resi‐
1056 dent Set Size (RSS). A task may exceed the memory limit until
1057 the next periodic accounting sample.
1058
1059
1060 --mem-bind=[{quiet|verbose},]<type>
1061 Bind tasks to memory. Used only when the task/affinity plugin is
1062 enabled and the NUMA memory functions are available. Note that
1063 the resolution of CPU and memory binding may differ on some ar‐
1064 chitectures. For example, CPU binding may be performed at the
1065 level of the cores within a processor while memory binding will
1066 be performed at the level of nodes, where the definition of
1067 "nodes" may differ from system to system. By default no memory
1068 binding is performed; any task using any CPU can use any memory.
1069 This option is typically used to ensure that each task is bound
1070 to the memory closest to its assigned CPU. The use of any type
1071 other than "none" or "local" is not recommended.
1072
1073 NOTE: To have Slurm always report on the selected memory binding
1074 for all commands executed in a shell, you can enable verbose
1075 mode by setting the SLURM_MEM_BIND environment variable value to
1076 "verbose".
1077
1078 The following informational environment variables are set when
1079 --mem-bind is in use:
1080
1081 SLURM_MEM_BIND_LIST
1082 SLURM_MEM_BIND_PREFER
1083 SLURM_MEM_BIND_SORT
1084 SLURM_MEM_BIND_TYPE
1085 SLURM_MEM_BIND_VERBOSE
1086
1087 See the ENVIRONMENT VARIABLES section for a more detailed de‐
1088 scription of the individual SLURM_MEM_BIND* variables.
1089
1090 Supported options include:
1091
1092 help show this help message
1093
1094 local Use memory local to the processor in use
1095
1096 map_mem:<list>
1097 Bind by setting memory masks on tasks (or ranks) as spec‐
1098 ified where <list> is
1099 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1100 ping is specified for a node and identical mapping is ap‐
1101 plied to the tasks on every node (i.e. the lowest task ID
1102 on each node is mapped to the first ID specified in the
1103 list, etc.). NUMA IDs are interpreted as decimal values
1104 unless they are preceded with '0x' in which case they in‐
1105 terpreted as hexadecimal values. If the number of tasks
1106 (or ranks) exceeds the number of elements in this list,
1107 elements in the list will be reused as needed starting
1108 from the beginning of the list. To simplify support for
1109 large task counts, the lists may follow a map with an as‐
1110 terisk and repetition count. For example
1111 "map_mem:0x0f*4,0xf0*4". For predictable binding re‐
1112 sults, all CPUs for each node in the job should be allo‐
1113 cated to the job.
1114
1115 mask_mem:<list>
1116 Bind by setting memory masks on tasks (or ranks) as spec‐
1117 ified where <list> is
1118 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1119 mapping is specified for a node and identical mapping is
1120 applied to the tasks on every node (i.e. the lowest task
1121 ID on each node is mapped to the first mask specified in
1122 the list, etc.). NUMA masks are always interpreted as
1123 hexadecimal values. Note that masks must be preceded
1124 with a '0x' if they don't begin with [0-9] so they are
1125 seen as numerical values. If the number of tasks (or
1126 ranks) exceeds the number of elements in this list, ele‐
1127 ments in the list will be reused as needed starting from
1128 the beginning of the list. To simplify support for large
1129 task counts, the lists may follow a mask with an asterisk
1130 and repetition count. For example "mask_mem:0*4,1*4".
1131 For predictable binding results, all CPUs for each node
1132 in the job should be allocated to the job.
1133
1134 no[ne] don't bind tasks to memory (default)
1135
1136 p[refer]
1137 Prefer use of first specified NUMA node, but permit
1138 use of other available NUMA nodes.
1139
1140 q[uiet]
1141 quietly bind before task runs (default)
1142
1143 rank bind by task rank (not recommended)
1144
1145 sort sort free cache pages (run zonesort on Intel KNL nodes)
1146
1147 v[erbose]
1148 verbosely report binding before task runs
1149
1150
1151 --mem-per-cpu=<size>[units]
1152 Minimum memory required per allocated CPU. Default units are
1153 megabytes. Different units can be specified using the suffix
1154 [K|M|G|T]. The default value is DefMemPerCPU and the maximum
1155 value is MaxMemPerCPU (see exception below). If configured, both
1156 parameters can be seen using the scontrol show config command.
1157 Note that if the job's --mem-per-cpu value exceeds the config‐
1158 ured MaxMemPerCPU, then the user's limit will be treated as a
1159 memory limit per task; --mem-per-cpu will be reduced to a value
1160 no larger than MaxMemPerCPU; --cpus-per-task will be set and the
1161 value of --cpus-per-task multiplied by the new --mem-per-cpu
1162 value will equal the original --mem-per-cpu value specified by
1163 the user. This parameter would generally be used if individual
1164 processors are allocated to jobs (SelectType=select/cons_res).
1165 If resources are allocated by core, socket, or whole nodes, then
1166 the number of CPUs allocated to a job may be higher than the
1167 task count and the value of --mem-per-cpu should be adjusted ac‐
1168 cordingly. Also see --mem and --mem-per-gpu. The --mem,
1169 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1170
1171 NOTE: If the final amount of memory requested by a job can't be
1172 satisfied by any of the nodes configured in the partition, the
1173 job will be rejected. This could happen if --mem-per-cpu is
1174 used with the --exclusive option for a job allocation and
1175 --mem-per-cpu times the number of CPUs on a node is greater than
1176 the total memory of that node.
1177
1178
1179 --mem-per-gpu=<size>[units]
1180 Minimum memory required per allocated GPU. Default units are
1181 megabytes. Different units can be specified using the suffix
1182 [K|M|G|T]. Default value is DefMemPerGPU and is available on
1183 both a global and per partition basis. If configured, the pa‐
1184 rameters can be seen using the scontrol show config and scontrol
1185 show partition commands. Also see --mem. The --mem,
1186 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1187
1188
1189 --mincpus=<n>
1190 Specify a minimum number of logical cpus/processors per node.
1191
1192
1193 --network=<type>
1194 Specify information pertaining to the switch or network. The
1195 interpretation of type is system dependent. This option is sup‐
1196 ported when running Slurm on a Cray natively. It is used to re‐
1197 quest using Network Performance Counters. Only one value per
1198 request is valid. All options are case in-sensitive. In this
1199 configuration supported values include:
1200
1201 system
1202 Use the system-wide network performance counters. Only
1203 nodes requested will be marked in use for the job alloca‐
1204 tion. If the job does not fill up the entire system the
1205 rest of the nodes are not able to be used by other jobs
1206 using NPC, if idle their state will appear as PerfCnts.
1207 These nodes are still available for other jobs not using
1208 NPC.
1209
1210 blade Use the blade network performance counters. Only nodes re‐
1211 quested will be marked in use for the job allocation. If
1212 the job does not fill up the entire blade(s) allocated to
1213 the job those blade(s) are not able to be used by other
1214 jobs using NPC, if idle their state will appear as PerfC‐
1215 nts. These nodes are still available for other jobs not
1216 using NPC.
1217
1218
1219 In all cases the job allocation request must specify the
1220 --exclusive option. Otherwise the request will be denied.
1221
1222 Also with any of these options steps are not allowed to share
1223 blades, so resources would remain idle inside an allocation if
1224 the step running on a blade does not take up all the nodes on
1225 the blade.
1226
1227 The network option is also supported on systems with IBM's Par‐
1228 allel Environment (PE). See IBM's LoadLeveler job command key‐
1229 word documentation about the keyword "network" for more informa‐
1230 tion. Multiple values may be specified in a comma separated
1231 list. All options are case in-sensitive. Supported values in‐
1232 clude:
1233
1234 BULK_XFER[=<resources>]
1235 Enable bulk transfer of data using Remote Di‐
1236 rect-Memory Access (RDMA). The optional resources
1237 specification is a numeric value which can have a
1238 suffix of "k", "K", "m", "M", "g" or "G" for kilo‐
1239 bytes, megabytes or gigabytes. NOTE: The resources
1240 specification is not supported by the underlying IBM
1241 infrastructure as of Parallel Environment version
1242 2.2 and no value should be specified at this time.
1243
1244 CAU=<count> Number of Collectve Acceleration Units (CAU) re‐
1245 quired. Applies only to IBM Power7-IH processors.
1246 Default value is zero. Independent CAU will be al‐
1247 located for each programming interface (MPI, LAPI,
1248 etc.)
1249
1250 DEVNAME=<name>
1251 Specify the device name to use for communications
1252 (e.g. "eth0" or "mlx4_0").
1253
1254 DEVTYPE=<type>
1255 Specify the device type to use for communications.
1256 The supported values of type are: "IB" (InfiniBand),
1257 "HFI" (P7 Host Fabric Interface), "IPONLY" (IP-Only
1258 interfaces), "HPCE" (HPC Ethernet), and "KMUX" (Ker‐
1259 nel Emulation of HPCE). The devices allocated to a
1260 job must all be of the same type. The default value
1261 depends upon depends upon what hardware is available
1262 and in order of preferences is IPONLY (which is not
1263 considered in User Space mode), HFI, IB, HPCE, and
1264 KMUX.
1265
1266 IMMED =<count>
1267 Number of immediate send slots per window required.
1268 Applies only to IBM Power7-IH processors. Default
1269 value is zero.
1270
1271 INSTANCES =<count>
1272 Specify number of network connections for each task
1273 on each network connection. The default instance
1274 count is 1.
1275
1276 IPV4 Use Internet Protocol (IP) version 4 communications
1277 (default).
1278
1279 IPV6 Use Internet Protocol (IP) version 6 communications.
1280
1281 LAPI Use the LAPI programming interface.
1282
1283 MPI Use the MPI programming interface. MPI is the de‐
1284 fault interface.
1285
1286 PAMI Use the PAMI programming interface.
1287
1288 SHMEM Use the OpenSHMEM programming interface.
1289
1290 SN_ALL Use all available switch networks (default).
1291
1292 SN_SINGLE Use one available switch network.
1293
1294 UPC Use the UPC programming interface.
1295
1296 US Use User Space communications.
1297
1298
1299 Some examples of network specifications:
1300
1301 Instances=2,US,MPI,SN_ALL
1302 Create two user space connections for MPI communica‐
1303 tions on every switch network for each task.
1304
1305 US,MPI,Instances=3,Devtype=IB
1306 Create three user space connections for MPI communi‐
1307 cations on every InfiniBand network for each task.
1308
1309 IPV4,LAPI,SN_Single
1310 Create a IP version 4 connection for LAPI communica‐
1311 tions on one switch network for each task.
1312
1313 Instances=2,US,LAPI,MPI
1314 Create two user space connections each for LAPI and
1315 MPI communications on every switch network for each
1316 task. Note that SN_ALL is the default option so ev‐
1317 ery switch network is used. Also note that In‐
1318 stances=2 specifies that two connections are estab‐
1319 lished for each protocol (LAPI and MPI) and each
1320 task. If there are two networks and four tasks on
1321 the node then a total of 32 connections are estab‐
1322 lished (2 instances x 2 protocols x 2 networks x 4
1323 tasks).
1324
1325
1326 --nice[=adjustment]
1327 Run the job with an adjusted scheduling priority within Slurm.
1328 With no adjustment value the scheduling priority is decreased by
1329 100. A negative nice value increases the priority, otherwise de‐
1330 creases it. The adjustment range is +/- 2147483645. Only privi‐
1331 leged users can specify a negative adjustment.
1332
1333
1334 --no-bell
1335 Silence salloc's use of the terminal bell. Also see the option
1336 --bell.
1337
1338
1339 -k, --no-kill[=off]
1340 Do not automatically terminate a job if one of the nodes it has
1341 been allocated fails. The user will assume the responsibilities
1342 for fault-tolerance should a node fail. When there is a node
1343 failure, any active job steps (usually MPI jobs) on that node
1344 will almost certainly suffer a fatal error, but with --no-kill,
1345 the job allocation will not be revoked so the user may launch
1346 new job steps on the remaining nodes in their allocation.
1347
1348 Specify an optional argument of "off" disable the effect of the
1349 SALLOC_NO_KILL environment variable.
1350
1351 By default Slurm terminates the entire job allocation if any
1352 node fails in its range of allocated nodes.
1353
1354
1355 --no-shell
1356 immediately exit after allocating resources, without running a
1357 command. However, the Slurm job will still be created and will
1358 remain active and will own the allocated resources as long as it
1359 is active. You will have a Slurm job id with no associated pro‐
1360 cesses or tasks. You can submit srun commands against this re‐
1361 source allocation, if you specify the --jobid= option with the
1362 job id of this Slurm job. Or, this can be used to temporarily
1363 reserve a set of resources so that other jobs cannot use them
1364 for some period of time. (Note that the Slurm job is subject to
1365 the normal constraints on jobs, including time limits, so that
1366 eventually the job will terminate and the resources will be
1367 freed, or you can terminate the job manually using the scancel
1368 command.)
1369
1370
1371 -F, --nodefile=<node_file>
1372 Much like --nodelist, but the list is contained in a file of
1373 name node file. The node names of the list may also span multi‐
1374 ple lines in the file. Duplicate node names in the file will
1375 be ignored. The order of the node names in the list is not im‐
1376 portant; the node names will be sorted by Slurm.
1377
1378
1379 -w, --nodelist=<node_name_list>
1380 Request a specific list of hosts. The job will contain all of
1381 these hosts and possibly additional hosts as needed to satisfy
1382 resource requirements. The list may be specified as a
1383 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1384 for example), or a filename. The host list will be assumed to
1385 be a filename if it contains a "/" character. If you specify a
1386 minimum node or processor count larger than can be satisfied by
1387 the supplied host list, additional resources will be allocated
1388 on other nodes as needed. Duplicate node names in the list will
1389 be ignored. The order of the node names in the list is not im‐
1390 portant; the node names will be sorted by Slurm.
1391
1392
1393 -N, --nodes=<minnodes>[-maxnodes]
1394 Request that a minimum of minnodes nodes be allocated to this
1395 job. A maximum node count may also be specified with maxnodes.
1396 If only one number is specified, this is used as both the mini‐
1397 mum and maximum node count. The partition's node limits super‐
1398 sede those of the job. If a job's node limits are outside of
1399 the range permitted for its associated partition, the job will
1400 be left in a PENDING state. This permits possible execution at
1401 a later time, when the partition limit is changed. If a job
1402 node limit exceeds the number of nodes configured in the parti‐
1403 tion, the job will be rejected. Note that the environment vari‐
1404 able SLURM_JOB_NUM_NODES will be set to the count of nodes actu‐
1405 ally allocated to the job. See the ENVIRONMENT VARIABLES sec‐
1406 tion for more information. If -N is not specified, the default
1407 behavior is to allocate enough nodes to satisfy the requirements
1408 of the -n and -c options. The job will be allocated as many
1409 nodes as possible within the range specified and without delay‐
1410 ing the initiation of the job. The node count specification may
1411 include a numeric value followed by a suffix of "k" (multiplies
1412 numeric value by 1,024) or "m" (multiplies numeric value by
1413 1,048,576).
1414
1415
1416 -n, --ntasks=<number>
1417 salloc does not launch tasks, it requests an allocation of re‐
1418 sources and executed some command. This option advises the Slurm
1419 controller that job steps run within this allocation will launch
1420 a maximum of number tasks and sufficient resources are allocated
1421 to accomplish this. The default is one task per node, but note
1422 that the --cpus-per-task option will change this default.
1423
1424
1425 --ntasks-per-core=<ntasks>
1426 Request the maximum ntasks be invoked on each core. Meant to be
1427 used with the --ntasks option. Related to --ntasks-per-node ex‐
1428 cept at the core level instead of the node level. NOTE: This
1429 option is not supported when using SelectType=select/linear.
1430
1431
1432 --ntasks-per-gpu=<ntasks>
1433 Request that there are ntasks tasks invoked for every GPU. This
1434 option can work in two ways: 1) either specify --ntasks in addi‐
1435 tion, in which case a type-less GPU specification will be auto‐
1436 matically determined to satisfy --ntasks-per-gpu, or 2) specify
1437 the GPUs wanted (e.g. via --gpus or --gres) without specifying
1438 --ntasks, and the total task count will be automatically deter‐
1439 mined. The number of CPUs needed will be automatically in‐
1440 creased if necessary to allow for any calculated task count.
1441 This option will implicitly set --gpu-bind=single:<ntasks>, but
1442 that can be overridden with an explicit --gpu-bind specifica‐
1443 tion. This option is not compatible with a node range (i.e.
1444 -N<minnodes-maxnodes>). This option is not compatible with
1445 --gpus-per-task, --gpus-per-socket, or --ntasks-per-node. This
1446 option is not supported unless SelectType=cons_tres is config‐
1447 ured (either directly or indirectly on Cray systems).
1448
1449
1450 --ntasks-per-node=<ntasks>
1451 Request that ntasks be invoked on each node. If used with the
1452 --ntasks option, the --ntasks option will take precedence and
1453 the --ntasks-per-node will be treated as a maximum count of
1454 tasks per node. Meant to be used with the --nodes option. This
1455 is related to --cpus-per-task=ncpus, but does not require knowl‐
1456 edge of the actual number of cpus on each node. In some cases,
1457 it is more convenient to be able to request that no more than a
1458 specific number of tasks be invoked on each node. Examples of
1459 this include submitting a hybrid MPI/OpenMP app where only one
1460 MPI "task/rank" should be assigned to each node while allowing
1461 the OpenMP portion to utilize all of the parallelism present in
1462 the node, or submitting a single setup/cleanup/monitoring job to
1463 each node of a pre-existing allocation as one step in a larger
1464 job script.
1465
1466
1467 --ntasks-per-socket=<ntasks>
1468 Request the maximum ntasks be invoked on each socket. Meant to
1469 be used with the --ntasks option. Related to --ntasks-per-node
1470 except at the socket level instead of the node level. NOTE:
1471 This option is not supported when using SelectType=select/lin‐
1472 ear.
1473
1474
1475 -O, --overcommit
1476 Overcommit resources.
1477
1478 When applied to a job allocation (not including jobs requesting
1479 exclusive access to the nodes) the resources are allocated as if
1480 only one task per node is requested. This means that the re‐
1481 quested number of cpus per task (-c, --cpus-per-task) are allo‐
1482 cated per node rather than being multiplied by the number of
1483 tasks. Options used to specify the number of tasks per node,
1484 socket, core, etc. are ignored.
1485
1486 When applied to job step allocations (the srun command when exe‐
1487 cuted within an existing job allocation), this option can be
1488 used to launch more than one task per CPU. Normally, srun will
1489 not allocate more than one process per CPU. By specifying
1490 --overcommit you are explicitly allowing more than one process
1491 per CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1492 mitted to execute per node. NOTE: MAX_TASKS_PER_NODE is defined
1493 in the file slurm.h and is not a variable, it is set at Slurm
1494 build time.
1495
1496
1497 -s, --oversubscribe
1498 The job allocation can over-subscribe resources with other run‐
1499 ning jobs. The resources to be over-subscribed can be nodes,
1500 sockets, cores, and/or hyperthreads depending upon configura‐
1501 tion. The default over-subscribe behavior depends on system
1502 configuration and the partition's OverSubscribe option takes
1503 precedence over the job's option. This option may result in the
1504 allocation being granted sooner than if the --oversubscribe op‐
1505 tion was not set and allow higher system utilization, but appli‐
1506 cation performance will likely suffer due to competition for re‐
1507 sources. Also see the --exclusive option.
1508
1509
1510 -p, --partition=<partition_names>
1511 Request a specific partition for the resource allocation. If
1512 not specified, the default behavior is to allow the slurm con‐
1513 troller to select the default partition as designated by the
1514 system administrator. If the job can use more than one parti‐
1515 tion, specify their names in a comma separate list and the one
1516 offering earliest initiation will be used with no regard given
1517 to the partition name ordering (although higher priority parti‐
1518 tions will be considered first). When the job is initiated, the
1519 name of the partition used will be placed first in the job
1520 record partition string.
1521
1522
1523 --power=<flags>
1524 Comma separated list of power management plugin options. Cur‐
1525 rently available flags include: level (all nodes allocated to
1526 the job should have identical power caps, may be disabled by the
1527 Slurm configuration option PowerParameters=job_no_level).
1528
1529
1530 --priority=<value>
1531 Request a specific job priority. May be subject to configura‐
1532 tion specific constraints. value should either be a numeric
1533 value or "TOP" (for highest possible value). Only Slurm opera‐
1534 tors and administrators can set the priority of a job.
1535
1536
1537 --profile={all|none|<type>[,<type>...]}
1538 Enables detailed data collection by the acct_gather_profile
1539 plugin. Detailed data are typically time-series that are stored
1540 in an HDF5 file for the job or an InfluxDB database depending on
1541 the configured plugin.
1542
1543
1544 All All data types are collected. (Cannot be combined with
1545 other values.)
1546
1547
1548 None No data types are collected. This is the default.
1549 (Cannot be combined with other values.)
1550
1551
1552 Valid type values are:
1553
1554
1555 Energy Energy data is collected.
1556
1557
1558 Task Task (I/O, Memory, ...) data is collected.
1559
1560
1561 Lustre Lustre data is collected.
1562
1563
1564 Network
1565 Network (InfiniBand) data is collected.
1566
1567
1568 -q, --qos=<qos>
1569 Request a quality of service for the job. QOS values can be de‐
1570 fined for each user/cluster/account association in the Slurm
1571 database. Users will be limited to their association's defined
1572 set of qos's when the Slurm configuration parameter, Account‐
1573 ingStorageEnforce, includes "qos" in its definition.
1574
1575
1576 -Q, --quiet
1577 Suppress informational messages from salloc. Errors will still
1578 be displayed.
1579
1580
1581 --reboot
1582 Force the allocated nodes to reboot before starting the job.
1583 This is only supported with some system configurations and will
1584 otherwise be silently ignored. Only root, SlurmUser or admins
1585 can reboot nodes.
1586
1587
1588 --reservation=<reservation_names>
1589 Allocate resources for the job from the named reservation. If
1590 the job can use more than one reservation, specify their names
1591 in a comma separate list and the one offering earliest initia‐
1592 tion. Each reservation will be considered in the order it was
1593 requested. All reservations will be listed in scontrol/squeue
1594 through the life of the job. In accounting the first reserva‐
1595 tion will be seen and after the job starts the reservation used
1596 will replace it.
1597
1598
1599 --signal=[R:]<sig_num>[@sig_time]
1600 When a job is within sig_time seconds of its end time, send it
1601 the signal sig_num. Due to the resolution of event handling by
1602 Slurm, the signal may be sent up to 60 seconds earlier than
1603 specified. sig_num may either be a signal number or name (e.g.
1604 "10" or "USR1"). sig_time must have an integer value between 0
1605 and 65535. By default, no signal is sent before the job's end
1606 time. If a sig_num is specified without any sig_time, the de‐
1607 fault time will be 60 seconds. Use the "R:" option to allow
1608 this job to overlap with a reservation with MaxStartDelay set.
1609 To have the signal sent at preemption time see the pre‐
1610 empt_send_user_signal SlurmctldParameter.
1611
1612
1613 --sockets-per-node=<sockets>
1614 Restrict node selection to nodes with at least the specified
1615 number of sockets. See additional information under -B option
1616 above when task/affinity plugin is enabled.
1617 NOTE: This option may implicitly set the number of tasks (if -n
1618 was not specified) as one task per requested thread.
1619
1620
1621 --spread-job
1622 Spread the job allocation over as many nodes as possible and at‐
1623 tempt to evenly distribute tasks across the allocated nodes.
1624 This option disables the topology/tree plugin.
1625
1626
1627 --switches=<count>[@max-time]
1628 When a tree topology is used, this defines the maximum count of
1629 leaf switches desired for the job allocation and optionally the
1630 maximum time to wait for that number of switches. If Slurm finds
1631 an allocation containing more switches than the count specified,
1632 the job remains pending until it either finds an allocation with
1633 desired switch count or the time limit expires. It there is no
1634 switch count limit, there is no delay in starting the job. Ac‐
1635 ceptable time formats include "minutes", "minutes:seconds",
1636 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1637 "days-hours:minutes:seconds". The job's maximum time delay may
1638 be limited by the system administrator using the SchedulerParam‐
1639 eters configuration parameter with the max_switch_wait parameter
1640 option. On a dragonfly network the only switch count supported
1641 is 1 since communication performance will be highest when a job
1642 is allocate resources on one leaf switch or more than 2 leaf
1643 switches. The default max-time is the max_switch_wait Sched‐
1644 ulerParameters.
1645
1646
1647 --thread-spec=<num>
1648 Count of specialized threads per node reserved by the job for
1649 system operations and not used by the application. The applica‐
1650 tion will not use these threads, but will be charged for their
1651 allocation. This option can not be used with the --core-spec
1652 option.
1653
1654
1655 --threads-per-core=<threads>
1656 Restrict node selection to nodes with at least the specified
1657 number of threads per core. In task layout, use the specified
1658 maximum number of threads per core. NOTE: "Threads" refers to
1659 the number of processing units on each core rather than the num‐
1660 ber of application tasks to be launched per core. See addi‐
1661 tional information under -B option above when task/affinity
1662 plugin is enabled.
1663 NOTE: This option may implicitly set the number of tasks (if -n
1664 was not specified) as one task per requested thread.
1665
1666
1667 -t, --time=<time>
1668 Set a limit on the total run time of the job allocation. If the
1669 requested time limit exceeds the partition's time limit, the job
1670 will be left in a PENDING state (possibly indefinitely). The
1671 default time limit is the partition's default time limit. When
1672 the time limit is reached, each task in each job step is sent
1673 SIGTERM followed by SIGKILL. The interval between signals is
1674 specified by the Slurm configuration parameter KillWait. The
1675 OverTimeLimit configuration parameter may permit the job to run
1676 longer than scheduled. Time resolution is one minute and second
1677 values are rounded up to the next minute.
1678
1679 A time limit of zero requests that no time limit be imposed.
1680 Acceptable time formats include "minutes", "minutes:seconds",
1681 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1682 "days-hours:minutes:seconds".
1683
1684
1685 --time-min=<time>
1686 Set a minimum time limit on the job allocation. If specified,
1687 the job may have its --time limit lowered to a value no lower
1688 than --time-min if doing so permits the job to begin execution
1689 earlier than otherwise possible. The job's time limit will not
1690 be changed after the job is allocated resources. This is per‐
1691 formed by a backfill scheduling algorithm to allocate resources
1692 otherwise reserved for higher priority jobs. Acceptable time
1693 formats include "minutes", "minutes:seconds", "hours:min‐
1694 utes:seconds", "days-hours", "days-hours:minutes" and
1695 "days-hours:minutes:seconds".
1696
1697
1698 --tmp=<size>[units]
1699 Specify a minimum amount of temporary disk space per node. De‐
1700 fault units are megabytes. Different units can be specified us‐
1701 ing the suffix [K|M|G|T].
1702
1703
1704 --uid=<user>
1705 Attempt to submit and/or run a job as user instead of the invok‐
1706 ing user id. The invoking user's credentials will be used to
1707 check access permissions for the target partition. This option
1708 is only valid for user root. This option may be used by user
1709 root may use this option to run jobs as a normal user in a
1710 RootOnly partition for example. If run as root, salloc will drop
1711 its permissions to the uid specified after node allocation is
1712 successful. user may be the user name or numerical user ID.
1713
1714
1715 --usage
1716 Display brief help message and exit.
1717
1718
1719 --use-min-nodes
1720 If a range of node counts is given, prefer the smaller count.
1721
1722
1723 -v, --verbose
1724 Increase the verbosity of salloc's informational messages. Mul‐
1725 tiple -v's will further increase salloc's verbosity. By default
1726 only errors will be displayed.
1727
1728
1729 -V, --version
1730 Display version information and exit.
1731
1732
1733 --wait-all-nodes=<value>
1734 Controls when the execution of the command begins with respect
1735 to when nodes are ready for use (i.e. booted). By default, the
1736 salloc command will return as soon as the allocation is made.
1737 This default can be altered using the salloc_wait_nodes option
1738 to the SchedulerParameters parameter in the slurm.conf file.
1739
1740 0 Begin execution as soon as allocation can be made. Do not
1741 wait for all nodes to be ready for use (i.e. booted).
1742
1743 1 Do not begin execution until all nodes are ready for use.
1744
1745
1746 --wckey=<wckey>
1747 Specify wckey to be used with job. If TrackWCKey=no (default)
1748 in the slurm.conf this value is ignored.
1749
1750
1751 --x11[={all|first|last}]
1752 Sets up X11 forwarding on "all", "first" or "last" node(s) of
1753 the allocation. This option is only enabled if Slurm was com‐
1754 piled with X11 support and PrologFlags=x11 is defined in the
1755 slurm.conf. Default is "all".
1756
1757
1759 Executing salloc sends a remote procedure call to slurmctld. If enough
1760 calls from salloc or other Slurm client commands that send remote pro‐
1761 cedure calls to the slurmctld daemon come in at once, it can result in
1762 a degradation of performance of the slurmctld daemon, possibly result‐
1763 ing in a denial of service.
1764
1765 Do not run salloc or other Slurm client commands that send remote pro‐
1766 cedure calls to slurmctld from loops in shell scripts or other pro‐
1767 grams. Ensure that programs limit calls to salloc to the minimum neces‐
1768 sary for the information you are trying to gather.
1769
1770
1772 Upon startup, salloc will read and handle the options set in the fol‐
1773 lowing environment variables. The majority of these variables are set
1774 the same way the options are set, as defined above. For flag options
1775 that are defined to expect no argument, the option can be enabled by
1776 setting the environment variable without a value (empty or NULL
1777 string), the string 'yes', or a non-zero number. Any other value for
1778 the environment variable will result in the option not being set.
1779 There are a couple exceptions to these rules that are noted below.
1780 NOTE: Command line options always override environment variables set‐
1781 tings.
1782
1783
1784 SALLOC_ACCOUNT Same as -A, --account
1785
1786 SALLOC_ACCTG_FREQ Same as --acctg-freq
1787
1788 SALLOC_BELL Same as --bell
1789
1790 SALLOC_BURST_BUFFER Same as --bb
1791
1792 SALLOC_CLUSTERS or SLURM_CLUSTERS
1793 Same as --clusters
1794
1795 SALLOC_CONSTRAINT Same as -C, --constraint
1796
1797 SALLOC_CONTAINER Same as --container.
1798
1799 SALLOC_CORE_SPEC Same as --core-spec
1800
1801 SALLOC_CPUS_PER_GPU Same as --cpus-per-gpu
1802
1803 SALLOC_DEBUG Same as -v, --verbose. Must be set to 0 or 1 to
1804 disable or enable the option.
1805
1806 SALLOC_DELAY_BOOT Same as --delay-boot
1807
1808 SALLOC_EXCLUSIVE Same as --exclusive
1809
1810 SALLOC_GPU_BIND Same as --gpu-bind
1811
1812 SALLOC_GPU_FREQ Same as --gpu-freq
1813
1814 SALLOC_GPUS Same as -G, --gpus
1815
1816 SALLOC_GPUS_PER_NODE Same as --gpus-per-node
1817
1818 SALLOC_GPUS_PER_TASK Same as --gpus-per-task
1819
1820 SALLOC_GRES Same as --gres
1821
1822 SALLOC_GRES_FLAGS Same as --gres-flags
1823
1824 SALLOC_HINT or SLURM_HINT
1825 Same as --hint
1826
1827 SALLOC_IMMEDIATE Same as -I, --immediate
1828
1829 SALLOC_KILL_CMD Same as -K, --kill-command
1830
1831 SALLOC_MEM_BIND Same as --mem-bind
1832
1833 SALLOC_MEM_PER_CPU Same as --mem-per-cpu
1834
1835 SALLOC_MEM_PER_GPU Same as --mem-per-gpu
1836
1837 SALLOC_MEM_PER_NODE Same as --mem
1838
1839 SALLOC_NETWORK Same as --network
1840
1841 SALLOC_NO_BELL Same as --no-bell
1842
1843 SALLOC_NO_KILL Same as -k, --no-kill
1844
1845 SALLOC_OVERCOMMIT Same as -O, --overcommit
1846
1847 SALLOC_PARTITION Same as -p, --partition
1848
1849 SALLOC_POWER Same as --power
1850
1851 SALLOC_PROFILE Same as --profile
1852
1853 SALLOC_QOS Same as --qos
1854
1855 SALLOC_REQ_SWITCH When a tree topology is used, this defines the
1856 maximum count of switches desired for the job al‐
1857 location and optionally the maximum time to wait
1858 for that number of switches. See --switches.
1859
1860 SALLOC_RESERVATION Same as --reservation
1861
1862 SALLOC_SIGNAL Same as --signal
1863
1864 SALLOC_SPREAD_JOB Same as --spread-job
1865
1866 SALLOC_THREAD_SPEC Same as --thread-spec
1867
1868 SALLOC_THREADS_PER_CORE
1869 Same as --threads-per-core
1870
1871 SALLOC_TIMELIMIT Same as -t, --time
1872
1873 SALLOC_USE_MIN_NODES Same as --use-min-nodes
1874
1875 SALLOC_WAIT_ALL_NODES Same as --wait-all-nodes. Must be set to 0 or 1
1876 to disable or enable the option.
1877
1878 SALLOC_WAIT4SWITCH Max time waiting for requested switches. See
1879 --switches
1880
1881 SALLOC_WCKEY Same as --wckey
1882
1883 SLURM_CONF The location of the Slurm configuration file.
1884
1885 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
1886 error occurs (e.g. invalid options). This can be
1887 used by a script to distinguish application exit
1888 codes from various Slurm error conditions. Also
1889 see SLURM_EXIT_IMMEDIATE.
1890
1891 SLURM_EXIT_IMMEDIATE Specifies the exit code generated when the --im‐
1892 mediate option is used and resources are not cur‐
1893 rently available. This can be used by a script
1894 to distinguish application exit codes from vari‐
1895 ous Slurm error conditions. Also see
1896 SLURM_EXIT_ERROR.
1897
1898
1900 salloc will set the following environment variables in the environment
1901 of the executed program:
1902
1903 SLURM_*_HET_GROUP_#
1904 For a heterogeneous job allocation, the environment variables
1905 are set separately for each component.
1906
1907 SLURM_CLUSTER_NAME
1908 Name of the cluster on which the job is executing.
1909
1910 SLURM_CONTAINER
1911 OCI Bundle for job. Only set if --container is specified.
1912
1913 SLURM_CPUS_PER_GPU
1914 Number of CPUs requested per allocated GPU. Only set if the
1915 --cpus-per-gpu option is specified.
1916
1917 SLURM_CPUS_PER_TASK
1918 Number of CPUs requested per task. Only set if the
1919 --cpus-per-task option is specified.
1920
1921 SLURM_DIST_PLANESIZE
1922 Plane distribution size. Only set for plane distributions. See
1923 -m, --distribution.
1924
1925 SLURM_DISTRIBUTION
1926 Only set if the -m, --distribution option is specified.
1927
1928 SLURM_GPU_BIND
1929 Requested binding of tasks to GPU. Only set if the --gpu-bind
1930 option is specified.
1931
1932 SLURM_GPU_FREQ
1933 Requested GPU frequency. Only set if the --gpu-freq option is
1934 specified.
1935
1936 SLURM_GPUS
1937 Number of GPUs requested. Only set if the -G, --gpus option is
1938 specified.
1939
1940 SLURM_GPUS_PER_NODE
1941 Requested GPU count per allocated node. Only set if the
1942 --gpus-per-node option is specified.
1943
1944 SLURM_GPUS_PER_SOCKET
1945 Requested GPU count per allocated socket. Only set if the
1946 --gpus-per-socket option is specified.
1947
1948 SLURM_GPUS_PER_TASK
1949 Requested GPU count per allocated task. Only set if the
1950 --gpus-per-task option is specified.
1951
1952 SLURM_HET_SIZE
1953 Set to count of components in heterogeneous job.
1954
1955 SLURM_JOB_ACCOUNT
1956 Account name associated of the job allocation.
1957
1958 SLURM_JOB_ID
1959 The ID of the job allocation.
1960
1961 SLURM_JOB_CPUS_PER_NODE
1962 Count of CPUs available to the job on the nodes in the alloca‐
1963 tion, using the format CPU_count[(xnumber_of_nodes)][,CPU_count
1964 [(xnumber_of_nodes)] ...]. For example:
1965 SLURM_JOB_CPUS_PER_NODE='72(x2),36' indicates that on the first
1966 and second nodes (as listed by SLURM_JOB_NODELIST) the alloca‐
1967 tion has 72 CPUs, while the third node has 36 CPUs. NOTE: The
1968 select/linear plugin allocates entire nodes to jobs, so the
1969 value indicates the total count of CPUs on allocated nodes. The
1970 select/cons_res and select/cons_tres plugins allocate individual
1971 CPUs to jobs, so this number indicates the number of CPUs allo‐
1972 cated to the job.
1973
1974 SLURM_JOB_NODELIST
1975 List of nodes allocated to the job.
1976
1977 SLURM_JOB_NUM_NODES
1978 Total number of nodes in the job allocation.
1979
1980 SLURM_JOB_PARTITION
1981 Name of the partition in which the job is running.
1982
1983 SLURM_JOB_QOS
1984 Quality Of Service (QOS) of the job allocation.
1985
1986 SLURM_JOB_RESERVATION
1987 Advanced reservation containing the job allocation, if any.
1988
1989 SLURM_JOBID
1990 The ID of the job allocation. See SLURM_JOB_ID. Included for
1991 backwards compatibility.
1992
1993 SLURM_MEM_BIND
1994 Set to value of the --mem-bind option.
1995
1996 SLURM_MEM_BIND_LIST
1997 Set to bit mask used for memory binding.
1998
1999 SLURM_MEM_BIND_PREFER
2000 Set to "prefer" if the --mem-bind option includes the prefer op‐
2001 tion.
2002
2003 SLURM_MEM_BIND_SORT
2004 Sort free cache pages (run zonesort on Intel KNL nodes)
2005
2006 SLURM_MEM_BIND_TYPE
2007 Set to the memory binding type specified with the --mem-bind op‐
2008 tion. Possible values are "none", "rank", "map_map", "mask_mem"
2009 and "local".
2010
2011 SLURM_MEM_BIND_VERBOSE
2012 Set to "verbose" if the --mem-bind option includes the verbose
2013 option. Set to "quiet" otherwise.
2014
2015 SLURM_MEM_PER_CPU
2016 Same as --mem-per-cpu
2017
2018 SLURM_MEM_PER_GPU
2019 Requested memory per allocated GPU. Only set if the
2020 --mem-per-gpu option is specified.
2021
2022 SLURM_MEM_PER_NODE
2023 Same as --mem
2024
2025 SLURM_NNODES
2026 Total number of nodes in the job allocation. See
2027 SLURM_JOB_NUM_NODES. Included for backwards compatibility.
2028
2029 SLURM_NODELIST
2030 List of nodes allocated to the job. See SLURM_JOB_NODELIST. In‐
2031 cluded for backwards compabitility.
2032
2033 SLURM_NODE_ALIASES
2034 Sets of node name, communication address and hostname for nodes
2035 allocated to the job from the cloud. Each element in the set if
2036 colon separated and each set is comma separated. For example:
2037 SLURM_NODE_ALIASES=ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
2038
2039 SLURM_NTASKS
2040 Same as -n, --ntasks
2041
2042 SLURM_NTASKS_PER_CORE
2043 Set to value of the --ntasks-per-core option, if specified.
2044
2045 SLURM_NTASKS_PER_GPU
2046 Set to value of the --ntasks-per-gpu option, if specified.
2047
2048 SLURM_NTASKS_PER_NODE
2049 Set to value of the --ntasks-per-node option, if specified.
2050
2051 SLURM_NTASKS_PER_SOCKET
2052 Set to value of the --ntasks-per-socket option, if specified.
2053
2054 SLURM_OVERCOMMIT
2055 Set to 1 if --overcommit was specified.
2056
2057 SLURM_PROFILE
2058 Same as --profile
2059
2060 SLURM_SUBMIT_DIR
2061 The directory from which salloc was invoked or, if applicable,
2062 the directory specified by the -D, --chdir option.
2063
2064 SLURM_SUBMIT_HOST
2065 The hostname of the computer from which salloc was invoked.
2066
2067 SLURM_TASKS_PER_NODE
2068 Number of tasks to be initiated on each node. Values are comma
2069 separated and in the same order as SLURM_JOB_NODELIST. If two
2070 or more consecutive nodes are to have the same task count, that
2071 count is followed by "(x#)" where "#" is the repetition count.
2072 For example, "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
2073 first three nodes will each execute two tasks and the fourth
2074 node will execute one task.
2075
2076 SLURM_THREADS_PER_CORE
2077 This is only set if --threads-per-core or SAL‐
2078 LOC_THREADS_PER_CORE were specified. The value will be set to
2079 the value specified by --threads-per-core or SAL‐
2080 LOC_THREADS_PER_CORE. This is used by subsequent srun calls
2081 within the job allocation.
2082
2083
2085 While salloc is waiting for a PENDING job allocation, most signals will
2086 cause salloc to revoke the allocation request and exit.
2087
2088 However if the allocation has been granted and salloc has already
2089 started the specified command, then salloc will ignore most signals.
2090 salloc will not exit or release the allocation until the command exits.
2091 One notable exception is SIGHUP. A SIGHUP signal will cause salloc to
2092 release the allocation and exit without waiting for the command to fin‐
2093 ish. Another exception is SIGTERM, which will be forwarded to the
2094 spawned process.
2095
2096
2098 To get an allocation, and open a new xterm in which srun commands may
2099 be typed interactively:
2100
2101 $ salloc -N16 xterm
2102 salloc: Granted job allocation 65537
2103 # (at this point the xterm appears, and salloc waits for xterm to exit)
2104 salloc: Relinquishing job allocation 65537
2105
2106
2107 To grab an allocation of nodes and launch a parallel application on one
2108 command line:
2109
2110 $ salloc -N5 srun -n10 myprogram
2111
2112
2113 To create a heterogeneous job with 3 components, each allocating a
2114 unique set of nodes:
2115
2116 $ salloc -w node[2-3] : -w node4 : -w node[5-7] bash
2117 salloc: job 32294 queued and waiting for resources
2118 salloc: job 32294 has been allocated resources
2119 salloc: Granted job allocation 32294
2120
2121
2123 Copyright (C) 2006-2007 The Regents of the University of California.
2124 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
2125 Copyright (C) 2008-2010 Lawrence Livermore National Security.
2126 Copyright (C) 2010-2021 SchedMD LLC.
2127
2128 This file is part of Slurm, a resource management program. For de‐
2129 tails, see <https://slurm.schedmd.com/>.
2130
2131 Slurm is free software; you can redistribute it and/or modify it under
2132 the terms of the GNU General Public License as published by the Free
2133 Software Foundation; either version 2 of the License, or (at your op‐
2134 tion) any later version.
2135
2136 Slurm is distributed in the hope that it will be useful, but WITHOUT
2137 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
2138 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
2139 for more details.
2140
2141
2143 sinfo(1), sattach(1), sbatch(1), squeue(1), scancel(1), scontrol(1),
2144 slurm.conf(5), sched_setaffinity (2), numa (3)
2145
2146
2147
2148November 2021 Slurm Commands salloc(1)